Usage Guide
This guide will help you understand how to effectively use Whisper2Linux, covering basic operations, advanced features, and tips for getting the most out of your voice-controlled Linux experience.
Basic Operation
Configuring Whisper2Linux
The defaults are designed to work with the Setup Guide : Docker.
-
Open the
whisper2linux.py
file in a text editor. -
If you followed Setup Guide: Host Remotely you must update all three API_URLs:
WHISPER_API_URL = "http://localhost:9000/asr" #Automatic Speech Recognition
TTS_API_URL = "http://localhost:8000/v1/audio/speech" #Text-To-Speech
OLLAMA_API_URL = "http://localhost:11434/api/chat" #Ollama LLM Model -
Save the changes to the file.
Starting Whisper2Linux
- Finish a Setup Requirements and Install Whisper2Linux.
- Open a terminal in the Whisper2Linux directory.
- Run the following command:
python whisper2linux.py
- You should see a message indicating that Whisper2Linux is listening for the Ctrl+Alt key combination.
Activating Voice Input
- Hold down the
Ctrl
andAlt
keys simultaneously. - You'll hear a beep, indicating that Whisper2Linux is ready to listen.
- Speak your command or text.
- Release the
Ctrl
andAlt
keys to process your input.
Command Types
Whisper2Linux supports several types of voice inputs:
1. Basic Typing
To have Whisper2Linux type out exactly what you say:
- Example: Say "How far is it from San Francisco to Paris?"
- Result: Whisper2Linux will type "How far is it from San Francisco to Paris?" at the current cursor position.
2. Direct AI Assistant Response
To get a direct response from the AI assistant:
- Example: Say "Olga: How far is it from San Francisco to Paris?"
- Result: The AI assistant (Olga) will respond with information about the distance between San Francisco and Paris.
3. Type and Respond
To both type your question and get an AI response:
- Example: Say "How far is it from San Francisco to Paris? Olga respond."
- Result: Whisper2Linux will type your question and then Olga will speak the response aloud.
4. Web Search
To perform a web search:
- Example: Say "Olga: Search for the latest Linux kernel release."
- Result: Olga will search the web and provide a summary of the latest Linux kernel release information.
5. System Commands
Whisper2Linux can execute various system commands:
-
Copy Text:
- Say: "Olga: Copy this to clipboard."
- Result: The spoken text will be copied to Whisper2Linux's in-memory clipboard.
-
Paste Text:
- Say: "Olga: Paste."
- Result: Whisper2Linux will paste the text stored in its in-memory clipboard at the current cursor position.
6. Keyboard Shortcuts
You can trigger keyboard shortcuts using voice commands:
-
Example: Say "Olga: Press Enter."
-
Result: Whisper2Linux will simulate pressing the Enter key.
-
Example: Say "Olga: Select all."
-
Result: Whisper2Linux will simulate pressing Ctrl+A to select all text in the current application.
Advanced Features
Fuzzy Command Matching
Whisper2Linux uses fuzzy matching to interpret commands, allowing for natural variations in speech. This means you don't have to be exact with your command wording.
Continuous Audio Processing
The application records and processes audio in chunks, enabling real-time responsiveness to your voice inputs.
Context-Aware Responses
When using the AI assistant (Olga), it maintains context from previous interactions within the same session, allowing for more natural, conversational interactions.
Tips for Effective Use
-
Speak Clearly: While the speech recognition is robust, clear enunciation will improve accuracy.
-
Use the Trigger Word: When you want a direct AI response, start your query with the trigger word (default is "Olga").
-
Experiment with Commands: Try variations of commands to see what works best for you. The fuzzy matching system is designed to be flexible.
-
Customize Common Commands: If you find yourself using certain phrases frequently, consider adding custom commands to streamline your workflow.
-
Monitor Performance: Pay attention to the performance logs to identify any bottlenecks or areas for optimization.
Troubleshooting
If you encounter issues while using Whisper2Linux:
-
No Response to Voice Input:
- Ensure your microphone is properly connected and selected as the default input device.
- Check if the
Ctrl
andAlt
keys are being detected by watching the console output.
-
Incorrect Transcriptions:
- Try speaking more slowly and clearly.
- Adjust your microphone position or settings.
- Consider using a higher quality microphone if problems persist.
-
Slow Response Times:
- Check your internet connection, as API calls require a stable connection.
- Ensure your system meets the recommended hardware requirements.
- Consider using a more powerful GPU if processing large language models locally.
-
Unexpected Command Execution:
- Review the fuzzy matching thresholds in the code and adjust if necessary.
- Be more specific with your commands, especially for system operations.
-
Audio Playback Issues:
- Verify that your system's audio output is properly configured.
- Check the console for any error messages related to audio playback.
Customization
Whisper2Linux is designed to be customizable. You can:
- Modify the
TRIGGER_WORD
in the code to change the AI assistant's name. - Add new commands by extending the
commands
dictionary in the source code. - Adjust audio settings like
SAMPLE_RATE
andRECORDING_CHUNK_DURATION
for different audio quality and latency trade-offs. - Fine-tune
ACTIVATION_DELAY
andRESUME_DELAY
to adjust the key press sensitivity.
For more detailed customization options, refer to the Customization Guide.
Logging and Debugging
To help with troubleshooting and optimization, Whisper2Linux offers logging options:
- Run with memory logging:
python whisper2linux.py --log memory
- Run with file logging:
python whisper2linux.py --log file --log-file /path/to/logfile.log
These logs can provide valuable insights into the application's performance and any issues that may occur.
Security and Privacy Considerations
Remember that Whisper2Linux is designed with privacy in mind:
- Audio data is processed in-memory and not stored persistently.
- By default, no logs are kept unless explicitly enabled.
- The application only monitors the
Ctrl
andAlt
keys for activation.
However, be mindful of sensitive information when using voice commands in public or shared spaces.
Conclusion
Whisper2Linux offers a powerful way to interact with your Linux desktop using voice commands. By understanding its features and following these usage guidelines, you can significantly enhance your productivity and enjoy a more accessible computing experience. Don't hesitate to explore, customize, and make Whisper2Linux work best for your unique needs!