Skip to main content

Usage Guide

This guide will help you understand how to effectively use Whisper2Linux, covering basic operations, advanced features, and tips for getting the most out of your voice-controlled Linux experience.

Basic Operation

Configuring Whisper2Linux

The defaults are designed to work with the Setup Guide : Docker.

  1. Open the whisper2linux.py file in a text editor.

  2. If you followed Setup Guide: Host Remotely you must update all three API_URLs:

    WHISPER_API_URL = "http://localhost:9000/asr" #Automatic Speech Recognition
    TTS_API_URL = "http://localhost:8000/v1/audio/speech" #Text-To-Speech
    OLLAMA_API_URL = "http://localhost:11434/api/chat" #Ollama LLM Model
  3. Save the changes to the file.

Starting Whisper2Linux

  1. Finish a Setup Requirements and Install Whisper2Linux.
  2. Open a terminal in the Whisper2Linux directory.
  3. Run the following command:
    python whisper2linux.py
  4. You should see a message indicating that Whisper2Linux is listening for the Ctrl+Alt key combination.

Activating Voice Input

  1. Hold down the Ctrl and Alt keys simultaneously.
  2. You'll hear a beep, indicating that Whisper2Linux is ready to listen.
  3. Speak your command or text.
  4. Release the Ctrl and Alt keys to process your input.

Command Types

Whisper2Linux supports several types of voice inputs:

1. Basic Typing

To have Whisper2Linux type out exactly what you say:

  • Example: Say "How far is it from San Francisco to Paris?"
  • Result: Whisper2Linux will type "How far is it from San Francisco to Paris?" at the current cursor position.

2. Direct AI Assistant Response

To get a direct response from the AI assistant:

  • Example: Say "Olga: How far is it from San Francisco to Paris?"
  • Result: The AI assistant (Olga) will respond with information about the distance between San Francisco and Paris.

3. Type and Respond

To both type your question and get an AI response:

  • Example: Say "How far is it from San Francisco to Paris? Olga respond."
  • Result: Whisper2Linux will type your question and then Olga will speak the response aloud.

To perform a web search:

  • Example: Say "Olga: Search for the latest Linux kernel release."
  • Result: Olga will search the web and provide a summary of the latest Linux kernel release information.

5. System Commands

Whisper2Linux can execute various system commands:

  • Copy Text:

    • Say: "Olga: Copy this to clipboard."
    • Result: The spoken text will be copied to Whisper2Linux's in-memory clipboard.
  • Paste Text:

    • Say: "Olga: Paste."
    • Result: Whisper2Linux will paste the text stored in its in-memory clipboard at the current cursor position.

6. Keyboard Shortcuts

You can trigger keyboard shortcuts using voice commands:

  • Example: Say "Olga: Press Enter."

  • Result: Whisper2Linux will simulate pressing the Enter key.

  • Example: Say "Olga: Select all."

  • Result: Whisper2Linux will simulate pressing Ctrl+A to select all text in the current application.

Advanced Features

Fuzzy Command Matching

Whisper2Linux uses fuzzy matching to interpret commands, allowing for natural variations in speech. This means you don't have to be exact with your command wording.

Continuous Audio Processing

The application records and processes audio in chunks, enabling real-time responsiveness to your voice inputs.

Context-Aware Responses

When using the AI assistant (Olga), it maintains context from previous interactions within the same session, allowing for more natural, conversational interactions.

Tips for Effective Use

  1. Speak Clearly: While the speech recognition is robust, clear enunciation will improve accuracy.

  2. Use the Trigger Word: When you want a direct AI response, start your query with the trigger word (default is "Olga").

  3. Experiment with Commands: Try variations of commands to see what works best for you. The fuzzy matching system is designed to be flexible.

  4. Customize Common Commands: If you find yourself using certain phrases frequently, consider adding custom commands to streamline your workflow.

  5. Monitor Performance: Pay attention to the performance logs to identify any bottlenecks or areas for optimization.

Troubleshooting

If you encounter issues while using Whisper2Linux:

  1. No Response to Voice Input:

    • Ensure your microphone is properly connected and selected as the default input device.
    • Check if the Ctrl and Alt keys are being detected by watching the console output.
  2. Incorrect Transcriptions:

    • Try speaking more slowly and clearly.
    • Adjust your microphone position or settings.
    • Consider using a higher quality microphone if problems persist.
  3. Slow Response Times:

    • Check your internet connection, as API calls require a stable connection.
    • Ensure your system meets the recommended hardware requirements.
    • Consider using a more powerful GPU if processing large language models locally.
  4. Unexpected Command Execution:

    • Review the fuzzy matching thresholds in the code and adjust if necessary.
    • Be more specific with your commands, especially for system operations.
  5. Audio Playback Issues:

    • Verify that your system's audio output is properly configured.
    • Check the console for any error messages related to audio playback.

Customization

Whisper2Linux is designed to be customizable. You can:

  1. Modify the TRIGGER_WORD in the code to change the AI assistant's name.
  2. Add new commands by extending the commands dictionary in the source code.
  3. Adjust audio settings like SAMPLE_RATE and RECORDING_CHUNK_DURATION for different audio quality and latency trade-offs.
  4. Fine-tune ACTIVATION_DELAY and RESUME_DELAY to adjust the key press sensitivity.

For more detailed customization options, refer to the Customization Guide.

Logging and Debugging

To help with troubleshooting and optimization, Whisper2Linux offers logging options:

  • Run with memory logging: python whisper2linux.py --log memory
  • Run with file logging: python whisper2linux.py --log file --log-file /path/to/logfile.log

These logs can provide valuable insights into the application's performance and any issues that may occur.

Security and Privacy Considerations

Remember that Whisper2Linux is designed with privacy in mind:

  • Audio data is processed in-memory and not stored persistently.
  • By default, no logs are kept unless explicitly enabled.
  • The application only monitors the Ctrl and Alt keys for activation.

However, be mindful of sensitive information when using voice commands in public or shared spaces.

Conclusion

Whisper2Linux offers a powerful way to interact with your Linux desktop using voice commands. By understanding its features and following these usage guidelines, you can significantly enhance your productivity and enjoy a more accessible computing experience. Don't hesitate to explore, customize, and make Whisper2Linux work best for your unique needs!