Customization Guide

Whisper2Linux is designed to be highly customizable, allowing users to tailor the application to their specific needs and preferences. This guide will walk you through various customization options and how to implement them.

1. Changing the Trigger Word

The trigger word (default: "Olga") can be easily changed:

Open whisper2linux.py
Locate the TRIGGER_WORD variable
Change its value to your preferred trigger word:

TRIGGER_WORD = "your_preferred_trigger_word"

2. Customizing the AI Assistant's Personality

To modify the AI assistant's personality and behavior:

Find the SYSTEM_MESSAGE variable
Edit the content to reflect the desired personality and capabilities:

SYSTEM_MESSAGE = """
You are a helpful AI assistant named {trigger_word}. Your role is to...
[Add your custom instructions here]
"""

3. Adding Custom Commands

To add new voice commands:

Define a new function for your command:

def cmd_custom_action():
    # Implement your custom action here
    print("Executing custom action")

Add the new command to the commands dictionary:

commands.update({
    "custom action": cmd_custom_action
})

Now you can use "Olga: Custom action" to trigger your new command.

4. Modifying Existing Commands

To change the behavior of existing commands:

Locate the function for the command you want to modify (e.g., cmd_copy)
Edit the function to implement the desired behavior

Example: Modifying the copy command to append a timestamp:

def cmd_copy(transcription):
    timestamp = time.strftime("%Y-%m-%d %H:%M:%S")
    state.in_memory_clipboard = f"{transcription} (Copied at {timestamp})"
    logging.debug(f"Copied to in-memory clipboard: {state.in_memory_clipboard}")

5. Adjusting Audio Settings

To customize audio recording and playback:

Modify the SAMPLE_RATE variable to change the audio quality:

SAMPLE_RATE = 44100  # CD quality audio

Adjust the RECORDING_CHUNK_DURATION for different responsiveness:

RECORDING_CHUNK_DURATION = 0.05  # Shorter chunks for faster processing

6. Customizing API Endpoints

To use different API services:

Update the API URL variables:

WHISPER_API_URL = "https://your-custom-whisper-api.com/transcribe"
TTS_API_URL = "https://your-custom-tts-api.com/synthesize"
OLLAMA_API_URL = "https://your-custom-ollama-api.com/chat"

Modify the corresponding API call functions to match the new API's requirements

7. Implementing Custom Error Handling

To add custom error handling:

Create a custom error handling function:

def custom_error_handler(error_type, error_message):
    logging.error(f"Custom error handler: {error_type} - {error_message}")
    # Implement your custom error handling logic here

Use this function in try-except blocks throughout the code:

try:
    # Some operation
except Exception as e:
    custom_error_handler("OperationError", str(e))

8. Adding Custom Logging

To implement custom logging:

Create a custom logger:

import logging

def setup_custom_logger():
    logger = logging.getLogger("WhisperLinuxCustomLogger")
    handler = logging.FileHandler("custom_whisperlinux.log")
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    logger.setLevel(logging.DEBUG)
    return logger

custom_logger = setup_custom_logger()

Use the custom logger throughout the code:

custom_logger.debug("Debug message")
custom_logger.info("Info message")
custom_logger.warning("Warning message")
custom_logger.error("Error message")

9. Customizing the Activation Method

To change how Whisper2Linux is activated:

Modify the handle_recording function:

def handle_recording():
    # Implement your custom activation logic here
    # For example, using a specific hotkey or phrase
    if custom_activation_condition:
        state.recording_started = True
        play_beep()
        state.audio_data = []
        state.last_recording_time = time.time()
        record_audio_continuously()

Update the main event loop to use your new activation method.

10. Implementing Custom Text Processing

To add custom text processing before executing commands:

Create a new function for text processing:

def custom_text_processor(text):
    # Implement your custom text processing here
    # For example, removing filler words or correcting common mistakes
    processed_text = text.lower().replace("um", "").replace("uh", "")
    return processed_text

Integrate this function into the command processing pipeline:

def process_transcription(transcription):
    processed_transcription = custom_text_processor(transcription)
    # Rest of the processing logic

11. Adding Custom Shortcuts

To implement custom keyboard shortcuts:

Define new shortcut functions:

def custom_shortcut_1():
    # Implement custom shortcut action
    subprocess.run(["xdotool", "key", "ctrl+alt+1"])

def custom_shortcut_2():
    # Another custom shortcut
    subprocess.run(["xdotool", "key", "ctrl+alt+2"])

Add these to the commands dictionary:

commands.update({
    "shortcut one": custom_shortcut_1,
    "shortcut two": custom_shortcut_2
})

12. Customizing the User Interface

While Whisper2Linux is primarily a voice-controlled application, you might want to add a simple GUI for configuration or status display:

Install a GUI library like tkinter:

sudo apt-get install python3-tk

Implement a basic GUI:

import tkinter as tk

def create_gui():
    root = tk.Tk()
    root.title("Whisper2Linux Status")
    
    status_label = tk.Label(root, text="Whisper2Linux is running")
    status_label.pack()
    
    stop_button = tk.Button(root, text="Stop", command=stop_whisper2linux)
    stop_button.pack()
    
    root.mainloop()

def stop_whisper2linux():
    # Implement stop logic here
    print("Stopping Whisper2Linux")
    # You might want to set a flag to stop the main loop

# Run the GUI in a separate thread
gui_thread = threading.Thread(target=create_gui)
gui_thread.start()

13. Implementing Custom Wake Word Detection

To use a custom wake word instead of key presses:

Implement a wake word detection function:

def detect_wake_word(audio_chunk):
    # Implement wake word detection logic
    # This could use a pre-trained model or a simple energy threshold
    return wake_word_detected

# In the main audio processing loop:
if detect_wake_word(audio_chunk):
    handle_recording()

14. Adding Support for Multiple Languages

To support multiple languages:

Modify the Whisper API call to specify the language:

def transcribe_audio_from_memory(audio_buffer, language='en'):
    params = {
        'task': 'transcribe',
        'language': language,
        'output': 'txt',
        'encode': False
    }
    # Rest of the function remains the same

Implement a language detection or selection mechanism:

def detect_language(audio_chunk):
    # Implement language detection logic
    # This could be based on the user's settings or automatic detection
    return detected_language

# In the main processing loop:
detected_lang = detect_language(audio_chunk)
transcription = transcribe_audio_from_memory(audio_buffer, language=detected_lang)

Conclusion

These customization options provide a starting point for tailoring Whisper2Linux to your specific needs. Remember to thoroughly test any changes you make to ensure they don't negatively impact the application's performance or reliability. As you become more familiar with the codebase, you'll likely discover even more ways to customize and extend Whisper2Linux's functionality.

Always back up your original code before making significant changes, and consider using version control (like Git) to manage your customizations. This will allow you to easily revert changes if needed and keep track of your modifications over time.

Happy customizing!

1. Changing the Trigger Word​

2. Customizing the AI Assistant's Personality​

3. Adding Custom Commands​

4. Modifying Existing Commands​

5. Adjusting Audio Settings​

6. Customizing API Endpoints​

7. Implementing Custom Error Handling​

8. Adding Custom Logging​

9. Customizing the Activation Method​

10. Implementing Custom Text Processing​

11. Adding Custom Shortcuts​

12. Customizing the User Interface​

13. Implementing Custom Wake Word Detection​

14. Adding Support for Multiple Languages​

Conclusion​