Customization Guide
Whisper2Linux is designed to be highly customizable, allowing users to tailor the application to their specific needs and preferences. This guide will walk you through various customization options and how to implement them.
1. Changing the Trigger Word
The trigger word (default: "Olga") can be easily changed:
- Open
whisper2linux.py
- Locate the
TRIGGER_WORD
variable - Change its value to your preferred trigger word:
TRIGGER_WORD = "your_preferred_trigger_word"
2. Customizing the AI Assistant's Personality
To modify the AI assistant's personality and behavior:
- Find the
SYSTEM_MESSAGE
variable - Edit the content to reflect the desired personality and capabilities:
SYSTEM_MESSAGE = """
You are a helpful AI assistant named {trigger_word}. Your role is to...
[Add your custom instructions here]
"""
3. Adding Custom Commands
To add new voice commands:
- Define a new function for your command:
def cmd_custom_action():
# Implement your custom action here
print("Executing custom action")
- Add the new command to the
commands
dictionary:
commands.update({
"custom action": cmd_custom_action
})
Now you can use "Olga: Custom action" to trigger your new command.
4. Modifying Existing Commands
To change the behavior of existing commands:
- Locate the function for the command you want to modify (e.g.,
cmd_copy
) - Edit the function to implement the desired behavior
Example: Modifying the copy command to append a timestamp:
def cmd_copy(transcription):
timestamp = time.strftime("%Y-%m-%d %H:%M:%S")
state.in_memory_clipboard = f"{transcription} (Copied at {timestamp})"
logging.debug(f"Copied to in-memory clipboard: {state.in_memory_clipboard}")
5. Adjusting Audio Settings
To customize audio recording and playback:
- Modify the
SAMPLE_RATE
variable to change the audio quality:
SAMPLE_RATE = 44100 # CD quality audio
- Adjust the
RECORDING_CHUNK_DURATION
for different responsiveness:
RECORDING_CHUNK_DURATION = 0.05 # Shorter chunks for faster processing
6. Customizing API Endpoints
To use different API services:
- Update the API URL variables:
WHISPER_API_URL = "https://your-custom-whisper-api.com/transcribe"
TTS_API_URL = "https://your-custom-tts-api.com/synthesize"
OLLAMA_API_URL = "https://your-custom-ollama-api.com/chat"
- Modify the corresponding API call functions to match the new API's requirements
7. Implementing Custom Error Handling
To add custom error handling:
- Create a custom error handling function:
def custom_error_handler(error_type, error_message):
logging.error(f"Custom error handler: {error_type} - {error_message}")
# Implement your custom error handling logic here
- Use this function in try-except blocks throughout the code:
try:
# Some operation
except Exception as e:
custom_error_handler("OperationError", str(e))
8. Adding Custom Logging
To implement custom logging:
- Create a custom logger:
import logging
def setup_custom_logger():
logger = logging.getLogger("WhisperLinuxCustomLogger")
handler = logging.FileHandler("custom_whisperlinux.log")
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)
return logger
custom_logger = setup_custom_logger()
- Use the custom logger throughout the code:
custom_logger.debug("Debug message")
custom_logger.info("Info message")
custom_logger.warning("Warning message")
custom_logger.error("Error message")
9. Customizing the Activation Method
To change how Whisper2Linux is activated:
- Modify the
handle_recording
function:
def handle_recording():
# Implement your custom activation logic here
# For example, using a specific hotkey or phrase
if custom_activation_condition:
state.recording_started = True
play_beep()
state.audio_data = []
state.last_recording_time = time.time()
record_audio_continuously()
- Update the main event loop to use your new activation method.
10. Implementing Custom Text Processing
To add custom text processing before executing commands:
- Create a new function for text processing:
def custom_text_processor(text):
# Implement your custom text processing here
# For example, removing filler words or correcting common mistakes
processed_text = text.lower().replace("um", "").replace("uh", "")
return processed_text
- Integrate this function into the command processing pipeline:
def process_transcription(transcription):
processed_transcription = custom_text_processor(transcription)
# Rest of the processing logic
11. Adding Custom Shortcuts
To implement custom keyboard shortcuts:
- Define new shortcut functions:
def custom_shortcut_1():
# Implement custom shortcut action
subprocess.run(["xdotool", "key", "ctrl+alt+1"])
def custom_shortcut_2():
# Another custom shortcut
subprocess.run(["xdotool", "key", "ctrl+alt+2"])
- Add these to the commands dictionary:
commands.update({
"shortcut one": custom_shortcut_1,
"shortcut two": custom_shortcut_2
})
12. Customizing the User Interface
While Whisper2Linux is primarily a voice-controlled application, you might want to add a simple GUI for configuration or status display:
- Install a GUI library like
tkinter
:
sudo apt-get install python3-tk
- Implement a basic GUI:
import tkinter as tk
def create_gui():
root = tk.Tk()
root.title("Whisper2Linux Status")
status_label = tk.Label(root, text="Whisper2Linux is running")
status_label.pack()
stop_button = tk.Button(root, text="Stop", command=stop_whisper2linux)
stop_button.pack()
root.mainloop()
def stop_whisper2linux():
# Implement stop logic here
print("Stopping Whisper2Linux")
# You might want to set a flag to stop the main loop
# Run the GUI in a separate thread
gui_thread = threading.Thread(target=create_gui)
gui_thread.start()
13. Implementing Custom Wake Word Detection
To use a custom wake word instead of key presses:
- Implement a wake word detection function:
def detect_wake_word(audio_chunk):
# Implement wake word detection logic
# This could use a pre-trained model or a simple energy threshold
return wake_word_detected
# In the main audio processing loop:
if detect_wake_word(audio_chunk):
handle_recording()
14. Adding Support for Multiple Languages
To support multiple languages:
- Modify the Whisper API call to specify the language:
def transcribe_audio_from_memory(audio_buffer, language='en'):
params = {
'task': 'transcribe',
'language': language,
'output': 'txt',
'encode': False
}
# Rest of the function remains the same
- Implement a language detection or selection mechanism:
def detect_language(audio_chunk):
# Implement language detection logic
# This could be based on the user's settings or automatic detection
return detected_language
# In the main processing loop:
detected_lang = detect_language(audio_chunk)
transcription = transcribe_audio_from_memory(audio_buffer, language=detected_lang)
Conclusion
These customization options provide a starting point for tailoring Whisper2Linux to your specific needs. Remember to thoroughly test any changes you make to ensure they don't negatively impact the application's performance or reliability. As you become more familiar with the codebase, you'll likely discover even more ways to customize and extend Whisper2Linux's functionality.
Always back up your original code before making significant changes, and consider using version control (like Git) to manage your customizations. This will allow you to easily revert changes if needed and keep track of your modifications over time.
Happy customizing!