System Requirements
To ensure optimal performance of Whisper2Linux, your system should meet or exceed the following requirements. These specifications are designed to handle the real-time audio processing, speech recognition, and natural language processing tasks that Whisper2Linux performs.
Software Requirements
- Operating System:
- Manjaro GNOME (tested)
- Any Linux distribution with an X11 desktop environment (compatibility may vary)
- Python Version: Python 3.8 or higher
- Required Python Packages:
requests
: For making HTTP requests to API endpointssounddevice
: For audio recording and playbacksoundfile
: For audio file handlingnumpy
: For numerical operations, particularly audio data manipulationXlib
: For X Window System interactionrapidfuzz
: For fuzzy string matching in command processing
Hardware Requirements
-
CPU: 8 cores or more recommended
- The multi-threaded nature of Whisper2Linux benefits from multiple cores for concurrent processing of audio, speech recognition, and command execution.
-
Memory: 64GB RAM recommended
- Ample RAM ensures smooth operation when dealing with audio buffers, language models, and concurrent processing tasks.
-
Storage: 256GB NVMe/SSD or faster
- Fast storage improves overall system responsiveness and reduces latency in audio processing and model loading.
-
GPU:
- Optimal Performance: NVIDIA RTX 3090 or RTX 4090
- These high-end GPUs provide excellent performance for running large language models and speech recognition tasks.
- Minimum Required: NVIDIA GPU with at least 10GB VRAM
- Examples include GTX 1080Ti, RTX 3060, or RTX 3080
- The GPU is crucial for accelerating machine learning models used in speech recognition and natural language processing.
- Optimal Performance: NVIDIA RTX 3090 or RTX 4090
-
Audio Input:
- A working microphone (built-in or external)
- Whisper2Linux is configured to use the default system microphone, but this can be customized in the code if needed.
Network Requirements
- Internet Connection:
- A stable and fast internet connection is required for API calls to external services (Whisper API, TTS API, and Ollama API).
- Recommended minimum speed: 10 Mbps download, 5 Mbps upload
Additional Software
- xdotool:
- This tool is used for simulating keyboard input and must be installed separately.
- Installation on Arch-based systems:
sudo pacman -S xdotool
- Installation on Debian-based systems:
sudo apt-get install xdotool
Optional: Remote Processing
If you choose to run the processing requirements remotely (e.g., on the Akash Network):
- Akash Network Account: Required for deploying services on the Akash decentralized cloud
- AKT Tokens: Necessary for paying for compute resources on the Akash Network
- Docker: Knowledge of Docker and containerization is beneficial for managing remote deployments