Skip to main content

API Integration

Whisper2Linux integrates with several APIs to provide its core functionality. This document details the APIs used, how they are integrated, and how to configure or customize these integrations.

Overview of API Integrations

Whisper2Linux utilizes three main API integrations:

  1. Whisper API for speech-to-text transcription
  2. Text-to-Speech (TTS) API for generating spoken responses
  3. Ollama API for natural language processing and AI-driven responses

1. Whisper API Integration

The Whisper API is used for converting speech to text.

Configuration

The Whisper API endpoint is defined in the WHISPER_API_URL variable:

WHISPER_API_URL = "http://192.168.1.186:9000/asr"

Usage in Code

The Whisper API is called in the transcribe_audio_from_memory function:

def transcribe_audio_from_memory(audio_buffer):
files = {
'audio_file': ('audio.flac', audio_buffer, 'audio/flac')
}
params = {
'task': 'transcribe',
'language': 'en',
'output': 'txt',
'encode': False
}
response = requests.post(WHISPER_API_URL, files=files, params=params)
# Process the response...

Customization

To use a different speech-to-text service:

  1. Update the WHISPER_API_URL to point to your preferred service.
  2. Modify the transcribe_audio_from_memory function to match the new API's requirements.

2. Text-to-Speech (TTS) API Integration

The TTS API is used to generate spoken responses from text.

Configuration

The TTS API settings are defined by these variables:

TTS_API_URL = "http://192.168.1.186:8000/v1/audio/speech"
TTS_VOICE = "alloy"

Usage in Code

The TTS API is called in the synthesize_speech function:

def synthesize_speech(text, voice=TTS_VOICE):
response = requests.post(TTS_API_URL, json={
"model": "tts-1-hd",
"input": text,
"voice": voice,
"response_format": "wav",
"speed": 1.0
})
# Process the response...

Customization

To use a different TTS service:

  1. Update the TTS_API_URL to point to your preferred service.
  2. Modify the synthesize_speech function to match the new API's requirements.
  3. Adjust the TTS_VOICE variable if your new service offers different voice options.

3. Ollama API Integration

The Ollama API is used for natural language processing and generating AI-driven responses.

Configuration

The Ollama API settings are defined by these variables:

OLLAMA_API_URL = "http://192.168.1.186:11434/api/chat"
OLLAMA_MODEL = "mistral-nemo"

Usage in Code

The Ollama API is primarily used in the cmd_respond function:

def cmd_respond(transcription):
# ... (preprocessing logic)
response = requests.post(OLLAMA_API_URL, json={
"model": OLLAMA_MODEL,
"messages": [
{"role": "system", "content": SYSTEM_MESSAGE},
{"role": "user", "content": query}
],
"stream": False
})
# Process the response...

Customization

To use a different language model or AI service:

  1. Update the OLLAMA_API_URL to point to your preferred service.
  2. Modify the OLLAMA_MODEL variable if using a different model within the same service.
  3. Adjust the cmd_respond function to match the new API's request and response format.
  4. Update the SYSTEM_MESSAGE variable to provide appropriate context for your chosen AI model.

API Error Handling

Whisper2Linux implements error handling for API calls to ensure robustness:

try:
response = requests.post(API_URL, ...)
response.raise_for_status()
# Process successful response
except requests.exceptions.RequestException as e:
logging.error(f"API call failed: {e}")
# Handle the error (e.g., provide a fallback response)

This pattern is used across all API integrations to catch and log errors, allowing the application to gracefully handle API failures.

Performance Considerations

API calls can impact the responsiveness of Whisper2Linux. To optimize performance:

  1. Caching: Implement caching for frequent or repetitive queries to reduce API calls.
  2. Asynchronous Calls: Consider using asynchronous API calls to prevent blocking the main application thread.
  3. Timeout Settings: Set appropriate timeouts for API calls to prevent hanging on slow responses.
  4. Rate Limiting: Implement rate limiting to avoid overloading the APIs and potentially hitting usage limits.

Security Considerations

When working with external APIs, consider the following security practices:

  1. API Key Management: If the APIs require authentication, store API keys securely and never hard-code them in the source.
  2. HTTPS: Ensure all API endpoints use HTTPS to encrypt data in transit.
  3. Input Sanitization: Sanitize all user inputs before sending them to the APIs to prevent injection attacks.
  4. Response Validation: Validate and sanitize API responses before processing them to guard against malicious data.

Extending API Integrations

To add new API integrations to Whisper2Linux:

  1. Define new configuration variables for the API endpoint and any necessary settings.
  2. Create functions to handle the API calls, following the existing patterns for error handling and logging.
  3. Integrate the new API functionality into the appropriate parts of the application logic.
  4. Update the documentation to reflect the new API integration and its configuration options.

Testing API Integrations

It's crucial to thoroughly test API integrations:

  1. Unit Tests: Write unit tests for individual API call functions.
  2. Mock Responses: Use mocked API responses in tests to cover various scenarios, including errors.
  3. Integration Tests: Perform integration tests with actual API calls in a controlled environment.
  4. Rate Limit Testing: Test behavior when approaching or exceeding API rate limits.
  5. Fallback Behavior: Ensure the application gracefully handles API unavailability.

Conclusion

API integrations are central to Whisper2Linux's functionality, enabling speech recognition, text-to-speech conversion, and intelligent responses. By understanding these integrations, you can effectively customize, extend, and maintain the application to suit your specific needs. Always consider performance, security, and robustness when working with these APIs to ensure a smooth and reliable user experience.