talk-buddy

Text-to-Speech Setup

Configure voice synthesis services to enable AI voice responses in Talk Buddy. This guide covers both online and local TTS (Text-to-Speech) service options for natural-sounding conversation practice.

Understanding TTS Services

What is Text-to-Speech?

TTS services convert AI text responses into spoken voice output:

AI voice responses: Hear what the AI character says during practice
Natural conversations: Voice output makes dialogue feel more realistic
Multiple voices: Different characters can have distinct voices
Language support: Various languages and accents (service-dependent)

Service Options

Online Services (Default)

Pre-configured services: Ready to use immediately

Pros: No setup required, high-quality voices, multiple languages
Cons: Requires internet, potential privacy concerns, usage limits
Best for: Quick start, testing, occasional practice

Local Services (Recommended for Privacy)

Self-hosted services: Run on your own computer

Pros: Complete privacy, offline capability, no usage limits
Cons: Requires setup, uses system resources, may have fewer voice options
Best for: Regular practice, privacy-conscious users, offline environments

Quick Start (Online Services)

Default Configuration

Talk Buddy comes pre-configured with working TTS services:

Check Current Status

Look at status footer: Voice indicator should be green (●)
If green: You’re ready for voice-enabled practice
If red/gray: Follow troubleshooting steps below

Test Voice Service

Go to Settings: Click “Settings” in sidebar
Open the Voice tab: Look for voice synthesis configuration
Click “Test Voice”: Verify service is working
Listen for voice: Should hear test speech output
Check audio quality: Verify voice is clear and understandable

Troubleshooting Online Services

Connection Issues

Check internet: Verify stable connection
Firewall settings: Ensure Talk Buddy can access external services
Service status: Online services may occasionally be unavailable

Audio Problems

System volume: Check computer audio settings and volume
Audio device: Verify correct speakers/headphones selected
Driver issues: Update audio drivers if needed

Local TTS Setup (Speaches)

Why Use Local Services?

Privacy benefits:

No data sent to external servers
Complete offline functionality
No usage limits or quotas

Performance benefits:

Faster response times (no network latency)
Consistent availability
Customizable voices for your use case

Installing Speaches

System Requirements

Operating System: Windows 10+, macOS 10.14+, Linux (Ubuntu 18.04+)
RAM: 4GB minimum, 8GB recommended
Storage: 2-10GB for voice models
CPU: Modern processor (last 5 years recommended)

Installation Steps

Option 1: Docker Installation (Recommended)

# Pull the Speaches Docker image
docker pull ghcr.io/tts-ai/speaches:latest

# Run Speaches container with TTS enabled
docker run -d \
  --name speaches \
  -p 8000:8000 \
  ghcr.io/tts-ai/speaches:latest

Option 2: Python Installation

# Install Python 3.8+ if not already installed
python --version

# Install Speaches via pip
pip install speaches

# Start Speaches server with TTS
speaches serve --host 0.0.0.0 --port 8000 --enable-tts

Option 3: Binary Installation

Download: Get binary from Speaches releases
Extract: Unzip to preferred location
Run: Execute the binary to start server
Configure: Set to run on port 8000 with TTS enabled

Configuring Talk Buddy for Local Voice Synthesis

Update Service URL

Open Talk Buddy Settings
Go to the Voice tab
Find the Service URL field
Change to local address: http://localhost:8000
Save settings

Test Local Connection

Click “Test Voice” in settings
Verify connection: Should show successful connection
Test voice synthesis: Should hear test speech
Check status footer: Voice indicator should be green

Speaches Voice Configuration

Voice Model Selection

Speaches supports multiple voice models:

Fast Models (Lower quality, faster synthesis)

Good for: Real-time conversation, older hardware
Model examples: speaches-ai/Kokoro-82M-v1.0-ONNX, microsoft/speecht5_tts

High-Quality Models (Better voices, slower synthesis)

Good for: High-quality practice, powerful hardware
Model examples: speaches-ai/Kokoro-82M-v1.0-ONNX, suno/bark

Voice Characteristics

Configure different voices for scenarios:

# Example: Configure female voice
speaches serve --tts-voice "en_US-amy-medium" --port 8000

# Example: Configure male voice
speaches serve --tts-voice "en_US-ryan-high" --port 8000

Language Configuration

Set up for your language:

# Example: Configure for Spanish TTS
speaches serve --tts-language es --tts-voice "es_ES-marta-medium" --port 8000

# Example: Configure for French TTS
speaches serve --tts-language fr --tts-voice "fr_FR-siwis-medium" --port 8000

Advanced TTS Configuration

Create configuration file speaches.yaml:

server:
  host: "0.0.0.0"
  port: 8000
  
stt:
  enabled: true
  model: "Systran/faster-whisper-medium"
  
tts:
  enabled: true
  model: "speaches-ai/piper-en_US-lessac-medium"
  voice_speed: 1.0
  voice_pitch: 0.0
  output_format: "wav"

Advanced TTS Configuration

Multiple Voice Setup

Character-Specific Voices

Configure different voices for different AI characters:

Interview scenarios: Professional, clear voice
Customer service: Friendly, approachable voice
Technical scenarios: Authoritative, confident voice
Casual conversation: Relaxed, conversational voice

Voice Switching

In Talk Buddy settings (Voice tab):

Primary voice: Default voice for most scenarios
Alternative voices: Different voices for specific contexts
Test voices: Verify each voice works well for intended use
Document preferences: Keep track of which voices work best

Performance Optimization

Hardware Optimization

For better local TTS performance:

Use SSD storage: Faster model loading
Increase RAM: Better model caching
Use GPU acceleration: If supported by your TTS service
Close other applications: Free resources for voice synthesis

Voice Quality vs Speed

Choose appropriate balance:

Fast models: Real-time conversation priority
Quality models: Natural-sounding voice priority
Balanced models: Good compromise for most uses
Specialized models: Optimized for specific languages or use cases

Security and Privacy

Local Service Security

Secure your local installation:

Firewall configuration: Only allow local connections
Network isolation: Keep TTS service on local network only
Regular updates: Maintain current software versions
Access control: Restrict who can access the service

Data Privacy

Understand data handling:

Local processing: No data leaves your computer
No logging: Configure services to not store text/audio
Temporary processing: Text processed in memory only
User control: You control all data and processing

Troubleshooting TTS Issues

Common Problems

No Audio Output

Symptoms: Silent AI responses, no voice heard Solutions:

Check system volume: Verify computer audio not muted
Test audio device: Confirm speakers/headphones work with other apps
Check TTS service: Verify service is running and connected
Test different voice: Try alternative voice models

Poor Voice Quality

Symptoms: Robotic voice, audio artifacts, unclear speech Solutions:

Try different voice model: Some models sound more natural
Check audio settings: Verify sample rate and format settings
Update audio drivers: Ensure latest audio drivers installed
Reduce system load: Close other applications using audio

Service Connection Errors

Symptoms: Red Voice indicator, connection timeouts Solutions:

Verify service running: Check if Speaches or online service is available
Test network connectivity: Ensure internet access for online services
Check firewall: Confirm Talk Buddy can access the voice synthesis service
Restart services: Stop and start the voice service, restart Talk Buddy

Slow Voice Generation

Symptoms: Long delays between AI text and voice output Solutions:

Use faster models: Switch to smaller, quicker voice models (Voice tab in Settings)
Optimize hardware: Close other applications, upgrade hardware
Check network: Ensure stable, fast internet for online services
Local processing: Switch to the Built-in (offline) option in the Voice tab for better performance

Advanced Troubleshooting

Log Analysis

Check service logs for errors:

# View Speaches logs
docker logs speaches

# Check system audio logs (macOS)
log show --predicate 'subsystem == "com.apple.coreaudio"' --last 5m

# Windows audio troubleshooting
# Use Windows Audio troubleshooter in Settings

Network Diagnostics

Test service connectivity:

# Test local Speaches service
curl http://localhost:8000/health

# Test TTS endpoint
curl -X POST http://localhost:8000/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, this is a test", "voice": "en_US-amy-medium"}'

Performance Monitoring

Monitor resource usage:

CPU usage: TTS processing should use <30% CPU
Memory usage: Voice models require 1-3GB RAM typically
Network usage: Online services use bandwidth during synthesis
Audio latency: Monitor delay between text and voice output

Service Comparison

Online vs Local TTS

Aspect	Online Services	Local Services
Setup	Ready immediately	Requires installation
Privacy	Data sent externally	Complete privacy
Voice Quality	Often excellent	Varies by model
Speed	Network dependent	Hardware dependent
Cost	May have usage limits	Free after setup
Offline	Requires internet	Works offline
Voices	Many options	Depends on models

Recommended Configurations

For Students

Start with: Default online services
Upgrade to: Local services if practicing frequently
Best for: Learning and experimenting with voice-enabled practice

For Teachers

Recommended: Local services for privacy and reliability
Classroom use: Local services avoid internet dependency
Best for: Consistent, private classroom experience

For Professionals

Recommended: Local services for confidential practice
Corporate use: Local services meet security requirements
Best for: Professional development with privacy

Quick Setup Checklist

Online Voice Synthesis (5 minutes)

Open Talk Buddy
Check Voice status indicator (should be green)
Go to Settings → Voice tab and test the service
Verify audio output device is working
Test with sample text

Local Voice Synthesis / Built-in (offline) (45 minutes)

Install Speaches (Docker, Python, or binary)
Start Speaches service on port 8000 with voice synthesis enabled
Configure Talk Buddy to use localhost:8000 (Voice tab in Settings)
Test connection in Settings
Verify voice synthesis works
Configure for automatic startup (optional)

Troubleshooting (15 minutes)

Check audio output device and system volume
Verify service connectivity (green status indicator)
Test with simple text synthesis
Check network/firewall settings if needed
Review logs for error messages

With proper voice setup, your Talk Buddy conversations become immersive and natural. Choose the option that best fits your privacy needs and desired voice quality!

Related Guides:

Listening Setup - Configure speech recognition input
AI Model Integration - Set up conversation AI
Connection Issues - Fix connectivity problems

This site is open source. Improve this page.