Self-hosted transcription
replace OpenAI Whisper API with predictable VPS cost

// screenshot of speaches.ai ↗
Speaches is a self-hosted speech-to-text (STT) and text-to-speech (TTS) server with OpenAI-compatible API. Wraps Whisper (STT), Piper / Kokoro (TTS), and exposes them as the standard /v1/audio/transcriptions and /v1/audio/speech OpenAI endpoints.
Speaches is a self-hosted speech-to-text (STT) and text-to-speech (TTS) server with OpenAI-compatible API. Wraps Whisper (STT), Piper / Kokoro (TTS), and exposes them as the standard /v1/audio/transcriptions and /v1/audio/speech OpenAI endpoints.
Drop-in replacement for OpenAI Whisper API at $0/transcription — runs on your own VPS or GPU.
Concrete scenarios where teams pick Speaches over the SaaS alternative.
replace OpenAI Whisper API with predictable VPS cost
synthesize speech for self-hosted Alexa-style apps
bulk transcribe podcasts, meetings, lectures
live captions, voice control
Whisper handles 100+ languages out of the box
If your team profile matches one of these, Speaches is a strong fit out of the box.
building voice features without OpenAI per-minute costs
bulk transcribing back catalogs without metered API spend
needing voice processing without cloud upload
building self-hosted Alexa/Google Home alternatives
integrating voice into LLM agents (Open WebUI, LibreChat)
When evaluating self-hosted options for this category, here are the dimensions on which Speaches consistently lands above the alternatives.
The stack you'll plug Speaches into — services, protocols, and adjacent apps in the BluixApps catalog.
ghcr.io/speaches-ai/speaches:latest (release-tagged)/v1/audio/transcriptions + /v1/audio/speech endpointsOperational guidance from running this in production — what to do before you scale, what to lock down, what surprises people.
8000:8000 · ghcr.io/speaches-ai/speaches:latest-cpu