AI / LLM · PRO TIER

Speachespro

Speaches is a self-hosted speech-to-text (STT) and text-to-speech (TTS) server with OpenAI-compatible API. Wraps Whisper (STT), Piper / Kokoro (TTS), and exposes them as the standard /v1/audio/transcriptions and /v1/audio/speech OpenAI endpoints.

Install via WHMCS → Visit speaches.ai ↗

🤖 AI / LLM Min 2048 MB RAM Port 8000 (http) Tier pro

// What it is

A closer look.

Speaches is a self-hosted speech-to-text (STT) and text-to-speech (TTS) server with OpenAI-compatible API. Wraps Whisper (STT), Piper / Kokoro (TTS), and exposes them as the standard /v1/audio/transcriptions and /v1/audio/speech OpenAI endpoints.

Drop-in replacement for OpenAI Whisper API at $0/transcription — runs on your own VPS or GPU.

// Use cases

What it's for.

Concrete scenarios where teams pick Speaches over the SaaS alternative.

◆

Self-hosted transcription

replace OpenAI Whisper API with predictable VPS cost

◈

Voice assistant TTS

synthesize speech for self-hosted Alexa-style apps

◇

Audio content production

bulk transcribe podcasts, meetings, lectures

▣

Real-time streaming STT

live captions, voice control

▦

Multi-language speech

Whisper handles 100+ languages out of the box

// Who it's for

Built for these teams.

If your team profile matches one of these, Speaches is a strong fit out of the box.

Profile A

Indie SaaS founders

building voice features without OpenAI per-minute costs

Profile B

Podcasters

bulk transcribing back catalogs without metered API spend

Profile C

Privacy-bound apps

needing voice processing without cloud upload

Profile D

Voice assistant developers

building self-hosted Alexa/Google Home alternatives

Profile E

AI engineers

integrating voice into LLM agents (Open WebUI, LibreChat)

// Differentiators

Why teams pick Speaches.

When evaluating self-hosted options for this category, here are the dimensions on which Speaches consistently lands above the alternatives.

✓OpenAI-compatible API — every OpenAI Whisper SDK works pointing at Speaches
✓Multiple model sizes — Whisper tiny / base / small / medium / large
✓CPU + GPU support — runs on modest hardware for testing, scales with GPU
✓MIT license — commercial use unrestricted
✓Streaming support — real-time transcription for live audio
✓Active development — frequent releases tracking upstream Whisper

// Integrations

Connects to.

The stack you'll plug Speaches into — services, protocols, and adjacent apps in the BluixApps catalog.

◇

OpenAI SDKs

Python, JS, every official OpenAI client works

◈

STT engines

Whisper (multiple sizes), Faster-Whisper (optimized)

◆

TTS engines

Piper (fast), Kokoro (quality)

▣

Audio formats

MP3, WAV, M4A, OGG, FLAC input; WAV / MP3 output

▦

VAD

Voice Activity Detection for streaming

▩

Webhook support

async transcription completion callbacks

▼

HTTP REST

primary API surface

// Adoption & deployment

Notable users & community

5k+ GitHub stars (rapidly growing)
Featured in self-hosted voice AI guides
Active Discord community
Strong adoption in privacy-bound voice applications
Frequent releases matching OpenAI API evolution

What we ship

Docker compose: Speaches server + model cache volume
Pinned ghcr.io/speaches-ai/speaches:latest (release-tagged)
HTTPS via Let's Encrypt; API key auth via proxy
Whisper-base + Kokoro voices pre-downloaded
GPU passthrough optional (significantly faster for large models)
OpenAI-compatible /v1/audio/transcriptions + /v1/audio/speech endpoints
Stateless service — no backup needed beyond config

// Tips & operations

Run it properly.

Operational guidance from running this in production — what to lock down, what surprises people.

// PERFORMANCE

GPU strongly recommended for large

Whisper-large on CPU = unusable; medium acceptable on modern CPU

// SECURITY

Pre-download models

first request downloads model; bake into image to avoid stalls

// OPERATIONS

Audio format conversion

Speaches transcodes via ffmpeg; some formats need explicit re-encoding

// RELIABILITY

Mind disk usage

Whisper models: tiny 39MB, base 74MB, small 244MB, medium 769MB, large 1.5GB

// DEPLOYMENT

Streaming has GPU overhead

VAD + chunking add latency on CPU

// SCALING

Auth at proxy layer

Speaches has no built-in auth; protect with API key proxy

2048

// min ram (MB)

// min disk (GB)

8000

// access port

http

// protocol

pro

// bluixapps tier

// Alternatives in AI / LLM

Compare with

Project resources

Official sitespeaches.ai ↗