Multi-lingual TTS
17 languages from one model
// official site: github.com ↗
XTTS-v2 is Coqui AI's multilingual text-to-speech model — 17 languages, voice cloning from 6-second samples, expressive emotional delivery, streaming output. Industry-leading open TTS, the canonical choice for self-hosted speech synthesis projects.
XTTS-v2 is Coqui AI's multilingual text-to-speech model — 17 languages, voice cloning from 6-second samples, expressive emotional delivery, streaming output. Industry-leading open TTS, the canonical choice for self-hosted speech synthesis projects.
The voice equivalent of "open SDXL" — best-in-class open weights with permissive commercial terms.
Concrete scenarios where teams pick XTTS-v2 (Coqui) over the SaaS alternative.
17 languages from one model
6-second sample → speech in cloned voice
chunked audio output, low latency
English speaker → speak in Spanish/French/Italian
natural prosody, not robotic
REST endpoints for programmatic use
If your team profile matches one of these, XTTS-v2 (Coqui) is a strong fit out of the box.
generating multi-language content
creating character voices
narrating content in multiple languages
producing demo videos at scale
auto-narrating articles for screen readers
selling voice synthesis services
When evaluating self-hosted options for this category, here are the dimensions on which XTTS-v2 (Coqui) consistently lands above the alternatives.
The stack you'll plug XTTS-v2 (Coqui) into — services, protocols, and adjacent apps in the BluixApps catalog.
/root/bluixapps/xtts.txtbluixapps_ensure_nvidia_runtimeOperational guidance from running this in production — what to lock down, what surprises people.