Audio/video transcription
at production speed
// official site: github.com ↗
WhisperX is the production-grade enhancement of OpenAI Whisper — adds 70× real-time inference speed, word-level timestamps via forced alignment, and speaker diarization via pyannote-audio. The standard for serious transcription + subtitling pipelines.
WhisperX is the production-grade enhancement of OpenAI Whisper — adds 70× real-time inference speed, word-level timestamps via forced alignment, and speaker diarization via pyannote-audio. The standard for serious transcription + subtitling pipelines.
Where vanilla Whisper is research code, WhisperX is the engineering-grade version.
Concrete scenarios where teams pick WhisperX over the SaaS alternative.
at production speed
(who said what)
for precise subtitle generation
skip silence, faster results
entire podcasts, full meetings
99 languages from Whisper backbone
If your team profile matches one of these, WhisperX is a strong fit out of the box.
generating multi-speaker transcripts
auto-generating SRT/VTT subtitles
transcribing customer calls at scale
(Zoom transcripts, Teams summaries)
captioning content
building voice-to-text pipelines
When evaluating self-hosted options for this category, here are the dimensions on which WhisperX consistently lands above the alternatives.
The stack you'll plug WhisperX into — services, protocols, and adjacent apps in the BluixApps catalog.
/root/bluixapps/whisperx.txtbluixapps_ensure_nvidia_runtimeOperational guidance from running this in production — what to lock down, what surprises people.