CatalogStacksModulesSaaSMobileLabs → Become a partner
HomeCatalog🤖 AI / LLMWhisperX
Screenshot of WhisperX

// official site: github.com ↗

AI / LLM · PRO TIER

WhisperXpro

WhisperX is the production-grade enhancement of OpenAI Whisper — adds 70× real-time inference speed, word-level timestamps via forced alignment, and speaker diarization via pyannote-audio. The standard for serious transcription + subtitling pipelines.

🤖 AI / LLM Min 6144 MB RAM Port 8002 (http) Tier pro
// What it is

A closer look.

WhisperX is the production-grade enhancement of OpenAI Whisper — adds 70× real-time inference speed, word-level timestamps via forced alignment, and speaker diarization via pyannote-audio. The standard for serious transcription + subtitling pipelines.

Where vanilla Whisper is research code, WhisperX is the engineering-grade version.

// Use cases

What it's for.

Concrete scenarios where teams pick WhisperX over the SaaS alternative.

Audio/video transcription

at production speed

Multi-speaker labelling

(who said what)

Word-level timestamps

for precise subtitle generation

VAD pre-processing

skip silence, faster results

Batch processing

entire podcasts, full meetings

Multilingual

99 languages from Whisper backbone

// Who it's for

Built for these teams.

If your team profile matches one of these, WhisperX is a strong fit out of the box.

Profile A

Podcast producers

generating multi-speaker transcripts

Profile B

Video platforms

auto-generating SRT/VTT subtitles

Profile C

Call center analytics

transcribing customer calls at scale

Profile D

Meeting note systems

(Zoom transcripts, Teams summaries)

Profile E

Accessibility teams

captioning content

Profile F

AI app developers

building voice-to-text pipelines

// Differentiators

Why teams pick WhisperX.

When evaluating self-hosted options for this category, here are the dimensions on which WhisperX consistently lands above the alternatives.

  • BSD-4-Clause — fully open
  • Faster than vanilla Whisper — by 10-70× (batched + VAD + faster-whisper backend)
  • Word-level timestamps — forced alignment via wav2vec2
  • Diarization — via pyannote-audio integration
  • Production-tested — by major podcasts, transcription services
  • Better than commercial APIs — for many use cases at zero per-minute cost
// Integrations

Connects to.

The stack you'll plug WhisperX into — services, protocols, and adjacent apps in the BluixApps catalog.

REST API server
POST audio → JSON/SRT/VTT
Whisper backbone
uses faster-whisper or original Whisper
pyannote-audio
for speaker diarization
VAD
Silero or pyannote VAD preprocessor
Pair with
Ollama/vLLM for "transcribe → summarize" pipelines
Output formats
JSON, SRT, VTT, TSV, TXT
// Adoption & deployment

Notable users & community

  • 17k+ GitHub stars (parent WhisperX)
  • Used by podcast platforms (Podscribe-style services)
  • Featured in major transcription tooling roundups
  • Active research community + commercial integrations
  • Multiple production-tested deployments

What we ship

  • Docker (ghcr.io/jim60105/whisperx:latest) with NVIDIA runtime
  • Persistent volumes: audio (input), output (transcripts), models cache
  • Port 8002 mapped
  • HF_TOKEN environment variable for diarization
  • Install report at /root/bluixapps/whisperx.txt
  • Sample curl commands for transcription + diarization
  • Output format selection guide
  • Use case examples (podcast, video subtitles, meetings)
  • Pairing suggestions (LLM for summarization)
  • GPU pre-flight check via bluixapps_ensure_nvidia_runtime
  • Backup hook covers audio + output + models cache
// Tips & operations

Run it properly.

Operational guidance from running this in production — what to lock down, what surprises people.

// PERFORMANCE
VRAM
6 GB GPU for large-v3 model; 4 GB CPU fallback works
// SECURITY
Speed
70× real-time on RTX 4090; 10-30× on RTX 3060
// OPERATIONS
Diarization
requires HF_TOKEN + accept pyannote terms
// RELIABILITY
VAD preprocessing
enables "skip silence" — massive speed-up on podcasts
// DEPLOYMENT
Language detection
automatic or specify language for accuracy
// SCALING
Output formats
// MAINTENANCE
Best languages
EN, ES, FR, DE, IT, PT, RU, ZH, JA
// COSTS
Production
batch via API, async queue, file upload limit ~100 MB
6144
// min ram (MB)
15
// min disk (GB)
8002
// access port
http
// protocol
pro
// bluixapps tier

Project resources

Official sitegithub.com ↗