AI / LLM · PRO TIER

WhisperXpro

WhisperX is the production-grade enhancement of OpenAI Whisper — adds 70× real-time inference speed, word-level timestamps via forced alignment, and speaker diarization via pyannote-audio. The standard for serious transcription + subtitling pipelines.

Install via WHMCS → Visit github.com ↗

🤖 AI / LLM Min 6144 MB RAM Port 8002 (http) Tier pro

// What it is

A closer look.

WhisperX is the production-grade enhancement of OpenAI Whisper — adds 70× real-time inference speed, word-level timestamps via forced alignment, and speaker diarization via pyannote-audio. The standard for serious transcription + subtitling pipelines.

Where vanilla Whisper is research code, WhisperX is the engineering-grade version.

// Use cases

What it's for.

Concrete scenarios where teams pick WhisperX over the SaaS alternative.

◆

Audio/video transcription

at production speed

◈

Multi-speaker labelling

(who said what)

◇

Word-level timestamps

for precise subtitle generation

▣

VAD pre-processing

skip silence, faster results

▦

Batch processing

entire podcasts, full meetings

▩

Multilingual

99 languages from Whisper backbone

// Who it's for

Built for these teams.

If your team profile matches one of these, WhisperX is a strong fit out of the box.

Profile A

Podcast producers

generating multi-speaker transcripts

Profile B

Video platforms

auto-generating SRT/VTT subtitles

Profile C

Call center analytics

transcribing customer calls at scale

Profile D

Meeting note systems

(Zoom transcripts, Teams summaries)

Profile E

Accessibility teams

captioning content

Profile F

AI app developers

building voice-to-text pipelines

// Differentiators

Why teams pick WhisperX.

When evaluating self-hosted options for this category, here are the dimensions on which WhisperX consistently lands above the alternatives.

✓BSD-4-Clause — fully open
✓Faster than vanilla Whisper — by 10-70× (batched + VAD + faster-whisper backend)
✓Word-level timestamps — forced alignment via wav2vec2
✓Diarization — via pyannote-audio integration
✓Production-tested — by major podcasts, transcription services
✓Better than commercial APIs — for many use cases at zero per-minute cost

// Integrations

Connects to.

The stack you'll plug WhisperX into — services, protocols, and adjacent apps in the BluixApps catalog.

◇

REST API server

POST audio → JSON/SRT/VTT

◈

Whisper backbone

uses faster-whisper or original Whisper

◆

pyannote-audio

for speaker diarization

▣

VAD

Silero or pyannote VAD preprocessor

▦

Pair with

Ollama/vLLM for "transcribe → summarize" pipelines

▩

Output formats

JSON, SRT, VTT, TSV, TXT

// Adoption & deployment

Notable users & community

17k+ GitHub stars (parent WhisperX)
Used by podcast platforms (Podscribe-style services)
Featured in major transcription tooling roundups
Active research community + commercial integrations
Multiple production-tested deployments

What we ship

Docker (ghcr.io/jim60105/whisperx:latest) with NVIDIA runtime
Persistent volumes: audio (input), output (transcripts), models cache
Port 8002 mapped
HF_TOKEN environment variable for diarization
Install report at /root/bluixapps/whisperx.txt
Sample curl commands for transcription + diarization
Output format selection guide
Use case examples (podcast, video subtitles, meetings)
Pairing suggestions (LLM for summarization)
GPU pre-flight check via bluixapps_ensure_nvidia_runtime
Backup hook covers audio + output + models cache

// Tips & operations

Run it properly.

Operational guidance from running this in production — what to lock down, what surprises people.

// PERFORMANCE

VRAM

6 GB GPU for large-v3 model; 4 GB CPU fallback works

// SECURITY

Speed

70× real-time on RTX 4090; 10-30× on RTX 3060

// OPERATIONS

Diarization

requires HF_TOKEN + accept pyannote terms

// RELIABILITY

VAD preprocessing

enables "skip silence" — massive speed-up on podcasts

// DEPLOYMENT

Language detection

automatic or specify language for accuracy

// SCALING

Output formats

// MAINTENANCE

Best languages

EN, ES, FR, DE, IT, PT, RU, ZH, JA

// COSTS

Production

batch via API, async queue, file upload limit ~100 MB

6144

// min ram (MB)

// min disk (GB)

8002

// access port

http

// protocol

pro

// bluixapps tier

// Alternatives in AI / LLM

Compare with

Project resources

Official sitegithub.com ↗