Text-to-video
high-fidelity short videos from prompts
// official site: github.com ↗
CogVideoX is THUDM's open-source text-to-video model family (China's Tsinghua University + Zhipu AI) — generates 6-10 second videos at 720×480 with strong temporal consistency and physics. Multiple model sizes (2B, 5B params) for different VRAM budgets. Apache 2.0 license.
CogVideoX is THUDM's open-source text-to-video model family (China's Tsinghua University + Zhipu AI) — generates 6-10 second videos at 720×480 with strong temporal consistency and physics. Multiple model sizes (2B, 5B params) for different VRAM budgets. Apache 2.0 license.
One of the highest-quality open video models, the answer to closed-source Runway/Pika competitors.
Concrete scenarios where teams pick CogVideoX over the SaaS alternative.
high-fidelity short videos from prompts
extend a still image into motion
re-style or extend existing video
campaign visuals, social ads
bring concepts to life
high-quality storyboard videos
If your team profile matches one of these, CogVideoX is a strong fit out of the box.
producing high-quality video content
offering video generation to clients
studying video generation
generating campaign visuals
offering premium video gen tier
When evaluating self-hosted options for this category, here are the dimensions on which CogVideoX consistently lands above the alternatives.
The stack you'll plug CogVideoX into — services, protocols, and adjacent apps in the BluixApps catalog.
THUDM/CogVideo repo/root/bluixapps/cogvideox.txtbluixapps_ensure_nvidia_runtimeOperational guidance from running this in production — what to lock down, what surprises people.