CatalogStacksModulesSaaSMobileLabs → Become a partner
HomeCatalog🤖 AI / LLMAphrodite Engine
Screenshot of Aphrodite Engine

// official site: github.com ↗

AI / LLM · PRO TIER

Aphrodite Enginepro

Aphrodite Engine is a vLLM fork by Pygmalion AI that adds advanced sampling methods (top-a, min-p, mirostat, smoothing factor), broader quantization (EXL2, GGUF, AQLM, SqueezeLLM), and KoboldAI API compatibility. Designed for roleplay, creative writing, and exploration scenarios that need finer sampling control than vanilla vLLM provides.

🤖 AI / LLM Min 16384 MB RAM Port 2242 (http) Tier pro
// What it is

A closer look.

Aphrodite Engine is a vLLM fork by Pygmalion AI that adds advanced sampling methods (top-a, min-p, mirostat, smoothing factor), broader quantization (EXL2, GGUF, AQLM, SqueezeLLM), and KoboldAI API compatibility. Designed for roleplay, creative writing, and exploration scenarios that need finer sampling control than vanilla vLLM provides.

// Use cases

What it's for.

Concrete scenarios where teams pick Aphrodite Engine over the SaaS alternative.

Creative writing pipelines

advanced samplers for varied output

Roleplay AI

preserving character voice across long conversations

GGUF / EXL2

quantization support (more than vLLM)

Triple-API compatibility

OpenAI + KoboldAI + native

Karras schedulers

alternative sampling distributions

Mirostat / smoothing

target perplexity sampling

// Who it's for

Built for these teams.

If your team profile matches one of these, Aphrodite Engine is a strong fit out of the box.

Profile A

AI roleplay platforms

(character.ai-style)

Profile B

Interactive fiction creators

needing varied LLM output

Profile C

Pygmalion AI community

members and their products

Profile D

Power users

wanting more sampler control than vLLM

Profile E

Researchers

exploring novel sampling methods

// Differentiators

Why teams pick Aphrodite Engine.

When evaluating self-hosted options for this category, here are the dimensions on which Aphrodite Engine consistently lands above the alternatives.

  • AGPL-3.0 — fully open
  • Advanced samplers — not available in vanilla vLLM:
  • GGUF + EXL2 quantization — broader than vLLM's GPTQ/AWQ
  • KoboldAI API — drop-in for SillyTavern, KoboldHorde, RisuAI
  • Pygmalion community — models work natively
// Integrations

Connects to.

The stack you'll plug Aphrodite Engine into — services, protocols, and adjacent apps in the BluixApps catalog.

OpenAI v1
/v1/chat/completions, /v1/completions
KoboldAI
/api/v1/generate (for SillyTavern, RisuAI, etc.)
Aphrodite native
/v1/internal/* for advanced samplers
Quantization
GGUF (llama.cpp), EXL2 (ExLlamaV2), AWQ, GPTQ, SqueezeLLM, Bitsandbytes, AQLM
Pair with
SillyTavern (canonical roleplay UI), Pygmalion-tuned models
Multi-GPU
--tensor-parallel-size N
// Adoption & deployment

Notable users & community

  • 1.5k+ GitHub stars
  • PygmalionAI community + commercial Pygmalion service
  • Used in roleplay AI platforms
  • Active development by Alpin + contributors
  • Featured in r/LocalLLaMA roleplay sub-communities

What we ship

  • Docker (alpindale/aphrodite-engine:latest)
  • Default model: NousResearch/Meta-Llama-3.1-8B-Instruct (configurable)
  • Persistent volume: /opt/aphrodite/models (HF cache)
  • Port 2242 (Aphrodite default)
  • --launch-kobold-api for SillyTavern/RisuAI compatibility
  • Install report at /root/bluixapps/aphrodite.txt
  • Sample API call with advanced samplers (min_p, smoothing_factor)
  • Quantization format guide
  • HF_TOKEN environment variable
  • GPU pre-flight check via bluixapps_ensure_nvidia_runtime
  • Backup hook covers model cache
// Tips & operations

Run it properly.

Operational guidance from running this in production — what to lock down, what surprises people.

// PERFORMANCE
GGUF for diverse hardware
works on consumer GPUs without modern features
// SECURITY
EXL2 for speed
fastest quantization format, ExLlamaV2 lineage
// OPERATIONS
Sampler combos for RP
// RELIABILITY
Mirostat
target perplexity sampling, set mirostat: 1, mirostat_tau: 5
// DEPLOYMENT
Multi-shard
tensor parallel like vLLM
// SCALING
vs vLLM
same core engine, Aphrodite adds samplers + GGUF/EXL2
// MAINTENANCE
vs TGI
Aphrodite for RP/creative, TGI for HF integration
16384
// min ram (MB)
40
// min disk (GB)
2242
// access port
http
// protocol
pro
// bluixapps tier

Project resources

Official sitegithub.com ↗