Home›Catalog›🤖 AI / LLM›Aphrodite Engine

AI / LLM · PRO TIER

Aphrodite Enginepro

Aphrodite Engine is a vLLM fork by Pygmalion AI that adds advanced sampling methods (top-a, min-p, mirostat, smoothing factor), broader quantization (EXL2, GGUF, AQLM, SqueezeLLM), and KoboldAI API compatibility. Designed for roleplay, creative writing, and exploration scenarios that need finer sampling control than vanilla vLLM provides.

Install via WHMCS → Visit github.com ↗

🤖 AI / LLM Min 16384 MB RAM Port 2242 (http) Tier pro

// What it is

A closer look.

Aphrodite Engine is a vLLM fork by Pygmalion AI that adds advanced sampling methods (top-a, min-p, mirostat, smoothing factor), broader quantization (EXL2, GGUF, AQLM, SqueezeLLM), and KoboldAI API compatibility. Designed for roleplay, creative writing, and exploration scenarios that need finer sampling control than vanilla vLLM provides.

// Use cases

What it's for.

Concrete scenarios where teams pick Aphrodite Engine over the SaaS alternative.

◆

Creative writing pipelines

advanced samplers for varied output

◈

Roleplay AI

preserving character voice across long conversations

◇

GGUF / EXL2

quantization support (more than vLLM)

▣

Triple-API compatibility

OpenAI + KoboldAI + native

▦

Karras schedulers

alternative sampling distributions

▩

Mirostat / smoothing

target perplexity sampling

// Who it's for

Built for these teams.

If your team profile matches one of these, Aphrodite Engine is a strong fit out of the box.

Profile A

AI roleplay platforms

(character.ai-style)

Profile B

Interactive fiction creators

needing varied LLM output

Profile C

Pygmalion AI community

members and their products

Profile D

Power users

wanting more sampler control than vLLM

Profile E

Researchers

exploring novel sampling methods

// Differentiators

Why teams pick Aphrodite Engine.

When evaluating self-hosted options for this category, here are the dimensions on which Aphrodite Engine consistently lands above the alternatives.

✓AGPL-3.0 — fully open
✓Advanced samplers — not available in vanilla vLLM:
✓GGUF + EXL2 quantization — broader than vLLM's GPTQ/AWQ
✓KoboldAI API — drop-in for SillyTavern, KoboldHorde, RisuAI
✓Pygmalion community — models work natively

// Integrations

Connects to.

The stack you'll plug Aphrodite Engine into — services, protocols, and adjacent apps in the BluixApps catalog.

◇

OpenAI v1

/v1/chat/completions, /v1/completions

◈

KoboldAI

/api/v1/generate (for SillyTavern, RisuAI, etc.)

◆

Aphrodite native

/v1/internal/* for advanced samplers

▣

Quantization

GGUF (llama.cpp), EXL2 (ExLlamaV2), AWQ, GPTQ, SqueezeLLM, Bitsandbytes, AQLM

▦

Pair with

SillyTavern (canonical roleplay UI), Pygmalion-tuned models

▩

Multi-GPU

--tensor-parallel-size N

// Adoption & deployment

Notable users & community

1.5k+ GitHub stars
PygmalionAI community + commercial Pygmalion service
Used in roleplay AI platforms
Active development by Alpin + contributors
Featured in r/LocalLLaMA roleplay sub-communities

What we ship

Docker (alpindale/aphrodite-engine:latest)
Default model: NousResearch/Meta-Llama-3.1-8B-Instruct (configurable)
Persistent volume: /opt/aphrodite/models (HF cache)
Port 2242 (Aphrodite default)
--launch-kobold-api for SillyTavern/RisuAI compatibility
Install report at /root/bluixapps/aphrodite.txt
Sample API call with advanced samplers (min_p, smoothing_factor)
Quantization format guide
HF_TOKEN environment variable
GPU pre-flight check via bluixapps_ensure_nvidia_runtime
Backup hook covers model cache

// Tips & operations

Run it properly.

Operational guidance from running this in production — what to lock down, what surprises people.

// PERFORMANCE

GGUF for diverse hardware

works on consumer GPUs without modern features

// SECURITY

EXL2 for speed

fastest quantization format, ExLlamaV2 lineage

// OPERATIONS

Sampler combos for RP

// RELIABILITY

Mirostat

target perplexity sampling, set mirostat: 1, mirostat_tau: 5

// DEPLOYMENT

Multi-shard

tensor parallel like vLLM

// SCALING

vs vLLM

same core engine, Aphrodite adds samplers + GGUF/EXL2

// MAINTENANCE

vs TGI

Aphrodite for RP/creative, TGI for HF integration

16384

// min ram (MB)

// min disk (GB)

2242

// access port

http

// protocol

pro

// bluixapps tier

// Alternatives in AI / LLM

Compare with

Project resources

Official sitegithub.com ↗