AI / LLM · PRO TIER

LiteLLMpro

LiteLLM is an OpenAI-compatible proxy that fronts 100+ LLM providers — OpenAI, Anthropic, Google, Mistral, Cohere, Hugging Face, Ollama, AWS Bedrock, Azure, and dozens more. Your code calls litellm.completion() (or the proxy's OpenAI-compatible REST endpoint) and LiteLLM routes to the actual provider with retries, fallbacks, cost tracking, and load balancing.

Install via WHMCS → Visit litellm.ai ↗

🤖 AI / LLM Min 1024 MB RAM Port 4000 (http) Tier pro

// What it is

A closer look.

LiteLLM is an OpenAI-compatible proxy that fronts 100+ LLM providers — OpenAI, Anthropic, Google, Mistral, Cohere, Hugging Face, Ollama, AWS Bedrock, Azure, and dozens more. Your code calls litellm.completion() (or the proxy's OpenAI-compatible REST endpoint) and LiteLLM routes to the actual provider with retries, fallbacks, cost tracking, and load balancing.

It's the "LLM router" pattern — the one piece of infrastructure that makes provider-switching painless.

// Use cases

What it's for.

Concrete scenarios where teams pick LiteLLM over the SaaS alternative.

◆

Provider-agnostic LLM apps

write code once, switch backends via config

◈

Cost optimization

route cheap queries to cheaper models automatically

◇

High availability

fallback chains across providers when one is down

▣

Budget enforcement

per-user / per-team spend limits with alerts

▦

Migration

gradual swap from OpenAI to Anthropic without code changes

// Who it's for

Built for these teams.

If your team profile matches one of these, LiteLLM is a strong fit out of the box.

Profile A

AI platform teams

standardizing LLM access across the org

Profile B

Enterprises

needing audit trail + budget enforcement on LLM usage

Profile C

Multi-LLM apps

wanting to A/B test providers without code refactor

Profile D

Cost-conscious startups

routing dev traffic to cheap providers, prod to premium

Profile E

Resellers

offering "OpenAI-compatible API" while proxying to multiple backends

// Differentiators

Why teams pick LiteLLM.

When evaluating self-hosted options for this category, here are the dimensions on which LiteLLM consistently lands above the alternatives.

✓OpenAI-compatible API — every OpenAI SDK works without code changes
✓100+ providers — most comprehensive LLM router in OSS
✓Cost + token tracking — built-in spend analytics per key
✓Routing rules — match queries to models by cost, latency, region
✓MIT license — clean for commercial / production
✓Active development — releases multiple times per week

// Integrations

Connects to.

The stack you'll plug LiteLLM into — services, protocols, and adjacent apps in the BluixApps catalog.

◇

LLM providers

OpenAI, Anthropic, Google, AWS Bedrock, Azure, Mistral, Cohere, HuggingFace, Ollama, vLLM, custom

◈

Observability

Langfuse, Helicone, OpenTelemetry, Prometheus metrics

◆

Caching

Redis-backed response cache to avoid duplicate API calls

▣

Auth

JWT, API keys with per-key rate limits + budgets

▦

Database

Postgres for spend tracking, key management, audit log

▩

Admin UI

built-in dashboard for keys, costs, model usage

▼

SDK clients

Python (native), JS via OpenAI SDK pointed at proxy

// Adoption & deployment

Notable users & community

15k+ GitHub stars
Adopted by major AI platform teams as standard LLM router
Featured in enterprise AI architecture guides
Backed by BerriAI with active commercial enterprise offering
Strong Discord, weekly releases, predictable roadmap

What we ship

Docker compose: LiteLLM proxy + Postgres + Redis
Pinned ghcr.io/berriai/litellm:latest (locked to release tag)
HTTPS via Let's Encrypt; admin UI with random master key
Pre-configured for Ollama detection on same VPS
Postgres for spend tracking + key persistence
Redis for response caching
Backup hook covers Postgres (keys + spend history)

// Tips & operations

Run it properly.

Operational guidance from running this in production — what to lock down, what surprises people.

// PERFORMANCE

Always run with database

without Postgres, spend tracking + key management don't persist

// SECURITY

Set budgets per key

without budget caps, a single buggy client can rack up huge bills

// OPERATIONS

Use response caching

Redis cache on identical prompts saves significant cost

// RELIABILITY

Monitor via Langfuse

built-in LiteLLM → Langfuse integration captures every call for debugging

// DEPLOYMENT

Health check models

LiteLLM's health endpoint pings each provider; integrate with uptime monitoring

// SCALING

Update frequently

provider APIs change; LiteLLM releases track them; stale versions = silent failures

1024

// min ram (MB)

// min disk (GB)

4000

// access port

http

// protocol

pro

// bluixapps tier

// Alternatives in AI / LLM

Compare with

Project resources

Official sitelitellm.ai ↗