HomeCatalog🤖 AI / LLMArize Phoenix
Screenshot of Arize Phoenix website

// screenshot of github.com ↗

AI / LLM · PRO TIER

Arize Phoenixpro

Arize Phoenix is an LLM observability and tracing platform — OpenTelemetry-native, captures every LLM call (prompt, response, tokens, latency, cost) and visualizes traces for debugging. Built specifically for AI applications where stack traces don't tell you why the model said something stupid.

🤖 AI / LLM Min 1024 MB RAM Port 6006 (http) Tier pro
// What it is

A closer look.

Arize Phoenix is an LLM observability and tracing platform — OpenTelemetry-native, captures every LLM call (prompt, response, tokens, latency, cost) and visualizes traces for debugging. Built specifically for AI applications where stack traces don't tell you why the model said something stupid.

It's the closest OSS equivalent to LangSmith and Helicone, designed by Arize AI (a major ML observability company).

// Use cases

What it's for.

Concrete scenarios where teams pick Arize Phoenix over the SaaS alternative.

LLM debugging

replay traces to understand why responses were wrong

Cost analysis

track token spend per prompt template, per user, per feature

Latency profiling

find slow chain steps in agent workflows

Eval frameworks

run benchmarks on LLM outputs with consistent metrics

A/B testing prompts

compare prompt versions on real traffic

// Who it's for

Built for these teams.

If your team profile matches one of these, Arize Phoenix is a strong fit out of the box.

Profile A

AI engineers

debugging RAG and agent pipelines in production

Profile B

ML observability teams

standardizing LLM monitoring across products

Profile C

Product teams

running prompt experiments with measurable metrics

Profile D

Compliance / audit teams

maintaining LLM call audit trails

Profile E

AI platform teams

providing observability as a service to internal teams

// Differentiators

Why teams pick Arize Phoenix.

When evaluating self-hosted options for this category, here are the dimensions on which Arize Phoenix consistently lands above the alternatives.

  • OpenTelemetry-native — standard tracing protocol, integrates with any obs stack
  • Apache 2.0 — fully open, commercial use unrestricted
  • LangChain / LlamaIndex first-class — auto-instrumentation, no manual tracing code
  • Evals built-in — LLM eval framework included
  • Self-hosted — keep prompts + responses on your infrastructure
  • Active development — backed by Arize AI commercial product
// Integrations

Connects to.

The stack you'll plug Arize Phoenix into — services, protocols, and adjacent apps in the BluixApps catalog.

Auto-instrumentation
LangChain, LlamaIndex, OpenAI SDK, LiteLLM, Bedrock
OpenTelemetry
send traces from any otel-compatible SDK
LLM providers
captures calls to OpenAI, Anthropic, Ollama, any OpenAI-compatible
Eval frameworks
Phoenix Evals for built-in benchmarking
Datasets API
curate prompt/response datasets for fine-tuning
Webhook export
push traces to downstream systems
REST API
programmatic access to trace data
// Adoption & deployment

Notable users & community

  • 5k+ GitHub stars
  • Adopted by Arize AI customers + LangChain users
  • Active Slack community
  • Featured in LLM observability stack guides
  • Strong roadmap with continuous feature additions

What we ship

  • Docker compose: Phoenix + Postgres + persistent storage
  • Pinned arizephoenix/phoenix:latest (release-tagged)
  • HTTPS via Let's Encrypt
  • Pre-configured OpenTelemetry endpoint for trace ingestion
  • Auto-detects LiteLLM on same VPS for combined observability stack
  • Persistent volume for trace storage
  • Backup hook covers Postgres + trace exports
// Tips & operations

Run it properly.

Operational guidance from running this in production — what to do before you scale, what to lock down, what surprises people.

// PERFORMANCE
Set up auto-instrumentation early
manual tracing is tedious; auto-instrumentation gives 80% coverage in minutes
// SECURITY
Mind storage growth
every LLM call captured; cold storage / TTL policies essential at scale
// OPERATIONS
Sampling for production
high-traffic apps don't need 100% trace sampling; reduce to 10-50% to control cost
// RELIABILITY
Use Eval framework
Phoenix Evals = built-in LLM-as-judge evaluators; saves writing eval code
// DEPLOYMENT
Auth via reverse proxy
Phoenix has minimal built-in auth; protect with Authelia / OAuth proxy
// SCALING
Persistent storage
traces are valuable data; mount volume from day one
1024
// min ram (MB)
5
// min disk (GB)
6006
// access port
http
// protocol
pro
// bluixapps tier
6006:6006 · arizephoenix/phoenix:latest
// docker image

Project resources

Official sitegithub.com ↗
// Alternatives in AI / LLM

Compare with