HomeCatalog🤖 AI / LLMOllama
Screenshot of Ollama website

// screenshot of ollama.com ↗

AI / LLM · PRO TIER

Ollamapro

Ollama is the de-facto standard for running local large language models on your own hardware. A single binary + REST API that pulls models from a public registry (Llama 3.3, Mistral, Qwen, DeepSeek, Phi-4 and dozens more), handles quantization, GPU offload, and exposes a simple /api/generate and /api/chat interface that's API-compatible with the OpenAI SDK.

🤖 AI / LLM Min 4096 MB RAM Port 11434 (http) Tier pro
// What it is

A closer look.

Ollama is the de-facto standard for running local large language models on your own hardware. A single binary + REST API that pulls models from a public registry (Llama 3.3, Mistral, Qwen, DeepSeek, Phi-4 and dozens more), handles quantization, GPU offload, and exposes a simple /api/generate and /api/chat interface that's API-compatible with the OpenAI SDK.

It's the boring, reliable engine that every other self-hosted AI tool ends up integrating against — Open WebUI, AnythingLLM, LibreChat, Flowise, LangChain, LiteLLM, n8n.

// Use cases

What it's for.

Concrete scenarios where teams pick Ollama over the SaaS alternative.

Private chat assistants

internal company chat that never sends prompts to OpenAI

GDPR-compliant LLM access

EU customers in healthcare, legal, finance who can't push prompts to US clouds

Cost control at scale

predictable per-month VPS bill vs metered API spend

Air-gapped inference

on-prem or restricted-network environments

AI app development backbone

local dev loop for engineers building on top of LLMs

// Who it's for

Built for these teams.

If your team profile matches one of these, Ollama is a strong fit out of the box.

Profile A

AI developers & ML engineers

fast local dev loop, no API rate limits, no $ per token while building

Profile B

Privacy-bound enterprises

legal, healthcare, finance, gov teams forbidden from US-hosted LLM APIs

Profile C

Hosting providers

resellers offering "private AI VPS" to their customers as a higher-margin SKU

Profile D

Researchers & academics

evaluating open models without paying OpenAI / Anthropic per experiment

Profile E

Indie SaaS founders

predictable per-month VPS cost beats unpredictable per-token bills as traffic grows

// Differentiators

Why teams pick Ollama.

When evaluating self-hosted options for this category, here are the dimensions on which Ollama consistently lands above the alternatives.

  • OpenAI-compatible API — most existing client code works with a URL change
  • Massive model catalog — with one-command pulls (ollama pull llama3.3)
  • Apache 2.0 license — commercial use unencumbered
  • CPU-capable — for small models (TinyLlama, Phi-3) — runs on a $7/mo VPS for testing
  • GPU optional — but supported (CUDA, ROCm, Apple Metal) when you scale to 7B+ models
  • Single binary — operational simplicity, no Python venv hell
// Integrations

Connects to.

The stack you'll plug Ollama into — services, protocols, and adjacent apps in the BluixApps catalog.

Chat UIs
Open WebUI, AnythingLLM, LibreChat, Khoj all detect Ollama as first-class backend
Workflow builders
n8n + Flowise + Langflow + Typebot have native Ollama nodes
LLM SDKs
LangChain, LlamaIndex, Semantic Kernel, Haystack all support Ollama natively
OpenAI-proxy gateways
LiteLLM proxies Ollama as if it were OpenAI for legacy clients
IDE assistants
Continue.dev, Aider, Cline, Cody let devs hit local Ollama for code completion
Model formats
pulls Hugging Face GGUF directly; Modelfile lets you fork & customize
Embeddings endpoint
/api/embeddings works with Chroma, Qdrant, pgvector RAG stacks
// Adoption & deployment

Notable users & community

  • 110k+ GitHub stars, top of awesome-selfhosted AI category
  • Integrated by Continue.dev, Cline, Aider, LangChain, LlamaIndex, LiteLLM, OpenWebUI as a first-class backend
  • Active Discord, weekly model drops, strong macOS and Linux maintainer community
  • Backed by ollama.ai company — sustainable dev model with permissive Apache 2.0 license
  • Cited in countless "self-host your AI stack" guides on r/selfhosted, r/LocalLLaMA

What we ship

  • Docker compose stack: Ollama server + GPU passthrough config (off by default)
  • Pre-allocated model storage volume at /var/lib/ollama for persistence across upgrades
  • Pinned ollama/ollama:0.5.4 image, tracked weekly against upstream
  • HTTP-only by default on 127.0.0.1:11434; SSL + auth via Nginx Proxy Manager when paired with Open WebUI
  • Sizing guidance shipped in customer docs: 8 GB RAM minimum for 3B models, 16 GB recommended for 7B, GPU required for 13B+
  • Backup hook captures /var/lib/ollama before each update (models can be 4-20 GB — opt-in)
// Tips & operations

Run it properly.

Operational guidance from running this in production — what to do before you scale, what to lock down, what surprises people.

// PERFORMANCE
Pre-pull models
before exposing the service — first request triggers a multi-GB download that times out user calls
// SECURITY
Tune OLLAMA_KEEP_ALIVE
default unloads model after 5 min idle; set 1h for warm latency, -1 to keep forever
// OPERATIONS
Verify GPU detection
with ollama ps — if model says "100% CPU" you're not using your GPU; check NVIDIA drivers + CUDA toolkit
// RELIABILITY
Never expose Ollama directly
no built-in auth; always behind nginx + basic auth, OAuth proxy, or a chat-UI gateway
// DEPLOYMENT
Memory budget rule
7B Q4 ≈ 5 GB, 13B Q4 ≈ 9 GB, 70B Q4 ≈ 40 GB; size VPS accordingly
// SCALING
Disk cleanup
ollama list then ollama rm unused; models silently accumulate in /usr/share/ollama/.ollama/models
4096
// min ram (MB)
20
// min disk (GB)
11434
// access port
http
// protocol
pro
// bluixapps tier
11434:11434 · ollama/ollama:latest
// docker image

Project resources

Official siteollama.com ↗
// Alternatives in AI / LLM

Compare with