← Back to Payloads
Tutorial2026-06-25

LiteLLM: A Unified API Proxy for Every LLM in Your Stack in 5 Minutes

Three models, three SDKs, three error formats. LiteLLM is the OpenAI-compatible proxy that sits in front of Claude, GPT, Gemini, Ollama, and 100+ providers, giving your whole stack one endpoint, one SDK, built-in cost tracking, virtual keys, fallbacks, and budgets. Five-minute setup. Version-controlled YAML. Pays for itself the day you add your second backend.
Quick Access
Install command
$ mrt install tutorial
Browse related skills
LiteLLM: A Unified API Proxy for Every LLM in Your Stack in 5 Minutes

LiteLLM: A Unified API Proxy for Every LLM in Your Stack in 5 Minutes

You have three models in your stack. Claude for production chat. GPT-5 for the eval step. Ollama for local dev. Three SDKs. Three error formats. Three rate-limit headers. Every team I have worked with in 2026 hits the same wall: the model layer is now a multi-vendor tax, paid in onboarding time and the silent cost of an engineer who gives up and hard-codes the SDK that works.

Hi guys, Mr. Technology here.

The fix is LiteLLM — BerriAI's OpenAI-compatible proxy that sits in front of every model you call, speaks the OpenAI wire format, and routes to 100+ providers under one URL. Five minutes of setup gives you a single /v1/chat/completions endpoint that handles Claude, GPT, Gemini, Bedrock, Vertex, Mistral, Ollama, vLLM, and any local server. Your app talks OpenAI. LiteLLM translates.

The Setup

bash pip install 'litellm[proxy]'

Drop a YAML at the root of your repo:

```yaml model_list:

  • model_name: claude-sonnet

litellm_params: model: claude-sonnet-4-6 api_key: os.environ/ANTHROPIC_API_KEY

  • model_name: gpt-5

litellm_params: model: gpt-5 api_key: os.environ/OPENAI_API_KEY

  • model_name: ollama-qwen

litellm_params: model: ollama/qwen2.5-coder:32b api_base: http://localhost:11434 ```

Spin it up:

bash litellm --config litellm_config.yaml --port 4000

Your entire org hits http://localhost:4000/v1/chat/completions with the OpenAI SDK, picks model="claude-sonnet", and gets a response. Same code in prod, staging, eval, and your laptop's dev loop.

Why This Wins

One SDK, one error format. Every provider's quirks — Anthropic's prompt caching headers, Gemini's safety blocks, Bedrock's sigv4, Ollama's missing system role — get normalized at the proxy. Your app catches openai.APIError and openai.RateLimitError the same way for every upstream.

Cost tracking is built in. LiteLLM ships a virtual-key system where you issue per-team or per-engineer keys, set monthly USD budgets, and watch spend land in Postgres or SQLite. Every request logs model, prompt_tokens, completion_tokens, cost_usd, user, team. When finance asks what the eval pipeline cost last month, you answer in ten seconds with SELECT team, SUM(cost_usd) FROM litellm_logs GROUP BY team.

Fallbacks and budgets. Add litellm_params.fallbacks: [gpt-5] to the Claude model and a 429 from Anthropic transparently retries on GPT-5. Add rpm: 100 and tpm: 500000 per virtual key and a runaway script cannot burn the budget. Add timeout: 30 once and your fleet stops hanging on a dead Ollama.

Streaming, function calling, JSON mode, vision — LiteLLM translates every OpenAI feature to the upstream's equivalent. The application code does not change when you swap backends.

The Pattern That Saved Me

Three rules make this stick past the demo:

1. One config file, version-controlled. The YAML lives next to your docker-compose.yml. New models are a PR. "Who added the new key?" is git log litellm_config.yaml.

2. Virtual keys for humans, real keys for CI. Engineers get sk-litellm-<name> with a $200/month budget. CI gets a service-account key with a 10x budget and an auditable tag.

3. Run the proxy in Docker, not on your laptop. docker run -p 4000:4000 ghcr.io/berriai/litellm:main is the same command in dev, CI, and prod. Your laptop is a client, not a server.

When To Skip This

If you ship one model to one customer and never plan to add another, you do not need a proxy. The moment you have two backends — even Claude prod plus Ollama dev — the proxy pays for itself in onboarding time alone.

LiteLLM is the boring infrastructure move that turns a multi-vendor LLM stack from a coordination problem into a config file. Spend the five minutes. Stop hand-rolling provider adapters.

Mr. Technology


*LiteLLM 1.50+ (June 2026), Apache 2.0, github.com/BerriAI/litellm. Supports 100+ providers including OpenAI, Anthropic, Google, AWS Bedrock, Vertex, Azure, Mistral, Groq, Together, Fireworks, Ollama, vLLM, and any OpenAI-compatible endpoint. Built-in spend tracking via Postgres/SQLite/Prisma, virtual keys, per-key budgets, fallbacks, retries, streaming, function calling, JSON mode, vision. UI on :4000/ui when LITELLM_UI=1.*

Related Dispatches