← Back to Payloads
Tutorial2026-06-02· 3 min read

Setting Up LiteLLM as a Unified API Proxy: One Endpoint, Every LLM

Stop writing provider-specific code for OpenAI, Anthropic, and Google. LiteLLM is the open-source proxy that gives you one OpenAI-compatible endpoint for every LLM, with virtual keys and spend tracking built in. Twenty minutes from zero to a unified API.
Quick Access
Install command
$ mrt install tutorial
Browse related skills
Setting Up LiteLLM as a Unified API Proxy: One Endpoint, Every LLM

Setting Up LiteLLM as a Unified API Proxy: One Endpoint, Every LLM

If you're running an LLM application and you've written provider-specific code to handle OpenAI, Anthropic, and Google separately, you're wasting engineering time. The reason isn't technical elegance — it's that you probably haven't met LiteLLM yet. It's the open-source proxy that gives you a single OpenAI-compatible endpoint for every LLM you want to use.

Here's how to set it up properly.

Why You Need This

The cost of provider fragmentation isn't visible until you have it. Three different API client libraries, three different error formats, three different rate limit headers, three different streaming protocols, three different ways to handle function calling. When you want to add a new model, you reimplement all of it.

LiteLLM solves this with a proxy pattern. You point your application at one OpenAI-compatible endpoint, configure the proxy with your provider credentials, and your code never knows it's talking to Claude, GPT-4, or Gemini. Switching models is a config change, not a code change.

The Setup

**Step 1: Install LiteLLM.** It ships as a pip package and runs as a server.

pip install 'litellm[proxy]'

**Step 2: Create a config file.** This is where you declare which providers and models are available.

litellm_config.yaml

model_list:

  • model_name: gpt-4o

litellm_params:

model: openai/gpt-4o

api_key: os.environ/OPENAI_API_KEY

  • model_name: claude-sonnet

litellm_params:

model: anthropic/claude-3-5-sonnet-20241022

api_key: os.environ/ANTHROPIC_API_KEY

  • model_name: gemini-pro

litellm_params:

model: gemini/gemini-1.5-pro

api_key: os.environ/GEMINI_API_KEY

The `model_name` is what your application uses. The `model` field is the actual provider/model. Your app only ever sees the alias.

**Step 3: Start the proxy.**

litellm --config litellm_config.yaml --port 4000

You now have a local OpenAI-compatible endpoint at `http://localhost:4000`.

Pointing Your App at It

Any OpenAI client library works without modification:

from openai import OpenAI

client = OpenAI(

base_url="http://localhost:4000",

api_key="sk-anything" # Auth handled by proxy

)

response = client.chat.completions.create(

model="claude-sonnet", # Use your alias

messages=[{"role": "user", "content": "Hello!"}]

)

Same code works for `gpt-4o`, `gemini-pro`, or any other model in your config. Switch models by changing the string. The proxy handles format translation, retry logic, and rate limit handling.

The Two Features That Make It Indispensable

**Virtual Keys.** Generate scoped API keys for each user or service without exposing provider credentials:

curl -X POST http://localhost:4000/key/generate \

-H "Authorization: Bearer sk-anything" \

-d '{"user_id": "team-alpha", "models": ["gpt-4o", "claude-sonnet"]}'

You get back a key that only works for the specified models. Revoke it when the team leaves.

**Spend Tracking.** Every request gets logged with cost. Hit `/spend/logs` to see who's spending what on which model. For a team with monthly AI budgets, this is the visibility you need.

The Gotchas

Don't run this without authentication. By default, the proxy is open. Add `master_key: sk-your-key` to your config and pass it as the API key from clients.

Don't skip the proxy's rate limit config. If you don't set per-model limits, a runaway loop in your app can drain your API budget in minutes.

Don't forget to set `LITELLM_LOG=INFO` during development. The proxy logs every request including the full prompt and response. Use it. The first time you see exactly what your app is sending, you'll catch issues you didn't know existed.

That's the setup. About twenty minutes from zero to a unified API across every provider. The only question is what to migrate first.

— Mr. Technology