
Setting Up LiteLLM as a Unified API Proxy: One Endpoint, Every LLM
If you're running an LLM application and you've written provider-specific code to handle OpenAI, Anthropic, and Google separately, you're wasting engineering time. The reason isn't technical elegance — it's that you probably haven't met LiteLLM yet. It's the open-source proxy that gives you a single OpenAI-compatible endpoint for every LLM you want to use.
Here's how to set it up properly.
The cost of provider fragmentation isn't visible until you have it. Three different API client libraries, three different error formats, three different rate limit headers, three different streaming protocols, three different ways to handle function calling. When you want to add a new model, you reimplement all of it.
LiteLLM solves this with a proxy pattern. You point your application at one OpenAI-compatible endpoint, configure the proxy with your provider credentials, and your code never knows it's talking to Claude, GPT-4, or Gemini. Switching models is a config change, not a code change.
**Step 1: Install LiteLLM.** It ships as a pip package and runs as a server.
pip install 'litellm[proxy]'
**Step 2: Create a config file.** This is where you declare which providers and models are available.
model_list:
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
litellm_params:
model: anthropic/claude-3-5-sonnet-20241022
api_key: os.environ/ANTHROPIC_API_KEY
litellm_params:
model: gemini/gemini-1.5-pro
api_key: os.environ/GEMINI_API_KEY
The `model_name` is what your application uses. The `model` field is the actual provider/model. Your app only ever sees the alias.
**Step 3: Start the proxy.**
litellm --config litellm_config.yaml --port 4000
You now have a local OpenAI-compatible endpoint at `http://localhost:4000`.
Any OpenAI client library works without modification:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:4000",
api_key="sk-anything" # Auth handled by proxy
)
response = client.chat.completions.create(
model="claude-sonnet", # Use your alias
messages=[{"role": "user", "content": "Hello!"}]
)
Same code works for `gpt-4o`, `gemini-pro`, or any other model in your config. Switch models by changing the string. The proxy handles format translation, retry logic, and rate limit handling.
**Virtual Keys.** Generate scoped API keys for each user or service without exposing provider credentials:
curl -X POST http://localhost:4000/key/generate \
-H "Authorization: Bearer sk-anything" \
-d '{"user_id": "team-alpha", "models": ["gpt-4o", "claude-sonnet"]}'
You get back a key that only works for the specified models. Revoke it when the team leaves.
**Spend Tracking.** Every request gets logged with cost. Hit `/spend/logs` to see who's spending what on which model. For a team with monthly AI budgets, this is the visibility you need.
Don't run this without authentication. By default, the proxy is open. Add `master_key: sk-your-key` to your config and pass it as the API key from clients.
Don't skip the proxy's rate limit config. If you don't set per-model limits, a runaway loop in your app can drain your API budget in minutes.
Don't forget to set `LITELLM_LOG=INFO` during development. The proxy logs every request including the full prompt and response. Use it. The first time you see exactly what your app is sending, you'll catch issues you didn't know existed.
That's the setup. About twenty minutes from zero to a unified API across every provider. The only question is what to migrate first.
— Mr. Technology