
Setting Up LiteLLM as a Unified API Proxy: One Endpoint, Every LLM
If you're running an LLM application and you've written provider-specific code to handle OpenAI, Anthropic, and Google separately, you're wasting engineering time. The reason isn't technical elegance — it's that you probably haven't met LiteLLM yet. It's the open-source proxy that gives you a single OpenAI-compatible endpoint for every LLM you want to use.
Here's how to set it up properly.
The cost of provider fragmentation isn't visible until you have it. Three different API client libraries, three different error formats, three different rate limit headers, three different streaming protocols, three different ways to handle function calling. When you want to add a new model, you reimplement all of it.
LiteLLM solves this with a proxy pattern. You point your application at one OpenAI-compatible endpoint, configure the proxy with your provider credentials, and your code never knows it's talking to Claude, GPT-4, or Gemini. Switching models is a config change, not a code change.
Step 1: Install LiteLLM. It ships as a pip package and runs as a server.
bash pip install 'litellm[proxy]'
Step 2: Create a config file. This is where you declare which providers and models are available.
```yaml
model_list:
litellm_params: model: openai/gpt-4o api_key: os.environ/OPENAI_API_KEY
litellm_params: model: anthropic/claude-3-5-sonnet-20241022 api_key: os.environ/ANTHROPIC_API_KEY
litellm_params: model: gemini/gemini-1.5-pro api_key: os.environ/GEMINI_API_KEY ```
The model_name is what your application uses. The model field is the actual provider/model. Your app only ever sees the alias.
Step 3: Start the proxy.
bash litellm --config litellm_config.yaml --port 4000
You now have a local OpenAI-compatible endpoint at http://localhost:4000.
Any OpenAI client library works without modification:
```python from openai import OpenAI
client = OpenAI( base_url="http://localhost:4000", api_key="sk-anything" # Auth handled by proxy )
response = client.chat.completions.create( model="claude-sonnet", # Use your alias messages=[{"role": "user", "content": "Hello!"}] ) ```
Same code works for gpt-4o, gemini-pro, or any other model in your config. Switch models by changing the string. The proxy handles format translation, retry logic, and rate limit handling.
Virtual Keys. Generate scoped API keys for each user or service without exposing provider credentials:
bash curl -X POST http://localhost:4000/key/generate \ -H "Authorization: Bearer sk-anything" \ -d '{"user_id": "team-alpha", "models": ["gpt-4o", "claude-sonnet"]}'
You get back a key that only works for the specified models. Revoke it when the team leaves.
Spend Tracking. Every request gets logged with cost. Hit /spend/logs to see who's spending what on which model. For a team with monthly AI budgets, this is the visibility you need.
Don't run this without authentication. By default, the proxy is open. Add master_key: sk-your-key to your config and pass it as the API key from clients.
Don't skip the proxy's rate limit config. If you don't set per-model limits, a runaway loop in your app can drain your API budget in minutes.
Don't forget to set LITELLM_LOG=INFO during development. The proxy logs every request including the full prompt and response. Use it. The first time you see exactly what your app is sending, you'll catch issues you didn't know existed.
That's the setup. About twenty minutes from zero to a unified API across every provider. The only question is what to migrate first.
— Mr. Technology