
I have six different LLM API keys in my .env file. OpenAI. Anthropic. Google. Mistral. Groq. Together. Every project I touch has a different one in the lead role — and every provider ships a different SDK, a different response shape, a different streaming behavior, a different tool-calling format. Switching costs in this stack are real.
LiteLLM fixes it. It's an open-source proxy that gives you an OpenAI-compatible endpoint and routes to literally any provider. Set it up once, swap models with a string, and your code never changes.
Here's the part that takes 10 minutes once you know it.
You run LiteLLM as a local proxy on port 4000. Your app talks to it like it's OpenAI. The proxy translates to whatever provider you've configured. To move from GPT-4o to Claude to Llama, you change the model name in your request — that's it.
```bash
litellm \ --model openai/gpt-4o \ --model anthropic/claude-sonnet-4-5 \ --model gemini/gemini-2.0-flash ```
That's the first step. Three providers, one endpoint, one SDK.
bash pip install 'litellm[proxy]'==1.51.0
Create a config.yaml — this is where you centralize everything:
```yaml model_list:
litellm_params: model: openai/gpt-4o api_key: os.environ/OPENAI_API_KEY
litellm_params: model: anthropic/claude-sonnet-4-5 api_key: os.environ/ANTHROPIC_API_KEY
litellm_params: model: gemini/gemini-2.0-flash api_key: os.environ/GEMINI_API_KEY
litellm_settings: drop_params: true # silently ignore params a provider doesn't support telemetry: false # opt out of phoning home ```
drop_params: true is the one flag that saves you hours. Anthropic doesn't have frequency_penalty? LiteLLM drops it instead of erroring. Always set it.
```python from openai import OpenAI
client = OpenAI( base_url="http://localhost:4000", api_key="anything", # not enforced locally )
resp = client.chat.completions.create( model="claude-sonnet", # swap to "gpt-4o" with no other change messages=[{"role": "user", "content": "Explain KV cache in 2 sentences."}], ) print(resp.choices[0].message.content) ```
Your code doesn't know — and doesn't care — which provider answered. Run an A/B test by changing one string. Migrate providers in a PR with a single line of diff.
Fallback chains. When a provider hiccups, LiteLLM retries the next one — no code change required.
```yaml
litellm_params: model: openai/gpt-4o api_key: os.environ/OPENAI_API_KEY fallbacks: ["claude-sonnet", "gemini-flash"] ```
Cost routing. Cheap model by default, expensive model for hard queries.
python def route(prompt: str) -> str: return "gemini-flash" if len(prompt) < 500 else "claude-sonnet"
Free logging. Every call gets logged to a local SQLite file with cost, latency, and token counts. That's the "log every model call" pattern — built in, no extra setup.
Don't assume provider-specific params are ignored. They aren't — temperature=0 means subtly different things to different providers, and LiteLLM passes them through verbatim unless drop_params: true is set. And always pass model= explicitly in the request body. If you don't, the proxy picks the first model in your list — rarely the one you wanted.
If you only ever use one provider, skip LiteLLM — the abstraction is overhead you don't earn back. The win shows up at the second provider, or the day a regional outage forces a failover in 30 seconds.
Set it up once on a Monday. By Friday you'll have routed around two outages and A/B tested three models without touching a line of business logic. That's the win.
LiteLLM is the only piece of LLM infrastructure I install in every project before writing code. The day a provider hiccups, you'll thank past-you.