← Back to Payloads
tutorial2026-06-01

LiteLLM in 10 Minutes: One Interface, Every LLM Provider

Six API keys, six SDKs, six sets of weird edge cases. LiteLLM gives you a single OpenAI-compatible endpoint and routes to any provider — so swapping from GPT to Claude to Llama becomes a one-string change. Here's the setup that actually works in 10 minutes.
Quick Access
Install command
$ mrt install tutorial
Browse related skills
LiteLLM in 10 Minutes: One Interface, Every LLM Provider

LiteLLM in 10 Minutes: One Interface, Every LLM Provider

I have six different LLM API keys in my .env file. OpenAI. Anthropic. Google. Mistral. Groq. Together. Every project I touch has a different one in the lead role — and every provider ships a different SDK, a different response shape, a different streaming behavior, a different tool-calling format. Switching costs in this stack are real.

LiteLLM fixes it. It's an open-source proxy that gives you an OpenAI-compatible endpoint and routes to literally any provider. Set it up once, swap models with a string, and your code never changes.

Here's the part that takes 10 minutes once you know it.

The Core Idea

You run LiteLLM as a local proxy on port 4000. Your app talks to it like it's OpenAI. The proxy translates to whatever provider you've configured. To move from GPT-4o to Claude to Llama, you change the model name in your request — that's it.

```bash

Start the proxy with three models in 30 seconds

litellm \ --model openai/gpt-4o \ --model anthropic/claude-sonnet-4-5 \ --model gemini/gemini-2.0-flash ```

That's the first step. Three providers, one endpoint, one SDK.

Install and Configure

bash pip install 'litellm[proxy]'==1.51.0

Create a config.yaml — this is where you centralize everything:

```yaml model_list:

  • model_name: gpt-4o

litellm_params: model: openai/gpt-4o api_key: os.environ/OPENAI_API_KEY

  • model_name: claude-sonnet

litellm_params: model: anthropic/claude-sonnet-4-5 api_key: os.environ/ANTHROPIC_API_KEY

  • model_name: gemini-flash

litellm_params: model: gemini/gemini-2.0-flash api_key: os.environ/GEMINI_API_KEY

litellm_settings: drop_params: true # silently ignore params a provider doesn't support telemetry: false # opt out of phoning home ```

drop_params: true is the one flag that saves you hours. Anthropic doesn't have frequency_penalty? LiteLLM drops it instead of erroring. Always set it.

Point Your Code at the Proxy

```python from openai import OpenAI

client = OpenAI( base_url="http://localhost:4000", api_key="anything", # not enforced locally )

resp = client.chat.completions.create( model="claude-sonnet", # swap to "gpt-4o" with no other change messages=[{"role": "user", "content": "Explain KV cache in 2 sentences."}], ) print(resp.choices[0].message.content) ```

Your code doesn't know — and doesn't care — which provider answered. Run an A/B test by changing one string. Migrate providers in a PR with a single line of diff.

Patterns That Pay Off Immediately

Fallback chains. When a provider hiccups, LiteLLM retries the next one — no code change required.

```yaml

  • model_name: reliable

litellm_params: model: openai/gpt-4o api_key: os.environ/OPENAI_API_KEY fallbacks: ["claude-sonnet", "gemini-flash"] ```

Cost routing. Cheap model by default, expensive model for hard queries.

python def route(prompt: str) -> str: return "gemini-flash" if len(prompt) < 500 else "claude-sonnet"

Free logging. Every call gets logged to a local SQLite file with cost, latency, and token counts. That's the "log every model call" pattern — built in, no extra setup.

The Mistakes I Made

Don't assume provider-specific params are ignored. They aren't — temperature=0 means subtly different things to different providers, and LiteLLM passes them through verbatim unless drop_params: true is set. And always pass model= explicitly in the request body. If you don't, the proxy picks the first model in your list — rarely the one you wanted.

If you only ever use one provider, skip LiteLLM — the abstraction is overhead you don't earn back. The win shows up at the second provider, or the day a regional outage forces a failover in 30 seconds.

Set it up once on a Monday. By Friday you'll have routed around two outages and A/B tested three models without touching a line of business logic. That's the win.


LiteLLM is the only piece of LLM infrastructure I install in every project before writing code. The day a provider hiccups, you'll thank past-you.

Related Dispatches