← Back to Payloads
Opinion2026-06-26

Fine-Tuning Is Dead for 95% of Use Cases. Stop Telling People to Fine-Tune.

Frontier APIs, prompt caching, distillation, and 1M-token context windows have made most custom LoRAs worse than calling Claude with a good system prompt. The fine-tuning cottage industry is selling 2023 infrastructure to a 2026 market.
Quick Access
Install command
$ mrt install opinion
Browse related skills
Fine-Tuning Is Dead for 95% of Use Cases. Stop Telling People to Fine-Tune.

Fine-Tuning Is Dead for 95% of Use Cases. Stop Telling People to Fine-Tune.

Hot take: if your blog post or conference talk tells a generalist engineering team to fine-tune a model in 2026, you are giving them bad advice. The fine-tuning cottage industry is selling 2023 infrastructure to a 2026 market. Unsloth made fine-tuning cheap. Cheap is not the same as right.

The Math Has Flipped

In 2023, fine-tuning won on cost. GPT-3.5 was $0.002 per 1K output tokens; a fine-tuned Llama-2-7B on your own H100 was effectively free per token after the GPU was paid off. That math is dead. In June 2026, Claude Sonnet 4.5 sits at $3 / $15 per million input / output tokens, GPT-5-mini at $0.25 / $2, Gemini 2.5 Flash at $0.30 / $2.50. With Anthropic prompt caching at a 10x discount on cached reads, you can put the entire company knowledge base inside the prompt for under ten cents per request (Anthropic prompt caching pricing).

A 7B fine-tune on dedicated hardware burns $400-800/month for a single A100-80GB before storage, observability, and the engineer on pager when the base model updates. Break-even versus gpt-5-mini is north of 40 million output tokens per month — a bar almost no internal use case clears in year one.

Quality Has Flipped Too

The second argument was quality. A fine-tuned 7B on your domain beat GPT-3.5 on your domain. Of course it did — GPT-3.5 was a generalist and you had a narrow task. In 2026 the generalists are not 2023's. Claude Sonnet 4.5, GPT-5, and Gemini 2.5 Pro are frontier models trained on trillions of tokens. A LoRA on a 7B base is a 2% delta on top of a starting point that is now 40% behind.

The METR studies on developer productivity and the Arize and Braintrust enterprise evals converge on the same finding: prompt engineering plus tool use plus frontier models beats small fine-tunes at lower total cost. The headline "our fine-tuned 8B beats GPT-4!" posts from 2024 were honest then. They are lies now (METR — Measuring the Impact of AI on Developer Productivity).

The Maintenance Tax Nobody Prices In

Fine-tunes rot. Llama 4 ships. Qwen 3 ships. Your LoRA was on Llama 3.1. You have three choices: re-fine-tune on the new base, stay on the old base and fall behind, or merge into a model the ecosystem is leaving behind. Every team running fine-tunes in production has hit this.

Frontier vendors snapshot the model. Your fine-tune target stays frozen for the contract's lifetime. When a new model ships, you re-evaluate with a one-line API change (Braintrust — Eval-Driven Development).

The 5% That Actually Needs It

Fine-tuning is not dead. It is niche:

  • High-volume structured output in a stable schema — tax forms, medical coding, fixed JSON where prompt overhead dominates.
  • On-device or air-gapped inference — defense, medical devices, embedded in a car. No API in a Faraday cage.
  • Latency-critical loops under 30ms — trading, robotics. No round-trip budget.
  • Distillation from a frontier model you already pay for — you have a Claude contract, you need a cheap local 3B, you distill. The fine-tune is downstream of the API, not a replacement.

None of those justifications fit on a tweet. If your reason for fine-tuning fits on a tweet, you do not need a fine-tune. You need a better prompt and a cached frontier API call.

The Take

I will die on this hill: the fine-tuning industry — the Substack posts, the "fine-tune your own GPT" courses, the LoRA-on-Mistral tutorials — is selling 2023 infrastructure to a 2026 market. Frontier APIs, prompt caching, distillation, and 1M-token context windows have made most custom LoRAs worse than calling Claude with a good system prompt. The remaining 5% is real, narrow, and not what Twitter is pitching.

Before you write the "how to fine-tune Llama 4 on your docs" tutorial, ask one question: have you exhausted the cheap options? Prompt caching. Tool use. Long context. Distillation. If the answer is no, write the prompt. The fine-tune will still be there in six months when you have earned the right to need one.

Mr. Technology

Related Dispatches