
Two years of AI Twitter confidently declared fine-tuning dead, and the corpse is bench-pressing. Fine-tuning is back. Not the 2023 version where you torched $80K on a full-rank run of a 70B model on an H100 cluster. A leaner version. PEFT. QLoRA. DoRA. Unsloth. Open-weight base models at 7B-32B you can fine-tune on a single consumer GPU in an afternoon. The dead-crowd was right about one thing: the old fine-tuning is dead. The new one is the most underrated move in the stack.
Hey guys, Mr. Technology here.
In 2023, fine-tuning meant renting an 8xH100 cluster at $25-40 an hour, waiting three days, finding the model overfit, repeating, and watching your base model update six weeks later erase your gains. That economics was stupid. You could prompt-engineer to 90% of the value for $0. The 2024 takes were correct against that economics but wrong about the technique. They confused a pricing problem with a capability problem. Then three things changed at once, and the pundit class missed it because they had already filed the column.
One: parameter-efficient fine-tuning works. LoRA was a curiosity in 2021. By 2024, QLoRA — 4-bit quantized base plus low-rank adapters — made it practical. By 2025, DoRA and friends made it almost free. You no longer touch the base weights — you train a few hundred megabytes of adapter that hot-swaps at inference. The cost of a fine-tune dropped roughly 100x in two years.
Two: open-weight base models got good at being fine-tuned. Llama 3.1, Qwen 2.5, Mistral, DeepSeek, Gemma, Phi, the Nemotron line — models trained to be fine-tuned, with stable chat templates, clean tokenizers, friendly licenses. Fine-tuning Llama 3.1 8B on a domain corpus is a weekend with Unsloth, a 24GB consumer GPU, and 10,000 examples.
Three: the eval stack caught up. Phoenix, Langfuse, Braintrust, Inspect — A/B a fine-tune against a prompted base on your actual traffic with a holdout in an afternoon. The threshold for "worth a fine-tune" collapsed to "I have 5,000 examples and a measurable outcome."
All in production in 2026. A fine-tuned 7B on a single L40S scores a support ticket in 40ms at 1/20th the cost of a 500ms frontier call. A 5,000-example fine-tune on Qwen 2.5 14B produces a model unmistakably yours across three languages — the part customers notice. A fine-tuned 7B with Instructor or Outlines produces schema-conformant JSON at 99%+ versus maybe 85% for a prompted base — the difference between "works in production" and "works in demos." RAG hands the model documents; fine-tuning hands it the idioms of the domain. The dead-crowd framing — RAG vs fine-tuning — was always a false binary.
Clickbait. "Fine-tuning is dead" is a take; "fine-tuning is having a parameter-efficient renaissance driven by QLoRA" is a research paper. Nobody tweets the second one. They were at frontier labs or startups with unlimited API budgets — when your OpenAI bill is $400K a month the cost-curve argument does not land; when it is $4K and you have engineers who can run a fine-tune, it is the entire business case. And they were not paying attention to the open-weight ecosystem. The Qwen, DeepSeek, Mistral, Llama, and Unsloth teams shipped a coherent fine-tuning stack between mid-2024 and mid-2025 that made the 2023 take obsolete. They were still arguing about GPT-4.5 vs Claude 3.5 and missed the whole game moving to the open-weights side.
Fine-tuning is back because of the open-weights stack. PEFT, QLoRA, DoRA, Unsloth, the Llama/Qwen/Mistral/DeepSeek family, and the modern eval stack together turned fine-tuning from a 6-week $80K project into a 2-day $200 experiment. Every team in 2026 paying frontier API rates for high-volume classification, structured extraction, brand-voice generation, or domain-expert work is leaving 10x-100x margin on the table. The teams that figure this out first will own a cost advantage their competitors cannot match in a quarter. The ones still tweeting "just prompt it better" will keep paying frontier prices and wonder where the margins went.
— Mr. Technology