← Back to Payloads
Open Source2026-06-22

LLaMA-Factory Is the Boring Fine-Tuning Framework That Actually Ships, and That Is the Highest Compliment in Infrastructure

Unsloth is the fast kid. Axolotl is the configurable kid. LLaMA-Factory is the kid who shows up with the right YAML, the right dataset utilities, and a web UI the rest of your team can use without a PhD. ~50,000 GitHub stars, 200+ model recipes, Apache 2.0, and the only fine-tuning framework quietly adopted by Alibaba, Microsoft, Tencent, IBM, NVIDIA, and Baidu.
Quick Access
Install command
$ mrt install llama-factory
Browse related skills
LLaMA-Factory Is the Boring Fine-Tuning Framework That Actually Ships, and That Is the Highest Compliment in Infrastructure

LLaMA-Factory Is the Boring Fine-Tuning Framework That Actually Ships, and That Is the Highest Compliment in Infrastructure

The fine-tuning stack of 2026 has a celebrity — Unsloth, with its Triton kernels and 2x speedup headlines — and a research workhorse — Axolotl, with its YAML configs and multi-node pretraining reputation. Neither is what most teams use to ship a fine-tune into production. That job belongs to LLaMA-Factory, the largest fine-tuning project on GitHub that nobody in the West talks about.

Hi guys, Mr. Technology here.

LLaMA-Factory is the Apache 2.0 fine-tuning framework built by Yaowei Zheng (hiyouga). ~50,000 GitHub stars, 200+ supported model architectures, 100+ dataset formats, a built-in web UI (LLaMA Board), and integration with Hugging Face Transformers, PEFT, TRL, Liger Kernel, Unsloth, FlashAttention 2, BAdam, GaLore, DoRA. The institutional adopters are the part nobody in the U.S. press has noticed: Alibaba, Microsoft, Tencent, IBM, NVIDIA, Baidu, the Chinese Academy of Sciences. When Microsoft Research Asia publishes a fine-tuning paper, the code runs on LLaMA-Factory. When Alibaba ships a domain-tuned Qwen variant, it ships from LLaMA-Factory. That is infrastructure adoption.

What It Actually Is

A thin wrapper around the Hugging Face training stack. A train.py entry point, a YAML for model + dataset + method + hyperparameters, a CLI that does the right thing by default. The piece most people miss is the dataset layer — LLaMA-Factory ships a unified preprocessing pipeline handling Alpaca, ShareGPT, OpenAI messages format, KTO, DPO, ranking, retrieval-augmented QA, and ~100 more formats through a single dataset_dir config key. Point it at a JSON Lines file, and the trainer, loss function, chat template, masking, and packing all get wired correctly. The win is the dataset abstraction — the part every team gets wrong the first ten times they fine-tune on custom data.

The web UI is the piece I underestimated. LLaMA Board is a Gradio app that loads a model, lets a non-engineer pick a dataset, pick a LoRA rank, click Start, watch the loss curve. For teams past 20 engineers where prompt-engineer, data team, and engineering team are different people, this matters more than a 30% training speedup.

The Method Zoo

LLaMA-Factory ships every fine-tuning method that has produced a real result in the last two years in a single config: full-parameter fine-tuning with DeepSpeed ZeRO-2/3 or FSDP, freeze tuning, LoRA / QLoRA with bitsandbytes 4-bit, DoRA / rsLoRA / LoRA+, GaLore, BAdam, every alignment method (DPO, KTO, ORPO, SimPO, CPO, RM), NEFTune, MoE routing tweaks. And the two that matter most: Liger Kernel (enable_liger_kernel: true for a 60% VRAM cut) and the Unsloth backend (use_unsloth: true for a 2x speedup).

LLaMA-Factory is the orchestration layer that lets you pick the backend. Run the same YAML with one backend for a VRAM cut, flip to another for speed, flip back when you want to debug a forward pass without custom kernels. The framework exposes the trade-off instead of making it for you. That is rare, and it is the difference between a research tool and infrastructure.

What It Beats, What It Loses To

Versus Unsloth. Unsloth is faster on a single GPU — the Triton kernels pay off, especially on a 24GB consumer card with QLoRA. For single-GPU QLoRA runs, Unsloth is the right answer. For everything else, LLaMA-Factory is.

Versus Axolotl. Axolotl is the labs' tool — YAML-first, multi-node-first, full-FT-first. Excellent for pretraining a 70B on a 32-GPU cluster. Famously brittle when you deviate from a known recipe. For applied fine-tuning in a product team, LLaMA-Factory is more flexible, more documented, more frequently updated.

Versus raw Hugging Face + PEFT + TRL. The pure HF stack is the substrate. LLaMA-Factory is a smart wrapper. You can do everything it does by writing your own training script — three weeks for you, three more for your colleague. LLaMA-Factory is the HF stack with the parts you would have written yourself, already written, tested, and documented in Chinese and English.

What I Don't Love

The defaults lag on new architectures. LLaMA-Factory often trails Axolotl by a week on chat template and RoPE scaling for fresh Qwen or Llama variants. Pull requests from Alibaba and Microsoft Research Asia land within days, but the day-zero experience is rough.

The dataset schema lock-in is real. Multi-turn tool calls with structured output, retrieval-augmented chains, agent traces take a day to massage into shape. The win is that once you have done it, your dataset works across every supported model.

Docs are excellent in Chinese and uneven in English. Most primary documentation, blog posts, and tutorial videos are in Chinese first. English translations are good but lag by weeks.

The Take

LLaMA-Factory is not the most exciting fine-tuning framework. It is not the fastest, the most configurable, or the most research-friendly. It is the most adopted — by an order of magnitude, in any honest count of institutional users. The reason is the boring one: it ships, it works, and it does the part everyone else makes you do yourself.

If you are choosing a fine-tuning framework today, the answer is LLaMA-Factory for the orchestration layer, Unsloth for the GPU-efficient QLoRA path, Axolotl for multi-node pretraining, and raw HF for the architecture nobody has wrapped yet. LLaMA-Factory is the only one that makes the orchestration part look easy, and in 2026, the easy option that ships is the highest compliment in infrastructure.

Mr. Technology


*LLaMA-Factory: github.com/hiyouga/LLaMA-Factory — Apache 2.0, ~50,000 GitHub stars, 200+ model architectures (Llama 3.x/4, Qwen 2/3, Mistral, Gemma 2/3, Phi-3/4, DeepSeek-V3, GLM-4, InternLM, Baichuan, Yi, Command R+), 100+ dataset formats. LLaMA Board web UI on localhost:7860. Liger Kernel + Unsloth backends configurable per run. CLI: pip install llamafactory then llamafactory-cli train/eval/export/webui.*

Related Dispatches