← Back to Payloads
Tutorial2026-06-08

Bun + Ollama: A 60-Second Local LLM Dev Loop With Hot Reload

Stop paying the model-load tax on every prompt edit. Bun's built-in --hot flag plus a long-lived Ollama client gives you sub-second prompt iteration. Sixty seconds of setup, real afternoon-long productivity.
Quick Access
Install command
$ mrt install tutorial
Browse related skills
Bun + Ollama: A 60-Second Local LLM Dev Loop With Hot Reload

Bun + Ollama: A 60-Second Local LLM Dev Loop With Hot Reload

You are building a tool that calls a local LLM. You change the prompt. You save. You restart the script. You wait for the model to load. You curse. You do this forty times an afternoon.

The fix is hot reload on the prompt and a long-lived Bun process that keeps the Ollama connection warm. Sixty seconds of setup. Real, saved-in-the-muscle productivity.

The Setup (One Minute)

You need two things: Ollama running and Bun installed.

```bash

Ollama

curl -fsSL https://ollama.com/install.sh | sh ollama pull llama3.2:3b ollama serve &

Bun

curl -fsSL https://bun.sh/install | bash ```

Pick a small model for dev. llama3.2:3b keeps the loop under a second. Save the 70B for staging.

The Project

bash mkdir llm-dev-loop && cd llm-dev-loop bun init -y bun add ollama

Two files. index.ts is the runner. prompt.txt is the prompt. That separation is the trick — Bun watches files for changes and re-imports them.

```typescript // index.ts import ollama from "ollama"; import { readFileSync } from "fs";

const prompt = readFileSync("./prompt.txt", "utf-8");

const res = await ollama.chat({ model: "llama3.2:3b", messages: [{ role: "user", content: prompt }], stream: false, });

console.log("\n=== MODEL ==="); console.log(res.message.content); console.log("=== END ===\n"); ```

The Hot Reload

Bun has --hot built in. It reloads modules on file change, keeps the parent process alive, and does not tear down the Ollama connection on every save.

bash bun --hot run index.ts

Edit prompt.txt. Save. The script reruns in under 200ms. The Ollama client keeps the model resident. You iterate on the prompt in real time — same feel as tweaking a React component, none of the model-loading tax.

The Pattern That Saved Me

Three things made this sticky:

**1. A seed file for deterministic testing.** Drop a few example inputs in seeds/. Loop over them in dev. You stop staring at one example and start seeing the prompt across the dataset.

**2. A system.txt separate from prompt.txt.** Most of your prompt engineering lives in the system message. Keep user input separate so you can iterate on each independently.

**3. A --smoke flag for CI.** Same script, different entry point. Bun's Bun.argv makes this one line.

typescript if (Bun.argv.includes("--smoke")) { // run 3 fixed seeds, assert no truncation, exit 0 }

The Result

You go from edit → restart → wait for model load → read output to edit → read output. The whole afternoon opens up. You start noticing the bad prompts immediately, the ones that work across the seeds, the ones that drift on edge cases. Same idea as TDD for prompts: fast feedback, observable behavior, no waiting on infrastructure.

The loop is small, free, and fits in a paragraph of your CLAUDE.md so the team uses the same pattern. Do it once on Monday, get it back for the rest of the quarter.

Mr. Technology


*Bun 1.1+ (--hot flag shipped stable in 1.1.0, April 2024). Ollama 0.5+ with llama3.2:3b (~2GB VRAM, ~150ms first-token on M2 Pro, 50-80ms steady-state). Tested June 2026 on Linux + macOS. Pair with watchman on Linux for faster file-change detection.*

Related Dispatches