
You are building a tool that calls a local LLM. You change the prompt. You save. You restart the script. You wait for the model to load. You curse. You do this forty times an afternoon.
The fix is hot reload on the prompt and a long-lived Bun process that keeps the Ollama connection warm. Sixty seconds of setup. Real, saved-in-the-muscle productivity.
You need two things: Ollama running and Bun installed.
```bash
curl -fsSL https://ollama.com/install.sh | sh ollama pull llama3.2:3b ollama serve &
curl -fsSL https://bun.sh/install | bash ```
Pick a small model for dev. llama3.2:3b keeps the loop under a second. Save the 70B for staging.
bash mkdir llm-dev-loop && cd llm-dev-loop bun init -y bun add ollama
Two files. index.ts is the runner. prompt.txt is the prompt. That separation is the trick — Bun watches files for changes and re-imports them.
```typescript // index.ts import ollama from "ollama"; import { readFileSync } from "fs";
const prompt = readFileSync("./prompt.txt", "utf-8");
const res = await ollama.chat({ model: "llama3.2:3b", messages: [{ role: "user", content: prompt }], stream: false, });
console.log("\n=== MODEL ==="); console.log(res.message.content); console.log("=== END ===\n"); ```
Bun has --hot built in. It reloads modules on file change, keeps the parent process alive, and does not tear down the Ollama connection on every save.
bash bun --hot run index.ts
Edit prompt.txt. Save. The script reruns in under 200ms. The Ollama client keeps the model resident. You iterate on the prompt in real time — same feel as tweaking a React component, none of the model-loading tax.
Three things made this sticky:
**1. A seed file for deterministic testing.** Drop a few example inputs in seeds/. Loop over them in dev. You stop staring at one example and start seeing the prompt across the dataset.
**2. A system.txt separate from prompt.txt.** Most of your prompt engineering lives in the system message. Keep user input separate so you can iterate on each independently.
**3. A --smoke flag for CI.** Same script, different entry point. Bun's Bun.argv makes this one line.
typescript if (Bun.argv.includes("--smoke")) { // run 3 fixed seeds, assert no truncation, exit 0 }
You go from edit → restart → wait for model load → read output to edit → read output. The whole afternoon opens up. You start noticing the bad prompts immediately, the ones that work across the seeds, the ones that drift on edge cases. Same idea as TDD for prompts: fast feedback, observable behavior, no waiting on infrastructure.
The loop is small, free, and fits in a paragraph of your CLAUDE.md so the team uses the same pattern. Do it once on Monday, get it back for the rest of the quarter.
— Mr. Technology
*Bun 1.1+ (--hot flag shipped stable in 1.1.0, April 2024). Ollama 0.5+ with llama3.2:3b (~2GB VRAM, ~150ms first-token on M2 Pro, 50-80ms steady-state). Tested June 2026 on Linux + macOS. Pair with watchman on Linux for faster file-change detection.*