
You spent $200 last month on Copilot. Your M2 Pro has 32GB of unified memory sitting idle. The 7B coding model you've been calling from OpenRouter fits on your laptop. You do not need to pay for a coding assistant in 2026. The setup is brew install ollama → ollama pull qwen2.5-coder:7b → install Continue.dev → point it at the local endpoint. Free. Private. Does not phone home.
Hey guys, Mr. Technology here.
Continue.dev is an open-source VSCode/JetBrains extension that gives you inline edits, autocomplete, and a chat sidebar. The trick is ~/.continue/config.json lets you point it at any OpenAI-compatible endpoint — including Ollama on http://localhost:11434. Same UX as Copilot. Your data never leaves your laptop. Your inference cost is electricity.
```bash brew install ollama ollama serve & # runs on http://localhost:11434 ollama pull qwen2.5-coder:7b
ollama pull qwen2.5-coder:32b ```
Verify with curl http://localhost:11434/api/generate -d '{"model":"qwen2.5-coder:7b","prompt":"print hi","stream":false}'. Returns JSON in under a second on M2 Pro. If it hangs, your model did not load — check ollama list.
Install from the VSCode Marketplace (ext install continue.continue). Open the config:
bash mkdir -p ~/.continue code ~/.continue/config.json
Drop this in:
json { "models": [ { "title": "Qwen 2.5 Coder 7B (Local)", "provider": "ollama", "model": "qwen2.5-coder:7b", "apiBase": "http://localhost:11434" } ], "tabAutocompleteModel": { "title": "Qwen 2.5 Coder 1.5B", "provider": "ollama", "model": "qwen2.5-coder:1.5b", "apiBase": "http://localhost:11434" } }
Two models, not one. The 7B handles chat and inline edits. The 1.5B handles autocomplete — because autocomplete fires 50+ times an hour and the smaller model is 4x faster. This is the single most useful Continue.dev configuration tip nobody writes down.
Continue loads ~/.continue/prompts/ as slash commands. Drop a file called review.md:
```markdown
name: review description: Code review focused on production safety
Review the highlighted code for: SQL injection, missing input validation, hardcoded secrets, error swallowing, and N+1 queries. Be terse. Use bullet points. ```
Type /review in the chat sidebar on any code block. Same UX as Cursor's /review, no subscription.
ollama serve is competing with the chat model for the same GPU. Run two Ollama instances on different ports, or drop the small model — autocomplete gets noticeably worse but chat stays sharp."contextLength": 16384 and bump maxTokens in the same config."allowAnonymousTelemetry": false to be sure, or block Ollama's outbound traffic with Little Snitch / pf.Continue.dev + Ollama is the boring free local coding-assistant setup that just works in 2026. No subscription. No data leaving your machine. A Qwen 2.5 Coder 7B on a 32GB Mac handles 90% of what Copilot does, with the remaining 10% being tasks where you want a frontier model — and for those, point Continue at OpenRouter and add a second model entry. The 15-minute setup is the highest-leverage afternoon you will spend on your dev environment this month.
— Mr. Technology
*Tested June 2026 with Continue 1.1.x, Ollama 0.5+, Qwen 2.5 Coder (1.5B / 7B / 32B), on macOS 15+ and Ubuntu 24.04. The two-model split (large for chat, small for autocomplete) is the only non-obvious config. Set "allowAnonymousTelemetry": false in Continue's settings.json if privacy matters. Add an OpenRouter model entry under models[] for the 10% of tasks where you need a frontier model — same UX, costs only the actual calls.*