You spent $200 last month on Copilot. Your M2 Pro has 32GB sitting idle. A 15-minute setup gives you a free, private, code-aware VSCode AI assistant.

Continue.dev + Ollama: A Free Local AI Coding Assistant in VSCode (15-Minute Setup)

You spent $200 last month on Copilot. Your M2 Pro has 32GB of unified memory sitting idle. The 7B coding model you've been calling from OpenRouter fits on your laptop. You do not need to pay for a coding assistant in 2026. The setup is brew install ollama → ollama pull qwen2.5-coder:7b → install Continue.dev → point it at the local endpoint. Free. Private. Does not phone home.

Hey guys, Mr. Technology here.

What You Get

Continue.dev is an open-source VSCode/JetBrains extension that gives you inline edits, autocomplete, and a chat sidebar. The trick is ~/.continue/config.json lets you point it at any OpenAI-compatible endpoint — including Ollama on http://localhost:11434. Same UX as Copilot. Your data never leaves your laptop. Your inference cost is electricity.

Step 1 — Ollama

bash

brew install ollama
ollama serve &              # runs on http://localhost:11434
ollama pull qwen2.5-coder:7b
# optional: bigger model if you have the VRAM
ollama pull qwen2.5-coder:32b

Verify with curl http://localhost:11434/api/generate -d '{"model":"qwen2.5-coder:7b","prompt":"print hi","stream":false}'. Returns JSON in under a second on M2 Pro. If it hangs, your model did not load — check ollama list.

Step 2 — Continue.dev

Install from the VSCode Marketplace (ext install continue.continue). Open the config:

bash

mkdir -p ~/.continue
code ~/.continue/config.json

Drop this in:

json

{
  "models": [
    {
      "title": "Qwen 2.5 Coder 7B (Local)",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b",
      "apiBase": "http://localhost:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen 2.5 Coder 1.5B",
    "provider": "ollama",
    "model": "qwen2.5-coder:1.5b",
    "apiBase": "http://localhost:11434"
  }
}

Two models, not one. The 7B handles chat and inline edits. The 1.5B handles autocomplete — because autocomplete fires 50+ times an hour and the smaller model is 4x faster. This is the single most useful Continue.dev configuration tip nobody writes down.

Step 3 — Slash Commands

Continue loads ~/.continue/prompts/ as slash commands. Drop a file called review.md:

markdown

---
name: review
description: Code review focused on production safety
---
Review the highlighted code for: SQL injection, missing input validation, hardcoded secrets, error swallowing, and N+1 queries. Be terse. Use bullet points.

Type /review in the chat sidebar on any code block. Same UX as Cursor's /review, no subscription.

Gotchas

Autocomplete latency. If the 1.5B feels slow, ollama serve is competing with the chat model for the same GPU. Run two Ollama instances on different ports, or drop the small model — autocomplete gets noticeably worse but chat stays sharp.
Context window. Qwen 2.5 Coder 7B has 32K context. Continue's default pulls 4K tokens from the current file plus chat history. For longer files, add "contextLength": 16384 and bump maxTokens in the same config.
Privacy. Ollama runs entirely on-device. Continue's telemetry is opt-in — set "allowAnonymousTelemetry": false to be sure, or block Ollama's outbound traffic with Little Snitch / pf.
The 32B model needs ~20GB VRAM. M2 Max 64GB handles it via unified memory. M1 8GB does not. On integrated graphics, stay on the 7B.
VSCode, not Cursor. Continue works in Cursor but the chat UX is rough — Cursor has not merged the integration into their sidebar. VSCode users: this is the afternoon you wire it up.

The Take

Continue.dev + Ollama is the boring free local coding-assistant setup that just works in 2026. No subscription. No data leaving your machine. A Qwen 2.5 Coder 7B on a 32GB Mac handles 90% of what Copilot does, with the remaining 10% being tasks where you want a frontier model — and for those, point Continue at OpenRouter and add a second model entry. The 15-minute setup is the highest-leverage afternoon you will spend on your dev environment this month.

— Mr. Technology

*Tested June 2026 with Continue 1.1.x, Ollama 0.5+, Qwen 2.5 Coder (1.5B / 7B / 32B), on macOS 15+ and Ubuntu 24.04. The two-model split (large for chat, small for autocomplete) is the only non-obvious config. Set "allowAnonymousTelemetry": false in Continue's settings.json if privacy matters. Add an OpenRouter model entry under models[] for the 10% of tasks where you need a frontier model — same UX, costs only the actual calls.*