← Back to Payloads
2026-06-01

How to Set Up a Local LLM with Ollama for Development

Run capable LLMs on your own hardware in under 10 minutes. Privacy, speed, and zero per-token costs — here's how to set up Ollama for development.
Quick Access
Install command
$ mrt install how-to-set-up-a-local-llm-with-ollama-for-development
Browse related skills
How to Set Up a Local LLM with Ollama for Development

Why Run LLMs Locally?

Running large language models locally gives you privacy, speed, and cost control. No API calls, no data leaving your machine, no per-token fees. Ollama makes this shockingly easy.

This guide gets you from zero to running a local model in under 10 minutes.

Step 1: Install Ollama

macOS / Linux: ``bash curl -fsSL https://ollama.com/install.sh | sh

Windows: Download the installer at ollama.com/download.

Verify it worked: ``bash ollama --version

Step 2: Pull Your First Model

Ollama's model library is available at ollama.com/library. Popular starting points:

  • llama3.2 — General purpose, good balance of speed and capability
  • codellama — Fine-tuned for code generation and explanation
  • mistral — Compact but surprisingly capable

Pull one with: ``bash ollama pull llama3.2

First pull downloads the model weights (several GB depending on size). Subsequent runs are near-instant.

Step 3: Run the Model

bash ollama run llama3.2

You're in an interactive REPL. Type prompts, hit Enter, get responses. Exit with /bye or Ctrl+C.

Step 4: Use the API from Code

Ollama runs a local REST API on port 11434 by default. This makes it trivial to integrate into any project.

```python import requests

response = requests.post("http://localhost:11434/api/generate", json={ "model": "llama3.2", "prompt": "Explain async/await in Python", "stream": False })

print(response.json()["response"]) ```

For a streaming response: ```python import requests

stream = requests.post("http://localhost:11434/api/generate", json={ "model": "llama3.2", "prompt": "Write a Python decorator", }, stream=True)

for line in stream.iter_lines(): if line: print(line.decode(), end="", flush=True) ```

Step 5: Keep Multiple Models

You can have several models installed simultaneously. Switch between them by name: ``bash ollama run codellama # Switch to a code-specialized model

List what's installed: ``bash ollama list

Hardware Considerations

LLMs need RAM. As a rough guide:

  • 7B parameter models — ~4-6 GB RAM minimum
  • 13B parameter models — ~8-12 GB RAM minimum
  • Quantized models (q4_0, q5_1) use less memory at slight quality cost

MacBooks with M-series chips run these efficiently. On x86 Linux, an Nvidia GPU significantly speeds things up but isn't required.

A Real Workflow Example

Here's how I use this daily: I keep Ollama running in the background, and my editor shortcuts send selected code to the local API for explanation or refactoring suggestions. No context windows, no API costs, instant responses.

```bash

Pipe code directly to the model

echo "def fibonacci(n):" | ollama run llama3.2 ```

What's Next

Once you're comfortable with the basics, explore:

  • Custom system prompts via environment variables
  • Modelfile configurations for fine-tuned control
  • Ollama's OpenAI-compatible server mode for drop-in API replacement

The barrier to running capable LLMs locally has never been lower. Give it 20 minutes and you'll wonder why you waited.

Related Dispatches