
Running large language models locally gives you privacy, speed, and cost control. No API calls, no data leaving your machine, no per-token fees. Ollama makes this shockingly easy.
This guide gets you from zero to running a local model in under 10 minutes.
macOS / Linux: ``bash curl -fsSL https://ollama.com/install.sh | sh
Windows: Download the installer at ollama.com/download.
Verify it worked: ``bash ollama --version
Ollama's model library is available at ollama.com/library. Popular starting points:
Pull one with: ``bash ollama pull llama3.2
First pull downloads the model weights (several GB depending on size). Subsequent runs are near-instant.
bash ollama run llama3.2
You're in an interactive REPL. Type prompts, hit Enter, get responses. Exit with /bye or Ctrl+C.
Ollama runs a local REST API on port 11434 by default. This makes it trivial to integrate into any project.
```python import requests
response = requests.post("http://localhost:11434/api/generate", json={ "model": "llama3.2", "prompt": "Explain async/await in Python", "stream": False })
print(response.json()["response"]) ```
For a streaming response: ```python import requests
stream = requests.post("http://localhost:11434/api/generate", json={ "model": "llama3.2", "prompt": "Write a Python decorator", }, stream=True)
for line in stream.iter_lines(): if line: print(line.decode(), end="", flush=True) ```
You can have several models installed simultaneously. Switch between them by name: ``bash ollama run codellama # Switch to a code-specialized model
List what's installed: ``bash ollama list
LLMs need RAM. As a rough guide:
MacBooks with M-series chips run these efficiently. On x86 Linux, an Nvidia GPU significantly speeds things up but isn't required.
Here's how I use this daily: I keep Ollama running in the background, and my editor shortcuts send selected code to the local API for explanation or refactoring suggestions. No context windows, no API costs, instant responses.
```bash
echo "def fibonacci(n):" | ollama run llama3.2 ```
Once you're comfortable with the basics, explore:
The barrier to running capable LLMs locally has never been lower. Give it 20 minutes and you'll wonder why you waited.