← Back to Payloads
Tutorial2026-06-26

Run a Private ChatGPT Clone on Your Laptop in 10 Minutes: Ollama + Open WebUI

Spin up a private ChatGPT clone on your laptop in 10 minutes with Ollama and Open WebUI. Zero data leaves your machine, zero subscription, full OpenAI-compatible API.
Quick Access
Install command
$ mrt install tutorial
Browse related skills
Run a Private ChatGPT Clone on Your Laptop in 10 Minutes: Ollama + Open WebUI

Run a Private ChatGPT Clone on Your Laptop in 10 Minutes: Ollama + Open WebUI

You don't need to send your company's roadmap to OpenAI to use a capable model at your desk. You need two things: Ollama to run the model locally, and Open WebUI to give it a ChatGPT-style chat interface. Total install time on a MacBook Pro with an M-series chip: under ten minutes. Total cost: zero. Total data leaving your machine: zero.

Hey guys, Mr. Technology here.

Step 1: Install Ollama

bash curl -fsSL https://ollama.com/install.sh | sh ollama --version

That's the whole runtime. Ollama ships llama.cpp, model registries, and a local API server all in one binary. On macOS you can also grab the .dmg from ollama.com and skip the curl. Either way, ollama serve will start a daemon on http://127.0.0.1:11434 if it isn't already running.

Step 2: Pull a Real Model

Skip the toy 7B. For coding, summarization, and chat you want a 14B–32B parameter model that fits in unified memory. On a 32 GB Mac, this is the sweet spot:

bash ollama pull qwen2.5-coder:14b ollama pull llama3.1:8b ollama pull nomic-embed-text

The first one is your coding workhorse. The second is a faster general chat model. The third is for embeddings if you later wire up RAG. Each pull lands in 8–12 GB and caches to ~/.ollama/models. Subsequent pulls only fetch deltas.

Quick sanity check:

bash ollama run qwen2.5-coder:14b "Write a Python one-liner to flatten a nested list"

If you get a sensible answer, your stack is alive.

Step 3: Stand Up Open WebUI

The fastest path is Docker. One container, one volume, no fuss.

bash docker run -d \ --name open-webui \ -p 3000:8080 \ -v open-webui:/app/backend/data \ --restart always \ ghcr.io/open-webui/open-webui:main

Open http://localhost:3000, create an account (the first signup becomes admin), and the interface auto-discovers Ollama running on the host. Every model you ollama pull appears in the model dropdown. Streaming, markdown, code highlighting, conversation history, image attachments — it's all there.

If Docker isn't your thing, pip install open-webui and open-webui serve works equally well. Same UI, same features, just bound to your existing Python environment.

Step 4: Talk to It From Your Code

Ollama exposes an OpenAI-compatible endpoint, so your existing client code barely changes:

```python from openai import OpenAI

client = OpenAI( base_url="http://127.0.0.1:11434/v1", api_key="ollama", # any string works )

resp = client.chat.completions.create( model="qwen2.5-coder:14b", messages=[{"role": "user", "content": "Refactor this function for readability"}], ) print(resp.choices[0].message.content) ```

That same base URL works in Cursor, Continue, Cline, and any other tool that lets you point the OpenAI endpoint somewhere else. One local daemon, every client.

The Take

You went from a blank laptop to a private ChatGPT clone with persistent chat history, multi-model support, and an OpenAI-compatible API in ten minutes. No accounts, no telemetry, no rate limits, no $20/month subscription. The only thing missing is scale — and that's a problem you solve with vLLM and a GPU box when you actually need one.

Mr. Technology


*Ollama: MIT-licensed local model runner, macOS/Linux/Windows. Open WebUI: MIT, self-hosted. Models: qwen2.5-coder (Apache 2.0), llama3.1 (Meta license), nomic-embed-text (Apache 2.0). RAM budget: 8B models need ~8 GB, 14B need ~12 GB, 32B need ~24 GB. Swap --restart always for --restart unless-stopped on systemd hosts.*

Related Dispatches