Zep open-sourced Graphiti and nobody is talking about it. Bi-temporal model, episode-based provenance, MCP server, ~27K stars, 18.5% gains over full-context on LongMemEval with 90% lower latency.

Graphiti Is the Open-Source Temporal Knowledge Graph That Beats Mem0, MemGPT, and Every RAG-on-Chat-History Hack

Every agent I ship has the same disease. The model is fine. The tools are fine. The harness is fine. The memory is a half-broken pile of summarized chat transcripts that turns into mush by the third turn. The answer is the same thing I keep coming back to: stop stuffing chat into a vector store and start using a temporal knowledge graph. The library is Graphiti — the open-source engine behind Zep, Apache 2.0, ~27K stars, ICLR 2025 paper, and a freshly shipped MCP server.

Hey guys, Mr. Technology here.

What Graphiti Actually Is

Graphiti is a Python framework for building and querying temporal context graphs for AI agents. You feed it a stream of unstructured text — chat messages, emails, tool outputs, support tickets — and it extracts entities, relationships, and the time window during which each fact was true. You query with a hybrid retriever that blends semantic embeddings, BM25, and graph traversal.

It runs on Neo4j, FalkorDB (Redis-based, recommended for production), or Amazon Neptune, with a pluggable LLM and embedding provider. The LLM is the extraction worker; the graph is the source of truth. Provider-agnostic. (Graphiti docs)

The Bi-Temporal Model Is The Whole Point

Most agent memory systems do the obvious wrong thing: they store a fact, and when it changes, they overwrite it. The next time your agent looks, the old fact is gone. The agent confidently tells the user they live in Berlin when they moved to Lisbon six weeks ago. Welcome to amnesia.

Graphiti has a bi-temporal model with two timestamps on every fact edge: valid_at (when the fact became true) and invalid_at (when it stopped being true). When the LLM extracts a new fact that contradicts an existing one, Graphiti does not delete the old fact. It closes the validity window on the old edge and opens a new edge with a fresh valid_at. The historical record stays intact. This is the model temporal databases have used for thirty years, and almost nobody in the LLM world is doing it.

The second timestamp is created_at on the episode — the raw source data the fact was extracted from. That is provenance. When your agent gets something wrong, you trace the exact message that produced the bad fact and invalidate the subgraph in one call.

Episodes, Not Chunks

Graphiti stores episodes, not chunks. An episode is a unit of source data — one message, one document, one tool call — with an occurred_at timestamp. The LLM walks each episode during ingestion, extracts entities and edges, and the episode stays around as the ground-truth record. This enables incremental updates without recomputation: new episodes arrive, the extractor runs only on the new text, existing edges get re-validated, contradictions close validity windows.

That is what gives the 90% latency reduction Zep's paper reports. When your agent has 50,000 tokens of history, stuffing it all into the context window costs ~3 seconds of prefill. Graphiti's retriever returns a tightly scoped subgraph — a few hundred tokens of structured facts — and the model pre-fills in under 300ms. Same accuracy, tenth the latency, hundredth the tokens.

The Benchmark Wins Are Real

The Zep team published the numbers in their ICLR 2025 paper and they have held up. On DMR (Deep Memory Retrieval) — the original MemGPT/Letta eval — Graphiti-backed Zep scores 94.8% versus MemGPT's 93.4% and recursive summarization's 35.3%. (Zep paper)

The more interesting number is on LongMemEval — testing real enterprise memory like "remember the user's preference across 500 turns of mixed business and casual chat." Full-context GPT-4o scores 71.2%. Full-context GPT-4o-mini scores 60.2%. Zep-on-Graphiti scores 63.8% with gpt-4o — with 90% lower latency. Smaller, cheaper models with Graphiti beat larger, more expensive models that rely on the context window.

The MCP Server Is The Trick That Makes It Stick

The biggest product unlock in the last month is that Graphiti ships a first-party MCP server. Drop it into Claude Code, Cursor, or any MCP-aware client:

json

{
  "mcpServers": {
    "graphiti": {
      "command": "uvx",
      "args": ["graphiti-mcp-server"]
    }
  }
}

After that, Claude has add_memory, search_memory_nodes, search_memory_facts, and get_episodes as native tools and uses them automatically when relevant. Add the MCP server, point it at a FalkorDB instance, and your agent has memory that does not break.

Where It Fits

Mem0 is a vector store with light entity extraction — fast, simple, and amnesia with extra steps. No temporal model, no validity windows. Fine for a single session, lost across weeks.

Letta (MemGPT) pages context in and out of the LLM like virtual memory. Clever, but opinionated about the agent loop. Graphiti is loop-agnostic and beats it on DMR, 94.8% to 93.4%.

RAG-on-chat-history is the pattern most teams ship. Embed every message, top-k retrieve. It works until the user contradicts a message from three weeks ago — you retrieve both, the model is confused, the user gets a worse answer than if you had no memory. Graphiti's contradiction detection is the explicit fix.

The Take

RAG-on-chat-history is a placeholder. Vector stores with summary memory are a placeholder. MemGPT-style hierarchical paging is a clever placeholder. The actual fix for agent memory is a graph that knows when facts are true and where they came from — and the open-source implementation of that fix is Graphiti. There is no longer a good reason to ship an amnesiac agent in 2026.

— Mr. Technology

Repo: github.com/getzep/graphiti — Apache 2.0, ~27K stars. Backends: Neo4j, FalkorDB, Neptune. Paper: Zep: A Temporal Knowledge Graph Architecture for Agent Memory (arXiv 2501.13956), ICLR 2025. MCP server: graphiti-mcp-server on PyPI. Benchmarks: DMR 94.8% (vs MemGPT 93.4%), LongMemEval up to 18.5% gain over full-context with 90% latency reduction. Tested June 2026 with graphiti 0.16.x on FalkorDB 1.4.