← Back to Payloads
Open Source2026-06-08

Mem0's New Memory Algorithm Hits 92.5 on LoCoMo at 7K Tokens

Mem0's v3 algorithm scores 92.5 on LoCoMo and 94.4 on LongMemEval at roughly one-quarter the tokens of full-context approaches, and ships an agent-native signup flow that is the real story. The first open-source agent memory layer in 2026 worth betting production traffic on.
Quick Access
Install command
$ mrt install mem0
Browse related skills
Mem0's New Memory Algorithm Hits 92.5 on LoCoMo at 7K Tokens

Mem0's New Memory Algorithm Hits 92.5 on LoCoMo at 7K Tokens

Mem0 just shipped a v3 memory algorithm that scores 92.5 on LoCoMo, 94.4 on LongMemEval, and 64.1 on BEAM (1M) — while averaging 6,800 tokens per retrieval call. Full-context approaches on the same benchmarks burn 25,000+. That is the entire pitch in one line: same accuracy, roughly one-quarter the tokens. It is the first open-source agent memory layer in 2026 I would bet production traffic on. Apache 2.0 license, Python and Node SDKs, repo at mem0ai/mem0.

What changed in the algorithm

The old Mem0 did extraction, update, and delete in a multi-pass loop. It worked. It also forgot things the model thought it had remembered. The new algorithm is single-pass ADD-only. One LLM call extracts facts, period. Memories accumulate; nothing is overwritten. Update and delete were the wrong abstraction — most "memory edits" in practice are new memories with a timestamp, not destructive rewrites.

The retrieval side got a real upgrade. Three scoring signals run in parallel and fuse: semantic similarity (vector search), BM25 keyword matching, and entity matching. The combined score beats any single signal. The third signal is the under-rated one: agent-generated facts are now first-class. When your agent confirms an action — "I shipped PR #482" — that fact is stored with the same weight as a user-stated fact. The previous version buried agent output, which is the most common reason agent memory felt unreliable in 2025.

The numbers, exactly

All from Mem0's published research paper, run on the same production-representative model stack, single-pass retrieval, no agentic loops:

  • LoCoMo (1,540 questions, 5 categories): 92.5 overall. Old algorithm: 71.4.
  • LongMemEval (500 questions, 6 categories): 94.4 overall. Old algorithm: 67.8. +53.6 on assistant memory recall specifically.
  • BEAM 1M (700 questions, 35 conversations): 64.1.
  • BEAM 10M (200 questions, 10 conversations): 48.6.
  • Mean tokens per retrieval call: ~6,800, vs 25,000+ for full-context.

The evaluation framework is open-sourced, so the numbers are reproducible. That is more than most of the agent-memory space can claim. Letta's filesystem paper for example scores 74% on LoCoMo, but the comparison set is much narrower and the reproduction code lives in their own blog post.

What it actually looks like

The default config wires up OpenAI gpt-5-mini for extraction, text-embedding-3-small for embeddings, Qdrant for vectors, and SQLite for history. You can swap any of it.

```python from mem0 import Memory

m = Memory()

messages = [ {"role": "user", "content": "I'm building a Postgres-backed API on Fly.io."}, {"role": "assistant", "content": "Got it. I'll keep that in mind."}, ] m.add(messages, user_id="rami")

results = m.search("Where is the user's app hosted?", filters={"user_id": "rami"}) for r in results["results"]: print(r["memory"], r["score"]) ```

That is the entire library API for the common path. For teams, the self-hosted Docker compose brings up the full server with auth, dashboards, and graph store. The cloud tier does the same minus the ops.

Where it fits — and where it does not

Mem0 is a focused memory layer, not an agent framework. That distinction matters. Letta (formerly MemGPT) is a full agent runtime with a memory subsystem; Cognee is a knowledge engine that fuses graph and vector; Mem0 is the bolt-on that any agent stack can adopt. If you already have a LangGraph or CrewAI or hand-rolled agent, you add Mem0 as the persistent memory primitive and you are done. If you need a turnkey agent runtime with planning, tool calling, and memory in one package, Mem0 is the wrong abstraction — pick Letta or LangGraph.

The clever bit, and the part I did not expect, is the onboarding flow. As of mid-2026 Mem0 ships an agent signup path:

bash npm install -g @mem0/cli mem0 init --agent --agent-caller claude-code mem0 add "I am using mem0" mem0 search "am I using mem0"

An agent mints a working API key in under five seconds — no email, no dashboard, no OTP. The human owner claims the account later. In 2026 the user of an API is increasingly an agent, not a person, and the fact that Mem0's CLI treats that as a first-class case is a real product signal. Most auth flows still assume a human at a keyboard.

Verdict

Three things to be clear about. One: the algorithm is genuinely better. Single-pass ADD-only extraction is the right call, multi-signal retrieval is the right call, and the benchmark uplift is large enough to be reproducible. Two: this is a memory layer, not an agent framework. If you need a planner, you still need LangGraph or Letta. Three: the agent signup flow is the most under-reported product decision in the agent-memory space in 2026, and other vendors will copy it.

If you are building any agent that needs to remember user-specific facts, prior decisions, or evolving domain state, install mem0ai, swap in m.add() and m.search(), and stop paying 25,000 tokens per retrieval. The v2-to-v3 migration guide is in the docs and the upgrade is one call away.

Mr. Technology

Related Dispatches