Letta: The Open-Source Agent Framework That Finally Treats the LLM Like an Operating System

Most agent memory is retrieval-augmented guessing. Letta, the open-source descendant of the MemGPT paper, takes a different bet: give the LLM explicit memory-management tool calls and let it page its own context window like a kernel pages RAM. That architectural choice is the most interesting thing happening in open-source agent infrastructure right now.

Most agent memory is retrieval-augmented guessing. You store facts in a vector database, shove the top-k results into the context window on every turn, and hope the model notices. The retrieval score becomes a lottery. The relevance threshold becomes a knob you tweak. The LLM, the most expensive component in the entire stack, is treated as a passive reader of whatever you decided was important.

Letta, the open-source descendant of the MemGPT paper, takes a different bet: give the model explicit memory-management tool calls and let it page its own context the way a kernel pages RAM. That sounds like a small architectural choice. It is not. It is the only design I've seen in open-source agent infrastructure that takes the LLM seriously as a system, not a function call.

The OS Analogy Is The Whole Point

The original MemGPT paper, from Charles Packer and the Berkeley group in 2023, framed the problem precisely: a fixed context window is a hard memory limit, and bolting external retrieval onto a stateless model is a hack. The fix was to give the model a tiered memory hierarchy it could manage itself:

Core memory — in-context blocks the model can read, write, and rewrite via tool calls. The LLM's "RAM." In Letta, these are explicitly named blocks (persona, human, project_context), each with a character limit and description. They live in the prompt on every turn.
Recall memory — the full conversation history, persisted to disk, searchable but not in the active context. The LLM's pagefile.
Archival memory — processed, indexed knowledge in a vector or graph backend. The LLM's long-term storage.

The model decides when to read from recall, when to write to core, when to evict to archival. It is doing its own memory management. The framework just gives it the syscalls.

How It Actually Works

The API is deceptively simple. Create an agent with named memory blocks, then send messages:

python

from letta_client import Letta
client = Letta(token=LETTA_API_KEY)
agent = client.agents.create(
    model="openai/gpt-5.2",
    memory_blocks=[
        {"label": "human", "value": "Name: Aashi. Senior backend engineer. Prefers Postgres."},
        {"label": "persona", "value": "I am a precise, no-fluff engineering assistant."},
    ],
    tools=["web_search", "memory_insert", "memory_replace"],
)
response = client.agents.messages.create(agent.id, input="What database should I use?")

Behind that API, every turn is a structured loop. The model is prompted with the current state of every memory block, the message buffer, and a toolset that includes core_memory_append, core_memory_replace, archival_memory_insert, archival_memory_search, and conversation_search. The framework parses the tool calls, mutates persisted state, and re-injects updated blocks into the next prompt. The model does the work; the framework is the persistence layer.

The 2026 addition that actually changed how I think about this is sleep-time compute. Instead of forcing the model to consolidate memory during a live conversation (which adds latency while the user waits for the LLM to tidy its notes), Letta spins up a background agent that processes, summarizes, and rewrites memory blocks while the user is idle. MemGPT's original design bundled everything into one agent; sleep-time compute separates the concerns.

Why It Matters In 2026

Letta Code, the terminal coding agent shipped in December 2025 and the #1 model-agnostic open-source agent on Terminal-Bench as of mid-2026, is the proof of concept. A coding agent that retains your repo conventions, your naming preferences, and the half-finished refactor from last Tuesday across sessions is qualitatively different from one that starts cold every morning.

The Letta API decouples agent identity, memory, and state from the underlying model. Swap GPT-5.2 for Opus 4.5 for a local Qwen3.5 and the agent keeps its memory and history. No other major framework does this cleanly.

The Honest Limitations

The MemGPT approach has real costs. The model is doing memory work that competes with the actual task. Without sleep-time compute, you pay a latency tax on every turn as the model decides what to evict, recall, archive. Memory quality is bounded by the model's tool-use capability. Letta's own leaderboard ranks GPT-5.2 and Opus 4.5 at the top and notes that smaller open models degrade the experience. And this is not a drop-in replacement for RAG. If you have a static document corpus, you want a vector database with hybrid search. Letta's archival memory is for knowledge the agent has produced or consumed, not arbitrary retrieval.

The deeper limitation: memory management is still a context engineering problem, and the LLM is doing it heuristically. There is no guarantee the model writes the right fact to the right block, or recalls the right fact at the right time. The benchmarks (LoCoMo, LongMemEval, BEAM) measure recall, not judgment. Production systems still need to instrument for memory quality.

The Take

Letta is the only open-source agent framework I've seen that treats memory as a first-class system component rather than a feature bolted on top of an LLM call. The OS-inspired hierarchy is the right abstraction. Sleep-time compute is the right production pattern. Model-agnostic persistence is the right bet for a world where the frontier model changes every six months. The framework has rough edges — latency cost, model-dependence, evolving developer experience — but the architectural foundation is sound in a way that most of the agent-framework field is not.

If you are building an agent that needs to remember anything beyond the current conversation, install pip install letta-client and pay attention to the memory block design. That is where the actual work is. The LLM is the easy part.

Repo: github.com/letta-ai/letta — Apache 2.0, 19K+ stars, self-hostable, Python and TypeScript SDKs, Letta Code CLI, sleep-time compute. Original MemGPT paper: Packer et al., 2023. Letta Code launched December 2025; #1 on Terminal-Bench open-source category as of May 2026.