Hindsight vs Letta (MemGPT): Agent Memory Compared (2026)

March 14, 2026

If you're building AI agents that need to remember things across sessions, you've probably narrowed your search to a handful of frameworks. Two that show up in nearly every evaluation: Hindsight and Letta (formerly MemGPT).

They both solve the agent memory problem — but they solve it in fundamentally different ways. Hindsight is a memory layer you plug into any agent stack. Letta is a full agent runtime that happens to have excellent memory built in. That distinction drives every other difference between the two.

This guide breaks down the architectures, tradeoffs, and use cases so you can pick the right one without spending a week on each.

Hindsight vs Letta: Quick Comparison

	Hindsight	Letta (MemGPT)
What it is	Standalone memory layer	Full agent runtime
License	MIT	Apache 2.0
GitHub stars	~4K (growing fast)	~21K
Memory approach	Passive extraction + multi-strategy retrieval	Agent self-edits its own memory blocks
Memory tiers	Unified store with 4 parallel retrieval strategies	Core (RAM) · Recall (disk cache) · Archival (cold)
Retrieval	Semantic + keyword + graph + temporal, cross-encoder reranking	Agentic tool calls against memory tiers
Benchmark	94.6% on LongMemEval	Not published
SDKs	Python, TypeScript, Go	Python
Integration model	Library — drop into any agent framework	Platform — agents run inside Letta
Setup time	Minutes (one Docker command)	Hours (runtime + ADE + configuration)
Managed cloud	Yes	Yes ($20–200/mo)
Self-hosted	Yes (free, MIT)	Yes (free, Apache 2.0)

Agent Memory Architecture: Library vs. Platform

This is the core difference, and everything else flows from it.

Letta: An Operating System for Agents

Letta started as the MemGPT research project — a paper that proposed treating LLM context like an operating system manages virtual memory. The agent gets a fixed context window (analogous to RAM), and Letta provides system calls for paging information in and out of longer-term storage.

That OS metaphor isn't just branding. Letta is a full agent runtime. Your agents don't just use Letta for memory — they run inside Letta. The framework handles the agent loop, tool execution, memory management, and state persistence. It provides an Agent Development Environment (ADE) for visual debugging, monitoring, and inspecting memory state.

This is powerful if you want an integrated platform. You get agent orchestration, memory, tool management, and observability in one package. But it means adopting Letta as your agent framework, not just your memory system. If you're already invested in LangGraph, CrewAI, or your own agent loop, Letta doesn't slot in as a memory layer — it replaces your stack.

Hindsight: A Memory Layer, Nothing More

Hindsight takes the opposite approach. It's a standalone memory service that any agent can call. Your agent framework handles the loop, the tools, the orchestration. Hindsight handles remembering.

The architecture is built around four parallel retrieval strategies — semantic search, keyword search, knowledge graph traversal, and temporal filtering — that execute simultaneously on every query. As the survey paper "Memory in the Age of AI Agents" documents, multi-strategy retrieval is a critical capability for modern agent memory systems. Results are fused using reciprocal rank fusion and then scored by a cross-encoder reranker. This multi-strategy approach is why Hindsight scores 94.6% on the LongMemEval benchmark — different types of memory queries (temporal, relational, factual) need different retrieval strategies, and no single strategy handles all of them well.

Because it's a library and not a runtime, Hindsight works with whatever you're already using. Python, TypeScript, Go — pick an SDK, point it at the Hindsight server, and your agent has persistent memory. The agent framework doesn't matter. The LLM provider doesn't matter. It's also MCP-native, so any MCP-compatible client can use it as a memory tool out of the box.

Agent Memory Management: Who Edits Memory?

This is where the philosophical difference between the two systems becomes concrete.

Letta: The Agent Manages Its Own Memory

In Letta, the agent is responsible for deciding what to remember. Memory is organized into three tiers:

Core Memory — a small block that stays in the context window at all times, like working RAM. Contains the agent's persona, user info, and critical context. The agent can read and write to it directly.
Recall Memory — conversation history stored outside the context window, like a disk cache. The agent can search it when it needs to look back at what was said.
Archival Memory — long-term storage for large amounts of information the agent might need eventually, like cold storage. The agent inserts and queries it via tool calls.

The key design choice: the agent self-edits. When the agent decides something is important, it calls a function to write it to core, recall, or archival memory. When it needs to recall something, it calls a function to search. This is elegant because the agent uses its own reasoning to decide what matters — there's no separate extraction pipeline making those decisions.

The tradeoff is that memory quality depends entirely on the model's judgment. If the model decides something isn't worth remembering, it's gone. If it writes a poorly structured memory entry, retrieval suffers. And every memory operation costs inference tokens, since the agent has to reason about what to store and how to store it.

Hindsight: The System Extracts Memory Passively

Hindsight doesn't ask the agent to manage memory. Instead, the agent sends interaction data to Hindsight, and the system handles extraction automatically:

Fact extraction — pulls discrete facts, preferences, and events from raw interactions
Entity resolution — links extracted facts to known entities, deduplicating and merging references to the same person, project, or concept
Reflection — periodically synthesizes accumulated facts into higher-order observations (mental models, patterns, evolving assessments)

The agent doesn't need to decide what to remember or how to structure it. It just sends its interactions and queries memory when it needs context. This keeps the agent code simple and avoids spending inference tokens on memory management decisions.

The tradeoff is less fine-grained control. With Letta, the agent can be very deliberate about what it stores and how. With Hindsight, the extraction pipeline makes those decisions. In practice, passive extraction tends to be more consistent — it catches things the agent might not think to save — but it can't capture agent-specific reasoning about why something matters.

Developer Experience: Agent Memory Integration

Getting Started with Letta

Letta requires more upfront investment. You're adopting a runtime, so you need to:

Install and run the Letta server
Define your agent's persona, memory blocks, and tool configuration
Learn Letta's agent loop model and how agents interact with memory tiers
Optionally set up the ADE for debugging

The payoff is a fully integrated development environment. The ADE lets you visually inspect memory state, watch agents reason about what to store, and debug retrieval in real time. If you're building agents from scratch and want an opinionated, batteries-included platform, the learning curve is worth it. Expect to spend a few hours getting comfortable, and Letta is model-agnostic so you can bring whatever LLM you prefer.

Getting Started with Hindsight

Hindsight is designed to get out of the way:

docker run -p 8723:8723 vectorize/hindsight

That's the self-hosted setup. From there, you install an SDK (Python, TypeScript, or Go), initialize a client, and start adding memories. Typical integration is under thirty minutes, including reading the docs.

There's less to learn because there's less surface area. Hindsight does one thing — memory — and exposes a small API for it. The tradeoff is that you don't get Letta's built-in agent orchestration, visual debugging, or tool management. You bring those yourself.

Pricing: Hindsight vs Letta

Letta

Self-hosted: Free (Apache 2.0)
Managed cloud: $20/mo (starter) to $200/mo (teams), with enterprise tiers above that

The managed cloud handles hosting, scaling, and persistence. Self-hosting is free but you're responsible for infrastructure and updates.

Hindsight

Self-hosted: Free (MIT license)
Managed cloud: Available through Vectorize.io with usage-based pricing

Both offer genuinely usable self-hosted options. Neither gates core features behind a paid tier.

When to Choose Letta for Agent Memory

Letta is the right choice when:

You're building agents from scratch and want a complete runtime, not just memory. Letta gives you the agent loop, tool execution, state management, and memory in one package.
You want agents that reason about their own memory. Letta's self-editing model is genuinely innovative — the agent decides what's important and structures its own memory. For use cases where that deliberation matters (complex personal assistants, long-running autonomous agents), this is a meaningful advantage.
You value visual debugging. The ADE is excellent for understanding what's happening inside your agent's memory. If you're doing research or iterating on agent behavior, the observability tooling is a differentiator.
You want an opinionated platform. If the "pick your own everything" approach sounds exhausting, Letta makes decisions for you and provides a cohesive experience.

When to Choose Hindsight for Agent Memory

Hindsight is the right choice when:

You already have an agent framework and just need memory. If you're running LangGraph, CrewAI, AutoGen, or a custom agent loop, Hindsight drops in without replacing anything.
You need high retrieval accuracy across diverse query types. The four parallel retrieval strategies with cross-encoder reranking handle temporal, relational, and factual queries consistently. The 94.6% LongMemEval score reflects this.
You want fast integration. One Docker command to self-host. SDKs in three languages. MCP-native. Most teams integrate in under an hour.
You're building multi-agent systems where different agents share the same memory. Hindsight is framework-agnostic, so heterogeneous agent stacks can all read and write to the same memory service.
You want passive memory extraction. Your agents focus on their tasks. Hindsight handles the remembering.

Verdict: Hindsight vs Letta for Agent Memory

Letta and Hindsight are solving different problems with different philosophies.

Choose Letta if you want a full agent platform with memory deeply integrated into the runtime. Its OS-inspired architecture and agent self-editing model are genuinely novel, and the ADE provides observability that standalone memory systems can't match. You're trading flexibility for a cohesive, opinionated experience.

Choose Hindsight if you want a memory layer that works with whatever agent stack you already have. Its multi-strategy retrieval and passive extraction pipeline deliver strong accuracy without requiring your agents to manage their own memory. You're trading the integrated platform experience for flexibility and faster integration.

The honest answer: most teams evaluating both already have an agent framework. If that's you, Hindsight is the simpler path — it adds memory to your stack without replacing it. If you're starting fresh and want the runtime to handle everything, Letta is worth the deeper investment.

As IBM's research on AI agent memory explains, the ability for agents to learn from experience is becoming a core architectural requirement. Whether that learning happens through agent self-editing (Letta's approach) or passive extraction (Hindsight's approach), the right choice depends on how much control your agents need over their own agent memory management.

Further reading:

What Is Agent Memory? — foundational concepts
Agent Memory vs RAG — key architectural differences explained
Best AI Agent Memory Systems in 2026 — full comparison of all 8 major frameworks
Letta Alternatives — other options if Letta isn't the right fit