Hindsight: Building AI Agents That Actually Learn

Modern AI agents are expected to do more than answer isolated questions. We ask them to manage projects, collaborate with teams, track evolving information, and develop stable viewpoints over time. Yet most agents today still behave like stateless tools—each interaction starts fresh, with little ability to accumulate experience or learn.
This gap isn’t a limitation of language models themselves. It’s a limitation of how we design agent memory.
At Vectorize, this realization led us to build Hindsight: an open-source agent memory system designed not just for recall, but for learning.
The Problem We Hit Building Agents
We first encountered this problem while building AI agents for our own internal workflows, including an AI project manager. These agents needed to remember prior decisions, understand people and entities over time, and refine their judgments as new information arrived.
When we looked at existing agent memory solutions, we were disappointed. Most systems amounted to little more than:
- Semantic search over conversation logs
- Large chunks of history stuffed back into the context window
- Closed, opaque implementations
These approaches helped with short-term recall, but they didn’t allow agents to learn. They blurred facts and inferences, struggled with long time horizons, and produced inconsistent reasoning across sessions.
What Human Memory Gets Right
Human memory isn’t static or deterministic. What we recall depends on:
- Context: the situation we’re in and the task at hand
- Time: recency matters, but so does sequence and causality
- Entities: people, places, and concepts anchor memory
- Beliefs: we form opinions that evolve as evidence accumulates
Crucially, humans distinguish between what we observed and what we believe. We can explain why we hold a belief, revise it when contradicted, and maintain a coherent perspective over long periods of time.
Most agent memory systems ignore these distinctions.
Designing Memory for Learning, Not Storage
With these principles in mind, we set out to build a memory system that treats memory as a first-class substrate for reasoning, not just a retrieval add-on.
That effort became Hindsight.
Along the way, we collaborated with researchers at Virginia Tech and practitioners at The Washington Post, who were facing similar challenges in long-horizon reasoning, explainability, and consistency. These discussions directly shaped both the system design and the research behind it.
The Hindsight Architecture
Hindsight is built around a simple but powerful abstraction: retain, recall, and reflect.

Four Memory Networks
Instead of storing everything in a single undifferentiated store, Hindsight organizes memory into four distinct networks:
- World: objective facts about the external world
- Experience: the agent’s own actions and interactions
- Observation: synthesized, preference-neutral summaries of entities
- Opinion: subjective beliefs with confidence scores that evolve over time
This separation provides epistemic clarity: the agent can distinguish evidence from inference, and developers can inspect why an agent answered the way it did.
Retain: Turning Conversations into Structured Memory
Hindsight’s retention layer converts raw conversational streams into narrative facts with:
- Temporal ranges
- Canonical entities
- Causal and semantic links
Rather than fragmenting memory into isolated snippets, Hindsight stores self-contained narratives that preserve reasoning and context. These memories form a temporal, entity-aware graph that grows over time.

Recall: Finding What Actually Matters
When an agent needs memory, Hindsight doesn’t return a fixed top-k list. Instead, it performs multi-strategy retrieval, combining:
- Semantic vector search
- Keyword (BM25) search
- Graph traversal over entities and causal links
- Temporal filtering
These signals are fused and reranked to return just enough relevant memory to fit within a token budget—allowing models to focus their attention where it matters most.

Reflect: Reasoning, Opinions, and Learning
Reflection is where learning happens.
Hindsight supports preference-conditioned reasoning through configurable behavioral parameters such as skepticism, literalism, and empathy. Given the same facts, agents with different dispositions can form different opinions—just like humans do.
Importantly, opinions are not static. As new evidence arrives, Hindsight reinforces, weakens, or revises existing beliefs, updating confidence scores over time. Opinions become trajectories, not labels.

Does It Work? The Evidence
We evaluated Hindsight on two demanding long-horizon memory benchmarks: LongMemEval and LoCoMo.
The results were striking:
- With an open-source 20B model, Hindsight improved LongMemEval accuracy from 39.0% to 83.6% over a full-context baseline
- Scaling the LLM backbone pushed performance to 91.4%, achieving state-of-the-art results
- On LoCoMo, Hindsight consistently outperformed existing open memory systems and matched or exceeded frontier-backed and closed source approaches
These gains weren’t about larger models—they came from better memory. The architecture itself carried the performance improvements across multi-session reasoning, temporal queries, and preference-sensitive tasks.
You can see all these details in this paper published on arXiv in collaboration with Virginia Tech and practitioners at Washington Post: Hindsight is 20/20: Building Agent Memory that Retains, Recalls, and Reflects
Why We Open Sourced Hindsight
As we looked more closely at popular agent frameworks, we were surprised by how often they violated a basic engineering principle: separation of concerns. Memory selection was pushed into prompts, and models were left to infer relevance from massive context dumps.
We believe agent memory deserves its own dedicated layer.
That’s why we made Hindsight open source and free. Our goal is to help the community build agents that:
- Learn from experience
- Maintain consistent perspectives
- Explain their reasoning
- Scale across long time horizons
What’s Next
Hindsight is just getting started.
We’re exploring richer opinion dynamics, controlled forgetting, privacy-aware memory management, and deeper integration with tool-using agents. We also see opportunities to jointly learn memory extraction, retrieval, and reflection rather than treating them as fixed pipelines.
If you’re building long-lived agents—or want to help shape what agent memory becomes next—we’d love your involvement.
- Check out (and star) the GitHub repo
- Join the Slack community
- Read the paper: Hindsight is 20/20: Building Agent Memory that Retains, Recalls, and Reflects
- Sign up for early access to the Hindsight cloud service