Hindsight: Building AI Agents That Actually Learn

Chris Latimer
Hindsight: Building AI Agents That Actually Learn

Modern AI agents are expected to do more than answer isolated questions. We ask them to manage projects, collaborate with teams, track evolving information, and develop stable viewpoints over time. Yet most agents today still behave like stateless tools—each interaction starts fresh, with little ability to accumulate experience or learn.

This gap isn’t a limitation of language models themselves. It’s a limitation of how we design agent memory.

At Vectorize, this realization led us to build Hindsight: an open-source agent memory system designed not just for recall, but for learning.

The Problem We Hit Building Agents

We first encountered this problem while building AI agents for our own internal workflows, including an AI project manager. These agents needed to remember prior decisions, understand people and entities over time, and refine their judgments as new information arrived.

When we looked at existing agent memory solutions, we were disappointed. Most systems amounted to little more than:

  • Semantic search over conversation logs
  • Large chunks of history stuffed back into the context window
  • Closed, opaque implementations

These approaches helped with short-term recall, but they didn’t allow agents to learn. They blurred facts and inferences, struggled with long time horizons, and produced inconsistent reasoning across sessions.

What Human Memory Gets Right

Human memory isn’t static or deterministic. What we recall depends on:

  • Context: the situation we’re in and the task at hand
  • Time: recency matters, but so does sequence and causality
  • Entities: people, places, and concepts anchor memory
  • Beliefs: we form opinions that evolve as evidence accumulates

Crucially, humans distinguish between what we observed and what we believe. We can explain why we hold a belief, revise it when contradicted, and maintain a coherent perspective over long periods of time.

Most agent memory systems ignore these distinctions.

Designing Memory for Learning, Not Storage

With these principles in mind, we set out to build a memory system that treats memory as a first-class substrate for reasoning, not just a retrieval add-on.

That effort became Hindsight.

Along the way, we collaborated with researchers at Virginia Tech and practitioners at The Washington Post, who were facing similar challenges in long-horizon reasoning, explainability, and consistency. These discussions directly shaped both the system design and the research behind it.

The Hindsight Architecture

Hindsight is built around a simple but powerful abstraction: retain, recall, and reflect.

A diagram illustrating the Hindsight Memory Networks, highlighting the processes of retain, recall, and reflect. It includes labeled sections for World Facts, Experiences, Observations, and Opinions, with arrows indicating the flow of information.

Four Memory Networks

Instead of storing everything in a single undifferentiated store, Hindsight organizes memory into four distinct networks:

  • World: objective facts about the external world
  • Experience: the agent’s own actions and interactions
  • Observation: synthesized, preference-neutral summaries of entities
  • Opinion: subjective beliefs with confidence scores that evolve over time

This separation provides epistemic clarity: the agent can distinguish evidence from inference, and developers can inspect why an agent answered the way it did.

Retain: Turning Conversations into Structured Memory

Hindsight’s retention layer converts raw conversational streams into narrative facts with:

  • Temporal ranges
  • Canonical entities
  • Causal and semantic links

Rather than fragmenting memory into isolated snippets, Hindsight stores self-contained narratives that preserve reasoning and context. These memories form a temporal, entity-aware graph that grows over time.

Flowchart illustrating the Hindsight architecture for agent memory, detailing processes for retaining input, fact extraction, creating memory units, and routing memory into distinct categories.

Recall: Finding What Actually Matters

When an agent needs memory, Hindsight doesn’t return a fixed top-k list. Instead, it performs multi-strategy retrieval, combining:

  • Semantic vector search
  • Keyword (BM25) search
  • Graph traversal over entities and causal links
  • Temporal filtering

These signals are fused and reranked to return just enough relevant memory to fit within a token budget—allowing models to focus their attention where it matters most.

Diagram depicting the process of intelligent parallel retrieval in an AI memory system, showcasing various methods such as semantic search, keyword search, graph traversal, and time series, leading to the final contextual memory set.

Reflect: Reasoning, Opinions, and Learning

Reflection is where learning happens.

Hindsight supports preference-conditioned reasoning through configurable behavioral parameters such as skepticism, literalism, and empathy. Given the same facts, agents with different dispositions can form different opinions—just like humans do.

Importantly, opinions are not static. As new evidence arrives, Hindsight reinforces, weakens, or revises existing beliefs, updating confidence scores over time. Opinions become trajectories, not labels.

Flowchart illustrating the Hindsight memory architecture for AI agents. It shows the process from input query to final response, including memory recall, context building, and opinion adjustments.

Does It Work? The Evidence

We evaluated Hindsight on two demanding long-horizon memory benchmarks: LongMemEval and LoCoMo.

The results were striking:

  • With an open-source 20B model, Hindsight improved LongMemEval accuracy from 39.0% to 83.6% over a full-context baseline
  • Scaling the LLM backbone pushed performance to 91.4%, achieving state-of-the-art results
  • On LoCoMo, Hindsight consistently outperformed existing open memory systems and matched or exceeded frontier-backed and closed source approaches

These gains weren’t about larger models—they came from better memory. The architecture itself carried the performance improvements across multi-session reasoning, temporal queries, and preference-sensitive tasks.

You can see all these details in this paper published on arXiv in collaboration with Virginia Tech and practitioners at Washington Post: Hindsight is 20/20: Building Agent Memory that Retains, Recalls, and Reflects

Why We Open Sourced Hindsight

As we looked more closely at popular agent frameworks, we were surprised by how often they violated a basic engineering principle: separation of concerns. Memory selection was pushed into prompts, and models were left to infer relevance from massive context dumps.

We believe agent memory deserves its own dedicated layer.

That’s why we made Hindsight open source and free. Our goal is to help the community build agents that:

  • Learn from experience
  • Maintain consistent perspectives
  • Explain their reasoning
  • Scale across long time horizons

What’s Next

Hindsight is just getting started.

We’re exploring richer opinion dynamics, controlled forgetting, privacy-aware memory management, and deeper integration with tool-using agents. We also see opportunities to jointly learn memory extraction, retrieval, and reflection rather than treating them as fixed pipelines.

If you’re building long-lived agents—or want to help shape what agent memory becomes next—we’d love your involvement.