Agent Memory

Build AI Agents That Actually Learn

Most agent memory systems are just search. Hindsight gives your agents real memory: the ability to retain what they learn, recall it when relevant, and reflect on experience to form new understanding. Not within a single session, but across weeks and months.

View on GitHub Try Hindsight Cloud

Search Is Not Memory

Most “agent memory” systems chop conversations into embedding vectors and push them into a database. The information is in there somewhere, as a floating point array, with no concept of when it happened or what it means.

Agents Forget Everything: Every session starts from zero. Your support agent asks the customer to repeat their issue. Your coding assistant forgets your architecture. Context from prior interactions is gone.
Same Mistakes on Repeat: The agent tries the same failed approach over and over. It has no way to remember that something didn't work, no mechanism to learn from experience.
RAG Is Not Memory: Vector search can't reason about time, can't distinguish facts from synthesized knowledge, and treats all information identically. You end up with keyword search that happens to use embeddings.

RAG vs Agent Memory

RAG (Retrieval-Augmented Generation)

• Retrieves static document chunks
• Single-strategy vector similarity search
• No concept of time or causality
• Top-K results with unpredictable token usage
• No learning from interactions
Result: search, not memory

Agent Memory (Hindsight)

• Extracts structured facts, experiences, beliefs
• Multi-strategy: semantic + keyword + graph + temporal
• Full temporal reasoning with causal chain tracking
• Token budget system with predictable costs
• Agents reflect and form new understanding
Result: agents that genuinely learn

Three Operations That Turn Data Into Knowledge

Storing things is trivial. The hard part is structuring experience into knowledge the agent can use to behave differently. Hindsight organizes memory around three core operations.

Retain: New information arrives and the memory system extracts structured facts (self-contained narrative units, not arbitrary 512-token chunks), identifies entities and relationships, stamps everything with temporal metadata, and classifies the type of information. The output is a queryable graph, not a flat list.
Recall: Multi-strategy retrieval combines semantic search, keyword matching, graph traversal, and temporal filtering. Results are fused and reranked against a token budget so the agent gets the right context for the current task, not a dump of everything vaguely related.
Reflect: The operation most memory systems don't have, and where learning actually happens. The agent reasons over accumulated memories to synthesize new understanding, form observations from evidence, and consult mental models you've curated.

Quick Example

from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

# Retain: store a memory
client.retain(bank_id="user-123", content="Alice prefers email and works at Google.")

# Recall: retrieve relevant memories
memories = client.recall(bank_id="user-123", query="How should I contact Alice?")

# Reflect: reason about what the agent knows
insight = client.reflect(bank_id="user-123", query="What do I know about Alice?")

Multi-Strategy Retrieval

A single retrieval approach isn't enough. Vectors can't answer “what happened last Tuesday?” and keyword search misses paraphrases. Hindsight runs four strategies in parallel, fuses the results, and reranks against a token budget.

Semantic (vector)

Conceptual similarity, paraphrasing

Keyword (BM25)

Names, exact terms, identifiers

Graph traversal

Related entities, indirect connections

Temporal

"Last Tuesday," "Q3," time-range queries

Fusion & Reranking

4 strategies

Reciprocal Rank Fusion

Cross-encoder reranking

Token budget

The agent gets the right context for the current task, fit to a predictable token budget. No more dumping everything vaguely related into the context window.

Benchmark Performance

GPT-4o60.2%

Zep71.2%

Supermemory85.2%

Hindsight91.4%

LongMemEval benchmark, independently verified by Virginia Tech and The Washington Post. Read the paper →

A Hierarchical Learning System

Lumping all memories together means the agent can't distinguish facts from synthesized knowledge. Hindsight uses a hierarchy: mental models first, then observations, then raw facts, so the agent reasons from the most refined knowledge available.

World Facts: Objective claims about the external world. "Alice works on the infrastructure team." "The API uses REST." Things the agent was told, not things it concluded.; “The customer has two active subscriptions and prefers email.”
Experiences: The agent's own actions and interactions. "I recommended Python for this task." "I sent the weekly report on Friday." The agent needs to know what it has done and said.; “Last time I issued a refund without a request number, the API errored.”
Observations: Consolidated knowledge automatically synthesized from multiple facts. The consolidation engine detects patterns, tracks evidence, and evolves observations as new information arrives, capturing the full journey rather than just the latest state.; “User was previously a React enthusiast but has deliberately switched to Vue.”
Mental Models: User-curated summaries for common queries. When you create a mental model, Hindsight runs a reflect operation and stores the result. Future queries check mental models first, giving you consistency, speed, and explicit control over key topics.; “Team Communication Preferences: prefers async Slack over meetings.”

When someone asks “why did you suggest that?” the agent traces through specific world facts and experiences, points to the observations that informed its thinking, and references the mental model it consulted. That's a fundamentally different answer from “here are the five most similar text chunks I found.”

Built for Production

Hindsight is used in production at enterprises and growing startups. These are the capabilities that matter when you move past prototypes.

Predictable Token Costs: The token budget system returns optimal context within a limit you set. No more unpredictable bills from top-K retrieval dumping everything into the context window.
Temporal Reasoning: "What changed since last month?" actually works. Temporal spreading activation follows causal chains across time with decay, something vector search fundamentally cannot do.
Open Source, MIT Licensed: Full transparency, no vendor lock-in. Self-host with Docker in under a minute, or use Hindsight Cloud. Contribute, fork, inspect the code.
Any LLM Provider: Works with OpenAI, Anthropic, Google Gemini, Groq, Cohere, Ollama, and LMStudio. Swap providers without changing your memory architecture.
Multi-Agent Shared Memory: Memory banks let agents share common ground while keeping private context separate. A researcher and a coder on the same project see the same world facts.
Configurable Disposition: Tune how skeptical, literal, or empathetic the agent is when forming beliefs. Given identical evidence, a cautious agent and a trusting one reach different conclusions.