Agent memory that learns from experience

Four retrieval strategies running in parallel. Token-budget optimization. Not RAG. Not a vector database wrapper. 94.6% on LongMemEval, peer-reviewed and independently reproducible.

View on GitHub Try Hindsight Cloud

New to agent memory? Start here →

Integration

Add memory in minutes, not sprints

No schema design. No manual tagging. No migration. Your agent starts building memory from the first conversation.

# pip install hindsight-client

from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")
client.retain(bank_id="my-bank", content="Alice prefers Slack over email")
results = client.recall(bank_id="my-bank", query="How does Alice communicate?")

Full REST API reference at hindsight.vectorize.io/api-reference

Stuck on integration? Ask in Slack.

Real engineers, real answers. The Hindsight team and community are one message away.

Join Slack

Under the hood

Four retrieval strategies. One query.

Dense vector search: Semantic similarity via embeddings; Finds conceptually related memories even when the wording differs.
Sparse vector search: BM25 keyword matching; Catches exact terms and proper nouns that semantic search misses.
Graph traversal: Entity relationship connections; Discovers linked context: person → project → preference.
Temporal search: Time-aware retrieval with causal chains; “What happened during onboarding last week?” with cause and effect.

All four run in parallel. Results merge with token budgets, not top-K. You get predictable context size, predictable cost, and the most relevant memories from four different angles.

Token budgets, not top-K: Token budgets control how much memory fits in your prompt (e.g., 4,096 tokens), unlike top-K which counts results (e.g., top 10) regardless of size. Predictable context window usage, predictable API costs.
Conflict detection: When facts change, both states are preserved with timestamps. “Alex used to prefer email, now prefers Slack” is more useful than silently overwriting.
Entity resolution: Batch disambiguation handles hundreds of entities in 3 database queries instead of hundreds. 99.5% query reduction.

State of the art. Peer-reviewed.

LongMemEval is a peer-reviewed benchmark for agent memory systems. Our results are independently reproducible. The benchmark code is open source.

GPT-4o60.2%

Zep71.2%

Supermemory85.2%

Hindsight94.6%

Read the research Explore benchmark results

Open source or managed. Same product.

MIT Licensed

Open Source

Built from the same codebase as Hindsight Cloud. No feature gating, no usage limits, no phone-home telemetry.

$docker run -p 8888:8888 -e HINDSIGHT_API_LLM_API_KEY=$OPENAI_API_KEY ghcr.io/vectorize-io/hindsight

View on GitHub Read the docs

SOC2 Type 2

Hindsight Cloud

Managed Hindsight. We handle infrastructure, scaling, backups, and upgrades. You focus on your agent.

Same API, same MCP interface, same everything. Just no ops.

Try Hindsight Cloud See pricing

Production ready

SOC2 Type 2 certified: Annual audit of security controls, availability, and data handling.
User isolation: Tag-based security boundaries prevent cross-user data leakage during memory consolidation.
No phone-home telemetry: Self-hosted Hindsight sends nothing back. Your data stays on your infrastructure.
Peer-reviewed research: Built on research from academic institutions. Results independently reproducible.

Start building agents that learn

Open source, MIT licensed. Self-host or use Hindsight Cloud.

View on GitHub Try Hindsight Cloud

94.6%

LongMemEval — highest score of any memory system