Four retrieval strategies running in parallel. Token-budget optimization. Not RAG. Not a vector database wrapper. 94.6% on LongMemEval, peer-reviewed and independently reproducible.
New to agent memory? Start here →
Integration
No schema design. No manual tagging. No migration. Your agent starts building memory from the first conversation.
# pip install hindsight-client
from hindsight_client import Hindsight
client = Hindsight(base_url="http://localhost:8888")
client.retain(bank_id="my-bank", content="Alice prefers Slack over email")
results = client.recall(bank_id="my-bank", query="How does Alice communicate?")Full REST API reference at hindsight.vectorize.io/api-reference
Under the hood
All four run in parallel. Results merge with token budgets, not top-K. You get predictable context size, predictable cost, and the most relevant memories from four different angles.
LongMemEval is a peer-reviewed benchmark for agent memory systems. Our results are independently reproducible. The benchmark code is open source.
Built from the same codebase as Hindsight Cloud. No feature gating, no usage limits, no phone-home telemetry.
docker run -p 8888:8888 -e HINDSIGHT_API_LLM_API_KEY=$OPENAI_API_KEY ghcr.io/vectorize-io/hindsightManaged Hindsight. We handle infrastructure, scaling, backups, and upgrades. You focus on your agent.
Same API, same MCP interface, same everything. Just no ops.
Open source, MIT licensed. Self-host or use Hindsight Cloud.
94.6%
LongMemEval — highest score of any memory system