What 94.6% memory accuracy means for your users

Your users notice when an agent forgets their preferences or asks the same question twice. Hindsight scores 94.6% on LongMemEval, outperforming every competing memory system across all five evaluation dimensions. Our results were independently reproduced by academic AI researchers and published for peer review.

Overall LongMemEval Scores

Composite scores across all five evaluation dimensions. Higher means your agent remembers more and forgets less.

GPT-4o60.2%
Zep71.2%
Supermemory85.2%
Hindsight94.6%

Per-Dimension Breakdown

How each system performs across the five core memory capabilities that your users depend on.

DimensionGPT-4oZepSupermemoryHindsight
Single-Session72.1%78.4%88.3%96.2%
Cross-Session55.8%68.1%83.7%93.8%
Temporal Reasoning48.3%74.6%81.2%92.1%
Knowledge Update58.7%65.3%84.9%95.4%
Multi-Hop Reasoning66.1%69.8%88.1%95.1%

Methodology

All scores on this page come from LongMemEval, a comprehensive benchmark for evaluating long-term memory in conversational AI systems. The benchmark was developed by independent researchers and is the most rigorous public evaluation of agent memory capabilities available today.

LongMemEval tests five core dimensions of memory performance:

  • Single-Session — recalling information within the same conversation.
  • Cross-Session — retaining facts across separate conversations over time.
  • Temporal Reasoning — understanding when events occurred and reasoning about time.
  • Knowledge Update — correctly updating beliefs when information changes.
  • Multi-Hop Reasoning — combining multiple stored facts to answer complex queries.

Together, these dimensions provide a holistic picture of how well a memory system supports real-world agent workflows where conversations span days, weeks, or months.

Feature Comparison

Performance is only part of the picture. You also need to own your data and deploy on your terms.

SystemScoreLicenseSelf-Host
Hindsight94.6%MITYes (Docker)
Supermemory85.2%ClosedEnterprise only
Zep71.2%MixedVia Graphiti
GPT-4o60.2%N/ANo

Start building agents that learn

Open source, MIT licensed. Self-host or use Hindsight Cloud.