Hermes Agent Memory Providers: All 7 Options Compared

March 31, 2026

Hermes Agent ships with seven external memory providers. One command — hermes memory setup — opens a picker and installs whichever you choose. The hard part isn't setup. It's knowing which one to pick.

This article covers all seven: what each one does, how it stores and retrieves memory, what it costs, and who it's built for.

A quick framing note before the comparisons: all seven providers layer on top of Hermes's built-in memory system, which always runs. MEMORY.md and USER.md stay active regardless of which provider you pick. External providers add structured capture, better retrieval, and cross-session persistence — they don't replace the foundation.

At a Glance

Provider	Storage	Cost	Unique Approach	LongMemEval
Hindsight	Local or Cloud	Free (local)	Knowledge graph + reflect synthesis	94.6%
Holographic	Local SQLite	Free	HRR algebra + trust scoring, zero deps	—
OpenViking	Self-hosted	Free	Tiered L0/L1/L2 loading, 80-90% token savings	—
Mem0	Cloud	Freemium	Server-side LLM extraction, dual memory scope	67.6% (GPT-4o)†
Honcho	Cloud	Varies	Dialectic user modeling — OSS is AGPL v3.0	—
ByteRover	Local or Cloud	Freemium	Knowledge tree in human-readable Markdown	—
RetainDB	Cloud	Paid	Hybrid search: vector + BM25 + reranking	—

† Mem0 score from LongMemEval-S, a benchmark variant — not directly comparable to the full LongMemEval used for Hindsight.

Hindsight

Hindsight is the only provider that stores structured knowledge rather than text. Where every other provider captures memories as semantic chunks — paragraphs or sentences that get retrieved by similarity — Hindsight extracts discrete facts ("production database runs on port 5433"), named entities (people, services, projects), and relationships between them ("auth service depends on Redis for sessions"). This gives retrieval precision: ask about a specific service and Hindsight surfaces facts about that service, not chunks that happened to mention it nearby.

The reflect operation is unique in this group: a periodic synthesis pass that reads across all stored memories to derive higher-level insights and consolidate related facts. Other providers accumulate raw extractions; Hindsight actively refines what it knows.

On LongMemEval — the standard benchmark for agent memory — Hindsight scores 94.6% (473/500 correct). The top officially reproduced result on the Agent Memory Benchmark leaderboard, and achievable without sending data to a cloud API.

Setup: hermes memory setup → select Hindsight → set HINDSIGHT_API_KEY in .env if using Cloud, or leave blank for local daemon. Installs hindsight-client.

Storage: Local PostgreSQL daemon by default (no cloud account required). Hindsight Cloud is available for shared memory across machines or teams.

Tools: hindsight_recall, hindsight_retain, hindsight_reflect

Best for: Developers who want the highest retrieval accuracy, anyone working with privacy-sensitive data, teams who need structured knowledge rather than raw text retrieval.

For full configuration options, see the Hindsight integration guide.

Holographic

Holographic is the most technically unusual option in the list. Instead of vector embeddings or keyword search, it uses HRR (Holographic Reduced Representations) — a mathematical framework where memories are stored as superposed complex-valued vectors, and recall is algebraic rather than similarity-based. The practical implication: retrieval is sub-millisecond and runs on pure SQLite with zero pip dependencies beyond what Hermes already requires.

Trust scoring adds another layer: memories that are recalled and confirmed across multiple sessions gain higher trust scores and are weighted more heavily in retrieval. Memories that contradict newer information decay. Over time the memory store self-corrects rather than accumulating noise.

The trade-off is that Holographic doesn't do LLM-based extraction — it doesn't pull structured facts out of conversations the way Hindsight does. It stores and retrieves conversational content, but doesn't build a knowledge graph.

Setup: hermes memory setup → select Holographic. No API keys, no external services, no additional installs.

Storage: Local SQLite. Nothing leaves your machine.

Tools: 2 tools (minimal surface area by design)

Best for: Users who want enhanced local memory with zero additional dependencies, air-gapped environments, anyone who wants memory that self-corrects over time.

OpenViking

OpenViking comes from ByteDance's open-source team (volcengine) and takes a filesystem-first approach to agent context. Rather than a database, it organizes memory, resources, and skills into a filesystem hierarchy — a "context database" where each piece of knowledge lives as a file at the appropriate level of the tree.

The standout feature is L0/L1/L2 tiered loading:

L0 (Abstract): A one-sentence summary (~50 tokens). Loaded first for quick retrieval.
L1 (Overview): Core information and usage scenarios (~500 tokens). Loaded when the agent needs to plan.
L2 (Details): The full original content. Loaded only when the agent needs deep context.

The agent reads L0 summaries first and escalates to L1 or L2 only when necessary. The result, per published benchmarks, is 80-90% token cost reduction compared to loading full context on every turn — a significant savings for high-volume or long-running deployments.

Setup: hermes memory setup → select OpenViking → set OPENVIKING_ENDPOINT and OPENVIKING_API_KEY in .env. Requires running the OpenViking server (self-hosted).

Storage: Self-hosted server. You control the infrastructure.

Tools: 5 tools (highest tool count of any provider)

Best for: Cost-conscious deployments at scale, users who want filesystem-transparent memory they can inspect and edit directly, teams already comfortable running self-hosted services.

Mem0

Mem0 is the fastest to get working. Install the provider, add your API key, and within 30 seconds Hermes is capturing and recalling memories automatically. It uses server-side LLM extraction — Mem0's infrastructure decides what's worth keeping from each conversation — and includes a circuit breaker that prevents memory failures from blocking agent responses.

The dual memory scope is a practical differentiator: session memories (short-term, scoped to the current conversation) and user memories (long-term, persisted across all sessions). Both are searched and injected before each response, giving the agent both immediate context and historical context simultaneously.

On LongMemEval-S, Mem0 scores 67.6% with GPT-4o. The result reflects the trade-off inherent in cloud-based extraction: convenient and fast to set up, but less precise than structured knowledge extraction.

A free tier is available. Self-hosted open-source mode exists but requires more setup than the cloud version.

Setup: hermes memory setup → select Mem0 → set MEM0_API_KEY in .env.

Storage: Mem0 Cloud by default. Self-hosted possible with additional configuration.

Tools: mem0_add, mem0_search, mem0_get_all

Best for: Developers who want the fastest setup, users who need both session-scoped and long-term memory as distinct concepts, anyone who wants a free starting point and can decide on infrastructure later.

Honcho

Honcho takes a different approach to memory than every other provider here. Rather than storing facts about what you've told the agent, it builds a model of how you think — your reasoning patterns, your communication style, your decision-making tendencies over time. It calls this dialectic user modeling, and it runs as a background process that deepens its model of you across sessions.

The result is memory that becomes more accurate as a model of you, rather than a growing archive of facts. For personal assistants and long-running agents, this is a genuinely different value proposition.

One important note: Honcho's open-source codebase is licensed under AGPL v3.0. This means if you self-host Honcho as part of a networked application, you're required to release the source of that application under AGPL as well. Using Honcho's managed cloud service avoids this obligation — the license applies to the OSS code, not the SaaS product. For commercial or proprietary products, verify your licensing situation before self-hosting.

Setup: hermes memory setup → select Honcho → set HONCHO_API_KEY in .env.

Storage: Honcho Cloud. Enterprise VPC self-hosting available.

Tools: 4 tools

Best for: Personal assistant use cases where the agent should develop a deepening model of the user, anyone who was already using Honcho under the old integration and wants to continue.

ByteRover

ByteRover organizes agent knowledge into a hierarchical knowledge tree stored as human-readable Markdown files in .brv/context-tree/. Unlike most providers that abstract storage into a database, ByteRover keeps everything inspectable — you can open the files in any text editor and see exactly what your agent knows, organized into a navigable hierarchy.

The pre-compression extraction hook is the most distinctive integration point: ByteRover fires specifically before Hermes compresses a long conversation, ensuring in-flight knowledge gets captured into the tree before context is summarized away. For users with very long sessions where compression fires frequently, this is a meaningful safeguard.

The CLI (brv) manages the knowledge tree and is a required dependency for the Hermes provider.

Setup: hermes memory setup → select ByteRover → install brv CLI → set BRV_API_KEY in .env (Cloud) or configure local storage.

Storage: Local by default. Cloud option available.

Tools: 3 tools

Best for: Developers who want full visibility into what their agent knows, users who want human-readable knowledge storage they can edit manually, anyone whose sessions are long enough that compression regularly fires.

RetainDB

RetainDB focuses on retrieval quality through hybrid search: every query runs vector similarity, BM25 keyword matching, and a cross-encoder reranking step in parallel. The combination addresses a known weakness of vector-only retrieval — exact keyword matches (variable names, error codes, specific technical terms) that semantic search can miss.

RetainDB is the only provider in this list that is paid-only with no free tier. It's a cloud service; no self-hosted option is available.

Setup: hermes memory setup → select RetainDB → set RETAINDB_API_KEY in .env.

Storage: RetainDB Cloud.

Tools: 5 tools (matching OpenViking for highest tool count)

Best for: Teams with strict retrieval precision requirements, technical use cases where exact keyword matching matters as much as semantic relevance (code, error messages, configuration values), and budgets that justify a paid tier.

Which One Should You Pick?

You want the best recall accuracy → Hindsight. 94.6% on LongMemEval, structured extraction, knowledge graph. The only provider with a reflect operation that synthesizes across all stored memories.

You want zero additional dependencies → Holographic. Pure SQLite, no pip installs. Works out of the box in any environment including air-gapped.

You want to minimize token costs at scale → OpenViking. L0/L1/L2 tiered loading reduces token consumption 80-90% by loading summaries first and full details only when needed.

You want the fastest setup → Mem0. Freemium, 30-second setup, free tier available. Good starting point before committing to infrastructure.

You want the agent to model how you think → Honcho. Dialectic user modeling builds a behavioral model of the user over time, not just a fact store. Check licensing if self-hosting.

You want human-readable memory you can inspect and edit → ByteRover. Knowledge tree as Markdown files. Full visibility into what the agent knows.

You need hybrid keyword + semantic retrieval → RetainDB. Vector + BM25 + reranking. The only paid-only option, but retrieval precision is the strongest of the cloud providers.

Conclusion

The seven providers cover a wide range of trade-offs: local vs. cloud, free vs. paid, structured extraction vs. raw retrieval, accuracy vs. token efficiency. Most users will find the right choice from this short list:

Starting out → Mem0 (free, fast)
Privacy or accuracy matters → Hindsight (local, best benchmarks)
Constrained environment → Holographic (zero deps)
Modeling the user over time → Honcho (check AGPL if self-hosting)
Token cost at scale → OpenViking (tiered loading)

Run hermes memory setup and you'll be prompted through the full configuration for whichever you choose. If you go with Hindsight, the integration guide covers configuration options, recall tuning, and multi-machine setup.

For context on how Hermes's built-in memory layers work before adding a provider, see How Hermes Agent Memory Actually Works. Comparing Hermes to OpenClaw? See OpenClaw vs Hermes Agent: Memory Compared.