How Hermes Agent Memory Actually Works (And How to Make It Better)

March 31, 2026

Most people who set up Hermes Agent expect memory to work like a recorder — everything gets captured, everything is retrievable later. Then they run a few sessions and notice the memory files are empty. Or they ask Hermes something it should know from last week and it draws a blank.

This isn't a bug. It's a mismatch between expectations and how the system is actually designed.

Hermes Agent's memory is more layered than it first appears. There are two distinct things happening — built-in memory that the agent manages itself, and a separate skills system that handles the self-improving part. They work differently, live in different places, and have different limitations.

On top of that, Hermes recently shipped a new pluggable memory provider system that fundamentally changes how external memory works. If you set it up after this update, the experience is quite different from what earlier documentation described.

This article explains how the whole system actually works, where the built-in memory falls short, and how to choose an external provider if you want more.

The Four Memory Layers

Hermes doesn't have one memory system — it has four. Most confusion comes from treating them as one thing.

Layer 1: Prompt memory (hot)

Two small files in ~/.hermes/memories/:

MEMORY.md (~2,200 characters, ~800 tokens) — durable facts: environment details, project conventions, discovered workarounds, lessons learned
USER.md (~1,375 characters, ~500 tokens) — user profile: preferences, communication style, identity

Both are loaded as a frozen snapshot into the system prompt at session start — frozen to keep the LLM prefix cache stable. Updates written during a session are persisted to disk immediately but won't appear in the system prompt until the next session. This is the agent's "working knowledge" — small, always present, always current.

Layer 2: Session archive (cold recall)

All CLI and messaging sessions are stored in a SQLite database (~/.hermes/state.db). The agent can search this via the session_search tool when it needs episodic recall — "did we discuss X before?" or "what happened with the auth service last week?". Results are summarized by a configurable LLM call (defaults to whatever provider you have set up).

The key distinction: architecture determines access, not agent judgment. Prompt memory is always in context. Session archive is only accessed when the agent explicitly calls session_search. The design keeps the system prompt small and cache-stable while enabling rich historical recall when needed.

Layer 3: Skills (procedural memory)

When Hermes completes a complex task, it writes a reusable skill document — a markdown file capturing the approach, tools used, and steps that worked. Skills are stored in ~/.hermes/skills/, are searchable, and self-improve as the agent reuses and refines them. This is the self-improving part people expect memory to handle.

Layer 4: External provider (optional)

The pluggable provider system described later in this article. Adds structured extraction, entity resolution, and cross-session persistence on top of the built-in layers.

Both prompt memory and skills are agent-curated, but they trigger differently. Skills are created reactively after task completion — a clear, predictable event. Prompt memory is updated on the agent's judgment, prompted periodically by a configurable nudge_interval and flushed proactively in gateway mode before idle timeout. In short sessions or between nudge intervals, you may see nothing written to MEMORY.md.

Where the Built-in System Has Limits

The built-in layers are well-designed for what they are. The gaps become apparent in specific situations.

Prompt memory is small by design. The combined ~1,300 token budget for MEMORY.md and USER.md is intentional — it keeps the system prompt stable for prefix caching. But it means the agent is constantly making judgment calls about what's worth the space. Under memory pressure, when an add exceeds the character limit the tool returns an error and the agent must consolidate or remove entries before retrying. Nuanced specifics can get compressed away.

Session search is keyword-based. The SQLite FTS5 index works well when you search using the same words stored in the transcript. It struggles with rephrasing and relational questions — "what did I tell you about the auth service?" when the stored session says "authentication microservice uses Redis for session tokens." The agent has to know to call session_search and use the right query terms.

No entity resolution or relationship tracking. Neither prompt memory nor the session archive has any concept that "Alice" and "my coworker Alice from engineering" are the same person, or that two mentions of "the database" across different sessions refer to the same system. Facts accumulate without structure.

Compression can lose in-flight context. Before compressing a long conversation, Hermes triggers a dedicated memory flush — a separate model call where only the memory tool is available, giving the agent a chance to save important facts before compression collapses the history. Facts that weren't flagged during that flush are gone once the context is summarized.

None of this means the built-in system is bad — the four-layer design is intentional and keeps memory transparent, local, and inspectable. But it explains why users who need structured recall, entity resolution, or automatic extraction across long sessions reach for an external provider.

The New Memory Provider System

Hermes recently shipped a pluggable memory provider system that changes the setup significantly. Previously, the only external memory option was Honcho, which had to be configured manually and wasn't enabled by default — a source of significant confusion in the community. The new system replaces that with a unified interface: seven external providers ship with Hermes, and a setup wizard walks you through picking and configuring one.

hermes memory setup

This opens an interactive picker, installs any required dependencies, prompts for credentials, and activates the provider. One command replaces what used to be manual config file editing.

The architecture is two layers:

Built-in memory (always on): MEMORY.md and USER.md continue to work as described above. This layer never turns off.
One external provider (optional): Adds structured capture, better retrieval, cross-session persistence, and provider-specific capabilities on top of the built-in layer.

Check what's active at any time:

hermes memory status

Turn the external provider off to go back to built-in only:

hermes memory off

The Seven Providers

Provider	Storage	Cost	LongMemEval	Key Capability
Hindsight	Local or Cloud	Free (local)	94.6%	Knowledge graph, structured facts + entities, reflect synthesis
Honcho	Cloud	Varies	—	Dialectic user modeling, peer cards — OSS is AGPL v3.0
Mem0	Cloud	Freemium	67.6% (GPT-4o)†	Server-side LLM extraction, circuit breaker
OpenViking	Self-hosted	Free	—	Filesystem hierarchy, tiered memory loading (L0/L1/L2)
Holographic	Local SQLite	Free	—	FTS5 + trust scoring + HRR algebra — zero pip dependencies
RetainDB	Cloud	Paid	—	Hybrid search: vector + BM25 + reranking
ByteRover	Local or Cloud	Freemium	—	Pre-compression extraction, knowledge tree

† Mem0 score from LongMemEval-S, a benchmark variant — not directly comparable to the full LongMemEval used for Hindsight.

Hindsight is the only provider with a reflect operation — a cross-memory synthesis that reads across all stored memories to derive higher-level insights and update the knowledge graph. It's also the only provider that stores structured knowledge (facts, entities, relationships) rather than text chunks, which means retrieval is precise rather than approximate. On LongMemEval, the standard benchmark for agent memory, Hindsight scores 94.6% (473/500 correct) — the top officially reproduced result on the Agent Memory Benchmark leaderboard. Runs locally by default with no recurring cost; Hindsight Cloud is available if you want to share memory across machines.

Holographic is notable for zero pip dependencies — it runs on local SQLite and implements its own retrieval using HRR (Holographic Reduced Representations) algebra with trust scoring. Good choice if you want enhanced local memory without installing anything extra.

Honcho is the original external provider and remains the best option if you want dialectic user modeling — a system specifically designed to build a model of how the user thinks, not just what they've said. License note: Honcho's open-source codebase is licensed under the GNU Affero General Public License v3.0 (AGPL v3.0). AGPL is a strong copyleft license: if you self-host Honcho as part of a networked service or application, you are required to release the full source code of that application under AGPL as well. This does not apply if you use Honcho's managed cloud service. For personal self-hosted use this typically isn't an issue, but if you're building a commercial or proprietary product and want to run Honcho yourself, consult legal counsel before doing so — or use the cloud service instead.

Mem0, RetainDB, and ByteRover are cloud-based with varying pricing. Mem0 is the easiest to set up and has a free tier.

OpenViking uses a tiered loading system (L0/L1/L2) that prioritizes recently accessed or frequently relevant memories over flat retrieval.

Choosing Based on Your Situation

All seven providers improve on built-in memory. The right one depends on what matters most to you.

You want the best recall accuracy and don't mind a full local setup → Hindsight. It scores 94.6% on LongMemEval — the top officially reproduced result on the Agent Memory Benchmark leaderboard. Structured extraction means it retrieves the right facts, not just chunks that happened to match your query. Runs locally with PostgreSQL bundled; nothing leaves your machine.

You want enhanced local memory with zero extra dependencies → Holographic. Pure SQLite, no pip installs beyond Hermes itself. If you're running in an air-gapped environment or just want something that can't break due to an external API going down, Holographic is the right call. Trade-off: less structured extraction than Hindsight.

You want the fastest setup and are fine with cloud → Mem0. Free tier, 30-second setup, reliable extraction. Scores 67.6% on LongMemEval-S (per this paper) — solid for most use cases. Best starting point if you want something working immediately and can decide on infrastructure later.

You want Hermes to deeply understand how you think, not just what you've said → Honcho. The dialectic user modeling approach builds a model of your reasoning patterns over time, not just a fact store. This is the most differentiated option for personal assistants where the relationship between agent and user deepens with use. Note that Honcho's open-source codebase is AGPL v3.0 licensed — fine for personal use or using the managed cloud service, but self-hosting it as part of a commercial or proprietary product carries source-disclosure obligations.

You're running Hermes in a team environment and need shared memory → Hindsight (Cloud or self-hosted server) or RetainDB. Hindsight's Cloud option gives you a managed shared memory store with the same structured extraction as the local version. RetainDB's hybrid search (vector + BM25 + reranking) is worth evaluating if retrieval precision across a large shared corpus is the priority.

You need memory that survives context compression reliably → ByteRover. Its pre-compression extraction hook fires specifically before Hermes compresses context, ensuring in-flight facts are captured before they're summarized away. Useful for very long-running sessions where compression fires frequently.

Quick reference:

Situation	Provider
Best accuracy, local-first	Hindsight
Zero dependencies, local	Holographic
Fastest setup, free tier	Mem0
Deep user modeling	Honcho
Team / shared memory	Hindsight Cloud or RetainDB
Long sessions with frequent compression	ByteRover

Setting Up Hindsight

Hindsight gives Hermes structured long-term memory: facts, entities, relationships, and a reflect operation that synthesizes across everything it knows. It runs locally by default — no cloud account required, nothing leaves your machine.

hermes memory setup

Select Hindsight from the picker. The wizard installs hindsight-client, prompts for your configuration, and writes the activation key to config.yaml. On next launch, Hindsight is active.

What changes once it's running:

Before each response, Hindsight queries its knowledge graph for facts relevant to your current message and injects them into context. The agent has relevant history without needing to search for it.

After each turn, the conversation is sent to Hindsight for extraction in the background. Facts are pulled out ("production database runs on port 5433"), entities are identified (people, services, projects), and relationships are mapped ("auth service depends on Redis for sessions"). The agent doesn't spend tokens on this — it happens after the response, not during it.

Periodically, the reflect operation synthesizes across all stored memories to derive higher-level insights and consolidate related facts. This is the closest thing to genuine self-improvement in the memory layer: the system actively refines what it knows rather than just accumulating raw extractions.

Three tools are added to Hermes's tool registry: hindsight_recall, hindsight_retain, and hindsight_reflect. The agent can call these explicitly when needed, but the prefetch-and-sync loop means most memory operations happen automatically.

For multi-machine setups or shared memory across team instances, point Hindsight at a remote server by setting HINDSIGHT_API_KEY in .env during setup. The full provider documentation covers configuration options, recall tuning, and team deployments.

FAQ

Why hasn't Hermes written anything to my memory files?

The built-in memory is agent-curated — Hermes writes to MEMORY.md and USER.md when it judges something worth persisting, aided by a periodic nudge that prompts it to reflect and save. In very short or narrowly task-focused sessions, nothing notable may be flagged. Also note that mid-session writes don't appear in the system prompt until the next session starts — so even if saves occurred, you won't see them reflected until you restart. If you want automatic capture of everything regardless of session length, an external provider like Hindsight handles that in the background without relying on the agent's judgment.

What's the difference between memory and skills?

Memory (MEMORY.md / USER.md) stores facts about you, your projects, and your environment. Skills are task procedures — markdown documents Hermes writes after completing complex tasks so it can reuse the approach later. Skills are the primary self-improving mechanism. Memory is contextual recall. They're stored separately and work independently.

Do I need an external provider?

No. The built-in memory works without any setup and is sufficient for casual use. An external provider is worth adding if you want automatic extraction rather than agent-initiated saves, structured retrieval that handles rephrasing and relational questions, or memory that persists reliably across long sessions.

Can I use multiple external providers at once?

No. The system enforces a single active external provider. Built-in memory always runs alongside whichever external provider you pick, but you can't run Hindsight and Mem0 simultaneously.

What happened to the old Honcho setup?

Honcho still works and is one of the seven providers. The change is that it now lives inside the unified provider system rather than being a separate standalone integration. If you had Honcho configured previously, Hermes auto-migrates your config — no data loss, no manual steps needed.

Conclusion

Hermes Agent's memory is more capable than new users expect — it's just not automatic by default in the way most people assume. The built-in layer handles persistent facts and user context through agent-managed files. The skills system handles the self-improving loop. These are separate things, and understanding the distinction resolves most of the confusion.

The new provider system makes adding structured, automatic memory straightforward: one command, an interactive wizard, and you have a full external memory backend running alongside the built-in layer.

If you're starting fresh, Hindsight is worth trying first. It's the only provider with structured knowledge extraction, entity resolution, and a reflect operation that synthesizes across everything it knows — and it runs locally with no recurring cost. On LongMemEval it scores highest among all providers tested.

Run hermes memory setup, pick a provider, and let it run for a week. The difference from built-in-only is noticeable within a few sessions. If you go with Hindsight, the integration guide covers the full configuration reference.

For a detailed breakdown of all seven providers, see Hermes Agent Memory Providers: All 7 Options Compared. If you're running into issues with your current setup, Hermes Agent Memory Not Working? Here's Why covers the most common problems and fixes. Comparing Hermes to OpenClaw? See OpenClaw vs Hermes Agent: Memory Compared.