All blogs
Architecture

Recall without a vector search: how Memoir finds what it knows

Most agent memory recalls by embedding the query and scanning vectors. Memoir recalls the way you'd browse folders: the agent sees the map of named paths, picks the relevant ones, and reads them — no embeddings, no similarity scan, no extra model.

"Recall" doesn't have to mean "embed the query and cosine-search a vector DB." That's the reflex, but it's a choice — and Memoir makes a different one. Because Memoir already organized every memory into a meaningful taxonomy at write time, recall becomes navigation. And the thing doing the navigating — your agent — is already an LLM, so it can do the choosing itself.

Memoir recalls the way you'd browse folders, not the way a search engine guesses: the agent sees the map of named paths, picks the relevant ones, then reads them. No embeddings, no similarity scan, no extra model.

Let's follow a returning chat. The user says: "set things up the way I like them." Stored from earlier sessions are preferences.ui.theme = "dark mode", preferences.editor.keymap = "vim", profile.personal.health = "allergic to penicillin", and dozens more. Here's how the assistant finds the right ones.

Caller-driven recall The agent asks Memoir for a map of paths, picks the preferences paths itself, then reads exactly those. The map and read steps use no LLM inside Memoir. MESSAGE set things up the way I like 🗺️ SUMMARIZE map of stored paths 🟣 AGENT PICKS preferences .* 🔵 GET read those keys ✅ ANSWER dark mode, vim, … no LLM no LLM
summarize → pick → get. The gray steps run no LLM inside Memoir; the purple step is the agent's own judgment.

Step 1 — get the map with memoir_summarize

The agent first asks for a histogram of stored paths, grouped by taxonomy depth. It's a pure structural scan — instant, no API key, no model:

memoir_summarize({ depth: 3 })
// →
{ "namespaces": { "default": {
    "preferences.ui.theme": 1,
    "preferences.editor.keymap": 1,
    "profile.personal.health": 1,
    "context.people.family": 2
}}}

The agent now sees what exists — readable paths, not opaque vector ids. metrics.* and the internal taxonomy namespace are excluded; this is user facts only.

Step 2 — the agent picks

The host model reads the map and selects the paths relevant to "the way I like things" → preferences.ui.theme and preferences.editor.keymap. Memoir doesn't choose — the calling agent does, using the full conversation it already has in context.

The agent picks from the map A histogram of stored paths. The two preferences rows are highlighted as the agent's pick; the rest are dimmed. memoir_summarize({ depth: 3 }) → preferences.ui.theme 1 ✓ picked preferences.editor.keymap 1 ✓ picked profile.personal.health 1 context.people.family 2
The agent is already an LLM — let it choose.

Step 3 — read exactly those with memoir_get

A batched exact-path fetch. No model, milliseconds:

memoir_get({ keys: ["preferences.ui.theme", "preferences.editor.keymap"] })
// →
[ { "key": "preferences.ui.theme", "content": "dark mode" },
  { "key": "preferences.editor.keymap", "content": "vim" } ]

Drilling a large store

For a big store, the depth-3 map is huge — so the agent drills by depth instead. Ask for the top level, pick a branch, narrow, repeat. This is the O(log n)-shaped move: a constant-size prompt at every step.

The narrowing funnel Top-level histogram, then pick a branch and narrow, then read the leaf keys. depth 1 preferences (28) · context (25) · knowledge (24) agent picks: preferences prefix = preferences .ui · .editor · .food … agent picks: preferences.ui 🔵 get preferences.ui.theme constant-size prompts at each step
Top-level histogram → pick a branch → narrow → read.

Vector recall fans out — compare the query to everything. Memoir narrows in — walk the tree. O(n) scan vs O(log n) drill.

The recall modes

The caller-driven drill above (summarize → get) is the recommended default, because the agent with full context picks the keys. There's also a one-shot memoir_recall shortcut with a mode for when the host can't drill:

Recall modes Four recall modes compared by who picks keys, whether an LLM runs inside Memoir, whether a key is needed, and latency. DEFAULT caller drill summarize→get picks: host agent LLM: none key: no latency: fast agent has context lexical keyword rank picks: keyword rank LLM: none key: no latency: instant quick lookups single 1 LLM call picks: memoir LLM LLM: 1 call key: yes latency: ~0.5–0.8s small, fuzzy store tiered multi-step LLM picks: memoir LLM LLM: 2–3 calls key: yes latency: ~1–2s large/noisy store
More modes exist — but if the host can drill, it should. single/tiered need a key and fall back to lexical without one.

Why this works

Two ideas carry the whole design:

  • Structure beats similarity. Because writes are classified into a meaningful taxonomy, reads are navigation. You don't guess relevance with vectors — you see the labels.
  • The agent is already an LLM. Don't pay for a second model inside the memory layer to pick keys. Hand the map to the agent that already understands the conversation — keyless, low-latency recall by default.
Fan-out vs narrow-in Vector recall fans the query out to every embedding and needs a model; Memoir walks the query down a tidy tree to a leaf with no model. VECTOR · fan out query compare to everything · needs a model MEMOIR · narrow in query leaf walk to the leaf · no model
O(n) similarity scan vs O(log n) taxonomy drill.

Read the MCP setup guide

Remember classifies a fact into a path; recall walks back to it. The intelligence lives in the structure — so finding a memory is browsing, not searching. The same flow powers the Claude Code, Hermes, and OpenClaw plugins.