Stop guessing what your agent cost
Memoir tracks turns, tool calls, errors, and latency per git branch — automatically, with zero LLM in the loop. Here's what changes when you can finally read that.
The invoice problem
You finish a feature. Diff looks clean. You merge. Three weeks later the bill arrives and you have no idea which branches were the expensive ones — let alone why.
The information was there the whole time. Every Claude Code session writes a JSONL transcript: tool calls, tool results, timestamps, model output. Nobody reads it. Even if you wanted to, doing it after the fact across forty branches isn't a project — it's a punishment.
The fix is to read it once, when it's cheap, and store the answer where you can find it later.
Branches are budgets
Metrics aren't a global counter. They're per-branch. The Stop
hook reads which memoir branch you're on (it follows your code
branch automatically) and writes the deltas to
metrics.turn.<branch>.
That single design choice changes what you can do:
- One branch ≈ one feature ≈ one cost line. Finish a branch, look at its row, know what the work took. No instrumentation step. No "remember to start a timer."
- Identity survives merges. When a feature lands
on
main, its metrics stay onfeature/foo.maindoesn't accumulate every branch's tally into one inflated bucket — each feature stays attributable. - Branches rank against each other. Same agent, same model, four columns. The shape of the work is legible at a glance.
Branch Turns Calls Errors Avg latency
main 76 390 10 85.2 s
feature/stop.hook.stats 3 93 0 391.1 s
feature/metric.codebase.stas 8 86 1 140.9 s
feature/add.forget.ui 1 4 0 52.1 s feature/stop.hook.stats jumps out: 3 turns, 93 tool
calls, ~6.5 minutes per turn. That's the shape of "agent did a
lot of exploratory reading per request."
feature/add.forget.ui is the opposite — tight,
surgical, one turn. You can read the style of work each
branch represents, not just its cost.
Catch the thrash, not the invoice
Two columns are diagnostic, not descriptive: Repeats and Errors.
- Repeats > 0 means the agent issued the same tool call with the same args more than once. Usually a sign it didn't read the previous result. A branch heavy on repeats is a prompt that's leaving the agent guessing.
- Errors trending up is a tools / permissions / setup story. A tall error bar means a branch fought the sandbox, the auth, or the file system. You don't need to know which call — you just know to investigate before the next branch repeats it.
Catching this during the work, not in the post-mortem, is the whole point.
If your only signal that a branch was expensive is the next invoice, you've already paid to find out.
Find the shape of cheap features
Some branches are cheap. Low calls per turn, low output chars, low latency. They tend to share two things: a clear target and a focused prompt.
Once you can see which branches were cheap, you can deliberately replicate their shape — same scoping discipline, same instructions, same agent. Cheap branches aren't a coincidence; they're a pattern that's now visible.
Free is the whole point
Every previous attempt at "agent observability" hit the same wall: the observability cost more than the agent did. Add a layer that calls a model to summarize each turn, and three weeks later your audit budget has eaten your build budget.
The metrics path runs no model calls. It's:
- a Python read of the JSONL transcript (string lengths, counts, timestamp deltas)
- integer addition into the existing branch tally
- one tree write under
metrics.turn.<branch>
Milliseconds per turn, kilobytes per branch. Toggleable with
MEMOIR_NO_METRICS=1. The schema stays narrow on
purpose: nine integers, no per-tool breakdown, no per-model
split, no dollar conversion. Wider schemas need expensive merges.
Narrow schemas stay free — and a free signal is one you'll
actually leave on.
Char counts aren't tokens, but they correlate to within ~5% on real workloads. That's plenty to compare branches against each other. For invoicing, your provider's dashboard is still the source of truth. The point isn't to replace it — it's to give you a number you can read before it shows up there.
Try it now
Once Memoir is installed, metrics start collecting on the next turn — no flag to flip. View them with:
$ /memoir:ui The Statistics modal's Metrics tab renders the per-branch table plus four bar distributions: avg latency, tool errors, output chars, tool result chars.
Read the meter while you're driving, not after the bill arrives.