All blogs
Essay

Stop guessing what your agent cost

Memoir tracks turns, tool calls, errors, and latency per git branch — automatically, with zero LLM in the loop. Here's what changes when you can finally read that.

The invoice problem

You finish a feature. Diff looks clean. You merge. Three weeks later the bill arrives and you have no idea which branches were the expensive ones — let alone why.

The information was there the whole time. Every Claude Code session writes a JSONL transcript: tool calls, tool results, timestamps, model output. Nobody reads it. Even if you wanted to, doing it after the fact across forty branches isn't a project — it's a punishment.

The fix is to read it once, when it's cheap, and store the answer where you can find it later.

Branches are budgets

Metrics aren't a global counter. They're per-branch. The Stop hook reads which memoir branch you're on (it follows your code branch automatically) and writes the deltas to metrics.turn.<branch>.

That single design choice changes what you can do:

  • One branch ≈ one feature ≈ one cost line. Finish a branch, look at its row, know what the work took. No instrumentation step. No "remember to start a timer."
  • Identity survives merges. When a feature lands on main, its metrics stay on feature/foo. main doesn't accumulate every branch's tally into one inflated bucket — each feature stays attributable.
  • Branches rank against each other. Same agent, same model, four columns. The shape of the work is legible at a glance.
Branch                              Turns    Calls   Errors   Avg latency
main                                   76      390       10       85.2 s
feature/stop.hook.stats                 3       93        0      391.1 s
feature/metric.codebase.stas            8       86        1      140.9 s
feature/add.forget.ui                   1        4        0       52.1 s

feature/stop.hook.stats jumps out: 3 turns, 93 tool calls, ~6.5 minutes per turn. That's the shape of "agent did a lot of exploratory reading per request." feature/add.forget.ui is the opposite — tight, surgical, one turn. You can read the style of work each branch represents, not just its cost.

Catch the thrash, not the invoice

Two columns are diagnostic, not descriptive: Repeats and Errors.

  • Repeats > 0 means the agent issued the same tool call with the same args more than once. Usually a sign it didn't read the previous result. A branch heavy on repeats is a prompt that's leaving the agent guessing.
  • Errors trending up is a tools / permissions / setup story. A tall error bar means a branch fought the sandbox, the auth, or the file system. You don't need to know which call — you just know to investigate before the next branch repeats it.

Catching this during the work, not in the post-mortem, is the whole point.

If your only signal that a branch was expensive is the next invoice, you've already paid to find out.

Find the shape of cheap features

Some branches are cheap. Low calls per turn, low output chars, low latency. They tend to share two things: a clear target and a focused prompt.

Once you can see which branches were cheap, you can deliberately replicate their shape — same scoping discipline, same instructions, same agent. Cheap branches aren't a coincidence; they're a pattern that's now visible.

Free is the whole point

Every previous attempt at "agent observability" hit the same wall: the observability cost more than the agent did. Add a layer that calls a model to summarize each turn, and three weeks later your audit budget has eaten your build budget.

The metrics path runs no model calls. It's:

  • a Python read of the JSONL transcript (string lengths, counts, timestamp deltas)
  • integer addition into the existing branch tally
  • one tree write under metrics.turn.<branch>

Milliseconds per turn, kilobytes per branch. Toggleable with MEMOIR_NO_METRICS=1. The schema stays narrow on purpose: nine integers, no per-tool breakdown, no per-model split, no dollar conversion. Wider schemas need expensive merges. Narrow schemas stay free — and a free signal is one you'll actually leave on.

Char counts aren't tokens, but they correlate to within ~5% on real workloads. That's plenty to compare branches against each other. For invoicing, your provider's dashboard is still the source of truth. The point isn't to replace it — it's to give you a number you can read before it shows up there.


Try it now

Once Memoir is installed, metrics start collecting on the next turn — no flag to flip. View them with:

$ /memoir:ui

The Statistics modal's Metrics tab renders the per-branch table plus four bar distributions: avg latency, tool errors, output chars, tool result chars.

Get started with Memoir

Read the meter while you're driving, not after the bill arrives.