Token ledger — where did the tokens go?

a pocket guide to spending less on claude code. ten rules, in order of how much they actually move the needle. read it before you start a long session — not after the receipts come in.

most important

cache preservation

pick a model. stick with it.

Never switch models mid-session. Each model has its own cache; switching forces a full cold re-read of identical content.
Don't add or remove MCP servers mid-session — changing the tool list busts the cache.
Decide your model at session start; if a subtask needs a cheaper one, spin a subagent or open a fresh short session — never switch the main one.

rule of thumb: the cache is a coffee that goes cold the second you stir it. don't stir.

context management

clear, compact, watch.

Use /clear between unrelated tasks — stale context costs tokens on every message.
Run /compact at ~70%, not the default 95%.
Add custom compact instructions: keep current goal, changed files, errors, decisions. Drop everything else.
Keep /usage open and glance at it.

CLAUDE.md

under 200 lines. essentials only.

It loads every single session — every line is a tax you pay forever.
Move workflow-specific instructions (PR review, migrations, etc.) into Skills so they load on demand.
If you can't justify a line to a future you, it shouldn't be there.

model choice

decide before you start.

Default to Sonnet.
Opus only for genuinely hard architectural / reasoning work.
Haiku for subagents doing simple things.
Decide upfront — switching mid-session breaks the cache (see Rule 01).

prompting

be specific.

"add input validation to the login function in auth.ts" — not "improve the codebase".
Use plan mode(Shift+Tab) before complex tasks so Claude doesn't burn tokens going down the wrong path.

subagents

delegate the noise.

Send verbose operations (test runs, log processing, doc fetching) to a subagent.
The noisy output stays in their context. Only the summary comes back to you.
Pair with Haiku for simple, repetitive subagent work.

MCP servers

trim. then commit.

Disable unused servers — even when deferred they add overhead.
Prefer CLI tools (gh, aws, gcloud) over their MCP equivalents.
Configure every server you'll need before a long task. Adding one midway busts the cache.

extended thinking

don't think when you don't need to.

Lower effort with /effort or set MAX_THINKING_TOKENS=8000 for simple tasks.
Disable thinking entirely in /configfor straightforward work — it's invisible spend.

cache ttl awareness

mind the clock. your plan decides how forgiving it is.

pro plan

5-minute TTL

go idle mid-task and the next message pays the full re-upload cost. don't take long coffee breaks.

max plan

1-hour TTL

much more forgiving. you can step away and pick up where you left off without bleeding tokens.

.claudeignore

tell it what not to look at.

Exclude node_modules, build artifacts, generated files — anything Claude shouldn't read but might pull in by adding it in .claudeignore.
If you've ever watched it grep a lockfile, you needed this yesterday.

stop the
bleed.save the tokens.

the ten rules.

pick a model. stick with it.

clear, compact, watch.

under 200 lines. essentials only.

decide before you start.

be specific.

delegate the noise.

trim. then commit.

don't think when you don't need to.

mind the clock. your plan decides how forgiving it is.

tell it what not to look at.

now go audit a session.

stop thebleed.save the tokens.

the ten rules.

pick a model. stick with it.

clear, compact, watch.

under 200 lines. essentials only.

decide before you start.

be specific.

delegate the noise.

trim. then commit.

don't think when you don't need to.

mind the clock. your plan decides how forgiving it is.

tell it what not to look at.

now go audit a session.

stop the
bleed.save the tokens.