token saverfield guide · v110 rules · 4 min read

save the
tokens.(the expensive ones.)

a pocket guide to spending less on claude code. ten rules, in order of how much they actually move the needle. read it before you start a long session — not after the receipts come in.

the headline rule
don't switch
never change models mid-session. each one has its own cache.
cache ttl
5 min / 1 hr
pro plan vs. max. don't go idle on pro mid-task.
CLAUDE.md cap
≤ 200 lines
loaded every session. essentials only.
%
compact early
at ~70%
not the default 95. preserve current goal, errors, decisions.

the ten rules.

ordered roughly by impact. the cache one matters more than all the others combined.
most important
01
cache preservation

pick a model. stick with it.

  • Never switch models mid-session. Each model has its own cache; switching forces a full cold re-read of identical content.
  • Don't add or remove MCP servers mid-session — changing the tool list busts the cache.
  • Decide your model at session start; if a subtask needs a cheaper one, spin a subagent or open a fresh short session — never switch the main one.
rule of thumb: the cache is a coffee that goes cold the second you stir it. don't stir.
02
context management

clear, compact, watch.

  • Use /clear between unrelated tasks — stale context costs tokens on every message.
  • Run /compact at ~70%, not the default 95%.
  • Add custom compact instructions: keep current goal, changed files, errors, decisions. Drop everything else.
  • Keep /usage open and glance at it.
03
CLAUDE.md

under 200 lines. essentials only.

  • It loads every single session — every line is a tax you pay forever.
  • Move workflow-specific instructions (PR review, migrations, etc.) into Skills so they load on demand.
  • If you can't justify a line to a future you, it shouldn't be there.
04
model choice

decide before you start.

  • Default to Sonnet.
  • Opus only for genuinely hard architectural / reasoning work.
  • Haiku for subagents doing simple things.
  • Decide upfront — switching mid-session breaks the cache (see Rule 01).
05
prompting

be specific.

  • "add input validation to the login function in auth.ts" — not "improve the codebase".
  • Use plan mode(Shift+Tab) before complex tasks so Claude doesn't burn tokens going down the wrong path.
06
subagents

delegate the noise.

  • Send verbose operations (test runs, log processing, doc fetching) to a subagent.
  • The noisy output stays in their context. Only the summary comes back to you.
  • Pair with Haiku for simple, repetitive subagent work.
07
MCP servers

trim. then commit.

  • Disable unused servers — even when deferred they add overhead.
  • Prefer CLI tools (gh, aws, gcloud) over their MCP equivalents.
  • Configure every server you'll need before a long task. Adding one midway busts the cache.
08
extended thinking

don't think when you don't need to.

  • Lower effort with /effort or set MAX_THINKING_TOKENS=8000 for simple tasks.
  • Disable thinking entirely in /configfor straightforward work — it's invisible spend.
09
cache ttl awareness

mind the clock. your plan decides how forgiving it is.

pro plan
5-minute TTL
go idle mid-task and the next message pays the full re-upload cost. don't take long coffee breaks.
max plan
1-hour TTL
much more forgiving. you can step away and pick up where you left off without bleeding tokens.
10
.claudeignore

tell it what not to look at.

  • Exclude node_modules, build artifacts, generated files — anything Claude shouldn't read but might pull in.
  • If you've ever watched it grep a lockfile, you needed this yesterday.

now go audit a session.

drop a jsonl into the ledger and see which rule you've been breaking.

open the ledger