Stop Burning Claude Code Tokens on Invisible Overhead

# Stop Burning Claude Code Tokens on Invisible Overhead *Stop paying for context you don't need. A practical guide to reclaiming your Claude Code budget.* --- If you've noticed your Claude Code usage climbing faster than your output, you're not imagining things. The culprit isn't necessarily the model—it's the invisible infrastructure running beneath every prompt. Most developers treat Claude Code like a blank canvas: you ask, it answers, tokens flow. But in reality, every session arrives pre-loaded with dozens of hidden token costs: project rules, plugin hooks, tool schemas, conversation history, and caching behavior. Left unmanaged, these overhead patterns can consume **up to 73% of your token budget** before you've written a single line of productive code. This guide breaks down where those tokens go—and exactly how to get them back. --- ## 🔍 Where Your Tokens Actually Go Before you optimize, understand the anatomy of a Claude Code request. Every prompt triggers: 1. **Project context injection** (`CLAUDE.md`, `.cursorrules`, etc.) 2. **Active skill files** (auto-loaded based on task detection) 3. **MCP tool schemas** (full definitions from connected servers) 4. **Hook-injected context** (UserPromptSubmit, SessionStart, etc.) 5. **Full conversation history** (re-tokenized on every turn) 6. **Cache state management** (TTL-based invalidation) 7. **Extended thinking overhead** (if globally enabled) Multiply these by 20–50 interactions per session, and overhead compounds rapidly. --- ## 🧹 The 9 Overhead Patterns (And How to Fix Them) ### 1. CLAUDE.md Bloat (~14% of tokens) **Problem**: Project rule files grow silently. A 4,800-token `CLAUDE.md` loads on *every* turn, creating a constant baseline tax. **Fix**: ```bash # Check your file size wc -w ~/.claude/CLAUDE.md .claude/CLAUDE.md # Refactor aggressively: ✅ Move framework-specific rules to project-level configs ✅ Extract reusable patterns into Skills (load-on-demand) ✅ Delete rules you can't recall writing ✅ Convert verbose explanations into 3-word imperatives 🎯 Target: <1,500 tokens combined (global + project) ``` ### 2. Conversation History Re-reads (~13%) **Problem**: Every follow-up re-tokenizes the entire chat. Message #30 pays to re-read messages 1–29. **Fix**: - Edit previous messages instead of adding follow-ups (`↑` → edit → resend) - Hard-cap conversations at 20 messages; use `/compact` to summarize & continue - Start fresh chats with a 3-bullet summary when context grows stale ### 3. Hook Injection Waste (~11%) **Problem**: UserPromptSubmit hooks inject context before *every* prompt. Four hooks = ~2,400–6,200 tokens of pre-prompt overhead. **Fix**: ```json // Audit ~/.claude/settings.json { "hooks": { "UserPromptSubmit": [ // Keep only universally relevant hooks: "git-branch", "active-env", "project-root" // Remove task-specific injectors that fire on every prompt ] } } ``` ### 4. Cache Misses on Session Resume (~10%) **Problem**: Anthropic's prompt cache has a ~5-minute TTL. Pause >5 minutes = cache miss = full-price re-tokenization. **Fix**: - Send a trivial "ping" prompt before breaks <10 min to reset cache TTL - For longer breaks: keep baseline overhead lean so misses cost less - Consider 1-hour cache lifetime on paid plans if you resume frequently ### 5. Skill Loading on Irrelevant Tasks (~7%) **Problem**: Skills auto-load when Claude detects "possible relevance." Conservative detection = unnecessary token spend. **Fix**: ```bash # Audit usage over 7 days ls -lt ~/.claude/skills/ | head -10 # Disable unused skills: claude skills disable legacy-refactor claude skills disable experimental-api # Keep 3–4 core skills active; archive the rest ``` ### 6. "Just-in-Case" Tool Definitions (~6%) **Problem**: Every connected MCP server ships its full tool schema on *every* request—whether used or not. **Fix**: ```bash # List active MCP servers claude mcp list # Remove unused servers from auto-load: # Edit ~/.claude/settings.json → mcpServers # Prefer project-level MCP configs over global defaults ``` ### 7. Extended Thinking on Simple Tasks (~5%) **Problem**: Leaving "Advanced Thinking" globally enabled burns 2,000–3,000 tokens of `` on trivial requests. **Fix**: - Default: Extended thinking **OFF** - Toggle ON per-message (`Alt+T` in Claude Code) only when needed - Reserve for architecture decisions, debugging, multi-step planning ### 8. Wrong-Direction Generation (~4%) **Problem**: Letting Claude finish a misguided 400-line response wastes output tokens. **Fix**: - `Cmd+.` (Mac) / `Ctrl+.` (Win) to stop generation immediately - Redirect from the last useful point instead of restarting ### 9. Plugin Auto-Update Redundancy (~3%) **Problem**: SessionStart hooks from multiple plugins add ~50 tokens each; 9 plugins = ~450 tokens/session startup. **Fix**: - Reduce SessionStart hooks to only essential context (branch, env vars) - Disable verbose "loaded successfully" notifications --- ## 🚫 Optimization Traps to Avoid Not all "efficiency hacks" deliver. Here's what *doesn't* work: | Strategy | Why It Fails | |----------|-------------| | Switching to cheaper models for "simple" tasks | Overhead dominates model cost; lean context > cheap model | | Aggressively clearing context between tasks | Loses valuable continuity; better to compact & continue | | Disabling all skills globally | Manual re-instruction costs more than selective loading | | Scheduling work during "off-peak" hours | Anthropic's rate limits affect few users; overhead matters more | | Downgrading subscription tiers | Cost-per-work-hour stays similar; limits just feel more painful | --- ## 🧠 The Mental Model Shift > **Claude Code sessions aren't blank slates—they're pre-charged invoices.** Before your first productive token, every session pays for: - Your `CLAUDE.md` (always) - Active plugin hooks (always) - Loaded Skill files (when relevance detected) - MCP tool schemas (always) - Conversation history (always) - Cache recompilation (on resume after TTL) **Productive tokens = Total tokens − Overhead tax** If overhead is 73%, prompt quality barely moves the needle. Cut the tax first. --- ## 📋 Your 30-Minute Audit Checklist ```bash □ Check CLAUDE.md size: wc -w ~/.claude/CLAUDE.md .claude/CLAUDE.md → Target: <1,500 tokens combined □ Audit hooks: cat ~/.claude/settings.json | jq '.hooks' → Keep only universally relevant hooks □ List MCP servers: claude mcp list → Remove unused servers from auto-load □ Review active skills: ls ~/.claude/skills/ → Disable anything not used in past 7 days □ Test cache behavior: Send prompt → wait 6 min → send again → Monitor token usage spike (cache miss indicator) □ Toggle extended thinking: Default OFF, enable per-task via Alt+T ``` --- ## 🔧 Pro Tips for Sustainable Optimization 1. **Version your context files**: Treat `CLAUDE.md` like code—commit changes, review diffs, revert bloat. 2. **Use project-level configs**: Keep global settings minimal; push specificity to `.claude/` per project. 3. **Monitor token usage**: Use proxy tools or Anthropic's dashboard to spot overhead spikes early. 4. **Document your optimizations**: Share team-wide guidelines to prevent context creep. 5. **Re-audit monthly**: Overhead compounds silently. Schedule quarterly context reviews. --- ## 💡 The Bottom Line Most "Claude got dumber" complaints trace to context bloat—not model degradation. The same subscription plan can deliver 2–3× more productive work when the invisible tax is cut. Start small: trim your `CLAUDE.md`, disable one unused skill, and watch your token efficiency climb. Then scale the discipline. Your future self—and your usage dashboard—will thank you. --- *Optimization isn't about using less Claude. It's about making every token count.*