use-other-models-in-claude-code

85% of what Claude Code does doesn't need Claude. ## TL;DR - Only 15% of Claude Code token usage requires Opus-level reasoning; the other 85% is commodity work any cheap model handles. - OpenRouter is the officially documented path: two env vars unlock 400+ models including DeepSeek V3 at $0.14/M input (35x cheaper than Opus 4.7). - LiteLLM is the right choice for teams — it adds budget caps, rate limits, and audit logs between Claude Code and any provider. - Non-Claude models degrade in long agentic sessions (200+ turns); keep Opus for sessions over 150 turns or tasks requiring multi-file reasoning. - The rule: if a junior dev solves it in under two minutes, route it to a cheap model. --- That figure isn't from hype — it's from OpenRouter routing analysis showing only 15% of token usage in Claude Code sessions demands Opus-level reasoning ([source](https://openrouter.ai/docs/guides/coding-agents/claude-code-integration)). The other 85% is file reads, search operations, boilerplate generation, and simple edits. Most developers burn flagship AI spend on grunt work. The fix is a task-routing map, not just a config change. ## Task taxonomy: what actually needs flagship reasoning Categorize every task Claude Code performs into two buckets. **Commodity work** (safe for cheap models): - Reading source files to understand context - Grepping for variable definitions, imports, or usages - Generating getters, setters, or boilerplate scaffolding - Writing docstrings and inline comments - Creating test stubs from existing code patterns - Simple one-file edits with clear instructions **High-value work** (keep on Opus): - Refactoring across multiple files with shared state - Debugging race conditions or concurrency issues - Designing API contracts or schema migrations - Security review of authentication or data handling logic - Long agentic sessions where context coherence matters A simple heuristic: if the task takes a junior developer under two minutes, route it to a cheap model. If it requires a senior engineer's judgment, use Opus. The 85/15 split means you can cut spend by 70–85% without touching the tasks where Claude's reasoning depth actually shows. ![diagram](http://localhost:3001/api/media/file/use-other-models-in-claude-code-1.webp) ## Path 1: OpenRouter (individuals, fastest setup) OpenRouter is officially documented in Claude Code's guides ([openrouter.ai/docs/guides/coding-agents/claude-code-integration](https://openrouter.ai/docs/guides/coding-agents/claude-code-integration)). Two environment variables and you're done: ```bash export ANTHROPIC_BASE_URL=https://openrouter.ai/api/v1 export ANTHROPIC_API_KEY=your_openrouter_key ``` Start Claude Code as normal. Use `/model` mid-session to switch without restarting. OpenRouter exposes 400+ models through the same Anthropic Messages API format Claude Code already speaks — no adapters, no wrappers. **Which models to use for commodity work:** | Task type | Recommended model | Input cost | |-----------|------------------|-----------| | File reads, search, grep | DeepSeek V3 (`deepseek/deepseek-chat`) | $0.14/M | | Boilerplate, docstrings | Llama 3.3 70B (`meta-llama/llama-3.3-70b-instruct`) | Free (rate-limited) | | Simple edits, test stubs | DeepSeek V3 or Qwen3 72B | $0.14–$0.29/M | | Architecture, debugging | Claude Opus 4.7 (`anthropic/claude-opus-4-7`) | $5/M | OpenRouter response metadata includes `total_cost` and `usage` fields per request, so you can measure spend per task type and validate the 85/15 split against your actual usage. ## Path 2: LiteLLM proxy (teams, compliance, governance) LiteLLM is officially documented in Claude Code as the LLM gateway ([code.claude.com/docs/en/llm-gateway](https://code.claude.com/docs/en/llm-gateway)). It sits between Claude Code and multiple model providers, adding: - Per-user and per-team spending caps - Rate limiting that prevents one session from burning the monthly budget - Centralized API key management — developers never touch provider credentials - Fallback chains: if DeepSeek V3 hits rate limits, route to GPT-4o-mini automatically - Audit logs for compliance A minimal LiteLLM config for a team using DeepSeek as default with Opus fallback: ```yaml model_list: - model_name: claude-code-default litellm_params: model: deepseek/deepseek-chat api_key: os.environ/DEEPSEEK_API_KEY - model_name: claude-code-opus litellm_params: model: anthropic/claude-opus-4-7 api_key: os.environ/ANTHROPIC_API_KEY router_settings: fallbacks: - {"claude-code-default": ["claude-code-opus"]} ``` Point Claude Code at the LiteLLM proxy the same way you'd point it at OpenRouter: ```bash export ANTHROPIC_BASE_URL=http://localhost:4000 export ANTHROPIC_API_KEY=your_litellm_master_key ``` One documented enterprise deployment uses LiteLLM to proxy Claude Code to Azure AI Foundry, satisfying compliance requirements that block direct Anthropic API calls entirely. Cost tracking alone justifies the setup for any team spending more than $100/month on LLM APIs — you can't optimize what you can't measure. ## The cost math Claude Opus 4.7 costs $5/M input and $25/M output on OpenRouter ([source](https://openrouter.ai/anthropic/claude-opus-4.7)). DeepSeek V3 runs at $0.14/M input and $1.10/M output. That's a 35x difference on input. In one documented Opus 4.6 session spanning 170 turns, input tokens were 96.2% of the total bill. If 85% of those turns were commodity tasks, the majority of that spend went to work DeepSeek handles equally well. Model routers built around task complexity routing report 70–85% savings in documented cases — consistent with the 85/15 split. Over a week of active development at 4–6 hours per day, a developer running all tasks through Opus might spend $200–$400 on API costs. Routing the 85% commodity tasks to DeepSeek V3 brings that to $30–$60 — with zero quality loss on file reads, boilerplate, and search. ## The real risk: context degradation in long sessions Non-Claude models show measurable degradation in long agentic sessions, particularly in the 200–380 turn range. Claude re-references the task definition more frequently, keeping decisions anchored. As one practitioner put it: *"A fast agent that drifts can generate more cleanup work than a slower, more consistent one."* Three things don't work on non-Claude backends: - **Thinking** — extended reasoning mode - **Compaction** — Claude Code's built-in context compression - **Skills** — the slash-command skill system Vision-plus-code coherence also degrades: Claude Opus handles a screenshot alongside CSS debugging as a single unified task. Other models fragment the context. The routing map needs a session-length parameter. Under 50 turns on a focused task, cheap models are safe. Beyond 150 turns, or any session involving cross-file refactoring or debugging chains, switch to Opus. The 85/15 rule applies to token volume — a single long agentic session might flip entirely into the 15% that needs Opus. ## Putting it together You don't need a cheaper Claude plan — you need a routing strategy. OpenRouter starts in two lines of shell config. LiteLLM scales to a team with compliance requirements. The savings are real (70–85% documented), but only if you respect where each model breaks. Use cheap models for commodity work, keep session length in mind, and save Opus for the tasks that actually need it. The developers who get this right aren't just spending less — they're shipping faster, because they stopped paying flagship prices for file reads.

Use other models in claude code