Save 30-60% on Claude Code costs with an installable skill, CLI tools, and 11 deep-dive guides.
Claude Code (official plugin system):
/plugin marketplace add Sagargupta16/claude-cost-optimizer
/plugin install cost-mode@claude-cost-optimizerMulti-agent (Cursor, Cline, Codex, 40+ agents):
npx skills add Sagargupta16/claude-cost-optimizerThen activate in any session:
/cost-mode # Standard (40-60% output token reduction)
/cost-mode lite # Professional brevity (20-40% output reduction)
/cost-mode strict # Telegraphic, max savings (60-70% output reduction)
/cost-mode off # Resume normal behavior
| Feature | How It Saves Tokens |
|---|---|
| Strips filler | Drops pleasantries, hedging, restating your question, trailing summaries |
| Suggests cheaper models | "Haiku handles this -- /model haiku" for simple tasks |
| Suggests CLI tools | "Use prettier/eslint --fix directly" instead of burning LLM tokens |
| Session awareness | Reminds to /compact after 20+ turns, fresh sessions for new tasks |
| Minimal code gen | Diffs over rewrites, no obvious comments, no speculative error handling |
| Auto-deactivates | Full clarity for security warnings, destructive ops, and when you're confused |
Technical accuracy is never sacrificed. Code in commits and PRs is written normally.
claude-rate runs on your filesystem -- no signup, no GitHub upload, no network round-trip. Pick whichever runner fits your shell:
# curl one-shot (no Node, no install)
curl -sSL https://raw.githubusercontent.com/Sagargupta16/claude-cost-optimizer/main/tools/claude-rate/install.sh | sh -s -- .
# curl, persistent install
curl -sSL https://raw.githubusercontent.com/Sagargupta16/claude-cost-optimizer/main/tools/claude-rate/install.sh | sh -s -- --installAdd --fix to print copy-pasteable fix commands, --strict to fail CI when the grade drops below B, or --json for machine-readable output. See tools/claude-rate/README.md for the full breakdown.
The local rater inspects things the web analyzer can't: real MCP server count from .mcp.json, hooks, .claudeignore coverage gaps vs files actually on disk, accidentally-committed secrets, and cost-mode skill installation status.
Not live yet. These pages are built in site/ but the deployed site currently serves the landing page only. Until they ship, use
claude-rateabove.
| Tool | What It Does |
|---|---|
| Repo Analyzer | Paste a GitHub URL to get a cost audit, grade (A+ to F), and recommendations |
| Cost Calculator | Estimate monthly spend based on your model, sessions, and config |
| Badge Checker | Score your setup and get a shields.io badge for your repo |
Real project, 30-turn session, Opus 4.8:
BEFORE (no optimization): AFTER (5 minutes of setup):
CLAUDE.md: 6,200 chars (truncated) CLAUDE.md: 2,800 chars (under limit)
.claudeignore: missing .claudeignore: 12 patterns
MCP servers: 6 active MCP servers: 2 active
System prompt: ~15,000 tokens/turn System prompt: ~5,500 tokens/turn
Session cost: $2.85 Session cost: $1.12
Monthly (3x/day): $188.10 Monthly (3x/day): $73.92
Savings: $114.18/month (61%)
Even without installing the skill, these 5 changes cut costs immediately:
| # | Strategy | Savings | Guide |
|---|---|---|---|
| 1 | Keep CLAUDE.md under 4,000 characters -- content beyond 4K is silently truncated | 10-20% | Context Optimization |
| 2 | Use Haiku for simple tasks (--model haiku) -- 5x cheaper than Opus |
20-40% | Model Selection |
| 3 | Use Plan Mode before coding -- prevents wasted iterative cycles | 15-25% | Workflow Patterns |
| 4 | Add .claudeignore -- stop Claude from reading node_modules, dist, lock files |
5-15% | Context Optimization |
| 5 | Delegate to subagents -- isolate expensive searches from main context | 20-40% | Workflow Patterns |
Full walkthrough: Getting Started in 5 Minutes
cost-mode is the first skill. More are planned:
| Skill | Status | What It Does | Cost Impact |
|---|---|---|---|
| cost-mode | Live | Concise responses, model routing suggestions, session awareness | 30-60% cost savings |
| claudeignore-gen | Planned | Auto-generates .claudeignore based on your project's tech stack | 5-15% input reduction |
| context-compress | Planned | Rewrites your CLAUDE.md to be shorter while keeping all essential info | 10-20% input reduction |
| cache-optimizer | Planned | Detects cache-busting patterns and suggests fixes to maximize prompt cache hits | 10-25% input reduction |
| budget-guard | Planned | Per-session and per-day spending limits with warnings before you blow past them | Prevents overspend |
Want to build one? Skills are just SKILL.md files -- see CONTRIBUTING.md and the skills/cost-mode/ directory for the pattern.
11 deep-dive guides covering every optimization area:
| Guide | What You'll Learn |
|---|---|
| 00 - Getting Started | Zero to optimized in 5 minutes -- the essential setup |
| 01 - Understanding Costs | How billing works, what costs the most, where money goes |
| 02 - Context Optimization | Reduce input tokens: CLAUDE.md, .claudeignore, file reads |
| 03 - Model Selection | When to use Opus vs Sonnet vs Haiku (with decision tree) |
| 04 - Workflow Patterns | Plan mode, subagents, commands, batch operations |
| 05 - Team Budgeting | Per-developer budgets, cost tracking, ROI calculation |
| 06 - Access Methods & Pricing | Compare API vs Bedrock vs Vertex AI vs Claude Code pricing |
| 07 - MCP & Agent Cost Impact | MCP server overhead, subagent costs, Agent SDK patterns |
| 08 - Prompt Caching Deep Dive | Cache mechanics, TTL economics, maximizing hit rates, ROI math |
| 09 - Subscription Plan Value | Choose the right plan, maximize allowance, upgrade/downgrade signals |
| 10 - Three-Tier Task Routing | Skip the LLM for Tier 0 tasks, route cheap tasks to Haiku, save Opus for complex work |
Also: Visual Diagrams (Mermaid flowcharts) | One-Page Cheatsheet
Copy-paste configs that are already optimized:
CLAUDE.md: Minimal | Standard | Comprehensive | Monorepo
By Stack: React+Vite | Next.js | FastAPI | MERN | Terraform | Go | Rust | Django | Rails | Spring Boot
Settings: Cost-Conscious | Balanced | Performance-First
Commands: /cost-check | /budget-mode | /quick-fix | /optimize
7 tools for measuring, tracking, and reducing costs. Full tools documentation
| Tool | What It Does |
|---|---|
| Token Estimator | Estimate token count and cost for any file |
| Usage Analyzer | Find cost hotspots across your sessions |
| Badge Generator | Grade your project config (A+ to F) from the CLI |
| MCP Cost Server | In-session cost estimation via MCP |
| VS Code Extension | Token count and cost in the status bar |
| GitHub Action | Automated cost audit on PRs |
| Budget Hooks | Track tool calls, log costs, warn at thresholds |
| Model | Input / 1M | Output / 1M | Cache Hit / 1M | 5m Cache Write / 1M | 1h Cache Write / 1M | Context | Max Output |
|---|---|---|---|---|---|---|---|
| Fable 5 (most capable) | $10.00 | $50.00 | $1.00 | $12.50 | $20.00 | 1M | 128K |
| Mythos 5 (limited, Glasswing) | $10.00 | $50.00 | $1.00 | $12.50 | $20.00 | 1M | 128K |
| Opus 4.8 (Opus flagship) | $5.00 | $25.00 | $0.50 | $6.25 | $10.00 | 1M | 128K |
| Opus 4.7 | $5.00 | $25.00 | $0.50 | $6.25 | $10.00 | 1M | 128K |
| Opus 4.6 | $5.00 | $25.00 | $0.50 | $6.25 | $10.00 | 1M | 128K |
| Opus 4.5 | $5.00 | $25.00 | $0.50 | $6.25 | $10.00 | 200K | 64K |
| Opus 4.1 | $15.00 | $75.00 | $1.50 | $18.75 | $30.00 | 200K | 32K |
| Sonnet 4.6 | $3.00 | $15.00 | $0.30 | $3.75 | $6.00 | 1M | 64K |
| Sonnet 4.5 | $3.00 | $15.00 | $0.30 | $3.75 | $6.00 | 200K | 64K |
| Haiku 4.5 | $1.00 | $5.00 | $0.10 | $1.25 | $2.00 | 200K | 64K |
| Mythos Preview (retires 2026-06-30) | $25.00 | $125.00 | $2.50 | $31.25 | $50.00 | 1M | -- |
1M context on Fable 5, Mythos 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6 bills at standard rates across the full window (no long-context premium). Batch API: 50% off both input and output (up to 300K output via the output-300k-2026-03-24 beta). Fast Mode (Opus 4.8, 4.7, and 4.6, research preview): Opus 4.8 is 2x ($10/$50), Opus 4.7 and 4.6 are 6x ($30/$150); up to 2.5x output tokens/sec. Regional endpoints (Bedrock / Vertex AI / Claude API inference_geo: "us" for 4.6+ models): +10%. Subscriptions: Pro $20/mo (or $200/yr ≈ $16.67/mo, ~17% off), Max 5x $100/mo, Max 20x $200/mo. Web search: $10 per 1,000 searches plus token costs. Web fetch: free beyond token costs. Code execution: free with web search/fetch; otherwise 1,550 free hours/month then $0.05/hour per container. Bash tool: +245 input tokens. Text editor tool: +700 input tokens.
Opus 4.8 / 4.7 tokenizer caveat: The new tokenizer (Opus 4.7 and later) may use up to 35% more tokens for the same fixed text. Effective per-task cost is higher than posted pricing suggests — factor this into budgets, especially when comparing to Opus 4.6 / Sonnet 4.6.
Fast Mode (research preview): Available on Opus 4.8, Opus 4.7, and Opus 4.6 via the
fast-mode-2026-02-01beta header (speed: "fast"). Pricing is per-model: Opus 4.8 = 2x ($10/$50 per MTok), Opus 4.7 / 4.6 = 6x ($30/$150 per MTok) (4.6 Fast Mode is deprecated as of the 4.8 launch). Up to 2.5x more output tokens/second; speed gain is on output tokens/sec, not time-to-first-token. Opus 4.8 Fast Mode is Claude API + Managed Agents only. Not available on Claude Platform on AWS, Batch API, or Priority Tier. Switching speeds invalidates prompt cache. Join the waitlist.Fable 5 / Mythos 5 (GA 2026-06-09): Anthropic's new Mythos-class tier above Opus, at 2x Opus 4.8's price ($10/$50). Same specs for both: 1M context at standard rates, 128K max output, always-on adaptive thinking (control depth with
effort;thinking: disablednot supported), 4.7-generation tokenizer. Fable 5 is GA everywhere (Claude API, Claude Platform on AWS, Bedrock, Vertex AI, Microsoft Foundry) and includes safety classifiers that can decline requests -- a refusal returns HTTP 200 withstop_reason: "refusal", pre-output refusals are not billed, and the betafallbacksparameter plus fallback credit make retrying on another model cheap. Mythos 5 is the same model without the classifiers, limited to approved Project Glasswing customers. No Fast Mode on either; Batch API supported ($5/$25). Requires 30-day data retention (no zero-data-retention option).Mythos Preview: superseded by Mythos 5 -- retires 2026-06-30. Was the invite-only defensive-cybersecurity research preview under Project Glasswing.
Looking for older model IDs and pricing? See the Legacy & Retired Models section below for migration context.
Reference only -- don't use these for new work. Kept for migration context if you're unwinding code that still pins old model IDs.
Recently retired (requests now fail):
| Model | Retired on | Migrate to |
|---|---|---|
Claude Opus 3 (claude-3-opus-20240229) |
2026-01-05 | Opus 4.8 |
Claude Sonnet 3.7 (claude-3-7-sonnet-20250219) |
2026-02-19 | Sonnet 4.6 |
Claude Haiku 3.5 (claude-3-5-haiku-20241022) |
2026-02-19 (still on Bedrock + Vertex AI) | Haiku 4.5 |
Claude Haiku 3 (claude-3-haiku-20240307) |
2026-04-20 | Haiku 4.5 |
| Claude Sonnet 3.5 v1 / v2, Sonnet 3, Claude 2.x, Claude 1.x, Instant 1.x | 2024-2025 | See deprecations page |
Deprecated, retiring soon:
| Model | Retirement date | Migrate to |
|---|---|---|
Claude Sonnet 4 (claude-sonnet-4-20250514) |
2026-06-15 | Sonnet 4.6 |
Claude Opus 4 (claude-opus-4-20250514) |
2026-06-15 | Opus 4.8 |
Claude Mythos Preview (claude-mythos-preview) |
2026-06-30 | Mythos 5 (Glasswing) |
Claude Opus 4.1 (claude-opus-4-1-20250805) |
2026-08-05 | Opus 4.8 |
Older snapshots still callable (not retired, but not the headline tier):
| Snapshot | Pricing | Context | Why use |
|---|---|---|---|
| Opus 4.7 | $5/$25 | 1M | Previous flagship -- pin if not yet re-tuned for 4.8 |
| Opus 4.6 | $5/$25 | 1M | Pinned workloads / older tokenizer |
| Opus 4.5 | $5/$25 | 200K | Pinned workloads only |
| Opus 4.1 | $15/$75 | 200K | Compatibility only -- 3x more expensive |
| Sonnet 4.5 | $3/$15 | 200K | Pinned workloads only |
Authoritative source: platform.claude.com/docs/en/about-claude/model-deprecations.
The cheatsheet has a more detailed legacy table including last-known pricing for every retired tier.
- Task Comparison -- same task, optimized vs not
- Model Comparison -- Opus vs Sonnet vs Haiku
- Context Size Impact -- how CLAUDE.md size affects cost
- Community Leaderboard -- crowdsourced cost-per-task data
- Case Studies -- real-world optimization stories
- GitHub Discussions -- ask questions, share strategies
- Contributing -- from starring the repo to building new skills
- Issue Templates -- tips, benchmarks, case studies
| Project | What It Does | Savings |
|---|---|---|
| caveman | Brevity skill -- strips filler from responses | 50-75% output |
| claude-mem | Compresses session context for handoff | Input reduction |
| claudetop | htop-style monitoring with cost tracking | Visibility |
How much does Claude Code actually cost?
With Pro ($20/mo or $200/yr ≈ $16.67/mo with annual billing — 17% off), Max 5x ($100/mo), or Max 20x ($200/mo), you get included usage. Heavy users report $3-15/day without optimization, $1-5/day with it. Opus 4.8, 4.7, and 4.6 at $5/$25 are 3x cheaper per token than Opus 4.1 ($15/$75) — but the new tokenizer (Opus 4.7 and later) can use up to 35% more tokens, so the effective gap is closer to ~2x.
Does this apply to the Claude API too?
Context optimization, model selection, and prompt engineering apply to both. The skill and commands are Claude Code-specific.
Will optimization reduce output quality?
No. These strategies eliminate waste (duplicate context, unnecessary file reads, expensive models for simple tasks). Quality stays the same or improves -- less noise means better reasoning.
What's the biggest single change I can make?
Install cost-mode (npx skills add Sagargupta16/claude-cost-optimizer) and switch to Haiku for routine tasks. Combined: 50-70% savings.
If this repo helped you save money, consider giving it a star!
If you found this useful, check out my other AI/Claude tools:
| Project | Description |
|---|---|
| claude-code-recipes | 50+ copy-paste recipes for Claude Code - commands, subagents, hooks, skills |
| claude-skills | Custom Claude Code plugin marketplace with dev-workflow, FARM stack, and more |
| agent-recipes | AI agent workflows for real-world dev tasks - code review, testing, security |
| ai-git-hooks | AI-powered git hooks - auto-review diffs, generate commit messages, security scanning |
| mcp-toolkit | Production-ready middleware for MCP servers - auth, caching, rate limiting |
MIT - use these strategies, templates, and tools however you want.