Claude Cost Optimizer

Save 30-60% on Claude Code costs with an installable skill, CLI tools, and 11 deep-dive guides.

Install

Claude Code (official plugin system):

/plugin marketplace add Sagargupta16/claude-cost-optimizer
/plugin install cost-mode@claude-cost-optimizer

Multi-agent (Cursor, Cline, Codex, 40+ agents):

npx skills add Sagargupta16/claude-cost-optimizer

Then activate in any session:

/cost-mode              # Standard (40-60% output token reduction)
/cost-mode lite         # Professional brevity (20-40% output reduction)
/cost-mode strict       # Telegraphic, max savings (60-70% output reduction)
/cost-mode off          # Resume normal behavior

What cost-mode Does

Feature	How It Saves Tokens
Strips filler	Drops pleasantries, hedging, restating your question, trailing summaries
Suggests cheaper models	"Haiku handles this -- `/model haiku`" for simple tasks
Suggests CLI tools	"Use `prettier`/`eslint --fix` directly" instead of burning LLM tokens
Session awareness	Reminds to `/compact` after 20+ turns, fresh sessions for new tasks
Minimal code gen	Diffs over rewrites, no obvious comments, no speculative error handling
Auto-deactivates	Full clarity for security warnings, destructive ops, and when you're confused

Technical accuracy is never sacrificed. Code in commits and PRs is written normally.

Full skill documentation

Rate your setup

Locally, in 5 seconds (recommended)

claude-rate runs on your filesystem -- no signup, no GitHub upload, no network round-trip. Pick whichever runner fits your shell:

# curl one-shot (no Node, no install)
curl -sSL https://raw.githubusercontent.com/Sagargupta16/claude-cost-optimizer/main/tools/claude-rate/install.sh | sh -s -- .

# curl, persistent install
curl -sSL https://raw.githubusercontent.com/Sagargupta16/claude-cost-optimizer/main/tools/claude-rate/install.sh | sh -s -- --install

Add --fix to print copy-pasteable fix commands, --strict to fail CI when the grade drops below B, or --json for machine-readable output. See tools/claude-rate/README.md for the full breakdown.

The local rater inspects things the web analyzer can't: real MCP server count from .mcp.json, hooks, .claudeignore coverage gaps vs files actually on disk, accidentally-committed secrets, and cost-mode skill installation status.

Public repos -- web tools

Not live yet. These pages are built in site/ but the deployed site currently serves the landing page only. Until they ship, use claude-rate above.

Tool	What It Does
Repo Analyzer	Paste a GitHub URL to get a cost audit, grade (A+ to F), and recommendations
Cost Calculator	Estimate monthly spend based on your model, sessions, and config
Badge Checker	Score your setup and get a shields.io badge for your repo

Before vs After

Real project, 30-turn session, Opus 4.8:

BEFORE (no optimization):                 AFTER (5 minutes of setup):
  CLAUDE.md:        6,200 chars (truncated)  CLAUDE.md:        2,800 chars (under limit)
  .claudeignore:    missing                   .claudeignore:    12 patterns
  MCP servers:      6 active                  MCP servers:      2 active
  System prompt:    ~15,000 tokens/turn       System prompt:    ~5,500 tokens/turn
  Session cost:     $2.85                     Session cost:     $1.12
  Monthly (3x/day): $188.10                   Monthly (3x/day): $73.92

  Savings: $114.18/month (61%)

Quick Start (5 Minutes, No Skill Needed)

Even without installing the skill, these 5 changes cut costs immediately:

#	Strategy	Savings	Guide
1	Keep CLAUDE.md under 4,000 characters -- content beyond 4K is silently truncated	10-20%	Context Optimization
2	Use Haiku for simple tasks (`--model haiku`) -- 5x cheaper than Opus	20-40%	Model Selection
3	Use Plan Mode before coding -- prevents wasted iterative cycles	15-25%	Workflow Patterns
4	Add `.claudeignore` -- stop Claude from reading node_modules, dist, lock files	5-15%	Context Optimization
5	Delegate to subagents -- isolate expensive searches from main context	20-40%	Workflow Patterns

Full walkthrough: Getting Started in 5 Minutes

Skills Roadmap

cost-mode is the first skill. More are planned:

Skill	Status	What It Does	Cost Impact
cost-mode	Live	Concise responses, model routing suggestions, session awareness	30-60% cost savings
claudeignore-gen	Planned	Auto-generates .claudeignore based on your project's tech stack	5-15% input reduction
context-compress	Planned	Rewrites your CLAUDE.md to be shorter while keeping all essential info	10-20% input reduction
cache-optimizer	Planned	Detects cache-busting patterns and suggests fixes to maximize prompt cache hits	10-25% input reduction
budget-guard	Planned	Per-session and per-day spending limits with warnings before you blow past them	Prevents overspend

Want to build one? Skills are just SKILL.md files -- see CONTRIBUTING.md and the skills/cost-mode/ directory for the pattern.

Guides

11 deep-dive guides covering every optimization area:

Guide	What You'll Learn
00 - Getting Started	Zero to optimized in 5 minutes -- the essential setup
01 - Understanding Costs	How billing works, what costs the most, where money goes
02 - Context Optimization	Reduce input tokens: CLAUDE.md, .claudeignore, file reads
03 - Model Selection	When to use Opus vs Sonnet vs Haiku (with decision tree)
04 - Workflow Patterns	Plan mode, subagents, commands, batch operations
05 - Team Budgeting	Per-developer budgets, cost tracking, ROI calculation
06 - Access Methods & Pricing	Compare API vs Bedrock vs Vertex AI vs Claude Code pricing
07 - MCP & Agent Cost Impact	MCP server overhead, subagent costs, Agent SDK patterns
08 - Prompt Caching Deep Dive	Cache mechanics, TTL economics, maximizing hit rates, ROI math
09 - Subscription Plan Value	Choose the right plan, maximize allowance, upgrade/downgrade signals
10 - Three-Tier Task Routing	Skip the LLM for Tier 0 tasks, route cheap tasks to Haiku, save Opus for complex work

Also: Visual Diagrams (Mermaid flowcharts) | One-Page Cheatsheet

Templates

Copy-paste configs that are already optimized:

CLAUDE.md: Minimal | Standard | Comprehensive | Monorepo

Settings: Cost-Conscious | Balanced | Performance-First

Commands: /cost-check | /budget-mode | /quick-fix | /optimize

CLI Tools

7 tools for measuring, tracking, and reducing costs. Full tools documentation

Tool	What It Does
Token Estimator	Estimate token count and cost for any file
Usage Analyzer	Find cost hotspots across your sessions
Badge Generator	Grade your project config (A+ to F) from the CLI
MCP Cost Server	In-session cost estimation via MCP
VS Code Extension	Token count and cost in the status bar
GitHub Action	Automated cost audit on PRs
Budget Hooks	Track tool calls, log costs, warn at thresholds

Pricing Reference (verified 2026-06-12)

Model	Input / 1M	Output / 1M	Cache Hit / 1M	5m Cache Write / 1M	1h Cache Write / 1M	Context	Max Output
Fable 5 (most capable)	$10.00	$50.00	$1.00	$12.50	$20.00	1M	128K
Mythos 5 (limited, Glasswing)	$10.00	$50.00	$1.00	$12.50	$20.00	1M	128K
Opus 4.8 (Opus flagship)	$5.00	$25.00	$0.50	$6.25	$10.00	1M	128K
Opus 4.7	$5.00	$25.00	$0.50	$6.25	$10.00	1M	128K
Opus 4.6	$5.00	$25.00	$0.50	$6.25	$10.00	1M	128K
Opus 4.5	$5.00	$25.00	$0.50	$6.25	$10.00	200K	64K
Opus 4.1	$15.00	$75.00	$1.50	$18.75	$30.00	200K	32K
Sonnet 4.6	$3.00	$15.00	$0.30	$3.75	$6.00	1M	64K
Sonnet 4.5	$3.00	$15.00	$0.30	$3.75	$6.00	200K	64K
Haiku 4.5	$1.00	$5.00	$0.10	$1.25	$2.00	200K	64K
Mythos Preview (retires 2026-06-30)	$25.00	$125.00	$2.50	$31.25	$50.00	1M	--

1M context on Fable 5, Mythos 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6 bills at standard rates across the full window (no long-context premium). Batch API: 50% off both input and output (up to 300K output via the output-300k-2026-03-24 beta). Fast Mode (Opus 4.8, 4.7, and 4.6, research preview): Opus 4.8 is 2x ($10/$50), Opus 4.7 and 4.6 are 6x ($30/$150); up to 2.5x output tokens/sec. Regional endpoints (Bedrock / Vertex AI / Claude API inference_geo: "us" for 4.6+ models): +10%. Subscriptions: Pro $20/mo (or $200/yr ≈ $16.67/mo, ~17% off), Max 5x $100/mo, Max 20x $200/mo. Web search: $10 per 1,000 searches plus token costs. Web fetch: free beyond token costs. Code execution: free with web search/fetch; otherwise 1,550 free hours/month then $0.05/hour per container. Bash tool: +245 input tokens. Text editor tool: +700 input tokens.

Opus 4.8 / 4.7 tokenizer caveat: The new tokenizer (Opus 4.7 and later) may use up to 35% more tokens for the same fixed text. Effective per-task cost is higher than posted pricing suggests — factor this into budgets, especially when comparing to Opus 4.6 / Sonnet 4.6.

Fast Mode (research preview): Available on Opus 4.8, Opus 4.7, and Opus 4.6 via the fast-mode-2026-02-01 beta header (speed: "fast"). Pricing is per-model: Opus 4.8 = 2x ($10/$50 per MTok), Opus 4.7 / 4.6 = 6x ($30/$150 per MTok) (4.6 Fast Mode is deprecated as of the 4.8 launch). Up to 2.5x more output tokens/second; speed gain is on output tokens/sec, not time-to-first-token. Opus 4.8 Fast Mode is Claude API + Managed Agents only. Not available on Claude Platform on AWS, Batch API, or Priority Tier. Switching speeds invalidates prompt cache. Join the waitlist.

Fable 5 / Mythos 5 (GA 2026-06-09): Anthropic's new Mythos-class tier above Opus, at 2x Opus 4.8's price ($10/$50). Same specs for both: 1M context at standard rates, 128K max output, always-on adaptive thinking (control depth with effort; thinking: disabled not supported), 4.7-generation tokenizer. Fable 5 is GA everywhere (Claude API, Claude Platform on AWS, Bedrock, Vertex AI, Microsoft Foundry) and includes safety classifiers that can decline requests -- a refusal returns HTTP 200 with stop_reason: "refusal", pre-output refusals are not billed, and the beta fallbacks parameter plus fallback credit make retrying on another model cheap. Mythos 5 is the same model without the classifiers, limited to approved Project Glasswing customers. No Fast Mode on either; Batch API supported ($5/$25). Requires 30-day data retention (no zero-data-retention option).

Mythos Preview: superseded by Mythos 5 -- retires 2026-06-30. Was the invite-only defensive-cybersecurity research preview under Project Glasswing.

Looking for older model IDs and pricing? See the Legacy & Retired Models section below for migration context.

Legacy & Retired Models

Reference only -- don't use these for new work. Kept for migration context if you're unwinding code that still pins old model IDs.

Recently retired (requests now fail):

Model	Retired on	Migrate to
Claude Opus 3 (`claude-3-opus-20240229`)	2026-01-05	Opus 4.8
Claude Sonnet 3.7 (`claude-3-7-sonnet-20250219`)	2026-02-19	Sonnet 4.6
Claude Haiku 3.5 (`claude-3-5-haiku-20241022`)	2026-02-19 (still on Bedrock + Vertex AI)	Haiku 4.5
Claude Haiku 3 (`claude-3-haiku-20240307`)	2026-04-20	Haiku 4.5
Claude Sonnet 3.5 v1 / v2, Sonnet 3, Claude 2.x, Claude 1.x, Instant 1.x	2024-2025	See deprecations page

Deprecated, retiring soon:

Model	Retirement date	Migrate to
Claude Sonnet 4 (`claude-sonnet-4-20250514`)	2026-06-15	Sonnet 4.6
Claude Opus 4 (`claude-opus-4-20250514`)	2026-06-15	Opus 4.8
Claude Mythos Preview (`claude-mythos-preview`)	2026-06-30	Mythos 5 (Glasswing)
Claude Opus 4.1 (`claude-opus-4-1-20250805`)	2026-08-05	Opus 4.8

Older snapshots still callable (not retired, but not the headline tier):

Snapshot	Pricing	Context	Why use
Opus 4.7	$5/$25	1M	Previous flagship -- pin if not yet re-tuned for 4.8
Opus 4.6	$5/$25	1M	Pinned workloads / older tokenizer
Opus 4.5	$5/$25	200K	Pinned workloads only
Opus 4.1	$15/$75	200K	Compatibility only -- 3x more expensive
Sonnet 4.5	$3/$15	200K	Pinned workloads only

Authoritative source: platform.claude.com/docs/en/about-claude/model-deprecations.

The cheatsheet has a more detailed legacy table including last-known pricing for every retired tier.

Benchmarks & Case Studies

Task Comparison -- same task, optimized vs not
Model Comparison -- Opus vs Sonnet vs Haiku
Context Size Impact -- how CLAUDE.md size affects cost
Community Leaderboard -- crowdsourced cost-per-task data
Case Studies -- real-world optimization stories

Community

GitHub Discussions -- ask questions, share strategies
Contributing -- from starring the repo to building new skills
Issue Templates -- tips, benchmarks, case studies

Complementary Projects

Project	What It Does	Savings
caveman	Brevity skill -- strips filler from responses	50-75% output
claude-mem	Compresses session context for handoff	Input reduction
claudetop	htop-style monitoring with cost tracking	Visibility

FAQ

How much does Claude Code actually cost?

With Pro ($20/mo or $200/yr ≈ $16.67/mo with annual billing — 17% off), Max 5x ($100/mo), or Max 20x ($200/mo), you get included usage. Heavy users report $3-15/day without optimization, $1-5/day with it. Opus 4.8, 4.7, and 4.6 at $5/$25 are 3x cheaper per token than Opus 4.1 ($15/$75) — but the new tokenizer (Opus 4.7 and later) can use up to 35% more tokens, so the effective gap is closer to ~2x.

Does this apply to the Claude API too?

Context optimization, model selection, and prompt engineering apply to both. The skill and commands are Claude Code-specific.

Will optimization reduce output quality?

No. These strategies eliminate waste (duplicate context, unnecessary file reads, expensive models for simple tasks). Quality stays the same or improves -- less noise means better reasoning.

What's the biggest single change I can make?

Install cost-mode (npx skills add Sagargupta16/claude-cost-optimizer) and switch to Haiku for routine tasks. Combined: 50-70% savings.

Star History

If this repo helped you save money, consider giving it a star!

More AI Developer Tools

If you found this useful, check out my other AI/Claude tools:

Project	Description
claude-code-recipes	50+ copy-paste recipes for Claude Code - commands, subagents, hooks, skills
claude-skills	Custom Claude Code plugin marketplace with dev-workflow, FARM stack, and more
agent-recipes	AI agent workflows for real-world dev tasks - code review, testing, security
ai-git-hooks	AI-powered git hooks - auto-review diffs, generate commit messages, security scanning
mcp-toolkit	Production-ready middleware for MCP servers - auth, caching, rate limiting

License

MIT - use these strategies, templates, and tools however you want.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.agents/plugins		.agents/plugins
.claude-plugin		.claude-plugin
.github		.github
benchmarks		benchmarks
case-studies		case-studies
docs		docs
guides		guides
hooks		hooks
plugins/cost-mode		plugins/cost-mode
site		site
skills/cost-mode		skills/cost-mode
templates		templates
tools		tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
cheatsheet.md		cheatsheet.md
renovate.json		renovate.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Cost Optimizer

Install

What cost-mode Does

Rate your setup

Locally, in 5 seconds (recommended)

Public repos -- web tools

Before vs After

Quick Start (5 Minutes, No Skill Needed)

Skills Roadmap

Guides

Templates

CLI Tools

Pricing Reference (verified 2026-06-12)

Legacy & Retired Models

Benchmarks & Case Studies

Community

Complementary Projects

FAQ

Star History

More AI Developer Tools

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Claude Cost Optimizer

Install

What cost-mode Does

Rate your setup

Locally, in 5 seconds (recommended)

Public repos -- web tools

Before vs After

Quick Start (5 Minutes, No Skill Needed)

Skills Roadmap

Guides

Templates

CLI Tools

Pricing Reference (verified 2026-06-12)

Legacy & Retired Models

Benchmarks & Case Studies

Community

Complementary Projects

FAQ

Star History

More AI Developer Tools

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages