Local control plane for model-slot routing, fallbacks, and provider switching across CLI coding tools.
Keep your client speaking in slots. Control what actually answers.
Most CLI coding tools assume stable access to expensive frontier models.
| Situation | The Failure Mode |
|---|---|
| Single provider | One outage, one price hike — your workflow stops |
| Hand-editing configs | Every backend swap requires touching LiteLLM config manually |
| No fallback logic | A dead route has no graceful alternative |
| No routing visibility | You don't know what's actually answering your slots |
| Locked-in clients | claude, codex assume you speak their model names |
"Most routing tools treat the client as the problem. BrokeLLM treats the routing layer as the solution."
BrokeLLM puts a local control plane in front of multiple providers. Clients keep speaking in stable lanes — Claude-style (sonnet, opus, haiku, custom, subagent), Codex-style (gpt54, gpt54mini, gpt53codex, gpt52codex, gpt52, gpt51codexmax, gpt51codexmini) — while you control which provider and model actually answers.
One stable local interface. Backends become swappable.
BrokeLLM is a governed execution system, not just a proxy, router, or harness in isolation.
Its runtime is composed of multiple cooperating layers:
- Execution: runs requests and tool-facing flows against the selected model/provider path
- Enforcement: applies boundary checks, policy rules, and runtime constraints
- Adjudication: separates worker, verifier, and adversary review paths
- Verdict System: resolves outcomes through formal verdict algebra
- Evidence Binding: anchors decisions to logs, artifacts, state, and execution traces
- Orchestration: controls retries, escalation, lifecycle flow, and route transitions
The diagrams below are views into those layers. No single diagram is the whole system.
BrokeLLM is not merely:
- a proxy
- a router
- a simple harness
- a wrapper around model calls
Those are mechanisms inside the runtime, not the runtime's full identity.
git clone https://github.com/B-A-M-N/BrokeLLM.git
cd BrokeLLM
./install.sh
cp .env.template .env
broke doctor
broke listNew to AI-assisted tools? See
HUMAN_ONLY/FOR_BEGINNERS.mdfor a beginner-facing guide.
| Client | Status | Notes |
|---|---|---|
| Claude CLI | ✅ Verified | Works against the local LiteLLM gateway |
| Codex CLI | ✅ Verified | Custom provider config via Responses API wire format |
| Gemini CLI | Raw endpoint works; CLI path not yet reliable |
BrokeLLM is built around three ideas:
The point: clients keep speaking in slots. You control what actually answers.
- Claude-style lanes:
sonnet,opus,haiku,custom,subagent - Codex-style lanes:
gpt54,gpt54mini,gpt53codex,gpt52codex,gpt52,gpt51codexmax,gpt51codexmini - Interactive route switching with
broke swap - Team and profile presets for reusable configurations
- Fallback chain support per slot
- Drift warnings for floating aliases
- Health, validation, explain, route, metrics, and probe commands
- Focused regression tests for control-plane behavior
- Single canonical truth boundary — no split-brain state
- BrokeLLM runtime proxy on
http://localhost:4000 - LiteLLM execution layer behind the proxy on an internal port
- Unified local entrypoint for
claudeandcodex - Simple one-command install
- Multi-key pools per provider using
*_API_KEY,*_API_KEY_2,*_API_KEY_3 - Live key rotation policy for new requests without restarting the client session
- Per-key health state:
healthy,cooldown,blocked,auth_failed - Atomic policy/state writes with generation tracking
- Reason-coded rotation logs for
429,401/403, timeout, and manual disable - Applies to both Claude and Codex, because both route through the same BrokeLLM proxy on
:4000
- Per-lane live model policy across Claude, Codex, and semantic lane families
- New requests can switch models without restarting the client session
- Per-model health state:
healthy,cooldown,blocked,incompatible - Model policy changes are generation-tracked and applied only to new requests
- In-flight requests stay on the model/key they already started with
- Harness mode is separate from
BROKE_MODE=cli|router - Supported profiles:
off,throughput,balanced,high_assurance - Verdict algebra:
ACCEPT,ACCEPT_WITH_WARNINGS,RETRY_NARROW,RETRY_BROAD,ESCALATE,BLOCK - Blocks integrity and boundary violations
- Retries weak or incomplete work
- Warns on mediocre but still operational work
- Persists last verdict and categories for later inspection
- Caches stable role doctrine, normalized evidence packets, and repeat-review verdicts to reduce prompt drift and no-op re-review churn
broke harness runturns BrokeLLM into the launch boundary for a mediated CLI-agent run with per-run ledgering and shimmed command observation
broke acp-serverexposes a Broke-owned ACP-compatible stdio control surface for harness mode by default- Gemini can sit behind native ACP-style transport semantics
- Claude runs as a persistent
stream-jsonCLI session normalized behind the Broke ACP facade - Codex runs as a persistent
app-serverJSON-RPC session normalized behind the Broke ACP facade - OpenRouter, Groq, Cerebras, and local models run as headless logical ACP lanes with persistent Broke-owned session state and ephemeral provider calls
- Session truth, cancellation, and lane-result schema stay Broke-owned even when provider runtimes differ
- BrokeLLM proxy binds to
127.0.0.1:4000 - LiteLLM binds to
127.0.0.1:4001 - Launch preflight enforces the pinned LiteLLM version, lockfile drift checks,
.pthallowlisting,PYTHONPATHhygiene,600runtime file permissions, and harness shim integrity before BrokeLLM starts or launches a client - Client requests require a local BrokeLLM token, except local health endpoints
- Proxy-to-LiteLLM traffic uses a separate internal master token
- Runtime policy/state/token files are written with restricted local permissions and shared file locking
- Claude, Codex, and Gemini launch through a sanitized client env to reduce stale auth/base-url leakage
- Provider secrets are injected only into the specific child processes that need them; BrokeLLM no longer globally exports
.env - Launched client sessions write an audit trail to
.launch_audit.logshowing the injected env surface and secret counts, never secret values or secret variable names - The proxy fails closed on unknown routed model ids with
400 invalid_model - The proxy rate-limits repeated invalid auth attempts from the same client address
- Top-level dependency input is pinned exactly in
requirements.txt - Full transitive lock lives in
requirements.lock - Installs use
pip --require-hashes - Optional local wheel mirror lives in
vendor/wheels make lockrefreshes the hash-verified lock filemake mirror-wheelsrefreshes the local wheel mirror
normalkeeps compatibility highest while still using targeted child-process env injectionhardenedadds the same isolated env launch with stricter process umaskstrictruns launched clients underbwrapwith an explicit filesystem/environment boundary and no arbitrary outbound network- In
strict, Claude and Codex get a local-only bridge back to the BrokeLLM proxy over a Unix socket; Gemini is blocked because it still needs direct network access - Strict-mode home binds are narrowed to client-specific state directories instead of broad
$HOME/.config/$HOME/.cacheexposure, and the sandbox only mounts BrokeLLM'sbin/helpers rather than the full BrokeLLM repo - Sandbox profiles apply to launched clients, not the internal BrokeLLM proxy itself
| Provider | Notes |
|---|---|
| OpenRouter | Recommended; includes free-tier models |
| Groq | Fast inference |
| Cerebras | High-throughput option |
| GitHub Models | GPT-4o-mini and others |
| Gemini | Direct endpoint verified; CLI path experimental |
| Hugging Face | Free-tier access |
LiteLLM itself supports far more providers. BrokeLLM is an intentionally smaller, opinionated layer on top.
All health, drift, and routing state is resolved through a single canonical normalization boundary in the control plane.
doctor, route, explain, and validate do not maintain separate views of backend identity. They consume the same shared resolver — so the tool always reports one consistent truth about:
- which backend a slot maps to
- whether that backend is healthy
- whether the route is pinned or floating
- whether fallback state is relevant
If those surfaces disagree, the control plane stops being trustworthy. That design rule matters more than any single command.
See ARCHITECTURE.md for the full truth-boundary specification.
broke # start the full wrapper
broke start # start only the gateway
broke stop
broke restart
broke statusbroke list
broke models
broke swap
broke swap <team>
broke fallback <slot> # interactive backend picker
broke fallback <slot> <fb1> [fb2...]
broke fallback policybroke team save <name> [cli|route]
broke team load <name>
broke team list
broke team delete <name>
broke team fallback <team> <slot> <fb1> [fb2...]
broke team access <team> [--slots sonnet,gpt54] [--rpm N] [--tpm N]
0means unlimited forrpmandtpm. Omitted values mean "leave unchanged."
broke profile new <name> <team> [--desc "text"] [--slots sonnet,gpt54] [--rpm N] [--tpm N]
broke profile load <name>
broke profile list
broke profile delete <name>broke doctor
broke validate
broke explain <slot>
broke route <slot>
broke metrics
broke probe <slot>
broke key-policy
broke key-state
broke model-policy
broke model-state
broke sandbox
broke harnessbroke key-policy
broke key-policy openrouter --mode round_robin --cooldown 45 --retries 2 --order OPENROUTER_API_KEY_2,OPENROUTER_API_KEY
broke key-state
broke key-state set OPENROUTER_API_KEY_2 blockedPolicy changes apply to new requests without forcing you to exit the current client session. In-flight requests keep using the key they already started with.
broke model-policy
broke model-policy gpt54 --mode priority_order --cooldown 60 --retries 2 --order OR/Qwen-3.6+,Groq/Kimi-K2
broke model-state
broke model-state set "Cerebras/GPT-OSS-120B" cooldownModel policy changes apply to new requests without forcing you to exit the current client session. In-flight requests keep using the model they already started with.
broke sandbox
broke sandbox set hardened
broke sandbox set strictstrict is intentionally optional. It is more isolated, but it can break clients that expect broader filesystem or session access.
Network policy by profile:
normal→ unrestrictedhardened→ unrestrictedstrict→ local-only for Claude/Codex, denied for Gemini
broke harness
broke harness checklist
broke harness run --policy balanced -- --help
broke harness set throughput
broke harness set balanced
broke harness set high_assurance
broke harness evaluate --worker ACCEPT --verifier RETRY_BROAD --adversary ACCEPT_WITH_WARNINGS --risk normal --retries 0
broke harness evaluate --task "fix fallback routing" --diff "bin/_mapping.py changed" --tests "python3 -m unittest"Harness mode is orthogonal to cli vs router mode. It does not replace routing. It adds a runtime decision layer on top of deterministic checks and optional worker/verifier/adversary verdict inputs.
The harness now persists review-lane control-plane state, not just final verdict summaries:
- durable
worker/verifier/adversarylane records - logical lane session ids independent of backend runtime lifetime
- typed evidence artifacts for task, diff, tests, commands, policy events, and retry history
- explicit observation vs inference separation in evidence packets
- durable per-lane verdict contributions
- degraded review-lane policy as part of verdict algebra
Use broke harness checklist to inspect the current implementation checklist from the CLI, or open broke dashboard to inspect lane health, checklist progress, and last lane contributions visually.
broke harness run is the first mediated-runtime slice. It registers a run_id, builds a per-run env plan, prepends harness shims for key tools, writes an append-only event ledger under .runtime/harness/<run_id>/events.jsonl, and opens a completion checkpoint when the launched client exits.
Spec docs:
- Harness Core
- Harness Code Profile
- Harness GSD Overlay
- Harness Lane Checklist
- Harness ACP Lane Spec
- Harness ACP Hardening
Worker lane options:
- default:
broke_routerwithbroke_managedcredentials - elevated option:
provider_directwithprovider_managedcredentials for the worker lane - Gemini surface-direct option:
client_directwithprovider_managedcredentials for harnessing Gemini CLI around BrokeLLM rather than through it
Example:
broke harness run --worker-route provider_direct --elevated --provider claude -- --help
broke harness run --worker-route client_direct --provider gemini -- --helpProvider-direct worker runs are explicit on purpose. They are treated as authority expansion and are denied unless --elevated is present.
broke snapshot save
broke snapshot list
broke snapshot restore <id>broke freeze # toggle freeze
broke freeze on # block swap-style state changes
broke freeze offbroke provider
broke provider list
broke provider swap [claude|codex|gemini]broke doctor
broke explain gpt54
broke route gpt54Use these together to isolate different failure classes:
| Command | What It Tells You |
|---|---|
broke doctor |
Whether the gateway and upstream backends are healthy |
broke explain gpt54 |
Configured slot, backend, health view, and fallback chain |
broke route gpt54 |
What would actually be selected for that slot right now |
This separates: provider failure / bad mapping state / access-policy blocking / fallback activation risk.
Route gpt54 through one provider lane, haiku through another, inspect, save:
broke list
broke explain gpt54
broke route gpt54
broke team save workCreate a restricted profile for a production app:
broke profile new myapp work --desc "Production app" --slots gpt54 --rpm 30
broke connect --profile myappProgrammatic clients can select a profile explicitly with X-Broke-Profile: myapp when using the global gateway token. Profile-bound tokens created with broke connect --profile ... are pinned to that profile and reject mismatched X-Broke-Profile overrides.
Codex — verified on April 3, 2026 using a custom provider configuration over the Responses API wire format:
env BROKE_DUMMY=dummy \
codex exec \
-c 'model_provider="broke"' \
-c 'model_providers.broke={name="BrokeLLM",base_url="http://localhost:4000/v1",env_key="BROKE_DUMMY",wire_api="responses"}' \
-m 'GitHub/GPT-4o-mini' --skip-git-repo-check --json \
"Respond with exactly OK and nothing else."The previous
OPENAI_BASE_URLenv override approach was not reliable for Codex and is no longer the recommended path.
Gemini — BrokeLLM's Google-compatible HTTP endpoint works. The Gemini CLI itself is not reliable through BrokeLLM: even with GOOGLE_GEMINI_BASE_URL and GEMINI_API_KEY_AUTH_MECHANISM set, the CLI internally routes to Gemini/2.0-Flash which BrokeLLM does not expose. Until that changes upstream, Gemini CLI compatibility should not be claimed.
make verifyCurrent regression coverage:
- Launch preflight integrity enforcement
- Health agreement across
doctor,route, andexplain - Pinned-state persistence during swap
- Floating alias drift warnings
- Zero-value limit semantics
- Installer interpreter consistency
- Env-name consistency
- Multi-key deployment generation across providers
- Live key-policy generation updates
- Live key-state persistence and cooldown behavior
- Live model-policy generation updates
- Live model-state persistence and cooldown behavior
- Local proxy auth and internal upstream auth enforcement
- Strict sandbox local-only bridge and launch audit hooks
- Python 3.8+
curllsof- One or more provider API keys
claudeorcodexCLI installed
./install.shThe installer will:
- Verify Python
- Install locked LiteLLM dependencies from
requirements.lock - Link
brokeinto yourPATH - Create
.envfrom.env.templateif needed - Initialize the default routing files
Optional supply-chain modes:
./install.sh --mirror-wheels
./install.sh --use-wheel-mirror
./install.sh --offline--mirror-wheelsdownloads the exact locked artifacts intovendor/wheels--use-wheel-mirrorprefers the local wheel mirror during install--offlineinstalls only fromvendor/wheels
cp .env.template .envFill in whichever keys you plan to use:
OPENROUTER_API_KEY=
OPENROUTER_API_KEY_2=
GROQ_API_KEY=
GROQ_API_KEY_2=
CEREBRAS_API_KEY=
CEREBRAS_API_KEY_2=
GITHUB_TOKEN=
GITHUB_TOKEN_2=
GEMINI_API_KEY=
GEMINI_API_KEY_2=
HF_TOKEN=
HF_TOKEN_2=You do not need every key — only the providers you actively route to need valid credentials.
Optional security overrides:
BROKE_CLIENT_TOKEN=
BROKE_INTERNAL_MASTER_KEY=If you leave those unset, BrokeLLM generates local tokens automatically and stores them in restricted local files.
Dependency integrity files:
Free-tier models and routes can change availability, latency, quota behavior, or backing model behavior without notice.
For a more stable setup:
- Prefer pinned model entries over floating aliases
- Define fallback chains for important slots
- Save known-good mappings as teams
- Run
broke validateandbroke doctorbefore relying on a route
| Situation | Good Fit? |
|---|---|
| You want cheaper or mixed-provider routing behind a stable local interface | ✅ |
| You need deterministic slot routing for CLI AI tools | ✅ |
| You care about health, drift, and explainability — not blind proxying | ✅ |
| You want to preserve the client-facing slot experience while changing backends freely | ✅ |
| You only use one provider and don't care about routing behavior | ❌ |
| You want fully managed SaaS simplicity instead of a local control plane | ❌ |
| You don't need observability, fallback chains, or slot abstraction | ❌ |
Most CLI coding tools assume you have stable access to expensive frontier models.
I did not.
BrokeLLM exists because I needed a way to keep using those tools without being locked into a single provider or cost structure. The goal was not to emulate a provider. The goal was to create a local control plane that lets the client experience stay the same while the backend becomes flexible, cheaper, and under user control.
I am a single father and primary caregiver to a young child with autism, and I am currently unemployed. This project is for anyone else in a similar situation who still wants to write code, experiment, and keep moving.
Control-plane behavior lives in bin/_mapping.py.
CLI entrypoint is bin/broke.
Truth-boundary specification is in ARCHITECTURE.md.
Issues and improvements are welcome, especially around:
- Harness runtime contracts — tighten ACP lane behavior, PTY supervision, and adapter truth boundaries
- Execution hardening — improve shim integrity, sandboxing, and hostile-runtime resistance
- Operator visibility — better dashboard views, lane health, and run-channel observability
- Cross-provider mediation — Claude, Codex, and Gemini behavior parity behind the Broke harness facade
- Documentation — harness architecture, ACP semantics, and operator workflows
Current milestone: harness-oriented execution boundary and ACP/runtime mediation.
Suggested next steps:
- stabilize the harness runtime contract across Claude, Codex, and Gemini
- harden PTY supervision, lane health, and ACP adapter behavior
- keep documentation aligned with the harness-first development direction
MIT — see LICENSE.
LiteLLM is a separate upstream project with its own licensing and enterprise/commercial components. Review upstream terms independently if you distribute or extend those parts.
BrokeLLM is a personal open-source tool. It is not affiliated with Anthropic, OpenAI, Google, GitHub, Groq, Cerebras, Hugging Face, or BerriAI.
You are responsible for your API usage and billing, your provider terms, your local environment security, and pinning and reviewing dependency versions appropriately.
Keep your client speaking in slots. Control what actually answers. ⚡








