A small operating file for keeping long-running AI agent sessions from degrading.
Capable models can do useful long work. The problem is not usually raw intelligence. The problem is operational drift:
- the context window gets treated like durable memory
- old notes get mistaken for current evidence
- the agent rewrites plans instead of acting
- loops form quietly
- safety checks become cages
- restarts lose the actual next action
Weasel is a small public CLAUDE.md pattern for that problem. It gives an agent a simple contract: act when checks pass, verify the result, keep state compact, and make degradation visible before it compounds.
There is a useful public name for one version of this failure: recognition without arrest. The agent sees the constraint, says the right thing about it, then still ships the wrong action. Weasel is not the research corpus for that pattern. It is the practical side: a small operating file for making that next wrong action harder to take.
This is what the failure often looks like after a long context window:
> I should check the test output before continuing.
> Let me think about the best approach.
> Actually, I should verify the state first.
> The next step would be to update the configuration.
A lot of words. Zero shipped work.
The goal is not to make agents look alive. The goal is to solve a boring problem: keep useful agent work grounded in state, evidence, checkpoints, and measurable action.
- CLAUDE.md - the copyable operating contract
- AGENTS.md - compatibility shim for agents that look for
AGENTS.md - DEMO.md - a two-minute test prompt
- demo - runtime screenshots
No framework. No server. No hidden system. Just one file you can put in a repo and ask a capable coding agent to follow.
The model is not the whole system.
Different models have different strengths. Some are better at synthesis. Some are better at creative width. Some are useful as red-team pressure. Some are cheap and good enough for extraction or formatting.
Those strengths only matter after the agent can operate on the same basic wavelength:
- use current evidence
- act when checks pass
- name the real blocker when checks fail
- keep state small enough to hand off
- verify what changed
- do not turn a safety rule into a hiding place
Weasel is that shared behavior layer. Route by model strength later. First make the agent grounded enough, compact enough, and action-oriented enough to trust with the next step.
- Open Claude, Kimi, ChatGPT, or another capable coding model.
- Paste the quick-test prompt from DEMO.md.
- Give it a real task.
The shift should show up immediately: fewer status paragraphs, more direct action, clearer blockers.
These demos show the same operating pattern under different capable models.
- 60-second demo: Edge on Kimi runtime demo
- Cross-model demo: same operating file across Kimi and Claude
The screenshot is not the product. The useful shape is the state packet:
| What appears in the cycle | Why it matters |
|---|---|
| Compact balance/state lines | The agent reports live state instead of narrating intent. |
| Open positions/tasks listed clearly | The next action is grounded in current work, not vague context. |
| Integrity gap called out | The agent names uncertainty instead of hiding it. |
| Rule lapse surfaced | The agent reports process drift before it compounds. |
| Named next condition | The loop pauses only with a reason and a next trigger. |
The point is not the labels in one screenshot. The point is the operating shape: compact, honest, actionable. No filler. No "I should." Just what is true, what happened, and what is next.
Long-running agent sessions tend to fail in repeatable ways:
- context gets heavy and the agent stops seeing the current task clearly
- old notes get treated as live evidence
- plans keep getting rewritten instead of executed
- safety checks become cages instead of useful guardrails
- restarts lose the next action
- "I would do X" replaces doing X
The core rule:
The context window is working memory, not durable memory. Logs are not learning. A rule only matters if it changes the next action.
Weasel is built around that constraint. It does not try to replace the model. It gives the model a smaller surface to stay honest against.
The point is not to make an agent sound more serious. The point is to make the work easier to inspect: what was true, what was done, what was verified, and what still blocks the next action.
CLAUDE.md gives an agent six compact rules:
- act after checks
- live evidence first
- keep state compact
- break loops early
- learn only what changes behavior
- keep safety rails useful
The AGENTS.md file is only a shim. The canonical rules stay in CLAUDE.md so the repo remains simple.
Copy CLAUDE.md into a project where you use a coding agent.
Then start the agent with:
Read CLAUDE.md, then continue this task. Keep state compact and act when checks pass.
For long sessions, ask it to maintain a small state file:
current_task:
current_state:
last_action:
last_verification:
open_blockers:
next_action:The state file is not a diary. It is a handoff surface.
Weasel is not:
- an agent framework
- a benchmark
- a model router
- a desktop assistant
- a claim that a model is conscious
- a replacement for good engineering judgment
It is a small public operating contract for keeping agent work coherent across time.
This repo should stay practical. Good contributions look like:
- a clearer rule that changes the next action
- a small demo showing before/after behavior
- a reproducible case where the file prevented drift
- a case where the file failed and the rule needs tightening
- a compatibility note for another capable model or agent tool
Bad contributions are bigger theories with no changed behavior.
This repo is intentionally small. It does not include private logs, account details, platform automation, internal playbooks, API keys, hidden prompts, or deployment-specific rules.
If the operating file lands for your setup, follow along:
- X: @MohitJaswa27
- Reddit: u/Mother-Grapefruit-45 and r/SituationBrief
- Discord: The Brief
Issues and pull requests are welcome too.
Created by MJ with Iris and Cody/Codex. Runtime demos recorded under Kimi Code and Claude Code. Public integrity pass by Raven.
MIT. Copyright (c) 2026 MJ. See LICENSE.
