Weasel

A small operating file for keeping long-running AI agent sessions from degrading.

Capable models can do useful long work. The problem is not usually raw intelligence. The problem is operational drift:

the context window gets treated like durable memory
old notes get mistaken for current evidence
the agent rewrites plans instead of acting
loops form quietly
safety checks become cages
restarts lose the actual next action

Weasel is a small public CLAUDE.md pattern for that problem. It gives an agent a simple contract: act when checks pass, verify the result, keep state compact, and make degradation visible before it compounds.

There is a useful public name for one version of this failure: recognition without arrest. The agent sees the constraint, says the right thing about it, then still ships the wrong action. Weasel is not the research corpus for that pattern. It is the practical side: a small operating file for making that next wrong action harder to take.

This is what the failure often looks like after a long context window:

> I should check the test output before continuing.
> Let me think about the best approach.
> Actually, I should verify the state first.
> The next step would be to update the configuration.

A lot of words. Zero shipped work.

The goal is not to make agents look alive. The goal is to solve a boring problem: keep useful agent work grounded in state, evidence, checkpoints, and measurable action.

CLAUDE.md - the copyable operating contract
AGENTS.md - compatibility shim for agents that look for AGENTS.md
DEMO.md - a two-minute test prompt
demo - runtime screenshots

No framework. No server. No hidden system. Just one file you can put in a repo and ask a capable coding agent to follow.

The Bet

The model is not the whole system.

Different models have different strengths. Some are better at synthesis. Some are better at creative width. Some are useful as red-team pressure. Some are cheap and good enough for extraction or formatting.

Those strengths only matter after the agent can operate on the same basic wavelength:

use current evidence
act when checks pass
name the real blocker when checks fail
keep state small enough to hand off
verify what changed
do not turn a safety rule into a hiding place

Weasel is that shared behavior layer. Route by model strength later. First make the agent grounded enough, compact enough, and action-oriented enough to trust with the next step.

Try It in 60 Seconds

Open Claude, Kimi, ChatGPT, or another capable coding model.
Paste the quick-test prompt from DEMO.md.
Give it a real task.

The shift should show up immediately: fewer status paragraphs, more direct action, clearer blockers.

Watch It Run

These demos show the same operating pattern under different capable models.

60-second demo: Edge on Kimi runtime demo
Cross-model demo: same operating file across Kimi and Claude

What a Real Cycle Looks Like

The screenshot is not the product. The useful shape is the state packet:

What appears in the cycle	Why it matters
Compact balance/state lines	The agent reports live state instead of narrating intent.
Open positions/tasks listed clearly	The next action is grounded in current work, not vague context.
Integrity gap called out	The agent names uncertainty instead of hiding it.
Rule lapse surfaced	The agent reports process drift before it compounds.
Named next condition	The loop pauses only with a reason and a next trigger.

The point is not the labels in one screenshot. The point is the operating shape: compact, honest, actionable. No filler. No "I should." Just what is true, what happened, and what is next.

Why This Exists

Long-running agent sessions tend to fail in repeatable ways:

context gets heavy and the agent stops seeing the current task clearly
old notes get treated as live evidence
plans keep getting rewritten instead of executed
safety checks become cages instead of useful guardrails
restarts lose the next action
"I would do X" replaces doing X

The core rule:

The context window is working memory, not durable memory. Logs are not learning. A rule only matters if it changes the next action.

Weasel is built around that constraint. It does not try to replace the model. It gives the model a smaller surface to stay honest against.

The point is not to make an agent sound more serious. The point is to make the work easier to inspect: what was true, what was done, what was verified, and what still blocks the next action.

What It Does

CLAUDE.md gives an agent six compact rules:

act after checks
live evidence first
keep state compact
break loops early
learn only what changes behavior
keep safety rails useful

The AGENTS.md file is only a shim. The canonical rules stay in CLAUDE.md so the repo remains simple.

How To Use It

Copy CLAUDE.md into a project where you use a coding agent.

Then start the agent with:

Read CLAUDE.md, then continue this task. Keep state compact and act when checks pass.

For long sessions, ask it to maintain a small state file:

current_task:
current_state:
last_action:
last_verification:
open_blockers:
next_action:

The state file is not a diary. It is a handoff surface.

What This Is Not

Weasel is not:

an agent framework
a benchmark
a model router
a desktop assistant
a claim that a model is conscious
a replacement for good engineering judgment

It is a small public operating contract for keeping agent work coherent across time.

How This Should Grow

This repo should stay practical. Good contributions look like:

a clearer rule that changes the next action
a small demo showing before/after behavior
a reproducible case where the file prevented drift
a case where the file failed and the rule needs tightening
a compatibility note for another capable model or agent tool

Bad contributions are bigger theories with no changed behavior.

Public Boundary

This repo is intentionally small. It does not include private logs, account details, platform automation, internal playbooks, API keys, hidden prompts, or deployment-specific rules.

Connect

If the operating file lands for your setup, follow along:

X: @MohitJaswa27
Reddit: u/Mother-Grapefruit-45 and r/SituationBrief
Discord: The Brief

Issues and pull requests are welcome too.

Credits

Created by MJ with Iris and Cody/Codex. Runtime demos recorded under Kimi Code and Claude Code. Public integrity pass by Raven.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weasel

The Bet

Try It in 60 Seconds

Watch It Run

What a Real Cycle Looks Like

Why This Exists

What It Does

How To Use It

What This Is Not

How This Should Grow

Public Boundary

Connect

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
demo		demo
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
DEMO.md		DEMO.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Weasel

The Bet

Try It in 60 Seconds

Watch It Run

What a Real Cycle Looks Like

Why This Exists

What It Does

How To Use It

What This Is Not

How This Should Grow

Public Boundary

Connect

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages