Skip to content

jaswalmohit8-collab/weasel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Weasel

A small operating file for keeping long-running AI agent sessions from degrading.

Capable models can do useful long work. The problem is not usually raw intelligence. The problem is operational drift:

  • the context window gets treated like durable memory
  • old notes get mistaken for current evidence
  • the agent rewrites plans instead of acting
  • loops form quietly
  • safety checks become cages
  • restarts lose the actual next action

Weasel is a small public CLAUDE.md pattern for that problem. It gives an agent a simple contract: act when checks pass, verify the result, keep state compact, and make degradation visible before it compounds.

There is a useful public name for one version of this failure: recognition without arrest. The agent sees the constraint, says the right thing about it, then still ships the wrong action. Weasel is not the research corpus for that pattern. It is the practical side: a small operating file for making that next wrong action harder to take.

This is what the failure often looks like after a long context window:

> I should check the test output before continuing.
> Let me think about the best approach.
> Actually, I should verify the state first.
> The next step would be to update the configuration.

A lot of words. Zero shipped work.

The goal is not to make agents look alive. The goal is to solve a boring problem: keep useful agent work grounded in state, evidence, checkpoints, and measurable action.

  • CLAUDE.md - the copyable operating contract
  • AGENTS.md - compatibility shim for agents that look for AGENTS.md
  • DEMO.md - a two-minute test prompt
  • demo - runtime screenshots

No framework. No server. No hidden system. Just one file you can put in a repo and ask a capable coding agent to follow.

The Bet

The model is not the whole system.

Different models have different strengths. Some are better at synthesis. Some are better at creative width. Some are useful as red-team pressure. Some are cheap and good enough for extraction or formatting.

Those strengths only matter after the agent can operate on the same basic wavelength:

  • use current evidence
  • act when checks pass
  • name the real blocker when checks fail
  • keep state small enough to hand off
  • verify what changed
  • do not turn a safety rule into a hiding place

Weasel is that shared behavior layer. Route by model strength later. First make the agent grounded enough, compact enough, and action-oriented enough to trust with the next step.

Try It in 60 Seconds

  1. Open Claude, Kimi, ChatGPT, or another capable coding model.
  2. Paste the quick-test prompt from DEMO.md.
  3. Give it a real task.

The shift should show up immediately: fewer status paragraphs, more direct action, clearer blockers.

Watch It Run

These demos show the same operating pattern under different capable models.

What a Real Cycle Looks Like

Edge agent state after one completed cycle

The screenshot is not the product. The useful shape is the state packet:

What appears in the cycle Why it matters
Compact balance/state lines The agent reports live state instead of narrating intent.
Open positions/tasks listed clearly The next action is grounded in current work, not vague context.
Integrity gap called out The agent names uncertainty instead of hiding it.
Rule lapse surfaced The agent reports process drift before it compounds.
Named next condition The loop pauses only with a reason and a next trigger.

The point is not the labels in one screenshot. The point is the operating shape: compact, honest, actionable. No filler. No "I should." Just what is true, what happened, and what is next.

Why This Exists

Long-running agent sessions tend to fail in repeatable ways:

  • context gets heavy and the agent stops seeing the current task clearly
  • old notes get treated as live evidence
  • plans keep getting rewritten instead of executed
  • safety checks become cages instead of useful guardrails
  • restarts lose the next action
  • "I would do X" replaces doing X

The core rule:

The context window is working memory, not durable memory. Logs are not learning. A rule only matters if it changes the next action.

Weasel is built around that constraint. It does not try to replace the model. It gives the model a smaller surface to stay honest against.

The point is not to make an agent sound more serious. The point is to make the work easier to inspect: what was true, what was done, what was verified, and what still blocks the next action.

What It Does

CLAUDE.md gives an agent six compact rules:

  1. act after checks
  2. live evidence first
  3. keep state compact
  4. break loops early
  5. learn only what changes behavior
  6. keep safety rails useful

The AGENTS.md file is only a shim. The canonical rules stay in CLAUDE.md so the repo remains simple.

How To Use It

Copy CLAUDE.md into a project where you use a coding agent.

Then start the agent with:

Read CLAUDE.md, then continue this task. Keep state compact and act when checks pass.

For long sessions, ask it to maintain a small state file:

current_task:
current_state:
last_action:
last_verification:
open_blockers:
next_action:

The state file is not a diary. It is a handoff surface.

What This Is Not

Weasel is not:

  • an agent framework
  • a benchmark
  • a model router
  • a desktop assistant
  • a claim that a model is conscious
  • a replacement for good engineering judgment

It is a small public operating contract for keeping agent work coherent across time.

How This Should Grow

This repo should stay practical. Good contributions look like:

  • a clearer rule that changes the next action
  • a small demo showing before/after behavior
  • a reproducible case where the file prevented drift
  • a case where the file failed and the rule needs tightening
  • a compatibility note for another capable model or agent tool

Bad contributions are bigger theories with no changed behavior.

Public Boundary

This repo is intentionally small. It does not include private logs, account details, platform automation, internal playbooks, API keys, hidden prompts, or deployment-specific rules.

Connect

If the operating file lands for your setup, follow along:

Issues and pull requests are welcome too.

Credits

Created by MJ with Iris and Cody/Codex. Runtime demos recorded under Kimi Code and Claude Code. Public integrity pass by Raven.

License

MIT. Copyright (c) 2026 MJ. See LICENSE.

About

A small operating file for keeping long-running AI agent sessions from degrading. Capable models can do useful long work. The problem is not usually raw intelligence. The problem is operational drift: Stops AI coding agents from rotting during long sessions. One file, MIT licensed.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors