Skip to content

blackjax-devs/agent-team

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

agent-team

A configurable, Bayesian-aware multi-agent dev team, built on sagent + the Claude CLI. The default profile is a team for building and debugging applications that use BlackJAX — but the workspace, roster, models, and prompts are all config, so you can point it at any project.

Status: pre-release scaffold (milestone 2, in progress). Extracted from the BlackJAX team's internal channel and being generalized + decoupled. Not yet published; the quickstart below is provisional. See DECOUPLING.md for the remaining extraction work.

Two ways to use it

Surface What Best for
Agent pack (Claude Code native) a Bayesian-aware set of agents (tl, swe, statistician, …) + methodology checklists dropped into your repo's .claude/. One entry agent fans out to subagents. No server. "finalize this PR", "debug why this won't sample"
Channel (sagent server) a persistent multi-agent team with a web UI, peer messaging, and a shared worklog. Configurable roster — the full team, or a single curated agent (e.g. just the statistician) as a debug surface. ongoing coordination; a dedicated debugging surface

Both share the same roles, methodology, and workspace config; only the runtime differs.

Requirements

  • The claude CLI, authenticated — the framework shells out to claude --print.
  • Python ≥ 3.12 and uv.
  • Loopback-only HTTP; write-capable roles are sandboxed.

Quickstart (channel) — provisional

No clone — one line (runs the server straight from GitHub; sagent comes along via the direct-URL dep, no extra flags):

uvx --from git+https://github.com/blackjax-devs/agent-team.git agent-team serve --port 8767
# UI at http://127.0.0.1:8767/
# persistent install instead:  uv tool install git+https://github.com/blackjax-devs/agent-team.git

From a clone (for development or a custom profile):

uv sync                        # installs the package + the `agent-team` console script
agent-team serve --port 8767   # or: python -m agent_team.serve --port 8767

Run it in tmux. However you launch it, serve is a foreground, long-running process — start it inside a tmux (or screen) session so it survives terminal/SSH disconnects and you can detach (Ctrl-b then d) and reattach (tmux attach -t agent-team) without killing the team:

tmux new -s agent-team
# …run the serve command above, then Ctrl-b d to detach; Ctrl-C in the window stops it

The bundled default profile ships inside the package, so it works out of the box from any cwd. To use a custom profile, set AGENT_TEAM_PROFILE_DIR to your profile dir. Per-session audit data (main.jsonl, sessions/) lands in the launch cwd by default; set SAGENT_DATA_DIR to relocate it.

agent-team --help prints usage and exits without booting the server.

Stopping the server

serve shuts down gracefully on SIGINT/SIGTERM: the HTTP server stops first, then each agent's runtime drains and persists its tape, so no work is lost and the next launch resumes cleanly. Prefer this over a hard kill.

# In tmux (recommended) — in the serve window press Ctrl-C, or send it without
# attaching (the scriptable version of the same graceful stop):
tmux send-keys -t agent-team C-c

# Not in tmux — SIGTERM the process listening on the port (default 8767), but
# ONLY if one is found (so it's a no-op, not an error, when already stopped):
pid=$(ss -ltnp 2>/dev/null | grep ':8767 ' | grep -oP 'pid=\K[0-9]+' | head -1)
[ -n "$pid" ] && kill -TERM "$pid" || echo "no agent-team server on :8767"

Don't pkill -f 'agent-team serve': the pattern matches your own shell/script (its command line contains the string), so it can signal the wrong process.

Give it a few seconds to drain. If it doesn't exit — a wedged agent can block the drain — escalate to SIGKILL (kill -KILL <pid>); tapes are persisted incrementally, so little is lost. Avoid tmux kill-session as a first move: it hard-kills the process and skips the graceful drain.

Restart, recovery & large tapes

The channel persists each agent's full conversation (its sagent tape) to SAGENT_DATA_DIR/sessions/<role>.sagent/session.jsonl, so stopping and relaunching serve resumes every agent where it left off.

After any restart, give the team a few minutes to settle — don't touch anything in the meanwhile. On boot each agent re-feeds its resumed tape to the claude CLI and re-establishes its MCP bridge — a warmup that can run for up to ~90s, after which agents recover on their first real turn (≈2–3 min total). Don't send work or trigger another restart/slim during this window: acting on a still-settling agent can crash its compaction and wedge it.

Resume-slim (the slim mode — materialize's fallback). The resumed tape is trimmed to roughly its last AGENT_TEAM_RESUME_KEEP messages (default 120), snapped to a clean turn boundary, before it is re-fed to the CLI. This bounds the first-turn re-feed so a large tape (a long-running coordinator can reach megabytes) can't stall the MCP handshake on boot. It is deterministic (no model call), the on-disk history is never rewritten, and small tapes are left untouched. Set AGENT_TEAM_RESUME_KEEP=0 to disable (resume the full tape), or raise it to keep more context. A server restart is therefore the reliable recovery for an agent whose tape has grown too large to re-feed.

The slim is anchored across agents on the lead (AGENT_TEAM_RESUME_LEAD, default tl): the lead is trimmed by record count, the wall-clock floor of its recent window is taken from the inbound-message timestamps on its tape, and every other agent is trimmed to that same floor. Without this, the busy coordinator gets a shorter recent window than its quieter peers (same record count = less wall-clock), so peers resume holding work the lead has forgotten — they desync. Idle agents (no activity since the floor) keep only a last-turn breadcrumb, not a stale window.

Resume mode (AGENT_TEAM_RESUME_MODE, default materialize). Three modes:

  • materialize (default) — instead of re-feeding the tape, write claude's session JSONL directly from the resumed context and resume it with a native --resume. There is no re-feed at all, so startup is effectively instant and each agent continues exactly mid-thread. It couples to claude's session-file format, so it is gated by a boot drift-canary (a throwaway claude round-trip diffed against the materializer's output); on drift — or if claude isn't available at boot — it falls back to slim.
  • slim — the trimming described above: the deterministic fallback the canary degrades to, and selectable directly.
  • full — resume the untrimmed tape (no slimming): the original fat-tape re-feed behaviour, kept as an escape hatch.

Per-agent restart (debug console). The console at /debug exposes three intensities per agent (also POST /api/restart {role, mode}, loopback-only):

mode what it does use when
slim compacts the tape (summarize old turns, keep the recent thread); agent stays on-task a long-but-healthy agent you don't want to interrupt
soft clears history, preserves the inbox a blunt reset
reanchor clears history, then re-seeds the last couple of turns + a "re-sync with the lead before acting" directive recovering a stuck/confused agent

slim runs a model call to summarize, so it only fits a warm, settled agent. The server refuses a live slim when the context is large (> AGENT_TEAM_SLIM_MAX_MESSAGES) or a turn is in flight — re-feeding a big tape inside the compact call is exactly what wedges an agent. For a very large or unresponsive tape, prefer a server restart (resume-slim) or reanchor.

Related env knobs:

  • AGENT_TEAM_RESUME_MODE (default materialize) — materialize (native --resume, canary-gated, effectively-instant startup) | slim (fallback) | full.
  • AGENT_TEAM_RESUME_KEEP (default 120) — messages kept per tape on resume (slim mode/fallback); 0 resumes the full tape.
  • AGENT_TEAM_RESUME_LEAD (default tl) — the role whose slim cut anchors every other agent's wall-clock floor (slim mode); empty/absent disables cross-agent anchoring.
  • AGENT_TEAM_SLIM_MAX_MESSAGES (default 250) — max resolved-context size at which a live slim is allowed; above it the server refuses and points at a restart.
  • AGENT_TEAM_MCP_CONNECT_TIMEOUT_SEC (default 25) — seconds boot waits for the CLI's MCP bridge before respawning.
  • AGENT_TEAM_COMPACT_TRIGGER (default 0.80) — context-utilization fraction at which an agent auto-compacts mid-run.

Profiles

Behavior is driven by a profile dir (AGENT_TEAM_PROFILE_DIR): a team.toml (roster, per-role model, workspace, sandbox, session-id namespace, presets) + role prompt templates. The bundled default profile ships inside the package (agent_team/profiles/default) and targets "an app that uses BlackJAX"; it is used automatically when AGENT_TEAM_PROFILE_DIR is unset. The BlackJAX dev team itself runs this same framework against a private profile (its monorepo + worklog).

Presets in the default profile: dev-team (the full five) and statistician-only (a single curated debug agent).

License

Apache License 2.0 — see LICENSE.

About

A configurable, Bayesian-aware multi-agent dev team on sagent + the Claude CLI (a Claude Code agent pack + a sagent channel). Default profile: building and debugging apps that use BlackJAX.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors