Skip to content

feat(fleet): add MCP channel server for real-time agent communication#642

Closed
Wirasm wants to merge 602 commits into
mainfrom
feature/fleet-channels
Closed

feat(fleet): add MCP channel server for real-time agent communication#642
Wirasm wants to merge 602 commits into
mainfrom
feature/fleet-channels

Conversation

@Wirasm

@Wirasm Wirasm commented Mar 24, 2026

Copy link
Copy Markdown
Owner

Summary

Adds a kild-fleet MCP channel server that watches inbox files and pushes notifications into Claude Code sessions via the channels protocol. Reduces fleet communication latency from ~1s (Claude inbox polling) to ~100ms (fs.watch + stdio notification). Also exposes MCP tools so agents can write status/reports/messages without shelling out to CLI commands.

Changes

  • New: integrations/channel.rs — installs channel server, patches .mcp.json, cleanup on destroy
  • New: kild init-channels CLI command — installs server.ts + bun deps at ~/.kild/channels/fleet/
  • New: FleetConfig in kild-config with [fleet] channels toggle (default: false)
  • Modified: fleet_agent_flags() appends --dangerously-load-development-channels server:kild-fleet when channels enabled
  • Modified: daemon_spawn.rs calls setup_channel_integration() in the spawn sequence
  • Modified: destroy.rs cleans up .mcp.json from project root for --main sessions

Architecture

Agent brain (Claude session)         Agent worker (Claude session)
  └─ kild-fleet MCP server            └─ kild-fleet MCP server
       │  fs.watch(KILD_FLEET_DIR)         │  fs.watch(KILD_INBOX)
       │                                   │
       └──── ~/.kild/inbox/<project>/ ─────┘
              (existing file protocol)

The channel server is a TypeScript/Bun MCP server embedded as a string constant in the Rust binary. It reads $KILD_INBOX and $KILD_FLEET_DIR from the PTY environment (already injected by the inbox module). The file-based inbox protocol remains the source of truth — the channel is a notification + tooling layer on top.

MCP Tools Exposed

Tool Role Action
report_status worker Write status + optional report.md
send_to_brain worker Write task.md to brain's inbox
send_to_worker brain Write task.md to worker's inbox
list_fleet both List fleet members with status

Files Changed

15 files changed (+616, -23)

File list
  • crates/kild-core/src/sessions/integrations/channel.rs (new, 442 lines)
  • crates/kild/src/commands/init_channels.rs (new, 59 lines)
  • crates/kild-config/src/types.rs — FleetConfig struct
  • crates/kild-config/src/loading.rs — merge logic
  • crates/kild-config/src/lib.rs — re-export
  • crates/kild-core/src/lib.rs — re-export
  • crates/kild-core/src/sessions/daemon_spawn.rs — wire channel integration
  • crates/kild-core/src/sessions/fleet.rs — channels flag in fleet_agent_flags()
  • crates/kild-core/src/sessions/destroy.rs — .mcp.json cleanup
  • crates/kild-core/src/sessions/daemon_helpers.rs — re-export
  • crates/kild-core/src/sessions/integrations/mod.rs — register module
  • crates/kild-paths/src/lib.rs — channels_dir(), fleet_channel_dir()
  • crates/kild/src/app/misc.rs — init-channels clap def
  • crates/kild/src/app/mod.rs — subcommand registration
  • crates/kild/src/commands/mod.rs — dispatch

Testing

  • cargo fmt --check && cargo clippy --all -- -D warnings passes
  • cargo test --all passes (0 failures)
  • cargo build --all succeeds
  • E2E: kild init-channels installs server + deps
  • E2E: Brain create gets .mcp.json + channels flag
  • E2E: Worker create gets .mcp.json + channels flag
  • E2E: kild inject → worker receives task, writes status + report
  • E2E: Brain destroy cleans up .mcp.json from project root
  • E2E: Graceful degradation when [fleet] channels = false (no .mcp.json, no flag)

Configuration

[fleet]
channels = true  # default: false (research preview)

Requires Bun runtime. Run kild init-channels to install dependencies.

Wirasm and others added 30 commits February 11, 2026 09:43
* investigate: mutex unwrap in get_process_metrics can panic (#333)

* fix: replace mutex unwrap with proper error handling in get_process_metrics (#333)

`get_process_metrics()` used `.unwrap()` on `SYSTEM.lock()` which would
panic if the mutex was poisoned. Replace with `.map_err()` to convert
into `ProcessError::SystemError`, which the caller already handles
gracefully via `Option`.

Fixes #333
* Investigate issue #289: Ghostty focus/hide after CG migration

Root cause: focus_window doesn't unminimize — AXRaised + activate_app
don't undo kAXMinimizedAttribute set by kild hide. Need to add
UnminimizeAndRaise action that sets kAXMinimized=false before raising.

* fix: unminimize Ghostty window before raising on focus (#289)

After the Core Graphics migration (#286), `kild focus` could not restore
a window previously minimized via `kild hide` because the AX API focus
path only raised the window without unsetting kAXMinimizedAttribute.

Changes:
- Add UnminimizeAndRaise variant to WindowAction enum
- Add ax_unminimize_and_raise_window function mirroring existing pattern
- Check is_minimized in focus_window and unminimize before raising
- Add test for new WindowAction variant

Fixes #289

* fix: add specific unminimize failure logging and exhaustive match test

Add dedicated focus_unminimize_failed log event to distinguish unminimize
failures from general AX raise failures — helps debug Ghostty AX quirks.

Replace array length test with exhaustive match test for WindowAction
variants, providing stronger compile-time guarantees.
* Investigate issue #320: daemon PTY sessions exit immediately after open/resume

* fix: detect early PTY exit in daemon sessions (#320)

When a daemon PTY process exits immediately after spawn (bad resume
session, missing binary, env issue), kild now detects it within 200ms
instead of letting the user discover it later via `kild attach`.

Changes:
- Add exit_code field to DaemonSession and SessionInfo wire type
- Store exit_code in handle_pty_exit before transitioning to Stopped
- Add get_session_info() and read_scrollback() daemon client functions
- Add DaemonPtyExitedEarly error variant with exit code and scrollback
- Add post-creation health check in both create_session and open_session
  daemon paths (200ms grace period + daemon status poll)
- Clean up stopped daemon session on early exit detection

Fixes #320
* fix: send SIGWINCH to PTY when terminal element resizes

The embedded terminal was hardcoded to 80x24. GPUI computes correct
cols/rows from element bounds in prepaint() but never sent them to the
PTY. Store the PTY master handle in Terminal (via Arc<Mutex<>>), add
ResizeHandle that bundles the refs needed for resize operations, and
call resize_if_changed() from prepaint() when dimensions change.

This sends SIGWINCH to the child process and reflows the terminal grid
so programs like vim/htop respond to window resize correctly.

Closes #310

* fix: replace panic with proper error handling in PTY resize

- Replace .expect() on current_size lock with map_err + early return
  to prevent UI crash on lock poison
- Replace silent if-let-Ok on pty_master lock with explicit error
  logging and propagation
- Make resize_if_changed() return Result<(), TerminalError>, using the
  previously-unused PtyResize error variant
- Handle Result in TerminalElement::prepaint() with tracing::error
- Fix SIGWINCH comment accuracy (updates kernel winsize, not direct signal)
- Document lock scope management in ordering comment
* investigate issue #332: DaemonClientError missing KildError trait

* fix: implement KildError trait for DaemonClientError (#332)

DaemonClientError was the only error type in kild-core missing the
KildError trait implementation, breaking the documented error handling
contract.

Changes:
- Add KildError impl with DAEMON_* error codes matching existing convention
- NotRunning is the only user error (user can fix by starting daemon)
- Add tests covering all 5 variants for error_code() and is_user_error()

Fixes #332
* Investigate issue #334: health module has zero test coverage

* test: add unit tests for health module (#334)

The health module had 537 lines of code with zero test coverage.
This adds 24 tests covering health status calculation, session
enrichment, aggregation, snapshot storage, and history cleanup.

Refactors storage functions to accept &Path parameter for testability
with tempfile::TempDir, keeping public API unchanged.

Fixes #334
)

* investigate: process module dependency on agents module (#326)

Analyze the upward dependency from process to agents module and
document implementation plan to decouple via caller-passed patterns.

* refactor: remove process module dependency on agents module (#326)

The process module imported the agents module to look up agent-specific
process name patterns during process detection, creating an upward
dependency from a low-level utility to a domain-specific layer.

Move agent pattern resolution to the caller (terminal/handler.rs) by
adding an `additional_patterns` parameter to `find_process_by_name()`
and `generate_search_patterns()`. Add `get_all_process_patterns()` to
the agents module for bidirectional pattern resolution.

Fixes #326
…inal (#349)

* feat: add scrollback buffer and scroll wheel support to embedded terminal

Wire GPUI ScrollWheelEvent to alacritty_terminal's scroll_display() via
a hitbox-based mouse event listener in TerminalElement. Supports both
trackpad (pixel deltas) and mouse wheel (line deltas).

Add a "Scrollback" badge at top-right when scrolled up from bottom so
users know they're viewing history. The badge disappears when scrolled
back to bottom.

The 10,000-line scrollback buffer was already provided by
TermConfig::default() — this change just adds the scroll UI.

Closes #311

* fix: add visual fallback for scrollback badge paint failure

Paint a thin accent bar when badge text rendering fails so the user
still gets a visible "scrolled up" indicator.
* refactor: remove inverted dependency from errors module to agents

Add `supported_agents` field to `ConfigError::InvalidAgent` so the
error message is fully constructed at the call site. This removes the
`use crate::agents::supported_agents_string` import from the base
errors module, which should have no domain-module dependencies.

Closes #325

* fix: use realistic supported_agents in dispatch error test

Use supported_agents_string() instead of String::new() so the test
validates with real data rather than producing a non-actionable error
message pattern.
…ion agent (#344)

* investigate: open --no-agent inherits session agent name (#288)

* fix: open --no-agent sets agent to 'shell' instead of inheriting session agent (#288)

The BareShell match arm in open_session used session.agent.clone() which
inherited the original agent name (e.g. "claude") instead of "shell".
This caused incorrect display in list/status output despite the actual
shell command running correctly.

Fixes #288
* investigate issue #321: sync_daemon_session_status field-dropping bug

* fix: use targeted JSON patching in sync_daemon_session_status (#321)

sync_daemon_session_status() was using save_session_to_file() which
round-trips through the Session struct, silently dropping fields from
newer binary versions (e.g., task_list_id). This is the same class of
bug fixed in agent_status.rs by PR #319.

Changes:
- Add patch_session_json_fields() for atomic multi-field JSON patching
- Replace save_session_to_file() with field-level patches in sync_daemon_session_status
- Add test verifying unknown fields survive multi-field patching

Fixes #321
* refactor: decouple kild CLI from kild-daemon server embedding

Make kild-daemon a standalone binary instead of embedding it as a library
dependency in the CLI. The CLI now spawns kild-daemon as a subprocess for
both foreground and background modes, and auto-start discovers the binary
as a sibling executable.

This removes tokio and kild-daemon dependencies from the CLI crate,
resulting in a smaller binary and cleaner dependency separation.

Closes #324

* docs: update architecture docs for standalone kild-daemon binary

Clarify that kild-daemon is now a standalone binary spawned as a subprocess
by the CLI, rather than being embedded as a library dependency. Auto-start
discovers the binary as a sibling executable.

* refactor: extract shared find_sibling_binary utility, improve daemon startup

- Extract binary discovery logic into `daemon::find_sibling_binary()` in
  kild-core, eliminating duplication across autostart.rs, daemon.rs (CLI),
  and handler.rs (shim)
- Add child process crash detection to CLI background daemon start loop
  (matching autostart.rs behavior)
- Add debug logging to CLI daemon socket readiness loop
- Add structured error logging to kild-daemon main.rs for config load,
  runtime init, and server failures
Remove unused PtyRead and ChannelSend variants, the unused impl block
(error_code, is_user_error), and #[allow(dead_code)] attributes. These
were left over from before the terminal rendering was wired up.
* feat: add mouse selection and copy/paste to embedded terminal

Wire GPUI mouse events to alacritty_terminal's Selection API for
click-drag text selection with visual highlight. Cmd+C copies selection
to clipboard (falls back to SIGINT with no selection), Cmd+V pastes
clipboard to PTY. Supports single-click, double-click (word), and
triple-click (line) selection.

* fix: address review feedback for terminal mouse selection

- Surface PTY write failures to user via error banner (Cmd+C/Cmd+V)
- Add bounds clamping in pixel_to_grid to prevent overflow on cast
- Add debug logging when selection coordinates are clamped
- Fix inaccurate comments (mouse down handler, Cmd+C behavior)
* refactor: move MergeReadiness and compute_merge_readiness to kild-core

Move merge readiness business logic from CLI layer (stats.rs) into
kild-core so both CLI and kild-ui can import it without duplication.

- Add MergeReadiness enum to kild-core/src/git/types.rs
- Add compute_merge_readiness() to kild-core/src/git/operations.rs
- Export via git/mod.rs and lib.rs
- Update CLI to import from kild-core
- Move 11 unit tests to kild-core

Closes #330

* refactor: make compute an inherent method on MergeReadiness

Move computation logic from free function to MergeReadiness::compute()
for better type encapsulation. Remove backward-compat wrapper per
project guidelines. Add edge-case tests for CI Pending/Unknown states
and Draft PR handling.
)

* refactor: extract shared kild-protocol crate for IPC message types

Extract ClientMessage, DaemonMessage, and SessionInfo wire types from
kild-daemon into a new kild-protocol crate with only serde/serde_json
dependencies. All three consumers (kild-daemon, kild-core, kild-tmux-shim)
now import typed enums from kild-protocol instead of hand-crafting JSON,
eliminating protocol drift across three independent implementations.

- kild-protocol: new crate with typed enums and serde roundtrip tests
- kild-daemon: re-exports types from kild-protocol (zero behavior change)
- kild-core: replace json!() + .get() chains with typed construction/matching
- kild-tmux-shim: same typed rewrite for all IPC functions

* refactor: add typed SessionStatus and ErrorCode enums to kild-protocol

Replace stringly-typed fields with compile-checked enums:

- SessionInfo.status: String → SessionStatus enum (Creating, Running, Stopped)
- DaemonMessage::Error.code: String → ErrorCode enum with #[serde(other)] fallback
- DaemonClientError::DaemonError now carries ErrorCode for typed error matching

Fix silent failures in daemon client (kild-core):
- read_scrollback: base64 decode errors now surface instead of unwrap_or_default()
- read_scrollback: unexpected response types return ProtocolError instead of empty vec
- get_session_info: unexpected responses return ProtocolError instead of Ok(None)
- get_session_status: unexpected responses warn + return ProtocolError instead of
  silently returning Ok(None) with misleading "completed" event name

Wire format unchanged — serde rename_all ensures backward compatibility.
Split the 2,280-line operations.rs into 5 files by responsibility:
- naming.rs: path sanitization, branch names, project ID generation
- validation.rs: branch/arg validation, git directory checks
- status.rs: diff stats, worktree status, git stats aggregation
- health.rs: branch health metrics with pub(super) shared helpers
- overlaps.rs: file overlap detection across kilds

Updated mod.rs with re-exports so callers use cleaner paths
(e.g. crate::git::kild_branch_name instead of
crate::git::operations::kild_branch_name). Tests moved with
their code, no behavioral changes.
* feat: auto-detect runtime mode on `kild open`

When a daemon-created kild is stopped and reopened with `kild open`,
it now automatically reopens in daemon mode without requiring the
`--daemon` flag.

- Add `runtime_mode: Option<RuntimeMode>` to Session struct (persists
  across stop/start via `#[serde(default)]`)
- Set runtime_mode during `create_session` based on resolved mode
- Change `open_session` to accept `Option<RuntimeMode>` where `None`
  triggers auto-detection: explicit flag > session stored mode > config
  > Terminal default
- Add `resolve_explicit_runtime_mode` CLI helper that returns `None`
  when no flags passed (vs `resolve_runtime_mode` which always resolves)
- UI actions pass `None` for auto-detect instead of hardcoded Terminal

Closes #297

* docs: update kild open runtime mode resolution

Document auto-detection behavior where kild open now uses the session's
stored runtime_mode by default, only using config or flags when explicit
overrides are provided.

Resolution chain: --daemon/--no-daemon flag > session's stored mode >
config > Terminal default

* refactor: extract runtime mode resolution to dedicated function

Extract the nested unwrap_or_else cascade into resolve_effective_runtime_mode()
which returns (RuntimeMode, source) for clearer logging. The source now
distinguishes "config" from "default" instead of lumping both as "default".

Add unit tests for all 4 resolution branches (explicit, session, config,
default) and a persistence test verifying runtime_mode survives stop + reload.

* test: add auto-detect (None) cases to store and serialization tests

Clarify doc comment on Command::OpenKild.runtime_mode to distinguish
CLI-level semantics (no flag passed) from Session-level semantics
(legacy session). Add runtime_mode: None variants to store contract
test and both serialization round-trip tests.
Move applescript_escape from terminal::common::escape to a top-level
escape module in kild-core. This eliminates the cross-domain dependency
where notify reached into terminal internals for a general-purpose
string escaping function.

Also fix pre-existing unused imports in git/health.rs exposed by
cargo fmt (moved test-only imports into #[cfg(test)] module).

Closes #327
* feat: add cursor blink animation to embedded terminal

Add a 530ms on/off cursor blink to the kild-ui terminal. The blink
timer lives in TerminalView using an epoch-based invalidation pattern:
each keystroke resets the blink so the cursor stays solid while typing.
Unfocused cursors remain static (no blinking).

Also fix pre-existing unused imports in git/health.rs (moved test-only
imports into #[cfg(test)] module).

Closes #314

* refactor: extract shared blink timer, fix comment, add debug logging

- Extract duplicated blink timer loop into `spawn_blink_timer()` method
- Split catch-all `_ => break` into explicit `Ok(false)` (stale epoch)
  and `Err(e)` (view dropped) arms with debug-level tracing
- Fix misleading "thin bar" comment to accurately describe cursor_visible
  vs has_focus responsibility split
* feat: add `kild completions` subcommand for shell tab-completion

Add `kild completions <shell>` that generates shell completion scripts
for bash, zsh, fish, powershell, and elvish using clap_complete.
Completions are generated dynamically from the clap CLI definition so
they stay in sync as commands change.

Closes #89

* docs: add completions command to development reference
* refactor: decompose sessions/handler.rs into focused modules

Extract the remaining functions from handler.rs (2,943 lines) into
focused, single-responsibility modules:

- create.rs: create_session + helpers
- open.rs: open_session, restart_session + helpers
- stop.rs: stop_session
- list.rs: list_sessions, get_session, sync_daemon_session_status
- daemon_helpers.rs: build_daemon_create_request, ensure_shim_binary,
  create_zdotdir_wrapper, compute_spawn_id

handler.rs becomes a pure re-export facade (~18 lines) preserving the
session_ops::* API used by lib.rs, dispatch.rs, and health/handler.rs.

All tests migrated to their respective modules. No API changes, no
behavior changes — pure code movement following the established pattern
from destroy.rs, complete.rs, and agent_status.rs extractions.

Closes #329

* docs: update module structure for decomposed sessions handlers

* refactor: rename and relocate misplaced tests in sessions modules

- Rename persistence lifecycle tests in create.rs to reflect what they
  actually test (save/load/remove cycles, not destroy behavior)
- Move destroy_session tests from create.rs to destroy.rs where they
  belong
* Investigate issue #369: --json commands must return valid JSON for all states

* fix: --json commands return valid JSON for all states (#369)

Several CLI commands violated the JSON contract by printing plain text
for empty or error states when --json was set. Fix pr, stats --all, and
overlaps commands to always output valid JSON regardless of state.

Changes:
- pr: output JSON for no-remote and no-PR-found paths
- stats --all: output [] for empty sessions
- overlaps: output JSON object for empty/insufficient kild states
- Add integration tests for stats --all --json and overlaps --json
- Fix pre-existing duplicate imports in health.rs and trailing newline in create.rs

Fixes #369

* archive investigation artifact for #369

* fix: add reason field to no-PR-found JSON response for consistency

Both pr.rs empty-state JSON responses now include a "reason" field,
matching the pattern used in overlaps.rs.

* fix: gate macOS-only imports behind cfg(target_os = "macos")

The applescript and escape imports in iterm.rs and terminal_app.rs were
missing the cfg gate, causing unused import errors on Linux CI.
…nal (#383)

* fix: render wide characters (CJK/emoji) at double cell width in terminal

Break text runs at wide character boundaries so each wide char is
positioned individually at its exact grid column. This prevents width
errors from accumulating across mixed normal/wide character runs.

Also extend the focused cursor to span 2 cell widths when sitting on
a wide character.

Closes #315

* fix: clarify wide char comment to explain text shaper mismatch
Root causes identified:
- remove_worktree_force() never calls branch deletion
- detect_orphaned_branches() only matches legacy "worktree-" prefix

Fix: explicit branch deletion in destroy_session() using session
metadata, and update cleanup detection to match kild/ prefix.
Root cause: window resolution fails before run_assertion() is called,
error propagates to main() where it's dropped without printing.
Secondary issue: process::exit(1) doesn't flush stdout buffers.

Fix: catch window resolution errors in assert handler and format as
assertion failure output. Flush stdout before exit.
When `kild-peek assert --app "NonExistentApp" --exists` failed, it exited
with code 1 but printed nothing. Window resolution errors were propagated
as infrastructure errors instead of formatted as assertion failures.

Changes:
- Catch window resolution errors in assert handler and format as
  "Assertion: FAIL" with diagnostic details (or JSON when --json)
- Flush stdout before process::exit(1) to prevent lost output in pipes
- Add integration tests for assert failure output (plain, JSON, clean)

Fixes #354
`kild complete` was succeeding even when no PR existed because
`is_pr_merged()` conflated "no PR" with "not merged". Now uses
`check_pr_exists()` for early detection and returns NoPrFound error.

Changes:
- Add NoPrFound error variant to SessionError
- Add PR existence check before merge check in complete_session
- Simplify CLI handler (remove redundant pre-check, core is source of truth)
- Add test for new error variant

Fixes #358
Wirasm and others added 28 commits February 26, 2026 23:09
* refactor(ipc): consolidate thread-local connection pools into kild-protocol

Both kild-core and kild-tmux-shim maintained nearly identical thread-local
IpcConnection pools (~30 lines each). Extract the common take/release pattern
into kild_protocol::pool so both crates delegate to a single implementation.

No new dependencies added to kild-protocol — the pool is pure RefCell +
thread_local logic with no tracing needed.

Closes #583

* docs: update CLAUDE.md for kild_protocol::pool consolidation

Three entries referenced the old per-crate thread-local pools. Update
them to reflect that both kild-core and kild-tmux-shim now delegate to
the shared kild_protocol::pool module.

* fix(ipc): address review feedback on pool consolidation

- Fix take() doc to describe all 3 behavioral paths (reuse, evict stale,
  fresh) instead of collapsing eviction into "otherwise"
- Document single-socket-path invariant in module doc
- Return bool from take()/release() so callers can emit their own tracing
  events — restores connection_reused, connection_created, connection_cached,
  and connection_dropped_on_return debug events in both kild-core and shim
- Add test_take_evicts_stale_cached_connection test covering the "released
  alive, server closes, next take reconnects" path
- Align wording: "take" instead of "reuse a cached one" in doc summaries
…sions (#609)

PR #584 fixed PTY timing issues (#540) by skipping PTY delivery for fleet
claude sessions, intending to rely on "dropbox task.md + Claude inbox".
However, only the dropbox write was implemented — the Claude Code inbox
write was never added. Since Claude Code polls its inbox file (not the
dropbox), the initial prompt was silently dropped.

Changes:
- Move write_to_inbox from CLI inject.rs to kild-core fleet.rs
- Call write_to_inbox from deliver_initial_prompt_for_session when
  skip_pty is true (fleet claude sessions)
- Update inject.rs to use the kild-core version
- Add tests for write_to_inbox in fleet.rs
When --initial-prompt is used with fleet claude sessions, the CLI now:
1. Prints a deprecation warning telling the agent to use kild inject
2. Delivers the prompt via the reliable inbox path as fallback

Also updates kild-brain.md to instruct the brain to always use
kild create + kild inject (two-step), never --initial-prompt.

The --initial-prompt delivery path has been unreliable for fleet
sessions across PRs #540, #584, #609. The inject path is battle-tested.

Changes:
- create.rs/open.rs: detect fleet claude sessions with --initial-prompt,
  print warning, deliver via fleet::write_to_inbox as fallback
- fleet.rs: make is_claude_fleet_agent and fleet_mode_active pub
- kild-brain.md: replace --initial-prompt usage with create + inject
* fix(fleet): store augmented command with fleet flags in session file

The session file stored the base agent command before fleet flags were
appended, so `kild list` showed a command missing --agent-id,
--agent-name, and --team-name flags. Use the already-computed
fleet_command instead so the stored command matches what actually runs.

Closes #607

* docs: clarify fleet-augmented command in spawn_daemon_agent doc comment
* refactor(tests): fix naming convention in agent, terminal, editor tests

Update `define_agent_backend!` macro to generate test names that follow
the `<subject>_<expected_behavior>` convention using `paste` for ident
concatenation (e.g. `amp_backend_returns_correct_name`).

Rename trait test functions:
- `test_agent_backend_basic_methods` → `mock_agent_backend_delegates_name_display_and_yolo_correctly`
- `test_terminal_backend_basic_methods` → `mock_terminal_backend_name_and_availability_are_accessible`
- `test_editor_backend_basic_methods` → `mock_editor_backend_is_available_and_not_a_terminal_editor`

Closes #523

* fix(tests): move paste to dev section, rename remaining test_ prefixes

Move `paste` from production to dev/test section in workspace Cargo.toml
to match `temp-env` placement convention.

Rename remaining sibling tests that were missed in the initial pass:
- test_terminal_backend_execute_spawn → mock_terminal_backend_execute_spawn_returns_window_title
- test_terminal_backend_close_window → mock_terminal_backend_close_window_does_not_panic
- test_editor_backend_open → mock_editor_backend_open_succeeds

* refactor(tests): extract shared macro tests, fix stale doc comment

Address review agent findings:
- Extract 4 shared test functions into define_agent_backend_tests! helper
  macro, eliminating ~30 lines of duplication between the two arms
- Update module doc comment to reflect the new test_prefix parameter
- Trim redundant return-type clause from terminal close_window test comment
)

* fix(daemon): detect and restart stale daemons after binary upgrade

After cargo install, old daemon processes kept running from the previous
binary. Auto-start saw the daemon alive via ping and returned immediately,
never comparing binaries.

Now the daemon records its binary path + mtime at startup in
~/.kild/daemon.bin. On auto-start, if the running daemon's recorded mtime
differs from the current binary on disk, it is gracefully restarted.

- kild daemon status warns when the daemon is stale
- kild daemon restart added as a convenience command (stop + start)
- ensure_daemon_running auto-restarts stale daemons transparently

Closes #608

* fix: address code review issues for stale daemon detection

- Fix autostart logic: stale restart now bypasses auto_start guard,
  preventing Disabled error after stopping a stale daemon
- Fix mtime fallback: return false (not stale) when mtime is unreadable,
  preventing false-positive restart loops
- Fix bin_path derivation: use KildPaths-based bin_file_path() in both
  daemon and client for consistent path resolution
- Add proper error logging to kild-core read_bin_file (match daemon version)
- Extract spawn_daemon_background() helper to deduplicate CLI start/restart

* fix: address findings from silent failure, simplifier, and comment agents

Silent failure fixes:
- Log find_sibling_binary error in is_daemon_stale instead of swallowing
- Return Option<u32> from spawn_daemon_background to avoid printing PID 0
- Log mtime read failure in write_bin_file instead of silent fallback
- Wait for socket removal (not just PID file) in stop_stale_daemon
- Add error! log on restart stop timeout path
- Make handle_daemon_start PID read failure soft (was hard error)
- Log mtime parse failures in both read_bin_file implementations

Simplification:
- Replace needs_spawn flag with early return spawn_daemon() in
  ensure_daemon_running for clearer control flow

Comment fixes:
- Update run_server docstring to include bin file write step
- Match bin_file_path doc to daemon version with path literal
- Add actionable hint when stale daemon stop fails
* refactor(git): consolidate direct git2 usages behind git/ module API

Add git/query.rs with high-level query functions (is_git_repo,
get_origin_url, has_any_remote, has_uncommitted_changes, etc.) and
git/test_support.rs for test helpers. Replace all direct git2 usages
outside git/ module:

- forge/registry.rs: use git::get_origin_url instead of Repository::open
- sessions/destroy.rs: delegate has_remote_configured to git::has_any_remote
- sessions/info.rs: use git::has_uncommitted_changes for status check
- projects/types.rs: delegate is_git_repo to git::is_git_repo
- cleanup/operations.rs: rewrite to use git::list_local_branch_names,
  worktree_active_branches, list_worktree_entries, is_worktree_valid
- cleanup/handler.rs: use git::delete_local_branch, remove Repository usage

Rename ProjectError::Git2CheckFailed to GitCheckFailed with GitError
source type instead of raw git2::Error.

Closes #593

* fix(git): restore error logging in query functions, improve error handling

- has_uncommitted_changes: log repo-open and status-check failures
  instead of silently discarding via .ok()?
- is_worktree_valid: log repo-open and HEAD failures at debug level
  instead of silently returning false
- ensure_in_repo: distinguish NotFound from unexpected errors,
  consistent with is_git_repo
- list_local_branch_names: log unreadable branch names at debug level
- validate_cleanup_request: properly map NotInRepository vs other
  GitErrors instead of discarding error context
- Improve doc comments for WorktreeEntry, list_worktree_entries,
  worktree_active_branches, ensure_in_repo, is_worktree_valid

* refactor(cleanup): flatten nested match blocks for clarity

- Flatten three-level nested match in worktree_active_branches into
  sequential match-with-continue, removing one level of indentation
- Replace match-on-result-then-propagate in scan_for_orphans and
  cleanup_orphaned_resources with map_err+? to reduce boilerplate
- Flatten nested match chain in collect_session_worktree_paths into
  sequential match-with-continue
- Replace assert_eq!(..., true/false) with assert!/assert!(!...) in tests
* fix(hooks): filter and tag claude-status hook events by source (#611)

The claude-status hook was forwarding all Claude Code lifecycle events
(SubagentStop, TeammateIdle, TaskCompleted) to the honryu brain session,
making subagent noise indistinguishable from the primary agent finishing.

Replace per-event message formats ([DONE], [WAITING], [IDLE], [ERROR])
with a unified tagged format: [EVENT] <branch> <tag>: <summary>

Tags: agent.stop, subagent.stop, teammate.idle, task.completed,
agent.waiting, agent.idle

By default only primary agent events (Stop, Notification) are forwarded.
SubagentStop, TeammateIdle, and TaskCompleted are dropped unless
KILD_HOOK_VERBOSE=1 is set in the session environment.

Closes #611

* fix(hooks): prevent verbose events from writing dedup gate

Review agents found a bug: in verbose mode, TeammateIdle can write the
.idle_sent gate file before Stop fires, silently suppressing the primary
completion signal to the brain.

Fix: introduce WRITE_GATE flag so only primary events (Stop, idle_prompt)
write the gate. Verbose-only events (SubagentStop, TeammateIdle,
TaskCompleted) forward but never touch the gate.

Also:
- Update doc comment to describe event tagging and forwarding behavior
- Strengthen test assertions: verify KILD_HOOK_VERBOSE conditional
  expression (not just name), include TaskCompleted in forward block
  check, assert Stop is not gated on KILD_HOOK_VERBOSE, assert
  WRITE_GATE presence
* fix(core): decouple process/ from sessions/ module

process/cleanup.rs imported Session from sessions/, creating a
bidirectional dependency between what should be a lower-level utility
module and the session domain.

Invert the dependency: rename cleanup_session_pid_files to
cleanup_pid_files accepting &[String] (PID keys) instead of &Session.
Add Session::pid_keys() method so the session-aware key extraction
stays in sessions/types where it belongs.

After this change, process/ has zero imports from sessions/.

Closes #589

* fix(core): restore warn for no-agents fallback, improve doc comments

Address review findings:
- Restore warn! log in pid_keys() when session has no tracked agents
  (diagnostic signal was accidentally dropped during refactor)
- Clarify pid_keys() doc to mention empty-spawn-id edge case and
  potential duplicate keys for legacy agents
- Reference get_pid_file_path in cleanup_pid_files doc for path format
* refactor(terminal): extract duplicated AppleScript patterns

Add high-level helpers (spawn_via_applescript, close_via_applescript,
focus_via_applescript, hide_via_applescript) that combine template
substitution with osascript execution, reducing boilerplate in iTerm
and Terminal.app backends.

Move window ID validation into a default close_window trait method
that delegates to a new close_window_by_id required method, removing
require_window_id duplication from all four backends.

Closes #436

* fix(terminal): address review findings from automated agents

- Fix stale doc in platform_unsupported! macro (close_window → close_window_by_id)
- Add non-macOS stubs for high-level AppleScript helpers to match low-level pattern
- Update close_window_by_id doc to include Alacritty alongside Ghostty
- Update CLAUDE.md trait snippet to reflect new close_window_by_id and defaults
* refactor(git): extract kild-git crate from kild-core (#587)

Move the self-contained git/ module into its own workspace crate,
following the pattern of kild-config and kild-paths extractions.

- Create crates/kild-git/ with all git operations (worktree management,
  branch health, status, queries, remote ops, naming, validation)
- Extract detect_project/detect_project_at into kild-git::project
- kild-core depends on kild-git and re-exports all public types/functions
  for backward compatibility
- handler.rs (create_worktree) and overlaps.rs stay in kild-core since
  they depend on kild-core internals (files module, session types, config)
- Move KildError impl for GitError to kild-core::errors (avoids circular dep)
- Update all consumers (sessions, cleanup, CLI sync) to use git:: re-exports

No behavior changes — pure structural extraction.

Closes #587

* fix: gate test_support behind feature flag, clean up imports

Address review feedback:

- Gate kild-git test_support module behind `testing` feature flag so
  test helpers don't ship in production binaries (restores #[cfg(test)]
  behavior from before extraction)
- kild-core uses the feature via dev-dependencies for its tests
- Replace types::* glob import with explicit {ProjectInfo, WorktreeInfo}
  in handler.rs
- Remove misleading inline module-source comments from re-export block
  (items are alphabetically sorted, labels were inaccurate)
* feat(ui): add cursor blink support with BlinkManager

Add epoch-based cursor blink timer to TerminalView. The BlinkManager
toggles cursor visibility every 500ms via cx.spawn(), with epoch tracking
to cancel stale timers when a new cycle starts.

Cursor stays visible during typing — pause() on every keystroke resets
the blink timer. Unfocused terminals show a static hollow block cursor
(no blinking).

Closes #471

* fix(ui): address review findings in cursor blink

- Stop blink timer on focus loss, restart on focus gain — no more
  wasteful repaints on unfocused terminals, and cursor state is
  always reset when focus returns (even via mouse click)
- Replace .unwrap_or(false) with explicit match on view update —
  view-released teardown is now a distinct code path, not silently
  collapsed with stale-epoch exits
- Encapsulate toggle logic behind toggle_if_current() method —
  blink field stays pub(super) for the spawn closure but mutation
  is self-contained
- Rename pause() to reset() — matches actual semantics (restarts
  the blink cycle, not a pause)
- Remove two-step new() + enable() — blink starts inert, render()
  drives the lifecycle based on focus state
- Tighten module visibility: pub mod blink → mod blink
- Fix doc comments for accuracy
* refactor(notify): extract NotificationBackend trait with platform backends

Add trait-based extensibility for the notification subsystem, following
the existing TerminalBackend and ForgeBackend patterns:

- NotificationBackend trait with name/is_available/send interface
- MacOsNotificationBackend (osascript) and LinuxNotificationBackend (notify-send)
- Registry with platform-based auto-detection
- NotifyError type with KildError implementation

Also add standard From/TryFrom trait implementations:
- From<AgentMode> for OpenMode (kild-protocol)
- From<kild_protocol::SessionStatus> for core SessionStatus
- From<&ProcessInfo> for ProcessMetadata

Closes #435

* fix(notify): address review findings from 4 review agents

- Remove dead NotifyError::IoError variant (YAGNI — no caller produces it)
- Remove redundant which::which check in LinuxNotificationBackend::send
  (is_available already gates dispatch; Command::new handles missing binary)
- Return bool from send_via_backend to distinguish sent vs skipped, fixing
  misleading send_completed log when no backend is available
- Remove premature lib.rs re-exports of NotificationBackend/NotifyError
  (no callers outside kild-core)
- Simplify registry detect() with find+map instead of verbose find_map
- Inline detect_backend() into send_via_backend (single caller)
- Add warn log on wildcard arm in From<SessionStatus> for unknown variants
- Add doc note on format_notification_message re "needs input" for Error
- Remove redundant test comments
* refactor: rename *Info types to domain-role names

Rename generic *Info types to more descriptive domain-role names:

- WorktreeInfo → WorktreeState (kild-git)
- ProjectInfo → GitProjectState (kild-git)
- BranchInfo → BranchState (kild-git)
- PrInfo → PullRequest (kild-core forge)
- NativeWindowInfo → NativeWindow (kild-core terminal)

Pure rename, no behavior changes. Updates all usages across kild-git,
kild-core, kild CLI, and documentation.

Closes #522

* refactor: rename test functions to match new type names

Align test function names with the renamed types:
- test_worktree_info → test_worktree_state
- test_worktree_info_preserves_original_branch_name → test_worktree_state_preserves_original_branch_name
- test_project_info → test_git_project_state
- test_branch_info → test_branch_state
- test_pr_info_serde_roundtrip → test_pull_request_serde_roundtrip
- test_pr_info_with_none_summaries → test_pull_request_with_none_summaries
- test_native_window_info_fields → test_native_window_fields
* refactor: rename *Info/*Data types to domain-role names

Rename ambiguous types across sessions and process modules:

- SessionInfo (protocol) → DaemonSessionStatus
- SessionInfo (core) → SessionSnapshot
- DestroySafetyInfo → DestroySafety
- AgentStatusInfo → AgentStatusRecord
- AgentProcessData → AgentProcessDto
- ProcessInfo → ProcessSnapshot

Also renames the DaemonSession::to_session_info() method to
to_daemon_session_status() for consistency.

Pure rename — no behavior changes.

Closes #521

* fix: update stale string references after type renames

Fix error messages, test names, and CLAUDE.md that still referenced
old type names (SessionInfo, AgentStatusInfo) after the rename refactor.

* fix: update ProcessInfo → ProcessSnapshot in From impl and tests

* remove accidental research file
#624)

* fix(daemon): use real terminal size for PTY instead of hardcoded 24x80

Daemon PTYs were created with hardcoded 24x80 dimensions regardless of
the actual terminal size. This caused agents like Claude Code to render
incorrectly until the attach window connected and sent a resize.

Add a resolution chain for initial PTY dimensions:
1. --rows/--cols CLI flags (explicit override)
2. [daemon] default_rows/default_cols config (set-and-forget)
3. ioctl(TIOCGWINSZ) on stdout (real terminal detection)
4. 80x24 fallback (no TTY available)

This fixes the common case where `kild create --daemon` runs from a real
terminal, and enables non-TTY callers (Claude Code, scripts) to specify
dimensions via flags or config.

* fix(daemon): resolve PTY dimensions independently + introduce OpenSessionRequest

- Fix resolve_pty_size to resolve each dimension independently via
  .or() chaining instead of requiring both --rows AND --cols. Previously
  passing only --cols 220 silently discarded the flag. Same fix for config
  default_rows/default_cols.
- Add debug logging to PTY size resolution and ioctl fallback paths.
- Introduce OpenSessionRequest struct matching CreateSessionRequest
  pattern, removing #[allow(clippy::too_many_arguments)] from open_session.
- Merge with_rows/with_cols into with_pty_size on both request types.
* Investigate issue #520: rename *Manager types to domain-role names

* feat(hooks): add HTTP hook endpoint and Claude Code hook enhancements (#629-#634)

Replace the fragile shell-script-based hook pipeline with typed Rust in
the daemon for events that support HTTP hooks.

- Add hyper HTTP listener to daemon on localhost:19222 for Stop and
  SubagentStop events with in-memory IdleGate replacing .idle_sent file
- Rewrite claude.rs settings patching: HTTP hooks for Stop/SubagentStop,
  command hooks for TeammateIdle/TaskCompleted/Notification
- Add prompt hook on Stop for task verification before stopping (#630)
- Add SessionStart hook for auto-priming via kild prime --self --raw (#631)
- Add kild report command for TaskCompleted structured reporting (#632)
- Add kild check-queue command and inject --queue for TeammateIdle
  auto-reassign (#633)
- Add --self and --raw flags to kild prime command
- Add hooks_port to DaemonConfig and DaemonRuntimeConfig (default 19222)
- Remove file-based idle gate from dropbox.rs and daemon_request.rs

* fix(hooks): address review findings — error handling, type safety, tests, docs

- Propagate write_task error in check_queue instead of silently dropping tasks
- Replace unwrap_or_default() with proper JSON parse error handling in report
- Separate prompt-hook detection from has_our_hook to prevent over-broad matching
- Add 1 MiB body size limit to HTTP hook endpoint
- DRY DEFAULT_HOOKS_PORT constant in kild-protocol, imported by 3 crates
- Log config load failure in resolve_hooks_port with fallback warning
- Log actionable message on HTTP hook bind failure with port number
- Refactor IdleGate to HashSet, HookResult to BrainForward + typed AgentStatus
- Add HookDecision enum replacing Option<String> for hook responses
- Add 7 queue/report unit tests (FIFO ordering, peek idempotency, overwrite)
- Fix test_ensure_claude_status_hook_always_overwrites to test actual overwrite
- Update CLAUDE.md: hook architecture, new commands, hooks_port config, queue protocol
* perf: process kill retry, PID jitter, daemon runtime

- Add process.wait() after kill() to verify termination and prevent
  zombie processes
- Add +/-20% random jitter to PID file polling interval to prevent
  thundering herd when multiple kild create commands run simultaneously
- Switch daemon from multi-threaded tokio runtime to current_thread
  since it's I/O-bound with low concurrency (PTY reads + IPC)

Closes #478

* fix: bounded kill wait, PID-based jitter, revert daemon runtime

- Replace process.wait() with bounded 500ms poll loop to avoid blocking
  indefinitely on uninterruptible sleep
- Use PID-based jitter instead of SystemTime nanos which are correlated
  across simultaneous launches
- Revert current_thread runtime: daemon multiplexes PTYs + IPC across
  multiple sessions, single-threaded would stall all sessions on any
  slow operation

* fix: address review feedback on kill wait loop and PID jitter

- Reuse existing `system` in kill wait loop instead of allocating
  a new System::new() on each of up to 50 poll iterations
- Add debug log when kill wait times out (process didn't exit
  within 500ms after SIGKILL)
- Hoist constant jitter calculation above the polling loop
- Align docstring to use "decorrelate" instead of "thundering herd"

* refactor: simplify kill wait loop and jitter arithmetic

- Replace exited flag with early return on process exit
- Use elapsed() pattern consistent with read_pid_file_with_retry
- Remove unnecessary signed-integer casts in jitter computation —
  stays in u64 since BASE_INTERVAL_MS (100) > JITTER_RANGE_MS (20)
* refactor: rename *Manager types to domain-role names

Rename four types that used the vague `Manager` suffix to descriptive
domain-role names per the Code Naming Contract:

- SessionManager → DaemonSessionStore (kild-daemon)
- PtyManager → PtyStore (kild-daemon)
- ProjectManager → ProjectRegistry (kild-core)
- TeamManager → TeamStore (kild-ui)

No behavior changes — pure mechanical rename across all usages, docs,
and CLAUDE.md/AGENTS.md references.

Closes #520

* fix: address review feedback on *Manager rename

- Rename team_manager field → team_store across all UI call sites
- Rename 13 test functions test_project_manager_* → test_project_registry_*
- Fix 2 doc comments still saying "project manager"
…e error display (#628)

* refactor: extract magic strings, decompose long functions, standardize error display

- Add WORKTREE_ADMIN_PREFIX constant to naming.rs alongside KILD_BRANCH_PREFIX
- Use kild_branch_name() instead of raw format!("kild/{}") in pr.rs, detail_view.rs
- Use KILD_BRANCH_PREFIX in kild_branch_name() function body
- Add SHIM_VERSION constant for tmux version string in shim commands

- Extract kill_tracked_agents() from destroy_session() (127 lines → helper)
- Extract sweep_ui_daemon_sessions() from destroy_session() (55 lines → helper)
- Extract resolve_resume_args() from open_session() (46 lines → helper)

- Add display_operation_error() helper for consistent CLI error formatting
- Standardize error display across open, hide, focus, diff, health, stats,
  sync, commits, and teammates commands to use color::error() consistently

Closes #438

* fix: address PR review — restore comment, tighten visibility, fix test helpers

- Restore non-fatal comment on daemon cleanup in kill_tracked_agents()
- Change WORKTREE_ADMIN_PREFIX to pub(crate) — no external callers
- Replace raw format!("kild/...") in cleanup/handler.rs and overlaps.rs
  test helpers with kild_branch_name() / kild_worktree_admin_name()

* fix: address review feedback from PR review agents

- Fix kill_tracked_agents docstring: clarify daemon errors are always
  non-fatal, not gated on force flag
- Fix resolve_resume_args docstring: document is_bare_shell parameter
  and error return paths
- Remove .unwrap() in kill_tracked_agents, use indexing instead
- Change display_operation_error to use impl Display over &dyn Display
- Inline WORKTREE_ADMIN_PREFIX constant (single use site)
- Reuse kild_branch variable in pr.rs no_pr_found path
- Use imported color module instead of crate::color in teammates.rs
- Add kild-git crate to CLAUDE.md workspace structure
…635)

* Investigate issue #520: rename *Manager types to domain-role names

* feat(brain): add memory, hooks, maxTurns to kild-brain agent; deprecate --initial-prompt

- Add `memory: user`, `maxTurns: 200`, and agent-scoped `hooks:` to
  kild-brain frontmatter (PreToolUse bash guard + Stop fleet snapshot)
- Create `.claude/hooks/brain-bash-guard.sh` to enforce "no source code
  access" constraint at the hook level
- Replace ~40 lines of manual memory management with auto-memory note
- Fix router skill to use create-then-inject instead of --initial-prompt
- Deprecate --initial-prompt in CLI help text with runtime warnings
- Update CLAUDE.md with deprecation notes and inject-based brain setup

* fix: address review findings on brain hooks and deprecation warnings

- Guard: switch from fragile grep+sed JSON parsing to jq, fail closed
  on parse failure, add ERR trap, block subshell invocations (bash -c,
  sh -c), remove overly broad src/ pattern, document advisory nature
- Deprecation: eliminate contradictory double-warning for fleet sessions
  by branching into fleet-specific vs general paths (not both)
- Deprecation: use color::warning/color::hint consistently in open.rs
  (was bare eprintln, now matches create.rs)
- Deprecation: remove redundant initial_prompt_for_warning clone in
  create.rs, clone at use site instead
- Logging: add structured error!() events for inbox fallback failures
  in both create.rs and open.rs
- Stop hook: log stderr to file instead of /dev/null, ensure dir exists
- SKILL.md: add session-active check after sleep 5 before injecting,
  warn user if session not ready instead of silently losing the message
* refactor(fleet): replace dropbox protocol with universal inbox

Remove the complex dropbox messaging system (task IDs, history.jsonl,
flock locking, ack files) in favor of a simpler file-based inbox
protocol at ~/.kild/inbox/<project_id>/<branch>/.

Key changes:
- Delete dropbox.rs (2,100 lines) and its CLI commands (check-queue, report)
- Add inbox.rs with streamlined read_inbox_state() and generate_prime_context()
- Extract fleet instruction generation into fleet_instructions.rs
- Simplify Claude hooks: remove Stop prompt hook, SessionStart auto-prime,
  check-queue from TeammateIdle, and report from TaskCompleted
- Rewrite inbox/inject/prime CLI commands for the new protocol
- Add inbox path helpers to kild-paths

* fix(fleet): address review findings from PR #637

Critical fixes:
- Remove write_task("honryu",...) from forward_to_brain that clobbered
  the brain's task.md on every worker Stop event
- Populate fleet entries in kild prime --json (was always empty vec)
- Surface errors in handle_all_prime (was silently returning Ok)

Error handling fixes:
- Narrow status file read to only swallow NotFound, warn on other I/O errors
- Await spawn_blocking JoinHandle in forward_to_brain to catch panics
- Add eprintln for inbox status init failure (was warn-only)

Simplification:
- Remove unused _is_brain parameter from ensure_inbox
- Flatten write_task return from Result<Option<()>> to Result<bool>
- Extract build_fleet_entries() and render_fleet_table() helpers
- Merge _resolved wrappers into primary read_inbox_state/generate_prime_context
- Extract write_fleet_instructions_to() shared helper
- Warn on corrupt fleet instruction markers (begin without end)

Docs:
- Fix README.md stale dropbox references and removed --task/--report/--status flags
- Fix wave planner skill referencing deleted dropbox.rs
…erations (#638)

* fix(daemon): restore pooled connection timeout after short-timeout operations

Four functions (ping_daemon, get_session_status, get_session_info,
read_scrollback) set a 2s read timeout on pooled IPC connections but
never restored the default 30s before returning to the pool. The next
caller on the same thread inherited the corrupted 2s timeout, causing
spurious timeouts on slower operations.

Save the original timeout before overriding, and restore it before
returning the connection to the pool. If the restore fails, the
connection is dropped rather than poisoning the pool.

Add IpcConnection::get_read_timeout() to support the save/restore
pattern.

* refactor(daemon): extract RAII timeout guard for pooled connection operations

Replace 8 manual save/set/restore cycles across 4 functions with
IpcConnection::with_read_timeout() — a closure-based helper that
saves the original timeout, sets a short one, runs the caller's
closure, and restores the original on return.
)

* fix(session): mark session Stopped even when daemon is unreachable

When the daemon isn't running, `kild stop` would fail with DaemonError
and leave the session stuck Active on disk. The early return at
stop.rs:124 fired before the session status was updated.

Consolidate the `is_daemon_unreachable` check from list.rs into a proper
`is_unreachable()` method on DaemonClientError. Use it in both
stop_session() and stop_teammate() to treat unreachable daemon errors
(NotRunning, ConnectionFailed, ProtocolError, Io) as "PTY already dead"
instead of blocking the stop flow.

* fix(session): use is_unreachable() in destroy path for consistency
… ID races (#640)

* fix(shim): hold registry lock across load-modify-save to prevent pane ID races

state::load() and state::save() each acquired independent flocks. Between
the two calls, a concurrent split-window could read the same next_pane_id,
causing duplicate pane IDs and orphaned daemon PTYs.

Add LockedRegistry guard type that holds both the Flock and PaneRegistry.
save(self) writes while the lock is still held. Update all 8 callers in
commands.rs to use load_and_lock() + locked.save(). Add thread-level
concurrency test verifying unique pane ID allocation.

* fix(shim): gate standalone load() behind #[cfg(test)], keep save() for init_registry

* fix(shim): address review findings from PR #640

Update module doc to reflect that load() is now test-only and save()
is init-only. Add read/write phase comments to handle_new_session.
Restructure handle_new_window to do all reads before writes,
eliminating interleaved registry()/registry_mut() calls.
* feat(ui): add terminal reconnection on daemon disconnect

When the daemon reader connection drops, the terminal view now shows
"Press R to reconnect" instead of a dead-end error. The Reconnect action
spawns an async task that calls connect_for_attach(), builds a new
Terminal::from_daemon(), and replaces the terminal + event task
atomically. Handles edge cases: daemon gone, session destroyed, multiple
rapid presses, local terminals unaffected.

* fix(ui): address review findings from PR #641

- Use actual terminal dimensions on reconnect instead of hardcoded 24x80
- Log error on reconnect_state lock poison instead of silently swallowing
- Accept uppercase R for reconnect key (CapsLock resilience)
- Fix theme::surface_1() -> theme::surface() compilation error
Add a kild-fleet MCP channel server that watches inbox files and pushes
notifications into Claude Code sessions via the channels protocol. This
reduces fleet communication latency from ~1s (Claude inbox polling) to
~100ms (fs.watch + stdio notification).

The channel server (TypeScript/Bun) is embedded in the Rust binary and
installed to ~/.kild/channels/fleet/ via `kild init-channels`. It exposes
MCP tools (report_status, send_to_worker, send_to_brain, list_fleet) so
agents can communicate without shelling out to CLI commands.

Gated behind [fleet] channels = true config flag (default: false).
Requires Bun runtime. Graceful degradation when unavailable.

New command: `kild init-channels` — installs server + bun deps.
New config: `[fleet] channels` — enables channel server for fleet sessions.
…tall

Channel server writes a .channel breadcrumb to the inbox dir after MCP
handshake completes. `kild inbox` surfaces this as [channel] next to the
status line, so you can confirm the channel server is actually connected.

Stale breadcrumbs are cleaned on session create/open (ensure_inbox).

Also skip `bun install` in `kild init-channels` when node_modules exists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant