nexu-io · lefarcen · Jun 10, 2026 · Jun 10, 2026 · Jun 10, 2026 · Jun 10, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -130,18 +130,23 @@ root `pnpm tools-pr` script without a new explicit maintainer decision.
 
 ## Agent runtime conventions
 
-- `RuntimeAgentDef.promptInputFormat` selects how the daemon writes the prompt to a child's stdin. The default `'text'` writes the composed prompt and ends stdin immediately. `'stream-json'` wraps the prompt as one JSONL `user` message and KEEPS stdin open so the daemon can stream further user messages back in mid-turn. Claude (`apps/daemon/src/runtimes/defs/claude.ts`) ships `'stream-json'` together with `--input-format stream-json` so the host can answer interactive tools like `AskUserQuestion` with a real `tool_result` block. Every other agent stays on `'text'`.
-- `apps/daemon/src/server.ts` tracks `run.pendingHostAnswers` (a Set of `tool_use_id` strings) and `run.stdinOpen` on the run object. The `claude-stream-json` event handler adds AskUserQuestion ids to the set and closes stdin only when both the set is empty AND a `turn_end` (or `usage`) event arrives with a non `tool_use` `stop_reason`. The `tool_use` stop reason means the model paused mid tool (waiting on claude-code's internal runner or on a host answer); closing stdin there would truncate the follow up response.
-- `claude-stream.ts` emits the `turn_end` event AFTER iterating the assistant message's content blocks, not before. When `--include-partial-messages` is unsupported, tool_use events surface only from the assistant wrapper, so emitting `turn_end` first would let the daemon close stdin before the host had registered any pending answers.
-- `POST /api/runs/:id/tool-result` is the daemon endpoint for feeding a `tool_result` block back into a still running stream-json child. Body shape: `{ toolUseId: string, content: string, isError?: boolean }`. Web callers use `submitChatRunToolResult` from `apps/web/src/providers/daemon.ts`. The daemon writes a JSONL `user` message containing one `tool_result` content block, removes the id from `pendingHostAnswers`, and lets the next `turn_end` decide when to close stdin.
-- AskUserQuestion specifically: Claude's system prompt section in `apps/daemon/src/prompts/system.ts` (Claude only block at the bottom of `composeSystemPrompt`) tells the model to use the tool for 2 to 4 finite choices, and to stop generating tokens after the tool call instead of also writing a markdown duplicate. `AssistantMessage.suppressAskUserQuestionFallbackText` is the belt and suspenders that hides any trailing markdown text in the same turn.
+- `RuntimeAgentDef.promptInputFormat` selects how the daemon writes the prompt to a child's stdin. The default `'text'` writes the composed prompt and ends stdin immediately. `'stream-json'` wraps the prompt as one JSONL `user` message and KEEPS stdin open so the daemon can stream further user messages back in mid-turn. Claude (`apps/daemon/src/runtimes/defs/claude.ts`) ships `'stream-json'` together with `--input-format stream-json` as generic mid-turn input infrastructure; the daemon closes stdin once the turn terminates cleanly. Every other agent stays on `'text'`.
+- `apps/daemon/src/server.ts` tracks `run.stdinOpen` on the run object. `applyClaudeStreamJsonRunBookkeeping` closes stdin (and records `turnCompletedCleanly`) when a `turn_end` (or `usage`) event arrives with a non `tool_use` `stop_reason`. The `tool_use` stop reason means the model paused mid tool (waiting on claude-code's internal runner); closing stdin there would truncate the follow up response.
+- `claude-stream.ts` emits the `turn_end` event AFTER iterating the assistant message's content blocks, not before, so the daemon sees the final `stop_reason` and every tool_use of the turn before deciding whether to close stdin.
+- The host asks the user clarifying questions through the `<question-form>` artifact (see "Asking the user questions" below), NOT through a stdin-injected `tool_result`. There is no `AskUserQuestion` tool wiring, no `/api/runs/:id/tool-result` endpoint, and no host-answer return path; the stream-json input skeleton is retained only as generic infrastructure.
+
+## Asking the user questions
+
+- There is exactly one mechanism for clarifying user intent: the `<question-form>` markdown artifact the model emits inline. The chat renders a `QuestionsBanner` entry point (`AssistantMessage.tsx`); the form itself renders in the right-hand Questions tab (`QuestionsPanel` + `QuestionFormView`), and answers flow back as the next user message (`formatFormAnswers` in `apps/web/src/artifacts/question-form.ts` → `POST /api/chat`). There is no inline interactive tool card.
+- `<question-form>` is valid on ANY turn, not just turn-1 discovery. Use it for turn-1 discovery briefs AND for mid-conversation clarification (e.g. an ambiguous annotation). The system-prompt guidance lives in `apps/daemon/src/prompts/system.ts` and `discovery.ts`; the API/BYOK-mode wording is mirrored through `packages/contracts/src/prompts/system.ts`.
+- `run-artifacts.ts:runAskedUserQuestion` powers the `run_finished.asked_user_question` analytics signal by scanning the run's streamed text for a `<question-form` marker (reassembled across `text_delta` chunks), not by detecting any tool call.
 
 ## Chat UI conventions
 
 - `apps/web/src/components/file-viewer-render-mode.ts` decides URL-load vs srcDoc for HTML previews. Bridges (deck, comment/inspect selection, palette, edit, tweaks) can ONLY inject through the srcDoc path. Add a new disqualifier to `UrlLoadDecision` whenever a feature needs a srcDoc-only bridge; pass it from `FileViewer.tsx` based on a source-content heuristic where appropriate (e.g. `hasTweaksTemplate`). The host keeps both iframes mounted simultaneously and swaps CSS visibility so toggling render mode does not cause an iframe reload flash; `iframeRef.current` stays aligned with the active iframe via `useEffect`. Receive filters use `isOurIframe(ev.source)` to accept messages from either iframe but signals that should ONLY come from the active iframe (e.g. `od:tweaks-available`) re-check `ev.source === iframeRef.current?.contentWindow`.
 - TodoWrite UI pins one canonical task list above the chat composer via `PinnedTodoSlot` in `ChatPane.tsx`. The slot reads the latest TodoWrite snapshot across the conversation through `latestTodoWriteInputFromMessages` (`apps/web/src/runtime/todos.ts`). `AssistantMessage.stripTodoToolGroups` removes any TodoWrite tool groups from per message rendering so there is exactly one TodoCard on screen. The progress count includes both `completed` and `in_progress` items (1/4 reads "one underway" not "zero finished"). Dismissal via the Done button is keyed on the snapshot's JSON, so a fresh TodoWrite from the agent automatically re shows the card. `PinnedTodoSlot` sits OUTSIDE the `.chat-log` scroll container, so auto-scroll requires explicit coverage: `ChatPane`'s `ResizeObserver` accepts a `containerRef` from `PinnedTodoSlot` and observes that element directly, and a pane-level `MutationObserver` (`childList: true` on the chat pane ancestor) re-syncs that observation whenever the slot mounts or unmounts as new TodoWrite snapshots arrive.
-- `AskUserQuestionCard` (in `ToolCard.tsx`) prefers the live `onAnswerToolUse(toolUseId, content)` route (POSTs to `/api/runs/:id/tool-result`) and falls back to the legacy `onSubmitForm(text)` path when the run has already terminated. Selected chips persist across reloads by parsing the stored `tool_result.content` back into the selections shape.
-- Tool group rendering uses `dedupeSnapshotToolRetries` to collapse identical `AskUserQuestion` retries (one card per unique input, keeping the latest tool_use_id) and `TodoWrite` snapshots (only the most recent call, since each call is a state replace).
+- Clarifying questions render through the `<question-form>` artifact and the Questions tab, not an inline tool card — see "Asking the user questions" above.
+- Tool group rendering uses `dedupeSnapshotToolRetries` to collapse `TodoWrite` snapshots (only the most recent call survives, since each call is a state replace). `SNAPSHOT_TOOL_NAMES` lists the snapshot-style tools; non-snapshot tools pass through untouched.
 
 ## Web CSS ownership
 

diff --git a/apps/daemon/src/chat-routes.ts b/apps/daemon/src/chat-routes.ts
@@ -59,7 +59,6 @@ export interface RegisterChatRoutesDeps extends RouteDeps<'db' | 'design' | 'htt
 export function registerChatRoutes(app: Express, ctx: RegisterChatRoutesDeps) {
   const { db, design } = ctx;
   const { sendApiError, createSseResponse } = ctx.http;
-  const { submitToolResultToRun } = ctx.chat;
   const { testProviderConnection, testAgentConnection, getAgentDef, isKnownModel, sanitizeCustomModel, listProviderModels } = ctx.agents;
   const {
     handleCritiqueArtifact,
@@ -124,51 +123,6 @@ export function registerChatRoutes(app: Express, ctx: RegisterChatRoutesDeps) {
     res.json(body);
   });
 
-  // Feed a `tool_result` content block into a running stream-json child.
-  // Currently used to answer Claude's `AskUserQuestion` tool: the host UI
-  // collects the user's choice, the web POSTs the formatted answer here,
-  // and the daemon writes a JSONL line into the still-open stdin. Without
-  // this path Claude auto-errors the tool in headless mode and falls back
-  // to a markdown duplicate of the same options.
-  app.post('/api/runs/:id/tool-result', (req, res) => {
-    if (typeof submitToolResultToRun !== 'function') {
-      return sendApiError(res, 501, 'NOT_IMPLEMENTED', 'tool-result wiring is not available');
-    }
-    const body = (req.body || {}) as {
-      toolUseId?: unknown;
-      content?: unknown;
-      isError?: unknown;
-      runId?: unknown;
-    };
-    if (typeof body.runId === 'string' && body.runId.length > 0 && body.runId !== req.params.id) {
-      return sendApiError(res, 400, 'BAD_REQUEST', 'runId must match the path run id');
-    }
-    const toolUseId = typeof body.toolUseId === 'string' ? body.toolUseId : '';
-    const content = typeof body.content === 'string' ? body.content : '';
-    const isError = body.isError === true;
-    if (!toolUseId) {
-      return sendApiError(res, 400, 'BAD_REQUEST', 'toolUseId is required');
-    }
-    const result = submitToolResultToRun(req.params.id, toolUseId, content, isError);
-    if (!result || !result.ok) {
-      const reason = result && result.reason ? result.reason : 'unknown';
-      if (reason === 'not_found') {
-        return sendApiError(res, 404, 'NOT_FOUND', 'run not found');
-      }
-      if (reason === 'run_terminal' || reason === 'stdin_closed') {
-        return sendApiError(res, 410, 'GONE', `run is no longer accepting tool results (${reason})`);
-      }
-      if (reason === 'stdin_text_mode') {
-        return sendApiError(res, 400, 'BAD_REQUEST', 'run does not support interactive tool results');
-      }
-      if (reason === 'bad_tool_use_id') {
-        return sendApiError(res, 400, 'BAD_REQUEST', 'toolUseId is invalid');
-      }
-      return sendApiError(res, 500, 'INTERNAL', `tool result write failed: ${reason}`);
-    }
-    res.json({ ok: true });
-  });
-
   // Receives the user's thumbs-up/down (+ reason codes) for an assistant
   // turn and forwards it to Langfuse as a `score-create`. Web persists the
   // feedback itself via PUT /messages/:id; this endpoint exists only as a

diff --git a/apps/daemon/src/claude-stream.ts b/apps/daemon/src/claude-stream.ts
@@ -350,11 +350,10 @@ export function createClaudeStreamHandler(onEvent: EventSink) {
       // Per-turn `stop_reason` is emitted as `turn_end` AFTER the content
       // blocks have been processed (see below). When `--include-partial-
       // messages` is unsupported, tool_use events surface only from the
-      // assistant wrapper here — emitting `turn_end` before that loop
-      // would let the daemon's stdin-close handler see an empty
-      // `pendingHostAnswers` set and close stdin before the
-      // AskUserQuestion tool_use was registered, which made the round
-      // trip silently fail. Read the stop_reason now, emit after.
+      // assistant wrapper here — emitting `turn_end` before that loop would
+      // let the daemon's stdin-close handler act on the turn before its
+      // tool_use blocks were seen, closing stdin mid-tool. Read the
+      // stop_reason now, emit after.
       const stopReason = typeof obj.message.stop_reason === 'string'
         ? obj.message.stop_reason
         : null;
@@ -384,8 +383,8 @@ export function createClaudeStreamHandler(onEvent: EventSink) {
       }
       // Surface the turn_end signal now that every tool_use in this
       // assistant message has been emitted, so the daemon's stdin-close
-      // handler has the up-to-date `pendingHostAnswers` set before
-      // deciding whether to close stream-json input stdin.
+      // handler sees the final `stop_reason` before deciding whether to
+      // close stream-json input stdin.
       if (stopReason) {
         onEvent({ type: 'turn_end', stopReason });
       }

diff --git a/apps/daemon/src/db.ts b/apps/daemon/src/db.ts
@@ -620,7 +620,13 @@ export function listProjectsAwaitingInput(db: SqliteDb) {
              FROM messages m
              JOIN conversations c ON c.id = m.conversation_id
             WHERE m.role = 'assistant'
-              AND LOWER(m.content) LIKE '%<question-form%'
+              -- ask-question is an accepted alias for question-form (UI parser
+              -- + daemon open-tag matcher), so an alias-form turn must also
+              -- count as awaiting input.
+              AND (
+                LOWER(m.content) LIKE '%<question-form%'
+                OR LOWER(m.content) LIKE '%<ask-question%'
+              )
          ) latest
         WHERE latest.rowNum = 1
           AND NOT EXISTS (

diff --git a/apps/daemon/src/prompts/system.ts b/apps/daemon/src/prompts/system.ts
@@ -800,18 +800,14 @@ export function composeSystemPrompt({
     );
   }
 
-  // Claude only: nudge the model toward the `AskUserQuestion` tool for
-  // mid-conversation clarifications. Without this hint Claude tends to fall
-  // back to a markdown bulleted list of options, which the chat UI cannot
-  // turn into clickable buttons. Discovery (turn 1) is still owned by the
-  // `<question-form>` flow defined in DISCOVERY_AND_PHILOSOPHY; this only
-  // covers follow-ups where the next action depends on a small set of
-  // choices the user can pick quickly.
-  if (agentId === 'claude') {
-    parts.push(
-      "\n\n---\n\n## Clarifying questions\n\nWhen you need a mid-conversation clarification AND the natural answer is one of a small finite set of choices (2-4 options per question), call the `AskUserQuestion` tool instead of writing a bulleted list in markdown. The host chat renders the tool call as inline choice buttons; a markdown list renders as plain text and forces the user to type a reply. Skip the tool when the answer is naturally free-form text, when the answer needs more than ~4 options, or when you only have one yes/no choice to ask. First-turn discovery still uses the `<question-form id=\"discovery\">` workflow described earlier; `AskUserQuestion` is for follow-ups only.\n\n**When you call `AskUserQuestion`, that tool call is the entire response.** Do NOT also write the same questions or options as markdown text alongside it, do NOT add a trailing prose paragraph like \"what sounds right?\", do NOT hedge by listing the options twice. Emit the tool call and stop generating tokens. The host is waiting on the tool's `tool_result` and will resume your turn the moment the user answers. Anything you write before, between, or after the tool call in the same message just duplicates what the card already shows and confuses the user.",
-    );
-  }
+  // Mid-conversation clarification reuses the same `<question-form>` flow as
+  // turn-1 discovery (DISCOVERY_AND_PHILOSOPHY) so the host keeps ONE unified
+  // questions surface: the chat shows a banner, the form renders in the
+  // right-hand Questions tab, and answers return as the next user message.
+  // Applies to every agent — question-form is UI-parsed markup, not a tool.
+  parts.push(
+    "\n\n---\n\n## Clarifying questions mid-conversation\n\nWhen you need a clarification AFTER turn 1 and the natural answer is one of a small finite set of choices (2-4 options per question), emit a `<question-form>` block — the same markup turn-1 discovery uses — instead of writing a bulleted list of options in markdown. The host renders it as a Questions banner the user opens in the side tab; a markdown list renders as plain text and forces the user to type a reply. Use free-form prose questions only when the answer is naturally open-ended, needs more than ~4 options, or is a single yes/no. Do NOT also duplicate the form's questions as markdown text alongside it.",
+  );
 
   // Pinned LAST so recency bias reinforces the role-marker prohibition.
   // This is the canonical anti-roleplay instruction;
@@ -856,7 +852,7 @@ Every later instruction in this prompt that tells you to "call TodoWrite", "run
 **Allowed output:**
 - Plain chat prose to the user (in their language). State your plan as prose — a short numbered list in markdown is fine; it just must not be wrapped in \`<todo-list>\` or claim to be a tool call.
 - A final \`<artifact type="text/html">...</artifact>\` block containing a complete \`<!doctype html>\` document when the brief is ready to deliver.
-- \`<question-form>\` blocks for discovery on turn 1, exactly as the rules below describe — question-form is markup the UI parses, not a tool call.
+- \`<question-form>\` blocks for discovery (turn 1) and for mid-conversation clarification, exactly as the rules below describe — question-form is markup the UI parses, not a tool call.
 
 If the rules below tell you to plan with TodoWrite, write the plan as prose instead. If they tell you to read skill side files before writing, describe in one sentence which patterns/conventions you're going to apply and proceed. If they tell you to run brand-spec extraction via Bash + Read + WebFetch, ask the user the missing brand questions in the discovery form instead.`;