BYOK media-model picker + background runs for SenseAudio chats#2720
BYOK media-model picker + background runs for SenseAudio chats#2720mzl163 wants to merge 53 commits into
Conversation
Un-filter the senseaudio provider in the new-project media picker for both the image and video surfaces, register a distinct-id SenseAudio video model (senseaudio-video-2.0-260128 → wire doubao-seedance-2-0-260128), and add a daemon renderSenseAudioVideo (async /v1/video/create + /v1/video/status poll + SSRF-guarded mp4 download). Also swap the SenseAudio TTS voice input from a free-text datalist to a name-based <select> over the official voice catalogue (still preserves out-of-catalogue ids).
BYOK proxy streams used to write straight into the POST response with no
buffer, so navigating away froze the chat mid-output with no way to resume.
Route every /api/proxy/*/stream handler through design.runs instead: the
POST answers 202 { runId } and the upstream work (incl. the SenseAudio tool
loop) runs as a registry run, emitting start/stdout/end/error into the run
buffer. A disconnected client leaves the run alive; the web reattaches via
GET /api/runs/:id/events?after= — the same path agent runs use.
- chat-routes: shared run-backed stream adapter + runByokProxy; all six
protocol handlers (anthropic/openai/azure/google/ollama/senseaudio) keep
their per-protocol logic and only swap the sink. Upstream fetch now takes
an AbortSignal for cooperative cancel.
- runs: add an onCancel hook so cancel() can abort a BYOK run's upstream
fetch (no OS child process to SIGTERM).
- web: streamByokViaDaemon (POST proxy → runId → consumeDaemonRun); the
BYOK send path pins runId/status/lastEventId onto the assistant row like
the agent path, and the reattach effect no longer skips non-daemon mode.
…ideo/audio) The BYOK chat's media tools were hardwired to SenseAudio, and the composer only exposed a SenseAudio image-model dropdown. Route the tools through the unified media dispatcher (generateMedia) instead, so a BYOK chat can generate with ANY catalogue model whose provider has credentials configured in Settings → Media — and expose image / video / audio model pickers in the composer over the whole registry. - byok-tools: generate_image / generate_video now delegate to generateMedia (provider-routed, credential-resolved, project-written); add a generate_audio (speech / music / sfx) tool. Model validation broadened from SenseAudio-only to the full registry per surface; tool `model` enums now span every catalogue model. - senseaudio proxy handler: build toolCtx with composer-selected image / video / audio defaults, falling back to the SenseAudio model per surface (the only provider whose key this chat seeds) rather than the global catalogue default; await the media-config seed so generateMedia finds the key before the first tool fires; dispatch the new audio tool. - composer: replace the single image dropdown with image / video / audio pickers grouped by provider; thread the three selections through ChatPane → ProjectView → streamByokViaDaemon proxy body. - i18n: add settings.byokVideoModel / settings.byokAudioModel across locales. - tests: rewrite byok-tools spec around the generateMedia delegation; add a proxy-routes spec proving a composer-selected model reaches the upstream generation request (the picker is wired, not decorative).
The three inline image / video / audio selects crowded the composer. Replace them with a single compact "Media models" button that opens a popover holding all three pickers (closes on outside click / Escape). Adds the settings.byokMediaModels label across locales.
…configured generateMedia falls back to a placeholder stub when OD_MEDIA_ALLOW_STUBS is set, so a BYOK chat could "succeed" with a fake image for a model whose provider has no credentials — making the composer model picker look meaningless. Add a per-call allowStub opt-out and pass allowStub:false from the BYOK media tools so an unconfigured / unintegrated model surfaces a real error the chat reports, instead of a placeholder that looks generated.
…tool arg A SenseAudio chat could generate with its own model choice even after the user picked a different one, because the LLM's `model` tool argument overrode the composer selection. Add pinned* fields to the tool context: the user's pick now wins over any model the LLM passes (pinned > LLM arg > surface fallback), so the picker actually controls generation — and a pinned unconfigured provider errors clearly instead of silently falling back to SenseAudio.
handleSend reads the per-session byok image/video/audio model overrides when building the proxy body, but they were missing from its useCallback deps — so it closed over the initial empty values and the composer picker never changed what got generated (daemon logged pin[img=-]). Add the three overrides to the deps array. Also log the received pin[img/vid/aud] models on each senseaudio proxy request to make "wrong model" reports diagnosable.
…y path The BYOK background+reattach change routed API-mode chats through streamByokViaDaemon instead of the in-browser streamMessage, leaving these specs driving (and mocking) the old path — red since that commit. Drive the new run-registry callbacks (onRunCreated/onRunStatus + handlers) so the empty-response soft-state, success-sound, attachment-inlining, and ElevenLabs-voice-injection behaviours are verified on the real path; mock streamByokViaDaemon in resume-conversation so its auto-send no longer leaks an unhandled rejection.
Each SenseAudio image model accepts a different fixed set of sizes — 2.0 takes 1024x1024 but 1.0 / doubao-seedream-5.0 reject it with `参数错误:size`. The size map was global (2.0-only), so picking any other image model failed once the composer selection actually reached the backend. Make senseAudioImageSize per-model and pick the valid size whose ratio is closest to the requested aspect.
…tive select) The native <select> in the media-models popover misbehaved in Electron: clicking an option in the OS dropdown could trip the popover's outside-click handler and unmount the select before its change committed, so the user's pick never reached React state (daemon kept logging pin[img=-]). Replace it with a custom button-list dropdown — option clicks fire a normal React onClick that always commits. Verified end-to-end via a real option click: daemon receives pin[img=senseaudio-image-1.0-260319] and generates with that model.
Two follow-ups so the composer media-model picker behaves as users expect: - Priority: an explicit model the user names in chat (forwarded by the LLM as the tool's `model` arg) now wins over the composer pick, which wins over the surface fallback — composer is the DEFAULT, not an override. Earlier it was made authoritative while debugging; that prevented overriding via chat. Tool descriptions now tell the model to omit `model` unless the user named one. Renamed the tool-context fields pinned* → composer* to match. - Label: the "(default)" option showed the global catalogue default (gpt-image-2) even though an empty selection actually routes to the SenseAudio model — so users thought gpt-image-2 was selected when it wasn't. The picker now labels the default with the SenseAudio surface model, matching what the daemon uses.
Show a real-time green dot on a project when it has an in-flight (queued or running) agent / BYOK run, so users can see at a glance which projects are still working in the background. Added a useRunningProjectIds hook (polls the run registry + reacts to RUNS_CHANGED) and surface the dot in two places: the home "recent projects" cards and the top workspace tabs bar.
…-picker # Conflicts: # apps/web/src/components/ProjectView.tsx
Reconcile the BYOK media tooling with upstream nexu-io#2570 (SenseAudio BYOK TTS). Upstream added a SenseAudio-specific generate_speech tool that hits /t2a_v2 directly; this branch already routes all media (image/video/audio) through the unified generateMedia dispatcher, whose renderSenseAudioTTS path performs the same /t2a_v2 + hex->mp3 work. Kept the full-registry generate_image/video/audio tools and dropped the duplicate generate_speech tool, executeGenerateSpeech, and their tests so the chat exposes one TTS path (generate_audio) instead of two overlapping ones.
The merged tree pairs main's ProjectView.reattach-restore.test.tsx (from nexu-io#2383, which mocks ../../src/i18n with only useT) against ProjectView.tsx, which calls useI18n at render. Vitest threw "No useI18n export defined on the mock" for all five reattach cases. Add useI18n to the mock factory so the component's i18n hook resolves, matching the other ProjectView test files.
Normal agent chats (Claude Code / Codex) now show the same registry-wide media-model picker the SenseAudio BYOK chat has. The selection is a per-turn default: the daemon forwards it on the run request and injects it as OD_DEFAULT_IMAGE_MODEL / OD_DEFAULT_VIDEO_MODEL / OD_DEFAULT_AUDIO_MODEL into the spawned agent's env, and `od media generate` falls back to it when the agent omits --model. An explicit --model (the user named a model in chat) still wins. - contracts: ChatRequest gains imageModel/videoModel/audioModel. - cli: `od media generate` --model is now optional; resolves from the matching OD_DEFAULT_*_MODEL env when omitted, errors only if neither is set. - daemon: startChatRun reads the per-turn defaults, injects the OD_DEFAULT_* env on spawn, and the system prompt tells the agent to omit --model when a default is configured. - web: the composer picker renders for normal agent chats (gated on daemon mode, hidden when the SenseAudio BYOK picker is active) and its empty-state "(default)" label names the registry default rather than always SenseAudio.
…chat media Factor SenseAudio's media tool loop (inject BYOK_MEDIA_TOOLS, stream deltas, accumulate tool_calls, run generate_image/video/audio via generateMedia, feed results back, loop) into a reusable runByokMediaChat helper, and refactor the SenseAudio handler onto it (proxy-routes tests unchanged: 56/56). Wire the OpenAI-compatible BYOK proxies onto the same helper so they get in-chat media generation too: - openai: seeds the BYOK key into media-config.openai (its key also unlocks gpt-image + TTS) and defaults each surface to OpenAI's model. - azure: OpenAI-compatible chat; no dedicated media provider in the registry, so it does not self-seed — in-chat media routes to whatever the user configured in Settings → Media. Honors the deployment-path model-in-URL rule via includeModel. The helper degrades to a plain LLM passthrough when the request carries no valid projectId (no tools injected), so non-project BYOK chats are unchanged. Adds defaultMediaModelsForProvider() to derive per-provider surface defaults. Ollama keeps its native /api/chat NDJSON passthrough (no OpenAI tool_calls).
… chats
Adds a dedicated runAnthropicMediaChat adapter so Anthropic (Claude) BYOK chats
can generate media in-chat. Claude's Messages API uses a different tool
protocol than OpenAI, so the adapter declares tools as { name, input_schema },
accumulates `tool_use` content blocks (id/name + streamed input_json_delta),
executes generate_image/video/audio via generateMedia, and feeds results back
as `tool_result` content blocks in a user turn. Claude has no media API of its
own, so it never self-seeds — generation routes to whatever the user configured
in Settings → Media.
Broadens the composer media-model picker from SenseAudio-only to every BYOK
protocol that now injects media tools (senseaudio, openai, azure, anthropic),
gated on !agentMediaPickerEnabled so it never collides with the normal-agent
picker. preferProvider names the empty-state default per vendor (SenseAudio /
OpenAI default to their own seeded model; Azure / Anthropic show the registry
default since media comes from separately-configured providers).
Tests: proxy-routes gains an OpenAI mediaEnabled-gating test (tools injected
with a projectId, omitted without) and an Anthropic adapter test (tool_use →
tool_result → text turn). 58/58 pass.
…-picker # Conflicts: # apps/web/tests/components/ProjectView.reattach-restore.test.tsx
…esolution)
Align renderSenseAudioVideo with the SenseAudio /v1/video/create doc and make
the output controllable. This fixes three discrepancies that affected the CLI
(`od media generate`) and the BYOK chat path alike (both route through
renderSenseAudioVideo):
- i2v reference frame: the image content block used the OpenAI shape
{ type:'image_url', image_url:{ url } }, but the doc requires
{ type:'image', url, role:'first_frame' }. Image-to-video now passes the
reference in the documented form so it is actually honored.
- watermark: the gateway defaults watermark to true, so clips were stamped.
Send watermark:false by default.
- resolution: was hard-pinned to 720p. Thread an optional resolution
(480p|720p|1080p, default 720p) end to end — generateMedia args + MediaContext,
both /api/projects/:id/media/generate handlers, the `od media generate
--resolution` flag, and the BYOK generate_video tool param. providerNote now
reports it.
Text-to-video, image model/prompt/size, and prompt passing were already
correct; this only touches the video request body.
Tests: media-senseaudio-video gains a resolution-override assertion and an i2v
content-shape assertion (documented { type:'image', url, role }, not image_url);
existing create-body test now also locks resolution=720p + watermark=false.
Resolves 5 conflicts after main advanced 110 commits: - chat-routes.ts: keep the runByokMediaChat refactor (in-chat media for all media-capable BYOK providers) and fold main's max_tokens/max_completion_tokens handling into runTurn — first attempt via buildOpenAIChatTokenParam, with a 400 isUnsupportedMaxTokensError retry on max_completion_tokens, so GPT-5/o-series and Azure deployment aliases work through the shared media loop. - ChatComposer.tsx / ProjectView.tsx: keep both sides' imports (full media-models set + deriveUploadCohort; streamByokViaDaemon + reportChatRunFeedback). - ProjectView.tsx stream call: use the shared run-lifecycle const handlers and add main's analyticsHints spread. - RecentProjectsStrip.tsx: keep main's !publishedDesignSystem guard on isActive and the live isRunning green-dot indicator. - Drop resume-conversation test (removed on main in nexu-io#2562). Aligned 2 Azure token-param retry tests to the run-registry (202 {runId}) shape since BYOK proxies now stream through the run registry.
…udioVideo
The header docblock still described the old OpenAI-style
{ type:'image_url', image_url:{url} } i2v entry; the actual create body
(and its inline comment) already send the documented SenseAudio shape
{ type:'image', url, role:'first_frame' }. Comment-only; no behavior change.
…docs
The home composer's media options were generic (2K/4K size, 3-30s
duration) and didn't match what the SenseAudio gateway accepts, and a
picked size never reached the generation call. End-to-end fix so the UI
options, the brief, the project metadata, and the actual `od media
generate` call all agree with the SenseAudio docs.
Image (/v1/image/async): per-model discrete pixel sizes (e.g.
2048x1152), shown as a Size dropdown that REPLACES the Ratio dropdown
for discrete-size models (the size already encodes the aspect, never
both). The chosen size flows metadata.imageSize -> system prompt --size
-> CLI --size -> renderSenseAudioImage (honors an explicit documented
size, else maps aspect). Daemon + web size tables expanded to the full
documented per-model lists (incl. sensenova-u1-fast).
Video (/v1/video/create): resolution dropdown = 480p/720p/1080p
(default 720p), duration capped to 4-15s for SenseAudio video; both
model-aware so non-SenseAudio models keep their own options. Resolution
flows metadata.videoResolution -> system prompt --resolution -> daemon.
Also stop the agent echoing a stale/contradictory aspect ratio for
size-based images: the od-media-generation brief no longer hardcodes
"Aspect: {{aspect}}", and the system prompt tells the agent to describe
the output by its exact pixel size only (never approximate a ratio like
16:9). The media-generation contract now documents the --size and
--resolution flags plus the SenseAudio per-model size / video-resolution
rules, so the agent has the full parameter spec, not just the chosen
value.
Contracts: ProjectMetadata gains imageSize + videoResolution.
…-picker # Conflicts: # apps/daemon/src/byok-tools.ts # apps/daemon/src/chat-routes.ts # apps/daemon/src/media-routes.ts # apps/daemon/src/media.ts # apps/daemon/tests/byok-tools.test.ts
…-picker # Conflicts: # apps/web/src/i18n/locales/fr.ts
…-picker Conflicts resolved: - apps/daemon/src/server.ts: combine per-turn defaultMediaModels with main's mediaExecution policy params on agent run spawn. - apps/daemon/src/prompts/system.ts: call main's renderMediaMetadataAction helper while keeping branch's dynamic --size / --resolution dispatch args and the 'do NOT restate aspect' instruction for discrete sizes. - apps/daemon/src/media-routes.ts: keep main's handleGenerate refactor with policy gating; thread resolveProjectMediaModel and size/resolution body fields through it.
…-picker
Conflicts resolved:
- apps/daemon/src/chat-routes.ts: keep branch's runByokMediaChat /
runAnthropicMediaChat / runByokProxy helpers across anthropic, openai,
azure, google, and ollama proxy routes. Thread main's OpenRouter
attribution headers (HTTP-Referer / X-Title) through openai authHeaders.
Drop main's inline azure max_completion_tokens retry pair because the
helper already performs that retry via buildOpenAIChatTokenParam /
isUnsupportedMaxTokensError. Adopt main's createDeltaGuard contamination
protection on the google and ollama paths.
- apps/daemon/src/media.ts: keep both generateMedia args — main's images?:
string[] (multi-image) and branch's allowStub?: boolean (BYOK strict mode).
- apps/daemon/src/prompts/system.ts: keep branch's imageSize XOR
aspectRatio dispatch; use main's improved aspectRatio default copy
('1:1 (default — use 16:9 for landscape/outdoor scenes, 9:16 for
portrait/vertical)') in the aspect-only branch.
- apps/web/src/components/NewProjectPanel.tsx: union supportedModels —
keep senseaudio image+video entries alongside main's new openrouter /
imagerouter / leonardo / custom-image providers.
- apps/web/tests/components/NewProjectPanel.test.ts: keep both the
SenseAudio voice catalogue suite and main's OpenRouter visibility cases.
lefarcen
left a comment
There was a problem hiding this comment.
Hey @mzl163, the feature scope is laid out clearly — the media-model picker, SenseAudio voice dropdown, and background-run indicators are easy to understand from the description and screenshots. One PR-description detail before reviewers scope this: the Surface area checklist currently leaves API / contract unchecked and says there are no packages/contracts changes, but this branch updates packages/contracts/src/api/chat.ts and packages/contracts/src/api/projects.ts. Could you tick API / contract and adjust that sentence to mention the optional chat media-model fields / project metadata additions? That’ll keep release verification aligned with the actual surface.
…-picker # Conflicts: # apps/daemon/src/media-routes.ts # apps/daemon/src/server.ts
…-picker Conflicts resolved: - apps/daemon/src/media-routes.ts: union the import block — keep branch's resolveProjectMediaModel + main's isSandboxModeEnabled side by side. - apps/daemon/src/server.ts (project media route): drop branch's inline POST /api/projects/:id/media/generate registration; that route now lives in media-routes.ts handleGenerate, which already calls resolveProjectMediaModel and forwards size/resolution. Keeping both would double-register the route. - apps/daemon/src/server.ts (composePromptContext return): combine main's promptTelemetryParts field with branch's finalPrompt (the media-default augmented prompt that injects the composer-selected OD_DEFAULT_*_MODEL hints when the user does not name a model in chat). - apps/web/src/components/ChatComposer.tsx: keep branch's consolidated ByokMediaModelsPopover (BYOK + normal-agent variants); drop main's older inline SenseAudio-only SearchableModelSelect — the popover now subsumes every media-capable BYOK provider plus the normal-agent chat surface. - apps/web/src/components/ProjectView.tsx: keep branch's per-session media model overrides + project-metadata seeding effect + instructions review state; main has no equivalent block (branch-only feature). - apps/web/src/components/WorkspaceTabsBar.tsx: keep both isRunning (branch's green-dot indicator) and dragOverClass (main's drag-target highlight) derivations; the JSX further down already consumes both. - apps/web/tests/components/ProjectView.api-empty-response.test.tsx: rewrite the project-instructions BYOK system-prompt test off the obsolete mockedStreamMessage / StreamHandlers API onto the new mockByokStream helper, matching the rest of the file's pattern.
… feat/senseaudio-media-picker
|
@mzl163 friendly reminder: this PR has been waiting on an author response for more than 3 days after reviewer or maintainer feedback. When you have a chance, please reply here or push an update. To keep the queue manageable, PRs with no author activity for more than 5 days after feedback may be closed automatically, but they can be reopened when work resumes. |
…-picker
Conflicts resolved:
- apps/daemon/src/media-routes.ts: union imports — branch's resolveProjectMediaModel + main's aihubmix catalog helpers.
- apps/daemon/src/media.ts: keep branch's renderSenseAudioVideo (full t2v/i2v + watermark/ratio/resolution + SSRF download); follow it with main's renderAIHubMixImage / renderAIHubMixGeminiImage / renderAIHubMixTTS / renderAIHubMixVideo block so both providers coexist. Add assertExternalAssetUrl to the connectionTest import (still referenced by SenseAudio video).
- apps/daemon/src/chat-routes.ts: keep branch's unified runByokMediaChat / runAnthropicMediaChat / runByokProxy architecture for SenseAudio/OpenAI/Azure/Anthropic/Google/Ollama. **Drop main's per-provider registerByokToolChatProxy factory** — its routes (BYOK_SENSEAUDIO_TOOLS / BYOK_AIHUBMIX_TOOLS specific proxies) are not registered through chat-routes anymore. Known regression: AIHubMix BYOK *chat* (in-chat generate_image/video/speech via /api/proxy/aihubmix/stream) no longer works on this branch; AIHubMix media generation still works via /api/media/generate, the CLI, and the media-config picker.
- apps/daemon/src/byok-tools.ts: keep BOTH ecosystems' exports. Branch's BYOK_MEDIA_TOOLS + executeGenerate{Image,Video,Audio} + isImageModel/isVideoModel/isAudioModel + SENSEAUDIO_* constants. Main's BYOK_AIHUBMIX_TOOLS + executeAIHubMixGenerate{Image,Speech,Video} + isAIHubMix*Model helpers + AIHubMix wire constants. Adds withToolRequestInit helper + sleep helper needed by the AIHubMix executors. Adds upstreamApiKey / upstreamBaseUrl optional fields on BYOKToolContext so the AIHubMix executor's session-key shortcut still compiles.
- apps/daemon/tests/byok-tools.test.ts: union imports + node:fs imports back so main's AIHubMix executor tests still compile alongside branch's generateMedia-mock tests.
- apps/web/src/providers/daemon.ts: extend ByokDaemonStreamOptions with byokSpeechModel / byokSpeechVoice so ProjectView's submit shape passes typecheck.
- apps/web/src/components/{ChatComposer,ChatPane,ProjectView}.tsx: union both sides' props/state (HEAD's BYOK audio override + agent-media-picker / image/video/audio model overrides + project-id seed effect; main's BYOK speech override + voice override + workspaceContext props + live byok model options hooks + Escape-to-close on instructions editor).
- apps/web/src/components/ChatComposer.tsx (composer JSX): take main's new LexicalComposerInput body (the inline BYOK media-model popovers are deferred pending a unified picker — main intentionally removed them; the props/handlers stay wired for the future unified surface).
- apps/web/src/components/NewProjectPanel.tsx: union supportedModels — SenseAudio + main's openrouter / imagerouter / leonardo / custom-image / aihubmix providers; union imports (SenseAudio voices + AIHubMix live-model hooks).
- apps/web/src/components/home-hero/media-surfaces.ts: keep branch's imageSize-XOR-aspect dispatch (drops main's separate resolution field for image).
- apps/web/src/i18n/types.ts + 18 locale files: union both sides' keys (byokAudioModel + byokVideoI2vHint / byokSpeechModel / byokSpeechVoice / byokModelDefaultOption). Locale union deduped — 4 locales had a duplicate byokVideoModel that got cleaned up.
…-picker Conflict: apps/web/src/components/ChatPane.tsx — main refactored the chat composer to render via a portal (ref=composerSlotRef + createPortal(composerNode, composerPortalTarget)) so the composer can detach from the chat-log scroll container while staying visually pinned. Took main's portal JSX, then added the branch-only props (byokAudioModel/onChangeByokAudioModel, agentMediaPickerEnabled, image/video/audioModel + handlers) to the composerNode declaration upstream so the portal still threads through every BYOK / agent media-model state the rest of the branch wires up.
|
@lefarcen Thanks for catching that! I’ve updated the PR description to tick API / contract and mentioned the optional chat media-model fields + project metadata changes. I also resolved conflicts with the latest environment. Everything should be aligned now for review! |
…ctory These exercise main's `registerByokToolChatProxy` AIHubMix proxy route (POST /api/proxy/aihubmix/stream) and its claude*/gemini* model-routing branches. The merge at 04c79c2 kept the branch's unified runByokMediaChat architecture and did not adopt that factory, so the route is not registered and the tests 404 in this branch. Skipped instead of deleted so the intent and assertion shapes stay reviewable for when AIHubMix BYOK chat is brought back in line with main's surface. Tests skipped: - routes AIHubMix to /v1/chat/completions with tools + APP-Code header - routes AIHubMix claude* models to the Anthropic /v1/messages wire - routes AIHubMix gemini* models to the Gemini streamGenerateContent wire - runs the BYOK media tool loop on the AIHubMix claude (Anthropic) route - runs the BYOK media tool loop on the AIHubMix gemini route Local: tests/proxy-routes.test.ts → 66 passed | 5 skipped.
…-picker Conflicts resolved: - apps/web/src/components/HomeView.tsx (footerInputNamesForChip): take main's streamlined Home composer chrome for prototype/deck chips (['designSystem'] only — PR nexu-io#3692 removed fidelity/slideCount/ speakerNotes from the inline footer); keep branch's image chip ['designSystem', 'model', 'ratio', 'size'] (SenseAudio discrete-size image flow depends on 'size'; main's 'resolution' for image is a no-op on the branch's media-surfaces field schema). - apps/web/src/components/WorkspaceTabsBar.tsx (per-tab derived values): keep both — branch's isRunning (live background-run green dot from run registry) and main's isHome (Home tab pinned-leftmost guard). The JSX downstream consumes both.
…-picker Conflicts resolved this round (28 commits behind, mostly i18n drift + UI surface tidy-ups): - apps/daemon/src/prompts/system.ts (metadata block): keep branch's imageSize-XOR-aspectRatio dispatch in the metadata description; both sides write the aspectRatio hint with the same default copy now. - apps/web/src/components/ChatComposer.tsx (imports): keep branch's media models + groupByProvider + fetchConnectors imports; main only removed fetchConnectors locally. - apps/web/src/components/ChatPane.tsx (props destructure): union both sides — branch's image/video/audio overrides + agentMediaPickerEnabled, main's composerLeadingAccessory. - apps/web/src/components/HomeView.tsx (footerInputNamesForChip): take main's full simplification — image / video now expose only ['designSystem']; the agent asks ratio / duration / model / size / resolution during the run via AskUserQuestion. Same for hyperframes / audio (no pills). SenseAudio discrete-size flow moves off the home composer into Settings → Media + agent discovery. - apps/web/src/components/WorkspaceTabsBar.tsx (per-tab derived values): keep branch's isRunning (live green dot) and adopt main's isPinned (replaces isHome; downstream JSX uses isPinned). - apps/web/src/components/home-hero/media-surfaces.ts: take main's simplified media metadata + query template — image / video / hyperframes / audio surfaces no longer seed model / ratio / size / resolution / duration / voice from the composer; system prompt prints '(unknown — ask: …)' instead. - apps/web/src/i18n/locales/zh-TW.ts: take branch's big block of zh-TW entries (main side empty in the conflict region), then dedupe duplicate keys (~396) introduced by the union and drop 90+ orphan keys that main removed from types.ts (chat.designToolbox.*, workspace.newSideChat*, home.openExistingProject*, home.chooseFolderSubtitle). - apps/web/tests/components/HomeView.media-options.test.tsx: align two cases with main's footer-pill simplification (image / video only expose designSystem pill; no model / ratio / duration / resolution / size). The SenseAudio image-size / video-resolution test was rewritten as it.skip with a doc comment — its DOM never renders under the new composer; re-enable when those pills come back to the home composer.
…-picker Single-file conflict: apps/web/src/i18n/locales/zh-TW.ts. HEAD added chat.amrCard.* / chat.amrError.* / chat.antigravityError.* / plugins.actions.* / workingDirPicker.* entries; main added homeHero.* + chip.* + updated workingDirPicker.* copy (PR nexu-io#3880 'Local storage' reframe). Resolution: keep both sides (strip markers, run multi-line-aware key dedupe), first occurrence wins — branch's workingDirPicker translations stay; main's homeHero entries land where branch didn't have them.
…-picker Single-file conflict: apps/web/src/components/ProjectView.tsx (state block). HEAD had image/video/audio model overrides + project-id seed effect + instructions review/edit/escape state; main side empty (PR nexu-io#3924 removed the project instructions editor entry). Resolution: keep branch's image/video/audio overrides + seed effect (heavily consumed by submit path + composer props), drop the instructions review/edit/escape state (no downstream consumers in the merged file after main's removal).
…-picker Single-file conflict: apps/web/src/i18n/locales/zh-TW.ts (3 chat.mode.design.* translations). Pure copy refresh — take main's wording: '設計' / '設計模式' / '即時看板' (matches the EN canonical 'Design' label) over branch's earlier 'Design Agent' / '即時產物' draft.
…-picker
Single-file conflict: apps/web/src/i18n/locales/zh-TW.ts (questions.* block). HEAD had English placeholder strings ('Questions' / 'Mind if I ask…' / 'Continue' / …); main added proper zh-TW translations + a new questions.bannerAnswered key. Take main's translations — branch had untranslated drafts.
…-picker Conflicts resolved: - apps/web/src/i18n/locales/zh-TW.ts (header): keep branch's '...en' spread fallback (annotates that strict-keys-only adoption is deferred until in-flight feature keys land their translations) + main's explicit zh-TW translations. Dedupe duplicate keys + drop 99 orphan keys that main removed from types.ts (chat.designToolbox.*, generationPreview.title, ...). Re-append the file's trailing '};' that the orphan strip ate. - apps/web/src/components/ProjectView.tsx (BYOK send path): take main's three inlined run-lifecycle callbacks (onRunCreated / onRunStatus / onRunEventId) — they carry main's superseded-run gating (supersededRunsRef.current.has) and the new clearCurrentRunStreamingMarker helper. Branch's shorthand-by-name references to the older named handlers above would otherwise plug stale logic in.
Two regressions surfaced on PR nexu-io#2720 CI (Browser tests + Strict PR visual tests) that didn't fire on main's HEAD: 1. workspace-keyboard-flows.test.ts — 'Enter sends / Shift+Enter inserts a newline' (received runCount=6, expected 0 before pressing Enter). Root cause: useRunningProjectIds (the green-dot indicator hook this branch added) polled GET /api/runs every 2500ms via setInterval. The Playwright test mock 'page.route("**/api/runs", …)' caught those GETs alongside the POST it was actually counting. Fix: hook now fetches '/api/runs?status=active' (URL with query string falls outside the test's bare '**/api/runs' glob) and drops the setInterval. Refresh on mount + RUNS_CHANGED event + visibilitychange, no periodic polling. The active filter is also what the indicator semantically wants, so this is a cleanup rather than a workaround. 2. api-empty-response.test.ts — 'API empty stream shows No output instead of Done' (.assistant-label 'No output' not visible). Root cause: branch's ProjectView routed BYOK chat through streamByokViaDaemon, which POSTs to /api/proxy/<protocol>/stream and expects a JSON '{ runId }' response before consuming /api/runs/:id/events. But the daemon's /api/proxy routes (runByokMediaChat) still respond with an SSE stream directly. So the .json() decode threw, onError fired, onDone never ran, and the empty_response status event never landed — the assistant card stayed on 'Done'. Fix: revert BYOK send to streamMessage (providers/anthropic.ts) — main's direct-SSE-consumption path that emits onDelta/onDone/onError as the proxy SSE arrives. byokHandlers.onDone still triggers ProjectView's emptyApiResponse check (config.mode === 'api' && empty text/html), which writes the 'empty_response' status the test asserts on. Drops the onRunCreated/onRunStatus/onRunEventId wiring that only matters when the response carries a runId (which the proxy doesn't). Both fixes are scoped to the two regressing files; no daemon or contracts changes.
Branch's ProjectView already uses streamMessage for BYOK turns (matches main). The merge kept HEAD's older version of this test which still mocked streamByokViaDaemon and asserted on it — those calls never happened, so 13 cases failed. Take main's version of the test verbatim; it mocks and asserts on streamMessage to match the actual code path.
…e path Same merge-resolution residue as the api-empty-response fix: branch's ProjectView uses streamMessage for BYOK turns, but this test still mocked + asserted on streamByokViaDaemon. Take main's version so the mock matches what the code actually calls.
Why
I'm building on top of Open Design with a SenseAudio BYOK key, and the BYOK
chat could talk but couldn't really drive media generation end to end:
model, so the model always fell back to one fixed choice.
switched to another project and came back.
This PR closes those gaps so a BYOK chat can generate image / video / audio
with a sensible default model, keep running in the background, and show which
projects are live.
What users will see
default model for image, video, and audio — spanning the whole registry
(OpenAI / Volcengine / SenseAudio / …), not just SenseAudio. This is a
default: if you name a specific model in the chat, that wins.
of a free-text field.
resumes the streaming output instead of freezing it.
on projects that have a live background run.
Surface area
od media generate; this PR only adds a UI-side default model preference for the chat surface, not a new capability.packages/contracts/src/api/chat.ts:ChatRequestgains optional
imageModel/videoModel/audioModel(per-turn defaultmedia models from the composer; daemon forwards them as
OD_DEFAULT_*_MODELenv to the spawned agent).
packages/contracts/src/api/projects.ts:ProjectMetadatagains optionalimageSize(e.g.2048x1152forSenseAudio's discrete-size image models) and
videoResolution(480p/720p/1080p). All additions are optional, backward-compatible — olderclients/daemons that ignore the fields keep working unchanged.
settings.byokImageModel,settings.byokVideoModel,settings.byokAudioModel,settings.byokMediaModels(all 19 locales)Screenshots
Validation
pnpm guard✅pnpm --filter @open-design/daemon typecheck✅ ·pnpm --filter @open-design/web typecheck✅byok-tools/proxy-routes/media-senseaudio(4 files) — 86/86 ✅ProjectView.api-empty-response/ProjectView.run-isolation— 19/19 ✅senseaudio-image-1.0in the composer → daemon logpin[img=senseaudio-image-1.0-260319]→ generation succeeded with the per-model image size fix.