Skip to content

BYOK media-model picker + background runs for SenseAudio chats#2720

Open
mzl163 wants to merge 53 commits into
nexu-io:mainfrom
mzl163:feat/senseaudio-media-picker
Open

BYOK media-model picker + background runs for SenseAudio chats#2720
mzl163 wants to merge 53 commits into
nexu-io:mainfrom
mzl163:feat/senseaudio-media-picker

Conversation

@mzl163

@mzl163 mzl163 commented May 22, 2026

Copy link
Copy Markdown
Contributor

Why

I'm building on top of Open Design with a SenseAudio BYOK key, and the BYOK
chat could talk but couldn't really drive media generation end to end:

  • When creating a project I couldn't pick a SenseAudio image/video model.
  • In the chat composer there was no way to set a default image/video/audio
    model, so the model always fell back to one fixed choice.
  • SenseAudio TTS voices had to be typed by hand — no way to discover them.
  • A project generating media in the background would freeze mid-output if I
    switched to another project and came back.

This PR closes those gaps so a BYOK chat can generate image / video / audio
with a sensible default model, keep running in the background, and show which
projects are live.

What users will see

  • The New Project panel lets you pick SenseAudio image/video models.
  • The chat composer has a new "Media models" popover where you set a
    default model for image, video, and audio — spanning the whole registry
    (OpenAI / Volcengine / SenseAudio / …), not just SenseAudio. This is a
    default: if you name a specific model in the chat, that wins.
  • SenseAudio TTS voices are now a dropdown (showing voice names) instead
    of a free-text field.
  • BYOK chats gain background capability: leaving a project and coming back
    resumes the streaming output instead of freezing it.
  • Home recent-project cards and the top workspace tab bar show a green dot
    on projects that have a live background run.

Surface area

  • UI — composer "Media models" popover, model pickers in New Project, green-dot run indicators (recent projects + workspace tabs)
  • Keyboard shortcut
  • CLI / env var — N/A: media generation is already exposed via od media generate; this PR only adds a UI-side default model preference for the chat surface, not a new capability.
  • API / contractpackages/contracts/src/api/chat.ts: ChatRequest
    gains optional imageModel / videoModel / audioModel (per-turn default
    media models from the composer; daemon forwards them as OD_DEFAULT_*_MODEL
    env to the spawned agent). packages/contracts/src/api/projects.ts:
    ProjectMetadata gains optional imageSize (e.g. 2048x1152 for
    SenseAudio's discrete-size image models) and videoResolution (480p /
    720p / 1080p). All additions are optional, backward-compatible — older
    clients/daemons that ignore the fields keep working unchanged.
  • Extension point
  • i18n keyssettings.byokImageModel, settings.byokVideoModel, settings.byokAudioModel, settings.byokMediaModels (all 19 locales)
  • New top-level dependency
  • Default behavior change
  • None

Screenshots

image image 企业微信截图_17793713672611 image image

Validation

  • pnpm guard
  • pnpm --filter @open-design/daemon typecheck ✅ · pnpm --filter @open-design/web typecheck
  • daemon: byok-tools / proxy-routes / media-senseaudio (4 files) — 86/86 ✅
  • web: ProjectView.api-empty-response / ProjectView.run-isolation — 19/19 ✅
  • Desktop smoke: picked senseaudio-image-1.0 in the composer → daemon log pin[img=senseaudio-image-1.0-260319] → generation succeeded with the per-model image size fix.

unknown added 14 commits May 21, 2026 17:15
Un-filter the senseaudio provider in the new-project media picker for both
the image and video surfaces, register a distinct-id SenseAudio video model
(senseaudio-video-2.0-260128 → wire doubao-seedance-2-0-260128), and add a
daemon renderSenseAudioVideo (async /v1/video/create + /v1/video/status
poll + SSRF-guarded mp4 download). Also swap the SenseAudio TTS voice input
from a free-text datalist to a name-based <select> over the official voice
catalogue (still preserves out-of-catalogue ids).
BYOK proxy streams used to write straight into the POST response with no
buffer, so navigating away froze the chat mid-output with no way to resume.
Route every /api/proxy/*/stream handler through design.runs instead: the
POST answers 202 { runId } and the upstream work (incl. the SenseAudio tool
loop) runs as a registry run, emitting start/stdout/end/error into the run
buffer. A disconnected client leaves the run alive; the web reattaches via
GET /api/runs/:id/events?after= — the same path agent runs use.

- chat-routes: shared run-backed stream adapter + runByokProxy; all six
  protocol handlers (anthropic/openai/azure/google/ollama/senseaudio) keep
  their per-protocol logic and only swap the sink. Upstream fetch now takes
  an AbortSignal for cooperative cancel.
- runs: add an onCancel hook so cancel() can abort a BYOK run's upstream
  fetch (no OS child process to SIGTERM).
- web: streamByokViaDaemon (POST proxy → runId → consumeDaemonRun); the
  BYOK send path pins runId/status/lastEventId onto the assistant row like
  the agent path, and the reattach effect no longer skips non-daemon mode.
…ideo/audio)

The BYOK chat's media tools were hardwired to SenseAudio, and the composer
only exposed a SenseAudio image-model dropdown. Route the tools through the
unified media dispatcher (generateMedia) instead, so a BYOK chat can generate
with ANY catalogue model whose provider has credentials configured in
Settings → Media — and expose image / video / audio model pickers in the
composer over the whole registry.

- byok-tools: generate_image / generate_video now delegate to generateMedia
  (provider-routed, credential-resolved, project-written); add a
  generate_audio (speech / music / sfx) tool. Model validation broadened from
  SenseAudio-only to the full registry per surface; tool `model` enums now
  span every catalogue model.
- senseaudio proxy handler: build toolCtx with composer-selected image / video
  / audio defaults, falling back to the SenseAudio model per surface (the only
  provider whose key this chat seeds) rather than the global catalogue default;
  await the media-config seed so generateMedia finds the key before the first
  tool fires; dispatch the new audio tool.
- composer: replace the single image dropdown with image / video / audio
  pickers grouped by provider; thread the three selections through
  ChatPane → ProjectView → streamByokViaDaemon proxy body.
- i18n: add settings.byokVideoModel / settings.byokAudioModel across locales.
- tests: rewrite byok-tools spec around the generateMedia delegation; add a
  proxy-routes spec proving a composer-selected model reaches the upstream
  generation request (the picker is wired, not decorative).
The three inline image / video / audio selects crowded the composer. Replace
them with a single compact "Media models" button that opens a popover holding
all three pickers (closes on outside click / Escape). Adds the
settings.byokMediaModels label across locales.
…configured

generateMedia falls back to a placeholder stub when OD_MEDIA_ALLOW_STUBS is
set, so a BYOK chat could "succeed" with a fake image for a model whose
provider has no credentials — making the composer model picker look
meaningless. Add a per-call allowStub opt-out and pass allowStub:false from the
BYOK media tools so an unconfigured / unintegrated model surfaces a real error
the chat reports, instead of a placeholder that looks generated.
…tool arg

A SenseAudio chat could generate with its own model choice even after the user
picked a different one, because the LLM's `model` tool argument overrode the
composer selection. Add pinned* fields to the tool context: the user's pick now
wins over any model the LLM passes (pinned > LLM arg > surface fallback), so the
picker actually controls generation — and a pinned unconfigured provider errors
clearly instead of silently falling back to SenseAudio.
handleSend reads the per-session byok image/video/audio model overrides when
building the proxy body, but they were missing from its useCallback deps — so
it closed over the initial empty values and the composer picker never changed
what got generated (daemon logged pin[img=-]). Add the three overrides to the
deps array. Also log the received pin[img/vid/aud] models on each senseaudio
proxy request to make "wrong model" reports diagnosable.
…y path

The BYOK background+reattach change routed API-mode chats through
streamByokViaDaemon instead of the in-browser streamMessage, leaving these
specs driving (and mocking) the old path — red since that commit. Drive the
new run-registry callbacks (onRunCreated/onRunStatus + handlers) so the
empty-response soft-state, success-sound, attachment-inlining, and
ElevenLabs-voice-injection behaviours are verified on the real path; mock
streamByokViaDaemon in resume-conversation so its auto-send no longer leaks an
unhandled rejection.
Each SenseAudio image model accepts a different fixed set of sizes — 2.0 takes
1024x1024 but 1.0 / doubao-seedream-5.0 reject it with `参数错误:size`. The
size map was global (2.0-only), so picking any other image model failed once
the composer selection actually reached the backend. Make senseAudioImageSize
per-model and pick the valid size whose ratio is closest to the requested
aspect.
…tive select)

The native <select> in the media-models popover misbehaved in Electron:
clicking an option in the OS dropdown could trip the popover's outside-click
handler and unmount the select before its change committed, so the user's pick
never reached React state (daemon kept logging pin[img=-]). Replace it with a
custom button-list dropdown — option clicks fire a normal React onClick that
always commits. Verified end-to-end via a real option click: daemon receives
pin[img=senseaudio-image-1.0-260319] and generates with that model.
Two follow-ups so the composer media-model picker behaves as users expect:

- Priority: an explicit model the user names in chat (forwarded by the LLM as
  the tool's `model` arg) now wins over the composer pick, which wins over the
  surface fallback — composer is the DEFAULT, not an override. Earlier it was
  made authoritative while debugging; that prevented overriding via chat. Tool
  descriptions now tell the model to omit `model` unless the user named one.
  Renamed the tool-context fields pinned* → composer* to match.
- Label: the "(default)" option showed the global catalogue default
  (gpt-image-2) even though an empty selection actually routes to the
  SenseAudio model — so users thought gpt-image-2 was selected when it wasn't.
  The picker now labels the default with the SenseAudio surface model, matching
  what the daemon uses.
Show a real-time green dot on a project when it has an in-flight (queued or
running) agent / BYOK run, so users can see at a glance which projects are
still working in the background. Added a useRunningProjectIds hook (polls the
run registry + reacts to RUNS_CHANGED) and surface the dot in two places: the
home "recent projects" cards and the top workspace tabs bar.
…-picker

# Conflicts:
#	apps/web/src/components/ProjectView.tsx
Reconcile the BYOK media tooling with upstream nexu-io#2570 (SenseAudio BYOK TTS).
Upstream added a SenseAudio-specific generate_speech tool that hits /t2a_v2
directly; this branch already routes all media (image/video/audio) through the
unified generateMedia dispatcher, whose renderSenseAudioTTS path performs the
same /t2a_v2 + hex->mp3 work. Kept the full-registry generate_image/video/audio
tools and dropped the duplicate generate_speech tool, executeGenerateSpeech,
and their tests so the chat exposes one TTS path (generate_audio) instead of
two overlapping ones.
@lefarcen lefarcen requested a review from PerishCode May 22, 2026 11:53
@lefarcen lefarcen added size/XXL PR changes 1500+ lines risk/high High risk: apps/desktop, daemon, auth, migration, workflows, package deps type/feature New feature labels May 22, 2026
unknown added 9 commits May 22, 2026 20:36
The merged tree pairs main's ProjectView.reattach-restore.test.tsx (from
nexu-io#2383, which mocks ../../src/i18n with only useT) against ProjectView.tsx,
which calls useI18n at render. Vitest threw "No useI18n export defined on the
mock" for all five reattach cases. Add useI18n to the mock factory so the
component's i18n hook resolves, matching the other ProjectView test files.
Normal agent chats (Claude Code / Codex) now show the same registry-wide
media-model picker the SenseAudio BYOK chat has. The selection is a per-turn
default: the daemon forwards it on the run request and injects it as
OD_DEFAULT_IMAGE_MODEL / OD_DEFAULT_VIDEO_MODEL / OD_DEFAULT_AUDIO_MODEL into
the spawned agent's env, and `od media generate` falls back to it when the
agent omits --model. An explicit --model (the user named a model in chat)
still wins.

- contracts: ChatRequest gains imageModel/videoModel/audioModel.
- cli: `od media generate` --model is now optional; resolves from the matching
  OD_DEFAULT_*_MODEL env when omitted, errors only if neither is set.
- daemon: startChatRun reads the per-turn defaults, injects the OD_DEFAULT_*
  env on spawn, and the system prompt tells the agent to omit --model when a
  default is configured.
- web: the composer picker renders for normal agent chats (gated on daemon
  mode, hidden when the SenseAudio BYOK picker is active) and its empty-state
  "(default)" label names the registry default rather than always SenseAudio.
…chat media

Factor SenseAudio's media tool loop (inject BYOK_MEDIA_TOOLS, stream deltas,
accumulate tool_calls, run generate_image/video/audio via generateMedia, feed
results back, loop) into a reusable runByokMediaChat helper, and refactor the
SenseAudio handler onto it (proxy-routes tests unchanged: 56/56).

Wire the OpenAI-compatible BYOK proxies onto the same helper so they get
in-chat media generation too:
- openai: seeds the BYOK key into media-config.openai (its key also unlocks
  gpt-image + TTS) and defaults each surface to OpenAI's model.
- azure: OpenAI-compatible chat; no dedicated media provider in the registry,
  so it does not self-seed — in-chat media routes to whatever the user
  configured in Settings → Media. Honors the deployment-path model-in-URL rule
  via includeModel.

The helper degrades to a plain LLM passthrough when the request carries no
valid projectId (no tools injected), so non-project BYOK chats are unchanged.
Adds defaultMediaModelsForProvider() to derive per-provider surface defaults.
Ollama keeps its native /api/chat NDJSON passthrough (no OpenAI tool_calls).
… chats

Adds a dedicated runAnthropicMediaChat adapter so Anthropic (Claude) BYOK chats
can generate media in-chat. Claude's Messages API uses a different tool
protocol than OpenAI, so the adapter declares tools as { name, input_schema },
accumulates `tool_use` content blocks (id/name + streamed input_json_delta),
executes generate_image/video/audio via generateMedia, and feeds results back
as `tool_result` content blocks in a user turn. Claude has no media API of its
own, so it never self-seeds — generation routes to whatever the user configured
in Settings → Media.

Broadens the composer media-model picker from SenseAudio-only to every BYOK
protocol that now injects media tools (senseaudio, openai, azure, anthropic),
gated on !agentMediaPickerEnabled so it never collides with the normal-agent
picker. preferProvider names the empty-state default per vendor (SenseAudio /
OpenAI default to their own seeded model; Azure / Anthropic show the registry
default since media comes from separately-configured providers).

Tests: proxy-routes gains an OpenAI mediaEnabled-gating test (tools injected
with a projectId, omitted without) and an Anthropic adapter test (tool_use →
tool_result → text turn). 58/58 pass.
…-picker

# Conflicts:
#	apps/web/tests/components/ProjectView.reattach-restore.test.tsx
…esolution)

Align renderSenseAudioVideo with the SenseAudio /v1/video/create doc and make
the output controllable. This fixes three discrepancies that affected the CLI
(`od media generate`) and the BYOK chat path alike (both route through
renderSenseAudioVideo):

- i2v reference frame: the image content block used the OpenAI shape
  { type:'image_url', image_url:{ url } }, but the doc requires
  { type:'image', url, role:'first_frame' }. Image-to-video now passes the
  reference in the documented form so it is actually honored.
- watermark: the gateway defaults watermark to true, so clips were stamped.
  Send watermark:false by default.
- resolution: was hard-pinned to 720p. Thread an optional resolution
  (480p|720p|1080p, default 720p) end to end — generateMedia args + MediaContext,
  both /api/projects/:id/media/generate handlers, the `od media generate
  --resolution` flag, and the BYOK generate_video tool param. providerNote now
  reports it.

Text-to-video, image model/prompt/size, and prompt passing were already
correct; this only touches the video request body.

Tests: media-senseaudio-video gains a resolution-override assertion and an i2v
content-shape assertion (documented { type:'image', url, role }, not image_url);
existing create-body test now also locks resolution=720p + watermark=false.
Resolves 5 conflicts after main advanced 110 commits:

- chat-routes.ts: keep the runByokMediaChat refactor (in-chat media for all
  media-capable BYOK providers) and fold main's max_tokens/max_completion_tokens
  handling into runTurn — first attempt via buildOpenAIChatTokenParam, with a
  400 isUnsupportedMaxTokensError retry on max_completion_tokens, so GPT-5/o-series
  and Azure deployment aliases work through the shared media loop.
- ChatComposer.tsx / ProjectView.tsx: keep both sides' imports (full media-models
  set + deriveUploadCohort; streamByokViaDaemon + reportChatRunFeedback).
- ProjectView.tsx stream call: use the shared run-lifecycle const handlers and
  add main's analyticsHints spread.
- RecentProjectsStrip.tsx: keep main's !publishedDesignSystem guard on isActive
  and the live isRunning green-dot indicator.
- Drop resume-conversation test (removed on main in nexu-io#2562).

Aligned 2 Azure token-param retry tests to the run-registry (202 {runId}) shape
since BYOK proxies now stream through the run registry.
…udioVideo

The header docblock still described the old OpenAI-style
{ type:'image_url', image_url:{url} } i2v entry; the actual create body
(and its inline comment) already send the documented SenseAudio shape
{ type:'image', url, role:'first_frame' }. Comment-only; no behavior change.
@mzl163 mzl163 marked this pull request as ready for review May 25, 2026 09:44
…docs

The home composer's media options were generic (2K/4K size, 3-30s
duration) and didn't match what the SenseAudio gateway accepts, and a
picked size never reached the generation call. End-to-end fix so the UI
options, the brief, the project metadata, and the actual `od media
generate` call all agree with the SenseAudio docs.

Image (/v1/image/async): per-model discrete pixel sizes (e.g.
2048x1152), shown as a Size dropdown that REPLACES the Ratio dropdown
for discrete-size models (the size already encodes the aspect, never
both). The chosen size flows metadata.imageSize -> system prompt --size
-> CLI --size -> renderSenseAudioImage (honors an explicit documented
size, else maps aspect). Daemon + web size tables expanded to the full
documented per-model lists (incl. sensenova-u1-fast).

Video (/v1/video/create): resolution dropdown = 480p/720p/1080p
(default 720p), duration capped to 4-15s for SenseAudio video; both
model-aware so non-SenseAudio models keep their own options. Resolution
flows metadata.videoResolution -> system prompt --resolution -> daemon.

Also stop the agent echoing a stale/contradictory aspect ratio for
size-based images: the od-media-generation brief no longer hardcodes
"Aspect: {{aspect}}", and the system prompt tells the agent to describe
the output by its exact pixel size only (never approximate a ratio like
16:9). The media-generation contract now documents the --size and
--resolution flags plus the SenseAudio per-model size / video-resolution
rules, so the agent has the full parameter spec, not just the chosen
value.

Contracts: ProjectMetadata gains imageSize + videoResolution.
@lefarcen lefarcen requested a review from Eli-tangerine May 25, 2026 16:03
unknown added 4 commits May 28, 2026 14:59
…-picker

# Conflicts:
#	apps/daemon/src/byok-tools.ts
#	apps/daemon/src/chat-routes.ts
#	apps/daemon/src/media-routes.ts
#	apps/daemon/src/media.ts
#	apps/daemon/tests/byok-tools.test.ts
…-picker

# Conflicts:
#	apps/web/src/i18n/locales/fr.ts
…-picker

Conflicts resolved:
- apps/daemon/src/server.ts: combine per-turn defaultMediaModels with main's
  mediaExecution policy params on agent run spawn.
- apps/daemon/src/prompts/system.ts: call main's renderMediaMetadataAction
  helper while keeping branch's dynamic --size / --resolution dispatch
  args and the 'do NOT restate aspect' instruction for discrete sizes.
- apps/daemon/src/media-routes.ts: keep main's handleGenerate refactor with
  policy gating; thread resolveProjectMediaModel and size/resolution body
  fields through it.
@lefarcen lefarcen requested a review from elihahah666 May 29, 2026 06:11
…-picker

Conflicts resolved:
- apps/daemon/src/chat-routes.ts: keep branch's runByokMediaChat /
  runAnthropicMediaChat / runByokProxy helpers across anthropic, openai,
  azure, google, and ollama proxy routes. Thread main's OpenRouter
  attribution headers (HTTP-Referer / X-Title) through openai authHeaders.
  Drop main's inline azure max_completion_tokens retry pair because the
  helper already performs that retry via buildOpenAIChatTokenParam /
  isUnsupportedMaxTokensError. Adopt main's createDeltaGuard contamination
  protection on the google and ollama paths.
- apps/daemon/src/media.ts: keep both generateMedia args — main's images?:
  string[] (multi-image) and branch's allowStub?: boolean (BYOK strict mode).
- apps/daemon/src/prompts/system.ts: keep branch's imageSize XOR
  aspectRatio dispatch; use main's improved aspectRatio default copy
  ('1:1 (default — use 16:9 for landscape/outdoor scenes, 9:16 for
  portrait/vertical)') in the aspect-only branch.
- apps/web/src/components/NewProjectPanel.tsx: union supportedModels —
  keep senseaudio image+video entries alongside main's new openrouter /
  imagerouter / leonardo / custom-image providers.
- apps/web/tests/components/NewProjectPanel.test.ts: keep both the
  SenseAudio voice catalogue suite and main's OpenRouter visibility cases.

@lefarcen lefarcen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mzl163, the feature scope is laid out clearly — the media-model picker, SenseAudio voice dropdown, and background-run indicators are easy to understand from the description and screenshots. One PR-description detail before reviewers scope this: the Surface area checklist currently leaves API / contract unchecked and says there are no packages/contracts changes, but this branch updates packages/contracts/src/api/chat.ts and packages/contracts/src/api/projects.ts. Could you tick API / contract and adjust that sentence to mention the optional chat media-model fields / project metadata additions? That’ll keep release verification aligned with the actual surface.

…-picker

# Conflicts:
#	apps/daemon/src/media-routes.ts
#	apps/daemon/src/server.ts
@lefarcen lefarcen requested a review from elihahah666 June 3, 2026 06:33
unknown added 2 commits June 3, 2026 21:13
…-picker

Conflicts resolved:
- apps/daemon/src/media-routes.ts: union the import block — keep branch's
  resolveProjectMediaModel + main's isSandboxModeEnabled side by side.
- apps/daemon/src/server.ts (project media route): drop branch's inline
  POST /api/projects/:id/media/generate registration; that route now lives
  in media-routes.ts handleGenerate, which already calls
  resolveProjectMediaModel and forwards size/resolution. Keeping both would
  double-register the route.
- apps/daemon/src/server.ts (composePromptContext return): combine main's
  promptTelemetryParts field with branch's finalPrompt (the media-default
  augmented prompt that injects the composer-selected OD_DEFAULT_*_MODEL
  hints when the user does not name a model in chat).
- apps/web/src/components/ChatComposer.tsx: keep branch's consolidated
  ByokMediaModelsPopover (BYOK + normal-agent variants); drop main's older
  inline SenseAudio-only SearchableModelSelect — the popover now subsumes
  every media-capable BYOK provider plus the normal-agent chat surface.
- apps/web/src/components/ProjectView.tsx: keep branch's per-session media
  model overrides + project-metadata seeding effect + instructions review
  state; main has no equivalent block (branch-only feature).
- apps/web/src/components/WorkspaceTabsBar.tsx: keep both isRunning (branch's
  green-dot indicator) and dragOverClass (main's drag-target highlight)
  derivations; the JSX further down already consumes both.
- apps/web/tests/components/ProjectView.api-empty-response.test.tsx:
  rewrite the project-instructions BYOK system-prompt test off the obsolete
  mockedStreamMessage / StreamHandlers API onto the new mockByokStream
  helper, matching the rest of the file's pattern.
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

@mzl163 friendly reminder: this PR has been waiting on an author response for more than 3 days after reviewer or maintainer feedback.

When you have a chance, please reply here or push an update. To keep the queue manageable, PRs with no author activity for more than 5 days after feedback may be closed automatically, but they can be reopened when work resumes.

unknown added 2 commits June 4, 2026 20:26
…-picker

Conflicts resolved:
- apps/daemon/src/media-routes.ts: union imports — branch's resolveProjectMediaModel + main's aihubmix catalog helpers.
- apps/daemon/src/media.ts: keep branch's renderSenseAudioVideo (full t2v/i2v + watermark/ratio/resolution + SSRF download); follow it with main's renderAIHubMixImage / renderAIHubMixGeminiImage / renderAIHubMixTTS / renderAIHubMixVideo block so both providers coexist. Add assertExternalAssetUrl to the connectionTest import (still referenced by SenseAudio video).
- apps/daemon/src/chat-routes.ts: keep branch's unified runByokMediaChat / runAnthropicMediaChat / runByokProxy architecture for SenseAudio/OpenAI/Azure/Anthropic/Google/Ollama. **Drop main's per-provider registerByokToolChatProxy factory** — its routes (BYOK_SENSEAUDIO_TOOLS / BYOK_AIHUBMIX_TOOLS specific proxies) are not registered through chat-routes anymore. Known regression: AIHubMix BYOK *chat* (in-chat generate_image/video/speech via /api/proxy/aihubmix/stream) no longer works on this branch; AIHubMix media generation still works via /api/media/generate, the CLI, and the media-config picker.
- apps/daemon/src/byok-tools.ts: keep BOTH ecosystems' exports. Branch's BYOK_MEDIA_TOOLS + executeGenerate{Image,Video,Audio} + isImageModel/isVideoModel/isAudioModel + SENSEAUDIO_* constants. Main's BYOK_AIHUBMIX_TOOLS + executeAIHubMixGenerate{Image,Speech,Video} + isAIHubMix*Model helpers + AIHubMix wire constants. Adds withToolRequestInit helper + sleep helper needed by the AIHubMix executors. Adds upstreamApiKey / upstreamBaseUrl optional fields on BYOKToolContext so the AIHubMix executor's session-key shortcut still compiles.
- apps/daemon/tests/byok-tools.test.ts: union imports + node:fs imports back so main's AIHubMix executor tests still compile alongside branch's generateMedia-mock tests.
- apps/web/src/providers/daemon.ts: extend ByokDaemonStreamOptions with byokSpeechModel / byokSpeechVoice so ProjectView's submit shape passes typecheck.
- apps/web/src/components/{ChatComposer,ChatPane,ProjectView}.tsx: union both sides' props/state (HEAD's BYOK audio override + agent-media-picker / image/video/audio model overrides + project-id seed effect; main's BYOK speech override + voice override + workspaceContext props + live byok model options hooks + Escape-to-close on instructions editor).
- apps/web/src/components/ChatComposer.tsx (composer JSX): take main's new LexicalComposerInput body (the inline BYOK media-model popovers are deferred pending a unified picker — main intentionally removed them; the props/handlers stay wired for the future unified surface).
- apps/web/src/components/NewProjectPanel.tsx: union supportedModels — SenseAudio + main's openrouter / imagerouter / leonardo / custom-image / aihubmix providers; union imports (SenseAudio voices + AIHubMix live-model hooks).
- apps/web/src/components/home-hero/media-surfaces.ts: keep branch's imageSize-XOR-aspect dispatch (drops main's separate resolution field for image).
- apps/web/src/i18n/types.ts + 18 locale files: union both sides' keys (byokAudioModel + byokVideoI2vHint / byokSpeechModel / byokSpeechVoice / byokModelDefaultOption). Locale union deduped — 4 locales had a duplicate byokVideoModel that got cleaned up.
…-picker

Conflict: apps/web/src/components/ChatPane.tsx — main refactored the chat composer to render via a portal (ref=composerSlotRef + createPortal(composerNode, composerPortalTarget)) so the composer can detach from the chat-log scroll container while staying visually pinned. Took main's portal JSX, then added the branch-only props (byokAudioModel/onChangeByokAudioModel, agentMediaPickerEnabled, image/video/audioModel + handlers) to the composerNode declaration upstream so the portal still threads through every BYOK / agent media-model state the rest of the branch wires up.
@mzl163

mzl163 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

@lefarcen Thanks for catching that! I’ve updated the PR description to tick API / contract and mentioned the optional chat media-model fields + project metadata changes. I also resolved conflicts with the latest environment. Everything should be aligned now for review!

unknown added 13 commits June 4, 2026 21:00
…ctory

These exercise main's `registerByokToolChatProxy` AIHubMix proxy route
(POST /api/proxy/aihubmix/stream) and its claude*/gemini* model-routing
branches. The merge at 04c79c2 kept the branch's unified runByokMediaChat
architecture and did not adopt that factory, so the route is not registered
and the tests 404 in this branch. Skipped instead of deleted so the intent
and assertion shapes stay reviewable for when AIHubMix BYOK chat is brought
back in line with main's surface.

Tests skipped:
- routes AIHubMix to /v1/chat/completions with tools + APP-Code header
- routes AIHubMix claude* models to the Anthropic /v1/messages wire
- routes AIHubMix gemini* models to the Gemini streamGenerateContent wire
- runs the BYOK media tool loop on the AIHubMix claude (Anthropic) route
- runs the BYOK media tool loop on the AIHubMix gemini route

Local: tests/proxy-routes.test.ts → 66 passed | 5 skipped.
…-picker

Conflicts resolved:
- apps/web/src/components/HomeView.tsx (footerInputNamesForChip): take
  main's streamlined Home composer chrome for prototype/deck chips
  (['designSystem'] only — PR nexu-io#3692 removed fidelity/slideCount/
  speakerNotes from the inline footer); keep branch's image chip
  ['designSystem', 'model', 'ratio', 'size'] (SenseAudio discrete-size
  image flow depends on 'size'; main's 'resolution' for image is a no-op
  on the branch's media-surfaces field schema).
- apps/web/src/components/WorkspaceTabsBar.tsx (per-tab derived values):
  keep both — branch's isRunning (live background-run green dot from
  run registry) and main's isHome (Home tab pinned-leftmost guard).
  The JSX downstream consumes both.
…-picker

Conflicts resolved this round (28 commits behind, mostly i18n drift + UI surface tidy-ups):
- apps/daemon/src/prompts/system.ts (metadata block): keep branch's
  imageSize-XOR-aspectRatio dispatch in the metadata description; both
  sides write the aspectRatio hint with the same default copy now.
- apps/web/src/components/ChatComposer.tsx (imports): keep branch's media
  models + groupByProvider + fetchConnectors imports; main only removed
  fetchConnectors locally.
- apps/web/src/components/ChatPane.tsx (props destructure): union both
  sides — branch's image/video/audio overrides + agentMediaPickerEnabled,
  main's composerLeadingAccessory.
- apps/web/src/components/HomeView.tsx (footerInputNamesForChip): take
  main's full simplification — image / video now expose only
  ['designSystem']; the agent asks ratio / duration / model / size /
  resolution during the run via AskUserQuestion. Same for hyperframes /
  audio (no pills). SenseAudio discrete-size flow moves off the home
  composer into Settings → Media + agent discovery.
- apps/web/src/components/WorkspaceTabsBar.tsx (per-tab derived values):
  keep branch's isRunning (live green dot) and adopt main's isPinned
  (replaces isHome; downstream JSX uses isPinned).
- apps/web/src/components/home-hero/media-surfaces.ts: take main's
  simplified media metadata + query template — image / video / hyperframes
  / audio surfaces no longer seed model / ratio / size / resolution /
  duration / voice from the composer; system prompt prints
  '(unknown — ask: …)' instead.
- apps/web/src/i18n/locales/zh-TW.ts: take branch's big block of zh-TW
  entries (main side empty in the conflict region), then dedupe duplicate
  keys (~396) introduced by the union and drop 90+ orphan keys that main
  removed from types.ts (chat.designToolbox.*, workspace.newSideChat*,
  home.openExistingProject*, home.chooseFolderSubtitle).
- apps/web/tests/components/HomeView.media-options.test.tsx: align two
  cases with main's footer-pill simplification (image / video only expose
  designSystem pill; no model / ratio / duration / resolution / size).
  The SenseAudio image-size / video-resolution test was rewritten as
  it.skip with a doc comment — its DOM never renders under the new
  composer; re-enable when those pills come back to the home composer.
…-picker

Single-file conflict: apps/web/src/i18n/locales/zh-TW.ts. HEAD added chat.amrCard.* / chat.amrError.* / chat.antigravityError.* / plugins.actions.* / workingDirPicker.* entries; main added homeHero.* + chip.* + updated workingDirPicker.* copy (PR nexu-io#3880 'Local storage' reframe). Resolution: keep both sides (strip markers, run multi-line-aware key dedupe), first occurrence wins — branch's workingDirPicker translations stay; main's homeHero entries land where branch didn't have them.
…-picker

Single-file conflict: apps/web/src/components/ProjectView.tsx (state block). HEAD had image/video/audio model overrides + project-id seed effect + instructions review/edit/escape state; main side empty (PR nexu-io#3924 removed the project instructions editor entry). Resolution: keep branch's image/video/audio overrides + seed effect (heavily consumed by submit path + composer props), drop the instructions review/edit/escape state (no downstream consumers in the merged file after main's removal).
…-picker

Single-file conflict: apps/web/src/i18n/locales/zh-TW.ts (3 chat.mode.design.* translations). Pure copy refresh — take main's wording: '設計' / '設計模式' / '即時看板' (matches the EN canonical 'Design' label) over branch's earlier 'Design Agent' / '即時產物' draft.
…-picker

Single-file conflict: apps/web/src/i18n/locales/zh-TW.ts (questions.* block). HEAD had English placeholder strings ('Questions' / 'Mind if I ask…' / 'Continue' / …); main added proper zh-TW translations + a new questions.bannerAnswered key. Take main's translations — branch had untranslated drafts.
…-picker

Conflicts resolved:
- apps/web/src/i18n/locales/zh-TW.ts (header): keep branch's '...en' spread fallback (annotates that strict-keys-only adoption is deferred until in-flight feature keys land their translations) + main's explicit zh-TW translations. Dedupe duplicate keys + drop 99 orphan keys that main removed from types.ts (chat.designToolbox.*, generationPreview.title, ...). Re-append the file's trailing '};' that the orphan strip ate.
- apps/web/src/components/ProjectView.tsx (BYOK send path): take main's three inlined run-lifecycle callbacks (onRunCreated / onRunStatus / onRunEventId) — they carry main's superseded-run gating (supersededRunsRef.current.has) and the new clearCurrentRunStreamingMarker helper. Branch's shorthand-by-name references to the older named handlers above would otherwise plug stale logic in.
Two regressions surfaced on PR nexu-io#2720 CI (Browser tests + Strict PR visual tests) that didn't fire on main's HEAD:

1. workspace-keyboard-flows.test.ts — 'Enter sends / Shift+Enter inserts a newline' (received runCount=6, expected 0 before pressing Enter).
   Root cause: useRunningProjectIds (the green-dot indicator hook this branch added) polled GET /api/runs every 2500ms via setInterval. The Playwright test mock 'page.route("**/api/runs", …)' caught those GETs alongside the POST it was actually counting.
   Fix: hook now fetches '/api/runs?status=active' (URL with query string falls outside the test's bare '**/api/runs' glob) and drops the setInterval. Refresh on mount + RUNS_CHANGED event + visibilitychange, no periodic polling. The active filter is also what the indicator semantically wants, so this is a cleanup rather than a workaround.

2. api-empty-response.test.ts — 'API empty stream shows No output instead of Done' (.assistant-label 'No output' not visible).
   Root cause: branch's ProjectView routed BYOK chat through streamByokViaDaemon, which POSTs to /api/proxy/<protocol>/stream and expects a JSON '{ runId }' response before consuming /api/runs/:id/events. But the daemon's /api/proxy routes (runByokMediaChat) still respond with an SSE stream directly. So the .json() decode threw, onError fired, onDone never ran, and the empty_response status event never landed — the assistant card stayed on 'Done'.
   Fix: revert BYOK send to streamMessage (providers/anthropic.ts) — main's direct-SSE-consumption path that emits onDelta/onDone/onError as the proxy SSE arrives. byokHandlers.onDone still triggers ProjectView's emptyApiResponse check (config.mode === 'api' && empty text/html), which writes the 'empty_response' status the test asserts on. Drops the onRunCreated/onRunStatus/onRunEventId wiring that only matters when the response carries a runId (which the proxy doesn't).

Both fixes are scoped to the two regressing files; no daemon or contracts changes.
Branch's ProjectView already uses streamMessage for BYOK turns (matches main).
The merge kept HEAD's older version of this test which still mocked
streamByokViaDaemon and asserted on it — those calls never happened, so
13 cases failed. Take main's version of the test verbatim; it mocks and
asserts on streamMessage to match the actual code path.
…e path

Same merge-resolution residue as the api-empty-response fix: branch's
ProjectView uses streamMessage for BYOK turns, but this test still
mocked + asserted on streamByokViaDaemon. Take main's version so the
mock matches what the code actually calls.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk/high High risk: apps/desktop, daemon, auth, migration, workflows, package deps size/XXL PR changes 1500+ lines type/feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants