Problem
Open Design already has a no-API-key image path, but it only fires in one
narrow cell: when the chat agent is Codex and the project is an image
project, the daemon injects the renderCodexImagegenOverride prompt (PR #622)
so Codex calls its built-in image_gen tool. Every other combination needs a
provider API key.
The result is a coverage gap for the (large) set of users who have a ChatGPT
subscription and a signed-in Codex CLI but no OPENAI_API_KEY:
| Chat agent |
How they generate an image today |
Needs an API key? |
| Codex |
built-in image_gen via the #622 prompt override |
No |
| Claude Code / Gemini / any other |
od media generate → openai / volcengine / … provider |
Yes |
So a prototyper running Claude Code (or Gemini, or Cursor) in Open Design
cannot generate an image at all unless they go get an API key — even though the
machine already has a perfectly good, already-authenticated image generator
sitting in the Codex CLI.
Proposed solution
Add a local codex-cli media provider with one image model,
codex-image-gen, that drives the operator's already-signed-in Codex CLI.
On od media generate --surface image --model codex-image-gen … the daemon
spawns a headless codex exec turn, tells it to run its built-in image_gen
tool, and reads back the produced PNG. The bytes bill against the user's
ChatGPT subscription; Open Design never sees an API key.
The increment over #622 is agent independence: because this is a real
provider behind od media generate, it works no matter which coding agent
drives the chat. #622 only rewires generation when Codex itself is the agent.
This is the same shape as the existing hyperframes provider — a local
renderer with credentialsRequired: false and settingsVisible: false, run
inside the daemon process — not a hosted API. It is not a generic provider
router or a provider-account pool (the things docs/external-media-orchestration.md
rules out): it drives the single local user's own Codex login, with no pooling,
no OD-owned budget, and no stored credentials.
It also slots cleanly under the run-scoped media execution policy (#3106):
since it goes through od media generate, a run with
mediaExecution: { mode: "enabled", allowedModels: [...] } governs it for free,
the same way it governs every other model.
Feasibility evidence (smoke on a real machine)
codex-cli 0.132.0, signed in with an official ChatGPT account (OAuth, no
OPENAI_API_KEY, no third-party model_provider gateway). The built-in
image_gen tool is mounted and produces a real PNG with no key:
$ printf '%s' 'Use your built-in image_gen tool to generate: an orange cat on a
blue cube, flat design, 1:1. Copy the output to /tmp/codex-imagegen-smoke.png
and report the final path. If image_gen is not available, say
IMAGE_GEN_UNAVAILABLE and stop.' \
| codex exec --json --skip-git-repo-check -
…
$ file /tmp/codex-imagegen-smoke.png
/tmp/codex-imagegen-smoke.png: PNG image data, 1254 x 1254, 8-bit/color RGB
The --json event stream shows only command_execution + agent_message
items — no API-key path, no scripts/image_gen.py CLI fallback. The image came
from the built-in tool.
Also verified that the conservative --sandbox workspace-write (the daemon's
default codex sandbox on macOS/Linux) is enough — image_gen reaches its
backend over codex's own connection, not the sandboxed shell, so it works
without danger-full-access.
Note: official OAuth login is a hard prerequisite — on a machine where Codex
is pointed at a third-party relay/gateway, image_gen is not mounted. The
provider detects this and fails loudly (see risks).
Mechanism
- Register provider
codex-cli + model codex-image-gen (caps: ['t2i'],
credentialsRequired: false, settingsVisible: false) in both
apps/web/src/media/models.ts and the daemon mirror
apps/daemon/src/media-models.ts.
- Dispatcher branch in
apps/daemon/src/media.ts: spawn codex exec --json --skip-git-repo-check --sandbox workspace-write -C <tmp>, prompt via stdin
(reusing the existing resolveAgentExecutable(codexAgentDef) for the binary
and codexNeedsDangerFullAccessSandbox() for the Windows/WSL sandbox
escalation), render into a private temp workspace, return the PNG bytes to
the generic dispatcher which writes them into the project.
- Confirm success from ground truth, because
image_gen leaves no distinct
event in codex exec --json: codex exits 0, the file exists at the path we
handed it, and it carries a PNG signature. Otherwise throw — no stub
fallback (same as hyperframes).
- Timeout with SIGTERM→SIGKILL; stream codex's progress through
onProgress.
Risks / self-disclosure
- New shape: the daemon spawns an agent CLI as a generation backend. Today
the daemon shells out to npx hyperframes (local render); this extends that
to codex exec. Subprocess lifecycle is bounded (timeout + SIGKILL, temp
workspace cleaned in finally).
- Sandbox choice. Uses
workspace-write + network (the daemon's existing
codex policy), escalating to danger-full-access only where Codex has no
working sandbox (Windows/WSL), via the existing helper.
- It spends the user's ChatGPT quota. Each generation consumes the user's
own subscription budget. That's the point (no key needed), but it must never
be the default model — it is opt-in via the model picker only; the default
image model stays gpt-image-2.
- Login dependency. Requires an official ChatGPT-authenticated Codex CLI.
If image_gen is unavailable (no login / relay gateway), the provider throws
a clear, actionable error rather than degrading to a placeholder.
- First version is t2i only — no i2i/inpaint/reference-image yet.
Happy to implement this; PR to follow with the provider, mock-based tests, and
real-generation evidence.
Problem
Open Design already has a no-API-key image path, but it only fires in one
narrow cell: when the chat agent is Codex and the project is an image
project, the daemon injects the
renderCodexImagegenOverrideprompt (PR #622)so Codex calls its built-in
image_gentool. Every other combination needs aprovider API key.
The result is a coverage gap for the (large) set of users who have a ChatGPT
subscription and a signed-in Codex CLI but no
OPENAI_API_KEY:image_genvia the #622 prompt overrideod media generate→openai/volcengine/ … providerSo a prototyper running Claude Code (or Gemini, or Cursor) in Open Design
cannot generate an image at all unless they go get an API key — even though the
machine already has a perfectly good, already-authenticated image generator
sitting in the Codex CLI.
Proposed solution
Add a local
codex-climedia provider with one image model,codex-image-gen, that drives the operator's already-signed-in Codex CLI.On
od media generate --surface image --model codex-image-gen …the daemonspawns a headless
codex execturn, tells it to run its built-inimage_gentool, and reads back the produced PNG. The bytes bill against the user's
ChatGPT subscription; Open Design never sees an API key.
The increment over #622 is agent independence: because this is a real
provider behind
od media generate, it works no matter which coding agentdrives the chat. #622 only rewires generation when Codex itself is the agent.
This is the same shape as the existing
hyperframesprovider — a localrenderer with
credentialsRequired: falseandsettingsVisible: false, runinside the daemon process — not a hosted API. It is not a generic provider
router or a provider-account pool (the things
docs/external-media-orchestration.mdrules out): it drives the single local user's own Codex login, with no pooling,
no OD-owned budget, and no stored credentials.
It also slots cleanly under the run-scoped media execution policy (#3106):
since it goes through
od media generate, a run withmediaExecution: { mode: "enabled", allowedModels: [...] }governs it for free,the same way it governs every other model.
Feasibility evidence (smoke on a real machine)
codex-cli 0.132.0, signed in with an official ChatGPT account (OAuth, noOPENAI_API_KEY, no third-partymodel_providergateway). The built-inimage_gentool is mounted and produces a real PNG with no key:The
--jsonevent stream shows onlycommand_execution+agent_messageitems — no API-key path, no
scripts/image_gen.pyCLI fallback. The image camefrom the built-in tool.
Also verified that the conservative
--sandbox workspace-write(the daemon'sdefault codex sandbox on macOS/Linux) is enough —
image_genreaches itsbackend over codex's own connection, not the sandboxed shell, so it works
without
danger-full-access.Mechanism
codex-cli+ modelcodex-image-gen(caps: ['t2i'],credentialsRequired: false,settingsVisible: false) in bothapps/web/src/media/models.tsand the daemon mirrorapps/daemon/src/media-models.ts.apps/daemon/src/media.ts: spawncodex exec --json --skip-git-repo-check --sandbox workspace-write -C <tmp>, prompt via stdin(reusing the existing
resolveAgentExecutable(codexAgentDef)for the binaryand
codexNeedsDangerFullAccessSandbox()for the Windows/WSL sandboxescalation), render into a private temp workspace, return the PNG bytes to
the generic dispatcher which writes them into the project.
image_genleaves no distinctevent in
codex exec --json: codex exits 0, the file exists at the path wehanded it, and it carries a PNG signature. Otherwise throw — no stub
fallback (same as
hyperframes).onProgress.Risks / self-disclosure
the daemon shells out to
npx hyperframes(local render); this extends thatto
codex exec. Subprocess lifecycle is bounded (timeout + SIGKILL, tempworkspace cleaned in
finally).workspace-write+ network (the daemon's existingcodex policy), escalating to
danger-full-accessonly where Codex has noworking sandbox (Windows/WSL), via the existing helper.
own subscription budget. That's the point (no key needed), but it must never
be the default model — it is opt-in via the model picker only; the default
image model stays
gpt-image-2.If
image_genis unavailable (no login / relay gateway), the provider throwsa clear, actionable error rather than degrading to a placeholder.
Happy to implement this; PR to follow with the provider, mock-based tests, and
real-generation evidence.