Skip to content

Add a codex-cli image provider: no-API-key generation via the signed-in Codex CLI #4128

@fancy-agent

Description

@fancy-agent

Problem

Open Design already has a no-API-key image path, but it only fires in one
narrow cell: when the chat agent is Codex and the project is an image
project, the daemon injects the renderCodexImagegenOverride prompt (PR #622)
so Codex calls its built-in image_gen tool. Every other combination needs a
provider API key.

The result is a coverage gap for the (large) set of users who have a ChatGPT
subscription and a signed-in Codex CLI but no OPENAI_API_KEY:

Chat agent How they generate an image today Needs an API key?
Codex built-in image_gen via the #622 prompt override No
Claude Code / Gemini / any other od media generateopenai / volcengine / … provider Yes

So a prototyper running Claude Code (or Gemini, or Cursor) in Open Design
cannot generate an image at all unless they go get an API key — even though the
machine already has a perfectly good, already-authenticated image generator
sitting in the Codex CLI.

Proposed solution

Add a local codex-cli media provider with one image model,
codex-image-gen, that drives the operator's already-signed-in Codex CLI.
On od media generate --surface image --model codex-image-gen … the daemon
spawns a headless codex exec turn, tells it to run its built-in image_gen
tool, and reads back the produced PNG. The bytes bill against the user's
ChatGPT subscription; Open Design never sees an API key.

The increment over #622 is agent independence: because this is a real
provider behind od media generate, it works no matter which coding agent
drives the chat. #622 only rewires generation when Codex itself is the agent.

This is the same shape as the existing hyperframes provider — a local
renderer with credentialsRequired: false and settingsVisible: false, run
inside the daemon process — not a hosted API. It is not a generic provider
router or a provider-account pool (the things docs/external-media-orchestration.md
rules out): it drives the single local user's own Codex login, with no pooling,
no OD-owned budget, and no stored credentials.

It also slots cleanly under the run-scoped media execution policy (#3106):
since it goes through od media generate, a run with
mediaExecution: { mode: "enabled", allowedModels: [...] } governs it for free,
the same way it governs every other model.

Feasibility evidence (smoke on a real machine)

codex-cli 0.132.0, signed in with an official ChatGPT account (OAuth, no
OPENAI_API_KEY, no third-party model_provider gateway). The built-in
image_gen tool is mounted and produces a real PNG with no key:

$ printf '%s' 'Use your built-in image_gen tool to generate: an orange cat on a
  blue cube, flat design, 1:1. Copy the output to /tmp/codex-imagegen-smoke.png
  and report the final path. If image_gen is not available, say
  IMAGE_GEN_UNAVAILABLE and stop.' \
  | codex exec --json --skip-git-repo-check -
…
$ file /tmp/codex-imagegen-smoke.png
/tmp/codex-imagegen-smoke.png: PNG image data, 1254 x 1254, 8-bit/color RGB

The --json event stream shows only command_execution + agent_message
items — no API-key path, no scripts/image_gen.py CLI fallback. The image came
from the built-in tool.

Also verified that the conservative --sandbox workspace-write (the daemon's
default codex sandbox on macOS/Linux) is enough — image_gen reaches its
backend over codex's own connection, not the sandboxed shell, so it works
without danger-full-access.

Note: official OAuth login is a hard prerequisite — on a machine where Codex
is pointed at a third-party relay/gateway, image_gen is not mounted. The
provider detects this and fails loudly (see risks).

Mechanism

  • Register provider codex-cli + model codex-image-gen (caps: ['t2i'],
    credentialsRequired: false, settingsVisible: false) in both
    apps/web/src/media/models.ts and the daemon mirror
    apps/daemon/src/media-models.ts.
  • Dispatcher branch in apps/daemon/src/media.ts: spawn codex exec --json --skip-git-repo-check --sandbox workspace-write -C <tmp>, prompt via stdin
    (reusing the existing resolveAgentExecutable(codexAgentDef) for the binary
    and codexNeedsDangerFullAccessSandbox() for the Windows/WSL sandbox
    escalation), render into a private temp workspace, return the PNG bytes to
    the generic dispatcher which writes them into the project.
  • Confirm success from ground truth, because image_gen leaves no distinct
    event in codex exec --json: codex exits 0, the file exists at the path we
    handed it, and it carries a PNG signature. Otherwise throw — no stub
    fallback (same as hyperframes).
  • Timeout with SIGTERM→SIGKILL; stream codex's progress through onProgress.

Risks / self-disclosure

  • New shape: the daemon spawns an agent CLI as a generation backend. Today
    the daemon shells out to npx hyperframes (local render); this extends that
    to codex exec. Subprocess lifecycle is bounded (timeout + SIGKILL, temp
    workspace cleaned in finally).
  • Sandbox choice. Uses workspace-write + network (the daemon's existing
    codex policy), escalating to danger-full-access only where Codex has no
    working sandbox (Windows/WSL), via the existing helper.
  • It spends the user's ChatGPT quota. Each generation consumes the user's
    own subscription budget. That's the point (no key needed), but it must never
    be the default model — it is opt-in via the model picker only; the default
    image model stays gpt-image-2.
  • Login dependency. Requires an official ChatGPT-authenticated Codex CLI.
    If image_gen is unavailable (no login / relay gateway), the provider throws
    a clear, actionable error rather than degrading to a placeholder.
  • First version is t2i only — no i2i/inpaint/reference-image yet.

Happy to implement this; PR to follow with the provider, mock-based tests, and
real-generation evidence.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions