You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This proposes a concrete, lightweight design for the Reasoning band (the yellow @1Hz lane in the 24/06 architecture), and — more importantly — asks the WG to settle the supervision, which is the open design choice. It builds on the System-2 causal head already merged in #81.
Encoded Visual History (from the World Model, #85/#93)
--> Scenario Description --> Predicted Scenario
--> Video-Language-Model loss (student/teacher) # front camera only, 1 Hz
Objective (from the 24/06 notes): help the policy handle edge cases. The band classifies the driving scenario w.r.t. the ODD and emits (a) a classification vector to the Trajectory Planner (to modulate the trajectory) and/or (b) scenario-description text/tokens, learned without explicit labels besides the trajectory (a student/teacher VLM setup). Edge cases to stress-test come from the KIT long-tail set.
This is a design sketch for discussion — corrections and advice very welcome.
Design principles
Cheap & 1 Hz. Small trainable heads on top of the already-computed Encoded Visual History; no extra backbone pass.
Opt-in, decoupled. Default off → the Reactive/World paths are unchanged; the band is additive with its own loss module (matches the "separate loss modules per branch" action item from 24/06).
Nothing of the Reasoning band is wired today. The 24/06 meeting redefined its supervision to a scenario-description VLM (student/teacher), which is not yet specified or built.
Proposed design (modular, opt-in)
C1 — Scenario encoder. Consume the Encoded Visual History [B, 896] (1 Hz) → small MLP/attention head → scenario latent.
C3 — Scenario-description head (optional). A light decoder emitting scenario-description tokens, trained against a teacher (see open questions).
C4 — Planner coupling. Feed C2's vector into the Trajectory Planner via a zero-init adaptive gate (FiLM-style; no-op at init so the reactive baseline is unchanged — see decisions).
C5 — reasoning_loss module (separate per-branch loss): student/teacher distillation for C3 (+ optional classification supervision for C2).
LINGO front-cam; 1 Hz aligns with the World Model; KIT = the edge cases
Implementation plan (phased, additive)
Module skeleton + synthetic test — ReasoningHead with the I/O contract, tested on random tensors; default off / zeros fallback so the rest is unchanged.
C4 — wire the classification vector into the planner behind a flag.
C3 + C5 — scenario-description head + student/teacher reasoning_loss, once the teacher/supervision is fixed (open questions below).
Edge-case eval on the KIT long-tail set.
Would land as a separate PR after the World Model (#85) merges, the same way the World Model was built.
Open questions (supervision — need WG / @m-zain-khawaja input)
Defaults proposed above; the points that really need your call:
Teacher signal — OK with an open-weights VLM (Qwen2-VL / InternVL) as an offline, train-only auto-labeller, or do you prefer the Alpamayo CoC autolabeler from the start?
DriveVLM-RL: https://arxiv.org/abs/2603.18315 — precedent for keeping the VLM only at training time and removing it at inference (asynchronous), which is the deployability argument for the student/teacher setup proposed here (the teacher VLM is train-only; the student stays in the network).
Summary
This proposes a concrete, lightweight design for the Reasoning band (the yellow @1Hz lane in the 24/06 architecture), and — more importantly — asks the WG to settle the supervision, which is the open design choice. It builds on the System-2 causal head already merged in #81.
Flow from the 24/06 design sketch by @m-zain-khawaja:
Objective (from the 24/06 notes): help the policy handle edge cases. The band classifies the driving scenario w.r.t. the ODD and emits (a) a classification vector to the Trajectory Planner (to modulate the trajectory) and/or (b) scenario-description text/tokens, learned without explicit labels besides the trajectory (a student/teacher VLM setup). Edge cases to stress-test come from the KIT long-tail set.
This is a design sketch for discussion — corrections and advice very welcome.
Design principles
SceneContext+ causal head, already merged) and feat(world-model): slow World Model branch (JEPA) + AutoE2E wiring (#13) #85's Encoded Visual History (896).Current state — what exists / what's missing
Exists in
main:SceneContext+ System-2 causal head (structured classification output, not free-form text). Merged but not wired to the planner.Missing (this proposal):
Proposed design (modular, opt-in)
[B, 896](1 Hz) → small MLP/attention head → scenario latent.SceneContextfrom feat(reasoning): optional System 2 causal head behind a typed SceneContext #81 → the planner-facing signal.reasoning_lossmodule (separate per-branch loss): student/teacher distillation for C3 (+ optional classification supervision for C2).Reuse map
SceneContext+ causal head (#81)losses/(per-branch modules)Key design decisions — proposed defaults (to confirm with the WG)
Each comes with a SOTA-grounded default so we can converge fast; happy to change any.
reasoning_loss, 1:1 startResidualMapFusionalpha=0 pattern (won't destabilise)Implementation plan (phased, additive)
ReasoningHeadwith the I/O contract, tested on random tensors; default off / zeros fallback so the rest is unchanged.reasoning_loss, once the teacher/supervision is fixed (open questions below).Would land as a separate PR after the World Model (#85) merges, the same way the World Model was built.
Open questions (supervision — need WG / @m-zain-khawaja input)
Defaults proposed above; the points that really need your call:
If you confirm the teacher signal + the v1 student output, I'll start with C1–C2 + the (zero-init) planner gate, reusing #81.
References