feat(data_parsing): L2D 1 Hz sequential multi-view windows for the World Model (#16) by gcordova10 · Pull Request #95 · autowarefoundation/auto_e2e

gcordova10 · 2026-06-29T06:21:57Z

What this adds

L2DDataset currently loads only the current frame and leaves visual_history
as zeros, with no future frames — so the World Model / JEPA (#13, #85) has neither
history nor targets. This PR adds sequential 1 Hz multi-view windows (the
"Sequential frame access (needed for #13)" checklist item in #16).

Design (testable without `lerobot`)

data_parsing/l2d/world_model_windows.py (new, pure / dataset-agnostic):
- stride_for_hz(source_hz, wm_hz) — L2D 10 Hz → 1 Hz = stride 10.
- window_offsets(N, stride) — past [-(N-1)s..0] (oldest→newest, current last)
  and future [s..N*s].
- required_margins(N, stride) = ((N-1)*s, N*s).
- build_windows(load_frame, row, ep_start, ep_end, num_frames=4, stride=10) →
  (history_frames, future_frames), each [N, V, 3, H, W]; raises IndexError
  if a window would cross the episode boundary. Takes a load_frame(row)->[V,3,H,W]
  callable, so it has no dataset/lerobot dependency and is unit-tested with a
  synthetic loader.
data_parsing/l2d/dataset.py (opt-in, default OFF → byte-identical to before):
- new args include_world_model_windows=False, wm_num_frames=4, wm_hz=1.0, source_hz=10.0;
- _load_multiview_frame(row) refactor, reused for the current frame and every
  window frame;
- _build_sample_index margins take the max of the egomotion window (64/64)
  and the World Model windows, so a valid frame always has a complete window;
- __getitem__ emits history_frames / future_frames when enabled;
  L2DSample gains the two optional keys (NotRequired, version-guarded import
  for py3.10/3.12).

How it fits

Proposal: Feature Reconstruction Loss for FutureState (Auxiliary Self-Supervised Task) #13 train_il: compute_step_loss consumes exactly batch["history_frames"]
/ ["future_frames"] → encode_history → predict_future → jepa_loss (these were
synthetic before).
feat(world-model): slow World Model branch (JEPA) + AutoE2E wiring (#13) #85 World Model: history_frames feeds WorldActionModel; future_frames
are the JEPA targets.
Does not overlap Proposal: World Action Model (1 Hz camera multi-view encoder without BEV view + encoded features temporal fusion) #93 (different area — data_parsing).

1 Hz / 10 Hz

L2D is 10 Hz (egomotion.py, _DT=0.1), so source_hz=10.0 (stride 10).
Parametrized: a 30 FPS source → source_hz=30.

Scope / not in this PR (kept honest)

Camera calibration (BEV camera_params): TODO: Dataset Selection and DataLoader Implementation #16 lists it, but I keep this PR to
the sequential windows. l2d/camera.py already has make_camera_params_placeholder()
(identity) with a documented TODO to parse intrinsic @ extrinsic → [3,4] per view
from L2D's extrinsic_RDF.yaml; BEV fusion runs on its learnable pseudo-projection
meanwhile (not blocking). Follow-up: the real YAML parser.
Performance: decoding N×2 multi-view frames per sample is expensive; for real
training prefer pre-extraction (data_parsing/pre_extracted.py) or caching. The
functional correctness here is independent.

Tests

pytest Model/tests/test_world_model_windows.py → 14 passed. ruff + mypy clean.
Diff: 2 new files + 2 edited (dataset.py, __init__.py).

…rld Model (autowarefoundation#16) Implements the 'sequential frame access (needed for the feature reconstruction loss autowarefoundation#13)' item of autowarefoundation#16, on the agreed LeRobot L2D dataset. Problem: L2DDataset loaded only the current frame and returned visual_history as zeros, so the World Model had no past context and no future targets for the JEPA loss. - world_model_windows.py (new, dataset-agnostic + unit-testable without lerobot): stride_for_hz (10 Hz -> 1 Hz = stride 10; parametrised for e.g. 30 fps), window_offsets, required_margins, build_windows(load_frame, row, ep_start, ep_end, N, stride) -> (history_frames, future_frames), each [N, V, 3, H, W], oldest->newest, with episode-boundary checks (no cross-episode leakage). - L2DDataset: opt-in include_world_model_windows (default OFF -> byte-identical). Emits history_frames/future_frames [N,7,3,H,W]; valid-index margins take the max of egomotion (64/64) and the World-Model window; _load_multiview_frame refactored and reused. Feeds train_il's JEPA term directly. - 14 tests (pure windowing logic, no dataset download); mypy/ruff clean. Not done (flagged): camera calibration extraction (BEV works via pseudo_projection; L2D calib API unverified offline) and pre-extraction/caching for the heavy multi-view video decode — both follow-ups; functional correctness is independent. Signed-off-by: GABRIELA CORDOVA <100548769@alumnos.uc3m.es>

gcordova10 · 2026-06-29T06:46:37Z

CI green . This implements the "Sequential frame access (needed for #13)" item from #16: 1 Hz multi-view past/future windows for the World Model.

New pure, dataset-agnostic module data_parsing/l2d/world_model_windows.py (stride_for_hz, window_offsets, required_margins, build_windows) — unit-tested without lerobot (14 tests) by injecting a synthetic frame loader.
L2DDataset extended opt-in (include_world_model_windows=False by default → unchanged otherwise): emits history_frames / future_frames [N, V, 3, H, W], with episode-boundary checks.

Feeds the JEPA directly: #13's compute_step_loss consumes batch["history_frames"] / ["future_frames"], and #85's World Model uses them as input / targets (they were synthetic placeholders before). Independent of #85 (different area) and doesn't overlap #93.

Kept honest about scope (details in the PR body): BEV camera calibration is left as a documented TODO in l2d/camera.py (BEV runs on the learnable pseudo-projection meanwhile), and pre-extraction/caching for decode throughput is a noted follow-up. Ready for review.

m-zain-khawaja

approved - thanks @gcordova10

This was referenced Jun 29, 2026

feat(world-model): slow World Model branch (JEPA) + AutoE2E wiring (#13) #85

Merged

Proposal: Feature Reconstruction Loss for FutureState (Auxiliary Self-Supervised Task) #13

Closed

m-zain-khawaja approved these changes Jun 30, 2026

View reviewed changes

m-zain-khawaja merged commit ab3651e into autowarefoundation:main Jun 30, 2026
1 check passed

This was referenced Jun 30, 2026

TODO: Dataset Selection and DataLoader Implementation #16

Closed

feat(data_parsing): offline pre-extraction of the World Model 1 Hz windows (#16) #100

Merged

gcordova10 deleted the feat/world-model-dataloader branch June 30, 2026 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(data_parsing): L2D 1 Hz sequential multi-view windows for the World Model (#16)#95

feat(data_parsing): L2D 1 Hz sequential multi-view windows for the World Model (#16)#95
m-zain-khawaja merged 1 commit into
autowarefoundation:mainfrom
gcordova10:feat/world-model-dataloader

gcordova10 commented Jun 29, 2026

Uh oh!

gcordova10 commented Jun 29, 2026 •

edited

Loading

Uh oh!

m-zain-khawaja left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gcordova10 commented Jun 29, 2026

What this adds

Design (testable without lerobot)

How it fits

1 Hz / 10 Hz

Scope / not in this PR (kept honest)

Tests

Uh oh!

gcordova10 commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-zain-khawaja left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Design (testable without `lerobot`)

gcordova10 commented Jun 29, 2026 •

edited

Loading