Add ManiSkill3 environments (Franka Panda + SIMPLER Bridge) by qhua360 · Pull Request #251 · galilai-group/stable-worldmodel

qhua360 · 2026-06-09T04:42:44Z

Summary

Adds ManiSkill3 (SAPIEN) manipulation tasks as standard SWM environments via a single robot-agnostic wrapper. Two stationary-arm families:

Franka Panda table-top manipulation (swm/MS*) — PickCube, PushCube, PullCube, PokeCube, StackCube, LiftPegUpright, PegInsertionSide, PlugCharger, PickSingleYCB, RollBall.
SIMPLER / real2sim Bridge digital twins (WidowX, swm/Simpler*) — carrot-on-plate, spoon-on-towel, stack-cube, eggplant-in-basket.

Includes data collection (collect_maniskill.py) and world-model eval support (LeWM training data config + goal-conditioned MPC eval config). World.evaluate + SWM's policies/solvers also handle policy eval via the task's native success detector (mapped onto terminated).

Design

ManiSkillWrapper(gym.Wrapper) mirrors FetchWrapper: wraps gym.make(..., num_envs=1), debatches torch→numpy, maps info['success'] → terminated, lifts env_name/proprio/state/instruction into info, renders (H,W,3) uint8. No robot-specific branching.
TASK_SPECS is the single extension point — one line per task/embodiment.
Goal-conditioned eval: _set_state/_set_goal_state restore/compare the flat ManiSkill sim state (info['state']), mirroring the PushT env, so SWM's MPC eval (eval_wm.py) works.
Opt-in maniskill dependency group (GPU-only — needs CUDA+Vulkan; excluded from uv sync --all-extras/[all], installed via uv sync --group maniskill), lazily imported so import stable_worldmodel stays CPU-importable.
Assets auto-download on first use (MS_SKIP_ASSET_DOWNLOAD_PROMPT=1).

Factors of Variation

The env declares a variation_space of visual distribution-shift knobs (SIMPLER-aligned), applied in _apply_variations using ManiSkill's own SAPIEN idioms (cf. digital_twins/so100_arm/grasp_cube.py) then scene.update_render()+get_obs() so the frame reflects them; sampled values flow into info['variation.*']. Each was verified on an A100 to change the rendered frame via reset(options={'variation_values': ...}):

Factor	Mechanism	Δ pixels
`light.intensity`	`scene.set_ambient_light`	17074
`camera.angle_delta`	render-camera `set_local_pose`	15300
`object.color`	cube material `set_base_color`	102 (small object)
`rendering.transparent_arm`	robot-link material alpha	5129

Deferred (documented): table.color/background.color — those surfaces are textured, so set_base_color is a verified no-op; they need a texture swap / greenscreen. Distractors and object-pose-as-settable-factor are follow-ups (object pose is already seed-randomized per episode).

Success rate

Demo replay. No trained model is bundled, so scripts/examples/maniskill_demo_replay.py replays ManiSkill's official demonstrations through the wrapper — restoring each demo's initial env_state then replaying its recorded actions, reporting success_rate:

PickCube-v1 (motionplanning demos, pd_joint_pos): success_rate = 20/20 = 100.0%

This confirms the env + success→terminated wiring yields and detects real successes (vs. ~0% from a random policy). (Reproduction uses the initial env_state, not the seed — batched-GPU demos aren't seed-reproducible; the absolute-joint demos replay deterministically.)

World-model MPC. A LeWM world model trained on a collected PickCube dataset and evaluated with goal-conditioned MPC reached 2% (1/50) goal-reaching success on swm/MSPickCube-v0 (random-policy data, 30 epochs, goal threshold 2.0 on the flat sim state) — a real but low baseline:

python scripts/data/collect_maniskill.py
python scripts/train/lewm.py data=maniskill output_model_name=lewm_mspickcube wandb.enabled=false
python scripts/plan/eval_wm.py --config-name maniskill

Creating an eval dataset

World-model MPC eval is wired (env callables + collect/train/eval configs), but — exactly like PushT — it runs on a prepared dataset, not raw World.collect output. The dataset must be in the benchmark layout: per-row episode_idx/step_idx + ep_len/ep_offset, with pixels (HWC), action, proprio, and state (the flat ManiSkill sim state used by _set_state) — the same format as the provided pusht_expert_train.h5. World.collect output isn't natively in that layout (Lance hides the index columns; HDF5 omits them), so the eval dataset is produced/provided separately. The trained checkpoint's config.json must be the model block (save_pretrained(config_key='model')). Real Open-X datasets (DROID / BridgeData V2) are for training policies, not as MPC-eval datasets (they carry no ManiSkill sim state to restore).

Verification (A100, CUDA 12.8, NumPy 2)

pytest tests/envs/test_maniskill.py → 5 passed (registry + variation-space CPU tests + 2 GPU rollouts). On CPU/CI without ManiSkill the GPU tests skip.
FoV per-factor pixel-diffs above; render() frames visually confirmed (real Panda/WidowX scenes).
collect→train→eval verified on an H100 (LeWM goal-reaching 2% / 1-of-50 on swm/MSPickCube-v0).

Notes

Panda tasks default to native joint control; pass control_mode='pd_ee_delta_pose' for a uniform 7-D EE action.
Google Robot SIMPLER tasks aren't ported to ManiSkill3 yet (out of scope).

🤖 Generated with Claude Code

Wrap ManiSkill3 (SAPIEN) manipulation tasks as standard SWM gym envs via a single robot-agnostic ManiSkillWrapper, registering two stationary-arm families: Franka Panda table-top manipulation (swm/MS*) and the SIMPLER / real2sim Bridge digital twins on WidowX (swm/Simpler*). - ManiSkillWrapper mirrors FetchWrapper: wraps gym.make at num_envs=1, debatches torch->numpy, maps the native success detector onto `terminated` (what World.evaluate scores), and lifts env_name/proprio/state/instruction into info. Robot-agnostic: proprio flattens obs['agent'], the render camera auto-detects, and **kwargs pass through to gym.make. mani_skill is imported lazily (GPU-only). - tasks.py holds a declarative TASK_SPECS registry; adding a robot/task is a one-line entry. envs/__init__.py registers all ids in one generic loop. - Optional [maniskill] extra (mani-skill), kept out of [all] (GPU-only). - Tests: CPU-safe registry checks + GPU rollout test that skips without mani_skill installed. Docs page + mkdocs nav entry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Verified on a Lambda H100 (CUDA 12.8, driver 580.105.08): both families render real (224,224,3) frames, step with scalar reward + bool terminated, expose success/instruction, and run end-to-end through World.evaluate. pytest tests/envs/test_maniskill.py: 4 passed (2 CPU + 2 GPU). Two fixes surfaced against the real ManiSkill3 API: - render_mode is a read-only property on gym.Wrapper; pass it to gym.make instead of assigning it on the wrapper. - The Bridge digital-twin tasks only support obs_mode='rgb+segmentation' (not plain 'rgb'); set it per-task in TASK_SPECS. Docs: note first-run asset downloads (scene + WidowX robot; public, no token). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Set MS_SKIP_ASSET_DOWNLOAD_PROMPT=1 (via os.environ.setdefault) in the wrapper so missing scene/robot assets download automatically on first gym.make instead of prompting — the interactive prompt raises EOFError under headless/non-interactive stdin. Overridable: set the var to 0 to be prompted. Verified on the H100 by deleting the WidowX robot asset and re-creating the Bridge env non-interactively (auto-downloaded, no EOFError). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Note in the ManiSkill docs that ManiSkill's motion-planning solvers (mplib==0.1.1, pinned on Linux) segfault under NumPy 2 due to a NumPy-1.x ABI mismatch in mplib's compiled extension. Clarify this does NOT affect the integration — the simulator, wrapper, rendering, and World.evaluate all run on NumPy 2; only the optional MP trajectory generators are hit — and give the workarounds (numpy<2 env, or replay download_demo datasets). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Remove the mplib/NumPy-2 limitation admonition: ManiSkill's motion-planning solvers are not part of this integration's intended workflow, so calling out their NumPy-2 incompatibility was more confusing than useful. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Resolves PR feedback on the ManiSkill envs. Factors of Variation: replace the empty variation_space with applied, verified visual FoV (SIMPLER-aligned), each confirmed to change the rendered frame on an A100 via reset(options={'variation_values': ...}): - light.intensity -> scene.set_ambient_light (17074 px) - camera.angle_delta -> render-camera set_local_pose (15300 px) - object.color -> cube material set_base_color ( 102 px) - rendering.transparent_arm -> robot-link material alpha (5129 px) Applied in _apply_variations using ManiSkill's own SAPIEN idioms (cf. digital_twins/so100_arm/grasp_cube.py), then scene.update_render()+get_obs() so the frame reflects the change. Sampled values flow to info['variation.*']. table.color/background.color are deferred: those surfaces are textured, so set_base_color is a no-op (verified) — they need a texture swap / greenscreen. Comment style: remove the `# -----` dividers (env.py) and `# --- X ---` section labels (tasks.py). Success rate: add scripts/examples/maniskill_demo_replay.py — restores each ManiSkill demo's initial env_state and replays its actions through the wrapper. PickCube motionplanning demos reproduce 20/20 = 100%, confirming the env + success->terminated wiring yields and detects real successes. Verified on an A100 (CUDA 12.8): 5/5 pytest, FoV pixel-diffs above, demo replay 100%. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Missed two `# --- ... ---` section dividers in tests/envs/test_maniskill.py during the earlier comment cleanup (which only touched env.py/tasks.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a "World-model (MPC) evaluation — follow-up" note to the ManiSkill docs: what eval_wm.py needs (a sim-collected dataset in the pusht_expert_train.h5 layout: episode_idx/step_idx/ep_len/ep_offset + HWC pixels + flat state; goal- conditioning callables; a model-block checkpoint config), the validated 2% goal-reaching result, and that real Open-X datasets (DROID/Bridge) are for training policies, not MPC-eval datasets. Eval scaffolding lives on the maniskill-wm-eval branch for the follow-up PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Fold the collect/train/eval scaffolding into the env integration: - env.py: _set_state/_set_goal_state + flat get_state() recording + goal-distance terminated (PushT-style goal-conditioned MPC eval). Validated on H100 (exact state round-trip; goal-distance termination fires). - scripts/data/collect_maniskill.py + config: RandomPolicy collection on PickCube. - scripts/train/config/data/maniskill.yaml: LeWM training data config. - scripts/plan/config/maniskill.yaml: goal-conditioned eval config. Docs: fold the world-model MPC result (LeWM 2% / 1-of-50 goal-reaching on swm/MSPickCube-v0) and the collect->train->eval setup into the Success rate section; note that the MPC eval consumes a dataset in the benchmark layout (episode_idx/step_idx/ep_len/ep_offset + HWC pixels + flat state), supplied separately like PushT's pusht_expert_train.h5, and that the checkpoint config must be the model block (config_key='model'). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The world-model MPC eval does not run on raw collect output. Mark the config as a template and point it at the artifacts it actually needs: - header note: requires a benchmark-format dataset (episode_idx/step_idx/ep_len/ ep_offset + HWC pixels + flat state, like pusht_expert_train.h5) and a model-block checkpoint config (save_pretrained config_key='model'). - eval.dataset_name: maniskill/mspickcube_random.lance -> maniskill/mspickcube_eval.h5 (collect output isn't readable by eval; supply a prepared dataset). - policy: lewm_mspickcube -> lewm_mspickcube/weights_epoch_30.pt (a bare run name is ambiguous across epochs). - docs: note that scripts/plan/config/maniskill.yaml is a template. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CI's `uv sync --all-extras` installed the GPU-only maniskill extra, pulling mani-skill -> mplib -> toppra. No toppra version ships cp311/cp312 wheels (0.6.8 is cp310-only, no sdist), so the install failed on 3.11/3.12; on 3.10 it installed and the GPU rollout tests ran and crashed on missing Vulkan. mani-skill is GPU-only (CUDA + Vulkan) with no clean wheel install on linux py3.11/3.12, so it doesn't belong in `--all-extras`. Move it from [project.optional-dependencies] to a [dependency-groups] `maniskill` group (same mechanism as `dev`); `--all-extras` ignores non-default groups, so CI resolves cleanly with no workflow change. Opt in via `uv sync --group maniskill`. Also: - test: skip the rollout test when no CUDA is available (defensive — it needs a GPU/Vulkan even if the group is installed locally). - docs: install via `uv sync --group maniskill`; note it's an opt-in group. Verified: `uv export --all-extras` excludes mani-skill/mplib/toppra; `uv export --group maniskill` includes them; uv.lock unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

qhua360 and others added 11 commits June 8, 2026 23:45

Remove remaining # --- divider comments in the maniskill test

b562759

Missed two `# --- ... ---` section dividers in tests/envs/test_maniskill.py during the earlier comment cleanup (which only touched env.py/tasks.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ManiSkill3 environments (Franka Panda + SIMPLER Bridge)#251

Add ManiSkill3 environments (Franka Panda + SIMPLER Bridge)#251
qhua360 wants to merge 11 commits into
galilai-group:mainfrom
qhua360:add-maniskill-envs

qhua360 commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

qhua360 commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Factors of Variation

Success rate

Creating an eval dataset

Verification (A100, CUDA 12.8, NumPy 2)

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qhua360 commented Jun 9, 2026 •

edited

Loading