Skip to content

Add ManiSkill3 environments (Franka Panda + SIMPLER Bridge)#251

Open
qhua360 wants to merge 11 commits into
galilai-group:mainfrom
qhua360:add-maniskill-envs
Open

Add ManiSkill3 environments (Franka Panda + SIMPLER Bridge)#251
qhua360 wants to merge 11 commits into
galilai-group:mainfrom
qhua360:add-maniskill-envs

Conversation

@qhua360

@qhua360 qhua360 commented Jun 9, 2026

Copy link
Copy Markdown

Summary

Adds ManiSkill3 (SAPIEN) manipulation tasks as standard SWM environments via a single robot-agnostic wrapper. Two stationary-arm families:

  • Franka Panda table-top manipulation (swm/MS*) — PickCube, PushCube, PullCube, PokeCube, StackCube, LiftPegUpright, PegInsertionSide, PlugCharger, PickSingleYCB, RollBall.
  • SIMPLER / real2sim Bridge digital twins (WidowX, swm/Simpler*) — carrot-on-plate, spoon-on-towel, stack-cube, eggplant-in-basket.

Includes data collection (collect_maniskill.py) and world-model eval support (LeWM training data config + goal-conditioned MPC eval config). World.evaluate + SWM's policies/solvers also handle policy eval via the task's native success detector (mapped onto terminated).

Design

  • ManiSkillWrapper(gym.Wrapper) mirrors FetchWrapper: wraps gym.make(..., num_envs=1), debatches torch→numpy, maps info['success'] → terminated, lifts env_name/proprio/state/instruction into info, renders (H,W,3) uint8. No robot-specific branching.
  • TASK_SPECS is the single extension point — one line per task/embodiment.
  • Goal-conditioned eval: _set_state/_set_goal_state restore/compare the flat ManiSkill sim state (info['state']), mirroring the PushT env, so SWM's MPC eval (eval_wm.py) works.
  • Opt-in maniskill dependency group (GPU-only — needs CUDA+Vulkan; excluded from uv sync --all-extras/[all], installed via uv sync --group maniskill), lazily imported so import stable_worldmodel stays CPU-importable.
  • Assets auto-download on first use (MS_SKIP_ASSET_DOWNLOAD_PROMPT=1).

Factors of Variation

The env declares a variation_space of visual distribution-shift knobs (SIMPLER-aligned), applied in _apply_variations using ManiSkill's own SAPIEN idioms (cf. digital_twins/so100_arm/grasp_cube.py) then scene.update_render()+get_obs() so the frame reflects them; sampled values flow into info['variation.*']. Each was verified on an A100 to change the rendered frame via reset(options={'variation_values': ...}):

Factor Mechanism Δ pixels
light.intensity scene.set_ambient_light 17074
camera.angle_delta render-camera set_local_pose 15300
object.color cube material set_base_color 102 (small object)
rendering.transparent_arm robot-link material alpha 5129

Deferred (documented): table.color/background.color — those surfaces are textured, so set_base_color is a verified no-op; they need a texture swap / greenscreen. Distractors and object-pose-as-settable-factor are follow-ups (object pose is already seed-randomized per episode).

Success rate

Demo replay. No trained model is bundled, so scripts/examples/maniskill_demo_replay.py replays ManiSkill's official demonstrations through the wrapper — restoring each demo's initial env_state then replaying its recorded actions, reporting success_rate:

PickCube-v1 (motionplanning demos, pd_joint_pos): success_rate = 20/20 = 100.0%

This confirms the env + success→terminated wiring yields and detects real successes (vs. ~0% from a random policy). (Reproduction uses the initial env_state, not the seed — batched-GPU demos aren't seed-reproducible; the absolute-joint demos replay deterministically.)

World-model MPC. A LeWM world model trained on a collected PickCube dataset and evaluated with goal-conditioned MPC reached 2% (1/50) goal-reaching success on swm/MSPickCube-v0 (random-policy data, 30 epochs, goal threshold 2.0 on the flat sim state) — a real but low baseline:

python scripts/data/collect_maniskill.py
python scripts/train/lewm.py data=maniskill output_model_name=lewm_mspickcube wandb.enabled=false
python scripts/plan/eval_wm.py --config-name maniskill

Creating an eval dataset

World-model MPC eval is wired (env callables + collect/train/eval configs), but — exactly like PushT — it runs on a prepared dataset, not raw World.collect output. The dataset must be in the benchmark layout: per-row episode_idx/step_idx + ep_len/ep_offset, with pixels (HWC), action, proprio, and state (the flat ManiSkill sim state used by _set_state) — the same format as the provided pusht_expert_train.h5. World.collect output isn't natively in that layout (Lance hides the index columns; HDF5 omits them), so the eval dataset is produced/provided separately. The trained checkpoint's config.json must be the model block (save_pretrained(config_key='model')). Real Open-X datasets (DROID / BridgeData V2) are for training policies, not as MPC-eval datasets (they carry no ManiSkill sim state to restore).

Verification (A100, CUDA 12.8, NumPy 2)

  • pytest tests/envs/test_maniskill.py5 passed (registry + variation-space CPU tests + 2 GPU rollouts). On CPU/CI without ManiSkill the GPU tests skip.
  • FoV per-factor pixel-diffs above; render() frames visually confirmed (real Panda/WidowX scenes).
  • collect→train→eval verified on an H100 (LeWM goal-reaching 2% / 1-of-50 on swm/MSPickCube-v0).

Notes

  • Panda tasks default to native joint control; pass control_mode='pd_ee_delta_pose' for a uniform 7-D EE action.
  • Google Robot SIMPLER tasks aren't ported to ManiSkill3 yet (out of scope).

🤖 Generated with Claude Code

qhua360 and others added 11 commits June 8, 2026 23:45
Wrap ManiSkill3 (SAPIEN) manipulation tasks as standard SWM gym envs via
a single robot-agnostic ManiSkillWrapper, registering two stationary-arm
families: Franka Panda table-top manipulation (swm/MS*) and the SIMPLER /
real2sim Bridge digital twins on WidowX (swm/Simpler*).

- ManiSkillWrapper mirrors FetchWrapper: wraps gym.make at num_envs=1,
  debatches torch->numpy, maps the native success detector onto
  `terminated` (what World.evaluate scores), and lifts
  env_name/proprio/state/instruction into info. Robot-agnostic: proprio
  flattens obs['agent'], the render camera auto-detects, and **kwargs pass
  through to gym.make. mani_skill is imported lazily (GPU-only).
- tasks.py holds a declarative TASK_SPECS registry; adding a robot/task is
  a one-line entry. envs/__init__.py registers all ids in one generic loop.
- Optional [maniskill] extra (mani-skill), kept out of [all] (GPU-only).
- Tests: CPU-safe registry checks + GPU rollout test that skips without
  mani_skill installed. Docs page + mkdocs nav entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verified on a Lambda H100 (CUDA 12.8, driver 580.105.08): both families
render real (224,224,3) frames, step with scalar reward + bool terminated,
expose success/instruction, and run end-to-end through World.evaluate.
pytest tests/envs/test_maniskill.py: 4 passed (2 CPU + 2 GPU).

Two fixes surfaced against the real ManiSkill3 API:
- render_mode is a read-only property on gym.Wrapper; pass it to gym.make
  instead of assigning it on the wrapper.
- The Bridge digital-twin tasks only support obs_mode='rgb+segmentation'
  (not plain 'rgb'); set it per-task in TASK_SPECS.

Docs: note first-run asset downloads (scene + WidowX robot; public, no token).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Set MS_SKIP_ASSET_DOWNLOAD_PROMPT=1 (via os.environ.setdefault) in the
wrapper so missing scene/robot assets download automatically on first
gym.make instead of prompting — the interactive prompt raises EOFError
under headless/non-interactive stdin. Overridable: set the var to 0 to be
prompted. Verified on the H100 by deleting the WidowX robot asset and
re-creating the Bridge env non-interactively (auto-downloaded, no EOFError).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Note in the ManiSkill docs that ManiSkill's motion-planning solvers
(mplib==0.1.1, pinned on Linux) segfault under NumPy 2 due to a NumPy-1.x
ABI mismatch in mplib's compiled extension. Clarify this does NOT affect
the integration — the simulator, wrapper, rendering, and World.evaluate
all run on NumPy 2; only the optional MP trajectory generators are hit —
and give the workarounds (numpy<2 env, or replay download_demo datasets).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the mplib/NumPy-2 limitation admonition: ManiSkill's motion-planning
solvers are not part of this integration's intended workflow, so calling out
their NumPy-2 incompatibility was more confusing than useful.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolves PR feedback on the ManiSkill envs.

Factors of Variation: replace the empty variation_space with applied,
verified visual FoV (SIMPLER-aligned), each confirmed to change the rendered
frame on an A100 via reset(options={'variation_values': ...}):
  - light.intensity   -> scene.set_ambient_light        (17074 px)
  - camera.angle_delta -> render-camera set_local_pose   (15300 px)
  - object.color      -> cube material set_base_color    (  102 px)
  - rendering.transparent_arm -> robot-link material alpha (5129 px)
Applied in _apply_variations using ManiSkill's own SAPIEN idioms (cf.
digital_twins/so100_arm/grasp_cube.py), then scene.update_render()+get_obs()
so the frame reflects the change. Sampled values flow to info['variation.*'].
table.color/background.color are deferred: those surfaces are textured, so
set_base_color is a no-op (verified) — they need a texture swap / greenscreen.

Comment style: remove the `# -----` dividers (env.py) and `# --- X ---`
section labels (tasks.py).

Success rate: add scripts/examples/maniskill_demo_replay.py — restores each
ManiSkill demo's initial env_state and replays its actions through the wrapper.
PickCube motionplanning demos reproduce 20/20 = 100%, confirming the env +
success->terminated wiring yields and detects real successes.

Verified on an A100 (CUDA 12.8): 5/5 pytest, FoV pixel-diffs above, demo
replay 100%.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Missed two `# --- ... ---` section dividers in tests/envs/test_maniskill.py
during the earlier comment cleanup (which only touched env.py/tasks.py).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a "World-model (MPC) evaluation — follow-up" note to the ManiSkill docs:
what eval_wm.py needs (a sim-collected dataset in the pusht_expert_train.h5
layout: episode_idx/step_idx/ep_len/ep_offset + HWC pixels + flat state; goal-
conditioning callables; a model-block checkpoint config), the validated 2%
goal-reaching result, and that real Open-X datasets (DROID/Bridge) are for
training policies, not MPC-eval datasets. Eval scaffolding lives on the
maniskill-wm-eval branch for the follow-up PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fold the collect/train/eval scaffolding into the env integration:
- env.py: _set_state/_set_goal_state + flat get_state() recording + goal-distance
  terminated (PushT-style goal-conditioned MPC eval). Validated on H100 (exact
  state round-trip; goal-distance termination fires).
- scripts/data/collect_maniskill.py + config: RandomPolicy collection on PickCube.
- scripts/train/config/data/maniskill.yaml: LeWM training data config.
- scripts/plan/config/maniskill.yaml: goal-conditioned eval config.

Docs: fold the world-model MPC result (LeWM 2% / 1-of-50 goal-reaching on
swm/MSPickCube-v0) and the collect->train->eval setup into the Success rate
section; note that the MPC eval consumes a dataset in the benchmark layout
(episode_idx/step_idx/ep_len/ep_offset + HWC pixels + flat state), supplied
separately like PushT's pusht_expert_train.h5, and that the checkpoint config
must be the model block (config_key='model').

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The world-model MPC eval does not run on raw collect output. Mark the config as
a template and point it at the artifacts it actually needs:
- header note: requires a benchmark-format dataset (episode_idx/step_idx/ep_len/
  ep_offset + HWC pixels + flat state, like pusht_expert_train.h5) and a
  model-block checkpoint config (save_pretrained config_key='model').
- eval.dataset_name: maniskill/mspickcube_random.lance -> maniskill/mspickcube_eval.h5
  (collect output isn't readable by eval; supply a prepared dataset).
- policy: lewm_mspickcube -> lewm_mspickcube/weights_epoch_30.pt (a bare run name
  is ambiguous across epochs).
- docs: note that scripts/plan/config/maniskill.yaml is a template.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI's `uv sync --all-extras` installed the GPU-only maniskill extra, pulling
mani-skill -> mplib -> toppra. No toppra version ships cp311/cp312 wheels
(0.6.8 is cp310-only, no sdist), so the install failed on 3.11/3.12; on 3.10 it
installed and the GPU rollout tests ran and crashed on missing Vulkan.

mani-skill is GPU-only (CUDA + Vulkan) with no clean wheel install on linux
py3.11/3.12, so it doesn't belong in `--all-extras`. Move it from
[project.optional-dependencies] to a [dependency-groups] `maniskill` group
(same mechanism as `dev`); `--all-extras` ignores non-default groups, so CI
resolves cleanly with no workflow change. Opt in via `uv sync --group maniskill`.

Also:
- test: skip the rollout test when no CUDA is available (defensive — it needs a
  GPU/Vulkan even if the group is installed locally).
- docs: install via `uv sync --group maniskill`; note it's an opt-in group.

Verified: `uv export --all-extras` excludes mani-skill/mplib/toppra;
`uv export --group maniskill` includes them; uv.lock unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant