AbdelStark
diff --git a/‎AGENTS.md‎
Lines changed: 96 additions & 0 deletions b/‎AGENTS.md‎
Lines changed: 96 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 43 additions & 19 deletions b/‎README.md‎
Lines changed: 43 additions & 19 deletions
diff --git a/‎crates/pwm-testkit/src/bin/pwm.rs‎
Lines changed: 87 additions & 30 deletions b/‎crates/pwm-testkit/src/bin/pwm.rs‎
Lines changed: 87 additions & 30 deletions
@@ -0,0 +1,96 @@
+# Agent context
+
+Context for coding agents working in this repository. Keep it current when public
+behavior changes.
+
+## What this is
+
+ProvableWorldModel is a commit-and-audit proof system: anyone can verify, on a CPU
+with no floating point, that a committed quantized
+[le-wm](https://github.com/lucas-maes/le-wm) JEPA world model was run exactly as
+claimed. It adapts the [CommitLLM](https://github.com/lambdaclass/CommitLLM)
+scheme (Freivalds matmul checks plus exact integer replay, Merkle commitments, a
+Fiat-Shamir transcript) from language models to a world model, and because the
+model runs in exact integer fixed point it closes CommitLLM's one open hole:
+non-reproducible attention. There is no proving circuit and no arithmetization.
+
+## Current state
+
+- The full le-wm V0 predictor (latent_dim 192, history 3, depth 6, 16 heads,
+  dim_head 64, mlp 2048; 2,437 ops) proves and verifies in the Rust prover. Per
+  ConditionalBlock: AdaLN-zero conditioning, 16-head self-attention, GELU
+  feed-forward, gated residuals, with the multi-head reshapes as Slice/Concat over
+  a named-buffer block DAG (`pwm-core::block`, `prove_block`/`verify_block`).
+- The exporter ingests the real `quentinll/lewm-pusht` checkpoint, takes a real
+  PushT expert episode from `lerobot/pusht`, encodes the real observation frames
+  through the checkpoint's own ViT encoder plus projector and the real expert
+  action through the action encoder, quantizes the full 192-dim V0 subgraph, folds
+  BatchNorm, and commits the manifest. `pwm prove-predictor <bundle>` then proves
+  and verifies the real weights on the real observation and action.
+- Tiers P0 (one predictor step), P1 (rollout), P2 (fixed-candidate planning) are
+  implemented and tested. P3 (full CEM planner) and P4 (pixel-to-plan, the ViT
+  encoder inside the proof) are deferred. The image encoder is the trusted offline
+  step today.
+- The proof attests the exact integer (quantized) relation, not float or PyTorch
+  equivalence. Per-tensor activation-scale calibration for float-faithful outputs
+  is a further refinement; activations stay int8 throughout the current scheme.
+- The argmin uniqueness check and the Freivalds probability bound are formally
+  verified in Lean 4 (no `sorry`), under `lean/`.
+
+## Layout
+
+Five small crates, one trust anchor. The verifier depends on neither the exporter
+nor any Python or float runtime.
+
+- `pwm-core`: fields (M31 value, Fp61 audit), fixed point, tensors, Merkle
+  commitments, Fiat-Shamir transcript, the Freivalds check, the block DAG, the
+  trace model. `no_std`.
+- `pwm-export`: checkpoint to quantized integer graph, manifest, golden vectors,
+  the Rust integer reference, the lerobot/pusht data adapter (Python under
+  `crates/pwm-export/python`).
+- `pwm-prover`: run the reference inference, commit the trace, derive challenges,
+  emit the `AuditArtifact`.
+- `pwm-verifier`: CPU, `no_std`, float-free. Freivalds-check the linears,
+  recompute the rest, check rollout, cost, argmin.
+- `pwm-testkit`: golden vectors, accept and reject suites, the mutation harness,
+  the `pwm` demo CLI (`crates/pwm-testkit/src/bin/pwm.rs`).
+
+## Run the demo
+
+```bash
+docker compose up --build                                  # full predictor, synthetic weights
+docker compose --profile real up --build export predictor-real   # real pretrained checkpoint
+docker compose --profile compact up --build prover verifier      # tiny two-party handoff
+```
+
+Without Docker:
+
+```bash
+cargo run -p pwm-testkit --bin pwm --release -- prove-predictor          # synthetic
+cargo run -p pwm-testkit --bin pwm --release -- prove-predictor <bundle> # real weights
+cargo test --workspace                                                   # accept + reject suites
+```
+
+The `pwm` CLI prints a four-stage pipeline (EXPORT, PROVE, VERIFY, TAMPER); it
+honors `NO_COLOR` and `--json`.
+
+## Conventions
+
+- Stable Rust toolchain (MSRV 1.85). No nightly. The verifier is `no_std` and
+  float-free; keep it that way.
+- Prose (README, demo docs, website, printed CLI output) uses no em-dashes. Rust
+  doc-comments may keep them per repo style.
+- Commits and PRs carry no tool attribution and no `Co-Authored-By` lines.
+- One logical change per branch and PR. Branch from `main`, open a PR, keep CI
+  green (fmt, clippy `-D warnings`, the full test suite, SPDX headers, link check,
+  the no-em-dash check). Confirm before any destructive git operation.
+- Keep complexity and file size as low as the task allows. Small, focused files.
+
+## Pointers
+
+- [README.md](README.md): overview and quickstart.
+- [demo/README.md](demo/README.md): the three demo modes and the real-checkpoint steps.
+- [specs.md](specs.md): the normative commit-and-audit specification.
+- [roadmap.md](roadmap.md): the plan, the pivot rationale, and the sequencing.
+- [website/](website/): the interactive explainer, deployed to GitHub Pages on
+  push to `main` via `.github/workflows/pages.yml`.
@@ -40,16 +40,30 @@ docker compose up --build
 ```
 
 ```
-[prover] model   le-wm V0 predictor (6 blocks, 16 heads), synthetic weights (pass a bundle for real)
-[prover] config  dim=192, history=3, heads=16, dim_head=64, mlp=2048, depth=6
-[prover] weights 30 tensors, 10764288 int8 params
-[prover] graph   2437 ops over the named-buffer block DAG
-[prover] infer   exact integer forward pass in 36 ms
-[prover]   z_next[..6] [-2, -1, 0, 1, 2, -2]  (predicted next-latent head)
-[verifier] challenge  replayed the Fiat-Shamir transcript, derived the Freivalds r
-[verifier] checks     Freivalds v·x == r·z on every projection; exact recompute of attention, softmax, GELU, LayerNorm, residuals
-[verifier] ACCEPT  in 26 ms
-[verifier] tamper  forged matmul op Some(2) -> REJECT FreivaldsCheckFailed { op_id: 2 }
+ProvableWorldModel  commit-and-audit over the le-wm world model
+  pipeline   checkpoint -> quantize -> commit -> encode -> run -> prove -> verify
+
+[stage 1/4] EXPORT  offline, trusted
+  ├ model    le-wm V0 predictor (6 blocks, 16 heads), synthetic weights (pass a bundle for real)
+  ├ config   dim=192, history=3, heads=16, dim_head=64, mlp=2048, depth=6
+  ├ weights  30 tensors, 10,764,288 int8 params
+  ├ inputs   z_history [3x192], action embedding [3x192]
+  └ source   synthetic quantized latents
+
+[stage 2/4] PROVE  exact integer inference + commitment
+  ├ graph    2,437 ops over the named-buffer block DAG
+  │          per block: AdaLN-zero, 16-head attention, GELU FFN, gated residuals
+  ├ infer    exact integer forward pass in 131 ms
+  └ z_next   [-2, -1, 0, 1, 2, -2]  (predicted next-latent head)
+
+[stage 3/4] VERIFY  no_std, float-free
+  ├ challenge replayed the Fiat-Shamir transcript, derived the Freivalds r
+  ├ checks   Freivalds v·x == r·z on every projection
+  │          exact recompute of attention, softmax, GELU, LayerNorm, residuals
+  └ verdict  ACCEPT  in 39 ms
+
+[stage 4/4] TAMPER  forge one matmul output
+  └ forged matmul op 2 -> REJECT FreivaldsCheckFailed { op_id: 2 }  (caught)
 ```
 
 That default is the real architecture with synthetic weights (fast, no checkpoint).
@@ -95,15 +109,25 @@ cargo run -p pwm-testkit --bin pwm --release -- prove-predictor /tmp/lewm_predic
 ```
 
 ```
-[prover] model   le-wm V0 predictor (6 blocks, 16 heads), REAL quantized checkpoint weights
-[prover] config  dim=192, history=3, heads=16, dim_head=64, mlp=2048, depth=6
-[prover] inputs  z_history [3x192], action embedding [3x192]
-[prover]   source  real PushT expert episode (lerobot/pusht): 3 frames @ frameskip 5 -> ViT encoder; real 2D action + agent state
-[prover] graph   2437 ops over the named-buffer block DAG
-[prover] infer   exact integer forward pass in 34 ms
-[prover]   z_next[..6] [11, 55, 32, -73, -57, 13]  (predicted next-latent head)
-[verifier] ACCEPT  in 24 ms
-[verifier] tamper  forged matmul op Some(2) -> REJECT FreivaldsCheckFailed { op_id: 2 }
+[stage 1/4] EXPORT  offline, trusted
+  ├ model    le-wm V0 predictor (6 blocks, 16 heads), REAL quantized checkpoint weights
+  ├ source   quentinll/lewm-pusht (Hugging Face, MIT), int8-quantized V0 subgraph
+  ├ config   dim=192, history=3, heads=16, dim_head=64, mlp=2048, depth=6
+  ├ weights  30 tensors, 10,764,288 int8 params
+  ├ inputs   z_history [3x192], action embedding [3x192]
+  └ source   real PushT expert episode (lerobot/pusht): 3 frames @ frameskip 5 -> ViT encoder; real 2D action + agent state
+
+[stage 2/4] PROVE  exact integer inference + commitment
+  ├ graph    2,437 ops over the named-buffer block DAG
+  │          per block: AdaLN-zero, 16-head attention, GELU FFN, gated residuals
+  ├ infer    exact integer forward pass in 130 ms
+  └ z_next   [11, 55, 32, -73, -57, 13]  (predicted next-latent head)
+
+[stage 3/4] VERIFY  no_std, float-free
+  └ verdict  ACCEPT  in 38 ms
+
+[stage 4/4] TAMPER  forge one matmul output
+  └ forged matmul op 2 -> REJECT FreivaldsCheckFailed { op_id: 2 }  (caught)
 ```
 
 A real expert episode (consistent observation and action) goes through the
 
@@ -67,6 +67,35 @@ fn hex8(root: &[u8; 32]) -> String {
         .collect::<String>()
         + "..."
 }
+// --- pipeline visuals (box-drawing tree, respects NO_COLOR) ---
+fn commas(n: usize) -> String {
+    let s = n.to_string();
+    let b = s.as_bytes();
+    let mut out = String::new();
+    for (i, c) in b.iter().enumerate() {
+        if i > 0 && (b.len() - i) % 3 == 0 {
+            out.push(',');
+        }
+        out.push(*c as char);
+    }
+    out
+}
+fn stage(n: u32, total: u32, name: &str, sub: &str) {
+    println!(
+        "\n{} {}  {}",
+        paint("35;1", &format!("[stage {n}/{total}]")),
+        paint("1", name),
+        dim(sub)
+    );
+}
+// tree-branch connector: mid (├) for inner rows, end (└) for the last
+fn li(last: bool) -> String {
+    dim(if last { "  \u{2514}" } else { "  \u{251c}" })
+}
+// continuation line under a branch (│)
+fn cont() -> String {
+    dim("  \u{2502}")
+}
 fn next_latent(next: &[i64]) -> Vec<i64> {
     next.iter().skip((SEQ - 1) * DIM).copied().collect()
 }
@@ -501,80 +530,108 @@ fn main() {
                 "{}",
                 paint(
                     "36;1",
-                    "ProvableWorldModel: full le-wm predictor into prove_block\n"
+                    "ProvableWorldModel  commit-and-audit over the le-wm world model"
                 )
             );
-            println!("{} {}  {}", tag("prover"), paint("1", "model "), label);
+            println!(
+                "{}",
+                dim("  pipeline   checkpoint -> quantize -> commit -> encode -> run -> prove -> verify")
+            );
+
+            // --- stage 1: export (offline, trusted) ---
+            stage(1, 4, "EXPORT", "offline, trusted");
+            println!("{} model    {}", li(false), label);
             if is_real {
                 println!(
-                    "{} source  quentinll/lewm-pusht {}",
-                    tag("prover"),
+                    "{} source   quentinll/lewm-pusht {}",
+                    li(false),
                     dim("(Hugging Face, MIT), int8-quantized V0 subgraph")
                 );
             }
             println!(
-                "{} config  dim={}, history={}, heads={}, dim_head={}, mlp={}, depth={}  {}",
-                tag("prover"),
+                "{} config   dim={}, history={}, heads={}, dim_head={}, mlp={}, depth={}",
+                li(false),
                 dims.d,
                 dims.s,
                 dims.h,
                 dims.dh,
                 dims.mlp,
-                dims.depth,
-                dim("(self-attention + AdaLN + GELU FFN + residuals)")
+                dims.depth
             );
             println!(
-                "{} weights {} tensors, {} int8 params",
-                tag("prover"),
+                "{} weights  {} tensors, {} int8 params",
+                li(false),
                 weights.len(),
-                params
+                commas(params)
             );
             println!(
-                "{} inputs  z_history [{}x{}], action embedding [{}x{}]",
-                tag("prover"),
+                "{} inputs   z_history [{}x{}], action embedding [{}x{}]",
+                li(false),
                 dims.s,
                 dims.d,
                 dims.s,
                 dims.d
             );
-            println!("{}   source  {}", tag("prover"), dim(&input_source));
+            println!("{} source   {}", li(true), dim(&input_source));
+
+            // --- stage 2: prove (exact integer inference + commitment) ---
+            stage(2, 4, "PROVE", "exact integer inference + commitment");
             println!(
-                "{} graph   {} ops over the named-buffer block DAG",
-                tag("prover"),
-                proven.ops.len()
+                "{} graph    {} ops over the named-buffer block DAG",
+                li(false),
+                commas(proven.ops.len())
             );
             println!(
-                "{} infer   exact integer forward pass in {}",
-                tag("prover"),
+                "{}          {}",
+                cont(),
+                dim("per block: AdaLN-zero, 16-head attention, GELU FFN, gated residuals")
+            );
+            println!(
+                "{} infer    exact integer forward pass in {}",
+                li(false),
                 ok(&ms(infer))
             );
             println!(
-                "{}   z_next[..6] {:?}  {}",
-                tag("prover"),
+                "{} z_next   {:?}  {}",
+                li(true),
                 &out[..out.len().min(6)],
                 dim("(predicted next-latent head)")
             );
+
+            // --- stage 3: verify (no_std, float-free) ---
+            stage(3, 4, "VERIFY", "no_std, float-free");
             println!(
-                "{} challenge  replayed the Fiat-Shamir transcript, derived the Freivalds r",
-                tag("verifier")
+                "{} challenge replayed the Fiat-Shamir transcript, derived the Freivalds r",
+                li(false)
             );
             println!(
-                "{} checks     Freivalds {} on every projection; exact recompute of attention, softmax, GELU, LayerNorm, residuals",
-                tag("verifier"),
+                "{} checks   Freivalds {} on every projection",
+                li(false),
                 dim("v\u{00b7}x == r\u{00b7}z")
             );
+            println!(
+                "{}          {}",
+                cont(),
+                dim("exact recompute of attention, softmax, GELU, LayerNorm, residuals")
+            );
             match res {
-                Ok(_) => println!("{} {}  in {}", tag("verifier"), ok("ACCEPT"), ms(vtime)),
+                Ok(_) => println!("{} verdict  {}  in {}", li(true), ok("ACCEPT"), ms(vtime)),
                 Err(e) => {
-                    println!("{} {}  {e:?}", tag("verifier"), bad("REJECT"));
+                    println!("{} verdict  {}  {e:?}", li(true), bad("REJECT"));
                     exit(1);
                 }
             }
+
+            // --- stage 4: tamper (forge one matmul output) ---
+            stage(4, 4, "TAMPER", "forge one matmul output");
             match reject {
                 Some(e) => println!(
-                    "{} tamper  forged matmul op {forged_op:?} -> {} {e}",
-                    tag("verifier"),
-                    bad("REJECT")
+                    "{} forged matmul op {} -> {} {}  {}",
+                    li(true),
+                    forged_op.map(|i| i.to_string()).unwrap_or_default(),
+                    bad("REJECT"),
+                    e,
+                    dim("(caught)")
                 ),
                 None => {
                     eprintln!("tamper undetected (bug)");