Skip to content

Commit e9060cc

Browse files
authored
Merge pull request #175 from AbdelStark/demo/visual-pipeline-and-site
demo: staged visual pipeline output, refresh site/readme/agent context
2 parents 383915b + 9963b49 commit e9060cc

6 files changed

Lines changed: 330 additions & 76 deletions

File tree

AGENTS.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Agent context
2+
3+
Context for coding agents working in this repository. Keep it current when public
4+
behavior changes.
5+
6+
## What this is
7+
8+
ProvableWorldModel is a commit-and-audit proof system: anyone can verify, on a CPU
9+
with no floating point, that a committed quantized
10+
[le-wm](https://github.com/lucas-maes/le-wm) JEPA world model was run exactly as
11+
claimed. It adapts the [CommitLLM](https://github.com/lambdaclass/CommitLLM)
12+
scheme (Freivalds matmul checks plus exact integer replay, Merkle commitments, a
13+
Fiat-Shamir transcript) from language models to a world model, and because the
14+
model runs in exact integer fixed point it closes CommitLLM's one open hole:
15+
non-reproducible attention. There is no proving circuit and no arithmetization.
16+
17+
## Current state
18+
19+
- The full le-wm V0 predictor (latent_dim 192, history 3, depth 6, 16 heads,
20+
dim_head 64, mlp 2048; 2,437 ops) proves and verifies in the Rust prover. Per
21+
ConditionalBlock: AdaLN-zero conditioning, 16-head self-attention, GELU
22+
feed-forward, gated residuals, with the multi-head reshapes as Slice/Concat over
23+
a named-buffer block DAG (`pwm-core::block`, `prove_block`/`verify_block`).
24+
- The exporter ingests the real `quentinll/lewm-pusht` checkpoint, takes a real
25+
PushT expert episode from `lerobot/pusht`, encodes the real observation frames
26+
through the checkpoint's own ViT encoder plus projector and the real expert
27+
action through the action encoder, quantizes the full 192-dim V0 subgraph, folds
28+
BatchNorm, and commits the manifest. `pwm prove-predictor <bundle>` then proves
29+
and verifies the real weights on the real observation and action.
30+
- Tiers P0 (one predictor step), P1 (rollout), P2 (fixed-candidate planning) are
31+
implemented and tested. P3 (full CEM planner) and P4 (pixel-to-plan, the ViT
32+
encoder inside the proof) are deferred. The image encoder is the trusted offline
33+
step today.
34+
- The proof attests the exact integer (quantized) relation, not float or PyTorch
35+
equivalence. Per-tensor activation-scale calibration for float-faithful outputs
36+
is a further refinement; activations stay int8 throughout the current scheme.
37+
- The argmin uniqueness check and the Freivalds probability bound are formally
38+
verified in Lean 4 (no `sorry`), under `lean/`.
39+
40+
## Layout
41+
42+
Five small crates, one trust anchor. The verifier depends on neither the exporter
43+
nor any Python or float runtime.
44+
45+
- `pwm-core`: fields (M31 value, Fp61 audit), fixed point, tensors, Merkle
46+
commitments, Fiat-Shamir transcript, the Freivalds check, the block DAG, the
47+
trace model. `no_std`.
48+
- `pwm-export`: checkpoint to quantized integer graph, manifest, golden vectors,
49+
the Rust integer reference, the lerobot/pusht data adapter (Python under
50+
`crates/pwm-export/python`).
51+
- `pwm-prover`: run the reference inference, commit the trace, derive challenges,
52+
emit the `AuditArtifact`.
53+
- `pwm-verifier`: CPU, `no_std`, float-free. Freivalds-check the linears,
54+
recompute the rest, check rollout, cost, argmin.
55+
- `pwm-testkit`: golden vectors, accept and reject suites, the mutation harness,
56+
the `pwm` demo CLI (`crates/pwm-testkit/src/bin/pwm.rs`).
57+
58+
## Run the demo
59+
60+
```bash
61+
docker compose up --build # full predictor, synthetic weights
62+
docker compose --profile real up --build export predictor-real # real pretrained checkpoint
63+
docker compose --profile compact up --build prover verifier # tiny two-party handoff
64+
```
65+
66+
Without Docker:
67+
68+
```bash
69+
cargo run -p pwm-testkit --bin pwm --release -- prove-predictor # synthetic
70+
cargo run -p pwm-testkit --bin pwm --release -- prove-predictor <bundle> # real weights
71+
cargo test --workspace # accept + reject suites
72+
```
73+
74+
The `pwm` CLI prints a four-stage pipeline (EXPORT, PROVE, VERIFY, TAMPER); it
75+
honors `NO_COLOR` and `--json`.
76+
77+
## Conventions
78+
79+
- Stable Rust toolchain (MSRV 1.85). No nightly. The verifier is `no_std` and
80+
float-free; keep it that way.
81+
- Prose (README, demo docs, website, printed CLI output) uses no em-dashes. Rust
82+
doc-comments may keep them per repo style.
83+
- Commits and PRs carry no tool attribution and no `Co-Authored-By` lines.
84+
- One logical change per branch and PR. Branch from `main`, open a PR, keep CI
85+
green (fmt, clippy `-D warnings`, the full test suite, SPDX headers, link check,
86+
the no-em-dash check). Confirm before any destructive git operation.
87+
- Keep complexity and file size as low as the task allows. Small, focused files.
88+
89+
## Pointers
90+
91+
- [README.md](README.md): overview and quickstart.
92+
- [demo/README.md](demo/README.md): the three demo modes and the real-checkpoint steps.
93+
- [specs.md](specs.md): the normative commit-and-audit specification.
94+
- [roadmap.md](roadmap.md): the plan, the pivot rationale, and the sequencing.
95+
- [website/](website/): the interactive explainer, deployed to GitHub Pages on
96+
push to `main` via `.github/workflows/pages.yml`.

README.md

Lines changed: 43 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -40,16 +40,30 @@ docker compose up --build
4040
```
4141

4242
```
43-
[prover] model le-wm V0 predictor (6 blocks, 16 heads), synthetic weights (pass a bundle for real)
44-
[prover] config dim=192, history=3, heads=16, dim_head=64, mlp=2048, depth=6
45-
[prover] weights 30 tensors, 10764288 int8 params
46-
[prover] graph 2437 ops over the named-buffer block DAG
47-
[prover] infer exact integer forward pass in 36 ms
48-
[prover] z_next[..6] [-2, -1, 0, 1, 2, -2] (predicted next-latent head)
49-
[verifier] challenge replayed the Fiat-Shamir transcript, derived the Freivalds r
50-
[verifier] checks Freivalds v·x == r·z on every projection; exact recompute of attention, softmax, GELU, LayerNorm, residuals
51-
[verifier] ACCEPT in 26 ms
52-
[verifier] tamper forged matmul op Some(2) -> REJECT FreivaldsCheckFailed { op_id: 2 }
43+
ProvableWorldModel commit-and-audit over the le-wm world model
44+
pipeline checkpoint -> quantize -> commit -> encode -> run -> prove -> verify
45+
46+
[stage 1/4] EXPORT offline, trusted
47+
├ model le-wm V0 predictor (6 blocks, 16 heads), synthetic weights (pass a bundle for real)
48+
├ config dim=192, history=3, heads=16, dim_head=64, mlp=2048, depth=6
49+
├ weights 30 tensors, 10,764,288 int8 params
50+
├ inputs z_history [3x192], action embedding [3x192]
51+
└ source synthetic quantized latents
52+
53+
[stage 2/4] PROVE exact integer inference + commitment
54+
├ graph 2,437 ops over the named-buffer block DAG
55+
│ per block: AdaLN-zero, 16-head attention, GELU FFN, gated residuals
56+
├ infer exact integer forward pass in 131 ms
57+
└ z_next [-2, -1, 0, 1, 2, -2] (predicted next-latent head)
58+
59+
[stage 3/4] VERIFY no_std, float-free
60+
├ challenge replayed the Fiat-Shamir transcript, derived the Freivalds r
61+
├ checks Freivalds v·x == r·z on every projection
62+
│ exact recompute of attention, softmax, GELU, LayerNorm, residuals
63+
└ verdict ACCEPT in 39 ms
64+
65+
[stage 4/4] TAMPER forge one matmul output
66+
└ forged matmul op 2 -> REJECT FreivaldsCheckFailed { op_id: 2 } (caught)
5367
```
5468

5569
That default is the real architecture with synthetic weights (fast, no checkpoint).
@@ -95,15 +109,25 @@ cargo run -p pwm-testkit --bin pwm --release -- prove-predictor /tmp/lewm_predic
95109
```
96110

97111
```
98-
[prover] model le-wm V0 predictor (6 blocks, 16 heads), REAL quantized checkpoint weights
99-
[prover] config dim=192, history=3, heads=16, dim_head=64, mlp=2048, depth=6
100-
[prover] inputs z_history [3x192], action embedding [3x192]
101-
[prover] source real PushT expert episode (lerobot/pusht): 3 frames @ frameskip 5 -> ViT encoder; real 2D action + agent state
102-
[prover] graph 2437 ops over the named-buffer block DAG
103-
[prover] infer exact integer forward pass in 34 ms
104-
[prover] z_next[..6] [11, 55, 32, -73, -57, 13] (predicted next-latent head)
105-
[verifier] ACCEPT in 24 ms
106-
[verifier] tamper forged matmul op Some(2) -> REJECT FreivaldsCheckFailed { op_id: 2 }
112+
[stage 1/4] EXPORT offline, trusted
113+
├ model le-wm V0 predictor (6 blocks, 16 heads), REAL quantized checkpoint weights
114+
├ source quentinll/lewm-pusht (Hugging Face, MIT), int8-quantized V0 subgraph
115+
├ config dim=192, history=3, heads=16, dim_head=64, mlp=2048, depth=6
116+
├ weights 30 tensors, 10,764,288 int8 params
117+
├ inputs z_history [3x192], action embedding [3x192]
118+
└ source real PushT expert episode (lerobot/pusht): 3 frames @ frameskip 5 -> ViT encoder; real 2D action + agent state
119+
120+
[stage 2/4] PROVE exact integer inference + commitment
121+
├ graph 2,437 ops over the named-buffer block DAG
122+
│ per block: AdaLN-zero, 16-head attention, GELU FFN, gated residuals
123+
├ infer exact integer forward pass in 130 ms
124+
└ z_next [11, 55, 32, -73, -57, 13] (predicted next-latent head)
125+
126+
[stage 3/4] VERIFY no_std, float-free
127+
└ verdict ACCEPT in 38 ms
128+
129+
[stage 4/4] TAMPER forge one matmul output
130+
└ forged matmul op 2 -> REJECT FreivaldsCheckFailed { op_id: 2 } (caught)
107131
```
108132

109133
A real expert episode (consistent observation and action) goes through the

crates/pwm-testkit/src/bin/pwm.rs

Lines changed: 87 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,35 @@ fn hex8(root: &[u8; 32]) -> String {
6767
.collect::<String>()
6868
+ "..."
6969
}
70+
// --- pipeline visuals (box-drawing tree, respects NO_COLOR) ---
71+
fn commas(n: usize) -> String {
72+
let s = n.to_string();
73+
let b = s.as_bytes();
74+
let mut out = String::new();
75+
for (i, c) in b.iter().enumerate() {
76+
if i > 0 && (b.len() - i) % 3 == 0 {
77+
out.push(',');
78+
}
79+
out.push(*c as char);
80+
}
81+
out
82+
}
83+
fn stage(n: u32, total: u32, name: &str, sub: &str) {
84+
println!(
85+
"\n{} {} {}",
86+
paint("35;1", &format!("[stage {n}/{total}]")),
87+
paint("1", name),
88+
dim(sub)
89+
);
90+
}
91+
// tree-branch connector: mid (├) for inner rows, end (└) for the last
92+
fn li(last: bool) -> String {
93+
dim(if last { " \u{2514}" } else { " \u{251c}" })
94+
}
95+
// continuation line under a branch (│)
96+
fn cont() -> String {
97+
dim(" \u{2502}")
98+
}
7099
fn next_latent(next: &[i64]) -> Vec<i64> {
71100
next.iter().skip((SEQ - 1) * DIM).copied().collect()
72101
}
@@ -501,80 +530,108 @@ fn main() {
501530
"{}",
502531
paint(
503532
"36;1",
504-
"ProvableWorldModel: full le-wm predictor into prove_block\n"
533+
"ProvableWorldModel commit-and-audit over the le-wm world model"
505534
)
506535
);
507-
println!("{} {} {}", tag("prover"), paint("1", "model "), label);
536+
println!(
537+
"{}",
538+
dim(" pipeline checkpoint -> quantize -> commit -> encode -> run -> prove -> verify")
539+
);
540+
541+
// --- stage 1: export (offline, trusted) ---
542+
stage(1, 4, "EXPORT", "offline, trusted");
543+
println!("{} model {}", li(false), label);
508544
if is_real {
509545
println!(
510-
"{} source quentinll/lewm-pusht {}",
511-
tag("prover"),
546+
"{} source quentinll/lewm-pusht {}",
547+
li(false),
512548
dim("(Hugging Face, MIT), int8-quantized V0 subgraph")
513549
);
514550
}
515551
println!(
516-
"{} config dim={}, history={}, heads={}, dim_head={}, mlp={}, depth={} {}",
517-
tag("prover"),
552+
"{} config dim={}, history={}, heads={}, dim_head={}, mlp={}, depth={}",
553+
li(false),
518554
dims.d,
519555
dims.s,
520556
dims.h,
521557
dims.dh,
522558
dims.mlp,
523-
dims.depth,
524-
dim("(self-attention + AdaLN + GELU FFN + residuals)")
559+
dims.depth
525560
);
526561
println!(
527-
"{} weights {} tensors, {} int8 params",
528-
tag("prover"),
562+
"{} weights {} tensors, {} int8 params",
563+
li(false),
529564
weights.len(),
530-
params
565+
commas(params)
531566
);
532567
println!(
533-
"{} inputs z_history [{}x{}], action embedding [{}x{}]",
534-
tag("prover"),
568+
"{} inputs z_history [{}x{}], action embedding [{}x{}]",
569+
li(false),
535570
dims.s,
536571
dims.d,
537572
dims.s,
538573
dims.d
539574
);
540-
println!("{} source {}", tag("prover"), dim(&input_source));
575+
println!("{} source {}", li(true), dim(&input_source));
576+
577+
// --- stage 2: prove (exact integer inference + commitment) ---
578+
stage(2, 4, "PROVE", "exact integer inference + commitment");
541579
println!(
542-
"{} graph {} ops over the named-buffer block DAG",
543-
tag("prover"),
544-
proven.ops.len()
580+
"{} graph {} ops over the named-buffer block DAG",
581+
li(false),
582+
commas(proven.ops.len())
545583
);
546584
println!(
547-
"{} infer exact integer forward pass in {}",
548-
tag("prover"),
585+
"{} {}",
586+
cont(),
587+
dim("per block: AdaLN-zero, 16-head attention, GELU FFN, gated residuals")
588+
);
589+
println!(
590+
"{} infer exact integer forward pass in {}",
591+
li(false),
549592
ok(&ms(infer))
550593
);
551594
println!(
552-
"{} z_next[..6] {:?} {}",
553-
tag("prover"),
595+
"{} z_next {:?} {}",
596+
li(true),
554597
&out[..out.len().min(6)],
555598
dim("(predicted next-latent head)")
556599
);
600+
601+
// --- stage 3: verify (no_std, float-free) ---
602+
stage(3, 4, "VERIFY", "no_std, float-free");
557603
println!(
558-
"{} challenge replayed the Fiat-Shamir transcript, derived the Freivalds r",
559-
tag("verifier")
604+
"{} challenge replayed the Fiat-Shamir transcript, derived the Freivalds r",
605+
li(false)
560606
);
561607
println!(
562-
"{} checks Freivalds {} on every projection; exact recompute of attention, softmax, GELU, LayerNorm, residuals",
563-
tag("verifier"),
608+
"{} checks Freivalds {} on every projection",
609+
li(false),
564610
dim("v\u{00b7}x == r\u{00b7}z")
565611
);
612+
println!(
613+
"{} {}",
614+
cont(),
615+
dim("exact recompute of attention, softmax, GELU, LayerNorm, residuals")
616+
);
566617
match res {
567-
Ok(_) => println!("{} {} in {}", tag("verifier"), ok("ACCEPT"), ms(vtime)),
618+
Ok(_) => println!("{} verdict {} in {}", li(true), ok("ACCEPT"), ms(vtime)),
568619
Err(e) => {
569-
println!("{} {} {e:?}", tag("verifier"), bad("REJECT"));
620+
println!("{} verdict {} {e:?}", li(true), bad("REJECT"));
570621
exit(1);
571622
}
572623
}
624+
625+
// --- stage 4: tamper (forge one matmul output) ---
626+
stage(4, 4, "TAMPER", "forge one matmul output");
573627
match reject {
574628
Some(e) => println!(
575-
"{} tamper forged matmul op {forged_op:?} -> {} {e}",
576-
tag("verifier"),
577-
bad("REJECT")
629+
"{} forged matmul op {} -> {} {} {}",
630+
li(true),
631+
forged_op.map(|i| i.to_string()).unwrap_or_default(),
632+
bad("REJECT"),
633+
e,
634+
dim("(caught)")
578635
),
579636
None => {
580637
eprintln!("tamper undetected (bug)");

0 commit comments

Comments
 (0)