Add web-based shiny hunter with multi-worker parallel hunting#5
Merged
Conversation
Wizard-based UI (save state → record macro → hunt) with a bare-WASM hunt worker that bypasses WasmBoy's postMessage overhead for 40-80x speedup over the current spike's 4 att/sec.
12 tasks: extract WASM binary, build bare-WASM core + hunt worker, macro recorder, hunt orchestration, UI components, 3 wizard steps, page shell, integration test, and performance benchmark.
WasmBoy's package.json has "browser": UMD which webpack picks over "module": ESM. The UMD bundle fails to chunk-split in Next.js, causing a 404 at runtime. Force webpack to resolve wasmboy to the ESM entry.
The recorder pauses WasmBoy's internal rendering loop and drives frames manually via tick(). But tick() only executes the WASM frame without rendering. Add renderFrame() that reads the frame buffer from WASM memory and draws it to the canvas.
Audio was previously disabled in both windowed and headless modes to avoid the mobile audio context handshake. Enable it when running windowed so the user gets sound while playing the game in steps 1 and 3 (post-shiny playback). Headless mode (used by the recorder) keeps audio off.
Saves a per-game checkpoint (save state + verified macro) to IndexedDB after Steps 1 and 2 complete. When the user returns and loads a ROM that matches an existing checkpoint, the Save State step offers to skip directly to Hunt (or Record Macro if no macro was saved yet), removing the need to re-play through the intro every session. Checkpoints are keyed by game/region (one per slot).
WasmBoy's singleton worker becomes unusable after the recorder drives its frame loop manually with _runWasmExport — its postMessage wrapper times out on subsequent SET_MEMORY calls. Verifying the recorded macro through that path was producing 'Verify failed: undefined' errors. Spawn a fresh hunt worker (the same one used for the actual hunt) for verification: load the recorded state, replay the macro once with delayWindow=1, and read back species + DVs. The bare worker has its own WASM instance and is independent of WasmBoy's worker. Also makes RecordingSession.stop() async so callers wait for the last in-flight frame to finish before reading the final macro.
Three independent fixes/optimizations in the bare WASM core:
1. Game Boy address translation (gbToWasm). WasmBoy's gameBoyMemory
blob is not a flat 0x0000-0xFFFF GB address space — it packs VRAM,
WRAM, and other internal memory at fixed offsets within the
gameBoyInternalMemory region. Reading species at GB 0xD164 with
the previous naive 'base + gbAddr' formula landed in the OTHER
region and returned zeros. Map GB addresses to WASM offsets the
same way WasmBoy's getWasmBoyOffsetFromGameBoyOffset does.
2. Patch a typo bug in WasmBoy 0.7.1's compiled WASM binary. Each of
the four sound channels' loadState() uses the channel's runtime
cycleCounter value as the save-state slot index instead of the
constant saveStateSlot:
Channel1.cycleCounter = load<i32>(getSaveStateMemoryOffset(0x00,
Channel1.cycleCounter)); // should be Channel1.saveStateSlot
After the first state load cycleCounter holds garbage, so the
second loadState computes an offset far past the end of WASM
linear memory and traps with 'memory access out of bounds'.
Patch the four buggy 'global.get cycleCounter' instructions in
the binary at instantiation time, replacing them with
'i32.const SLOT' (slots 7-10). Handles both 2-byte (single-byte
LEB128) and 3-byte (two-byte LEB128) global.get encodings.
3. Configure the headless emulator for speed: enable audio/graphics/
timers batch processing and disable scanline rendering. Replace
the per-frame executeFrame() loop in tick() with a single
executeMultipleFrames() call to remove JS↔WASM crossing overhead.
Also adds an ensureMem helper that refreshes the Uint8Array view if
the underlying ArrayBuffer was detached by a memory.grow.
Previous worker only posted progress every 50th attempt, every shiny, and on attempt 1. With the bare WASM core running at ~0.5 attempts/sec, that meant the UI counter stayed at '1' for ~100s before jumping to 50 — not a useful progress indicator. Post on every attempt instead. The message rate stays low enough (< 1 message/sec) for the main thread to handle without back-pressure. Also includes the latestSpecies in progress messages (used by the verify path) and adds a few phase-level console logs for diagnosing worker startup.
WasmBoy's own benchmark config uses tileCaching because it renders frames; our hunt worker doesn't render at all, so disabling scanline rendering entirely wins by a wider margin on dialog-heavy scenes (e.g. the starter selection where text crawls and sprites animate). Switching our flags to match the benchmark's config slowed Pokémon Red's macro replay from ~2.0s to ~2.8s per attempt. Comment-only change.
The hunt walks the 65,536-frame delay window without replacement, seeded by masterSeed. Previously the seed was random, causing the worker to tick the bootstrap state forward by an average of 32k frames (~80s at 400fps) before attempt 1 could even start. Random seeding made sense in the Python implementation where multiple parallel workers each took a different slice of the window. The browser worker is single-threaded today so there's nothing to coordinate — random seeding only added startup latency. When we add multi-worker support, each worker will be assigned an explicit slice via a startDelay parameter rather than a hashed seed.
Each progress message now ships the framebuffer with it (instead of firing separate 'frame' messages every 200ms), and Hunt.tsx draws the GB screen plus a 'real Pokemon dialog' textbox showing the attempt number, species name, DVs, and a SHINY! line when applicable. The textbox is rendered tile-by-tile from pokered's 8x8 1bpp font (font.png + font_extra.png) and the Crystal shiny sparkle, ported from the Python monitor. The MonitorGrid component is updated to render canvas cells with the same overlay (kept around for reuse when we add multi-worker hunting), but the Hunt step now uses just the single live canvas.
…orker For the first three attempts, log how long readState, macro replay, and the settle loop take so we can pinpoint where any slowdown lives. For attempt 1 only, re-replay the macro from the saved pre-macro state with 30-frame chunked polling to find the earliest frame at which the party's species and DV bytes become readable. If that frame is well below macroTotalFrames the macro was over-recorded and the hunt could be sped up by trimming the trailing wait. These logs run on every hunt right now; we can gate them behind a debug flag once we're past the perf-tuning phase.
Splits the 65,536-frame delay window into N contiguous slices and
spawns one Web Worker per slice. With ~6 workers on a typical machine,
steady-state throughput goes from ~0.5 to ~3 attempts/sec.
Worker (hunt-worker.ts):
- Takes a workerId, startDelay, and delayCount instead of masterSeed
+ delayWindow. Bootstraps once to startDelay, then walks delayCount
consecutive delays linearly (no wrap-around — each slice is its own
contiguous chunk).
- All outbound messages carry workerId so the main thread can route
per-worker progress into the monitor grid.
- New 'pause' inbound message: sets the paused flag without finding a
shiny, used to coordinate stop-on-shiny across the pool.
- Diagnostic timing/earliest-species logs are gated to worker 0 to
keep the console readable.
Main thread (hunt.ts):
- defaultWorkerCount() picks max(2, min(6, hardwareConcurrency-1)).
- Each worker gets its own cloned state buffers (postMessage transfers
detach, so cloning is required per worker).
- onProgress aggregates totalAttempts and attemptsPerSec across the
pool; onWorkerProgress feeds the per-cell monitor grid.
- On shiny, broadcasts 'pause' to every other worker so they stop at
their next attempt boundary while the user reviews the result.
- Stop/resume now broadcast to every worker.
UI (Hunt.tsx + MonitorGrid):
- The single live preview canvas is replaced by a per-worker grid: one
cell per worker, each cell rendering that worker's latest framebuffer
with the GB-style overlay (W{n} {NAME} / ATK ../DEF.. / SPD../SPC../HP..).
- Grid is capped at 2 rows so every worker is visible at once
(cols = max(2, ceil(N/2))).
- Shiny playback uses a separate canvas that only mounts when the
user clicks Play.
Verify (verify.ts):
- Updated to the new start-message shape (workerId=0, startDelay=0,
delayCount=1).
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Builds out the entire web-based shiny hunter as a 3-step wizard (Save State → Record Macro → Hunt), implementing the design at `docs/superpowers/specs/2026-04-30-web-shiny-hunter-design.md` and the plan at `docs/superpowers/plans/2026-04-30-web-shiny-hunter.md`.
The hunt loop runs across N parallel Web Workers (default 6) using a bare-WASM core extracted from WasmBoy, achieving ~3 attempts/sec steady-state on a typical machine. Each worker walks a contiguous slice of the 65,536-frame delay window; the UI shows a per-worker monitor grid with the GB-style textbox overlay (species, DVs, shiny indicator) just like the Python `monitor.py`.
Highlights
Test plan
Known limitations