Skip to content

[Web] Improve large tensor loading in wasm runtime#19771

Draft
MakotoUwu wants to merge 1 commit into
apache:mainfrom
MakotoUwu:ole-34/apache-tvm-webgpu-runtime-gemma4
Draft

[Web] Improve large tensor loading in wasm runtime#19771
MakotoUwu wants to merge 1 commit into
apache:mainfrom
MakotoUwu:ole-34/apache-tvm-webgpu-runtime-gemma4

Conversation

@MakotoUwu

Copy link
Copy Markdown

This splits out the Web/WebGPU runtime-only portion of #19766 into a smaller PR, following reviewer feedback that the compiler-side changes should be handled separately.

This PR keeps the scope to web/ runtime code:

  • reorder FFI implementation includes before runtime implementation includes in the wasm single-translation-unit build, avoiding static initialization ordering issues during module startup
  • make ArrayDecodeStorage tolerate f32-to-bf16 records whose payload is already native float32-sized, while preserving the existing packed-bf16 expansion path
  • load large tensor-cache records in chunks to avoid oversized JS-to-wasm decode/copy calls
  • unpack kTVMFFIShape callback results as JS number arrays so chunked tensor views can pass explicit shape tuples

Local validation:

  • npm run lint from web/
  • npx tsc --noEmit --pretty false from web/
  • git diff --check

I could not run the full local npm run prepwasm && npm run build path on this machine because Emscripten (emcc/emsdk) is not installed. The earlier broad PR had the Apache wasm CI pass before the compiler-side CI failures; this PR is intended to let the wasm job validate the runtime-only split independently.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request restructures TVM-FFI includes in wasm_runtime.cc to prevent static initialization crashes and updates ArrayDecodeStorage to tolerate uncompressed float32 weights under the 'f32-to-bf16' format. In runtime.ts, it introduces chunked record loading and copying (up to 128MB chunks) to handle large tensors efficiently, and adds support for kTVMFFIShape types. The review feedback suggests optimizing these chunking loops by utilizing the cached makeShapeTuple method on the Instance class rather than invoking the FFI this.ctx.makeShapeTuple repeatedly, which reduces redundant FFI round-trips.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread web/src/runtime.ts Outdated
Comment on lines +1469 to +1471
this.ctx.makeShapeTuple(
...chunkShape.map((value) => new Scalar(value, "int")),
),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

We can leverage the cached makeShapeTuple method on the Instance class instead of directly calling the FFI this.ctx.makeShapeTuple on every chunk. This avoids redundant FFI round-trips to create the same shape tuple multiple times across chunks and records, improving performance.

                    this.makeShapeTuple(chunkShape),

Comment thread web/src/runtime.ts Outdated
Comment on lines +1513 to +1515
const chunkShapeTuple = this.ctx.makeShapeTuple(
...chunkShape.map((value) => new Scalar(value, "int")),
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

We can leverage the cached makeShapeTuple method on the Instance class instead of directly calling the FFI this.ctx.makeShapeTuple on every chunk. This avoids redundant FFI round-trips to create the same shape tuple multiple times across chunks and records, improving performance.

                  const chunkShapeTuple = this.makeShapeTuple(chunkShape);

@MakotoUwu MakotoUwu force-pushed the ole-34/apache-tvm-webgpu-runtime-gemma4 branch from 012380a to 9c4334d Compare June 14, 2026 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant