TensorWasm
Wasmtime Fork Assessment
Wasmtime Fork Assessment
Decision
TensorWasm does NOT fork Wasmtime.
The original question was whether to configure Cranelift's existing flags or
fork wasmtime-cranelift to install a compilation hook that flags GPU-offload
candidates. After investigation we chose a third path: walk a simplified
intermediate representation (see [tensor_wasm_jit::detector::BlockIR]) that the
TensorWasm front-end populates from the Wasm bytes via wasmparser.
This avoids:
- A long-lived patch against
wasmtime-craneliftthat drifts with every upstream minor release. - A maintenance burden that diverges from upstream JIT improvements.
- The risk of subtle correctness bugs at the CLIF-rewrite boundary.
Cost
The trade-off is that TensorWasm's detector cannot see Cranelift's downstream optimisation results (register pressure, constant folding, loop unrolling). In practice this matters less than expected because:
- The detector triggers based on structural features (v128 op ratio,
static loop trip count) which
wasmparsercan extract directly from Wasm. - Cranelift's post-pass optimisations rarely change the v128 ratio of a basic block by more than a few percent.
- When the detector misclassifies a candidate,
DeoptGuard(S13) catches the error at runtime and re-executes on CPU.
When we would fork
If empirical evidence shows that Cranelift's downstream IR contains
information we can't derive from wasmparser (most likely: post-inlining
trip count refinement), we revisit the decision. The risk register
(docs/RISKS.md) tracks this entry alongside the rest of
the open architectural risks.
Upstream contributions
A known upstream limitation is that Cranelift does not currently expose
cranelift::Module's CLIF passes as a public extension point. Should
upstream add such a hook, the simplified IR becomes optional and the
project can opt into richer Cranelift integration without a fork.
Status: Decision of record. Last reviewed: 2026-05-28 (v0.3.7).