TensorWasm
Craton TensorWasm — Glossary
Craton TensorWasm — Glossary
Short definitions of recurring terms in TensorWasm's documentation, RFCs, and crate-internal comments. Each entry is a two-to-three-line paragraph; deeper material lives in the linked specs and design docs.
UVM (Unified Virtual Memory)
CUDA's managed-memory facility, accessed via cudaMallocManaged, that
lets host code and device kernels share a single address space. The
driver page-migrates the backing pages between host RAM and GPU memory
on demand. TensorWasm's UnifiedBuffer is allocated through UVM so a
Wasm guest's linear memory is directly addressable by a launched
kernel.
MPS (Multi-Process Service)
NVIDIA's CUDA context-sharing daemon (nvidia-cuda-mps-control) that
multiplexes several client processes onto one device-side context, so
co-located tenants share SMs without serialising at the driver. See
docs/MPS-SETUP.md for the startup, capabilities, and
the runtime probe TensorWasm uses to decide between MPS-shared and
per-tenant CUDA contexts.
MIG (Multi-Instance GPU)
Hardware-level GPU partitioning on A100, H100, and later datacenter
parts. The GPU is split into up to seven isolated instances, each with
its own SMs, L2 slice, and memory partition. MIG is the long-term
mitigation for the L2 cache timing side channel called out in
SECURITY.md; it provides hardware
isolation that MPS alone does not.
PTX
NVIDIA's intermediate-representation assembly for CUDA kernels —
human-readable, architecture-agnostic, and consumed by ptxas to
produce the SASS that runs on a specific compute capability. Wasm
guests upload PTX via wasi_cuda_load_ptx; TensorWasm validates it
with ptxas --gpu-name <arch> before caching the compiled module.
WMMA
Warp Matrix Multiply-Accumulate — the tensor-core instruction class
(mma.sync family in PTX) that performs a small fixed-shape
matrix-multiply-and-accumulate in one warp-wide operation. WMMA is the
path through which the auto-offload pipeline lights up the tensor
cores on Volta-and-newer hardware.
BLAKE3 fingerprint
A 32-byte BLAKE3 digest used as the cache key in
tensor-wasm-exec for compiled Wasm modules. The full digest is
stored (post-H11 fix); earlier code truncated to 16 bytes which the
H11 hardening item ruled out as too narrow a key for an
adversarially-chosen-collision threat model.
Deopt guard
A JIT-mechanism check inserted around an auto-offloaded kernel that verifies the preconditions used to admit the offload still hold at each invocation. On a miss — alignment, shape, dtype, or tenant-quota violation — the guard falls back to CPU execution rather than launching the kernel.
Dispatch future
The F3-backend abstraction in tensor-wasm-exec over kernel-launch
completion. The same DispatchFuture type covers the busy-poll,
sleep-loop, event-driven, and CUDA-async backends, so the call site
does not need to know which polling strategy the operator selected.
Back-pressure semaphore
The bounded counting semaphore that caps the number of in-flight
asynchronous kernel dispatches per tenant context. The bound prevents
the CUDA driver from saturating on a single tenant's burst and is the
mechanism through which tensor-wasm-tenant quota counters take
effect on the async path.
Tenant capability
A newtype guard token (added by the H17 hardening item) required to
mutate any TenantContext quota counter. The token is unforgeable
within the crate's public surface, so any code path that updates a
counter has had to acquire a TenantCapability first — the type
system enforces the invariant.
Snapshot wire format
The v2 binary format for captured-instance snapshots: zstd-compressed
bincode 2.x legacy encoding, wrapped in a header carrying a magic
constant, a format version byte, and a CRC32 over the payload. The
authoritative byte-layout spec is
crates/tensor-wasm-snapshot/FORMAT.md;
the cross-version compatibility promise is in
SNAPSHOT-COMPATIBILITY.md.
Auto-offload
The runtime decision to JIT a Wasm function down to a GPU kernel,
made by the auto-offload detector based on a vector-op-density
heuristic on the function's body. See
AUTO-OFFLOAD.md for the precise admission rules
and the patterns the detector recognises versus rejects.
Cold start
The interval from the first request for a function instance to that
instance being fully warm and serving steady-state latency. The five
additive components — image fetch, Wasm compile, snapshot restore,
CUDA-context warmup, first-request overhead — are decomposed in
COLD-START.md; snapshot capture/restore latency is
the dominant lever on this number.
WASI-cuda
The host-interface package, currently versioned as
wasi:cuda/host@0.2.0, that gives Wasm guests typed access to PTX
load and kernel-launch host functions. The WIT lives under wit/ and
is the surface against which both the explicit kernel-dispatch path
and the auto-offload fast path are bound.
pliron / cuda-oxide
The two Rust scaffold backends prototyped in RFC 0001 (next-generation lowering) as candidates for the post-v1 kernel-IR pipeline. Neither is wired into the v0.3.x release — they are scaffold-only — and the RFC records the trade-offs that will inform the eventual lowering-backend choice.