TensorWasm

Feature-Status Matrix — Canonical Source of Truth

Feature-Status Matrix — Canonical Source of Truth

This file is the single canonical status source for every major TensorWasm feature: whether it is wired, scaffold-only, hardware-gated, or planned, and behind which Cargo feature flag (if any). It exists to end the scaffold-vs-wired drift between README.md, CHANGELOG.md, and docs/OPENAI-COMPAT.md.

Workspace version: 0.3.7 (root Cargo.toml).

When this table and any prose elsewhere disagree, this table wins. README.md, CHANGELOG.md, and the per-feature spec docs defer to it for status; they may add narrative, but the Status column here is authoritative.

Status vocabulary

TermMeaning
WiredReachable and functional on the default build or a documented feature flag — works end-to-end through the invoke / HTTP / store path it claims.
LandedImplementation present and exercised, but the production-relevant verification (typically the GPU path) is #[ignore]d pending hardware; host-side behaviour runs today.
ScaffoldCompiles, surface-area-stable types + tests exist, but the call site returns a documented not-yet-wired sentinel (e.g. NotYetWired / NotYetImplemented / FEATURE_NOT_EXPOSED).
Hardware-gatedCode is written and reviewed but has never executed against real NVIDIA silicon; CI links the CUDA stubs and the proving test is #[ignore = "requires CUDA hardware"]. See HARDWARE-GATED-WORK.md.
Proven on hardwareThe path has actually executed against real NVIDIA silicon and produced verified-correct output. The proving test is still #[ignore = "requires CUDA hardware"] (so it is skipped in GPU-less CI and needs a GPU runner / local --features cuda --include-ignored run) — #[ignore] here means "CI has no GPU", not "unproven".
Planned-v0.4Not yet implemented; tracked for v0.4+.

Scaffold is deliberately not called "Wired": a green default-build test does not prove the driver call works. A path is only marked Proven on hardware once it has actually run on a GPU; an unrun CUDA path stays Hardware-gated. The end-to-end GPU launch path (kernel_args_e2e / vector_add_end_to_end_real_ptx_real_kernel) is Proven on hardware — this is the same claim presentation.md makes ("Proven on real silicon"), and this table now agrees with it.

Matrix

FeatureCrate(s)StatusCargo feature flagNotes / Tracking-ID
Typed multi-value guest argstensor-wasm-exec, tensor-wasm-cli, tensor-wasm-apiWirednone (default)T33. WasmArg enum + JSON↔Val codec; --args <JSON> plumbed end-to-end through CLI / HTTP /invoke{,-async,-stream} / SpawnConfig::with_args. call_export_with_args supersedes deprecated call_export.
/invoke-stream SSE streamingtensor-wasm-wasi-gpu, tensor-wasm-apiWirednone (default)T34. Guest wasi:tensor/host.emit-chunk calls surface as SSE event: chunk frames via StreamingContext. Honors T36 cooperative deadlines (DEADLINE-ELAPSED → terminal event: error). Replaces the 0.3.7 single not_yet_wired frame. Spec: STREAMING.md.
Signed kernel registry / DiskRegistrytensor-wasm-jit (+ tensor-wasm-api for HTTP)Wiredkernel-registry (jit); kernel-registry-api (api)T35. Disk-persisted DiskRegistry over tensor-wasm-artifacts::DiskArtifactStore, restart-safe, paginated (list_paginated, cap 1000), optional publisher allowlist. HTTP backend selected by TENSOR_WASM_API_KERNEL_REGISTRY_DIR. CLI kernel publish|list|verify is wired (B6.4): publish BLAKE3-hashes + signs a KernelManifest and POSTs it to /kernels, list GETs + renders the manifest table, verify re-computes the HMAC locally (constant-time) against a manifest blob on disk. Replaces the prior v0.3.6 scaffold that exited FEATURE_NOT_EXPOSED (3). Servers built without --features kernel-registry-api (the default) return 503 kernel_registry_not_configured, which the CLI surfaces as a clear error. Source: crates/tensor-wasm-cli/src/cmd/kernel.rs. Signing envelope is v2 (T12). Spec: KERNEL-REGISTRY.md.
Cooperative epoch yield / deadlinestensor-wasm-wasi-gpu, tensor-wasm-execWirednone (default)T36 + this session's executor change. wasi:scheduler/host@0.1.0 SchedulerContext; the executor's per-invocation Instant deadline drives both scheduler verdicts (CONTINUE / DEADLINE-NEAR / DEADLINE-ELAPSED) and BackPressure acquire rejection (DEADLINE_NEAR_WINDOW = 50ms). Spec: COOPERATIVE-YIELD.md.
Pre-instantiated instance pooltensor-wasm-execWirednone (default)T37. InstancePool + InstancePoolConfig wired through the invoke path; per-(tenant, module-hash) channel with pre-spawn and reset-on-return. Spec: INSTANCE-POOL.md.
Differential correctness oracletensor-wasm-jitLanded (host); GPU path hardware-gateddifferential-oracleT38. Proptest harness driving DifferentialOracle over matmul / vector_add / conv2d blueprints + per-kernel tolerance table. Host (Wasmtime CPU) verdicts run end-to-end today; CUDA GPU verdicts are #[ignore] pending the S22 self-hosted runner. Spec: DIFFERENTIAL-ORACLE.md.
Per-tenant GPU memory quotas (in-process)tensor-wasm-tenantWirednone (default)T39. TenantContextBuilder::with_gpu_memory_bytes_cap + consume_gpu_bytes / release_gpu_bytes. In-process counter is the primary accounting source. Spec: GPU-QUOTAS.md.
GPU memory quotas (cuMemPool + host-side cap)tensor-wasm-memHardware-gatedgpu-mem-pool (strict-superset of cudarc-backend)T39. TenantMemPool gives each tenant a cuMemPool and enforces the cap host-side in allocate (CAS over live_bytes vs cap_bytes), routing allocations through cuMemAllocFromPoolAsync (UnifiedBuffer::new_in_tenant_pool). NB: CU_MEMPOOL_ATTR_RELEASE_THRESHOLD is a retention hint, NOT a driver-level cap — a hardware run (GPU-VALIDATION-2026-05-30 BUG-1) disproved the earlier driver-pin claim. Requires CUDA 11.2+; the driver-reject test is #[ignore]. Bypass-resistant for pool-routed allocations; a raw-driver-handle tenant is out of scope (see GPU-QUOTAS.md).
Artifact-backed snapshotstensor-wasm-snapshot, tensor-wasm-artifactsWired (default; BREAKING for default writer)artifact-backing (on by default)T40. Default snapshot envelope flipped from legacy inline v3 to DiskArtifactStore-backed v4. Reads still accept legacy v2/v3 indefinitely. Opt out per-call via capture_legacy() or build --no-default-features --features signed-snapshots.
Unified content-addressed artifact storetensor-wasm-artifactsWirednone (always-on crate; no [features])Roadmap #9. ArtifactStore trait + InMemoryArtifactStore + fully-implemented DiskArtifactStore (BLAKE3 content-hash + zstd + HMAC-SHA256 + 16-byte magic). Backs snapshots (T40) and the JIT disk cache (T30). Spec: ARTIFACT-STORE.md.
OpenAI-compat gatewaytensor-wasm-apiWirednone (always-on routes)T41. /v1/completions + /v1/chat/completions translate to internal invoke via TENSOR_WASM_API_OPENAI_MODEL_MAP (model:uuid,…); buffered or SSE. Closes the 0.3.7 501 openai_not_yet_wired scaffold. Caveats: argv marshalling calls _start() -> () (no typed prompt arg yet); usage token counts are zeros (no tokenizer); multimodal content dropped — all deferred to v0.5. Spec: OPENAI-COMPAT.md.
CUDA backend — cust (UVM)tensor-wasm-memHardware-gatedunified-memory (pulls cust)Default cust 0.3 backing. cuMemAllocManaged + cuMemAdvise + cuMemPrefetchAsync. Round-trip tests #[ignore]. The historical "default" GPU backing.
CUDA backend — cudarctensor-wasm-memHardware-gatedcudarc-backend (pulls cudarc)cust → cudarc migration spike (W1.2 / CUDARC-SPIKE.md). Parallel UnifiedBuffer impl. Allocation/prefetch tests #[ignore].
CUDA backend — cuda-oxidetensor-wasm-mem, tensor-wasm-jitScaffoldcuda-oxide-backendRFC 0001 v0.5 cust-successor. Dep-less (no cuda-oxide crate pulled in); on tensor-wasm-mem exposes CudaOxideUnifiedBuffer returning the NOT_YET_WIRED sentinel; on tensor-wasm-jit pulls pliron 0.15 (crates.io) for the pliron_dialect scaffold. The git-pinned experimental-cuda-oxide-host-backend feature and its cuda-host/cuda-core/cuda-async git deps were removed this cycle pending crates.io publish (re-add per CUDA-OXIDE-CUTOVER.md).
Real CUDA host functions / kernel launchtensor-wasm-exec, tensor-wasm-wasi-gpu, tensor-wasm-snapshot, tensor-wasm-tenantEnd-to-end launch PROVEN ON HARDWARE; lower-level host fns still partly hardware-gatedcudaReal wasi:cuda host fns, contexts, cuLaunchKernel, GPU snapshot restore. Typed argv lowering for scalar + pointer kernel args (W1.1). The full launch path — Wasm guest → wasi:cudacuLaunchKernel → results read back and asserted (c[i]==a[i]+b[i] from managed memory) — is VERIFIED ON REAL SILICON: kernel_args_e2e (incl. vector_add_end_to_end_real_ptx_real_kernel) passes 8/8 on an RTX 2060 (cc 7.5, CUDA 13.2), re-confirmed 2026-06-01. These tests remain #[ignore = "requires CUDA hardware"] so they are skipped in (GPU-less) CI — they require a GPU runner or a local --features cuda --include-ignored run, not because the path is unproven. Individual lower-level cuda host fns still flagged UNVERIFIED-PENDING-HARDWARE in HARDWARE-GATED-WORK.md remain as-is unless covered by this e2e proof. See GPU-VALIDATION-2026-05-30.md.
JIT auto-offload pipelinetensor-wasm-jitWired (pipeline always compiled); CUDA wiring hardware-gatedauto-offload gates extra CUDA-side wiringThe Cranelift-free detector → BlockIR → PTX-text pipeline is always compiled and runs on host; the feature only gates CUDA-side wiring tested under --features cuda. Spec: AUTO-OFFLOAD.md.
JIT MatMul / wmma PTX emissiontensor-wasm-jitScaffold (refused by default); wmma hardware-unverifiedruntime EmitConfig::enable_experimental_matmul (no Cargo flag)MatMul returns EmitError::NotYetImplemented by default. The wmma.mma.sync sm_80+ lowering fires only when enable_experimental_matmul = true, and even then the emitted PTX is hardware-unverified (the dev RTX 2060 is SM_75 and cannot run it; needs the SM_89 runner). See HARDWARE-GATED-WORK.md item 7.
pliron PTX pipeline (stages)tensor-wasm-jitScaffoldcuda-oxide-backend; stage-2 twasm.*llvm.* under pliron-llvm-backendpliron_dialect / pliron_lowering stages return NotYetImplemented / NotYetWired sentinels. pliron-llvm-backend is a strict superset of cuda-oxide-backend and carries a hard llvm-sys = "221" dep (needs system LLVM 221). Spec: PLIRON-PIPELINE.md.

Non-feature-flag capabilities (always on)

These ship by default with no Cargo feature gate; configured at runtime (env/config), included here so the matrix doubles as a complete status reference:

CapabilityStatusConfiguration
Signed snapshots (HMAC-SHA256, v3)Wiredsigned-snapshots feature (on by default)
Capability-to-registry bindingWired (unconditional; enforced even with default-features = false)strict-cap-binding gates only the typed *_strict admin APIs
Scoped bearer tokens / audit log / per-token rate limit / HTTP metricsWiredenv vars (TENSOR_WASM_API_*)
Async execution + epoch interruptionWiredalways-on behaviour, not a flag
NVIDIA MPS shared contextsRuntime-selectedenv/config, not a Cargo flag — see MPS-SETUP.md
OTLP exporterOpt-inotlp feature (tensor-wasm-core)

Sources

  • ../CHANGELOG.md [0.3.7] section and the T-task wire-up notes (T30, T33–T41, T8/T9/T12).
  • Per-crate [features] tables in each crates/*/Cargo.toml.
  • OPENAI-COMPAT.md — T41 wiring + v0.5 caveats.
  • HARDWARE-GATED-WORK.md — authoritative inventory of unverified-on-silicon paths.
  • The per-feature spec docs linked in the Notes column.