TensorWasm

Run untrusted code. On the GPU. Safely. At serverless speed.

License: Apache-2.0Read the docs (69) →View on GitHub

RTX 2060

Validated on real GPU

Tagged releases (v0.1 → v0.3.7)

Tied

CPU speed vs Wasmtime 45

Open external-audit findings

Modern problems require modern solutions.

Modern AI and data workloads want two things that have always been at odds: the isolation of a sandbox — so you can run untrusted, multi-tenant code without it touching the host or its neighbours — and the raw throughput of the GPU, so the work actually finishes on time.

Traditional sandboxes give you safety but keep you on the CPU. Hand-written GPU code gives you speed but no isolation. TensorWasm gives you both in one runtime: sandboxed .wasm guests that dispatch real CUDA kernels through a typed host interface.

This isn't a whitepaper. The full path — Wasm guest → wasi:cuda → cuLaunchKernel → read results back — runs end-to-end on a real NVIDIA GPU, with tests asserting the GPU actually computed the right answer. On the pure-CPU path, throughput is statistically tied with upstream Wasmtime 45.

A snapshot subsystem captures and restores Wasm + GPU state, so cycling many small functions doesn't mean paying full instantiation cost every time. Apache-2.0, with a permissive trademark policy — commercial use, modification, and redistribution all permitted. No open-core bait-and-switch.

✦

Sandboxed by construction

Every workload is a WebAssembly module isolated by Wasmtime. Untrusted code stays in its lane — memory-safe, capability-gated, and deadline-enforced. No escape hatches.

✦

GPU-native, not GPU-adjacent

Guests reach the GPU through a typed wasi:cuda interface. Wasm linear memory is backed by CUDA Unified Memory, so data is reachable from the GPU without a copy.

✦

Multi-tenant from the first line

One process, many tenants — each with scoped bearer tokens, per-token rate limits, and per-tenant GPU memory quotas. Isolation is the architecture, not a deployment pattern bolted on.

✦

Production-ready ops included

Prometheus metrics, end-to-end OpenTelemetry traces, a drop-in Grafana dashboard, structured audit logs, published SLOs, and one runbook per alert — all shipped in the repo.

The Technical Edge

Why experts choose TensorWasm

Typed wasi:cuda host interface

Guests perform explicit kernel dispatch today, with opt-in automatic offload on the roadmap. Wasm linear memory is backed by CUDA Unified Memory for zero-copy data sharing. Requires CUDA 12.0+ and SM_70+ for standard kernels; the CPU path runs anywhere Wasmtime runs.

Multi-tenant isolation & quotas

Scoped bearer tokens, per-token rate limits, and per-tenant GPU memory quotas out of one fleet. An OpenAI-compatible /v1/completions and /v1/chat/completions gateway with streaming responses sits in front, with auth and audit on every request.

Snapshots & fast cold-starts

An 11-crate Rust workspace wrapping Wasmtime (not a fork). A snapshot subsystem captures and restores combined Wasm + GPU state so high-churn, small-function workloads avoid full instantiation cost on every cycle.

Ready to secure
the future?

Request Expert Briefing