TensorWasm
Getting Started with Craton TensorWasm
Getting Started with Craton TensorWasm
Welcome to TensorWasm — a runtime for deploying WebAssembly functions with first-class GPU acceleration. This guide takes you from a clean checkout to invoking your first deployed function in about fifteen minutes.
Audience
This guide is written for Rust developers who want to deploy Wasm functions that can offload work to a GPU. You should be comfortable with cargo, basic shell usage, and have a passing familiarity with HTTP APIs. Prior CUDA experience is helpful but not required — TensorWasm can auto-offload many SIMD-shaped Rust loops without you ever writing a kernel by hand.
If you're operating TensorWasm in production rather than developing functions for it, jump straight to DEPLOYMENT.md.
Prerequisites
| Requirement | Why | Notes |
|---|---|---|
| Rust toolchain | Building TensorWasm and your Wasm guests | rustup will pick up the nightly channel automatically from rust-toolchain.toml. |
| CUDA 12.0+ | GPU offload (optional) | Only required if you want GPU acceleration. See CUDA-SETUP.md. |
| Docker | Observability stack | Used by the bundled docker-compose.yml to spin up Jaeger, Prometheus, and Grafana. |
Verify your toolchain:
rustup --version
cargo --version
nvcc --version # only if you want GPU offload
docker --version
If nvcc is missing, that's fine — TensorWasm will fall back to CPU execution for kernels that haven't been explicitly compiled to PTX.
Hello, TensorWasm
The five steps below walk you from git clone to a deployed function answering HTTP requests.
1. Clone and build the workspace
git clone https://github.com/craton-co/craton-tensor-wasm.git
cd craton-tensor-wasm
cargo build --workspace
The first build pulls a handful of large dependencies (wasmtime, the CUDA bindings, the OpenTelemetry stack) and takes 5–10 minutes on a modern laptop. Subsequent builds are incremental.
If you hit build errors related to CUDA, see BUILD.md for the CUDA_ROOT and CUDA_ARCH overrides.
2. Write a hello-world Wasm module
In a sibling directory to your tensor-wasm checkout:
cargo new --lib hello
cd hello
rustup target add wasm32-wasip1
Replace src/lib.rs with:
#[no_mangle]
pub extern "C" fn add(a: f32, b: f32) -> f32 {
a + b
}
Then add to Cargo.toml:
[lib]
crate-type = ["cdylib"]
Build it:
cargo build --target wasm32-wasip1
You now have target/wasm32-wasip1/debug/hello.wasm. For richer examples — including GPU kernels — read the Wasm Developer Guide.
3. Run it locally with the tensor-wasm CLI
From your tensor-wasm checkout:
cargo run --bin tensor-wasm -- run ../hello/target/wasm32-wasip1/debug/hello.wasm
This loads the module into an in-process runtime and runs it once. Use this loop for fast iteration during development — there's no HTTP layer, no scheduler, just the engine.
For the full CLI surface, see CLI.md.
4. Start the HTTP server
To expose your function over the network, start the API gateway:
cargo run --bin tensor-wasm -- serve --addr 0.0.0.0:8080
You should see startup logs ending in something like:
tensor-wasm-api listening on 0.0.0.0:8080
The server is stateless at the gateway tier — durability lives in snapshots, which we'll cover in DEPLOYMENT.md.
5. Deploy and invoke
In a second shell:
# Deploy
curl -X POST http://localhost:8080/functions \
-H 'Content-Type: application/octet-stream' \
--data-binary @../hello/target/wasm32-wasip1/debug/hello.wasm
# Response includes a function id, e.g. {"id":"fn_abc123"}
# Invoke
curl -X POST http://localhost:8080/functions/fn_abc123/invoke \
-H 'Content-Type: application/json' \
-d '{"export":"add","args":[2.0,3.0]}'
# => {"result": 5.0}
You've just deployed and invoked your first TensorWasm function. For the full HTTP surface — async invocation, streaming, batching — see API.md.
Observability
TensorWasm emits OpenTelemetry traces and Prometheus metrics out of the box. The fastest way to see them is the bundled compose stack:
docker compose up -d jaeger prometheus grafana
TENSOR_WASM_OTLP_ENDPOINT=http://localhost:4317 \
TENSOR_WASM_LOG=debug \
cargo run --bin tensor-wasm -- serve --addr 0.0.0.0:8080
Then open http://localhost:16686 for Jaeger traces of every invocation, including per-host-call spans for kernel launches.
The TENSOR_WASM_LOG env var follows the tracing-subscriber directive format: TENSOR_WASM_LOG=info,tensor_wasm_wasm=debug is a useful default during development.
For the full metrics catalog and dashboard layout, see OBSERVABILITY.md.
Next steps
Now that you have an end-to-end loop working, here are the docs to read next:
- WASM-DEVELOPER-GUIDE.md — write real compute functions, including hand-tuned GPU kernels via
wasi:cuda/host@0.2.0. - AUTO-OFFLOAD.md — learn when and how TensorWasm automatically promotes hot SIMD loops to GPU kernels, no PTX required.
- API.md — the full HTTP API: deploy, invoke, list, delete, snapshots.
- CUDA-SETUP.md — installing CUDA, picking an
sm_*arch, and validating your GPU. - DEPLOYMENT.md — running TensorWasm in production, capacity planning, and disaster recovery.
Welcome aboard.