TensorWasm

Craton TensorWasm — Documentation Index

Craton TensorWasm — Documentation Index

The single-page sitemap for every Markdown document shipped with Craton TensorWasm. The grouping mirrors how a reader actually navigates the project: pick the section that matches your role, follow the link, and the linked doc is the contract.

The wave tag in parentheses (W1.1, W2.3, etc.) records which v0.2–v0.4 hardening wave landed the document; docs without a tag predate the wave program. The link path is relative to this file (i.e. relative to docs/), so ../GOVERNANCE.md points at the repository root.

If a doc is reachable from README.md as well, it is listed in the table at the bottom of that file. The Missing cross-links section at the foot of this page enumerates the docs that are only reachable via this index — a future PR may add anchors for them in README.md.

What this index is and is not

This index is the canonical inventory of in-repository Markdown documentation: every doc that ships in the source tree appears in exactly one section below. The summary text is the single-sentence abstract a reader needs to decide whether to open the doc.

This index is not a tutorial, a learning path, or a status page.

The index is also not the right surface to host long-form content. Anything that needs more than the one-line summary below belongs in the linked doc itself; if a section starts growing past the "one row per doc" rule the section is wrong, not the rule.

Contents

  1. What this index is and is not
  2. Conventions
  3. Getting started
  4. Architecture and internals
  5. API surface
  6. Performance and benchmarking
  7. CUDA
  8. Operations
  9. Security
  10. Governance and supply chain
  11. Snapshots
  12. Missing cross-links
  13. Audience routing
  14. How to extend this index

Conventions

  • Link paths are relative to this file (docs/INDEX.md). A leading ../ therefore steps up into the repository root; a bare filename resolves inside docs/ itself.
  • Wave tags trace each doc back to a workstream entry in PATH-TO-V1.md: W<wave>.<task> matches the row in the per-area workstream tables. Docs that predate the wave program are marked with an em dash.
  • Summaries are deliberately single-sentence. If a doc warrants more context the doc itself should open with that paragraph so this index can quote a one-liner from it.
  • No emoji, no badges. This file is a flat sitemap, optimised for grep, not for visual scanning.

Getting started

The narrow on-ramp for a new contributor or operator: from a clean checkout to a function running against a deployed gateway. Start with GETTING-STARTED.md, then branch into the role-specific guide (CLI.md for the developer, the production-deployment tutorial for the operator).

DocWaveOne-sentence summary
GETTING-STARTED.mdFifteen-minute onboarding tutorial that walks a Rust developer from a clean checkout to invoking a deployed Wasm function.
CLI.mdComplete reference for the tensor-wasm developer CLI, its subcommands, global flags, exit codes, and JSON argument conventions.
tutorials/production-deployment.mdW3.8End-to-end tutorial that takes a competent SRE from a fresh Kubernetes cluster to a production-ready TensorWasm deployment with mTLS, Prometheus, Grafana, audit log, and a deployed function.
MIGRATING-FROM-WASMTIME-WASMER.mdW3.9Honest evaluation guide for teams already running upstream Wasmtime, Wasmer, or a Spin/Wasmer-Edge FaaS deciding whether to move workloads onto TensorWasm.
WASM-DEVELOPER-GUIDE.mdWalkthrough for writing Wasm guests against TensorWasm, from a trivial add(a, b) through the wasi:cuda host imports and the auto-offload fast path.
BUILD.mdBuild matrix for the three supported configurations (no-CUDA, CUDA host, CUDA stub) plus the canonical feature-flag taxonomy.

Architecture and internals

Background for contributors changing the runtime itself: how the crates fit together, the upstream-pinning decisions, the JIT pipeline shape, and the cold-start latency model that the snapshot subsystem exists to fight. The root ARCHITECTURE.md is the entry point; everything else in this section is a deeper cut into one subsystem.

DocWaveOne-sentence summary
../ARCHITECTURE.mdW5.2 (refresh)The eleven-crate dependency graph, the layered execution model, and the trust boundaries between Wasm guest, host process, and CUDA driver.
WASMTIME-FORK.mdThe decision record that explains why TensorWasm does not fork Wasmtime, and which alternative simplified-IR path the JIT detector walks instead.
RISKS.mdLiving risk register tracking architectural risks, upstream pinning decisions, and known limitations, refreshed alongside every CHANGELOG.md release.
AUTO-OFFLOAD.mdUser-facing reference for the auto-offload pipeline: which Wasm patterns the detector recognises, which it rejects, and how to enable it.
CUDARC-SPIKE.mdW1.2The custcudarc migration spike record: version chosen, API mapping table, known gaps, and the recommended cutover plan.
COLD-START.mdThe five-component additive model for cold-start latency on a TensorWasm node and the operator levers that affect each component.
INSTANCE-POOL.mdB5.8Roadmap feature #5 (pre-instantiated instance pool): the wired (T37) warm pool through the invoke path, configuration knobs, and the reset-on-return contract.
KERNEL-REGISTRY.mdB6.3Roadmap feature #3 (signed kernel registry): HMAC-SHA256 KernelManifest records and the wired (T35) disk-persisted DiskRegistry over the artifact store, with paginated GET /kernels.
DIFFERENTIAL-ORACLE.mdB5.9Roadmap feature #6 (differential JIT correctness oracle): bit-identity assertion contract between the Wasmtime CPU path and the JIT PTX path, plus the per-kernel tolerance policy.
ARTIFACT-STORE.mdB6.6Roadmap feature #9 (unified content-addressed signed artifact store): the tensor-wasm-artifacts trait surface, on-disk envelope, and the now-wired convergence that backs snapshots (T40) and the JIT L2 cache (T30).
glossary.mdShort paragraph definitions of recurring CUDA, Wasm, and TensorWasm-internal terms (UVM, MPS, MIG, PTX, WMMA, BLAKE3 fingerprint, deopt guard, dispatch future, etc.).

API surface

The stable wire and binary surfaces TensorWasm commits to: HTTP REST, audit-log JSON, the published Rust + OpenAPI reference archive, and the mTLS contract that fronts them all. The hand-written REST reference in crates/tensor-wasm-api/API.md is the canonical surface for humans; the per-release rustdoc + OpenAPI bundle described in API-REFERENCE.md is the canonical surface for tooling.

DocWaveOne-sentence summary
../crates/tensor-wasm-api/API.mdHand-written REST reference for every endpoint the tensor-wasm-api gateway serves, with request/response examples for each route.
API-REFERENCE.mdW4.8Publication-policy for the per-release rustdoc + OpenAPI archive: what is in it, what is not, the URL contract, and the workflow that produces it.
AUDIT-LOG.mdW2.2Wire-format schema, sink configuration, rotation guidance, and stable-string contract for the structured audit log emitted on state-mutating routes.
STREAMING.mdB6.1Roadmap feature #2 (streaming HTTP invoke responses): the wasi:tensor/host.emit-chunk host-fn contract and the wired (T34) SSE / chunked-transfer path that surfaces real guest chunks.
OPENAI-COMPAT.mdB4.9 / B5.6Roadmap feature #10 (OpenAI-compatible inference gateway shim): the /v1/completions and /v1/chat/completions routes, wired (T41) to internal invoke via TENSOR_WASM_API_OPENAI_MODEL_MAP (buffered or SSE), closing the earlier 501 openai_not_yet_wired scaffold.
deployment/mtls.mdW2.8Two production deployment shapes — self-terminated rustls and reverse-proxy fronting — with a recommended path for the v0.4 binary that still binds plaintext.

Performance and benchmarking

How TensorWasm measures itself, where the published numbers come from, the operator-side SLO contract, and the runbook/dashboard pair that turns burn-rate alerts into mitigation steps. Internal regression (PERFORMANCE.md + committed baseline.json) and external comparison (BENCHMARKING.md) are split deliberately — the latter pulls in the anti-cheating checklist a reader needs to reproduce a result a blog post would publish.

DocWaveOne-sentence summary
PERFORMANCE.mdHow TensorWasm measures performance, what the current reference numbers look like, and how the committed-baseline.json CI regression gate works.
BENCHMARKING.mdCompanion to PERFORMANCE.md focused on external comparisons: same-workload, same-hardware, same-statistics rules for honest competitive benchmarks.
CAPACITY-PLANNING.mdW4.4Three reference SKUs, four sizing formulas, and tenants-per-host curves that translate the SLO targets and bench medians into a host-sizing answer.
SLO.mdW1.9The project's commitment to numeric availability, latency, and error-rate targets for the HTTP surface and kernel-dispatch path.
dashboards/README.mdW2.5Index for the importable Grafana dashboard (tensor-wasm-overview.json) that gives one stat panel per SLI and one row per subsystem.
runbooks/README.mdW2.6One-page-per-alert operator manual: the alert → runbook mapping from SLO.md §7 with shared mitigation-step structure across every page.

CUDA

The toolkit-install path, the multi-tenant MPS daemon contract, the kernel-authoring guide, and the auto-offload reference for the JIT pipeline. A reader new to TensorWasm's CUDA story should read CUDA-SETUP.md first (matrix of toolkit + driver + arch), then CUDA-KERNELS.md to write a kernel, then MPS-SETUP.md once they need more than ~8 co-located tenants on one GPU.

DocWaveOne-sentence summary
CUDA-SETUP.mdW1.6The exact toolkit, driver, compiler, environment-variable, and verification matrix to bring a CUDA host online for TensorWasm development.
MPS-SETUP.mdNVIDIA MPS daemon startup, capabilities, limits, and the runtime probe TensorWasm uses to decide between MPS-shared and per-tenant CUDA contexts.
AUTO-OFFLOAD.mdUser-facing reference for which Wasm patterns the auto-offload JIT recognises and how to enable it (also listed under Architecture).
CUDA-KERNELS.mdW4.5Practical guide for developers writing CUDA kernels that load and dispatch under TensorWasm's wasi:cuda surface, covering both explicit and auto-offload paths.
PLIRON-PIPELINE.mdFour-wave implementation plan for the Pliron-based auto-offload pipeline (Wasm to PTX via the interim LoweredOp IR and cuda-oxide), companion to RFC 0001.
CUDA-OXIDE-CUTOVER.mdEight-step cutover runbook for the day cuda-oxide v0.2 ships: dependency bump through default-backend flip, gated on four pre-conditions per RFC 0001 Option C.
HARDWARE-GATED-WORK.mdAuthoritative inventory of the CUDA code paths that are written but unverified on hardware (allocation/prefetch backends, async dispatch, device-memory host fns, try_grow_in_place, experimental wmma MatMul, cuda-oxide host backend) and how the gated gpu.yml CI lane validates each.

Operations

Everything an operator running TensorWasm in production needs after the gateway is alive: deployment topology, manifest sets for the three supported orchestrators (Kubernetes, Helm, Nomad), upgrade and backup playbooks, and the observability contract. The three orchestrator deliverables (deploy/k8s/, deploy/helm/tensor-wasm/, deploy/nomad/) describe the same single-instance runtime — an operator can switch between them without re-learning the env-var surface.

DocWaveOne-sentence summary
DEPLOYMENT.mdThe canonical production-topology reference: load balancer, gateway replicas, GPU pool, MPS, and disaster-recovery sequencing.
../deploy/k8s/README.mdW2.7Plain-YAML Kubernetes reference manifests (namespace, configmap, deployment, service, ServiceMonitor) for self-managed installs.
../deploy/helm/tensor-wasm/README.mdW2.7Templated Helm chart for the same single-gateway topology as the plain manifests, with a values-driven install surface.
../deploy/nomad/README.mdW5.6HashiCorp Nomad reference job specs (docker and raw_exec) for the same single-instance runtime as the k8s and Helm assets.
UPGRADE.mdW3.3Operator-facing fleet upgrade playbook describing the opinionated sequence for rolling a running TensorWasm deployment from one release to another.
BACKUP-RESTORE.mdW3.7What a production TensorWasm deployment must back up, the tested strategies, the restore paths, and the validation procedure that confirms a backup is good.
OBSERVABILITY.mdThe tracing span schema, the optional OTLP exporter stack, and how to wire a local collector for development.
CONFIG.mdB2.9Single-source reference for every environment variable consumed by tensor-wasm, grouped by crate, with default + type + effect columns.
GPU-QUOTAS.mdB6.5Roadmap feature #8 (per-tenant GPU memory quotas): the wired (T39) in-process counter as primary accounting via TenantContextBuilder, plus the host-side cuMemPool cap (hardware-gated, behind gpu-mem-pool).
COOPERATIVE-YIELD.mdB6.4Roadmap feature #4 (cooperative deadlines via WASI yield): the wasi:scheduler/host@0.1.0 protocol, the CONTINUE / DEADLINE-NEAR / DEADLINE-ELAPSED return codes, and the embedder wiring snippet.

Security

The threat model, the v0.1 audit findings, the backport policy that governs how security fixes flow into supported release branches, and the runbook a maintainer follows to rehearse a coordinated disclosure end to end before a real CVE arrives. Reports go to security@craton.com.ar (covered in ../SECURITY.md).

DocWaveOne-sentence summary
../SECURITY.mdW3.5 (backport policy), M8.5 (snapshot HMAC)TensorWasm's threat model, isolation strategy summary, the optional snapshot HMAC authentication (cross-linked to the v2 → v3 migration), and the backport policy that decides which security fixes land on which release branches.
SECURITY-AUDIT.mdThe v0.1 security-audit findings: methodology (manual walk + cargo-fuzz), per-asset verdict, and the follow-up tracking for partially-mitigated items.
TESTING.mdB2.9Testing conventions across the workspace: unit/integration/CUDA/fuzz layers, the #[ignore] policy for hardware-gated tests, and the CI matrix that runs them.
FUZZING.mdB2.9The fuzz/ directory layout, per-target corpora, the nightly + weekly cron schedule, and the v0.5 24-hour gate that determines when a target counts as "covered".
runbooks/cve-disclosure-dry-run.mdW5.5Manual procedure for rehearsing the CVE disclosure pipeline end-to-end on a test repository before a real CVE arrives.

Governance and supply chain

The maintainer registry, the decision process, the RFC pipeline, release engineering (CHANGELOG, MIGRATION, PATH-TO-V1), and the supply-chain commitments (SBOM, reproducible builds, Wasmtime cadence, trademark policy) that together ground the v1.0 gate in PATH-TO-V1.md. The split is deliberate: GOVERNANCE.md is the rules, MAINTAINERS.md is the registry, and the RFC pipeline is the mechanism that produces every other change that touches them.

DocWaveOne-sentence summary
../GOVERNANCE.mdW1.8Lightweight, opinionated governance model for a small core team, modeled on Wasmtime/ripgrep/zellij rather than the CNCF TOC.
../MAINTAINERS.mdW5.4Source of truth for the current maintainer roster and the active-maintainer count used by the quorum math in GOVERNANCE.md.
TRADEMARK.mdW3.4Permissive trademark policy: forks and re-implementations may reuse the "Craton TensorWasm" and "TensorWasm" names; only substantive forks must rebrand for clarity.
SBOM.mdW4.3What the CycloneDX SBOM shipped with every release contains, what it does not contain, how to regenerate it locally, and the maintainer contract.
REPRODUCIBLE-BUILDS.mdW3.6Recipe for two independent builds of the same release tag producing bit-identical sha256 digests on Linux x86_64 with the pinned toolchain.
WASMTIME-UPGRADE.mdW2.9Cadence policy for Wasmtime version bumps: quarterly minor bumps, major bumps case-by-case, plus the per-bump maintainer checklist.
RELEASE.mdB2.9Release-engineering runbook: tag preconditions, the per-release CHANGELOG / SBOM / cosign step sequence, and the @craton-co/release ownership contract.
../CHANGELOG.mdW3.1Keep-a-Changelog log of every notable change, grouped by semver release; the [Unreleased] section tracks the v0.2–v0.4 wave work staged on main.
MIGRATION-v0-to-v1.mdW3.2Operational checklist a v0.x deployment follows to land on v1.0 cleanly, populated continuously between v0.1 and v1.0.
PATH-TO-V1.mdThe proposed five-milestone roadmap from the current v0.1.0 preview to a v1.0 production release, with explicit anti-goals and open decisions.
FEATURE-STATUS.mdCanonical per-feature status matrix (Wired / Landed / Scaffold / Hardware-gated / Planned-v0.4) mapping each major feature to its crate(s) and Cargo feature flag; the single source of truth that README, CHANGELOG, and OPENAI-COMPAT defer to for status.
../rfcs/README.mdW1.7Lightweight RFC process: one contributor writes a doc, opens a PR, gives reviewers a week, and a maintainer decides.
../rfcs/TEMPLATE.mdW1.7The required starting point for a new RFC; copy to rfcs/0000-short-kebab-slug.md and fill in the sections in order.

Snapshots

The on-disk format spec and the cross-version compatibility promise. SNAPSHOT-COMPATIBILITY.md is the which-version-restores-what contract; crates/tensor-wasm-snapshot/FORMAT.md is the what-the-bytes-are spec. The two are kept in sync deliberately: a wire-format bump touches both files in the same PR.

DocWaveOne-sentence summary
SNAPSHOT-COMPATIBILITY.mdW1.3, M8.5 (v2 → v3)The cross-version compatibility promise: which TensorWasm versions can restore which on-disk snapshot versions, the format-bump procedure, and the v2 → v3 signed-snapshot migration (provision key → configure reader → configure writer → flip to strict mode).
../crates/tensor-wasm-snapshot/FORMAT.mdThe wire-format specification — the byte layout SnapshotWriter::capture produces and SnapshotReader::restore consumes, including the magic constant and current version.

The docs below ship in the repository and are reachable through this index, but do not currently have inbound links from README.md. They are flagged here so a future PR can add table rows for them. This index does not modify any other document; the missing anchors are intentionally left for a separate change.

The runbook README itself is linked from README.md, but the individual runbook pages it lists are not — they are reachable only through that index, which is the intended structure for an on-call manual (operators navigate from the alert payload via the index, not from the project landing page). Listed here for completeness.

Discoverable only through this index

DocWhy it should appear in README.md
../crates/tensor-wasm-snapshot/FORMAT.mdSnapshot wire-format spec — currently reachable only from SNAPSHOT-COMPATIBILITY.md and crate-internal docs; the format is part of the public contract per SNAPSHOT-COMPATIBILITY.md and warrants a top-level anchor.
../deploy/nomad/README.mdNomad reference manifests (W5.6) ship alongside the k8s and Helm assets, both of which are already in the README.md Operations table. Adding a row keeps the three orchestrators symmetric.
../rfcs/TEMPLATE.mdLinked from the Contributing section of README.md (line 235); already covered, listed here only so the inventory is complete.
RISKS.mdLiving risk register, referenced from CHANGELOG.md but not directly from README.md's Architecture & reference table.

Discoverable only via runbooks/README.md (by design)

These pages are intentionally reached through their index because the alert payload, not the project landing page, is what an operator opens during a page:

DocWaveSummary
runbooks/availability-fast-burn.mdW2.6Runbook for the 14.4× burn-rate alert against availability_http.
runbooks/availability-slow-burn.mdW2.6Runbook for the 6× burn-rate alert against availability_http.
runbooks/availability-very-slow-burn.mdW2.6Runbook for the 1× sustained-burn alert against availability_http.
runbooks/dispatch-latency-spike.mdW2.6Runbook for the kernel-dispatch P95 latency SLO breach (tensor_wasm_kernel_latency_seconds).
runbooks/invoke-latency-spike.mdW2.6Runbook for the POST /functions/{id}/invoke P95 latency SLO breach.
runbooks/healthz-slow.mdW2.6Runbook for the GET /healthz P95 latency SLO breach.
runbooks/rollback.mdW2.6Manual procedure for reverting a TensorWasm node from a bad release.
runbooks/oncall-paging.mdW2.6Manual procedure for escalating from operator-handling to waking the on-call maintainer.
runbooks/trace-id.mdW2.6Non-alert reference for finding logs that share a given trace id.
runbooks/disaster-recovery.mdW3.7Manual procedure for bringing a TensorWasm deployment back online after the host kernel, snapshot store, or CUDA driver has been lost.
runbooks/cve-disclosure-dry-run.mdW5.5Manual procedure for rehearsing the CVE disclosure pipeline end-to-end on a test repository (also listed under Security).

Audience routing

A quick lookup table for "which docs should I read first?" by role. Each row lists three documents in the order a fresh reader should take them on.

How to extend this index

When a new doc lands, add one row to the most-relevant section above and (if appropriate) an inbound link from README.md. Keep summaries to a single sentence; if a doc warrants a paragraph, the doc itself should carry that introduction in its first line so this index can quote it. The wave tag is the W-number from PATH-TO-V1.md's workstream tables; leave it as an em dash if the doc predates the wave program.

If an existing doc moves or is renamed, update both the row here and the corresponding row in README.md. The two surfaces are kept in sync deliberately: this index is the complete inventory, the README.md table is the curated landing surface. When the inventory and the landing surface drift apart, the Missing cross-links section above is the ledger that records the gap.