TensorWasm

Craton TensorWasm — Documentation Index

The single-page sitemap for every Markdown document shipped with Craton TensorWasm. The grouping mirrors how a reader actually navigates the project: pick the section that matches your role, follow the link, and the linked doc is the contract.

The wave tag in parentheses (W1.1, W2.3, etc.) records which v0.2–v0.4 hardening wave landed the document; docs without a tag predate the wave program. The link path is relative to this file (i.e. relative to docs/), so ../GOVERNANCE.md points at the repository root.

If a doc is reachable from README.md as well, it is listed in the table at the bottom of that file. The Missing cross-links section at the foot of this page enumerates the docs that are only reachable via this index — a future PR may add anchors for them in README.md.

What this index is and is not

This index is the canonical inventory of in-repository Markdown documentation: every doc that ships in the source tree appears in exactly one section below. The summary text is the single-sentence abstract a reader needs to decide whether to open the doc.

This index is not a tutorial, a learning path, or a status page.

For a learning path see GETTING-STARTED.md followed by the Audience routing table below.
For the v1.0 roadmap and status see PATH-TO-V1.md and the [Unreleased] section of ../CHANGELOG.md.
For the published rustdoc + OpenAPI archive (rendered, hosted, per release) see API-REFERENCE.md.

The index is also not the right surface to host long-form content. Anything that needs more than the one-line summary below belongs in the linked doc itself; if a section starts growing past the "one row per doc" rule the section is wrong, not the rule.

What this index is and is not
Conventions
Getting started
Architecture and internals
API surface
Performance and benchmarking
CUDA
Operations
Security
Governance and supply chain
Snapshots
Missing cross-links
Audience routing
How to extend this index

Conventions

Link paths are relative to this file (docs/INDEX.md). A leading ../ therefore steps up into the repository root; a bare filename resolves inside docs/ itself.
Wave tags trace each doc back to a workstream entry in PATH-TO-V1.md: W<wave>.<task> matches the row in the per-area workstream tables. Docs that predate the wave program are marked with an em dash.
Summaries are deliberately single-sentence. If a doc warrants more context the doc itself should open with that paragraph so this index can quote a one-liner from it.
No emoji, no badges. This file is a flat sitemap, optimised for grep, not for visual scanning.

Getting started

The narrow on-ramp for a new contributor or operator: from a clean checkout to a function running against a deployed gateway. Start with GETTING-STARTED.md, then branch into the role-specific guide (CLI.md for the developer, the production-deployment tutorial for the operator).

Doc	Wave	One-sentence summary
GETTING-STARTED.md	—	Fifteen-minute onboarding tutorial that walks a Rust developer from a clean checkout to invoking a deployed Wasm function.
CLI.md	—	Complete reference for the `tensor-wasm` developer CLI, its subcommands, global flags, exit codes, and JSON argument conventions.
tutorials/production-deployment.md	W3.8	End-to-end tutorial that takes a competent SRE from a fresh Kubernetes cluster to a production-ready TensorWasm deployment with mTLS, Prometheus, Grafana, audit log, and a deployed function.
MIGRATING-FROM-WASMTIME-WASMER.md	W3.9	Honest evaluation guide for teams already running upstream Wasmtime, Wasmer, or a Spin/Wasmer-Edge FaaS deciding whether to move workloads onto TensorWasm.
WASM-DEVELOPER-GUIDE.md	—	Walkthrough for writing Wasm guests against TensorWasm, from a trivial `add(a, b)` through the `wasi:cuda` host imports and the auto-offload fast path.
BUILD.md	—	Build matrix for the three supported configurations (no-CUDA, CUDA host, CUDA stub) plus the canonical feature-flag taxonomy.

Architecture and internals

Background for contributors changing the runtime itself: how the crates fit together, the upstream-pinning decisions, the JIT pipeline shape, and the cold-start latency model that the snapshot subsystem exists to fight. The root ARCHITECTURE.md is the entry point; everything else in this section is a deeper cut into one subsystem.

Doc	Wave	One-sentence summary
../ARCHITECTURE.md	W5.2 (refresh)	The eleven-crate dependency graph, the layered execution model, and the trust boundaries between Wasm guest, host process, and CUDA driver.
WASMTIME-FORK.md	—	The decision record that explains why TensorWasm does not fork Wasmtime, and which alternative simplified-IR path the JIT detector walks instead.
RISKS.md	—	Living risk register tracking architectural risks, upstream pinning decisions, and known limitations, refreshed alongside every `CHANGELOG.md` release.
AUTO-OFFLOAD.md	—	User-facing reference for the auto-offload pipeline: which Wasm patterns the detector recognises, which it rejects, and how to enable it.
CUDARC-SPIKE.md	W1.2	The `cust` → `cudarc` migration spike record: version chosen, API mapping table, known gaps, and the recommended cutover plan.
COLD-START.md	—	The five-component additive model for cold-start latency on a TensorWasm node and the operator levers that affect each component.
INSTANCE-POOL.md	B5.8	Roadmap feature #5 (pre-instantiated instance pool): the wired (T37) warm pool through the invoke path, configuration knobs, and the reset-on-return contract.
KERNEL-REGISTRY.md	B6.3	Roadmap feature #3 (signed kernel registry): HMAC-SHA256 `KernelManifest` records and the wired (T35) disk-persisted `DiskRegistry` over the artifact store, with paginated `GET /kernels`.
DIFFERENTIAL-ORACLE.md	B5.9	Roadmap feature #6 (differential JIT correctness oracle): bit-identity assertion contract between the Wasmtime CPU path and the JIT PTX path, plus the per-kernel tolerance policy.
ARTIFACT-STORE.md	B6.6	Roadmap feature #9 (unified content-addressed signed artifact store): the `tensor-wasm-artifacts` trait surface, on-disk envelope, and the now-wired convergence that backs snapshots (T40) and the JIT L2 cache (T30).
glossary.md	—	Short paragraph definitions of recurring CUDA, Wasm, and TensorWasm-internal terms (UVM, MPS, MIG, PTX, WMMA, BLAKE3 fingerprint, deopt guard, dispatch future, etc.).

API surface

The stable wire and binary surfaces TensorWasm commits to: HTTP REST, audit-log JSON, the published Rust + OpenAPI reference archive, and the mTLS contract that fronts them all. The hand-written REST reference in crates/tensor-wasm-api/API.md is the canonical surface for humans; the per-release rustdoc + OpenAPI bundle described in API-REFERENCE.md is the canonical surface for tooling.

Doc	Wave	One-sentence summary
../crates/tensor-wasm-api/API.md	—	Hand-written REST reference for every endpoint the `tensor-wasm-api` gateway serves, with request/response examples for each route.
API-REFERENCE.md	W4.8	Publication-policy for the per-release rustdoc + OpenAPI archive: what is in it, what is not, the URL contract, and the workflow that produces it.
AUDIT-LOG.md	W2.2	Wire-format schema, sink configuration, rotation guidance, and stable-string contract for the structured audit log emitted on state-mutating routes.
STREAMING.md	B6.1	Roadmap feature #2 (streaming HTTP `invoke` responses): the `wasi:tensor/host.emit-chunk` host-fn contract and the wired (T34) SSE / chunked-transfer path that surfaces real guest chunks.
OPENAI-COMPAT.md	B4.9 / B5.6	Roadmap feature #10 (OpenAI-compatible inference gateway shim): the `/v1/completions` and `/v1/chat/completions` routes, wired (T41) to internal invoke via `TENSOR_WASM_API_OPENAI_MODEL_MAP` (buffered or SSE), closing the earlier `501 openai_not_yet_wired` scaffold.
deployment/mtls.md	W2.8	Two production deployment shapes — self-terminated rustls and reverse-proxy fronting — with a recommended path for the v0.4 binary that still binds plaintext.

Performance and benchmarking

How TensorWasm measures itself, where the published numbers come from, the operator-side SLO contract, and the runbook/dashboard pair that turns burn-rate alerts into mitigation steps. Internal regression (PERFORMANCE.md + committed baseline.json) and external comparison (BENCHMARKING.md) are split deliberately — the latter pulls in the anti-cheating checklist a reader needs to reproduce a result a blog post would publish.

Doc	Wave	One-sentence summary
PERFORMANCE.md	—	How TensorWasm measures performance, what the current reference numbers look like, and how the committed-`baseline.json` CI regression gate works.
BENCHMARKING.md	—	Companion to PERFORMANCE.md focused on external comparisons: same-workload, same-hardware, same-statistics rules for honest competitive benchmarks.
CAPACITY-PLANNING.md	W4.4	Three reference SKUs, four sizing formulas, and tenants-per-host curves that translate the SLO targets and bench medians into a host-sizing answer.
SLO.md	W1.9	The project's commitment to numeric availability, latency, and error-rate targets for the HTTP surface and kernel-dispatch path.
dashboards/README.md	W2.5	Index for the importable Grafana dashboard (`tensor-wasm-overview.json`) that gives one stat panel per SLI and one row per subsystem.
runbooks/README.md	W2.6	One-page-per-alert operator manual: the alert → runbook mapping from `SLO.md` §7 with shared mitigation-step structure across every page.

CUDA

The toolkit-install path, the multi-tenant MPS daemon contract, the kernel-authoring guide, and the auto-offload reference for the JIT pipeline. A reader new to TensorWasm's CUDA story should read CUDA-SETUP.md first (matrix of toolkit + driver + arch), then CUDA-KERNELS.md to write a kernel, then MPS-SETUP.md once they need more than ~8 co-located tenants on one GPU.

Doc	Wave	One-sentence summary
CUDA-SETUP.md	W1.6	The exact toolkit, driver, compiler, environment-variable, and verification matrix to bring a CUDA host online for TensorWasm development.
MPS-SETUP.md	—	NVIDIA MPS daemon startup, capabilities, limits, and the runtime probe TensorWasm uses to decide between MPS-shared and per-tenant CUDA contexts.
AUTO-OFFLOAD.md	—	User-facing reference for which Wasm patterns the auto-offload JIT recognises and how to enable it (also listed under Architecture).
CUDA-KERNELS.md	W4.5	Practical guide for developers writing CUDA kernels that load and dispatch under TensorWasm's `wasi:cuda` surface, covering both explicit and auto-offload paths.
PLIRON-PIPELINE.md	—	Four-wave implementation plan for the Pliron-based auto-offload pipeline (Wasm to PTX via the interim `LoweredOp` IR and cuda-oxide), companion to RFC 0001.
CUDA-OXIDE-CUTOVER.md	—	Eight-step cutover runbook for the day cuda-oxide v0.2 ships: dependency bump through default-backend flip, gated on four pre-conditions per RFC 0001 Option C.
HARDWARE-GATED-WORK.md	—	Authoritative inventory of the CUDA code paths that are written but unverified on hardware (allocation/prefetch backends, async dispatch, device-memory host fns, `try_grow_in_place`, experimental wmma MatMul, cuda-oxide host backend) and how the gated `gpu.yml` CI lane validates each.

Operations

Everything an operator running TensorWasm in production needs after the gateway is alive: deployment topology, manifest sets for the three supported orchestrators (Kubernetes, Helm, Nomad), upgrade and backup playbooks, and the observability contract. The three orchestrator deliverables (deploy/k8s/, deploy/helm/tensor-wasm/, deploy/nomad/) describe the same single-instance runtime — an operator can switch between them without re-learning the env-var surface.

Doc	Wave	One-sentence summary
DEPLOYMENT.md	—	The canonical production-topology reference: load balancer, gateway replicas, GPU pool, MPS, and disaster-recovery sequencing.
../deploy/k8s/README.md	W2.7	Plain-YAML Kubernetes reference manifests (namespace, configmap, deployment, service, ServiceMonitor) for self-managed installs.
../deploy/helm/tensor-wasm/README.md	W2.7	Templated Helm chart for the same single-gateway topology as the plain manifests, with a values-driven install surface.
../deploy/nomad/README.md	W5.6	HashiCorp Nomad reference job specs (docker and raw_exec) for the same single-instance runtime as the k8s and Helm assets.
UPGRADE.md	W3.3	Operator-facing fleet upgrade playbook describing the opinionated sequence for rolling a running TensorWasm deployment from one release to another.
BACKUP-RESTORE.md	W3.7	What a production TensorWasm deployment must back up, the tested strategies, the restore paths, and the validation procedure that confirms a backup is good.
OBSERVABILITY.md	—	The `tracing` span schema, the optional OTLP exporter stack, and how to wire a local collector for development.
CONFIG.md	B2.9	Single-source reference for every environment variable consumed by `tensor-wasm`, grouped by crate, with default + type + effect columns.
GPU-QUOTAS.md	B6.5	Roadmap feature #8 (per-tenant GPU memory quotas): the wired (T39) in-process counter as primary accounting via `TenantContextBuilder`, plus the host-side `cuMemPool` cap (hardware-gated, behind `gpu-mem-pool`).
COOPERATIVE-YIELD.md	B6.4	Roadmap feature #4 (cooperative deadlines via WASI yield): the `wasi:scheduler/host@0.1.0` protocol, the CONTINUE / DEADLINE-NEAR / DEADLINE-ELAPSED return codes, and the embedder wiring snippet.

Security

The threat model, the v0.1 audit findings, the backport policy that governs how security fixes flow into supported release branches, and the runbook a maintainer follows to rehearse a coordinated disclosure end to end before a real CVE arrives. Reports go to security@craton.com.ar (covered in ../SECURITY.md).

Doc	Wave	One-sentence summary
../SECURITY.md	W3.5 (backport policy), M8.5 (snapshot HMAC)	TensorWasm's threat model, isolation strategy summary, the optional snapshot HMAC authentication (cross-linked to the v2 → v3 migration), and the backport policy that decides which security fixes land on which release branches.
SECURITY-AUDIT.md	—	The v0.1 security-audit findings: methodology (manual walk + `cargo-fuzz`), per-asset verdict, and the follow-up tracking for partially-mitigated items.
TESTING.md	B2.9	Testing conventions across the workspace: unit/integration/CUDA/fuzz layers, the `#[ignore]` policy for hardware-gated tests, and the CI matrix that runs them.
FUZZING.md	B2.9	The `fuzz/` directory layout, per-target corpora, the nightly + weekly cron schedule, and the v0.5 24-hour gate that determines when a target counts as "covered".
runbooks/cve-disclosure-dry-run.md	W5.5	Manual procedure for rehearsing the CVE disclosure pipeline end-to-end on a test repository before a real CVE arrives.

Governance and supply chain

The maintainer registry, the decision process, the RFC pipeline, release engineering (CHANGELOG, MIGRATION, PATH-TO-V1), and the supply-chain commitments (SBOM, reproducible builds, Wasmtime cadence, trademark policy) that together ground the v1.0 gate in PATH-TO-V1.md. The split is deliberate: GOVERNANCE.md is the rules, MAINTAINERS.md is the registry, and the RFC pipeline is the mechanism that produces every other change that touches them.

Doc	Wave	One-sentence summary
../GOVERNANCE.md	W1.8	Lightweight, opinionated governance model for a small core team, modeled on Wasmtime/ripgrep/zellij rather than the CNCF TOC.
../MAINTAINERS.md	W5.4	Source of truth for the current maintainer roster and the active-maintainer count used by the quorum math in `GOVERNANCE.md`.
TRADEMARK.md	W3.4	Permissive trademark policy: forks and re-implementations may reuse the "Craton TensorWasm" and "TensorWasm" names; only substantive forks must rebrand for clarity.
SBOM.md	W4.3	What the CycloneDX SBOM shipped with every release contains, what it does not contain, how to regenerate it locally, and the maintainer contract.
REPRODUCIBLE-BUILDS.md	W3.6	Recipe for two independent builds of the same release tag producing bit-identical sha256 digests on Linux x86_64 with the pinned toolchain.
WASMTIME-UPGRADE.md	W2.9	Cadence policy for Wasmtime version bumps: quarterly minor bumps, major bumps case-by-case, plus the per-bump maintainer checklist.
RELEASE.md	B2.9	Release-engineering runbook: tag preconditions, the per-release CHANGELOG / SBOM / cosign step sequence, and the `@craton-co/release` ownership contract.
../CHANGELOG.md	W3.1	Keep-a-Changelog log of every notable change, grouped by semver release; the `[Unreleased]` section tracks the v0.2–v0.4 wave work staged on `main`.
MIGRATION-v0-to-v1.md	W3.2	Operational checklist a v0.x deployment follows to land on v1.0 cleanly, populated continuously between v0.1 and v1.0.
PATH-TO-V1.md	—	The proposed five-milestone roadmap from the current v0.1.0 preview to a v1.0 production release, with explicit anti-goals and open decisions.
FEATURE-STATUS.md	—	Canonical per-feature status matrix (Wired / Landed / Scaffold / Hardware-gated / Planned-v0.4) mapping each major feature to its crate(s) and Cargo feature flag; the single source of truth that README, CHANGELOG, and OPENAI-COMPAT defer to for status.
../rfcs/README.md	W1.7	Lightweight RFC process: one contributor writes a doc, opens a PR, gives reviewers a week, and a maintainer decides.
../rfcs/TEMPLATE.md	W1.7	The required starting point for a new RFC; copy to `rfcs/0000-short-kebab-slug.md` and fill in the sections in order.

Snapshots

The on-disk format spec and the cross-version compatibility promise. SNAPSHOT-COMPATIBILITY.md is the which-version-restores-what contract; crates/tensor-wasm-snapshot/FORMAT.md is the what-the-bytes-are spec. The two are kept in sync deliberately: a wire-format bump touches both files in the same PR.

Doc	Wave	One-sentence summary
SNAPSHOT-COMPATIBILITY.md	W1.3, M8.5 (v2 → v3)	The cross-version compatibility promise: which TensorWasm versions can restore which on-disk snapshot versions, the format-bump procedure, and the v2 → v3 signed-snapshot migration (provision key → configure reader → configure writer → flip to strict mode).
../crates/tensor-wasm-snapshot/FORMAT.md	—	The wire-format specification — the byte layout `SnapshotWriter::capture` produces and `SnapshotReader::restore` consumes, including the magic constant and current version.

Missing cross-links

The docs below ship in the repository and are reachable through this index, but do not currently have inbound links from README.md. They are flagged here so a future PR can add table rows for them. This index does not modify any other document; the missing anchors are intentionally left for a separate change.

The runbook README itself is linked from README.md, but the individual runbook pages it lists are not — they are reachable only through that index, which is the intended structure for an on-call manual (operators navigate from the alert payload via the index, not from the project landing page). Listed here for completeness.

Discoverable only through this index

Doc	Why it should appear in `README.md`
../crates/tensor-wasm-snapshot/FORMAT.md	Snapshot wire-format spec — currently reachable only from `SNAPSHOT-COMPATIBILITY.md` and crate-internal docs; the format is part of the public contract per `SNAPSHOT-COMPATIBILITY.md` and warrants a top-level anchor.
../deploy/nomad/README.md	Nomad reference manifests (W5.6) ship alongside the k8s and Helm assets, both of which are already in the `README.md` Operations table. Adding a row keeps the three orchestrators symmetric.
../rfcs/TEMPLATE.md	Linked from the Contributing section of `README.md` (line 235); already covered, listed here only so the inventory is complete.
RISKS.md	Living risk register, referenced from `CHANGELOG.md` but not directly from `README.md`'s Architecture & reference table.

Discoverable only via runbooks/README.md (by design)

These pages are intentionally reached through their index because the alert payload, not the project landing page, is what an operator opens during a page:

Doc	Wave	Summary
runbooks/availability-fast-burn.md	W2.6	Runbook for the 14.4× burn-rate alert against `availability_http`.
runbooks/availability-slow-burn.md	W2.6	Runbook for the 6× burn-rate alert against `availability_http`.
runbooks/availability-very-slow-burn.md	W2.6	Runbook for the 1× sustained-burn alert against `availability_http`.
runbooks/dispatch-latency-spike.md	W2.6	Runbook for the kernel-dispatch P95 latency SLO breach (`tensor_wasm_kernel_latency_seconds`).
runbooks/invoke-latency-spike.md	W2.6	Runbook for the `POST /functions/{id}/invoke` P95 latency SLO breach.
runbooks/healthz-slow.md	W2.6	Runbook for the `GET /healthz` P95 latency SLO breach.
runbooks/rollback.md	W2.6	Manual procedure for reverting a TensorWasm node from a bad release.
runbooks/oncall-paging.md	W2.6	Manual procedure for escalating from operator-handling to waking the on-call maintainer.
runbooks/trace-id.md	W2.6	Non-alert reference for finding logs that share a given trace id.
runbooks/disaster-recovery.md	W3.7	Manual procedure for bringing a TensorWasm deployment back online after the host kernel, snapshot store, or CUDA driver has been lost.
runbooks/cve-disclosure-dry-run.md	W5.5	Manual procedure for rehearsing the CVE disclosure pipeline end-to-end on a test repository (also listed under Security).

Audience routing

A quick lookup table for "which docs should I read first?" by role. Each row lists three documents in the order a fresh reader should take them on.

Role	First	Second	Third
Wasm developer	GETTING-STARTED.md	WASM-DEVELOPER-GUIDE.md	../crates/tensor-wasm-api/API.md
CUDA kernel author	CUDA-SETUP.md	CUDA-KERNELS.md	AUTO-OFFLOAD.md
SRE / operator	DEPLOYMENT.md	tutorials/production-deployment.md	runbooks/README.md
Capacity planner	SLO.md	CAPACITY-PLANNING.md	PERFORMANCE.md
Security reviewer	../SECURITY.md	SECURITY-AUDIT.md	AUDIT-LOG.md
Release engineer	../CHANGELOG.md	UPGRADE.md	REPRODUCIBLE-BUILDS.md
Compliance / auditor	SBOM.md	AUDIT-LOG.md	SECURITY-AUDIT.md
Maintainer (new)	../GOVERNANCE.md	../MAINTAINERS.md	../rfcs/README.md
Wasmtime/Wasmer evaluator	MIGRATING-FROM-WASMTIME-WASMER.md	WASMTIME-FORK.md	WASMTIME-UPGRADE.md
On-call (paged)	runbook from alert payload	runbooks/README.md	runbooks/trace-id.md

How to extend this index

When a new doc lands, add one row to the most-relevant section above and (if appropriate) an inbound link from README.md. Keep summaries to a single sentence; if a doc warrants a paragraph, the doc itself should carry that introduction in its first line so this index can quote it. The wave tag is the W-number from PATH-TO-V1.md's workstream tables; leave it as an em dash if the doc predates the wave program.

If an existing doc moves or is renamed, update both the row here and the corresponding row in README.md. The two surfaces are kept in sync deliberately: this index is the complete inventory, the README.md table is the curated landing surface. When the inventory and the landing surface drift apart, the Missing cross-links section above is the ledger that records the gap.