TensorWasm
Craton TensorWasm CLI
Craton TensorWasm CLI
The tensor-wasm binary is the developer-facing entry point to Craton TensorWasm. It wraps the same tensor-wasm-exec engine that powers the server (see API.md) so anything that runs against a deployed function can also be exercised locally without standing up infrastructure.
The CLI is built as part of the workspace — see BUILD.md for prerequisites and feature flags. After cargo build -p tensor-wasm-cli you will find the binary at target/<profile>/tensor-wasm (tensor-wasm.exe on Windows).
tensor-wasm --help
prints the top-level synopsis. Every subcommand also supports --help for its own flags.
Global behaviour
- Logging is configured via
TENSOR_WASM_LOG(which usestracing-subscriber'sEnvFilterdirective syntax);RUST_LOGis honoured as a fallback. The default level iswarn; setTENSOR_WASM_LOG=tensor_wasm_exec=debugto drill into the executor orTENSOR_WASM_LOG=infoto surface routine progress. Diagnostics are written to stderr so stdout stays clean for command output (critical for--output json). Security warning: setting the level totrace(notablyreqwest=trace) causesreqwestto log outbound request headers, including theAuthorization: Bearer <token>header set byTENSOR_WASM_TOKEN. Do not enable trace-level logging in production; the CLI does not currently install a tracing field-redaction layer. - Exit codes follow the Unix convention:
0on success, non-zero on any user or runtime error. Errors print to stderr with a chained-cause summary courtesy ofanyhow. Thesnapshotandkernelsubcommands additionally use2for local validation failures (bad path, malformed key, oversized archive) and3for "the API endpoint is not yet shipped" so CI can tell those apart from a generic failure (1). - Arguments and outputs that involve guest data use JSON. Use
--args '[1.0, 2.0]'-style values; non-array JSON is rejected with a clear message.
Global flags
These flags are accepted on every subcommand (they are global clap flags):
--tenant <u64>: tenant id advertised on outbound API requests via theX-TensorWasm-Tenantheader. Defaults to0, which suppresses the header entirely for backwards compatibility.--ca-cert <PATH>: trust an additional PEM-encoded private CA root for outbound HTTPS. The certificate is added alongside the system trust store, not instead of it, so use this for an internal/self-signed CA rather than--insecure. The file must be PEM (not DER) and contain aBEGIN CERTIFICATEblock, or the CLI fails fast with an actionable error.--insecure: disable TLS certificate verification entirely (danger_accept_invalid_certs). Security hazard: this exposes the connection to man-in-the-middle attacks and can leak theTENSOR_WASM_TOKENbearer credential. A loud warning is logged on every invocation. Intended only for local dev against a throwaway cert; prefer--ca-certeverywhere else.
Environment variables
TENSOR_WASM_TOKEN: when set (and non-empty after trimming), sent asAuthorization: Bearer <token>on every outbound request. If a token is configured and the target URL is a non-loopbackhttp://endpoint, the CLI logs a one-shot warning before the token leaves the process (it does not refuse — operators may legitimately be on a trusted private network).TENSOR_WASM_LOG/RUST_LOG:EnvFilterdirective for log verbosity (see above; defaultwarn).TENSOR_WASM_REQUIRE_KEY_PERMS: when set to1on Unix,snapshot save/snapshot restore/kernel publish/kernel verifyrefuse (hard error, exit2) to use an HMAC key file that is group- or world-readable, instead of merely warning. Lets security-conscious deployments fail closed on a leaked-readable signing key.
Subcommands
tensor-wasm run <file.wasm> [--export <name>] [--args <json>]
Run a Wasm module locally against an in-process TensorWasmEngine.
-
<file.wasm>: path to the module to execute. Must exist and be readable. -
--export <name>: function to invoke. Defaults tomain. -
--args <json>: arguments to forward to the guest, encoded as a JSON array. Each element is converted to the closest-fitting wasm value type before being passed tocall_export_with_args:JSON literal Wasm value type integer in i32range (e.g.1,-2147483648)i32integer outside i32range (e.g.2147483648)i64non-integer numeric (e.g. 2.5)f64anything else (string, array, null) rejected with a parse error The export's declared signature must accept the resulting parameter types or wasmtime returns an error.
f32cannot be selected from a JSON literal unambiguously; build a Wasm wrapper that demotes fromf64if you need 32-bit floats from the CLI.
Examples:
# Legacy () -> () export — prints `ok`.
tensor-wasm run tests/wasm-fixtures/noop.wasm --export noop
# (i32, i32) -> i32 adder — prints `3`.
tensor-wasm run tests/wasm-fixtures/adder.wasm --export add --args '[1, 2]'
# (f64) -> f64 doubler — prints `3.0`.
tensor-wasm run tests/wasm-fixtures/doubler.wasm --export double --args '[1.5]'
On success the command prints the export's result list. An empty result list (the () -> () case) collapses to the literal ok for stable scripting; a single-element result unwraps to the scalar (so a -> i32 adder prints 3, not [3]); multi-element results print as a JSON array. On failure the chained-cause stack is written to stderr and the process exits non-zero. This subcommand exercises the same compile-and-spawn path that tensor-wasm-api's POST /functions/{id}/invoke handler uses, so local runs are a faithful reproduction of server behaviour.
tensor-wasm deploy <file.wasm> --server <url> [--name <name>] [--output <text|json>]
Upload a Wasm module to a TensorWasm server.
<file.wasm>: path to the artefact to deploy. Capped at 64 MiB to match the server's request-body limit.--server <url>: base URL of the target server (e.g.http://localhost:8080). Must usehttp://orhttps://and have a non-empty host.--name <name>: tenant-supplied display name. Defaults to the file stem when omitted.--output <text|json>: output format.text(default) prints the assigned function id bare;jsonprints a stable machine-readable envelope{"id":"<...>"}(compact, one line) for scripting / CI.
tensor-wasm deploy streams the Wasm bytes through a base64 encoder and POSTs {"name": ..., "wasm_b64": ...} to /functions on the target server. On success the response carries the assigned function id, which is printed to stdout for piping into subsequent tensor-wasm invoke calls.
tensor-wasm invoke <id> --server <url> [--export <name>] [--args <json>] [--output <text|json>]
Call a deployed function by id.
<id>: the function identifier returned by an earliertensor-wasm deploy. Validated locally against a strict identifier charset ([A-Za-z0-9._-], rejecting the empty string and the traversal tokens./..) before any request is made, so a fat-fingered id fails fast with a clear "invalid character" message rather than an opaque server round-trip.--server <url>: base URL of the target TensorWasm server.--export <name>: exported function to call on the deployed module. Defaults to_start(the WASI command convention); the server falls back tomainwhen_startis absent.--args <json>: arguments forwarded to the function as a JSON array. The CLI validates that the value parses as a JSON array (non-array JSON is rejected) and embeds it as theargsfield; when omitted,argsis an empty array.--output <text|json>: output format.text(default) pretty-prints the server response;jsonprints a stable machine-readable envelope{"id":..., "export":..., "response":<parsed-or-raw>}(compact, one line) for scripting / CI.
The subcommand issues a POST /functions/{id}/invoke against the server with the JSON body {"export": "...", "args": [...]} and prints the response to stdout. The wire envelope (including args) is fixed today, but argument pass-through is not yet wired into the executor server-side — the API handler deserialises the envelope strictly (to bound the DoS surface) but does not yet thread args into call_export_with_args. The shape is locked now so clients written against this CLI keep working once the server learns to consume args. Non-2xx responses surface as non-zero exits with the error envelope forwarded to stderr.
tensor-wasm bench <file.wasm> --export <name> [--n <iters>]
Benchmark a Wasm export locally and print a P50 / P95 / P99 / max latency table. Each iteration spawns a fresh instance, invokes the export, and terminates — so the reported numbers are end-to-end (including cold start), not steady-state.
<file.wasm>: path to the module.--export <name>: function to invoke per iteration. Defaults tomain.--n <iters>: iteration count. Defaults to100. Must be at least 1.--output <text|json>: output format.text(default) prints the percentile table;jsonemits a machine-readable document of the same percentiles for CI perf gates.
Example:
tensor-wasm bench tests/wasm-fixtures/vector_add.wasm --export add --n 1000
Sample output:
bench: export=`add` iterations=1000
+-----------+--------------+
| percentile| latency|
+-----------+--------------+
| P50 | 312.40 us |
| P95 | 589.10 us |
| P99 | 1.20 ms |
| max | 4.83 ms |
+-----------+--------------+
Percentiles use the nearest-rank method on the sorted sample buffer. For a steady-state micro-benchmark (single instance, repeated calls), use the Criterion suite in tensor-wasm-bench — see BUILD.md.
tensor-wasm snapshot save --instance <id> --output <out.tensor-wasm> --server <url>tensor-wasm snapshot restore --input <in.tensor-wasm> --as-instance <id> --server <url>
Capture or restore an instance's state from a .tensor-wasm archive via the TensorWasm HTTP API — both sub-actions are server-backed, not a local file format dance. Arguments are passed as named flags, not positionals.
tensor-wasm snapshot save POSTs /instances/{id}/snapshot (the instance id is percent-encoded into the path) and streams the server-produced archive to --output. The download is written to a tempfile in the output's directory and atomically renamed on success, so a failed transfer never leaves a half-written snapshot at the target path. Flags:
--instance <id>: identifier of the running instance to snapshot. Must be non-empty.--output <path>: where to write the resulting.tensor-wasmarchive. The parent directory must exist.--server <url>: base URL of the target server.--max-restore-bytes <bytes>: cap on the number of bytes the CLI will accept from the server and write to disk. Defaults to 256 MiB; values above the default are clamped down so a malicious server cannot fill the operator's disk by streaming an unbounded response body.--hmac-key-file <PATH>: optional HMAC signing key (see below).
tensor-wasm snapshot restore streams the on-disk archive up to POST /instances/restore and prints the restored instance id from the server's {"id": ...} ack. Flags:
--input <path>: path to the.tensor-wasmarchive to upload. Must be a regular file.--as-instance <id>: identifier to assign to the restored instance. Validated against a strict identifier charset ([A-Za-z0-9._-], rejecting empty and./..) so it can be carried safely in theX-TensorWasm-As-Instanceheader.--server <url>: base URL of the target server.--max-archive-bytes <bytes>: cap on the on-disk (compressed) archive the CLI will upload. Defaults to 256 MiB; this bounds the compressed upload only — the decompressed footprint is enforced server-side and may be larger. The deprecated alias--max-decompressedis accepted for one release; prefer--max-archive-bytes.--hmac-key-file <PATH>/--require-signature: optional signature verification (see below).
Validation errors (missing/unwritable output, oversized archive, malformed key) are caught locally and exit 2 (LOCAL_VALIDATION_FAILED) before any network call. The /instances/... API routes are planned but not yet merged server-side; until they land, a route-miss 404 from the server is surfaced as exit 3 (FEATURE_NOT_EXPOSED) with a tracking-issue pointer rather than a silent success.
Signed snapshots: --hmac-key-file and --require-signature
Both snapshot save and snapshot restore accept --hmac-key-file <PATH> pointing at a 32-byte HMAC-SHA256 key. The file is interpreted as 64 hex characters when its trimmed length matches, otherwise as 32 raw bytes; any other length is rejected locally with exit code 2 (LOCAL_VALIDATION_FAILED) before the CLI dials the server. The hex-encoded key is forwarded as the X-TensorWasm-Snapshot-HMAC-Key request header — the server uses it to sign (on save) or verify (on restore) the archive. The CLI refuses to send the key over plaintext http:// to a non-loopback host (exit 2); use https:// or a loopback target. snapshot restore additionally accepts --require-signature, which sends X-TensorWasm-Snapshot-Require-Signature: true so the server refuses to rehydrate any unsigned (v2) archive. On Unix, a group/world-readable key file logs a warning; set TENSOR_WASM_REQUIRE_KEY_PERMS=1 to make that a hard error. See docs/SNAPSHOT-FORMAT.md for the on-disk layout of the signed frame.
tensor-wasm kernel publish|list|verify
Publish, list, or verify entries in the signed kernel registry (roadmap feature #3). The server-side /kernels routes are gated behind the api crate's kernel-registry-api Cargo feature; a server built without it returns 503 kernel_registry_not_configured, which the CLI surfaces as a normal error envelope. See docs/KERNEL-REGISTRY.md for the manifest schema and signing envelope.
tensor-wasm kernel publish <name> <version> --ptx-file <PATH> --sm <SM> --key-file <PATH> [--publisher <id>] --server <url> computes BLAKE3 over the PTX text, builds a KernelManifest, signs it with the HMAC-SHA256 key from --key-file, and POSTs the bundle to /kernels. The server re-verifies the signature and digest before accepting.
<name>/<version>: positional kernel name (e.g.matmul.f32) and SemVer-style version. Both must be non-empty.--ptx-file <PATH>: PTX text the manifest references. Capped at 16 MiB.--sm <SM>: compute capability the PTX targets (e.g.80for sm_80).--key-file <PATH>: 32-byte HMAC-SHA256 signing key, same on-disk format assnapshot --hmac-key-file(64 hex chars or 32 raw bytes).--publisher <id>: optional advisory publisher identifier baked into the manifest (not covered by the signature in v0.3.x). Defaults totensor-wasm-cli.--server <url>: base URL of the target server.
tensor-wasm kernel list --server <url> GETs /kernels and renders the manifest table (name@version sm=<n> publisher=<id>). Empty registries print (no kernels registered).
tensor-wasm kernel verify <name@version> --manifest-file <PATH> --key-file <PATH> is local-only — it reads a manifest JSON off disk, recomputes the HMAC under --key-file, and compares against the manifest's signature in constant time, also asserting the name@version selector matches. Useful for gating a build pipeline on a manifest verifying before it gets uploaded. No server is contacted.
tensor-wasm serve [--addr <host:port>] [--token <TOKEN>]... [--tenant-header-policy <optional|required>] [--cors-origin <ORIGIN>]... [--max-body-bytes <bytes>] [--allow-plaintext-public]
Run the TensorWasm HTTP API gateway in-process: builds the axum router, binds it to --addr, and serves until Ctrl-C (also SIGTERM on Unix, for clean docker stop / pod termination). This is the entrypoint the README quickstart and docs/GETTING-STARTED.md point at. On a successful bind it prints listening on http://<addr>.
--addr <host:port>: bind address. Defaults to127.0.0.1:8080. Production deployments should pass0.0.0.0:<port>(or a private subnet address) explicitly.--token <TOKEN>: bearer token the gateway accepts. Repeat to allowlist multiple tokens. If omitted entirely, the gateway falls back to readingTENSOR_WASM_API_TOKENSfrom the environment; empty/unset there means dev mode (auth disabled, with a startup warning).--tenant-header-policy <optional|required>:X-TensorWasm-Tenantheader policy.optional(default) mirrorsTENSOR_WASM_API_REQUIRE_TENANTunset;requiredmirrorsTENSOR_WASM_API_REQUIRE_TENANT=1(requests without the header are rejected400).--cors-origin <ORIGIN>: origin to allow via CORS; repeatable. Currently informational — the api crate does not yet expose a programmatic CORS knob, so a non-empty value only logs a warning. Kept so the quickstart command parses cleanly.--max-body-bytes <bytes>: maximum inbound request body size. Defaults to 64 MiB to match the api crate'sMAX_REQUEST_BODY_BYTES. Currently informational — passing a non-default value logs a warning and the gateway uses the compiled-in 64 MiB cap until the api crate grows a body-cap knob.--allow-plaintext-public(env:TENSOR_WASM_ALLOW_PLAINTEXT_PUBLIC): acknowledge knowingly exposing a dev-mode (no-auth) deployment on a non-loopback address. Required when--addrresolves to0.0.0.0,::, or any non-loopback IP and no token allowlist is configured; without it the CLI refuses to bind such a configuration. Setting it does NOT enable auth — it merely silences the safety gate, and a 60-second recurring warning keeps the misconfiguration visible in long-running logs.
The bind-safety gate runs before the wasmtime engine is initialised, so a misconfigured invocation fails fast with a single-line error naming both the address and the fix.
tensor-wasm metrics --server <url> [--output <text|json>]
Fetch and print the /metrics endpoint of a TensorWasm server.
--server <url>: base URL of the target server.--output <text|json>: output format.text(default) prints the raw Prometheus text exposition (pipe togrep '^tensor_wasm_'to filter to TensorWasm's own series);jsonemits a machine-readable document with the parsed samples for CI scripts.
Server-controlled label values are stripped of ASCII control bytes before printing so a malicious server cannot smuggle ANSI escapes into the operator's terminal.
tensor-wasm observe [--addr <url>] [--interval <secs>] [--output <text|json>]
Live operator dashboard. Polls GET /healthz and GET /metrics against the target server on a fixed cadence and rewrites a single screen with the most actionable signals. Intended for on-call incident triage when neither a browser nor a Grafana session is available.
--addr <url>: base URL of the target server. Defaults tohttp://localhost:8080. Must usehttp://orhttps://and have a non-empty host.--interval <secs>: refresh cadence, in seconds. Defaults to2. Must be at least1.--output <text|json>: output format.text(default) renders the in-place ANSI dashboard;jsonemits one machine-readable JSON document per tick as newline-delimited JSON (NDJSON) — in this mode the screen is not cleared between ticks, so the stream is appendable into a log pipeline.
Auth/tenant headers (Authorization: Bearer ..., X-TensorWasm-Tenant) are attached when configured, identical to every other HTTP-shaped subcommand. The refresh loop exits cleanly on Ctrl-C; per-tick fetch failures (network blips, server restart) are rendered into the board rather than aborting the loop.
Example:
TENSOR_WASM_TOKEN=devtoken tensor-wasm observe --addr https://tensor-wasm.example.com --interval 5
Sample output:
Craton TensorWasm — operator dashboard
target: http://localhost:8080 interval: 2s
--------------------------------------------------
liveness: /healthz ok
uptime: n/a
functions: ?
jobs.active:?
instances: 3
gpu.memory: 1.00 GiB
--------------------------------------------------
endpoint req/s p50 p95
/healthz 4.50 n/a n/a
/invoke 10.00 10.0ms 275.0ms
--------------------------------------------------
Ctrl-C to exit.
Cells render as ? (for counts) or n/a (for percentages, latencies) when the underlying Prometheus series is absent from the scrape — for example, tensor_wasm_functions_total and tensor_wasm_jobs_active are reserved series names not yet emitted by tensor-wasm-core::metrics, so they show ? against current servers until they land. The dashboard never substitutes a misleading zero.
Prometheus parsing is done in-process with a small inline parser; no extra dependency is pulled in for the dashboard. The histogram percentiles use linear interpolation across the _bucket series, matching PromQL's histogram_quantile() for buckets that share a path label.
tensor-wasm completions <shell> [--out-dir <dir>]
Emit a shell-completion script for the named shell. Supported values match clap_complete::Shell: bash, zsh, fish, elvish, powershell.
By default the script is written to stdout. Pass --out-dir <dir> to write it to a conventional filename inside <dir> instead — used to regenerate the committed scaffolding under crates/tensor-wasm-cli/completions/.
Wire-up examples:
# bash, system-wide
tensor-wasm completions bash | sudo tee /etc/bash_completion.d/tensor-wasm
# zsh, per-user
tensor-wasm completions zsh > ~/.zsh/completions/_tensor-wasm
# fish
tensor-wasm completions fish > ~/.config/fish/completions/tensor-wasm.fish
# PowerShell, current session
tensor-wasm completions powershell | Out-String | Invoke-Expression
# Regenerate the committed scaffolding
tensor-wasm completions bash --out-dir crates/tensor-wasm-cli/completions
tensor-wasm completions zsh --out-dir crates/tensor-wasm-cli/completions
tensor-wasm completions fish --out-dir crates/tensor-wasm-cli/completions
tensor-wasm man [--out-dir <dir>]
Generate roff(7) man pages for tensor-wasm and every subcommand, sourced from
the same clap definitions the help output uses. With no flags, the root page
is written to stdout. With --out-dir <dir>, the root page plus one
tensor-wasm-<sub>.1 per subcommand is written under <dir> (this is how the
committed scaffolding under crates/tensor-wasm-cli/man/
is regenerated).
# Regenerate all committed man pages in one pass
tensor-wasm man --out-dir crates/tensor-wasm-cli/man
# Quick preview without committing
tensor-wasm man | man -l -
Shell completions
Pre-generated completion scripts for bash, zsh, and fish live under
crates/tensor-wasm-cli/completions/.
That directory's README.md
covers per-OS install paths (system-wide vs per-user, Linux vs macOS, the
zsh $fpath story, and the fish ~/.config/fish/completions/ convention).
Short version:
| Shell | File | Install path (per-user) |
|---|---|---|
| bash | crates/tensor-wasm-cli/completions/tensor-wasm.bash | ~/.local/share/bash-completion/completions/tensor-wasm |
| zsh | crates/tensor-wasm-cli/completions/_tensor-wasm | ~/.zsh/completions/_tensor-wasm (on $fpath) |
| fish | crates/tensor-wasm-cli/completions/tensor-wasm.fish | ~/.config/fish/completions/tensor-wasm.fish |
Regenerate after any clap-flag change with tensor-wasm completions <shell> --out-dir crates/tensor-wasm-cli/completions.
Man pages
Pre-generated .1 man pages live under
crates/tensor-wasm-cli/man/:
tensor-wasm.1— root command + global flagstensor-wasm-run.1,tensor-wasm-deploy.1,tensor-wasm-invoke.1,tensor-wasm-bench.1,tensor-wasm-snapshot.1,tensor-wasm-serve.1,tensor-wasm-metrics.1,tensor-wasm-observe.1,tensor-wasm-completions.1,tensor-wasm-man.1— one per top-level subcommand
The man --out-dir generator walks the clap command tree, so it also emits a
tensor-wasm-kernel.1 page for the kernel subcommand; that page is not yet
committed under crates/tensor-wasm-cli/man/, so re-running the regeneration
command below will add it on the next pass.
That directory's README.md covers
per-OS install paths (Linux man-db, macOS, WSL on Windows) and the
mandb reindex step.
Regenerate after any clap-flag change with tensor-wasm man --out-dir crates/tensor-wasm-cli/man.
Cross-references
- BUILD.md — workspace build matrix and feature flags (the CLI is part of
cargo build --workspace). - API.md — REST surface that
tensor-wasm deploy/invoke/metricstarget. - AUTO-OFFLOAD.md — JIT path triggered by
tensor-wasm runandtensor-wasm benchwhen a guest is auto-offload-eligible.
Stability
The CLI surface — subcommand names, required positional arguments, and the long-form flags listed above — is considered stable for the v0.1 release window. Short-form flag aliases and the machine-readable output format (the --output json envelopes) are not yet stable.