TensorWasm
Craton TensorWasm — Reference Grafana dashboards
Craton TensorWasm — Reference Grafana dashboards
This directory holds the importable Grafana dashboards that ship with
TensorWasm. They are the operator-facing companion to
docs/SLO.md and the docs/runbooks/ set: every alert
in those documents has at least one panel here, and every SLI defined
in docs/SLO.md has a top-row stat panel.
Status: v0.3 gate. The dashboard is checked in alongside
PATH-TO-V1.md milestone v0.3 ("Production
observability"). Several panels reference metrics that have not yet
been emitted by the code; those panels will show "No data" / "N/A"
until the metric ships. See Metric inventory.
Files
| File | Purpose |
|---|---|
tensor-wasm-overview.json | The main overview dashboard. SLO summary + HTTP + tenant + snapshot + JIT + back-pressure rows. Imports cleanly into any Grafana with a Prometheus datasource configured. |
README.md | This file. |
How to import
- Open Grafana.
- Navigate to Dashboards → New → Import.
- Either:
- Upload JSON file: select
docs/dashboards/tensor-wasm-overview.jsonfrom your local checkout, or - Paste JSON / URL: paste the raw contents of the file, or the
raw-file URL (e.g.
https://github.com/craton-co/craton-tensor-wasm/raw/main/docs/dashboards/tensor-wasm-overview.json).
- Upload JSON file: select
- Grafana prompts for the
DS_PROMETHEUSdatasource variable — select the Prometheus datasource that scrapes your TensorWasm process'sGET /metricsendpoint. Any Prometheus datasource works; the dashboard does not depend on Mimir, Cortex, or any distribution-specific feature. - Click Import. Grafana assigns a fresh
uidandid; the JSON file does not pin either, so two Grafana instances will not clash if they both import this dashboard.
To pull updates after the dashboard JSON changes upstream, re-import the new JSON over the existing dashboard. Grafana offers a "Replace existing" toggle that preserves the dashboard's URL.
Prometheus scrape interval
The dashboard uses $__rate_interval in every rate() and
histogram_quantile() expression, which automatically picks a window
appropriate for the current panel resolution and the configured scrape
interval. The recommended minimum scrape interval is 15 seconds
(Prometheus default); going below 5 seconds will make per-panel resolution
choppy without adding signal, and going above 60 seconds will smear
the latency P95 panels.
Configure the scrape in your prometheus.yml:
scrape_configs:
- job_name: tensor-wasm
metrics_path: /metrics
scrape_interval: 15s
static_configs:
- targets: ['tensor-wasm.your-host:8080']
Dashboard variables
Three templating variables are defined at the dashboard level.
| Name | Type | Source | Default | Purpose |
|---|---|---|---|---|
DS_PROMETHEUS | datasource | Grafana | (prompt at import) | The Prometheus datasource feeding every panel. Selected once at import. |
tenant | query | label_values(tensor_wasm_active_instances, tenant) | All | Filters the Tenant row panels by tenant. Multi-select with "All" included. |
route | query | label_values(tensor_wasm_http_requests_total, route) | All | Filters the HTTP traffic row panels by axum route template (e.g. /healthz, /functions/:id/invoke). Multi-select with "All" included. |
The tenant and route variables degrade gracefully: if the
underlying label does not yet exist on the metric (per W2.3), the
variable's option list will be empty and the panels will fall through
to whatever series exists (typically a single un-labeled aggregate).
Panel inventory
The grid is 24 columns wide. Rows from top to bottom:
SLO summary (top row, stat panels — thresholds match docs/SLO.md §3)
availability_http (30d)— 99.5% targeterror_rate_invoke (5m)— ≤ 1.0% targethealthz P95 (5m)— ≤ 10 ms targetinvoke P95 (5m)— ≤ 100 ms host-only / ≤ 500 ms with GPU dispatchdispatch P95 (5m)— ≤ 50 µs host-only; CUDA-host TBD (v0.4)
Build identity (header row, stat panel)
-
Binary version— Stat panel reading thetensor_wasm_build_infoinfo-style gauge. Recommended PromQL — render the gauge'sversionlabel directly with no aggregation, so a heterogeneous fleet shows one row per running binary:tensor_wasm_build_infoIn Grafana, set the panel's value mapping to the
versionlabel via Value options → Fields → Labels → version (or uselabel_values(tensor_wasm_build_info, version)in a templated caption). The same metric carriesgit_sha,rustc_version,profile, andtargetlabels; surface whichever are useful as secondary stat panels or as the panel tooltip. The gauge is always1— the payload is the label set, not the number. Useful during an upgrade window to confirm every replica reports the expected version; cross-references thetensor_wasm_build_infocheck indocs/UPGRADE.md§6.
HTTP traffic
Requests/sec by route— timeseriesError rate (5xx) by route— timeseries, 1% threshold lineHTTP latency P50 / P95 / P99 by route— timeseries, log Y axis
Tenant
Active instances by tenant— stacked timeseriesGPU memory by tenant— stacked timeseries, bytesKernel dispatches/sec by tenant— timeseries
Snapshot
Snapshot capture P95— timeseriesSnapshot restore P95— timeseriesSnapshot disk round-trip P95— timeseries
JIT / auto-offload
JIT cache hit ratio— timeseries; carries two queries (the v0.3 proxy viaoffload_success/offload_fallbackand the intendedjit_cache_warm_hit/cold_miss_then_insertratio that fills in once the metric pair ships)JIT emit_text P95 by blueprint— timeseries
Back-pressure
Back-pressure permits (used vs available)— timeseriesBack-pressure permit utilization— timeseries, 80% / 95% threshold lines
Metric inventory
Audited against crates/tensor-wasm-core/src/metrics.rs and
crates/tensor-wasm-api/src/http_metrics.rs (the sources of truth
for currently-emitted metrics).
Exists today
These metrics back panels that render real data immediately after import:
tensor_wasm_active_instances(gauge)tensor_wasm_gpu_memory_used_bytes(gauge)tensor_wasm_kernel_dispatches_total(counter)tensor_wasm_kernel_latency_seconds_bucket/tensor_wasm_kernel_latency_seconds_count/tensor_wasm_kernel_latency_seconds_sum(histogram, 14 buckets spanning 10 µs to 10 s)tensor_wasm_instance_spawns_total(counter)tensor_wasm_instance_terminations_total(counter)tensor_wasm_offload_success_total(counter)tensor_wasm_offload_fallback_total(counter)tensor_wasm_http_requests_total{route,method,status}(counter, landed in W2.3 viatensor_wasm_api::http_metrics)tensor_wasm_http_request_duration_seconds_bucket/tensor_wasm_http_request_duration_seconds_count/tensor_wasm_http_request_duration_seconds_sum(histogram, 12 buckets spanning 1 ms to 10 s, labelsroute/method/status, landed in W2.3)tensor_wasm_http_requests_in_flight{route,method}(gauge, landed in W2.3; capacity panel only, not an SLI)tensor_wasm_build_info{version,git_sha,rustc_version,profile,target}(info-style gauge, value always1; landed in W4.9). Powers the Build identity stat panel and is referenced by the UPGRADE.md post-upgrade verification step that confirms every replica reports the expectedversionafter a rolling deploy.tensor_wasm_jobs_active(gauge, single series; landed in C3). Incremented whenPOST /functions/:id/invoke-asyncrecords aJobRecordin the API-layer registry, decremented when the spawned task transitions the job toCompletedorFailed. The Async-invocation row's "Pending jobs" panel renders this series directly. v0.3.x is intentionally a single series; the v0.4 follow-up will switch the field to aFamily<TenantLabels, ...>and the panel's PromQL will tolerate the relabel without edit.tensor_wasm_gpu_memory_bytes_per_tenant{tenant_id}(gauge family; landed in C3). Updated bytensor-wasm-tenanton everyconsume_bytes/release_bytesaccounting transition when the context is constructed withTenantContextBuilder::with_metrics. Additive to the pre-existing single-series total attensor_wasm_gpu_memory_used_bytes;sum by () (tensor_wasm_gpu_memory_bytes_per_tenant)is expected to track that total within scrape jitter. The Tenant row's "GPU memory by tenant" panel switches from the single-series total to this family automatically once at least one tenant has been observed.
Caveat: the gauges and counters listed above are emitted today as
aggregates with no tenant label. The Tenant-row panels query
sum by (tenant) (...), which collapses to a single series until
per-tenant labeling lands. That is intentional — the panel layout
should not change when the labels appear, only the number of series.
TODO, will fill in when remaining follow-ups land
These metrics are referenced by panels but are not yet emitted by the code. The corresponding panels will show "No data" / "N/A" until the listed follow-up ships the missing instrumentation. Each is flagged inline in the panel description.
(The HTTP gateway counter / duration / in-flight metrics that were previously in this list landed in W2.3 — see Exists today.)
Tenant labeling on existing metrics (a relabeling, not a new metric):
tensor_wasm_active_instances{tenant}(add label)tensor_wasm_kernel_dispatches_total{tenant}(add label)
Per-tenant GPU memory has been replaced by the additive family
tensor_wasm_gpu_memory_bytes_per_tenant{tenant_id} rather than a
relabeling — see Exists today. The pre-existing
single-series tensor_wasm_gpu_memory_used_bytes total stays, so
existing alerts that aggregate against it do not break.
Snapshot subsystem (proposed instrumentation point: tensor-wasm-mem
snapshot save/restore paths):
tensor_wasm_snapshot_capture_seconds_bucket(histogram)tensor_wasm_snapshot_restore_seconds_bucket(histogram)tensor_wasm_snapshot_disk_round_trip_seconds_bucket(histogram)
JIT / auto-offload subsystem (proposed instrumentation point:
tensor-wasm-jit cache and emit paths):
tensor_wasm_jit_cache_warm_hit_total(counter)tensor_wasm_jit_cache_cold_miss_then_insert_total(counter)tensor_wasm_jit_emit_text_seconds_bucket{blueprint}(histogram)
Back-pressure semaphore (proposed instrumentation point:
tensor-wasm-exec dispatch semaphore):
tensor_wasm_backpressure_permits_used(gauge)tensor_wasm_backpressure_permits_available(gauge)
When a metric in this TODO list ships, no dashboard edit is required — the panel that references it will start populating with data on the next scrape. The panel descriptions should then be updated in a follow-up PR to remove the "shows N/A until …" hedge.
Editing the dashboard
The JSON is the source of truth. If you edit the dashboard in Grafana's UI:
- Use Dashboard settings → JSON Model to copy the updated JSON.
- Strip the
idanduidfields from the top level — they are instance-specific and should not be committed. - Replace the contents of
tensor-wasm-overview.jsonwith the stripped JSON. - Commit per the rules in
docs/SLO.md§9 (any threshold or query change that affects an SLO panel must land in the same PR as the SLO change).
Cross-references
docs/SLO.md— SLI/SLO definitions and burn-rate alert thresholds that this dashboard visualizes.docs/OBSERVABILITY.md— tracing schema and OTLP setup; the metrics half of the same observability story.crates/tensor-wasm-core/src/metrics.rs— source of truth for currently-emitted metric names.PATH-TO-V1.md— v0.3 exit criterion "Reference Grafana dashboard committed" is satisfied by this directory.
Status: v0.3 gate. HTTP request rate, error rate, and latency panels
render real data as of W2.3. The async-invocation "Pending jobs"
panel and the per-tenant GPU-memory breakdown render real data as of
C3 (tensor_wasm_jobs_active,
tensor_wasm_gpu_memory_bytes_per_tenant). The remaining "No data"
panels (snapshot histograms, JIT cache counters, back-pressure
gauges, and tenant labeling on the remaining counter/gauge pair) are
tracked in Metric inventory.