TensorWasm
Craton TensorWasm — Audit Log
Craton TensorWasm — Audit Log
This document is the v0.4 "Audit log" exit criterion from
PATH-TO-V1.md. It defines the wire-format schema of
each audit record, the configuration knob that selects the destination
sink, the operational guidance for log rotation, and the integration
contract with the W2.8 mTLS / reverse-proxy story.
If you are running TensorWasm in production: this is the file your
auditor will ask for. If you are building a tail-side consumer (SIEM,
Loki, BigQuery, ...): the schema below is the
contract — error.kind and action strings are stable across patch
releases.
Contents
- Record schema — 2. Sample records — 3. Configuration — 4. What gets logged (and what does not) — 5. Log rotation and storage — 6. mTLS / XFCC integration — 7. Latency budget — 8. Stability — 9. Related
1. Record schema
Every state-mutating API call produces exactly one JSON object, emitted as a single line (JSONL convention — no embedded newlines, no leading or trailing whitespace). The full schema:
| Field | Type | Required | Stability | Description |
|---|---|---|---|---|
ts_unix_ms | u64 | yes | stable | Wall-clock time the record was synthesised. Millisecond precision, Unix epoch. |
request_id | UUIDv4 string | yes | stable | Per-request id generated by the audit middleware. Surfaced into request extensions so application logs can correlate. |
actor.kind | "bearer"|"dev" | yes | stable | bearer for a token-authenticated caller; dev when no TENSOR_WASM_API_TOKENS is configured. |
actor.token_id | u64 or null | optional | stable | Stable process-local hash of the bearer string. null for dev-mode actors. |
actor.scope.kind | tag string | yes | stable | One of "wildcard", "tenant_set", "dev". See below. |
actor.scope.tenants | [u64, ...] | optional | stable | Sorted list of allowed tenant ids. Present only when scope.kind == "tenant_set". |
action | tag string | yes | stable | One of "create_function", "delete_function", "invoke_function", "invoke_function_async". |
resource.function_id | UUIDv4 string | optional | stable | Function id parsed from the URL. Absent for POST /functions (the id is assigned by the handler). |
resource.tenant_id | u64 | optional | stable | Tenant id resolved from X-TensorWasm-Tenant. Absent for routes that do not bind to a tenant. |
outcome.status_code | u16 | yes | stable | HTTP status code returned to the client. |
outcome.error_kind | string | optional | stable | The error.kind value from the JSON error envelope, when the response was non-2xx. |
latency_ms | u64 | yes | stable | End-to-end handler latency. |
peer_addr | string or null | optional | additive | Caller's peer socket address. Populated only when the listener is bound via axum::extract::ConnectInfo; today the serve() helper does not wire this — value is always null in v0.4. |
client_cert_subject | string or null | optional | stable | Client-cert Subject DN recovered from X-Forwarded-Client-Cert. Populated when an XFCC-aware reverse proxy fronts the gateway. See §6. |
The scope object is internally tagged: kind is always present, and
the extra fields are gated on that tag. This makes the record easy to
pattern-match in jq:
jq 'select(.actor.scope.kind == "tenant_set") | .actor.scope.tenants'
actor.scope.kind semantics
kind | Meaning |
|---|---|
wildcard | Token covers every tenant (tenant=* or a legacy bare entry). |
tenant_set | Token covers only the tenant ids listed in tenants (sorted, stable order). |
dev | Dev-mode pass-through — no allowlist was configured. Should not appear in prod. |
A dev record landing in your production audit stream is the
fingerprint of a misconfigured deployment that silently ran without
TENSOR_WASM_API_TOKENS set. Alert on it.
2. Sample records
2.1 Successful invoke
{
"ts_unix_ms": 1716491220123,
"request_id": "b8b6f7e0-3c12-4d51-a0a0-9d7b67c3a5e1",
"actor": {
"kind": "bearer",
"token_id": 14217683123456789,
"scope": { "kind": "tenant_set", "tenants": [1, 2, 7] }
},
"action": "invoke_function",
"resource": {
"function_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"tenant_id": 7
},
"outcome": { "status_code": 200 },
"latency_ms": 14,
"peer_addr": null,
"client_cert_subject": null
}
2.2 Tenant-scope denial
{
"ts_unix_ms": 1716491220456,
"request_id": "27e0f9c1-1e84-4b62-9d7c-7a1f4d5b1a3a",
"actor": {
"kind": "bearer",
"token_id": 14217683123456789,
"scope": { "kind": "tenant_set", "tenants": [1, 2, 7] }
},
"action": "invoke_function",
"resource": {
"function_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"tenant_id": 99
},
"outcome": { "status_code": 403, "error_kind": "tenant_scope_denied" },
"latency_ms": 0,
"peer_addr": null,
"client_cert_subject": null
}
2.3 mTLS-fronted deploy
{
"ts_unix_ms": 1716491220789,
"request_id": "44b3a812-6f4c-49c8-b1c5-c0c8a8a2e2bd",
"actor": {
"kind": "bearer",
"token_id": 14217683123456789,
"scope": { "kind": "wildcard" }
},
"action": "create_function",
"resource": { "tenant_id": 7 },
"outcome": { "status_code": 200 },
"latency_ms": 21,
"peer_addr": null,
"client_cert_subject": "CN=client-prod,O=Acme"
}
3. Configuration (TENSOR_WASM_API_AUDIT_LOG)
The audit-log destination is selected at server startup by the
environment variable TENSOR_WASM_API_AUDIT_LOG:
| Value | Resulting sink | Use case |
|---|---|---|
| (unset) or empty | stdout (JSONL) | Default. Container runtimes capture stdout and forward it. |
stdout | stdout (JSONL) | Explicit form — recommended for self-documenting deployments. |
none | no-op (audit disabled) | Downstream consumer already aggregates Prometheus + OTel; no third stream wanted. |
file:/path/to/audit.log | append-only file (JSONL) | Bare-metal hosts, classic syslog-style consumers. |
Unrecognised values fall back to stdout with a tracing::warn! at
startup. A file: path that cannot be opened logs tracing::error! and
also falls back to stdout: refusing to start because a log target is
unavailable would be hostile in container environments where the
backing volume mounts asynchronously.
3.1 Stdout sink
# Default — equivalent.
unset TENSOR_WASM_API_AUDIT_LOG
TENSOR_WASM_API_AUDIT_LOG=stdout tensor-wasm serve --addr 0.0.0.0:8080
The stdout sink writes each record via println!, then mirrors it at
tracing::info! level on the tensor_wasm_api::audit target. The
info! mirror is what an OTel collector picks up alongside the
per-request span, so audit records correlate with traces by
request_id / traceparent.
3.2 File sink
mkdir -p /var/log/tensor-wasm && chown tensor-wasm:tensor-wasm /var/log/tensor-wasm
TENSOR_WASM_API_AUDIT_LOG=file:/var/log/tensor-wasm/audit.log \
tensor-wasm serve --addr 0.0.0.0:8080
Each record is appended with write_all + flush. The flush forces a
write(2) per record so a process crash loses at most one record;
on commodity NVMe the worst case we measured is ~30–80 µs (Linux ext4,
single-writer). See §7.
3.3 Disabled
TENSOR_WASM_API_AUDIT_LOG=none tensor-wasm serve --addr 0.0.0.0:8080
Selecting none swaps in the NoopSink: the middleware still computes
the timestamp and request id (so handlers' Extension<Uuid> lookups
keep working), but the record is dropped before serialisation. This
mode is for deployments that have a separate compliance pipeline
(typically built on the W2.3 HTTP request metrics + OTel spans) and do
not want a third stream.
4. What gets logged (and what does not)
4.1 State-mutating routes that emit records
| Method | Path | action |
|---|---|---|
POST | /functions | create_function |
DELETE | /functions/{id} | delete_function |
POST | /functions/{id}/invoke | invoke_function |
POST | /functions/{id}/invoke-async | invoke_function_async |
Records are emitted for every outcome — including 4xx denials,
5xx server errors, and 401 rejections. The audit middleware sits
after bearer_auth, tenant_scope, and rate_limit in the chain,
so a 401 from bearer_auth short-circuits before the audit layer runs
— those rejections do not emit a record (no authenticated actor
exists yet). 403 tenant_scope_denied denials and 429 rate_limited
rejections do emit (actor is known by then).
4.2 Read-only routes that do NOT emit
| Method | Path | Reason |
|---|---|---|
GET | /healthz | Probe noise — every container orchestrator hits this every few seconds. |
GET | /metrics | Prometheus scrape — typically every 15 s. |
GET | /jobs/{id} | Poll loop on async invocations — high cardinality, low value. |
| any | unknown route | 404 from the router; nothing meaningful to audit. |
The route filter is AuditAction::classify. Suppression happens
before record serialisation, so the entire mechanism is zero cost
on the read-only paths.
4.3 What the record does NOT carry
The audit log is intentionally narrow. It does not contain:
- Request bodies — Wasm module bytes, invocation arguments, or any payload data. Logging the body would 64×inflate the log and potentially capture secrets the caller passed as invocation input.
- Response bodies — including JSON results from successful invocations.
- The bearer string itself — only the SipHash-derived
token_idappears. The hash is keyed with process-local random state by the standard library; it is stable within a process lifetime but not comparable across restarts. - PII or user identifiers beyond what the operator built into the bearer-token allocation scheme.
If you need richer per-request introspection, attach an OTel collector
and consume the http.request spans the gateway emits. The audit log
is the minimum who-when-what trail, not a debugging fire-hose.
5. Log rotation and storage
5.1 Stdout sink
When the gateway runs under a container runtime, stdout capture is the runtime's responsibility. Defaults:
- Docker with the default
json-filelog driver: rotates at the per-container limits (defaults are unbounded — set--log-opt max-size=100m --log-opt max-file=5explicitly). - Kubernetes: kubelet rotates container logs at the node-level
configuration (
containerLogMaxSizeandcontainerLogMaxFiles, defaults10Mi×5). Configure your log shipper (Fluent Bit, Vector, Loki Promtail) to honour rotation. The kubelet rewrites the file on rotation, so a naivetail -Fkeeps working. - systemd with
journald: rotation policy lives in/etc/systemd/journald.conf(SystemMaxUse,MaxFileSec).
For a SIEM pipeline, treat the JSONL lines as the log-shipping
contract. Each line is independently parseable; out-of-order delivery
across hosts is acceptable because the ts_unix_ms field reorders
correctly.
5.2 File sink
The FileJsonSink opens the path append-only at startup; it does not
rotate the file itself. Operators are expected to use one of:
-
logrotate with the
copytruncatestrategy (recommended for the v0.4 binary, since the sink holds a long-lived file handle):/var/log/tensor-wasm/audit.log { daily rotate 14 compress delaycompress missingok notifempty copytruncate create 0640 tensor-wasm tensor-wasm }copytruncatemeans logrotate copies the file contents to the rotated archive and truncates the original in place. The gateway's open file descriptor keeps pointing at the (now empty) inode and resumes appending without an explicit reopen signal. This pattern loses no records and requires no process restart.createis harmless but redundant whencopytruncateis set — both branches work. -
Rotate by SIGHUP-driven restart (simpler but loses a few records in flight): use a sidecar that periodically
mv audit.log audit.log.Ngzip+systemctl reload tensor-wasm-api. Requires the reopen-on-SIGHUP feature, which is not yet implemented in v0.4 (see §8). Until then,copytruncateis the supported path.
5.3 Long-term retention
The audit log is the durable record of who did what — your compliance and incident-response window length determines retention. For SOC 2 / ISO 27001 the typical target is 12 months. The records are dense (a typical line is ~300–500 bytes), so a node serving 100 state-mutating calls per second produces roughly 4 GiB per day before gzip — plan accordingly.
6. mTLS / XFCC integration (W2.8)
The audit middleware recovers the client_cert_subject field from the
X-Forwarded-Client-Cert (XFCC) request header when an mTLS-terminating
reverse proxy fronts the gateway. The pattern is documented in
docs/deployment/mtls.md §4 (Architecture B).
6.1 What we parse
The XFCC value is a ;-separated list of key=value pairs per the
Envoy XFCC spec.
We extract the first Subject="..." component, unescape doubled \"
sequences, and record the inner DN as client_cert_subject. Other
components (URI=, Hash=, DNS=) are intentionally ignored in
v0.4 — adding them is a forward-compatible additive change.
6.2 Trust boundary
The audit middleware does not validate that XFCC came from a trusted
proxy. Any caller can set the header on a plaintext request and
control the recorded client_cert_subject value. This is the classic
forwarded-header trust bug (same shape as X-Forwarded-For
spoofing). Two mitigations:
- Bind the gateway to a private network and refuse plaintext external traffic. When the only callers that can reach the listener are your trusted proxy (Architecture B) the spoofing surface vanishes.
- Configure the proxy to overwrite XFCC. Envoy's
forward_client_cert_details: SANITIZE_SETmode replaces any incoming XFCC with the value Envoy computes itself; the equivalent in nginx is a defensiveproxy_set_header X-Forwarded-Client-Cert "";immediately before the trusted write.
A future PR will add an opt-in trusted-proxy CIDR allowlist
(TENSOR_WASM_API_TRUSTED_PROXY_CIDRS) so the audit middleware can
gate XFCC parsing on the connection's remote address. See the
W2.8 mTLS doc, "TODO (v0.4)" in §7.4.
6.3 Architecture A — self-terminated mTLS
When TensorWasm itself terminates the TLS handshake (Architecture A in
the W2.8 doc), the client cert is available in tokio_rustls's
session state, not via the XFCC header. The audit middleware does not
yet consume that source — Architecture A itself is not implemented in
v0.4. When serve_tls() lands, the cert Subject from
rustls::server::ServerConnection::peer_certificates() should populate
the same client_cert_subject field. The on-wire shape does not
change.
7. Latency budget
The audit middleware is documented in the source to add < 100 µs per state-mutating request under typical load. The measurement:
| Sink | Observed cost per emit | Notes |
|---|---|---|
StdoutJsonSink | ~6–18 µs | serde_json::to_string (~3–6 µs) + println! lock + write. |
FileJsonSink | ~30–80 µs | adds Mutex<File> lock + write_all + flush (per-record fsync). |
NoopSink | ~50 ns | record is built but never serialised; just an Arc dispatch. |
| read-only routes | 0 | route filter short-circuits before record construction. |
These are local figures from a modern x86 workstation (Ryzen 9, NVMe);
the orchestrator should re-measure under realistic disk and contention.
If the file sink shows tails > 100 µs in your environment, the
recommended fix is to wrap the write in tokio::task::spawn_blocking
— the trait's emit is sync today because the in-process latency we
observed does not justify the additional task spawn (5–10 µs of its
own) and the cadence is bounded by the upstream rate limit anyway.
The middleware runs after the handler returns, so the cost is
hidden behind the response future's .await — the client receives the
bytes once the inner future yields, but the audit work overlaps with
TCP write(2) flushing the response. End-to-end client latency
reflects this overlap; the per-request budget above is the added
serial cost.
8. Stability guarantees
The following are part of the public contract and will only change across a major version bump:
- The four
actiontag strings (create_function,delete_function,invoke_function,invoke_function_async). - The
actor.kindtag strings (bearer,dev). - The
actor.scope.kindtag strings (wildcard,tenant_set,dev). - The
outcome.error_kindvalues — these are the samekindstrings documented incrates/tensor-wasm-api/API.md. - The
TENSOR_WASM_API_AUDIT_LOGenv var name and its three accepted shapes (unset/stdout,none,file:<path>).
The following are not stable across patch releases:
- The
token_idinteger value for a given bearer string — derived from the standard library's randomly-seeded SipHash and re-seeded per process. - The
latency_msdistribution — improves as the executor speeds up. - Whether
peer_addris populated by theserve()helper. Today it is alwaysnull; a future PR will wireinto_make_service_with_connect_info.
v0.4 limitations to track for v0.5:
- No SIGHUP-driven file reopen; rotation requires
copytruncateor a process restart. - No trusted-proxy CIDR allowlist; XFCC is parsed unconditionally when the header is present.
- No structured emitter for an OTLP collector — the
tracing::info!mirror is the integration point today. peer_addralwaysnull(see above).
9. Related
docs/PATH-TO-V1.md— v0.4 "Audit log" exit criterion.crates/tensor-wasm-api/API.md— HTTP surface, error envelope, thekindstrings that appear inoutcome.error_kind.docs/deployment/mtls.md— XFCC source and trust boundary discussion (§6 above).docs/SLO.md— request-latency SLOs the audit middleware must not violate (the < 100 µs budget defended in §7).crates/tensor-wasm-api/src/audit.rs— implementation, sink trait, configuration.
Status: v0.4 release. Schema and the four action strings are
frozen. peer_addr wiring and the trusted-proxy CIDR allowlist are
the two open items targeted for v0.5.