TensorWasm

Craton TensorWasm — Audit Log

This document is the v0.4 "Audit log" exit criterion from PATH-TO-V1.md. It defines the wire-format schema of each audit record, the configuration knob that selects the destination sink, the operational guidance for log rotation, and the integration contract with the W2.8 mTLS / reverse-proxy story.

If you are running TensorWasm in production: this is the file your auditor will ask for. If you are building a tail-side consumer (SIEM, Loki, BigQuery, ...): the schema below is the contract — error.kind and action strings are stable across patch releases.

Record schema — 2. Sample records — 3. Configuration — 4. What gets logged (and what does not) — 5. Log rotation and storage — 6. mTLS / XFCC integration — 7. Latency budget — 8. Stability — 9. Related

1. Record schema

Every state-mutating API call produces exactly one JSON object, emitted as a single line (JSONL convention — no embedded newlines, no leading or trailing whitespace). The full schema:

Field	Type	Required	Stability	Description
`ts_unix_ms`	`u64`	yes	stable	Wall-clock time the record was synthesised. Millisecond precision, Unix epoch.
`request_id`	UUIDv4 string	yes	stable	Per-request id generated by the audit middleware. Surfaced into request extensions so application logs can correlate.
`actor.kind`	`"bearer"\|"dev"`	yes	stable	`bearer` for a token-authenticated caller; `dev` when no `TENSOR_WASM_API_TOKENS` is configured.
`actor.token_id`	`u64` or `null`	optional	stable	Stable process-local hash of the bearer string. `null` for dev-mode actors.
`actor.scope.kind`	tag string	yes	stable	One of `"wildcard"`, `"tenant_set"`, `"dev"`. See below.
`actor.scope.tenants`	`[u64, ...]`	optional	stable	Sorted list of allowed tenant ids. Present only when `scope.kind == "tenant_set"`.
`action`	tag string	yes	stable	One of `"create_function"`, `"delete_function"`, `"invoke_function"`, `"invoke_function_async"`.
`resource.function_id`	UUIDv4 string	optional	stable	Function id parsed from the URL. Absent for `POST /functions` (the id is assigned by the handler).
`resource.tenant_id`	`u64`	optional	stable	Tenant id resolved from `X-TensorWasm-Tenant`. Absent for routes that do not bind to a tenant.
`outcome.status_code`	`u16`	yes	stable	HTTP status code returned to the client.
`outcome.error_kind`	string	optional	stable	The `error.kind` value from the JSON error envelope, when the response was non-2xx.
`latency_ms`	`u64`	yes	stable	End-to-end handler latency.
`peer_addr`	string or `null`	optional	additive	Caller's peer socket address. Populated only when the listener is bound via `axum::extract::ConnectInfo`; today the `serve()` helper does not wire this — value is always `null` in v0.4.
`client_cert_subject`	string or `null`	optional	stable	Client-cert Subject DN recovered from `X-Forwarded-Client-Cert`. Populated when an XFCC-aware reverse proxy fronts the gateway. See §6.

The scope object is internally tagged: kind is always present, and the extra fields are gated on that tag. This makes the record easy to pattern-match in jq:

jq 'select(.actor.scope.kind == "tenant_set") | .actor.scope.tenants'

`actor.scope.kind` semantics

`kind`	Meaning
`wildcard`	Token covers every tenant (`tenant=*` or a legacy bare entry).
`tenant_set`	Token covers only the tenant ids listed in `tenants` (sorted, stable order).
`dev`	Dev-mode pass-through — no allowlist was configured. Should not appear in prod.

A dev record landing in your production audit stream is the fingerprint of a misconfigured deployment that silently ran without TENSOR_WASM_API_TOKENS set. Alert on it.

2. Sample records

2.1 Successful invoke

{
  "ts_unix_ms": 1716491220123,
  "request_id": "b8b6f7e0-3c12-4d51-a0a0-9d7b67c3a5e1",
  "actor": {
    "kind": "bearer",
    "token_id": 14217683123456789,
    "scope": { "kind": "tenant_set", "tenants": [1, 2, 7] }
  },
  "action": "invoke_function",
  "resource": {
    "function_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
    "tenant_id": 7
  },
  "outcome": { "status_code": 200 },
  "latency_ms": 14,
  "peer_addr": null,
  "client_cert_subject": null
}

2.2 Tenant-scope denial

{
  "ts_unix_ms": 1716491220456,
  "request_id": "27e0f9c1-1e84-4b62-9d7c-7a1f4d5b1a3a",
  "actor": {
    "kind": "bearer",
    "token_id": 14217683123456789,
    "scope": { "kind": "tenant_set", "tenants": [1, 2, 7] }
  },
  "action": "invoke_function",
  "resource": {
    "function_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
    "tenant_id": 99
  },
  "outcome": { "status_code": 403, "error_kind": "tenant_scope_denied" },
  "latency_ms": 0,
  "peer_addr": null,
  "client_cert_subject": null
}

2.3 mTLS-fronted deploy

{
  "ts_unix_ms": 1716491220789,
  "request_id": "44b3a812-6f4c-49c8-b1c5-c0c8a8a2e2bd",
  "actor": {
    "kind": "bearer",
    "token_id": 14217683123456789,
    "scope": { "kind": "wildcard" }
  },
  "action": "create_function",
  "resource": { "tenant_id": 7 },
  "outcome": { "status_code": 200 },
  "latency_ms": 21,
  "peer_addr": null,
  "client_cert_subject": "CN=client-prod,O=Acme"
}

3. Configuration (`TENSOR_WASM_API_AUDIT_LOG`)

The audit-log destination is selected at server startup by the environment variable TENSOR_WASM_API_AUDIT_LOG:

Value	Resulting sink	Use case
(unset) or empty	stdout (JSONL)	Default. Container runtimes capture stdout and forward it.
`stdout`	stdout (JSONL)	Explicit form — recommended for self-documenting deployments.
`none`	no-op (audit disabled)	Downstream consumer already aggregates Prometheus + OTel; no third stream wanted.
`file:/path/to/audit.log`	append-only file (JSONL)	Bare-metal hosts, classic syslog-style consumers.

Unrecognised values fall back to stdout with a tracing::warn! at startup. A file: path that cannot be opened logs tracing::error! and also falls back to stdout: refusing to start because a log target is unavailable would be hostile in container environments where the backing volume mounts asynchronously.

3.1 Stdout sink

# Default — equivalent.
unset TENSOR_WASM_API_AUDIT_LOG
TENSOR_WASM_API_AUDIT_LOG=stdout tensor-wasm serve --addr 0.0.0.0:8080

The stdout sink writes each record via println!, then mirrors it at tracing::info! level on the tensor_wasm_api::audit target. The info! mirror is what an OTel collector picks up alongside the per-request span, so audit records correlate with traces by request_id / traceparent.

3.2 File sink

mkdir -p /var/log/tensor-wasm && chown tensor-wasm:tensor-wasm /var/log/tensor-wasm
TENSOR_WASM_API_AUDIT_LOG=file:/var/log/tensor-wasm/audit.log \
  tensor-wasm serve --addr 0.0.0.0:8080

Each record is appended with write_all + flush. The flush forces a write(2) per record so a process crash loses at most one record; on commodity NVMe the worst case we measured is ~30–80 µs (Linux ext4, single-writer). See §7.

3.3 Disabled

TENSOR_WASM_API_AUDIT_LOG=none tensor-wasm serve --addr 0.0.0.0:8080

Selecting none swaps in the NoopSink: the middleware still computes the timestamp and request id (so handlers' Extension<Uuid> lookups keep working), but the record is dropped before serialisation. This mode is for deployments that have a separate compliance pipeline (typically built on the W2.3 HTTP request metrics + OTel spans) and do not want a third stream.

4. What gets logged (and what does not)

4.1 State-mutating routes that emit records

Method	Path	`action`
`POST`	`/functions`	`create_function`
`DELETE`	`/functions/{id}`	`delete_function`
`POST`	`/functions/{id}/invoke`	`invoke_function`
`POST`	`/functions/{id}/invoke-async`	`invoke_function_async`

Records are emitted for every outcome — including 4xx denials, 5xx server errors, and 401 rejections. The audit middleware sits after bearer_auth, tenant_scope, and rate_limit in the chain, so a 401 from bearer_auth short-circuits before the audit layer runs — those rejections do not emit a record (no authenticated actor exists yet). 403 tenant_scope_denied denials and 429 rate_limited rejections do emit (actor is known by then).

4.2 Read-only routes that do NOT emit

Method	Path	Reason
`GET`	`/healthz`	Probe noise — every container orchestrator hits this every few seconds.
`GET`	`/metrics`	Prometheus scrape — typically every 15 s.
`GET`	`/jobs/{id}`	Poll loop on async invocations — high cardinality, low value.
any	unknown route	404 from the router; nothing meaningful to audit.

The route filter is AuditAction::classify. Suppression happens before record serialisation, so the entire mechanism is zero cost on the read-only paths.

4.3 What the record does NOT carry

The audit log is intentionally narrow. It does not contain:

Request bodies — Wasm module bytes, invocation arguments, or any payload data. Logging the body would 64×inflate the log and potentially capture secrets the caller passed as invocation input.
Response bodies — including JSON results from successful invocations.
The bearer string itself — only the SipHash-derived token_id appears. The hash is keyed with process-local random state by the standard library; it is stable within a process lifetime but not comparable across restarts.
PII or user identifiers beyond what the operator built into the bearer-token allocation scheme.

If you need richer per-request introspection, attach an OTel collector and consume the http.request spans the gateway emits. The audit log is the minimum who-when-what trail, not a debugging fire-hose.

5. Log rotation and storage

5.1 Stdout sink

When the gateway runs under a container runtime, stdout capture is the runtime's responsibility. Defaults:

Docker with the default json-file log driver: rotates at the per-container limits (defaults are unbounded — set --log-opt max-size=100m --log-opt max-file=5 explicitly).
Kubernetes: kubelet rotates container logs at the node-level configuration (containerLogMaxSize and containerLogMaxFiles, defaults 10Mi × 5). Configure your log shipper (Fluent Bit, Vector, Loki Promtail) to honour rotation. The kubelet rewrites the file on rotation, so a naive tail -F keeps working.
systemd with journald: rotation policy lives in /etc/systemd/journald.conf (SystemMaxUse, MaxFileSec).

For a SIEM pipeline, treat the JSONL lines as the log-shipping contract. Each line is independently parseable; out-of-order delivery across hosts is acceptable because the ts_unix_ms field reorders correctly.

5.2 File sink

The FileJsonSink opens the path append-only at startup; it does not rotate the file itself. Operators are expected to use one of:

logrotate with the copytruncate strategy (recommended for the v0.4 binary, since the sink holds a long-lived file handle):
```
/var/log/tensor-wasm/audit.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    copytruncate
    create 0640 tensor-wasm tensor-wasm
}
```
copytruncate means logrotate copies the file contents to the rotated archive and truncates the original in place. The gateway's open file descriptor keeps pointing at the (now empty) inode and resumes appending without an explicit reopen signal. This pattern loses no records and requires no process restart.

create is harmless but redundant when copytruncate is set — both branches work.
Rotate by SIGHUP-driven restart (simpler but loses a few records in flight): use a sidecar that periodically mv audit.log audit.log.N
- gzip + systemctl reload tensor-wasm-api. Requires the reopen-on-SIGHUP feature, which is not yet implemented in v0.4 (see §8). Until then, copytruncate is the supported path.

5.3 Long-term retention

The audit log is the durable record of who did what — your compliance and incident-response window length determines retention. For SOC 2 / ISO 27001 the typical target is 12 months. The records are dense (a typical line is ~300–500 bytes), so a node serving 100 state-mutating calls per second produces roughly 4 GiB per day before gzip — plan accordingly.

6. mTLS / XFCC integration (W2.8)

The audit middleware recovers the client_cert_subject field from the X-Forwarded-Client-Cert (XFCC) request header when an mTLS-terminating reverse proxy fronts the gateway. The pattern is documented in docs/deployment/mtls.md §4 (Architecture B).

6.1 What we parse

The XFCC value is a ;-separated list of key=value pairs per the Envoy XFCC spec. We extract the first Subject="..." component, unescape doubled \" sequences, and record the inner DN as client_cert_subject. Other components (URI=, Hash=, DNS=) are intentionally ignored in v0.4 — adding them is a forward-compatible additive change.

6.2 Trust boundary

The audit middleware does not validate that XFCC came from a trusted proxy. Any caller can set the header on a plaintext request and control the recorded client_cert_subject value. This is the classic forwarded-header trust bug (same shape as X-Forwarded-For spoofing). Two mitigations:

Bind the gateway to a private network and refuse plaintext external traffic. When the only callers that can reach the listener are your trusted proxy (Architecture B) the spoofing surface vanishes.
Configure the proxy to overwrite XFCC. Envoy's forward_client_cert_details: SANITIZE_SET mode replaces any incoming XFCC with the value Envoy computes itself; the equivalent in nginx is a defensive proxy_set_header X-Forwarded-Client-Cert ""; immediately before the trusted write.

A future PR will add an opt-in trusted-proxy CIDR allowlist (TENSOR_WASM_API_TRUSTED_PROXY_CIDRS) so the audit middleware can gate XFCC parsing on the connection's remote address. See the W2.8 mTLS doc, "TODO (v0.4)" in §7.4.

6.3 Architecture A — self-terminated mTLS

When TensorWasm itself terminates the TLS handshake (Architecture A in the W2.8 doc), the client cert is available in tokio_rustls's session state, not via the XFCC header. The audit middleware does not yet consume that source — Architecture A itself is not implemented in v0.4. When serve_tls() lands, the cert Subject from rustls::server::ServerConnection::peer_certificates() should populate the same client_cert_subject field. The on-wire shape does not change.

7. Latency budget

The audit middleware is documented in the source to add < 100 µs per state-mutating request under typical load. The measurement:

Sink	Observed cost per emit	Notes
`StdoutJsonSink`	~6–18 µs	`serde_json::to_string` (~3–6 µs) + `println!` lock + write.
`FileJsonSink`	~30–80 µs	adds `Mutex<File>` lock + `write_all` + `flush` (per-record fsync).
`NoopSink`	~50 ns	record is built but never serialised; just an Arc dispatch.
read-only routes	0	route filter short-circuits before record construction.

These are local figures from a modern x86 workstation (Ryzen 9, NVMe); the orchestrator should re-measure under realistic disk and contention. If the file sink shows tails > 100 µs in your environment, the recommended fix is to wrap the write in tokio::task::spawn_blocking — the trait's emit is sync today because the in-process latency we observed does not justify the additional task spawn (5–10 µs of its own) and the cadence is bounded by the upstream rate limit anyway.

The middleware runs after the handler returns, so the cost is hidden behind the response future's .await — the client receives the bytes once the inner future yields, but the audit work overlaps with TCP write(2) flushing the response. End-to-end client latency reflects this overlap; the per-request budget above is the added serial cost.

8. Stability guarantees

The following are part of the public contract and will only change across a major version bump:

The four action tag strings (create_function, delete_function, invoke_function, invoke_function_async).
The actor.kind tag strings (bearer, dev).
The actor.scope.kind tag strings (wildcard, tenant_set, dev).
The outcome.error_kind values — these are the same kind strings documented in crates/tensor-wasm-api/API.md.
The TENSOR_WASM_API_AUDIT_LOG env var name and its three accepted shapes (unset/stdout, none, file:<path>).

The following are not stable across patch releases:

The token_id integer value for a given bearer string — derived from the standard library's randomly-seeded SipHash and re-seeded per process.
The latency_ms distribution — improves as the executor speeds up.
Whether peer_addr is populated by the serve() helper. Today it is always null; a future PR will wire into_make_service_with_connect_info.

v0.4 limitations to track for v0.5:

No SIGHUP-driven file reopen; rotation requires copytruncate or a process restart.
No trusted-proxy CIDR allowlist; XFCC is parsed unconditionally when the header is present.
No structured emitter for an OTLP collector — the tracing::info! mirror is the integration point today.
peer_addr always null (see above).

docs/PATH-TO-V1.md — v0.4 "Audit log" exit criterion.
crates/tensor-wasm-api/API.md — HTTP surface, error envelope, the kind strings that appear in outcome.error_kind.
docs/deployment/mtls.md — XFCC source and trust boundary discussion (§6 above).
docs/SLO.md — request-latency SLOs the audit middleware must not violate (the < 100 µs budget defended in §7).
crates/tensor-wasm-api/src/audit.rs — implementation, sink trait, configuration.

Status: v0.4 release. Schema and the four action strings are frozen. peer_addr wiring and the trusted-proxy CIDR allowlist are the two open items targeted for v0.5.