Craton Shield
Craton Shield Core -- Performance & WCET Results
Craton Shield Core -- Performance & WCET Results
Benchmark Methodology
All latency figures were collected using the criterion micro-benchmark harness with the following settings:
| Parameter | Value |
|---|---|
| Minimum iterations | 100 |
| Warm-up time | 3 s |
| Measurement time | 5 s |
| Confidence level | 95 % |
| Noise threshold | 2 % |
| Statistical method | Linear regression (criterion) |
Each benchmark is run in isolation (single-threaded) to capture the deterministic worst-case path. Results represent the mean across all measured iterations; criterion also reports lower/upper confidence-interval bounds which are omitted here for brevity.
Hardware & Software Environment
| Component | Detail |
|---|---|
| CPU | x86_64 (AMD Zen 3 / Intel Alder Lake equivalent) |
| OS | Linux 6.x (kernel preemption disabled for bench) |
| Rust | 1.82+ stable |
| Profile | release with lto = true, codegen-units = 1 |
| Allocator | System (benchmarks only; runtime is #[no_alloc]) |
| Target | x86_64-unknown-linux-gnu |
All benchmarks were run with CPU frequency scaling disabled
(performance governor) and process affinity pinned to an isolated core.
Latency Results
Current (v0.7.0 — after hot-path optimizations)
| Operation | Mean Latency | Notes |
|---|---|---|
| CAN frame submission | ~265 ns | 5 detectors, standard 8-byte frame |
| Ethernet packet inspection | ~28 ns | SOME/IP with hash-indexed allow-list |
| ETH inspection (rejected) | ~158 ns | Full allowlist miss (worst case) |
| Firewall (128 rules, first match) | ~9 ns | Sorted-priority early exit |
| Firewall (128 rules, last match) | ~166 ns | Worst case: all 128 rules scanned |
| Firewall (64 port rules) | ~252 ns | L4 matching, last match |
| Policy engine (64 rules) | ~199 ns | First-match miss, full scan to default deny |
| EWMA anomaly score update | ~45 ns | Single signal update |
| SHA-256 (256 bytes) | ~461 ns | Software implementation |
| SHA-256 (4 KiB) | ~7.5 us | Software implementation |
| HMAC-SHA256 (64 bytes) | ~276 ns | Software implementation |
| AES-128-GCM encrypt (256 bytes) | ~907 ns | Software implementation |
| Event log append | ~108 ns | HMAC-chained, cached serialization |
| Full platform tick cycle | ~73 ns | All subsystems, idle state |
| Runtime submit_can_frame | ~265 ns | Full pipeline: monitor + IDS + tick |
Before vs After (optimization impact)
The following optimizations were applied in v0.7.0. "Before" numbers are from the v0.5.0 baseline measured on the same hardware.
| Operation | Before (v0.5.0) | After (v0.7.0) | Speedup | What Changed |
|---|---|---|---|---|
| CAN frame submission | ~464 ns | ~265 ns | 1.8x | Sorted allowlist (binary search insert) |
| Ethernet packet inspection | ~1.2 us | ~28 ns | 43x | Hash-indexed allow-list (FNV-1a, O(1) lookup) |
| Firewall (128 rules) | ~23 us (full scan) | ~9 ns (first match) | 2,500x | Priority-sorted rules with early exit |
| Event log append | ~350 ns | ~108 ns | 3.2x | Cached prev-entry serialization |
| Full platform tick (idle) | ~5 us | ~73 ns | 68x | All subsystem optimizations combined |
| Runtime submit_can_frame | ~464 ns | ~265 ns | 1.8x | All subsystem optimizations combined |
Optimization details
-
Firewall: sorted-priority early exit — Rules are now maintained in ascending priority order via insertion sort.
evaluate()returns on the first matching rule instead of scanning all rules. First-match is O(1); worst-case (last-match) is still O(n) but at ~1.3 ns/rule. -
Firewall: hash-based rate limiter — Token-bucket lookup uses multiplicative hash + linear probing instead of O(32) linear scan.
-
ETH monitor: hash-indexed allow-list —
is_service_allowed()uses a 128-entry FNV-1a hash table over(src_mac, dst_mac, service_id)for O(1) average-case lookup, replacing O(64) linear scan. -
Event logger: cached serialization — Stores the serialized form of the most recently appended entry, avoiding a re-serialize on each
append()for theprev_hashcomputation. -
CAN monitor: sorted allowlist — IDs are maintained in sorted order with O(log n) binary search for duplicate detection during
add(). -
Policy engine: time validation fast path — Bitwise OR check
(valid_from | valid_until) == 0skips time constraint evaluation for rules with no time bounds (the common case). -
Anomaly detectors: bit-mask indexing —
HistogramDetectorandMarkovDetectorusevalue & (N-1)when N is a power of two, replacing division with a single AND instruction.
Scaling Behavior
| Firewall Rules | First Match | Last Match | Per-rule cost |
|---|---|---|---|
| 8 | ~9 ns | ~17 ns | 2.1 ns |
| 32 | ~10 ns | ~46 ns | 1.4 ns |
| 64 | ~10 ns | ~89 ns | 1.4 ns |
| 128 | ~10 ns | ~190 ns | 1.5 ns |
| Policy Rules | Full scan (miss) | Per-rule cost |
|---|---|---|
| 8 | ~32 ns | 4.0 ns |
| 16 | ~60 ns | 3.8 ns |
| 32 | ~118 ns | 3.7 ns |
| 64 | ~227 ns | 3.5 ns |
| CAN Rules | process_frame | Per-rule cost |
|---|---|---|
| 1 | ~233 ns | — |
| 5 | ~235 ns | ~0.5 ns |
| 16 | ~242 ns | ~0.6 ns |
| 64 | ~320 ns | ~1.4 ns |
Memory Footprint
Binary Size (release, LTO, stripped)
| Capacity Tier | Approx. Binary Size | Description |
|---|---|---|
| Minimal (CAN only) | ~180 KiB | CAN monitor + IDS + firewall |
| Standard (CAN + ETH) | ~280 KiB | Above + Ethernet monitor + anomaly |
| Full (all subsystems) | ~420 KiB | All 21 subsystems enabled |
| With crypto (software) | ~520 KiB | Full + ring-based crypto |
Stack Usage
| Context | Stack Depth |
|---|---|
vs_platform_init | ~2.4 KiB |
vs_submit_can_frame | ~1.2 KiB |
vs_submit_eth_packet | ~1.8 KiB |
vs_platform_tick | ~3.2 KiB |
vs_get_health | ~0.8 KiB |
| Maximum (any FFI entry point) | <4 KiB |
Stack measurements obtained via -Z emit-stack-sizes and manual analysis of
the call graph. No recursive calls exist in any execution path.
Heap Usage
The Craton Shield runtime performs zero heap allocations during steady-state
operation. All buffers are statically sized or stack-allocated. The only
allocations occur during vs_platform_init (one-time setup of the Mutex-
wrapped global state) and are freed on vs_platform_shutdown.
WCET Considerations
Craton Shield is designed for deterministic execution suitable for ASIL-B applications:
- No heap allocation in the hot path. All data structures use fixed-size arrays and ring buffers.
- Bounded loops only. Every loop in the runtime has a compile-time or configuration-time upper bound (e.g., max 128 firewall rules at base tier, max 64-byte CAN payload, max 9216-byte Ethernet frame).
- No recursion. The call graph is statically provable to be acyclic.
- Deterministic branching. All match/switch arms are exhaustive; no data-dependent iteration counts in the critical path.
- Panic-safe FFI boundary.
catch_unwindguards prevent stack unwinding across the C/Rust boundary. Panics are counted and reported viavs_get_panic_count(). - Rate limiting is O(1) per frame (token bucket with constant-time refill).
- No floating-point in the critical path (EWMA uses fixed-point arithmetic).
- No system calls in
submit_can_frame,submit_eth_packet, orplatform_tick(mutex acquisition uses futex on Linux, which is a syscall only on contention).
Automotive Timing Budget Compliance
| Bus / Domain | Typical Budget | Before (v0.5.0) | Margin | After (v0.7.0) | Margin |
|---|---|---|---|---|---|
| CAN (500 kbit/s) | ~10 us/frame | ~464 ns | 21x | ~265 ns | 37x |
| Automotive Ethernet | ~100 us/packet | ~1.2 us | 83x | ~28 ns | 3,571x |
| Full tick (idle) | 1 ms budget | ~5 us | 200x | ~73 ns | 13,700x |
All measured latencies remain well within the timing budgets defined by ISO 11898 (CAN), ISO 11898-2 (CAN-FD), and IEEE 802.3 (Automotive Ethernet), leaving substantial margin for jitter, OS scheduling, and interrupt latency on production ECU hardware.
Target Platform Estimates
The benchmarks above are from an x86_64 workstation. On representative automotive targets, expected scaling factors are:
| Target MCU | Clock | Estimated Scaling Factor |
|---|---|---|
| ARM Cortex-R52 (Traveo) | 400 MHz | ~8-12x slower |
| ARM Cortex-A53 (S32G) | 1.0 GHz | ~3-5x slower |
| Infineon AURIX TC3xx | 300 MHz | ~10-15x slower |
Even with a 15x scaling factor on the slowest target (AURIX TC3xx), the optimized latencies remain well within budget:
| Operation (scaled 15x) | Estimated Latency | Budget | Margin |
|---|---|---|---|
| CAN frame submission | ~4.0 us | 10 us | 2.5x |
| ETH packet inspection | ~0.4 us | 100 us | 250x |
| Full tick (idle) | ~1.1 us | 1 ms | 900x |