Craton HSM
Benchmarks
Benchmarks
Craton HSM ships two benchmark harnesses built on Criterion.rs. The first measures raw Rust cryptographic throughput; the second loads the compiled PKCS#11 shared library through C_GetFunctionList and exercises the same code path real consumers (OpenSSL engine, Java SunPKCS11, NSS) go through. The numbers below are reported as medians; full Criterion HTML reports land in target/criterion/report/index.html after each run.
Harness
Two benches live under benches/:
| Harness | File | What it measures |
|---|---|---|
| Direct Rust API | benches/crypto_bench.rs | Cryptographic backend performance with no FFI, no session state, no ABI marshalling. |
| PKCS#11 C ABI | benches/pkcs11_abi_bench.rs | End-to-end performance through the C ABI, including catch_unwind, pointer marshalling, audit logging, and session bookkeeping. |
The ABI harness dynamically loads libcraton_hsm.so / craton_hsm.dll via libloading and, when SOFTHSM2_LIB is set, loads SoftHSMv2 in the same process and runs identical operations inside the same Criterion report. Every ABI iteration includes the full C_*Init + C_* pair (for example C_SignInit + C_Sign), which reflects how PKCS#11 consumers actually call the module.
All published numbers were collected on Windows 11, x86_64, single-threaded, with --release, LTO enabled, and RUSTFLAGS="-C target-cpu=native". Criterion used 100 samples per benchmark (10 for RSA key generation).
Running the benches
# Direct Rust API
RUSTFLAGS="-C target-cpu=native" cargo bench --bench crypto_bench
# PKCS#11 C ABI (default RustCrypto backend)
RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench
# PKCS#11 C ABI with the aws-lc-rs backend
RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench \
--features awslc-backend --no-default-features
To enable the SoftHSMv2 head-to-head, point SOFTHSM2_LIB at the installed library before invoking cargo bench:
# Linux
SOFTHSM2_LIB=/usr/lib/softhsm/libsofthsm2.so \
RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench
# macOS
SOFTHSM2_LIB=$(brew --prefix softhsm)/lib/softhsm/libsofthsm2.so \
RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench
# Windows
SOFTHSM2_LIB=C:/SoftHSM2/SoftHSM2/lib/softhsm2-x64.dll \
RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench
The harness provisions a SoftHSMv2 token under target/bench-tokens/, writes a softhsm2.conf with absolute paths, and initialises the token with the same PINs Craton HSM uses before the comparison begins.
Measured operations
The ABI harness covers the operations listed below. Sizes are the payload passed to a single C_* call; key generation benches measure C_GenerateKey / C_GenerateKeyPair only.
| Group | Mechanism | Data size |
|---|---|---|
pkcs11_rsa_sign_2048 | RSA PKCS#1 v1.5 + SHA-256 sign | 32 B |
pkcs11_rsa_verify_2048 | RSA PKCS#1 v1.5 + SHA-256 verify | 32 B |
pkcs11_ecdsa_p256_sign | ECDSA P-256 sign (raw hash) | 32 B |
pkcs11_ecdsa_p256_verify | ECDSA P-256 verify (raw hash) | 32 B |
pkcs11_aes_gcm_encrypt_4kb | AES-256-GCM encrypt | 4 KB |
pkcs11_aes_gcm_decrypt_4kb | AES-256-GCM decrypt | 4 KB |
pkcs11_sha256_digest_4kb | SHA-256 digest | 4 KB |
pkcs11_keygen_rsa_2048 | RSA-2048 key pair generation | — |
pkcs11_keygen_ec_p256 | EC P-256 key pair generation | — |
pkcs11_keygen_aes_256 | AES-256 key generation | — |
The direct Rust harness additionally covers RSA-3072 and RSA-4096 sign, Ed25519 sign and verify, AES-GCM at 256 B / 4 KB / 64 KB, SHA-256 and SHA-512 at 4 KB, ML-KEM-512/768 encapsulate and decapsulate, and ML-DSA-44/65 sign and verify.
Direct Rust API results
Baseline numbers from the RustCrypto backend, measured against the crate's Rust API (no PKCS#11 overhead):
| Operation | Time |
|---|---|
| RSA-2048 sign | 1.806 ms |
| RSA-2048 verify | 206.2 us |
| RSA-4096 sign | 11.94 ms |
| ECDSA P-256 sign | 339.7 us |
| ECDSA P-256 verify | 289.6 us |
| Ed25519 sign | 45.99 us |
| Ed25519 verify | 47.44 us |
| AES-GCM encrypt, 256 B | 1.396 us |
| AES-GCM encrypt, 4 KB | 5.970 us |
| AES-GCM encrypt, 64 KB | 62.35 us |
| AES-GCM decrypt, 256 B | 0.589 us |
| AES-GCM decrypt, 4 KB | 3.822 us |
| AES-GCM decrypt, 64 KB | 58.05 us |
| SHA-256, 4 KB | 18.63 us |
| SHA-512, 4 KB | 10.45 us |
| ML-KEM-512 encap | 56.11 us |
| ML-KEM-768 encap | 82.43 us |
| ML-KEM-512 decap | 97.04 us |
| ML-KEM-768 decap | 179.9 us |
Source code referenced for RSA-3072 timing is not published in docs/benchmarks.md; run cargo bench --bench crypto_bench locally for that size.
Backend comparison: RustCrypto vs aws-lc-rs
Both rows built with target-cpu=native. aws-lc-rs provides assembly-optimised primitives (AES-NI, AVX2, Montgomery multiplication) and is the backend required for FIPS operation.
| Operation | RustCrypto | aws-lc-rs | Speedup |
|---|---|---|---|
| RSA-2048 sign | 2.001 ms | 1.628 ms | 1.2x |
| RSA-2048 verify | 222.0 us | 26.79 us | 8.3x |
| ECDSA P-256 sign | 331.6 us | 291.8 us | 1.1x |
| ECDSA P-256 verify | 298.3 us | 66.44 us | 4.5x |
| AES-GCM decrypt, 4 KB | 3.590 us | 2.015 us | 1.8x |
| SHA-256, 4 KB | 16.63 us | 11.52 us | 1.4x |
| SHA-512, 4 KB | 10.14 us | 8.362 us | 1.2x |
| RSA-2048 keygen | 214.7 ms | 91.42 ms | 2.3x |
| EC P-256 keygen | 184.4 us | 157.3 us | 1.2x |
The aws-lc-rs encrypt path draws a fresh nonce from SystemRandom on every call, while the RustCrypto path uses a cached per-key counter nonce. Decrypt numbers are therefore the fairer algorithmic comparison: aws-lc-rs is 1.8x faster on AES-GCM decrypt at 4 KB.
PKCS#11 ABI results
The following table is the three-way head-to-head through C_GetFunctionList. All three columns were produced by the same Criterion run on the same machine; Craton HSM has all nine optimisations applied.
| Operation | Craton HSM (RustCrypto) | Craton HSM (aws-lc-rs) | SoftHSMv2 2.6.1 | Best vs SoftHSM |
|---|---|---|---|---|
| RSA-2048 sign | 2.558 ms | 1.837 ms | 1.522 ms | SoftHSM 1.2x |
| RSA-2048 verify | 303.5 us | 251.0 us | 37.82 us | SoftHSM 6.6x |
| ECDSA P-256 sign | 511.5 us | 363.2 us | 89.10 us | SoftHSM 4.1x |
| ECDSA P-256 verify | 506.5 us | 338.0 us | 109.1 us | SoftHSM 3.1x |
| SHA-256 digest, 4 KB | 26.0 us | 15.58 us | 9.90 us | SoftHSM 1.6x |
| AES-GCM encrypt, 4 KB | 6.173 us | 4.419 us | — | Craton HSM only |
| AES-GCM decrypt, 4 KB | 5.605 us | 4.094 us | — | Craton HSM only |
| RSA-2048 keygen | 313.6 ms | 208.6 ms | 80.99 ms | SoftHSM 2.6x |
| EC P-256 keygen | 824.4 us | 824.9 us | 224.5 us | SoftHSM 3.7x |
| AES-256 keygen | 18.83 us | 17.79 us | 90.63 us | Craton HSM 5.1x |
SoftHSMv2 does not accept null GCM parameters, so AES-GCM encrypt / decrypt cannot be benchmarked against it through the same harness; those cells are intentionally empty.
Where Craton HSM leads
- AES-256 key generation is 5.1x faster. Craton HSM draws bytes directly from the OS CSPRNG (
SystemRandom), while SoftHSMv2 routes through Botan's DRBG stack. - AES-GCM encrypt / decrypt have no SoftHSMv2 comparison available; the Craton HSM aws-lc-rs path completes 4 KB decrypt in 4.094 us.
- Post-quantum — ML-KEM-768 encapsulation in 74.46 us and ML-DSA-65 signing in 563.3 us, neither of which SoftHSMv2 implements.
Where SoftHSMv2 leads
- RSA-2048 verify is 6.6x faster — Botan's Montgomery multiplication uses hand-written ADX/MULX assembly that RustCrypto and aws-lc-rs do not yet match on this code path.
- ECDSA P-256 sign is 4.1x faster; Botan uses a precomputed base-point table and wNAF scalar multiplication.
- EC P-256 keygen is 3.7x faster and SHA-256 digest is 1.6x faster for the same assembly-level reasons.
For RSA-2048 sign, the gap is 1.2x. No SoftHSMv2 numbers are published for RSA-3072, RSA-4096, or Ed25519 in this comparison.
Post-quantum results
ML-KEM, ML-DSA, and SLH-DSA use pure-Rust implementations (ml-kem, ml-dsa, slh-dsa crates). The numbers therefore do not vary across backends.
| Operation | Time |
|---|---|
| ML-DSA-44 sign | 711.9 us |
| ML-DSA-65 sign | 563.3 us |
| ML-DSA-44 verify | 158.1 us |
| ML-DSA-65 verify | 270.1 us |
| ML-KEM-512 encap | 51.84 us |
| ML-KEM-768 encap | 74.46 us |
| ML-KEM-512 decap | 84.95 us |
| ML-KEM-768 decap | 135.3 us |
Methodology notes
- Single-threaded. PKCS#11 mandates a singleton module state, so every benchmark runs on one thread. Published figures therefore reflect single-operation latency, not peak throughput of a parallel workload.
- Warm caches. Keys are pre-generated during Criterion setup; the measured iterations exercise warm CPU caches and warm parsed-key caches (see below).
target-cpu=native. Every number was collected withRUSTFLAGS="-C target-cpu=native". Distribution binaries built without this flag will be slower, most noticeably on ML-KEM decapsulation (up to 25%).- Persistence. The benches run with the default in-memory token (
persist_objects = false). Enabling persistence adds anEncryptedStorewrite to key-generation and object-mutation paths; encrypted storage uses AES-GCM over each object blob, so the overhead is roughly one AES-GCM encrypt of the serialised object per write. - Async audit logging. The audit log's SHA-256 hash chain and JSON serialisation run on a dedicated thread fed by an
mpscchannel;C_Sign/C_Verify/C_Encryptenqueue and return in sub-microsecond time. With synchronous audit logging (pre-optimisation), ABI-layer AES-GCM decrypt at 4 KB was 12.57 us; after moving audit off the hot path the same operation is 5.605 us. - RSA key-generation variance is high. Prime-search timing depends on the candidates produced by the RNG; Criterion reports a wider interval for these benches than for any other.
- Release mode only.
cargo benchsets--releaseautomatically. Debug-mode numbers are not representative and should not be reported.
Environment variables
| Variable | Purpose | Default |
|---|---|---|
CRATON_HSM_LIB | Path to the Craton HSM shared library. | Auto-detected under target/release/. |
SOFTHSM2_LIB | Path to the SoftHSMv2 shared library. Enables the head-to-head run. | Unset (comparison disabled). |
SOFTHSM2_CONF | SoftHSMv2 config file. | Written by the harness. |
Known limitations
- No multi-threaded benchmarks. The PKCS#11 spec's global-state model makes parallel measurements unrepresentative.
- AES-GCM cannot be compared with SoftHSMv2 through this harness because the two libraries disagree on whether
CK_GCM_PARAMSmay be null. - RSA-3072 / RSA-4096 and Ed25519 numbers are produced by the Rust-level harness only; no corresponding SoftHSMv2 rows are published in the reference document.
- YubiHSM and p11-kit-proxy are not part of the harness. See Comparison gaps below.
Comparison gaps
Craton HSM publishes head-to-head numbers against SoftHSMv2 only. The reference docs/benchmarks.md in the source tree does not include YubiHSM, p11-kit-proxy, or other HSM vendors, and this page intentionally omits speculation. If you need comparative numbers against those modules, run the harness locally against their PKCS#11 libraries by setting SOFTHSM2_LIB to the target library path — the same harness loads any C_GetFunctionList-compatible module — and be aware that algorithm parameter conventions may differ.