Craton HSM

Benchmarks

Craton HSM ships two benchmark harnesses built on Criterion.rs. The first measures raw Rust cryptographic throughput; the second loads the compiled PKCS#11 shared library through C_GetFunctionList and exercises the same code path real consumers (OpenSSL engine, Java SunPKCS11, NSS) go through. The numbers below are reported as medians; full Criterion HTML reports land in target/criterion/report/index.html after each run.

Harness

Two benches live under benches/:

Harness	File	What it measures
Direct Rust API	`benches/crypto_bench.rs`	Cryptographic backend performance with no FFI, no session state, no ABI marshalling.
PKCS#11 C ABI	`benches/pkcs11_abi_bench.rs`	End-to-end performance through the C ABI, including `catch_unwind`, pointer marshalling, audit logging, and session bookkeeping.

The ABI harness dynamically loads libcraton_hsm.so / craton_hsm.dll via libloading and, when SOFTHSM2_LIB is set, loads SoftHSMv2 in the same process and runs identical operations inside the same Criterion report. Every ABI iteration includes the full C_*Init + C_* pair (for example C_SignInit + C_Sign), which reflects how PKCS#11 consumers actually call the module.

All published numbers were collected on Windows 11, x86_64, single-threaded, with --release, LTO enabled, and RUSTFLAGS="-C target-cpu=native". Criterion used 100 samples per benchmark (10 for RSA key generation).

Running the benches

# Direct Rust API
RUSTFLAGS="-C target-cpu=native" cargo bench --bench crypto_bench

# PKCS#11 C ABI (default RustCrypto backend)
RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench

# PKCS#11 C ABI with the aws-lc-rs backend
RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench \
    --features awslc-backend --no-default-features

To enable the SoftHSMv2 head-to-head, point SOFTHSM2_LIB at the installed library before invoking cargo bench:

# Linux
SOFTHSM2_LIB=/usr/lib/softhsm/libsofthsm2.so \
    RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench

# macOS
SOFTHSM2_LIB=$(brew --prefix softhsm)/lib/softhsm/libsofthsm2.so \
    RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench

# Windows
SOFTHSM2_LIB=C:/SoftHSM2/SoftHSM2/lib/softhsm2-x64.dll \
    RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench

The harness provisions a SoftHSMv2 token under target/bench-tokens/, writes a softhsm2.conf with absolute paths, and initialises the token with the same PINs Craton HSM uses before the comparison begins.

Measured operations

The ABI harness covers the operations listed below. Sizes are the payload passed to a single C_* call; key generation benches measure C_GenerateKey / C_GenerateKeyPair only.

Group	Mechanism	Data size
`pkcs11_rsa_sign_2048`	RSA PKCS#1 v1.5 + SHA-256 sign	32 B
`pkcs11_rsa_verify_2048`	RSA PKCS#1 v1.5 + SHA-256 verify	32 B
`pkcs11_ecdsa_p256_sign`	ECDSA P-256 sign (raw hash)	32 B
`pkcs11_ecdsa_p256_verify`	ECDSA P-256 verify (raw hash)	32 B
`pkcs11_aes_gcm_encrypt_4kb`	AES-256-GCM encrypt	4 KB
`pkcs11_aes_gcm_decrypt_4kb`	AES-256-GCM decrypt	4 KB
`pkcs11_sha256_digest_4kb`	SHA-256 digest	4 KB
`pkcs11_keygen_rsa_2048`	RSA-2048 key pair generation	—
`pkcs11_keygen_ec_p256`	EC P-256 key pair generation	—
`pkcs11_keygen_aes_256`	AES-256 key generation	—

The direct Rust harness additionally covers RSA-3072 and RSA-4096 sign, Ed25519 sign and verify, AES-GCM at 256 B / 4 KB / 64 KB, SHA-256 and SHA-512 at 4 KB, ML-KEM-512/768 encapsulate and decapsulate, and ML-DSA-44/65 sign and verify.

Direct Rust API results

Baseline numbers from the RustCrypto backend, measured against the crate's Rust API (no PKCS#11 overhead):

Operation	Time
RSA-2048 sign	1.806 ms
RSA-2048 verify	206.2 us
RSA-4096 sign	11.94 ms
ECDSA P-256 sign	339.7 us
ECDSA P-256 verify	289.6 us
Ed25519 sign	45.99 us
Ed25519 verify	47.44 us
AES-GCM encrypt, 256 B	1.396 us
AES-GCM encrypt, 4 KB	5.970 us
AES-GCM encrypt, 64 KB	62.35 us
AES-GCM decrypt, 256 B	0.589 us
AES-GCM decrypt, 4 KB	3.822 us
AES-GCM decrypt, 64 KB	58.05 us
SHA-256, 4 KB	18.63 us
SHA-512, 4 KB	10.45 us
ML-KEM-512 encap	56.11 us
ML-KEM-768 encap	82.43 us
ML-KEM-512 decap	97.04 us
ML-KEM-768 decap	179.9 us

Source code referenced for RSA-3072 timing is not published in docs/benchmarks.md; run cargo bench --bench crypto_bench locally for that size.

Backend comparison: RustCrypto vs aws-lc-rs

Both rows built with target-cpu=native. aws-lc-rs provides assembly-optimised primitives (AES-NI, AVX2, Montgomery multiplication) and is the backend required for FIPS operation.

Operation	RustCrypto	aws-lc-rs	Speedup
RSA-2048 sign	2.001 ms	1.628 ms	1.2x
RSA-2048 verify	222.0 us	26.79 us	8.3x
ECDSA P-256 sign	331.6 us	291.8 us	1.1x
ECDSA P-256 verify	298.3 us	66.44 us	4.5x
AES-GCM decrypt, 4 KB	3.590 us	2.015 us	1.8x
SHA-256, 4 KB	16.63 us	11.52 us	1.4x
SHA-512, 4 KB	10.14 us	8.362 us	1.2x
RSA-2048 keygen	214.7 ms	91.42 ms	2.3x
EC P-256 keygen	184.4 us	157.3 us	1.2x

The aws-lc-rs encrypt path draws a fresh nonce from SystemRandom on every call, while the RustCrypto path uses a cached per-key counter nonce. Decrypt numbers are therefore the fairer algorithmic comparison: aws-lc-rs is 1.8x faster on AES-GCM decrypt at 4 KB.

PKCS#11 ABI results

The following table is the three-way head-to-head through C_GetFunctionList. All three columns were produced by the same Criterion run on the same machine; Craton HSM has all nine optimisations applied.

Operation	Craton HSM (RustCrypto)	Craton HSM (aws-lc-rs)	SoftHSMv2 2.6.1	Best vs SoftHSM
RSA-2048 sign	2.558 ms	1.837 ms	1.522 ms	SoftHSM 1.2x
RSA-2048 verify	303.5 us	251.0 us	37.82 us	SoftHSM 6.6x
ECDSA P-256 sign	511.5 us	363.2 us	89.10 us	SoftHSM 4.1x
ECDSA P-256 verify	506.5 us	338.0 us	109.1 us	SoftHSM 3.1x
SHA-256 digest, 4 KB	26.0 us	15.58 us	9.90 us	SoftHSM 1.6x
AES-GCM encrypt, 4 KB	6.173 us	4.419 us	—	Craton HSM only
AES-GCM decrypt, 4 KB	5.605 us	4.094 us	—	Craton HSM only
RSA-2048 keygen	313.6 ms	208.6 ms	80.99 ms	SoftHSM 2.6x
EC P-256 keygen	824.4 us	824.9 us	224.5 us	SoftHSM 3.7x
AES-256 keygen	18.83 us	17.79 us	90.63 us	Craton HSM 5.1x

SoftHSMv2 does not accept null GCM parameters, so AES-GCM encrypt / decrypt cannot be benchmarked against it through the same harness; those cells are intentionally empty.

Where Craton HSM leads

AES-256 key generation is 5.1x faster. Craton HSM draws bytes directly from the OS CSPRNG (SystemRandom), while SoftHSMv2 routes through Botan's DRBG stack.
AES-GCM encrypt / decrypt have no SoftHSMv2 comparison available; the Craton HSM aws-lc-rs path completes 4 KB decrypt in 4.094 us.
Post-quantum — ML-KEM-768 encapsulation in 74.46 us and ML-DSA-65 signing in 563.3 us, neither of which SoftHSMv2 implements.

Where SoftHSMv2 leads

RSA-2048 verify is 6.6x faster — Botan's Montgomery multiplication uses hand-written ADX/MULX assembly that RustCrypto and aws-lc-rs do not yet match on this code path.
ECDSA P-256 sign is 4.1x faster; Botan uses a precomputed base-point table and wNAF scalar multiplication.
EC P-256 keygen is 3.7x faster and SHA-256 digest is 1.6x faster for the same assembly-level reasons.

For RSA-2048 sign, the gap is 1.2x. No SoftHSMv2 numbers are published for RSA-3072, RSA-4096, or Ed25519 in this comparison.

Post-quantum results

ML-KEM, ML-DSA, and SLH-DSA use pure-Rust implementations (ml-kem, ml-dsa, slh-dsa crates). The numbers therefore do not vary across backends.

Operation	Time
ML-DSA-44 sign	711.9 us
ML-DSA-65 sign	563.3 us
ML-DSA-44 verify	158.1 us
ML-DSA-65 verify	270.1 us
ML-KEM-512 encap	51.84 us
ML-KEM-768 encap	74.46 us
ML-KEM-512 decap	84.95 us
ML-KEM-768 decap	135.3 us

Methodology notes

Single-threaded. PKCS#11 mandates a singleton module state, so every benchmark runs on one thread. Published figures therefore reflect single-operation latency, not peak throughput of a parallel workload.
Warm caches. Keys are pre-generated during Criterion setup; the measured iterations exercise warm CPU caches and warm parsed-key caches (see below).
target-cpu=native. Every number was collected with RUSTFLAGS="-C target-cpu=native". Distribution binaries built without this flag will be slower, most noticeably on ML-KEM decapsulation (up to 25%).
Persistence. The benches run with the default in-memory token (persist_objects = false). Enabling persistence adds an EncryptedStore write to key-generation and object-mutation paths; encrypted storage uses AES-GCM over each object blob, so the overhead is roughly one AES-GCM encrypt of the serialised object per write.
Async audit logging. The audit log's SHA-256 hash chain and JSON serialisation run on a dedicated thread fed by an mpsc channel; C_Sign / C_Verify / C_Encrypt enqueue and return in sub-microsecond time. With synchronous audit logging (pre-optimisation), ABI-layer AES-GCM decrypt at 4 KB was 12.57 us; after moving audit off the hot path the same operation is 5.605 us.
RSA key-generation variance is high. Prime-search timing depends on the candidates produced by the RNG; Criterion reports a wider interval for these benches than for any other.
Release mode only. cargo bench sets --release automatically. Debug-mode numbers are not representative and should not be reported.

Environment variables

Variable	Purpose	Default
`CRATON_HSM_LIB`	Path to the Craton HSM shared library.	Auto-detected under `target/release/`.
`SOFTHSM2_LIB`	Path to the SoftHSMv2 shared library. Enables the head-to-head run.	Unset (comparison disabled).
`SOFTHSM2_CONF`	SoftHSMv2 config file.	Written by the harness.

Known limitations

No multi-threaded benchmarks. The PKCS#11 spec's global-state model makes parallel measurements unrepresentative.
AES-GCM cannot be compared with SoftHSMv2 through this harness because the two libraries disagree on whether CK_GCM_PARAMS may be null.
RSA-3072 / RSA-4096 and Ed25519 numbers are produced by the Rust-level harness only; no corresponding SoftHSMv2 rows are published in the reference document.
YubiHSM and p11-kit-proxy are not part of the harness. See Comparison gaps below.

Comparison gaps

Craton HSM publishes head-to-head numbers against SoftHSMv2 only. The reference docs/benchmarks.md in the source tree does not include YubiHSM, p11-kit-proxy, or other HSM vendors, and this page intentionally omits speculation. If you need comparative numbers against those modules, run the harness locally against their PKCS#11 libraries by setting SOFTHSM2_LIB to the target library path — the same harness loads any C_GetFunctionList-compatible module — and be aware that algorithm parameter conventions may differ.