Craton HSM

Benchmarks

Benchmarks

Craton HSM ships two benchmark harnesses built on Criterion.rs. The first measures raw Rust cryptographic throughput; the second loads the compiled PKCS#11 shared library through C_GetFunctionList and exercises the same code path real consumers (OpenSSL engine, Java SunPKCS11, NSS) go through. The numbers below are reported as medians; full Criterion HTML reports land in target/criterion/report/index.html after each run.

Harness

Two benches live under benches/:

HarnessFileWhat it measures
Direct Rust APIbenches/crypto_bench.rsCryptographic backend performance with no FFI, no session state, no ABI marshalling.
PKCS#11 C ABIbenches/pkcs11_abi_bench.rsEnd-to-end performance through the C ABI, including catch_unwind, pointer marshalling, audit logging, and session bookkeeping.

The ABI harness dynamically loads libcraton_hsm.so / craton_hsm.dll via libloading and, when SOFTHSM2_LIB is set, loads SoftHSMv2 in the same process and runs identical operations inside the same Criterion report. Every ABI iteration includes the full C_*Init + C_* pair (for example C_SignInit + C_Sign), which reflects how PKCS#11 consumers actually call the module.

All published numbers were collected on Windows 11, x86_64, single-threaded, with --release, LTO enabled, and RUSTFLAGS="-C target-cpu=native". Criterion used 100 samples per benchmark (10 for RSA key generation).

Running the benches

# Direct Rust API
RUSTFLAGS="-C target-cpu=native" cargo bench --bench crypto_bench

# PKCS#11 C ABI (default RustCrypto backend)
RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench

# PKCS#11 C ABI with the aws-lc-rs backend
RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench \
    --features awslc-backend --no-default-features

To enable the SoftHSMv2 head-to-head, point SOFTHSM2_LIB at the installed library before invoking cargo bench:

# Linux
SOFTHSM2_LIB=/usr/lib/softhsm/libsofthsm2.so \
    RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench

# macOS
SOFTHSM2_LIB=$(brew --prefix softhsm)/lib/softhsm/libsofthsm2.so \
    RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench

# Windows
SOFTHSM2_LIB=C:/SoftHSM2/SoftHSM2/lib/softhsm2-x64.dll \
    RUSTFLAGS="-C target-cpu=native" cargo bench --bench pkcs11_abi_bench

The harness provisions a SoftHSMv2 token under target/bench-tokens/, writes a softhsm2.conf with absolute paths, and initialises the token with the same PINs Craton HSM uses before the comparison begins.

Measured operations

The ABI harness covers the operations listed below. Sizes are the payload passed to a single C_* call; key generation benches measure C_GenerateKey / C_GenerateKeyPair only.

GroupMechanismData size
pkcs11_rsa_sign_2048RSA PKCS#1 v1.5 + SHA-256 sign32 B
pkcs11_rsa_verify_2048RSA PKCS#1 v1.5 + SHA-256 verify32 B
pkcs11_ecdsa_p256_signECDSA P-256 sign (raw hash)32 B
pkcs11_ecdsa_p256_verifyECDSA P-256 verify (raw hash)32 B
pkcs11_aes_gcm_encrypt_4kbAES-256-GCM encrypt4 KB
pkcs11_aes_gcm_decrypt_4kbAES-256-GCM decrypt4 KB
pkcs11_sha256_digest_4kbSHA-256 digest4 KB
pkcs11_keygen_rsa_2048RSA-2048 key pair generation
pkcs11_keygen_ec_p256EC P-256 key pair generation
pkcs11_keygen_aes_256AES-256 key generation

The direct Rust harness additionally covers RSA-3072 and RSA-4096 sign, Ed25519 sign and verify, AES-GCM at 256 B / 4 KB / 64 KB, SHA-256 and SHA-512 at 4 KB, ML-KEM-512/768 encapsulate and decapsulate, and ML-DSA-44/65 sign and verify.

Direct Rust API results

Baseline numbers from the RustCrypto backend, measured against the crate's Rust API (no PKCS#11 overhead):

OperationTime
RSA-2048 sign1.806 ms
RSA-2048 verify206.2 us
RSA-4096 sign11.94 ms
ECDSA P-256 sign339.7 us
ECDSA P-256 verify289.6 us
Ed25519 sign45.99 us
Ed25519 verify47.44 us
AES-GCM encrypt, 256 B1.396 us
AES-GCM encrypt, 4 KB5.970 us
AES-GCM encrypt, 64 KB62.35 us
AES-GCM decrypt, 256 B0.589 us
AES-GCM decrypt, 4 KB3.822 us
AES-GCM decrypt, 64 KB58.05 us
SHA-256, 4 KB18.63 us
SHA-512, 4 KB10.45 us
ML-KEM-512 encap56.11 us
ML-KEM-768 encap82.43 us
ML-KEM-512 decap97.04 us
ML-KEM-768 decap179.9 us

Source code referenced for RSA-3072 timing is not published in docs/benchmarks.md; run cargo bench --bench crypto_bench locally for that size.

Backend comparison: RustCrypto vs aws-lc-rs

Both rows built with target-cpu=native. aws-lc-rs provides assembly-optimised primitives (AES-NI, AVX2, Montgomery multiplication) and is the backend required for FIPS operation.

OperationRustCryptoaws-lc-rsSpeedup
RSA-2048 sign2.001 ms1.628 ms1.2x
RSA-2048 verify222.0 us26.79 us8.3x
ECDSA P-256 sign331.6 us291.8 us1.1x
ECDSA P-256 verify298.3 us66.44 us4.5x
AES-GCM decrypt, 4 KB3.590 us2.015 us1.8x
SHA-256, 4 KB16.63 us11.52 us1.4x
SHA-512, 4 KB10.14 us8.362 us1.2x
RSA-2048 keygen214.7 ms91.42 ms2.3x
EC P-256 keygen184.4 us157.3 us1.2x

The aws-lc-rs encrypt path draws a fresh nonce from SystemRandom on every call, while the RustCrypto path uses a cached per-key counter nonce. Decrypt numbers are therefore the fairer algorithmic comparison: aws-lc-rs is 1.8x faster on AES-GCM decrypt at 4 KB.

PKCS#11 ABI results

The following table is the three-way head-to-head through C_GetFunctionList. All three columns were produced by the same Criterion run on the same machine; Craton HSM has all nine optimisations applied.

OperationCraton HSM (RustCrypto)Craton HSM (aws-lc-rs)SoftHSMv2 2.6.1Best vs SoftHSM
RSA-2048 sign2.558 ms1.837 ms1.522 msSoftHSM 1.2x
RSA-2048 verify303.5 us251.0 us37.82 usSoftHSM 6.6x
ECDSA P-256 sign511.5 us363.2 us89.10 usSoftHSM 4.1x
ECDSA P-256 verify506.5 us338.0 us109.1 usSoftHSM 3.1x
SHA-256 digest, 4 KB26.0 us15.58 us9.90 usSoftHSM 1.6x
AES-GCM encrypt, 4 KB6.173 us4.419 usCraton HSM only
AES-GCM decrypt, 4 KB5.605 us4.094 usCraton HSM only
RSA-2048 keygen313.6 ms208.6 ms80.99 msSoftHSM 2.6x
EC P-256 keygen824.4 us824.9 us224.5 usSoftHSM 3.7x
AES-256 keygen18.83 us17.79 us90.63 usCraton HSM 5.1x

SoftHSMv2 does not accept null GCM parameters, so AES-GCM encrypt / decrypt cannot be benchmarked against it through the same harness; those cells are intentionally empty.

Where Craton HSM leads

  • AES-256 key generation is 5.1x faster. Craton HSM draws bytes directly from the OS CSPRNG (SystemRandom), while SoftHSMv2 routes through Botan's DRBG stack.
  • AES-GCM encrypt / decrypt have no SoftHSMv2 comparison available; the Craton HSM aws-lc-rs path completes 4 KB decrypt in 4.094 us.
  • Post-quantum — ML-KEM-768 encapsulation in 74.46 us and ML-DSA-65 signing in 563.3 us, neither of which SoftHSMv2 implements.

Where SoftHSMv2 leads

  • RSA-2048 verify is 6.6x faster — Botan's Montgomery multiplication uses hand-written ADX/MULX assembly that RustCrypto and aws-lc-rs do not yet match on this code path.
  • ECDSA P-256 sign is 4.1x faster; Botan uses a precomputed base-point table and wNAF scalar multiplication.
  • EC P-256 keygen is 3.7x faster and SHA-256 digest is 1.6x faster for the same assembly-level reasons.

For RSA-2048 sign, the gap is 1.2x. No SoftHSMv2 numbers are published for RSA-3072, RSA-4096, or Ed25519 in this comparison.

Post-quantum results

ML-KEM, ML-DSA, and SLH-DSA use pure-Rust implementations (ml-kem, ml-dsa, slh-dsa crates). The numbers therefore do not vary across backends.

OperationTime
ML-DSA-44 sign711.9 us
ML-DSA-65 sign563.3 us
ML-DSA-44 verify158.1 us
ML-DSA-65 verify270.1 us
ML-KEM-512 encap51.84 us
ML-KEM-768 encap74.46 us
ML-KEM-512 decap84.95 us
ML-KEM-768 decap135.3 us

Methodology notes

  • Single-threaded. PKCS#11 mandates a singleton module state, so every benchmark runs on one thread. Published figures therefore reflect single-operation latency, not peak throughput of a parallel workload.
  • Warm caches. Keys are pre-generated during Criterion setup; the measured iterations exercise warm CPU caches and warm parsed-key caches (see below).
  • target-cpu=native. Every number was collected with RUSTFLAGS="-C target-cpu=native". Distribution binaries built without this flag will be slower, most noticeably on ML-KEM decapsulation (up to 25%).
  • Persistence. The benches run with the default in-memory token (persist_objects = false). Enabling persistence adds an EncryptedStore write to key-generation and object-mutation paths; encrypted storage uses AES-GCM over each object blob, so the overhead is roughly one AES-GCM encrypt of the serialised object per write.
  • Async audit logging. The audit log's SHA-256 hash chain and JSON serialisation run on a dedicated thread fed by an mpsc channel; C_Sign / C_Verify / C_Encrypt enqueue and return in sub-microsecond time. With synchronous audit logging (pre-optimisation), ABI-layer AES-GCM decrypt at 4 KB was 12.57 us; after moving audit off the hot path the same operation is 5.605 us.
  • RSA key-generation variance is high. Prime-search timing depends on the candidates produced by the RNG; Criterion reports a wider interval for these benches than for any other.
  • Release mode only. cargo bench sets --release automatically. Debug-mode numbers are not representative and should not be reported.

Environment variables

VariablePurposeDefault
CRATON_HSM_LIBPath to the Craton HSM shared library.Auto-detected under target/release/.
SOFTHSM2_LIBPath to the SoftHSMv2 shared library. Enables the head-to-head run.Unset (comparison disabled).
SOFTHSM2_CONFSoftHSMv2 config file.Written by the harness.

Known limitations

  • No multi-threaded benchmarks. The PKCS#11 spec's global-state model makes parallel measurements unrepresentative.
  • AES-GCM cannot be compared with SoftHSMv2 through this harness because the two libraries disagree on whether CK_GCM_PARAMS may be null.
  • RSA-3072 / RSA-4096 and Ed25519 numbers are produced by the Rust-level harness only; no corresponding SoftHSMv2 rows are published in the reference document.
  • YubiHSM and p11-kit-proxy are not part of the harness. See Comparison gaps below.

Comparison gaps

Craton HSM publishes head-to-head numbers against SoftHSMv2 only. The reference docs/benchmarks.md in the source tree does not include YubiHSM, p11-kit-proxy, or other HSM vendors, and this page intentionally omits speculation. If you need comparative numbers against those modules, run the harness locally against their PKCS#11 libraries by setting SOFTHSM2_LIB to the target library path — the same harness loads any C_GetFunctionList-compatible module — and be aware that algorithm parameter conventions may differ.