Craton HSM

Cluster and Replication

The craton-hsm-cluster crate builds a multi-node Craton HSM cluster with Raft consensus, authenticated inter-node RPCs, durable log and snapshot storage, and a per-peer health subsystem. Tenants, keys, roles, and configuration changes are replicated as state-machine events so every node converges on the same view.

Crate: craton-hsm-cluster
License: BSL 1.1
MSRV: Rust 1.75
Safety: #![deny(unsafe_code)] at the crate root.
Status: production-ready

Modules

raft — leader election, log replication, commit index tracking.
replication — higher-level key-replication events (KeyCreated, KeyUpdated, KeyDeleted, KeyRotated, ConfigChanged) with per-event SHA-256 checksums.
state_machine — applies committed log entries to the tenant / key state that the rest of the HSM reads from.
storage — Raft HardState (term, voted-for) and log persistence. Two implementations ship:
- InMemoryStorage for tests.
- FileStorage for single-process production use with atomic write-rename and fsync on every hard-state flush.
config — cluster configuration struct.
health — per-peer liveness tracking and cluster-wide quorum health.

Quorum Sizing

Raft requires a majority of peers to elect a leader and commit entries. The practical sizing rule is:

Cluster size	Can tolerate	Notes
1	0 failures	Not a cluster; no fault tolerance.
3	1 failure	Minimum recommended.
5	2 failures	Common production sizing.
7	3 failures	Diminishing returns past this point.

Use odd peer counts. Even-sized clusters tolerate the same number of failures as the next-smaller odd size while doubling write-path coordination cost.

Mixing 0.1.0 and 0.1.1 nodes in the same cluster is unsupported due to the HMAC replay-protection addition in 0.1.1. Roll a cluster to a single version at a time.

Network Requirements

Low-latency, ideally single-AZ. Raft is a synchronous-replication protocol; leader commit latency is bounded by the slowest majority follower.
Clock skew tolerance: ±30 s by default, configurable via max_message_age_ms.
Wire encryption is out of scope for this crate. Deploy on an isolated network or tunnel inter-node traffic over mTLS. The crate provides authenticity and replay protection on the message payload; it does not encrypt it.
Transport is pluggable but not bundled. The crate defines the trait surface and the authenticated message format. Binary crates supply the concrete transport — typically tokio::net or a gRPC framework.

Security Properties

Authenticated RPCs. Every RPC carries an HMAC-SHA256 tag keyed with the cluster secret. Peers without the secret are silently ignored — no reply is returned — so unauthenticated nodes cannot provoke state changes or exfiltrate state.
Replay protection. RPCs include nonces and monotonic per-source sequence numbers that peers track.
Bounded log entries. Any single log entry larger than MAX_LOG_ENTRY_BYTES (16 MiB) is rejected at read time. A corrupt or forged file cannot OOM the node.
Bounded snapshots. Snapshots larger than MAX_SNAPSHOT_BYTES (1 GiB) are rejected before being mapped into memory.
Fail-closed secret. If cluster_secret_hex is unset and allow_insecure is false (the default), cluster construction fails with a diagnostic. The insecure-no-cluster-secret feature exists for tests and demos only.
Zeroizing secrets. The cluster secret is held in Zeroizing<[u8; 32]>.
Snapshot integrity. StorageError::SnapshotIntegrityFailure surfaces on HMAC mismatch during snapshot load.

Feature Flag

Flag	Default	Effect
`insecure-no-cluster-secret`	off	Allow `RaftNode::new` / `RaftNode::from_config` without a cluster secret. Tests / demos only.

Configuration Example

Rotate a 32-byte cluster secret into each node via environment variable. Never embed it in the binary or commit it to version control.

use craton_hsm_cluster::config::{ClusterConfig, PeerConfig};
use craton_hsm_cluster::storage::FileStorage;
use craton_hsm_cluster::raft::RaftNode;
use std::path::PathBuf;

let cfg = ClusterConfig {
    node_id: "node-a".into(),
    peers: vec![
        PeerConfig { id: "node-b".into(), address: "10.0.0.2:7700".into() },
        PeerConfig { id: "node-c".into(), address: "10.0.0.3:7700".into() },
    ],
    cluster_secret_hex: Some(std::env::var("CRATON_CLUSTER_SECRET")?),
    allow_insecure: false,
    ..ClusterConfig::default()
};
cfg.validate()?;  // Fails fast if the secret is missing.

let storage = FileStorage::open(PathBuf::from("/var/lib/craton/raft"))?;
let node    = RaftNode::new(cfg, storage)?;
# Ok::<(), Box<dyn std::error::Error>>(())

Failover Behaviour

On leader loss, followers run a randomized election timeout and elect a new leader. Writes are unavailable for the duration of the election window.
Read traffic may be served from any node, but clients that require linearizability should route through the current leader or use a read-index probe.
A partitioned minority cannot make progress: Raft refuses to commit without a majority ack, which is the correct behaviour for an HSM — a split-brain HSM would silently diverge key state.
The health module tracks per-peer liveness and exposes a cluster-wide quorum-health indicator for upstream monitoring.

Replication Events

replication wraps committed state-machine entries as discrete events with SHA-256 integrity checksums:

KeyCreated, KeyUpdated, KeyDeleted, KeyRotated
ConfigChanged

The state machine in the consuming binary applies these to its key store; Craton HSM plugs the craton-hsm-auth tenant / key database into that seam.

Limitations

Transport not bundled. See Network Requirements above.
Single-process FileStorage. Two processes writing to the same Raft directory will corrupt it. Keep one node process per state directory.
No online membership change protocol yet. No joint consensus or single-server membership changes. Reconfigure by restarting the cluster.
Snapshot compaction is manual. The state machine decides when to snapshot and truncate.

Authentication and RBAC — tenants and roles replicated via the Raft state machine.
KMIP server — usually deployed in front of a clustered state machine.