Craton HSM
Cluster and Replication
Cluster and Replication
The craton-hsm-cluster crate builds a multi-node Craton HSM cluster
with Raft consensus, authenticated inter-node RPCs, durable log and
snapshot storage, and a per-peer health subsystem. Tenants, keys, roles,
and configuration changes are replicated as state-machine events so
every node converges on the same view.
- Crate:
craton-hsm-cluster - License: BSL 1.1
- MSRV: Rust 1.75
- Safety:
#![deny(unsafe_code)]at the crate root. - Status: production-ready
Modules
raft— leader election, log replication, commit index tracking.replication— higher-level key-replication events (KeyCreated,KeyUpdated,KeyDeleted,KeyRotated,ConfigChanged) with per-event SHA-256 checksums.state_machine— applies committed log entries to the tenant / key state that the rest of the HSM reads from.storage— RaftHardState(term, voted-for) and log persistence. Two implementations ship:InMemoryStoragefor tests.FileStoragefor single-process production use with atomic write-rename andfsyncon every hard-state flush.
config— cluster configuration struct.health— per-peer liveness tracking and cluster-wide quorum health.
Quorum Sizing
Raft requires a majority of peers to elect a leader and commit entries. The practical sizing rule is:
| Cluster size | Can tolerate | Notes |
|---|---|---|
| 1 | 0 failures | Not a cluster; no fault tolerance. |
| 3 | 1 failure | Minimum recommended. |
| 5 | 2 failures | Common production sizing. |
| 7 | 3 failures | Diminishing returns past this point. |
Use odd peer counts. Even-sized clusters tolerate the same number of failures as the next-smaller odd size while doubling write-path coordination cost.
Mixing 0.1.0 and 0.1.1 nodes in the same cluster is unsupported
due to the HMAC replay-protection addition in 0.1.1. Roll a cluster to
a single version at a time.
Network Requirements
- Low-latency, ideally single-AZ. Raft is a synchronous-replication protocol; leader commit latency is bounded by the slowest majority follower.
- Clock skew tolerance: ±30 s by default, configurable via
max_message_age_ms. - Wire encryption is out of scope for this crate. Deploy on an isolated network or tunnel inter-node traffic over mTLS. The crate provides authenticity and replay protection on the message payload; it does not encrypt it.
- Transport is pluggable but not bundled. The crate defines the
trait surface and the authenticated message format. Binary crates
supply the concrete transport — typically
tokio::netor a gRPC framework.
Security Properties
- Authenticated RPCs. Every RPC carries an HMAC-SHA256 tag keyed with the cluster secret. Peers without the secret are silently ignored — no reply is returned — so unauthenticated nodes cannot provoke state changes or exfiltrate state.
- Replay protection. RPCs include nonces and monotonic per-source sequence numbers that peers track.
- Bounded log entries. Any single log entry larger than
MAX_LOG_ENTRY_BYTES(16 MiB) is rejected at read time. A corrupt or forged file cannot OOM the node. - Bounded snapshots. Snapshots larger than
MAX_SNAPSHOT_BYTES(1 GiB) are rejected before being mapped into memory. - Fail-closed secret. If
cluster_secret_hexis unset andallow_insecureisfalse(the default), cluster construction fails with a diagnostic. Theinsecure-no-cluster-secretfeature exists for tests and demos only. - Zeroizing secrets. The cluster secret is held in
Zeroizing<[u8; 32]>. - Snapshot integrity.
StorageError::SnapshotIntegrityFailuresurfaces on HMAC mismatch during snapshot load.
Feature Flag
| Flag | Default | Effect |
|---|---|---|
insecure-no-cluster-secret | off | Allow RaftNode::new / RaftNode::from_config without a cluster secret. Tests / demos only. |
Configuration Example
Rotate a 32-byte cluster secret into each node via environment variable. Never embed it in the binary or commit it to version control.
use craton_hsm_cluster::config::{ClusterConfig, PeerConfig};
use craton_hsm_cluster::storage::FileStorage;
use craton_hsm_cluster::raft::RaftNode;
use std::path::PathBuf;
let cfg = ClusterConfig {
node_id: "node-a".into(),
peers: vec![
PeerConfig { id: "node-b".into(), address: "10.0.0.2:7700".into() },
PeerConfig { id: "node-c".into(), address: "10.0.0.3:7700".into() },
],
cluster_secret_hex: Some(std::env::var("CRATON_CLUSTER_SECRET")?),
allow_insecure: false,
..ClusterConfig::default()
};
cfg.validate()?; // Fails fast if the secret is missing.
let storage = FileStorage::open(PathBuf::from("/var/lib/craton/raft"))?;
let node = RaftNode::new(cfg, storage)?;
# Ok::<(), Box<dyn std::error::Error>>(())
Failover Behaviour
- On leader loss, followers run a randomized election timeout and elect a new leader. Writes are unavailable for the duration of the election window.
- Read traffic may be served from any node, but clients that require linearizability should route through the current leader or use a read-index probe.
- A partitioned minority cannot make progress: Raft refuses to commit without a majority ack, which is the correct behaviour for an HSM — a split-brain HSM would silently diverge key state.
- The
healthmodule tracks per-peer liveness and exposes a cluster-wide quorum-health indicator for upstream monitoring.
Replication Events
replication wraps committed state-machine entries as discrete events
with SHA-256 integrity checksums:
KeyCreated,KeyUpdated,KeyDeleted,KeyRotatedConfigChanged
The state machine in the consuming binary applies these to its key
store; Craton HSM plugs the craton-hsm-auth tenant / key database into
that seam.
Limitations
- Transport not bundled. See Network Requirements above.
- Single-process
FileStorage. Two processes writing to the same Raft directory will corrupt it. Keep one node process per state directory. - No online membership change protocol yet. No joint consensus or single-server membership changes. Reconfigure by restarting the cluster.
- Snapshot compaction is manual. The state machine decides when to snapshot and truncate.
Related Documents
- Authentication and RBAC — tenants and roles replicated via the Raft state machine.
- KMIP server — usually deployed in front of a clustered state machine.