Craton HSM

Cluster and Replication

Cluster and Replication

The craton-hsm-cluster crate builds a multi-node Craton HSM cluster with Raft consensus, authenticated inter-node RPCs, durable log and snapshot storage, and a per-peer health subsystem. Tenants, keys, roles, and configuration changes are replicated as state-machine events so every node converges on the same view.

  • Crate: craton-hsm-cluster
  • License: BSL 1.1
  • MSRV: Rust 1.75
  • Safety: #![deny(unsafe_code)] at the crate root.
  • Status: production-ready

Modules

  • raft — leader election, log replication, commit index tracking.
  • replication — higher-level key-replication events (KeyCreated, KeyUpdated, KeyDeleted, KeyRotated, ConfigChanged) with per-event SHA-256 checksums.
  • state_machine — applies committed log entries to the tenant / key state that the rest of the HSM reads from.
  • storage — Raft HardState (term, voted-for) and log persistence. Two implementations ship:
    • InMemoryStorage for tests.
    • FileStorage for single-process production use with atomic write-rename and fsync on every hard-state flush.
  • config — cluster configuration struct.
  • health — per-peer liveness tracking and cluster-wide quorum health.

Quorum Sizing

Raft requires a majority of peers to elect a leader and commit entries. The practical sizing rule is:

Cluster sizeCan tolerateNotes
10 failuresNot a cluster; no fault tolerance.
31 failureMinimum recommended.
52 failuresCommon production sizing.
73 failuresDiminishing returns past this point.

Use odd peer counts. Even-sized clusters tolerate the same number of failures as the next-smaller odd size while doubling write-path coordination cost.

Mixing 0.1.0 and 0.1.1 nodes in the same cluster is unsupported due to the HMAC replay-protection addition in 0.1.1. Roll a cluster to a single version at a time.

Network Requirements

  • Low-latency, ideally single-AZ. Raft is a synchronous-replication protocol; leader commit latency is bounded by the slowest majority follower.
  • Clock skew tolerance: ±30 s by default, configurable via max_message_age_ms.
  • Wire encryption is out of scope for this crate. Deploy on an isolated network or tunnel inter-node traffic over mTLS. The crate provides authenticity and replay protection on the message payload; it does not encrypt it.
  • Transport is pluggable but not bundled. The crate defines the trait surface and the authenticated message format. Binary crates supply the concrete transport — typically tokio::net or a gRPC framework.

Security Properties

  • Authenticated RPCs. Every RPC carries an HMAC-SHA256 tag keyed with the cluster secret. Peers without the secret are silently ignored — no reply is returned — so unauthenticated nodes cannot provoke state changes or exfiltrate state.
  • Replay protection. RPCs include nonces and monotonic per-source sequence numbers that peers track.
  • Bounded log entries. Any single log entry larger than MAX_LOG_ENTRY_BYTES (16 MiB) is rejected at read time. A corrupt or forged file cannot OOM the node.
  • Bounded snapshots. Snapshots larger than MAX_SNAPSHOT_BYTES (1 GiB) are rejected before being mapped into memory.
  • Fail-closed secret. If cluster_secret_hex is unset and allow_insecure is false (the default), cluster construction fails with a diagnostic. The insecure-no-cluster-secret feature exists for tests and demos only.
  • Zeroizing secrets. The cluster secret is held in Zeroizing<[u8; 32]>.
  • Snapshot integrity. StorageError::SnapshotIntegrityFailure surfaces on HMAC mismatch during snapshot load.

Feature Flag

FlagDefaultEffect
insecure-no-cluster-secretoffAllow RaftNode::new / RaftNode::from_config without a cluster secret. Tests / demos only.

Configuration Example

Rotate a 32-byte cluster secret into each node via environment variable. Never embed it in the binary or commit it to version control.

use craton_hsm_cluster::config::{ClusterConfig, PeerConfig};
use craton_hsm_cluster::storage::FileStorage;
use craton_hsm_cluster::raft::RaftNode;
use std::path::PathBuf;

let cfg = ClusterConfig {
    node_id: "node-a".into(),
    peers: vec![
        PeerConfig { id: "node-b".into(), address: "10.0.0.2:7700".into() },
        PeerConfig { id: "node-c".into(), address: "10.0.0.3:7700".into() },
    ],
    cluster_secret_hex: Some(std::env::var("CRATON_CLUSTER_SECRET")?),
    allow_insecure: false,
    ..ClusterConfig::default()
};
cfg.validate()?;  // Fails fast if the secret is missing.

let storage = FileStorage::open(PathBuf::from("/var/lib/craton/raft"))?;
let node    = RaftNode::new(cfg, storage)?;
# Ok::<(), Box<dyn std::error::Error>>(())

Failover Behaviour

  • On leader loss, followers run a randomized election timeout and elect a new leader. Writes are unavailable for the duration of the election window.
  • Read traffic may be served from any node, but clients that require linearizability should route through the current leader or use a read-index probe.
  • A partitioned minority cannot make progress: Raft refuses to commit without a majority ack, which is the correct behaviour for an HSM — a split-brain HSM would silently diverge key state.
  • The health module tracks per-peer liveness and exposes a cluster-wide quorum-health indicator for upstream monitoring.

Replication Events

replication wraps committed state-machine entries as discrete events with SHA-256 integrity checksums:

  • KeyCreated, KeyUpdated, KeyDeleted, KeyRotated
  • ConfigChanged

The state machine in the consuming binary applies these to its key store; Craton HSM plugs the craton-hsm-auth tenant / key database into that seam.

Limitations

  • Transport not bundled. See Network Requirements above.
  • Single-process FileStorage. Two processes writing to the same Raft directory will corrupt it. Keep one node process per state directory.
  • No online membership change protocol yet. No joint consensus or single-server membership changes. Reconfigure by restarting the cluster.
  • Snapshot compaction is manual. The state machine decides when to snapshot and truncate.