Craton HSM

Backup and Recovery

Backup and Recovery

Craton HSM's persistent state lives in a small number of files that are easy to back up, but the confidentiality constraints are strict: private keys must never leave the cryptographic boundary in plaintext. This page specifies what to back up, how to back it up safely, and how to restore — on the same host, on a new host, and across versions.

What to back up

ItemLocationContainsBackup method
Encrypted object store[token].storage_path (default craton_hsm_store/)AES-256-GCM-wrapped keys and token objectsFile copy from a quiesced daemon, or craton-hsm-admin backup
Configurationcraton_hsm.toml (path in CRATON_HSM_CONFIG)Security policy, paths, algorithm togglesVersion control
TLS server material[daemon].tls_cert, [daemon].tls_keyPEM cert + private key for the gRPC endpointEncrypted offline storage
Audit log archiveRotated segments of [audit].log_pathTamper-evident operation historyImmutable object storage with chain verification at ingest
SO and user PIN envelopeOut of bandNeeded to unlock the restored storeEnterprise password vault, separate from the backup bundle

Deliberately not backed up:

  • Raw plaintext key material. The store is already encrypted under a key derived from the SO PIN; extracting cleartext keys is not a supported operation.
  • Runtime state (session table, DRBG state). These are deliberately volatile.
  • Process logs beyond what is captured centrally by the SIEM.

Backup procedure

Prefer the admin CLI's backup command over a raw file copy: it produces a single archive, validates store consistency, and does not require stopping the daemon.

craton-hsm-admin backup --pin <SO_PIN> --output backup-YYYYMMDD.enc

The output file is already wrapped in the store's encryption envelope. For long-term storage, wrap it again with an organization-level escrow key (for example, a gpg --encrypt --recipient backup@example.com pipeline into an S3 bucket with Object Lock enabled).

Complete a backup bundle by including:

  • The output of craton-hsm-admin backup.
  • The current craton_hsm.toml, redacted of any path that leaks host-specific layout.
  • Versioned TLS material (cert and key), separately encrypted.
  • A manifest: Craton HSM version, host identity, timestamp, SHA-256 of each file.

Raw file-copy alternative (when the CLI is unavailable)

Only when craton-hsm-admin is not usable (for example, a partial disk failure):

sudo systemctl stop craton-hsm
tar czf store.tgz -C /var/lib/craton-hsm store
gpg --encrypt --recipient backup@example.com store.tgz
sudo systemctl start craton-hsm

Raw file copies require a brief stop to guarantee consistency. Never copy the redb file while the daemon holds it open: you will get a torn file.

Backup cadence

  • Every configuration change — before applying it, snapshot the pre-change state.
  • Before any upgrade — the major-version backup is the one you will actually need.
  • Daily, for production deployments that generate or destroy keys during business operations.
  • Immediately after any batch key import.

Restoration

Restore to the same host

sudo systemctl stop craton-hsm
mv /var/lib/craton-hsm/store /var/lib/craton-hsm/store.quarantine.$(date +%s)
craton-hsm-admin restore --pin <SO_PIN> --input backup-YYYYMMDD.enc
sudo systemctl start craton-hsm
craton-hsm-admin token info

The quarantined store is kept until the restored state is confirmed good. Delete it only after a successful smoke test of at least one signing operation against a known key.

Restore to a new host

A new host restore is the DR case. Prerequisites:

  • The same Craton HSM version (or a newer version explicitly supported by the migration guide — see below).
  • A base OS image that matches the original in libc, CPU architecture, and endianness.
  • The SO PIN, delivered separately from the backup bundle.

Procedure:

  1. Install Craton HSM per ../getting-started/installation.
  2. Install the config file from the backup bundle. Adjust storage_path and log_path if the filesystem layout differs; never change serial_number or label during a restore.
  3. Install TLS material at the paths referenced in the config, with correct ownership and mode 0600.
  4. craton-hsm-admin restore --pin <SO_PIN> --input backup-YYYYMMDD.enc.
  5. Start the daemon. Verify POST passes (journalctl -u craton-hsm | grep -i self-test).
  6. Run craton-hsm-admin token info and compare the object count and label against the backup manifest.
  7. Sign a known test payload with a known key and verify against a pre-computed signature.
  8. Re-enroll the host in load balancers, service meshes, or monitoring only after the verification succeeds.

Cross-version restore

The store format is stable within a minor series. For cross-version restores:

  • Patch-level upgrade (for example, 0.9.0 → 0.9.1). Backward-compatible. Restore directly. Review the configuration validation rules in the release notes — tightened validation can reject a previously valid config (see ../getting-started/installation and the 0.9.1 notes on path validation and PBKDF2 iteration floor).
  • Minor-version upgrade within a pre-1.0 line. Read the migration notes for the target version before restoring. Some fields may need to be updated (for example, pbkdf2_iterations must be at least 100 000 from 0.9.1 onward).
  • Major-version upgrade (to 1.0). PQC key serialization may change. Export PQC keys via C_WrapKey or generate a fresh keypair after the upgrade. Always keep the pre-upgrade backup for the rollback window.

Never restore a newer-version backup into an older-version daemon; the forward-compatibility guarantee runs one direction only.

Storage and handling of backup bundles

  • Store backups on encrypted volumes. The backup file is encrypted, but defense-in-depth matters — compromise of the backup plus the SO PIN compromises the HSM.
  • Keep the SO PIN and the backup on separate systems with separate access-control paths. An operator who can read the bundle should not be able to read the PIN, and vice versa.
  • Use object storage with immutability (S3 Object Lock, GCS Bucket Lock) for ransomware resistance.
  • Retain backups per regulatory requirement; at minimum, keep the two most recent successful backups plus one from the current quarter.
  • Test restore at least quarterly. An untested backup is a hypothesis, not a backup.

Restoration drill

Run this drill quarterly on a non-production host, and before any major-version upgrade.

  1. Pull the most recent backup from offsite (not the local copy).
  2. Verify bundle integrity: decrypt and list the archive without extracting.
  3. Stand up a clean host with no prior Craton HSM state.
  4. Restore per the "Restore to a new host" procedure above.
  5. Sign a known test vector with a known key; compare against the pre-recorded signature.
  6. Verify audit-log continuity: the first entry in the restored log must chain from the last entry in the most recent archived segment.
  7. Record the wall-clock time from step 3 to step 5 as the observed RTO.
  8. Destroy the drill host's key material. Do not leave a drill artefact running.
  9. File the drill report with RTO, issues encountered, and any procedure updates.

See ./runbook for related operational tasks and ./troubleshooting for failure diagnosis during restore.