Craton HSM
Backup and Recovery
Backup and Recovery
Craton HSM's persistent state lives in a small number of files that are easy to back up, but the confidentiality constraints are strict: private keys must never leave the cryptographic boundary in plaintext. This page specifies what to back up, how to back it up safely, and how to restore — on the same host, on a new host, and across versions.
What to back up
| Item | Location | Contains | Backup method |
|---|---|---|---|
| Encrypted object store | [token].storage_path (default craton_hsm_store/) | AES-256-GCM-wrapped keys and token objects | File copy from a quiesced daemon, or craton-hsm-admin backup |
| Configuration | craton_hsm.toml (path in CRATON_HSM_CONFIG) | Security policy, paths, algorithm toggles | Version control |
| TLS server material | [daemon].tls_cert, [daemon].tls_key | PEM cert + private key for the gRPC endpoint | Encrypted offline storage |
| Audit log archive | Rotated segments of [audit].log_path | Tamper-evident operation history | Immutable object storage with chain verification at ingest |
| SO and user PIN envelope | Out of band | Needed to unlock the restored store | Enterprise password vault, separate from the backup bundle |
Deliberately not backed up:
- Raw plaintext key material. The store is already encrypted under a key derived from the SO PIN; extracting cleartext keys is not a supported operation.
- Runtime state (session table, DRBG state). These are deliberately volatile.
- Process logs beyond what is captured centrally by the SIEM.
Backup procedure
Prefer the admin CLI's backup command over a raw file copy: it produces a single archive, validates store consistency, and does not require stopping the daemon.
craton-hsm-admin backup --pin <SO_PIN> --output backup-YYYYMMDD.enc
The output file is already wrapped in the store's encryption envelope. For long-term storage, wrap it again with an organization-level escrow key (for example, a gpg --encrypt --recipient backup@example.com pipeline into an S3 bucket with Object Lock enabled).
Complete a backup bundle by including:
- The output of
craton-hsm-admin backup. - The current
craton_hsm.toml, redacted of any path that leaks host-specific layout. - Versioned TLS material (cert and key), separately encrypted.
- A manifest: Craton HSM version, host identity, timestamp, SHA-256 of each file.
Raw file-copy alternative (when the CLI is unavailable)
Only when craton-hsm-admin is not usable (for example, a partial disk failure):
sudo systemctl stop craton-hsm
tar czf store.tgz -C /var/lib/craton-hsm store
gpg --encrypt --recipient backup@example.com store.tgz
sudo systemctl start craton-hsm
Raw file copies require a brief stop to guarantee consistency. Never copy the redb file while the daemon holds it open: you will get a torn file.
Backup cadence
- Every configuration change — before applying it, snapshot the pre-change state.
- Before any upgrade — the major-version backup is the one you will actually need.
- Daily, for production deployments that generate or destroy keys during business operations.
- Immediately after any batch key import.
Restoration
Restore to the same host
sudo systemctl stop craton-hsm
mv /var/lib/craton-hsm/store /var/lib/craton-hsm/store.quarantine.$(date +%s)
craton-hsm-admin restore --pin <SO_PIN> --input backup-YYYYMMDD.enc
sudo systemctl start craton-hsm
craton-hsm-admin token info
The quarantined store is kept until the restored state is confirmed good. Delete it only after a successful smoke test of at least one signing operation against a known key.
Restore to a new host
A new host restore is the DR case. Prerequisites:
- The same Craton HSM version (or a newer version explicitly supported by the migration guide — see below).
- A base OS image that matches the original in libc, CPU architecture, and endianness.
- The SO PIN, delivered separately from the backup bundle.
Procedure:
- Install Craton HSM per ../getting-started/installation.
- Install the config file from the backup bundle. Adjust
storage_pathandlog_pathif the filesystem layout differs; never changeserial_numberorlabelduring a restore. - Install TLS material at the paths referenced in the config, with correct ownership and mode
0600. craton-hsm-admin restore --pin <SO_PIN> --input backup-YYYYMMDD.enc.- Start the daemon. Verify POST passes (
journalctl -u craton-hsm | grep -i self-test). - Run
craton-hsm-admin token infoand compare the object count and label against the backup manifest. - Sign a known test payload with a known key and verify against a pre-computed signature.
- Re-enroll the host in load balancers, service meshes, or monitoring only after the verification succeeds.
Cross-version restore
The store format is stable within a minor series. For cross-version restores:
- Patch-level upgrade (for example, 0.9.0 → 0.9.1). Backward-compatible. Restore directly. Review the configuration validation rules in the release notes — tightened validation can reject a previously valid config (see ../getting-started/installation and the 0.9.1 notes on path validation and PBKDF2 iteration floor).
- Minor-version upgrade within a pre-1.0 line. Read the migration notes for the target version before restoring. Some fields may need to be updated (for example,
pbkdf2_iterationsmust be at least 100 000 from 0.9.1 onward). - Major-version upgrade (to 1.0). PQC key serialization may change. Export PQC keys via
C_WrapKeyor generate a fresh keypair after the upgrade. Always keep the pre-upgrade backup for the rollback window.
Never restore a newer-version backup into an older-version daemon; the forward-compatibility guarantee runs one direction only.
Storage and handling of backup bundles
- Store backups on encrypted volumes. The backup file is encrypted, but defense-in-depth matters — compromise of the backup plus the SO PIN compromises the HSM.
- Keep the SO PIN and the backup on separate systems with separate access-control paths. An operator who can read the bundle should not be able to read the PIN, and vice versa.
- Use object storage with immutability (S3 Object Lock, GCS Bucket Lock) for ransomware resistance.
- Retain backups per regulatory requirement; at minimum, keep the two most recent successful backups plus one from the current quarter.
- Test restore at least quarterly. An untested backup is a hypothesis, not a backup.
Restoration drill
Run this drill quarterly on a non-production host, and before any major-version upgrade.
- Pull the most recent backup from offsite (not the local copy).
- Verify bundle integrity: decrypt and list the archive without extracting.
- Stand up a clean host with no prior Craton HSM state.
- Restore per the "Restore to a new host" procedure above.
- Sign a known test vector with a known key; compare against the pre-recorded signature.
- Verify audit-log continuity: the first entry in the restored log must chain from the last entry in the most recent archived segment.
- Record the wall-clock time from step 3 to step 5 as the observed RTO.
- Destroy the drill host's key material. Do not leave a drill artefact running.
- File the drill report with RTO, issues encountered, and any procedure updates.
See ./runbook for related operational tasks and ./troubleshooting for failure diagnosis during restore.