Craton HSM

Fork Safety

Fork Safety

Craton HSM is a single-process, multi-threaded PKCS#11 module. A Unix fork(2) call is hostile to this design because it duplicates the parent's address space — including every lock, file descriptor, DRBG state, and audit-log hash — into a child that neither the parent nor the module can safely coordinate with.

This page explains why forking after C_Initialize is dangerous, how the module detects and rejects the resulting state, and how to run Craton HSM under popular pre-forking servers such as Apache prefork, Gunicorn, and Phusion Passenger.

Why Forking Breaks a Loaded HSM

After fork(2) returns in the child, the child has:

  • Stale mutex state. parking_lot::Mutex, DashMap shard locks, and RwLock all inherit the locked-or-unlocked state from the parent at the instant of the fork. If any lock was held in the parent when fork ran, the child wakes up with that lock held — and no thread to release it.
  • Shared file descriptors. The persistent database and the audit log are open in both processes. Concurrent appends from parent and child interleave bytes at the kernel level and corrupt the structure.
  • Duplicated DRBG state. The HmacDrbg state is byte-identical in parent and child. Absent intervention, both produce the same random output, which is catastrophic for nonces, salts, and keygen.
  • Broken audit chain. Two processes appending to the log with the same previous_hash produce divergent chains that cannot be verified.
  • Split object store. In-memory session objects are duplicated but not coordinated; a C_DestroyObject in one process does not affect the other.

Rust's type system does not help here — fork is a property of the process model below the language runtime. The only safe posture is to treat any PKCS#11 state in the child as invalid.

Detection: PID Comparison

C_Initialize records the process ID of the caller in an internal AtomicU64. Every subsequent PKCS#11 call retrieves the current PID (getpid(2) on Unix, GetCurrentProcessId() on Windows) and compares it against the stored value.

Parent (PID 1234):
  C_Initialize -> stores 1234 -> CKR_OK
  C_OpenSession -> PID check: 1234 == 1234 -> proceeds

  fork()
    |
    +-- Parent (still PID 1234):
    |     C_Sign -> PID check: 1234 == 1234 -> proceeds
    |
    +-- Child (PID 5678):
          C_Sign -> PID check: 5678 != 1234
                 -> CKR_CRYPTOKI_NOT_INITIALIZED

The child is forced through C_Initialize again. That call resets the HSM core, re-seeds the DRBG, re-opens its own log and database handles, and stores its own PID. The parent's open state is not touched.

This is the same approach taken by OpenSC's PKCS#11 module and is the behaviour recommended by the PKCS#11 specification's guidance on library initialisation. Applications that call fork and expect the child to continue using a PKCS#11 session from the parent are in violation of the specification regardless of which module they use.

Windows

fork(2) does not exist on Windows. CreateProcess creates a process with a fresh address space; each process loads craton_hsm.dll independently and runs its own C_Initialize. The PID check still runs as defence in depth against any future change to the Windows process model.

pthread_atfork

Craton HSM does not register pthread_atfork handlers. In principle a prepare/parent/child handler trio could acquire every internal lock before the fork and release it after, but the module uses lock primitives (parking_lot, DashMap) whose internals are not safe to traverse from a pthread_atfork callback, and the persistent storage layer uses fs2 file locks that do not survive fork safely either.

Rather than try to make the state survivable, the module rejects calls from the child. This is the only posture that provides deterministic behaviour across backends and platforms.

Persistent Database Protection

Beyond the PID check, the persistent store uses an advisory exclusive file lock. When persistence is enabled, C_Initialize calls fs2::FileExt::try_lock_exclusive() on the database file. If a second process (forked child or an unrelated process pointed at the same storage_path) tries to open the database, the lock fails and C_Initialize returns CKR_GENERAL_ERROR with a log entry:

Database at <path> is locked by another process

This covers the case where a forked child re-initialises and attempts to reopen the parent's database. The child cannot corrupt the store by virtue of the lock, and the operator gets a clear error to diagnose the deployment mistake.

Guidance for Forking Servers

Apache prefork MPM

The prefork MPM creates worker processes with fork(2). The PKCS#11 module loaded by the parent cannot be used by the workers.

  • Do not call C_Initialize from the parent. Configure any PKCS#11 consumer (mod_ssl, OpenSSL engine, PKCS#11 provider) so that initialisation happens at first use inside the worker.
  • If a module loads eagerly at startup, switch the MPM to event or worker (thread-based) or use mod_ssl with SSLStaplingStandardCacheTimeout and a PKCS#11 provider that supports post-fork re-initialisation.
  • Alternatively, point every worker at craton-hsm-daemon over gRPC and let the workers be short-lived clients that make no assumption about in-process state.

Gunicorn

Gunicorn uses a pre-fork model by default. Use the --preload flag only for modules that do not touch the HSM. For HSM integration:

  • Run without --preload and initialise the PKCS#11 provider lazily in each worker (inside the application factory or a post_worker_init hook).
  • Or run with the gthread or gevent worker classes, which do not fork per request; still initialise inside the worker, not the arbiter.
  • Or use the gRPC daemon and let workers be gRPC clients. This is the recommended pattern for horizontal scaling.

Phusion Passenger

Passenger uses smart spawning by default — the application is loaded in a spawner process and workers are forked from it. This is the worst case for a PKCS#11 module: any state created by the spawner becomes invalid in the worker.

  • Disable smart spawning (passenger_spawn_method conservative; in Nginx, PassengerSpawnMethod conservative in Apache). This makes each worker a fresh process that initialises the module from scratch.
  • Conservative spawning has a higher startup cost; accept it, or run the daemon and let workers be gRPC clients to amortise startup.

General Rule

Initialise the PKCS#11 module after the last fork, not before. If you cannot guarantee that, use the gRPC daemon and let every process be an independent client.

Multi-Process Access Without Forking

Several supported patterns keep a single HSM state authoritative across many processes without involving fork:

ScenarioPattern
Single applicationDirect dlopen / LoadLibrary; one initialised module per process
Multiple applications on the same hostcraton-hsm-daemon over gRPC; every application is a client
Containers / KubernetesSidecar daemon per pod, or a daemonset with one daemon per node
Multiple hostsOne daemon per host, optionally joined into a cluster (enterprise)

The daemon serialises operations through a single HsmCore, maintains a single file lock on the database, and owns a single audit log with an unbroken chain. Clients may come and go freely; the daemon owns the state.

Invariants Summary

  1. A child process after fork must call C_Initialize before any other PKCS#11 function; any other call returns CKR_CRYPTOKI_NOT_INITIALIZED.
  2. Only one process may hold the persistent database open at a time.
  3. The gRPC daemon is the supported path for multi-process access.
  4. Each process that loads the library has isolated token state; there is no shared memory across processes.
  5. Applications that need pre-fork key material should generate it inside the forked worker, not in the parent.

Further Reading