← blog

Aether: beating gzip-9 with a neural state-space model

A diagonal linear SSM with 66 parameters, adaptive from scratch per block. 17.1% smaller than gzip-9 on the Silesia Corpus.

by Craton engineering

The state-of-the-art for general-purpose lossless compression has not moved much in a decade. gzip from 1993, bzip2 from 1996, xz from 2009, zstd from 2015. The underlying models — Huffman, BWT, LZMA — are older still. We wanted to know what it would take to meaningfully improve, in pure Rust, at speeds that are plausible for production.

Aether is the answer. On the Silesia Corpus (the 202 MiB, 12-file industry standard), it achieves a 26.45% ratio. gzip-9 lands at 31.91%. bzip2-9 at 25.72%. Aether sits between gzip and bzip2, much closer to bzip2 — 17.1% smaller than gzip, only 2.8% larger than bzip2.

The approach

Traditional archivers couple modeling and coding tightly. Huffman assumes a specific probability structure; LZMA assumes another. Aether separates them. The modeling layer produces a probability distribution; the coding layer compresses against whatever it is handed.

The modeling layer is the interesting part:

  1. Content-Defined Chunking with FastCDC (16–512–4096 KiB chunks). Keeps boundary stability across edits.
  2. Entropy analysis + content-type detection per chunk.
  3. Semantic solid grouping — files of the same type are compressed together.
  4. Adaptive routing — per chunk, pick the smallest of five pipelines: BWT+MTF+RLE+NeuralSSM+RangeCoding, LZ77+Predictor+RangeCoding, plain predictor + range coding, zstd-19 fallback, or store.
  5. Neural SSM Predictor: a diagonal linear state-space model with 66 learnable parameters that adapts from scratch per block, with a hierarchical RLE predictor and an order-2 literal context blend.

The final coder is a custom byte-aligned range coder with 15-bit CDF precision. Every piece is measured; every piece earns its place in the pipeline.

Why neural, and why 66 parameters

The "neural" label is loaded. We are not doing a transformer. We are using a tiny diagonal SSM with 32 exponential moving averages at timescales from 0.5 to 0.999, feeding binary classifiers, mixed adaptively with a hierarchical RLE predictor.

66 parameters is a deliberate choice. Fewer than a chess-engine evaluation function. Small enough to update in the inner loop with no GPU and no dependencies. Adaptive enough that it re-learns the source statistics from scratch for each block — no pre-trained weights, no model download, no non-determinism.

Where the cost goes

~1.3 MiB/s on our internal corpus in the current release. Breakdown: BWT sort ~50%, Neural SSM predict+update ~25%, range coding ~15%, everything else ~10%. The BWT sort is the obvious next optimization target; we have a plan that we haven't yet landed.

What it is good for, and what it is not

Good: cold archives, code repositories, text-heavy datasets, backups, anywhere the decompression happens rarely and the size matters. Also: situations where you want a reproducible single-binary pure-Rust compression stack.

Not good (yet): real-time compression of high-throughput streams. That is zstd's home turf, and we are not trying to displace it there.

Source: github.com/craton-co/aether-arch. Six crates: core, CLI, FFI (cbindgen), REST API, Wasm (decompress-only), Python (PyO3). Encryption: AES-256-GCM / ChaCha20-Poly1305 with Argon2id KDF.