Aether
Changelog
Changelog
All notable changes to AetherArch will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Added
- Placeholder for new additions
[0.3.0-rc1] - TBD
Planned
- Archive splitting and spanning: multi-part archives across multiple files/disks
- Archive repair: parity block generation and recovery tool
- Format Freeze: finalize
.aetbinary format specification for 1.0 - Third-party security audit and FIPS 140-2 evaluation
[0.2.3] - 2026-03-18
Added
- Direct CDF override:
predict_cdf()overrides onNeuralSsmPredictorandOrder0Modelbuild CDF tables in-place withoutprobs_to_cdf()conversion — 2.6-3.4× predictor speedup - Division→multiplication optimization: Precomputed reciprocals in
RlePredictor::predict()andNeuralSsmPredictor::predict()/predict_cdf()— replaces 254 divisions per call with 1 division + 254 multiplies; 10% SSM predictor speedup, +20% end-to-end throughput - LTO + codegen-units=1: Release profile optimizations for marginal (<5%) gains
- CLA Assistant: GitHub Actions workflow for contributor license agreement enforcement
- Dependabot action bumps:
actions/checkoutv6,actions/upload-artifactv7,actions/download-artifactv8,dtolnay/rust-toolchain1.100.0 - Enlarged internal corpus: Test fixtures expanded from 87.1 KiB to 2.6 MiB (english.txt 2 MiB, source.rs 315 KiB, mixed.json 299 KiB) for more representative speed benchmarks
Fixed
- Content-type detection: Fixed type mismatch in
analyzer.rstext detection (containscall), improving routing decisions on Silesia corpus
Changed
- Silesia ratio: 26.45% (was 29.12% in 0.2.2) — improved content-type detection leads to better routing
- Silesia speed: 0.2 MB/s compress, 0.3 MB/s decompress (889s on 202 MiB)
- Internal corpus: 2.75% ratio (0.220 bpb) on enlarged 2.6 MiB corpus; 1.0 MB/s compress, 2.1 MB/s decompress
Investigated & Reverted
- AVX2/SIMD vectorization on SSM hot loops — CPU frequency penalty negates gains
fast_ln()polynomial approximation in SSM mixer — ratio regression (+0.17%)- 16 order-2 context buckets (was 8) — count fragmentation on small corpus (+0.10%)
- Unrolled binary search in
decode_cdf()— 16% slower due to instruction cache pressure - SA-based BWT via SA-IS — mathematically incorrect for cyclic rotations
[0.2.2] - 2026-02-26
Added
- Enhanced enterprise gating:
threadingandcloudmodules now gated behind theenterprisefeature flag. - Examples directory:
basic_compress.rs,basic_decompress.rs,streaming_extract.rsinaether-core/examples/ - Benchmark comparison:
aet bench --compareflag for external tool comparison (gzip, bzip2, xz, zstd) - Wasm crate (
aether-wasm): wasm-bindgen decompress-only API (verify,list_files,extract_file) - Performance optimization:
#[inline]on predictor hot paths, precomputeda_invin NeuralSSM EMA loop
Changed
- BENCHMARKS.md: Updated with 0.2.1 numbers, external tool comparison tables, 179 tests
[0.2.1] - 2026-02-13
Added
- Compression analytics:
CompressionAnalyticsstruct with per-method block counts, byte distributions, group stats, timing; CLI--analyticsflag - Dictionary pretraining:
Dictionarymodule for training/saving/loading.aedfiles;save_state()/load_state()onProbabilityPredictortrait; implemented for Order0, RLE, NeuralSSM;FLAG_HAS_DICTIONARYheader flag;Compressor/Decompressor::with_dictionary(); CLIaet traincommand and--dictionaryflag on compress/extract - Archive migration tool:
Migratorstruct inpipeline::migratefor decompress→recompress with new settings (predictor change, dictionary, encryption); CLIaet migratecommand - REST API server (
aether-servercrate):axum-based HTTP server with/compress,/extract,/verify,/list,/health,/versionendpoints; multipart upload; configurable port and max upload size - Cloud storage adapters:
StorageBackendtrait withCloudReader(Read+Seek via range requests); S3, GCS, Azure Blob stub implementations; URL parser fors3://,gs://,az://schemes; 5 unit tests - C FFI crate (
aether-ffi):aether.hheader via cbindgen, lifecycle/compression/decompression/error APIs, 10 unit tests - Python bindings (
aether-python): PyO3 module withcompress(),extract(),verify(),list_files(), encryption support, type stubs - Encryption (enterprise feature): AES-256-GCM and ChaCha20-Poly1305 with Argon2id KDF, per-block nonces, 57-byte EncryptionHeader, CLI
--password/--cipherflags, 18 tests - Multi-threaded decompression (enterprise feature): two-phase (sequential I/O → parallel CPU) across solid groups,
Decompressor::with_max_threads(), CLI--threadsflag, 5 integration tests - Pure-Rust suffix array: replaced
cdivsufsortC binding withdivsufsortcrate (eliminates unsafe FFI) - Criterion benchmarks: roundtrip, BWT, range coder, and predictor benchmarks (
cargo bench -p aether-core) - ROADMAP.md with phased production readiness plan
- CHANGELOG.md (this file)
- CONTRIBUTING.md with development guidelines
- SECURITY.md with responsible disclosure policy
- Apache-2.0 license
- GitHub Actions CI pipeline (test, clippy, fmt, audit)
deny.tomlfor cargo-deny license and CVE auditing#[non_exhaustive]on all public enums for forward compatibility- Bounds checking on archive-supplied sizes before memory allocation
MAX_DECOMPRESSED_BLOCK_SIZE(64 MiB) safety capMAX_FILE_COUNTandMAX_BLOCK_COUNTsanity limits- Named constant
BWT_DECISIVE_RATIO(was magic 0.55) - MSRV policy: Rust 1.75.0
enterprisefeature flag for future gated features- Comprehensive
///rustdoc on all public API items MAX_BWT_INPUT_SIZE(8 MiB) guard to prevent OOM from BWT doubled-text allocation- Bounds checks on
encoded_len/lz_lenin decompressor (crafted archive defense) ContextMixermemory warning rustdoc for parallel compression workloads- Format stability warning in
lz_preprocess.rsmodule docs - Fuzz targets:
fuzz_block_header,fuzz_streaming_metadata,fuzz_decode_block,fuzz_range_coder PredictorId::Rlevariant (0x0005) —RlePredictornow has its own predictor IDcontext-mixerfeature flag: gatesContextMixer,Lz4AwarePredictor, and related testslz4feature flag: gateslz_preprocessmodule andlz4_flexdependency- Memory backpressure:
Compressor::with_max_threads()limits concurrent group compression (default 4) #[ignore]on slow hyperparameter sweep tests
Changed
MAX_DECODE_SIZEin range coder now aligned withMAX_DECOMPRESSED_BLOCK_SIZE(was hardcoded 16 MiB)- Router sync section: replaced redundant
encode_blockre-encode withsync_predictorfor LZ77 and PredictorRans paths (eliminates double-encode, fixes cross-block state contamination) - Decompressor: added
sync_predictorcalls after LZ77/LZ4/PredictorRans decode to maintain symmetric cross-block state with compressor - Error messages in
decompress_block/decompress_block_streamingnow include block ID, archive offset, group ID, and compression method bwt_mtf_encode_parts/bwt_mtf_encodenow returnResult(was infallible)lz4_flexpinned to=0.11.3(format stability)- Split
decompress.rsintodecompress.rs(shared types),decompress_seekable.rs, anddecompress_streaming.rs— public API unchanged lz4_flexis now an optional dependency (enabled vialz4feature, indefault)- Parallel compression uses bounded
rayon::ThreadPoolBuilderinstead of global pool
[0.1.8] — 2026-01-20
Added
- Streaming decompression (
Read-only path, noSeekrequired)read_metadata_streaming()for sequential metadata parsingextract_all_streaming()/extract_with_streaming_metadata()two-phase APIverify_streaming()/list_files_streaming()full streaming support- Per-group predictors via
HashMap<u32, Box<dyn ProbabilityPredictor>> - CLI:
"-"stdin sentinel for Extract, List, Verify commands
- Skip
sync_predictorduring decompression optimizationpredictor_state_flagin BlockHeader (byte 13)CompressedChunk.predictor_syncedtracking during compression- Backward compatible: old archives always sync (same as 0.1.7)
- 9 new integration tests: streaming roundtrip (4), streaming verify (2), streaming list (1), metadata detection (1), two-phase extraction (1)
[0.1.7] — 2025-11-15
Changed
- Custom range coder replacing
constrictioncrate (LZMA-style carry-propagating encoder + subtraction-based decoder) predict_cdf()method onProbabilityPredictortrait (returns[u16; 257]CDF)- Chunk sizes: MIN 4 KiB → 16 KiB, AVG 64 KiB → 512 KiB, MAX 512 KiB → 4096 KiB
- RLE decoder hardening: saturating arithmetic, MAX_DECODE_SIZE=16 MiB guard
- Extract command now shows MiB/s speed
wrapping_mulin LCG test generators, mid-payload corruption offsets
[0.1.6] — 2025-09-22
Added
- NeuralSsmPredictor: diagonal SSM + RLE baseline + order-2 context
- Adaptive mixer with EMA log-likelihood sensitivity
- Silesia-tuned hyperparameters (D=32, lr=0.01, o2_blend=0.30)
[0.1.5] — 2025-07-28
Added
- BWT + MTF + RUNA/RUNB RLE preprocessing pipeline
- Custom LZ77 encoder (min-match-3, lazy matching, 64KB window)
- Adaptive routing cascade: BWT → LZ77 → plain RC → Zstd → Store
- RlePredictor: hierarchical 3-context predictor for RLE streams
- Semantic solid grouping by content type
- Parallel group compression via rayon
[0.1.0] — 2025-05-16
Added
- Initial AetherArch implementation
.aetbinary archive format- Order0Model, ContextMixer, Lz4AwarePredictor
- FastCDC content-defined chunking
- BLAKE3 integrity checksums
- CRC32 header/trailer verification
- CLI tool (
aet) with compress, extract, list, verify, bench commands
Copyright 2024-2026 Craton Software Company Licensed under Apache-2.0.