
I built a small lossless preprocessing library called STRATA.
It exposes structural transforms such as:
- 2D predictors
- cube rotation
- radial reordering for 3D voxels
- YCoCg-R colour transform
- automatic per-input transform selection
Repo: https://github.com/rjamesy/strata
The unusual finding is that STRATA works better as a preprocessor for general-purpose codecs than as a standalone codec.
In particular, STRATA-preprocess + zstd-1 can measurably beat raw zstd-22 on both speed and compression ratio for shape-aware data:
- 27 MB RGB photo + YCoCg-R + zstd-1: 463× faster and 5.4% smaller than raw zstd-22
- Smooth 2D heightmap + 2D predictor + zstd-1: 4.2× faster and 44.5% smaller
- 64³ volume + cube rotation + radial + zstd-22: 14.4% smaller at roughly the same speed
The mechanism appears to be simple: zstd at -22 performs expensive long-range string matching, but on smooth or structured raw data there may be few exact repetitions to match. STRATA exposes the redundancy directly, so even zstd-1 can exploit it. Total work decreases.
The results are reproducible: bench/preprocess_demo.py writes a CSV covering all tested combinations.
Caveats:
- STRATA does not beat WebP-lossless on natural-photo RGB, though it narrows the gap from about 50% to 13%.
- It ties bzip2 on plain text.
- The project is MIT-licensed.