u/Fantastic_Scratch767

I built a small lossless preprocessing library called STRATA.

It exposes structural transforms such as:

  • 2D predictors
  • cube rotation
  • radial reordering for 3D voxels
  • YCoCg-R colour transform
  • automatic per-input transform selection

Repo: https://github.com/rjamesy/strata

The unusual finding is that STRATA works better as a preprocessor for general-purpose codecs than as a standalone codec.

In particular, STRATA-preprocess + zstd-1 can measurably beat raw zstd-22 on both speed and compression ratio for shape-aware data:

  • 27 MB RGB photo + YCoCg-R + zstd-1: 463× faster and 5.4% smaller than raw zstd-22
  • Smooth 2D heightmap + 2D predictor + zstd-1: 4.2× faster and 44.5% smaller
  • 64³ volume + cube rotation + radial + zstd-22: 14.4% smaller at roughly the same speed

The mechanism appears to be simple: zstd at -22 performs expensive long-range string matching, but on smooth or structured raw data there may be few exact repetitions to match. STRATA exposes the redundancy directly, so even zstd-1 can exploit it. Total work decreases.

The results are reproducible: bench/preprocess_demo.py writes a CSV covering all tested combinations.

Caveats:

  • STRATA does not beat WebP-lossless on natural-photo RGB, though it narrows the gap from about 50% to 13%.
  • It ties bzip2 on plain text.
  • The project is MIT-licensed.
u/Fantastic_Scratch767 — 12 days ago