u/Nonkilife

[Seeking Review] SPX: A Lossless Image Codec using RCT + MED + Sharding + rANS

[Seeking Review] SPX: A Lossless Image Codec using RCT + MED + Sharding + rANS

Hi all,

I've spent the last few months developing a lossless image compressor called SPX, aiming to balance compression density and encoding speed, that is, maintaining compression rate higher than .webp (m6) but lower than .jxl (e7) while significantly enhancing encoding speed.

I did some testing and the performance seems consistent in most datasets but compression savings aren't that consistent.

https://preview.redd.it/kg6xfriluuzg1.png?width=1231&format=png&auto=webp&s=0a2f2a5e4de4c1df3059f6f24536cad59bcf9d92

I think I've hit my limit as a self-taught amateur developer knowing a little Python. I can't come up with any new idea to improve it anymore so Gemini suggested coming here for professional advice.

It's an Apache 2.0 open source project. Any suggestion on how to improve compression rate without losing too much speed is highly appreciated! Thank you!

GitHub: https://github.com/nonkilife/SPX-Image-Lossless-Compression

Quick Start: pip install spx-codec

==

// The Architecture:

SPX isn't a fundamental breakthrough, but a streamlined 4-part pipeline designed for modern CPU throughput:

  1. RCT: Reversible Color Transform (Green-sub).
  2. MED: Branchless Median Edge Detector.
  3. Stateless Sharding: Pixels are allocated into 42 shards based on local gradient (v), luminance (i), and direction (t). These 3 parameters can be adjusted to accommodate different types of images to obtain better performance.
  4. Entropy Coding: Rust-based 4-way Interleaved rANS.

// Customization & Extensibility:

  • Dynamic Sharding: The (i, v, t) boundaries for pixel classification are not hard-coded. They can be easily re-tuned to accommodate specialized image distributions.
  • Flexible Entropy Modeling: The rANS probability modes are stored in .npz format. This allows users to swap or retrain templates for specific datasets without re-compiling the core Rust engine.
  • Adaptive Framework: While current design is a common solution, the architecture is designed to be a "compression sandbox" for specific domain needs.

// The Performance (Snapshot on AMD Ryzen 5 3500X):

  • Encoding Speed: ~12 MB/s on Kodak, peaking at 44 MB/s on standard synthetic sets.
  • Compression Ratio: Consistently 25-30% smaller than PNG; sits between WebP (M6) and JXL (E7) most of the time.
  • Validation: Bit-perfect verification (MSE = 0) with an integrated unified benchmark suite.
  • Target Data: Tested on CLIC, DIV2K, Tecnick, ICI, and Kodak (primarily natural photography).
  • Limitation: Validation on synthetic images is currently limited, so consistency in those specific domains remains a known unknown.
  • Comparative Benchmark: https://github.com/nonkilife/SPX-Image-Lossless-Compression/blob/main/technical/BENCHMARK.md

// The Bottleneck:

I've reached a point where manual optimizations (branchless logic, LUT, SIMD-friendly structures) are no longer yielding significant gains.

I've experimented with:

  • Predictors: Swapping MED for GAP or Paeth (MED still wins on speed/ratio balance).
  • Context: Adding UR, UU, LL pixel data to MED (speed tumbled, ratio improvement was negligible).
  • Sharding: Tested >5,000 shard combinations up to ~60 shards using Monte Carlo Simulation; the current 42-shard model seems to be the "sweet spot" for speed. Adaptive sharding based on image unique fingerprints (eg. H-entropy, AAD, size, R:G:B proportion, etc) was also tested but compression improvement was minor and experienced significant speed loss.
  • rANS PDF: High-bit modes proved too overhead-heavy for most shards after analyzed Clic 2021 dataset.

While 90% of approaches are proven failure, there is still unexplored territory:

  • 8-way Interleaving: I've considered scaling the rANS core to 8-way interleaving. However, initial analysis suggests my current Zen 2 architecture (3500X) might suffer from cache port contention or register pressure at that level. I've stuck with 4-way as a stable, high-efficiency baseline.
  • C++ & AVX-512: The current engine is a Python/Rust hybrid. I suspect a pure C++ implementation leveraging AVX-512 could push the throughput slightly higher, but that currently exceeds my personal technical stack.
reddit.com
u/Nonkilife — 7 days ago