ex_data_sketch v0.8.0 — Deterministic Foundations
ex_data_sketch v0.8.0 is out. This release invests entirely in the substrate that all 15 existing sketches share, preparing the grounds for release v0.9.0 where we add streaming integrations for Broadway / GenStage support, ETS / DETS / Zarr.
What's new:
Deterministic hashing. Every sketch now goes through a validated, byte-stable hash layer. HLL, ULL, Theta, and CMS accept
hash_strategy: :murmur3for Apache DataSketches interop — this was silently ignored in v0.7.x. XXHash3 remains the default and fastest path (~30 M items/sec at p=14 on the Rust NIF).Binary stability & corruption detection. Serialized sketches now carry a CRC32C trailer and an embedded hash metadata block (EXSK v2). Bit-flip corruption that previously would silently produce wrong estimates is now caught and returns a structured
DeserializationError. v0.8.0 reads v1 frames; v0.7.x cannot read v2 — stage your rollout accordingly.Murmur3 hot path. 8 new Rust NIFs extend in-Rust hashing to Murmur3. The Murmur3 path is within 8% of XXH3 throughput. No more falling off the fast path when you select
:murmur3.Precompiled NIFs for Windows. x86_64 and ARM64 MSVC targets join the matrix. 16 artifacts total (8 targets x 2 NIF versions). No Rust toolchain needed on any supported platform.
Property-locked guarantees. 14 StreamData properties lock HLL/ULL monotonicity and error bounds, KLL/REQ rank consistency, CMS overestimation-only, and Bloom/XorFilter/Cuckoo no-false-negative. A 200-mutation fuzz suite verifies that binary v2 corruption never silently propagates.
Breaking changes (2):
- EXSK v2 is one-way. v0.7.x readers can't decode v2 frames. Deploy readers first, then producers.
hash_strategy: :murmur3is no longer silently overridden to:xxhash3. Sketches that specified Murmur3 will now actually use it — estimates are correct but differ from v0.7.x.
One-liner upgrade:
{:ex_data_sketch, "~> 0.8.0"}
Most users need no code changes. Full migration guide ships in HexDocs.
Stats: 1,317 tests, 171 properties, 92.7% coverage, 0 credo issues.