u/Shoddy_One4465

▲ 9 r/erlang+1 crossposts

ex_data_sketch v0.8.0 — Deterministic Foundations

ex_data_sketch v0.8.0 is out. This release invests entirely in the substrate that all 15 existing sketches share, preparing the grounds for release v0.9.0 where we add streaming integrations for Broadway / GenStage support, ETS / DETS / Zarr.

What's new:

  • Deterministic hashing. Every sketch now goes through a validated, byte-stable hash layer. HLL, ULL, Theta, and CMS accept hash_strategy: :murmur3 for Apache DataSketches interop — this was silently ignored in v0.7.x. XXHash3 remains the default and fastest path (~30 M items/sec at p=14 on the Rust NIF).

  • Binary stability & corruption detection. Serialized sketches now carry a CRC32C trailer and an embedded hash metadata block (EXSK v2). Bit-flip corruption that previously would silently produce wrong estimates is now caught and returns a structured DeserializationError. v0.8.0 reads v1 frames; v0.7.x cannot read v2 — stage your rollout accordingly.

  • Murmur3 hot path. 8 new Rust NIFs extend in-Rust hashing to Murmur3. The Murmur3 path is within 8% of XXH3 throughput. No more falling off the fast path when you select :murmur3.

  • Precompiled NIFs for Windows. x86_64 and ARM64 MSVC targets join the matrix. 16 artifacts total (8 targets x 2 NIF versions). No Rust toolchain needed on any supported platform.

  • Property-locked guarantees. 14 StreamData properties lock HLL/ULL monotonicity and error bounds, KLL/REQ rank consistency, CMS overestimation-only, and Bloom/XorFilter/Cuckoo no-false-negative. A 200-mutation fuzz suite verifies that binary v2 corruption never silently propagates.

Breaking changes (2):

  1. EXSK v2 is one-way. v0.7.x readers can't decode v2 frames. Deploy readers first, then producers.
  2. hash_strategy: :murmur3 is no longer silently overridden to :xxhash3. Sketches that specified Murmur3 will now actually use it — estimates are correct but differ from v0.7.x.

One-liner upgrade:

{:ex_data_sketch, "~> 0.8.0"}

Most users need no code changes. Full migration guide ships in HexDocs.

Stats: 1,317 tests, 171 properties, 92.7% coverage, 0 credo issues.

GitHub | Hex | Docs

reddit.com
u/Shoddy_One4465 — 1 day ago
▲ 20 r/erlang+1 crossposts

Released v0.2.0 of ExSystolic -- a BEAM-native systolic array simulator. If you're into parallel algorithms, dataflow computing, or just like seeing deterministic parallelism on the BEAM, this might be interesting.

What's a systolic array? It's a grid of simple processors (PEs) connected by FIFO links, all driven by a global clock. Data pulses through the grid one tick at a time. The canonical use case is matrix multiplication, but the same pattern works for convolution, shortest paths (tropical semi-ring), and any sliding-window computation.

A systolic array is a hardware execution model designed for high-throughput, repetitive computations where data flows through a grid of simple processing units. Instead of constantly moving data back and forth from memory (the real bottleneck in modern systems), it keeps data in motion and reuses it as it propagates through the array. This makes it extremely efficient for workloads dominated by linear algebra—especially matrix multiplications, convolutions, and streaming transformations.

This is why systolic designs underpin much of today’s AI and high-performance compute stack. Google TPUs, accelerators from NVIDIA, and chips used by Tesla all leverage similar principles to power neural networks, computer vision, and real-time inference. The same model also applies to signal processing, scientific computing, and even emerging database and graph workloads—making systolic execution a compelling abstraction for building next-generation data and compute systems.

What's new in v0.2.0:

  • Parallel backend -- splits arrays into tiles, dispatches them in parallel via Task.Supervisor or a Poolex worker pool. The interpreted (sequential) backend still works.
  • Proven determinism -- both backends follow the same 6-step BSP contract. Conformance tests verify that interpreted and partitioned backends produce identical PE states and trace events. The parallel backend uses ordered: true dispatch and sorts trace events by {tick, coord}.
  • Pluggable topology -- ExSystolic.Space behaviour with a new links/2 callback. Default is 2D grid, but you can implement graph spaces, hierarchical layouts, etc.
  • Shared link operations -- Backend.LinkOps eliminates triple-duplicated inject/read/write logic (~150 LOC removed).
  • 98.4% test coverage, 185 tests + 34 doctests, 0 dialyzer errors.

Quick example (2x2 GEMM, both backends):

alias ExSystolic.{Array, Clock, PE.MAC, Examples.GEMM}

a = [[1,2],[3,4]]
b = [[5,6],[7,8]]

array =
  Array.new(rows: 2, cols: 2)
  |> Array.fill(MAC)
  |> Array.connect(:west_to_east)
  |> Array.connect(:north_to_south)
  |> Array.input(:west, GEMM.west_streams(a, 2, 2, 2))
  |> Array.input(:north, GEMM.north_streams(b, 2, 2, 2))

# Sequential
interp = Clock.run(array, ticks: 5) |> Array.result_matrix()

# Parallel (same result!)
part = Clock.run(array, ticks: 5, backend: :partitioned) |> Array.result_matrix()

interp == part  # => true

The README has a full tutorial on systolic arrays, including image convolution and shortest-path examples.

Would love feedback, especially on the Space/topology abstraction and the parallel dispatch design.

u/Shoddy_One4465 — 11 days ago