u/Brilliant-Weight-234

How are you doing reproducible MySQL benchmarking across versions or configs?
▲ 9 r/mysql

How are you doing reproducible MySQL benchmarking across versions or configs?

I’ve been looking into how people actually benchmark MySQL setups in a way that produces results you can trust and compare over time.

On paper it sounds simple, but once you try to compare across:

  • different MySQL versions
  • config changes
  • environments

it gets messy quite quickly.

Typical issues I keep hearing about:

  • results that are hard to reproduce
  • leftover state affecting runs
  • difficulty explaining why numbers differ, not just that they do

The part that seems especially tricky is controlling the full lifecycle:

  • clean state between runs
  • consistent warmup
  • repeatable execution
  • attaching diagnostics so results are interpretable

We’ve been working on a framework that tries to make this more deterministic:

  • explicit DB lifecycle per iteration
  • hooks for diagnostics/profiling
  • consistent execution + reporting

There’s a beta here if anyone is curious:
https://mariadb.org/mariadb-foundation-releases-the-beta-of-the-test-automation-framework-taf-2-5/

Mostly interested in how others approach this:

  • Do you trust your benchmarking results?
  • How do you ensure reproducibility?
  • Are you using existing tools or mostly custom scripts?
  • What tends to break consistency the most?

Would be great to hear real-world approaches.

u/Brilliant-Weight-234 — 10 days ago