u/Brilliant-Weight-234

How are you doing reproducible MySQL benchmarking across versions or configs?

How are you doing reproducible MySQL benchmarking across versions or configs?

I’ve been looking into how people actually benchmark MySQL setups in a way that produces results you can trust and compare over time.

On paper it sounds simple, but once you try to compare across:

different MySQL versions
config changes
environments

it gets messy quite quickly.

Typical issues I keep hearing about:

results that are hard to reproduce
leftover state affecting runs
difficulty explaining why numbers differ, not just that they do

The part that seems especially tricky is controlling the full lifecycle:

clean state between runs
consistent warmup
repeatable execution
attaching diagnostics so results are interpretable

We’ve been working on a framework that tries to make this more deterministic:

explicit DB lifecycle per iteration
hooks for diagnostics/profiling
consistent execution + reporting

There’s a beta here if anyone is curious:
https://mariadb.org/mariadb-foundation-releases-the-beta-of-the-test-automation-framework-taf-2-5/

Mostly interested in how others approach this:

Do you trust your benchmarking results?
How do you ensure reproducibility?
Are you using existing tools or mostly custom scripts?
What tends to break consistency the most?

Would be great to hear real-world approaches.

u/Brilliant-Weight-234 — 10 days ago