TL;DR: During blue-green broker deployments with 50K+ partition movements, Cruise Control moves all replicas over X hours, then executes all 10K+ leadership changes in a concentrated burst at the end, causing client latency spikes. Looking for ways to spread leadership movements throughout the rebalance.
Background scenario:
We run a 9-broker Kafka cluster and do blue-green deployments where we add 9 new brokers and rebalance the entire cluster. Our typical rebalance involves ~55,000 partition movements.
The execution is sequential:
- Move ALL replicas (X hours)
- Then move ALL leadership (concentrated burst)
This causes a "leadership storm" at the end where thousands of leadership changes happen rapidly, leading to client connection disruptions and request timeouts.
Questions:
- Is this sequential execution (replicas → leadership) fundamental to CC's architecture, or are we missing a config option?
- Has anyone else dealt with this during large rebalances or blue-green deployments?