AOS8 cluster client speed asymmetry issue
Hoping for a quick fix before entering TAC hell...we have a pair of 7210s running 8.10.0.22, and for a little while we've had a somewhat reproducible issue where some clients' download speeds max out between 10 and 30 Mbps. I am now about 90% sure that this is happening only to clients where their anchor controller is different from their AP's anchor controller. e.g.:
AP01 is anchored to controller A
Client X on AP01 is anchored to controller A, speeds are great
Client Y on AP01 is anchored to controller B, speeds are asymmetrically poor
Clients with the same anchor as their AP can reliably get 700Mbps symmetrical on 6GHz, but when the anchors are separate it'll be something like 20Mbps down and 400Mbps up.
I'm not sure exactly when this started, though a likely start was a few weeks ago when we went from 8.10.0.19 to 8.10.0.22.
This happens on any kind of SSID (enterprise, open, SAE), though all are tunneled. I do not really want to convert the entire campus to bridged SSIDs as a workaround.
We have jumbo frames (9198) enabled everywhere involved (APs, switch uplinks, controller LAGs). The controllers each have an MCLAG to our CX6400 cores, which have a VSX ISL with MTU 9500.
No wired ports involved show any drops or errors, each controller has a port-channel of two 10GbE links. We see jumbos incrementing on the switches, and giants incrementing on the controllers. Controller port-channels and interfaces all confirm jumbo/9198 is enabled.
When I look at the datapath tunnel list, it seems like the AP tunnels are all MTU 9000, but the tunnel between the controllers is 1500. Not sure what's expected here.
Rebooting APs does not resolve, and a phased reboot of the controllers last night did not resolve. I'm considering going to 8.13.2.0 tonight as a last resort before starting the TAC journey. Any advice is welcome!