We upgraded four GKE clusters from 1.27 to 1.28 two weeks ago. No workload changes, no node pool changes, same namespace structure. Our network egress bill jumped 40% across all four clusters overnight.
Digging into the billing export, I see Network Internet Egress from Americas to Americas SKU up 35% and Network Inter Region Egress up 50%. But nothing changed in our service mesh or ingress controllers.
Checked the usual suspects: north-south traffic through LoadBalancer services looks flat. No new external endpoints. VPC Flow Logs show the same source/destination pairs as before.
Then I noticed something: GKE 1.28 enables Container Network Interface (CNI) managed node prefixes by default on new node pools. Our node pools weren't new, but the upgrade might have rolled the feature anyway. That feature can cause additional control plane communication over the network interface, which might be getting billed as egress even within the same VPC.
Also looking at kube-proxy mode – 1.28 defaults to iptables but if you had ipvs before, the migration could change packet pathing.
Anyone else seeing this? Is there a metric in Prometheus (maybe container_network_transmit_bytes_total vs billing data mismatch) that proves this is a control plane overhead problem? I'd rather not rebuild all four clusters to test the node prefix theory.