CVE reduction gone wrong: 2GB container images deployed and audited in production
Our security team decided to tackle our CVE backlog by building minimal container images. Minimal ended up meaning strip everything, then add it all back when builds started failing. We shipped 2GB images to production last month.
A compliance auditor showed up yesterday for a routine check and asked why our container images were the size of small VMs. I had to explain to our CTO why our CVE reduction effort tripled deployment bandwidth and made our security posture look worse on paper than before we started.
We didn't catch it ourselves because everything worked. Images deployed, services ran, CVE numbers went down. Nobody checked actual image size because that wasn't the metric we were watching. The debug utilities and build dependencies that crept back in during troubleshooting just stayed there.
Pull times went from 2 minutes to 8. That showed up in deploy metrics but we blamed the registry.
The thing I keep coming back to is that we had no automated check on image composition after the build. CVE count was the only signal we were watching and it told us we were fine.
Has anyone actually solved the image composition validation problem in CI? Something that catches bloat before it gets to production, not just CVE count.