A simple AI agent override mistake wiped out our ART metrics improvement
Still cant believe i did this. we rolled out this new ai agent setup a couple months ago for tier 1 tickets. supposed to auto resolve simple stuff like password resets and basic app crashes cutting average resolution time from 45 minutes down to under 5 per early reports. whole point was compressing time to value on every employee request management loves the dashboards showing slas green across the board.
was tweaking permissions yesterday because some high priority incidents were getting stuck in queue. agent was too aggressive on p2s so i wrote a quick bulk update script that pulled back a few hundred open tickets from last week across a couple of categories. tested on staging first everything fine. but i was rushing end of day friday brain dead from back to back meetings and hit the prod endpoint instead.
script ran in 90 seconds marked every matching ticket as resolved with canned note from agent 'instant intervention complete user notified'. art plummets overnight from 12 minutes average to 2.3 minutes. looks amazing at first glance until you dig in. 80% reduction but now 800 tickets show resolved with zero human touch including around 60 serious cases like broken payroll access and crm outages.
morning meeting cto pulls up the metrics dashboard screaming about how art never looked this good but finance director is furious because their month end reports are gone. service desk phones melting down employees calling back saying their issues vanished. slas technically hit but audit trail shows my id did bulk closure on everything. scrambling to reopen without triggering false alerts or double counting stats.
team is pissed i bypassed qa manager wants post mortem asap and now legal asking about compliance since some were security tickets. we can recover most data but the embarrassment is killing me. has anyone nuked their core metrics like this with ai overrides and how bad does this blow up usually??