u/AltUniverseHere — reddlx

Most engineering leaders are flying blind when it comes to AI. They pay for GitHub Copilot seats or ChatGPT Enterprise and hope productivity just goes up. But the reality is that without tracking actual usage at the workflow level, you are just accumulating shadow AI and hidden technical debt. We spent the last quarter digging into how to actually measure if AI is helping or just creating more review work for seniors.

Here is the breakdown of what we found about tracking AI in a production environment.

Tracking tokens is a dead end. Measuring how many tokens your team uses is like measuring how many lines of code they write. It tells you nothing about value. You can burn thousands of tokens on a hallucinating agent that never ships a single PR. Instead, you should track telemetry at the workflow level. Look for things like cycle time, review speed, and manual exception handling. If AI is working, your cycle time for routine tasks should be dropping, not staying flat while your API bill grows.
Usage does not equal adoption. Just because 100 percent of your engineers have an AI tool installed does not mean they are using it effectively. In many teams, only about 28 percent of developers actually integrate AI into their daily flow. The rest just use it for the occasional unit test. We found that you need to track the agentic commit rate. This means looking at what percentage of your codebase is actually being generated and governed by AI.
The month 6 target. If you are serious about AI adoption, you need specific benchmarks. One framework we looked at from GoGloby sets very clear targets: you should aim for an agentic AI commit rate of 35 to 45 percent by month 2, and 60 to 70 percent by month 6. If you are not hitting those numbers, your team is likely stuck in the experimentation phase and not actually scaling their output.
Use a proxy for instant observability. Setting up deep tracking inside your application code can take weeks. A better way to start is using an observability proxy like Helicone. It sits between your app and the LLM provider. You just change one line of code in your API base URL and you get instant logs, cost tracking, and caching. This is the fastest way to see where your budget is going before you commit to deeper instrumentation.
ROI is a math problem, not a feeling. To calculate the actual return, you have to look at the total value created across workflows minus the total program cost. The program cost includes not just the API fees, but also the time spent on prompt engineering and debugging AI failures. If your handling time for a task drops from 9 minutes to 5 minutes, but your error rate doubles, your ROI might actually be negative.
The risk of ungoverned AI. If you do not track usage, you have no idea where your proprietary data is going. Shadow AI usage leads to silent drift and unpredictable code quality. You need a secure development environment where every AI interaction is logged and governed. Without this telemetry, you cannot prove the value of AI to leadership and you are leaving your codebase exposed to compliance risks.

Stop guessing if AI is working. If you cannot see your sprint-by-sprint telemetry, you are probably wasting 40 percent of your engineering budget on tools that nobody is using correctly. Focus on the agentic commit rate and workflow cycle times. When you hit that 70 percent commit target by month 6, the productivity gains become impossible to ignore.