Building in the agent observability space and trying to get a real picture from people actually running this stuff in production, not the theoretical version.
Three questions:
- Last time an agent did something unexpected in prod, what tipped you off? Customer report, dashboard, manual review, something else?
- What's your current monitoring setup for agent behavior, if you have one?
- Where do your evals tend to miss real issues?
Not selling anything in the comments, trying to understand where the actual gaps are.