u/FormExtension7920

Building in the agent observability space and trying to get a real picture from people actually running this stuff in production, not the theoretical version.

Three questions:

  1. Last time an agent did something unexpected in prod, what tipped you off? Customer report, dashboard, manual review, something else?
  2. What's your current monitoring setup for agent behavior, if you have one?
  3. Where do your evals tend to miss real issues?

Not selling anything in the comments, trying to understand where the actual gaps are.

reddit.com
u/FormExtension7920 — 14 days ago

Hey all, I'm building in the AI observability space and trying to understand what actually sucks about the current tools before I add more of the same to the pile.

Some stuff I keep hearing:

- Evals only catch what you already knew to look for

- Dashboards look healthy while agents quietly degrade

- Setup is heavy, you end up instrumenting forever

- Pricing scales in weird ways with trace volume

What's actually been your experience? Specifically:

  1. A failure mode that slipped through your current tooling and you only caught from a user complaint

  2. If you could wave a wand and fix one thing about your setup, what would it be

  3. What made you switch tools, or stop using one entirely

Trying to learn what's broken. Happy to share what I find back.

reddit.com
u/FormExtension7920 — 15 days ago