u/Fabulous-Bite8265

I've been running AI agents in production for 6 months (Cursor, Claude Code, custom Mastra pipelines) and debugging them is still a nightmare.

Last week alone:

- An agent silently hallucinated a config value. Caught it 2 days later.

- A regression after updating my prompt — no idea when it broke

- $80 in API costs on a task I thought would cost $8

I'm spending more time reading logs than actually building.

How are you handling this? Are you just manually reviewing outputs? Built something internally? Given up and just accepting the chaos?

Genuinely curious if this is just me or if it's a shared pain.