
Noticed that for cost saving and for scaling, using Agents seems very tempting but most of you guys are not adopting it because the possibility of it messing things up. I have been following this and many other subs around LLMs and Agents, everything from the top posts to recent are regarding agents going off and doing something they are not supposed to do, drift and ignore the system prompts. Real examples:
- "Never delete user data" → agent calls
DROP TABLE usersnext turn - "Don't share internal pricing" → agent leaks cost basis to a customer
- "Verify identity first" → agent skips to the action
- Add 10 more rules → model quietly drops the first 5
I am 100% sure if you have used Agents in prod, this has occurred to you (especially when your system prompts get larger, and context gets bigger). You can test this yourself and notice immediate enforcement.
Prompt-based rules are suggestions, not constraints. Re-prompting fixes one case, breaks two. Post-hoc evals tell you what already went wrong. NeMo and Guardrails AI help on content safety but don't cover business logic/your specification.
After tackling this from a few angles, I finally got something solid. A proxy system between your app and your LLM, which reads rules from a plain markdown, enforces at runtime. Provider-agnostic, one base URL change, works with LangGraph/CrewAI/custom.
- Maximum discount is 15%.
- Never reveal internal pricing or cost basis.
Without it: agent offers 90% off and mentions your margin. With it: 15%, no margin talk.
Curious if it solved your LLMs for outputting incorrect stuff or agents from going off tracks, it definitely did for my (specific) use cases.
What's everyone doing for this in prod? Shadow evals? Re-prompt loops? Something I'm missing?