u/Cloaky233

Last week at my day job, I watched a customer support agent, an AI one, confidently issue a refund using a policy that had changed three weeks earlier. Nobody told the agent. The doc that defined the policy was last updated in 2024. The actual current rule was buried in a Slack thread from a Tuesday standup. The agent did exactly what its training said to do. The training was wrong because the company had moved on without it.

This is not a model problem. The model is fine. The model is, frankly, smarter than half my team. The problem is that the agent has no idea how this specific company actually works.

Think about what we let AI agents do today. They write production code. They open pull requests. They debug stack traces at 3am. They have full access to our codebase and they ship to main branch.

But ask the same agent: "should we approve this pricing exception for Customer X?" and it has nothing. No context. No history. No idea that we already gave that customer two exceptions this quarter and a third would tank our margin model. The agent that just shipped 400 lines of Rust into production cannot answer a question my junior PM answers in 10 seconds.

Why?

Because the knowledge a company actually runs on doesn't live in code. It lives in:

  • A Slack thread from last Tuesday where the founder said "no more refunds above $50 without my approval."
  • The senior support rep who's been there four years and just knows which customers get the legacy flow.
  • A Notion doc someone started in 2024 and abandoned in March.
  • The retro after that one bad incident where the team quietly agreed to never page Marcus again on weekends.

Every company runs on hundreds of these. They are the actual operating manual. They never get written down because they change every week. The doc would be stale by the time you finished writing it.

So I'm building something I think every company is going to need... NanoEval

A layer that watches how a company actually decides things, in real time, from where the decisions actually happen. Slack, tickets, meetings, AI sessions, the whole exhaust. It extracts what I'm calling semantic variables, the rules and thresholds the company decides on, and tracks them on a graph that updates over time.

When the refund threshold changes after Tuesday's standup, the graph updates. Wednesday's agent runs on the new rule. The graph can also answer "what did we believe two weeks ago and why did we change it." Documents can't do that. Documents lie about being current.

The output ships as a skills file in the agentskills.io / MCP format. Any agent, Claude, Cursor, ChatGPT, your in-house one, calls the same brain. They stop guessing.

And here's the actual point. Once an agent has this, it's no longer just a builder. It can sit in on the decisions too. It can flag when a proposed pricing exception contradicts a rule from last sprint. It can notice that the same question got asked three times in three different Slack channels this week. It can tell you that the run-book your on-call agent is following was deprecated in March.

AI already builds with us. The next frontier is AI deciding with us. Not replacing the humans, augmenting the room. The thing in the corner of the meeting that quietly knows the entire history of every decision the company has ever made.

I'm one week in. Pre-MVP. Solo. Building from Bangalore. Fine-tuning small open-weight models for the variable extraction because foundation models over RAG hallucinate process. Temporal graph architecture is settling. First prototype in 4 weeks.

If you've deployed an AI agent at your company and watched it confidently do the wrong thing, you already know why this needs to exist.

If you want to follow what I'm building or be a design partner, my DMs are open. If you want to tell me I'm wrong, also open. Both are useful.

u/Cloaky233 — 9 days ago