So I've been sitting in on a lot of ERP AI chatbot scoping conversations lately, some for clients, some for people just starting to evaluate, and there's this one thing I keep seeing that genuinely makes me uncomfortable every time.
A team gets a demo. The chatbot looks incredible in the demo; it answers cross-system questions, pulls live data, triggers approvals, and handles follow-ups in context. Everyone in the room is excited. They sign.
And then somewhere between month four and month eight, someone on the operations team quietly mentions that employees are still submitting tickets for the same queries the chatbot was supposed to handle. The chatbot in the demo ran against a 200-row test dataset; the production ERP had 11 years of transaction history and three custom modules the vendors hadn't seen. And the IT person in the room goes quiet because they already knew.
The demo chatbot retrieved data from a clean, prepped environment.
The production chatbot was connected to the actual ERP; the one with seven years of custom modules, a Salesforce instance, a data warehouse nobody documented properly, and a legacy approval workflow that only three people in the company fully understand.
Those are not the same build. The proposal treated them like they were.
What I've started realising is that there are basically two types of ERP AI chatbots, and vendors don't volunteer which one they're actually scoping. One reads your ERP data. One acts on it; triggers workflows, executes approvals, and escalates vendor SLA breaches without someone manually catching it. The first one saves an employee a few minutes per query. The second one removes entire process steps. The price difference in the proposal is not proportional to the capability difference in production.
And the gap almost never shows up at launch. It shows up at the six-month adoption review when daily active usage is 20% of what was projected, and nobody can clearly explain why.
From what I've seen, the questions that actually expose this before you sign anything:
- Did a person with real ERP engineering experience review this scope, or just an AI product team?
- Does the proposal include post-launch model retraining, or does it stop at go-live?
- What happens when an employee's query falls outside the training data?
A good team will walk you through the failure mode. A team that hasn't actually built inside your ERP will give you a very confident non-answer.
Anyway's. Curious if anyone else has been through this. What was the gap between what got demoed and what got deployed? Was it the cross-system queries? The compliance architecture that got added as an afterthought? The retraining that was supposed to be quarterly and never happened!!
What actually happened vs. what the proposal said.