u/Best_District593

So I've been sitting in on a lot of ERP AI chatbot scoping conversations lately, some for clients, some for people just starting to evaluate, and there's this one thing I keep seeing that genuinely makes me uncomfortable every time.

A team gets a demo. The chatbot looks incredible in the demo; it answers cross-system questions, pulls live data, triggers approvals, and handles follow-ups in context. Everyone in the room is excited. They sign.

And then somewhere between month four and month eight, someone on the operations team quietly mentions that employees are still submitting tickets for the same queries the chatbot was supposed to handle. The chatbot in the demo ran against a 200-row test dataset; the production ERP had 11 years of transaction history and three custom modules the vendors hadn't seen. And the IT person in the room goes quiet because they already knew.

The demo chatbot retrieved data from a clean, prepped environment.

The production chatbot was connected to the actual ERP; the one with seven years of custom modules, a Salesforce instance, a data warehouse nobody documented properly, and a legacy approval workflow that only three people in the company fully understand.

Those are not the same build. The proposal treated them like they were.

What I've started realising is that there are basically two types of ERP AI chatbots, and vendors don't volunteer which one they're actually scoping. One reads your ERP data. One acts on it; triggers workflows, executes approvals, and escalates vendor SLA breaches without someone manually catching it. The first one saves an employee a few minutes per query. The second one removes entire process steps. The price difference in the proposal is not proportional to the capability difference in production.

And the gap almost never shows up at launch. It shows up at the six-month adoption review when daily active usage is 20% of what was projected, and nobody can clearly explain why.

From what I've seen, the questions that actually expose this before you sign anything:

  • Did a person with real ERP engineering experience review this scope, or just an AI product team?
  • Does the proposal include post-launch model retraining, or does it stop at go-live?
  • What happens when an employee's query falls outside the training data?

A good team will walk you through the failure mode. A team that hasn't actually built inside your ERP will give you a very confident non-answer.

Anyway's. Curious if anyone else has been through this. What was the gap between what got demoed and what got deployed? Was it the cross-system queries? The compliance architecture that got added as an afterthought? The retraining that was supposed to be quarterly and never happened!!

What actually happened vs. what the proposal said.

reddit.com
u/Best_District593 — 8 days ago

Week ten of a twelve-week build. We're doing a query audit, the kind we should have done in week two, and we realise that roughly 40% of what the client's employees actually need to ask involves data that SAP Joule literally cannot see.

Their Salesforce instance. A data warehouse they'd been running since 2017. Both are completely outside Joule's reach.

We'd proposed Joule because the client was on S/4HANA Cloud, a clean single-vendor stack on paper, and Joule deploys fast for standard in-SAP queries. What we hadn't mapped properly was where their real query volume actually came from. Finance querying ERP data? Fine. Operations wanting to cross-reference CRM history with inventory status? Invisible.

So we went back and rebuilt the connector layer around an RAG approach over their OData layer. Three weeks added to the timeline. Client was decent about it; we'd caught it before deployment, so it wasn't a production disaster, just a painful rework conversation.

The thing that bothers me more than the three weeks: we had enough information to catch this in week one if we'd run the query audit then. The client gave us their top 50 employee requests in the kickoff doc. I looked at it again after week ten, and the cross-system stuff was right there. I just didn't weigh it properly when scoping the integration approach.

The deployed version works. Adoption is actually decent. We pushed it through Teams, which helped a lot; people didn't have to change where they worked. The finance use case took a while to click, but it did.

The retraining cadence I'm less confident about. We went monthly, failure-case-driven. Client's finance team uses enough internal terminology that I think bi-weekly for the first four months would've gotten intent recognition sharper, faster. Hard to know. We didn't run the counterfactual.

Curious if anyone else has hit the query-type coverage problem on multi-platform environments, specifically where the client thinks they're running a clean single-ERP stack but their real workflows are pulling from three systems.

How early are you mapping that before it affects the architecture call?

reddit.com
u/Best_District593 — 8 days ago

Week ten of a twelve-week build. We're doing a query audit, the kind we should have done in week two, and we realise that roughly 40% of what the client's employees actually need to ask involves data that SAP Joule literally cannot see.

Their Salesforce instance. A data warehouse they'd been running since 2017. Both are completely outside Joule's reach.

We'd proposed Joule because the client was on S/4HANA Cloud, a clean single-vendor stack on paper, and Joule deploys fast for standard in-SAP queries. What we hadn't mapped properly was where their real query volume actually came from. Finance querying ERP data? Fine. Operations wanting to cross-reference CRM history with inventory status? Invisible.

So we went back and rebuilt the connector layer around an RAG approach over their OData layer. Three weeks added to the timeline. Client was decent about it; we'd caught it before deployment, so it wasn't a production disaster, just a painful rework conversation.

The thing that bothers me more than the three weeks: we had enough information to catch this in week one if we'd run the query audit then. The client gave us their top 50 employee requests in the kickoff doc. I looked at it again after week ten, and the cross-system stuff was right there. I just didn't weigh it properly when scoping the integration approach.

The deployed version works. Adoption is actually decent. We pushed it through Teams, which helped a lot, people didn't have to change where they worked. The finance use case took a while to click, but it did.

The retraining cadence I'm less confident about. We went monthly, failure-case-driven. Client's finance team uses enough internal terminology that I think bi-weekly for the first four months would've gotten intent recognition sharper, faster. Hard to know. We didn't run the counterfactual.

Curious if anyone else has hit the query-type coverage problem on multi-platform environments, specifically where the client thinks they're running a clean single-ERP stack but their real workflows are pulling from three systems.

How early are you mapping that before it affects the architecture call?

reddit.com
u/Best_District593 — 8 days ago

We were twelve weeks into a SAP S/4HANA chatbot build when we realized we had scoped the wrong architecture.

The client wanted cross-system queries — ERP data combined with their Salesforce instance and a data warehouse they'd been running for seven years. The proposal we started with used SAP Joule as the base layer.

Joule is a good product. It's built natively into the SAP environment, deploys fast, and handles standard in-SAP queries without custom connectors. What it doesn't do well is reach outside the SAP ecosystem. The CRM and warehouse data were invisible to it. The multi-system queries that accounted for about 40% of what the client's employees actually needed to ask were all undetectable.

We caught it before deployment, but only because we did a real query audit in week ten instead of week one, which was our mistake. We went back, redesigned the connector layer around a RAG approach over an OData layer, and added three weeks to the timeline.

The deployed version handles cross-system queries now. Adoption is actually fine. But I think about how that conversation with the client would have gone if we'd launched the Joule version and they'd discovered the gap in production.

The thing I keep seeing in discussions about enterprise AI chatbots is that the native Copilot vs. custom RAG decision gets treated as a style preference when it's actually a functional one. They're not interchangeable options for the same use case.

Native copilots (Joule, Oracle Digital Assistant, Copilot Studio) are the right choice if you're running a clean single-platform ERP stack, your use cases stay within that platform, and you want fast time-to-value on standard query types.

Custom RAG layers are the right choice if you have cross-system data requirements, you need workflow triggering rather than just data retrieval, or your ERP environment includes a legacy platform with non-standard data schemas.

Most mid-to-large enterprise environments are the second scenario. A lot of vendors still propose the first solution because it's easier to scope and faster to demonstrate.

I'm still not sure we got the retraining cadence right on this one. We built out a monthly model retraining loop based on failure case analysis, but the client's finance team uses enough domain-specific terminology that I wonder if bi-weekly would have produced better intent recognition in months two and three. No clean answer on that yet.

Anyone else navigating this in SAP or Oracle environments specifically?

Curious how others are handling the query-type coverage problem when the client's data environment spans more than one platform.

reddit.com
u/Best_District593 — 8 days ago