Title: OT/ICS people: have you seen an authorized action cause problems because it was valid but unsafe?

I’m trying to understand whether this is a real OT/ICS problem or whether I’m overthinking it.

I’m looking for real examples where:

the person was authorized
the session/access path was approved
the asset was legitimate
the command/change/action was technically valid
but it still caused, almost caused, or could have caused a problem because of timing, sequence, value, process state, or field context

Examples I’m thinking about:

Breaker/switch/pump/valve command issued at the wrong time
Rapid repeated open/close or start/stop commands
Wrong setpoint, threshold, mode, or register value
Vendor had approved remote access but too much freedom once inside
Protection/automation/PLC logic change that passed normal workflow but was not safe in the real operating context
Interlocks or permissives existed, but did not cover the actual condition
Temporary vendor/maintenance access became permanent and later created risk
Operator or engineer selected the wrong asset or action in an HMI/SCADA system

For people who work around PLCs, SCADA, DCS, substations, water/wastewater, manufacturing, utilities, or industrial controls:

Have you seen this happen in the real world?

I’m especially interested in:

What happened?
What control was supposed to prevent it?
Why did that control fail or not apply?
Was it caught in real time, after the fact, or not at all?
Would any kind of real-time “second check” have helped, or would that be rejected because of uptime/availability risk?

Not looking for company names or sensitive details. Sanitized stories are fine.

I’m also interested in hearing “this is already solved by interlocks/procedures” or “this would never be allowed in a mature environment” if that’s your experience.

I posted a couple days ago asking:

Got a ton of good responses and a pretty clear split:

Camp 1:

“No remote access ever”
Everything on-site
Eliminate the problem entirely

Camp 2:

Remote access is unavoidable (utilities, manufacturing, distributed assets, etc.)
VPN → DMZ / jump host → session recording
Lock it down as much as possible

Both make sense depending on the environment.

What I didn’t expect was how consistent the answers were around what happens after someone gets in.

A few patterns that kept coming up:

1. It turns into trust pretty quickly

Example someone gave:

Vendor connects via a temporary cellular router
Direct to PLC
Save “before” logic, make changes, save “after”

That’s not really control… that’s “we’ll know what happened later if something breaks.”

2. Most controls stop at access, not actions

Even in more mature setups:

MFA, VPN, jump hosts, segmentation
Session recording
Protocol breaks

All solid.

But it’s still mostly:
“you got in the right way, so now do your thing”

3. Lots of monitoring, not much real-time stopping

I saw a lot of:

“we record sessions”
“someone should be watching”
“we can review logs if needed”

Didn’t see much:

“we can actually stop a bad command mid-execution”
“we validate changes against expected behavior in real time”

4. Everyone agrees mistakes are the bigger risk… but we don’t really control for them

One of the best comments was basically:

Feels true.

But most setups don’t actually prevent that mistake — they just make it traceable afterward.

Where I’m stuck (and curious if I’m off here):

Feels like we’re really good at:

Controlling who gets in
Logging what happened after

But there’s a gap in:

Controlling what they’re actually doing while they’re in

Especially in OT where:

A “valid” command can still be dangerous depending on timing / sequence / context
And a lot of damage comes from “authorized” actions, not exploits

Question to the people actually dealing with this:

If you allow vendor/remote access today…

Is there anything in your environment that:

Understands commands at the protocol level (not just IP/port/session)
Enforces guardrails in real time
Or blocks “valid but unsafe” actions

Or is it mostly:

access control + segmentation + logging + trust?

(Not a pitch, just thinking out loud)

I’ve been wondering if an in-line approach could work here where:

It understands things like PLC commands
Learns what “normal/safe” looks like
And can stop something before it executes if it’s out of bounds (within strict boundaries + human in the loop)

But I can also see this breaking in a hundred ways in real environments but I want to see where it could do some good.

u/RCCole20

OT/ICS people: have you seen an authorized action cause problems because it was valid but unsafe?

Where I’m stuck (and curious if I’m off here):

Question to the people actually dealing with this:

(Not a pitch, just thinking out loud)