u/RCCole20

OT/ICS people: have you seen an authorized action cause problems because it was valid but unsafe?

Title: OT/ICS people: have you seen an authorized action cause problems because it was valid but unsafe?

I’m trying to understand whether this is a real OT/ICS problem or whether I’m overthinking it.

I’m looking for real examples where:

  • the person was authorized
  • the session/access path was approved
  • the asset was legitimate
  • the command/change/action was technically valid
  • but it still caused, almost caused, or could have caused a problem because of timing, sequence, value, process state, or field context

Examples I’m thinking about:

  • Breaker/switch/pump/valve command issued at the wrong time
  • Rapid repeated open/close or start/stop commands
  • Wrong setpoint, threshold, mode, or register value
  • Vendor had approved remote access but too much freedom once inside
  • Protection/automation/PLC logic change that passed normal workflow but was not safe in the real operating context
  • Interlocks or permissives existed, but did not cover the actual condition
  • Temporary vendor/maintenance access became permanent and later created risk
  • Operator or engineer selected the wrong asset or action in an HMI/SCADA system

For people who work around PLCs, SCADA, DCS, substations, water/wastewater, manufacturing, utilities, or industrial controls:

Have you seen this happen in the real world?

I’m especially interested in:

  1. What happened?
  2. What control was supposed to prevent it?
  3. Why did that control fail or not apply?
  4. Was it caught in real time, after the fact, or not at all?
  5. Would any kind of real-time “second check” have helped, or would that be rejected because of uptime/availability risk?

Not looking for company names or sensitive details. Sanitized stories are fine.

I’m also interested in hearing “this is already solved by interlocks/procedures” or “this would never be allowed in a mature environment” if that’s your experience.

reddit.com
u/RCCole20 — 6 days ago

I posted a couple days ago asking:

>

Got a ton of good responses and a pretty clear split:

Camp 1:

  • “No remote access ever”
  • Everything on-site
  • Eliminate the problem entirely

Camp 2:

  • Remote access is unavoidable (utilities, manufacturing, distributed assets, etc.)
  • VPN → DMZ / jump host → session recording
  • Lock it down as much as possible

Both make sense depending on the environment.

What I didn’t expect was how consistent the answers were around what happens after someone gets in.

A few patterns that kept coming up:

1. It turns into trust pretty quickly

Example someone gave:

  • Vendor connects via a temporary cellular router
  • Direct to PLC
  • Save “before” logic, make changes, save “after”

That’s not really control… that’s “we’ll know what happened later if something breaks.”

2. Most controls stop at access, not actions

Even in more mature setups:

  • MFA, VPN, jump hosts, segmentation
  • Session recording
  • Protocol breaks

All solid.

But it’s still mostly:
“you got in the right way, so now do your thing”

3. Lots of monitoring, not much real-time stopping

I saw a lot of:

  • “we record sessions”
  • “someone should be watching”
  • “we can review logs if needed”

Didn’t see much:

  • “we can actually stop a bad command mid-execution”
  • “we validate changes against expected behavior in real time”

4. Everyone agrees mistakes are the bigger risk… but we don’t really control for them

One of the best comments was basically:

>

Feels true.

But most setups don’t actually prevent that mistake — they just make it traceable afterward.

Where I’m stuck (and curious if I’m off here):

Feels like we’re really good at:

  • Controlling who gets in
  • Logging what happened after

But there’s a gap in:

  • Controlling what they’re actually doing while they’re in

Especially in OT where:

  • A “valid” command can still be dangerous depending on timing / sequence / context
  • And a lot of damage comes from “authorized” actions, not exploits

Question to the people actually dealing with this:

If you allow vendor/remote access today…

Is there anything in your environment that:

  • Understands commands at the protocol level (not just IP/port/session)
  • Enforces guardrails in real time
  • Or blocks “valid but unsafe” actions

Or is it mostly:

  • access control + segmentation + logging + trust?

(Not a pitch, just thinking out loud)

I’ve been wondering if an in-line approach could work here where:

  • It understands things like PLC commands
  • Learns what “normal/safe” looks like
  • And can stop something before it executes if it’s out of bounds (within strict boundaries + human in the loop)

But I can also see this breaking in a hundred ways in real environments but I want to see where it could do some good.

reddit.com
u/RCCole20 — 26 days ago