u/Slight_Analysis_5414

▲ 5 r/ROS+1 crossposts

Before a mobile robot hits hard E-stop: detecting wheel slip and odom jumps from /cmd_vel + /odom

Hi guys,

I’ve been working on a small ROS 2 project for AMR/AGV-style mobile robots.
Problem:
A robot may still be receiving valid velocity commands, but its physical motion no longer matches the command stream.
Examples:
- wheel slip on wet / oily floors
- odometry mismatch
- localization jumps
- stale / bursty velocity commands
- robot starts shaking or over-correcting before safety lidar / hardware E-stop cuts in

A normal timeout only checks:

Did a command arrive recently?

It does not check:

Is the robot still moving according to the command it was just given?

So I built a small inline ROS 2 topic filter:

/cmd_vel → Kinematic Guard → /safe_cmd_vel
                 ↑
               /odom

It has a passive observe mode first, so it can run without taking over control.

Example status:

{

"status": "RESYNCING",

"causalAlignment": "BROKEN",

"dominantCause": "WHEEL_SLIP",

"guardAction": "BRAKE_AND_RESYNC"

}

The demo does not need a real robot, Gazebo, or Isaac Sim. It uses a lightweight mock AMR/AGV and injects wheel slip.

GitHub:
https://github.com/ZC502/ros2_kinematic_guard

ROS Discourse discussion:
https://discourse.openrobotics.org/t/detecting-execution-collapse-before-hard-e-stop-ros2-kinematic-guard-for-ros-2-amr-agv/54944

I’d be interested in feedback from people who have dealt with mobile robot slip, odometry jumps, or unexpected hard E-stop events in the field.

reddit.com
u/Slight_Analysis_5414 — 21 hours ago
▲ 0 r/quant

We ran into a failure mode recently that I’m curious how others are handling in production systems. Setup was pretty standard:

- pre-trade risk checks (exposure / limits)

- order routing

- multi-service architecture with retries + async state updates

On paper, risk check is a hard gate. But under certain conditions (retry + latency + delayed state propagation), we saw cases where:

order submission went through before the risk state was actually updated/cleared.

No missing rule.

No disabled control.

Just execution order drift.

What made it tricky:

- the system *knew* the correct order

- logs showed risk checks existed

- but enforcement lived in workflow/orchestration, not in execution state itself

So when things got slightly out of sync, the “gate” behaved more like a suggestion.

Curious how people here deal with this in practice:

  1. Do you enforce ordering at the execution layer (e.g. state machine / transactional constraints)?

  2. Or rely on orchestration guarantees (queues, retries, idempotency, etc.)?

Also — how do you test this?

Most backtests don’t simulate:

- retry storms

- partial failures

- async drift between services

Feels like a lot of “we had the control” incidents are really “we didn’t enforce sequence at the state level.”

Would especially appreciate perspectives from anyone running high-frequency or multi-venue systems where latency + retries are unavoidable.

reddit.com
u/Slight_Analysis_5414 — 16 days ago
▲ 0 r/quant

I keep seeing the same failure pattern in trading infrastructure:

Pre-trade risk checks exist.
Compliance rules exist.
Approval chains exist.

And yet under latency spikes, retries, partial failures, or human override paths, the system still ends up doing this:

Order sent → risk state updated later

instead of:

Risk cleared → order released

Everyone “knew” the correct sequence.

The system still executed the wrong one.

That’s not a model problem.

That’s a state integrity problem.

Most systems still treat execution order as metadata:

if risk_check_passed:
    send_order()

instead of treating it as execution physics:

RiskCheck⋅Order≠Order⋅RiskCheck

These two operations should not commute.

Reversing them is not a small bug.

It’s a compliance failure.

Sometimes a regulatory one.

Sometimes a seven-figure one.

The deeper issue is that this often isn’t just non-commutative.

It’s non-associative.

We implicitly assume:

(A⋅B)⋅C=A⋅(B⋅C)

but in real execution systems:

(context + approval) + transfer

is not the same as

context + (approval + transfer)

The grouping itself changes the validity of the action.

That means workflow safety is not just about order.

It’s about causal structure.

This is what pushed us away from rule engines and toward a non-associative operator model internally (we call it NARH).

Yes — this eventually leads into octonion-like structures, not because “fancy math is cool,” but because standard associative abstractions keep failing under production conditions.

If the operator sequence is invalid, the residual cannot converge to zero.

The system should fail mathematically before it fails financially.

We turned this into a pre-execution logic firewall called SARA:

  • not another audit log
  • not another policy engine
  • not another prompt wrapper

but a deterministic execution control plane: if the sequence cannot converge to a safe state, execution never reaches the API.

Think: pre-trade kill switch not AI agent middleware.

Curious if others here have seen the same issue:

Are we over-relying on post-trade audit + soft policy checks, when what we actually need is hard execution integrity at the state transition layer?

Also curious whether anyone here has experimented with non-standard algebraic structures for workflow validation in production systems.

reddit.com
u/Slight_Analysis_5414 — 17 days ago

I keep seeing two extremes in fintech AI conversations:

  1. “AI will fix everything.”

  2. “AI agents can never safely go live in finance.”

From what I’m seeing, the issue is not just model quality. The harder blocker is operational and governance-related: many agent systems still don’t understand the order-sensitive — even non-commutative — nature of financial workflows (where doing A then B is not equivalent to doing B then A).

In finance, some action sequences are not merely “less optimal” when reversed — they become non-compliant, unsafe, or legally indefensible. Examples:

• suitability check -> recommendation

• risk check -> transfer

• review -> send

• authorization -> access

• backup -> delete

If those get reversed, it’s not just a bad UX outcome. It can become a control failure.

That makes me think the missing layer in fintech AI adoption is not simply “better models,” but a pre-execution control layer that can:

• detect unsafe action order

• enforce tenant/user/session scope boundaries

• require human approval for high-impact actions

• leave an audit-ready, tamper-evident trail

• run in shadow mode before any production write access is granted

The shadow mode piece feels especially important. In a regulated environment, the first question is often not “can this agent work?” but “can we observe it safely, collect evidence, and understand what it would have done before letting it touch production systems?”

So my current hypothesis is:

Fintech doesn’t necessarily lack AI capability. It lacks reliable control planes for agentic execution.

I’d really appreciate blunt feedback from operators, builders, risk/compliance folks, or security teams:

  1. Is order control actually a real blocker in your environment, or is this too narrow?

  2. Which workflows are painful enough to matter, but safe enough to pilot?

  3. What evidence would your team need before allowing an agent to take real actions?

  4. Is shadow mode + approval routing + audit evidence the most realistic path to production?

  5. For customer-facing or multi-tenant agents, is memory/scope isolation already good enough, or still a real risk?

I’m currently exploring a control-plane approach for order-sensitive (“non-commutative”) workflows, and I’m genuinely trying to understand whether the missing product in fintech AI is better models, or better execution controls.

reddit.com
u/Slight_Analysis_5414 — 20 days ago