u/Aggravating_Log9704

SD-WAN improved performance but network security issues remained during peak traffic. why?

Rolled out SD-WAN across 23 sites 14 months ago. Latency dropped, MPLS costs came down, project paid for itself. That part was fine.

Problem showed up later. Next gen firewall inline at each major hub, handles inspection okay under normal load. But twice last quarter we hit end of month reporting overlapping with a firmware push nobody coordinated, inspection throughput dropped, some traffic bypassed policy because the firewall was queueing and timing out.

SD-WAN saw nothing. Links up, paths optimal. no failover. Only found it digging through firewall logs by accident.

What I can't figure out is why the pattern is so specific. Sites routing through regional hubs got hit. Branches backhauling directly to the data center were completely fine. Don't know if that's an architecture issue or just undersized hubs.

Is the answer bigger hardware or does inspection need to move somewhere else entirely? Anyone dealt with this?

reddit.com
u/Aggravating_Log9704 — 19 hours ago
▲ 12 r/scala

Why does Spark performance start to vary between similar jobs?

We have a few Spark jobs that are very similar in terms of logic and structure. They run on the same cluster with the same configs

In theory performance should be close, but in practice it isn’t. Some runs finish in around 10–12 minutes, others go past 20 minutes with no clear difference in input size

Checked Spark UI, executors, stages, shuffle behavior. Nothing stands out. No failures, no obvious skew

This started showing up more once more jobs were added to the cluster. Feels like resource contention but not fully clear where it shows up

Has anyone seen this kind of variation across similar Spark jobs and what usually causes it

reddit.com
u/Aggravating_Log9704 — 7 days ago