Why would an ISP disable MSS clamping, break PMTUD, and silently drop fragmented TCP after 600s – resource conservation, security policy, or neglect?
Hello everyone!
First of all I'd like to say I am not a network engineer and everything stated here is something I spent time learning about in the past month (which may also include incorrect interpretations), so my understanding and knowledge is extremely limited. Please do correct me if I am wrong or making incorrect assumptions.
TL;DR:
ISP doesn't clamp MSS on PPPoE → PMTUD fails → TCP fragmentation → fragmented flows hit 600s stateful firewall timeout → silent drops. Trying to understand if this is neglect, resource-saving, or a valid security policy.
The problem:
I'm a streamer on a residential 500 Mbps upload plan. My RTMP/RTMPS streams to Twitch, Kick, Trovo, or Restream would drop exactly at ~600 seconds with OBS "WriteN, RTMP send error 10060." YouTube and VPN streams were perfectly stable.
Original findings:
- Disconnecting streams – Direct streams to Twitch, Kick, Trovo, Restream (both RTMP and RTMPS) died at ~600s. Ping to gateway and 8.8.8.8 stayed 0% packet loss; tcping to ingest server port 1935/443 remained 100% open even after OBS reported disconnect. Wireshark showed server sending Dup ACKs (with SACK) right before silence, then client retransmissions for ~15–20s, then client sends RST. No FIN from server.
- Working streams – YouTube (both RTMP and RTMPS) and as a control test, Twitch RTMP with VPN. All three stayed solid without disconnects at 600s, no retransmissions with a clean FIN when I manually ended the stream.
ISP ticket:
After gathering all the evidence, logs and Wireshark captures, I sent a support ticket to my ISP with a request to check if there are any interreferences in traffic management that times out my connection due to CGNAT/NAT/DPI/stateful firewall.
After 30+ days of waiting for their response (with occasional check-ins from my side so the ticket doesn't get closed), they said they performed a basic remote checks, concluded their network uses only stateless equipment with no timers, incorrectly claimed Twitch streams use QUIC for video and RTMP only for signaling, mischaracterized my Wireshark evidence, claimed VPN only works because it's changing the ingest server, ignored the reproducible issue on the vast majority of platforms and ultimately suggested I update OBS, lower my bitrate, or try a different ingest server—all while failing to address my specific questions about the 600-second timeout. After realizing they are refusing to engage and provide support, I dove deeper.
Latest findings:
- MTU – Path MTU is 1492 (confirmed with
ping -f -l). Typical PPPoE overhead. - MSS Clamping – SYN from my PC advertises MSS=1460. No MSS clamping by ISP router.
- PMTUD – First RTMP Handshake (~1500 IP) is sent with DF flag set. Router fragments it anyway (maybe RFC 1191 not operating as intended?) and forwards; ICMP "Fragmentation needed" (MTU=1492) arrives after the packet is forwarded. Server SACK proves partial delivery, so Windows never reduces MTU. Fragmentation persists for the entire stream.
- Why YouTube and VPN work:
- VPN works because MSS gets lowered to ~1380 by the tunnel, packets stay under 1492 bytes.
- YouTube works because Google's ingest edge clamps MSS to 1412 in SYN-ACK.
- The fix – Manually setting Windows MTU to 1492 fixes everything permanently.
- The conclusion – The observed fragmentation correlates with the stream failures. The resulting TCP instability appears to coincide with session expiration around ~600 seconds — behavior consistent with stateful network handling (firewall/BNG/DPI systems or similar mechanisms). Note: I obviously can’t confirm the exact internal classification or trigger mechanism; I’m only observing that when MTU is fixed and fragmentation disappears, the issue disappears as well.
The questions:
The technical mechanism is clear. What I'm trying to wrap my head around is why an ISP would operate like this. I've read that 600s is a common default timeout for fragmented or "unclassified" flows. I also understand that non-initial IP fragments lack TCP headers, so a firewall may not be able to reliably refresh state – essentially treating a still-active stream as idle.
But this situation appears to originate from how the ISP handles MTU constraints:
- MSS clamping not enabled – Is this usually a conscious decision to save router CPU (clamping rewrites every SYN) or just neglect? Would enabling it on CPE/BNG be considered a "normal" and expected configuration for PPPoE customers?
- PMTUD broken (fragmentation despite DF, ICMP delivered but late) – Is this a common misconfiguration, or might it be deliberate (e.g., allowing fragments to reduce ICMP processing load)? Could it be an artifact of some hardware offloading?
- 600s silent drop of fragmented TCP – My research suggests this is often a resource-protection measure: VFR is expensive, and applying a short timeout prevents state-table bloat. But also that fragmented traffic is a common attack vector.
- In your experience, is a 600s timeout on fragmented flows more likely a security policy (mitigating fragment attacks), a resource conservation measure (saving TCAM/RAM) or neglect (set it and forget it)?
- If the ISP's own equipment causes the fragmentation by not clamping MSS and PMTUD failing, can they still legitimately claim it's a security measure?
- Is the silent drop (no RST to client or server) standard practice, or just bad implementation?
Extra context for those who are curios:
I'm asking these questions because I'm currently in a formal complaints process. The ISP is refusing to engage with the evidence and deflecting blame to my OBS settings, which I've already proven are not the cause. I've filed a first‑level complaint and expect it to be rejected, especially since I explicitly asked how their traffic management practices might affect the use of applications under EU net neutrality transparency rules (Article 4), and they have simply ignored those questions. I plan to continue escalating it to the ISP commission and possibly the regulatory authority if they keep being dismissive and uncooperative. The answers to these questions will directly strengthen my complaint. Whether the cause is resource conservation, security policy, or neglect. all could represent a failure to comply with EU net neutrality rules depending on how it's framed—and understanding the ISP's internal reasoning will help me challenge their refusal to engage with the evidence.