u/Burninator05

Periodic partial failure

I have a commercial network that I'm periodically having issues with. This network uses a single public IP and we use NAT and 10.x.x.x networks on the inside. One of my users has an application to perform testing services (think PearsonView). Typically around 8 to 9 in the morning that application stops working and won't start working again until overnight sometime. Sometimes it will work for several days before the issue reoccurs. When the application stops working other websites continue to work normally and no other users other than the testing people are complaining.

The network consists of a single Cisco 3900 performing routing and several Cisco switches to get to the user location.

I have looked at potential QOS issues but didn't see anything that stood out and, honestly, don't know enough about NAT to really know where to look. However if it was a NAT issue I would expect issues with other services/websites.

The testing app uses 443 to reach out to a backend and acts similar to a virtual desktop. I am not blocking any 443 traffic across the network and have not made any network changes.

We have worked with our ISP and they have provided us a second interface on their PoP configured with a /30 for testing. When connected to this /30, the application works normally even when it doesn't work when attached to the inside of my network.

This issue has been a problem in the past but it has been about 9 months since it last happened but in the last 3 weeks it has happened almost every day.

Any thoughts on what I should be looking at?

reddit.com
u/Burninator05 — 4 hours ago