u/Administrative_Row61

Arch Linux + MSI B650 Tomahawk WiFi + Realtek RTL8125: network speed degrades over time (900 Mbps → 50 Mbps) until reboot

Arch Linux + MSI B650 Tomahawk WiFi + Realtek RTL8125: network speed degrades over time (900 Mbps → 50 Mbps) until reboot

Hi all,

I’m running Arch Linux and I’m trying to diagnose a networking issue with my onboard Realtek RTL8125 2.5GbE NIC.

Motherboard:

\- MSI MAG B650 Tomahawk WiFi

The problem:

\-After reboot, internet speed is normal (\~900 Mbps)

\-After some hours of uptime, download/upload speed degrades badly (\~50 Mbps or even lower)

\-Reboot immediately restores full speed

\-Latency/ping stays mostly fine

\-No obvious packet loss

\-Happens on Ethernet only

\-This started only recently. I didn’t intentionally change anything major besides normal Arch updates.

Motherboard NIC:

\-RTL8125 2.5GbE Controller

Originally using:

\-r8169

I also tested:

\-r8168-dkms

and even:

\-r8125-dkms

but the issue still happens.

When degraded:

\-Internet becomes extremely slow

\-iperf3 to another LAN machine collapses hard

\-TCP retransmits become very high

\-But ping to router and internet remains stable

Example:

ping 192.168.1.1

Stable:

\~0.3–0.7 ms

0% packet loss

ping 1.1.1.1

Also stable:

\~10–11 ms

0% packet loss

So latency is fine while throughput dies.

iperf3 example during degraded state:

\[ 5\] 0.00-1.00 sec 896 KBytes 7.33 Mbits/sec 61 retr

\[ 5\] 1.00-2.00 sec 512 KBytes 4.19 Mbits/sec 17 retr

...

\[ 5\] 0.00-10.00 sec 6.75 MBytes 5.66 Mbits/sec 146 retr

So retransmits explode under load.

Things I already tested:

Drivers:

\-r8169

\-r8168-dkms

\-r8125-dkms

No real improvement.

Offloads disabled:

sudo ethtool -K enp12s0 gro off gso off tso off

No change.

IRQ balancing:

Installed and enabled:

sudo pacman -S irqbalance

sudo systemctl enable --now irqbalance

NIC interrupt was originally mostly pinned to one CPU core.

After tweaking IRQ affinity + enabling RPS, interrupts spread a little more across CPUs, but issue still happens eventually.

RPS enabled:

for f in /sys/class/net/enp12s0/queues/rx-\*/rps\_cpus; do

echo ffffffff | sudo tee $f

done

Still degrades after some uptime.

EEE already disabled:

EEE status: disabled

qdisc:

Tried:

fq\_codel

pfifo\_fast

No difference

.

Other possibly relevant info:

This machine also runs:

\-Docker

\-k3s

\-multiple bridges/veth interfaces

Interfaces include:

\-docker0

\-cni0

\-flannel.1

\-many veth devices

But even after stopping Docker + k3s, degraded throughput remained.

Things I noticed:

During normal operation:

ethtool enp12s0

shows:

Speed: 1000Mb/s

Duplex: Full

Link detected: yes

No link flaps.

Also:

ip -s link show enp12s0

shows almost no actual errors.

Question:

Has anyone seen:

RTL8125 gradually degrading throughput over uptime on Linux?

r8169/r8168/r8125 all behaving similarly?

interrupt/softirq saturation causing long-term throughput collapse?

Could this still be:

PCIe ASPM?

MSI/MSI-X issue?

AMD B650 chipset quirk?

kernel regression?

firmware/BIOS issue?

Any ideas for deeper debugging would be appreciated because I’m running out of things to test.

Edit:

Additional diagnostic data (during issue / monitoring):

rx_missed: 0

rx_mac_missed: 2243 (and increasing over time)

reddit.com
u/Administrative_Row61 — 6 days ago

Arch Linux + MSI B650 Tomahawk WiFi + Realtek RTL8125: network speed degrades over time (900 Mbps → 50 Mbps) until reboot

Hi all,

I’m running Arch Linux and I’m trying to diagnose a networking issue with my onboard Realtek RTL8125 2.5GbE NIC.

Motherboard:

- MSI MAG B650 Tomahawk WiFi

The problem:

-After reboot, internet speed is normal (~900 Mbps)

-After some hours of uptime, download/upload speed degrades badly (~50 Mbps or even lower)

-Reboot immediately restores full speed

-Latency/ping stays mostly fine

-No obvious packet loss

-Happens on Ethernet only

-This started only recently. I didn’t intentionally change anything major besides normal Arch updates.

Motherboard NIC:

-RTL8125 2.5GbE Controller

Originally using:

-r8169

I also tested:

-r8168-dkms

and even:

-r8125-dkms

but the issue still happens.

When degraded:

-Internet becomes extremely slow

-iperf3 to another LAN machine collapses hard

-TCP retransmits become very high

-But ping to router and internet remains stable

Example:

ping 192.168.1.1

Stable:

~0.3–0.7 ms

0% packet loss

ping 1.1.1.1

Also stable:

~10–11 ms

0% packet loss

So latency is fine while throughput dies.

iperf3 example during degraded state:

[ 5] 0.00-1.00 sec 896 KBytes 7.33 Mbits/sec 61 retr

[ 5] 1.00-2.00 sec 512 KBytes 4.19 Mbits/sec 17 retr

...

[ 5] 0.00-10.00 sec 6.75 MBytes 5.66 Mbits/sec 146 retr

So retransmits explode under load.

Things I already tested:

Drivers:

-r8169

-r8168-dkms

-r8125-dkms

No real improvement.

Offloads disabled:

sudo ethtool -K enp12s0 gro off gso off tso off

No change.

IRQ balancing:

Installed and enabled:

sudo pacman -S irqbalance

sudo systemctl enable --now irqbalance

NIC interrupt was originally mostly pinned to one CPU core.

After tweaking IRQ affinity + enabling RPS, interrupts spread a little more across CPUs, but issue still happens eventually.

RPS enabled:

for f in /sys/class/net/enp12s0/queues/rx-*/rps_cpus; do

echo ffffffff | sudo tee $f

done

Still degrades after some uptime.

EEE already disabled:

EEE status: disabled

qdisc:

Tried:

fq_codel

pfifo_fast

No difference

.

Other possibly relevant info:

This machine also runs:

-Docker

-k3s

-multiple bridges/veth interfaces

Interfaces include:

-docker0

-cni0

-flannel.1

-many veth devices

But even after stopping Docker + k3s, degraded throughput remained.

Things I noticed:

During normal operation:

ethtool enp12s0

shows:

Speed: 1000Mb/s

Duplex: Full

Link detected: yes

No link flaps.

Also:

ip -s link show enp12s0

shows almost no actual errors.

Question:

Has anyone seen:

RTL8125 gradually degrading throughput over uptime on Linux?

r8169/r8168/r8125 all behaving similarly?

interrupt/softirq saturation causing long-term throughput collapse?

Any ideas for deeper debugging would be appreciated because I’m running out of things to test.

Edit: Additional diagnostic data (during issue / monitoring):

rx_missed: 0 rx_mac_missed: 2243 (and increasing over time)

I also tried disabling ASPM (pcie_aspm=off) and it did not solve the issue.

I collected more low-level data while the issue is occurring:

ethtool -S shows rx_missed remains relatively low but steadily increases over time under load rx_mac_missed increases gradually during sustained traffic /proc/net/softnet_stat shows non-zero drops in column 2 across multiple CPUs, indicating softnet backlog drops rather than NIC-level errors Disabling Docker and k3s does not eliminate the issue Interrupt distribution was initially heavily skewed to a single CPU core, but improving IRQ affinity + enabling RPS temporarily restores full throughput However, performance still degrades again after some uptime even with RPS enabled

reddit.com
u/Administrative_Row61 — 6 days ago