u/Quanord

What’s one performance metric you wish more teams watched earlier?

Been thinking about this after a rough on-call week.

Everyone watches CPU and memory. What actually tipped us off that something was wrong is queue depth and lock waits. Both were creeping up for days before anything obvious showed up in the dashboards.

What signals have saved you from a bad production incident? Replication lag, lock waits, queue depth… what else do teams watch that doesn’t get enough attention?

reddit.com
u/Quanord — 18 hours ago
▲ 77 r/PawChampClub+1 crossposts

A while ago, our walks were a mess. Pulling, sudden stops, random lunges at things I couldn’t even see. I came home more trained than my dog.
What helped most was not some genius leash training trick. It was realizing my dog was too overstimulated outside to just magically walk nicely because I wanted that.
The biggest shift was slowing everything down. I stopped treating walks like we had somewhere to be. For a bit, the goal was just calm walking, not distance, not speed, not trying to prove we were a functional duo.
Another thing that helped was me finally noticing how often I was rewarding pulling without meaning to. Dog pulls, dog gets to the smell. Very solid business model from his side. Once I got more consistent and stopped moving when the leash stayed tight, he started getting it.
Also, timing mattered way more than I expected. If I waited until he was already locked onto something and pulling like he paid the bills, I was too late. It worked much better when I caught the moment earlier and redirected before his brain fully left the chat.
And honestly, shorter better walks helped more than long chaotic ones. I used to think a “good” walk had to be long. Turns out 15 calm minutes are way better than 40 minutes of public embarrassment.
We still have off days, but leash training made walks way calmer and way less annoying for both of us.

reddit.com
u/Quanord — 6 days ago