r/OReilly_Learning

▲ 254 r/OReilly_Learning+1 crossposts

What’s Your Most Controversial IT Opinion?

Fellow sysadmins, what’s your biggest unpopular IT opinion? Not the usual “users should reboot first” stuff, but the things you’ve learned after a few years in the trenches that you probably wouldn’t say too loudly in a meeting.

reddit.com
u/OReilly_Learning — 1 day ago
▲ 210 r/OReilly_Learning+2 crossposts

Google released two early-release chapters from the SRE Book 2nd Edition this week.

>One is the new "AI for SRE" chapter. It's on O'Reilly publication behind a paywall, but a free trial works. Read it last night, sharing the takeaways for anyone who doesn't to read the full thing.

The condensed version:

  1. AI is not a human replacement. The book is firm on this. We still need humans for the high-stakes calls and to maintain the AI itself.
  2. Don't give AI full access on day one. Build trust the way you would with a junior engineer. Let it suggest fixes first, fix small issues next, only then expand its scope.
  3. If the agent can take an action, it must have a rollback. If there is no undo path, the access should not be granted. This is the line I think most teams shipping agents are skipping right now.
  4. When the agent fails or gives a bad suggestion, flag it. The chapter leans on the same principle as good postmortem culture, more feedback and more context means better future execution.
  5. During incidents, the time-saver is not the fix, it is the searching. The chapter frames the agent as the thing that finds the right answer fast across tabs, runbooks, and prior incidents, instead of the thing that pushes the fix.
  6. Dashboards tell you something is broken. AI is positioned as the layer that tells you why, by reading the tickets and the user feedback that the dashboards do not capture.
  7. The framing that stuck with me most: AI does not reduce SRE workload, it raises the reliability ceiling. Cheaper reliability does not mean less work, it means higher reliability demanded across more services. Jevon's paradox applied to ops.

What I would add as a practitioner: the 5-level maturity model they propose is useful, but the gating criteria between levels is where the real engineering lives. "Agent suggested 50 fixes, 47 were good" sounds great until you ask which 3 were wrong and what they would have broken. Most teams I see skipping straight to autonomous remediation are not doing that work.

Worth a read if you are scoping AI in operations in the next year.

(Disclosure: I run Sherlocks, which builds in this space. This is not a pitch for it.)

reddit.com
u/OReilly_Learning — 3 days ago
▲ 27 r/OReilly_Learning+1 crossposts

A customer asked why ChatGPT was saying our product doesn’t support subscriptions. We’ve had subscriptions live for over a year, so that didn’t make any sense.

I tried it myself and got the same answer.

So I dug a bit deeper and hit our pricing page using GPTBot as the user agent. The response looked… fine at first glance. Layout, nav, footer, all there. But the actual content was basically empty divs where React would normally hydrate.

So yeah, the bots weren’t seeing our product. They were seeing a skeleton of it.

Checked a few others too, Perplexity was messing up our pricing, Claude was missing entire parts of the product. Every AI had a slightly different wrong version.

We ended up doing something pretty simple in hindsight.

Instead of trying to make bots understand our HTML, we just gave them a format they’re better at reading.

Now every page has a markdown version alongside it. Same content, just clean, structured, no JS needed. At build time we generate both /page and /page. md.

Then at the edge, we check the user agent. If it’s one of the known AI crawlers (GPTBot, ClaudeBot, etc.), we serve the markdown version. Otherwise it just goes to the normal site. It’s literally just a string match, so there’s basically no overhead.

One small thing that made a surprisingly big difference, we normalized a lot of text to plain ASCII. Stuff like ₹ symbols, fancy quotes, em dashes. Models were weirdly inconsistent with those, but something like “INR 15000” gets reproduced correctly every time.

We’re also logging all bot requests now, mainly to see where markdown coverage is missing. That ended up being the most useful signal.

We did try going the SSR route first, thinking “just render everything for bots.” It technically worked, but added latency and still sent a lot of noisy HTML. Felt like we were maintaining two systems for no real gain, so we scrapped it pretty quickly.

Right now things are a lot more stable, but one thing we’re still figuring out is redirects. Bots cache pretty aggressively, and if they hit an old URL and get a 404, that seems to stick around longer than you’d expect.

Curious if anyone else has dealt with that part, how are you handling old URLs and keeping AI crawlers in sync?

reddit.com
u/OReilly_Learning — 13 days ago