r/sre

▲ 4 r/sre

Will Datadog bill me twice for APM if I delete and recreate a host?

On the datadog pricing table, it says that APM starts at $35 per host per month.

Now my question is : what if during a month I delete one of my hosts (for example an AWS EC2) and I create a new host. Will I be billed twice ($70 for the month), or will they calculate my bill according to the number of hours that I've used each host? (so the total would be $35 for the month)

Thank you

reddit.com
u/Ok-Transition-7857 — 15 hours ago
▲ 8 r/sre

New PM wants AI-generated root cause analysis. Am I overreacting to the quality?

Just started building out an agentic workflow for incident response and our new PM is fully bought in on AI-generated RCA reports. Says it'll cut toil and catch patterns we miss manually. Sounds great in theory.

Then I see the POC output and it's flagging random correlations, like "high number of firefox texture limit events may indicate frontend rendering issues" showing up in a backend latency incident.

I pushed back saying we need proper data correlation, not just anomaly pattern-matching, but he wants everyone committing AI outputs to the runbooks directly. I'm the platform lead and this feels like it's going to create more review work, not less.

Has anyone dealt with AI RCA tools that actually reduce MTTR without creating a mountain of garbage to sift through first? Or is manual still king for complex incidents? Starting to lose faith in this whole direction.

reddit.com
u/Appropriate-Plan5664 — 23 hours ago
▲ 3 r/sre

We're doing weekly live coding sessions on our open-source eBPF root cause analysis tool -anyone interested in joining?

Hey everyone!

We've been building an open-source eBPF-based agent for automated root cause analysis and wanted to start opening up the development process to the community.

We're thinking of doing weekly live coding sessions where we work through the codebase together - debugging, building features, discussing architecture decisions in real time.

Has anyone done something similar with their open-source project? Would love to know what worked. And if anyone's curious to join, happy to share the details in the comments.

reddit.com
u/Epifyse — 21 hours ago
▲ 0 r/sre

Stop Using Top! Try sysview – The Minimal Resource Monitor I Built (Open Source)

Tired of bloated monitoring tools or staring at endless lines in top/htop? I built sysview – a blazing fast, lightweight system resource monitor that shows you exactly what you need (CPU, RAM, disk, network) in a clean terminal UI. No clutter, no fuss, just instant visibility.

https://preview.redd.it/is8idx6hyxtg1.png?width=1394&format=png&auto=webp&s=8f665bd3a3e0c1538ced5604dee2a10e894ab126

Why sysview beats the usual suspects:

  • 🚀 Minimal dependencies – install in seconds
  • ⚡ Real-time stats for CPU, memory, disk, and network
  • 🖥️ Perfect for SSH sessions and remote work
  • 👀 Designed for SREs who want actionable info, not noise

If you’re ready to ditch the old tools and see your system’s health at a glance, check out sysview! I’m actively maintaining it and would love your feedback or contributions.

Let me know what you think, or if you have feature requests – let’s make sysview the go-to tool for SREs!

Features

  • Cross-platform: Works on macOS and Linux
  • Color-coded: Green (good), Yellow (warning), Red (critical)
  • Progress bars: Visual representation of usage
  • Process tree: Hierarchical view of processes
  • Git integration: Status, branches, commits, remotes
  • Interactive: Real-time monitoring mode
reddit.com
u/UnitedYak6161 — 20 hours ago
▲ 0 r/sre

Aws cloud- devops- SRE- cloud engineer

Hi,

I am looking for Aws cloud engineer role / SRE/ devops/ production support roles.

Having total 10.5 yrs of exp which includes 5 in cloud.

I am an immediate joiner.

Any opening?

Or anyone can refer me??

reddit.com
u/Hi__J — 17 hours ago
Week