Coming from an infrastructure background, I was accustomed to real time alerting on hardware events. Since moving into the cloud, I’ve noticed the industry accepts a 24-72 hour delay in billing data (that assumes you’re being more proactive than just looking at the monthly bill). I was using Cloudability at the time and even it was behind (because the provider data themselves is behind). Buy I was able to build a real time alerting software to send me notices as soon as a resource usage event was occurring (with the expected price impact). I’m considering open-sourcing the main functionality (monitoring/alerting) on GitHub and having a purchasable upgrade for additional features (multiple users, support, anomaly detection, tagging analysis, AI/LLM token forecasting, MCP for BYOLLM, etc). Any thoughts on this approach?
u/Artistic_Lock_6483
Coming from an infrastructure background, I was accustomed to real time alerting on hardware events. Since moving into the cloud, I’ve noticed the industry accepts a 24-72 hour delay in billing data (in other words- if you have a price spike it started at least 24 hours ago). So I was able to build a real time alerting software to send me notices as soon as a resource usage event was occurring (with the expected price impact). I’m considering open-sourcing the main functionality (monitoring/alerting) on GitHub and having a purchasable upgrade for additional features (multiple users, support, anomaly detection, tagging analysis, AI/LLM token forecasting, MCP for BYOLLM, etc). Any thoughts on this approach?
Coming from an infrastructure background, I was accustomed to real time alerting on hardware events. Since moving into the cloud, I’ve noticed the industry accepts a 24-72 hour delay in billing data (in other words- if you have a price spike it started at least 24 hours ago). So I was able to build a real time alerting software to send me notices as soon as a resource usage event was occurring (with the expected price impact). I’m considering open-sourcing the main functionality (monitoring/alerting) on GitHub and having a purchasable upgrade for additional features (multiple users, support, anomaly detection, tagging analysis, AI/LLM token forecasting, MCP for BYOLLM, etc). Any thoughts on this approach?
You ever notice how all of these horror stories of clouds spend typically occur over a weekend? It’s because billing data lags behind usage (24-72 hrs depending on your Cloud provider). It’s because people are actually paying attention first thing Monday morning and whatever state things were in Friday (when attentiveness is down) has now hit the dashboard (that assumes you’re looking at the right dashboard and not just waiting for the monthly bill). If your daily spend is $10k, a 72-hour billing delay (standard for AWS/Azure Rating Latency) results in $30,000 of unrecoverable spend before an alert even fires.
I was getting asked by our CFO about the bill and retroactively looking at reports (Cloudability and native Azure/AWS) but the approach of playing investigator was annoying. Coming from an infrastructure background I expected to be alerted when things happened not find out after the fact only (didn’t monitoring software solve this like 10 years ago?!?!). I built my own solution for our use case… But I’m wondering why no one else is bothered by this.
Hey r/FinOps — pushed cletrics/finops-agents public this week. MIT. This community was in our head the whole time we were building it.
34 specialist agent personas + 6 named-pattern playbooks. Markdown files with YAML frontmatter. Drops into any modern coding assistant (Claude Code, Cursor, Copilot, Windsurf, Aider, OpenCode, Gemini CLI). No runtime, no telemetry, no network.
Why: when a dev asks their assistant "help me analyze the CUR" or "is this RDS oversized?", the generic answer is subtly wrong. CUR 2.0 columns ≠ CUR 1. GCP SUDs apply automatically, CUDs don't. Azure has 6 enrollment types. Each persona here is scoped tight to one niche with the schema, gotchas, and questions a senior practitioner asks first.
Categories: cloud-cost (8), commitments (5), kubernetes (3), data-platforms (3), governance (6), waste-detection (6), specialized (3).
Named-pattern playbooks you can cite in postmortems: Zombie NAT Gateway, Snapshot Sprawl, Cross-AZ Chatterbox, Idle Load Balancer, Oversized RDS, Untagged Spend Drift.
Repo: https://github.com/Cletrics/finops-agents
Pinned roadmap discussion: upvote candidate agents (Snowflake, Databricks, LLM API spend, GCP folder hierarchy, localizations).
PRs welcome. Im working on the FinOps Professional cert (analyst + practitioner already) and built these to help in a small FinOps org. What's missing?