u/AnswerPositive6598

OpenAI Daybreak - security tooling for defenders

As expected, OpenAI is coming out strong against Anthropic’s security releases

“Defenders can bring secure code review, threat modeling, patch validation, dependency risk analysis, detection, and remediation guidance into the everyday development loop so software becomes more resilient from the start.”

reddit.com
u/AnswerPositive6598 — 2 days ago

OpenAI Daybreak - secure coding and vulnerability scanning

As expected…

“Defenders can bring secure code review, threat modeling, patch validation, dependency risk analysis, detection, and remediation guidance into the everyday development loop”

openai.com
u/AnswerPositive6598 — 2 days ago

Over the last few weeks, I've been building out an open source security and compliance tool for AWS and Azure. The initial output looked **pretty decent**, but as I put it to the test against real-world cloud environments, a number of **key gaps** emerged.

  1. Features in the documentation were completely **missing in code**
  2. **Test coverage** was very poor
  3. AWS checks **weren't mapped to CIS benchmarks**
  4. Initially, AWS only **covered one region** (us-east-1) and Azure (only one subscription, not the others in that tenant)
  5. Reporting **verbiage was wrong**

I decided to go deeper into Claude Code's working and ask it out how we could have avoided or reduced these gaps. It's response was super interesting and probably not surprising for others on this subreddit. But definitely enlightening for me.

I then asked it to document all these gaps into a markdown, which reference we then added into Claude.md to make sure we avoided them into the future. Some of the key lessons were:

  1. *Determinism is a legitimate choice in specific use cases.* For this particular toolkit, where every finding had to be legit and traceable, we decided to use static API calls to discover settings and map them to controls.
  2. *Every line in the documentation had one or more tests to check actual implementation.* In the first one or two runs, we found a number of stubs.
  3. *Document all bugs and their fixes.* Anyone reading the repository now has an audit trail of what failure modes were encountered and how they were fixed
  4. *Auditability: every output traces to a cause.* When the software produces a result, can you explain \*why\* it produced that result, in terms a human can follow?
  5. *Honest scope.* Document what the software does, but more importantly what it does not do. The initial Readme claimed comprehensive AWS scanning, which we shaved down to what actually was being covered and what wasn't.
  6. *Test extensively.* I scanned half a dozen cloud environments. I wish I had access to more. Each scan yielded more gaps and helped improve the tool.
  7. *Legibility.* Can someone (I mean human) read the code and understand what is going on? Can you as the author explain the purpose of each file in the repo?

This is besides extensive use of plan, ultraplan, brainstorm and other modes that I found very insightful, but they didn't fix the basic coding hallucination and quality issues I've enumerated above.

What are your guardrails to ensure you build trustworthy and reliable software?

reddit.com
u/AnswerPositive6598 — 6 days ago

Over the last few weeks, I've been building out an open source security and compliance tool for AWS and Azure. The initial output looked **pretty decent**, but as I put it to the test against real-world cloud environments, a number of **key gaps** emerged.

  1. Features in the documentation were completely **missing in code**
  2. **Test coverage** was very poor
  3. AWS checks **weren't mapped to CIS benchmarks**
  4. Initially, AWS only **covered one region** (us-east-1) and Azure (only one subscription, not the others in that tenant)
  5. Reporting **verbiage was wrong**

I decided to go deeper into Claude Code's working and ask it out how we could have avoided or reduced these gaps. It's response was super interesting and probably not surprising for others on this subreddit. But definitely enlightening for me.

I then asked it to document all these gaps into a markdown, which reference we then added into Claude.md to make sure we avoided them into the future. Some of the key lessons were:

  1. *Determinism is a legitimate choice in specific use cases.* For this particular toolkit, where every finding had to be legit and traceable, we decided to use static API calls to discover settings and map them to controls.
  2. *Every line in the documentation had one or more tests to check actual implementation.* In the first one or two runs, we found a number of stubs.
  3. *Document all bugs and their fixes.* Anyone reading the repository now has an audit trail of what failure modes were encountered and how they were fixed
  4. *Auditability: every output traces to a cause.* When the software produces a result, can you explain \*why\* it produced that result, in terms a human can follow?
  5. *Honest scope.* Document what the software does, but more importantly what it does not do. The initial Readme claimed comprehensive AWS scanning, which we shaved down to what actually was being covered and what wasn't.
  6. *Test extensively.* I scanned half a dozen cloud environments. I wish I had access to more. Each scan yielded more gaps and helped improve the tool.
  7. *Legibility.* Can someone (I mean human) read the code and understand what is going on? Can you as the author explain the purpose of each file in the repo?

This is besides extensive use of plan, ultraplan, brainstorm and other modes that I found very insightful, but they didn't fix the basic coding hallucination and quality issues I've enumerated above.

What are your guardrails to ensure you build trustworthy and reliable software?

reddit.com
u/AnswerPositive6598 — 6 days ago
▲ 15 r/ciso+1 crossposts

If you have been following the “Trivy -> Checkmarx -> Dependabot -> Who else” saga, here are the top 10 things to secure your dev environment:

  1. Pin GitHub actions to SHA keys, not version tags

  2. If you aren’t sure you’ve been compromised or not, rotate all your creds anyway - Github keys, API keys, DB credentials, LLM keys, etc.

  3. Use short-lived credentials via OIDC, not long-lasting cloud keys

  4. Protect publisher and maintainer accounts with MFA - even investing in hardware keys if you can afford it

  5. Scope every token to the minimum access it needs - be it a PyPi or npm token or a cloud account. Probably do an end-to-end access review immediately

  6. Add dependency cooldowns - don’t auto-install a newer version of a package the day it is released

  7. Audit OAuth grants in Google Workspace, Microsoft Entra (the Vercel hack was partly because of this)

  8. Have a supply chain incident response playbook

  9. Run SCA to check and fix all known vulnerable or malicious package dependencies

  10. I’d love to say implement egress filtering, but in fast moving dev environments that may not always be possible.

Anything you’d add or change?

reddit.com
u/AnswerPositive6598 — 15 days ago