u/harbinger-alpha

LLM bug bounty landscape in 2026: program-by-program scope, median payouts, and the indirect-injection gap

I've been reporting LLM vulns to a few programs over the last year and got tired of the gap between what programs advertise and what actually gets paid.

The published info is scattered across program pages, policy revisions, and disclosed reports. Nobody's pulled it into a single comparative view. So I did. Program-by-program: Anthropic, OpenAI, Google, Meta, Microsoft, Hugging Face, plus the HackerOne/Bugcrowd long tail of SaaS-with-AI-bolted-on programs. What's scoped, what's practically out-of-scope regardless of what the policy says, realistic median payouts vs. the advertised ceilings.

A few hot takes the writeup lands on:

  • Median AI-bug payout in 2026 is ~$500–$2,500, not the $15k-$50k headline numbers. Those headlines refer to critical findings against core infra; the middle band is where most hunters live.
  • Indirect prompt injection is the biggest scope-vs-triage gap. Most programs nominally accept it. In practice triage under-rates it, especially multi-step chains, because the impact argument lives in a paragraph of report text rather than a single reproducible step.
  • OpenAI's program specifically: report framing matters more than the vuln itself. "I can make ChatGPT say X" is closed as known limitation. "I can make a third-party Custom GPT leak its uploaded knowledge files" is paid. Same underlying primitive, different framing, 10x different outcome.
  • Anthropic's program is the highest-signal in the space if you can invest multi-day effort per finding — their "universal jailbreak" bar is real but so are the payouts when you hit it.
  • The growth zone is the HackerOne/Bugcrowd long tail: SaaS products that bolted AI features onto existing surfaces, where triagers often don't know how to evaluate LLM-specific impact. Opportunity + frustration.

Bias disclosure: I run Wraith (AI-security platform, hands-on academy and cert for this discipline) so I care about hunters succeeding in this space because it validates the category. I've reported into several of these programs myself and flagged where I have direct experience vs. secondary.

Full writeup: wraith.sh/learn/state-of-llm-bug-bounties-2026 Pushback welcome — especially from hunters who've had different experiences with any of the programs. What would you add or disagree with?

reddit.com
u/harbinger-alpha — 7 hours ago

I built a free hands-on CTF-style course for AI/LLM security attacks — looking for red-team feedback

I've been doing AI security work for a while (pentest background,

PhD, eCPPT) and something kept bugging me: when colleagues asked

"where do I learn to break LLM agents?" I had nothing hands-on to

point them to. Every "AI security training" was either a whitepaper

or a $3k vendor course with slides.

So I wrote one. Six modules over the attack classes I run into in

production:

- Prompt Injection (direct)

- Indirect Prompt Injection (via retrieved content / RAG)

- System Prompt Extraction

- Tool Abuse / Excessive Agency

- Data Exfiltration

- Jailbreaks / Guardrail Bypass

Each module is a mini course: concept explainer (~10k words on

average), annotated walkthrough attacking a fictional product

(HyperionBot, Relay support copilot, Inkwell, Glyph SaaS), defense

patterns with priority order, knowledge check. Then a hands-on CTF

challenge against a chatbot I built to be deliberately-weak in that

specific way — you chat with it and try the attack yourself.

One technical note I'm curious about: the challenges use

deterministic trigger patterns layered under an LLM fallback, so

the intended-solution path reliably fires regardless of model

alignment on a given day. The target is Claude Haiku with a

roleplay-weak-character system prompt, plus pattern-matched canonical

leaks when the intended technique is detected. Works well enough that

the lesson lands without depending on alignment to hold a specific

way. I'd be interested in how other AI security educators handle

this — it's a practical problem when teaching an attack that a

well-aligned model will resist.

Free tier: concept reads + one practice challenge per module.

Full access (quizzes, defense content, advanced challenges) is a

monthly subscription; there's also a cert exam on top. Core

material is substantial even on the free tier if that's your

comfort level.

Link in comments. Three things I'd love feedback on from this sub:

  1. Am I wrong on any defense patterns? The guardrail-bypass /

    crescendo defense chapter I'm least confident about — that

    whole attack class is hard to defend against without breaking

    product UX.

  2. Attack classes I didn't cover that you'd want to see? Vector

    embedding poisoning, agentic memory poisoning, supply chain

    are all on my roadmap but haven't shipped.

  3. For anyone teaching AI security internally: what do you

    actually point your team at today? I'd genuinely like to know

    what the competition looks like from inside the industry.

reddit.com
u/harbinger-alpha — 2 days ago