u/haukebr — reddlx

The video is the full run. The command was literally this:

qagent "Goal: I successfully buy a backpack. Steps: 1. login (standard_user / secret_sauce) 2. add 'Sauce Labs Backpack' to cart 3. open the cart 4. checkout — fill First Name, Last Name, Zip 5. click Continue, then Finish. End: I see the 'Thank you for your order!' confirmation page" --url https://www.saucedemo.com/

That's the whole spec. No selectors, no fixtures, no await page.click(...). A real Playwright browser, end-to-end, with PASS/FAIL + evidence at the end.

What's working

google/gemma-4-26b-a4b-it via OpenRouter passes a real Gravity Forms submission (5 required fields, checkbox arrays, paired email confirmation) for ~$0.008/run, 5/5.
gpt-4.1-mini also passes 5/5, ~6× the cost.
--reporter=ndjson streams one JSON event per turn and a stable done envelope at the end (outcome, evidence, totalCost, finalUrl). Exit codes 0/1/2/3. So a Claude Code parent agent can shell out, parse tail -1, and act on a real verdict.

Install

npm install -g @qagent/cli
npx playwright install chromium
qagent config set apiKey sk-or-...
qagent config set provider openrouter #Or any other provider you like
qagent config set model google/gemma-4-26b-a4b-it

Claude code usage: Just tell it to test with qagent / read the readme.

Why did I build this? Because letting claude code test it's own creation fails SO MANY TIMES. And also takes ages, while burning tokens. I prefer to pay the API price (for gemma-4 that's less than a cent).