The video is the full run. The command was literally this:
qagent "Goal: I successfully buy a backpack. Steps: 1. login (standard_user / secret_sauce) 2. add 'Sauce Labs Backpack' to cart 3. open the cart 4. checkout — fill First Name, Last Name, Zip 5. click Continue, then Finish. End: I see the 'Thank you for your order!' confirmation page" --url https://www.saucedemo.com/
That's the whole spec. No selectors, no fixtures, no await page.click(...). A real Playwright browser, end-to-end, with PASS/FAIL + evidence at the end.
What's working
google/gemma-4-26b-a4b-itvia OpenRouter passes a real Gravity Forms submission (5 required fields, checkbox arrays, paired email confirmation) for ~$0.008/run, 5/5.gpt-4.1-minialso passes 5/5, ~6× the cost.--reporter=ndjsonstreams one JSON event per turn and a stabledoneenvelope at the end (outcome,evidence,totalCost,finalUrl). Exit codes 0/1/2/3. So a Claude Code parent agent can shell out, parsetail -1, and act on a real verdict.
Install
npm install -g @qagent/cli
npx playwright install chromium
qagent config set apiKey sk-or-...
qagent config set provider openrouter #Or any other provider you like
qagent config set model google/gemma-4-26b-a4b-it
Claude code usage: Just tell it to test with qagent / read the readme.
Why did I build this? Because letting claude code test it's own creation fails SO MANY TIMES. And also takes ages, while burning tokens. I prefer to pay the API price (for gemma-4 that's less than a cent).
u/haukebr — 7 days ago