u/IndianITCell

Building Open-source Agentic QA Harness with Memory

Building Open-source Agentic QA Harness with Memory

Hey Reddit,
I am the creator of agent-qa.

AI has accelerated development which allows devs to build products at lightning speed. But the confidence whether it works isn't there. Though coding agents can write tests on their own but they greedily writes tests to make them pass.

The intention of building agent-qa is to provide an AI native solution to E2E testing.
I have used playwright as a kernel for executing planned actions in the QA harness.

Looking forward to feedback.

GitHub - https://github.com/vostride/agent-qa
Consider giving it a ⭐
Thanks!

Demo - vostride.com/

u/IndianITCell — 20 hours ago

The Playwright maintenance trap (and why simply wrapping LLMs around it isn't enough)

Hey Reddit,
I’ve spent the last 5+ years fighting flaky E2E tests and false-positive pipeline failures.

Recently, I started experimenting with autonomous coding agents to write test scripts. They can generate hundreds of lines of code in seconds, but I quickly noticed a massive flaw: they greedily write brittle, static tests just to make the pipeline pass in the moment. The second a developer changes a CSS class or a UI rendering is delayed, the test breaks, and someone has to manually intervene.

I realized that simple "self-healing" or prompt-engineering an LLM to guess a new selector isn't an actual fix—it's just a band-aid.

So, I decided to take a completely different architectural approach. I open-sourced a framework called agent-qa, but instead of making it a standard test generator, I built it around the concept of an "evolving execution memory."

Here is how the architecture is set up:

  1. It uses Playwright (web) and Appium (mobile) strictly as a low-level execution kernel.
  2. The AI harness sits on top, but it doesn't write static code. It executes steps written in natural language by observing the actual screen state and accessibility tree.
  3. The core difference: With every test run, it builds persistent Product, Suite, and Test memory. If it encounters a broken UI element, it attempts alternative paths to heal the test on the fly. Crucially, it saves that successful intervention to its memory so it doesn't waste tokens or time making the same mistake on the next run.

It basically learns your application's navigation model over time, dropping execution time and token usage significantly on subsequent runs.

I’m looking for brutal feedback on this architecture from other QA and automation engineers.

Link to the open-source repo and the technical docs is in the comments. Let me know what you think of the execution memory approach.

reddit.com
u/IndianITCell — 1 day ago

The Playwright maintenance trap (and why simply wrapping LLMs around it isn't enough)

Hey Reddit,
I’ve spent the last 5+ years fighting flaky E2E tests and false-positive pipeline failures.

Recently, I started experimenting with autonomous coding agents to write test scripts. They can generate hundreds of lines of code in seconds, but I quickly noticed a massive flaw: they greedily write brittle, static tests just to make the pipeline pass in the moment. The second a developer changes a CSS class or a UI rendering is delayed, the test breaks, and someone has to manually intervene.

I realized that simple "self-healing" or prompt-engineering an LLM to guess a new selector isn't an actual fix—it's just a band-aid.

So, I decided to take a completely different architectural approach. I open-sourced a framework called agent-qa, but instead of making it a standard test generator, I built it around the concept of an "evolving execution memory."

Here is how the architecture is set up:

  1. It uses Playwright (web) and Appium (mobile) strictly as a low-level execution kernel.
  2. The AI harness sits on top, but it doesn't write static code. It executes steps written in natural language by observing the actual screen state and accessibility tree.
  3. The core difference: With every test run, it builds persistent Product, Suite, and Test memory. If it encounters a broken UI element, it attempts alternative paths to heal the test on the fly. Crucially, it saves that successful intervention to its memory so it doesn't waste tokens or time making the same mistake on the next run.

It basically learns your application's navigation model over time, dropping execution time and token usage significantly on subsequent runs.

I’m looking for brutal feedback on this architecture from other QA and automation engineers.

Link to the open-source repo and the technical docs is in the comments. Let me know what you think of the execution memory approach.

reddit.com
u/IndianITCell — 1 day ago
▲ 1 r/opensource+2 crossposts

agent-qa: Write tests in natural language. QA harness framework for web & mobile.

Hey Reddit,
I am the creator of agent-qa.

AI has accelerated development which allows devs to build products at lightning speed. But the confidence whether it works isn't there. Though coding agents can write tests on their own but they greedily writes tests to make them pass.

The intention of building agent-qa is to provide an AI native solution to E2E testing.
I have used playwright as a kernel for executing planned actions in the QA harness.

Looking forward to feedback.

GitHub - https://github.com/vostride/agent-qa
Consider giving it a ⭐
Thanks!

Demo - vostride.com/

github.com
u/IndianITCell — 13 hours ago
▲ 6 r/Appium+7 crossposts

Introducing agent-qa: Open-source AI end-to-end testing for web and mobile apps.

Hey, I am the creator of agent-qa.

AI has accelerated development which allows devs to build products at lightning speed. But the confidence whether it works isn't there. Though coding agents can write tests on their own but they greedily writes tests to make them pass.

The intention of building agent-qa is to provide an AI native solution to E2E testing.

vostride.com
u/IndianITCell — 13 hours ago