u/corporate925

do you feel like you're losing your actual testing instinct because of AI

i can still read a suite and understand what's happening just fine, and i'm confident making decisions around coverage strategy, risk areas, what needs exploratory attention versus what's safe to automate

but lately i've been relying heavily on tools like drizz or testim to actually write the entire end to end test logic, to the point where i barely write assertions from scratch anymore. describe the flow, tool produces the script, i review and push...

but i'm wondering are we becoming worse testers in terms of actual instinct? especially juniors who are starting with these tools already in place, are they even developing the foundational thinking anymore?

how are you handling this??

reddit.com
u/corporate925 — 3 days ago
▲ 31 r/antiai+1 crossposts

i work in testing and my team replaced genuine testing instinct with AI tooling.

i'm a QA engineer at a corporate setup. one tester, multiple markets, and a pipeline that needs proof of passing runs across three execution platforms before anything merges. sometime around february the team decided the repetitive overhead was too high and brought in AI tooling (drizz, testim, copilot) to absorb it

on the surface it worked exactly as advertised. regression coverage went up. sprint velocity improved. the number of automated test cases in the suite nearly doubled in two months. management looked at the dashboard and saw green

what the dashboard doesn't show is that nobody fully understands what half those tests are actually verifying anymore

the assertions were generated fast. the flows were mapped by tooling that has no concept of what the product is supposed to do for a real user. tests were written against implementation detail instead of behaviour because the AI had no way of knowing the difference and nobody slowed down long enough to catch it. the suite grew and the collective comprehension of what the suite meant quietly shrank in the opposite direction

the junior testers who came in after the tooling was already in place have almost no debugging instinct. they can prompt. they cannot tell you why a flaky test is flaky or what an assertion being too tightly coupled to internal state actually means for regression confidence. that understanding is supposed to come from writing tests badly first and learning from it. the tooling skipped that entire phase and called it efficiency

when something fails in production now the investigation takes longer than it used to. not because the bugs are more complex but because the test that should have caught it was generated by something that approximated coverage without understanding the risk surface

the velocity numbers are real. the sprint metrics are green. and i genuinely cannot tell you with confidence whether the next release is safe to ship or whether we have just built an elaborate system for feeling like we can

that gap between appearance and reality is the part nobody is measuring and nobody wants to talk about because the dashboard looks fine

reddit.com
u/corporate925 — 3 days ago
▲ 7 r/scrum+1 crossposts

tracked 3 months of my own PR failures. the test suite is blocking me in ways nobody else can see

around january my commit pace started dropping. not because features got harder but instead i was spending more time getting PRs through the gate than actually developing. so i started tracking my past three months, 30 plus PR failures across my own commits. the reason wasn't what i expected

genuine regressions were the minority majority of it split across three patterns… flaky locators tied to DOM attributes that shift between deployments, environment-specific failures from configuration drift between staging and rollout that nobody formally documented, and tests asserting against implementation details rather than behaviour. that last one is the worst. refactored a transformation module in february, cleaner logic, identical output, four tests failed because they were coupled to intermediate state that no longer existed, the feature worked but the suite disagreed

a lot of these tests were written under automation pressure the team needed coverage numbers up, sprint had a TC automation quota, so tests got written fast. no time to think properly about selector strategy, assertion design, or whether the test was actually verifying behaviour versus internal structure the suite grew, the metrics looked healthy, and the underlying fragility got baked in quietly

that's what i've been committing against for three months

the invisibility of it is what actually gets to me. sprint metrics don't capture time spent re-running pipelines or diagnosing flaky failures. from the outside my velocity looked low…. the suite looked green. those two things were directly connected and nobody was looking at that relationship

started logging failure reasons instead of just counts. flaky infrastructure, environment drift, wrong assertion target, genuine regression. each one has a completely different fix and collapsing them all into a single failure metric is how this stays invisible for months

I am not sure what the fix looks like at the team level yet

reddit.com
u/corporate925 — 4 days ago

31 api integration test cases development and hiring interviews in the same sprint.. my project is short staffed and it shows

this sprint started with API integration testing and development. 31+ service-to-service validation, payload schema checks, status code assertions, making sure the response contracts between upstream and downstream systems weren't drifting. postman collections across staging and rollout, edge case coverage on error handling flows, boundary testing on the data transformation logic etc etc the kind of work that needs an uninterrupted mental context because a missed field in a nested JSON response is exactly the thing that surfaces two sprints later and looks like a completely unrelated production issue

midway through the sprint the project flagged a resourcing gap. short staffed as many project members are about to get release, new joinee pipeline needed to move, QA perspective was needed in the hiring loop. so alongside the integration testing i m now evaluating candidates reviewing test design approach, probing on automation architecture decisions, assessing regression strategy thinking. two completely different cognitive loads running in parallel on the same day

the context switching cost is hard to quantify but it's real. If you guys are aware API testing requires holding the entire service contract in working memory… request structure, expected response schema, interdependencies across endpoints. breaking that for a 45 minute technical interview and returning to it is not a clean resume. you are cold-starting on a warm problem every single time

the hiring work is also not lightweight. prepping the scenario, running the session, debriefing after…. close to two hours per candidate. the project needs people because the current workload is unsustainable, which means the short staffed problem is directly generating additional work while we're actively trying to solve it

what kept the automation side from slipping entirely was offloading the repetitive assertion scripting to automation tooling. been running browser stack and drizz for the response validation logic and schema checks on recurring endpoint patterns, and testim for some of the UI-side regression that was piling up alongside this. the boilerplate that used to eat a full afternoon is getting handled faster, which created just enough headroom to keep the integration coverage from falling behind while the interviews were running in parallel

without this buffer something would have had to give. either API coverage would've been incomplete or the hiring rounds would've been rushed neither was acceptable given where the project is right now

reddit.com
u/corporate925 — 7 days ago

what 21 automated test cases actually looked like this sprint

so i just want to document what this sprint actually looked like because i need someone else to understand what happened here

it started with 21 test cases. sounds fine right. 21 is not a lot. i've done more in a sprint without it being a problem. except these 21 had to cover 5 different markets and nobody, at any point in the history of this project, thought to standardise the locators across them

same component with same functionality in five markets but each having completely different xpaths. not slightly different, neither close enough to handle with a variable. actually different. so what looks like 21 test cases on paper is really 21 scripts written five times over with market-specific logic because that's what the codebase required. that's where the first chunk of time went

then there are the environments….. staging and rollout both need to pass before anything moves forward and because the locator inconsistency exists across markets it also exists across environments so you're not just running the same suite twice, you're validating that each variation holds in both places

and then there's the process to actually get the code merged. first you run everything locally in vscode and collect proof it passes. then you push to jenkins and run the same TCs across all markets in both environments and collect proof again. then you do the same run in browserstack and collect that proof too. all of that goes to the mentors and the merge only happens if they approve every single piece of it

the jenkins runs alone take long enough that i started context switching to other tasks just to not sit there watching it. browserstack after that is another full cycle on top of it. and after all of that the merge is still in review

none of this complexity came from the test cases being hard the logic is straightforward, the scenarios are clear. the entire overhead came from basic structural decisions made before i joined the project that nobody went back to fix. locator standardisation across five markets would have made this a normal sprint. instead of what I had to face

21 TCs is a reasonable ask. 21 TCs with this setup is a different number entirely and it was never scoped that way

reddit.com
u/corporate925 — 8 days ago

new to QA, got assigned GTM dataLayer test cases for the first time… this is how i'm approaching it

april release was straightforward for me. pure manual, functional checks, nothing i couldn't handle by just using the product carefully

july release is different. still have the manual TCs but now i have also been assigned 10 test cases that involve validating events in the browser console using GTMDataObjects. first time dealing with anything like this

the way i've broken it down is like.. for the GTM side i'm not jumping straight into full automation. the plan is to open the browser console, trigger the event manually, and verify the dataLayer is pushing the right object with the right keys and values. once i understand what the correct output looks like consistently, i'll script the assertions around that. so the manual step is intentional it's to understand the data shape before i write anything

the GTMDataObjects give a structured way to verify what's actually being pushed to the dataLayer without having to dig through raw console output every time which helps

the split between manual TCs and these console ones is a bit awkward to manage in parallel but i'm treating them as two separate workflows rather than trying to merge them. manual runs first to catch functional issues, then console validation as a separate pass

not sure if this is the most efficient approach or if i'm overcomplicating the sequencing. would be useful to hear from anyone who's done GTM validation work before especially around how you structured the assertions

reddit.com
u/corporate925 — 10 days ago