
our agency had been running Meta for over a year. ROAS was a stable 2.3x.
we had a kind of relationship where you can't really fire them but also can't brag about them at any dinner party that doesn't involve other people who also pay agencies.
every monthly call sounded identical. "we're testing new angles." "the algorithm needs more time." "let's allocate more to creative."
after month 11 I started suspecting their creative process was three guys in a Slack channel typing "what if we made the headline shorter."
Meanwhile, our reviews tab on Shopify had been quietly collecting dust since 2022. 612 reviews sitting there. Mostly 5-star. Some 3-star.
A few absolute roasts I'd been pretending didn't exist. I'd never opened the dashboard for anything other than embedding the rating widget on PDPs.
Then I had the dumb-but-correct thought that probably saved us $8k/month: my customers have already written better ad copy than my agency. I just need to find it.
The setup (took me 22 minutes)
Exported all 612 reviews from Judge. me as a CSV. You can do the same on Loox, Stamped, or Yotpo. Cleaned it in Excel. Added columns for star rating, product variant, review length in words, and date posted. The length column matters more than people think. I'll come back to that.
Then I started feeding them to Claude in batches of ~80. This is where most people screw it up. If you prompt with "find patterns in these reviews," you get summary-level slop.
"Customers liked the product." "Many mentioned good quality." Cool, useless, thanks.
The prompt has to force Claude to extract specific language in five categories:
- Pain language - what was their life like before they bought? Use the customer's exact words, not paraphrased.
- Transformation language - what changed after using it? Again, exact phrases. Grammar mistakes preserved.
- Objection language - anything that sounded like "I almost didn't buy because..." or "I was worried that..." This is the goldmine.
- Unexpected use cases - anyone using the product for something we never marketed it for?
- Comparative language - mentions of a competitor, a previous solution, or "I've tried everything else."
The non-obvious bit nobody tells you: the gold isn't in 5-star reviews. Five-star reviews mostly say "love it!!" and tell you nothing usable.
The gold is in long 4-star reviews where someone explains their entire journey, and 3-star reviews where they're slightly disappointed and accidentally tell you exactly which promise didn't land.
That's why the length column matters. Sort by word count descending, work through the top 20% first. That's 80% of your insights.
Three things this surfaced that actually changed our ad account
Finding 1: An objection literally nobody on the agency side had addressed.
47 reviews mentioned some variation of "I almost didn't buy because I thought it would be too [specific concern]."
Our existing ads were structured to confirm the concern and then overcome it ("yes it looks heavy, but actually...").
We rewrote the headlines to flip the objection into a benefit instead. New variant ran at a 31% lower CPA than the agency's best ad. [SCREENSHOT: side-by-side ad creative + performance numbers]
Finding 2: A use case we had never sold against.
19 reviews mentioned using the product during a specific seasonal moment. We had been marketing it as an everyday item for two years. Built one ad set around the seasonal angle.
Within three weeks it became our highest ROAS creative in the account. The agency had access to the same reviews. They never opened them.
Finding 3: The phrase that became our hook.
The phrase "I didn't think it would actually [X]" appeared in 31 reviews. Almost word-for-word. We dropped it as the opening line of a UGC-style ad.
Best-performing creative we have ever run. People in the comments started replying "this is exactly what I thought too."
Things to watch out for if you actually try this
Don't feed Claude reviews from multiple SKUs in one batch. The patterns blur into mush. Run each product separately even if it triples your time.
Claude will sometimes "clean up" customer language to sound more polished. Add to your prompt: "preserve exact phrasing including grammatical errors, slang, and unusual word choices." The authenticity lives in the awkward sentences.
Cross-check anything that surprises you. If Claude tells you "47 reviews mentioned X," ask it to quote five of them with row numbers.
Catches hallucination in 10 seconds. Skip this step and you'll write an ad around a phrase that doesn't exist.
100 reviews is the realistic minimum for patterns to emerge. Below that you're looking at individual feedback, not patterns. Above 1,000 you start getting diminishing returns and should batch by date instead.
The whole workflow takes ~30 minutes once you have the prompt chain dialed in. I run it monthly now on the new reviews from that month.
It's basically a free creative strategist who has read every customer testimonial you've ever received and remembers them all.
We didn't renew the agency contract last month. Not because they were bad. Because a $20/month Claude sub plus a CSV export was doing their job better than they were.
If you want the full review-mining system I built...
(the extraction prompts, the tagging template, the 30-min workflow doc, and sample outputs from 4 different shopify brands so you can see what good looks like before you start), let me know in the comments and I'll share the link with you.