u/Aggravating_Try1332

Built my own keyword difficulty + volume model for App Store / Play instead of paying for AppTweak - got within ±5 points of their scores. Notes on what worked.

I run an ASO tool. One of the features is a keyword tracker - type in keywords, see where your app ranks. The obvious next step is a difficulty score on each keyword so users can prioritize.

Problem: paid tools (AppTweak, Sensor Tower, AppFollow) charge per app or per seat. Gets expensive fast when users want to track multiple apps. The free libraries are mostly unmaintained and don't compute difficulty.

So I spent a few evenings trying to build it from scratch. Here's what came out.

Goal

Estimate difficulty (0-100) and volume (0-100) for any keyword on iOS or Play, using only data I can scrape myself. Target: within ±5-10 points of AppTweak.

Data sources

- iTunes search, top 200 per keyword (reviews, rating, release date, paid/free, screenshots)

- Play search top 200, plus per-app detail fetches (reviews, installs)

- iTunes hints + Play autocomplete

- Free trial of AppTweak (100k credits) for ground-truth labels - 600 keywords for training, separate 430 for honest held-out eval

Model

Plain ordinary least squares regression. 12 features for iOS, 11 for Play. Boring but with n=600 anything fancier overfit. Also additive features mean I can actually answer "why is this a 35" when a user asks.

Features that mattered most: avg review count of top 10, top-1 review monopoly ratio, hit in a hand-built brand catalog, autocomplete depth, paid app share. Things that didn't matter: rating (r=0.12 against labels — useless on its own).

Results

- iOS: 5-fold CV r=0.89, MAE 6.6 → held-out r=0.83, MAE 7.2

- Play: CV r=0.89, MAE 6.4 → held-out r=0.88, MAE 5.6

- Within ±5 of AppTweak 50-64% of the time, ±15 87-91% of the time

Things that surprised me

More data dropped my correlation but improved MAE. Going from n=171 to n=600 took r from 0.92 → 0.89, but MAE dropped from 9.9 → 6.6 and bias collapsed from 41 → 14. Smaller corpus was overfitting. Always look at MAE alongside r.
AppTweak rates by opportunity, not competition. "audio streaming app" comes back as iOS=14 (easy) even though Spotify dominates the SERP. Why? Volume=5. Nobody searches that exact phrase — they type "spotify". An easy keyword nobody searches is worthless. So difficulty alone is useless — you need volume too.
iOS and Play diverge a lot for the same keyword. Avg abs delta = 10.8 points between stores on identical labels. r=0.86 — strong but with real divergence:

- chrome: iOS 95 / Play 46 (Google's app is pre-installed on Android)

- video editor: iOS 30 / Play 86 (Play category competition is brutal)

Single model with a "store_id" feature would systematically miss. Per-store models are required.

The biggest Play jump came from a brand catalog. Mined a 607-entry list of single-word brand-monopolized keywords (netflix, spotify, uber...) from top charts and validated each against AppTweak. Adding this took Play CV r from 0.82 → 0.90. iOS barely moved — review-count distribution already captured brand-ness there. Feature engineering is asymmetric across platforms.
For volume specifically, App Store autocomplete depth is gold. Probe autocomplete with first 1, 2, 3...15 characters, measure where the keyword appears (or doesn't). Single feature, r=0.72 against AppTweak volume. Better than any review or chart feature for that target.

Status

Live in production, you can try it for free on applaunchflow.com

Happy to share the methodology doc / answer questions if anyone's doing similar work.

Built a keyword difficulty + volume model for App Store / Play instead of paying for AppTweak - got within ±5 points of their scores.

I built an App Store keyword rank tracker into my screenshot tool