
GitHub has a serious fake engagement problem and I wanted to see how visible it actually is through the public API, its worse than I thought after I went down that rabbit hole...
Turns out: very visible. Yesterday's scan found 185 out of 185 engagers on a single repo were bots. Not 90%. Not "mostly suspicious". Every single one. The repo had zero legitimate stars.
What I built
phantomstars is a Python tool that runs daily via GitHub Actions (free, no servers):
- Scrapes GitHub Trending and searches for repos created in the last 7 days with sudden star spikes
- Pulls star and fork events from the last 24 hours per repo
- Bulk-fetches every engager's profile via the GraphQL API (account creation date, follower counts, repo history)
- Scores each account on a weighted model: account age (35%), profile completeness (30%), repo patterns (25%), activity history (10%)
- Detects coordinated campaigns using timestamp clustering and union-find: groups of 4+ suspicious accounts that engaged within a 3-hour window
- Files an issue directly on the targeted repo so the maintainer knows what's happening
Campaign IDs are deterministic SHA-256 fingerprints of the sorted member set, so the same group of bots gets the same ID across runs. You can track a farm across multiple days even as individual accounts get suspended.
What the pattern actually looks like
It's remarkably consistent. A fake engagement campaign in the raw data:
- 40-200 accounts, all created within the same 1-2 week window
- Zero original repositories, or only forks they never touched
- No bio, no location, no followers, no following
- All of them starring the same repo within a 90-minute window
- The target repo usually has a name implying it's a tool, hack, executor, or generator
Today's scan: 53 active campaigns across 3,560 accounts profiled. 798 classified as likely_fake. The repos being targeted are mostly low-quality AI tools and "executor" software that needs manufactured credibility fast.
Notifying the affected repo
When a repo hits a 40%+ fake engagement ratio or a campaign is detected, phantomstars opens an issue on that repo with the full suspect table: account logins, creation dates, composite scores, campaign membership. The maintainer sees it in their own issue tracker without having to find this project first.
Worth noting: a lot of these repos have issues disabled, which is a red flag on its own. Those get skipped silently.
Why I built this
Stars are how developers decide what to evaluate, what to depend on, what to recommend. When that signal is bought, it affects real decisions downstream. This started as curiosity about how measurable the problem was. The answer was more measurable than I expected.
It's part of broader research into AI slop distribution at JS Labs: https://labs.jamessawyer.co.uk/ai-slop-intelligence-dashboards/
The fake engagement problem and the AI content quality problem are really the same problem. Fake stars are the distribution layer that gets garbage in front of real users.
All open source. The data is append-only JSONL committed back to the repo after every run, queryable with jq.
Repo: https://github.com/tg12/phantomstars
Findings are probabilistic, false positives exist, the README explains the full scoring model. If your account shows up and you're a real person, there's a false positive process.
Questions welcome on the detection approach, GraphQL batching, or campaign ID stability.