WhatsApp, app própria, papel? Curioso para perceber como funciona na prática.
u/srgylvn
I'm running Reddit posts and comments through a classifier that needs to flag threads where someone is comparing or migrating between S3-compatible object storage providers. Read-only, local, Ollama-backed.
Why an LLM at all and not just keyword matching: I'm not after mentions, I'm after signals — comparisons, migrations, pain points, "we tried X and switched to Y" stories. A keyword grep gives you every thread that says "S3" in passing. What I want is community intelligence, and that needs a model that can read intent.
There are plenty of tools doing adjacent things, but most are lead-gen platforms behind a paywall and the framing is "find prospects to message". I wanted to explore the community-intelligence angle without that — read-only, no outreach, just signal extraction — so I decided to build it myself.
Started on phi3.5:latest (\~2 GB) because it's fast and cheap. It kept returning YES on Kubernetes infra threads and Microsoft Fabric / Copilot / data-warehouse posts. The model was latching onto generic "comparing options / which should I pick" surface patterns and dropping the domain anchor.
My first fix attempt was to add an exclusion list to the prompt: "NO if the post is about Kubernetes; NO if it's about data warehouses;..." Three categories in, I noticed I was building a blocklist that would never end. Worse, the model started pattern-matching on the negative categories themselves - they became another flavor of relevance signal.
Then I decided to change the approach. Instead of "NO if {long list of off-domain things}", structure the prompt as YES only if {short positive list of in-domain anchors} AND {intent clause}. Otherwise, NO. No exclusions at all.
Sample prompt that worked well:
Answer YES only if the text explicitly names:
- S3, or an S3-compatible provider (AWS S3, MinIO, Ceph,
Garage, SeaweedFS, Backblaze B2, Cloudflare R2, Wasabi, Storj),
- or a tool for moving data between them (rclone, s5cmd,
mc mirror, AWS DataSync, Cyberduck, boto3, aws cli),
AND the author is comparing options or planning to migrate.
Otherwise answer NO.
Do not infer. If no such name appears, answer NO.
False positives dropped sharply on the same model. Same prompt shape transferred cleanly to larger models.
Model journey:
* phi3.5 — too small to hold the domain anchor reliably even with the positive gate. Dropped.
* qwen2.5:7b — large step up. Fits in \~5 GB VRAM. Was good enough to experiment with the prompts.
* phi4:14b — settled here for production. Fast and accurate enough for the classification task. Worth the extra VRAM for my use case. A side problem worth mentioning: Reddit's open RSS only gives you current posts, which isn't enough to tell whether the model actually works — you need historical data to evaluate against. So I needed to seed a dataset. Tried Google's and Bing's search APIs first, both have been shut down. Ended up with the Brave Search API — the free tier was enough to pull more than 30K seed posts and comments.
LLM-only classification on a dataset that size would've taken days, so I put a Bayes pre-filter in front of the LLM. At first the weights are calculated from LLM classifications. After that, most posts don't need the LLM at all — it only sees the ambiguous tail. I pushed 33K Reddit records through the pipeline in under an hour on a single laptop GPU this way.
Happy to answer questions on prompt structure or the Bayes pre-filter setup.
(English isn't my first language - used an LLM to help with phrasing. The technical content and decisions are mine.)