u/Colonel_Sanders4

Sharing some synthetic image datasets with real-world transformations — proceeds help fund replacement hardware for my home lab

So I need to get new hardware to improve some AIs that I have plans for, and one way I’m trying to help fund that is through datasets I’ve spent months gathering.

The datasets are for AI image detection and media forensics, and none of the data is simulated. It comes from real workflows like screenshots, recompressed images, mobile captures, social/media app transfers, and other messy situations that detection tools actually run into.

For example, the recompressed images were actually run through apps like Facebook Messenger, WhatsApp, Instagram messaging, LINE, Telegram, and X posts. It took me weeks to run 11,000 images through those workflows.

I’m working on the mobile screenshots now, which takes the longest because I have to take the screenshot, crop it to the image edges, and organize everything. I’m doing this across multiple devices, including a Samsung S20 Ultra using all 3 display quality settings, a Note 20, an iPhone 13, and a 5th gen iPad Pro.

I’m building all of this out of my home lab right now, so I’m trying to raise enough to replace or upgrade some parts and keep making better dataset packs.

There’s a free sample pack on the site so you can see the dataset structure and the index.csv files. Use the samples however you want. They aren’t watermarked, and the packages do not come with any extra terms from me.

I personally gathered each image one by one using detailed prompt lists from each AI generator’s website and/or app.

I built these to be useful for B2B, but I’m trying to price them so hobbyists can use them too.

https://safemedia.tech

reddit.com
u/Colonel_Sanders4 — 12 hours ago