We have a long list of approved tools that speed up engineering. Cursor, Claude Code, CodeRabbit, whatever tools and models we need to do our best work. We also use AI within our applications for certain user-facing features, internal processes, testing, monitoring, you name it.
My primary concern is this: even though we check a box on every service we use that says “don’t train on my data”, why are we trusting the companies that STOLE THE ENTIRE INTERNET, and have clear incentive to train on everything?
Correct me if I’m wrong, but unless they’re sloppy with their logs, there’s basically no way to prove specific material was ingested into these giant vector graphs.
Have we all given up and accepted that this is (probably) happening? Is there something I’m missing?