Is anyone testing detectors against newer open-source models?
Open-source language models evolve rapidly and vary widely in style. Detectors may not be trained against those outputs yet. Is testing keeping pace with the ecosystem?
Open-source language models evolve rapidly and vary widely in style. Detectors may not be trained against those outputs yet. Is testing keeping pace with the ecosystem?
If schools are going to use ai detection at all, there should be a clear written policy stating the tool used, its known error rate, how scores factor into decisions, and what the appeal process looks like. Almost none of that exists anywhere right now.
Ran a few poems through some detectors out of curiosity and the results were all over the place. Whitman scored high, a haiku scored low, a spoken word piece was flagged heavily. Poetry has always broken rules, apparently that's suspicious now.
Better detection tools aren't the answer if schools still don't know how to respond to a flag responsibly. What's needed is clearer policy, proper investigation processes and genuine fairness. The tool is only one piece of a broken system.
Medical students write in precise clinical language because accuracy is literally a matter of life and death in their field. That same precision is what gets them flagged for ai. The tool doesn't understand context at all.
Tested about eight different detectors over the past month and not one of them was consistent across multiple tests. You'd think by now someone would have cracked this. Apparently not even close.
Longer essays may provide more data for analysis.
The training data used for detectors likely affects results.