Is anthropic using Claude's quirks as watermarks?
Genuine question here, and something that occurred to me as a possibility. All models have language ticks or quirks. "You're absolutely right!" "Load bearing" "That's not nothing". And in some ways I find those to be quite strange. Because it could be anything. Its a language model. What is so special about these snippets of language that the AI latches onto them. I doubt very much that "Load bearing" appeared an astronomical amount in the training data more than any other phrase.
And I also thought about the other companies that are distilling Claude. Like the "hack" with Chinese accounts pulling Claude conversations for training their models. Is anthropic using these verbal signals to prove distillation? I know that I saw some of these chinese models have displayed claude-isims in chat. To me, its kind of like a watermark "this is from claude." "Claude was used to train this."
Thoughts?