
Everyone is having the same problem, a lot of people talke about it here, here is a solution.
Same as you.... I thought about one problem for months, ai being over confident in giving a wrong answer as much as if they were giving the right answer. This alone cost me a lot of money and time...days of training gone because an agent killed my online gpus...an agent sharing my apis key in a GitHub, an agent sharing one of my moats with the public, an agent deleting all my models memories....
Of course I figured it quickly, it did not think for months on how to solve this, actually, I just added an ai council with 5 individual graders and voila it worked, better quality all over the outputs and actions.
What i thought about for months, is how eliminate the problem, after few experiments, I reduced hallucination, after months... I think I can get rid of it all, to do so, I baked into the architecture of few of models 3 things :
\- Metacognition, the ability for a model to know when it doesn't know something, and simply the ability to say I don't know, instead of over confidently saying anything.
\- Logic and reason gates
\- A new detached system that reads a searchable indexable vector space, and enforces the response of the model. And if it doesn't answer, then the model should not speak...because it does not know it.
When you do all these,you will encounter some new problems, you have to solve, basically the model becomes slow in comparison to an architecture without all these 3. Of course I already solved this.
Hmmm now what, all is good, so how do you measure it, I created NEO, basically an honesty benchmark...
Most benchmarks reward how often a model gets the right answer. NEO rewards honesty about confidence. The leading chat models now answer hard questions correctly most of the time. The remaining failure is the dangerous one: a confident wrong answer that looks identical to a confident right one until you act on it. NEO measures whether a model says "I don't know" when it doesn't — and how often it makes things up instead
Made it for my self to test my model and improve it, but I was curious, who is the most honest model right now ??
So I took NEO, used it on the top 7 frontier models, can you guess who won ? The results in the screenshot. Full research papers and results + GitHub coming soon. What do you think ?