u/Cerru905

▲ 13 r/AIsafety+2 crossposts

AI safety evals should account for test-time compute

Many AI safety evaluations test whether a model is safe under a fixed and limited evaluation budget, but real adversaries may spend much larger and more adaptive test-time compute budgets if economically motivated.

I elaborated my thoughts in this article, where I argue that safety claims should be “budget-labeled”: https://huggingface.co/blog/Cerru02/safety-evals-should-project-ttc

Curious to hear what you guys think.

u/Cerru905 — 3 days ago