ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on 5% of all user queries

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 3 months ago

ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on 5% of all user queries

Infamousblt [any]@hexbear.net · 3 months ago

Where is that in this article

schnurrito@discuss.tchncs.de · 3 months ago

ctrl+f for “calculator”, though it doesn’t really use the (detailed) wording from the OP, which I think they copied from this list of links without attribution :P

Jayjader@jlai.lu · 3 months ago

I think this part references it, though it’s kinda solely in passing:

Production evaluations can elicit entirely new forms of misalignment before deployment. More importantly, despite being entirely derived from GPT-5 traffic, our evaluation shows the rise of a novel form of model misalignment in GPT-5.1 – dubbed “Calculator Hacking” internally. This behavior arose from a training-time bug that inadvertently rewarded superficial web-tool use, leading the model to use the browser tool as a calculator while behaving as if it had searched. This ultimately constituted the majority of GPT-5.1’s deceptive behaviors at deployment.

ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on 5% of all user queries

ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on 5% of all user queries

Sidestepping Evaluation Awareness and Anticipating Misalignment with Production Evaluations