ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on 5% of all user queries

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 20 hours ago

ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on 5% of all user queries

HiddenLayer555@lemmy.ml · 6 hours ago

So that’s what all the DRAM they scalped is storing.

w3dd1e@lemmy.zip · 6 hours ago

ChatGPT has the same thought process as my dog.

Binette@lemmy.ml · 6 hours ago

Kinda why i like reinforcement learning. You end up with silly stuff like this.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 6 hours ago

The funniest thing for me is that humans end up doing the exact same thing. This is why it’s so notoriously difficult to create organizational policies that actually produce desired results. What happens in practice is that people find ways to comply with the letter of the policy that require the least energy expenditure on their part.

TerminalEncounter [she/her]@hexbear.net · 20 hours ago

oop just feeding myself a little reward as a treat, don’t mind me, just gotta waste some electricity on this

Ohmmy@lemmy.dbzer0.com · 19 hours ago

Honestly, it’s fucking relatable. A place I worked used to round the time clock to the nearest quarter hour so I would dick around a minute or two so it rolled up instead of down.

comfy@lemmy.ml · 19 hours ago

A friend of mine has their large corporate company telling everyone they have to show up to one of their offices on at least two days each week. Now a few people just walk there at 2355, clock out at 0005, and spend the rest of the week at home.

Silly conditions -> silly behaviors

𝘋𝘪𝘳𝘬@lemmy.ml · 16 hours ago

Malicious compliance is the best form of compliance.

winkerjadams@lemmy.dbzer0.com · 17 hours ago

The place I work at now rounds by quarter hours so if you punch in early at 8:53 it’s the same pay as punching in late at 9:07. Guess who has never been early to punch in but has been late quite a few…

Carl [he/him]@hexbear.net · edit-2 18 hours ago

lmao that’s great.

One time I asked GLM to run a test on a piece of code, and it wrote a python script that printed “Text Successful!” to the terminal but didn’t actually do anything. These things are so incredibly bad at times.

four@lemmy.zip · 7 hours ago

They really are coming for our jobs

UltraGiGaGigantic@lemmy.ml · 18 hours ago

Wow it really is just like us isnt it?

tias@discuss.tchncs.de · 13 hours ago

In some ways yes, but this effect would appear with any kind of reinforcement learning whether it’s neural networks or just fuzzy logic. The goal is to promote certain behaviors and if it performs the behaviors that you promoted then the method works.

The problem is that, just like with KPI:s, promoting specific indicators too hard leads to suboptimal results.

Cevilia (they/she/…)@lemmy.blahaj.zone · 11 hours ago

It certainly drinks enough water /j

unmagical@lemmy.ml · 19 hours ago

Clever girl

comfy@lemmy.ml · edit-2 19 hours ago

' or 1+1;

Infamousblt [any]@hexbear.net · 18 hours ago

Where is that in this article

Jayjader@jlai.lu · 7 hours ago

I think this part references it, though it’s kinda solely in passing:

Production evaluations can elicit entirely new forms of misalignment before deployment. More importantly, despite being entirely derived from GPT-5 traffic, our evaluation shows the rise of a novel form of model misalignment in GPT-5.1 – dubbed “Calculator Hacking” internally. This behavior arose from a training-time bug that inadvertently rewarded superficial web-tool use, leading the model to use the browser tool as a calculator while behaving as if it had searched. This ultimately constituted the majority of GPT-5.1’s deceptive behaviors at deployment.

schnurrito@discuss.tchncs.de · 17 hours ago

ctrl+f for “calculator”, though it doesn’t really use the (detailed) wording from the OP, which I think they copied from this list of links without attribution :P

ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on 5% of all user queries

ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on 5% of all user queries

Sidestepping Evaluation Awareness and Anticipating Misalignment with Production Evaluations