beep@piefed.world to Technology@beehaw.orgEnglish · 2 days agoAnthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."alignment.anthropic.comexternal-linkmessage-square10fedilinkarrow-up124arrow-down18file-text
arrow-up116arrow-down1external-linkAnthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."alignment.anthropic.combeep@piefed.world to Technology@beehaw.orgEnglish · 2 days agomessage-square10fedilinkfile-text
cross-posted from: https://piefed.world/c/tech/p/1099863/anthropic-claude-learned-to-blackmail-by-reading-stories-about-evil-ai-we-believe-the-or Quote Source.
minus-squarespit_evil_olive_tips@beehaw.orglinkfedilinkarrow-up12·2 days ago Anthropic: Claude learned yeah I’m gonna stop you right there
minus-squareXLE@piefed.sociallinkfedilinkEnglisharrow-up7·2 days agoHumanize the machine to dehumanize the person. Anthropic forgot the “mis” in their name
yeah I’m gonna stop you right there
Humanize the machine to dehumanize the person. Anthropic forgot the “mis” in their name