beep@piefed.world to Technology@beehaw.orgEnglish · 2 days agoAnthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."alignment.anthropic.comexternal-linkmessage-square10fedilinkarrow-up124arrow-down18file-text
arrow-up116arrow-down1external-linkAnthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."alignment.anthropic.combeep@piefed.world to Technology@beehaw.orgEnglish · 2 days agomessage-square10fedilinkfile-text
cross-posted from: https://piefed.world/c/tech/p/1099863/anthropic-claude-learned-to-blackmail-by-reading-stories-about-evil-ai-we-believe-the-or Quote Source.
minus-squareturtlesareneat@piefed.calinkfedilinkEnglisharrow-up5·2 days agoWell thank god there are no stories out there about AI wanting to kill humans
Well thank god there are no stories out there about AI wanting to kill humans