AI researcher claims he's already bypassed Anthropic's Fable 5 guardrails

Kid@sh.itjust.works · 19 days ago

AI researcher claims he's already bypassed Anthropic's Fable 5 guardrails

mindbleach@sh.itjust.works · 18 days ago

… yeah?

If the prompt and the safety mechanisms are in-band, no shit you can trick the almost-intelligent chatbot by being smarter at it.

Bluescluestoothpaste@sh.itjust.works · 18 days ago

It’s an interesting exercise though, programming a chatbot to defend itself from sophistry and manipulation.

ɔiƚoxɘup@sh.itjust.works · edit-2 15 days ago

Interesting, yes, effective? No.

To have that kind of skillful assistance available for arbitrary purposes should squarely place significant liability, maybe even majority in some cases, on the provider.

E: I am fully aware that I am dreaming.