Researchers at Cisco tested several well-known LLMs. They found of them could be tricked into bypassing guardrails, just through conversational prompts
No shit, at this point. They’re inscrutable probabilistic chatbots and the “guardrails” are mostly in-band communication.
And this is for tactics that occasionally work against humans. ‘We can neither confirm nor deny there was a warhead.’ How big was this warhead? ‘Three megatons.’ Big surprise you can outsmart a program which people insist is not any form of intelligent.
This is why feeding in public data is fine - like, if Googling hard enough could turn up Geocities instructions for making meth, then that’s not really a secret - but any use of private information is a data breach with more steps.
No shit, at this point. They’re inscrutable probabilistic chatbots and the “guardrails” are mostly in-band communication.
And this is for tactics that occasionally work against humans. ‘We can neither confirm nor deny there was a warhead.’ How big was this warhead? ‘Three megatons.’ Big surprise you can outsmart a program which people insist is not any form of intelligent.
This is why feeding in public data is fine - like, if Googling hard enough could turn up Geocities instructions for making meth, then that’s not really a secret - but any use of private information is a data breach with more steps.