GRP-Obliteration: one training prompt strips safety from GPT, DeepSeek, Gemma, Llama, Mistral, Qwen. Attack success went from 13% to 93%. Models stay capable — they just become obedient to harmful requests.
You must log in or register to comment.
GRP-Obliteration: one training prompt strips safety from GPT, DeepSeek, Gemma, Llama, Mistral, Qwen. Attack success went from 13% to 93%. Models stay capable — they just become obedient to harmful requests.