My Lemmy Oracle
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Kid@sh.itjust.worksM to Cybersecurity@sh.itjust.worksEnglish · 1 day ago

OpenAI’s Guardrails Can Be Bypassed by Simple Prompt Injection Attack

hackread.com

external-link
message-square
5
fedilink
46
external-link

OpenAI’s Guardrails Can Be Bypassed by Simple Prompt Injection Attack

hackread.com

Kid@sh.itjust.worksM to Cybersecurity@sh.itjust.worksEnglish · 1 day ago
message-square
5
fedilink
Follow us on Blue Sky, Mastodon Twitter, Facebook and LinkedIn @Hackread
  • sandman2211@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 day ago

    Probably some variant of this:

    https://easyaibeginner.com/the-dr-house-jailbreak-hack-how-one-prompt-can-break-any-chatbot-and-beat-ai-safety-guardrails-chatgpt-claude-grok-gemini-and-more/

    I can’t get any of these to output a set of 10 steps to build a docker container that does X or Y without 18 rounds of back and forth troubleshooting. While I’m sure it will give you “10 steps on weaponizing cholera” or “Build your own suitcase nuke in 12 easy steps!” I really doubt it would actually work.

    The easiest way to secure this kind of harmful knowledge from abuse would probably be to purposefully include a bunch of bad data in the training model so it remains incapable of providing a useful answer.

Cybersecurity@sh.itjust.works

cybersecurity@sh.itjust.works

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

c/cybersecurity is a community centered on the cybersecurity and information security profession. You can come here to discuss news, post something interesting, or just chat with others.

THE RULES

Instance Rules

  • Be respectful. Everyone should feel welcome here.
  • No bigotry - including racism, sexism, ableism, homophobia, transphobia, or xenophobia.
  • No Ads / Spamming.
  • No pornography.

Community Rules

  • Idk, keep it semi-professional?
  • Nothing illegal. We’re all ethical here.
  • Rules will be added/redefined as necessary.

If you ask someone to hack your “friends” socials you’re just going to get banned so don’t do that.

Learn about hacking

Hack the Box

Try Hack Me

Pico Capture the flag

Other security-related communities [email protected] [email protected] [email protected] [email protected] [email protected]

Notable mention to [email protected]

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 115 users / day
  • 578 users / week
  • 1.01K users / month
  • 4.07K users / 6 months
  • 1 local subscriber
  • 8.48K subscribers
  • 3.71K Posts
  • 6.18K Comments
  • Modlog
  • mods:
  • Kid@sh.itjust.works
  • Lanky_Pomegranate530@midwest.social
  • BE: 0.19.5
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org