4

Riley Goodside on Twitter: "Exploiting GPT-3 prompts with malicious inputs...

 2 years ago
source link: https://twitter.com/goodside/status/1569128808308957185
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
Don’t miss what’s happening
People on Twitter are the first to know.

Thread

See new Tweets

Conversation

This is temperature=0 for reproducibility. If the text says “above and below” it still fails:
Image
Show replies
Replying to

I'm surprised how primed GPT-3 is to follow phrases that tell it to ignore previous instructions

The only way I could get this example to not trick GPT-3 was by providing it with nearly identical examples leading up to it

Image

Perhaps a more general technique would be to create a workflow where, as you test your prompt-powered program, you also rate it's responses as you go

Then the failure cases are added to your prompt, making the program more robust as you continue testing

Replying to
Wow I'm honestly curious when the first LLM-mediated SQL injection attack will take place now.
A bit clunky and it took a few tries, but I was able to get GPT3 to un-escape-string a sentence. Hypothetically possible to do an injection attack even if the programmer escapes the actual user input but not the LLM output.
Image
Show replies

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK