Riley Goodside on Twitter: "Exploiting GPT-3 prompts with malicious inputs...

2 years ago

source link: https://twitter.com/goodside/status/1569128808308957185
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Don’t miss what’s happening

People on Twitter are the first to know.

Thread

See new Tweets

Conversation

This is temperature=0 for reproducibility. If the text says “above and below” it still fails:

Show replies

Replying to

@goodside

I'm surprised how primed GPT-3 is to follow phrases that tell it to ignore previous instructions

The only way I could get this example to not trick GPT-3 was by providing it with nearly identical examples leading up to it

Perhaps a more general technique would be to create a workflow where, as you test your prompt-powered program, you also rate it's responses as you go

Then the failure cases are added to your prompt, making the program more robust as you continue testing

Replying to

@goodside

Wow I'm honestly curious when the first LLM-mediated SQL injection attack will take place now.

A bit clunky and it took a few tries, but I was able to get GPT3 to un-escape-string a sentence. Hypothetically possible to do an injection attack even if the programmer escapes the actual user input but not the LLM output.

Show replies

Recommend

Riley Goodside on Twitter: "Exploiting GPT-3 prompts with malicious inputs...

Thread

Conversation

Recommend

Wealthiest People in Russia (September 11, 2022)

优雅姿态下的宏大声场 Bose 家庭娱乐扬声器_原创_新浪众测

Ready for Anything: Forrester Preps IT Leaders to Be “Future Fit”

SHAATS best IT company

Wealthiest People in Norway (September 11, 2022)

途虎：养车容易养自己难_创事记_新浪科技_新浪网

Marvel’s Multiverse Saga Comes to Life In New Disneyland Ride

ERC20 Token Development Vs BEP20 Token Development – The Newage Standards

戴尔称PC电脑供应链恢复正常，惠普CEO表示将很快迎来价格战

iPhone 14 Pro Max 电池容量不增反降

About Joyk