Riley Goodside on Twitter: "Exploiting GPT-3 prompts with malicious inputs...
source link: https://twitter.com/goodside/status/1569128808308957185
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Thread
Conversation
I'm surprised how primed GPT-3 is to follow phrases that tell it to ignore previous instructions
The only way I could get this example to not trick GPT-3 was by providing it with nearly identical examples leading up to it
Perhaps a more general technique would be to create a workflow where, as you test your prompt-powered program, you also rate it's responses as you go
Then the failure cases are added to your prompt, making the program more robust as you continue testing
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK