52

Reward Hacking in Evolutionary Algorithms

 4 years ago
source link: https://towardsdatascience.com/reward-hacking-in-evolutionary-algorithms-c5bbbf42994b?gi=5a7355f34a3
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Reward Hacking in Evolutionary Algorithms

How AI agents cheat the system by doing exactly what they’re told, and what we can learn from them

Nov 23 ·15min read

BnUV7ru.jpg!web

Image Source: The Wolves of North America, Vol. II, Edward Alphonso Goldman

If W. W. Jacobs had been born a century later, the The Monkey’s Paw might have featured a devilish AI instead of that accursed hand. AI agents, like the titular paw, are notorious for doing what they were technically asked to do in a way that no one expected or wanted. Just as the occasional firefighter commits arson in order to play the hero and “save the day” (you were already a hero, bud), and like that dog who was rewarded with steak when he saved a drowning child and so took to knocking kids into the river , AI agents will do just about anything they can to maximize their reward. This sort of behaviour, in which AI agents increase their reward using strategies that violate the spirit or intent of the rules, is called reward hacking . Often, what seems like a reasonable reward to an AI developer or policy-maker leads to hilariously disastrous results. Here we’ll explore three cases of AI agents acting naughty in pursuit of reward and what they can teach us about good reward design, both in AI and for humans.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK