OpenAI Five: Goals and Progress

How OpenAI Five works

OpenAI Five is a team of five artificial neural networks, which you can think of as simulated “brains” which our team has designed to be well-shaped for learning Dota but start with no knowledge. OpenAI Five sees the world as a list of 20,000 numbers which encode the visible game state, and chooses an action by emitting a list of 8 numbers. The OpenAI team writes code which maps between game state/actions and lists of numbers. Once trained, these neural networks are creatures of pure instinct—their neural networks implement memory but do not otherwise learn further. They play as a team, but we do not design special communication structures—only provide them with anincentive.

Training

OpenAI Five’s neural networks start out with random parameters, and uses our general-purpose training system, Rapid, to learn better parameters. Rapid has OpenAI Five play copies of itself, generating 180 years of gameplay data each day across tens of thousands of simultaneous games, consuming 128,000 CPU cores and 256 GPUs. At each game frame, Rapid computes a numeric reward which is positive when something good has happened (e.g. an allied hero gained experience) and negative when something bad has happened (e.g. an allied hero was killed). Rapid then applies our Proximal Policy Optimization algorithm to update the parameters of the neural network—making actions which occurred soon before positive reward more likely and those soon before negative reward less likely.

Takeaway

Just like humans don’t plan out their muscle movements while planning out their day, the community (OpenAI included) had expected long-term planning to require algorithms which handle short-term and long-term plans separately—perhaps via a hierarchical reinforcement learning breakthrough. But despite its very simple underlying algorithm, OpenAI Five learns professional-level strategies from scratch—no human data provided.

Recommend

Bing.com runs on .NET Core 2.1!

Intel firmware now unredistributable by OS vendors

摄像头如何抓住你超速、压线、闯红灯？

区块链技术开发路线 - 勋爵

GitHub - ncsa/ssh-auditor: The best way to scan for weak ssh passwords on your n...

Getting Started with Kibana Advanced Searches

Sets in Python

十大深度学习热门论文（2018年版）

Reusing Higher Order Components in React applications with Bit

NFS vs AFS

About Joyk