[Submitted on 30 Jun 2022]

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of 10535 nodes, i.e., 10175 times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of 10164 nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.

Subjects:	Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
Cite as:	arXiv:2206.15378 [cs.AI]
	(or arXiv:2206.15378v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2206.15378

[2206.15378] Mastering the Game of Stratego with Model-Free Multiagent Reinforce...

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Recommend

中国首款纯太阳能汽车来了！造价1000万以上，神舟12同源技术

The Supply Chain Is Improving, Flexport Data Shows. Will It Be Enough?

SANS研究所：人为错误仍然是首要安全问题

关于独立开发者可能会面临的版权纠纷

2022-07-13-尼克希奇-盧多格雷茨直播視頻/錄像/回放|黑白直播

Google One Premium offering new premium Workspace features

2999元！华为MatePad 11新配色樱语粉今日开售：小姐姐爱了

Here’s how many 9to5Google readers said they suffer from Pixel 6 connectivity pr...

SpringBoot接口 - 如何优雅的对参数进行校验？ - pdai

Here's What Would Actually Happen If All The Planets Aligned

About Joyk