Computer Out-Plays Humans in "Doom" Deep Learning Key to Mastering Videogame's 3-D World

Byron SpiceTuesday, September 27, 2016

An AI agent developed by two CMU computer science students outplayed both the game's built-in AI agents and human players in a recent Doom competition.

Kill or be killed is the essence of the classic video game Doom, and an artificial intelligence agent developed by two Carnegie Mellon University computer science students has proven to be the game's ultimate survivor — outplaying both the game's built-in AI agents and human players.

The students, Devendra Chaplot and Guillaume Lample, used deep-learning techniques to train the AI agent to negotiate the game's 3-D environment, still challenging after more than two decades because players must act based only on the portion of the game visible on the screen.

Their work follows the groundbreaking work of Google's DeepMind, which used deep-learning methods to master two-dimensional Atari 2600 videogames and, earlier this year, defeat a world-class professional player in the board game Go. In contrast to the limited information provided in Doom, both Atari and Go give players a view of the entire playing field.

"The fact that their bot could actually compete with average human beings is impressive," said Ruslan Salakhutdinov, an associate professor of machine learning who was not involved in the student project. Simply navigating a 3-D world, much less competing successfully in this game environment, is a challenge for such AI agents, he noted.

Chaplot and Lample began getting online attention for their work after posting a research paper, which they have submitted for review to a leading AI conference, and some YouTube videos of their game play that drew more than 100,000 views in the first three days. Then, the Visual Doom AI Competition, in which AI agents played against each other in deathmatches, announced at the IEEE Computational Intelligence and Games Conference Sept. 22 in Greece that the duo's agent had placed second to a team from Facebook in one track and to a team from Intel in the competition's other track.

"We didn't train anything to kill humans," emphasized Chaplot, a master's degree student in the School of Computer Science's Language Technologies Institute. "We just trained it to play a game," albeit a game in which body counts are a way of keeping score. Moreover, the deep reinforcement learning techniques they used to teach their AI agent to play a virtual game might someday help self-driving cars operate safely on real-world streets and train robots to do a wide variety of tasks to help people, he noted.

Chaplot said humans have natural advantages in chasing and dodging enemies in Doom's 3-D world. The game's own built-in agents have to cheat, accessing maps and other game information, to be competitive. He and Lample, who recently finished his master's degree in the LTI, trained their AI agent, called Arnold, to play the game based only on what is visible on the screen, just like human players.

To do so, they combined several existing deep learning techniques based on neural networks in their own unique architecture. When the player is navigating through the game, it employs a Deep Q-Network, a reinforcement learning architecture that DeepMind used to master Atari games. When an enemy is in sight, the agent switches to a Deep Recurrent Q-Network, which includes a long short-term memory (LSTM) module that helps the agent track the enemy's movements and predict where to shoot.

Though the AI agent relies on only visual information to play the game, Chaplot and Lample used an application program interface (API) to access the game engine during training. This helped the agent learn how to identify enemies and game pieces more quickly, Chaplot said. Without this aid, they found the agent learned almost nothing in 50 hours of simulated game play, equivalent to more than 500 hours of computer time.

Not only is Arnold fast and an accurate shot, but it has also learned to dodge shots, making it hard to kill. Though Arnold placed second in both tracks of the VizDoom competition, it had the lowest number of deaths and the best kill-to-death ratio by a significant margin. In track one, the agents navigated with a map and only one weapon; in track two, they navigated without a map and with multiple weapons.

For More Information

Byron Spice | 412-268-9068 | bspice@cs.cmu.edu