A Nethack Reinforcement Learning Framework and Agent Chandler Watson1 1Department of Mathematics, Stanford University

Quixote: A NetHack Reinforcement Learning Framework and Agent Chandler Watson1 1Department of Mathematics, Stanford University CS 229, Spring 2019 Abstract Objective Model 1: Basic Q-learning Model 3: ε Scheduling In the 1980s, Michael Toy and NetHack: a ”roguelike” Unix game Simple Q-learning model: Random works well—bootstrap off it: Future Directions Glenn Wichman released a new • Permadeath, low win rate • Hard to capture informative enough state • Much exploration needed, then exploitation game that would change the • Sparse “rewards,” large action space Both for framework and for agent: ecosystem of Unix games • Massive number of game mechanics • Much, much more testing! forever: Rogue. Rogue was a • (https://wikimedia.org/api/rest_v1/media/math/render/svg/47fa1e5cf8cf75996a777c11c7b9445dc96d4637) Unpredictable NPCs • Implement deep RL architectures dungeon crawler—a game in • Explore more expressive state reps. which the objective is to • Simple epsilon-greedy strategy • Directions to landmarks explore a dungeon, typically in • Primary reward is delta score • ≈c • Finer auxiliary rewards search of an artifact—which • Retracing penalty, exploration bonus • Use “meta-actions”: go to door, fight enemy would result in many spinoffs, • State representation: including the open-source NetHack, enjoyed by many Figure 4: Stepwise epsilon scheduling. today. NetHack is a game both beloved and reviled for its Results incredible difficulty—finishing the game is seen as a great Rand. QL QL + ε AQL AQL + ε accomplishment. Figure 1: A screenshot of the end of a game of NetHack. (https://www.reddit.com/r/nethack/comments/3ybcce/yay_my_first_360_ascension/) 45.60 44.92 51.96 32.53* 49.52 In this project, we explore the creation of a new NetHack Difficult but interesting environment for RL Figure 6: Possible ”meta-actions” for Quixote. reinforcement learning (RL) • Focus on navigation (10 actions) • Randomness as action: framework, Quixote, and the • Maximize points at end of game: go deeper creation of a Q-learning agent from this framework. Figure 3: The state representation for Model 1. Additionally, we explore a Framework domain-specific state • Choose relatively high ε = 0.1 representation and Q-learning • Try both per-state/action learning rates tweaks, allowing progress (opted for constant learning rate later) Figure 5: Results of running Quixote on 50 episodes without involvement of deep (mean scores given). *Based on 30 episodes architectures. Discussion Figure 7: The “random as action” model. Possible future directions include reorganization and • Validate over 20 episodes for discount factor While shallow QL appeared to perform at • Q-learning decreases randomness packaging of Quixote into a random performance, the modifications • Blend policy and random bona fide Python package for Model 2: Approximate Q-Learning made improved QL over baseline. public use, and the References incorporation of deep RL into Ensure that model is able to generalize • Hard to say how useful state rep. was Quixote for higher between states w/ LFA: • Hard to say if linear model underfit A. Fern, RL for Large State Spaces. Available: performance. https://oregonstate.instructure.com/files/67025084/download • Bias from restarting when RL stalled D. Takeshi, Going Deeper into Reinforcement Learning, 2019. Available: • Variance between runs too high to conclude https://danieltakeshi.github.io/2016/10/31/going-deeper-into- reinforcement-learning-understanding-q-learning-and-linear-function- Figure 2: The layout of the Quixote framework. approximation/ Q-learning, Wikipedia, 2019. Available: https://en.wikipedia.org/wiki/Q- In any case, the Quixote framework despite learning. Original framework for NetHack RL several early bugs was rugged and has great Y. Liao, K. Li, Z. Yang, CS229 Final Report: Reinforcement Learning to Play • Interactivity allows rapid prototyping, debugging • Same hyperparameters, rewards, etc. Mario, 2012. Available: potential as an RL environment. http://cs229.stanford.edu/proj2012/LiaoYiYang-RLtoPlayMario.pdf.

A Nethack Reinforcement Learning Framework and Agent Chandler Watson1 1Department of Mathematics, Stanford University

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support