Comparing Learning Algorithms for Crossy Road Game Sajana Weerawardena, Alwyn Tan, Nick Rubin

Overview RL Approach Results ● Epsilon-greedy vanilla Q-learning ● Crossy Road is a 3D mobile version ● Game tree: agent never dies unless stuck w/ no escape ● Our state includes: of , and is one of the top ● Reinforcement learning: still training, but slowly improving (see below) ○ Safety status of 8 positions ○ Total states currently explored: ~8k ~ 4.3% of state space. grossing apps for iOS and Android. surrounding player ● Goal: create an intelligent agent to ○ Type of previous, current, and next (hopefully) play indefinitely rows (grass, road, or water) ● Infrastructure: we built our own Crossy ○ Direction of moving objects in the Road simulator in previous, current, and next rows ● Implemented both game tree and ○ Total existing states: 186,624 reinforcement learning agents, to ● Our rewards: compare how each performed ○ +7 for going forwards

○ -8 for going backwards ○ -300 if dead Game Tree Approach ● To train faster we trained on only road, Game tree (minimax) results: only water, and water/grass worlds, so ● Minimax implementation with a that more states were explored Depth 1 2 3 4 variable depth ● Reduced epsilon as we trained Average Score 213 390 2692 8536 ● All game objects (cars, logs, trees) treated as single opponent that moves deterministically 01101000 012 011 ● Evaluation function:

○ No penalty for going backwards Analysis

○ -99,999 if dead ○ Score = score * 1 / | x offset | ● Minimax agent, even with a low depth like 3 is able to achieve extremely to discourage moving to edge high scores because it will always survive unless there is a state in which every action leads to a death Challenges ● RL implementation isn’t as successful for a few reasons: ○ Crossy Road is a complex game and our feature extractor isn’t able to ● RL: struggled to craft a feature capture enough information about a state extractor that is robust enough in its ○ The agent has only been trained for 20-hours (~3,000 game iterations) representation of states, but isn’t too and it hasn’t yet encountered the vast majority of the 186,624 states that complex that the state space is our current feature extractor captures impossibly large. Also still trying to ● We expect that with more training our RL agent will improve optimize reward values ● Future work: function approximation, TD-learning, policy network

github.com/alwyntan/Crossy-Road-AI