Planning From Pixels in Atari With Learned Symbolic Representations Andrea Dittadi∗ Frederik K. Drachmann∗ Thomas Bolander Technical University of Denmark Technical University of Denmark Technical University of Denmark Copenhagen, Denmark Copenhagen, Denmark Copenhagen, Denmark
[email protected] [email protected] [email protected] Abstract e.g. problems from the International Planning Competition (IPC) domains, can be solved efficiently using width-based Width-based planning methods have been shown to yield search with very low values of k. state-of-the-art performance in the Atari 2600 video game playing domain using pixel input. One approach consists in The essential benefit of using width-based algorithms is an episodic rollout version of the Iterated Width (IW) al- the ability to perform semi-structured (based on feature gorithm called RolloutIW, and uses the B-PROST boolean structures) exploration of the state space, and reach deep feature set to represent states. Another approach, π-IW, aug- states that may be important for achieving the planning ments RolloutIW with a learned policy to improve how ac- goals. In classical planning, width-based search has been in- tions are picked in the rollouts. This policy is implemented as tegrated with heuristic search methods, leading to Best-First a neural network, and the feature set is derived from an inter- Width Search (Lipovetzky and Geffner 2017) that performed mediate representation learned by the policy network. Results well at the 2017 International Planning Competition. Width- suggest that learned features can be competitive with hand- based search has also been adapted to reward-driven prob- crafted ones in the context of width-based search.