RL-DARTS: Differentiable Architecture Search for Reinforcement Learning
RL-DARTS: Differentiable Architecture Search for Reinforcement Learning Yingjie Miao∗, Xingyou Song∗, Daiyi Peng Summer Yue, Eugene Brevdo, Aleksandra Faust Google Research, Brain Team Abstract We introduce RL-DARTS, one of the first applications of Differentiable Architec- ture Search (DARTS) in reinforcement learning (RL) to search for convolutional cells, applied to the Procgen benchmark. We outline the initial difficulties of applying neural architecture search techniques in RL, and demonstrate that by simply replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compati- ble with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code. Surprisingly, we find that the supernet can be used as an actor for inference to generate replay data in standard RL training loops, and thus train end-to-end. Throughout this training process, we show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competi- tive against manually designed policies, but also verify previous design choices for RL policies. 1 Introduction and Motivation Over the last decade, recent advancements in deep reinforcement learning have heavily focused on algorithmic improvements [25, 71], data augmentation [51, 35], infrastructure upgrades [15, 27], and even hyperparameter tuning [72, 16]. Automating such methods for RL have given rise to the field of Automated Reinforcement Learning (AutoRL). Surprisingly however, one relatively underdeveloped and unexplored area is in automating large-scale policy architecture design. Recently, works in larger-scale RL suggest that designing a policy’s architecture can be just as important as the algorithm or quality of data, if not more, for various metrics such as generalization, transferrability and efficiency.
[Show full text]