Monte Carlo Tree Search Compared to a * in Airspace Configuration

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/326211735 Monte Carlo tree search compared to A * in airspace conﬁguration decision problem Preprint · May 2018 DOI: 10.13140/RG.2.2.31176.21762 CITATIONS READS 0 75 2 authors, including: Gabriel Hondet Ecole Nationale de l'Aviation Civile 2 PUBLICATIONS 0 CITATIONS SEE PROFILE All content following this page was uploaded by Gabriel Hondet on 05 July 2018. The user has requested enhancement of the downloaded file. Monte Carlo tree search compared to A∗ in airspace configuration decision problem Gabriel Hondet, Benoît Viry May 21, 2018 Abstract a node 푣; • 푓푖푟푠푡(ℓ): returns the first element of ℓ. Dynamic airspace configuration is a highly combinatorial partitioning problem. Since exact methods tend to be overwhelmed by the complexity of I Introduction such problems, a stochastic approach is here considered. This paper presents the Monte Carlo tree The airspace is divided into sectors, themselves di- search algorithm using UCT search and adapted to vided into elementary modules. Each sector is man- one player games. The application of the algorithm aged by a controller working position composed of to the airspace partitioning problem is then detailed. two controllers. During the day, sectors are split Finally the Monte Carlo tree search is compared to and merged to be able to manage the varying traffic. the A∗. Splitting creates smaller sectors and is therefore used when traffic gets too dense. On the opposite, merging sectors allows fewer controllers to manage the same Notations airspace, and is therefore used when traffic becomes sparse. • 푚: elementary module; Currently, configuration is mainly decided on the • 푆: air traffic control sector, group of modules; fly by the chief officer. This decision is basedon • 푃 : partition of the airspace, group of sectors; the actual workload on each position. In this ap- • 푡: current time; proach, future workload estimation is based on the • 퐶(푃 , 푡): cost of a given partition 푃 at time 푡; controller’s feelings and it therefore lacks a work- • 퐶푡푟(푃1, 푃2): cost of transition between partitions load prediction tool. This papers aims at providing 푃1 and 푃2; a method predicting the airspace configuration. • 휋 = [푃1, … , 푃푛]: a sequence of partitions (called Several methods have been considered to solve the path in the tree search algorithms); dynamic airspace configuration problem, for instance • 푓(휋): cost associated to sequence of partitions via genetic algorithms in [13], constraint local search 휋; in [11], integer linear programming in [14] or dynamic • 풩: the set of nodes; programming in [3]. • 푢, 푣 ∈ 풩2: nodes In this paper, a temporal sequence of configura- • ℎ∶ 풩 → ℝ+: heuristic estimating the cost from tions is considered, as in [14] or [13]. Two costs a node 푢 to a final node; are considered to create the sequence, namely the • 휇푣: mean value of outcomes of simulations run cost associated to each configuration and a transition through or from a node 푣; cost. The former is based on the workload estimated • 푇푣: number of simulations run through or from for one sector given by a simple model. More com- 1 II PREVIOUS RELATED WORKS 2 plex models might be used such as a neural network avoided, thanks to the notion of compactness in [11] (see [10]). and balconies in [13]. To smooth the transition be- The sequence of configurations is extracted from a tween two configurations, the work associated with tree. The resulting sequence will therefore be a path the reallocation of one or more modules is evaluated. from the root to a leaf of this tree. This problem is This transition cost is included in the cost function highly combinatorial (partitioning of the airspace). being minimised (in [2]). Since stochastic tree search algorithms, combined In this paper, each partition scheme is based upon with deep neural networks have proved themselves the number of aircraft in each elementary module and worthy by outranking the best human player of one the cost of transition from the previous partitioning. of the most highly combinatorial game which is the The set of available parititions can be computed from game of Go, the Monte Carlo tree search [4] algo- the set of elementary modules and a context which rithm is used in this paper to fulfill the previously contains all available ATC sectors. mentioned task. The Monte Carlo tree search algorithm has been widely used in two-player games. Only three years af- ter its apparition in 1990, Brügmann applies it to the II Previous related works Go game in [5]. While the Monte-Carlo tree search is still extensively used in two-player games (and es- The dynamic airspace configuration problem requires pecially the Go game), its adaptation to one player a model of the airspace. In [13] or [14] the airspace game has been worked on. Auer et al. propose an is modelled via a graph. In those graphs, vertices upper confidence bound formula in [8] which appears represent elementary modules and an edge links two to be more efficient for one player games. The intro- adjacent modules. In [14], to be able to produce a duced formula uses the standard deviation of the out- sequence of configurations, the graph is time depen- come of the simulations. The latter article also uses dent. An other way to model the problem is to use a the rapid action value estimation (RAVE) technique constrained set of configurations as in [3]. This way to quicken the convergence of the algorithm. RAVE any configuration will match specified requirements, uses the “all moves as first” heuristic, in which all which can be qualified as hard constraints. moves1 seen during simulations are considered as a In most cases the cost of a configuration is based first move. This allows the algorithm to update more on the workload. Each approach seems to give their statistics in one simulation. Gelly et al. use in [9] own representation of the workload. For instance, [2] the RAVE technique coupled with several heuristics. determined workload density proportionally to the The heuristics bias the initialisaton of a node in the time spent by aircraft in each sector. A simpler ver- search tree, pre filling its statistics using prior knowl- sion [13] only uses the number of aircraft. On the edge of the problem. other hand, more complex methods, involving many more inputs are also available. For instance, [10] used several indicators, such as sector volume, or vertical III Algorithm incoming flows in the next 15 and 60 minutes. Those indicators as inputs to a neural network forecasting The task of building an optimal sequence of parti- the workload. tions (lowering as much as possible the workload of Other soft constraints appear to be relevant to have each controller) is fulfilled by a stochastic tree search a better model of the problem. For instance in [13] method, namely the Monte Carlo tree search. To as- and [2] a coordination cost is defined. It represents sess the quality of the results, an exact method (here the surplus of work added by flights travelling from A∗) is used. one sector to an other. The shape of the resulting sector is considered, as the simpler is the shape, the 1a move is informally considered as a decision taken regard- easier it is to manage. Complex shapes are therefore ing which state to choose while descending the search tree III ALGORITHM 3 III.A Monte Carlo tree search best node – according to a tree policy – among the children of 푢 is chosen. This type of problem can be III.A.1 Overview solved by bandits methods. Those methods consist Monte-Carlo is a best-first search method using in, given a bandit in front of several slot machines stochastic simulations. The algorithm is based on (multi armed bandit), deciding which machine will the computation of the reward expectancy of paths bring the highest reward knowing the past results. which is estimated through Monte-Carlo simulations. The objective of bandits methods is thus to maximise The algorithm actually uses two trees, an underlying the reward and minimise the regret of not playing the tree associated to the model (e.g. a game tree) and best machine. a search tree. The latter is built incrementally, each The bandit problem has been applied to MCTS via step being composed of four phases, namely the Upper Confidence Tree algorithm in [12] using 1. selection (or tree walk): choosing successively the UCB1 equation 1. Let 푢 be the node from which most promising nodes from the search tree, a child 푣 must be selected to go deeper in the tree, 2. expansion: adding new nodes to the search tree, 푇푢 the number of simulations carried out from node 3. simulation (or random walk): choosing succes- 푢 (which includes any simulation from nodes in any sively nodes from the model tree, from the ex- subtree of 푢) and 훽 a chosen constant. The selected panded node to a leaf, child 푣 is the one maximising the UCB1 equation 4. backpropagation (or backup): applying the re- 2 푙표푔 푇푢 sult of the simulation to the previously selected 휇푣 + 2훽√ (1) nodes of the search tree (phases 1 and 2). 푇푣 Those phases are repeated until a stopping criterion While the previous equation is well suited for two (e.g. memory or time) is reached, resulting in algo- players games, it can be tweaked to improve its effi- rithm 1 where the tree policy aggregates phases 1 and ciency in one player games or sequencing problems. 2 to create a new node of the search tree and the de- An alternative using the standard deviation 휎푣 of the fault policy gives an evaluation of the newly added outcome of the previous simulations involving node node.

Load more