See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/326211735

Monte Carlo tree search compared to A * in airspace configuration decision problem

Preprint · May 2018 DOI: 10.13140/RG.2.2.31176.21762

CITATIONS READS 0 75

2 authors, including:

Gabriel Hondet Ecole Nationale de l'Aviation Civile

2 PUBLICATIONS 0 CITATIONS

SEE PROFILE

All content following this page was uploaded by Gabriel Hondet on 05 July 2018.

The user has requested enhancement of the downloaded file. Monte Carlo tree search compared to A∗ in airspace configuration decision problem

Gabriel Hondet, Benoît Viry May 21, 2018

Abstract a node 푣; • 푓푖푟푠푡(ℓ): returns the first element of ℓ. Dynamic airspace configuration is a highly combi- natorial partitioning problem. Since exact meth- ods tend to be overwhelmed by the complexity of I Introduction such problems, a stochastic approach is here con- sidered. This paper presents the Monte Carlo tree The airspace is divided into sectors, themselves di- using UCT search and adapted to vided into elementary modules. Each sector is man- one player games. The application of the algorithm aged by a controller working position composed of to the airspace partitioning problem is then detailed. two controllers. During the day, sectors are split Finally the Monte Carlo tree search is compared to and merged to be able to manage the varying traffic. the A∗. Splitting creates smaller sectors and is therefore used when traffic gets too dense. On the opposite, merging sectors allows fewer controllers to manage the same Notations airspace, and is therefore used when traffic becomes sparse. • 푚: elementary module; Currently, configuration is mainly decided on the • 푆: air traffic control sector, group of modules; fly by the chief officer. This decision is basedon • 푃 : partition of the airspace, group of sectors; the actual workload on each position. In this ap- • 푡: current time; proach, future workload estimation is based on the • 퐶(푃 , 푡): cost of a given partition 푃 at time 푡; controller’s feelings and it therefore lacks a work- • 퐶푡푟(푃1, 푃2): cost of transition between partitions load prediction tool. This papers aims at providing 푃1 and 푃2; a method predicting the airspace configuration. • 휋 = [푃1, … , 푃푛]: a sequence of partitions (called Several methods have been considered to solve the path in the tree search algorithms); dynamic airspace configuration problem, for instance • 푓(휋): cost associated to sequence of partitions via genetic algorithms in [13], constraint local search 휋; in [11], integer linear programming in [14] or dynamic • 풩: the set of nodes; programming in [3]. • 푢, 푣 ∈ 풩2: nodes In this paper, a temporal sequence of configura- • ℎ∶ 풩 → ℝ+: heuristic estimating the cost from tions is considered, as in [14] or [13]. Two costs a node 푢 to a final node; are considered to create the sequence, namely the • 휇푣: mean value of outcomes of simulations run cost associated to each configuration and a transition through or from a node 푣; cost. The former is based on the workload estimated • 푇푣: number of simulations run through or from for one sector given by a simple model. More com-

1 II PREVIOUS RELATED WORKS 2 plex models might be used such as a neural network avoided, thanks to the notion of compactness in [11] (see [10]). and balconies in [13]. To smooth the transition be- The sequence of configurations is extracted from a tween two configurations, the work associated with tree. The resulting sequence will therefore be a path the reallocation of one or more modules is evaluated. from the root to a leaf of this tree. This problem is This transition cost is included in the cost function highly combinatorial (partitioning of the airspace). being minimised (in [2]). Since stochastic tree search algorithms, combined In this paper, each partition scheme is based upon with deep neural networks have proved themselves the number of aircraft in each elementary module and worthy by outranking the best human player of one the cost of transition from the previous partitioning. of the most highly combinatorial game which is the The set of available parititions can be computed from game of Go, the Monte Carlo tree search [4] algo- the set of elementary modules and a context which rithm is used in this paper to fulfill the previously contains all available ATC sectors. mentioned task. The Monte Carlo tree search algorithm has been widely used in two-player games. Only three years af- ter its apparition in 1990, Brügmann applies it to the II Previous related works Go game in [5]. While the Monte-Carlo tree search is still extensively used in two-player games (and es- The dynamic airspace configuration problem requires pecially the Go game), its adaptation to one player a model of the airspace. In [13] or [14] the airspace game has been worked on. Auer et al. propose an is modelled via a graph. In those graphs, vertices upper confidence bound formula in [8] which appears represent elementary modules and an edge links two to be more efficient for one player games. The intro- adjacent modules. In [14], to be able to produce a duced formula uses the standard deviation of the out- sequence of configurations, the graph is time depen- come of the simulations. The latter article also uses dent. An other way to model the problem is to use a the rapid action value estimation (RAVE) technique constrained set of configurations as in [3]. This way to quicken the convergence of the algorithm. RAVE any configuration will match specified requirements, uses the “all moves as first” heuristic, in which all which can be qualified as hard constraints. moves1 seen during simulations are considered as a In most cases the cost of a configuration is based first move. This allows the algorithm to update more on the workload. Each approach seems to give their statistics in one simulation. Gelly et al. use in [9] own representation of the workload. For instance, [2] the RAVE technique coupled with several heuristics. determined workload density proportionally to the The heuristics bias the initialisaton of a node in the time spent by aircraft in each sector. A simpler ver- search tree, pre filling its statistics using prior knowl- sion [13] only uses the number of aircraft. On the edge of the problem. other hand, more complex methods, involving many more inputs are also available. For instance, [10] used several indicators, such as sector volume, or vertical III Algorithm incoming flows in the next 15 and 60 minutes. Those indicators as inputs to a neural network forecasting The task of building an optimal sequence of parti- the workload. tions (lowering as much as possible the workload of Other soft constraints appear to be relevant to have each controller) is fulfilled by a stochastic tree search a better model of the problem. For instance in [13] method, namely the Monte Carlo tree search. To as- and [2] a coordination cost is defined. It represents sess the quality of the results, an exact method (here the surplus of work added by flights travelling from A∗) is used. one sector to an other. The shape of the resulting sector is considered, as the simpler is the shape, the 1a move is informally considered as a decision taken regard- easier it is to manage. Complex shapes are therefore ing which state to choose while descending the search tree III ALGORITHM 3

III.A Monte Carlo tree search best node – according to a tree policy – among the children of 푢 is chosen. This type of problem can be III.A.1 Overview solved by bandits methods. Those methods consist Monte-Carlo is a best-first search method using in, given a bandit in front of several slot machines stochastic simulations. The algorithm is based on (multi armed bandit), deciding which machine will the computation of the reward expectancy of paths bring the highest reward knowing the past results. which is estimated through Monte-Carlo simulations. The objective of bandits methods is thus to maximise The algorithm actually uses two trees, an underlying the reward and minimise the regret of not playing the tree associated to the model (e.g. a ) and best machine. a search tree. The latter is built incrementally, each The bandit problem has been applied to MCTS via step being composed of four phases, namely the Upper Confidence Tree algorithm in [12] using 1. selection (or tree walk): choosing successively the UCB1 equation 1. Let 푢 be the node from which most promising nodes from the search tree, a child 푣 must be selected to go deeper in the tree, 2. expansion: adding new nodes to the search tree, 푇푢 the number of simulations carried out from node 3. simulation (or random walk): choosing succes- 푢 (which includes any simulation from nodes in any sively nodes from the model tree, from the ex- subtree of 푢) and 훽 a chosen constant. The selected panded node to a leaf, child 푣 is the one maximising the UCB1 equation 4. (or backup): applying the re- 2 푙표푔 푇푢 sult of the simulation to the previously selected 휇푣 + 2훽√ (1) nodes of the search tree (phases 1 and 2). 푇푣 Those phases are repeated until a stopping criterion While the previous equation is well suited for two (e.g. memory or time) is reached, resulting in algo- players games, it can be tweaked to improve its effi- rithm 1 where the tree policy aggregates phases 1 and ciency in one player games or sequencing problems. 2 to create a new node of the search tree and the de- An alternative using the standard deviation 휎푣 of the fault policy gives an evaluation of the newly added outcome of the previous simulations involving node node. 푣 is proposed in [8] called the UCB1-tuned equation Algorithm 1 General MCTS [4] √ √푙표푔 푇 1 2 푙표푔 푇 휇 + 훽√ 푢 푚푖푛 ( , 휎2 + √ 푢 ) (2) procedure MctsSearchTree(푣0) 푣 푇 4 푣 푇 while within computational budget do ⎷ 푣 푣 휋 ← TreePolicy(푣0) 푣 ← 푓푖푟푠푡(휋) Δ ← DefaultPolicy(푣) Exploration exploitation trade-off The con- Backup(휋, Δ) stant 훽 answers to the exploration-exploitation end while dilemma. In the UCB1 equations 1 and 2, the right end procedure hand term increases the UCB value of less explored nodes to consider them as still promising. A high 훽 value will therefore make the algorithm prone to try unvisited nodes while a low value will consider almost III.A.2 Extending the search tree exclusively the results obtained so far, even if other paths are better but unexplored. Selection (algorithm 2) The aim of the selection is to build a path from the root of the search tree Expansion (algorithm 3) If the selected node has by choosing successively the most promising2 node. one or more unvisited children, the selection stops Given a node 푢 that has previously been selected, the and one of them is added to the search tree randomly, 2i.e. possibly leading to the best evaluation the latter being thus expanded by this new node. III ALGORITHM 4

III.A.3 Simulation and backpropagation The two steps described in this paragraph aim to guess the reward that can be expected from a path including a given node.

Algorithm 2 UCT algorithm Simulation Once a node has been expanded, ran- function TreePolicy(휋) dom nodes are chosen successively until a terminal 푣 ← 푓푖푟푠푡(휋) state is found. The cost of the resulting path, from if 푣 is terminal then the root to the node is then evaluated and backprop- return 휋 agated to all the ancestors of the expanded node. else if 푣 is not fully expanded then Heuristic One might want to bias the random- return Expand(푣) ∪휋 ness while choosing nodes. This would imply using a else heuristic which has to be able to discriminate a node 푓 ← BestChild(푣) between its siblings. return TreePolicy(푓 ∪ 휋) end if Backpropagation The backpropagation consists end if in updating the values required to carry out the tree end function policy. The values must be updated incrementally function BestChild(푣) since the backpropagation function has the current return 푎푟푔푚푎푥{UCB(푣′)|푣′ children of 푣} value of the parameters and the result of the simu- end function lation as parameters. Thus, for UCB1 equation, the expected reward (mean) and the total count are up- dated this way

Algorithm 4 UCB1 backpropagation procedure Backpropagate(푢, 푟) 훿 ← 푟 − 휇푢 휇 ← 휇 + 훿 푢 푢 푇푢+1 푇푢 ← 푇푢 + 1 end procedure

For the UCB1-tuned, the standard deviation has to be computed. It results in algorithm 5 which in- Algorithm 3 Expansion troduces the value 푚2(푢) associated to a node 푢 to function Expand(푢) compute the standard deviation via the formula ℓ ← {푣|푣 children of 푢, 푇푣 = 0} return random element from ℓ 2 푚2(푢) 휎푢 = . (3) end function 푇푢

III.B Sequence building Once the stopping criterium mentioned in III.A.1 is matched, the sequence of states can be extracted from IV MODEL 5

Algorithm 5 UCB1-tuned backpropagation Algorithm 6 Path building. The NextNode (here procedure Backpropagate(푢, 푟) max-child) function is one among those in III.B.1.

훿 ← 푟 − 휇푢 function MctsSearch(푢0, 푛) 휇 ← 휇 + 훿 푢 ← 푢 푢 푢 푇푢+1 0 ′ 훿 ← 푟 − 휇푢 휋 ← {푢} ′ 푚2(푢) ← 푚2(푢) + 훿훿 for 푖 = 1 to 푛 do 푇푢 ← 푇푢 + 1 MctsSearchTree(푢) end procedure 푢 ←NextNode(푢) 휋 ← 휋 ∪ {푢} end for the search tree. return 휋 end function function NextNode(푢) III.B.1 Choosing nodes return 푎푟푔푚푎푥{휇푣|푣 children of 푢} To build the final sequence, nodes are chosen accord- end function ing to a criterion. Chaslot et al. in [6] propose several methods to select nodes, returns the first element of 퐺 and 푓−insert(퐺, 푣) in- • max-child: select the child with highest mean serts 푣 in 퐺 ordering first by 푓 increasing then by 푔 reward; decreasing. • robust child: select the most visited root child; • secure child: select the child maximising a lower confidence bound. Heuristic To be sure to have the optimal solution, the heuristic ℎ has to be minimal i.e. let ℎ∗ be the optimal heuristic (the one giving the true distance III.B.2 Iterative pathfinding from a node to a final state), then for any node 푢, ∗ The path is built iteratively by calling successive ℎ(푢) ≤ ℎ (푢). Monte Carlo tree searches. Say an MCTS has been called on a node 푢. Once a stopping criterium is III.D Theoretical performances matched, a node 푣 among the ones reachable from 푢 In this paper, a high branching factor problem is con- is selected via one of the previously mentioned poli- sidered. cies. Then the MCTS algorithm is called back with Let 퐻 be the height of the tree, 퐾 the branching node 푣 as the root node. The final path is composed factor. Then, the complexity of the A∗ is expected to of the successive roots. The algorithm is described be, in the worst case, exponential in 퐻 (see [1]). in 6. Since the Monte carlo tree is a stochastic method, one must wait for a satisfactorily converging solution III.C A∗ rather than the exact solution. It has however been proven in [12] that the bias of the expected payoff To evaluate the exactness of the paths given by the tends to zero. MCTS algoithm, the A∗ algorithm is used. A∗ is an exact best first search pathfinding algorithm which uses a heuristic function to guide its search. The al- IV Model gorithm is given in 7 where 푢0 is the initial state, 푇 the set of terminal nodes, ℎ a heuristic function es- In this section, we discuss the model established to timating the cost from a state 푢 given as argument approach this configuration problem. First we define to a final state, 푘∶ 풩2 → ℝ. The function 푓푖푟푠푡(퐺) a structure for a controlled sector and all possible IV MODEL 6

Algorithm 7 A∗ algorithm [1] model. This collection is referred to as the context. ∗ It is the list of operation ATC sectors actually used procedure A (푢0) in ATC centres. 퐺 ← 푢0; 퐷 ← ∅; 푔(푢0) ← 0; 푓(푢0) ← 0 father(푢0) ← ∅ while 퐺 ≠ ∅ do IV.A.3 Transitions 푢 ← 푓푖푟푠푡(퐺); 퐺 ← 퐺\{푢} 퐷 ← 퐷 ∪ {푢} During the day, and with the evolution of the traf- if 푢 ∈ 푇 then fic, the workload (rigorously defined in IV.B) varies. return father To help controllers to maintain a homogeneous level end if of workload, the airspace partitioning needs to be for 푣 in childen of 푢 do adapted through successive transitions starting from if 푣 ∉ 퐷 ∪ 퐺 or [푔(푣) > 푔(푢) + 푘(푢, 푣)] an inital partition. This maneuver requires either co- then ordination or the opening of a new position and hence 푔(푣) ← 푔(푢) + 푘(푢, 푣) increases controller’s workload. This is the reason 푓(푣) ← 푔(푣) + ℎ(푣) why a full reconfiguration isn’t feasible, and only a father(푣) ← 푢 subset of all available transitions can be operated. 푓−insert(퐺, 푣) In the model considered here, three transitions are end if possible namely a merge, a transfer or a split. For end for an airspace composed of three modules 퐴, 퐵, and 퐶, end while those actions represent: end procedure • merge: {{퐴, 퐵}, {퐶}} → {{퐴, 퐵, 퐶}}

• split: {{퐴, 퐵, 퐶}} → {{퐴, 퐵}, {퐶}} transitions through time. Then a workload model is defined. And finally we define a cost per partition • transfer: {{퐴, 퐵}, {퐶}} → {{퐴}, {퐵, 퐶}} and a cost function aimed to be minimized. Each transition is composed of at most one merge, split or transfer; more would result in too complex IV.A State reconfigurations. IV.A.1 Time step Tree translation Since the overall goal is to compute a temporal se- quence of partitions over a day, the day has to be In the tree used in the algorithm exposed earlier, each sampled. In the model tree, a timestep separates a node has an embedded state. Considering a node 푢, node from its children. Practically, a timestep is one its children are the elements of the set of nodes having minute. as states the possible transitions from the state of 푢. Going down in the tree by selecting nodes is then equivalent to moving forward through time, each step IV.A.2 Partitions in the tree being a time step. In the configuration problem, the airspace needs tobe divided into sectors. Each sector represents an area IV.B Workload controlled by two controllers and is composed of one or more elementary modules. A partition 푃 is a set Let us now precise the model established to evaluate a of non-overlapping sectors covering all the airspace. workload metric. The workload depends on external A sector is a group of elementary modules, but in criteria such as sector and traffic complexity or hu- practice only a collection of sectors are allowed in this man factors (e.g. controller health or stress). In the IV MODEL 7

model assumed here, the workload felt by a controller with 훿ℎ(푃 , 푡) (resp. 훿푛(푃푡) and 훿푙(푃푡)) equals 1 if the is directly and only related to the traffic complexity. probability 푝high (resp. 푝normal and 푝low) is superior The simplest approch considers only the number of to the two others, and 0 otherwise. aircraft per sector 푁aircraft. We define two arbitary It is now possible to assign a cost to each partition. thresholds, namely 푡ℎlow and 푡ℎhigh, defining three This cost is linear regarding 푐+, 푐=, 푐− and 푛 (cardinal zones where the workload is low (resp. normal and of the partition 푃 ). The partition cost is defined as high) if 푁aircraft < 푡ℎlow (resp. 푡ℎlow ≤ 푁aircraft ≤ follows: 푡ℎhigh and 푡ℎhigh < 푁aircraft). Those workload levels 퐶(푃 , 푡) = 훼푐 + 훽푐 + 훾푐 + 휆푛 (7) are translated to probabilities 푝low, 푝normal and 푝high, + = − where 푝low(푆) is the probability that the sector 푆 is underloaded (resp. balanced and overloaded). In The parameters 훼, 훽, 훾 and 휆 (all positive) deter- mine a priority on which parameter to optimize. For short, for a sector 푆 at time 푡 and 푁aircraft: instance, a high value 훼 help to minimize 푐+, hence 푝푆,푡 = ퟙ (푁 ) low [0,푡ℎlow[ aircraft the overall number of sectors with too much traffic. 푝푆,푡 = ퟙ (푁 ) (4) In a real application, it maybe interesting to order normal [푡ℎlow,푡ℎhigh] aircraft those parameters as follow: 훼 > 훾 > 훽 > 휆. 푝푆,푡 = ퟙ (푁 ) high ]푡ℎlow,∞[ aircraft where ퟙ is the indicator function defined by: IV.C.2 Transition cost 1 if 푥 ∈ 퐼 In an operational context, each reconfiguration in- ퟙ퐼(푥) = { (5) creases the workload for controllers. This needs to 0 otherwise be considered via the definition of a transition cost. This expression of the workload, expressed in terms A transition is the difference of two partitions sep- of probabilities, allows us to use a more complex arated by only on time step. The transition cost is model for the workload such as a neural network then 0 if 푃 = 푃 (see [10]). This method learns the probabilities 푝low, 1 2 퐶푡푟(푃1, 푃2) = { (8) 푝normal and 푝high based on parameters including traf- 1 otherwise fic complexity and the complexity of the sector (e.g. airspace volume). IV.C.3 Objective function IV.C Partition cost and cost function Having defined a cost for partitions and transitions, it is now possible to aggregate everything in order to IV.C.1 Partition cost build a cost function over an entire path. For a path 휋 = [푃 , … , 푃 ] with a time from 푡 to 푡 , the cost In order to evaluate a given partition 푃 at time 푡, a 0 푛 0 푛 function is given by: cost 퐶(푃 , 푡) needs to be defined. In this paper, this cost depends on the workload in each sector. The 푛 definitions given in [7] are used to represent a high 푓(휋) = 퐶(푃0, 푡0) + ∑[퐶(푃푖, 푡푖)+ (respectively normal and low) cost for partition 푃 : 푖=1 (9)

푆,푡 2 휃 ⋅ 퐶푡푟(푃푖−1, 푃푖)] 푐+(푃 , 푡) = ∑ 훿ℎ(푆, 푡) ⋅ 푝high ⋅ |푆| 푆∈푃푡 with 휃 > 0 a parameter to determine. This is the −1 objective function to minimize in the A∗ algorithm. 푆 −2 푐=(푃 , 푡) = ( ∑ 훿푛(푆, 푡) ⋅ 푝푛표푟푚 ⋅ |푆| ) (6) The Monte Carlo Tree search algorithm maximizes 푆∈푃푡 an objective function. This function can be inter- 푆,푡 −2 preted as the reward of a path. Given the loss func- 푐−(푃 , 푡) = ∑ 훿푙(푆, 푡) ⋅ 푝low ⋅ |푆| tion defined previously, the target function canbe 푆∈푃푡 V RESULTS 8

constructed as: ∗ 5 A 1 푄(휋) = (10) MCTS 푓(휋) 4 IV.D Heuristic To use effectively the A∗ algorithm, a heuristic is 3 needed. Using the null heuristic (∀푢 ∈ 풩, ℎ(푢) = 0) – and thus using Dijkstra algorithm – revealed to be inefficient considering the branching factor. Let num of sectors 2 푘 ∈ ℕ, 푢 a node selected at time 푘. The heuristic ℎ(푢) gives the cost of the path starting from 푢 to a 1 leaf which is reached at a time 푡푓 . The heuristic is built without considering transitions (transition cost ∗ 0 5 10 15 20 and IV.A.3). Let 푃푖 be the partition with minimal cost at time 푖, the heuristic is then timestep

푓 ℎ(푢) = ∑ 퐶(푃 ∗, 푡 ) (11) 푖 푖 Figure 1: Grouping profile 푖=푘+1 This allows to generate the sequence of the best par- titions among all possible at each time step. 1. The first part of the scenario shows a busy situ- ation, all sectors are overloaded;

V Results 2. then follows a quiet moment with few aircraft, much of the sectors are underloaded; V.A Experimental setup 3. the end of the scenario has two overload modules For the tests, a fictitious area is used. It contains and the rest under normal or under load. 5 elementary modules and 12 control sectors. The resulting model tree has a mean branching factor of The expected behaviour is a tendency to increase the 6. Regarding the branching, the smallest French area number of sectors through splits to cope with part 1 – which is Paris West – has a branching factor of followed by many merges to reduce the number of 16 for 20 control sectors and 12 elementary modules. sectors in response to phase 2. The algorithms should Considering the already important branching factor, end with a specific sector created, the one overloaded a stochastic tree search method seems appropriate. in 3 and the rest grouped. Seeing 1, both algorithm The traffic is simulated the following way, each ele- behave as expected. mentary module has an associted list which, at index Accuracy can be quantitatively assessed by com- 푖 has the number of aircraft in it at time step 푖. Those paring the outcome of the MCTS algorithm against ∗ lists have been written by hand. the A cost and a greedy algorithm cost. The time al- lowed to the MCTS to run is considered long enough V.B Accuracy to allow the MCTS to converge to its best solution. The results are summarised in table 1. The graph 1 asserts the correct behaviour of both al- The graph 2 shows the evolution of the difference of gorithms A∗ and MCTS regarding grouping and de- costs with varying depth. The error rate is expected grouping. On the 푦 axis is indicated the number of to grow as the depth increase since the MCTS might sectors. The trafic activity is the following, fail to explore the best nodes among the exponen- VI CONCLUSION 9

Alg. Cost Err. A∗ 77.31 0% 850 MCTS 82.48 6.7% Greedy 90.50 17.1% 800 Table 1: Comparison of several methods on the gen- erated scenario (휃 = 3). 750 cost

140 700

120 650 100 0 0.5 1 1.5 2 80 time(s) cost 60

40 A∗ Figure 3: Monte Carlo tree search called with varying MCTS time allowed to compute each step, with a depth of 20 Greedy 40 and 휃 = 11. The cost seems to be approximately stable from 푡 = 0.5s. 0 5 10 15 20 depth pute the mean over all computed costs. The resulting graphs are shown in figure 3 in which it can be seen that the algorithm reaches its best cost Figure 2: Cost difference for varying depth with 0.5 when it runs for 1 second per step, but the cost is seconds allowed to MCTS and 휃 = 3. approximately stable from 0.5 seconds.

tially growing amount of them. The figure 2 does not show a constant growth of error rate between MCTS VI Conclusion and A∗. As hoped, the MCTS remains always better than the greedy algorithm. The presented work provides the basis of a method able to create dynamic airspace overture schemes. V.C Converging speed The model is built around a quantification of the workload felt by a air trafic controller. This quan- Since the MCTS converges to a solution, one expects tification is then used to create a cost function tobe to know the time required to be close enough to the minimised. The highly combinatorial nature of the optimal solution. To have an idea of this lapse, sev- problem makes it prone to be solved with stochastic eral MCTS are run with different computing time. methods. The Monte Carlo tree search algorithm is This results in a graph mapping the time allowed to then introduced since it has been used in some of the compute each step of the MCTS to the cost of the most combinatorial problem (game of Go). In this final path. To get rid of the undesired varations due paper, the algorithm is modified to take into account to the stochastic nature of the algorithm, the latter not only a result but all the path leading to this leaf. process is run several times; which allows one to com- It is also adapted to the one-player game nature of REFERENCES 10

the problem. [7] Etienne Ferrari. Prévision des ouvertures de secteurs de contrôle aérien, 2017. Further work Since all the tests has been run us- [8] Romaric Gaudel and Michèle Sebag. Feature se- ing the same fictitious area with the same trafic, nei- lection as a one-player game. 06 2010. ther the influence of the complexity of the airspace nor of the trafic have been assessed. It would be thus [9] Sylvain Gelly, Levente Kocsis, Marc Schoenauer, relevant to carry further tests on bigger areas with Michele Sebag, David Silver, Csaba Szepesvári, heavier and differing trafic. This can be emphasised and Olivier Teytaud. The grand challenge of by [7], in which the time needed to compute the cost : Monte carlo tree search and exten- with A∗ depends heavily on the trafic itself. sions. Communications of the ACM, 55(3):106– More complex workload models can also be used 113, 2012. such as neural networks in [10]. [10] David Gianazza. Forecasting workload and airspace configuration with neural networks and tree search methods. Artificial intelligence, References 174(7-8):530–549, 2010. [1] Jean-Marc Alliot, Thomas Schiex, Pascal Bris- [11] Peter Jägare, Pierre Flener, and Justin Pearson. set, and Frédérick Garcia. Intelligence artificielle Airspace sectorisation using constraint-based lo- et informatique théorique. Cépadues, 2nd edi- cal search. In ATM 2013, June 10-13, Chicago, tion, 2002. IL. Federal Aviation Administration, 2013.

[2] Judicaël Bedouet, Thomas Dubot, and Luis Ba- [12] Levente Kocsis and Csaba Szepesvári. Bandit sora. Towards an operational sectorisation based based monte-carlo planning. In ECML, vol- on deterministic and stochastic partitioning al- ume 6, pages 282–293. Springer, 2006. gorithms. In The Sixth SESAR Innovation Days, [13] Marina Sergeeva, Daniel Delahaye, Catherine 2016. Mancel, and Andrija Vidosavljevic. Dynamic [3] Michael Bloem and Pramod Gupta. Configur- airspace configuration by genetic algorithm. ing airspace sectors with approximate dynamic Journal of traffic and transportation engineering programming. 2010. (English edition), 4(3):300–314, 2017. [14] Tambet Treimuth, Daniel Delahaye, and San- [4] Edward Browne, Cameron Band Powley, Daniel dra Ulrich Ngueveu. A branch-and-price al- Whitehouse, Simon M Lucas, Peter I Cowl- gorithm for dynamic sector configuration. In ing, Philipp Rohlfshagen, Stephen Tavener, 8th International Conferenceon Applied Opera- Diego Perez, Spyridon Samothrakis, and Simon tional Research (ICAOR), volume 8, pages 47– Colton. A survey of monte carlo tree search 53, 2016. methods. IEEE Transactionson Computational Intelligence and AI in games, 4(1):1–43, 2012.

[5] Bernd Brügmann. Monte carlo go. October 1993.

[6] Guillaume Chaslot, Mark Winands, H Herik, Jos Uiterwijk, and Bruno Bouzy. Progressive strate- gies for monte-carlo tree search. 04:343–357, 11 2008.

View publication stats