Evolving Behaviour Trees for Supervisory Control of Robot Swarms
Total Page:16
File Type:pdf, Size:1020Kb
Evolving Behaviour Trees for Supervisory Control of Robot Swarms Elliott Hogg1y, Sabine Hauert1, David Harvey, Arthur Richards1 1Bristol Robotics Laboratory, University of Bristol, UK. E-mail: [email protected], [email protected] Abstract: Supervisory control of swarms is essential to their deployment in real world scenarios to both monitor their operation and provide guidance. We explore mechanisms by which humans might provide supervisory control to swarms and improve their performance. Rather than have humans guess the correct form of supervisory control, we use artificial evolution to learn effective human-readable strategies to help human operators understanding and performance when con- trolling swarms. These strategies can be thoroughly tested and can provide knowledge to be used in the future in a variety of scenarios. Behaviour trees are applied to represent human readable decision strategies which are produced through evolution. We investigate a simulated set of scenarios where a swarm of robots have to explore varying environments and reach sets of objectives. Effective supervisory control strategies are evolved to explore each environment using different local swarm behaviours. The evolved behaviour trees are animated alongside swarm simulations to enable clear under- standing of the supervisory strategies. We conclude by identifying the strengths in accelerated testing and the benefits of this approach for scenario exploration and training of human operators. Keywords: Human Swarm Interaction; Swarm; Artificial Evolution; Supervisory Control; Behaviour Trees 1. INTRODUCTION • Parametric: Adjusting the characteristics of an emer- gent behaviour [6, 12]. Growing interest in the use of swarm systems has led • Environmental: The operator highlights points of inter- to new questions regarding effective design for real-world est within an environment [8, 26]. environments [5]. Although their use has great poten- • Agent Control: The operator can take control of an tial in areas ranging from search and rescue to automated agent and perform manual movements to influence its agriculture, there are still few examples of real-world de- neighbors [22, 23]. ployment. Human supervision has been proposed to im- prove the performance of swarming by maintaining the Each control method has shown the ability to posi- scalability and robustness of swarms [4] whilst taking ad- tively affect the swarms performance in varying scenar- vantage of human intelligence and understanding. ios, however, it is difficult to identify the best choice of Human Swarm Interaction (HSI) aims to adapt swarm control method for a given scenario. We need to consider models into hybrid systems which operate with the aid of how the type of application and conditions should influ- a human operator. The operator can compensate for the ence the way we choose to control the swarm. Other fac- limitations of swarming behaviours and increase perfor- tors that also need consideration include cognitive limita- mance by interacting with the swarm. The fundamental tions of the operator [16], the optimal timing for human assumption is that the operator has information regarding interaction [20], the operators knowledge of swarm dy- the task that the swarm cannot perceive. Based on this namics [11], and how the level of interaction affects the knowledge, the operator can identify when the swarm be- robustness of the swarm [25]. Due to the high dimension- haviour requires adjustment. This overcomes the swarm’s ality of these scenarios it is difficult to understand how limited capability to understand its environment and the HSI systems should be designed. nuances of a given task. This control scheme allows There is currently no way to explore supervisory con- the human to correct for poor performance and research trol of swarms systematically. HSI studies are currently has highlighted significant improvements over purely au- investigated using human trials. This is beneficial for un- tonomous swarms [1, 8]. derstanding how humans behave when learning to con- In HSI, research has investigated ways in which an op- trol swarms, however, here we focus on exploring the erator can infer high level knowledge to the swarm to im- broader context of swarm control and how to systemat- prove performance. Initial answers have defined methods ically design HSI systems. Rather than having the hu- to directly control how the swarm behaves. If the operator man guess the rules, we propose artificial evolution of has information that the swarm cannot perceive, then the decision-making to act as a surrogate for the human. In operator can take action to account for the swarm’s lack this investigation we generate a swarm supervisor which of knowledge. These types of control have been catego- can view and control the swarm at accelerated simulation rized into four different types as discussed in Kolling’s speed. Using this approach we can exploit the ability to survey on HSI [14]. test rapidly in different simulated conditions to gain bet- • Algorithmic: Changing the emergent behaviour of the ter understanding of HSI strategies. Solutions can be used swarm [1, 13]. to infer training of human operators and also inform our knowledge of swarm control and therefore we consider y Elliott Hogg is the presenter of this paper. its viability in scenario exploration within HSI. When an action node is ticked, it will return a success if the computation has been completed and returns failure if it could not be completed. For a condition node, the re- turn value is dependent on the condition statement. Refer to Colledanchise and Ogren’s book for greater coverage of behaviour tree concepts [3]. Fig. 1 Example behaviour tree. Numbering shows the 2.2. Genetic Programming of Behaviour Trees order in which the tree is ticked. Genetic programming (GP) of behaviour trees has been performed for a variety of applications [9, 17, 18]. The following section will discuss the background of The aim of GP is to evolve the structure of computer pro- the methodologies that have been used. Section 3 then grams represented in the form of hierarchies [15]. We discusses a subset of swarming scenarios that we have apply common practices in GP to evolve behaviour trees explored and how artificial evolution has been applied to in this investigation. In GP the genome is the complete produce control strategies. We then analyse and discuss structure of the computer program. Because of this, we our findings in sections 4 and 5. apply evolutionary operators which account for this struc- ture. The modular nature of BTs allows us to modify the 2. BACKGROUND structure of a tree and still produce a valid executable BT. This makes them applicable to evolution which is reliant BT models have been chosen for their human read- on random manipulation of genomes. We apply three ability and modularity. Trees can be modified and re- types, single point crossover, single node mutation, and ordered without any need to redesign individual nodes, sub-tree growth. These techniques allow us to modify this is useful for artificial evolution where structures can and combine genomes to explore different solutions. be modified through crossover and mutation. Examples of applications are given in [10, 17]. Human readability 3. METHODOLOGY of Bts also enables us to represent strategies in an easy to The following section will detail how artificial evolu- understand form when presented to human operators for tion has been applied to develop swarm control strategies training. for a search and rescue (SAR) scenario. In this work, we 2.1. Behaviour Trees use BTs to represent a particular strategy to control the swarm that is evolved to increase task performance. A BT is a hierarchical model which consists of actions, conditions, and operators connected by directed edges 3.1. Simulation Architecture [3, 21]. BTs can represent many decision-making sys- We use our own custom simulator programmed in tems, such as finite state automata and subsumption ar- Python. The simulator consists of 2-D environments chitectures. An example tree is shown in figure 1. These where a scenario is specified, the swarm size, and areas of trees are modular and can be constructed from a set of interest to explore. The agents are modelled as particles action and condition functions which are independent of which can move in any direction with a degree of random order. This makes them well suited to artificial evolution. motion, and thus, the swarms behaviour is not determin- A BT is executed in a particular sequence, each step is istic. Each agent follows a vector trajectory which is de- referred to as a tick which occurs at periodic time steps. pendant on the emergent behaviour of the swarm. Poten- The tree is ticked starting from the root of the tree, pass- tial field forces act as additional vectors to avoid obstacles ing down from parent to child. When the tick arrives at and the bounds of the environment. The agents travel at a leaf (A node with no children), which can either be an a speed of 0.5 m/s. This system has been deployed on a action or condition node, a computation is made which super-computer cluster which has allowed us to evaluate will return either success, failure or running. The value many scenarios in parallel. This system is beneficial as returned then passes back up the tree depending on the we can rapidly assess new conditions simultaneously. structure of the tree. This process is shown in figure 1. Trees are ticked depth first and from left right, hence, 3.2. Scenario nodes on the left have higher priority. A swarm of ten agents is tasked to search an environ- The operator nodes can be thought of as logical gates ment to find a set of objectives as quickly as possible. We which tick their children in steps. There are selectors, se- investigate performance over a range of environments and quences, and parallel operators.