An Application of Planning in Ethology
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Fifth International Conference on Automated Planning and Scheduling In-silico Behavior Discovery System: An Application of Planning in Ethology Haibo Wang1, Hanna Kurniawati1, Surya Singh1;2, and Mandyam Srinivasan1;2 fh.wang10, hannakur, spns, [email protected] 1 School of Information Technology and Electrical Engineering 2 Queensland Brain Institute The University of Queensland, Brisbane, Australia Abstract Using the in-silico behavior discovery system In-silico behavior discovery system Initial trajectory data It is now widely accepted that a variety of interaction strate- For each hypothesis, compute an optimal strategy gies in animals achieve optimal or near optimal performance. and generate a set of simulated trajectories. Hypotheses Rank the hypotheses, if initial data is available. The challenge is in determining the performance criteria be- ing optimized. A difficulty in overcoming this challenge is Design and Subset of the hypotheses Trajectory data the need for a large body of observational data to delineate conduct animal that generates simulated experiment trajectories closer to the hypotheses, which can be tedious and time consuming, if not animals’ trajectories impossible. To alleviate this difficulty, we propose a system — termed “in-silico behavior discovery” — that will enable Conventional approaches ethologists to simultaneously compare and assess various hy- Design and Inference A hypothesis Trajectory data Performance criteria potheses with much less observational data. Key to this sys- conduct animal tem is the use of Partially Observable Markov Decision Pro- experiment cesses (POMDPs) to generate an optimal strategy under a given hypothesis. POMDPs enable the system to take into account imperfect information about the animals’ dynamics Figure 1: In-silico and conventional approaches as consid- and their operating environment. Given multiple hypotheses ered for studying honeybee collision avoidance. and a set of preliminary observational data, our system will compute the optimal strategy under each hypothesis, gener- ate a set of synthesized data for each optimal strategy, and Davies and Krebs 2012), determining the exact performance then rank the hypotheses based on the similarity between the criteria that are being optimized remains a challenge. Ex- set of synthesized data generated under each hypothesis and the provided observational data. In particular, this paper con- isting approaches require ethologists to infer the criteria siders the development of this approach for studying mid- being optimized from many observations on how the ani- air collision-avoidance strategies of honeybees. To perform mals interact. These approaches present two main difficul- a feasibility study, we test the system using 100 data sets of ties. First, the inference is hard to do, because even ex- close encounters between two honeybees. Preliminary results tremely different performance criteria may generate similar are promising, indicating that the system independently iden- observed data under certain scenarios. Second, obtaining ob- tifies the same hypothesis (optical flow centering) as discov- servational data are often difficult; some interactions rarely ered by neurobiologists/ethologists. occur. For instance, to understand collision avoidance strate- gies in insects and birds, it is necessary to observe many Introduction near-collision encounters, but such events are rare because these animals are very adept at avoiding collisions. Etholo- What are the underlying strategies that animals take when gists can resort to experiments that deliberately cause close- interacting with other animals? This is a fundamental ques- encounter events, but such experiments are tedious, time- tion in ethology. Aside from human curiosity, the answer to consuming, and may not faithfully capture the properties such a question may hold the key to significant technological of the natural environment. Furthermore, such experiments advances. For instance, understanding how birds avoid col- may not be possible for animals that have become extinct, lisions may help develop more efficient collision avoidance such as dinosaurs, in which case ethologists can only rely techniques for Unmanned Aerial Vehicles (UAVs),while un- on limited historical data, such as fossil tracks. derstanding how cheetahs hunt may help develop better con- This paper presents our preliminary work in developing servation management programs. a system — termed “in-silico behavior discovery” — to en- Although it is now widely accepted that a variety of inter- able ethologists study animals’ strategies by simultaneously action strategies in animals have been shaped to achieve op- comparing and assessing various performance criteria on the timal or near optimal performance (Breed and Moore 2012; basis of limited observational data. The difference between Copyright c 2015, Association for the Advancement of Artificial an approach using our system and conventional approaches Intelligence (www.aaai.org). All rights reserved. is illustrated in Figure 1. 296 Our system takes as many hypotheses as the user choose performs an action a 2 A, and perceives an observation to posit, and data from preliminary experiments. Preliminary o 2 O. A POMDP represents the uncertainty in the effect data are collision-avoidance encounter scenarios, and each of performing an action as a conditional probability func- encounter consists of a set of flight trajectories of all honey- tion, called the transition function, T = f(s0 j s; a), with bees involved in one collision-avoidance scenario. Each hy- f(s0 j s; a) representing the probability the agent moves pothesis is a performance criterion that may govern the hon- from state s to s0 after performing action a. Uncertainty in eybees’ collision-avoidance strategies. For each hypothesis, sensing is represented as a conditional probability function the system generates an optimal collision-avoidance strat- Z = g(o j s0; a), where g(o j s0; a) represents the probability egy and generates simulated collision-avoidance trajectories the agent perceives observation o 2 O after performing based on that. Given the set of simulated collision-avoidance action a and ending at state s0. trajectories, the system will rank the hypotheses based on the At each step, a POMDP agent receives a reward R(s; a), if similarity between the set of simulated trajectories and the it takes action a from state s. The agent’s goal is to choose a preliminary trajectory data. By being able to generate sim- sequence of actions that will maximize its expected total re- ulated data under various hypotheses, the in-silico behavior ward, while the agent’s initial belief is denoted as b0. When discovery system enables ethologists to “extract” more in- the sequence of actions may have infinitely many steps, we formation from the available data and better focus their sub- specify a discount factor γ 2 (0; 1), so that the total reward sequent data gathering effort, thereby reducing the size of is finite and the problem is well defined. exploratory data required to find the right performance cri- The solution of a POMDP problem is an optimal policy teria that explain the collision avoidance behavior of honey- that maximizes the agent’s expected total reward. A policy bees. This iterative process is illustrated in Figure 1 π : B ! A assigns an action a to each belief b 2 B, and in- Obviously, a key question is how to generate the collision- duces a value function V (b; π) which specifies the expected avoidance strategy under a given performance criterion. In total reward of executing policy π from belief b. The value our system, we use the Partially Observable Markov Deci- function is computed as sion Processes (POMDPs) framework. One may quickly ar- 1 X t gue that it is highly unlikely an insect such as a bee runs a V (b; π) = E[ γ R(st; at)jb; π] (1) POMDP solver in its brain. This may be true, but the purpose t=0 of our system is not to mimic honeybee neurology. Rather, To execute a policy π, a POMDP agent executes an action we use the widely accepted idea in biology — i.e., most in- selection and a belief update repeatedly. Suppose the agent’s teraction strategies in animals achieve optimal or near op- current belief is b. Then, it selects the action referred to by timal performance — to develop a tool that helps etholo- a = π(b), performs action a and receives an observation gists predict and visualize their hypotheses prior to con- o according to the observation function Z. Afterwards, the ducting animal experiments to test the hypotheses, thereby agent updates b to a new belief b0 given by helping them to design more focused and fruitful animal b0(s0) = τ(b; a; o) experiments. In fact, POMDPs allow us to relax the need Z to model the exact flight dynamics and perception of the = ηZ(s0; a; o) T (s; a; s0)ds (2) honeybees. No two animals are exactly alike, even though s2S they are of the same species. This uniqueness causes varia- where η is a normalization constant. tions in various parameters critical to generate the strategies. A more detailed review of the POMDP framework is For instance, some honeybees have better vision than oth- available in (Kaelbling, Littman, and Cassandra 1998). ers, enabling them to sense impending collisions more accu- Although computing the optimal policy is computation- rately and hence avoid collisions more often, different hon- ally intractable (Papadimitriou and Tsitsiklis 1987), results eybees have different wing beat frequencies causing vary- over the past decade have shown that approximating opti- ing manoeuvrability, etc. Our system frames these variations mality provides speed (Pineau, Gordon, and Thrun 2003; as stochastic uncertainties — commonly used modelling in Smith and Simmons 2005; Kurniawati, Hsu, and Lee 2008; analysing group behavior — and takes them into account Silver and Veness 2010), a POMDP can start becoming when computing the optimal collision-avoidance strategy practical for various real world problems (Bai et al. 2012; under a given performance criterion. Horowitz and Burdick 2013; Koval, Pollard, and Srinivasa We tested the feasibility of our in-silico behavior dis- 2014; Williams and Young 2007).