Evolving Autonomous Agent Controllers As Analytical Mathematical Models
Total Page:16
File Type:pdf, Size:1020Kb
Evolving Autonomous Agent Controllers as Analytical Mathematical Models Paul Grouchy1 and Gabriele M.T. D’Eleuterio1 1University of Toronto Institute for Aerospace Studies, Toronto, Ontario, Canada M3H 5T6 [email protected] Downloaded from http://direct.mit.edu/isal/proceedings-pdf/alife2014/26/681/1901537/978-0-262-32621-6-ch108.pdf by guest on 27 September 2021 Abstract a significant amount of time and computational resources. Furthermore, any such design decisions introduce additional A novel Artificial Life paradigm is proposed where experimenter bias into the simulation. Finally, complex be- autonomous agents are controlled via genetically- haviors require complex representations, adding to the al- encoded Evolvable Mathematical Models (EMMs). Agent/environment inputs are mapped to agent outputs ready high computational costs of ALife algorithms while via equation trees which are evolved using Genetic Pro- further obfuscating the inner workings of the evolved agents. gramming. Equations use only the four basic mathematical We propose Evolvable Mathematical Models (EMMs), a operators: addition, subtraction, multiplication and division. novel paradigm whereby autonomous agents are evolved via Experiments on the discrete Double-T Maze with Homing GP as analytical mathematical models of behavior. Agent problem are performed; the source code has been made available. Results demonstrate that autonomous controllers inputs are mapped to outputs via a system of genetically with learning capabilities can be evolved as analytical math- encoded equations. Only the four basic mathematical op- ematical models of behavior, and that neuroplasticity and erations (addition, subtraction, multiplication and division) neuromodulation can emerge within this paradigm without are required, as they can be used to approximate any ana- having these special functionalities specified a priori. lytic function, thus eliminating the need to determine which operators to employ. More sophisticated operations can of Introduction course be added at the will of the designer; however, the- oretically the four basic ones constitute the minimal set of When looking to design an Artificial Intelligence (AI) capa- operations required and accordingly the designer can be re- ble of robust and adaptable behavior, one must first choose lieved of much of the guessing game. Furthermore, since how to represent such an agent in silico. Although many agents are represented directly as mathematical equations, such representations exist, two of the most common are the evolved agents are amenable to mathematical analysis direct computer code representations, which can be pro- post facto. grammed autonomously via Genetic Programming (GP) The rest of this paper is organized as follows. In the next (Koza, 1992; Poli et al., 2008), and Artificial Neural Net- section, previous work on evolving ANNs with learning be- works (ANNs), which are computational models based on haviors and on Genetic Programming for agent control is the biological neural networks of animal brains that can be presented. This is followed by the algorithmic details of designed using one of a variety of approaches (Floreano EMMs. A Double-T Maze problem that requires learning et al., 2008). capabilities is then described, with the results of EMM ex- Both of these representations require significant a pri- periments in this domain presented in the subsequent sec- ori design decisions. For a GP approach, one must choose tion. The source code for these experiments is provided. which programming operations to include (e.g., bit-shift, Finally, conclusions and future work will be discussed. and, or, xor, if-then-else, etc.), with a trade-off between potential program capabilities and search space size (and Background thus the ability of GP to find a solution). With ANNs, one must choose from many different types (e.g., Feedforward, Neuroplasticity and Neuromodulation Recurrent, Continuous-Time, Spiking, etc.), with trade-offs Autonomous agents are often evolved as artificial neural that have not yet been systematically investigated (Floreano networks (ANNs). Typically, when one is evaluating an et al., 2008). Other important design decisions for ANNs ANN, either in simulation or hardware, the ANN’s connec- include the type of activation function and the incorporation tion weights are fixed. It is only when generating offspring of learning capabilities. Thus, to arrive at an agent represen- genomes that genetic operators such as mutation can mod- tation appropriate for their task, a designer must first invest ify the ANN’s weights. This is contrary to biological neural ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems networks, however, where “neuroplasticity” allows for con- Soltoggio et al., 2008)). Continuous-time recurrent neural nections between neurons to change during the lifespan of networks (CTRNNs) have also demonstrated learning capa- an organism (Pascual-Leone et al., 2005). Neuroplasticity bilities, sometimes even outperforming plastic neural net- (or “plasticity” for short) is what enables biological organ- works (Jesper and Floreano, 2002; Tuci and Quinn, 2003). isms to learn, modifying the way they react to certain inputs from the environment. Genetic Programming for Agent Control Hebbian learning, which is based on how learning is Genetic Programming (GP) can be used to evolve au- thought to occur in biology (Hebb, 1949), can be used to im- tonomous controllers as computer programs. In the first plement plasticity in ANNs (Floreano and Mondada, 1996). such experiment (Koza and Rice, 1992), controllers were In this paradigm, local learning rules ∆wij are evolved for represented as variable-length trees containing sensor inputs each network connection weight wij . After each timestep and four preprogrammed macros, such as “if-less-than-or- t of an ANN’s lifespan, each of its connection weights is equal.” Programs that could control a simulated robot to find Downloaded from http://direct.mit.edu/isal/proceedings-pdf/alife2014/26/681/1901537/978-0-262-32621-6-ch108.pdf by guest on 27 September 2021 updated using its associated learning rule a box in an irregularly shaped world and push it to a wall from four different starting configurations were evolved. t+∆t t t+∆t wij = wij + η∆wij , (1) A linear implementation of GP was used to evolve ma- chine code to control a Khepera robot in (Nordin and where 0 ≤ η ≤ 1 is the evolvable learning rate. An example Banzhaf, 1995, 1997). This work was recently extended in learning rule ∆wij is the “plain Hebb rule,” defined as (Burbidge et al., 2009; Burbidge and Wilson, 2014), where machine code was evolved using Grammatical Evolution. ∆wt+∆t = (1 − wt )vtvt , (2) ij ij i j Here, agent genomes are binary strings that are mapped to their machine code phenotypes via a prespecified generative where vt is the output of neuron i at the current timestep. i grammar. In the learning paradigm described above, connection GP has also been used to evolve competitors for the weights are adjusted at every timestep throughout the life- RoboCup robotic soccer tournament. The team “Darwin time of an agent. This differs from biological systems which United” was evolved using GP with a variety of possible are theorized to use “neuromodulation” to control and stabi- operations, including basic mathematical operators, reading lize learning (Bailey et al., 2000). and writing to memory locations and executing a variety of To improve learning algorithm performance, the Analog programmer-designedsubroutines (Andre and Teller, 1999). Genetic Encoding (AGE) algorithm, an encoding method A simple GP approach was also used to evolve robot goalie based on biological gene regulatorynetworks that can evolve behaviors in (Adorni et al., 1999). This work is the most both the connections weights and structure of an ANN (Mat- similar to our Evolvable Mathematical Models paradigm tiussi and Floreano, 2004; D¨urr et al., 2006), was modified to presented below, as it is evolving mathematical equations allow for Hebbian learning which could be enabled and dis- as trees for robot control. However, these experiments were abled via neuromodulatory signals (Soltoggio et al., 2007; performed on a relatively simple task (especially consider- Durr et al., 2008; Soltoggio et al., 2008). In this approach, a ing the amount of information provided to the controller), generalized Hebbian rule from (Niv et al., 2002) was modi- had only one equation tree/agent output, employed two ad- fied to include a modulatory signal m: ditional operators (sine and cosine), and did not allow for t+∆t t t t t t extra state variables/equation trees to be evolved. ∆wij = m η Avi vj + Bvi + Cvj + D (3) where A, B, C and D are evolvable parameters that deter- Evolvable Mathematical Models mine the importance of the different types of Hebbian learn- Our Evolvable Mathematical Models (EMM) algorithm uses ing and 0 ≤ mt ≤ 1 is the current strength of the sig- Genetic Programming (GP) to evolve mathematical mod- nal produced by one or more special modulatory neurons. els of behavior. An earlier version of this algorithm that These modulatory neurons operate in a manner similar to evolved one ODE per agent output (i.e., there were no extra regular neurons, taking inputs from other neurons in the net- state variables) was presented in (Grouchy and D’Eleuterio, work through network connections and generating their out- 2010) and implemented in (Grouchy and Lipson, 2012). The put value m using an activation function. core idea is that one can evolve autonomous agent con- trollers as mathematical equations that map from agent in- Fixed-Weight Learning puts to outputs. Learning behaviors have been observed in several fixed- weight ANN-based experiments as well. Fixed-weight re- Representation current neural networks seem to be able to accomplish cer- An EMM-based agent is represented as a system of equa- tain tasks requiring learning, as the recurrent neural connec- tions, with one equation for each of the N experimenter- tions can act as a type of memory (e.g., (Stanley et al., 2003; defined outputs vi in the simulation.