arXiv:1903.10410v4 [cs.NE] 22 Sep 2020 ieepcac,wietedsutmycm rmspoiled causing therefore from and come poisoning, food increases may cause which disgust may food which desire the nutritive food The while of smells. expectancy, smell some the life for is from arise example come that One may disgust [2]. and offspring desire result generating expect the are life in emotions the succeeding increasing and for and feelings pressure evolutionary as those a the act All lifetime of and the penalties. decrease organism as may an act disgust positi of and lifetime other fear, lifetime Pain, the (or the rewards. desire is increase and reproduce may joy, will inputs) Pleasure, components organisms. species key the a the of that of negative ensures One or sensory reproduction. which positively which and affect learn survival may to and their sequences able input natural are or or by inputs know selected [1] selection organisms sexual However, systems. Intell or Artificial (AI) in data learning process. a reinforcement brains labeled evolutionary and their supervised and provide such organisms for of not functions product fitness does predefined a environment is real-world supervi brain A con- The the data. that in mutable world) sion and (our evolution environment unlabeled the provides an and throughout tinuously in living learned Rewards organisms and inputs. of encoded sensory are through penalties adapts continuously artificial of form simplest i the lifetime of emergence their the during intelligence. appro general to th adapt described lead have the that and that may generations ensures envision learn We the process environments. o to mutable through This type ability passed learning. to the intrinsic are of subject have topology, that type is and network controllers the What the trained environment. and cannot is not neurons, the networks process are of evolutionary the brains the knowledge of i newborn innate weights topology i.e., no synaptic the inherited, the topologies i.e., be while Network evolution, fixed, networks. through not neural complex ne plausible more spiking biologically become on without a based is lifetime agents model The their for embodiment. mechanism throughout through control envi self-learning learn through genera i.e., evolve and supervision, of agents rewards emergence where mental the proposed, for is intelligence framework learning distributed ocpulfaeok rica eea Intelligence General Artificial framework, Conceptual h ri satuyrmral optn ahn that machine computing remarkable truly a is brain The Terms Index Abstract ocpulBoIsie rmwr o the for Framework Bio-Inspired Conceptual A † I hswr,acneta i-nprdprle and parallel bio-inspired conceptual a work, this —In eateto optrSine owga nvriyo Sc of University Norwegian Science, Computer of Department vlto fAtfiilGnrlIntelligence General Artificial of Evolution Boisie I eflann,Embodiment, Self-learning, AI, —Bio-inspired .I I. ∗ eateto optrSine soMtooia Univers Metropolitan Oslo Science, Computer of Department NTRODUCTION ‡ eateto oitcSses iuaMtooia,Oslo Metropolitan, Simula Systems, Holistic of Department inyPontes-Filho Sidney mi:sdepolmtn,[email protected] [email protected], Email: igence chosen ancy uron ron- in s ach nd ve ∗ ly n s e s - , f l † 1 n tfn Nichele Stefano and oaodayint nweg bu h niomn.How- environment. the about knowledge connecti innate the of any does (weights) avoid agents strength NAGI’s to synaptic the the of contain genotype not plastic the different memory Moreover, with [6]. associative cells rules have long-term may a which is generate networks neural control to evolutionary factor under important incl learning The such algorithms. inhibit learning of o bio-inspired and local (excitatory structure their and mutable neurotransmitters network connection a their evolved the in neurons, an affects system sensory-motor in Evolution a environment. [5], controls bio- neurons that a structure spiking biologic is i.e., of models NAGI neurons, plausible uses framework. which framework the (NAGI) inspired (as Intelligence For flame own. General candle’s its on a childhood). learn their touching in to after experienced agent authors pain the the for example, cues se the contain because occurs inputs This uti itself. are reinforce agent th or the supervise by of to inputs affected sensory environment the then surrounding interpr actions, agent’s of the capable of is reactions agent the self-learning a le Therefore, self-reinforcement [4]. and and [3] mean increasing learning self-learn the self-supervised covers of of continuously self-learning environment of to changing definition Our continuously a complexity. organism a intelligence an in general We adapt of artificial faced. of ability systems form nervous the first simplest simplest the the the what in define essent mimics with intelligence therefore that beings environment is general living mutable It work, for a evolution. develop this quest through to the In form simplest tackle process. its to training AI propose the narrow we for Such performa AI. systems high narrow computing specialized or highly weak require result i.e., implementations outstanding tasks, achieved specific S has self-adaptation. for community and scientific generalization AI with of the fields, machin ability in many the intelligence in human-level e.g., researchers reproducing of by goal years the many for pursued beneficial environment. was surrounding what the know eve in not penalties, harmful did by or ancestors repulsed first and their rewards i though i to sensory made attracted of being “interpreters” selection be by natural to beings by living for Evolution possible reduction. lifetime a nti ok epooeteNuovlto fArtificial of Neuroevolution the propose we work, this In been has AI strong or (AGI) Intelligence General Artificial ec n ehooy rnhi,Norway Trondheim, Technology, and ience t,Ol,Norway Oslo, ity, ∗ , ‡ Norway , arning nsory far, o nputs usion eting lized ory), ons nce ing es, ity ial al n n e s s f t ever, controllers that are selected for reproduction are those III. RELATED WORK rewarded for their ability of self-learning and adaptation to new environments, i.e., an artificial newborn brain of an agent The idea of neuroevolution with adaptive synapses is not is an “empty box” with the innate ability to learn how to new. Stanley et al. [15] present NEAT with adaptive synapses survive in different environments during its lifetime. That is using Hebbian local learning rules, with the goal of training somewhat similar to how humans have a specialized brain part neural networks for controlling agents in an environment. for learning quickly any language when they are born [2]. The authors verify the difference in performance with and The remainder of this paper is organized as follows. without adaptation on the dangerous food foraging domain. It provides background knowledge for the proposed NAGI The differences between their approach and ours are that framework, and then describes related works. Afterward, it their environment is static throughout the agent lifetime and contains a detailed explanation of the conceptual framework. they have inheritance of synaptic strength. Their results show Finally, we conclude this work by discussing the relevance of that both networks with and without adaptive synapses reach our approach for current AGI research, and we elaborate on the maximum fitness on that domain, and therefore both possible future works which may include such a novel bio- present “adaptation”. An extended version of the previous inspired parallel and distributed learning method. method is Adaptive Hypercube-based NEAT (HyperNEAT) [16]. Adaptive HyperNEAT includes indirect encoding of the network topology as large geometric patterns. II. BACKGROUND A recent work that uses NEAT and no weight inheritance The proposed NAGI framework brings together several key is Weight Agnostic Neural Networks (WANN), introduced by approaches in artificial intelligence, artificial life, and evolu- Gaier and Ha [17]. WANN is tested successfully with different tionary robotics [7], briefly reviewed in this section. A Spiking supervision and tasks whose weights Neural Network (SNN) is a type of artificial neural network are randomly initialized. Such promising results demonstrate that consists of biologically plausible neuron models [5]. Such that the network topology is as important as the connection neurons communicate with spikes or binary values in time weights in an artificial neural network. This is one of the series. SNNs incorporate the concept of time by intrinsically motivations for the development of the NAGI framework. modeling the membrane potential within each neuron. Neurons WANN and NEAT with adaptive synapses have been shown spike when the membrane potential reaches a certain threshold. to be successful methods. However, such methods miss an When the signals propagate as neurotransmitters to the neigh- important component for self-learning, which is a mutable boring neurons, their membrane potentials are therefore af- environment as proposed in this work. fected, increasing or decreasing. While SNNs are able to learn A recent review of neuroevolution can be found in [18] through unsupervised methods, i.e., Hebbian learning [8] and and it shows how competitive NEAT and its extensions are Spike-Timing-Dependent Plasticity (STDP) [9], spike trains in comparison to deep neural networks trained with gradient- are not differentiable and cannot be trained efficiently through based methods for reinforcement learning tasks. Neuroevolu- . NeuroEvolution of Augmenting Topologies tion provides several extensions, which include indirect encod- (NEAT) [10] is a method that uses a Genetic (GA) ing to allow scalability, novelty search to promote diversity, [11] to grow the topology of a simple neural network and meta-learning for training a network to learn how to learn, adjust the weights of the connections to optimize a target and the combination with for searching deep fitness function, while allowing to keep diversity (speciation) neural network architectures. Furthermore, its authors envisage in the population and to maintain compatible gene crossover that neuroevolution will be a key factor to reach AGI through with historical marking. The used to adapt meta-learning and open-ended evolution. However, in NEAT the weights in the proposed NAGI framework includes the the neural weights are inherited, so there is no explicit target Hebbian learning rule as STDP. In particular, the weight for general intelligence and adaptation. adaptation of STDP happens when the neuron produces a spike A framework for the neuroevolution of SNNs and topology or action potential through the axon (i.e., output connection). growth with genetic algorithms is proposed by Schaffer [19], Such an event allows the modification of the synaptic strength with the goal of pattern generation and sequence detection. of the dendrites (i.e., input connections) that caused or did not Eskandari et al. [20] propose a similar framework for artificial cause that spike. creature control, where the evolutionary process modifies and Funes and Pollack [12] describe the body/brain interaction inherits the network topology and the SNN weights to perform (sensors and actuators vs. controller) as “chicken and egg” a given task. problem; the course of natural evolution shows a history of A method which tries to produce general intelligence incre- body, nervous system, and environment all evolving simultane- mentally is PathNet [21], where deep neural network paths are ously in cooperation with, and in response to, each other [13]. selected through evolution to perform the forward propagation Embodied evolution [14] is an evolutionary learning method and weight adjustment. Such evolving selection allows the for training agents through embodiment, i.e., embodied agents network to learn new tasks faster by re-using frozen (previ- learning in an environment. Thus, in nature, general intelli- ously learned) paths without catastrophic forgetting. Another gence is a result of evolved self-learning through embodiment. framework that tries to produce low-level general intelligence

2 is described by Voss [22]. It is a functional proof-of-concept to learn), instead of merely learning to solve the specific prototype, owned by the company Adaptive A.I. Inc., which environment. Agents are more likely to survive if they perform can interact with virtual and real world through sensors and correct actions. Agents have access to the environment through actuators. Its controller, which has conceptual general intelli- sensory inputs. The environment also provides intrinsic re- gence capabilities, consists of a memory to save all data and wards and penalties. New agents inherit the topologies of to store the proprietary cognitive algorithms. the controllers from the previous generation (untrained), with Multi-agent environments have also been considered a valu- the possibility of complexification (e.g., new neurons and able stepping-stone towards AGI because the behavior of synapses can appear through genetic operators). Training or agents must adapt to cooperate and compete among them. One learning happens throughout a generation. The goal of the of the first examples of such multi-agent environment is the untrained inherited controllers is to possess a topology that PolyWorld ecological simulator, introduced by Yaeger [23]. supports the ability to learn new environments. Neural learning PolyWorld is a simulated environment of randomly generated occurs by utilizing environmental information (sensory input food where evolving artificial organisms controlled by neural and environmental rewards/penalties) and neuroplasticity (e.g., networks with Hebbian learning live. Organisms are able to Hebbian learning through spike-timing-dependent plasticity). eat, mate, fight, move, change the field of view, and utilize The expected result is an unsupervised evolving system that body brightness as a form of emergent communication. Their learns without explicit training in a self-learning manner emergent behaviors are to some extent similar to the ones through embodiment. In this section, the components of the found in nature. Another recent multi-agent environment is conceptual framework are described in details. presented by Lowe et al. [24]. However, in such an environ- ment, the adaptation occurs in the actor-critic methods of their A. Data representation reinforcement learning framework. Such method outperforms The data that flows to and from the spiking neural networks traditional reinforcement learning approaches on competitive that control the agents are encoded as firing rate, i.e., the and cooperative multi-agent environments. Another reinforce- number of spikes per second. The firing rate has minimum and ment learning method which exploits multi-agent environ- maximum values, and is represented for simplification as real ments is introduced by Jaderberg et al. [25]. In their work, number between 0 and 1 (i.e., range [0.0, 1.0]). The stimulus they use the environment of Quake III Arena Capture the to the neural networks can be Poisson-distributed spikes which Flag, a 3D first-person multiplayer game. Their method in have irregular interspike intervals, as observed in the human this game exceeds human-level performance, therefore the cortex [31]. That representation can be used for encoding input artificial agents are able to cooperate and compete among from binary environments (e.g., binary numbers 0 and 1 or them and even with human players. One work that provides an Boolean values False and True), or multi-value environments open-source competitive multi-agent environment for research (e.g., represented as grayscale from black to white), and allows purposes is Neural MMO [26]. Here, the agents are players for representation of minimum and maximum activation values which need to survive and prosper in an environment similar of sensors and actuators. to the ones used in Massively Multiplayer Online Role-Playing Games (MMORPGs). B. Self-Learning through Embodiment One important aspect of natural evolution is the ability to A new agent learns through the reactions of an environment endlessly produce diverse solutions of increasing complexity, via embodiment (i.e., by having a “body” that affects an i.e., open-ended evolution (OOE). In contrast, OOE is difficult environment while sensing it). As such, the input of the neural to achieve in artificial systems. A conceptual framework for the network controller includes reward and penalty information implementation of OOE in evolutionary systems is presented for the learning process, such as the collision sensor in by Taylor [27]. Embodiment plays a key role in OOE in an autonomous robot whose activation represents a penalty, the context of the agent and its morphology, as discussed and a reward otherwise. This feedback information is the by Bongard [28]. For an articulated summary and discussion key factor for achieving self-learning. The concept of self- of OOE see [29]. In [30] the authors argue that open-ended adaptation is closely connected with embodied cognition, a evolution is a grand challenge for artificial general intelligence, core property of living beings [32]. In contrast, supervised and artificial life methods are promising starting points for the learning and reinforcement learning use the error of the neural inclusion of OOE in AI systems. network output to globally adjust the network model through methods of iterative error reduction, such as gradient descent. IV. FRAMEWORKCONCEPT In embodied learning, the input itself is used to adjust the The main concept of the proposed NAGI framework is to agent’s controller. Such sensory input contains the reactions mimic as close as possible the evolution of general intelligence of the environment to the actions of the agent. in biological organisms, starting from the simplest form. To In the proposed framework, the local learning rules of the do that, we propose a minimalistic model with the following spiking neural network controller are responsible to correct the components. An agent is equipped with a randomly-initialized global behavior of the network according to agent experiences. minimal spiking neural network. The agent is placed in a This learning approach is, therefore, a result of self-learning mutable environment in order to be able to generalize (learn through embodiment. The framework overview is depicted

3 in Fig. 1. Note that self-learning through embodiment only D. Neuroplasticity works with agents in reactive environments (environments Each neuron in the evolved spiking neural network may that affect the agents and are affected by them), such as any have a different plasticity rule. The different types of learning sensory-motor system deployed in the real-world. Non-reactive rules are subject to evolutionary control. Examples of learning environments, on the other hand, do not react to any action of rules include asymmetric Hebbian, symmetric Hebbian, asym- the agent, like any image classifier or object detector which metric anti-Hebbian, symmetric anti-Hebbian [9]. Together only gives environmental information, thus there is no mutual with all the Hebbian learning rules encoded in the , interaction between an agent and a non-reactive environment. there will be the effectiveness of the potentiation and the Therefore, we propose to create a virtual reactive environment depression of the synapse strengths, i.e., how strong the for such cases. Virtual Embodied Learning (VEL) is the learning rules are going to be for reducing or increasing the proposed method for such cases when no reward and penalty weight of the synapses. Moreover, other types of learning rules feedback is available through the sensory input. VEL adds discovered in may be added together or in paral- reward and penalty inputs to a given sensory-motor system as lel to those, such as non-Hebbian learning, , illustrated in Fig. 1. It can also include the internal states of and synapse fatigue [35]–[37]. The neuroplasticity will also the agents, such as hunger and health. In addition, VEL can be regulated by a maximum total value of synaptic strength substitute supervised and reinforcement learning by using the that a neuron can have for its dendrites. In case this value is loss of the model as penalty input and the opposite of the loss reached, the increase in the weight of a synapse will cause a as reward input. decrease of the others in the same neuron. This type of weight normalization is reported in [38], [39] for biological neurons. C. Mutable environment E. Neuroevolution To truly exploit and assess the self-learning capabilities and The population of (spiking neural network con- the generalization of the evolving spiking neural network, trollers) for the agents is evolved through a modification of a mutable environment is proposed. The evolutionary goal NEAT [10]. The genotypes of NEAT describe the topology of agents is to survive the changes in their environment. In and weights of the synapses, while our proposed method does the real-world, living organisms inherit modifications to their not evolve the weights while includes in the genotype the body and/or behavior through the generations. For example, type of neurotransmitter and neuroplasticity [9]. The weights a species may evolve a camouflage, such as the stick insects of the spiking neural networks are randomly initialized in [33], and another one may evolve the appearance of a poi- every generation because the agents should not have innate sonous or venomous animal, such as the false coral snakes knowledge of the environment [2]. Therefore, the proposed [34]. The proposed mutable environment is a simple metaphor framework focuses on the self-learning capabilities of the of such examples. agents. Their lifetime will be longer when agents perform Fig. 2a shows mutable environments that every agent in correct actions and shorter when they perform wrong actions. the population faces during its lifetime. Each agent has one The lifetime of agents is used as fitness score to define the sensor which provides one bit of information (i.e., black or best performing neural networks. white) and can perform two actions (i.e., eat or avoid). In Algorithm 1 explains how an agent’s genome is evaluated each generation, the agents are presented with environmental while it is in a mutable environment during its lifetime. The data from several environments. Each sample is presented for fitness score for the agent is equal to the time the agent is alive a given period to allow the agents’ controllers to learn. In until its death. Each agent has a maximum life expectancy. the first environment, the correct action eat is associated with Such life expectancy is reduced faster when an agent receives the white color while the action avoid is associated with a penalty and it is reduced slower when the agent receives black. Once the environmental data has been consumed by a reward. Both penalties and rewards reduce the lifetime of the agent, there is an abrupt change in the interpretation of agents, as one agent is not to live for an infinite amount of the environment (black and white are flipped) and the agents time if it always performs the correct action. are presented with the environmental data again. Agents that The neuroevolution process allows the growth of neural net- perform well in many environments within each generation work topologies and therefore the population is initialized with are more likely to go through the next generation. minimal networks that complexify over time. Nevertheless, Fig. 2b presents more complex mutable environments where there may be a penalty on lifetime to avoid the generation of agents have two sensors and non-binary environmental values big networks which may have neuron groups that specialize can be received. As shown in the figure, different environments for each different environment. Therefore it allows the network are procedurally generated and presented in each generation, to learn how to forget the previous environment, and then where abrupt changes in the labeling of correct and wrong ac- be able to adapt to the new one [40]. Another reason to tions have happened. The set of actions may also be expanded apply this penalty for the size of the network is that more to more than two, with different effects on agents’ lifetime and neurons require more energy to maintain them. This reduction their fitness scores. of lifetime caused by the number of neurons can be regulated

4 Environment Yes Internal states

Data Agent

Self-learning Reward Sensors Actuators Action controller Correct action? Penalty

No

Fig. 1. Illustration of virtual embodied learning or self-learning through embodiment in a non-reactive environment. In the case of a reactive environment, rewards and penalties are embedded within the environmental data.

(a)

(b)

Fig. 2. Samples of mutable environments that can be presented to the agents through their evolution. Agents can execute two actions (eat or avoid). Within each generation, after the agents of a generation have seen all samples of an environment, a new one is presented. (a) 1D environment where the agent has one binary sensor. (b) The agent has two non-binary sensors (i.e, the axes). by a parameter, therefore choosing it is of high importance to One key difference between AI and AGI is the learning the fitness and lifetime of the agent. ability. Most of AI methods (supervised, unsupervised, and reinforcement learning) are explicitly trained, while AGI needs V. DISCUSSION AND CONCLUSION some intrinsic ability to self-learn. While current AI methods such as deep learning and rein- One of the open questions for AGI research is: how can forcement learning (and their combinations) have proven to artificial agents be able to acquire the general skill of learning, be successful in solving a multitude of challenging tasks, e.g., in order to continuously adapt throughout their lifetime? defeating humans in the real-time strategy game Starcraft II In biological systems, we infer that self-learning is a result [41], there is a lot of debate around the limitation of current of rewards and penalties which are embedded in the sensory methods for breakthroughs in Artificial General Intelligence. data living beings receive from the environment (unlabeled

5 Algorithm 1 Agent’s genome evaluation using mutable environment and virtual embodied learning 1: procedure EVALUATE(genome) 2: agent ← new Agent(genome) ⊲ Agent is initialized with untrained neural network 3: lifetime ← 0 4: while agent is alive do 5: if dataset is empty then 6: dataset ← getNextDatasetForCurrentGeneration() ⊲ Next temporary environment of the current generation 7: data, label ← getRandomSampleAndDeleteFrom(dataset) 8: reward ← False ⊲ Initialization of reward 9: penalty ← False ⊲ Initialization of penalty 10: while (agent is learningSample) ∧ (agent is alive) do ⊲ Agent learns the presented sample with neuroplasticity for a period of time 11: lifetime ← lifetime +1 12: action ← agent(data, reward, penalty) 13: reward ← action = label 14: penalty ← action =6 label 15: agent.healthReduction(reward, penalty) ⊲ Penalty reduces the agent’s health faster than reward, then accelerating its death 16: return lifetime and mutable data). Their ability to learn through this form of complex tasks and environments. In NAGI, the general intelli- self-learning through embodiment is a result of evolution. gence, i.e., learning to learn to adapt to different environments, One of the goals of the proposed NAGI framework is an is a result of self-learning through embodiment. Therefore, AGI system that allows the adaptation and general learning the learning process is not a result of explicit training with skills through the three main levels of self-organization in supervision or reinforcement learning, as there is no loss living systems [42]: function used to adjust the neural network weights. The • Phylogeny, which includes evolution of genetic represen- proposed neural network model is a bio-inspired model based tations; on spiking neural networks. Their learning is based on spike- • Ontogeny, which takes care of the morphogenetic process timing-dependent plasticity which uses only input data for (growth) from a single cell to a multicellular machine, by learning. As such, penalties and rewards are embedded within following the genotype instructions; the environmental data sensed by the agents. • Epigenesis, which allows the emergence of a learning sys- This work describes the details of the NAGI conceptual tem through an indirect encoding between genotype and framework as a novel paradigm for self-learning. Therefore, phenotype, and the phenotype is subject to modifications the experimental results are not included in this contribution. (learning) throughout the lifetime while interacting with However, our current experimental results are promising, and the environment. part of a separate contribution. We, therefore, envision the proposed spiking neural net- The NAGI conceptual framework proposes a computational work model will include developmental and morphogenetic system which may allow the simplest form of general intelli- processes [43] in future extensions of the framework. gence observed in nature to emerge. Self-learning through em- Another envisioned stepping-stone to AGI is the extension bodiment shifts the way machines currently learn by changing of the framework to artificial life multi-agent systems. Multi- the paradigm of supervised and reinforcement learning. Our agent environments will allow the emergence of more ad- efforts are also to reduce the gap between biological neural vanced strategies of adaptation and learning based on collabo- networks computation and artificial intelligence implemen- ration and competition. In addition, the framework may benefit tations, allowing for a biologically-inspired neural network from extending the environment itself into an evolving agent, model that suits the paradigm in artificial life of massively which can also allow for increased complexity and open-ended parallel, distributed, and local interactions. evolution. Finally, we expect that future implementations of the NAGI ACKNOWLEDGEMENTS framework and its extensions will be deployed/embodied into real robot agents equipped with physical sensors. We thank Gustavo Moreno e Mello, Kristine Heiney, Anis In conclusion, this work proposes a novel general frame- Yazidi, and Benedikt Vogler for thoughtful comments and work for the neuroevolution of artificial general intelligence discussions. This work was supported by Norwegian Research (NAGI) in its simplest form, which can be extended to more Council SOCRATES project (grant number 270961).

6 REFERENCES [24] R. Lowe, Y. Wu, A. Tamar, J. Harb, O. P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environ- [1] S. J. Arnold and M. J. Wade, “On the measurement of natural and sexual ments,” in Advances in Neural Information Processing Systems, 2017, selection: theory,” Evolution, vol. 38, no. 4, pp. 709–719, 1984. pp. 6379–6390. [2] A. Zador, “A critique of pure learning: What artificial neural networks [25] M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. can learn from animal brains,” bioRxiv preprint bioRxiv:582643, 2019. Castaneda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman [3] P. Sermanet, C. Lynch, Y. Chebotar, J. Hsu, E. Jang, S. Schaal, S. Levine, et al., “Human-level performance in first-person multiplayer games and G. Brain, “Time-contrastive networks: Self- with population-based deep reinforcement learning,” arXiv preprint from video,” in 2018 IEEE International Conference on Robotics and arXiv:1807.01281, 2018. Automation (ICRA). IEEE, 2018, pp. 1134–1141. [26] J. Suarez, Y. Du, P. Isola, and I. Mordatch, “Neural mmo: A massively [4] B. Hilleli and R. El-Yaniv, “Toward deep reinforcement learning without multiagent game environment for training and evaluating intelligent a simulator: An autonomous steering example,” in Thirty-Second AAAI agents,” arXiv preprint arXiv:1903.00784, 2019. Conference on Artificial Intelligence, 2018. [27] T. Taylor, “Evolutionary innovations and where to find them: Routes to [5] E. M. Izhikevich, “Simple model of spiking neurons,” IEEE Transactions open-ended evolution in natural and artificial systems,” Artificial Life, on neural networks, vol. 14, no. 6, pp. 1569–1572, 2003. vol. 25, no. forthcoming, 2019. [28] J. Bongard, N. Cheney, Z. Mahoor, and J. Powers, “The role of [6] B. F. Grewe, J. Gr¨undemann, L. J. Kitch, J. A. Lecoq, J. G. Parker, J. D. embodiment in open-ended evolution,” OOE3: The Third Workshop on Marshall, M. C. Larkin, P. E. Jercog, F. Grenier, J. Z. Li et al., “Neural Open-Ended Evolution, 2018. ensemble dynamics underlying a long-term associative memory,” Nature, [29] T. Taylor, M. Bedau, A. Channon, D. Ackley, W. Banzhaf, G. Beslon, vol. 543, no. 7647, p. 670, 2017. E. Dolson, T. Froese, S. Hickinbotham, T. Ikegami et al., “Open-ended [7] S. Doncieux, N. Bredeche, J.-B. Mouret, and A. E. G. Eiben, “Evolu- evolution: perspectives from the oee workshop in york,” Artificial life, tionary robotics: what, why, and where to,” Frontiers in Robotics and vol. 22, no. 3, pp. 408–423, 2016. AI, vol. 2, p. 4, 2015. [30] K. O. Stanley, J. Lehman, and L. Soros, “Open-endedness: The last [8] D. O. Hebb, The organization of behavior: A neuropsychological theory. grand challenge youve never heard of,” O’Reilly, 2017. New York: Wiley, Jun. 1949. [31] D. Heeger, “Poisson model of spike generation,” Handout, University of [9] Y. Li, Y. Zhong, J. Zhang, L. Xu, Q. Wang, H. Sun, H. Tong, X. Cheng, Standford, vol. 5, pp. 1–13, 2000. and X. Miao, “Activity-dependent synaptic plasticity of a chalcogenide [32] L. Smith and M. Gasser, “The development of embodied cognition: Six electronic synapse for neuromorphic systems,” Scientific reports, vol. 4, lessons from babies,” Artificial life, vol. 11, no. 1-2, pp. 13–29, 2005. p. 4906, 2014. [33] S. Lev-Yadun, A. Dafni, M. A. Flaishman, M. Inbar, I. Izhaki, G. Katzir, [10] K. O. Stanley and R. Miikkulainen, “Evolving neural networks through and G. Ne’eman, “Plant coloration undermines herbivorous insect cam- augmenting topologies,” , vol. 10, no. 2, pp. ouflage,” BioEssays, vol. 26, no. 10, pp. 1126–1130, 2004. 99–127, 2002. [34] T. M. Davidson and J. Eisner, “United states coral snakes,” Wilderness [11] J. H. Holland, “Genetic algorithms,” Scientific american, vol. 267, no. 1, & Environmental Medicine, vol. 7, no. 1, pp. 38–45, 1996. pp. 66–73, 1992. [35] H. K. Kato, A. M. Watabe, and T. Manabe, “Non-hebbian synaptic [12] P. Funes and J. Pollack, “Evolutionary body building: Adaptive physical plasticity induced by repetitive postsynaptic action potentials,” Journal designs for robots,” Artificial Life, vol. 4, no. 4, pp. 337–357, 1998. of Neuroscience, vol. 29, no. 36, pp. 11 153–11 160, 2009. [13] C. Mautner and R. K. Belew, “Evolving robot morphology and control,” [36] J. P. Johansen, L. Diaz-Mataix, H. Hamanaka, T. Ozawa, E. Ycu, Artificial Life and Robotics, vol. 4, no. 3, pp. 130–136, 2000. J. Koivumaa, A. Kumar, M. Hou, K. Deisseroth, E. S. Boyden et al., [14] R. A. Watson, S. G. Ficiei, and J. B. Pollack, “Embodied evolution: “Hebbian and neuromodulatory mechanisms interact to trigger asso- embodying an in a population of robots,” in ciative memory formation,” Proceedings of the National Academy of Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 Sciences, vol. 111, no. 51, pp. E5584–E5592, 2014. (Cat. No. 99TH8406), vol. 1, July 1999, pp. 335–342 Vol. 1. [37] T. Abrahamsson, B. Gustafsson, and E. Hanse, “Synaptic fatigue at the [15] K. O. Stanley, B. D. Bryant, and R. Miikkulainen, “Evolving adaptive naive perforant path–dentate granule cell synapse in the rat,” The Journal neural networks with and without adaptive synapses,” in The 2003 of physiology, vol. 569, no. 3, pp. 737–750, 2005. Congress on Evolutionary Computation, 2003. CEC’03., vol. 4. IEEE, [38] S. Royer and D. Par´e, “Conservation of total synaptic weight through 2003, pp. 2557–2564. balanced synaptic depression and potentiation,” Nature, vol. 422, no. [16] S. Risi and K. O. Stanley, “Indirectly encoding neural plasticity as a 6931, p. 518, 2003. pattern of local rules,” in International Conference on Simulation of [39] S. El-Boustani, J. P. K. Ip, V. Breton-Provencher, H. Okuno, H. Bito, and Adaptive Behavior. Springer, 2010, pp. 533–543. M. Sur, “Locally coordinated synaptic plasticity shapes cell-wide plas- [17] A. Gaier and D. Ha, “Weight agnostic neural networks,” in Advances in ticity of visual cortex neurons in vivo,” bioRxiv preprint bioRxiv:249706, Neural Information Processing Systems, 2019, pp. 5365–5379. 2018. [18] K. O. Stanley, J. Clune, J. Lehman, and R. Miikkulainen, [40] A. S. Benjamin, Successful remembering and successful forgetting: A “Designing neural networks through neuroevolution,” Nature Machine festschrift in honor of Robert A. Bjork. Psychology Press, 2011. Intelligence, vol. 1, no. 1, pp. 24–35, 2019. [Online]. Available: [41] O. Vinyals, I. Babuschkin, J. Chung, M. Mathieu, M. Jaderberg, https://doi.org/10.1038/s42256-018-0006-z W. M. Czarnecki, A. Dudzik, A. Huang, P. Georgiev, R. Powell, [19] J. D. Schaffer, “Evolving spiking neural networks: A novel growth algo- T. Ewalds, D. Horgan, M. Kroiss, I. Danihelka, J. Agapiou, J. Oh, rithm corrects the teacher,” in 2015 IEEE Symposium on Computational V. Dalibard, D. Choi, L. Sifre, Y. Sulsky, S. Vezhnevets, J. Mol- Intelligence for Security and Defense Applications (CISDA), May 2015, loy, T. Cai, D. Budden, T. Paine, C. Gulcehre, Z. Wang, T. Pfaff, pp. 1–8. T. Pohlen, Y. Wu, D. Yogatama, J. Cohen, K. McKinney, O. Smith, [20] E. Eskandari, A. Ahmadi, S. Gomar, M. Ahmadi, and M. Saif, “Evolving T. Schaul, T. Lillicrap, C. Apps, K. Kavukcuoglu, D. Hassabis, spiking neural networks of artificial creatures using ,” and D. Silver, “AlphaStar: Mastering the Real-Time Strategy Game in Neural Networks (IJCNN), 2016 International Joint Conference on. StarCraft II,” https://deepmind.com/blog/alphastar-mastering-real-time- IEEE, 2016, pp. 411–418. strategy-game-starcraft-ii/, 2019. [21] C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A. Rusu, [42] M. Sipper, E. Sanchez, D. Mange, M. Tomassini, A. P´erez-Uribe, A. Pritzel, and D. Wierstra, “Pathnet: Evolution channels gradient and A. Stauffer, “A phylogenetic, ontogenetic, and epigenetic view descent in super neural networks,” arXiv preprint arXiv:1701.08734, of bio-inspired hardware systems,” IEEE Transactions on Evolutionary 2017. Computation, vol. 1, no. 1, pp. 83–97, 1997. [22] P. Voss, “Essentials of general intelligence: The direct path to artificial [43] R. Doursat, H. Sayama, and O. Michel, Morphogenetic engineering: general intelligence,” in Artificial general intelligence. Springer, 2007, toward programmable complex systems. Springer, 2012. pp. 131–157. [23] L. Yaeger, “Computational genetics, physiology, metabolism, neural systems, learning, vision, and behavior or poly world: Life in a new context,” in Santa Fe Institute Studies in the Sciences of Complexity, vol. 17. Addison-Wesley Publishing CO, 1994, pp. 263–263.

7