The Effect of Social Learning on Individual Learning and Evolution
Total Page:16
File Type:pdf, Size:1020Kb
The Effect of Social Learning on Individual Learning and Evolution Chris Marriott1 and Jobran Chebib2 1University of Washington, Tacoma, WA, USA 98402 2University of Calgary, Calgary, AB, Canada T2N 1N4 [email protected] Abstract individual learning strategies (see Denaro and Parisi (1997); Acerbi and Parisi (2006); Marriott et al. (2010)) and most We consider the effects of social learning on the individual notably is its ability to support cumulative cultural evolu- learning and genetic evolution of a colony of artificial agents tion (Mesoudi et al. (2006)) as witnessed in human culture. capable of genetic, individual and social modes of adaptation. We confirm that there is strong selection pressure to acquire While a wide range of mechanisms of learning have been traits of individual learning and social learning when these are studied in conjunction with evolution we have found little adaptive traits. We show that selection pressure for learning evidence of the study of social learning mechanisms on evo- of either kind can supress selection pressure for reproduction lution. Further, since social learning is a distinct form of or greater fitness. We show that social learning differs from learning identified by its social as opposed to individual na- individual learning in that it can support a second evolution- ary system that is decoupled from the biological evolution- ture, it may also interact with the individual learning pro- ary system. This decoupling leads to an emergent interaction cesses. In this study we explore the interactions between where immature agents are more likely to engage in learning social learning, individual learning and genetic adaptation. activities than mature agents. We have designed a model that allows for evolution of re- productive, individual learning and social learning abilities Introduction in conjunction with traits for fitness. These abilities are not just on/off for our agents but instead can be participated in When agents possess both genetic adaptation operating on a more or less than others. The abilities also come at a cost to generational time scale and learning operating on the agent’s the agent (in time) which limits the maximum fitness achiev- time scale there is the potential for interaction between able by the agent. these adaptive mechanisms. Sznajder et al. (2012) have sur- The model we have developed allows for the genetic in- veyed studies into this interaction and show conditions under formation and the learned information to be expressed in the which the presence of learning can accelerate or decelerate same format, allowing for direct comparison between ge- genetic adaptation. netic adaptation and the individual and social learning pro- In particular Paenke et al. (2007) found that if the fitness cesses. function in the direction of adaptation is concave and the step size of the learning algorithm is small genetic adap- Experimental Setup tation can occur at an increased rate. This acceleratory effect was seen in a number of different simulations (for Learning Task Our simulation involves agents that gather arXiv:1406.2720v1 [cs.AI] 10 Jun 2014 instance Hinton and Nowlan (1987); Fontanari and Meir resources from a number of different resource sites. Each (1990); Mayley (1997); Lande (2009)). site has five locations where resources might be found. Only On the other hand if this condition is not upheld, or indeed one, two, or three of the locations actually have resources the opposite conditions exists, then the learning can slow while the others are empty. We call this value the reward the natural genetic adaptation (see Papaj (1994); Ander- of the site. Say an agent is at a resource site with three re- son (1995); Dopazo et al. (2000); Borenstein et al. (2006)). sources hidden in the five locations. The agent must select Some simulations have shown both acceleratory and de- the order it will check the locations. Once the agent has celeratory effects under different conditions (Ancel (2000); all three resources it can stop checking, and the cost of this Paenke et al. (2006, 2007)). gather attempt is the number of locations checked. This cost Social learning is a form of learning that arises from so- is measured in time units. cial situatedness (Lindblom and Ziemke (2003)) and is char- Agents will have to select which sites to gather from each acterized by agents interacting with one another in order to day and what order to check locations at those sites. The learn. Social learning can accelerate learning beyond that of resource sites may not be repeated in a day, but can be re- peated again the next day. The next day the resource site will list of actions. Its daily reward is the total number of re- have the same number of resources and they will be stored sources these actions accrue. at the same locations. This allows for the agent to adapt its A new agent is created from a parent selected randomly strategy on future days. but not uniformly from the population. An agent’s chance More formally we can think of a resource site as a subset of being selected is equal to the number of breeding tick- of f1; 2; 3; 4; 5g with size equal to the reward. For instance ets it has. It is rewarded one breeding ticket for every five with reward equal to 3 the resource site might be the set resources it gathered today and one for each time unit the f1; 2; 4g. An attempt at this site by the agent is character- agent spent on reproducing today. So agents better at gath- ized by a permutation of the set f1; 2; 3; 4; 5g, for instance, ering and who spend more time on reproduction are more [5; 4; 2; 1; 3]. The cost to the agent (in time units) at this site likely to be a parent. The new agent’s genome is a mutated is equal to the number of entries in the permutation that must clone of its parent’s genome. be processed before the subset representing the site has been When an agent is born the agent with least fitness dies covered by the elements of the permutation. In our example to maintain a constant population size. An agent’s fitness it requires the first 4 elements of [5; 4; 2; 1; 3] to cover the is equal to the average daily reward during its life minus its set f1; 2; 4g resulting in a cost of 4 time units spent to gather age. As a result agents that die are either poorer at gathering, 3 resources. Notice the maximum cost is always 5 and the older, or both. minimum cost is equal to the reward. During a day an agent may spend time learning. For each An agent is allocated 50 time units in a day and may occurrence of an individual learning action in its daily action gather from as many sites as it has time units. Thus opti- list the agent will mutate its memome once. This activity mally an agent can gather 50 rewards from resource sites in will change the memome but leaves the genome unchanged. one day if all time units are dedicated to gathering. An agent The new memome is used to select the next day’s actions. may also spend its time units on activities that may increase An agent is selected for participation in a social gathering its long term fitness. Specifically it may spend units attempt- with probability proportional to the number of time units ing to reproduce, learn individually, or learn socially. These spent on socially learning. The selected agents then con- tasks each take one time unit and may be taken more than tribute their best five memeplexes into a pool. The best five once in a day. in the pool are then redistributed to the agents replacing their five least fit memeplexes. Again this exchange affects only Agent Design The agent consists of a genome and a mem- the memome leaving the genome unchanged. ome. The genome and the memome are internally identical, but play different roles in the agent. The genome is inert in- Population Design We considered three isolated popula- formation that remains static for the duration of the agent’s tions of 50 agents. Each agent in the initial population has a life. It is used during asexual reproduction to create a new genome consisting of geneplexes with only a single random agent. The memome is dynamic and may change over the gathering action. No genome was seeded with reproducing, lifetime of the agent. It is used to select daily behavior and individual learning, or social learning actions. may be spread socially. The first population served as a control and agents were Genomes and memomes are internally identical and in not capable of learning in any way. Their only means of be- this paragraph we will describe these internals from the per- havioral adaptation is genetic evolution. Any actions spent spective of a genome. However, everything is equally true by these agents on learning were wasted actions as they had of the memome. A genome consists of 50 geneplexes, each no result and thus these actions were slightly maladaptive for describing a series of actions that could be carried out by these agents. The memome and genome were identical in an agent in a single day. A geneplex’s total cost is the time these agents since their memome was incapable of change. cost of all actions in the geneplex and is always less than or Agents in the second population were capable of learning equal to 50. A geneplex’s fitness is equal to the total reward from the environment but not from each other.