Evolutionofand UsingSignals,Symbols,and


categories that are based on iconic and categorical Abstract representations. Initially, each cognitive system builds an This paper describes different types of models for the iconicoftheperceivedobject.Itcorresponds evolution of communication and language. It uses the to the sensory representation of an object, such as its distinction between signals, symbols, and words for the projection in theretina. The retinal imageofahorseisan analysis of evolutionarymodels of language.Inparticular,it iconic representation. Subsequently, this representation is show how evolutionary computation techniques, such as processed and used to build a categorical representation. Artificial Life,canbe used tostudytheemergenceofsyntax The object is represented by some essential (indexical) and symbols from simple communication signals. Initially, a computational model that evolves repertoires of isolated features that define its membership in a category. The signalsispresented.Thisstudyhassimulatedtheemergenceof category of horses is an example of categorical signalsfornamingfoodsinapopulationofforagers.Thistype representation. These categorical representations [12] are of model studies communication systems based on simple useful to sort out the extensive perceptual variability of signal-objectassociations.Subsequently,modelsthatstudythe objectsintherealworld.Indeed,theabilityofand emergence of grounded symbols are discussed in general, animals to create categories, e.g. through categorical includingadetaileddescriptionofaworkontheevolutionof perception,constitutesthe“groundwork”ofcognition[11]. simplesyntacticrules.Thismodelfocusesontheemergenceof Upon this basis, it is possible to build more complex symbol-symbol relationships in evolved . Finally, cognitive skills, such as language. The next level of computationalmodelsofsyntaxacquisitionandevolutionare discussed. These different types of computational models representation is called symbolic. Symbolic representation provide an operational definition of the signal/symbol/ makes it possible for us to name and describe our distinction. The simulation and analysis of these types of environment, in terms of objects’ categories, their modelswillhelpunderstandingtheroleofsymbolsandsymbol memberships, and their invariant features. The word acquisitionintheoriginoflanguage. “horse”issuchatypeofsymbolicrepresentation.Symbolic representations can be combined together to describe new Index Terms: Evolution of language, Artificial Life, Symbol entities and relations. For example, the word “horse” and grounding,Neuralnetworks “stripe” can be used together to describe the of “zebra.” Symbols constitute the basis of language, I. INTRODUCTION especiallyinlanguages. A. Icons,Indices,andSymbols Deacon[8,9]usesahierarchyofreferencingsystemsbased on the three levels of iconic, indexical, and symbolic Analyses of linguistic and communication systems are relationships.Iconsareassociatedwithentitiesintheworld mainly based on the semiotic distinction between icons, indices, and symbols. These distinctions, originally because of stimulus generalisation and conventional introduced by Peirce [22], have been reproposed and similarity.Indicesareassociatedtoworldentitiesbyspatio- temporal correlation or part-whole contiguity. They are slightlyrevisedinrecentlanguageoriginworks(e.g.,[12], [9]). Briefly, Peirce's original distinction between icons, typical of conditional learning based on simple stimulus indices,andsymbolsisbasedonthefactthatan"icon"has association. Indexical are used in common animal communication systems. Symbols have referential physicalresemblancewiththeobjectitrefersto,an"index" relationships to indexical relationships, and also to other is associated intime/spacewithanobjects,anda"symbol" 1 isbasedonasocialorimplicitagreement. symbols . Human languages are based on the use of such symbols. Harnad [11,12] distinguishes between three types of Recently,Deacon[9]proposedanexplanationoftheorigin representations that are used by natural (and artificial) cognitive systems to build mental representations and of language that is based on this hierarchical referencing classificationsoftheexternalenvironment.Hehypothesises system. His theory relies on the main distinction between communication with and without the use of symbolic thatsymbols(words)originatedasthenamesofperceptual representations to explain the evolutionary gap between Angelo Cangelosi, Centre for Neural and Adaptive Systems and Plymouth 1 Institute of Neuroscience, University of Plymouth, Plymouth PL4 8AA ForacritiqueofDeacon'suseoftheterm""for (UK).[email protected] describingrelationshipsbetweensymbolssee[15] animal and human communication systems. In fact, a theroleofthesemodelsforunderstandingtheevolutionof variety of animal communication systems exist and have cognition would be diminished. Instead, evolutionary and beenstudiedindetail[14].Thereisnoapparentcontinuity artificial life overcome the symbol between animal communication systems and complex grounding problem. For example, neural networks can be humanlanguages.Noanimal“simplelanguages”havebeen usedformodellingorganisms'neuralandcognitivesystems discovered, i.e., communication systems using some to build iconic and categorical representations. elementaryformsofwordcombinationsorsyntax.Thelack Subsequently, these representations can be used for high- of simplelanguageshelps explain thegapbetween animal levelsymbolicrepresentations.Infact,simulatedorganisms and human communication. Deacon [8,9] ascribes this to can use symbols whose semantic are made upof thesymbolacquisitionproblem.Indeed,themain categorical representations, such as the internal between animal and human communication pertains to representations of a neural network.Organisms'iconicand symbolic references. There is a significant difference categorical representations can be activated by the actual between the animal indexical referencing systemof simple presence of their referents in the organism's world, object-signal associations and that of humans’ symbolic therefore by directly grounded symbols in the external associations.Inanimals,simpleassociationsbetweenworld world. entitiesandsignals (e.g.,monkeys’calls)aremostlyinnate andcanbeexplainedbymeremechanismsofrotelearning Thedifferencebetweendifferenttypesofassociations(e.g., and conditional learning. A Vervet monkey always uses a simpleindexicalrelationshipsversussymbolicassociations) call in association with a specific predator [5]. Instead, and their relation to the computationalmodels oflanguage symbolic associations havedoublereferences,onebetween evolutionisrepresentedgraphicallyinFigure1.Theobjects the symbol and the object, and the second between the and symbols used in this example are takenfrom Savage- symbol itself and other symbols. When a complex of Rumbaugh & Rumbaugh’s [24] experimental stimulus set logical andsyntacticalrelationshipsexistbetweensymbols, in ape language research. In this figure, the upper level wecancallthemwordsanddistinguishgrammaticalclasses always refers to the linguistic representations of of words. A language-speaking human knowsthataword signals/symbols,whilstthelowerlevelreferstotheobjects refers to an object and also that the same word has present in the environment. Note that the relationship grammatical relationships with other words. Due to the between symbols and objects, which constitutes the possible combinatorial interrelationships between words, groundingofsymbolsintoentitiesoftherealworld,isnota there canbe anexponential growthof referencewith each direct link between mental symbols and real objects. newlyaddedword.Syntaxallowsthecombinationofmore Instead,itisalinkbetweenmentalentities(thesymbolsor wordstoexpressnewmeanings.Therefore,eachnewword words) and other mental entities (such as ) that of the lexicon can be used to exponentially increase the constitute the semantic reference. Therefore, the objects in overallnumberofmeaningsthatthelanguagecanexpress. the lower level refer to a semantic categorisation that organismscanmakeoftheseobjects.Theserepresentations B. Signals,Symbols,andWordsinLanguageEvolution aremediated bythe organisms' sensorimotorandcognitive In the last decade, computational modelling has been abilities. Solidfoods, such asabanana and anorange, are applied to the study of the evolution of language and represented with a link between themselves, and the two communication. These models deal with different types of drinks are also linked. In fact, apes group food together communication systems. Some rely on the use of simple because they require similar sensorimotor behaviour. In signals, while others use symbolic communication systems Rumbaugh'sexperiments,apesobtainfoodusingavending or complex syntactical structures. Amongst the different machine that gives solid food and pours drinks. Therefore types of computational approaches, evolutionary animals learn that foods are "given," while drinks are computation techniques, such as the synthetic approach of "poured." artificiallife[20,25],canbeusedtostudytheemergenceof communication. This approach permits the study of the Figure 1a represents a communication system based on different stages of semiotic complexity, from simple grounded signals. Communication relies on simple associations between signals and objects to symbolic indexical associations between objects and signals. This representations,andthentocomplexsyntacticrelationships situation refers to the models of the evolution oflanguage betweenwords. that only focus on lexicon emergence (e.g., [4], [27]). Communication signals, that are directly grounded in the Evolutionary computational models also offer the organisms' environment, do not have any symbolic or advantagetodealwiththesymbolgroundingproblem[12]. syntacticproperties. Computational cognitive models require an intrinsic link betweenthesymbolsusedinthemodel,suchaswords,and Figure1bshowsasystembasedongroundedsymbols(e.g., their semantic in the external environment. In [26],[2]).Inthetoplayertherearelinksbetweensymbols, classical rule-based cognitive models, and in some andreferencesbetweensymbolsandobjects.Followingthe evolutionarymodelsoflanguage,theintrinsiclinkbetween previous definitionof symbolby Deacon [9]and[13], we symbols and a world's entities does not exist. Organisms' can categorise this as a symbolic communication system internal symbols need sensorimotor grounding, otherwise duetotherelationshipsbetweensymbols. “pour” “give" “pour” “give" “pour” “give" “coke” “orange” “coke” “orange” “coke” “orange” “juice” “banana” “juice” “banana” “juice” “banana”

a b c Figure1-Visualisationofthedifferenttypesofassociationsincomputationalmodelsoflanguageevolution. 1a: Language based on simple indexical relationships between objects and signals, 1b: Language with groundedsymbolicassociations,1c:Languagewithnon-groundedsymbolicassociations.Objectandwordare inspiredbySavage-Rumbaugh&Rumbaugh’s(1978)experimentsonapelanguage.Seetextforexplanation. environment. These models, that focus on lexicon A symbolic communication system based on words is one emergence,donotmakeanyexplicitanddirectreferenceto thatsimulatestheevolutionofgroundedsymbols,wherethe theroleofsyntaxinlanguageorigin.Theiraimistomodel relationships between symbols are syntactic. For example, the early stages of the evolution of animal-like when the useof communication symbols is governed by a communication. set of grammatical rules, these symbols can be classified intowordclasses,suchasverbs,nouns,prepositions. A recent study by Cangelosi and Parisi [4] has simulated theevolutionofsignalsfornamingfoodsinapopulationof It is important to note that all these types of grounded foragers. This type of model studies communication symbolic communication systems can be simulated easily systems based on simple signal-object associations. through evolutionary and artificial life methodologies. Organisms learn and evolve simple stimulus associations Other computational modelling techniques can simulate between objects in the environment and signals. symbol systems but are limited in how they deal with the Communication signals only have referential relationships problemofsymbolgrounding.Figure1creferstomodelsof withtheworld'sentities. the origin of symbols and words with no direct symbol grounding. At the top, only symbol-symbol associations A. EvolutionofSignals:ModelSetup exist. This type of system is not based on grounded The simulation scenario is inspired by the use of symbolic associations as the links between symbols and communication signals observed in small groups of objectsaremissing.Symbolsareself-referencingandlacka animals. Animals have evolved the use of signals to direct grounding in the organisms' environment. These communicate about predators [5] or to communicate models are normally used to simulate the origin of syntax aboutthelocationandqualityoffood[14].The andwords(e.g.,[17]) simulation scenario uses the exchange of communicative signals between pairs of organisms concerning the quality Thefollowingtwosectionspresentmodelsfortheevolution offood.Morespecifically,individualorganismssignaleach ofcommunicationsystemsbasedonsignals(sectionII)and otherifencountered“mushrooms”areedibleorpoisonous. symbols (section III). Section IV briefly discusses the use The organisms live in an environment that contains two of computational models for the evolution of syntax and types of mushrooms: edible and poisonous. Organisms words. reproduce on the basis of their ability to eat the edible mushroomsandavoid the poisonous ones.They mustfirst II. EVOLUTIONOFCOMMUNICATIONUSINGSIGNALS categorise an encountered mushroom as either edible or Evolutionary computation has recently been applied to poisonous,andthentheymustrespondbyapproachingand studying the emergence and auto-organisation of eating edible mushrooms and by going away from communication lexicons.Somemodels havebeenusedfor poisonousones. the simulation of the emergence of simple lexicons in populationsofsimulatedorganisms(e.g.,[23],[4],[19]),in The simulation is based on Econet models small communities of robots [27], or in on line Internet [21]. There is a population of 100 organisms. Each agents [26]. In these studies, organisms evolve shared individual lives in an environment of 20x20 cells that lexicons for describing entities and relations of the contains 20 randomly distributed mushrooms (e.g., Figure 2). Ten mushrooms are edible and the other 10 are poisonousmushroom.Attheendoflife,theorganismsare poisonous. At the beginning of its life, an individual rankedintermsoftheirenergyandthe20individualswith organism is placed in a randomly selected cell with a the most energy are allowed to reproduce by generating 5 randomly selected orientation. When an organism happens offspring each. An offspring has the same connection tosteponacellcontainingamushroom,theorganismeats weights of its single parent with the exception of some themushroomanditdisappears. genetic mutations.Tenpercentoftheweightsaremodified by adding or subtracting a small random number. The process is repeated for 1000 generations. The selective reproduction of the individuals with most energy and the constant addition of variation to the genetic pool of connectionweightsthroughthegeneticmutationsresultsin an increase in average energy across the1000generations and the evolutionary emergence of the behaviour of approachingandeatingtheediblemushroomsandavoiding thepoisonousones.

Figure2-Environmentfortheforagingtask withedibleandpoisonousmushrooms.Each organismiscontrolledbyaneuralnetwork. The behaviour of each organism is controlled by a feedforward neural network with 14 input units, 5 output units,and5hiddenunits(Figure3).Oneinputunitencodes the location of the single nearest mushroom as the mushroom'sanglemeasuredclockwisefromtheorganism's current facing direction. This angle is mapped in the intervalfrom0to1.Teninputunitsencodethemushroom's Figure3-Neuralnetworkarchitectureforthe perceptual properties. The 10 edible mushrooms are listener and speaker organisms. Note the encodedas10patternsof10bit,witheachpatternobtained exchange of the communication signal by changing a single bit, randomly chosen, in the betweenthethreelinguisticunits. prototypical pattern 1111100000. Similarly, the 10 poisonous mushrooms are encoded as 10 single-bit During its lifetime, an organism wanders in its deviations from the prototype 0000011111. The 3 environment. In each cycle, one particular mushroom remaining input units (signal-encoding inputunits)encode happenstobethemushroomclosesttotheorganism.Ifthe oneof8possibleperceivedsignals:111,110,100,etc.Two mushroom is sufficiently near to the organism, i.e., it is ofthe5outputunitsencodeamovementoftheorganismin locatedinoneofthe8cellsadjacenttotheorganism'scell, theenvironment.Theorganismcaneitherproceedonestep theorganism perceivesboth the locationof the mushroom forward(11),turn90degreestotheleft(10)ortotheright (its angle with respect to the organism's facing direction) (01),orjustdonothing(00).Theremaining3outputunits and its perceptual properties (the pattern of 10 bits). (signal-encoding output units) encode one of 8 possible However, if the mushroom is more distant, the organism emitted signals in the same way as the signal encoding canperceivethemushroom'slocationbutnotitsperceptual inputunits. properties. The 10 input units encoding the mushroom's perceptualpropertiesallhave0activation. Astartingpopulationof100neuralnetworkswiththesame architecture and randomly assigned connection weights is The evolution across 1000 generations of three different generated initially in each simulation. The individual's populationsiscompared.Onepopulationhasnolanguage. energy(fitness)isincreasedeverytimetheorganismeatsan When an individual encounters a mushroom that is not ediblemushroomanditisdecreasediftheorganismeatsa locatedinoneofthe8cellsadjacenttotheindividual'scell, the organism can perceive the direction in which the mushroom lies but not the mushroom's perceptual properties. In a second type of population the language is provided externally by the researcher and it does not evolve. When an individual belonging to this population encounters a mushroom, the three inputunits ofits neural network that encode perceived signals have an activation patternof'100'iftheencounteredmushroomisedibleand anactivationpatternof'010'ifitispoisonous.Thesignals producedbytheorganismsareignored. In the third type of population, language is not externally providedbutitinsteadevolvesautonomously.Thescenario, whichhasbeeninspiredby[16],isthefollowing.Likethe organisms of the other two populations, an individual can perceivethenearestmushroom'sperceptualpropertiesonly ifthemushroomiscloseenough.However,ineachcycle,a secondindividualisselectedrandomlyfromthepopulation and is placed next to the first individual. This way it is exposedtothesameperceptualinputasthefirstindividual Figure 4 - Frequency distribution of the 8 with the only difference that the second individual has possible signals for the edible mushrooms. access to the perceptual properties of the mushroom Although there are some oscillations, the whatever the distance of the mushroom. The only taskfor population evolves a language that tends to thesecondindividualistolabelthemushroomforthefirst consistently use the pattern '010' to label individual. The binaryoutputof itssignal-encodingoutput ediblemushrooms. units in response to the perceptual properties of the mushroom is used as input to the signal-encoding input units of the first individual. Therefore, in this last Itisinterestingtoexaminewhatlinguisticsignalsevolvein population, when anindividualencounters amushroom, it the third population. Figure 4 shows the frequency has always access to a linguistic signal produced by a distribution of the 8 possible signals for the edible conspecific. However, in this population, unlike the mushrooms in a sample simulation. Although there are previous population,thequality of thesignalsprovidedby some oscillations, the population evolves a language that conspecificsisnotguaranteed.Whateversignalisgenerated tends to consistently use the pattern '010' to label edible by the conspecific's neural network, the signal is input to mushrooms(thepattern'110'isevolvedtolabelpoisonous the neural network of the individual that must decide mushrooms). In other populations, a similar tendency to whether to approach or go away from the mushroom. evolvetheuseofonlytwosignalswasfound.However,no Hence,the languagecanbeusefultotheseorganismsonly consistent pattern was observed in evolved signals. A if it evolves appropriately. The fitness of the listening populationcanbesaidtopossessanefficientlanguageif(a) organism only depends on its own ability to reach/avoid functionally distinct categories (in our case, edible and mushroom, and not on its linguistic ability. The speaking poisonousmushrooms)arelabelledwithdistinctsignals,(b) organismdoesnotreceiveanyfitnesspayoff. a single signal tends to be used to label all the instances within a category, (c) all the individuals in the population B. EvolutionofSignals:Results tend to use the same signal to label the same category. (Clark[7]hasarguedthatprinciplessimilartothesegovern We ran 10 replications and analysed the final fitness at the child's acquisition of the lexicon.) According to these generation 1000 and the structure of evolved lexicons in criteria,thelanguageevolvedbyourpopulationappearsto populations with auto-organisation of communication. The beratherefficient.Similarresultswereobtainedintheother comparison of the fitness values in the three populations replications of the simulation although different pairs of showsthatlanguageisausefuladditiontotheevolutionary signalsemergedforthetwocategoriesofmushrooms. adaptation of these organisms. The organisms with no language have an average energyof alittlemorethan150 Thissimulationshowsthatapopulationofsimpleartificial unitsattheendofevolutionwhilethetwopopulationswith organisms living in a simple environment can evolve an language have an average energy of more than 250 units. efficient language with an informative functionto help the On the other hand, the two populations with language do individuals interactwith theirenvironment.Duetosensory not differ very much from each other. Although, limitations, an individualcanperceivethelocation but not predictably, the population with externally provided the perceptual properties of a distant mushroom. This languagehasamoreregularincreaseinaverageenergythan represents a serious handicap because an individual can thepopulation withevolvedlanguage,thetwopopulations adopt an informed decision on whether toapproach orgo reachanequivalentlevelofenergyattheendofevolution. awayfromanencounteredmushroomonlyifthemushroom isveryclose.Inthesecircumstances,thepopulationevolves a simple language in the sense that individuals tend to category shareacommon binary prototype. When this 18- generate distinctive signals for edible and for poisonous bit string is input to the organism's neural network, the mushroomsandthesesignalsareusedbyotherindividuals mushroom should be classified into one of the six to decidewhethertoapproach oravoidamushroom.This categories. type of language is based on signal use, rather than on symbolic communication, because there are independent Each time an organism collects an edible mushroom, its pairs of signal-object associationsbetween theelements of fitness is increased by one point if the correct category of the lexicon and the mushrooms existing in the organisms’ mushroom is identified. Identification is based upon the environment. There are no relationships between the two level of activation of oneoutput unit.Whenatoadstool is signalsindicatingpoisonousandediblemushrooms. collected,the fitnessdecreases byonepoint.Attheendof their lifetime, the fittest 20 organisms are selected and III. EVOLUTIONOFSYMBOLICCOMMUNICATION reproduce 4 offspring each. The organism’s genotype is Some language evolution models have focused on made up of the neural network's connection weights. Ten symbolisationandtheuseofcommunicationsystemsbased percentofeach offspring's connectionweights are mutated on symbolic representations. The use of symbolic randomly. representations implies some form ofsymbol combination, since symbols have double references: one with external A 3-layer feedforward neural network controls the world’sentities,anotherwithsomeoftheexistingsymbols. behaviourof theorganism(Figure5).Intheinputlayer,3 This type of model can be seen as a first approach to the units encode the location of the closest mushroomand18 studyoftheevolutionofsyntax.Italsopermitsasystematic unitsencodetheirbinaryfeatures.Eightinputunitsareused analysisoftheproblemofsymbolacquisition.Forexample, for the 8 communication symbols. The network has 5 comparisons can be made between symbol acquisition in hidden units. In the output layer, 3 units control the animal models (e.g., chimpanzees) and computational organism’s behaviour (movement and identification of models (e.g., artificial neural networks). An example of mushroom category), and 8 units are used to encode the such models is Cangelosi's [2] work on the emergence of mushroom names. These symbolic output units are simplesyntacticrulesbasedonsymbolcombination. organised in two clusters of competitive winner-takes-all units(oneclusterof2units,theotherof6units).Sinceonly one unit per cluster can beactive, each mushroomwillbe A. EvolutionofSymbols:ModelSetup namedusingtwosymbols. In this study, an evolutionary computation methodology is used which is similar to that used in the previous model. During the first 300 generations, organisms evolve the The model's behavioural task is influenced directly by ability to differentiate between the 6 types of mushrooms. Savage-Rumbaugh & Rumbaugh’s [24] ape language Organisms do not communicate and do not use the 8 experiments.Apopulationof80organismsmustperforma symbolic input and output units. Only the closest foragingtaskbycollectingediblemushroomsandavoiding mushroom's location and the 18-bit feature string are toadstools. There are six categories of mushrooms: three availableasinput.Fromgeneration301to400,organisms edible mushrooms (big, medium, and small) and three can communicate by using the 8 linguistic input/output toadstools (big, medium, and small). Once an edible units.Duringthesegenerations,thenew80organismslive mushroom is approached, organisms must identify its size together with their 20 parent organisms. Only the 80 category in order to gain fitness. As toadstools must be offspring will forage and reproduce. The parents serve as avoided, no further classification of their type is required. speakersandteachersfornamingthemushroomcategories. These foraging stimuli resemble those in Savage- This interaction constitutes the process of cultural Rumbaugh&Rumbaugh’s[24]studyonapelanguage.As transmission between the children’s and the parents’ we already mention in Section I, chimpanzees were fed generations. During each action, the parent network through a vending machine that could "give" solid foods receivesthe18-bitfeatureasinputandproducestwooutput and "pour" drinks.Thereforeanimalshadtolearnnotonly symbolsdescribingthemushroom.Thesesymbolsareused thenames(lexigrams)forthesinglefoods/drinksbutalsoa as input to the child's neural network. Ten percent of the lexigramforthedifferenttypesofsolidfoodstobe“given” time the childalsoreceivesthe18-bitstringasinput.This and another lexigram for different types of liquid to be facilitates the evolution of good languages as the “poured.” availabilityofthemushroomfeaturesisrare.Therefore,the parent'slinguisticdescriptionbecomesanimportantsource Organismsliveina2Denvironmentmeasuring100by100 for discriminating between mushroom categories. The cells. At the beginning of each epoch there are 1200 child's network first uses the parent's symbols to decide randomly distributed mushrooms, 200 per category. A what action to perform. Then it uses the parent’s symbols mushroom is characterised by a binary string of 18 for a naming task and an imitation task. The error perceptual features. These features will be used by the backpropagation algorithm is used in both learning tasks, organism's neural network to identify the mushroom type andtheparent'stwo-symbolstringisusedasteachinginput. andappropriateaction.Asetof3binaryfeaturesalwaysset Somenoise(arandomnumberbetween±.5)isaddedtothe to1identifiesthemushroomcategorywhilsttheremaining error between the child's output symbols and the parent's bitsareeither0or1.Therefore,the200mushroomsofeach output symbols to introduce variability in the process of Figure5-Neuralnetworkarchitectureandlearningtasksduringtheparent-childcommunicativeinteraction. cultural transmission. It is important to notice that in the combination. These languages are also compositional next generation, the new offspring will only inherit their because there is a clear between the semantic parents' pre-learning connection weights. Any weight structure and the linguistic clusters.Inthefirstcluster two changes resulting fromthebackpropagationalgorithm will different units are used. One unit is constantly associated notbetransmittedtothenextgeneration. withallmushroomcategoriestobeavoided,theotherwith all mushrooms to be approached. The units in the second B. EvolutionofSymbols:ModelResults cluster are associated with the subcategories referring to The simulation from generation 1 to 300 wasrepeated10 mushroomsize(big,medium,small.) times,usingdifferentinitialrandompopulations(i.e.,neural networks with different random weights). At generation In11ofthe18runs,populationsevolvedgoodlanguages, 300, the foraging task fitness in 9 out of 10 populations i.e. the use of at least four signal/signal-combinations to reached an optimal level.Indeed,analysisofthebehaviour distinguish the four emerged behavioural categories (the of the best organisms shows that all toadstools were whole group of toadstools, and the three categories of avoided and all edible mushrooms were approached and edible mushrooms). These emergent semantic categories correctlyidentified. didnotdistinguishbetweenthesubcategoriesoftoadstools. Thesesharedlanguagesemergedthroughaprocessofauto- The9successfulpopulationswereusedinthesecondstage organisation of the lexicon, due to the interaction between of simulation from generation 301 to 400. In this stage, organisms and the process of cultural transmission. In the communication was permitted and organisms could learn remaining 7 populations the emerged language was poor. how to name mushrooms from their parents. Eighteen Thatis,somemushroomtypeswerelabelledincorrectlydue different simulations were performed (9 populations * 2 to the lack of some signal/signal-combination. Therefore initial random lexicons). The percentages of the different thefitnessremainedverylowsincesomemushroomswere types of evolved languages are shown in Table 1. Single- incorrectlydescribedandcollected. signal languages are characterised by the use of only one linguistic output cluster that differentiates between the Single-signal Signal-comb. Verb-noun semantic categories of mushrooms. Inthefirstcluster, one Goodlang. 9% 27% 64% unitisusedforeachcategory,whilstintheotherclusterthe Imperfectlang. 14% 29% 57% same unit is active for all categories. Signal-combination Table 1:Percentagesofthedifferenttypesof languagesarethosecharacterisedbytheuseofbothclusters languagesin18replications(generation400). to communicate differences between semantic categories. However, in these languages there is no compositionality, i.e. no parallelism exists between the topology of the We are interested in identifying the different types of (i.e. hierarchical structure of the mushroom languages that have emerged. In particular, we want to categories) and that of the units in the two clusters. The focus on signal-combination rules that resemble known verb-noun languages are a special case of signal- syntactical structures, such as the verb-noun rule. Consideringthepopulationsinwhichgoodcommunication and foods and then the researchers checked if apes where evolved, 10 of the 11 languages use combinations of able to generalise the correct type of verb. The results signals. Out of these 10 populations, 3 populations use showed that under certain language training conditions various combinations of two signals, and the remaining 7 animals are able to learn real symbolic associations and use a verb-noun rule. In fact, in the two-unit cluster, each make correct rule generalisations. Chimpanzees correctly linguisticunitissystematicallyassociatedtoonlyoneofthe associatedthenewdrinks' name with the verb"pour,"and high-ordercategoriesedible/poisonous.One“verb”symbol thenewfoods'namewith"give." isalwaysusedforalltoadstools(“avoid”)andtheotherfor allediblemushrooms(“approach”).Theunitsinthe6-word A similar symbol acquisition test was developed for the cluster are used for distinguishing single “nouns” foraging task of the model presented in this section. (mushroom size subcategories) with which the two verbs Organismsarefirsttaughttoassociatetheverb"avoid"with systematicallycouple.Thisisusedtoidentifytheverb-noun thenamesoftwotoadstoolcategoriesand"approach"with rule.Anexampleofsuchalanguageatthegeneration400 the names of two edible mushroom categories. ofasamplesimulationisshowninFigure6. Subsequently,theyaretaughtthe name ofanewtoadstool and the name of a new edible mushroom. No direct feedback for the verb association is given during the learning of these new names. Finally, neural networks are tested to establish whether they learned to use the "verb- noun"ruletoassociatethecorrectverbwiththenewnames. Ten replications of this test were executed. The results showthatin70%ofpopulations(7outof10)thelearned language is actually based on symbolic associations between the mushrooms’ names and the two verbs [2]. It indicates that in this model most of the organisms' neural networks use a symbolic strategy when learning linguistic symbols and the syntactic rules for combining them. However, further and more systematic analysis of the acquisition of predicate-argument rules in this type of neuralnetworkswillbeneededtofullyassessthesyntactic structureshandledbythesenetworks.

IV. COMPUTATIONALAPPROACHESTOTHEEVOLUTIONOF WORDSANDSYNTAX Figure 6: Final language in a sample The previous type of model simulates the emergence of population. Note that the language has a simple forms of syntax, such as two-signal symbol perfect "verb-noun" structure since all combination and verb-noun compositionality. The evolved toadstools (ST, MT, BT, respectively for communication system is based on the use of symbols, Small, Medium and Big Toadstool) use the sinceorganismsare abletogeneralisetheuse oftheverb- outputlinguisticunit"Y"(="avoid"),andall nounruletoeachnewentryinthelexicon.Itusessymbols ediblemushrooms(SE,ME,BE,respectively that are also directly grounded in the organisms' for Small Edible, Medium Edible, and Big environment. However, the complexity of the evolved Ediblemushrooms)usethelinguisticunit"Z" syntax is too simple and quite distant from the level of (="approach").Thenameofeachmushroom complexityofhumanlanguages. category is indicated by the output linguistic units A-F (for differentiating mushroom Language and syntax have often been modelled using subcategoriesbytheirsize). different computational techniques. Among these, neural networks have been used extensively for language simulation [6]. For example, recurrent neural network Thelanguageevolutionmodelunderdiscussionissupposed architectures have been often used for word prediction to be based on symbolic referencing, rather than simple tasks. Networks were trained using complex grammatical object-signal associations. It is necessary to test the sentences,withmanylevelsofrecursion.Thestudyshowed symbolicvalueofsuchevolvedlanguages.Testsforsymbol that neural networkswere able toabstractthegrammatical acquisition have been developed in animal language structure hidden in the sentences. They were also able to studies.Forexample,inthestudybySavage-Rumbaugh& represent, in thelayer ofhidden units,thedifferentclasses Rumbaugh [24] the test consisted of experiments where ofwords,suchasnounsandverbs. chimpanzees learned initially to associate “pour” with the name of two drinks (coke and juice) and "give" with the Some recent computational studies have used evolutionary name of two solid foods (banana and orange). techniques to study the emergence of syntax and words. Subsequently, animals were taught new names of drinks Theysimulatecomplexlanguagesinwhichitispossibleto identify "words", i.e. symbols that belong to specific approach wouldhavetodefineanaprioriseriesofstages grammatical classes, such as verbs, nouns, prepositions, ofsemanticcomplexityuponwhichsyntaxwouldbebiased etc... For example, Kirby [17] studied the emergence of tograduallydevelop.Inasymbolgroundedapproachother compositionality and recursive grammars. It showed how autonomous factors (such as the emergence of different compositionality can emerge without natural selection, stages of behavioural complexity during organism's using a simple mechanism of cultural transmission of adaptation),wouldbefreetoaffect(ornot)theevolutionof language. In [1], populations of recurrent neural networks differentstagesofsyntaxcomplexity. learn context-free grammars. They are used to understand theroleofcriticalperiodsinlanguageacquisition. Futuremodelsoftheevolutionofsyntaxshouldincludethe essential grounding of words into the organisms’ Both neural network and evolutionary computation environment. Thiswillreleasetheresearcherfromthetask approacheshaveproducedinterestingresults.Forexample, of deciding which meanings to input to the system. The neural network models have been useful in understanding simulation approach proposed here, and other approaches the mechanisms of neural processing of language and such asthe robotic modellingoftheevolutionoflanguage syntax and the processes of language acquisition. [26, 27], are clear examples of how symbols can directly Evolutionary models have supported the integration ofthe and autonomously ground their in the organisms' processoflanguageacquisitionwiththatofevolutioninthe environment. However, all words in a lexicon need to be study of the role of cultural transmission and phylogenesis directly grounded. In fact, once a basic set of grounded inlanguageorigin. wordshasemerged,additionalwordscanacquiregrounded meaningsbymechanismsofgroundingtransfer[3]. Current models of syntax acquisition and evolution have somelimitations.Amongthese,oneparticularshortcoming V. CONCLUSION is the fact that these models solely focus on syntax, rather Computational modelling, and in particular evolutionary than on the evolution and acquisition of both syntax and computation, helped to renew the interest for a scientific lexicon. In fact, most of them are based on the referential approachtothestudiesonlanguageoriginandevolution.In system described in Figure 1c. They onlysimulatethe top thelastdecade, many modelshave beendevelopedforthe level of a language system, that of words and word-word simulation of the emergence of language and relationships. The links between words and their semantic communication in evolving population of interacting referents in the external environment are not simulated organisms.Somemodelsstudiedtheemergenceoflexicons directly. Sometimes, an abstract system of semantic and signal-object associations, others have simulated the grounding is used, with the use of other "symbols" to evolutionofsymbolsandsyntaxdirectly. represent semantics (e.g. when a modeller uses a list of wordstodenotesemanticcategories).Usingthisapproach, TherecentlanguageevolutiontheoriesofHarnad[13]and the associations that organisms learn are mainly self- Deacon[9]havefocusedontheroleofsymbolsandsymbol referential symbol-symbol relationships. Theimportance of acquisitionfortheunderstandingoftheoriginsoflanguage. providing a model with a mechanism for grounding Theabilityofcognitivesystems,inparticularinhumans,to symbols, and the way it can affect the type of results that build symbolic representations from simple indexical the model produces, can be shown by analysis of the associations constitutes the main basis for the further simulation presented in sectionIII. Theinitialsetup ofthe development oflinguisticabilities,andinparticularforthe model provided a two-level hierarchy of six low-level evolution of syntax. Simple indexical associations are the mushroom categories and two high-level categories. These basisforanimalcommunicationsystemsthroughtheuseof categories were not explicitly given in input to the signals. Symbolic representations are the basis for organisms, i.e. the organisms did not have the list of communicationusingsymbols,andinparticularfortheuse mushroom names from which to select a specific meaning ofwordsandsyntaxinhumanlanguages. during communication. The making and selection of meanings to communicate depended on the organisms' Thispaperhasproposedtheuseofthedistinctionbetween evolvingbehaviouralskills.Infact,thefinalresultsshowed signals, symbols, and words for the analysis of language that the emerged lexicon was organised around four evolution models. Moreover, it has stressed the need for grounded semantic categories (the whole category of simulating the grounding of symbols and words. poisonousmushrooms,andthethreesubcategoriesofedible Evolutionary computation permits the design of such type mushrooms).Thisshowsthatwhenthesystemisallowedto of languageevolution models. The simulation and analysis ground its own semantics, unexpected sets of semantic ofthesemodelswillhelptounderstandtheroleofsymbols categoriescanemerge.Iforganismshadbeeprovidedwith andsymbolacquisitioninlanguageorigin. a non-grounded, ready-made symbolic representation of meanings, results could havebeen different because ofthe VI. ACKNOWLEDGEMENT biasprovidedbythemodeller'sfixedsemantics.Moreover, anon-groundedapproachwouldhavesignificantlimitations This research was partially supported by an Award for in models that wanted to study the possible existence (or Newly Appointed Lecturers of The Nuffield Foundation not) of sequential stages of syntax complexity in the (NUF-NAL # SCI/180/97/116) and a PhD studentship of evolution of language.A researcherusinganongrounded theUniversityofGenoa. VII. REFERENCES language acquisition processes in four young Pan Troglodytes. Brain andLanguage,6:265-300. [1] Batali J. (1994). Innate biases and critical periods: Combining [25] Steels L. (1997) The synthetic modeling of language origins. evolutionand learning in theacquisitionofsyntax. In R. Brooks&P. EvolutionofCommunication,1(1):1-37. Maes(eds),ArtificialLifeIV,Cambridge,MA:MITPress,160-171. [26] Steels L. & Kaplan F. (1999). learning and semiotic [2] Cangelosi A. (1999). Modeling the evolution of communication: dynamics. In D. Floreano et al. (Eds.), Proceedings of ECAL99 From stimulus associations to grounded symbolic associations. In D. European Conference on Artificial Life, Berlin: Springer-Verlag, Floreano et al.(Eds.), Proceedings of ECAL99 European Conference 679-688. onArtificialLife,Berlin:Springer-Verlag,654-663 [27] Steels L. & Vogt P. (1997). Grounding adaptive language games [3] CangelosiA., GrecoA., & Harnad S. (2000). From robotictoilto in robotic agents. In P. Husband & I. Harvey (eds). Proceedings of symbolic theft: Grounding transfer from entry-level to higher-level the Fourth European Conference on Artificial Life, London: MIT categories.ConnectionScience,12(2),143-162 Press,474-482 [4] Cangelosi A., & Parisi D. (1998). The emergence of a "language" in an evolving population of neural networks. Connection Science, 10(2),83-97 [5] Cheney D.L. & Seyfarth R.M. (1990). How monkeys see the world: Inside the of another specie. Chicago, IL: Chicago UniversityPress. [6] Christiansen M.H. & Chater N. (in press). Connectionist natural languageprocessing:Thestateoftheart.CognitiveScience [7] Clark E. (1993). The lexicon in acquisition. Cambridge, MA: CambridgeUniversityPress. [8] Deacon T.W. (1996). Prefrontal cortex and symbol learning: Why a brain capable of language evolved only once. In B.M. Velichkovsky & D.M. Rumbaugh (eds), Communicating meaning: The evolution and development of language, Mahwah NJ: LEA Publishers,103-138. [9] Deacon T.W. (1997). The Symbolic Species: The coevolution of languageandhumanbrain,London:Penguin. [10] Greenfield P.M. & Savage-Rumbaugh S. (1990). Grammatical combination in Pan paniscus: Process of learning and invention in the evolution and development of language. In S.T. Parker & K.R. Gibson (eds), Language and intelligence in monkeys and apes, CambridgeUniversityPress,540-579. [11] Harnad S. (Ed.) (1987). Categorical Perception: The groundworkofcognition.NewYork:CambridgeUniversityPress. [12] Harnad S. (1990). The Symbol Grounding Problem. Physica D 42:335-346 [13] Harnad S. (1996) The origin of words: A psychophysical hypothesis. In B.M. Velichkovsky & D.M. Rumbaugh (eds), Communicating meaning: The evolution and development of language,MahwahNJ:LEAPublishers [14] Hauser M.D. (1996). The evolution of communication. Cambridge,MA:MITPress. [15] Hurford J. (1998). Review of Terrence Deacon, 1997 The Symbolic Species: The co-evolution of language and the human brain.TheTimesLiterarySupplement,October23rd,1998,34 [16] Hutchins E. & Hazelhurst B. (1995). How to invent a lexicon. The development of shared symbols in interaction, In N.Gilbert e R. Conte (Eds.) Artificial societies: The computer simulation of social life,London:UCLPress. [17] Kirby S. (in press). Syntax without Natural Selection: How compositionality emerges from vocabulary in a population of learners. In C. Knight, M. Studdert-Kennedy & J. Hurford (Eds.) Approaches to the Evolution of Language, Cambridge University Press. [18] Kirby S. (1999).Syntax out of learning: The culturalevolutionof structured communication in a population of induction algorithms. In D. Floreano et al. (Eds.), Proceedings of ECAL99 European ConferenceonArtificialLife,Berlin:Springer-Verlag,694-703. [19] Oliphant M. & Batali J. (1997). Learning and the emergence of coordinated communication. Centre for Research in Language Newsletter,11(1). [20] Parisi D. (1997). An Artificial Life approach to language. Mind andLanguage,59,121-146. [21] Parisi D., Cecconi F. & Nolfi S. (1990). ECONETS: Neural networksthatlearninanenvironment.Network,1:149-168. [22] Peirce C.S. (1978). Collected papers. Vol. II: Element of logic, C.Hartshorne&P.Weiss(Eds.),Cambridge,MA:Belknap. [23] Saunders, G.M., & Pollack, J.B. (1996). The evolution of communication schemes over continuous channels. Proceedings of the SAB'96 Conference on the Simulation of Adaptive Behavior. Cambridge,MA:MITPress. [24] Savage-Rumbaugh S. & Rumbaugh D.M. (1978). Symbolization, language, and chimpanzees: A theoretical reevaluation on initial