An Emergence of Coordinated Vladimir Kvasnicka Jiri Pospichal Communication in Populations Department of Mathematics of Agents Slovak Technical University 812 37 Bratislava Slovakia [email protected] [email protected]

Abstract The purpose of this article is to demonstrate that coordinated communication spontaneously emerges in a Keywords population composed of agents that are capable of specific genetic algorithms, coordinated communication, emergence, agent, cognitive activities. Internal states of agents are characterized Darwinian evolution, Baldwin by meaning vectors. Simple neural networks composed of effect, Dawkins’ one layer of hidden neurons perform cognitive activities of agents. An elementary communication act consists of the following: (a) two agents are selected, where one of them is declared the speaker and the other the listener; (b) the speaker codes a selected meaning vector onto a sequence of symbols and sends it to the listener as a message; and finally, (c) the listener decodes this message into a meaning vector and adapts his or her neural network such that the differences between speaker and listener meaning vectors are decreased. A Darwinian evolution enlarged by ideas from the Baldwin effect and Dawkins’ memes is simulated by a simple version of an evolutionary algorithm without crossover. The agent fitness is determined by success of the mutual pairwise communications. It is demonstrated that agents in the course of evolution gradually do a better job of decoding received messages (they are closer to meaning vectors of speakers) and all agents gradually start to use the same vocabulary for the common communication. Moreover, if agent meaning vectors contain regularities, then these regularities are manifested also in messages created by agent speakers, that is, similar parts of meaning vectors are coded by similar symbol substrings. This observation is considered a manifestation of the emergence of a grammar system in the common coordinated communication.

1 Introduction

Human language [9, 10, 30] makes it possible to express a huge number of quite different meanings by token sequences composed of a small number of simple elements, and to interpret such sequences by the meanings that they contain. A standard meaning of the term “grammar” refers to the systematic regularities between meanings and their representation by token sequences in a language. These structural regularities of a language constitute a basis for expressions of novel meaning combinations. The hearer can accurately interpret the received sequences as involving those familiar structures and relations, even though their specific combinations may have never been used before. This means that a communication system endowed with a grammar can be used to express new unusual meanings related to specific situations. The ability to

c 2000 Massachusetts Institute of Technology Artificial Life 5: 319–342 (1999)

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

communicate by a system composed of structural regularities represents an important achievement of a species, for which coordinated social activities are vital to survive, and where an accurate communication between two individuals from the same species represents a definite selection advantage. When this benefit is recognized, it is natural to explain language communication as the result of Darwinian [21, 22, 28]. The creation of coordinated communication between agents can be interpreted as an emergence of a new phenomenon in a population of agents that are capable of specific pairwise communication accompanied by a learning process. Starting from the general ideas of Darwinian evolutionary theory, a source of emergence of coordinated communication in a population of agents may be found in the ability of coordinated communication to increase agent fitness; that is, there exists a selection pressure for a spontaneous emergence of coordinated communication. Individuals that are incapable of coordinated communication are strongly handicapped. Their evaluated fitness is substantially lower than that of those agents that are capable of coordinated commu- nication. Frequency of appearance of individuals that are not capable of coordinated communication decreases in the course of evolution. The purpose of this article is to study a hypothesis that coordinated communication together with grammar regularities are the result of an evolutionary process running in a population composed of individuals, or agents. These agents are endowed with an ability to perform specific cognitive activities. By the term coordinated communication we mean an exchange of messages between agents that is unified (with the same semantic contents) for the whole population. For elementary communication acts the term “coordinated” implies that both speaker and listener understand each message in the same way. This requirement is formally expressed by a sequence of elementary steps in the communication act: The speaker’s internal state (e.g., corresponding to an internal representation of the environment) is coded into a signal-message (token sequence) received by the listener, who decodes this message into a form of internal state:

A selection of speaker’s internal state, ⇒ transformation of this internal state into a message represented by a token sequence, ⇒ the speaker sends the message to the listener, and ⇒ the received message is decoded by the listener into a form of internal state.

From this simple communication scheme it immediately follows that agents should be capable of specific cognitive activities that consist of coding the internal states onto mes- sages represented by token sequences (direct cognitive activity) and also in decoding the received messages onto internal states (an inverse cognitive activity). Moreover, the listener compares internal states constructed from the received messages with original internal states of speakers. If these internal states, original and decoded, are different, then the listener modifies parameters of his or her cognitive device so that differences between speaker and listener internal states in the forthcoming interaction between agents decrease. We presume that the speaker’s internal states correspond to some external surrounding reality, which can also be determined by a listener. Therefore we postulate that a listener can find out in some other way the internal state of the speaker. The problem of emergence of coordinated communication, where cognitive activ- ities of agents are performed in many different ways, is very intensively studied in the current artificial life literature [3, 5–7, 16, 26, 29, 31–33, 35] as well as in that of evolutionary linguistics [21–24, 27]. The present article is based on Batali’s idea [5, 6] that neural networks represent agent cognitive devices. We substantially enlarged this

320 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

interesting idea so that the Baldwin effect [4, 8, 19, 25, 34] and Dawkins’ memes [11] are considered as well. Batali studied the emergence of language only in the span of one epoch of evolution, whereas in the present article the emergence spans many epochs, allowing reproduction of the best agents, inheritance and spread of memes throughout generations. Recently, a similar approach was studied by Steels [33], who explored the emergence of a common vocabulary enhanced by cultural transmission. However, his model included neither neural networks nor learning with the Baldwin effect. Even though his “cultural heritage” was transmitted by a similar process as the communication act itself, it may not be the best model in biological reality, where the meaning of a new word is understood in childhood mainly through situation evaluation enhanced by feelings.

2 Darwinian Evolution, Baldwin Effect, and Dawkins’ Memes

Darwinian evolution is based on the principle of natural selection, according to which only the best-adapted individuals from a population survive. By reproduction of strong individuals offspring are produced that will be with high probability well adjusted for survival. In 1896 Baldwin [4] proposed a hypothesis: If learning helps survival, then the organisms best able to learn will have the most offspring, thus increasing the frequency of genes responsible for learning. If the environment remains relatively fixed, so that the most profitable information to learn remains constant, selection can lead to a genetic encoding of a trait that originally had to be learned. Baldwin called this mechanism “organic selection,” but it was later dubbed the “Baldwin effect.” As an extension of the Baldwin effect we consider Dawkins’ theory of memes [11]. Memes are ideas (messages) broadcasted throughout the whole population and are composed of useful information on how to increase fitness of chromosomes. Dawkins assumed that memes might be important for an acceleration of evolution. Many social traits (e.g., altruism) are discussed in the framework of [14, 15] only with some strong restrictions and assumptions that are not acceptable to all evolutionary biologists. An application of Dawkins’ memes represents a plausible possibility to explain and interpret many features of social behavior of animals [18]. In the framework of evolutionary algorithms (EAs) [12, 13, 20], the notion of bio- logical individual is formally replaced by the notion of chromosome, which usually represents the linearly ordered information content of an individual (genotype). Then we can talk about a population of chromosomes that reproduce with a probability di- rectly proportional to their strength; an important part of that reproduction is mutations. Mutations introduce into chromosomes new information, which can increase fitness of chromosomes created by crossover from parental chromosomes. New chromosomes push out from population chromosomes with low fitness. This basic reproduction cycle is continually repeated. After some time there is a high probability that chromosomes with new properties substantially increasing fitness will emerge in a population and push out older chromosomes without those properties. Evolutionary algorithms [12, 13, 20] are an abstraction of the above-presented basic principles of Darwinian evolutionary theory. Three different levels of EAs may be considered according to the method of fitness calculation:

1. Fitness of chromosomes is determined exactly by their positions on the fitness landscape, that is, it is determined entirely by chromosome compositions; no other effects are included in the fitness calculation. This method of fitness calculation corresponds to standard EAs, where chromosomes are directly mapped on positive real numbers (i.e., fitness) without any intermediate or hidden considerations.

Artificial Life Volume 5, Number 4 321

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

Figure 1. Diagrammatic illustration of a situation in which two chromosomes are placed on a fitness landscape so that they have the same fitness. The second chromosome is situated at a local maximum and therefore it does not have a chance to increase its fitness in its nearest neighborhood. Although the first chromosome has the same fitness as the second, it is situated on the fitness landscape in such a position that in its nearest neighborhood there exist positions with a greater fitness. From the standpoint of classical Darwinian evolutionary theory, when learning processes are ignored, these chromosomes cannot be distinguished and therefore will have the same chance to participate in a reproduction process. On the other hand, when the learning process is included in the evolution process, these chromosomes are distinguished; the first chromosome has a greater fitness and natural selection prefers this chromosome.

2. Fitness of chromosomes is influenced by the nearest neighborhood of their positions on the fitness landscape. It means that chromosomes are capable of learning, during which they search throughout their nearest neighborhood on the fitness landscape. As a result of this process chromosomes with the same fitness can be distinguished by the presence of higher fitness positions in their nearest neighborhood (see Figure 1). These chromosomes are more effective in the further evolution process than those that are not able to look for higher fitness chromosomes in the nearest neighborhood. The role of learning in evolutionary theory is called the Baldwin effect [4, 8, 34]. Its first study using genetic algorithms (GAs) was done by Hinton and Nowlan [19] (cf. [25]). However, better modeling of a learning process should also represent the quality of a learning ability by the size of the neighborhood searched around a chromosome. Very complex chromosomes could then theoretically code not only their position on a landscape, but their own learning algorithm as well. 3. Fitness of chromosomes is determined not only by the nearest neighborhood of their positions on the fitness landscape but also by the so-called memes that determine the information that is able to increase the fitness of chromosomes. The memetic part of a chromosome is used by offspring in the beginning of their life period for a preadaptation of their cognitive device. Dawkins introduced an idea of memes to in his famous book The Selfish Gene [11]. An incorporation of memes into EAs has been previously studied [25].

All these three different levels of sophistication of EAs may be formally considered as evolutionary steps of Darwinian evolution itself. This view of evolution distinguishes the complexity of the fitness evaluation process. On the first, lowest level, fitness is fully determined by chromosome composition only. The second level corresponds to the previous first level that is enriched by a possibility of learning; that is, chromosomes are capable of local search in their nearest neighborhoods. Finally, the third level is nothing but the second level enlarged by memes; that is, chromosomes are not only capable of learning but also capable of employing memes in the fitness evaluation process. We have to emphasize that in all these three levels the evolution is strictly Darwinian; we

322 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

Figure 2. Schematic representation of an elementary asymmetric act of communication between two agents. Agent speaker maps its internal state (represented by a binary meaning vector) onto a signal (represented by a symbol string). An agent listener receives this signal and it transforms the received signal into a form of binary meaning vector. In the process of learning, agent listener modifies his cognitive device such that a difference between the slightly distorted speaker meaning vector and the listener meaning vector is decreased.

do not postulate that traits acquired by learning are inherited by offspring. The learning process is used only for a fitness evaluation of chromosomes, not for their modification.

3 Specification of Agents

Let P ={A1, A2,...,Ap} be a population of agents that mutually interact in such a way that between two agents runs an asymmetric interaction (i.e., communication) in which the first agent is declared a speaker and the second a listener. We will postulate the following properties of agents (see Figure 2):

1. Each agent is capable of sending signals to another agent, where signals are strings of symbols B ={a, b,...}. 2. Internal states of agents are described by binary meaning vectors n n y = (y1, y2,...,yn) ∈{0, 1} ; all allowed meaning vectors form a set A {0, 1} . 3. “Cognitive devices” or “brains” of agents are modeled by simple neural networks (composed of one hidden layer) that are capable of the following two mutually inverse activities: Agent listener maps received signals onto internal states (a direct cognitive task) and agent speaker maps internal states onto signals (an inverse cognitive task to the previous direct task). 4. The listener recognizes to some extent internal states of speakers, and uses these internal states for a learning process. Parameters of listener’s cognitive device are modified so that the difference between the produced meaning vector and the recognized speaker’s meaning vector is slightly decreased.

As was mentioned above, each agent is endowed by a cognitive device. Its applica- tion to the transformation process of a token string x composed of symbols {a, b, c,...}

Artificial Life Volume 5, Number 4 323

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

onto a binary meaning vector y ∈ A {0, 1}n is formally determined by a paramet- ric mapping (realized by a neural network with weight coefficient w and threshold coefficients ϑ) as follows:

G(w,ϑ): {a, b, c,...} →{0, 1}n (1a)

y = G(w,ϑ; x) (1b)

where {a, b, c,...} is a set composed of all possible strings of q symbols a, b, c,..., {a, b, c,...} ={a, b,...,aa, ab,...,aab,...,baac,...}. An inverse transformation H (w,ϑ)with respect to the mapping (1a–b) maps binary meaning vectors onto symbol strings

G 1(w,ϑ)= H (w,ϑ): {0, 1}n →{a, b, c,...} (2a)

x = G 1(w,ϑ; y) = H (w,ϑ; y) (2b)

A construction of the inverse mapping H (w,ϑ) from the given mapping G(w,ϑ) is a nontrivial task that does not have a simple unambiguous solution. We outline the following solution of this problem that was initially suggested by Batali [5, 6]. Let us assume that we have a binary vector y ∈{0, 1}n. Our goal is to construct by applying the mapping G(w,ϑ) from this binary vector y a string x ∈{a, b, c,...} composed of l tokens

1 x = (x1x2 ...xl ) = G (w,ϑ; y) (3)

The proposed recurrent construction is composed of the following steps: Step 0. A binary vector y ∈ A {0, 1}n is given.

Step 1. The first token x1 is determined by

x1 = arg maxdG(w,ϑ; u) ye (4) u∈B

Pn where 0 dze= (zi ) n is a norm that expresses a similarity between the i=1 mapping G(w,ϑ; u) and the required output y; we have used (z) = 1if|z|0.5, and (z) = 0if|z| > 0.5. This means that the first token x1 is constructed by a minimization of a distance between the required output y and the calculated output represented by G(w,ϑ; u).

Step l. The lth token xl is determined by

xl = arg maxdG(w,ϑ; x1 ...xl1u) ye (5) u∈B

For example, the second token x2 is found in such a way that the resulting string (x1x2) gives maximal similarity of G(w,ϑ; x1x2) and y with respect to its second constituent x2. This recurrent process is stopped if the length l is greater than a prescribed maximal length lmax or the similarity norm has achieved its upper bound n.

324 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

Figure 3. A decomposition of the matrix of weight coefficients and the vector of threshold coefficients into blocks.

3 Specification of Neural Networks

For our forthcoming considerations the used neural networks are determined as follows:

N = (S, w,ϑ) (6)

where w and ϑ are weight and threshold coefficients, and S = (V , E) is an oriented graph composed of a neuron set V and a connection set E = V V . These two sets are restricted by the following three conditions:

1. The neuron set V is composed of three disjoint subsets corresponding to input, hidden, and output neurons

V = VI ∪ VH ∪ VO (7)

2. The connection set E is composed of three disjoint subsets

E = EIH ∪ EHH ∪ EHO, EIH VI VH, EHH = VH VH, EHO = VH VO (8)

3. Connections of S are evaluated by weight coefficients and hidden and output neurons are evaluated by threshold coefficients, so that a connection e = (vi , vj ) is evaluated by a weight coefficient wji, and a neuron (hidden or output, not input) vi is evaluated by a threshold coefficient ϑi . In general, we postulate that if the connection e = (vi , vj ) does not belong to the set E, then the corresponding weight coefficient is zero, wji = 0. This means that we have a simple formal device for determining a “topology” of the graph S. The nonzero weight coefficients indicate that the corresponding edges (i.e., connections) belong to the set E. According to the first two properties a matrix of weight coefficients and a vector of threshold coefficients are decomposed into blocks, see Figure 3.

A standard three-layer feed-forward neural network is specified by EIH = VI VH, EHH =∅, and EHO = VH VO; that is, it is composed of input, hidden, and output neurons, and between input and hidden neurons and hidden and output neurons there exist all possible connections, and finally no connections within hidden neurons appear. If the subset EHH is nonempty and its connections form at least one oriented

Artificial Life Volume 5, Number 4 325

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

Figure 4. Three illustrative examples of neural networks. The left-hand side neural network is the so-called feed- forward neural network that does not contain connections between hidden neurons, and neurons from the juxta- posed layers (input-hidden and hidden-output) are fully connected.

Figure 5. The present type of neural networks can be considered formally as neural networks that are composed of a layer of the so-called hidden neurons, where arbitrary connections are allowed inside of the hidden neuron layer. Consequently, for hidden neurons there are allowed such sequences of connections that form oriented cycles (strictly forbidden in standard feed-forward networks). Between input and hidden neurons and hidden and output neurons only upward connections are allowed. The right-hand graph is a compressed version of the left-hand graph and will be used in our forthcoming consideration for the construction of the so-called unfolded neural network.

cycle, then the corresponding neural network is called recurrent. Illustrative examples of neural networks are displayed in Figure 4. Activities of neurons are formally determined as follows (see Figure 5):

x = input ∈{a, b, c,...} u = H (x, u) (9) y = O(v) ∈ Rn

Here we introduce the convention that if a hidden or output neuron has no input con- nections, then its activity is set to zero. These coupled nonlinear equations are solved by a simple iterative scheme. Assuming that input x is represented by a token string (1) (2) (t) (tmax) of the length tmax, x = (x , x ,...,x ,...,x ), we get the following recurrent

326 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

Figure 6. Unfolded neural network with respect to the right-hand side network displayed in Figure 5. Loosely speaking, while the original network in Figure 5 corresponds to coupled equations (9), its unfolded version is assigned to an iterative solution of these equations that is represented by recurrent equations (10). Formally, the unfolded network may be considered as a parametric mapping of token string x onto a real vector y ∈ [0, 1]n [see Equations (1a,b)].

scheme for the calculation of activities: ) u(t) = H x(t), u(t1) (t = 1, 2,...,tmax) (10) y(t) = O u(t)

where u(0) are initial hidden activities; we set u(0) = 0. This system of recurrent equa- tions may be diagrammatically interpreted by the so-called unfolded neural network (see Figure 6). The unfolded neural network may be formally considered as a para- metric mapping that assigns to the first t tokens of the string x an n-dimensional vector y(t) ∈ [0, 1]n y(t) = G w,ϑ; x(1), x(2),...,x(t) (11)

An adaptation (learning) process of the unfolded neural network can be performed in the same way as for a feed-forward neural network; that is, we use an analogue of the so-called backpropagation technique [17] for the calculation of partial derivatives of an objective function with respect to weight and threshold coefficient. Let us consider a training set n . o ( ) ( ) ( ) = 1 , 2 ,..., tmax Atrain x x x yreq (12)

An objective function that expresses discrepancies between calculated and required output activities is determined as follows:

1 ( ) 2 E = y tmax y (13) 2 req

Artificial Life Volume 5, Number 4 327

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

An adaptation (learning) of the unfolded neural network is equivalent to a minimization of the objective function E with respect to weight and threshold coefficients

(w ,ϑ ) = arg min E(w,ϑ) (14) opt opt (w,ϑ)

This optimization problem is most frequently solved by the so-called steepest descent gradient method based on successive updating of the weight and threshold coefficients

w + = w grad E(w ,ϑ ) k 1 k w k k (15) ϑk+1 = ϑk grad ϑ E(wk ,ϑk )

where >0 is the so-called learning coefficient (in our computational simulations we set = 0.1). Partial derivatives of the objective function E with respect to weight and threshold coefficients are simply calculated by using the backpropagation methodology enlarged for unfolded networks [17] (called backpropagation through time). Neural networks were specified in a very general way. They are composed of three layers of input, hidden, and output neurons. Moreover, connections between input and hidden and hidden and output neurons are allowed only of the upward type, whereas connection between hidden neurons may be of an arbitrary type. This means that if a sequence of connections between hidden neurons forms an oriented cycle, then the neural network is of the so-called recurrent type. The above vague specification of the used neural networks (ranging from feed-forward to recurrent networks) is ad- vantageously used in our forthcoming considerations, where an evolution of cognitive agents (represented by neural networks) is studied so that the population is composed of agents that are equipped with neural networks with different architectures.

4 A Plasticity of Neural Networks

A plasticity of neural networks is of principal importance for an evolution of our cog- nitive agents toward the emergence of coordinated communication between them. We distinguish two types of plasticity [1, 2, 29] of neural networks:

1. Parametric plasticity: the weight w and threshold ϑ coefficients are modified by a Gaussian noise. 2. Structural plasticity: an architecture specified by the oriented graph S is modified so that a neuron or connection is either added or removed.

First, let us consider the parametric plasticity of the network N = (S, w,ϑ); its weight and threshold coefficients are modified (mutated) as follows:

0 = + ( ,) wij wij r 0 ϑ0 = ϑ + ( ,) (16) i i r 0

where r(0,) is a random (Gaussian) number with a normal distribution, zero mean, and dispersion (small positive number). We require that parametric mutations keep the structural architecture of neural networks; this means that the above formulae for mutation of weight coefficients are applied if the corresponding coefficients are not equal to zero. Formally, the parametric mutation can be represented by a stochastic operator Opar-mut that maps a neural network onto another neural network. Both original and

328 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

mutated neural networks are specified by the same graph S, while weight and thresh- old coefficients of the mutated network are formed from the coefficients of the original network by adding a Gaussian noise

0 0 0 0 N = Opar-mut(N ), N = (S, w,ϑ), and N = (S, w ,ϑ ) (17)

Second, the structural plasticity is realized by the so-called structural mutations spec- ified as follows:

0 0 0 0 0 N = Ostruct-mut(N ), N = (S, w,ϑ), and N = (S , w ,ϑ ) (18)

where the stochastic operator Ostruct-mut mainly affects the graph S whereas the weight coefficients remain untouched and only the corresponding threshold coefficient of the newly created/deleted neuron must also be created/deleted. The offspring graph S 0 is created from the parent graph S by making one of the following “mutations” (see Figure 7):

(M-1) New hidden neurons are introduced, or (M-2) isolated hidden neurons are removed, or (M-3) new oriented edges are introduced, or (M-4) old oriented edges are removed.

The introduced neurons or edges (types M-1 and M-3) are evaluated by randomly generated coefficients (threshold and weight, respectively) from the interval (1, 1).

5 Elementary Communication Acts and Fitness Evaluation

Let us study two agents AS and AL, where the first one is declared a speaker and the other a listener (see Figure 3). Their cognitive devices (represented by neural networks) are denoted by GS(wS,ϑS) and GL(wL,ϑL). Let us assume that the speaker’s internal state is described by the binary meaning vector y ∈ A {0, 1}n; then applying his neural network NS(SS, wS,ϑS) (represented by the mapping GS(wS,ϑS)) to the meaning = 1( ,ϑ ; ) ∈{ , , ,...} vector we get a symbol string (i.e., signal) x GS wS S y a b c ; this signal is sent to the listener AL. On the opposite side of the communication channel, the listener receives the signal x, and applying the cognitive device GL(wL,ϑL) the 0 n string is decoded to a form of binary real vector y = GL(wL,ϑL; x) ∈ [0, 1] . Let us define a distance between meaning vectors y and y0 by

Xn ( , 0) = 1 | 0| d y y yi yi (19) n i=1

For the coordinated communication between speaker and listener the above distance should be vanishing, that is, both meaning vectors y and y0 become identical. Let us define a fitness increment as follows: 1 1f = (20) + d(y, y0)

where is a small positive constant, which ensures that the denominator is always positive (we put = 0.01). The increment is larger if the distance d(y, y0) is smaller,

Artificial Life Volume 5, Number 4 329

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

Figure 7. Four different allowed types of mutation of a neural network. The first two types either add (M-1) or remove (M-2) an unconnected hidden neuron, whereas the last two types of mutation either add (M-3) or remove (M-4) a connection.

for example, if d(y, y0) = 0, then 1f = 1/. This fitness increment properly reflects the following important conditions required from both agents that take part in the given elementary communication act: Their cognitive mappings G(w,ϑ) are unambiguous. This requirement means that the emerged communication should be coordinated: both sides in the elementary communication act (speaker and listener) should use equivalent cognitive devices that are mutually inverse. Formally, a mapping G(w,ϑ) is called unambiguous if

G 1[w,ϑ; G(w,ϑ; y)] = y (21)

for each binary meaning vector y ∈ A. A pseudo-Pascal implementation of the above description of the fitness calculation is outlined in Algorithm 1 (see Figure 8). This evaluation is composed of repeated calls of elementary communication acts between two agents—one of them is declared the speaker and the other the listener. In our simulations the concept of fitness is used not only for the stochastic selection of best-fitted agents that are participating in the reproduction process, but also for the construction of parameters that specify mutations. Let fmax be an estimated maximal fitness (i.e., fitness of all agents for all evolutionary stages are bounded from above by this estimate number). Then for each agent selected for reproduction, so-called temperature [29] is introduced as follows

f (A) T (A) = 1 (22) fmax

330 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

Figure 8. Algorithm 1: pseudo-Pascal implementation of elementary communication acts (see Figure 3). At the beginning of the algorithm all fitness are set to zero. The fitness evaluation consists of repeated applications of elementary communications for all possible speakers, listeners, and speaker meaning vectors. The speaker maps the meaning vector y onto a message x; the listener receives the message x and decodes it into a meaning vector y0. At the end of an elementary communication act, the neural network of the listener is adapted so that the required output is closer to the speaker meaning vector y. A small positive constant delta prevents division by zero.

This “temperature” is used for a calculation of the parameter [see Equation (16)] specifying parametric mutation

= T (A)max (23)

where max is a prescribed maximal value of the mutational dispersion. This con- struction of the parameter means that at the beginning of evolution (f fmax), it approaches the greatest value max. On the other hand, if a selected agent has fitness closely related to the maximal value fmax, then the parameter is vanishing. The same approach as above is applicable also to structural mutations. Let 1max be a maximal numbers of neurons/connections that are allowed to be removed or introduced with one structural mutation. An actual value 0 1 1max of this number is determined as follows

1 = dT (A)1maxe (24)

where dxe is x rounded to the closest integer. This means that the temperature T (A) properly controls the size of mutations irrespective of whether they are of structural or parametric type. At the beginning of evolution, when the fitness of agents is very far from the estimated maximal value fmax, the mutations have very intensive character, but then at the end of evolution, when many agents have already achieved the estimated fitness fmax,a modification effect of mutation is vanishing.

6 Dawkins’ Memes

As was already mentioned in Section 2, Dawkins’ memes represent an important method for accelerating Darwinian evolution, where inheritance of acquired traits is not allowed.

Artificial Life Volume 5, Number 4 331

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

The fitness of a chromosome that is accompanied by a may be considerably greater than the fitness of a solo chromosome. Memes are conveyors of “cultural” (extra genetic) information that may be very important for increasing chromosome fitness. In the next part of this section we formulate a very restrictive theory of memes that are transmitted only from parents to children (so-called vertical transmission), while the typical Dawkins’ memes are transmitted by a horizontal transmission between two individuals, which accelerates evolution more progressively. This restriction was intro- duced to get a model that could be both easily implemented as well as analyzed. A meme M is assigned to an agent A with cognitive device represented by the mapping G(w,ϑ). Let us have a meaning vector y ∈ A {0, 1}a; applying an inverse mapping G 1(w,ϑ) the binary vector y is mapped onto a token sequence x = G 1(w,ϑ; y), and we get a pair x/y. A meme M is determined as an assemble of the pairs string/meaning M = x = G 1(w,ϑ; y)/y; y ∈ A (25)

Let a chromosome of the agent A be represented by the neural network N = (S, w,ϑ); then a complex chromosome-meme is expressed as an ordered couple of the neural network N and the meme M

X = (N , M ) (26)

To complete a specification of the complex chromosome-meme, we have to indicate a way to utilize its memetic part in the course of the agent life-span. At the beginning of the life-span of an agent-offspring X = (N = (S, w,ϑ),M ) an adaptation process of the agent neural network N = (S, w,ϑ) is carried out a prescribed number of steps with respect to the meme M (remember that the meme M has the structure of a training set: it is composed of pairs of the form input/required-output). We get an “educated” agent-offspring X 0 = N 0 = (S, w0,ϑ0), M ). Loosely speaking, these adaptations can be considered as “kindergartens” where agents at the beginning of their life-span are educated with respect to memes M that are inherited from parents. The main part of an agent life-span is the agent’s participation in random elementary communication acts, where it plays either the role of speaker or listener. If the agent was the listener, then its cognitive device (initially adapted with respect to its meme M ) undergoes one-step adaptation so that the required output state is the speaker meaning vector y. This is done in the framework of fitness evaluation specified in Section 5 (see Figure 8). At the end of an agent’s life-span the meme M is updated [see Equation (25)] so that by making use of the adapted cognitive device new input strings are generated, G 1(w,ϑ; y) for each y ∈ A. Consequently, after finishing the fitness evaluation process and the updating of its meme, the original agent-offspring has passed through the four stages (see Figure 9).

7 Computational Simulation of Darwinian Evolution (Evolutionary Algorithms)

To simulate Darwinian evolution we use a simple version of an evolutionary algorithm (it may be considered a rudimentary version of the famous genetic algorithm (GA) [13, 20], where a crossover operation was removed. In the literature such an EA is called evolutionary programming [12] ), see Figure 10. We shall postulate that a population is composed of simple cognitive agents that are endowed with neural networks and are

332 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

Figure 9. Four stages of an agent life-span. In the first stage, at the beginning of the life-span, an agent-offspring is adapted (educated in a kindergarten) a prescribed number of steps with respect to a meme M inherited from its parent. In the second stage the agent stochastically undergoes elementary communication acts. If it is a listener, then its cognitive device is modified by one-step adaptation with respect to the speaker meaning vector y. In the third stage, the agent updates its meme using its current learned neural network, meaning vectors y of the meme remain the same, but the token strings x are updated. In the fourth stage, the agent is ready to take part in a reproduction process, and a memory of its previous life-span is deposited in the updated meme M0. The modified weight and threshold coefficients acquired during its life-span are substituted by their initial values. This meme is transferred during the reproduction process to offspring. In the course of the reproduction process, the offspring neural network is created from the parent neural network so that either structure or weight and threshold coefficients (not simultaneously) are slightly modified (i.e., mutated).

capable of pairwise elementary communication. In our approach the Darwinian evo- lution involves both its constituents that are important for its acceleration, in particular the Baldwin effect and Dawkins’ memes. A computer simulation of the evolution of cognitive agents and the emergence of coordinated communication between them can be expressed as a sequence of the following steps:

Step 1. To initiate the algorithm, a population of agents is randomly generated. In this stage agents are represented by neural networks only; their chromosome memetic parts are empty. Step 2. If agents have nonempty chromosome memetic parts, then agents are adapted a prescribed number of steps. This adaptation is carried out with respect to the memetic parts that are composed of parental couples of input (token string) and required output (meaning vector). Step 3. Each agent’s fitness is evaluated. This evaluation of fitness is realized so that for the whole population a communication act is carried out for all possible pairs of agents. In each pair one is declared a speaker and the other a listener. The speaker codes a meaning vector into a signal, which is sent to the listener. The listener decodes this signal into a form of meaning vector and carries out one-step adaptation of its neural network so that the required output is closer to the speaker meaning vector. Step 4. Agents update their chromosome memetic parts by making use of their current neural networks; see Figure 9. Step 5. A new population is created so that we successively select quasirandomly the best-fitted agent-parents and the selected agents are shifted to the new popula-

Artificial Life Volume 5, Number 4 333

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

Figure 10. A diagrammatic outline of a simplified version of the used evolutionary algorithm. A population P is composed of chromosomes that are evaluated by positive real numbers called the fitness. The fitness is assigned to a quality of chromosomes, better quality meaning a greater fitness. Repeatedly applying a process of reproduction we create from the old population P a new population P0. The process of reproduction is composed of two subprocesses called selection and mutation. The subprocess of selection means that from the population P a chromosome is selected quasirandomly, depending on its fitness (fitter chromosomes have a greater chance to be selected). The second subprocess, mutation, corresponds to a small random change of the chromosome. The created chromosome is transferred to a new population P0; if this population is composed of the same number of chromosomes as the old population P, then P P0 and the whole process is repeated. The algorithm is stopped if some convergence criteria are met, for example, if population P is very homogenous, that is, it is composed of almost the same chromosomes.

tion composed of agent-offspring. Since our evolution is Darwinian, offspring inherit from parents only original weight and threshold coefficients that were not modified by adaptation processes. Moreover, offspring neural networks are created from par- ent neural networks so that they are either structurally or parametrically mutated (not both). Step 6. New population updates old population. Step 7. Steps 2–6 are repeated until convergence criteria are fulfilled (or algorithm is repeated a prescribed number of epochs); otherwise we go to Step 8. Step 8. End of algorithm.

The algorithm has a very simple interpretation. In Step 1 there is randomly generated a population of cognitive agents that are represented by neural networks. We postulate that these neural networks have both structural and parametric plasticity, so that within the reproduction process they may be slightly varied. In this initial step agents are not featured with memes in which a “cultural heritage” of parents is built-in and transferred from one generation to the next. In Step 2 agents are “educated” (if agent memes are not empty, i.e., agents are not in the first, or initial, epoch). The weight and thresh- old coefficients specifying the parametric properties of neural network are modified a prescribed number of steps by an adaptation process performed with respect to the corresponding meme. Loosely speaking, the inherited meme can be considered as a “cultural heritage” transferred from parents to offspring and its information contents serve as a primer for the offspring education at the initial stage of its life-span (in a kindergarten). In Step 3 all agents undergo a fitness evaluation process composed of a prescribed number of repetitions of the process of elementary communication acts. It is very important that in the course of this evaluation process agents are not kept fixed but their weight coefficients are modified when the agent has been selected as a listener. More specifically, a listener transforms by its neural network a received signal into a meaning vector, then the listener network performs one-step adaptation so that

334 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

Table 1. Set of meaning vectors. Meaning Meaning Meaning Meaning No. vectora No. vector No. vector No. vector 1* (11000110) 5 (11001001) 9 (11001100) 13 (11000011) 2 (00110110) 6* (00111001) 10 (00111100) 14 (00110011) 3 (01010110) 7 (01011001) 11* (01011100) 15 (01010011) 4 (10100110) 8 (10101001) 12 (10101100) 16* (10100011) a Meaning vectors labeled with a star belong to the test set Atest; remaining meaning vectors belong to the training set Atrain.

the required output is closer to the speaker meaning vector. A result of this adaptation is a decrease in differences between calculated and required meaning vector. In the forthcoming Step 4 agents update their memes in such a way that they are recalculated by making use of the current versions of agent neural networks. In other words, expe- rienced agents at the end of their life-span summarize their knowledge for successors (i.e., offspring) in a form of memes. In Step5areproduction process is applied to the whole population, and a new population composed of offspring is created. The impossibility of directly transferring acquired knowledge from parents to offspring neu- ral networks is partially suppressed by the assumption that Dawkins’ memes are taken into account. Memes are conveyors of useful information from parents to offspring. In Step 6 the new population updates the old population. In Step 7 criteria of convergence are tested. One of the simplest approaches to finishing the algorithm is to prescribe the number of evolutionary epochs.

8 Computational Simulations

The theory just outlined will be applied to computational simulations of emergence of coordinated communication. First, we introduce meaning vectors that are structured so that we can expect that the emerged coordinated communication will be endowed with similar structure regularities to that in the meaning vectors. This is considered as a manifestation of the grammar in the emerged coordinated communication. Let meaning vectors be determined as binary vectors created by the Cartesian product of two subsets of binary vectors (see Table 1):

A = A1 A2

A1 ={(1100), (0011), (0101), (1010)}, A2 ={(0110), (1001), (1100), (0011)}

It means that each of 16 eight-dimensional meaning vectors is composed of two parts that originate from subsets A1 and A2 (see Table 1). Moreover, the set A is divided into disjoint subsets Atrain and Atest, A = Atrain ∪ Atest, where the testing subset Atest is composed of starred meaning vectors from Table 1. The simulation of evolution will be performed with respect to the training set of meaning vectors, and we will expect that the emerging coordinated communication is capable of correct mapping of testing meaning vectors into token strings so that the same grammar is used as for training meaning vectors. Our computational simulations were done for the following specifications and nu- merical values of the chosen method:

1. The size of the population is composed of 30 agents that are represented by neural networks. The evolutionary process is initiated by a random generation of all

Artificial Life Volume 5, Number 4 335

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

agents so that the number of input and output neurons is always kept fixed, but the number of hidden neurons is varied and it is bounded from above by 15 neurons. Initial values of weight and threshold coefficients are taken from the open interval (1, 1). Moreover, connections are generated randomly so that between hidden neurons there may exist feedback connections. General architecture of allowed neural networks is fully specified in Section 3. 2. An evaluation of fitness of agent is specified by Algorithm 1 (see Figure 8). For all possible pairs (speaker and listener) all meaning vectors from the “training” set Atrain are applied for elementary communication acts. The meaning vectors are coded by speakers to strings of maximal length 10 composed of two symbol {a, b}. 3. The evolutionary algorithm is specified in Section 7, Figure 10. The chosen algorithm is very similar to Fogel’s evolutionary programming [12], where crossover between two chromosomes is ignored. The reproduction process is based on a quasirandom selection of agents (the probability of this selection is proportional to agent fitness), then the selected agent undergoes with the same probability either structural or parametric mutation. Alternative forms of four different modes of structural mutations are displayed in Figure 7. Parametric mutations are performed such that with a small probability Ploc-mut = 0.05 for all weight and threshold coefficients an addition of random numbers with r(0,) is realized, where a deviation = 0.1. 4. Each evolutionary epoch is started by an “education” process of agent-offspring with respect to their memes that are composed of pairs of meaning vector and signal. Agents are adapted Nadapt = 50 steps so that the meme is considered as a training set. The learning coefficient for neural network adaptation processes is = 0.1.

Figures 11 through 13 summarize our numerical results. Plots unambiguously show a coordinated communication that emerges between agents that are capable of pairwise elementary communication acts together with a learning process. This observation means that cognitive devices of all agents evolved so that they mapped training vectors onto signals (strings of symbols) by the same manner. In our simulation the Baldwin effect represents important acceleration of Darwinian evolution and it was simply accounted for by allowing a learning process during the life-span of agents and a plasticity of their cognitive devices (neural networks). We distinguish two types of plasticity, structural and parametric; both are included in our EA when a reproduction of parents to offspring is performed. The Baldwin effect may be turned off if the learning process at the end of each elementary communication act is removed (e.g., by setting the learning coefficient = 0). The importance of memes in our computational simulations is illustrated in Figure 12. Its plots unambiguously indicate that for the emergence of coordinated communication an inclusion of Dawkins’ memes is of primary importance. It may be assumed that a plasticity of agent cognitive devices together with a learning process within elemen- tary communication acts are necessary requirements for the evolutionary emergence of coordinated communication. This may be correct for simpler cognitive tasks rather than for highly developed and structured coordinated communication. A population of agents endowed by relatively complex cognitive devices (represented by recurrent neu- ral networks) can hardly master coordinated communication using simple Darwinian evolution even when accelerated by the Baldwin effect. This problem can be sur- mounted only when Dawkins’ memes are used. In general, the simplest cognitive tasks can be built into an agent genotype by the Baldwin effect, but this is obviously

336 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

Figure 11. Three different diagrams that show numerical results obtained by our calculations. Diagram A corresponds to a plot of mean distances between speaker and listener meaning vectors that are calculated for elementary communication acts. We see that this mean distance “monotonously” decreases from the initial value 0.5 to small positive values. This means that coordinated communication spontaneously emerged in a population of agents—that is, listeners correctly decoded received signals into meaning vectors that are equal to those meaning vectors that were used by speakers for creation of their signals. Diagram B shows that mean length of signals sent by speakers to listeners “monotonously” decreases in the course of evolution of a population. In the initial stage of evolution all agents send signals with maximal length; the signals get progressively shorter during evolution; at the end of evolution the length of sent signals reaches a minimal length. This observation implies that the used cognitive device (neural networks) spontaneously tends to create shorter messages during emergence of coordinated communication. The last diagram C shows a fraction of memes that have the same signal parts as the currently best-fitted agent. At the beginning of evolution all agents incorrectly map meaning vectors onto signals of the maximal length that are composed of the same symbol (e.g., aaa..aa,orbbb..bb). This is the main reason why in this initial stage of evolution a relatively high proportion of all memes is equal to memes of the current best-fitted agent and therefore the proportion of correct memes on the plot is high. As evolution progresses, this fraction dramatically decreases almost to zero. This means that all agents have started to produce the same signals for the same meaning vectors. Finally, in the third and final stage of evolution, when coordinated communication evolves between agents, the fraction of similar signals from the memes is rapidly increased to a value slightly smaller than one. In other words, we may say that for a population with well-evolved coordinated communication all agents use the same “vocabulary”; that is, all cognitive devices produce the same one-to-one mapping of meaning vectors onto signals.

impossible for more complicated cognitive tasks, such as coordinated communica- tion. Signals produced by the best-fitted agent at the end of evolution are summarized in Table 2. In accordance with our previous discussion (see Diagram A in Figure 11) this assignment of meaning vectors to the corresponding signals is unambiguous and correct for all population agents. In other words, we may say that Table 2 is a vocabulary of the emerged coordinated communication, in which each meaning vector is represented by a string composed of symbols a and b, and conversely. Moreover, this representation of meaning vectors by symbol strings also manifests an outline of grammar, for example, all meaning vectors starting with sequences 1100...and 0101...(0011...and 1010...)

Artificial Life Volume 5, Number 4 337

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

Figure 12. Two independent plots of mean distance for simulations with and without included memes. We see that if memes are not considered in our evolutionary simulations of the emergence of coordinated communication, then mean distance is kept fixed at about 0.5, that is, there does not exist a coordinated communication between agents.

Figure 13. A plasticity evolution of cognitive devices (neural networks) is illustrated by three different plots of fractions of weight coefficients that have nonzero values (i.e., they correspond to connections between corresponding neurons). We see that at the beginning of evolution all three proportions have started in 0.30 values (this means that only 30% of connections between input-hidden, hidden-hidden, and hidden-output neurons, respectively, were randomly generated). The most dramatic changes appeared for connections between hidden and input neurons: Their frequency was increased from 30 to 60%. This decisive change may suggest that plasticity of neural network architecture is substantial for coordinated communication.

are represented by strings with the first symbol b(a). Table 2 includes also meaning vectors that are not included in the training set Atrain; we see that these meaning vectors also are correctly interpreted. The plasticity of cognitive devices of agents is illustrated in Figure 13. The neural networks used in our study as a proper tool for cognitive devices are capable of changes of their architectures in the course of the reproduction process (i.e., when offspring are created from parents). In general, neural networks have a fixed number of input and output neurons (these numbers are determined by a dimension of binary vectors that code string symbols and the dimension of meaning vectors, respectively). The numbers of hidden neurons and connections between input-hidden, hidden-hidden, and hidden- output neurons may vary in the course of population evolution. Mutations of agent “genotype” in the course of reproduction are of the structural and parametric type. In particular, structural mutation is responsible for changes of neural network architectures during the evolution of agents. As one can see in Figure 13, neural connections are

338 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

Table 2. Meaning vectors and assigned strings produced by the best-fitted agent. Meaning Meaning No. vectora String No. vector String 1* (11000110) b 9 (11001100) baaa 2 (00110110) ab 10 (00111100) aa 3 (01010110) a 11* (01011100) aaaa 4 (10100110) bb 12 (10101100) ba 5 (11001001) baa 13 (11000011) bbb 6* (00111001) aaaa 14 (00110011) abbb 7 (01011001) aaa 15 (01010011) abb 8 (10101001) baaaa 16* (10100011) bbbb a Meaning vectors labeled with a star belong to the test set Atest; remaining meaning vectors belong to the training set Atrain.

tuned in positions that are best fitted for a performance of coordinated coordination (and also for its emergence).

9 Conclusions and Future Plans

If we look within classic linguistic or classic cognitive sciences for an answer to the question of how human languages (or in general, coordinated communication in a population of agents that are endowed by specific cognitive activities) are evolution- arily acquired, then we have to expect an answer of a very general character without detailed mechanisms of their emergence. Substantial breakthrough happens in the field of artificial life if we start to consider recent human beings as a result of a Darwinian evolutionary process, as something that has itself an evolutionary trajectory from the long past to the present. Then automatically we have to accept the idea that human language communication gradually evolved from its simplest forms to the current highly developed coordinated communication formally expressed by generative grammars. Si- multaneously with the evolution of language, the evolution of a cognitive device (brain) must also appear that enables necessary elementary cognitive activities of mapping of meaning states into linearly structured communication signals and vice versa. Figure 13 shows in an indirect way the evolution of the neural networks caused by the Baldwin effect, supported by a meme transfer. While in the beginning of the evolution the weight coefficients with the nonzero values are evenly distributed, at the end of evolution the picture is quite different. The results, which are not shown, suggest that the agents in the beginning required a lot of learning even to get rather poor results. However, at the end, when the learning abilities started to be replaced by a very good initial weight coefficient, the learning period did not increase the fitness much. Since neither the Baldwin effect nor memes alone were enough for emergence of evolution, it was very difficult to distinguish between these two influences. We leave the separation of these effects to further work. We have demonstrated that for populations of agents that are capable of elementary communication acts there exists a spontaneous emergence of coordinated communica- tion. This evolutionary emergence could not be studied by a pure Darwinian evolution without using the concepts of the Baldwin effect and Dawkins’ memes. Both these enlargements are important not only for acceleration but also for an evolutionary mech- anism of emergence of complex cognitive tasks. In particular, we have demonstrated that inclusion of the Baldwin effect alone is not sufficient to sustain the emergence of coordinated communication. According to our observations coordinated communica-

Artificial Life Volume 5, Number 4 339

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

tion is a trait of a very complex nature; it requires necessarily an inclusion of Dawkins’ memes together with the Baldwin effect. A design of scenarios that specify detailed mechanisms of emergence and evolution of different cognitive activities (like the sce- nario presented above) represents a key contribution of modern computer science to cognitive sciences and their transformation into a science in which a hypothesis may be computationally falsified. Final notes of this concluding section are concerned with our future plans. First, the population of agents will be “geographically” structured so that it will be decomposed onto disjoint subpopulations. Each subpopulation will have some other subpopulations as its neighborhood. We will examine the effect of migration. It is expected that in different subpopulations we will get different communication “dialects”; that is, for two different subpopulations the meaning vectors would be mapped onto symbol strings by different manners. Second, in the present approach the meaning vectors are kept fixed throughout the entire evolution. It seems that a postulation of progressive increasing of the complexity of meaning vectors will give a more realistic scenario of evolutionary emergence of coordinated communication. Third, one of the most interesting problems of theoretically oriented artificial neural networks is the problem of their modularity. In neural networks information is “compressed” holistically; that is, it cannot be de- termined what part of the neural network is responsible for a specific subtask. For instance, let us consider a neural network trained for recognition of human faces. By analysis of its structure it is impossible to separate its subparts so that we can say that a specific subpart is responsible for the classification of eyes or hair. In neuroscience [2] it is a well-established fact that the brain is composed among other things of mod- ules that perform specific preprocessing or processing (e.g., cognitive) tasks. We may formulate a question about the evolutionary origin of these modules. We would like to study a similar problem in our computational simulations. Let us imagine that meaning vectors are composed of different subparts. We postulate that these subparts corre- spond to relatively well-separated submeanings (e.g., color of an object). We expect that agent cognitive devices (neural networks) are well separated into subdevices so that each of them performs a mapping of the submeaning into a subsignal, and then a module-supervisor integrates the subsignals into one resulting signal that is formally related to the original meaning vector. Our present effort is concentrated on a design of such architectures of neural networks that are biologically plausible and are capable of the above modular activities.

Acknowledgments This work was supported by grants No. 1/4209/97 and No. 1/5229/98 of the Scientific Grant Agency of the Slovak Republic.

References 1. Angeline, P. J., Saunders, G. M., & Pollack, J. B. (1994). An evolutionary algorithm that constructs recurrent neural networks. IEEE Transactions on Neural Networks, 5, 54–65. 2. Arbib, M. A. (Ed.). (1995). Handbook of brain theory and neural networks. Cambridge, MA: MIT Press. 3. Arita, T., & Koyama, Y. (1998). Evolution of linguistic diversity in a simple communication system. In C. Adami, R. Belew, H. Kitano, & C. Taylor (Eds.), Artificial life VI (pp. 9–17). Cambridge, MA: MIT Press. 4. Baldwin, J. M. (1986). A new factor in evolution. American Naturalist, 30, 441–451. 5. Batali, J. (1994). Innate biases and critical periods: Combining evolution and learning in the acquisition of syntax. In R. Brooks & P. Maes (Eds.), Artificial life IV (pp. 160–171). Cambridge, MA: MIT Press.

340 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

6. Batali, J. (1998). Computational simulations of the emergence of grammar. In J. R. Hurford, M. Studdert-Kennedy, & C. Knight (Eds.), Approaches to the evolution of language: Social and cognitive bases (pp. 405–426). Cambridge, UK: Cambridge University Press. 7. Batali, J. (in press). The negotiation and acquisition of recursive grammars as a result of competition among exemplars. In C. Knight, J. R. Hurford, & M. Studdert-Kennedy (Eds.), The evolutionary emergence of language: Social function and the origins of linguistic form. Cambridge, UK: Cambridge University Press. 8. Belew, R. K., & Mitchel, M. (Eds.). (1996). Adaptive individuals in evolving populations: Models and algorithms. Redwood City, CA: Addison-Wesley. 9. Chomsky, N. (1957). Syntactic structures. The Hague: Mounton & Co. 10. Chomsky, N. (1987). Knowledge of language: Its nature, origin, and use. New York: Praeger. 11. Dawkins, R. (1976). The selfish gene. Oxford: Oxford University Press. 12. Fogel, D. B. (1995). Evolutionary computation. Toward a new philosophy of machine intelligence. New York: IEEE Press. 13. Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Reading, MA: Addison-Wesley. 14. Hamilton, W. D. (1964). The genetical evolution of social behavior I. Journal of Theoretical Biology, 7, 1–16. 15. Hamilton, W. D. (1964). The genetical evolution of social behavior II. Journal of Theoretical Biology, 7, 17–52. 16. Hashimoto, T., & Ikegami, T. (1996). Emergence of net-grammar in communicating agents. BioSystems, 38, 1–14. 17. Haykin, S. (1994). Neural networks: A comprehensive foundation. New York: Macmillan. 18. Heylighen, F. (1992). Evolution, selfishness and cooperation; Selfish memes and the evolution of cooperation. Journal of Ideas, 2, 70–84. 19. Hinton, G. E., & Nowlan, S. J. (1987). How learning can guide evolution. Complex System 1, 495–502. 20. Holland, J. H. (Ed.). (1975). Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press. 21. Hurford, J. R., Studdert-Kennedy, M., & Knight, C., (Eds.). (1998). Approaches to the evolution of language: Social and cognitive bases. Cambridge, UK: Cambridge University Press. 22. Hurford, J. R. (1998). How language and languages evolve. In R. Dunbar, C. Knight, & C. Power (Eds.), The evolution of culture. Edinburgh: Edinburgh University Press. 23. Kirby, S. (1999). Function, selection, and innateness: The emergence of language universals. Oxford: Oxford University Press. 24. Kirby, S. (1998). Fitness and the selective adaptation of language. In J. R. Hurford, M. Studdert-Kennedy, & C. Knight (Eds.), Approaches to the evolution of language: Social and cognitive bases (pp. 359–383). Cambridge, UK: Cambridge University Press. 25. Kvasnicka, V., & Pospichal, J. (1998). Simulation of Baldwin effect and Dawkins memes by genetic algorithm. In R. Roy, T. Furuhashi, & P. K. Chawdhry (Eds.), Advances in soft computing—Engineering design and manufacturing (pp. 481–496). London: Springer Verlag. 26. MacLennan, B. J., & Burghardt, G. M. (1994). Synthetic ethology and the evolution of cooperative communication. Adaptive Behavior, 2, 161–187. 27. Oliphant, M. (1997). Formal approaches to innate and learned communication: Laying the foundation of language. Unpublished doctoral dissertation, University of California at San Diego.

Artificial Life Volume 5, Number 4 341

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021 V. Kvasnicka and J. Pospichal An Emergence of Coordinated Communication

28. Pinker, S., & Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13, 707–784. 29. Saunders, G. M., & Pollack, J. B. (1994). The evolution of communication in adaptive agents. Technical Report GS-94-EVCOMM. Unpublished. 30. Saussure, de F. (1983). Course in general linguistics. (R. Harris, Trans.). London: Duckworth. (Original French edition published 1916) 31. Steels, L. (1997). The synthetic modeling of language origin. Evolution of Communication Journal, 1, 1–34. 32. Steels, L. (1998). Synthesizing the origins of language and meaning using co-evolution, self-organization and level formation. In J. R. Hurford, M. Studdert-Kennedy, & C. Knight (Eds.), Approaches to the evolution of language: Social and cognitive bases (pp. 384–404). Cambridge, UK: Cambridge University Press. 33. Steels, L. (1996). Self-organizing vocabularies. In C. G. Langton & K. Shimohara (Eds.). Artificial life V. (pp. 177–184). Cambridge, MA: MIT Press. 34. Turney, P., Whitley, D., & Anderson, R. (Eds.). (1996). Evolution, learning, and instinct: 100 years of the Baldwin effect. [Special issue]. Evolutionary Computation, 4 (3). 35. Werner, G. W., & Dyer, M. G. (1991). Evolution of communication in artificial intelligence. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmusen (Eds.). Artificial life II (pp. 659–687). Redwood City, CA: Addison Wesley.

342 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568809 by guest on 25 September 2021