<<

BEHAVIORAL AND BRAIN SCIENCES (2014) 37, 529–604 doi:10.1017/S0140525X13003099

Brain mechanisms of acoustic communication in humans and nonhuman primates: An evolutionary perspective

Hermann Ackermann Neurophonetics Group, Centre for – General Neurology, Hertie Institute for Clinical Brain Research, University of Tuebingen, D-72076 Tuebingen, Germany [email protected] www.hih-tuebingen.de/neurophonetik

Steffen R. Hage Neurobiology of Vocal Communication Research Group, Werner Reichardt Centre for Integrative , and Institute for Neurobiology, Department of Biology, University of Tuebingen, D-72076 Tuebingen, Germany [email protected] www.vocalcommunication.de

Wolfram Ziegler Clinical Research Group, City Hospital Munich- Bogenhausen, D-80992 Munich, and Institute of Phonetics and Speech Processing, Ludwig-Maximilians-University, D-80799 Munich, Germany. [email protected] www.ekn.mwn.de

Abstract: Any account of “what is special about the ” (Passingham 2008) must specify the neural basis of our unique ability to produce speech and delineate how these remarkable motor capabilities could have emerged in our hominin ancestors. Clinical data suggest that the basal ganglia provide a platform for the integration of primate-general mechanisms of acoustic communication with the faculty of articulate speech in humans. Furthermore, neurobiological and paleoanthropological data point at a two-stage model of the phylogenetic evolution of this crucial prerequisite of spoken language: (i) monosynaptic refinement of the projections of motor cortex to the brainstem nuclei that steer laryngeal muscles, presumably, as part of a “phylogenetic trend” associated with increasing during hominin evolution; (ii) subsequent vocal-laryngeal elaboration of cortico-basal ganglia circuitries, driven by human-specific FOXP2 mutations. This concept implies vocal continuity of spoken language evolution at the motor level, elucidating the deep entrenchment of articulate speech into a “nonverbal matrix” (Ingold 1994), which is not accounted for by gestural-origin theories. Moreover, it provides a solution to the question for the adaptive value of the “first word” (Bickerton 2009) since even the earliest and most simple verbal utterances must have increased the versatility of vocal displays afforded by the preceding elaboration of monosynaptic corticobulbar tracts, giving rise to enhanced social cooperation and prestige. At the ontogenetic level, the proposed model assumes age-dependent interactions between the basal ganglia and their cortical targets, similar to vocal in some songbirds. In this view, the emergence of articulate speech builds on the “renaissance” of an ancient organizational principle and, hence, may represent an example of “evolutionary tinkering” (Jacob 1977).

Keywords: articulate speech; basal ganglia; FOXP2; human evolution; speech acquisition; spoken language; striatum; vocal behavior;

1. Introduction: Species-unique (verbal) and troglodytes) and bonobos (Pan paniscus) (Hillix 2007; primate-general (nonverbal) aspects of human Wallman 1992), despite the fact that these species have vocal behavior “notoriously mobile lips and tongues, surely transcending ” 1.1. Nonhuman primates: Speechlessness in the face the human condition (Tuttle 2007, p. 21). As an example, the cross-fostered chimpanzee infant Viki mas- of extensive vocal repertoires and elaborate oral-motor “ ” capabilities tered less than a handful of words even after extensive training. These utterances were not organized as speech- All attempts to teach great apes spoken language have like vocal tract activities, but rather as orofacial manoeuvres failed – even in our closest cousins, the chimpanzees (Pan imposed on a (voiceless) expiratory air stream (Hayes 1951,

© Cambridge University Press 2014 0140-525X/14 $40.00 Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at 529 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates p. 67; see Cohen 2010). By contrast, Viki was able to skill- point of language evolution in our species (e.g., Corballis fully imitate manual and even orofacial movement 2002, p. ix; 2003). sequences of her caretakers (Hayes & Hayes 1952) and Tracing back to the 1960s, vocal tract morphology has learned, for example, to blow a whistle (Hayes 1951, been assumed to preclude production of “the full range pp. 77, 89). of human speech sounds” (Lieberman 2006a; 2006b, Nonhuman primates are, nevertheless, equipped with p. 289) and, thereby, to constrain imitation of spoken lan- rich vocal repertoires, related specifically to ongoing guage in nonhuman primates (Lieberman 1968; Lieber- intra-group activities or environmental events (Cheney & man et al. 1969). However, this model cannot account for Seyfarth 1990; 2007). Yet, their calls seem to be linked to the inability of nonhuman primates to produce even the different levels of arousal associated with especially most simple verbal utterances. The complete lack of urgent functions, such as escaping predators, surviving in verbal acoustic communication rather suggests more fights, keeping contact with the group, and searching for crucial cerebral limitations of vocal tract motor control food resources or mating opportunities (Call & Tomasello (Boë et al. 2002; Clegg 2012; Fitch 2000a; 2000b). Accord- 2007; Manser et al. 2002; Seyfarth & Cheney 2003b; Tom- ing to a more recent hypothesis, lip smacking – a rhythmic asello 2008). Several studies point, indeed, at a more elab- facial expression frequently observed in monkeys – might orate “cognitive load” to the vocalizations of monkeys and constitute a precursor of the dynamic organization of apes in terms of subtle audience effects (Wich & de Vries speech syllables (Ghazanfar et al. 2012; MacNeilage 2006), conceptual-semantic information (Zuberbühler 1998). As an important evolutionary step, a phonation 2000a; Zuberbühler et al. 1999), proto-syntactical call con- channel must have been added in order to render lip catenations (Arnold & Zuberbühler 2006; Ouattara et al. smacking an audible behavioral pattern (Ghazanfar et al. 2009), conditionability (Aitken & Wilson 1979; Hage 2013). Hence, this theory calls for a neurophysiological et al. 2013; Sutton et al. 1973; West & Larson 1995), and model of how articulator movements were refined and, the capacity to use distinct calls interchangeably under finally, integrated with equally refined laryngeal move- different conditions (Hage et al. 2013). It remains, ments to create the complex motor skill underlying the pro- however, to be determined whether such communicative duction of speech. skills really represent precursors of higher-order cognitive–linguistic operations. In any case, the motor 1.2. Dual-pathway models of acoustic communication mechanisms of articulate speech appear to lack significant and the enigma of emotive speech prosody vocal antecedents within the primate lineage. This limita- tion of the faculty of acoustic communication is “particular- The calls of nonhuman primates are mediated by a complex ly puzzling because [nonhuman primates] appear to have so network of brainstem components, encompassing a mid- many concepts that could, in principle, be articulated” brain “trigger structure,” located in the periaqueductal (Cheney & Seyfarth 2005, p. 142). As a consequence, the gray (PAG) and adjacent tegmentum, and a pontine vocal manual and facial gestures rather than the vocal calls of pattern generator (Gruber-Dujardin 2010; Hage 2010a; our primate ancestors have been considered the vantage 2010b). In addition to various subcortical limbic areas, the medial wall of the frontal lobes, namely, the cingulate vocalization region and adjacent neocortical areas, also pro- jects to the PAG. This region, presumably, controls higher- order motor aspects of vocalization such as operant call HERMANN ACKERMANN is Professor of Neurological conditioning (e.g., Trachy et al. 1981). By contrast, the Rehabilitation at the Centre for Neurology, Hertie In- stitute for Clinical Brain Research, University of Tue- acoustic implementation of the sound structure of spoken bingen. His research focuses on the cerebral basis of language is bound to a cerebral circuit including the ventro- speech production and speech perception, and he is lateral/insular aspects of the language-dominant frontal the author or coauthor of more than 120 publications lobe and the primary sensorimotor cortex, the basal within the domains of neuropsychology, neurolinguis- ganglia, and cerebellar structures in either hemisphere tics, and neurophonetics. (Ackermann & Riecker 2010a; Ackermann & Ziegler 2010; Ackermann et al. 2010). Given the virtually complete STEFFEN R. HAGE is Head of the Neurobiology of speechlessness of nonhuman primates, the behavioral ana- Vocal Communication Research Group at the Werner logues of acoustic mammalian communication might not be Reichardt Centre for , Univer- sought within the domain of spoken language, but rather in sity of Tuebingen. He is the author of more than 20 pub- lications within the area of neuroscience, especially the nonverbal affective vocalizations of our species such as laughing, crying, or moaning (Owren et al. 2011). Against and neuroethology. His major research “ ” interests focus on audio-vocal integration as well as this background, two separate neuroanatomic channels vocal-motor control mechanisms in acoustic communi- with different phylogenetic histories appear to participate cation of mammals, as well as cognitive processes in- in human acoustic communication, supporting nonverbal volved in vocal behavior of nonhuman primates. affective vocalizations and articulate speech, respectively (the “dual-pathway model” of human acoustic communica- WOLFRAM ZIEGLER is Head of the Clinical Neuropsy- tion; see Ackermann 2008; Owren et al. 2011; for an earlier chology Research Group at the City Hospital Munich- formulation, see Myers 1976). Bogenhausen and Professor of Neurophonetics at the Human vocal expression of motivational states is not re- Ludwig-Maximilians- University of Munich. He is the author or co-author of more than 150 publications in stricted to nonverbal affective displays, but deeply invades articulate speech. Thus, a speaker’s arousal-related mood peer-reviewed journals in the area of speech and lan- “ ” guage disorders. such as anger or joy shape the tone of spoken language (emotive/affective speech prosody). Along with nonverbal

Downloaded from http:/www.cambridge.org/core530 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

affective vocalizations, emotive speech prosody has also be developmental verbal dyspraxia show a reduced volume considered a behavioral trait homologous to the calls of of the striatum, the extent of which is correlated with the nonhuman primates (Heilman et al. 2004; Jürgens 1986; severity of nonverbal oral and speech motor impairments 2002b; Jürgens & von Cramon 1982).1 Moreover, one’s at- (Watkins et al. 2002b).3 Second, placement of two titude towards a person and one’s appraisal of a topic have a hominin-specific FOXP2 mutations into the mouse significant impact on the “speech melody” of verbal utter- genome (“humanized Foxp2”) gives rise to distinct morpho- ances (attitudinal prosody). Often these implicit aspects logical changes at the cellular level of the cortico-striatal- of acoustic communication – how we say something – are thalamic circuits in these rodents (Enard 2011). more relevant to a listener than propositional content, However, verbal dyspraxia subsequent to FOXP2 mutations that is, what we say (e.g., Wildgruber et al. 2006). The is characterized by a fundamentally different profile of timber and intonational contour of a speaker’s voice, the speech motor deficits as compared to Parkinsonian dysarth- loudness fluctuations and the rhythmic structure of verbal ria. The former resembles a communication disorder utterances, including the variation of speaking rate and which, in adults, reflects damage to fronto-opercular the local distinctness of articulation, represent the most cortex (i.e., inferior frontal/lower precentral gyrus) or the salient acoustic correlates of affective and attitudinal anterior insula of the language-dominant hemisphere prosody (Scherer 1986; Scherer et al. 2009; Sidtis & Van (Ackermann & Riecker 2010b; Ziegler 2008). Lancker Sidtis 2003). Unlike the propositional content of To resolve this dilemma, we propose that ontogenetic the speech signal – which ultimately maps onto a digital speech acquisition depends on close interactions between code of discrete phonetic-linguistic categories – the prosod- the basal ganglia and their cortical targets, whereas ic modulation of verbal utterances conveys graded/ mature verbal communication requires much less striatal analogue information on a speaker’s motivational states processing capacities. This hypothesis predicts different and intentional composure (Burling 2005). Most impor- speech motor deficits in perinatal dysfunctions of the tantly, activity of the same set of vocal tract muscles and basal ganglia as compared to the acquired dysarthria of a single speech wave simultaneously convey both the prop- PD patients. More specifically, basal ganglia disorders ositional and emotional contents of spoken language. with an onset prior to speech acquisition should severely Hence, two information sources seated in separate brain disrupt articulate speech rather than predominantly com- networks and creating fundamentally different data struc- promise the implementation of speech prosody. tures (analogue versus digital) contribute simultaneously to the formation of the speech signal. Therefore, the two 1.3. Organization of this target article channels must coordinate at some level of the central nervous system. Otherwise these two inputs would distort The suggestion that structural refinement of cortico-striatal and corrupt each other. So far, dual-pathway models of circuits – driven by human-specific mutations of the human acoustic communication have not specified the FOXP2 gene – represents a pivotal step towards the emer- functional mechanisms and neuroanatomic pathways gence of spoken language in our hominin ancestors eludes that participate in the generation of a speech signal with any direct experimental evaluation. Nevertheless, certain “intimately intertwined linguistic and expressive cues” inferences on the role of the basal ganglia in speech (Scherer et al. 2009, p. 446; see also Banse & Scherer motor control can be tested against the available clinical 1996, p. 618). This deep entrenchment of articulate and functional-imaging data. As a first step, the neuroana- speech into a “nonverbal matrix” has been assumed to rep- tomical underpinnings of the vocal behavior of nonhuman resent “the weakest point of gestural theories” of language primates are reviewed in section 2 – as a prerequisite to evolution (Ingold 1994, p. 302). the subsequent investigation of the hypothesis that in our Within the vocal domain, Parkinson’s disease (PD) – a species this system conveys nonverbal information paradigmatic dysfunction of dopamine neurotransmission through affective vocalizations and emotive/attitudinal at the level of the striatal component of the basal ganglia – speech prosody (sect. 3). Based upon clinical and neurobi- gives predominantly rise to a disruption of prosodic aspects ological data, section 4 then characterizes the differential of verbal utterances. Thus, the “addition of prosodic contribution of the basal ganglia to spoken language at contour” to articulate speech appears to depend on the in- the levels of ontogenetic speech acquisition (sect. 4.2.1) tegrity of the striatum (Darkins et al. 1988; see Van and of mature articulate speech (sect. 4.2.2), and delineates Lancker Sidtis et al. 2006). Against this background, struc- a neurophysiological model of the participation of the stri- tural reorganization of the basal ganglia during hominin atum in verbal behavior. Finally, these data are put into a evolution may have been a pivotal prerequisite for the paleoanthropological perspective in section 5. emergence of spoken language, providing a crucial phylo- genetic link – at least at the motor level – between the vocalizations of our primate ancestors, on the one hand, 2. Acoustic communication in nonhuman and the volitional motor aspects of articulate speech, on primates: Behavioral variation and cerebral control the other.2 2.1. Structural malleability of vocal signals Comparative molecular-genetic data corroborate this suggestion: First, certain mutations of the FOXP2 gene in 2.1.1. Ontogenetic emergence of acoustic call humans give rise to developmental verbal dyspraxia. This morphology. The vocal repertoires of monkeys and apes disorder of spoken language, presumably, reflects impaired encompass noise-like and harmonic components (Fig. 1A; sequencing of orofacial movements in the absence of basic De Waal 1988; Goodall 1986; Struhsaker 1967; Winter deficits of motor execution such as paresis of vocal tract et al. 1966). Vocal signals of both categories vary consider- muscles (Fisher et al. 2003; Fisher & Scharff 2009; ably across individuals, because age, body size, and stamina Vargha-Khadem et al. 2005). Individuals affected with influence vocal tract shape and tissue characteristics, for

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 531 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Figure 1A. Acoustic communication in nonhuman primates: Call structure. A. Spectrograms (left-hand section of each panel) and power spectra (right-hand section in each) of two common rhesus monkey vocalizations, that is, a “coo” (left panel) and a “grunt” (right panel). Gray level of the spectrograms codes for spectral energy. Coo calls (left panel) are characterized by a harmonic structure, encompassing a fundamental frequency (F0, the lowest and darkest band) fi and several harmonics (H1 to Hn). Measures derived from the F0 contour provide robust criteria for a classi cation of periodic signals, for example, peak frequency (peakF; Hardus et al. 2009a). Onset F0 seems to be highly predictive for the shape of the intonation contour, indicating the implementation of a “vocal plan” prior to movement initiation (Miller et al. 2009a; 2009b). Grunts (right) represent short and noisy calls whose spectra include more energy in the lower frequency range and a rather flat energy distribution.

example, the distance between the lips and the larynx dialects” of primate species (Snowdon 2008). Rarely, even (Fischer et al. 2002; 2004; Fitch 1997; but see Rendall memory-based imitation capabilities have been observed et al. 2005). However, experiments based on acoustic dep- in great apes: Thus, free-living chimpanzees were found rivation of squirrel monkeys (Saimiri sciureus) and cross- to copy the distinctive intonational and rhythmic pattern fostering of macaques and lesser apes revealed that call of the pant hoots of other subjects – even after the animal structure does not appear to depend in any significant providing the acoustic template had disappeared from the manner on species-typical auditory input (Brockelman & troop (Boesch & Boesch-Achermann 2000, pp. 234f). Schilling 1984; Geissmann 1984; Hammerschmidt & Whatever the precise mechanisms of vocal convergence, Fischer 2008; Owren et al. 1992; 1993; Talmage-Riggs these phenomena are indicative of the operation of a neu- et al. 1972; Winter et al. 1973). Thus, ontogenetic modifi- ronal feedback loop between auditory perception and vo- cations of acoustic structure may simply reflect maturation calization in nonhuman primates (see Brumm et al. 2004). of the vocal apparatus, including “motor-training” effects A male bonobo infant (“Kanzi”) reared in an enriched (Hammerschmidt & Fischer 2008; Pistorio et al. 2006), social environment spontaneously augmented his species- or the influence of hormones related to social status typical repertoire by four “novel” vocalizations (Hopkins (Roush & Snowdon 1994; 1999). In contrast, comprehen- & Savage-Rumbaugh 1991). However, these newly ac- sion and usage of acoustic signals show considerably more quired signals can be interpreted as scaled variants of a malleability than acoustic structure both in juvenile and single intonation contour (Fig. 3 in Taglialatela et al. adult animals (Owren et al. 2011). 2003). Since Pan paniscus has, to some degree, a graded rather than discrete call system (Bermejo & Omedes 2.1.2. Spontaneous adult call plasticity: Convergence on 1999; Clay & Zuberbühler 2009), new behavior challenges and imitation of species-typical variants of vocal could give rise to a differentiation of the available “vocal behavior. Despite innate acoustic call structures, the space”–indicating a potential to modulate call structures vocalizations of nonhuman primates may display some within the range of innate acoustic constraints rather than context-related variability in adulthood. For example, two the ability to learn new vocal signals. An alternative inter- populations of pygmy marmosets (Cebuella pygmaea)ofa pretation is that hitherto un-deployed vocalizations were different geographic origin displayed convergent shifts of recruited under those conditions (Lemasson & Hausberger spectral and durational call parameters (Elowson & 2004; Lemasson et al. 2005). Snowdon 1994; see further examples in Snowdon & Elowson 1999 and Rukstalis et al. 2003). Humans may 2.1.3. Volitional initiation of vocal behavior and modula- also match their speaking styles inadvertently during con- tion of acoustic call structure. It has been a matter of versation (“speech accommodation theory”; Burgoon debate for decades, in how far nonhuman primates are et al. 2010; see Masataka [2008a; 2008b] for an example). capable of volitional call initiation and modulation. A Such accommodation effects could provide a basis for the variety of behavioral studies seem to indicate both control changes in call morphology during social interactions in over the timing of vocal output and the capacity to nonhuman primates (Fischer 2003; Mitani & Brandt “decide” which acoustic signal to emit in a given context. 1994; Mitani & Gros-Louis 1998; Sugiura 1998). Subse- First, at least two species of New World primates (tamarins, quent reinforcement processes may give rise to “regional marmosets) discontinue acoustic communication during

Downloaded from http:/www.cambridge.org/core532 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

epochs of increased ambient noise in order to avoid signal operant control over spectro-temporal call structure in interferences and, therefore, to increase call detection nonhuman primates (Janik & Slater 1997; 2000). probability (Egnor et al. 2007; Roy et al. 2011). In addition, callitrichid monkeys obey “conversational rules” and show 2.1.4. Observational acquisition of species-atypical response selectivity during vocal exchanges (Miller et al. sounds. Few instances of species-atypical vocalizations in 2009a; 2009b; but see Rukstalis et al. 2003: independent nonhuman primates have been reported so far. Allegedly, F0 onset change). Such observations were assumed to indi- the bonobo Kanzi, mentioned earlier, spontaneously cate some degree of volitional control over call production. acquired a few vocalizations resembling spoken words As an alternative interpretation, these changes in vocal (Savage-Rumbaugh et al. 2004). Yet, systematic perceptual timing or loudness could simply reflect threshold effects data substantiating these claims are not available. As of audio-vocal integration mechanisms. Second, several further anecdotal evidence, Wich et al. (2009) reported nonhuman primates produce acoustically different alarm that a captive-born female orangutan (Pongo pygmaeus× vocalizations in response to distinct predator species, sug- Pongo abelii) began to produce human-like whistles at an gesting volitional access to call type (e.g., Seyfarth et al. age of about 12 years in the absence of any training. Further- 1980). Again, variation of motivational states could more, an idiosyncratic pant hoot variant (“Bronx cheer”– account for these findings. For example, the approach of resembling a sound called “blowing raspberries”) spread an aerial predator could represent a much more threaten- throughout a colony of several tens of captive chimpanzees ing event than the presence of a snake. To some extent, after it had been introduced by a male joining the colony even dynamic spectro-temporal features resembling the (Hopkins et al. 2007;Marshalletal.1999; similar sounds formant transients of the human acoustic speech signal have been observed in wild orangutans: Hardus et al. (see below sect. 4.1.) appear to contribute to the differen- 2009a; 2009b; van Schaik et al. 2003; 2006). Remarkably, tiation of predator-specific alarm vocalizations (“leopard these two acoustic displays, “raspberries” and whistles, do calls”) in Diana monkeys (Cercopithecus diana) (Riede & not engage laryngeal sound-production mechanisms, but Zuberbühler 2003a; 2003b; see Lieberman [1968] for reflect a linguo-labial trill (“raspberries”) or arise from oral earlier data). Yet, computer models insinuate that larynx air-stream resonances (whistles). Thus, the species-atypical lowering makes a critical contribution to these changes acoustic signals in nonhuman primates observed to date (Riede et al. 2005; 2006; see critical comments in Lieber- spare glottal mechanisms of sound generation. Apparently, man 2006b), thus, eliciting in a receiver the impression of laryngeal motor activity cannot be decoupled volitionally a bigger-than-real body size of the sender (Fitch 2000b; from species-typical audiovisual displays (Knight 1999). Fitch & Reby 2001). Diana monkeys may have learned this manoeuver as a strategy to mob large predators, a behavior often observed in the wild (Zuberbühler & 2.2. Cerebral control of motor aspects of call production Jenny 2007). 2.2.1. Brainstem mechanisms (PAG and pontine vocal The question of whether nonhuman primates are able to pattern generator). Since operant conditioning of the decouple their vocalizations from accompanying motiva- calls of nonhuman primates is technically challenging tional states and to use them in a goal-directed manner (Pierce 1985), analyses of the neurobiological control has been addressed in several operant-conditioning exper- mechanisms engaged in phonatory functions relied pre- iments (Aitken & Wilson 1979; Coudé et al. 2011; Hage dominantly on electrical brain stimulation. In squirrel et al. 2013; Koda et al. 2007; Sutton et al. 1973; West & monkeys (Saimiri sciureus) – the species studied most Larson 1995). In most of these studies, nonhuman primates extensively so far (Gonzalez-Lima 2010) – vocalizations learned to utter a vocalization in response to a food reward could be elicited at many cerebral locations, extending (e.g., Coudé et al. 2011; Koda et al. 2007). Rather than from the forebrain to the lower brainstem. This network demonstrating the ability to volitionally vocalize on encompasses a variety of subcortical limbic structures command, these studies merely confirm, essentially, that such as the hypothalamus, septum, and amygdala nonhuman primates produce adequate, motivationally (Fig. 1B; Brown 1915; Jürgens 2002b; Jürgens & Ploog based behavioral reactions to hedonistic stimuli. A recent 1970; Smith 1945). In mammals, all components of this study found, however, that rhesus monkeys can be highly conserved “communicating brain” (Newman 2003) trained to produce different call types in response to arbi- appear to project to the periaqueductal grey (PAG) of the trary visual signals and that they are capable to switch midbrain and the adjacent mesencephalic tegmentum between two distinct call types associated with different (Gruber-Dujardin 2010).4 Based on the integration of cues on a trial-to-trial basis (Hage et al. 2013). These obser- input from motivation-controlling regions, sensory struc- vations indicate that the animals are able – within some tures, motor areas, and arousal-related systems, the PAG limits – to volitionally initiate vocalizations and, therefore, seems to gate the vocal dimension of complex multi- are capable to instrumentalize their vocal utterances in modal emotional responses such as fear or aggression. order to accomplish behavioral tasks successfully. Likewise, The subsequent coordination of cranial nerve nuclei macaque monkeys may acquire control over loudness and engaged in the innervation of vocal tract muscles depends duration of coo calls (Hage et al. 2013; Larson et al. on a network of brainstem structures, including, particular- 1973; Sutton et al. 1973; 1981; Trachy et al. 1981). A ly, a vocal pattern generator bound to the ventrolateral more recent investigation even reported spontaneous dif- pons (Hage 2010a; 2010b; Hage & Jürgens 2006). ferentiation of coo calls in Japanese macaques with respect to peak and offset of the F0 contour during 2.2.2. Mesiofrontal cortex and higher-order aspects of operant tool-use training (Hihara et al. 2003). Such accom- vocal behavior. Electrical stimulation studies revealed plishments may, however, be explained by the adjustment that both New and Old World monkeys possess a “cingulate of respiratory functions and do not conclusively imply vocalization region” within the anterior cingulate cortex

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 533 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Figure 1B. Acoustic Communication in nonhuman Primates: Cerebral Organization. Cerebral “vocalization network” of the squirrel monkey (as a model of the primate-general “communication brain”). The solid lines represent the “vocal brainstem circuit” of the vocalization network and its modulatory cortical input (ACC), the dotted lines the strong connections of sensory cortical regions (AC, VC) and motivation-controlling limbic structures (Ac, Hy, Se, St) to this circuit. Key: ACC = Anterior cingulate cortex; AC = Auditory cortex; Ac = Nucleus accumbens; Hy = Hypothalamus; LRF = Lateral reticular formation; NRA = Nucleus retroambigualis; PAG = periaqueductal gray; PB = brachium pontis; SC = superior colliculus; Se = Septum; St = Nucleus stria terminalis; VC = Visual cortex (Unpublished figure. See Jürgens 2002b and Hage 2010a; 2010b for further details).

(ACC), adjacent to the anterior pole of the corpus callosum lower branch of the arcuate sulcus and the subcentral (Jürgens 2002b;Smith1945;Vogt&Barbas1988). Uni- and dimple just above the Sylvian fissure in Old World bilateral ACC ablation in macaques had, however, a minor monkeys (Gil-da-Costa et al. 2006; Petrides & Pandya and inconsistent impact on spontaneously uttered coo calls, 2009; Petrides et al. 2005) and chimpanzees (Sherwood but disrupted the vocalizations produced in response to an et al. 2003). Nevertheless, even bilateral damage to the ven- operant-conditioning task (Sutton et al. 1974; Trachy et al. trolateral aspects of the frontal lobes has no significant 1981). Furthermore, damage to preSMA – a cortical area impact on the vocal behavior of monkeys (P. G. Aitken neighboring the ACC in dorsal direction and located 1981; Jürgens et al. 1982; Myers 1976; Sutton et al. 1974). rostral to the supplementary motor area (SMA proper) – re- Electrical stimulation of these areas in nonhuman primates sulted in significantly prolonged response latencies (Sutton also failed to elicit overt acoustic responses, apart from a few et al. 1985). Comparable lesions in squirrel monkeys dimin- instances of “slight grunts” obtained from chimpanzees ish the rate of spontaneous isolation peeps, but the acoustic (Bailey et al. 1950, pp. 334f, 355f). Therefore, spontaneous structure of the produced calls remains undistorted (Kir- call production, at least, does not critically depend on the in- zinger & Jürgens 1982). As a consequence, mesiofrontal ce- tegrity of the cortical larynx representation (Ghazanfar & rebral structures appear to predominantly mediate calls Rendall 2008; Simonyan & Jürgens 2005). Most likely, driven by an animal’s internal motivational milieu. however, experimental lesions have not included the full extent or even the bulk of the Broca homologue of nonhu- 2.2.3. Ventrolateral frontal lobe and corticobulbar man primates as determined by recent cytoarchitectonic system. Both squirrel and rhesus monkeys possess a neo- studies (Fig. 4 in Aitken 1981; Fig. 1 in Sutton et al. 1974). cortical representation of internal and external laryngeal The role of this area in the control of vocal behavior in muscles in the ventrolateral part of premotor cortex, border- monkeys still remains to be clarified. Nonhuman primates ing areas associated with orofacial structures, namely, appear endowed with a more elaborate cerebral organiza- tongue, lips, and jaw (Fig. 1 in Hast et al. 1974; Jürgens tion of orofacial musculature as compared to the larynx, 1974; Simonyan & Jürgens 2002; 2005). Furthermore, which, presumably, provides the basis for their relatively ad- vocalization-selective neuronal activity may arise at the vanced orofacial imitation capabilities (Morecraft et al. level of the premotor cortex in macaques that are trained 2001). As concerns the basal ganglia and the cerebellum, to respond with coo calls to food rewards (Coudé et al. the lesion and stimulation studies available so far do not 2011). Interestingly, premotor neural firing appears to provide reliable evidence for a participation of these struc- occur only when the animals produce vocalizations in a spe- tures in the control of motor aspects of vocal behavior (Kir- cific learned context of food reward, but not under other zinger 1985; Larson et al. 1978; Robinson 1967). conditions. Finally, a cytoarchitectonic homologue to Prosimians and New World monkeys are endowed Broca’s area of our species has been found between the solely with polysynaptic corticobulbar projections to lower

Downloaded from http:/www.cambridge.org/core534 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

brain-stem motoneurons (Sherwood 2005; Sherwood et al. engage in the emotive-prosodic modulation of spoken lan- 2005). By contrast, morphological and neurophysiological guage. More specifically, ACC and/or PAG could provide studies revealed direct connections of the precentral a platform for the addition of graded, that is, analogue infor- gyrus of Old World monkeys and chimpanzees to the mation on a speaker’s motivational states and intentional cranial nerve nuclei engaged in the innervation of orofacial composure to the speech signal. This suggestion has so far muscles (Jürgens & Alipour 2002; Kuypers 1958b; More- not been thoroughly tested against the available clinical data. craft et al. 2001) which, together with the aforementioned more elaborate cortical representation of orofacial struc- 3.1. Brainstem mechanisms of speech production tures, may contribute to the enhanced facial-expressive ca- pabilities of anthropoid primates (Sherwood et al. 2005). Ultimately, all cerebral control mechanisms steering vocal Most importantly, the direct connections between motor tract movements converge on the same set of cranial cortex and nucleus (nu.) ambiguus appear restricted, even nerve nuclei. Damage to this final common pathway, there- in chimpanzees, to a few fibers targeting its most rostral fore, must disrupt both verbal and nonverbal aspects of component (Kuypers 1958b), subserving the innervation human acoustic communication. By contrast, clinical obser- of pharyngeal muscles via the ninth cranial nerve (Butler vations in patients with bilateral lesions of the fronto- & Hodos 2005). By contrast, humans exhibit considerably parietal operculum and/or the adjacent white matter more extensive monosynaptic cortical input to the moto- point at the existence of separate voluntary and emotional neurons engaged in the innervation of the larynx – though motor systems at the supranuclear level (Groswasser et al. still less dense than the projections to the facial and hypo- 1988; Mao et al. 1989). However, these data do not glossal nuclei (Iwatsubo et al. 1990; Kuypers 1958a). In further specify the course of the “affective-vocal motor addition, functional imaging data point to a primary system” and, more specifically, the role of the PAG, a motor representation of human internal laryngeal muscles major component of the primate-general “limbic commu- adjacent to the lips of the homunculus and spatially separat- nication system” (Lamendella 1977). ed from the frontal larynx region of New and Old World According to the dual-pathway model, the cerebral network monkeys (Brown et al. 2008; 2009; Bouchard et al. 2013). supporting affective aspects of acoustic communication in our As a consequence, thus, the monosynaptic elaboration of species must include the PAG, but bypass the corticobulbar corticobulbar tracts during hominin evolution might have tracts engaged in articulate speech. Isolated damage to this been associated with a refinement of vocal tract motor midbrain structure, thus, should selectively compromise the control at the cortical level (“Kuypers/Jürgens hypothesis”; vocal expression of emotional/motivational states and spare Fitch et al. 2010).5 the sound structure of verbal utterances. Yet, lesion data – though still sparse – are at variance with this suggestion. Acquired midbrain lesions restricted to the PAG completely in- 2.3. Summary: Behavioral and neuroanatomic terrupt both channels of acoustic communication, giving rise to constraints of acoustic communication in the syndrome of akinetic mutism (Esposito et al. 1999). More- nonhuman primates over, comparative electromyographic (EMG) data obtained The cerebral network controlling acoustic call structure from cats and humans also indicate that the sound production in nonhuman primates centers around midbrain PAG circuitry of the PAG is recruited not only for nonverbal affec- (vocalization trigger) and a pontine vocal pattern generator tive vocalizations, but also during speaking (Davis et al. 1996; (coordination of the muscles subserving call production). Zhang et al. 1994). Likewise, a more recent positron emission fi Furthermore, mesiofrontal cortex (ACC/adjacent preSMA) tomography (PET) study revealed signi cant activation of this engages in higher-order aspects of vocal behavior such as con- midbrain component during talking in a voiced as compared to ditioned responses. These circuits, apparently, do not allow a whispered speaking mode (Schulz et al. 2005). for a decoupling of vocal fold motor activity from species- Conceivably, the PAG contributes to the recruitment of typical audio-visual displays (Knight 1999). The resulting in- central pattern generators of the brainstem. Besides the ability to combine laryngeal and orofacial gestures into control of stereotyped behavioral activities such as breath- novel movement sequences appears to preclude nonhuman ing, chewing, swallowing, or yawning, these oscillatory primates from mastering even the simplest speech-like utter- mechanisms might, eventually, be entrained by superordi- ances, despite extensive vocal repertoires and a high versatil- nate functional systems as well (Grillner 1991; Grillner & ity of their lips and tongue. At best, modification of acoustic Wallén 2004). During speech production, such brainstem call structure is restricted to the “variability space” of innate networks could be instrumental in the regulation of call inventories, bound to motivational or hedonistic triggers, highly adaptive sensorimotor operations during the and confined to intonational, durational, and loudness param- course of verbal utterances. Examples include the control eters, that is, signal properties homologous to prosodic of inspiratory and expiratory muscle activation patterns in aspects of human spoken language. response to continuously changing biomechanical forces and the regulation of vocal fold tension following subtle al- terations of subglottal pressure (see, e.g., Lund & Kolta 3. Contributions of the primate-general “limbic 2006). From this perspective, damage to the PAG would in- communicating brain” to human vocal behavior terrupt the recruitment of basic adaptive brainstem mech- anisms relevant for speech production and, ultimately, The dual-pathway model of human acoustic communica- cause mutism. However, the crucial assumption of this tion predicts the “limbic communication system” of the explanatory model – spoken language engages phylogenet- brain of nonhuman primates to support the production ically older, though eventually reorganized, brainstem of affective vocalizations such as laughing, crying, and circuits – remains to be substantiated (Moore 2004; moaning in our species. In addition, this network might Schulz et al. 2005; Smith 2010).

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 535 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates 3.2. Recruitment of mesiofrontal cortex during verbal Ziegler 1995). Early case studies found the behavioral communication deficits to extend beyond verbal and nonverbal acoustic communication: Apparently vigilant subjects with normal 3.2.1. Anterior cingulate cortex (ACC). There is some muscle tone and deep tendon reflexes displayed diminished evidence that, similar to subhuman primates, the ACC is or abolished spontaneous body movements, delayed or a mediator of emotional/motivational acoustic expression absent reactions to external stimuli, and impaired autonom- in humans as well (see sect. 2.2.2). A clinical example is ic functions (e.g., Barris & Schuman 1953). By contrast, frontal lobe epilepsy, a syndrome characterized by involun- bilateral surgical resection of the ACC (cingulectomy), per- tary and stereotyped bursts of laughter (“gelastic seizures”; formed most often in patients suffering from medically Wild et al. 2003) that lack any concomitant adequate intractable or psychiatric diseases, failed to signifi- emotions (Arroyo et al. 1993; Chassagnon et al. 2003; cantly compromise acoustic communication (Brotis et al. Iannetti et al. 1997; Iwasa et al. 2002). The cingulate 2009). The complex functional-neuroanatomic architecture gyrus appears to be the most commonly disrupted site of the anterior mesiofrontal cortex hampers, however, any based on lesion surveys of gelastic seizure patients (Kovac straightforward interpretation of these clinical data. In et al. 2009). This suggestion was further corroborated by monkeys, the cingulate sulcus encompasses two or even a recent case study in which electrical stimulation of the three distinct “cingulate motor areas” (CMAs), which right-hemisphere ACC rostral to the genu of the corpus project to the supplementary motor area (SMA), among callosum elicited uncontrollable, but natural-sounding other regions (Dum & Strick 2002; Morecraft & van laughter – in the absence of merriment (Sperli et al. Hoesen 1992; Morecraft et al. 2001). Humans exhibit a 2006). Conceivably, a homologue of the vocalization similar compartmentalization of the medial wall of the center of nonhuman primates bound to rostral ACC may frontal lobes (Fink et al. 1997; Picard & Strick 1996). A underlie stereotyped motor patterns associated with emo- closer look at the aforementioned surgical data reveals tional vocalizations in humans. that bilateral cingulectomy for treatment of psychiatric dis- Does the ACC participate in speaking as well? Based on orders, as a rule, did not encroach on caudal ACC (Le Beau an early PET study, “two distinct speech-related regions in 1954; Whitty 1955; for a review, see Brotis et al. 2009, the human anterior cingulate cortex” were proposed, the p. 276). Thus, tissue removal restricted to rostral ACC com- more anterior of which was considered to be homologous ponents could explain the relatively minor effects of this to the cingulate vocalization center of nonhuman primates surgical approach.6 Conceivably, mesiofrontal akinetic (Paus et al. 1996, p. 213). A recent and more focused func- mutism reflects bilateral damage to the caudal CMA and/ tional imaging experiment by Loucks et al. (2007) failed to or its efferent projections, rather than dysfunction of a “cin- substantiate this claim. However, this investigation was gulate vocalization center” bound to rostral ACC. Instead, based on rather artificial phonation tasks involving pro- the anterior mesiofrontal cortex has been assumed to con- longed and repetitive vowel productions which do not tribute to reward-dependent selection/inhibition of verbal allow for an evaluation of the specific role of the ACC in responses in conflict situations rather than to motor the mediation of emotional aspects of speaking. In aspects of speaking (Calzavara et al. 2007; Paus 2001). another study, Schulz et al. (2005) required participants This interpretation is compatible with the fact that psychi- to recount a story in a voiced and a whispered speaking atric conditions bound to ACC pathology such as obsessive- mode and demonstrated enhanced hemodynamic activa- compulsive disorder or Tourette syndrome cause, among tion during the voiced condition in a region homologous other things, socially inappropriate vocal behavior to the cingulate vocalization center, but much larger (Müller-Vahl et al. 2009; Radua et al. 2010; Seeley 2008). responses emerged in contiguous neocortical areas of medial prefrontal cortex. It remains unclear, however, 3.2.2. Supplementary motor area (SMA). Damage to the how the observed activation differences between voiced SMA in the language-dominant hemisphere may give rise and whispered utterances should be interpreted, since to diminished spontaneous speech production, character- both of these phonation modes require specific laryngeal ized by delayed, brief, and dysfluent, but otherwise well- muscle activity. One investigation explicitly aimed at a articulated verbal responses without any central-motor further elucidation of the role of medial prefrontal cortex disorders of vocal tract muscles or impairments of other in motivational aspects of speech production by analyzing language functions such as speech comprehension or the covariation of induced emotive prosody with blood reading aloud (“transcortical motor aphasia”; for a review oxygen level dependent (BOLD) signal changes as mea- of the earlier literature, see Jonas 1981; 1987; more sured by functional magnetic resonance imaging (fMRI; recent case studies in Ackermann et al. 1996 and Ziegler Barrett et al. 2004). Affect-related pitch variation was et al. 1997).7 This constellation may arise from initial found to be associated with supracallosal rather than prege- mutism via an intermediate stage of silent word mouthing niculate hemodynamic activation. However, the observed (Rubens 1975) or whispered speaking (Jürgens & von response modulation may have been related to changes in Cramon 1982; Masdeu et al. 1978; Watson et al. 1986). the induced emotional states rather than pitch control. Based on these clinical observations, the SMA, apparently, On the whole, the available functional imaging data do supports the initiation (“starting mechanism”) and not provide conclusive support for the hypothesis that the maintenance of vocal tract activities during speech produc- prosodic modulation of verbal utterances critically tion (Botez & Barbeau 1971; Jonas 1981). Indeed, move- depends on the ACC. ment-related potentials preceding self-paced tongue The results of lesion studies are similarly inconclusive. protrusions and vocalizations were recorded over the SMA Bilateral ACC damage due to cerebrovascular disorders (Bereitschaftspotential; Ikeda et al. 1992). Calculation of the or tumours has been reported to cause a syndrome of aki- time course of BOLD signal changes during syllable repeti- netic mutism (Brown 1988; for a review, see Ackermann & tion tasks, preceded by a warning , revealed an

Downloaded from http:/www.cambridge.org/core536 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

earlier peak of the SMA response relative to primary example, virtually all languages of the world differentiate sensorimotor cortex (Brendel et al. 2010). These data between voiced and voiceless sounds (e.g., /b/ vs. /p/ or corroborate the suggestion – based on clinical data – of an en- /d/ vs. /t/), a distinction which requires fast and precise gagement of the SMA in the preparation and initiation of laryngeal manoeuvres and a close interaction of the larynx – verbal utterances, that is, pre-articulatory control processes. at a time-scale of tens of milliseconds – with the tongue or lips (Hirose 2010; Munhall & Löfqvist 1992; Weismer 1980). During voiced portions, moreover, the melodic “ 3.3. Summary: Role of the primate-general limbic line of the speech signal is modulated in a language-specific ” communication system in human vocal behavior meaningful way to implement the intonation patterns in- In line with the dual-pathway model of human acoustic herent to a speaker’s native idiom or, in tone languages communication, the ACC seems to participate in the such as Mandarin, to create different tonal variants of release of stereotyped motor patterns of affective-vocal dis- spoken syllables. plays, even in the absence of an adequate emotional state. Clinical and functional-imaging observations indicate the Whether this mesiofrontal area also contributes to the “motor execution level” of speech production, that is, control of laryngeal muscles during speech production the adjustment of speed and range of coordinated vocal still remains to be established. An adjacent region, the neo- tract gestures, to depend upon lower primary sensorimotor cortical SMA, appears, however, to participate in the prep- cortex and its efferent pathways, the cranial nerve nuclei, aration and initiation of articulate speech. Midbrain PAG the thalamus, the cerebellum – and the basal ganglia also supports spoken language and, presumably, helps (Ackermann & Ziegler 2010; Ackermann & Riecker to recruit ancient brainstem circuitries which have been 2010a; Ackermann et al. 2010). More specifically, distribu- reorganized to subserve basic adaptive sensorimotor ted and overlapping representations of the lips, tongue, functions bound to verbal behavior. jaw, and larynx within the ventral sensorimotor cortex of the dominant hemisphere generate, during speech produc- tion, dynamic activation patterns reflecting the gestural 4. Contribution of the basal ganglia to spoken organization of spoken syllables (Bouchard et al. 2013). language: Vocal-affective expression and Furthermore, it is assumed that the left anterior peri- acquisition of articulate speech and subsylvian cortex houses hierarchically “higher” speech-motor-planning information in the adult brain The basal ganglia represent an ensemble of subcortical gray required to orchestrate the motor execution organs matter structures of a rather conserved connectional archi- during the production of syllables and words (see Fig. 2C tecture across vertebrate taxa, including the striatum for an illustration; Ziegler 2008; Ziegler et al. 2012). (caudate nucleus and putamen), the external and internal Hence, ontogenetic speech acquisition can be understood segments of the globus pallidus, the subthalamic nucleus, as a long-term entrainment of patterned activities of the and the substantia nigra (Butler & Hodos 2005; Nieuwen- vocal tract organs and – based upon practice-related plastic- huys et al. 2008). Clinical and functional imaging data ity mechanisms – the formation of a speech motor network indicate a significant engagement of the striatum both in which subserves this motor skill with ease and precision. In ontogenetic speech acquisition and subsequent over- the following sections we argue that the basal ganglia play a learned speech motor control. We propose, however, a key role in this motor-learning process and in the progres- fundamentally different role of the basal ganglia at these sive assembly of laryngeal and supralaryngeal gestures into two developmental stages: The entrainment of articulatory “motor plans” for syllables and words. In the mature vocal tract motor patterns during childhood versus the system, this “motor knowledge” gets stored within ventro- emotive-prosodic modulation of verbal utterances in the lateral aspects of the left-hemisphere frontal lobe, while adult motor system. the basal ganglia are, by and large, restricted to a fundamen- tally different role, that is, the mediation of motivational and emotional-affective drive into the speech motor system. 4.1. Facets of the faculty of speaking: The recruitment of the larynx as an articulatory organ 4.2. Developmental shifts in the contribution of the basal The production of spoken language depends upon “more ganglia to speech production muscle fibers than any other human mechanical perfor- mance” (Kent et al. 2000, p. 273), and the responsible 4.2.1. The impact of pre- and perinatal striatal dysfunc- neural control mechanisms must steer all components of tions on spoken language. Insight into the potential this complex action system at a high spatial and temporal contributions of the basal ganglia to human speech acquisi- accuracy. As a basic constituent, the larynx – a highly effi- tion can be obtained from damage to these nuclei at a cient sound source – generates harmonic signals whose prelinguistic age. Distinct mutations of mitochondrial or spectral shape can be modified through movements of nuclear DNA may give rise to infantile bilateral striatal the mandible, tongue, and lips (Figs. 2A & 2B). Yet, this necrosis, a constellation largely restricted to this basal physical source-filter principle is not exclusively bound to ganglia component (Basel-Vanagaite et al. 2006; De Meir- human speech, but characterizes the vocal behavior of leir et al. 1995; Kim et al. 2010; Solano et al. 2003; Thyagar- other mammals as well (Fitch 2000a). By contrast to the ajan et al. 1995). At least two variants, both of them point acoustic communication of nonhuman primates, spoken mutations of the mitochondrial ATPase 6 gene, were language depends, however, on a highly articulated larynx associated with impaired speech learning capabilities (De whose motor activities must be integrated with the gestures Meirleir et al. 1995: “speech delayed for age”; Thyagarajan of equally articulated supralaryngeal structures into learned et al. 1995, case 1: “no useful language at age 3 years”). complex vocal tract movement patterns (Fig. 2C). For As a further clinical paradigm, birth asphyxia may

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 537 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Figure 2. Vocal tract mechanisms of speech sound production. A. Source-filter theory of speech production (Fant 1970). Modulation of expiratory air flow at the levels of the vocal folds and supralaryngeal structures (pharynx, velum, tongue, and lips) gives rise to most speech sounds across human languages (Ladefoged 2005). In case of vowels and voiced consonants, the adducted vocal folds generate a laryngeal source signal with a harmonic spectrum U(s), which is then filtered by the resonance characteristics of the supralaryngeal cavities T(s) and the vocal tract radiation function R (s). As a consequence, these sounds encompass distinct patterns of peaks and troughs (formant structure; P(s)) across their spectral energy distribution. B. Consonants are produced by constricting the vocal tract at distinct locations (a), for example, through occlusion of the oral cavity at the alveolar ridge of the upper jaw by the tongue tip for /d/, /t/, or /n/ (insert of left panel: T/B=tip/body of the tongue, U/L = upper/lower lips, J = lower jaw with teeth). Such manoeuvres give rise to distinct up- and downward shifts of formants: Right panels show the formant transients of /da/ as a spectrogram (b) and a schematic display (c); dashed lines indicate formant transients of syllable /ba/ (figures adapted from Kent & Read 2002). C. Schematic display of the gestural architecture of articulate speech, exemplified for the word speaking. Consonant articulation is based on distinct movements of lips, tongue, velum, and vocal folds, phase-locked to more global and slower deformations of the vocal tract (VT) associated with vowel production. Articulatory gestures are assorted into syllabic units, and gesture bundles pertaining to strong and weak syllables are rhythmically patterned to form metrical feet. Note that laryngeal activity in terms of glottal opening movements (bottom line) is a crucial part of the gestural patterning of spoken words and must be adjusted to and sequenced with other vocal tract movements in a precise manner (Ziegler 2010).

predominantly impact the basal ganglia and the thalamus highly selective inability to acquire particular grammatical (eventually, in addition, the brainstem) under specific con- rules (Gopnik 1990a; for more details, see Taylor 2009), ex- ditions such as uterine rupture or umbilical cord prolapse, tensive neuropsychological evaluations revealed a broader while the cerebral cortex and the underlying white matter phenotype of psycholinguistic dysfunctions, including are less affected (Roland et al. 1998). A clinical study nonverbal aspects of intelligence (Vargha-Khadem & found nine children out of a group of 17 subjects with Passingham 1990; Vargha-Khadem et al. 1995; Watkins this syndrome completely unable to produce any verbal ut- et al. 2002a). However, the most salient behavioral deficit terances at the ages of 2 to 9 years (Krägeloh-Mann et al. in the afflicted individuals consists of pronounced abnor- 2002). Six further patients showed significantly compro- malities of speech articulation (“developmental verbal mised articulatory functions (“dysarthria”). Most impor- dyspraxia”) that render spoken language “of many of the af- tantly, five children had not mastered adequate articulate fected members unintelligible to the naive listener” speech at the ages of 3 to 12 years, though lesions were con- (Vargha-Khadem et al. 1995, p. 930; see also Fee 1995; fined to the putamen and ventro-lateral thalamus, sparing Shriberg et al. 1997). Furthermore, the speech disorder the caudate nucleus and the precentral gyrus. was found to compromise voluntary control of nonverbal Data from a severe developmental speech or language vocal tract movements (Vargha-Khadem et al. 2005). disorder of monogenic autosomal-dominant inheritance More specifically, the phenotype includes a significant dis- with full penetrance extending across several generations ruption of simultaneous or sequential sets of motor activi- of a large family provide further evidence of a connection ties to command, in spite of a preserved motility of single between the basal ganglia and ontogenetic speech acquisi- vocal tract organs (Alcock et al. 2000a) and uncompromised tion (KE family; Hurst et al. 1990). At first considered a reproduction of tones and melodies (Alcock et al. 2000b).

Downloaded from http:/www.cambridge.org/core538 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

A heterozygous point mutation (G-to-A nucleotide tran- to later stages of the disease. In line with these suggestions, sition) of the FOXP2 gene (located on chromosome 7; attempts to document impaired orofacial movement execu- coding for a transcription factor) could be detected as the tion, especially, hypometric (“undershooting”) gestures underlying cause of the behavioral disorder (for a review, during speech production, yielded inconsistent results see Fisher et al. 2003).8 Volumetric analyses of striatal (Ackermann et al. 1997a). Moreover, a retrospective nuclei revealed bilateral volume reduction in the afflicted study based on a large sample of postmortem-confirmed family members, the extent of which was correlated with cases found that PD patients predominantly display “hypo- oral-motor impairments (Watkins et al. 2002b). Mice and phonic/monotonous speech,” whereas atypical Parkinso- humans share all but three amino acids in the FOXP2 nian disorders (APDs) such as multiple system atrophy protein, suggesting a high conservation of the respective or progressive supranuclear palsy result in “imprecise or gene across mammals (Enard et al. 2002; Zhang et al. slurred articulation” (Müller et al. 2001). As a consequence, 2002). Furthermore, two of the three substitutions must Müller et al. assume the articulatory deficits of APD to have emerged within our hominin ancestors after separa- reflect non-dopaminergic dysfunctions of brainstem or tion from the chimpanzee lineage. Since primates lacking cerebellar structures. the human FOXP2 variant cannot even imitate the simplest Much like early PD, ischemic infarctions restricted to the speech-like utterances, and since disruption of this gene in putamen primarily give rise to hypophonia as the most humans gives rise to severe articulatory deficits, it appears salient speech (Giroud et al. 1997). In its warranted to assume that the human variant of this gene extreme, a more or less complete loss of prosodic modula- locus represents a necessary prerequisite for the phyloge- tion of verbal utterances (“expressive or motor aprosodia”) netic emergence of articulate speech. Most noteworthy, has been observed following cerebrovascular damage to animal experimentation suggests that the human-specific the basal ganglia (Cohen et al. 1994; Van Lancker Sidtis copy of this gene is related to acoustic communication et al. 2006).10 These specific aspects of speech motor disor- (Enard et al. 2009) and directly influences the dendritic ders in PD or after striatal infarctions suggest a unique role architecture of the neurons embedded into cortico-basal of the basal ganglia in supporting spoken language produc- ganglia–thalamo–cortical circuits (Reimers-Kipping et al. tion in that the resulting dysarthria might primarily reflect a 2011, p. 82). diminished impact of motivational, affective/emotional, and attitudinal states on the execution of speech movements, 4.2.2. Motor aprosodia in Parkinson’s disease. A loss of leading to diminished motor activity at the laryngeal midbrain neurons within the substantia nigra pars com- rather than the supralaryngeal level. Similar to other pacta (SNc) represents the pathophysiological hallmark motor domains, thus, the degree of speech deficits in PD of Parkinson’s disease (PD; idiopathic Parkinsonian syn- appears sensitive to “the emotional state of the patient” drome), one of the most common neurodegenerative disor- (Jankovic 2008), which, among other things, provides a ders (Evatt et al. 2002; Wichmann & DeLong 2007). This physiological basis for motivation-related approaches to degenerative process results in a depletion of the neuro- therapeutic regimens such as the Lee Silverman Voice transmitter dopamine at the level of the striatum, rendering Treatment (LSVT; Ramig et al. 2004; 2007). This general PD a model of dopaminergic dysfunction of the basal loss of “motor drive” at the level of the speech motor ganglia, characterized within the motor domain by akinesia system and the predominant disruption of emotive (bradykinesia, hypokinesia), rigidity, tremor at rest, and speech prosody suggest that the intrusion of emotional/af- postural instability (Jankovic 2008; Marsden 1982). fective tone into the volitional motor mechanisms of speak- In advanced stages, functionally relevant morphological ing depends on a dopaminergic striatal “limbic-motor changes of striatal projection neurons may emerge interface” (Mogenson et al. 1980). (Deutch et al. 2007; see Mallet et al. [2006] for other non- dopaminergic PD pathomechanisms). Recent studies 4.3. Dual contribution of the striatum to spoken suggest that the disease process develops first in extranigral language: A neurophysiological model brainstem regions such as the dorsal motor nucleus of the glossopharyngeal and vagal nerves (Braak et al. 2003). 4.3.1. Dopamine-dependent interactions between the These initial lesions affect the autonomic-vegetative limbic and motor loops of the basal ganglia during nervous system, but do not encroach on gray matter struc- mature speech production. In mammals, nearly all cortical tures engaged in the control of vocal tract movements such areas as well as several thalamic nuclei send excitatory, glu- as the nu. ambiguus. tamatergic afferents to the striatum. This major input struc- A classical tenet of speech pathology assumes that ture of the basal ganglia is assumed to segregate into the Parkinsonian speech/voice abnormalities reflect specific caudate-putamen complex, the ventral striatum with the motor dysfunctions of vocal tract structures, giving rise to nucleus accumbens as its major constituent, and the striatal slowed and undershooting articulatory movements elements of the olfactory tubercle (e.g., Voorn et al. 2004). (brady-/hypokinesia). From this perspective, the perceived Animal experimentation shows these basal ganglia subcom- speech abnormalities of Parkinson’s patients have been ponents to be embedded into a series of parallel reentrant lumped together into a syndrome termed “hypokinetic dys- cortico-subcortico-cortical loops (Fig. 3A; Alexander et al. arthria” (Duffy 2005). Unlike in other cerebral disorders, 1990; DeLong & Wichmann 2007; Nakano 2000). Several systematic auditory-perceptual studies and acoustic mea- frontal zones, including primary motor cortex, SMA, and surements identified laryngeal signs such as monotonous lateral premotor areas, target the putamen, which then pro- pitch, reduced loudness, and breathy/harsh voice quality jects back via basal ganglia output nuclei and thalamic relay as the most salient abnormalities in PD (Logemann et al. stations to the respective areas of origin (motor circuit). By 1978; Ho et al. 1999a; 1999b; Skodda et al. 2009; 2011).9 contrast, cognitive functions relate primarily to connections Imprecise articulation appears, by contrast, to be bound of prefrontal cortex with the caudate nucleus, and affective

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 539 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates states to limbic components of the basal ganglia (ventral (Fig. 3A; e.g., Haber et al. 2000; for reviews, see Haber striatum). Functional imaging data obtained in humans 2010a; 2010b). This dopamine-dependent “cascading inter- are consistent with such an at least tripartite division of connectivity” provides a platform for a cross-talk between the basal ganglia (Postuma & Dagher 2006) and point to the different basal ganglia loops and may, therefore, allow a distinct representation of foot, hand, face, and eye move- emotional/motivational states to impact behavioral respons- ments within the motor circuit (Gerardin et al. 2003). Fur- es, including the affective-prosodic shaping of the sound thermore, the second basal ganglia output nucleus, the structure of verbal utterances. substantia nigra pars reticulata (SNr), projects to several The massive cortico- and thalamostriatal glutamatergic hindbrain “motor centers,” for example, PAG, giving rise (excitatory) projections to the basal ganglia input structures to several phylogenetically old subcortical basal ganglia– target the GABAergic (inhibitory) medium-sized spiny pro- brainstem–thalamic circuits (McHaffie et al. 2005). jection neurons (MSN) of the striatum. MSNs comprise A brainstem loop traversing the PAG could participate in roughly 95% of all the striatal cellular elements. Upon the recruitment of phylogenetically ancient vocal brainstem leaving the striatum, the axons of these neurons connect mechanisms during speech production (see sect. 3.1; via either the “direct pathway” or the “indirect pathway” Hikosaka 2007). to the output nuclei of the basal ganglia (Fig. 3B; Albin The suggestion of parallel cortico-basal ganglia– et al. 1989; for a recent review, see Gerfen & Surmeier thalamo–cortical circuits does not necessarily imply strict 2011; for critical comments, see, e.g., Graybiel 2005; segregation of information flow. To the contrary, connec- Nambu 2008). In addition, several classes of interneurons tional links between these networks are assumed to be a and dopaminergic projection neurons impact the MSNs. basis for integrative data processing (Joel & Weiner 1994; Dopamine has a modulatory effect on the responsiveness Nambu 2011; Parent & Hazrati 1995). More specifically, of these cells to glutamatergic input, depending on the re- antero- and retrograde fiber tracking techniques reveal a ceptor subtype involved (David et al. 2005; Surmeier et al. cascade of spiraling striato-nigro-striatal circuits, extending 2010a; 2010b). Against this background, MSNs must from ventromedial (limbic) via central (cognitive-associat- be considered the most pivotal computational units of the ive) to dorsolateral (motor) components of the striatum basal ganglia that are “optimized for integrating multiple

Figure 3. Structural and functional compartmentalization of the basal ganglia. A. Schematic illustration of the – at least – tripartite functional subdivision of the cortico-basal ganglia–thalamo–cortical circuitry. Motor, cognitive/associative, and limbic loops are depicted in different gray shades, and the two cross-sections of the striatum (center) delineate the limbic, cognitive/associative, and motor compartments of the basal ganglia input nuclei. Alternating reciprocal (e.g., 1–1) and non- reciprocal loops (e.g., subsequent trajectory 2) form a spiraling cascade of dopaminergic projections interconnecting these parallel reentrant circuits (modified Fig. 2.3.5. from Haber 2010b). B. Within the basal ganglia, the motor loop segregates into at least three pathways: a direct (striatum – SNr/GPi), an indirect (striatum – GPe – SNr/GPi), and a hyperdirect (via STN) circuit (based on Fig. 1 in Nambu 2011 and Fig. 25.1 in Walters & Bergstrom 2010). The direct and indirect medium-sized spiny projection neurons of the striatum (MSN) differ in their patterns of receptor and peptide expression (direct pathway: D1-type dopamine receptors, SP = substance P; indirect pathway: D2, ENK = enkephalin) rather than their somatodendritic architecture. Key: DA = dopamine; GPi/GPe = internal/external segment of globus pallidus; SNr = substantia nigra, pars reticulata; SNc = substantia nigra, pars compacta; VTA = ventral tegmental area; STN = subthalamic nucleus; SC = superior colliculus; PPN = pedunculopontine nucleus; PAG = periaqueductal gray.

Downloaded from http:/www.cambridge.org/core540 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

distinct inputs” (Kreitzer & Malenka 2008), including effects, ultimately, may converge on the ventral striatum, dopamine-dependent motivation-related information, con- which then, presumably, funnels this information into the veyed via ventromedial–dorsolateral striatal pathways to basal ganglia motor loops. those neurons. It is well established that midbrain dopami- nergic neurons have a pivotal role within the context of clas- 4.3.2. Integration of laryngeal and supralaryngeal articu- sical/Pavlovian and operant/instrumental conditioning tasks latory gestures into speech motor programs during (e.g., Schultz 2006; 2010). More specifically, unexpected speech acquisition. The basal ganglia are involved in the benefits in association with a stimulus give rise to stereo- development of stimulus-response associations, for typic short-latency/short-duration activity bursts of dopami- example, Pavlovian conditioning (Schultz 2006), and the nergic neurons which inform the brain on novel reward acquisition of stimulus-driven behavioral routines, such as opportunities. Whereas, indeed, such brief responses habit formation (Wickens et al. 2007). Furthermore, striatal cannot easily account for the impact of a speaker’s mood circuits are known to engage in motor skill refinement, such as anger or joy upon spoken language, other behavio- another variant of procedural (nondeclarative) learning.11 ral challenges, for example, longer-lasting changes in moti- For example, the basal ganglia input nuclei contribute to vational state such as “appetite, hunger, satiation, the development of “motor tricks” such as the control of behavioral excitation, aggression, mood, fatigue, despera- a running wheel or the preservation of balance in tion,” are assumed to give rise to more prolonged striatal rodents (Dang et al. 2006; Willuhn & Steiner 2008; Yin dopamine release (Schultz 2007, p. 207). Moreover, the et al. 2009). investigations and clinico- midbrain dopaminergic system is sensitive to the motiva- neuropsychological studies suggest that the basal ganglia tional condition of an animal during instrumental condi- contribute to motor skill learning in humans as well, tioning tasks (“motivation to work for a reward”; Satoh though existing data are still ambiguous (e.g., Badgaiyan et al. 2003). et al. 2007; Doya 2000; Doyon & Benali 2005; Kawashima The dopamine-dependent impact of motivation-related et al. 2012; Packard & Knowlton 2002; Wu & Hallett 2005). information on MSNs provides a molecular basis for the The clinical observations referred to suggest that bilateral influence of a speaker’s actual mood and actual emotions pre-/perinatal damage to the cortico-striatal-thalamic cir- on the speech control mechanisms bound to the basal cuits gives rise to severe expressive developmental speech ganglia motor loop. Consequently, depletion of striatal dop- disorders which must be distinguished from the hypoki- amine should deprive vocal behavior from the “energetic netic dysarthria syndrome seen in adult-onset basal activation” (Robbins 2010) arising in the various cortical ganglia disorders. Conceivably, thus, the primary control and subcortical limbic structures of the primate brain functions of these nuclei change across different stages of (see Fig. 1B). The different basic motivational states of motor skill acquisition. In particular, the basal ganglia our species – shared with other mammals – are bound to may primarily participate in the training phase preceding distinct cerebral networks (Panksepp 1998; 2010). For skill consolidation and automatization: The “engrams” example, the “rage/anger” and “fear/anxiety” systems shaping habitual behavior and the “programs” steering involve the amygdala, which, in turn, targets the ventrome- skilled movements, thus, may get stored in cortical areas dial striatum. On the other hand, the cortico-striatal motor rather than the basal ganglia (for references, see Graybiel loop is engaged in the control of movement execution, 2008; Groenewegen 2003). namely, the specification of velocity and range of orofacial Yet, several functional imaging studies of upper-limb and laryngeal muscles. The basal ganglia have an ideal stra- movement control failed to document a predominant tegic position to translate the various arousal-related mood contribution of the striatum to the early stages of motor se- states (joy or anger) into their respective acoustic signatures quence learning (Doyon & Benali 2005; Wu et al. 2004)or by means of a dopaminergic cascade of spiraling striato- even revealed enhanced activation of the basal ganglia nigro-striatal circuits – via adjustments of vocal tract inner- during overlearned task performance (Ungerleider et al. vation patterns (“psychobiological push effects of vocal 2002) and, therefore, do not support this model. As a affect expression”; Banse & Scherer 1996; Scherer et al. caveat, these experimental investigations may not provide 2009). In addition, spoken language may convey a speaker’s an appropriate approach to the understanding of the attitude towards a person or topic (“attitudinal prosody”; neural basis of speech motor learning. Spoken language Van Lancker Sidtis et al. 2006). Such higher-order commu- represents an outstanding “motor feat” in that its ontoge- nicative functions of speech prosody involve a more netic development starts early after or even prior to birth extensive appraisal of the context of a conversation and and extends over more than a decade. During this period, may exploit learned stylistic (ritualized) acoustic models the specific movement patterns of an individual’s native of vocal-expressive behavior (Scherer 1986; Scherer et al. idiom are exercised more extensively than any other com- 2009). Besides subcortical limbic structures and orbitofron- parable motor sequences. A case similar to articulate tal areas, ACC projects to the ventral striatum in monkeys speech can at most be made with educated musicians or (Haber et al. 1995; Kunishio & Haber 1994; Öngür & Price athletes who have experienced extensive motor practice 2000). Since these mesiofrontal areas are assumed to from early on over many years. In these subject groups, ex- operate as a platform of motivational-cognitive interactions tended motor learning is known to induce structural adap- subserving response evaluation (see above), the connec- tations of gray and white matter regions related to the level tions of ACC with the striatum, conceivably, engage in of motor accomplishments (Bengtsson et al. 2005; Gaser & the implementation of attitudinal aspects of speech Schlaug 2003). Such investigations into the mature neuro- prosody (“sociolinguistic/sociocultural pull factors” as anatomic network of highly trained “motor experts” have opposed to the “psychobiological push effects” referred to revealed fronto-cortical and cerebellar regions12 to be above; Banse & Scherer 1996; Scherer et al. 2009). Thus, predominantly moulded by the effects of long-term both the psychobiological push and the sociocultural pull motor learning with little or no evidence for any lasting

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 541 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates changes at the level of the basal ganglia (e.g., Gaser & our species feed into a complex divergence/convergence Schlaug 2003). Against this background, it might be conjec- network within the basal ganglia as well. These lateral inter- tured that the basal ganglia engage primarily in early stages actions between matrisomes bound to the various vocal of speech acquisition but do not house the motor represen- tract structures might provide the structural basis support- tations that ultimately convey the fast, error-resistant, and ing the early stages of ontogenetic speech acquisition. More highly automated vocal tract movement patterns of adult specifically, a larger striatal representation of laryngeal speech. This may explain why pre-/perinatal dysfunctions muscles – split up into a multitude of matrisomes – could of the basal ganglia have a disastrous impact on verbal provide a platform for the tight integration of vocal fold communication and preclude the acquisition of speech movements into the gestural architecture of vocal tract motor skills. motor patterns (Fig. 2C). How can the contribution of the basal ganglia to the as- sembly of vocal tract motor patterns during speech acquisi- tion be delineated in neurophysiological terms? One 4.4. Summary: Basal ganglia mechanisms bound to the important facet is that the laryngeal muscles should have integration of primate-general and human-specific gained a larger striatal representation in our species as com- aspects of acoustic communication pared to other primates. Humans are endowed with more Dopaminergic dysfunctions of the basal ganglia input fi extensive corticobulbar ber systems, including monosyn- nuclei in the adult brain predominantly disrupt the embed- aptic connections, engaged in the control of glottal func- ding of otherwise well-organized speech motor patterns tions (see sect. 2.2.3 above; Iwatsubo et al. 1990; Kuypers into an adequate emotive- and attitudinal-prosodic 1958a). Furthermore, functional imaging data point to a context. Based upon these clinical data, we propose that fi signi cant primary-motor representation of human internal the striatum adds affective-prosodic modulation to the laryngeal muscles, spatially separated from the frontal sound structure of verbal utterances. More specifically, “ ” larynx region of New and Old World monkeys (Brown the dopamine-dependent cascading interconnectivity et al. 2008; 2009). In contrast to other primates, therefore, between the various basal ganglia loops allows for a cross- fi a higher number of corticobulbar bers target the nu. talk between the limbic system and mature speech motor ambiguus. As a consequence, the laryngeal muscles control mechanisms. By contrast, bilateral pre-/perinatal should have a larger striatal representation in our species damage to the striato-thalamic components of the basal fi since the cortico-striatal ber tracts consist, to a major ganglia motor loops may severely impair speech motor in- extent, of axon collaterals of pyramidal tract neurons pro- tegration mechanisms, resulting in compromised spoken jecting to the spinal cord and the cranial nerve nuclei, in- language acquisition or even anarthria. We assume that cluding the nu. ambiguus (Gerfen & Bolam 2010; Reiner the striatum critically engages in the initial organization 2010). Apart from the nu. accumbens, electrical stimulation of “motor programs” during speech acquisition, whereas of striatal loci in monkeys, in fact, failed to elicit vocaliza- the highly automatized control units of mature speech pro- tions. In the latter case, however, the observed vocaliza- duction, that is, the implicit knowledge of “how syllables fl tions re ect, most presumably, evoked changes in the and words are pronounced,” are stored within anterior ’ animals internal motivational milieu rather than the excita- left-hemisphere peri-/subsylvian areas. tion of motor pathways (Jürgens & Ploog 1970). A more extensive striatal representation of laryngeal functions can be expected to enhance the coordination of 5. Paleoanthropological perspectives: A two-step these activities with the movements of supralaryngeal struc- phylogenetic/evolutionary scenario of the tures. Briefly, the dorsolateral striatum separates into two emergence of articulate speech morphologically identical compartments of MSNs, which vary, however, in neurochemical markers and input/ In a comparative view, the striatum appears to provide output connectivity (Graybiel 1990; for recent reviews, the platform on which a primate-general and, therefore, see Gerfen 2010; Gerfen & Bolam 2010). While the so- phylogenetically ancient layer of acoustic communication called striosomes (patches) are interconnected with penetrates the neocortex-based motor system of spoken limbic structures, the matrisomes (matrix) participate pre- language production. Given the virtually complete speech- dominantly in sensorimotor functions. This matrix compo- lessness of nonhuman primates due to, especially, a limited nent creates an intricate pattern of divergent/convergent role of laryngeal/supralaryngeal interactions during call information flow. For example, primary-motor and somato- production, structural elaboration of the cortico-basal sensory cortical representations of the same body part are ganglia–thalamic circuits should have occurred during connected with the same matrisomes of the ipsilateral hominin evolution. Recent molecular-genetic findings putamen (Flaherty & Graybiel 1993). Conversely, the pro- provide first specific evidence in support of this notion. jections of a single cortical primary-motor or somatosensory More specifically, human-specific FOXP2 copies may area to the basal ganglia appear to “diverge to innervate a have given rise to an elaboration of somatodendritic mor- set of striatal matrisomes which in turn send outputs that phology of basal ganglia loops engaged in the assemblage reconverge on small, possibly homologous sites” in pallidal of vocal tract movement sequences during early stages of structures further downstream (Flaherty & Graybiel 1994, articulate speech acquisition. We propose, however, that p. 608). Apparently, such a temporary segregation and sub- the assumed FOXP2-driven “vocal-laryngeal elaboration” sequent re-integration of cortico-striatal input facilitates of the cortico-striatal-thalamic motor loop should have “lateral interactions” between striatal modules and, been preceded by a fundamentally different phylogenet- thereby, enhances sensorimotor learning processes. ic-developmental process, that is, the emergence of mono- Similar to other body parts, it must be expected that synaptic corticobulbar tracts engaged in the innervation of the extensive larynx-related cortico-striatal fiber tracts of the laryngeal muscles.

Downloaded from http:/www.cambridge.org/core542 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

5.1. Monosynaptic elaboration of the corticobulbar exchanges” as a form of “grooming-at-a-distance”; tracts: Enhanced control over tonal and rhythmic Dunbar 2012) and, second, with other body movements characteristics of vocal behavior (Step 1) (dance). Such activities support interpersonal emotional bonds (“fellow-feeling”) and promote social cohesion/coop- In nonhuman primates the larynx functions as an energet- eration (Cross 2001; 2003; Cross & Morley 2009). These ically efficient sound source, but shows highly constrained, accomplishments must have emerged after the separation if any, volitional motor capabilities. Direct projections of of the hominin lineage since chimpanzees are unable to the motor cortex to the nu. ambiguus (see sect. 2.2.3) converge on a regular during call production (e.g., should have endowed this organ in humans with the poten- Geissmann 2000). More specifically, African apes engage tial to serve as a more skillful musical organ and an articu- in rhythmical behavior like drumming, but, apparently, lator with similar versatility as the lips and the tongue. lack the capacity of a mutual entrainment of such actions Presumably, this first evolutionary step toward spoken lan- into synchronized group displays (Fitch 2012). Thus, guage emerged independent of the presence of the human- monosynaptic elaboration of the corticobulbar tracts specific FOXP2 transcription factor. Structural morpho- might have provided the phylogenetic basis both for the metric (Belton et al. 2003; Vargha-Khadem et al. 1998; “communicative musicality” of human infants and for com- Watkins et al. 1999; 2002b) and functional imaging munal “wordless vocal exchanges,” preceding both articu- studies (Liégeois et al. 2003) in affected KE family late speech and more formal musical activities shaped by members demonstrate abnormalities of all components of culture (Malloch & Trevarthen 2009).14 As a further indica- the cerebral speech motor control system, except the tion that these achievements are not bound to the presence brainstem targets of the corticobulbar tracts (cranial of the human-specific FOXP2 transcription factor, repro- nerve nuclei, pontine gray) and the SMA (Fig. 4 in duction of musical tones and tunes was found largely Vargha-Khadem et al. 2005).13 As an alternative to uncompromised in KE family members with articulatory FOXP2-dependent neural processes, the increase of mono- disorders (Alcock et al. 2000b). synaptic elaboration of corticobulbar tracts within the The Kuypers/Jürgens hypothesis (Fitch et al. 2010) primate order (see sect. 2.2.3) might reflect a “phylogenetic assumes that the vocal-behavioral limitations of nonhuman trend” (Jürgens & Alipour 2002) associated with brain primates are rooted in the absence of direct corticobulbar volume enlargement. Thus, “evolutionary changes in projections to the brainstem motoneurons engaged in the brain size frequently go hand in hand with major changes innervation of laryngeal muscles and housed within the in both structural and functional details” (Striedter 2005, nu. ambiguus. Indeed, this model explains the inability of p. 12), For example, absolute brain volume predicts – via nonhuman primates to produce sound patterns that a nonlinear function – the size of various cerebral compo- impose particularly high demands on the coordination of nents, ranging from the medulla to the forebrain (Finlay laryngeal and supralaryngeal activities such as the rapid & Darlington 1995). The three- to four-fold enlargement voiced–voiceless alterations characteristic of articulate of absolute brain size in our species relative to australopith- speech. Yet, this suggestion cannot account for nonhuman ecine forms (Falk 2007), therefore, might have driven this primates’ inability to imitate less challenging, fully voiced, refinement of laryngeal control – concomitant with a reor- speech-like vocalizations such as syllables comprising ganization of the respective motor maps at the cortical voiced consonants (see sect. 4.3.2). level (Brown et al. 2008; 2009). Whatever the underlying mechanism, the development of monosynaptic projections of the motor strip to nu. ambiguus should have been asso- 5.2. FOXP2-driven vocal elaboration of the basal ganglia ciated with an enhanced versatility of laryngeal functions. motor loop: Enhanced integration of laryngeal and From the perspective of the lip-smack hypothesis supralaryngeal gestures (Step 2) (Ghazanfar et al. 2012), the elaboration of the corticobulbar As a further prerequisite of spoken language, the vocal tracts might have been a major contribution to turn the folds must serve as an “articulatory organ” that can be visual lip-smacking display into an audible signal (see “pieced together” with equally versatile orofacial gestures MacNeilage 1998; 2008). Furthermore, this process into a tightly integrated meshwork of appropriately timed should have allowed for a refinement of the rather stereo- vocal tract movements. Conceivably, FOXP2-driven mor- typic acoustic structure of the vocalizations of our early phological changes at the level of the basal ganglia in our hominin ancestors (Dissanayake 2009, p. 23; Morley hominin ancestors provided the physiological basis for 2012, p. 131), for example, the “discretization” of (innate) these sensorimotor capabilities to emerge as a second phy- glissando-like tonal call segments into “separate tonal logenetic step toward articulate speech. More specifically, steps” (Brandt 2009) or the capacity to match and maintain enhanced “lateral interactions” between striatal representa- individual pitches (Bannan 2012, p. 309). Such an elabora- tions of vocal tract muscles based on a divergence/conver- tion of the “musical characteristics” (Mithen 2006, p. 121) gence architecture of information flow within the basal of nonverbal vocalizations, for example, contact calls, ganglia (Flaherty & Graybiel 1994) have the potential to must have supported mother–child interactions. In order support the linkage of vocal tract movements into lan- to impact the attention, arousal, or mood of young guage-specific syllabic and metrical patterns. This would infants, caregivers often use non-linguistic materials such represent a major step in sensorimotor verbal learning as “interjections, calls, and imitative sounds”, characterized during ontogenetic speech acquisition. The role of the by “extensive melodic modulations” (Papoušek 2003). Fur- basal ganglia in this process seems to be confined to the thermore, monosynaptic corticobulbar projections allow phase where the entrainment and automatization of for rapid on/off switching of call segments and, thus, speech motor patterns takes place, while the persistent enable synchronization of vocal behavior, first, across indi- motor plans evolving during this process get stored within viduals (communal chorusing in terms of “wordless vocal left-hemisphere peri- or subsylvian cortex. In the mature

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 543 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates speech motor system, the contribution of the striatum to the first one or three or five protolanguage signs [such as speech production appears predominantly restricted to syllable repetitions or simple words] didn’t have a substan- dopamine-dependent, emotive-prosodic shading of the tial payoff, no one would have bothered to invent any speech signal as a homologue to the vocalizations of nonhu- more” (Bickerton 2009, p. 165). The announcement of man primates and a vestige of the ancient communication “displaced” objects such as perished large mammals and system. the subsequent recruitment of troop members for carcass Paleoanthropological data such as endocast traces of exploitation has been assumed to provide the necessary Broca’s area (Holloway et al. 2004, pp. 15ff) or morpholog- “substantial payoff” (Bickerton 2009, pp. 167f). But individ- ical features of the cranial base (Lieberman 2011) provide uals spending their whole – though often short – lives to- only indirect and ambiguous evidence on the evolution of gether in small and intimate troops should have been spoken language. “Comparing our behavior and brain able to convey such simple messages to a sufficient extent with those of other extant primates” (Ghazanfar & Miller by nonverbal, that is, gestural means (Coward 2010, 2006, p. R879) still represents the most robust approach p. 469). to the investigation of the “biological mechanisms underly- Rather than semantic-referential functions, the earliest ing the evolution of speech” (Ghazanfar & Rendall 2008, speech-like vocalizations could have served as refined p. R457). Recently, however, molecular-genetic studies contact calls and, thus, facilitated mother–child interactions have shed light on the phylogeny of verbal communication (Falk 2004; 2009). Likewise, these vocalizations might have in the hominin lineage and, more specifically, the contribu- allowed for a vocal elaboration of group activities such as tion of the basal ganglia to the evolution of spoken communal dancing or grooming, which consolidate intra- language. Thus, molecular-genetic analyses found the group cohesion and cooperation (Dunbar 1996; Mithen human form of the FOXP2 protein in 43,000-year-old Ne- 2006, pp. 208f). In other words, the earliest verbal utter- anderthal skeletal remains (Rosas et al. 2006) linked to the ances further expanded and refined the space of versatile same haplotype as in our species (Krause et al. 2007).15 vocal displays afforded by the preceding development of Since large-scale analyses of the FOXP2 locus in humans monosynaptic corticobulbar projections to the nu. ambi- failed to detect any amino acid polymorphisms (Enard guus. Besides other benefits (see above), these accomplish- et al. 2002), those speech-related mutations must have ments should have enhanced a “speaker’s” social prestige. been the target of strong selection pressures, causing a rel- Subsequent gradual “conventionalization” (Milo & Quiatt atively fast fixation within the human gene pool (“selective 1994) of speech-like acoustic signals then could have sweep”). Assuming modern humans and Neanderthals did slowly created opportunities for the conveyance of environ- not interbreed, positive selection of the relevant FOXP2 mental or social information by simply drawing attention to mutation(s) should have occurred in our most recent an actual event or situation (Dessalles 2007, p. 360). common ancestor (MRCA). Sequence analyses both of nuclear and mitochondrial DNA “locate” the MRCA to the mid-Middle Pleistocene, around 400,000 to 600,000 6. A look beyond the primate lineage: Birdsong years ago (Endicott et al. 2010; Green et al. 2010; Hofreiter and human speech 2011; Noonan 2010), and these data are compatible with the fossil record (Weaver et al. 2008). As an alternative sce- In a broader comparative perspective, the emergence of nario, gene flow could explain the presence of the human articulate speech appears to have involved the convergent FOXP2 variant in Neanderthal bones (Coop et al. 2008). evolution in our species of rather ancient principles of Under these conditions, a later emergence of the respective brain wiring, documented already many years ago in song- hominin mutations has been assumed – around 40,000 birds. The avian “song production network” roughly sepa- years ago (see Stringer 2012, pp. 190ff, for a recent discus- rates into two circuits, that is, the vocal motor pathway sion of interbreeding between modern humans and archaic (VMP) and the anterior forebrain pathway (AFP; e.g., populations, i.e., Neanderthals and Denisovans). A more Bolhuis et al. 2010; Jarvis 2004a; 2004b). Whereas VMP recent molecular-genetic study, finally, points at a positive shares essential organizational principles with human corti- selective sweep of a regulatory FOXP2 element – affecting cobulbar tracts such as monosynaptic projections to the neuronal expression of this gene – within a comparable cranial nerve centers steering the peripheral vocal appara- time domain, that is, during the last 50,000 years (Maricic tus (Wild 2008; see also Ackermann & Ziegler 2013), there et al. 2013). In any case, whatever model will prove true, are striking similarities between AFP and the cortico-basal FOXP2-driven speech-related modification of cortico-stria- ganglia loops of mammals, including our species (Doupe tal circuits must have emerged in individuals characterized et al. 2005). In zebra finches, area X – a major AFP compo- by a cerebral volume similar to that of extant modern nent that includes both striatal and pallidal elements – humans (Rightmire 2004; 2007). shows, for example, specific interdependencies between Assuming a gradual monosynaptic elaboration of cortico- FoxP2 level and the accuracy of tutor song imitation bulbar projections in parallel with brain size increase across (Haesler et al. 2007) or juvenile/adult singing activity the hominin lineage (see above), the relatively late reorga- (Teramitsu et al. 2010; for an evolutionary perspective on nization of cortico-basal ganglia loops driven by specific this gene see Scharff & Haesler 2005). Whereas bilateral FOXP2 mutations should have occurred on top of a fully VMP damage significantly compromises vocal behavior at developed motoneuronal axis. It is tempting to relate the any stage of an individual’s life history, AFP dysfunctions selective sweep of the hominin FOXP2 mutations to the have, by contrast, a more subtle impact upon mature evolution of speech and language functions (Enard & songs, but severely disrupt vocal learning mechanisms Pääbo 2004; Zhang et al. 2002). However, the benefits of (e.g., Brainard & Doupe 2002). Thus, (i) monosynaptic full-fledged verbal communication cannot have been the connections between upper and lower motoneurons driving force of the emergence of articulate speech. “If engaged in the innervation of the sound source and (ii)

Downloaded from http:/www.cambridge.org/core544 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

cortico-striatal motor loops supporting vocal-laryngeal hitherto underestimated role of the basal ganglia in functions appear to represent common functional- spoken language should help to further elucidate the neuroanatomic prerequisites both of spoken language and relationship between birdsong and human speech. birdsong (for a review of the parallels between avian and human acoustic communication, see Doupe & Kuhl 1999; Bolhuis & Everaert 2013; Bolhuis et al. 2010). As a 7. Conclusions consequence, birdsong can serve as an experimental model for the investigation of the neural control of During recent years, a salient contribution of subcortical human speech – though, most presumably, syntactic and structures, including the basal ganglia, to language evolu- semantic aspects of verbal utterances elude such an ap- tion has been assumed (Lieberman 2000; 2007). More proach (Beckers et al. 2012; Berwick et al. 2011). The specifically, FOXP2-driven modification of neural circuits traversing the basal ganglia must be considered a necessary prerequisite for “the emergence of proficient spoken lan- guage” (Vargha-Khadem et al. 2005). However, these sug- gestions do not account for the developmental dynamics of cortico-striatal interactions and the discrepancies between the sequels of basal ganglia lesions in children and adults. Based upon behavioral–clinical and functional imaging data, in this article we have proposed (1) two successive phylogenetic stages of speech acquisition (monosynaptic refinement of corticobulbar tracts and laryngeal elaboration of cortico-striatal motor circuits), and (2) a functional reor- ganization of the cortico-striatal motor loops engaged in vocal tract control during ontogenetic speech development (Fig. 4). It goes without saying that the model outlined here ad- dresses only one out of several building blocks of a compre- hensive theory of the evolution of spoken language. Most evidently, our approach still fails to account for the co- evolution of the described linguistic motor skills with the auditory skills underlying speech perception, and, as a con- sequence, the emergence of the auditory-motor network Figure 4. Cerebral network supporting the integration of that underlies the phonological processing capacities of primate-general (gray arrows) and human-specific aspects of our species. Furthermore, we need to better understand acoustic communication (black). how this elaborate auditory-vocal communication appara- A cascading dopaminergic circuitry (bidirectional arrows) tus became overarched by the expanding conceptual- connects the ventromedial-limbic (vm STR) with the semantic and syntactic capabilities of humans. Thus, dorsolateral-motor components of the striatum (dl STR) and language evolution must be considered a multicomponent their respective output nuclei, SNr and GPi. We suggest that fi ’ process, and the speci c phylogenetic interactions of emer- this circuitry funnels information on a speaker s actual affective/ gent speech production with these other traits await further motivational state into the central motor system, thereby modulating spoken language by an emotive-prosodic “tone,” a elucidation. Presumably, any such phylogenetic account also needs to integrate, among other things, social and mo- homologue of the vocal behavior of nonhuman primates. Unlike “ what is postulated by dual-pathway models, the two networks tivational contingencies (e.g., Dunbar 1996), the desire to appear to be closely intertwined at the level of the basal ganglia use the vocal tract to communicate” (Locke 1993, p. 322f), and of midbrain/brainstem structures. In our species, the motor amodal mimetic capacities (Donald 1999), mirror neuron cortex, first, has monosynaptic projections to brainstem nu. systems (Arbib 2006), and so-called executive functions ambiguus and, second, the basal ganglia motor loop extends to (Coolidge & Wynn 2009) as relevant driving forces and pre- laryngeal functions – based, probably, on the convergent – requisites of spoken language evolution (for a comprehen- evolution of a wiring schema already extant in songbirds sive overview, see Tallerman & Gibson 2012). whereas nonhuman primates seem to lack such a “vocal elaboration” of subcortical-cortical motor circuitry. The dashed lines indicate that the basal ganglia motor loop, apparently, NOTES undergoes a dynamic ontogenetic reorganization during spoken 1. Though predominantly depending on glottal source charac- language acquisition in that a left-hemisphere cortical storage teristics such as the fluctuations of pitch, loudness, and voice site of “motor programs” gradually emerges, bearing the major quality, vocal-affective prosodic expression may also be associated load of vocal tract control after mature speech production has with changes in speech breathing patterns, alterations of speaking been established. (This figure does not include the cerebellum, rate, and the degree to which speech sounds are hyper- or hypo- a structure also engaged in speech motor control [see articulated. Thus, motivational factors have, more or less, an Ackermann 2008], but not relevant for the discussion in this impact on all vocal tract subsystems. article.) Affective-emotive speech prosody, that is, the expression of Key: Amygdala etc. = amygdala and other (allocortical/ arousal-related mood states, has been considered as a behavioral mesolimbic) structures of the limbic system; ACC = anterior trait homologous to the acoustic signals of nonhuman primates cingulate cortex; SMA = supplementary motor area; SMC = in addition to nonverbal affective vocalizations such as laughter sensorimotor cortex; GPi = internal segment of globus pallidus; (“push-effects” of affective-emotive prosody; see last paragraph SNr/SNc = substantia nigra, pars reticulata/pars compacta; PAG in sect. 4.3.1). By contrast, attitudes like doubt or approval = periaqueductal gray; vCPG = vocal central pattern generator. cannot unambiguously be expected in nonhuman primates.

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 545 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Thus, it is questionable whether attitudinal prosody, that is, 12. As compared to the upper limbs, the specific contribution appraisal-related “pull-effects,” can be assumed homologous to of the cerebellum to speech motor learning is less clear. Most the vocal behavior of nonhuman primates. noteworthy, the few reported cases of congenital cerebellar hypo- Besides arousal-related motivational/affective states (e.g., joy) or plasia/aplasia, apparently, lack any significant disorders of spoken appraisal-based subjective attitudes (e.g., doubt), speech prosody language (Ackermann & Ziegler 1992). Acquired dysfunctions of may also convey linguistic information such as word accent (linguis- the cerebellum, nevertheless, compromise speech production, tic prosody) or contribute to the implementation of “speech acts” giving rise to, among other things, a slowed speaking rate and im- such as verbal intimidation of another subject (Sidtis & Van precise consonant articulation (Ackermann 2008; Duffy 2005). Lancker Sidtis 2003; Van Lancker Sidtis et al. 2006). Linguistic 13. These inferences must be considered with some precau- and pragmatic prosody are outside the scope of this article. tions: We can only conclude that the heterozygous(!) constella- In addition to a propositional message and affective/ tions observed so far in the KE family (Bolhuis et al. 2010, attitudinal states, the speech signal also conveys speaker-related p. 753) do not significantly disrupt the corticobulbar pathway – (“indexical”) information on age, gender, and identity, simply unlike other components of the central motor system. because the size and tissue properties of laryngeal and supralar- 14. Although contemporary traditional societies of a predomi- yngeal structures differ across individuals and change over lifetime nantly hunter-gatherer mode of subsistence “are not necessarily (Kreiman & Sidtis 2011). like some form of pre-human and should not be used uncritically 2. The more recent paleoanthropological literature applies the as models,” the respective ethnographic data, nevertheless, allow – – term hominin rather than hominid to the human clade limited inferences on the behavioral repertoire of our hominin an- “ ” “ ” ( family ), that is, the bush of all species tracing back to a cestors (Barnard 2011, p. 15). Thus, extensive communal dancing, common ancestor who diverged from the lineage encompassing often accompanied by rhythmic nonverbal utterances, represents modern chimpanzees (Lewin & Foley 2004, p. 9). a salient component of many ceremonies associated with impor- 3. Nucleotide sequences are given in italics, proteins in regular tant events in the life of an individual (e.g., circumcision rite; letters; lower- and uppercase serve to distinguish human (FOXP2/ Turner 1967, pp. 186ff, 193) or the history of a group (war-/ FOXP2), murine (Foxp2/Foxp2), and other, for example, avian peace-related gatherings; e.g., Rappaport 2000, pp. 173ff). Since (FoxP2/FoxP2) variants of the forkhead family of genes (Kaestner the coordination of vocal behavior and body movements may en- et al. 2000). courage a sense of “unity, harmony, and concord” among a group, 4. The PAG and the adjacent mesencephalic tegmentum rep- social bonding should benefit from a vocal elaboration of ritual resent a functional-neuroanatomic entity (Holstege 1991). In the forms (Rappaport 1999, pp. 220, 252ff). It must be noted, “ ” subsequent paragraphs, the term PAG will always refer to both however, that communal dancing often may include a competitive subcomponents. element aside from social bonding (James 2003, pp. 75f; for exam- 5. Monosynaptic projections of (the avian) motor cortex to ples, see Rappaport 1999, p. 80; 2000, pp. 191ff; Turner 1967, brainstem nuclei have also been documented in songbirds (for a p. 260). Principally, refined musical abilities could have supported review see, e.g., Wild 2008), an often neglected prerequisite of to some extent referential communication. Spoken languages may vocal learning (see sect. 6). include a broad range of nonverbal signals (Lewis 2009). For 6. Two cases of a constellation resembling transcortical motor example, the Mbendjele people living in the dense equatorial aphasia following ACC infarction have been documented to date forests of the Congo Basin, a habitat that severely impedes (Chang et al. 2007). Diffusion tensor imaging revealed additional visual orientation, report an encounter with a dangerous animal fi disruption of efferent SMA bers in one patient. Thus, a substan- to other group members by means of meticulous mimicry of the tial contribution of premotor mesiofrontal cortex to the observed respective auditory scene. These anthropological data support communication disorders must be considered. the suggestion that enhanced musicality of nonverbal vocalizations 7. Two case studies noted compromised speech prosody after may provide communicative benefits, but do not necessarily imply mesiofrontal lesion (Bell et al. 1990; Heilman et al. 2004). In the notion of a “musical protolanguage” or “musilanguage” the absence of more detailed neuroanatomic data, such observa- (Brown 2000), that is, music-like learned communication fi tions are dif cult to interpret unambiguously. systems preceding full-fledged spoken language, a hypothesis – 8. Further alterations of the FOXP2 gene such as a nonsense tracing back to (1871). – mutation giving rise to truncated protein products have been 15. Similar to nonhuman primates, limitations of articulate found in association with developmental speech dyspraxia speech due to vocal tract constraints have been attributed to (MacDermot et al. 2005). Neanderthals as well, giving rise to a reduced repertoire of speech 9. In contrast to other dysarthria variants, PD subjects show, as sounds (for a critical discussion, see Barney et al. 2012;Clegg2012). a rule, normal speaking rates. A subgroup of patients even displays an accelerated tempo (“hastening phenomenon”; e.g., Duffy 2005). This unique, but rarely studied, phenomenon may reflect a release of oscillatory basal ganglia activity (Ackermann et al. 1997b; Riecker et al. 2006). 10. Tracing back to the late 1970s (Ross & Mesulam 1979), a series of case studies assigned motor aprosodia – disrupted imple- Open Peer Commentary mentation of the “affective tone” of spoken language, concomitant with a preserved “ability to ‘feel emotion’ inwardly” and an unim- paired comprehension of other subjects’ vocal expression of motivational states – to a dysfunction of right-hemisphere fronto-opercular cortex and/or anterior insula (e.g., Ross & The sound of one hand clapping: Monnot 2008). However, the lesions in these cases appear to Overdetermination and the pansensory nature have encroached on the basal ganglia, including their connections of communication to mesiofrontal cortex (see Cancelliere & Kertesz 1990). 11. In contrast to habit formation, that is, the incremental emer- doi:10.1017/S0140525X13003944 gence of stimulus-driven behavioral routines, motor skill learning is characterized by the incremental refinement of movement execu- Kenneth John Aitken tion as reflected in reaction time measurements: “Learning how to Psychology Department, Hillside School, Aberdour, Fife KY3 0RH, Scotland, ride a bicycle is quite different from having the habit of biking every United Kingdom. evening after work” (Graybiel 2008, p. 370). [email protected] [email protected]

Downloaded from http:/www.cambridge.org/core546 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Abstract: Two substantive issues are relevant to discussions of the FOXP2 is highly conserved. Only two amino acids differentiate evolution of acoustic communication and merit further consideration us from orthologs in certain other primates (such as the gorilla and fi here. The rst is the importance of communicative ontogeny and the chimpanzee), and three from the orang-utan and mouse. In mice, impact of the proximal social environment on the early development of a defect in FoxP2 impairs both ultrasonic vocalization (Shu et al. communication and language. The second is the emerging evidence for 2005) and motor learning (Groszer et al. 2008), and there are a number of non-linguistic roles of FOXP2 and its orthologs. sex differences in gene expression (Bowers et al. 2013). In the Ackermann et al. review evidence that changes to FoxP2 acted as human, FOXP transcription differences, including FOXP2, are as- the necessary and specific accelerant for human language devel- sociated with an increased likelihood of autism spectrum disorders opment. I will briefly discuss three points relevant to this view: (ASD) (see Bowers & Konopka 2012; Mukamel et al. 2011; Toma the role of pre-verbal interaction in language acquisition, the et al. 2013). “ ” range of genes and pathways involved, and lastly the importance Is FOXP2 the key language gene ? A group of genetic factors is fi of FoxP2 changes in other species. reported in association with speci c language impairments (SLIs). Communicative ontogeny. The ontogeny of communication These include FOXP2, CYP19A1, FOXG1, FOXP1, NRXN1, stems from a latent genetic potential. This is channelled, con- PCDH11X, PCDH11Y, SETBP1, CNTNAP2, ATP2C2, and strained, and developed through the neonatal environment CMIP (Deriziotis & Fisher 2013; Marseglia et al. 2012; (Aitken 2008; Aitken & Trevarthen 1997; Crais et al. 2004; Newbury et al. 2010; Toma et al. 2013). SLIs are commonly re- Rowe & Goldin-Meadow 2009). Neonates interact with adults ported in association with ASD (Bowers & Konopka 2012; with varied communicative capabilities. Over 100,000 years, Chien et al. 2013; Szalontai & Csiszar 2013). In addition, newborns have adapted to massive changes in culture and CNTNAP2 KIIA0319/TTRAP/THEM2 mutations have been re- language, while the genetic mechanisms proposed are largely ported in association with reading disorders (see Newbury et al. unchanged 2011; Pinel et al. 2012). Prehistoric behaviour left us no records. We have to look to Is the specialised role of FOXP2 confined to human contemporary ontogenies to observe differences in development. communication? Orthologs of the FOXP2 gene are found Signing-for-communication by the congenitally deaf infants of across many species, affect vocal communication in many. It is signing deaf parents is precocious, while infants with hearing highly conserved and seems likely to have an important function “ parents and hearing infants with congenitally deaf parents are or functions preceding its role in language. The notion of deep ” often slow in signing (Volterra & Erting 1990). Language and homology of structural genes in somatotopic development is social attunement in hearing infants with hearing parents seems well known, but its relevance to social behaviour has only recently little affected by variations in adult gesture (Kirk et al. 2012). In- been suggested (Scharff & Petri 2011). Overly strong parallels to teractional attunement seems critical to infant development animal models are inappropriate (see Lynch 2009), however “ ” (Lundy 2013). knock-in humanized FoxP2 genes in mice have been shown to Ontogeny only partially mirrors phylogeny. The communicative alter cortico-basal ganglia circuitry (Enard et al. 2009). environment guides our latent and flexible potential. Our neonatal FoxP2 is involved in complex non-linguistic systems. It affects capacity to cope with, adapt to, and rapidly learn from our social birdsong development (Teramitsu et al. 2004), and FoxP2 environment is perhaps the unique human attribute. We are protein levels alter with the amount of male singing (Miller born with the capacity to develop the language of our parents et al. 2008). In some species of bat, FoxP2 appears to have through their environment. We are socially altricial – our larger evolved in parallel with echolocation (Yin et al. 2008). Here, fl cortices enable us to immerse ourselves in and learn through complex vocalization is used to coordinate ight and prey location our social environment, to engage our caregivers and to ensure (see Metzner & Schuller 2010). that we are cared for and stimulated. This process is largely artic- FoxP2 has undergone accelerated evolution in echolocating ulated well before we develop language (Feldman 2007; Oller bats (Li et al. 2007), whales, dolphins (Nery et al. 2013), and et al. 2013). humans (Ayub et al. 2013). It has a wider range of functions, Foetal brain growth approaches the limits imposed by maternal across a broader phylogenetic range than was previously appreci- pelvic size. This is at the cost of the neotenous development and ated in the brain networks for complex auditory processing. In relative vulnerability of most other organ systems. Accelerated some species this has served communication, while in others its early postnatal growth could surely achieve this end to support a adaptive function seems more related to complex motor guidance. cognitive-linguistic system with reduced perinatal risk. The abili- Targets in the avian and human brain are now clearer, but their ties necessary to social survival relies on the intergenerational genetic effects and neurochemical cascade are complex (Konopka transmission of adaptability. A dyadic preverbal system underpins et al. 2009). To date, some 34 FOXP2 transcription targets have fi this process (see Trevarthen & Aitken 2001), but the “second- been identi ed in basal ganglia and inferior frontal cortex alone person neuroscience” required to study its neurobiology is a (Spiteri et al. 2007). fi recent development (Schilbach et al. 2013). FOXP2 is insuf cient to account for the development of human An alternative evolutionary strategy, typified by the social language or its neural and neurochemical substrates. It is a proxy insects, relies on invariance in the social behaviour of its marker for the genetic control of complex biological systems we fi members (see Miller 2010). This can also provide evolutionary are only beginning to de ne or understand. success but is less robust in the face of significant environmental change. FOXP2 –“Human-specific and central to linguistic communica- Comparative analyses of speech and language tion” or a more general key in processing complex information? converge on birds Forkhead transcription factors are important to a wide range of developmental processes (Carlsson & Mahlapuu 2002; Nudel & doi:10.1017/S0140525X13003956 Newbury 2013). Interest in FOXP2 came through studying the effects of its mutation on one family pedigree – the KE family Gabriël J. L. Beckers,a Robert C. Berwick,b and Johan “ (Lai et al. 2001). First referred to as a developmental verbal dys- J. Bolhuisa ” praxia (Hurst et al. 1990), it has also been reported simply as a a “ ” Cognitive Neurobiology and Helmholtz Institute, Departments of Psychology dysphasia (Gopnik 1990b), as a defect in phonology and lan- and Biology, Utrecht University, 3584 CH Utrecht, The Netherlands; guage-production (Fletcher 1990), and a severe speech disorder bDepartment of Electrical and Computer Science and Department affecting all aspects of expressive language (Vargha-Khadem & of Brain and Cognitive Sciences, Massachusetts Institute of , Passingham 1990). Non-language–related differences have also Cambridge, MA 02139. been reported (Liégeois et al. 2003). [email protected]

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 547 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

http://gbeckers.nl vocalizations for improved communication in our hominin ances- [email protected] tors played a crucial role in the origin of any traits that may be http://lids.mit.edu/people/faculty/berwick-robert.html uniquely human. [email protected] Regarding the mechanisms underlying vocal behavior, Acker- http://www.bio.uu.nl/behaviour/Bolhuis mann et al. discuss the neural and genetic (FOXP2) parallels between humans and nonhuman primates in some detail. Here Abstract: Unlike nonhuman primates, thousands of bird species have too, common descent may not be a reliable guiding principle for articulatory capabilities that equal or surpass those of humans, and they comparative research, because changes in FOXP2 are implicated develop their vocalizations through vocal imitation in a way that is very similar to how human infants learn to speak. An understanding of how not only in differences between humans and nonhuman primates, speech mechanisms have evolved is therefore unlikely to yield key but also other mammals (e.g., bats and cetaceans) as well as birds. insights into how the human brain is special. Songbirds also have a FOXP2 gene that differs very little from the human variant. Moreover, in zebra finches the FOXP2 gene is Ackermann et al.’s efforts to understand the evolution of “brain apparently involved in vocalization and vocal learning, as it is in mechanisms of acoustic communication” focus on neuroanatomi- humans (Bolhuis & Everaert 2013; Bolhuis et al. 2010). Addition- cal adaptations in nonhuman primates that may have enabled the ally, in comparison with humans, songbirds have analogous (and evolution of articulated speech. Unlike these authors, however, perhaps homologous) brain structures that are involved in vocal we do not think that an understanding of how articulation production and auditory perception and memory (Bolhuis et al. evolved in terms of common descent from our primate ancestors 2010). must be key to an understanding of “how the human brain is Arguably, an important reason for the uniqueness of the human special.” Particularly when it comes to speech and language, brain/mind is our capacity for language per se, rather than articu- large-scale patterns of evolutionary convergence provide insights latory competence (Berwick et al. 2013). Given the already strong that are at least as important as insights from analyzing recent parallels between humans and songbirds in terms of auditory- common descent. vocal imitation learning, and the often remarkable articulatory Speech is one possible external interface for human language, skills in many avian species, it is reasonable to ask whether song- and speech-like capabilities per se are not unique to humans or birds also possess human-like syntactic abilities (Berwick et al. primates but are in fact widespread among species far removed 2011). Recent claims of such syntactic abilities in songbirds fl (e.g., Abe & Watanabe 2011) have been shown to be based from the primate clade. Ackermann et al. brie y mention song- fl birds as an experimental model system to study neural control upon awed experimental methodologies (Beckers et al. 2012). of speech-like behavior, but at least as important is that from a Nevertheless, we argue that the absence of evidence for human- broader comparative view, songbirds also provide important evo- like combinatorial abilities in songbirds does not as of yet lutionary insights. Not only do birds have structured, articulated constitute evidence of their absence. Should such syntactic capa- vocalizations, but just like human infants, they acquire these vocal- bilities be present in nonhuman animals, songbirds would prove izations through imitation learning, a trait that is rare among more likely candidates for comparative evolutionary analysis mammals and appears to be completely absent in nonhuman pri- than apes or monkeys. Taken together with the neurocognitive mates. In addition, the way in which songbirds learn to sing is very parallels between birdsong and human speech that we have similar to the way that human infants acquire speech. First, in sketched above (see also Berwick et al. 2011; 2013; Bolhuis both cases there is a sensitive period during which learning pro- et al. 2010), this has important consequences for any evolutionary ceeds optimally. Second, developing individuals go through a tran- interpretation of speech and language. sitional phase of vocal development, which is called “babbling” in infants and “subsong” in songbirds (Bolhuis & Everaert 2013; Bolhuis et al. 2010). In both species, vocal imitation and learning typically play a large role, though as noted above, in humans, the interface modality can be gestures rather than speech. Beyond cry and laugh: Toward a multilevel Beyond their human-like way of acquiring their vocalizations, model of language production many songbird and parrot species also produce highly virtuoso vocalizations, using special adaptations for phonation and doi:10.1017/S0140525X13003968 articulatory control. Birds have evolved a specialized organ, the syrinx, solely for vocalization, unlike the human larynx. In song- Marc H. Bornsteina and Gianluca Espositob,c birds, this organ is bipartite, enabling them, for example, to sing aChild and Family Research, Eunice Kennedy Shriver National Institute of with two independent voices at the same time, to use one side Child Health and Human Development, Rockledge I, Bethesda, MD 20892- for singing and the other side for respiration to avoid running 7971, USA; bDepartment of Psychology and Cognitive Sciences, University of out of breath, or to use one voice for low registers and the Trento, Trento, 38068, Italy; cDivision of Psychology, Nanyang Technological other one for high registers. Further, vocal articulation in birds University, 639798, Singapore. is not restricted to this specialized organ, but also includes fast [email protected] [email protected] lingual and oropharyngeal movements that either support voice http://www.cfr.nichd.nih.gov/index.html articulations, or add another layer of complexity on top of it http://polorovereto.unitn.it/∼esposito/ (Beckers 2013; Beckers et al. 2004). In short, there is no question that the vocal capabilities of many species of birds surpass those Abstract: Language production is a multilevel phenomenon, and human found in any other clade, including humans. capacities to communicate vocally progress from early forms, based on projections of motor cortex to brainstem nuclei, to complex elaborations, Vocal virtuosity in birds serves a variety of functions, including mediated by high-order cognition and fostered by socially mediated the social ones that Ackermann et al. suggest played a role in feedback. human speech evolution. Articulatory and vocal imitation capa- bilities have existed in these large clades for at least 50 million Primates appear to be motorically capable of speaking words years (Jarvis 2004b), providing ample opportunity for evolution- insofar as they can articulate sounds and have (in some document- ary tinkering, especially given that birds are very diverse in ed instances) actually articulated “words.” For example, rhesus terms of ecology and behavior. Despite this, none of the many monkeys produce different call types in association with ad hoc thousands of extant species of vocal learning birds have so far visual signals and even switch between call types associated with been reported to possess a “special” brain. This comparative different signals (Hage & Nieder 2013). So, vocal tract morphol- result suggests that Ackermann et al. place too much weight ogy is not the only limitation that accounts for the inability of non- on the notion that the evolution of more versatile call-like human primates to produce even simple verbal utterances.

Downloaded from http:/www.cambridge.org/core548 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Contemporary developmental theory and research – language et al. 1999; Tamis-LeMonda et al. 1996). In human beings, expe- production included – are rooted in systems dynamics of individu- riences make a telling difference. Recall that 100% of meaningful al-context relations that guide the emergence of behavior and vocalizations (the lexicon) are learned: Children growing up in ontogenetic change. Development is associated with dynamic re- Boston learn English-sounding vocal patterns, whereas children ciprocal relations among structures at multiple levels of organiza- growing up in Paris learn French-sounding ones. tion. Language – in toto, comprehension and expression of These assertions are further supported by understanding what phonology, morphology, semantics, syntactics, and pragmatics, happens when caregivers cannot adequately interpret a child’s vo- at least – has such a multilevel organization, extending as it does calization and provide adequate feedback. An instructive example from the anatomy of the vocal tract through brain-based motor ef- occurs when a parent interacts with a child who has a neurological fectors to interpersonal dynamics and on to cultural experience. deficit before the deficit is diagnosed, as in the case of children By focusing on one level of analysis, Ackermann et al.’s hypothesis with autism for whom diagnoses are provided after 18–24 misses the essential multilevel and developmental nature of vocal months of age. A core deficit of autism occurs in social communi- production. Bidirectional influences operate across these multiple cation. At least in a subgroup of infants with autism, early vocali- levels as biological and cognitive systems are nested within indi- zations are atypically produced (Esposito et al. 2013; Sheinkopf viduals, and individuals are nested within complex social and et al. 2012), making it challenging for caregivers to interpret verbal environments. Accordingly, the developmental systems (Venuti et al. 2012) and respond to their child in an effective perspective leads away from a singular explanatory focus on organ- way (Esposito & Venuti 2009). ism or on context to how multiple forces, which span from biolog- In summary, Ackermann et al. point to anatomy and neurobiol- ical pathways to macrolinguistic influences, collaborate in ogy as rate limiting factors on vocal/verbal production, when it is development. also the case that language cognition and interactional experience Taking a cue from advances in developmental science, consider need to be added to neuroanatomical machinery. As language is a two levels above the Ackermann et al.’s focus on vocal tract multilevel phenomenon, it is good to have one level of the multi- morphology that may play vital roles in vocal/verbal production. level system better understood, but all levels as well as their inter- Primates may be lacking in higher-order cognitive-linguistic oper- connectivity need to be analyzed and apprehended. The authors ations that subserve communicative skills and in social interaction conclude that “birdsong can serve as an experimental model for experiences that play key roles in speech development. the investigation of the neural control of human speech” (sect. 6), Ackermann et al. focus on ontogenetic speech production in and this might be the case for the neural control level, but for interactions between basal ganglia at one end of the spectrum levels of the complete system above , including syn- and their cortical targets at the other. Their main argument titu- tactic and semantic aspects of verbal utterances, higher-order cog- larly focuses on the roots and limiting conditions of vocal/verbal nitions and linguistic experience are requisite. The ultimate goal production but seems crucially to omit from consideration com- of the effort here is purportedly to appreciate comprehensively prehension, which almost by law ontogenetically and cognitively the origins, capacities, and motives of human speech. precedes production and therefore places a higher-order limita- The stated aim of Ackermann et al. is to propose phylogenetic tion on production. The case of human children acquiring stages of speech acquisition which they root in “monosynaptic re- language tells us that, outside cry, laugh, and mimicry, “context- finement of corticobulbar tracts and laryngeal elaboration of restricted” and “context-free” expressions of verbal forms follow cortico-striatal motor circuits” (sect. 7, para. 1). This approach comprehension of those forms. Production hardly ever occurs leaves untouched virtually all of the higher-order components of without comprehension as a pre-requisite. mental functioning and social language learning that collaborate Comprehension qua cognition transcends genetic endowment. in the end state of verbal production. Ackermann et al. argue that, because primates lacking the (human) FOXP2 variant cannot even imitate simple speech-like utterances, and because the disruption of this gene in humans gives rise to severe articulatory deficits, it appears warranted to assume that the human variant of this gene locus is pre-requisite The evolution of coordinated vocalizations to the phylogenetic emergence of articulate speech. From a devel- before language opmental viewpoint, however, it is well to recall that human babies who are also speechless presumably possess the FOXP2 gene. doi:10.1017/S0140525X1300397X Like primates, older infants possess the requisite genetics and neuroanatomy; what they lack, like primates, are cognition and Gregory A. Bryant (see below) requisite experience. Here, multilevel development Department of Communication, Center for Behavior, Evolution, and Culture, is uncoupled from neuroanatomy and pathology. University of California, Los Angeles (UCLA), Los Angeles, CA 90095-1563. Ackermann et al. assert that vocalizations in nonhuman species [email protected] reflect ontogenetic modifications of acoustic structure rooted in http://gabryant.bol.ucla.edu/ maturation. However, the restriction to maturation again ac- knowledges only one level of understanding speechlessness in Abstract: Ackermann et al. briefly point out the potential significance of nonhuman primates. Contemporary interactionist models posit coordinated vocal behavior in the dual pathway model of acoustic that social factors shape human communicative development communication. Rhythmically entrained and articulated pre-linguistic vocal activity in early hominins might have set the evolutionary stage for and early language learning. Communication begins as the fi fl later re nements that manifest in modern humans as language-based product of bidirectional in uences between infants and adults. conversational turn-taking, joint music-making, and other behaviors When 9- to 10-month-old English-learning infants experienced associated with prosociality. a non-native language (Mandarin) through live interactions with adults, television, or audio-only presentations, only those infants Ackermann et al. present an excellent overview of the neurocog- who experienced the language through live interactions learned nitive architecture underlying primate vocal production, including (Kuhl et al. 2003). Similarly, children learn novel verbs during a proposal for the evolution of articulated speech in humans. either live interactions or socially contingent video training over Multiple sources of evidence support the dual pathway model of video chat, but not during non-contingent video training (Rose- acoustic communication. The evolution of volitional control over berry et al. 2014). Human children’s caregivers provide feedback vocalizations might critically involve adaptations for rhythmic en- that is vital to infant learning. Furthermore, prospective longitudi- trainment (i.e., a coupling of independent oscillators that have nal study shows that maternal responsiveness to infants predicts some means of energy transfer between them). Entrained vocal when children achieve various language milestones (Bornstein and non-vocal behaviors afford a variety of modern abilities such

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 549 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

as turn-taking in conversation and coordinated music-making, in & Heath 2009), and the effect seems immune to the negative con- addition to refinements that lead to the production of speech sequences of explicit recognition. That is, when behavior match- sounds that interface with the language faculty. ing is noticed, but does not involve fine temporal coordination, Wilsonand Wilson (2005) described an oscillator model of conver- interactants do not respond positively (e.g., Bailenson et al. sational turn-taking where syllable production entrainment allows 2008). Manson et al. (2013) described interpersonal synchrony for efficient interlocutor coordination with minimal gap and as a coordination game that does not afford cheating opportuni- overlap in talk. The mechanisms underlying this ability might have ties, unlike mimicry and other behavior matching phenomena been present in the hominin line well before language evolved, where deceptive, manipulative strategies are potentially profit- and could be closely tied to potential early functions of social signal- able. Coordinating vocal (and other) behavior provides a means ing including rhythmic musical behavior and dance (Bryant 2013; for individuals to assess the fit of others as cooperating partners. Hagen & Bryant 2003; Hagen & Hammerstein 2009). Research Given the extreme cooperative nature of humans relative to on error correction mechanisms has revealed several design features other species, mechanisms for such assessment are not surprising, of such entrainment mechanisms. Repp (2005) proposed distinct and in fact should be expected. neural systems underlying different kinds of error correction in syn- Taken together, the findings described above point to an impor- chronous tapping. Phase-related adjustments involve dorsal process- tant component of human vocal communication that involves the es controlling action, while ventral perception and planning independent and integrated action of emotional vocal production processes underlie period correction adjustments. and speech production systems. Selection for articulatory control Bispham (2006) and Phillips-Silver et al. (2010) have suggested mechanisms underlying the entrainment of vocal behavior for that behavioral entrainment in humans involves the coupling of per- within- and between-group communicative functions could have ception and action incorporating pre-existing elements of motor set the stage for conversational turn-taking – an ability that incor- control and pulse perception. This coupling is plausibly linked to porated speech. Dual pathway models of acoustic communication Ackermann et al.’s first phylogenetic stage including laryngeal elab- should more seriously consider the neurocognitive underpinnings oration and monosynaptic refinement of corticobulbar tracts. In of vocal entrainment abilities and consider these adaptations in order to implement proper error correction in improvised contexts the phylogenetic history of human vocal behavior. of vocal synchrony, volitional control over articulators is necessary. While littlecomparative work has shown suchan ability in nonhuman primates, there is some evidence suggesting control over vocal artic- ulators in gelada baboons, with an ability to control, for example, vocal onset times relative to conspecific vocalizations (Richman Environments organize the verbal brain 1976). And recently, Perlman et al. (2012) have found that Koko the gorilla exercises breath control in her deliberate play with wind doi:10.1017/S0140525X13003981 instruments. Other evidence of this sort is certainly forthcoming, and will help us develop an accurate account of the evolutionary pre- A. Charles Catania cursors to speech production in humans. Department of Psychology, University of Maryland, Baltimore County (UMBC), Laughter provides a window into the phylogeny of human vocal Baltimore, MD 21250. production as well. Laugh-like vocalizations first appeared prior to [email protected] the last common ancestor (Davila-Ross et al. 2009), and in humans is likely derived from the breathing patterns exhibited during play Abstract: FOXP2 expression in the evolution of language derives from its activity (Provine 2000). Bryant and Aktipis (2014) found that per- role in allowing vocal articulation that is sensitive to its consequences. The discrete verbal discourse it allows must have evolved recently relative to ceptible proportions of inter-voicing intervals (IVIs) differed sys- affective features of vocal behavior such as tone of voice. Because all tematically between spontaneous and volitional human laughter, organ systems must have evolved in the service of behavior, attention is and altered versions of the laughs were differentially perceived given to ways in which environments may have driven brain organization. as being human made, and related to the IVI measures. Specifi- cally, slowed spontaneous laughs were indistinguishable from Ackerman et al.’s plausible account of how brain evolution may nonhuman animal calls, while slowed volitional laughs were recog- have led to language would be even more persuasive if it also nizable as being human produced. These data were interpreted as dealt with how evolutionary environments might have driven being evidence for perceptual sensitivity to vocalizations originat- brain changes that engendered language. The survival and repro- ing from different production machinery – a finding consistent duction of organisms within populations depends on their behav- with the dual pathway model presented here by Ackermann et al. ior, so I start from the position that the brain, like all organ Interestingly, laughter seems to play a role in coordinating con- systems, evolved in the service of behavior (Catania 2008). For versational timing. Manson et al. (2013) have reported that con- example, brain size may have driven articulatory control, but en- vergence in speech rate was positively associated with how vironments where that articulatory control made a difference much interlocutors engaged in co-laughter. While the degree of must also have driven brain size. Elsewhere I address in more convergence over a 10-minute conversation predicted cooperative detail these and related issues, including interpretations of learn- play in an unannounced Prisoner’s Dilemma game, the amount of ing in terms of selection rather than associations and the distinc- co-laughter did not. The relationship between laughter and tion between language structure and function (e.g., Catania speech is not well understood, though evidence suggests that it 1990; 2013a; Catania & Cerutti 1986). is integrated to some extent. The placement of laughter in the The functional distinction between affective language, as in speech stream follows some linguistic patterns (i.e., a punctuation tone of voice, and substantive language, as in vocal discourse, is effect) (Provine 1993), but also manifests itself embedded within illustrated by an account of the different reactions of two audienc- words and sentences as well (Bryant 2012). Co-laughter might es to a speech by Ronald Reagan (Sacks 1985). Psychotics without serve in some capacity to help conversationalists coordinate affect responded only to the speech content, whereas aphasics re- their talk, and, in early humans, perhaps coordinate other kinds sponded only to its affect; only those responsive to both dimen- of vocal behavior. Recent work has demonstrated that people sions found the speech persuasive. The affective and the can detect in very short co-laughter segments (<2 seconds) discursive systems involve the same vocal apparatus, so they nec- whether the co-laughers are acquainted or not (Bryant 2012) sug- essarily evolved in coordination, but Sacks’s example demon- gesting a possible chorusing function. strates their separate functionalities. If affective vocal functions A surge of recent work is showing how interpersonal synchrony are similar to those of other displays usually characterized as emo- involving entrainment results in cooperative interactions (e.g., tional, then other functions must have driven evolution of the dis- Kirschner & Tomasello 2010; Manson et al. 2013; Wiltermuth cursive system: evolution rarely duplicates existing functions.

Downloaded from http:/www.cambridge.org/core550 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Emotional behavior substantially predates language, with discur- refinement of skills, where differential consequences shape sive functions presumably overlaid upon it later. Contented or behavior. Existing taxonomies of behavioral processes could put angry or lustful gorillas do not need new ways to express their much of this in good order. They are built not upon associations emotions. or conditioning but rather upon the selection of behavior by its My candidate for the minimal function of discursive verbal consequences (e.g., Catania 2013a; 2013b; Madden 2012; Ver- behavior from which all other functions are derived? It is a planck 2000). The structure of behavior provides crucial clues highly efficient way in which one human can get another to do for what to look for in the brain. something (Catania 1995; 2003; 2009). The imperative does not Nevertheless, Ackerman et al. have made a strong case that require multiple-word utterances or grammar. Even if nothing FOXP2’s expression is prerequisite for the complex vocal articula- else is available, a single utterance functionally equivalent to the tions of human language. Timing is critical, so these coordinations command Stop! will benefit the members of any hominid group involve not just tongue, lips, and larynx, but also diaphragm and that creates it. Other functions (e.g., communication, narrative) rib cage. FOXP2’s expression may work by allowing motor pat- are derivatives of this fundamental one. For example, prestige terns that are otherwise highly constrained by anatomies and matters only when some individuals become more important in stimuli to be modified by their consequences. Skinner apparently telling others what to do; cooperation can sometimes be more ef- got it right when he wrote: “The human species took a crucial step fectively induced through verbal instructions than by other means. forward when its vocal musculature came under operant control in Telling others what to do leads to telling them what to say, and the production of speech sounds. Indeed, it is possible that all the giving information provides expanded ways to tell others how to distinctive achievements of the species can be traced to that one do things. genetic change” (Skinner 1986, p. 117). In nonverbal organisms, The long period during which our ancestors made tools and reinforcers can alter only the rate of vocalizations (e.g., cheeps mastered fire suggests that some form of hominid language has of chicks; Lane 1961); in children, however, their form is sculpted existed for perhaps millions of years, whereas archaeological even by such subtle differential consequences as producing findings coupled with inferences from linguistic change and sounds more or less resembling those of caregivers (Risley 1977; human migration implies a source perhaps as recent as 40 to Vihman 1996). This is as it should be because, as Ackerman 50 thousand years ago. The longer time makes sense if we et al. so effectively point out, our vocal articulations are perhaps include the evolution of affective and single-utterance precursors the most sophisticated of human achievements. with simple imperative functions, accompanied by more sophisti- cated articulations. Significant anatomical developments included bipedal locomotion, freeing respiration from constraints on the rib cage, and elaborations of vocal signals such as laughter (Provine 2000; 2012). But defining language solely in terms of syntactically Evolution of affective and linguistic organized multi-word utterances targets the more recent prove- disambiguation under social eavesdropping nance. The step from single- to multiword utterances with differ- pressures ent words having different functions allows for an explosion of language diversity. doi:10.1017/S0140525X13003993 Coherent accounts of language evolution must include three concurrent levels of selection (Catania 2001), each entailing dif- Kevin B. Clarka,b ferent mechanisms by which environments select surviving vari- aResearch and Development Service, Veterans Affairs Greater Los Angeles ants. First, phylogenetic (Darwinian) contingencies must select Healthcare System, Los Angeles, CA 90073; bComplex Biological Systems requisite physiological attributes (e.g., vocal tract structure, Alliance, North Andover, MA 01845. neural organization). Second, ontogenetic contingencies (selec- [email protected] [email protected] tion of behavior within individual lifetimes) must maintain those www.linkedin.com/pub/kevin-clark/58/67/19a language features acquired by individuals, as when native but not non-native speech sounds survive in a child’s developing vocal- Abstract: Contradicting new dual-pathway models of language evolution, izations. Third, cultural or memetic selection (selection of behav- cortico-striatal-thalamic circuitry disambiguate uncertainties in affective ior as it passes among individuals) must perpetuate languages prosody and propositional linguistic content of language production and across generations as communities pass them on from one to comprehension, predictably setting limits on useful complexity of articulate phonic and/or signed speech. Such limits likely evolved to another. ensure public information is discriminated by intended communicants Ackerman et al. seek an account for the co-evolution of articu- and safeguarded against the ecological pressures of social eavesdropping latory and perceptual skills. Yet if distributions of both skill levels within and across phylogenetic boundaries. exist within a population, those at the upper ranges of either skill will be selected. As long as selection operates relative to each pop- The basal ganglia contribute to acquisition, planning, initiation, ulation mean, the distributions will change together, just as but and execution of vocal and gestural communication skills in pri- more benignly than in the arms races of predators and their mates, birds, and other animals. Consistent with dual-pathway prey. For example, mothers with superior acuity along some audi- models of language evolution, Ackermann et al. in the target tory dimension will sometimes bear offspring with superior differ- article now speculate the basal ganglia also integrate and modulate entiation along some articulatory dimension; they and their (continuous or analog) affective prosody of vocalizations and ges- offspring will both be selected, just as more successful predators ticulations with little to no influence over (discrete or digital) are selected as their predation selects prey more successful at propositional linguistic content of human phonetic and, presum- escape. ably, signed speech. The authors cite comparative clinical and Ackerman et al. cite many relevant studies but provide no tax- basic research findings to support their claim that high-level lin- onomy of relevant processes. We read of reinforcement, goal-di- guistic processing only occurs in phylogenetically newer brain rected behavior, instrumental conditioning, learning responses systems, while omitting the recent small, but credible, neuroimag- to food rewards, acquisition of stimulus-driven behavioral rou- ing literature which contradicts this assertion and implicates tines, habit formation, training, and even motor tricks. But these human cortico-striatal-thalamic circuitry in disambiguating are simply alternative vocabularies for labeling behavior changes lexical (Chenery et al. 2008; Copeland 2003), grammatical that occur because behavior is modified by its consequences (Mestres-Missé et al. 2012), and semantic (Ketteler et al. 2008; (Catania 2013a; Schneider 2012). Consequences are as much in- Marques et al. 2009; Wittforth et al. 2010) uncertainties in per- volved in stimulus-driven behavioral routines, where responses ceived language. Failure to assimilate roles of the basal ganglia produce different consequences given different stimuli, as in in both language production and comprehension seriously

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 551 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

weakens the conceptual validity and power of Ackermann et al.’s loops. Three fundamental features of all computational complex- treatise on selective fitness of advancing animal taxa to evolve in- ity classes may be varied – computational resources (e.g., time, creasingly sophisticated dual-pathway communication systems for space), problem type to be solved (e.g., optimization or decision affective and propositional information exchange. problem, language production and comprehension), and compu- Evolutionarily older functions of cortico-striatal-thalamic loops tational model to be employed (e.g., deterministic Turing to generate and filter variances in affective prosody of non- and/or Machine, probabilistic Turing Machine, quantum computer) protolinguistic species-typical/atypical communications, as advo- (Clark 2012). Disparities in classical communication complexities cated by Ackermann et al., seem to have eventually and adaptively between birds and humans reveal dissociations for each computa- converged to help perform similar operations on propositional lin- tional feature and, consequently, for communication disambigua- guistic content, as evidenced in later human language use. Such tion involving affective prosodic or propositional information (lateralized) developments in cortico-striatal-thalamic processing content (Berwick et al. 2011). As disparities narrow and computa- necessarily first enabled language-deficient nonhuman animals tional features progressively overlap, threats of eavesdropping on to better articulate innate and/or learned primitive communica- public information should escalate for superior communicants, in tions (e.g., recombinant hierarchical call or song sequences with this case humans. precise, intricate spectral patterns) and, therefore, to more suc- More instructive scenarios, and ones that help identify flaws in cessfully transmit meanings or labels of both continuously and dis- purely classical complexity approaches toward language evolution, cretely structured information for receiver understanding (Arnold concern competing, closely related animals, such as bird or & Zuberbühler 2006; Berwick et al. 2011; Bolhuis et al. 2010; primate subspecies, with very similar communication complexi- Doupe et al. 2005; Ouattara et al. 2009; Zuberbühler 2000a; ties. Pressures of social eavesdropping rise when quality and/or Zuberbühler et al. 1999). Despite lack of direct empirical proof, quantity of niche resources dwindle and acquired public informa- one can further safely reason that homologous or analogous neu- tion facilitates selection and acquisition of preferred life necessi- romechanisms for disambiguating communication content arose ties shared by conspecifics. Subspecies communication from ecological forces that continue to drive changes in produc- adaptations, including genetically and/or culturally acquired tion, comprehension, and privatization of public vocal and gestural vocal dialects and behavioral modifications (Dabelsteen 2004; communications ancestral to and descendent from early hominin Danchin et al. 2004) processed via cortico-striatal-thalamic path- language innovations. ways, increase degrees of freedom for classical information com- Capacities of cortico-striatal-thalamic pathways to regulate var- putation, further privatizing public information readily iability in communication production and comprehension likely comprehended by conspecifics. However, when disambiguation coevolved with animal abilities to encrypt and decrypt sensitive demands for processing linguistic variations superposed public information at risk of corruption or interception from (or nearing maximal entanglement) with affective prosodic varia- social eavesdroppers. Evolution conserved social eavesdropping tions grow exponentially with information input size, privatization across phylogeny, whereby unintended observers breach informa- becomes governed by quantum computational models involving tion security of communicating parties in attempts to gain survival the entropic uncertainty principle for indistinguishable communi- and/or reproductive advantages (Clark 2010; 2013a; 2013b;in cations content (Clark 2012; in press; Nielsen & Chuang 2000). press; Dabelsteen 2004; Dall 2005; Danchin et al. 2004; Joint This principle imposes thresholds above which eavesdroppers 2006; Peake & McGregor 2004; Seyfarth & Cheney 2010; with inferior, missing, or over-allocated communication disambig- Stowe et al. 1995). Cortico-striatal-thalamic circuitry, via involve- uation neuromechanisms cannot definitely and simultaneously ment in automatic and/or volitional processing of affective and decrypt partite affective and linguistic content of public informa- propositional content variability, predictably sets limits on useful tion. However, intended communicants may violate the principle complexity of naturally communicated information. These con- by enhancing public information security through privy subspe- straints determine probabilities that public exchanges may be dis- cies-specific communication and memory specializations criminated by intended observers and safeguarded against social (cf. Bennett et al. 1993; Berta et al. 2010). eavesdroppers. When communication complexity processed by phylogenetically or culturally distant unintended observers far subtends upper complexity limits for information processed over superior disambiguation neuromechanisms of intended observers, information content of public messages and replies will remain Physical mechanisms may be as important as protected from eavesdropping. Complexity scaling of communica- brain mechanisms in evolution of speech tion production and comprehension extends along the continuum of signals to protolanguage to language and figures to be an essen- doi:10.1017/S0140525X13004007 tial evolutionary strategy to secure communications within and across taxonomic boundaries. Bart de Boera and Marcus Perlmanb One may begin to appreciate evolved neurobiological barriers aArtificial Intelligence Lab, Vrije Universiteit Brussel, 1050 Brussels, Belgium; to social eavesdropping by enlisting examples of dual-pathway bDepartment of Cognitive and Information Sciences, University of California– systems for birdsong and human speech given by Ackermann Merced, Merced, CA 95343. et al. The cortico-striatal-thalamic circuitry of birds and humans [email protected] [email protected] effect complexity scaling through two broad, related domains of http://ai.vub.ac.be/members/bart complexity – combinatorial and computational complexity – each fi Abstract: We present two arguments why physical adaptations for having particular signi cance for communication production and fi comprehension as well as for other aspects of cognition (Clark vocalization may be as important as neural adaptations. First, ne control over vocalization is not easy for physical reasons, and modern 2012). Classical combinatorial complexity differentiates levels of humans may be exceptional. Second, we present an example of a gorilla comparative language hierarchies and communication repertoires that shows rudimentary voluntary control over vocalization, indicating (Changizi 2001; Chomsky 1956; 1966; McNaughton & Papert that some neural control is already shared with great apes. 1971), where complexity is proportional to number of discrete in- formation elements, length of composite information sequences, Ackermann et al. propose a model of the evolution of neural ad- and structure of recursive information patterns. Useful complexity aptations related to the production of spoken language. Although under these conditions is defined by strictly ordered inclusive sets we are convinced of the importance of such adaptations, and al- of information capable of being both generated and recognized though the authors themselves state that “the model outlined with certain classical computational models, machines, or here addresses only one out of several building blocks” (sect. 7, grammar rules emulating properties of cortico-striatal-thalamic para. 2), we would nevertheless like to make two reflections on

Downloaded from http:/www.cambridge.org/core552 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

their article. Our first reflection is on the assumption that indepen- in coordination with a gesture in which she covers her mouth dent control over the vocal folds and the upper vocal tract is with an open hand. In several instances, she produces this behav- somehow a given, and our second reflection is on the ability of ior on command, demonstrating clear voluntary control over the apes to control vocalization voluntarily. closure of her glottis. Following Fitch (2000a), the authors assume that animal vocal- These behaviors appear to be examples of voluntary control izations have a source and a filter. They also appear to assume that over laryngeal motor activity outside of a species-typical audiovisu- source and filter are independent as they are in modern humans, al display, something that Ackermann et al. say has not been at- which is not necessarily the case. In many instances, the behavior tested yet in great apes. Apparently we should not discount the of the source is in fact strongly coupled to that of the filter (e.g., in possibility that apes – and by implication our last common ances- woodwind instruments). Source-filter theory was originally formu- tor – have more (rudimentary) abilities to control vocalization vol- lated in the context of human speech (Fant 1960). However, the untarily than is often assumed. fact that independence of source and filter is a good approxima- Given that (1) control over vocalization is not just limited by tion for human speech does not mean it is universally valid. neural factors, but also by purely anatomical and physiological Fletcher (1993) has investigated the theory of vibrating valves ones, and that (2) a gorilla has been shown to have some rudimen- and found that the independence of source and filter depends tary voluntary control over vocalization, we conclude that in the on the precise shape and configuration of the source. In addition, evolution of speech, anatomical and physiological adaptations to it depends on the ratio of resonance frequencies of the source and the vocal folds and the vocal tract may have been as important the filter. Titze (2008) has adapted the theory to human-like vocal as neural adaptations of their control. folds, and found that if the frequency at which the vocal folds vibrate is near the resonance frequencies of the vocal tract, strong coupling can occur. Apparently, modern human vocal folds and vocal tracts avoid strong coupling, but it is an open ques- ’ tion whether this was the case in our evolutionary ancestors. Very young infants responses to human and The little that we do know about ape vocal anatomy appears to nonhuman primate vocalizations argue against independence of source and filter. One instance of this is the large air sacs present in all great apes (Hewitt et al. doi:10.1017/S0140525X13004019 2002), which lower the resonance frequency of the upper vocal tract considerably (de Boer 2008) and would therefore increase Brock Ferguson, Danielle R. Perszyk, and Sandra coupling (as found in model experiments by Riede et al. 2008). R. Waxman In addition, chimpanzee vocal folds (the only ones about which Psychology Department, Northwestern University, Evanston, IL 60208. we have anatomical data) have so-called vocal lips (Demolin & [email protected] Delvaux 2006; Kelemen 1969), and thus a very different shape [email protected] from human vocal folds. Although we do not know the function [email protected] of these vocal lips, this difference between two closely related http://www.psychology.northwestern.edu/people/faculty/core/profiles/ species underscores the point that we should not just assume sandra-waxman.html similar behavior of their vocalization systems. ’ In systems where source and filter cannot behave independent- Abstract: Recent evidence from very young human infants responses to human and nonhuman primate vocalizations offers new insights – and brings ly, the set of signals that can be produced is necessarily more new questions – to the forefront for those who seek to integrate primate- limited. This consequence is demonstrated in a modeling study fi fi general and human-speci c mechanisms of acoustic communication with showing that when source and lter are closely coupled, vocaliza- theories of language acquisition. tion may be more chaotic, and thus it may be more difficult to time the onset of vocalization precisely (de Boer 2012). Given these ob- In their target article, Ackermann et al. contribute to a long- servations, it may not just be a lack of neural control that makes standing debate concerning the extent to which the uniquely precise vocalizations difficult for nonhuman primates. It may human propensity for language is the product of species-unique also be that the anatomy of their vocal folds and their vocal cognitive mechanisms (e.g., Hauser et al. 2002; Penn et al. tracts makes it much harder as well. 2008; Pinker & Bloom 1990). Their comprehensive analysis of Our second point of commentary is to note evidence of at least neurological and behavioral evidence strengthens the proposal one case in which a nonhuman primate appears to have some vol- for evolutionary continuity in the mechanisms underlying acoustic untary control over her larynx in the performance of learned, communication in human and nonhuman primates. Our goal in species atypical vocalizations. Koko, a human-reared, female this commentary is to amplify their proposal by highlighting gorilla (Patterson & Linden 1981), has been video-recorded per- recent behavioral evidence from human infants between 3 and forming numerous instances from a repertoire of play behaviors 6 months of age. This evidence, which documents how infants involving voluntary control over her larynx and surpralaryngeal respond to vocalizations of humans and nonhuman primates, vocal tract in coordination with various gestures and action rou- bears on Ackermann et al.’s formidable challenge to consider tines (Perlman et al. 2011). This repertoire includes the produc- the evidence of evolved acoustic communication architecture tion of breathy-voiced sounds and glottal stops in situations that within the broader faculties of human language. are determined by the particular play routine. Recent studies have documented that even in infants too young Perlman et al. (2011) describe how Koko exhibits vocal control to speak, listening to human speech supports core cognitive pro- in her play behavior of “talking” into telephones, when she often cesses, including the formation of object categories (Ferry et al. directs breathy grunt-like vocalizations into the receiver, which 2010; Fulkerson & Waxman 2007; Waxman & Gelman 2009). she holds to her mouth (voicing was observed in 42 of 68 exhala- Perhaps more surprisingly, this precocious link between human tions over 11 bouts). That she exercises voluntary control over her language and cognition is initially broad enough to include the vo- larynx in these vocalizations is suggested by the contrast of this calizations of nonhuman primates. For 3- and 4-month-olds, non- behavior to her routine of huffing on the lenses of eyeglasses as human primate vocalizations (from a blue-eyed Madagascar if to clean them. As in the real human performance of cleaning lemur) also promote object categorization, mirroring exactly the eyeglasses, Koko produces, in this case, open-mouthed audible effects of human speech. However, by 6 months, lemur vocaliza- huffs that are distinctly and without exception voiceless (as exhib- tions no longer have this language-like effect: Instead, the link to ited in 12 video-recorded bouts involving 25 exhalations). Another categorization is tuned specifically to human language (Ferry et al. dimension of vocal control is demonstrated in her voluntary per- 2013). These findings reveal that a link between language and formance of a mock “cough,” which involves a glottal stop, often object categories, evident as early as 3 months in human infants,

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 553 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates derives from a broader template that initially encompasses vocal- Functional neuroimaging of human izations of human and nonhuman primates, and is rapidly tuned vocalizations and affective speech specifically to human vocalizations (see also Vouloumanos et al. 2010). doi:10.1017/S0140525X13004020 This striking ontogenetic evidence has strong implications for theories of language acquisition. It also offers insights into Acker- Sascha Frühholz,a,b David Sander,a,b and Didier Grandjeana,b ’ mann et al. s proposal for integrating primate-general and human- aSwiss Center for Affective Sciences, University of Geneva, 1211 Geneva, specific mechanisms of acoustic communication. We focus here Switzerland; bDepartment of Psychology, University of Geneva, 1205 Geneva, on three. First, the evidence from human infants is consistent Switzerland. with the Ackermann et al.’s proposal that, broadly speaking, the [email protected] [email protected] faculties that give rise to human language may be related to [email protected] http://www.affective-sciences.org/user/286 those predating Homo sapiens (see also Fitch 2011; Stoeger http://cms.unige.ch/fapse/EmotionLab/Home.html et al. 2012). What remains to be seen is how precisely the relations http://cms.unige.ch/fapse/neuroemo/ between homologous neural structures can be specified. For example, one promising investigation might be to ascertain whether infants’ responses to human and nonhuman primate Abstract: Neuroimaging studies have verified the important integrative vocalizations engage the neural mechanisms described in the role of the basal ganglia during affective vocalizations. They, however, – target article. also point to additional regions supporting vocal monitoring, auditory Second, the evidence from human infants converges with motor feedback processing, and online adjustments of vocal motor ’ responses. For the case of affective vocalizations, we suggest partly Ackermann et al. s claim that human language acquisition may extending the model to fully consider the link between primate-general be built upon mechanisms that are specialized for acoustic com- and human-specific neural components. munication. One must, however, consider the necessity of these acoustically-based mechanisms in human language acquisition. Al- Ackermann et al. provide a remarkable neural model of human though most humans acquire language in the aural-oral modality, vocalizations linking affective and motor brain systems underlying our linguistic capacities are distinctly amodal. The signature of vocal communication. Recent neuroimaging studies on human af- human language is not its perceptual form, but rather its ability fective vocalizations provide additional insights on this close link to enable its users to express an infinite number of ideas using a between the affective and motor component. Although human discrete number of meaningful elements (Chomsky 1965). communication is mostly non-affective, the case of affective Thus, a complete account of the evolution of human language expressions provides an ideal paradigm to test the validity of the will be one that considers not only the acoustic-spoken modality affective-motor model of human communication proposed by but also the visual-manual modality in which deaf infants naturally Ackermann et al. acquire language. One question is whether, given the evidence for Recent neuroimaging studies have specified the neural mech- evolved neural hardware underpinning acoustic communication, anisms underlying affective vocalizations (Aziz-Zadeh et al. 2010; infants acquiring spoken language might have some advantage. Laukka et al. 2011; Wattendorf et al. 2013). These studies Evidence from infants acquiring sign language casts doubt on confirm the central role of the basal ganglia (BG) in vocalizations this possibility (e.g., Goldin-Meadow & Mylander 1983; (Laukka et al. 2011; Pichon & Kell 2013), as proposed by Acker- Newport & Meier 1985; Petitto & Marentette 1991). More mann et al., and show the close connection between the ventro- recent evidence from our lab underscores infants’ flexibility in medial and dorsolateral striatum during emotional speech identifying language-like signals beyond human speech. If a (Pichon & Kell 2013). They also support the notion of a close novel signal (consisting of pure sine-wave tone sequences) is em- connection of the BG to the cortico-subcortical vocalization bedded within a social communicative exchange, infants endow network (Laukka et al. 2011; Pichon & Kell 2013) as well as to the signal with communicative status and its effects mirror those the limbic system, which adds the emotional component of of human speech in a subsequent categorization task (Ferguson speech (Laukka et al. 2011; Péron et al. 2013; Wattendorf & Waxman 2013). et al. 2013). Finally, evidence from infants can mutually constrain and Although these studies support several of the main assumptions inform developing theories of language evolution, acquisition, by Ackermann et al., they, first, also provide conflicting evidence and usage. For example, we have recently discovered that for the suggested roles of some brain regions, and, second, suggest unlike nonhuman primate vocalizations, zebra finch birdsong additional areas to be included in the neural network of vocaliza- does not promote object categorization in human infants at any tions. Concerning the first point, Ackermann et al. propose, for age (Perszyk & Waxman 2013). This outcome is consistent with example, that the anterior cingulate cortex (ACC) has no central claims that, although birdsong shares some structural features role for prosodic vocal modulations, and that the inferior frontal with human language, it lacks the links to meaning that character- cortex (IFC) is only involved in speech output behavior. Recent ize human language and, to a much lesser extent, nonhuman studies, however, indicate that the ACC plays a central role in primate vocalizations (e.g., Berwick et al. 2013). the regulation of vocal behavior (Wattendorf et al. 2013), probably Ackerman et al.’s target article invites researchers supporting the interaction between cognitive, physiological, and across disciplines to engage in the larger enterprise of uncovering emotional-motivational states (Laukka et al. 2011) and serving the origins of human language. Within this enterprise, the biggest as an auditory–motor interface between the perception and pro- leaps will be made by those who integrate seemingly disparate duction of vocalizations (Aziz-Zadeh et al. 2010); see our neurological, behavioral, and developmental evidence to unearth Figure 1. Furthermore, the portion of the inferior frontal cortex the evolutionary continuities and discontinuities in both modali- (IFC) that lies rostral to the premotor cortex and Broca’s area ty-specific (e.g., vocalizations) and modality-independent capaci- seems also to be involved in processing vocalizations, especially ties that provide humans alone with the capacity to acquire in the recognition and the generation of emotional intonated language. speech (Aziz-Zadeh et al. 2010; Frühholz & Grandjean 2013). Similar to the ACC, the IFC might thus act as an auditory– motor interface linking the perception and the production of ACKNOWLEDGMENTS emotional speech. This interface seems critical, because audito- Portions of this research was supported by a SSHRC Doctoral Fellowship ry–motor feedback loops are important for online adjustments to Brock Ferguson, an NSF Graduate Research Fellowship to Danielle of vocal behavior based on the forward and backward mapping R. Perszyk, and a National Science Foundation grant to Sandra of performance predictions (Rauschecker & Scott 2009). This is R. Waxman (BCS-0950376). closely related to the second point.

Downloaded from http:/www.cambridge.org/core554 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

adjustment of ongoing motor responses (Wattendorf et al. 2013) and provides a macro temporal event structure (Kotz & Schwartze 2010) for the temporal dynamics embedded in emotional speech. Both are important ingredients for valid affective vocalizations in terms of vocal motor responses (Patel et al. 2011). Overall, from the perspective of affective vocalizations and emotional speech, neuroimaging evidence supports the neural model of Ackermann et al., but also suggests that the model might be extended to include auditory–motor feedback loops and online adjustment of vocal behavior (Fig. 1). The paradigm of human affective vocalizations thus might be a valid example for a cross-validation of the model proposed by Ackermann et al., because affective vocalizations are an essential ingredient of human communication.

ACKNOWLEDGMENTS Sascha Frühholz and Didier Grandjean were supported by grants from the Swiss National Science Foundation (105314_146559/1 and 105314_124572/1) and the NCCR in Affective Sciences (51NF40- 104897).

Functions of the cortico-basal ganglia circuits Figure 1 (Frühholz et al.). Suggested extension (black regions and arrows) of Ackermann et al.’s original model (gray regions) for spoken language may extend beyond beyond the affective (i.e., amygdala) and motor systems. Based emotional-affective modulation in adults on the paradigm of affective vocalizations and emotional speech, we suggest adding the AC and anterior IFC (aIFC), which serve doi:10.1017/S0140525X13004032 – auditory motor feedback processing and vocal monitoring; the a,b a,b CbII, which serves online micro and macro adjustments of vocal Takashi Hanakawa and Chihiro Hosoda motor output; and the ACC, which appears to be directly aDepartment of Advanced Neuroimaging, Integrative Brain Imaging Center, involved in controlling vocal output and physiological responses. National Center of Neurology and , Kodaira 187-8551, Japan; bPRESTO, Japan Science and Technology Agency, Kawaguchi, Saitama 332- 0012, Japan. Recent neuroimaging evidence also points to two brain struc- [email protected] tures active during human vocalizations, which are not yet (explic- http://researchmap.jp/takashihanakawa/ itly) included in the model. As mentioned above, vocalizations [email protected] strongly depend on auditory feedback for online adjustments http://researchmap.jp/chihiro/ and corrections. Accordingly, studies consistently report activity in low- and high-level regions of the auditory cortex (AC) (Aziz- Abstract: We support Ackermann et al.’s proposal that the cortico-basal Zadeh et al. 2010; Pichon & Kell 2013), and in the cerebellum ganglia circuits may play essential roles in the evolution of spoken (Laukka et al. 2011; Pichon & Kell 2013; Wattendorf et al. language. Here we discuss further evidence indicating that the cortico- basal ganglia circuits may contribute to various aspects of spoken 2013). While the AC together with the IFC is thought to serve au- language including planning, learning, and controlling of speech in ditory feedback processing and vocal monitoring, the cerebellum adulthood. mainly supports online macro- (Pichon & Kell 2013) and micro- adjustments (Wattendorf et al. 2013) of vocal motor behavior. Ackermann et al. have proposed a two-stage neural control model Concerning the AC feedback-related activity, the online valida- underlying phylogenetic and ontogenetic evolutions of spoken tion of the vocal performance seems critical for vocal expressions. language. Neural machinery at one stage depends upon the devel- Affective vocalizations for successful social communication opment of monosynaptic projections from the motor cortex to depend on a proper vocal production, especially in terms of cranial nerve nuclei in the brainstem and the other one involves temporo-dynamic features (Patel et al. 2011). The temporal functions of the cortico-basal ganglia circuits. We appreciate this slow prosodic modulations of emotional speech, in particular, proposal because we have been interested in the contribution of seem to rely on feedback processing in the AC (Aziz-Zadeh the cortico-basal ganglia circuits to language and associated abili- et al. 2010; Pichon & Kell 2013). A major part of the slow prosodic ties in humans. Here we want to extend the authors’ view, by modulations is determined by temporal variations of the funda- arguing for potential roles of the cortico-basal ganglia circuits in mental frequency, which mainly contribute to the perception various aspects of spoken language in adults. of pitch variations. This perceived temporal pitch variations of Accumulating evidence indicates that the basal ganglia partici- one’s own vocalizations considerably activates the AC, and, pate in speech control in humans. However, the roles of the surprisingly, also the cerebellum (Pichon & Kell 2013). basal ganglia in language control are still unclear. The functional- Although the cerebellum was a core element in a former model ity of the basal ganglia for spoken language perhaps extends proposed by Ackermann (2008), in the present article Ackermann beyond the modulation of laryngeal and orofacial movements. et al. note that it is not relevant here. However, given the above- We previously showed basal ganglia activity during a cognitive mentioned evidence that the cerebellum is related to slow tempo- task involving verbal motor imagery, or “inner speech,” in ral modulations in affective speech (Pichon & Kell 2013), and healthy adults (Hanakawa et al. 2002). This basal ganglia activity given the general observation that non-speech (primate-general) was accompanied by activity in other speech-related brain and speech-based affective vocalizations (human-specific) consid- regions such as supplementary motor area and frontal opercular erably activate the cerebellum (Laukka et al. 2011; Wattendorf regions. Moreover, we reported that performance of this verbal et al. 2013), we propose that the cerebellum should be an integral imagery task was impaired in patients with basal ganglia dysfunc- part of a neural model of vocal communication. It seems that for tions (Parkinson’s disease) in comparison with matched control emotional vocalizations, the cerebellum supports the online micro participants (Sawamoto et al. 2002). A neuroimaging experiment

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 555 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

supported that the impaired performance of the verbal imagery task based temporal difference learning (Doya 2008). It would be ex- in Parkinson’s disease was associated with dysfunctions of the basal tremely interesting to figure out the learning stage at which ganglia, the caudate nucleus in particular (Sawamoto et al. 2007). genetic predispositions such as FOXP2 play fundamental roles. Considering that motor imagery is closely related to motor planning Other studies in bilinguals have shown that the caudate nucleus is (Hanakawa et al. 2008), the contribution of the basal ganglia to important for monitoring and controlling of the two languages in use spoken language likely involves a planning stage of speech. (Crinion et al. 2006; Hernandez et al. 2001;Hosodaetal.2012). Of even more importance is to understand the contribution of In conclusion, we generally warrant Ackermann et al.’s proposal the basal ganglia to learning of spoken language. Ackermann et al. that the cortico-basal ganglia circuits may play essential roles in propose a fundamentally different role of the basal ganglia at on- evolutions of spoken language. We, however, consider that the togenetic stages: acquisition of articulatory motor patterns during cortico-basal ganglia circuits may contribute to various aspects childhood versus emotive-prosodic modulation of verbal utteranc- of spoken language including planning, learning, and controlling es during adulthood. We want to modify and extend this view, es- of speech in both childhood and adulthood. pecially with regard to the contrast between childhood and adulthood stages. The neural underpinnings for the native lan- guage development are difficult to study experimentally. There- fore, we want to argue for the role of the basal ganglia in speech acquisition in adults, taking the case of second language Does it talk the talk? On the role of basal (L2) learning as an example. ganglia in emotive speech processing We recently conducted a cohort study in which Japanese univer- sity students were enrolled in a 16-week e-learning program to doi:10.1017/S0140525X13004044 develop their English vocabulary (Hosoda et al. 2013). Although the training program involved various aspects of vocabulary learn- Uri Hasson,a Daniel A. Llano,b Gabriele Miceli,a and Anthony ing, an emphasis was placed upon the training of pronunciation. Steven Dickc The students learned 60 words or idioms in each week. An aCenter for Mind/Brain Sciences (CIMeC) and Department of Psychology and example sentence for each word and idiom was also presented. Cognitive Science, University of Trento, Mattarello (TN), Italy; bSchool of The participants were encouraged to dictate each word, idiom, Molecular and Cellular Biology, University of Illinois at Urbana-Champaign, and sentence 10 times in reference to “speech templates” provided Urbana, IL 61801; cDepartment of Psychology, Florida International University, by the program. By repeating after the speech templates, the par- Miami, FL 33199. ticipants were to compare their own utterances and the speech [email protected] [email protected] [email protected] templates, and then try to make corrections to his or her motor pro- adick@fiu.edu grams for pronunciation. Speculatively, this auditory feedback http://www.hasson.org learning should help the trainees achieve adequate spatio-temporal http://mcb.illinois.edu/faculty/profile/d-llano/ control of laryngeal and orofacial musculature. After 16 weeks, the http://www.unitn.it/en/cimec/11706/gabriele-miceli trainees showed approximately 30% improvement in a test battery http://faculty.fiu.edu/∼adick of English competence. We performed multidimensional imaging ’ assessment for neuroplastic changes associated with the training. Abstract: Ackermann et al. s phylogenetic account of speech argues that Most notably, probabilistic diffusion tractography demonstrated the basal ganglia imbue speech with emotive content. However, a body of work on auditory/emotive processing is inconsistent with attributing that connectivity between the inferior frontal gyrus and the this function exclusively to these structures. The account further caudate nucleus, an input station of the basal ganglia, was enhanced ’ overlooks the possibility that the emotion-integration function may be at in correlation with the improvement in the trainees L2 compe- least in part mediated by the cortico-ponto-cerebellar system. tence. This study has provided the first evidence that the cortico- basal ganglia circuits are involved in language learning in adults. Ackermann et al.’s phylogenetic account of speech development Furthermore, the learning-induced enhancement of the cortico- hinges, in part, on premises related to the role of basal ganglia basal ganglia connectivity was accompanied by enhanced connectiv- (BG) in adult human speech production. It argues that in ity between the inferior frontal gyrus and superior temporal/supra- adults, BG imbue speech with emotive content. While the marginal gyrus (dorsal pathway), but not between the inferior model targets an important and neglected issue, we argue that it frontal gyrus and middle temporal gyrus (ventral pathway). The suffers from two structural weaknesses: First, it does not suffi- dorsal pathway primarily concerns phonological aspects of language ciently consider studies of the role of BG in auditory and control. Hence, the selective involvement of the dorsal pathway in- emotive processing such as those showing that BG damage does dicated that our training program primarily tapped into phonolog- not disrupt emotive processing in speech. Second, the argument ical aspects of L2 vocabulary. also overlooks the possibility that the role attributed to the BG According to the findings in our study (Hosoda et al. 2013), we may be at least in part mediated by a different system – the suggest a possibility that the basal ganglia may contribute to learn- cortico-ponto-cerebellar system. We believe the authors’ ing of spoken language even in adults. Speech is acquired through account would be much strengthened if they address these experiences of adequate auditory inputs, which is evident in chil- points, which we detail in turn. dren with hearing loss (Tye-Murray et al. 1995). In addition, we Viability of BG as a speech/emotion synthesizer. A principle in- suspect that reinforcement-type learning (Demirezen 1988) sub- corporated in contemporary models of speech production is that served by functions of the cortico-basal ganglia circuits may un- production occurs under one or more levels of feedback, where derlie experience-based shaping of spoken language. To potential production errors are monitored either after utterance improve speech control, it is reasonable for both child and adult production (sensory feedback) or prior to it (via internal models; learners to rely on information about the success or failure of e.g., Hickok 2012). Ackermann et al. do not couch their account their speech production. We speculate that the feedback informa- in an existing speech-production model and leave the issue of tion could be self-generated in adult learners who are enrolled in feedback underspecified. Nonetheless, if the BG were responsible an e-learning program or be given by family and community for imbuing speech with emotive content, they would be expected members as praise or approval to children. Feedback information to have the capacity to monitor and correct for related errors, that indicating successful speech production can be utilized as a posi- is, evaluate that the intended emotive tone/prosody was instantiat- tive reinforcer to strengthen the neural circuits a trainee had just ed. However, BG are a weak candidate for such a function. The activated. The striatum that receives both contextual information authors ignore studies indicating (i) that the auditory response from the cortex and reward signals from dopaminergic neurons in BG is temporally insufficient to provide feedback (Langers & occupies the best position for reinforcement learning or reward- Melcher 2011) and that it has limited functional connectivity

Downloaded from http:/www.cambridge.org/core556 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

with areas of the temporal cortex mediating language processing additional evidence, accompanied by careful consideration of al- (Choi et al. 2012); (ii) that emotive speech processing is mediated ternative accounts. We hope this commentary will result in mainly by lateral temporal systems while excluding the BG (Kotz more detailed examination of the aforementioned issues. et al. 2013; Wildgruber et al. 2006); and, most importantly, (iii) that individuals with BG infarcts are equally sensitive to emotional speech variations as control populations (Paulmann et al. 2008; 2011). These three points argue against the authors’ claim that Differences in auditory timing between human adding prosody to speech depends on integrity of striatum. and nonhuman primates The suggested account relies on two additional premises that fi are not strongly supported by the literature: The rst, that in doi:10.1017/S0140525X13004056 adults, the BG can afford coding for emotion since adult perisyl- vian regions code for syllable motor programs, independently of Henkjan Honinga and Hugo Merchantb the BG. Empirical support for this point is tenuous at best: aAmsterdam Brain and Cognition, Institute for Logic, Language and Studies using manipulations of syllable frequency have either re- Computation, University of Amsterdam, Amsterdam, The Netherlands; ported null results (Brendel et al. 2011; Riecker et al. 2008)or bDepartment of , Instituto de Neurobiología, documented effects in the anterior insula (Carreiras et al. 2006). Universidad Nacional Autónoma de México, Campus Juriquila, Querétaro, The second, that the BG can merge emotional content due to México. cross talk between cortico-striatal-thalamic circuits. Although [email protected] [email protected] there is anatomical evidence for cross-talk across BG circuits in http://www.mcg.uva.nl/hh/ animal models (Haber 2003), the functional significance of http://132.248.142.13/personal/merchant/members.html these needs to be fleshed out. On the consideration of alternatives. A BG-oriented account Abstract: The gradual audiomotor evolution hypothesis is proposed as an alternative interpretation to the auditory timing mechanisms discussed in should address questions such as those raised above, and equally ’ importantly argue why the BG is the strongest neurobiological Ackermann et al. s article. This hypothesis accommodates the fact that the performance of nonhuman primates is comparable to humans in single- candidate for mediating the function in question. The authors interval tasks (such as interval reproduction, categorization, and do not make such an argument, which is unfortunate since interception), but shows differences in multiple-interval tasks (such as much of the neurobiological argument made here for BG could entrainment, synchronization, and continuation). be made effectively for other structures, such as the cerebellum. The involvement of the cerebellum in emotional processing is Ackermann et al. propose that the monosynaptic elaboration of well established. It is implicated in self-generation of various emo- the corticobulbar tracts, which played a selective role in the tional states (Damasio et al. 2000), with different emotions origins of speech, might also have provided the phylogenetic evoking distinct activity patterns in the structure (Baumann & basis for “communicative musicality” (sect. 5.1). The term “musi- Mattingley 2012). Damage to the cerebellum affects emotional cality” is used here to indicate the cognitive and biological mech- processing. In animal models, early cerebellar lesions can lead anisms that underlie the perception and production of music, as to disrupted emotional processing (Bobee et al. 2000), and in opposed to musical activities that are shaped by culture (Honing human adults, the Cerebellar Cognitive Affective Syndrome & Ploeger 2012; Honing et al, in press b). Perceiving a regular (CCAS; Schmahmann & Sherman 1998) is a recognized clinical pulse – the beat – in music is considered a fundamental compo- entity associated with blunting of affect. CCAS has been attribut- nent of musicality: It allows humans to dance and make music to- ed to damage to the posterior vermis, which reduces the cerebel- gether. This skill has been referred to as beat perception and lar contribution to perisylvian cortical areas via its outflow to the synchronization (Patel 2008), beat induction (Honing 2012), or ventral tier thalamic nuclei (Stoodley & Schmahmann 2010). pulse perception and entrainment (Fitch 2013). Furthermore, it Arguments used by Ackermann et al. in support of their BG hy- is considered a spontaneously developing (Winkler et al. 2009), pothesis could also be applied to the cerebellum. For example, music-specific (Patel 2008) and species-specific skill (Fitch 2013). FOXP2 expression is found in the cerebellum as well as the Interestingly, beat perception and synchronization (BPS) has caudate (Lai et al. 2003; Watkins et al. 2002b), and as shown by been observed in humans and a selected group of bird species Ackermann et al. (1992), cerebellar lesions are associated with (Hasegawa et al. 2011; Patel et al. 2009b), but appears to show dysarthia. In addition, activity in the cerebellum, but not BG, dis- some but not all the behavioral finger prints in nonhuman primates criminates emotive aspects of speech (Kotz et al. 2013). Further- (Honing et al. 2012;Zarcoetal.2009; but see Hattori et al. [2013] more, the cerebellum has the capacity for generating an internal for some counter-evidence). This observation is in support of the forward model of motor-to-auditory predictions of the sort vocal learning (VL) hypothesis (Patel 2008), which suggests that needed to evaluate whether the intended emotive aspect has BPS is a by-product of the VL mechanisms that are shared by been communicated (Knolle et al. 2013). While there is no several bird and mammal species, including humans, but that are direct examination of this issue for BG, work on motor control only weakly developed, or missing entirely, in nonhuman primates. suggests that functionally, BG may implement open- rather than Nevertheless it has to be noted that, since no evidence of rhythmic closed-loop control of motor actions (Gabrieli et al. 1997). entrainment was found in many vocal learners (including dolphins, It is important to point out that these explanations are not mutu- seals, and songbirds; Schachner et al. 2009), vocal learning may be ally exclusive. Cerebellar and BG circuits involved with language necessary, but clearly is not sufficient for BPS. Furthermore, recent converge at the ventral anterior nucleus of the thalamus, which evidence for BPS in a non-vocal learner (Cook et al. 2013)weakens has also been implicated in language, and can serve as a nidus for vocal learning as a pre-condition for rhythmic entrainment. cortical feedback via cortico-thalamic projections (Crosson 2013). The absence of synchronized movements to sound (or music) in Further, cerebellar outflow can directly influence the BG, and certain species is no evidence for the absence of beat perception. vice versa (Bostan et al. 2013), suggesting that attributing the emo- With behavioral methods that rely on overt motoric responses tional content of speech to either of these two systems in isolation (e.g., Hattori et al. 2013; Patel et al. 2009b) it is difficult to distin- may not be possible. Given this connectivity, it may be that the cer- guish between the contribution of perception and action; more ebellum drives emotion-carrying vocalizations by involving BG, or direct, electrophysiological measures such as event-related brain that the BG trigger emotional behavior that is ultimately modulated potentials (ERPs) allow testing for neural correlates of beat per- by the cerebellum, as would be consistent with a CCAS syndrome. ception (a pre-condition to rhythmic entrainment). To test this, However, data on this issue are lacking. we measured auditory ERPs in rhesus monkeys (Macaca Summary. Arguing that the BG can imbue speech with emo- mulatta) using the mismatch negativity (MMN) component as tional content is a significant claim and, as such, requires an index of (the violation of) rhythmic expectation (Honing

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 557 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

et al. 2012). Rhythmic expectation was probed by selectively omit- Abstract: Ackermann et al. treat both genetic and paleoanthropological ting parts of a musical rhythm, randomly inserting gaps at the first data too superficially to support their conclusions. The case of FOXP2 position of a musical unit (i.e., the “downbeat”). This oddball par- and Neanderthals is a prime example, which I will comment on in some adigm was used previously to probe beat perception in human detail; the issues are much more complex than they appear in adults and newborns (Honing et al., in press a; Winkler et al. Ackermann et al. 2009). The results confirmed the behavioral studies discussed earlier, in that rhesus monkeys are not able to detect the beat in Ackermann et al. provide some interesting speculations about a a complex auditory stimulus, although they can detect the start possible scenario for the evolution of the brain mechanisms of of a rhythmic group (Honing et al. 2012). In fact, a recent vocal communication and language. But in the areas that I am fa- paper showed that macaques exhibit changes of gaze and facial ex- miliar with, notably Neanderthal language (Johansson 2013), but pressions when a deviant of a regular rhythmic sequence is pre- also the history of the human language capacity in general sented, supporting the notion that monkeys are sensitive to the (Johansson 2005; 2011), their treatment of the evidence is super- structure of simple rhythms (Selezneva et al. 2013). ficial and simplistic (see sect. 5.2), leading to their drawing conclu- The question remains of whether more close human relatives, sions that are insufficiently supported. such as the great apes, show a more sophisticated ability for rhyth- The authors’ Section 5 supposedly provides “paleoanthropolog- mic entrainment than macaques. While the VL hypothesis pre- ical perspectives” on their scenario, but contains little reference to dicts that no rhythmic entrainment should be found, a recent paleoanthropological data. Instead it deals mainly with FOXP2, study (Hattori et al. 2013) showed that at least one chimpanzee with fossil DNA virtually the only paleo-connection. (Pan troglodytes), of the three that took part in the experiment, When mutations in the gene FOXP2 were found to be associat- was capable of spontaneously synchronizing her movements ed with specific language impairment (Lai et al. 2001), and it was with an auditory rhythm. Interestingly, this chimpanzee entrained shown that the gene had changed along the human lineage (Enard her tapping behavior to an isochronous 600-msec interval stimuli et al. 2002), it was heralded as a “language gene.” But intensive metronome, but not to other tempos. research has revealed a more complex story, with FOXP2 control- Based on these observations, we propose an alternative view: the ling synaptic plasticity in the basal ganglia (Lieberman 2009) gradual audiomotor evolution (GAE) hypothesis (Honing et al. 2012; rather than language per se, and playing a role in vocalizations Merchant & Honing 2014), which directly addresses the similarities and vocal learning in a wide variety of species, from bats (Li and differences that are found between human and nonhuman pri- et al. 2007) to songbirds (Haesler et al. 2004). The changes in mates (discussed in section 5.1 of the target article). This hypothesis FOXP2 in the human lineage quite likely are connected with suggests rhythmic entrainment (or beat-based timing) to be gradual- some aspects of language, but the connection is not nearly as ly developed in primates, peaking in humans but present only with direct as early reports claimed, and as Ackermann et al. apparently limited properties in other nonhuman primates; while humans assume. While FOXP2 is clearly relevant at some level when mod- share interval-based timing with all nonhuman primates and eling the brain mechanisms of language, Ackermann et al. go far related species. Thus, the GAE hypothesis accommodates the fact beyond the data when they treat speech evolution as “FOXP2- that the performance of rhesus monkeys is comparable to humans driven” (sect. 5.2). in single-interval tasks (such as interval reproduction, categorization, Likewise, the apparent presence of human FOXP2 in Neander- and interception; Mendez et al. 2011;Merchantetal.2003), but thals does not in itself prove that Neanderthals spoke (Benítez- differs substantively in multiple-interval tasks (such as rhythmic en- Burraco & Longa 2012). They most likely did speak, but that trainment, synchronization, and continuation; Zarco et al. 2009). conclusion rests on a complex web of inferences from diverse Finally, the GAE and VL hypotheses show the following crucial sources of evidence, with FOXP2 just one minor piece of the differences. First, the GAE hypothesis does not claim that the puzzle (Dediu & Levinson 2013; Johansson 2013; cf. Barceló- that is engaged in rhythmic entrainment is deeply Coblijn & Benítez-Burraco 2013). linked to vocal perception, production, and learning, even if It is also imprudent to assume that Neanderthals and modern some overlap between the circuits exists. Second, the GAE hy- humans did not interbreed (target article, sect. 5.2), and quite im- pothesis suggests that rhythmic entrainment could have devel- proper to invoke Green et al. (2010) in apparent support of this oped through a gradient of anatomofunctional changes on the assumption. The jury is still out on the interbreeding issue interval-based mechanism to generate an additional beat-based (Johansson 2013), but evidence favoring interbreeding is accumu- mechanism, instead of claiming a categorical jump from non- lating (Green et al. 2010; Dediu & Levinson 2013; Yotova et al. rhythmic/single-interval to rhythmic entrainment/multiple-inter- 2011). Ackermann et al. do consider gene flow as an alternative val abilities. Third, since the cortico-basal ganglia-thalamic scenario, but here the time frame is off; an emergence of the (CBGT) circuit has been involved in beat-based mechanisms in FOXP2 mutations 40,000 years ago (sect. 5.2) is not consistent imaging studies (Grahn & Brett 2007; Rao et al. 1997; Teki with their presence in all modern human populations, as this post- et al. 2011; Wiener et al. 2010), we suggest that the reverberant dates our most recent common ancestor (MRCA; Johansson 2011; flow of audiomotor information that loops across the anterior pre- Macaulay 2005) and is not supported by a proper genetic model frontal CBGT circuits may be the underpinning of human rhyth- either (Diller & Cann 2009). mic entrainment. Finally, the GAE hypothesis suggests that the In their main scenario of no interbreeding, Ackermann et al. integration of sensorimotor information throughout the mCBGT have a different time-frame problem; the FOXP2 change is here circuit and other brain areas during the perception or execution constrained to be older than 400,000 years, but the fixation rate of single intervals is similar in human and nonhuman primates. is not constrained in this case, nor is there any tight upper time limit (cf. Diller & Cann 2009; 2012), so it is improper to conclude that it must have been “a relatively fast fixation” and thus “strong selection pressures” (target article, sect. 5.2). Neanderthals did speak, but FOXP2 doesn’t Ackermann et al. dismiss the possible contribution of anatomical prove it data from fossils in a single sentence (sect. 5.2, para. 2), and while they are correct that endocasts and cranial bases are not highly infor- doi:10.1017/S0140525X13004068 mative, other relevant anatomical evidence is available, as reviewed in Johansson (2013) and Dediu & Levinson (2013). Sverker Johansson Vocal displays as the selective driver of protolanguage evolution Dalarna University, Falun, SE-791 88, Sweden. (target article, sect. 5.2; cf. Locke & Bogin 2006) are highly [email protected] unlikely, as they would drive the evolution of something more http://users.du.se/∼sja/ resembling birdsong than language (Johansson et al. 2006).

Downloaded from http:/www.cambridge.org/core558 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

The distinct processing systems for music and language in modern motor control and acoustics. This fact is well illustrated, for in- humans likewise do not support such a scenario (Dediu & stance, by the size of the consonant repertoire across all the Levinson 2013). world’s spoken languages, which is three-fold larger than that of Ackermann et al. mention briefly many different popular works vowels (Maddieson 1984). Any suitable account of speech evolu- on language evolution (e.g., Bickerton 2009; Mithen 2005; Falk tion must thus account for the evolution of both speech building 2004), but they do not engage with them at any depth, just blocks in our lineage. picking some aspect from each that fits into their own scenario, Like human consonants, some great apes calls do not obliga- without integration. torily require the control or action of the vocal folds. Great ape In summary, Ackermann et al. accurately identify brain circuit- voiceless calls, such as clicks, raspberries, smacks, kiss sounds, ry issues that need to be addressed in the context of language evo- and whistles, are underlined by voluntary control and maneu- lution, and they provide an interesting, if speculative, evolutionary vering of supra-laryngeal articulators (i.e., tongue, lips, and scenario for these circuits. But as soon as they step outside the jaw) in apparent homology to the articulatory movements of brain and attempt to engage with other types of evidence, or voiceless consonants (Lameira et al. 2013c). These calls rely with possible selective scenarios driving language evolution, on social learning for their acquisition and fine sensory-motor their treatment is insufficient. feedback for proper production (Hardus et al. 2009b; Lameira et al. 2013a; 2013b; Marshall et al. 1999; Wich et al. 2009; 2012). Apart from some rare cases across different taxa (e.g., storks, deer, macaques), great apes produce multiple voiceless calls. With the exception of humans, it is yet unclear whether The forgotten role of consonant-like calls in any other animal species have explored the acoustic space of theories of speech evolution their supra-laryngeal vocal tract to an extent similar to great apes. For instance, in some wild orangutan populations, voice- doi:10.1017/S0140525X1300407X less calls can account for half of the repertoire of an individual who produces more than ten different call types (Hardus et al. Adriano R. Lameiraa,b 2009b). Unfortunately, Ackermann et al. neglect the importance aInstitute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, of the homology in articulation, acoustics, and acquisition 1098 XH Amsterdam, the Netherlands; bPongo Foundation, 3421 XN between great ape voiceless calls and human voiceless Oudewater, The Netherlands. consonants. [email protected] Additionally, the authors tentatively suggest that factors like mother–infant interactions, grooming, social prestige, and com- Abstract: Ackermann et al. provide an informative neurological road-map munal dancing indirectly supported the emergence of vocal to primate call communication. However, the proposed model for speech learning. Such suggestions find no ground in primate literature. evolution inadequately integrates comparative primate evidence. These factors are either shared between most of nonhuman pri- Critically, great ape voiceless calls are explicitly rendered unimportant, mates or are only known in humans, obscuring possible phylo- leaving the proposed model deprived of behavioral feedstock and genetic approaches to relevant primate communicative traits. proximate selective drivers capable of triggering the neurological fi transformations described by the authors in the primate brain. As described in the target article, any signi cant and unique role the mentioned factors may have played in the earliest Ackermann et al. compile a manual guide to the neurology of stages of speech evolution remains at least ambiguous and acoustic communication in primates and humans that should be vague. Cooperative breeding, for instance, is also left out, read by any student and scholar interested in one of the oldest though this is a promising factor capable of prompting a shift questions in evolutionary biology – speech evolution. The in the fundamental way ancestral primate individuals may authors fail, however, to integrate this important information have communicated with each other (Burkart et al. 2007; with critical evidence from comparative primate research, (a 2009a; 2009b; Burkart & van Schaik 2010; Isler & van Schaik recurring pitfall in neurology-based hypotheses for language evo- 2012; van Schaik & Burkart 2010). lution; Arbib 2005; Seyfarth 2005) and so the proposed evolution- In sum, Ackermann et al. present an evolutionary model ary model falters on central heuristic pillars. inferred virtually from neurology alone, lacking concrete and/or In agreement with the currently dominant view of speech realistic primate behaviors and selective drivers that may have evolution (Fitch et al. 2010; Janik & Slater 1997), Ackermann prompted the neural transformations described. Great ape voice- et al. place a pronounced, but unwarranted importance on less calls provide one such potent behavioral model and resolve vocal learning, underlined primarily by vocal fold control. the conflicting notions of motor continuity within the primate Because nonhuman primates, including great apes, are lineage. Although further research is needed (Lameira et al. assumed to be incapable of vocal learning (Janik & Slater 2013c), evidence suggests that a call repertoire composed of 1997), the authors logically presume that “motor mechanisms innate vocalizations together with a minority of learned voiceless of articulate speech appear to lack significant vocal antecedents calls represents a shared feature among all great apes (Lameira within the primate lineage” (sect. 1.1, para. 2). Paradoxically, et al. 2013c), dating back thus to our ape ancestor. Such an ex- Ackermann et al. argue then for the existence of vocal continu- tended repertoire would have offered direct communicative ben- ity at the motor level within the primate lineage and pursue an efits for the transmission of more (detailed) information, evolutionary model which addresses speech features that pri- disclosing an advanced into acoustic communi- marily relate to nonhuman primate voiced calls, or “vocaliza- cation (Seyfarth & Cheney 2003a; 2008; 2010; Seyfarth et al. tions,” and vowels. 2005) across whatever contexts. Such benefits would have predic- Interestingly, Ackermann et al. describe and depict in a clear tively triggered selective pressures towards increased motor way that vocal fold control is obligatorily involved solely in the pro- control over call production, even though in the absence of duction of vowels, while consonants are often voiceless and may vocal fold control. In other words, it is possible that vocal learning be produced via supra-laryngeal articulation alone (with or did not trigger the emergence of a primate open-ended call rep- without simultaneous airflow). The authors recognize that “virtu- ertoire, but represented sequentially a “secondary” evolutionary ally all languages of the world differentiate between voiced and step (Lameira et al. 2013a). Flexible (e.g., Clay et al. 2011; voiceless sounds” (sect. 4.1, para. 1), and the diagrams provided Koda et al. 2013; Lemasson et al. 2011; Ouattara et al. 2009; Slo- by the authors illustrate well that supra-laryngeal articulation is combe & Zuberbuhler 2007; Townsend et al. 2008) and intention- versatile, multidimensional, multicomponent, and arguably, in al (Gruber & Zuberbühler 2013; Schel et al. 2013) use of innate some occasions, at least as complex as vocal fold control, both in vocalizations by nonhuman primates may have then provided

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 559 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

the basis for the expansion of motor control over the vocal folds complex melodic contours than cries (Ruzza et al. 2003). In the sufficient to allow individuals to learn to produce new voiced calls. months to follow sounds are produced with the entire vocal appa- Overall, great ape voiceless calls beg for a reconsideration of ratus (mouth, lips, nose, and throat) (de Boysson-Bardie 2001; the premises of the model proposed by Ackermann et al. The Oller 2000). This points to the appearance of a better control of homology between great ape voiceless calls and human conso- sound emission due not only to the maturation of the vocal appa- nants warrants serious consideration of the former in any histor- ratus, but also to a better nervous motor control of phonation. ical account of speech evolution. Great ape voiceless calls, for Vocal tract length in neonates is about 6 to 8 cm (Vorperian & instance, also show fascinating features in that they may be pro- Kent 2007) and reaches 8.5 cm at 18 months, that is, 55% of duced simultaneously with “musical” instruments (Hardus et al. adult size (Vorperian et al. 2009). Animal and human studies 2009b; Lameira et al. 2012), and their cultural transmission suggest that the nervous motor control of infant cry is similar to within separate populations leads to the emergence of function- that of monkeys: It involves the limbic system that initiates the al arbitrariness in primate acoustic communication (Lameira cry, the midbrain structures that configure the response, et al. 2013b). These features are probably based on neurological and the brainstem that is responsible for the mechanics of the interactions that are yet to be documented and/or investigated, cry. The latter integrates the laryngeal and respiratory activity but that pose intriguing possibilities for our comprehension of with the activation of the subroutines for fixed vocal patterns speech evolution. pre-programmed as an answer to external stimuli (Jürgens & Understanding speech evolution will require integrating Ploog 1988; Lenti Boero 2009, Lester & Boukydis 1992); thus a evidence collected across multiple levels and disciplines control for articulated sounds should only come from a rearrange- (Christiansen & Kirby 2003). Neurological studies and approaches ment and a maturation of other centers allowing more motor to the question of speech evolution will be of invaluable impor- freedom to lips, mandible, and tongue movements (Davis & tance, but there should be a committed effort to “anchor” neuro- MacNeilage 2002). logical data to comparative primate research, mimicking the The why question. Cry is an alarm signal with striking character- synergies that likely played out between the primate brain and istics of loudness and long duration. It can communicate individ- primate communicative behavior in the course of speech uality, sex of the caller (Cismaresco & Montagner 1990; Rocca & evolution. Lenti Boero 2005), and urgency to a recipient (Lenti Boero et al. 2008). Now, imagine a hominid social group endowed with such ACKNOWLEDGMENT communicative tool: A known individual could communicate Adriano R. Lameira was financially support by the Menken Funds of the alarm and urgency to group mates from a distance. This commu- University of Amsterdam during the preparation of the manuscript. nication might have had a basic referentiality as in other mamma- lian species (Lenti Boero 1992; Rasa 1986; Seyfarth & Cheney 1980; Zuberbühler 2000b). Why go further? Cries are fixed analog sounds and we know they might be aversive even for Early human communication helps in mothers (Frodi 1985; Frodi & Senchack 1990; Lenti Boero understanding language evolution et al. 2008; Levitzky & Cooper 2000), while articulated sounds are considered music-like and very pleasant to the care giver š š doi:10.1017/S0140525X13004081 (Papou ek & Papou ek 1981). Newborns having a capacity for music-like sounds might have been preferentially selected by Daniela Lenti Boero parents (Locke 2006), as a pilot experiment suggests (Lenti Department of Social and Human Sciences, University of Valle d’Aosta, 11010 Boero & Bottoni 2009). Those same infants might have been se- lected when adults (Hogan 1988) because they were able to use Aosta, Italy. ’ [email protected] frequency modulated sounds in courtship in a kind of hominids http://www.univda.it/lentiboerodaniela ancestral serenade, enabling them to communicate felt emotions (Banse & Scherer 1996). Abstract: Building a theory on extant species, as Ackermann et al. do, is a Auditory-motor coevolution. All communication devices, useful contribution to the field of language evolution. Here, I add another human language included, imply the coevolution of both receiver living model that might be of interest: human language ontogeny in the and emitter, which is evident in the specialized adult language first year of life. A better knowledge of this phase might help in brain areas: Wernicke’s and Broca’s. During early development understanding two more topics among the “several building blocks of a ” we know that infant perception of surrounding sounds, including comprehensive theory of the evolution of spoken language indicated in language, is much more advanced than motor competence: their conclusion by Ackermann et al., that is, the foundation of the co- – evolution of linguistic motor skills with the auditory skills underlying Infants are capable of auditory streaming at 2 5 days old (Winkler et al. 2003), and they discriminate vowel and phonetic speech perception, and the possible phylogenetic interactions of fi protospeech production with referential capabilities. sounds from the rst month (Clarkson & Berg 1983; Eimas et al. 1971; Mehler et al. 1988; Teinonen et al. 2009), sharing According to Ackermann et al., human language is a multicompo- this capacity with many animal species: rhesus macaques, dogs, nent process whose evolution must have operated at all life stages chinchilla, quails, and parrots (Adams et al. 1987; Bottoni et al. (Hogan 1988; Locke & Bogin 2006). In the first year of life, 2009; Dewson 1964; Kluender et al. 1987; Kuhl & Miller 1975; human sounds undergo a radical transformation: the substitution Miller 1977; Morse & Snowdon 1975; Pepperberg 2007). On of the cry, an analog signal paralleling the dimension of infant’s the melodic and musical side newborn infants recognize musical homeostatic imbalance (Gustafson et al. 2000; Lenti Boero et al. melodies heard before birth (Kisilevsky et al. 2004). In addition, 1998) and similar to mammalian signals by design (Lieberman event-related brain potential (ERP) and magnetoencephalogra- et al. 1968; 1971), with articulated speech-like sounds and some phy MEG studies show that newborns can form expectation of a meaningful words at the end of the first year (de Boysson- musical pitch and that infants detect substitution of musical Bardie 2001; Lenti Boero & Bottoni 2006; Oller 2000). Thus, in notes (Tervaniemi & Huotilainen 2003). Eventually, infants the first few months of life millions of years of language evolution shape their cries’ melodic contours upon their native language are summarized; and therefore some benchmark might be out- (Mampe et al. 2009), thus showing a foundation for auditory- lined to make hypotheses about the selective pressures at work. motor connection and imitation. Infants’ sense of hearing is “en- The how question. A first point is the transformation of the cyclopaedic,” because it is open to all linguistic and music analog signal “cry” into articulated sounds: From the age of 2 sounds. This capacity is lost from 5 to 6 months (de Boysson- months, infants start producing very low intensity protophones Bardie 2001), when infants attach their attention to motherese which are mostly, but not all, vowel-like, and have more (Oller 2000), a nonexistent feature at the dawn of language.

Downloaded from http:/www.cambridge.org/core560 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Thus, sound imitation by means of protophones might have been instead, shifting to selecting words that have similar meanings. concentrated on surrounding sounds, especially those uttered by A cortical-to-basal ganglia circuit that includes the putamen and predator or prey animals, to convey information about their pres- posterior prefrontal cortex is active during the execution of a ence and denote them in the acoustic channel. Though many the- sorting set-shift. The dorsolateral prefrontal cortex is involved ories point to enhanced sociality (Dunbar 1993), the possibility to whenever subjects make any decision, apparently monitoring refer to an object is still a core point for language evolution (Lenti whether the subjects’ responses were consistent with the chosen Boero & Bottoni 2006) and might have been a key factor for the sorting criterion (Monchi et al. 2001; Simard et al. 2011). Other selection for articulated sounds emission. neuroimaging studies, reviewed in Lieberman (2000; 2002; 2006b; 2012; 2013), show that the prefrontal cortex and the ACKNOWLEDGMENTS basal ganglia are active when subjects have to understand the The research going into the preparation of this commentary was meaning of a sentence, recall words from memory, subtract supported in 1995, from 2001 to 2003, and from 2005 to 2007 by grants numbers, and cognitive tasks. All primates, including humans, from MURST (Ministero dell’Università e della Ricerca Scientifica appear to have similar cortical-to-basal ganglia circuits (Lehericy e Tecnologica), and by funding from the University of Valle d’Aosta et al. 2004). from 2009 to 2013. Ackermann et al. instead place great weight on a hypothetical direct cortical-to-laryngeal neural circuit that bypasses the basal ganglia, accepting a premise advanced in Fitch (2010). The circuit does not exist, being based on flawed attempts to adapt a lethal tracer technique to study humans. The Nauta and Gygax Why we can talk, debate, and change our (1954) technique necessitates destroying discrete neural struc- minds: Neural circuits, basal ganglia tures in an animal’s brain. After some weeks the animal is sacri- operations, and transcriptional factors ficed and its brain is impregnated with a silver solution that delineates neuronal structure. Microscopic examination of sec- doi:10.1017/S0140525X13004093 tioned brain tissue can then reveal damage to downstream neurons in circuits to the neural structure that was destroyed. Philip Lieberman Using this technique, Kuypers (1958a) and Iwatsubo et al. Department of Cognitive, Linguistic and Psychological Sciences, Brown (1990) claimed that changes to spinal cord neurons that enervate University, Providence, RI 02912. the larynx revealed a direct cortical-laryngeal circuit. However, [email protected] the deceased patients studied had massive brain damage that in- www.Brown.edu cluded the basal ganglia and pathways to it. Similar changes to brainstem neurons occurred in patients who had died from non- Abstract: Ackermann et al. disregard attested knowledge concerning neurological disease processes (Terao et al. 1997). Jürgens aphasia, Parkinson disease, cortical-to-striatal circuits, basal ganglia, (2002b) concludes his review article on the neural bases of laryngeal phonation, and other matters. Their dual-pathway model motor control by noting that “motor coordination of learned cannot account for “what is special about the human brain.” Their ” human cortical-to-laryngeal neural circuit does not exist. Basal ganglia vocal patterns comes from the motor cortex and basal ganglia operations, enhanced by mutations on FOXP2, confer human motor- (p. 251). Moreover, in itself, enhanced laryngeal control of phona- control, linguistic, and cognitive capabilities. tion would not have yielded the encoding of segmental phonemes that is a unique property of human speech (Liberman et al. 1967). It has been clear for decades that aphasia never occurs without Ackermann et al. claim that basal ganglia circuits are devoted to subcortical damage, and can occur absent insult to the cortex learning “digital” linguistic contrasts in the first years of life, then (Naeser et al. 1982; Stuss & Benson 1986). The speech production shift to learning emotional prosody. However, no data are present- deficits of Parkinson disease and focal lesions to the basal ganglia ed to support this claim, and developmental studies show that this are qualitatively similar to ones occurring in aphasia (Blumstein is not the case. For example, prosodic patterns signaling intent are 1995; Blumstein et al. 1980; Lieberman et al. 1990; 1992; apparent in the first year of life in infants in a Catalan-speaking en- Pickett et al. 1998; Usui et al. 2004) and are not limited to aberrant vironment (Esteve-Gibert & Prieto 2013). Both lexical tones and laryngeal phonation. Motor control is slow and imprecise, thus de- prosodic patterns emerge in the early years of life for Mandarin- grading speech, walking, and other internally guided motor tasks learning infants (Chen & Kent 2009). (Harrington & Haaland 1991; Marsden & Obeso 1994). A suite As my publications have pointed out, transcriptional factors of cognitive deficits occurs (Flowers & Robertson 1985; Lange such as FOXP2 may hold the key to why the human brain et al. 1992), including impairment of cognitive inflexibility and enables us to talk, continually create new forms of art, and comprehending distinctions in meaning conveyed by syntax possess language (Lieberman 2006b; 2009; 2013). The basal (Grossman et al. 1991; Lieberman et al. 1990; 1992; Natsopoulos ganglia, which initially played a role in motor control, appear to et al. 1993). Similar, less pronounced, motor and cognitive deficits have been modified in the course of evolution. The version of occur when hypoxic insult degrades the metabolically active basal FOXP2 that differs with respect to two amino acids from chim- ganglia (Lieberman et al. 1994; 2005). panzees enhances synaptic plasticity in basal ganglia neurons These behavioral deficits derive from insult to a network of seg- and in the substantia nigra. It also increases dendritic connectivity. regated cortical-to-basal neural circuits linking areas of motor A third mutation on FOXP2 (on interon 8, close to the amino acid cortex and prefrontal cortex. Marsden and Obeso (1994), taking substitutions) appears to enhance transcription. This uniquely into account a comprehensive range of studies, concluded that human mutation occurred when modern humans first appeared the basal ganglia act as a neural “switch” in circuits linking them in Africa (Maricic et al. 2013). It resulted in a “selective sweep.” to the motor cortex, activating and linking submovements in inter- Selective sweeps on genetic mutations, such as those that confer nally guided acts such as walking or talking. When circumstances adult lactose tolerance (Tishkoff et al. 2007), occur when a muta- suggest a different motor response, the basal ganglia switch to a tion enhances the survival of progeny. One of the tenets of different sequence. The basal ganglia perform similar operations neurophysiology is that synaptic plasticity is the key to learning during cognitive tasks in circuits that include areas of the prefron- anything. Virtually all human knowledge is transmitted through tal cortex. fMRI studies confirm their supposition. For example, the medium of language, and FOXP2 appears to have played a the ventrolateral prefrontal cortex and the caudate nucleus of role in the evolution of human language by enhancing basal the basal ganglia are active when a subject is planning to change ganglia synaptic plasticity and connectivity. how he or she is sorting images on the basis of their shapes, to It is puzzling that Ackermann et al., disputing my views on the sorting them by color; or selecting words that rhyme and, physiology of speech production, included a direct quotation from

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 561 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

page 289 of my 2006 book, Toward an Evolutionary Biology of therefore represent an advantage for the babbling infant Language. In pages 131 to 245 of this book I discuss, in detail, (Elowson et al. 1998). But what can we say at this point about the issues noted above and other points raised by Ackermann et al. the neurobiological substrates of these vocalizations? And, more pragmatically, what does it tell us with regard to progressive neu- rodevelopmental conditions, such as RTT? Prosodic features of spoken language were reported to be dependent on the integrity En route to disentangle the impact and of the basal ganglia, especially the striatum (Darkins et al. 1988; neurobiological substrates of early Van Lancker Sidtis et al. 2006). From an evolutionary perspective Ackermann et al. argue that a “structural reorganization of the vocalizations: Learning from Rett syndrome basal ganglia during hominin evolution may have been a pivotal prerequisite for the emergence of spoken language” (sect. 1.2, doi:10.1017/S0140525X1300410X para. 3). A great body of clinical evidence for the involvement a b c of the basal ganglia in speech-language functions stems from pa- Peter B. Marschik, Walter E. Kaufmann, Sven Bölte, Jeff ’ d a tients with basal ganglia dysfunctions, such as Parkinson s Sigafoos, and Christa Einspieler disease or Tourette syndrome. The focus has been on the substan- a Institute of Physiology, Research Unit iDN – Interdisciplinary Developmental tia nigra pars reticulata that exerts inhibitory control of the mid- Neuroscience, Medical University of Graz Austria, 8010 Graz, Austria; brain periaqueductal gray matter (PAG), a major relay of the bDepartment of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA 02115; cCenter of Neurodevelopmental Disorders (KIND), descending motor system across vertebrates, and its role in con- Department of Women’s and Children’s Health, Karolinska Institutet, Astrid verting emotional and cognitive commands into vocalization Lindgren Children’s Hospital, Solna 171 76 Stockholm, Sweden; dSchool of (Kittelberger & Bass 2013; Menuet et al. 2011). Educational Psychology, Victoria University of Wellington, PO Box 600, The PAG does not directly control the coordinated activity of Wellington 6012, New Zealand. respiratory movements, and laryngeal and orofacial muscle [email protected] groups, but rather projects to the closely related brainstem [email protected] central pattern generators (CPGs; Hikosaka 2007). CPGs are neu- [email protected] [email protected] ronal circuits that can produce rhythmic motor patterns in the [email protected] absence of oscillatory input. Some CPGs operate continuously www.medunigraz.at/physiologie/pbmarschik (e.g., respiratory movements), whereas others are activated to http://www.iddrc.org/childrens-hospital-boston/index.php/investigators/ perform specific behavioral tasks (e.g., locomotion). To provide details/walter_e._kaufmann_md motor output flexibility, supraspinal projections activate, inhibit, www.ki.se/kind and, most of all, modulate the CPG-activity, as does sensory feed- http://www.victoria.ac.nz/education/about/staff/ed-psy-ped-staff/jeff- back (Einspieler & Marschik 2012; Grillner et al. 1995). CPGs for sigafoos vocalization have been studied to a great extent not only in am- www.medunigraz.at/physiologie/ceinspieler phibians and avians, but also in mammals such as cats (CPGs located in the nucleus retroambiguus; Zhang et al. 1995) or squir- Abstract: Research on acoustic communication and its underlying rel monkeys (CPGs in the parvocellular reticular formation neurobiological substrates has led to new insights about the functioning of around the nucleus ambiguus; Hage & Jürgens 2006). Barlow central pattern generators (CPGs). CPG-related atypicalities may point to et al. (2009) have suggested the same mechanism for early brainstem irregularities rather than cortical malfunctions for early vocalizations/babbling. The “vocal pattern generator,” together with other human vocalizations/babbling. A rudimentary understanding of the CPG-circuitry for respiration and mouth movements suggests CPGs, seems to have great potential in disentangling neurodevelopmental fi disorders and potentially predict neurological development. multiple loci in the brainstem, with a signi cant role for integra- tion among subsystems and the PAG (Barlow & Estep 2006). Acoustic communication has become the focus of intensive re- The above-mentioned tight interconnection of CPGs (Barlow search aiming to assess, delineate, and interpret the integrity of et al. 2009) becomes functionally evident when observing individ- neural functions within different theoretical frameworks. For uals with RTT, a neurodevelopmental disorder mainly arising example, an increasing number of studies have aimed to docu- from mutations in the X-linked MECP2 gene (Neul et al. 2010). ment early difficulties in this domain and their potential implica- We have speculated that the interconnectivity of CPGs is pictured tions with participants of various developmental disorders, such in RTT by the apparent evolution of early atypical vocalizations, as autism spectrum disorders or Rett syndrome (RTT). In this re- with inspiratory-modulated sound patterns, into oro-motor dys- search a series of peculiarities have been reported. Publications praxia and breathing irregularities later in childhood (Marschik on delay in acquisition of milestones are increasingly comple- et al. 2012). We propose the of RTT, a condition mented by documentation of qualitative deviances, even at the with well-documented early atypical vocalizations in both humans earliest stages of speech-language acquisition such as cooing and animal models (De Filippis et al. 2010; Marschik et al. 2012, and babbling vocalizations (e.g., Marschik et al. 2012; 2013; 2013), as a model for elucidating abnormalities and their mecha- Paul et al. 2011). nisms involving the CPGs. Our species-unique ability to frame our world with words and In terms of neurobiological substrates, studies of knock-out the required neurobiological underpinnings that enable it have mouse models of RTT have revealed reduced striatal dopamine fascinated researchers studying phylogenetic and ontogenetic per- release after stimulation that coincided with motor abnormalities spectives of language and communication. Contemplations about (Gantz et al. 2011). Whether such a nigro-striatal pathway involve- the origin, evolution, and development of verbal communicative ment could also be associated with abnormal ultrasonic vocaliza- abilities have led to many assumptions, speculations, theories, tions, as demonstrated in the Mecp2-308 mouse model (De and attempts to deliver the one and only plausible explanation. Filippis et al. 2010), remains open. Of relevance to the vocaliza- In their article, Ackermann et al. postulate an ontogenetic tion-generating circuitry is the demonstration of decreased PAG model that assumes age-dependent interactions between basal volume and length in yet another RTT mouse model (Mecp2B; ganglia and their cortical targets. We discuss the plausibility of Belichenko et al. 2008). Ultimately, the ontogeny of MeCP2 ex- and support for this assumption by reviewing recent findings on pression in the human brain (Kaufmann et al. 2005) supports an early vocalizations in infants with neurodevelopmental disorders, early involvement of brainstem monoaminergic nuclei and more specifically RTT. related brain regions in the pathogenesis of multiple neurologic From an ontogenetic perspective, early vocalizations – both deficits, including language. nonhuman (e.g., as shown for pygmy marmosets) and human – In conclusion, developmental delays and atypicalities in verbal actively promote the proximity and attention of caregivers, and behaviors and other neurologic functions in RTT support CPG

Downloaded from http:/www.cambridge.org/core562 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

and, consequently, brainstem involvement. Future human and and language gave rise to more complex and general social ele- animal model studies are needed to further elucidate developing ments, which, in turn, began to play a central role in the very in- brain–behavior interfaces, disentangle specific traits, and help teractions among such individuals through new types of detect affected children at an earlier age. The “vocal pattern gen- information exchange pathways such as advertisement, mass com- erator” together with other CPGs seems to have great potential in munication vehicles, and, more recently, social media (Fitch disentangling neurodevelopmental disorders and potentially 2006). predict neurological development. By exploring the underlying brain structures involved in the phylogenetic emergence of speech, the target article demon- strates how the use of words, as standardized vocal utterances filled with specific meaning (or, as Ackermann et al. call them, phonetic-linguistic categories), represents a unique cognitive re- Speech as a breakthrough signaling resource source of the human species. In a simple manner, speech can in the cognitive evolution of biological be understood as a voluntary pattern of vocalization common to complex adaptive systems a social community which has definite connotations and which is carefully manipulated at each individual communication event fi doi:10.1017/S0140525X13004111 for the transferring of speci c information, intentions, and ab- stract ideas. As discussed by the authors, despite the fact that Tobias A. Mattei close primates may demonstrate elaborated oral-motor capabili- Department of – Brain & Spine Center / InvisionHealth, Buffalo, ties and possess an extensive vocal repertoire, they fail in produc- NY 14221. ing a pattern of vocal communication that resembles speech. As [email protected] properly pointed by Ackermann et al., nonhuman primate oral http://www.invisionhealth.com/providers/neurosurgery/tobias-mattei-md/ communication would be much more similar to other nonverbal affective forms of human vocal expressions (such as laughing, Abstract: In self-adapting dynamical systems, a significant improvement crying, or moaning) than to any type of organized and standard- in the signaling flow among agents constitutes one of the most powerful ized pattern of vocalization that might possibly deserve the triggering events for the emergence of new complex behaviors. ’ status of language. Based on such considerations, Ackermann Ackermann and colleagues comprehensive phylogenetic analysis of the et al. argue that a unique state of development of the neurophys- brain structures involved in acoustic communication provides further iological networks responsible for coupling intentional planning evidence of the essential role which speech, as a breakthrough signaling fi resource, has played in the evolutionary development of human and the re ned coordination between the several motor elements cognition viewed from the standpoint of complex adaptive system analysis. involved in phonation (such as the tongue, laryngeal, jaw and facial muscles), ultimately enabled the human race to cross the critical In the target article, Ackermann et al. contend that speech has edge that separates the crude vocalization patterns observed in emerged as a major evolutionary advantage in hominin ancestors other primates from human speech and language. In this sense, as a result of a refinement in the projections from the motor cortex it could be said that, from the phylogenetic standpoint of brain to the brainstem nuclei responsible for the control of laryngeal evolution, the observed advancements in the neuroanatomical muscles as well as the further development of vocalization-specific areas responsible for acoustic communication mentioned in the cortico-basal ganglia circuitries driven by certain mutations in the target article (such as the cortico-brainstem connections and the FOXP2 gene which were unique to humans. FOXP2 gene-induced new cortico-basal projections) fostered Complex adaptive system (CAS) analysis has emerged as a pow- the further development of the primary cortical areas related to erful research approach that has been successfully used to study language emission (Broca’s area, which is localized in the left in- the basic mechanisms underlying the evolution of dynamical ferior frontal gyrus of the dominant hemisphere – Brodmann’s systems composed of multiple agents interacting through areas 44 and 45) and language comprehension (Wernicke’s area, complex and interdependent networks. As a broad and general which is localized in the posterior section of the superior temporal theoretical tool, CAS analysis has been employed in a variety of gyrus – Brodmann’s area 22) as well as of the white matter connec- research fields in both biological and social sciences in order to tion tracts and the accessory heteromodal association areas in- unveil the common general principles responsible for the evolu- volved in the generation and processing of different speech tion of apparently unrelated complex systems, such as global mac- features such as prosody, melody, rhythm, pitch, and syntax (Rau- roeconomics (Gintis 2006), the stock market (Mauboussin 2002), schecker 2012). geopolitical organizations (Braman 2004), the cyberspace (Phister Such refinement in the brain networks responsible for the pro- 2010), natural ecosystems (Levin 1998), the immune system duction and processing of speech in conjunction with advances in (Grilo et al. 2002), the human brain (Gomez Portillo & Gleiser other brain regions which enabled the emergence of more 2009), and intracellular signaling networks (Schwab & Pienta complex non-pictographic forms of written language (i.e. 1997). systems which provided symbolic representations for the pho- Recently it has been suggested that, taking into account the nemes and words that became established in the oral culture dynamic nature of grammar and semantics’ evolution throughout throughout early human history) were also a decisive factor for the centuries, language should be considered a typical example of the development of other higher cognitive functions which a complex adaptive system (Ellis 2009). More importantly, the ended up achieving a uniquely sophisticated status in humans, emergence of the cognitive apparatus responsible for the process- such as semantic memory, abstraction, future anticipation and ing of acoustic communication can be regarded as a unique break- planning, and mathematical reasoning (Aboitiz et al. 2006). through within biological complex adaptive systems, as it fostered In summary, the specific pattern observed in the evolutionary the development of new signaling networks not only among differ- development of speech highlighted in the target article (which in- ent individuals, but also within the subsystems operating inside volves one breakthrough change leading to the percolation of the each specific agent (Pinker 2010). By enabling dynamical inter-in- whole system and the emergence of new unpredictable attributes) dividual interactions through fast and instantaneous feedback represents a typical feature of complex adaptive systems. In fact, loops, the emergence of speech granted the biological systems in such types of self-adapting and dynamical systems, it has harboring such a new cognitive resource an enormous evolution- already been demonstrated that a significant improvement in ary advantage not only from the individual standpoint, but also the signaling flow among agents (such as that proportioned by from the perspective of further development of the whole the development of speech and language) constitutes one of the species through social collaboration. Ultimately the combination most powerful triggering events for the emergence of new of such new signaling networks built upon oral communication complex behaviors, very often leading to a complete reformulation

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 563 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

of the boundaries and hierarchical structures within the system one system for spoken language, plus one or more others for para- (Holland 2012). As an academic masterpiece on the issue, Acker- linguistic or non-linguistic signals that are then added together to mann and colleagues’ comprehensive phylogenetic analysis of the make a finished product of fluent, emotionally inflected speech. brain structures responsible for speech in humans and nonhumans The division between lateral motor cortex and other systems in primates provides further evidence of the essential role that the production of human vocal signals is not a simple one, and speech and language, as breakthrough signaling resources, have might be better characterized by the degree of voluntary played in the evolutionary development of human cognition control over the vocal tract rather than according to the type of viewed from the standpoint of complex adaptive system analysis. signals generated. For example, patients who have sustained lateral cortical injuries disrupting the voluntary production of speech can still produce spontaneous and natural-sounding laugh- ter and crying, and swearing (Van Lancker & Cummings 1999). Thus, articulate speech – swear words – can be produced involun- Voluntary and involuntary processes affect the tarily. Similarly, non-verbal emotional vocalizations can be pro- production of verbal and non-verbal signals by duced under voluntary control – social laughter is typically timed the human voice to occur at the end of linguistic phrases, during both speaking and signing (Provine & Emmorey 2006). Recent work using func- doi:10.1017/S0140525X13004123 tional MRI to explore the neural underpinnings of laughter showed a considerable involvement of lateral sensorimotor Carolyn McGettigana and Sophie Kerttu Scottb systems in the production of laughter under varying amounts of aDepartment of Psychology, Royal Holloway, University of London, Egham voluntary control (Wattendorf et al. 2013). TW20 0EX, United Kingdom; bInstitute of Cognitive Neuroscience, University In everyday spoken language, voluntary modulation of the way College London, London WC1N 3AR, United Kingdom. we speak plays an essential role in the intentional expression of [email protected] [email protected] mood, intentions and aspirations. Hawkins and Smith (2001) illus- www.carolynmcgettigan.com trate this with the English phrase, “I do not know,” the pragmatic https://sites.google.com/site/speechskscott/ sense of which can vary dramatically depending on how the words are articulated (compare the casual manner of “I dunno” with the Abstract: We argue that a comprehensive model of human vocal suggestion of irritation in “I… do… not… know!”). We recently behaviour must address both voluntary and involuntary aspects of investigated the neural correlates of voluntary modulations of articulate speech and non-verbal vocalizations. Within this, plasticity of vocal output should be acknowledged and explained as part of the spoken language by asking participants in an MRI scanner to mature speech production system. perform spoken impressions of accents and impersonations of fa- miliar individuals (McGettigan et al. 2013). The peak activations In their account of the neural systems supporting vocal expression associated with deliberate changes to speaking style (compared in humans, Ackermann et al. suggest that emotional and “attitudi- with speaking in a “normal voice”) were found in the left anterior nal” aspects of prosody might influence the execution of speech insula and inferior frontal gyrus. These areas are classically associ- via cross-talk between basal ganglia loops processing emotion, mo- ated with the production of spoken language (Blank et al. 2002; tivation, and speech motor programmes. It is problematic to claim Dronkers 1996), yet in this case the linguistic content of the

Figure 1 (McGettigan & Scott). On live radio, Presenter 1 is amused by the Reporter’s pronunciation of “Jack Toit.” Although she manages to deliver her script, the pitch (F0) of her voice rises sharply as her emotional state constricts the vocal tract and renders her less able to control the source of the vocal signal (Ruch & Ekman 2001).

Downloaded from http:/www.cambridge.org/core564 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

utterances was kept constant across the different conditions of the bDepartment of Biology and Physics, Kennesaw State University, Kennesaw, fi GA 30144; cDivision of Developmental and Cognitive Neuroscience, Yerkes experiment. It is dif cult to assert that these voluntary aspects of d speech production should, or could, be added to speech separately National Primate Research Center, Atlanta, GA 30329; School of Psychology, “ ” University of Sussex, Falmer BN1 9QH, England, United Kingdom; from the digital information bound up in the phonemes, sylla- eNeuroscience Institute and Language Research Center, Georgia State bles, and words of a language. Our recent results suggest that fl University, Atlanta, GA 30302. this kind of exibility is an integral part of the planning and [email protected] control of speech and voluntary vocal behaviour. http://gsite.univ-provence.fr/gsite/document.php? Not all vocal modulations can be added to speech in a controlled pagendx=12531&project=lpc manner. Ackermann and colleagues argue that linguistic and emo- [email protected] tional prosodic information, which they see as digital and analogue, http://science.kennesaw.edu/∼jtaglial/ respectively, are coordinated in the basal ganglia, as “Otherwise ” Taglialatela_-_Ape_Communication_Lab/Home.html these two inputs would distort and corrupt each other (target [email protected] article, sect. 1.2, para. 2). It is reductive to draw boundaries http://www.sussex.ac.uk/profiles/114996 between linguistic and paralinguistic aspects of vocal behaviour, [email protected] particularly when considering the role of linguistic prosody in dis- http://neuroscience.gsu.edu/profile/william-hopkins/ ambiguation (e.g., the contrast between a question and a state- ment). Furthermore, it is certainly the case that emotional states Abstract: Ackermann et al. mention the “acquisition of species-atypical do corrupt articulate speech, as is shown when a person tries to sounds” in apes without any discussion. In our commentary, we produce speech during a fit of laughter, when overcome with demonstrate that these atypical sounds in chimpanzees not only include grief, or when feeling extremely nervous – here, the voluntary laryngeal sounds, but also have a major significance regarding the origins control of vocalization is compromised, and articulate speech is of language, if we consider looking at their context of use, their social taken over by the physiological effects of emotion on the functions properties, their relations with gestures, their lateralization, and their of the vocal tract; see our Figure 1 (cf. Levenson 2003). neurofunctional correlates as well. Ackermann et al. claim that the basal ganglia might be essential for the acquisition of articulate speech during early childhood, while the Whether apes are able to voluntarily and intentionally control behaviours of the mature speech production system are controlled their vocal production remains a topic of intense debate (e.g., by perisylvian cortical structures. There is evidence that the plasticity Hopkins et al. 2011). In a brief paragraph in their target article of vocal learning reduces in adolescence and adulthood, for example, (sect. 2.1.4.), Ackermann et al. mention the “observational acqui- the marked persistence of first-language pronunciation in adult sition of species-atypical sounds” in apes and acknowledge that learners of a second language (Flege et al. 1999a; 1999b). chimpanzees are able to produce voluntary sounds using the mod- However, speech can change in adulthood – one study showed ulation of the air through the lips (“blowing raspberries” or “kiss”). that vowels in the speech of Queen Elizabeth II have, over several However, the authors also claimed that apes are not able to decades, gradually moved closer to the standard British English “engage laryngeal sound-production mechanisms” that can be spoken by her subjects (Harrington et al. 2000). Similarly, there is ex- “decoupled volitionally from species-typical audiovisual displays.” tensive evidence for the recovery of speech in the adult system after In fact, this latter claim is not accurate. stroke (Blank et al. 2003). It isdifficult to estimate the extent to which Hopkins et al. (2007) haveindeed described the use of two atypical these gradual changes in speech come about under conscious volun- novel “learned” sounds produced by several chimpanzees among the tary control. We continue to learn new information at all levels of the captive groups from the Yerkes Primate Research Center: Some linguistic hierarchy throughout the lifespan, and the extent to which chimpanzees are not only able to produce non-voiced “raspberries” an individual changes their speech, voluntarily or not, can vary over or “kiss” sounds (involving only the lips with the air of the mouth) but both long and short timescales. With reference to the authors’ pro- also “extended grunts,” which clearly engage the vocal tract and la- posal, we therefore pose the question: How do relearned and re- ryngeal sound-production mechanisms. Hopkins and colleagues mapped behaviours in the adult speech production system fit showed that the production of these atypical sounds and vocaliza- within a model where the contributions of the basal ganglia end tions is often produced with pointing gestures and is used exclusively after childhood language acquisition?. in the presence of both a human and an out-of-reach food in order to We are encouraged by an approach to modelling human vocal beg for food, while typical species-specific “food calls” were more behaviour that incorporates its social, emotional, and linguistic frequent in the presence of food alone (Hopkins et al. 2007). Such aspects. However, we urge caution in attempts to divide the atypical productions were interpreted as signals used intentionally speech signal into distinct types of information served by specific to capture the attention of the human. Indeed, great apes have underlying functional subsystems. We argue that vocal behaviour been shown to use those acoustic signals – vocal and lips sounds, is better characterized in terms of voluntary versus involuntary cage banging or clapping gestures – especially when the recipient control of a complex motor act, regardless of its informational is not attentive, whereas visual pointing gestures are preferentially content. Further, given the evidence that vocal behaviour used when the recipient is attentive (e.g., Leavens et al. 2004; remains plastic and flexible into adulthood, we question the 2010; see also in orangutans: Cartmill & Byrne 2007;forareview extent to which this plasticity need be mechanistically distinct of the literature, see Hopkins et al. 2011). In other words, the mul- from childhood language acquisition. timodal flexibility of communicative signaling (sounds, vocalizations, and gestures) is a manifestation of the ability of the great apes to adjust the modality of the signal to the attentional state of the recip- ient, and such an intentional property might be thus a special feature of social cognition that is needed in language processing. Why vocal production of atypical sounds in In addition, given the inter-individual variability among chim- apes and its cerebral correlates have a lot to panzees concerning the ability to produce or not those novel say about the origin of language sounds, it has been interpreted that, as for human speech but in contrast to species-typical vocalizations, those atypical vocal and doi:10.1017/S0140525X13004135 lip sounds might be socially learned. In fact, it has been reported that chimpanzees raised by biological mothers who were able to Adrien Meguerditchian,a Jared P. Taglialatela,b,c David produce those sounds, were more likely to also be able to do so A. Leavens,d and William D. Hopkinsc,e than chimpanzees raised by humans in a nursery (Taglialatela aLaboratory of Cognitive Psychology, UMR7290, Aix-Marseille University – et al. 2012). Moreover, among the chimpanzees that were not CNRS, Brain and Language Research Institute,13331 Marseille, France; able to produce these atypical vocalizations, a recent study not

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 565 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

only showed that (i) it was possible to explicitly train them to do so adaptation for articulate speech bears crucially on the reconstruction of using operant conditioning, but also (ii) that those subjects would language origins. further use these novel vocalizations in a communicative context for getting the attention of a human (Russell et al. 2013). In their target article Ackermann et al. make a valiant attempt to Finally, the investigation of lateralization of those atypical assemble a comprehensive account of the origin and neural orga- sounds and its functional cerebral correlates show some continuity nization of human speech on the basis of arguments confined by with the language system. Indeed, most of the language functions and large to comparative . The nature of their topic involve a left-hemispheric dominance (Knecht et al. 2000). Inter- is ill-suited to such an approach, because at its core lies a behav- estingly, it turns out that these chimpanzee auditory signals, when ioral adaptation and corresponding neural mechanism which we produced simultaneously with food-begging pointing gestures, share with some species of cetaceans, pinnipeds, and birds, but induce a stronger right-hand preference than when the gesture not with any nonhuman primate. For such a situation, the compar- is produced alone (Hopkins & Cantero 2003), indicating that ative method offers analogy instead of homology as guiding the left hemisphere may be more activated when producing concept (e.g., the elucidation of body form in cetaceans is better both gestures and these atypical vocal and lip sounds simultane- served by turning to distant fishes rather than to far closer relatives ously. Moreover, measures of orofacial asymmetries for vocal pro- among extant mammals). duction in chimpanzees have showed that species-typical The capacity in question is the ability to learn to reproduce, by vocalizations – such as food barks or pant-hoot – elicited a left- voice, patterns of sound first received by ear. This capacity is of sided orofacial asymmetry (i.e., right-hemispheric dominance), singular biological uselessness except in special cases, one of whereas atypical attention-getting sounds elicited an asymmetry which happens to be us humans, because every word and toward the right side of the mouth, indicating that, as for right- phrase we know how to pronounce has become ours by such handedness for communicative clapping gestures (Meguerditch- means. Technically, the capacity is known as vocal production ian et al. 2012), a left-hemispheric dominance might be involved learning (Janik & Slater 1997; 2000), and though the concept for producing those acoustical signals (Losin et al. 2008). More does occur in the target article, it is more by way of an after- impressively, brain imaging studies (PET [positive emission to- thought than as a principal pivot of analysis. mography]) conducted in three captive individuals have found Putting vocal production learning at center stage removes the that communicative signaling for begging food from a human by mystery of the “speechlessness” of even our closest primate rela- using either gestures, atypical attention-getting sounds, or both tives rightly emphasized by Ackermann and colleagues. Lacking of these modalities simultaneously, activated a homologous the vocal learning mechanism (Janik & Slater 1997), they naturally region of Broca’s area (IFG) predominantly in the left hemisphere cannot do that which inherently is dependent upon it, namely, (Taglialatela et al. 2008), a pattern of activation which is enhanced learn to pronounce words and phrases of rather arbitrary phone- in subjects who used both gestural and vocal signals simultane- mic composition. That vocal learning is, in fact, the crux of the ously (Taglialatela et al. 2011). matter is demonstrated by the ease with which numerous These collective findings support the idea that the atypical oro- species of parrots and other mimics among the birds do what no facial and vocal sounds in chimpanzees are a good illustration of chimpanzee has ever done: acquire a substantial repertoire of the potential existence of a multimodal intentional system that in- human words and phrases pronounced with a fidelity that fools tegrates gestures, orofacial, and atypical vocal sounds into the the human ear (Nottebohm 1976). same lateralized system. This multimodal communicative system The diction of bird mimics tells us that the entire pronunciatory not only shares some features of social cognition and social learn- part of the speech equation is a matter of being a vocal learner. ing with human language, but also seems to be ultimately related Step 1 on the path to speech is accordingly to come into posses- to brain specialization for language (Meguerditchian et al. 2011). sion of the capacity for vocal learning. This first step, moreover, This theory is consistent with the evidence that in humans, a single provides a plausible evolutionary context for the first step integrated communication system in the left cerebral hemisphere invoked by Ackermann et al., namely, the addition of direct might be in charge of both vocal and gestural linguistic communi- (monosynaptic) cortical efference to lower brainstem motor cation (e.g., Gentilucci & Dalla Volta 2008). For all of these nuclei controlling larynx, pharynx, tongue, and lips. reasons, and their implications for the precursors of human lan- The species distribution of such direct connections (to which guage and its brain specialization, we believe that Ackermann can be added direct cortical innervation of the nucleus retroambi- et al. should better consider these voluntary laryngeal sound- guus for respiratory control) suggests that they evolve specifically production mechanisms in chimpanzees and the related multi- for cerebral fine control of respiration and vocalization and not (as modal communicative system, in their theoretical model. the target article assumes) as a general concomitant of brain ex- pansion (Arriaga & Jarvis 2013; Fitch et al. 2010; Iwatsubo et al. ACKNOWLEDGMENT 1990; Jürgens 2002a; Kuypers 1958a; 1958b; Merker 2009; This research was supported by a grant from Agence National de la Okanoya & Merker 2007; Okanoya et al. 2007; Wild 1993; 1997). Recherche ANR-12-PDOC-0014-01 (LangPrimate). As suggested in a previous BBS commentary (Merker 2009), it is even conceivable that the “simple” addition, in ancestral Homo,ofa direct primary motor cortex efference to those medullary motor nuclei sufficed to recruit the already present cerebral territories cen- ’ ’ Speech, vocal production learning, and the tered on Wernicke s and Broca s areas (see Fig. 12.4 of Falk [2007] for putative homologs in Pan and Macaca; see also Neubert et al. comparative method 2014) to the practice-based acquisition of complex vocal output matching auditory models, thus making our ancestors vocal learners. doi:10.1017/S0140525X13004147 The most common use of vocal production learning in nature is as a means to impress potential mates and rivals by mastery of a Bjorn Merker complex song tradition (for the evolutionary logic, see Merker Fjälkestadsv. 410-82, SE-29194, Kristianstad, Sweden. [2012] and review by Spencer & MacDougall-Shackleton [email protected] [2011]). Humans are a singing species (von Humboldt 1836/ Abstract: The faith that “comparative analysis of the behaviour of modern 1971), so the default assumption would be that the vocal learning primates, in conjunction with an accurate phylogenetic tree of relatedness, capacity of our ancestors was exercised for similar purposes. If so, has the power to chart the early history of human cognitive evolution” they were maintaining traditions of intergenerationally transmit- (Byrne 2000 p. 543) runs afoul of the fact that no other primate besides ted and culturally learned vocal lore (song) long before that lore humans is capable of vocal production learning. This basic enabling became verbal by being semanticized.

Downloaded from http:/www.cambridge.org/core566 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Such a situation brings the “learner bottleneck” principle of it- claims with evidence that could help situate the model in a erated (transgenerational) learning into play (Kirby 2002). In the more evolutionary-developmental (evo-devo) frame (Bertossa non-human examples the narrow focus of song displays on poten- 2011). tial mates and rivals limits the operation of that principle to effect- In particular, the article does not take note of the evidence that ing progressive refinement of the formal properties of songstrings early vocal development of humans (especially in the first 3 (their purely formal syntax; see Kirby et al. [2008] and references months) shows emergent phonatory control, rather than emer- therein). Should, however, circumstances spread singing to the gent articulatory control (Buder et al. 2008; Koopmans-van full range of daily and seasonal activities, the same principle Beinum & van der Stelt 1986; Oller 1980). Phonatory control would ensure a gradual and progressive differentiation of song does not necessarily imply “prosodic” control (the focus of much repertoire by behavioral context, amounting to an implicit assorta- of the target article) in the sense that the term prosody is used tive contextual semanticization of the songstrings repertoire as a in literature about mature languages. Prosody is normally meant whole (Merker 2012; Merker & Okanoya 2007). to denote the capacity to integrate suprasegmental variations Therein lies the point of departure for a gapless path to human across multisyllabic strings, and the human infant in the first language, details of which are presented in a recent publication of months produces no such syllabic strings. Instead, categories of mine (Merker 2012). At the point at which that path is about to infant vocalization that are recognized in the first few months by arrive at fully instrumental language, it reaches an impasse. At- parents and laboratory staff cross-culturally include “protophones” tempts to use semanticized songstrings in a “displacement” (Oller 2000), such as vowel-like “vocants” (sounds produced in the mode of reference (Hockett 1960) would undermine the very con- mid-pitch range of the individual infant, with “normal” phonation, textual basis underpinning the semantic meaning of strings. To the kind that typically occurs in speech), squeals (high-pitched overcome that hurdle, I have postulated a naturally (as opposed sounds for the infant in question, often in falsetto), and growls to sexually) selected enhancement (expansion) of the cerebral (low-pitched or raucous sounds often in creaky voice). These storage capacity hosting the vocal learning mechanism, a capacity typical protophones occur as phonatory events with little or no ar- increase driven by individual self-interest in reaping the benefits ticulatory modulation, and to the limited extent that supraglottal of instrumental uses of songstring semantics. modulation occurs, it appears to be disorganized and unpredict- Such an expanded storage capacity would allow string (and ges- able at this stage. On the other hand, the protophones are easily tural) markers for communicative intent to be appended to the recognized as distinguishable categories because they tend to songstring repertoire by learning, launching it upon fully instru- occur in clusters of the same types (a series of squeals, for mental language use. Perhaps the FOXP2 enhancement of example, followed later by a series of vocants) even as early as 3 cortico-basal ganglia function in the human line provided the re- months (Kwon et al. 2007). The high rate of production of the pro- quired extra storage capacity. As far as is known, it affects relevant tophones, along with the fact that they occur both in solitary and neurons at the microscopic and functional level, promoting social circumstances (Locke 1993; Stark 1980; Yale et al. 1999), lengthening of dendrites (potentially increasing synapse suggests endogenous motivation in the infant to explore and numbers) as well as affecting synaptic plasticity by enhancing seemingly to practice vocalization, as well as to use vocalization long term depression (Reimers-Kipping et al. 2011). Since such to serve social functions. In addition, all the protophones are changes are general for the basal ganglia as a whole (along with used by infants in expression of positive, neutral, and negative af- associated thalamic and cortical domains), they fit better with a fective states (as indicated by facial affect), and these expressions general expansion of storage capacity than with a remodeling of are predictably related to responses of caregivers ranging from en- lateral interactions among circuits serving components of articu- couraging interaction in response to positive expressions, to late speech assumed in the target article. changing the situation (or talking about the need for it) in re- sponse to negative expressions (Oller et al. 2013). All these prop- erties of very early vocal development (spontaneous production, the ability to repeat sounds in clusters, vocal social interaction, and the ability to use sounds to express differing emotional Phonation takes precedence over articulation states on differing occasions) in the human infant at 3 months in development as well as evolution of are based on phonatory control, and all of them are foundational fl language for language, since every aspect of human language requires ex- ible control of phonation. doi:10.1017/S0140525X13004159 Phonatory control takes naturally logical precedence over supraglottal control in the sequence of development, and that D. Kimbrough Ollera,b,c logical precedence is reflected in the facts of development. Phona- a tory categories appear in development before systematic supra- School of Communication Sciences and Disorders, The University of “ ” Memphis, Memphis, TN 38105; bInstitute for Intelligent Systems, The glottal articulated categories such as canonical syllables, University of Memphis, Memphis, TN 38152; cKonrad Lorenz Institute for wherein well-formed syllables (heard as, for example, “dada” or Evolution and Cognition Research, A-3400 Klosterneuburg, Austria. “baba”) are produced through coordination of phonation and sys- [email protected] tematically repeatable supraglottal articulations (Koopmans-van http://umwa.memphis.edu/fcv/viewprofile.php?uuid=koller Beinum & van der Stelt 1986; Oller 1980; Stark 1980). In 40 years of longitudinal research in human infant vocalization, I Abstract: Early human vocal development is characterized first by have never witnessed any infant developing systematic and com- emerging control of phonation and later by prosodic and supraglottal municative supraglottal movements and using them in practice- articulation. The target article has missed the opportunity to use these like play or social communication, prior to developing phonatory facts in the characterization of evolution in language-specific brain mechanisms. Phonation appears to be the initial human-specific categories as described above. brain change for language, and it was presumably a key target of These facts offer an opportunity to Ackermann et al. to more selection in early hominin evolution. thoroughly incorporate developmental patterns into their expecta- tions regarding the “monosynaptic refinement of the projections The Ackermann et al. target article offers intriguing suggestions of motor cortex to the brainstem nuclei that steer laryngeal about a dual-pathway approach to evolution of vocal capabilities muscles.” Such refinement, it would appear, must begin to be and language. But the approach could be enhanced with regard manifest in the human infant’s brain by 3 months or earlier, al- to its behavioral assumptions by taking into account key informa- though by that age, the behavioral data do not suggest “prosodic” tion on vocal development. Without this information, the article control, but something simpler – control over gross differences in misses the opportunity to augment some of its most interesting phonatory pattern and pitch. The developmental picture appears

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 567 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

to be sufficiently clear to help illuminate differences between language-learning role during child development, are proposed human and nonhuman primate brain organization at maturity, to revert to a seemingly more evolutionarily conserved functional by suggesting notable cross-species similarity of phonatory and ar- role of supporting “emotive-prosodic” modulation in adult ticulatory control capabilities across nonhuman primates and humans. This illustrates how the proposal flexes to encompass humans early in life with much larger differences developing as most data and risks being empirically untestable. Especially time passes. unclear is what similarities or differences are hypothesized to In the near future, it may also be possible for the modeling of exist between humans and different animal models, where pre- Ackermann et al. to be enhanced by quantified direct comparisons sumably homologous or analogous neurobiological mechanisms among vocal capabilities of humans and nonhuman primates. On can be clarified. the one hand, studies in human infants are rapidly tying down Although we have little doubt that the basal ganglia were an evo- facts regarding vocal rate (volubility) for protophones across lutionary substrate for spoken language, one among many others, the ages and across social and non-social circumstances (Franklin current proposal requires considerable strengthening. We make two et al. 2014; Goldstein et al. 2009; Nathani et al. 2001) as well as key suggestions. First, the hypothesis needs to be grounded in, or its substantially improving our understanding of vocal types and key tenets distinguished from, certain cognitive and/or motor theo- their flexibility in human infants (Griebel & Oller 2008; Scheiner ries. Such theories have proposed that specific improvements oc- et al. 2006; Stark et al. 1993). Similarly, considerable progress is curred in vocal-learning systems or motor pathways of humans and being made on the description of both the amount of vocalization some birds, including cortico-striatal-thalamic circuits (Arriaga & that occurs across age in nonhuman primates and the degree to Jarvis 2013; Feenders et al. 2008; Fitch et al. 2010; Fitch & Jarvis which these vocalizations are used in differing contexts, the 2012; Petkov & Jarvis 2012;Wild1997). Second, we propose that latter representing an attempt to characterize the degree to the key tenets of the proposal, if clarified, can be comparatively which the social functions of nonhuman calls may indeed show tested in studies between, for instance, human and nonhuman pri- flexibility (Crockford & Boesch 2003; Laporte & Zuberbühler mates, and songbirds and vocal non-learning birds, and any of 2010). With the recent development of a facial affect coding these species and rodents (see our Figure 1). Such comparative anal- system for chimpanzees (Parr et al. 2008) modeled on the yses have already been used in the past to test for the hypothesized Ekman scheme for human affect (Ekman & Friesen 1978), quan- differences in the cortico-striatal system between some of these titative comparison of functional flexibility in vocalization across species, and can still be used to comparatively test additional humans and chimpanzees should soon be reached. Such improve- aspects of the current proposal. ments in our quantitative understanding of development, ampli- One issue is whether and which basal ganglia–dependent differ- fied by cross-species comparisons, should fundamentally ences exist between humans and other nonhuman primates or enhance the modeling of the evolution of language and the mammals. There is little direct comparative evidence in the brain mechanisms that underlie it. primate literature to suggest that the cortico-striatal-thalamic system is strikingly different in humans relative to nonhuman pri- ACKNOWLEDGMENTS mates. In fact, as Ackermann et al. note, nonhuman primates and This work was supported by the National Institutes of Deafness and Other rodents are used as cellular model systems for human basal Communication Disorders (DC011027, D. K. Oller), by the Konrad ganglia–related cognitive function on motor and procedural learn- Lorenz Institute for Evolution and Cognition Research, and by the ing, habit forming, reward and decision-making, and sensory- Plough Foundation. motor timing relationships (Matell & Meck 2004; Schultz et al. 2000). Presumably, the proposal is that the basal ganglia, as part of a cognitive system, increased in capacity in humans to support language learning (Friederici 2011; Petkov & Jarvis 2012; Petkov & Wilson 2012). In this regard, it is possibly interest- The basal ganglia within a cognitive system in fi birds and mammals ing that Arti cial Grammar learning tasks, which were developed in the infant learning literature and that tap into rule-based proce- doi:10.1017/S0140525X13004160 dural learning, appear to show differences between different species of monkeys (Wilson et al. 2013) and between monkeys Christopher I. Petkova,b and Erich D. Jarvisc and humans (Fitch & Hauser 2004). These observations were pre- aInstitute of Neuroscience, Newcastle University, Medical School, Newcastle dicted by cognitive theories on spoken language origins (Arriaga & b Jarvis 2013; Petkov & Jarvis 2012). upon Tyne NE2 4HH, United Kingdom; Centre for Behaviour and Evolution, fi Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom; Thus, the proposal lacks the strength of the speci city of the cHoward Hughes Medical Institute and Department of Neurobiology, Duke direct cortico-bulbar hypothesis, and at the same time suffers University, Durham, NC 27710. from the limitation of overemphasis on a region vital for cognition, [email protected] [email protected] whose function is lost without the context of the cortico-striatal- http://www.staff.ncl.ac.uk/chris.petkov thalamic circuits that are formed in the brains of birds and http://www.jarvislab.net/ mammals. As a historical example, the direct cortico-bulbar hy- pothesis is now seen to be grounded in motor theories of Abstract: The primate basal ganglia are fundamental to Ackermann et al.’s spoken language origins (Petkov & Jarvis 2012). It is very specific proposal. However, primates and rodents are models for human cognitive that a monosynaptic change allowed learned sensory patterns to functions involving basal ganglia circuits, and links between striatal be vocally produced. But its strength in specificity was also its function and vocal communication come from songbirds. We suggest that the proposal is better integrated in cognitive and/or motor theories Achilles heel, leaving unanswered how humans and other on spoken language origins and with more analogous nonhuman animal mammals differ in their neurobiological substrates for learned au- models. ditory patterns, and which are linked to vocal motor output (via the nucleus ambiguus). Cognitive theories and the current pro- In the target article, Ackermann et al. present an interesting twist posal aim to address this shortcoming. Moreover, even the tenet on the well-weathered hypothesis of a direct cortico-bulbar tract of a presence versus absence of a direct cortico-bulbar tract is as a key step in the evolution of spoken language in humans, or being challenged by recent data: Mice appear to have a sparse song in vocal-learning birds. The authors seek to generate a new but still present direct cortico-bulbar projection to the nucleus hypothesis that the basal ganglia, in particular, are functionally re- ambiguus and greater vocal-production-plasticity capabilities organized during human evolution for spoken language and also than had been thought (Arriaga & Jarvis 2013; Arriaga et al. change in function during ontogeny with the learning of speech. 2012), features that had been thought to be unique to humans Curiously, however, the basal ganglia, after supporting a and vocal-learning birds.

Downloaded from http:/www.cambridge.org/core568 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Figure 1 (Petkov & Jarvis). Summary diagrams of vocal systems in songbirds, humans, monkeys, and mice. Modified from Arriaga and Jarvis (2013). Cortico-striatal-thalamic loops are schematized from data in humans and songbirds. Yellow dashed lines in macaque monkeys and mice show proposed cortico-striatal-thalamic connections for vocalization that need to be tested.

Notably, the more precise link that the authors are pursuing from birds and mammals, and (2) delineating more carefully with regard to the origins of spoken language and basal ganglia how the current proposal can be integrated within or distin- function, already has an evolutionary counterpart in vocal-learning guished from other theories on spoken language origins. and vocal-non-learning birds. The avian striatal vocal nucleus (called Area X in songbirds) sits within a cortico-striatal-thalamic loop, which is important for song learning (Jarvis 2004b; 2006; Jarvis et al. 2000), including covert-skill song learning (Charles- worth et al. 2012). Moreover, Feenders et al. (2008), by compar- The sensorimotor and social sides of the ing the anterior-forebrain pathway in vocal-learning birds to this architecture of speech pathway in vocal-non-learning birds, found evidence to develop a motor theory of vocal-learning origin. doi:10.1017/S0140525X13004172 This theory proposes that the anterior-forebrain song pathway (including Area X) independently arose multiple times in vocal- Giovanni Pezzulo,a Laura Barca,a and Alessando D’Ausiliob learning birds from a set of regions that in vocal-non-learning aInstitute of Cognitive Sciences and , National Research Council, birds control non-vocal motor actions. The discrete striatal Area 00185 Rome, Italy; bRobotics, Brain and Cognitive Sciences Department, X that sits within the cortico-striatal-thalamic vocal-learning loop Italian Institute of Technology, 16163 Genova, Italy. (Fig. 1) is not present in vocal-non-learning birds. Motor striatal [email protected] [email protected] regions outside of Area X, or the comparable forebrain regions [email protected] in vocal-non-learning birds, are more diffuse and relate to these https://sites.google.com/site/giovannipezzulo/ animals’ non-vocal motor learning abilities. Thus, considerable in- https://sites.google.com/site/laurabarcahomepage/ sights on the cortico-striatal-thalamic system have already been http://www.iit.it/people/robotics-brain-and-cognitive-sciences-mirror- provided by avian models. These are only briefly alluded to but neurons-and-interaction-lab/researcher/alessandro-dausilio.html not meaningfully used to inform the current proposal. In summary, Ackermann et al.’s proposal is an interesting Abstract: Speech is a complex skill to master. In addition to sophisticated review of the literature with an emphasis on the basal ganglia as phono-articulatory abilities, speech acquisition requires neuronal systems configured for vocal learning, with adaptable sensorimotor maps an evolutionary substrate for spoken language. However, we that couple heard speech sounds with motor programs for speech found it heavy on conjecture and light on empirical hypotheses, production; imitation and self-imitation mechanisms that can train the which, as we have suggested, can be strengthened by (1) taking sensorimotor maps to reproduce heard speech sounds; and a a broader evolutionary perspective that allows integrating data “pedagogical” learning environment that supports tutor learning.

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 569 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Besides sophisticated phono-articulatory abilities, the architecture auditory-vocal correspondences and striatal structures important of speech has key computational, neuronal, and social prerequi- for song learning (Prather et al. 2008). sites that can shed light on its phylogenetic and ontogenetic The high-fidelity reproduction of sounds could be key to cultur- origins. al transmission and the evolutionary value of singing in songbirds As a first important requirement, the architecture of speech has (Merker 2012). However, human communities have richer social to be configured for vocal learning, with adaptable sensorimotor structures than other animals, which might have favored an open- circuits that couple heard speech sounds with motor programs ended instrumental use of vocal production besides ritualized for speech production. From a computational perspective, mas- display. The importance of this skill might have led to a greater tering speech in naturalistic environments plagued by uncertainty investment of parental time in teaching and, we propose, to ad- and noise is hard; this fact has long motivated control-theoretic vanced forms of “tutor learning” (Canevari et al. 2013). Of note, views of speech emphasizing error-correction mechanisms and a so-called pedagogical learning environment (Csibra & Gergely internal modeling (Guenther & Perkell 2004; Moore 2007). 2011) might have afforded specialized teaching strategies that Computational considerations also suggest that speech could be uniquely human and that greatly improve on imitation processing (and learning, see below) might benefit from a close in- and self-teaching learning methods. One example is “motherese”: teraction of perception and production systems. For example, Mothers modify their speech when speaking to young children in production systems might support perceptual processes by order to simplify their auditory processing and learning (see predicting and “synthesizing” auditory candidates (as in analysis Pezzulo et al. 2013). This example suggests that social and inter- by synthesis), while perceptual systems might support the self- active aspects of the learning environment are important prereq- monitoring and error-correction of vocal production by affording uisites – or at least a useful scaffold – for speech acquisition and an advance auditory analysis of the produced speech sounds. Neu- cultural transmission. robiological experiments support this idea by showing that the In sum, speech processing requires a sophisticated neuro- neuronal mechanisms for speech production and perception are computational architecture in which physiologic, motoric, not segregated in the brain; for example, specific motor circuits sensory, and social aspects mutually constrain each other and are recruited for the analysis of speech sound features (D’Ausilio plausibly co-evolve. In addition to studying genetic determinants, et al. 2012). An organic proposal on the architecture of speech can it is important to recognize that speech could have found a suit- be formulated within the framework of generative systems,in able “neuronal niche” (Dehaene & Cohen 2007) in existing which perception and action systems share computational (and brain structures (cortical and subcortical) supporting skilled neuronal) resources and are both guided by a common predic- action. For example, speech could have re-used “generative” dy- tion-error minimization process (Dindo et al. 2011; Friston namics of such structures for imitation and self-imitation, and re- 2010; Kiebel et al. 2008; Pezzulo 2012a; 2013; Yildiz et al. 2013). deployed existing computational resources for combinatorial A second important requirement is a learning method powerful processing (Chersi et al. 2014; Fadiga et al. 2009). enough to train the aforementioned sensorimotor architecture to In parallel, speech could have found a suitable “socio-cultural perceive and (re)produce sounds and speech. This problem has niche”: It could have been incubated within the sophisticated been studied particularly in songbirds that, while not speaking, interactive and social dynamics of our species. The social have sophisticated vocal learning abilities. Most theories assume context in which human speech is acquired is extremely rich, that songbird learning is a staged process (Brainard & Doupe and human speech learning operates on top of the sophisticated 2002). An initial period of auditory learning is needed to tune interactive, joint action, mutual emulation, and pedagogical abili- to represent sensory “prototypes” of heard speech ties, most of which are unique or at least much more developed in sounds (e.g., memorize learned song patterns heard by conspecif- our species (Pickering & Garrod 2013; Sebanz et al. 2006). The ics). These prototypes are then used as “reference signals” for im- demands of sophisticated social interactions might have contribut- itation learning; by learning to reproduce the stored template, an ed to transform vocalization from an initially quite limited sensor- animal can acquire equivalent vocal sound production skills. In imotor feat to a powerful, open-ended instrumental tool that control-theoretic terms, this process uses (auditory and articulato- permits conveying rich communicative intentions and forming ex- ry) feedback error-correction mechanisms to produce a sound tremely varied cultures (Pezzulo 2012b). In turn, we should not (sing or speech) that closely matches the stored template neglect how the intertwined sensorimotor and social sides of (Guenther & Perkell 2004). During the learning process, internal speech had a transformative impact on the destiny of our species. (inverse and forward) models are trained, too, that successively afford skilled sing or speech processing. To speed up learning, learners benefit from using self-imitation, too. Covert rather than overt singing (or speaking) might repro- Vocal learning, prosody, and basal ganglia: duce frequently heard speech sounds in the same way they are 1 encoded in their sensory maps (note that generative architectures Don’t underestimate their complexity afford this form of learning quite naturally; Hinton 2007). Using both overt and covert processes, animals (including humans) doi:10.1017/S0140525X13004184 might reproduce their stored prototypes with high fidelity, includ- Andrea Ravignani,a Mauricio Martins,a,b and ing the local accents of their communities. a The brain architecture supporting the aforementioned learning W. Tecumseh Fitch processes is incompletely known. Indeed, speech is a computa- aDepartment of Cognitive Biology, University of Vienna, A-1090 Vienna, b tionally challenging skill as it requires sensorimotor circuits to Austria; Language Research Laboratory, Lisbon Faculty of Medicine, 1649- be sensitive enough to discriminate subtle changes in speech 028 Lisbon, Portugal. sounds, and accurate enough to afford extremely precise control [email protected] [email protected] fi (e.g., of the timing of speech). The brain could finesse these prob- tecumseh. [email protected] lems by recruiting cortico-subcortical loops (especially those in- http://homepage.univie.ac.at/andrea.ravignani/ fi volving the basal ganglia and the cerebellum) especially during www.researchgate.net/pro le/Mauricio_Martins4/ fi learning. The role of these loops is seldom recognized in http://homepage.univie.ac.at/tecumseh. tch/ “ ” cortico-centric theories of motor skills (including speech), but Abstract: Ackermann et al.’s arguments in the target article need the evidence indicates that they could play an important role in sharpening and rethinking at both mechanistic and evolutionary levels. skill learning and mastery (Ackermann 2008; Caligiore et al. First, the authors’ evolutionary arguments are inconsistent with recent 2013). For example, vocal learning in the swamp sparrow might evidence concerning nonhuman animal rhythmic abilities. Second, involve a loop between forebrain neurons that establish prosodic intonation conveys much more complex linguistic information

Downloaded from http:/www.cambridge.org/core570 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

than mere emotional expression. Finally, human adults’ basal ganglia have by the dual-pathway proposal of Ackermann et al. The challenge a considerably wider role in speech modulation than Ackermann et al. for Ackermann et al.’s theory is, therefore, to account for the mod- surmise. ulation of prosody by human-specific cognitive functions (e.g., syntax), which are clearly not evolutionary homologues of While Ackermann et al.’s theory is interesting, seems plausible, primate emotional vocalizations controlled by the anterior cingu- and may initially appear tempting, it is based on incomplete read- late cortex. ings of several literatures. First, it is unclear why some of their ar- Finally, Ackermann et al. propose an ontogenetic pathway in guments should only apply to the specific instances of rhythmic which: (1) basal ganglia (BG) are important to generate integrated and prosodic control the authors discuss or why they fail to templates of orofacial and laryngeal movements during childhood, apply in other animal species. Their model assumes that enhance- but (2) in adulthood can be retrieved from cortical areas because ment of in-group cooperation and cohesion was the main driving these motor templates become well-trained. Later in ontogeny, force for the evolution of speech via the intermediate step where BG would mostly subserve the modulation of emotional vocal control and rhythm production would serve as chorusing and prosody, and not the coordination of speech production. These bonding tools. A key assumption is that speech would produce claims are not supported by currently available empirical data. rhythmic abilities as an evolutionary by-product. This scenario is For instance, Ackermann et al. cite Parkinson’s Disease (PD) in line with some empirical observations (for reviews, see Fitch data to support their claims that, in adults, BG lesions only 2012; Geissmann 2000) and previous theoretical frameworks for impair emotional prosody. In fact, PD patients with normal cogni- the origins of music (Hagen & Bryant 2003; Hagen & Hammer- tive functioning are more impaired in semantic fluency tasks than stein 2009; Merker 2000; Merker et al. 2009). However, when in phonetic fluency (Henry & Crawford 2004). Additionally, applied to language, Ackermann et al.’s evolutionary model does contra Ackermann et al., BG subserve complex syntactic and not withstand cross-species validation: Many nonhuman animals semantic processing in adults, with empirical findings consistent exhibit rhythmic behaviors while lacking speech. Before primate across PD (Dominey & Inui 2009; Henry & Crawford 2004; rhythmic abilities can be compared with humans’ at all, more ev- Lewis et al. 1998), BG lesion (Kotz et al. 2003; Teichmann idence regarding flexibility in vocalizations’ temporal patterning et al. 2008; Ullman et al. 1997), and neuroimaging research (Frie- (Fedurek et al. 2013) and motor synchronization (Hattori et al. derici & Kotz 2003). These data suggest that in adults the BG 2013) is needed in apes (cf. (Ravignani et al. 2013). support multiple functions relevant to spoken language, not just Evidence from non-primate species also seems to undermine simple emotional prosodic modulation. Ackermann et al.’s model. Two bird species, both vocal learners, Furthermore, contrary to the developmental pathway proposed have been shown to entrain to steady pulses (Hasegawa et al. by Ackermann et al., the acquisition of novel syntactic structures 2011; Patel et al. 2009a), supporting Ackermann et al.’s model in adults depends on the medial temporal cortex, and the retrieval and Patel’s hypothesis, whereby auditory-motor entrainment of syntactic templates after thorough learning mostly recruits the skills would be evolutionary by-products of vocal learning abilities BG and perisylvian structures (Ullman 2004). This evidence shows (Patel 2006). However, recent evidence suggests that vocal learn- that, contra Ackermann et al., BG are active in the retrieval of ing and rhythmic abilities might be dissociated. Sea lions, unlike over-learnt procedures. Ackermann et al. therefore need to seals, show no evidence of vocal learning (Janik & Slater 1997) propose alternative explanations to reconcile child and adult but nonetheless can reliably synchronize their movements to a data concerning the function of BG. range of musical stimuli at different tempi (Cook et al. 2013). In conclusion, to make their model robust, Ackermann et al. must Humans and sea lions are both rhythmically skilled, but only modify and refine their evolutionary and mechanistic explanations, humans evolved vocal learning and speech. Therefore, sea lions and clarify which assumptions are necessary, and which are suffi- constitute outliers inconsistent with the prediction of Ackermann cient, for their explanatory framework to hold. Is their model et al.’s model. This species evolved cognitive rhythmic abilities, robust enough to stand up to the clear, strong relationship without evolving speech. Invoking additional evolutionary forces between prosody and complex linguistic functions? How can Acker- and physiological mechanisms thus appears necessary: How can mann et al.’s model account for the complex functions of BG in Ackermann et al.’s model be modified to avoid incorrectly predict- adulthood? If in-group cohesion had to be achieved, why was ing vocal learning in rhythmic-skilled species? precise vocal control specifically selected for, rather than general Second, Ackermann et al.’s model assumes that prosodic mod- non-vocal rhythmic abilities? These and other questions need to ulation of speech conveys mainly simple motivational-emotional be addressed if Ackermann et al.’s model is to become convincing. information, and thus, that prosody and complex speech produc- tion had separate evolutionary histories. But evidence showing a NOTE 1. Andrea Ravignani and Mauricio Martins contributed equally to this tight connection between prosody and complex linguistic func- fi tions argues against this “double pathway” theory. Prosodic commentary as joint rst authors. contour is influenced by syntactic constituent structure, semantic ACKNOWLEDGMENTS relations, phonological rhythm, pragmatic considerations, as well This work was supported by Fundação para a Ciência e Tecnologia grant as by the length, complexity, and predictability of linguistic mate- SFRH/BD/64206/2009 (to Mauricio Martins) and European Research rial (Wagner & Watson 2010). Furthermore, prosodic cues are Council Advanced Grant 230604 SOMACCA (to Andrea Ravignani and used in childhood during acquisition of words (Christophe et al. W. Tecumseh Fitch). 2008) and grammatical constructions (Männel et al. 2013), and in adulthood for syntactic processing (Christophe et al. 2008; Kjel- gaard & Speer 1999; Langus et al. 2012; Wagner 2010) and word recognition (Cutler et al. 1997). Contra Ackermann et al., such complex linguistic modulation of Perceptual elements in brain mechanisms of prosody seems to be a prerequisite for the acquisition and use of acoustic communication in humans and language, and this process is likely to be influenced by cognitive nonhuman primates mechanisms specially modified in the human lineage. Compara- tive research on syntax precursors favors this hypothesis: The doi:10.1017/S0140525X13004196 ability to assemble sequences of sounds into hierarchical patterns might be either human-specific, or very poorly developed in other David H. Resera and Marcello Rosaa,b species (Conway & Christiansen 2001; ten Cate & Okanoya 2012). aDepartment of Physiology, Monash University, Melbourne, VIC3800, Hence, developmental and comparative evidence point to a more Australia; bAustralian Research Council Centre of Excellence for Integrative complex cognitive integration of prosody and speech than allowed Brain Function, Monash University Node, Melbourne, VIC 3800.

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 571 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

[email protected] and of the rostral and ventral auditory and auditory associated http://www.med.monash.edu.au/physiology/staff/reser.html areas of the temporal lobe in particular. [email protected] Anatomical tracing data have demonstrated that the rostral, http://www.med.monash.edu.au/physiology/staff/rosa.html most core area, RT, together with the belt area, RTL, have consis- tent monosynaptic connections with limbic areas in the rostral Abstract: Ackermann et al. outline a model for elaboration of subcortical prefrontal, anterior cingulate, and temporal pole cortex, as well motor outputs as a driving force for the development of the apparently as subcortical limbic structures such as the lateral amygdala and unique behaviour of language in humans. They emphasize circuits in the striatum and midbrain, and acknowledge, but do not explore, the ventral striatum (Reser et al. 2009). Moreover, evidence from importance of the auditory perceptual pathway for evolution of verbal electrophysiological and imaging studies in monkeys indicates communication. We suggest that understanding the evolution of that areas in the anterior and lateral temporal lobe cortex language will also require understanding of vocalization perception, exhibit response selectivity to sounds of increasing complexity especially in the auditory cortex. (Kikuchi et al. 2010; Kusmierek et al. 2012), including vocaliza- tions, in preference to environmental sounds (Perrodin et al. In all primate species examined so far the auditory cortex of 2011). Selective responses to speech, and, to a lesser extent, non- consists of a “core” region, comprising three areas; a complex of verbal vocalizations including laughter and baby cries, have been surrounding “belt” areas, which are thought to provide more obtained in recordings from likely homologous areas in the human complex types of processing; and a poorly characterised “parabelt” anterior superior temporal lobe (Chan et al. 2014). It is reasonable region, which is also likely to consist of multiple areas. Although to suspect that the anatomical and functional circuits formed by human homologues of the core, belt, and parabelt regions have these auditory areas undergo developmental modification in par- been tentatively identified, the system remains best characterized allel with the vocal output circuits described by Ackermann et al., in nonhuman primates. The primary auditory cortex (A1) remains given that we and other primates are expert listeners to conspecif- the best understood component of this organization, although ic vocalizations, in addition to being expert producers. This prop- recent progress has also illuminated some of the functions of osition could be tested in longitudinal studies of sub-adult animals other core regions, including the rostral auditory area (R), the ros- (e.g., via implanted electrode arrays), and by tract tracing experi- trotemporal auditory area (RT), and some belt areas (Bendor & ments involving rostral auditory areas at different developmental Wang 2008; Petkov et al. 2006; Rajan et al. 2013). stages. It has been shown that trained animals can resolve temporal Another issue with the proposed model involves where and how features of natural and synthetic human speech (Kuhl & Miller the learned motor programs would be stored and encoded in 1975), and that populations of neurons in A1 of awake, untrained “para- and subsylvian” cortical areas, and how this information monkeys exhibit responses that are consistent with categorical could be accessed by the subcortical centers controlling laryngeal perceptual boundaries. In particular, the voice onset time and pharyngeal movements. A notable feature of the connectional (VOT) parameter, which distinguishes pairs of spectrally similar anatomy of primate auditory cortex is a paucity of projections to phonemes in many languages, elicits a characteristic pattern of ex- motor cortex. While in macaques parts of the parabelt and adja- citatory and inhibitory activity in A1 of awake monkeys, which is cent polysensory cortex send connections to putative homologues consistent with activity recorded in the auditory cortex of of Broca’s area, which may be classified as a premotor area, there human subjects undergoing intracortical preoperative epilepsy are few or no monosynaptic projections to the cingulate or sub- monitoring (Steinschneider et al. 2005; 2013). Although the cate- cortical output areas which feature prominently in Ackermann gorical nature of the VOT parameter has come under scrutiny et al.’s proposal. Phonological and motor aspects of speech (Toscano et al. 2010), it remains clear that the temporal features should be considered jointly, rather than as disparate perceptual of human speech can be modelled across species – in short, the and productive components (Ziegler et al. 2012). Cortical micro- basic apparatus employed for processing of speech sound param- stimulation, as well as polysynaptic tract tracing using modified eters is phylogenetically conserved. Therefore, it is feasible to viruses make it feasible to map the connections from the auditory address questions about the interaction between vocalization pro- receptive areas to vocalization output pathways. We believe that duction and reception in animal models, and (carefully) extrapo- further studies of descending cortical modulatory areas in the an- late those results to the process of human speech. A particularly terior cingulate will likely help understand the early development ripe area for investigation of the interactions between speech pro- and evolution of language. duction and perception is in the realm of affective nonverbal content. It has been suggested that monkey vocalizations are akin to the nonverbal and automatic features of human vocaliza- tions, such as laughter (Ross et al. 2010) and infant–mother inter- action vocalizations (Whitham et al. 2007), and some monkey vocalizations have rhythmic similarities to human speech Vocal communication is multi-sensorimotor (Bergman 2013). coordination within and between individuals A key feature of the vocalization production model proposed by Ackermann et al. is developmental change in the role of striatal doi:10.1017/S0140525X13004202 and cortico-striatal circuits in vocal skill learning. They suggest that in early life, the cortico-striatal circuits are critical for devel- Daniel Y. Takahashia,b and Asif A. Ghazanfara,b,c opment of motor expertise, which is essential for normal speech aPrinceton Neuroscience Institute, Princeton University, Princeton, NJ 08544; production. Although some evidence suggests that there is bDepartment of Psychology, Princeton University, Princeton, NJ 08544; limited developmental modification of monkey vocalizations cDepartment of Ecology and Evolutionary Biology, Princeton University, (Owren et al. 1993; Winter et al. 1973), monkeys do exhibit mat- Princeton, NJ 08544. urational improvement of control over call features, a form of [email protected] [email protected] vocal “skill” (reviewed in Fedurek & Slocombe 2011), and marmo- www.princeton.edu/∼asifg set calls, in particular, undergo maturational change during pro- gress toward adult communication (Pistorio et al. 2006). The Abstract: Speech is an exquisitely coordinated interaction among effectors both within and between individuals. No account of human motor programs that result from expert learning of speech in communication evolution that ignores its foundational multisensory the Ackermann et al. model are ascribed to para- and subsylvian characteristics and cooperative nature will be satisfactory. Here, we cortical areas, though it is unclear which areas in particular are im- describe two additional capacities – rhythmic audiovisual speech and plicated. This developmental trajectory leads to several testable cooperative communication – and suggest that they may utilize the very hypotheses regarding the functional anatomy of auditory cortex, same or similar circuits as those proposed for vocal learning.

Downloaded from http:/www.cambridge.org/core572 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Both speech and nonhuman primate vocalizations are produced output, generating a babbling-like sound (Bergman 2013), is evi- by the coordinated movements of the lungs, larynx (vocal folds), dence that a coordination between lip-smacking and vocal output and the supralaryngeal vocal tract (Ghazanfar & Rendall 2008). may be easy to evolve. During vocal production, the shape of the vocal tract can be Human vocal communication is also a coordinated and cooper- changed by moving the various effectors of the face (including ative exchange of signals between individuals (Hasson et al. 2012). the lips, jaw, and tongue) into different positions. The different Foundational to all cooperative verbal communicative acts is a shapes, along with changes in vocal fold tension and respiratory more general one: taking turns to speak. Given the universality power, are what give rise to different sounding vocalizations. Dif- of turn-taking (Stivers et al. 2009), it is natural to ask how it ferent vocalizations (including different speech sounds) are pro- evolved. Recently, we tested whether marmoset monkeys com- duced in part by making different facial expressions. Thus municate cooperatively like humans (Takahashi et al. 2013). vocalizations are inherently “multisensory” (Ghazanfar 2013). Among the traits marmosets share with humans are a cooperative Given the inextricable link between vocal output and facial expres- breeding strategy and volubility. Cooperative care behaviors scaf- sions, it is perhaps not surprising that nonhuman primates, like fold prosocial motivational and cognitive processes not typically humans, readily recognize the correspondence between the visual seen in other primate species (Burkart et al. 2009a). We capital- and auditory components of vocal signals (Ghazanfar & Logothetis ized on the fact that marmosets are not only prosocial, but are 2003; Ghazanfar et al. 2007;Habbershonetal.2013;Jordanetal. also highly vocal and readily exchange vocalizations with conspe- 2005; Sliwa et al. 2011) and use facial motion to more accurately cifics. We observed that they exhibit cooperative vocal communi- and more quickly detect vocalizations (Chandrasekaran et al. cation, taking turns in extended sequences of call exchanges 2011). However, one striking dissimilarity between monkey vocaliza- (Takahashi et al. 2013), using conversation rules that are strikingly tions and human speech is that the latter has a unique bi-sensory similar to human rules (Stivers et al. 2009). Such exchanges did rhythmicity, in that both the acoustic output and the movements not depend upon pair-bonding or kinship with conspecifics and of the mouth share a 3–8 Hz rhythmicity and are tightly correlated are more sophisticated than simple call-and-responses exhibited (Chandrasekaran et al. 2009; Greenberg et al. 2003). According to by other species. Moreover, our data show that turn-taking in mar- one hypothesis, this bimodal speech rhythm evolved through the mosets shares with humans the characteristics of coupled oscilla- linking of rhythmic facial expressions to vocal output in ancestral pri- tors with self-monitoring as a necessary component (Takahashi mates to produce the first babbling-like speech output (Ghazanfar & et al. 2013) – an example of convergent evolution. Poeppel 2014; MacNeilage 1998). Lip-smacking, a rhythmic facial The lack of evidence for such turn-taking (vocal or otherwise) in expression commonly produced by many primate species, may apes suggests that human cooperative vocal communication could have been one such ancestral expression. It is used during affiliative have evolved in a manner very different than what the gestural- and often face-to-face interactions (Ferrari et al. 2009; Van Hooff origins hypotheses predict (Rizzolatti & Arbib 1998; Tomasello 1962); it exhibits a 3–8 Hz rhythmicity like speech (Ghazanfar 2008). In this alternative scenario, existing vocal repertoires et al. 2010); and the coordination of effectors during its production could begin to be used in a cooperative, turn-taking manner (Ghazanfar et al. 2012) and its developmental trajectory are similar when prosocial behaviors in general emerged. Although the phys- to speech (Morrill et al. 2012). iological basis of cooperative breeding is unknown (Fernandez- Very little is known about the neural mechanisms underlying Duque et al. 2009), the “prosociality” that comes with it certainly the production of rhythmic communication signals in human would require modifications to the organization of social and mo- and nonhuman primates. The mandibular movements shared by tivational neuroanatomical circuitry. This must have been an es- lip-smacking, vocalizations, and speech all require the coordina- sential step in the evolution of both human and marmoset tion of muscles controlling the jaw, face, tongue, and respiration, cooperative vocal communication – one that may, like vocal pro- and their foundational rhythms are likely produced by homolo- duction learning, also include changes to the cortical-basal gous central pattern generators in the brainstem (Lund & Kolta ganglia loops as well as changes to socially related motivational 2006). These circuits are modulated by feedback from peripheral circuitry in the hypothalamus and amygdala (Syal & Finlay sensory receptors. The neocortex may be an additional source in- 2011). These neuroanatomical changes would link vocalizations fluencing orofacial movements and their rhythmicity. Indeed, and response contingency to reward centers during development. lip-smacking and speech production are both modulated by the Importantly, given the small of marmo- neocortex, in accord with social context and communication sets, such changes may not require an enlarged brain. goals (Bohland & Guenther 2006; Caruana et al. 2011). Thus, one hypothesis for the similarities between lip-smacking and visual speech (i.e., the orofacial component of speech production) is that they are a reflection of the development of neocortical circuits influencing brainstem central pattern generators. Speech prosody, reward, and the corticobulbar One important neocortical node likely to be involved in this system: An integrative perspective circuit is the insula, a structure that has been a target for selection in the primate lineage (Bauernfiend et al. 2013). The human doi:10.1017/S0140525X13004214 insula is involved in, among other socio-emotional behaviors, speech production (Ackermann & Riecker 2004; Bohland & Carmelo M. Vicario Guenther 2006; Dronkers 1996). Consistent with an evolutionary School of Psychology, Bangor University, Bangor, Gwynedd LL57 link between lip-smacking and speech, the insula also plays a role 2AS, United Kingdom. in generating monkey lip-smacking (Caruana et al. 2011). It is con- [email protected] ceivable that for both monkey lip-smacking and human speech, [email protected] fi the development and coordination of effectors related to their http://www.bangor.ac.uk/psychology/people/pro les/carmelo_vicario. shared orofacial rhythm are due to the socially guided develop- php.en ment of the insula. However, a neural substrate is needed to Abstract: Speech prosody is essential for verbal communication. In this link the production of lip-smack-like facial expressions to concom- commentary I provide an integrative overview, arguing that speech itant vocal output (the laryngeal source) in order to generate that fi prosody is subserved by the same anatomical and neurochemical rst babbling-like vocal output. This link to laryngeal control mechanisms involved in the processing of reward/affective outcomes. remains a mystery. One scenario is the evolution of insular cortical control over the brainstem’s nucleus ambiguus. The fact that Speech prosody can be conceptually intended as the affective di- gelada baboons produce lip-smacks concurrently with vocal mension of verbal communication. The recognition of speech

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 573 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

prosody during social interactions has an adaptive function since it Finally, the work by Alipour et al. (2002) on new world provides information about the speaker’s intention and its emo- monkeys, Saguinus fuscicollis, provides the anatomical rationale tional states, allowing an appropriate response in different situa- to the link between corticobulbar pathway, reward, and prosody. tions (Frith 2009). In fact, this study documents the existence of a direct connectivity Current models of brain organization for prosody propose later- between the motor cortical tongue area and several subcortical alized representation based on featural (timing vs. pitch) or func- regions – such as the anterior cingulate cortex, the insula, the tional (affective vs. linguistic) characteristics of prosodic material ventral putamen, the caudate nucleus, and the amygdale – in- (see Sidtis et al. 2003). However, the role of subcortical structures volved in the processing of affective and rewarding outcomes. in prosody is being increasingly described. A further support to the current discussion is provided by neu- From the arguments provided by Ackermann et al. in the target rogenetic investigation. Ackermann et al. discuss the role of the article, one could argue that the corticobulbar tract is a key struc- FOXP2 gene on verbal communication. For example, a mutation ture linking both the affective related dimensions of verbal com- in this gene has been associated with apraxia of speech (Laffin munication and the reward system. For example, the authors et al. 2012; see also Vicario 2013a). Moreover, Shriberg et al. discuss the pivotal role of dopamine in speech prosody. Surpris- (2006) have shown that these alterations may extend to prosody. ingly, no mention was made about the role of serotonin, another The impact of FOXP2 on the activity of striatum (French et al. key monoamine of the reward system (Vicario 2013b), whose 2012) suggests the dopaminergic nature (Gale et al. 2013)of involvement at corticobulbar level has been reported in several the effect played by this gene on verbal communication. articles (e.g., Raul 2003). However, it is known that FOXP2 may modulate the serotoniner- Here, I expand upon these issues by providing arguments in gic activity of corticobulbar structures involved in the regulation of support of the suggestion that, in humans, speech prosody rewarding signals. For example, the neurons of the parabrachial might have evolved from the basic mechanisms implied in nucleus (PB) of rats constitutively express the FOXP2 transcrip- reward/affective-related functions. In particular, I propose an in- tion factor (Miller et al. 2011). PB is another key structure of tegrative overview which argues that speech prosody is subserved, the reward system; consisting of taste-responsive neurons at least in part, by the same anatomical and neurochemical mech- (Simon et al. 2006), it receives an obligatory relay from the anisms involved in the processing of reward/affective outcomes. NTS, in rodents (Tokita et al. 2004). Evidence is provided by the case of patients with Parkinson’s The research described above offers an overview about the re- disease (PD), a clinical condition characterized by a dysfunctional lationship between prosody, reward, and the corticobulbar neurotransmission of dopaminergic and serotoninergic neural cir- system. In particular, it shows that speech prosody and reward cuits (Bédard et al. 2011). PD patients are affected by a disruption processing may share similar neuroanatomical and neurochemical of the prosodic aspect of verbal utterance (Van Lancker Sidtis mechanisms. However, the evidence that both serotoninergic and et al. 2006), but these patients are known also for their swallowing dopaminergic mechanisms are involved in prosody, though cor- disorder (Potulska et al. 2003). Interestingly, De Letter et al. roborating the initial hypothesis of this commentary, poses the (2007) have shown that Levodopa induces modifications of problem of understanding how these monoamines work. A possi- prosody in advanced PD. Moreover, it was recently reported ble insight is provided by two recent articles suggesting a distinct that the deep brain stimulation of the subthalamic nucleus role for dopamine (Fiorillo 2013) and serotonin (Vicario 2013c), (STN), which modulates the activity of both dopaminergic respectively, in the processing of reward and aversiveness. By ex- (Lhommée et al. 2012) and serotoninergic (Creed et al. 2012) tending this argument to speech prosody, one could speculate that neurons, improves tongue force (Skodda 2012) and emotional dopamine subserves reward-oriented (e.g., approach) communi- speech (Brück et al. 2011) in PD. cation, while serotonin subserves punishment-oriented (e.g., Studies on animal models provide further support to this link threat) communication. between reward, prosody, and the corticobulbar tract. For example, Nuckolls et al. (2012) have shown that nigrostriatal dop- ACKNOWLEDGMENT fi aminergic depletion affects tongue force output. This phenome- I thank Abdul Malik Ibraim and the copy editorial of ce for their help in non is probably mediated by the effect of dopamine depletion checking the English used in this commentary. on several structures of the neural pathway connecting the tongue muscle with midbrain, for example, the nucleus tractus solitarius (NTS) (Granata & Woodruff 1982), a key area of the cor- ticobulbar system involved in the regulation of reward-related be- Modification of spectral features by nonhuman haviors such as food intake and swallowing. Interestingly, studies primates have shown that agonists and antagonists of dopamine spontane- ously activate neurons in the NTS (Granata & Woodruff 1982). doi:10.1017/S0140525X13004226 Tongue muscle control might also involve serotoninergic mech- anisms. For example, there is evidence that clozapine, a serotoni- Daniel J. Weiss,a Cara F. Hotchkin,b and Susan E. Parksc nergic antagonist, affects lick frequency in rats (Das & Fowler aDepartment of Psychology and Program in Linguistics, Pennsylvania State 1995). Moreover, it has been documented that NTS is an impor- University, University Park, PA 16802; bNaval Facilities Engineering tant site of action for serotonin (Lam et al. 2009). Command, Atlantic, Norfolk, VA 23508; cDepartment of Biology, Syracuse The study of animal models has also shown an involvement of both University, Syracuse, NY 13244. these two monoamines in reward-oriented communication. For [email protected] example, the research by Huang and Hessler (2008)onmalesong- http://weisslab.weebly.com/ birds reports a relationship between direct rewarding communica- [email protected] [email protected] tions (such as singing used for courtship) and dopaminergic http://parkslab.syr.edu/ mechanisms. These authors speculate that the ventral tegmentum (VT), a dopaminergic area which mediates rewarding and motivated Abstract: Ackermann et al. discuss the lack of evidence for vocal control in nonhuman primates. We suggest that nonhuman primates may be capable behaviors (Ghanbarian & Motamedi 2013), might modulate song- of achieving greater vocal control than previously supposed. In support of birds output to the higher stereotypy typical of courtship. Interest- this assertion, we discuss new evidence that nonhuman primates are ingly, a direct connectivity between VT and NTS has recently capable of modifying spectral features in their vocalizations. been shown (Alhadeff et al. 2012), which suggests some influence of VT at corticobulbar level. Salvante et al. (2010) have also reported In discussing the modulation of acoustic call structure, Acker- that serotonin may modulate communication in birds since it influ- mann et al. question the extent to which nonhuman primates ences the effect of extrinsic social factors on their singing effort. exert operant control over the spectro-temporal features of their

Downloaded from http:/www.cambridge.org/core574 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

calls. This concern echoes a long-standing notion that nonhuman parameters including spectral tilt (a relative measure of the distri- primates are capable of controlling acoustic parameters that can bution of energy between the high- and low-frequency compo- be modulated by changes in exhalation, such as loudness and nents of the call), which is an acoustic parameter that has been duration, but lack control over spectral features that may studied almost exclusively in human responses to noise (e.g., Lu require more nuanced control over the vocal apparatus (Janik & & Cooke 2009). We found that individuals modified the structure Slater 1997). of both call types in response to changes in noise amplitude In this connection, we would like to briefly review our work on and bandwidth. In CLCs, whose frequencies were overlapped noise-induced vocal modifications in nonhuman primates. When by the noise, peak fundamental frequency and spectral tilt vocalizing in noisy environments, humans and several other changed in response to increased noise amplitude and species (e.g., whales, bats, etc.; Hage et al. 2013; Parks et al. bandwidth (see our Fig. 1). While the noise stimuli did not 2011; see review in Hotchkin & Parks 2013) involuntarily raise overlap with the chirp frequencies, this call type also showed the amplitude of their vocalizations (i.e., the Lombard effect; changes to frequency content during noise. Noise with frequency Lombard 1911; Pick et al. 1989). Associated changes in vocaliza- components at or slightly below vocalization frequency may tion duration have also been documented in humans and nonhu- result in masking, due to a phenomenon known as the upward mans (e.g., Garnier et al. 2010). Some species also modify spectral spread of masking which has been observed in both humans features by shifting energy to higher harmonics (reviewed in and nonhuman mammals (Egan & Hake 1950; Nachtigall et al. Hotchkin & Parks 2013). However, consistent with Ackermann 2004). Increases in chirp frequency may provide release from et al.’s concerns, previous studies with nonhuman primates have masking by low-frequency noise, thereby improving the detect- suggested that they may be restricted to manipulating the ampli- ability even when noise frequencies do not overlap the vocaliza- tude and temporal aspects of their calls in response to noise in the tion. In chirps, the peak and maximum components of the transmission environment (Egnor & Hauser 2006; Sinnott et al. fundamental frequency increased as a result of noise level, with 1975). To the best of our knowledge, no nonhuman primate has no changes to spectral tilt. Other vocal modifications observed in- previously demonstrated the ability to modify spectral features cluded the Lombard effect (i.e., an increase in amplitude) and of their calls in response to noise. longer chirp duration. We have recently discovered this ability in cotton-top tamarins The focus of the authors is primarily on volitional control and (Saguinus oedipus), a small arboreal New World species known to modification of vocalizations in nonhuman primates, and they have an extensive vocal repertoire (Cleveland & Snowdon 1982). could therefore dismiss this finding because responses to noise A previous study with this species found that they modify the am- are thought to reflect involuntary processes, as noted above. In plitude and duration of their calls in response to noise (Egnor & fact, we do not argue that the flexibility exhibited in the Hauser 2006), and, using a different method, we have determined Lombard effect and conditioning studies necessarily set the they are also capable of modifying spectral components of their stage for the evolution of vocal flexibility as it is manifest in calls. Our findings suggest the possibility for greater vocal humans (as noted by Owren et al. 2011). However, in order to control in nonhuman primates than previously supposed (e.g., refine the dual stage hypothesis, it is important to fully describe Janik & Slater 1997). the range of flexible features available in nonhuman primate Unlike prior studies with nonhuman primates (Brumm et al. vocal communication. Research with cotton-top tamarins alone 2004; Egnor & Hauser 2006; Sinnott et al. 1975), our study has demonstrated they are capable of producing long-term used playbacks of both broad- and narrow-band white noise at a changes to the acoustic structure of their calls (i.e., vocal conver- range of amplitudes to investigate vocal control of two call gence; see Weiss et al. 2001); perceiving changes to the spectral types, chirps and combination long calls (CLCs; Hotchkin 2012; features of the harmonics in their CLCs (Weiss & Hauser Hotchkin et al. 2013). We measured a variety of acoustic 2002); modifying the timing of their calls (Egnor et al. 2007);

Figure 1 (Weiss et al.). Representative CLCs produced during (a) control and (b) treatment trials demonstrating changes to spectral tilt. All whistles from (a) have strong fundamental frequencies and maximum energy in the 2nd harmonic, while in (b) the first whistle has a very faint fundamental frequency at approximately 2 kHz, and peak frequencies for all whistles occur in the 4th harmonics. Reduced energy in the fundamental frequency is also apparent in the second and third whistles. Spectrogram parameters: 1024 point Hamming window, 75% overlap, 11.7 Hz frequency resolution.

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 575 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Commentary/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

and also altering the loudness, duration, and spectral components segmentation would rely on a left fronto-parietal network, includ- of their calls when transmitting signals in noisy environments ing Broca’s area (Wymbs et al. 2012). The second function cur- (Egnor & Hauser 2006; Egnor et al. 2006; Hotchkin 2012). rently assigned to BG is the control of movement “vigor” Further, it has been suggested that multiple levels of vocal according to motivational factors (Kurniawan et al. 2010; control may be active during Lombard vocalizations and could Mazzoni et al. 2007; Turner & Desmurget 2010). Indeed, a loss involve a complex array of neural structures extending beyond brain- of dopaminergic neurones in the substantia nigra affects the stem reflexes (Eliades & Wang 2012). Thus, the main implication of reward and decision-making processes (Mimura et al. 2006). our findings for the target article is to indicate that nonhuman pri- Bigger rewards typically lead to higher efforts (Berridge 2004; mates may possess greater vocal control than has previously been Schmidt et al. 2008; Takikawa et al. 2002) and, when selecting supposed. Developing a more complete understanding of the ways an action among multiple candidates, the choice critically in which nonhuman primates are capable of manipulating their depends on a comparison of the cost/benefit ratio for each vocalizations, and the supporting neural networks, may ultimately option (Niv et al. 2006). The motivational effect that an anticipat- help Ackermann et al. further refine their theory. ed reward has on the execution of an action is known as the “in- centive salience,” which is then translated into movement “vigor” to optimize the balance between the effort invested and the outcome value. An impairment of this process would lead to an inability to adjust the level of effort invested in movements, ex- Contribution of the basal ganglia to spoken plaining the occurrence of bradykinesia in PD patients (Mazzoni language: Is speech production like the et al. 2007). other motor skills? It is remarkable that two of the main roles assigned to BG in spoken language by Ackermann and colleagues match very well doi:10.1017/S0140525X13004238 with the contribution of BG to motor behaviour. First, following perinatal BG lesions or in patients with inherited language disor- Alexandre Zenona and Etienne Oliviera,b ders, one major spoken language impairment is “a significant dis- aInstitute of Neuroscience, School of Medicine, Université catholique de ruption of simultaneous or sequential sets of motor activities to Louvain, UCL, B-1200 Brussels, Belgium; bFondazioneIstituto Italiano di command, in spite of a preserved motility of single vocal tract Tecnologia (IIT), 16163 Genova, Italy. organs” (target article, sect. 4.2.1, para. 2) (Alcock et al. 2000a; [email protected] [email protected] Watkins et al. 2002a), which is consistent with the role of BG in www.coactionslab.com processing structured sequences. Intriguingly, these patients also present a deficit in grammatical rules acquisition (Alcock Abstract: Two of the roles assigned to the basal ganglia in spoken language et al. 2000a; Gopnik 1990a), as also reported in PD patients parallel very well their contribution to motor behaviour: (1) their role in (Chan et al. 2013; Pell & Monetta 2008). It is noteworthy that sequence processing, resulting in syntax deficits, and (2) their role in “ ” “ ” “ ” the other brain region critically involved in syntax and sequence movement vigor, leading to hypokinetic dysarthria or hypophonia. ’ This is an additional example of how the motor system has served the processing is Broca s area (Clerget et al. 2009; 2011; 2013; emergence of high-level cognitive functions, such as language. Fadiga et al. 2009; Tettamanti & Weniger 2006), which is tightly interconnected with BG (Ullman 2006). It is remarkable Besides the well-known contribution of the basal ganglia (BG) to that the two brain regions involved in chunking (Wymbs et al. motor behaviour, the numerous language deficits reported in 2012) are also those in which a lesion yields a syntax deficit sug- patients with Parkinson’s disease (PD), or with other BG gesting that the chunking process might provide the basis for hi- lesions, suggest they participate in language production, as Acker- erarchical processing and represent the common denominator mann et al. discuss in the target article. However, most inferences of BG contribution to motor and language production (Kotz about the contribution of BG to motor control based on deficits et al. 2009). observed in patients have proven to be flawed, and, despite Second, the “hypokinetic dysarthria” or “hypophonia” reported decades of investigation, the actual role of BG remains debated. in PD, and viewed by Ackermann et al. as a consequence of “a di- For example, recent studies in humans and monkeys have minished impact of motivational, affective/emotional, and attitudi- shown that a lesion of the internal part of the globus pallidus – nal states on the execution of speech movements, leading to one of the main BG outputs – leads to rather subtle motor deficits, sparse motor activity” and interpreted as a “general loss of mostly unrelated to PD motor symptoms (Desmurget & Turner ‘motor drive’ at the level of the speech motor system” (sect. 2008; 2010; Obeso et al. 2009). This suggests that many of the 4.2.2, para. 3) fits quite well with the view that BG play a symptoms resulting from BG lesions, including language deficits, central role in controlling the movement “vigor” according to mo- are likely to result from the perturbation, by noisy BG signals, of a tivational factors (Turner & Desmurget 2010). According to this larger network of cortical and subcortical structures. view, bradykinesia and hypophonia would not arise from a mere Nonetheless, two motor functions have emerged recently as impairment of the motor command execution but from a loss of being distinctively imputable to BG. First, it appears almost undis- influence of the motivational drive on the motor output. Acker- putable that BG contribute to motor sequence learning (Desmur- mann and colleagues go one step further in proposing that, in get & Turner 2010; Obeso et al. 2009; Turner & Desmurget the case of language, this impairment would not only result in a 2010). This is in sharp contrast to the idea that BG are involved reduced amplitude of the motor output but also in a decreased in the storage and execution of overlearned movements, or motivational and emotional modulation of speech. This could be “habits” (Graybiel 2008). Interestingly, a recent study has shown paralleled, in the context of motor behaviour, to the relative that BG are involved in motor chunking (Wymbs et al. 2012), as lack of spontaneous facial expression exhibited by PD patients already suggested by the finding that this process is dopamine-de- despite their preserved ability to produce posed facial expression pendent (Boyd et al. 2009; Tremblay et al. 2010; 2009). Chunking (Smith et al. 1996). Even though there are currently very little is a key mechanism in sequence learning, and it comprises two dis- data supporting this hypothesis, this is an interesting idea that tinct processes occurring at different stages: a first operation, deserves further investigation. called “segmentation,” consists of parsing sequences into shorter The similitude between the putative functions of BG to motor clusters (Clerget et al. 2012; Sakai et al. 2003; Verwey 2001; behaviour and language provides an additional example of how the Verwey & Eikelboom 2003), and this is followed by “concatena- motor system has served the emergence of high-level cognitive tion,” which consists of assembling these short chunks into functions, by minimal transformation, from ancestral structures longer clusters (Sakai et al. 2003; Verwey 1996). Whereas BG already present in nonhuman primates (Andres et al. 2008; would be involved in the concatenation process, the initial Dehaene & Cohen 2007; Olivier et al. 2007).

Downloaded from http:/www.cambridge.org/core576 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Response/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates ’ neurobiology of basal ganglia functions, as well as motor Authors Response skill learning and paleoanthropological concepts. In partic- ular, the following issues have been addressed: (i) the ca- pacities of nonhuman primates to control vocal behavior Phylogenetic reorganization of the basal and to produce species-atypical calls; (ii) the constraints of vocal tract anatomy on vocalizations; (iii) the scope of ganglia: A necessary, but not the only, bridge – – over a primate Rubicon of acoustic birdsong as a model of at least some aspects of human spoken language; (iv) the relationship of the FOXP2 gene communication to motor functions – or, more specifically – vocal behavior doi:10.1017/S0140525X1400003X across mammalian and avian taxa; (v) the contribution of a b corticobulbar tracts and brainstem central pattern genera- Hermann Ackermann, Steffen R. Hage, and – – Wolfram Ziegler c tors besides and beyond the basal ganglia to acoustic aNeurophonetics Group, Centre for Neurology – General Neurology, Hertie human communication; (vi) the rhythmic organization Institute for Clinical Brain Research, University of Tuebingen, D-72076 and oscillatory underpinnings of behavior; (vii) the impact Tuebingen, Germany; bNeurobiology of Vocal Communication Research of auditory and audiovisual information as well as social Group, Werner Reichardt Centre for Integrative Neuroscience, and Institute for factors on speech acquisition; (viii) the interactions of Neurobiology, Department of Biology, University of Tuebingen, D-72076 motor speech learning with preceding subverbal stages of Tuebingen, Germany; cClinical Neuropsychology Research Group, Municipal Hospital Munich-Bogenhausen, D-80992 Munich, and Institute of Phonetics acoustic communication; (ix) the contribution of cortico- and Speech Processing, Ludwig-Maximilians-University, D-80799 Munich, striatal circuitry to “speech learning” in adulthood; (x) the Germany. broad range of cognitive basal ganglia functions beyond [email protected] vocal-emotional expression and motor aspects of language; [email protected] and, finally, (xi) paleoanthropological aspects of the target [email protected] article such as the benefits of the initial articulatory http://www.hih-tuebingen.de/neurophonetik/ efforts of our species and the speaking capabilities of http://www.vocalcommunication.de http://www.ekn.mwn.de Neanderthals. We gratefully appreciate all the contributions which have Abstract: In this response to commentaries, we revisit the two main helped us to further specify our argument and have broad- arguments of our target article. Based on data drawn from a variety of ened our view on primate acoustic communication – in research areas – vocal behavior in nonhuman primates, speech extant nonhuman cousins, extinct relatives from the physiology and pathology, neurobiology of basal ganglia functions, genus Homo, and in our own species. In this response, motor skill learning, paleoanthropological concepts – the target fi we have organized the various commentaries into four article, rst, suggests a two-stage model of the evolution of the broad subject areas: (a) nonhuman primate vocal behavior crucial motor prerequisites of spoken language within the hominin (and birdsong), which we discuss in section R2; (b) contri- lineage: (1) monosynaptic refinement of the projections of motor cortex to brainstem nuclei steering laryngeal muscles, and (2) butions of the basal ganglia to mature spoken language pro- subsequent “vocal-laryngeal elaboration” of cortico-basal ganglia duction/affective-vocal behavior (sect. R3); (c) role of the circuits, driven by human-specific FOXP2 mutations. Second, as basal ganglia in ontogenetic speech acquisition (sect. R4); concerns the ontogenetic development of verbal communication, and (d) paleoanthropological perspectives of articulate age-dependent interactions between the basal ganglia and their speech acquisition (sect. R5). In the concluding section, cortical targets are assumed to contribute to the time course of the R6, we summarize some of the main points/key questions acquisition of articulate speech. Whereas such a phylogenetic likely to be entailed in further investigations of the phyloge- reorganization of cortico-striatal circuits must be considered a netic reorganization of the basal ganglia. necessary prerequisite for ontogenetic speech acquisition, the 30 commentaries – addressing the whole range of data sources referred to – point at several further aspects of acoustic communication which have to be added to or integrated with the R2. Nonhuman primate vocal behavior: An presented model. For example, the relationships between vocal underestimated or an inadequate vantage point for tract movement sequencing – the focus of the target article – and models of spoken language evolution? rhythmical structures of movement organization, the connections between speech motor control and the central-auditory and R2.1. Volitional control of vocal behavior in nonhuman central-visual systems, the impact of social factors upon the primates development of vocal behavior (in nonhuman primates and in our species), and the interactions of ontogenetic speech acquisition – Based upon a review of the behavioral organization and the based upon FOXP2-driven structural changes at the level of the neuroanatomic underpinnings of acoustic communication basal ganglia – with preceding subvocal stages of acoustic in nonhuman primates, we proposed in the target article communication as well as higher-order (cognitive) dimensions of that these species lack the capacity “to combine laryngeal phonological development. Most importantly, thus, several and orofacial gestures into novel movement sequences” promising future research directions unfold from these (sect. 2.3), rendering them virtually unable to generate contributions – accessible to clinical studies and functional imaging in our species as well as experimental investigations in nonhuman even the simplest speech-like vocal emissions, that is, primates. acoustic events in the form of one or more syllable- shaped signal pulses. Several commentaries suggest that R1. Introduction we might have underestimated the versatility of vocal func- tions in our primate relatives: The 30 commentaries have elaborated upon all aspects of 1. For example, commentators de Boer & Perlman the target article, extending from vocal behavior in nonhu- report that Koko, a human-reared female gorilla, learned man primates to speech physiology and pathology, the to display some species-atypical vocalizations (“breathy

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 577 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Response/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates grunt-like vocalizations” and “mock ‘coughs’”), indicating at duration, repetition rate, and spectral composition in re- least rudimentary voluntary laryngeal control. Comparable sponse to masking ambient noise rather than volitionally observations of species-atypical acoustic events (“extended controlled modification of vocal output (e.g., Brumm & grunts”) in captive chimpanzees, often as a component of Slabbekoorn 2005; Brumm & Zollinger 2011). Recently, multimodal and intentional display scenarios, are men- Hage and colleagues reported such vocal shifts to show tioned in the commentary by Meguerditchian, Tagliala- an extremely brief delay and to emerge at a latency of tela, Leavens, & Hopkins (Meguerditchian et al.) less than a hundred milliseconds after noise onset (Hage 2. Recent experiments by Weiss, Hotchkin, & Parks et al. 2013). Taking into account that single neurons in (Weiss et al.) found modification of the spectral structure the periaqueductal gray (PAG) change their vocalization- (“spectral tilt”) of the vocalizations of cotton-top tamarins related firing rates already around 400 msec prior to call under specific conditions such as a noisy environment. onset (Larson & Kistler 1984), these results indicate that 3. Finally, Lameira points at an eventually salient role the Lombard effect – and at least one of its acoustic corre- of the voiceless calls of great apes in speech evolution, lates – might be controlled by a neuronal network located which are “underlined by voluntary control and maneuver- within the brainstem rather than by superordinate higher- ing of supra-laryngeal articulators (...) in apparent homolo- order brain structures. Furthermore, modifications of the gy to the articulatory movements of voiceless consonants.” spectral features of vocal output such as those reported by Weiss et al. might be caused by alterations of an We readily admit the existence of – though highly ’ – animal s motivational state under different noise condi- limited volitional control over some aspects of vocal tions. A study in squirrel monkeys has, for example, behavior in nonhuman primates. In fact, recent studies found an increase in aversion to be correlated with an by one of us (Hage, and colleagues) show that rhesus upward shift of the maximal energy of the power spectrum monkeys are capable of volitionally initiating vocal output, of some call types (Fichtel et al. 2001). Taken together, that is, able to switch between two distinct call types changes in call structures do not necessarily point at specific from trial to trial in response to different visual cues in an volitional control capabilities, but may be mediated by operant conditioning task (Hage & Nieder 2013; Hage lower-level brainstem mechanisms. et al. 2013). Furthermore, single-cell recordings identified neurons in the monkey homologue of human Broca’s area – located within the ventrolateral prefrontal cortex – R2.2. Auditory-motor interactions in nonhuman (and that specifically predict such volitionally triggered calls, human) primates suggesting a crucial engagement of the monkey homologue of human Broca’s area in vocal initiation processes, a puta- Reser & Rosa call attention to the tight relationship tive precursor for speech control in the primate lineage between perception and production of species-typical (Hage & Nieder 2013). vocal behavior in nonhuman primates. Most importantly, However, such preadaptations of human vocal tract “the basic apparatus employed for processing of speech motor control in our nonhuman relatives do not pose a sound parameters is phylogenetically conserved” and, threat to our model. To the contrary, a complete absence thus, available to our cousins as well. As a hint towards of any precursors would raise the question of how the sug- tight connections between the auditory and the motor gested FOXP2-driven reorganization of cortico-striatal cir- domains of human vocal behavior, specific motor circuits cuits could have gained a foothold in the primate have been found to be recruited during the analysis of “communicating brain” in the first place. At the laryngeal speech sound features, as described in the commentary level, nevertheless, learned species-atypical sounds are re- by Pezzulo, Barca, & D’Ausilio (Pezzulo et al.). stricted to breathy-voiced (de Boer & Perlman) or ex- Besides frontal cortex, subcortical structures may contrib- tended grunts (Meguerditchian et al.). These ute to these encoding processes as well (Ackermann & vocalizations, therefore, lack a property which we consider Brendel, in press). essential to the communicative efficiency and the genera- More specifically, speech acquisition represents a variant tive potential of the sound structures prevailing in all of “vocal production learning,” that is, the capacity “to re- spoken languages, that is, the syllabic patterning of vocal produce by voice patterns of sound first received by ear,” tract movement sequences. This specific compositional as Merker writes (italics ours), and, therefore, must be ex- principle requires the control of the laryngeal sound pected to involve tight auditory-motor interconnections. source to become part of a meshwork of phonetic gestures However, the target article focuses on the motor side of which are organized – on the basis of precisely defined vocal production learning, and herein rests, in our view, a phase-relationships – as syllable-shaped gestural scores major obstacle for speech acquisition in nonhuman pri- (e.g., Goldstein et al. 2006; see Figure 2C of the target mates (see also sect. R4 here). Nevertheless, as alluded to article). by Reser & Rosa, studies of the connections between Besides changes in spectral call features, the experi- central-auditory and central-motor systems in nonhuman ments by Weiss et al. – referred to in their commentary – primates, including limbic structures, should provide gave rise to an increase in vocal amplitude in response to further opportunities for an elucidation of language evolu- noise (Lombard effect). Under these conditions, modifica- tion. As a highly intriguing aspect of the perception-pro- tions of call amplitude and spectral structure, conceivably, duction links within the domain of musicality, Honing & are rooted in a common cerebral mechanism and, thus, Merchant discuss the differential sensitivity to rhythm may represent components of a multifaceted vocal re- and beat in nonhuman primates as a basis for the proposed sponse pattern. Most probably, the Lombard effect – and gradual audiomotor evolution hypothesis (see also the com- its associated acoustic sequels – reflects involuntary mentary by Ravignani, Martins, & Fitch [Ravignani changes of several call parameters such as amplitude, et al.]).

Downloaded from http:/www.cambridge.org/core578 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Response/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

R2.3. Rhythmical entrainment and interlocutor mentioned by Takahashi & Ghazanfar, comparative investi- coordination as speech precursors? gations of the neural bases of vocal behavior and non-vocal facial expression are definitely warranted. As noted in the Takahashi & Ghazanfar and Bryant have contributed commentary by Meguerditchian et al., the vocal behavior two elucidating commentaries which suggest a precursor of chimpanzees is associated, depending upon communica- role of rhythmical facial activities and rhythmically en- tive content, with differential orofacial motor asymmetries. trained vocal and non-vocal behaviors in nonhuman primates for the rhythmical organization of verbal utteranc- es, on the one hand, and for the coordination of interlocu- R2.4. Commonalities between birdsong and human tors in human conversation, on the other. This notion spoken language: A more adequate vantage point for conforms to recent phonetic accounts of speaking as a scenarios of spoken language evolution? quasi-rhythmically entrained motor activity (e.g., Apart from a brief final paragraph related to birdsong, the Cummins 2009), interlinked with rhythmical principles target article focuses on precursors of spoken language engaged in the organization of auditory speech perception within the primate clade, trying to “delineate how these (Rothermich et al. 2012). Thus, Peelle and Davis (2012) remarkable motor capabilities [underlying speech produc- consider slow oscillatory activity of cortical neuronal assem- tion] could have emerged in our hominin ancestors” blies as a physiological basis for the processing of quasi- (target article, Abstract). Four commentaries plead for a rhythmical structures in speech comprehension, and broader perspective, including, especially, avian vocal Wilson and Wilson (2005) provided an oscillator model of behavior (Beckers, Berwick, & Bolhuis [Beckers the turn-taking behavior of speakers during conversation. et al.], Merker, Petkov & Jarvis, Pezzulo et al.). Hence, the rhythmical entrainment approach embarks on Beckers et al. even raise the concern that – with respect a close interlacing of vocal tract motor mechanism with au- to speech and language –“common descent may not be a ditory-perceptual processes in speech, and relates it to the reliable guiding principle for comparative research” and, cooperative nature of linguistic interactions. Allusions to most importantly, that this approach may miss the unique the rhythmicity of spontaneous and posed laughter and to aspects of language per se, “given the already strong the role of laughter “in coordinating conversational ” parallels between humans and songbirds in terms of audito- timing, as highlighted by Bryant, point at a deeply ry-vocal imitation learning, and the often remarkable artic- entrenched rhythmical basis of verbal utterances. Besides ulatory skills in many avian species” (see also the first brainstem centers and the insula (see comments by paragraph of the commentary by Merker for a similar argu- Takahashi & Ghazanfar), most importantly, clinical and ment). It goes without saying that a broader perspective functional imaging studies in humans suggest the rhythmi- would have provided a more elucidating scenario, and cal organization of verbal vocal behavior to be associated might have helped to define the major constraints acting with the basal ganglia (e.g., Ackermann et al. 1997b; upon speech evolution mechanisms and to narrow down Konczak et al. 1997; Riecker et al. 2002). Furthermore, research questions in primate studies. But all the common- rhythmical entrainment processes during speech produc- alities between human verbal communication and the tion may serve as a target of therapeutic intervention tech- acoustic behavior of non-primate mammals or songbirds niques in speech-disordered patients (e.g., Brendel & – “ cannot dispense us from the challenge of clarifying in suf- Ziegler 2008). So far, nevertheless, very little is known ficient detail – how highly vocal, but speechless primates about the neural mechanisms underlying the production have ultimately acquired the unique motor capabilities of rhythmic communication signals in human and nonhu- ” that enable us to gossip in well-articulated utterances. As man primates (as Takahashi & Ghazanfar point out in a matter of fact, “there is little direct comparative evidence their commentary), and this issue, surely, deserves in the primate literature to suggest that the cortico-striatal- further investigations. thalamic system is strikingly different in humans relative to The commentary by Takahashi & Ghazanfar draws at- nonhuman primates” (Petkov & Jarvis). In our proposal, tention to, among other things, experimental work on lip- the differences are restricted to the vocal domain and smacking in nonhuman primates, an emotional social involve a – within the primate lineage – human-specific signal whose frequency largely corresponds to the syllabic – vocal elaboration of otherwise primate-general cortico- rhythm of human speech. It is an intriguing idea and a striatal circuits, allowing for the sequencing of laryngeal valuable expansion of the frame/content concept developed and supralaryngeal gestures according to auditory tem- by MacNeilage and Davis (2001; see also MacNeilage 1998; – plates (see comments of Zenon & Olivier for a discussion 2008) that the superimposition of a voice signal onto the of sequencing as a basic basal ganglia function, see also lip-smacking cycle in gelada baboons has rendered this Lieberman’s commentary). social signal audible and may, thereby, have paved the way for the evolution of speech as a rhythmical oral- facial-laryngeal activity within auditory-visual displays. R3. The basal ganglia in mature speech production From the perspective of the model developed in our and affective-vocal behavior: A major player or a target article, however, the notion of two parallel layers negligible factor? of lip-smacking and vocalization behavior still lacks an im- portant ingredient: The phonatory mechanisms generating Based upon behavioral and neurobiological data obtained the voice signal during speaking involve a precisely timed from nonhuman primates and from our species, we have and smooth interaction of laryngeal gestures with the argued for a crucial role of the basal ganglia during movements of supralaryngeal movements as sketched in mature speech production in terms of the implementation Figure 2C of the target article. Considering the “inextrica- of emotive prosody, that is, the “affective tone” of verbal ut- ble link between vocal output and facial expressions” terances. A series of recent functional imaging studies,

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 579 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Response/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates indeed, provides further evidence for an engagement of the disorders of the basal ganglia have been observed to give basal ganglia in affective-vocal behavior, as highlighted in rise – though rather infrequently – to mostly transitory syn- the commentary by Frühholz, Sander, & Grandjean dromes of an aphasia (but not compromised speech), and (Frühholz et al.). However, we are by no means suggest- the concept of “subcortical aphasia” has been widely ac- ing that basal ganglia functions are restricted to “just simple knowledged. Nevertheless, any interpretation of these find- emotional prosodic modulation”–a critical objection ings in terms of the relevant functional-neuroanatomic brought forward by Ravignani et al. By contrast, we substratum must take into account alternative interpreta- fully acknowledge that “the basal ganglia support multiple tions. First, left-hemispheric subcortical lesions may give functions relevant to spoken language” and that, more spe- rise to diaschisis effects within the overlying fronto-temporal cifically, these subcortical structures must be expected to cortex, that is, hypometabolism – and subsequent dysfunc- engage in “complex syntactic and semantic processing in tion – of the perisylvian “language zones” (Weiller et al. adults” (see fifth paragraph of the commentary by Rav- 1993). Second, more advanced stages of Parkinson’sdisease ignani et al.). Against the background of several parallel and so-called atypical Parkinsonian syndromes may be associ- but interacting basal ganglia loops, including limbic, ated with damage to cortical areas affecting, eventually, lan- motor, and cognitive components (see, e.g., Fig. 3 of the guage functions. target article), multiple contributions of the basal ganglia A further critical comment put forward by Ravignani to speech and language are not only conceivable, but et al. also relates to the role of the basal ganglia in must even be expected. Thus, we agree that syntactic higher-order language processing. Based on experiments (Teichmann et al. 2005; Ullman 2001) and semantic probing the learning of novel syntactic structures in (Cardona et al. 2013) processes may hinge upon cortico- adults, they claim that these subcortical nuclei engage in striatal circuits (see also our response to Lieberman in the retrieval – rather than the acquisition – of overlearned subsequent paragraphs). procedures, implicitly suggesting that a similar relationship Furthermore, the target article by no means “assumes should hold for motor speech processes as well. Yet, the that prosodic modulation of speech conveys mainly short-term encoding of artificial syntactic structures simple motivational-emotional information”–a concern under experimental conditions in adulthood and their sub- raised by Ravignani et al. (see Note 1 of the target sequent retrieval are not necessarily the same thing as the article). We excluded linguistic prosody from our review long-term acquisition of speech motor routines during because the modulation of prosody by human-specific cog- infancy and childhood, and their retrieval in adults need nitive functions (e.g., syntax) is, most presumably, a compo- not depend on the same cerebral network. These sugges- nent of the left-hemisphere language system and must be tions could explain why the findings of novel syntax learning strictly separated – both at the functional and the neuroan- experiments are not compatible with the clinical data ob- atomic level – from emotive prosody (see, e.g., Sidtis & Van tained from speech-disordered infants and adults cited in Lancker Sidtis 2003). As a consequence, we fully support our target article (sect. 4.3.2.), which demonstrate that the suggestion that linguistic prosody is related to the engagement of the basal ganglia declines – though it “human-specific cognitive functions,” which – in contrast does not necessarily cease – across the time course of to emotional tone –“are clearly not evolutionary homo- speech acquisition. logues of primate emotional vocalizations” (Ravignani Commentators Hasson, Llano, Miceli, & Dick et al.). (Hasson et al.) raise principal concerns over the “viability The first part of Lieberman’s comments also raises a of BG [basal ganglia] as a speech/emotion synthesizer,” strong argument for a broad variety of motor, cognitive, since these subcortical structures lack “the capacity to and behavioral functions of the basal ganglia, based upon monitor and correct for related errors, that is, evaluate “a network of segregated cortical-to-basal neural circuits that the intended emotive tone/prosody was instantiated.” linking areas of motor cortex and prefrontal cortex.” The They argue that: (i) The basal ganglia cannot provide the nec- common basic operation across these domains seems to essary fast auditory feedback; (ii) processing of emotive be the task-dependent “switching” between motor and cog- prosody is mainly bound to lateral-temporal systems of the nitive responses or movements during “internally guided cortex; and (iii) basal ganglia dysfunctions fail to compromise acts.” Section 4.3.1. of the target article pays full credit to the perception of “emotional speech variations.” Parentheti- this firmly established model. Nevertheless, more recent cally, it is indeed the case that patients with Parkinson’s work shows that interconnections between these loops disease – at least in more advanced stages – may show im- are also of considerable importance (see Fig. 3 of the paired emotion recognition (see, e.g., Breitenstein et al. target article), especially in order to better understand 1998). More importantly, however, the basic premise of the the striatal interface of emotional/motivational and motor argument is – in our view – unwarranted. We by no means functions as well as the psychomotor aspects of striatal dis- want to curtail the relevance of (auditory) feedback within orders (see, e.g., Jankovic 2008). the domain of (speech) motor control, but why must the While we support the main thrust of Lieberman’s argu- basal ganglia – in order to implement emotive prosody – be ment, we have some concerns over the clinical data re- embedded into a “fast” feedback loop? Rather, as suggested ferred to, that is, the contention that the “speech by Frühholz et al.,the“temporal slow prosodic modulations production deficits of Parkinson’s disease and focal of emotional speech … seem to rely on feedback processing lesions to the basal ganglia are qualitatively similar to in the AC [auditory cortex].” But whatever the role of auditory ones occurring in aphasia.” As regards speech motor im- feedback within the area of vocal-emotional processing, the pairments in a narrow sense, there is definitely no similarity suggestions of Hasson et al. are at variance with a solid tradi- between Parkinson’s dysarthria, on the one hand, and tion of clinical neurology. All Parkinsonian symptoms are, for speech apraxia or phonological impairments after left ante- example, “dependent on the emotional state of the patient” rior cortical lesions, on the other. We acknowledge that (Jankovic 2008). Based upon, among other things, such

Downloaded from http:/www.cambridge.org/core580 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Response/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

observations, it is widely acknowledged that the basal ganglia Ellgring & Scherer 1996), and this clinical constellation is operate as a dopamine-dependent interface between the assumed to be associated with an imbalance of serotonergic limbic system and various motor areas (see, e.g., Mogenson (and noradrenergic) neurotransmission – a still central, et al. 1980, referred to in sect. 4.2.2. of the target article). though not sufficient pathophysiological model (Massart Vocal-affective expression represents just one aspect of this et al. 2012). Vicario speculates that “dopamine subserves broader spectrum of psychomotor basal ganglia functions reward-oriented (e.g., approach) communication, while (the second part of the commentary by Zenon & Olivier pro- serotonin subserves punishment-oriented (e.g., threat) vides a lucent account of these relationships). The projections communication.” Conceivably, thus, both dopamine and from the limbic to the motor basal ganglia loop can be consid- serotonin depletion might converge upon “motor aproso- ered the neurobiological substratum of psychomotor interac- dia” as a default vocal constellation. In contrast to the dop- tions, and this circuitry represents – contrary to the claims by amine, unfortunately, the neurobiological bases of Hasson et al. – a relatively well-established functional-neuro- serotonin effects are still by far less elaborated. Any anatomic model at this time, extending from the level of attempt towards an integration of both systems physiology to the level of molecular biology (see systems into a common functional-neuroanatomic frame- sect. 4.3. of the target article). work of the control of vocal behavior remains, thus, prema- Besides the structures depicted in Figure 4 of the target ture at the moment. article, which centers around the basal ganglia, further cor- tical and subcortical structures engage in speech motor control or, more generally, contribute to verbal communi- R4. Basal ganglia and ontogenetic speech cation – such as the anterior cingulate cortex (briefly re- acquisition: A so far neglected role of cortico- ferred to in the last part of sect. 4.3.1. of the target striatal circuits? article), rostral parts of the inferior frontal gyrus, auditory cortical areas, and the cerebellum (see Frühholz et al.’s Besides adult speech production (see sect. R3.) and phylo- commentary). Whereas these regions do not play a signifi- genetic language evolution (see sects. R5.1. and R5.2.), the cant role in our argument, we, nevertheless, highly appre- target article proposes a crucial role of the basal ganglia in ciate Frühholz et al.’s Figure 1, which incorporates the the ontogenetic development of verbal communication. afore-mentioned structures into Figure 4 of the target Several commentaries correctly point at the multilevel article. Interestingly, both Hasson et al.’s and Frühholz and multifaceted organization of an individual’s speech de- et al.’s commentaries proffer the cerebellum – rather than velopment and, correctly, complain that the target article the basal ganglia – as the region most likely to “imbue misses one or another aspect of this more complex speech with emotive content” (Hasson et al.’s phrase for picture: For example, (i) “the impact of the proximal the role these authors see us attributing to the BG). A sig- social environment” (Aitken) on the ontogenetic emer- nificant contribution of the “small brain” to speech motor gence of communicative capacities (Aitken and Bornstein control is beyond any dispute (Ackermann 2008), though, & Esposito); (ii) the influence of auditory-perceptual abil- in parentheses, the cerebellum does not appear to pertain ities already available to newborns and young infants (audi- to the cerebral network underlying acoustic communica- tory streaming, speech sound discrimination, melody tion in nonhuman primates (e.g., Kirzinger 1985). processing) upon vocal imitation capacities (Lenti Boero; However, cerebellar disorders do not give rise to a constel- see also Reser & Rosa for the domain of nonhuman pri- lation of motor aprosodia, that is, a monotonous and hypo- mates); (iii) the role of comprehension “which almost by phonic voice lacking affective deflections as in Parkinson’s law ontogenetically and cognitively precedes production” disease (for reviews, see Ackermann & Brendel, in press; during speech development (Bornstein & Esposito); (iv) Ackermann et al. 2007). Instead, the syndrome of ataxic the – highly intriguing – influence of listening to the vocali- dysarthria is predominantly characterized by articulatory zations of nonhuman primates on cognitive core-capacities deficits with irregular distortions of consonants and such as concept formation in infants during the first months vowels. The cerebellar cognitive affective syndrome – re- of life (Ferguson, Perszyk, & Waxman [Ferguson ferred to by Hasson et al. – has been reported, admittedly, et al.]); (v) the “possibility to refer to an object” (Lenti to comprise abnormalities of speech prosody in terms of a Boero); (v) and the obvious fact that speech motor plasticity high-pitched voice of a “whining, childish and hypophonic does not – or at least must not – end after childhood quality,” emerging, especially, in bilateral or generalized (McGettigan & Scott). disease processes (Schmahmann & Sherman 1998, At the end of the target article (sect. 7, “Conclusions”), p. 564; eight patients out of a total of 20 subjects with cer- we have briefly mentioned the importance of auditory- ebellar pathology). Most presumably, these perceived voice motor networks and the social environment within the abnormalities reflect impaired lower-level, that is, reflex- context of phylogenetic language evolution. We readily ac- mediated control of pitch stability in a subgroup of cerebel- knowledge that these functional interconnections also hold lar patients as documented, for example, by Ackermann for ontogenetic speech development. However, the target and Ziegler (1994) – rather than a compromised ability to article focuses on a distinct, but crucial, motor aspect of “imbue speech with emotive content.” the acquisition of articulate speech, that is, the concatena- Vicario points out that the target article does not pay any tion of vocal tract movements into coarticulated syllabic se- attention to the role of serotonin, that is, “another key quences; and a more exhaustive account would have been monoamine of the reward system” besides the neurotrans- beyond the scope of the review. Nevertheless, two com- mitter dopamine. We highly appreciate this observation. mentaries touch upon the motor level of ontogenetic Apart from Parkinson’s disease, major depression may speech development. Whereas the target article focuses also give rise to a monotonous/hypophonic voice lacking af- on the emergence of increasingly overlearned sequences fective deflection (e.g., Alpert et al. 2001; Cohn et al. 2009; of consonant-vowel syllables, the commentaries by Oller

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 581 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Response/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates and Lenti Boero further specify the preverbal vocaliza- Bölte, Sigafoos, & Einspieler (Marschik et al.) point tions of infants. at a further approach to the analysis of the operation of Oller points out that “phonatory events” (“protophones”) the central pattern generators within the speech domain, lacking significant supralaryngeal, that is, articulatory, modifi- that is, neurodevelopmental disorders such as Rett syn- cation characterize the early stages of human vocal develop- drome, a highly promising future research area. ment, especially, the first 3 months of life. These observations indicate the maturation of the laryngeal appara- tus to precede the maturation of the cortico-striatal circuits R5. Paleoanthropological perspectives of bound to language production. At least in this regard, ontog- articulate speech acquisition: How did peripheral eny, thus, appears to recapitulate phylogeny. Furthermore, and cerebral adaptations interact, and does a Lenti Boero highlights the “radical transformation” of focus on functional anatomy miss the crucial human vocal behavior during the first year of life, that is, parts of the story? “the substitution of the cry, an analog signal . . . with articulat- R5.1. Corticobulbar-laryngeal and striatal contributions ed speech-like sounds.” Whereas infant cries, most presum- to spoken language evolution: Who takes the lead? ably, depend upon a primate-general cerebral network, it is, in our view, the cortico-striatal circuitry which then steps in The introductory section of the target article suggests the as a prerequisite of speech motor learning. “inability of nonhuman primates to produce even Our focus on the contribution of cortico-striatal circuits the most simple verbal utterances” to be due to “more to speech acquisition in childhood by no means excludes crucial” cerebral limitations of motor control rather than a persisting engagement of the basal ganglia in speech vocal tract anatomy (sect. 1.1, para. 3). Deliberately, this motor plasticity mechanisms at a more advanced age. formulation (“more crucial”) does not exclude additional Indeed, as illustrated by McGettigan & Scott in their phylogenetic adaptations of the human speech apparatus comment, adaptive adjustments of speaking extend well at a peripheral level, including the shape of the vocal into adulthood and even senescence – in response to a folds – as suggested by de Boer & Perlman. These variety of internal and external conditions such as alter- authors hint at a larger source-filter coupling in apes as ations of peripheral-anatomic structures during aging or compared to human vocal tract anatomy – an observation ambient dialectal influences causing gradual sound that seems to reinforce our notion of the human larynx as changes in adults. We are not aware of any data supporting an independent and coordinate player within the orchestra the implication of the basal ganglia in such extended speech of speech organs (see Fig. 2C of the target article). Obvi- motor adaptation mechanisms, but a recent functional ously, the strongly coupled source-filter system of apes imaging found cortico-striatal circuits to be engaged in does not allow for the same versatility of acoustic pattern second language vocabulary learning (Hosoda et al. 2013; generation as the (relatively) uncoupled human system. see commentary by Hanakawa & Hosoda). Since the ex- As a consequence, the control of the more independent perimental design of this study emphasized pronunciation source and filter mechanisms of the human vocal appara- training, the task must, apparently, have challenged the tus – specifically, the coordination of laryngeal and supra- motor aspects of speech production. Though adult second laryngeal gestures – must involve the regulation of a language learning cannot be equated with the adaptive mech- greater number of degrees-of-freedom and, therefore, anisms influencing adult speech, these data point at least at should require enhanced neural control mechanisms. the possibility of a significant contribution of the basal Against this background, the “vocal elaboration” of the ganglia to a continuing process of modulation of motor cortico-striatal circuitry described in our model nicely speech mechanisms across adulthood – based, presumably, meshes with the peripheral vocal tract modifications that upon dopaminergic reward signals associated with successful may have occurred within the hominin lineage – in line articulatory performance (see also the comments by Vicario, with the comments by de Boer & Perlman. and further discussion below). Hence, our proposal does not Lieberman strongly rejects the assumption of a major assume two distinct computational subsystems of the basal contribution of monosynaptic corticobulbar connections ganglia supporting immature and mature speech motor to the phylogenetic development of articulate speech: He control, respectively. We rather aimed at presenting a writes, “in itself, enhanced laryngeal control of phonation model in which these subcortical nuclei assume two roles, would not have yielded the encoding of segmental pho- that is, (i) a system supporting speech motor learning mecha- nemes that is a unique property of human speech.” In nisms, and (ii) a pivot between motivational-emotive and stark contrast, Merker deemphasizes the role of the volitional mechanisms during speaking, with a gradual basal ganglia and puts the corticobulbar connections to decrease of the importance of the former component the front of the stage: He suggests “it is even conceivable during the maturation of speech motor control. that the ‘simple’ addition, in ancestral Homo, of a direct Any attempt towards a more comprehensive neurobio- primary motor cortex efference to . . . medullary motor logical model of human speech production, integrating nuclei sufficed to recruit the already present cerebral terri- phylogenetically older (vocal-emotional displays, including tories centered on Wernicke’s and Broca’s areas (...) to the affective prosody) and more recent components (construc- practice-based acquisition of complex vocal output” in tion of syllables and wordforms), must address the contri- terms of articulate speech. In this perspective, the role of bution of the various central pattern generators of the “FOXP2 enhancement of cortico-basal ganglia function in brainstem to spoken language (see sect. 3.1. and Fig. 4 of the human line” is restricted to the provision of “extra the target article). Admittedly, however, the respective dis- storage capacity” (Merker). As convincingly argued for by cussion of the target article has a still highly preliminary Lieberman in his commentary (and relevant books), en- character – because (adult) speech pathology lacks ade- hanced, FOXP2-driven “basal ganglia synaptic plasticity quate clinical model systems. Marschik, Kaufmann, and connectivity” represents a necessary prerequisite for

Downloaded from http:/www.cambridge.org/core582 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Response/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

vocal learning, including speech acquisition. In accordance human FOXP2 mutation arose in a large brain with mono- with the commentary of Merker, we assume, however, that synaptic corticobulbar connections to the distal cranial enhanced cerebral control of the larynx via monosynaptic nerve nuclei at its disposal. Any modifications of the pro- corticobulbar connections represents a necessary prerequi- posed scenario that shift these events into a more recent site of speech production as well, providing, for instance, time window do not compromise our suggestions. the basis for the generation of fast, ballistic laryngeal ges- Two commentaries raise concerns over the paleoanthro- tures such as those engaged in the production of unvoiced pological scenario put forward in the target article, stop consonants (two-stage model of the phylogenetic de- linking the emergence of articulate speech to a preceding velopment of articulate speech; see target article, Abstract). elaboration of nonverbal vocal displays. Ravignani et al. challenge the – alleged – assumption of our model that “en- hancement of in-group cooperation and cohesion was the R5.2. FOXP2-driven striatal reorganization during main driving force for the evolution of speech” (their spoken language evolution words). And Johansson claims: “Vocal displays as the selec- The (second part of the) commentary by Aitken provides a tive driver of protolanguage evolution (...) are highly unlike- concise review of the multiple linguistic/nonlinguistic ly, as they would drive the evolution of something more targets of FOXP2 (and its nonhuman cognates) across a resembling birdsong than language.” First, FOXP2-driven variety of species as well as the linguistic/nonlinguistic dys- striatal reorganization in humans does not give rise to functions following disruption of this gene locus. It con- “something more resembling birdsong than language” cludes: “FOXP2 is insufficient to account for the since it took place within a human brain, endowed with a development of human language or its neural and neuro- highly differentiated conceptual system even, most pre- chemical substrates. It is a proxy marker for the genetic sumably, prior to the emergence of language (see, e.g., control of complex biological systems we are only beginning Hurford 2007). And, furthermore, this development to define or understand.” Similarly, Johansson curtails the played out in a more elaborate social environment as com- contribution of this gene to phylogenetic language develop- pared to other species (see commentaries by Catania and ment: “The changes in FOXP2 in the human lineage quite Pezzulo et al.). likely are connected with some aspects of language, but the In our view, second, preverbal vocal displays – whether connection is not nearly as direct as early reports claimed, or not within the context of coordinated group activities – and as Ackermann et al. apparently assume.” served as a preadaptation for speech acquisition rather than We fully agree with these statements, which deny an – a “selective driver of protolanguage evolution.” More exclusive and/or exhaustive – contribution of FOXP2 to specifically, vocal displays enriched by sequences of sylla- the evolution of the human language system. Our model ble-sized articulatory gestures (resembling elaborated proposes only a significant – and necessary – contribution “babbling” instead of “Hmmmmm”; see above) could of FOXP2 to the phylogenetic emergence of motor have supported and promoted the initial stages of the phy- aspects (!) of spoken language (we leave open the question logenetic trajectory towards spoken language – at a point in of an engagement in higher-order cognitive dimensions of time when the benefits of a full-fledged spoken language acoustic communication, see our response to Lieberman were not yet available, even not imaginable. Most impor- above). Against this background, we really – in the words tantly, this model aims at an answer to the quest for the of Johansson (2005, p. 27) –“begin to define or understand adaptive benefits of a “first word” as raised by Bickerton the genetic control of the complex biological system” of (2009; see second last paragraph of sect. 5.2. in the target spoken language at the motor level since a plausible article). The commentaries by Catania as well as account of the underlying neurophysiological mechanisms Pezzulo et al. provide lucid and valuable ideas relevant and molecular-biological substrata can be envisaged in for a further specification of the forces which “might have terms of enhanced “basal ganglia synaptic plasticity and contributed to transform vocalization from an initially connectivity” (Lieberman). quite limited sensorimotor feat to a powerful, open- Admittedly, “the apparent presence of human FOXP2 in ended instrumental tool that permits conveying rich com- Neanderthals does not in itself prove that Neanderthals municative intentions” (Pezzulo et al.). For example, the spoke” (an argument put forward by Johansson) in terms more sophisticated interactions at the disposal of our of mastering the syntactic, semantic, and pragmatic level species, such as joint attention (Pezzulo et al.) and/or envi- of a full-fledged language system, and the target article ronmental contingencies in the social context of how “one does not make such a claim. Yet, there is no reason to human can get another to do something” (Catania), assume that Neanderthals were “quiet people” who should have paved the way towards a verbal code of “lacked completely articulate speech” (Fagan 2010, acoustic communication – after a FOXP2-driven vocal reor- Ch. 4). We think that Neanderthals – even if they did not ganization of cortico-striatal circuits provided the sensori- attain higher-order linguistic capabilities – had the func- motor prerequisites of spoken language. tional-anatomic prerequisites to enrich their “Hmmmmm” vocalizations (Mithen 2006) by syllabic artic- R5.3. Extensions of the proposed model of phylogenetic ulatory gestures – giving rise, presumably, to more salient articulate speech development vocal displays (some kind of elaborated “babbling”). The target article leaves open the question of the origin of the The new “dual-pathway model” of language evolution pre- human FOXP2 variant in Neanderthals and does not – sented in the target article is vividly rejected by Clark cannot – rule out the still controversial topic of interbreed- because it omits “the recent small, but credible, neuroimag- ing between these two hominin species. However, this ing literature which contradicts this assertion and implicates issue is not a crucial aspect of our argument, which rests human cortico-striatal-thalamic circuitry in disambiguating upon the notion that at least the functionally relevant lexical (…), grammatical (…), and semantic (…)

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 583 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 Response/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates uncertainties in perceived language.” Most presum- circuits within the hominin lineage – driven, most presum- ably, the task of disambiguation of verbal utterances ably, by a human-specific variant of the FOXP2 gene. As a rather hinges predominantly on cortical areas (see, e.g., consequence, the control of the laryngeal sound source Wittforth et al. 2010). In any case, there is ample clinical could have become part of a meshwork of phonetic ges- and experimental evidence for multiple contributions of tures that are molded – via precisely defined phase-rela- the basal ganglia to language perception and production, tionships – into syllable-shaped motor patterns. Such a and the model of multiple cortico-striatal loops (see phylogenetic reorganization of the basal ganglia must be above) allows these subcortical nuclei to subserve both considered necessary, but does not represent an already motor-limbic and cognitive aspects of spoken language. sufficient prerequisite for ontogenetic speech acquisition More specifically, elementary basal ganglia operations in our species – as demonstrated by the highly appreciated such as the generation and filtering of signal variances – comments to the target article. Furthermore, the various as assumed by Clark in his commentary (second para- commentaries point at a series of research questions graph) – may be recruited within different domains of which deserve further consideration and which are accessi- behavior (see also the comments by Zenon & Olivier ble to clinical/experimental investigations in our species as and Lieberman). Interestingly, these comments put the well as, at least partially, nonhuman primates. For example: suggestion of a contribution of cortico-striatal circuits to (a) Basal ganglia: Given a multitude of distinct cortico- the disambiguation of vocal behavior/verbal information striatal circuits, a “variegated” engagement of the basal into an evolutionary context: The basal ganglia are ganglia in human communication must be taken into assumed to set “limits on useful complexity of naturally account, including, among other things, the modulation communicated information” (Clark) in terms of a trade- of higher-order aspects of speech production – bound, pre- off between the (desired) signal recognition by intended sumably, to the operation of the so-called “cognitive observers and (unwanted) social eavesdropping. Although loop”–and the integration of vocal and non-vocal (facial, Clark does not further specify the mechanisms of the gestural) aspects of emotional expression. Against the back- assumed cortico-striatal “complexity scaling of communica- ground of well-established analogies between the human or tion,” assumed to extend “along the continuum of signals to mammalian basal ganglia and the avian “song brain,” the in- protolanguage to language,” these considerations, never- teractions of the cortico-striatal circuits with the central-au- theless, touch upon a significant problem of language evo- ditory system both during ontogenetic speech acquisition lution: Whereas a speaker should take measures to and mature speech production must be addressed in safeguard the signal against social eavesdroppers, a listener more detail. Finally, the conceivable interactions between must ascertain signal honesty. Increased voluntary control the neurotransmitter serotonin and the “striatal messen- over vocal behavior and the “low costs” of verbal utterances ger” dopamine during vocal-emotional expression await facilitate deception and raise the question of how trust as a further elucidation. prerequisite of human cooperation can emerge and be (b) Speech motor control mechanism: The relationship maintained (e.g., Sterelny 2012, Ch. 5). Rather than the between vocal tract movement sequencing – the focus of basal ganglia, enhanced mind-reading capabilities and the target article – and the rhythmic structure of verbal ut- memory storage capacities – associated with neocortical terances as well as other domains of behavior must be areas – must be considered the relevant tools for the evalu- further addressed in a comparative-biological perspective. ation of the reliability of a signal’s content. For example, the influential frame/content model of The contribution by Mattei adds an interesting novel speech development (MacNeilage 2008) points at the sup- aspect to the evolutionary scenario of the target article, plementary motor area (SMA) as a crucial component of which further strengthens – in our view – the suggested the cortical network of spoken language production, a proposal: This commentary puts the paleoanthropological mesiofrontal structure tightly interconnected with the inferences of the target article into the perspective of basal ganglia. complex adaptive system (CAS) analysis and highlights (c) Ontogenetic speech acquisition: The suggested that the phylogenetic processes driving the emergence of model of a pivotal role of the basal ganglia during ontoge- speech production within the hominin lineage –“refine- netic speech/language development must be further sub- ment in the projections from the motor cortex to the brain- stantiated. As an important research perspective within stem nuclei . . . as well as the further development of the clinical domain, the articulatory/phonatory deficits vocalization-specific cortico-basal ganglia circuitries”–can due to specific cerebral disorders such as Rett syndrome be considered a “breakthrough change” of signaling re- or isolated damage to the putamen must be further charac- sources triggering the “percolation of the whole system terized, based upon hypothesis-driven fine-grained percep- and the emergence of new unpredictable features” tual and acoustic evaluation procedures. Furthermore, the (Mattei). As a consequence, relatively small reorganization- notion of a pivotal contribution of the basal ganglia to the al processes within the motor system may have supported ontogenetic acquisition of speech motor skills must be em- “the emergence of high-level cognitive functions . . . from bedded into a broader framework, including the preceding ancestral structures already present in nonhuman pri- subverbal stages of vocal behavior and higher-order aspects mates” (as Zenon & Olivier observe). of phonological development. Unfortunately, the most interesting aspect of spoken lan- guage, that is, its emergence in the first place, eludes so far R6. Summary/conclusions a more direct examination, although molecular-genetic data begin to shed some light on this issue. As exemplified The target article focuses upon the – often neglected – by the commentaries on the target article, this light does motor aspects of spoken language evolution and emphasiz- not yet unravel a brightly illuminated and, thus, unambigu- es the crucial role of a vocal elaboration of cortico-striatal ous scenario. Nevertheless, the FOXP2-story nicely fits into

Downloaded from http:/www.cambridge.org/core584 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

the context of our current understanding of speech motor Aitken, K. J. & Trevarthen, C. (1997) Self/other organization in human psychological – control mechanisms and primate vocal behavior. Ultimate- development. Development and Psychopathology 9(4):653 77. [KJA] Aitken, P. G. (1981) Cortical control of conditioned and spontaneous vocal behavior ly, we hope that the suggestions of the target article on phy- in rhesus monkeys. Brain and Language 13:171–84. [aHA] logenetic and ontogenetic speech acquisition, centered Aitken, P. G. & Wilson, W. A., Jr. (1979) Discriminative vocal conditioning in rhesus around the basal ganglia, will help to pave the way monkeys: Evidence for volitional control? Brain and Language 8:227–40. towards a better understanding of the “end-point” of [aHA] Albin, R. L., Young, A. B. & Penney, J. B. (1989) The functional anatomy of basal these developmental trajectories, that is, the cortical orga- ganglia disorders. Trends in 12:366–75. [aHA] nization of mature speech production in relation to, for Alcock, K. J., Passingham, R. E., Watkins, K. E. & Vargha-Khadem, F. (2000a) example, the hemispheric lateralization effects of commu- Oral dyspraxia in inherited speech and language impairment and acquired nicative behavior in our closest cousins. dysphasia. Brain and Language 75(1):17–33. doi: 10.1006/brln.2000.2322. [aHA, AZ] Alcock, K. J., Passingham, R. E., Watkins, K. E. & Vargha-Khadem, F. (2000b) Pitch and timing abilities in inherited speech and language impairment. Brain and Language 75:34–46. [aHA] Alexander, G. E., Crutcher, M. D. & DeLong, M. R. (1990) Basal ganglia-thala- References mocortical circuits: Parallel substrates for motor, oculomotor, “prefrontal” and “limbic” functions. In: The prefrontal cortex: Its structure, function and pa- [The letters “a” and “r” before author’s initials stand for target article and thology, ed. H. B. M. Uylings, C. G. van Eden, J. P. C. de Bruin, M. A. Corner & – response references, respectively] M. G. P. Feenstra, pp. 119 46. Elsevier. (Elsevier Book Series on Neurosci- ence: Progress in Brain Research, vol. 85). [aHA] Abe, K. & Watanabe, D. (2011) Songbirds possess the spontaneous ability to Alhadeff, A. L., Rupprecht, L. E. & Hayes, M. R. (2012) GLP-1 neurons in the discriminate syntactic rules. Nature Neuroscience 14:1067–74. [GJLB] nucleus of the solitary tract project directly to the ventral tegmental area and – Aboitiz, F., García, R. R., Bosman, C. & Brunetti, E. (2006) Cortical memory nucleus accumbens to control for food intake. Endocrinology 153:647 58. mechanisms and language origins. Brain and Language 98:40–56. [TAM] [CMV] Ackermann, H. (2008) Cerebellar contributions to speech production and speech Alipour, M., Chen, Y. & Jürgens, U. (2002) Anterograde projections of the motor- perception: Psycholinguistic and neurobiological perspectives. Trends in cortical tongue area in the saddle-back tamarin (Saguinus fuscicollis). Brain and – Neurosciences 31(6):265–72. doi: 10.1016/j.tins.2008.02.011. [aHA, SF, GP] Behavioral Evolution 60:101 16. [CMV] fl Ackermann, H. & Brendel, B. (in press) Cerebellum. In: The neurobiology of lan- Alpert, M., Pouget, E. R. & Silva, R. R. (2001) Re ections of depression in acoustic ’ – guage, ed. G. Hickok & S. L. Small. Elsevier. [rHA] measures of the patient s speech. Journal of Affective Disorders 66:59 69. Ackermann, H., Hertrich, I., Daum, I., Scharf, G. & Spieker, S. (1997a) Kinematic [rHA] analysis of articulatory movements in central motor disorders. Movement Dis- Andres, M., Olivier, E. & Badets, A. (2008) Actions, words, and numbers: A motor orders 12:1019–27. [aHA] contribution to semantic processing? Current Directions in Psychological – Ackermann, H., Hertrich, I. & Ziegler, W. (2010) Dysarthria. In: The handbook of Science 17(5):313 17. doi: 10.1111/j.1467-8721.2008.00597.x. [AZ] language and speech disorders, ed. J. S. Damico, N. Müller & M. J. Ball, pp. Arbib, M. A. (2005) From monkey-like action recognition to human language: An 362–90. Wiley-Blackwell. [aHA] evolutionary framework for . Behavioral and Brain Sciences – Ackermann, H., Hertrich, I., Ziegler, W., Bitzer, M. & Bien, S. (1996) Acquired 28:105 24. [ARL] dysfluencies following infarction of the left mesiofrontal cortex. Aphasiology Arbib, M. A. (2006) The Mirror System Hypothesis on the linkage of action and 10:409–17. [aHA] language. In: Action to language via the mirror neuron system, ed. M. A. Arbib, – Ackermann, H., Konczak, J. & Hertrich, J. (1997b) The temporal control of repet- pp. 3 47. Cambridge University Press. [aHA] itive articulatory movements in Parkinson’s disease. Brain and Language Arnold, K. & Zuberbühler, K. (2006) Semantic combinations in primate calls. Nature 56:312–19. [arHA] 441(7091):303. [aHA, KBC] Ackermann, H., Mathiak, K. & Riecker, A. (2007) The contribution of the cerebel- Arriaga, G. & Jarvis, E. D. (2013) Mouse vocal communication system: Are ultra- – lum to speech production and speech perception: Clinical and functional sounds learned or innate? Brain and Language 124(1):96 116. doi: 10.1016/j. imaging data. Cerebellum 6:202–13. [rHA] bandl.2012.10.002. [BM, CIP] Ackermann, H. & Riecker, A. (2004) The contribution of the insula to motor aspects Arriaga, G., Zhou, E. P. & Jarvis, E. D. (2012) Of mice, birds, and men: The mouse of speech production: A review and a hypothesis. Brain and Language ultrasonic song system has some features similar to humans and song-learning 89:320–28. [DYT] birds. PLOS ONE 7(10):e46610. doi: 10.1371/journal.pone.0046610. [CIP] Ackermann, H. & Riecker, A. (2010a) Cerebral control of motor aspects of speech Arroyo, S., Lesser, R. P., Gordon, B., Uematsu, S., Hart, J., Schwerdt, P., production: Neurophysiological and functional imaging data. In: Speech motor Andreasson, K. & Fisher, R. S. (1993) Mirth, laughter, and gelastic seizures. – control: New developments in basic and applied research, ed. B. Maassen & P. Brain 116:757 80. [aHA] van Lieshout, pp. 117–34. Oxford University Press. [aHA] Ayub, Q., Yngvadottir, B., Chen, Y., Xue, Y., Hu, M., Vernes, S. C., Fisher, S. E. & Ackermann, H. & Riecker, A. (2010b) The contribution(s) of the insula to speech Tyler-Smith, C. (2013) FOXP2 Targets show evidence of positive selection in – production: A review of the clinical and functional imaging literature. Brain European populations. American Journal of Human Genetics 92:696 706. Structure and Function 214:419–33. [aHA] [KJA] Ackermann, H., Vogel, M., Petersen, D. & Poremba, M. (1992) Speech deficits in Aziz-Zadeh, L., Sheng, T. & Gheytanchi, A. (2010) Common premotor regions for ischaemic cerebellar lesions. The Journal of Neuroscience 239(4):223–27. the perception and production of prosody and correlations with empathy and – [UH] prosodic ability. PLoS ONE 5(1):1 8. [SF] Ackermann, H. & Ziegler, W. (1992) Cerebellar dysarthria: A review. Fortschritte Badgaiyan, R. D., Fischman, A. J. & Alpert, N. M. (2007) Striatal dopamine release – der Neurologie und Psychiatrie 60:28–40. (German). [aHA] in sequential learning. NeuroImage 38:549 56. [aHA] Ackermann, H. & Ziegler, W. (1994) Acoustic analysis of vocal instability in cere- Bailenson, J. N., Yee, N., Patel, K. & Beall, A. C. (2008) Detecting digital chame- – bellar dysfunctions. Annals of Otology, Rhinology and Laryngology leons. Computers in Human Behavior 24:66 87. [GAB] 103:98–104. [rHA] Bailey, P., von Bonin, G. & McCulloch, W. S. (1950) The isocortex of the chimpanzee. Ackermann, H. & Ziegler, W. (1995) Akinetic mutism: A review of the literature. University of Illinois Press. [aHA] Fortschritte der Neurologie und Psychiatrie 63:59–67. (German). [aHA] Bannan, N. (2012) Harmony and its role in human evolution. In: Music, language, – Ackermann, H. & Ziegler, W. (2010) Brain mechanisms underlying speech motor and human evolution, ed. N. Bannan, pp. 288 339. Oxford University Press. control. In: The handbook of phonetic sciences, 2nd edition, ed. W. J. Hard- [aHA] fi castle, J. Laver & F. E. Gibbon, pp. 202–50. Wiley-Blackwell. [aHA] Banse, R. & Scherer, K. R. (1996) Acoustic pro les in vocal emotion expression. – Ackermann, H. & Ziegler, W. (2013) A “birdsong perspective” on human speech Journal of Personality and Social Psychology 70(3):614 36. [aHA, DLB] production. In: Birdsong, speech, and language: Exploring the evolution of mind Barceló-Coblijn, L. & Benítez-Burraco, A. (2013) Disentangling the Neanderthal – and brain, ed. J. J. Bolhuis & M. Everaert, pp. 331–52. MIT Press. [aHA] net: A comment on Johansson (2013). Biolinguistics 7:199 216. [SJ] Adams, C. L., Molfese, D. L. & Betz, J. C. (1987) Electrophysiological correlates of Barlow, S. M. & Estep, M. (2006) Central pattern generation and the motor infra- categorical speech perception for voicing contrasts in dogs. Developmental structure for suck, respiration, and speech. Journal of Communication Disor- – Neuropsychology 3(3–4):175–89. [DLB] ders 39:366 80. [PBM] Aitken, K. J. (2008) Intersubjectivity, , and the neurobiology of Barlow, S. M., Lund, J. P., Estep, M. & Kolta, A. (2009) Central pattern generators autistic spectrum disorders: A systematic review. Keio Journal of Medicine 57 for orofacial movements and speech. In: Handbook of mammalian vocalization, – (1):15–36. [KJA] ed. S. M. Brudzynski, pp. 351 70. Academic Press. [PBM]

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 585 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Barnard, A. (2011) Social anthropology and human origins. Cambridge University Bispham, J. (2006) Rhythm in music: What is it? Who has it? And why? Music Press. [aHA] Perception 24:125–34. [GAB] Barrett, J., Pike, G. B. & Paus, T. (2004) The role of the anterior cingulate cortex in Blank, S. C., Bird, H., Turkheimer, F. & Wise, R. J. (2003) Speech production after pitch variation during sad affect. European Journal of Neuroscience stroke: The role of the right pars opercularis. Annals of Neurology 54(3):310–20. 19:458–64. [aHA] doi: 10.1002/ana.10656. [CM] Barney, A., Martelli, S., Serrurier, A. & Steele, J. (2012) Articulatory capacity of Blank, S. C., Scott, S. K., Murphy, K., Warburton, E. & Wise, R. J. (2002) Speech Neanderthals, a very recent and human-like fossil hominin. Philosophical production: Wernicke, Broca and beyond. Brain 125(Pt. 8):1829–38. [CM] Transactions of the Royal Society of London, Series B: Biological Sciences Blumstein, S. E. (1995) The neurobiology of language. In: Speech, language and 367:88–102. [aHA] communication, ed. J. Miller & P. D. Eimas, pp. 339–370. Academic Press. Barris, R. W. & Schuman, H. R. (1953) Bilateral anterior cingulate gyrus lesions: [PL] Syndrome of the anterior cingulate gyri. Neurology 3:44–52. [aHA] Blumstein, S. E., Cooper, W. E., Goodglass, H., Statlender, S. & Gottlieb J. E. Basel-Vanagaite, L., Muncher, L., Straussberg, R., Pasmanik-Chor, M., Yahav, M., (1980) Production deficits in aphasia: A voice-onset time analysis. Brain and Rainshtein, L., Walsh, C. A., Magal, N., Taub, E., Drasinover, V., Shalev, H., Language 9:153–70. [PL] Attia, R., Rechavi, G., Simon, A. J. & Shohat, M. (2006) Mutated nup62 causes Bobee, S., Mariette, E., Tremblay-Leveau, H. & Caston, J. (2000) Effects of early autosomal recessive infantile bilateral striatal necrosis. Annals of Neurology midline cerebellar lesion on cognitive and emotional functions in the rat. 60:214–22. [aHA] Behavioural Brain Research 112(1–2):107–17. [UH] Bauernfiend, A. L., de Sousa, A. A., Avasthi, T., Dobson, S. D., Raghanti, M. A., Boë, L. J., Heim, J. L., Honda, K. & Maeda, S. (2002) The potential Neandertal Lewandowski, A. H., Zilles, K., Semendeferi, K., Allman, J. M., Craig, A. D., vowel space was as large as that of modern humans. Journal of Phonetics Hof, P. R. & Sherwood, C. C. (2013) A volumetric comparison of the insular 30:465–84. [aHA] cortex and its subregions in primates. Journal of Human Evolution Boesch, C. & Boesch-Achermann, H. (2000) The chimpanzees of the Taï Forest: 64:263–79. [DYT] Behavioural ecology and evolution. Oxford University Press. [aHA] Baumann, O. & Mattingley, J. B. (2012) Functional topography of primary emotion Bohland, J. W. & Guenther, F. H. (2006) An fMRI investigation of syllable sequence processing in the human cerebellum. NeuroImage 61(4):805–11. doi: 10.1016/j. production. NeuroImage 2:821–41. [DYT] neuroimage.2012.03.044. [UH] Bolhuis, J. J. & Everaert, M., eds. (2013) Birdsong, speech and language. Exploring Beckers, G. J. L. (2013) Peripheral mechanisms of vocalization in birds: A compar- the evolution of mind and brain. MIT Press. [aHA, GJLB] ison with human speech. In: Birdsong, speech, and language. Exploring the Bolhuis, J. J., Okanoya, K. & Skarff, C. (2010) Twitter evolution: Converging evolution of mind and brain, ed. J. J. Bolhuis & M. Everaert, pp. 399–422. MIT mechanisms in birdsong and human speech. Nature Reviews Neuroscience 11 Press. [GJLB] (11):747–59. [aHA, GJLB, KBC] Beckers, G. J. L., Bolhuis, J. J., Okanoya, K. & Berwick, R. C. (2012) Birdsong Bornstein, M. H., Tamis-LeMonda, C. S., & Haynes, O. M. (1999) First words in the neurolinguistics: Songbird context-free grammar claim is premature. Neuro- second year: Continuity, stability, and models of concurrent and predictive Report 23:139–45. [aHA, GJLB] correspondence in vocabulary and verbal responsiveness across age and context. Beckers, G. J. L., Nelson, B. S. & Suthers, R. A. (2004) Vocal tract filtering by lingual Infant Behavior and Development 22:65–85. [MHB] articulation in a parrot. Current Biology 14:1592–97. [GJLB] Bostan, A. C., Dum, R. P. & Strick, P. L. (2013) Cerebellar networks with the Bédard, C., Wallman, M. J., Pourcher, E., Gould, P. V., Parent, A. & Parent, M. cerebral cortex and basal ganglia. Trends in Cognitive Sciences 17(5):241–54. (2011) Serotonin and dopamine striatal innervation in Parkinson’s disease and doi: 10.1016/j.tics.2013.03.003. [UH] Huntington’s chorea. Parkinsonism Related Disorders 17:593–98. [CMV] Botez, M. I. & Barbeau, A. (1971) Role of subcortical structures, and particularly of Belichenko, N. P., Belichenko, P. V., Li, H. H., Mobley, W. C. & Francke, U. (2008) the thalamus, in the mechanisms of speech and language: A review. Interna- Comparative study of brain morphology in Mecp2 mutant mouse models of Rett tional Journal of Neurology 8:300–20. [aHA] syndrome. Journal of Comparative Neurology 508:184–95. [PBM] Bottoni, L., Masin, S. & Lenti Boero, D. (2009) Vowel-like sound structure in an Bell, W. L., Davis, D. L., Morgan-Fisher, A. & Ross, E. D. (1990) Acquired apro- African Grey Parrot (Psittacus erithacus) vocal production. The Open Behav- sodia in children. Journal of Child Neurology 5:19–26. [aHA] ioral Science Journal 3:1–16. [DLB] Belton, E., Salmond, C. H., Watkins, K. E., Vargha-Khadem, F. & Gadian, D. G. Bouchard, K. E., Mesgarani, N., Johnson, K. & Chang, E. F. (2013) Functional (2003) Bilateral brain abnormalities associated with dominantly inherited verbal organization of human sensorimotor cortex for speech articulation. Nature and orofacial dyspraxia. Human 18:194–200. [aHA] 495:327–32. doi:10.1038/nature11911. [aHA] Bendor, D. & Wang, X. (2008) Neural response properties of primary, rostral, and Bowers, J. M. & Konopka, G. (2012) The role of the FOXP family of transcription rostrotemporal core fields in the auditory cortex of marmoset monkeys. Journal factors in ASD. Disease Markers 33(2012):251–60. doi: 10.3233/DMA-2012- of Neurophysiology 100:888–906. [DHR] 0919. [KJA] Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H. & Ullén, F. (2005) Bowers, J. M., Perez-Pouchoulen, M., Edwards, N. S. & McCarthy, M. M. (2013) Extensive piano practicing has regionally specific effects on white matter Foxp2 mediates sex differences in ultrasonic vocalization by rat pups and directs development. Nature Neuroscience 8:1148–50. [aHA] order of maternal retrieval. Journal of Neuroscience 33(8):3276–83. [KJA] Benítez-Burraco, A. & Longa, V. M. 2012. On the inference “Neanderthals had Boyd, L. A., Edwards, J. D., Siengsukon, C. S., Vidoni, E. D., Wessel, B. D. & FOXP2=they had complex language.” In: Evolution of language. Proceedings of Linsdell, M. A. (2009) Motor sequence chunking is impaired by basal ganglia the 9th International Conference (Evolang9), ed. T. Scott-Phillips, M. Tamariz, stroke. Neurobiology of Learning and Memory 92(1):35–44. doi: 10.1016/j. E. Cartmill & J. R. Hurford, pp 50–57. World Scientific. [SJ] nlm.2009.02.009. [AZ] Bennett, C. H., Brassard, G., Crépeau, C., Jozsa, R., Peres, A. & Wootters, W. (1993) Braak, H., Del Tredici, K., Rüb, U., de Vos, R. A. I., Jansen Steur, E. N. H. & Braak, Teleporting an unknown quantum state via dual classical and EPR channels. E. (2003) Staging of brain pathology related to sporadic Parkinson’s disease. Physical Review Letters 70:1895–99. [KBC] Neurobiology of Aging 24:197–211. [aHA] Bergman, T. J. (2013) Speech-like vocalized lip-smacking in geladas. Current Biology Brainard, M. S. & Doupe, A. J. (2002) What songbirds teach us about learning. 23(7):R268–69. [DHR, DYT] Nature 417:351–58. [aHA, GP] Bermejo, M. & Omedes, A. (1999) Preliminary vocal repertoire and vocal commu- Braman, S. (2004) The emergent global information policy regime. In: The emergent nication of wild bonobos (Pan paniscus) at Lilungu (Democratic Republic of global information policy regime, ed. S. Braman, pp. 12–37. Palgrave Macmil- Congo). Folia Primatologica 70:328–57. [aHA] lan. [TAM] Berridge, K. C. (2004) Motivation concepts in . Physiology Brandt, P. A. (2009) Music and how we became human – a view from cognitive se- and Behavior 81(2):179–209. doi: 10.1016/j.physbeh.2004.02.004. [AZ] miotics: Exploring imaginative hypotheses. In: Communicative musicality: Ex- Berta, M., Christandl, M., Colbeck, R., Renes, J. M. & Renner, R. (2010) The ploring the basis of human companionship, ed. S. Malloch & C. Trevarthen, pp. uncertainty principle in the presence of quantum memory. Nature Physics 31–44. Oxford University Press. [aHA] 6:659–62. [KBC] Breitenstein, C., Daum, I. & Ackermann, H. (1998) Emotional processing following Bertossa, R. C., ed. (2011) Theme issue: Evolutionary developmental biology (evo- cortical and subcortical brain damage: Contribution of the fronto-striatal cir- devo) and behaviour. Philosophical Transactions of the Royal Society B: Bio- cuitry. Behavioural Neurology 11:29–42. [rHA] logical Sciences 366(1574):2056–180. doi: 10.1098/rstb.2011.0035. [DKO] Brendel, B., Erb, M., Riecker, A., Grodd, W., Ackermann, H. & Ziegler, W. (2011) Berwick, R. C., Friederici, A. D., Chomsky, N. & Bolhuis, J. J. (2013) Evolution, Do we have a “mental syllabary” in the brain? An fMRI study. Motor Control brain, and the nature of language. Trends in Cognitive Sciences 17(2):89–98. 15(1):34–51. [UH] [GJLB, BF] Brendel, B., Hertrich, I., Erb, M., Lindner, A., Riecker, A., Grodd, W. & Acker- Berwick, R. C., Okanoya, K., Beckers, G. J. L. & Bolhuis, J. J. (2011) Songs to syntax: mann, H. (2010) The contribution of mesiofrontal cortex to the preparation and The linguistics of birdsong. Trends in Cognitive Sciences 15(3):113–21. [aHA, execution of repetitive syllable productions: An fMRI study. NeuroImage GJLB, KBC] 50:1219–30. [aHA] Bickerton, D. (2009) Adam’s tongue: How humans made language, how language Brendel, B. & Ziegler, W. (2008) Effectiveness of metrical pacing in the treatment of made humans. Hill & Wang. [aHA, SJ] apraxia of speech. Aphasiology 22:77–102. [rHA]

Downloaded from http:/www.cambridge.org/core586 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Brockelman, W. Y. & Schilling, D. (1984) Inheritance of stereotyped gibbon calls. Cardona, J. F., Gershanik, O., Gelormini-Lezama, C., Houck, A. L., Cardona, S., Nature 312:634–36. [aHA] Kargieman, L., Trujillo, N., Arévalo, A., Amoruso, L., Manes, F. & Ibánez, A. Brotis, A. G., Kapsalaki, E. Z., Paterakis, K., Smith, J. R. & Fountas, K. N. (2009) (2013) Action-verb processing in Parkinson’s disease: New pathways for motor- Historic evolution of open cingulectomy and stereotactic cingulotomy in language learning. Brain Structure and Function 218:1355–73. [rHA] the management of medically intractable psychiatric disorders, pain Carlsson, P. & Mahlapuu, M. (2002) Forkhead transcription factors: Key players in and drug . Stereotactic and Functional Neurosurgery 87:271–91. development and metabolism. Developmental Biology 250(1):1–23. [KJA] [aHA] Carreiras, M., Mechelli, A. & Price, C. J. (2006) Effect of word and syllable fre- Brown, J. W. (1988) Cingulate gyrus and supplementary motor correlates of vocal- quency on activation during lexical decision and reading aloud. Human Brain ization in man. In: The physiological control of mammalian vocalization, ed. Mapping 27(12):963–72. doi: 10.1002/hbm.20236. [UH] J. D. Newman, pp. 227–43. Plenum Press. [aHA] Cartmill, E. A. & Byrne, R. W. (2007) Orangutans modify their gestural signalling Brown, S. (2000) The “musilanguage” model of music evolution. In: The origins of according to their audience’s comprehension. Current Biology 17:1345–48. music, ed. N. L. Wallin, B. Merker & S. Brown, pp. 271–300. MIT Press. [AM] [aHA] Caruana, F., Jezzini, A., Sbriscia-Fioretti, B., Rizzolatti, G. & Gallese, V. (2011) Brown, S., Ngan, E. & Liotti, M. (2008) A larynx area in the human motor cortex. Emotional and social behaviors elicited by electrical stimulation of the insula in Cerebral Cortex 18:837–45. [aHA] the macaque monkey. Current Biology 21:195–99. [DYT] Brown, S., Laird, A. R., Pfordresher, P. Q., Thelen, S. M., Turkeltaub, P. & Liotti, M. Catania, A. C. (1990) What good is five percent of a language competence? Behav- (2009) The somatotopy of speech: Phonation and articulation in the human ioral and Brain Sciences 13:729–31. [ACC] motor cortex. Brain and Cognition 70:31–41. [aHA] Catania, A. C. (1995) Single words, multiple words, and the functions of language. Brown, T. G. (1915) Note on the physiology of the basal ganglia and mid-brain of the Behavioral and Brain Sciences 18:184–85. [ACC] anthropoid ape, especially in reference to the act of laughter. Journal of Phys- Catania, A. C. (2001) Three varieties of selection and their implications for the iology 49:195–207. [aHA] origins of language. In: Language evolution: Biological, linguistic and philo- Brück, C., Wildgruber, D., Kreifelts, B., Krüger, R. & Wächter, T. (2011) Effects of sophical perspectives, ed. G. Györi, pp. 55–71. Peter Lang. [ACC] subthalamic nucleus stimulation on emotional prosody comprehension in Par- Catania, A. C. (2003) Why behavior should matter to linguists. Behavioral and Brain kinson’s disease. PLOS ONE 6:e19140. [CMV] Sciences 26:670–72. [ACC] Brumm, H. & Slabbekoorn, H. (2005) Acoustic communication in noise. Advances in Catania, A. C. (2008) Brain and behavior: Which way does the shaping go? Behav- the Study of Behavior 35:151–209. [rHA] ioral and Brain Sciences 31:516–17. [ACC] Brumm, H., Voss, K., Köllmer, I. & Todt, D. (2004) Acoustic communication in Catania, A. C. (2009) Language evolution: Two tracks are not enough. Behavioral noise: Regulation of call characteristics in a New World monkey. Journal of and Brain Sciences 32:451–52. [ACC] Experimental Biology 207(3):443–48. [aHA, DJW] Catania, A. C. (2013a) Learning, 5th edition. Sloan. [ACC] Brumm, H. & Zollinger, S. A. (2011) The evolution of the Lombard effect: 100 years Catania, A. C. (2013b) A natural science of behavior. Review of General Psychology of psychoacoustic research. Behaviour 148:1173–98. [rHA] 17:133–39. [ACC] Bryant, G. A. (2012) Shared laughter in conversation as coalition signaling. Paper Catania, A. C. & Cerutti, D. (1986) Some nonverbal properties of verbal behavior. presented at the XXI Biennial International Conference on Human , In: Analysis and integration of behavioral units, ed. T. Thompson & M. D. Vienna, Austria, August 13, 2012. [GAB] Zeiler, pp. 185–211. Erlbaum. [ACC] Bryant, G. A. (2013) Animal signals and emotion in music: Coordinating affect across Chan, A. M., Dykstra, A. R., Jayaram, V., Leonard, M. K., Travis, K. E., Gygi, B., groups. Frontiers in Psychology 4(Article 990):1–13. [GAB] Baker, J. M., Eskandar, E., Hochberg, L.R., Halgren, E. & Cash, S. S. (2014) Bryant, G. A. & Aktipis, A. (2014) The animal nature of spontaneous human Speech-specific tuning of neurons in human superior temporal gyrus. Cerebral laughter. Evolution and Human Behavior 35(4):327–35. [GAB] Cortex 24(10):2679–93. [DHR] Buder, E. H., Chorna, L., Oller, D. K. & Robinson, R. (2008) Vibratory regime Chan, S.-H., Ryan, L. & Bever, T. G. (2013) Role of the striatum in language: classification of infant phonation. Journal of Voice 22:553–64. [DKO] Syntactic and conceptual sequencing. Brain and Language 125(3):283–94. doi: Burgoon, J. K., Floyd, K. & Guerrero, L. K. (2010) Nonverbal communication 10.1016/j.bandl.2011.11.005. [AZ] theories of interaction adaptation. In: The handbook of communication science, Chandrasekaran, C., Lemus, L., Trubanova, A., Gondan, M. & Ghazanfar, A. A. 2nd edition, ed. C. R. Berger, M. E. Roloff & D. R. Roskos-Ewoldsen, (2011) Monkeys and humans share a common computation for face/voice in- pp. 93–108. Sage. [aHA] tegration. PLOS Computational Biology 7(9):e1002165. [DYT] Burkart, J. M., Fehr, E., Efferson, C. & van Schaik, C. P. (2007) Other-regarding Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A. & Ghazanfar, A. A. preferences in a non-human primate: Common marmosets provision food al- (2009) The natural statistics of audiovisual speech. PLOS Computational truistically. Proceedings of the National Academy of Sciences USA Biology 5:e1000436. [DYT] 104:19762–66. [ARL] Chang, C.-C., Lee, Y. C., Lui, C.-C. & Lai, S.-L. (2007) Right anterior cingulate Burkart, J. M., Hrdy, S. B. & van Schaik, C. P. (2009a) Cooperative breeding and cortex infarction and transient speech aspontaneity. Archives of Neurology human cognitive evolution. Evolutionary Anthropology 18:175–86. [ARL, 64:442–46. [aHA] DYT] Changizi, M. A. (2001) Universal scaling laws for hierarchical complexity in lan- Burkart, J. M., Strasser, A. & Foglia M. (2009b) Trade-offs between social learning guages, organisms, behaviors and other combinatorial systems. Journal of The- and individual innovativeness in common marmosets, Callithrix jacchus. Animal oretical Biology 211(3):277–95. [KBC] Behaviour 77:1291–301. [ARL] Charlesworth, J. D., Warren, T. L. & Brainard, M. S. (2012) Covert skill learning in a Burkart, J. & van Schaik, C. (2010) Cognitive consequences of cooperative breeding cortical-basal ganglia circuit. Nature 486(7402):251–55. doi: 10.1038/ in primates? 13:1–19. [ARL] nature11078. [CIP] Burling, R. (2005) The talking ape: How language evolved. Oxford University Chassagnon, S., Minotti, L., Kremer, S., Verceuil, L., Hoffmann, D., Benabid, A. L. Press. [aHA] & Kahane, P. (2003) Restricted frontomesial epileptogenic focus generating Butler, A. B. & Hodos, W. (2005) Comparative vertebrate neuroanatomy: Evolution dyskinetic behavior and laughter. Epilepsia 44:859–63. [aHA] and adaptation, 2nd edition. Wiley. [aHA] Chen, L.-M. & Kent, R. D. (2009) Development of prosodic patterns in Mandarin- Byrne, R. W. (2000) Evolution of primate cognition. Cognitive Science 24:543–70. learning infants. Journal of Child Language 36(1):73–84. [PL] [BM] Chenery, H. J., Angwin, A. J. & Copeland, D. A. (2008) The basal circuits, dopamine, Caligiore, D., Pezzulo, G., Miall, R. C. & Baldassarre, G. (2013) The contribution of and ambiguous word processing: A neurobiological account of priming studies brain sub-cortical loops in the expression and acquisition of action under- in Parkinson’s disease. Journal of International Neuropsychology Society 14 standing abilities. Neuroscience and Biobehavioral Reviews 37(10):2504–15. (3):351–64. [KBC] [GP] Cheney, D. L. & Seyfarth, R. M. (1990) How monkeys see the world: Inside the mind Call, J. & Tomasello, M., eds. (2007) The gestural communication of apes and of another species. University of Chicago Press. [aHA] monkeys. Erlbaum. [aHA] Cheney, D. L. & Seyfarth, R. M. (2005) Constraints and preadaptations in the ear- Calzavara, R., Mailly, P. & Haber, S. N. (2007) Relationship between the cortico- liest stages of language evolution. The Linguistic Review 22:135–59. [aHA] striatal terminals from areas 9 and 46, and those from area 8A, dorsal and rostral Cheney, D. L. & Seyfarth, R. M. (2007) Baboon metaphysics: The evolution of a premotor cortex and area 24c: An anatomical substrate for cognition to action. social mind. University of Chicago Press. [aHA] European Journal of Neuroscience 26:2005–24. [aHA] Chersi, F., Ferro, M., Pezzulo, G. & Pirrelli, V. (2014) Topological self-organization Cancelliere, A. E. B. & Kertesz, A. (1990) Lesion localization in acquired deficits of and prediction learning can support both action and lexical chains in the brain. emotional expression and comprehension. Brain and Cognition 13:133–47. Topics in Cognitive Science 6(3):476–91. [GP] [aHA] Chien, W.-H., Gau, S. S.-F., Chen, C.-H., Tsai, W.-C., Wu, Y.-Y., Chen, P.-H., Canevari, C., Badino, L., D’Ausilio, A., Fadiga, L. & Metta, G. (2013) Modeling Shang, C.-Y. & Chen, C.-H. (2013) Increased gene expression of FOXP1 in speech imitation and ecological learning of auditory-motor maps. Frontiers in patients with autism spectrum disorders. Molecular Autism 4:23. Available at: Psychology 4:364. [GP] http://www.molecularautism.com/content/4/1/23 [KJA]

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 587 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Choi, E. Y., Yeo, B. T. & Buckner, R. L. (2012) The organization of the human Parkinson’s disease, and cortical lesions. Journal of International Neuropsy- striatum estimated by intrinsic functional connectivity. Journal of Neurophysi- chology Society 9(7):1041–52. [KBC] ology 108(8):2242–63. doi: 10.1152/jn.00270.2012. [UH] Corballis, M. C. (2002) From hand to mouth: The origins of language. Princeton Chomsky, N. (1956) Three models for the description of language. IRE Transactions University Press. [aHA] on Information Theory 2:113–24. [KBC] Corballis, M. C. (2003) From mouth to hand: Gesture, speech, and the evolution of Chomsky, N. (1965) Aspects of the theory of syntax. MIT Press. [BF] right-handedness. Behavioral and Brain Sciences 26:199–260. [aHA] Chomsky, N. (1966) Cartesian linguistics: A chapter in the history of rationalist Coudé, G., Ferrari, P. F., Rodà, F., Maranesi, M., Borelli, E., Veroni, V., Monti, F., thought. Harper & Row. [KBC] Rozzi, S. & Fogassi, L. (2011) Neurons controlling voluntary vocalization in the Christiansen, M. H. & Kirby, S. (2003) Language evolution: Consensus and con- macaque ventral premotor cortex. PLoS ONE 6:e26822. [aHA] troversies. Trends in Cognitive Sciences 7:300–307. [ARL] Coward, F. (2010) Small worlds, material culture and ancient Near Eastern social Christophe, A., Millotte, S., Bernal, S. & Lidz, J. (2008) Bootstrapping lexical and networks. In: Social brain, distributed mind, ed. R. Dunbar, C. Gamble & syntactic acquisition. Language and Speech 51(1–2):61–75. doi: 10.1177/ J. Gowlett, pp. 449–79. Oxford University Press. (Proceedings of the British 00238309080510010501. [AR] Academy, vol. 158). [aHA] Cismaresco, A. S. & Montagner, H. (1990) Mother’s discrimination of their neonates’ Crais, E., Douglas, D. & Campbell, C. (2004) The intersection of the development of cry in relation to cry acoustics: The first week of life. Early Child Development gestures and intentionality. Journal of Speech, Language and Hearing Research and Care 65:3–13. [DLB] 47:678–94. [KJA] Clark, K. B. (2010) On classical and quantum error-correction in ciliate mate se- Creed, M. C., Hamani, C., Bridgman, A., Fletcher, P. J. & Nobrega, J. N. (2012) lection. Communicative & Integrative Biology 3(4):374–78. [KBC] Contribution of decreased serotonin release to the antidyskinetic effects of deep Clark, K. B. (2012) A statistical mechanics definition of insight. In: Computational brain stimulation in a rodent model of tardive dyskinesia: Comparison of the intelligence, ed. A. G. Floares, pp. 139–62. Nova Science. [KBC] subthalamic and entopeduncular nuclei. Journal of Neuroscience 32:9574–81. Clark, K. B. (2013a) Ciliates learn to diagnose and correct classical error syndromes [CMV] in mating strategies. Frontiers in Microbiology 4:229. [KBC] Crinion, J., Turner, R., Grogan, A., Hanakawa, T., Noppeney, U., Devlin, J. T., Aso, Clark, K. B. (2013b) The mating judgments of microbes. In: Social learning theory: T., Urayama, S., Fukuyama, H., Stockton, K., Usui, K., Green, D. W. & Price, Phylogenetic considerations across animal, plant, and microbial taxa, ed. K. B. C. J. (2006) Language control in the bilingual brain. Science 312:1537–40. Clark, pp. 173–200. Nova Science. [KBC] [TH] Clark, K. B. (in press) Entropic uncertainty of ciliate behavioral signals limits Crockford, C. & Boesch, C. (2003) Context-specific calls in wild chimpanzees, Pan eavesdropping by mating rivals and predators. Frontiers in Microbiology. troglodytes verus: Analysis of barks. Animal Behaviour 66:115–25. doi: 10.1006/ [KBC] anbe.2003.2166. [DKO] Clarkson, M. G. & Berg, W. K. (1983) Cardiac orienting and vowel discrimination in Cross, I. (2001) Music, mind and evolution. Psychology of Music 29:95–102. newborns: Crucial stimulus parameters. Child Development 54:162–71. [aHA] [DLB] Cross, I. (2003) Music, cognition, culture, and evolution. In: The cognitive neuro- Clay, Z., Pika, S., Gruber, T. & Zuberbühler, K (2011) Female bonobos use copu- science of music, ed. I. Peretz & R. Zatorre, pp. 42–56. Oxford University lation calls as social signals. Biology Letters 7:513–16. [ARL] Press. [aHA] Clay, Z. & Zuberbühler, K. (2009) Food-associated calling sequences in bonobos. Cross, I. & Morley, I. (2009) The evolution of music: Theories, definitions and the Animal Behaviour 77:1387–96. [aHA] nature of the evidence. In: Communicative musicality: Exploring the basis of Clegg, M. (2012) The evolution of the human vocal tract: Specialized for speech? In: human companionship, ed. S. Malloch & C. Trevarthen, pp. 61–81. Oxford Music, language, and human evolution, ed. N. Bannan, pp. 58–80. Oxford University Press. [aHA] University Press. [aHA] Crosson, B. (2013) Thalamic mechanisms in language: A reconsideration based on Clerget, E., Andres, M. & Olivier, E. (2013) Deficit in complex sequence processing recent findings and concepts. Brain and Language 126(1):73–88. doi: 10.1016/j. after a virtual lesion of left BA45. PLOS ONE 8(6):e63722. doi: 10.1371/journal. bandl.2012.06.011. [UH] pone.0063722. [AZ] Csibra, G. & Gergely, G. (2011) Natural pedagogy as evolutionary adaptation. Clerget, E., Badets, A., Duque, J. & Olivier, E. (2011) Role of Broca’s area in motor Philosophical Transactions of the Royal Society of London, Series B: Biological sequence programming: A cTBS study. NeuroReport 22(18):965–69. doi: Sciences 366:1149–57. [GP] 10.1097/WNR.0b013e32834d87cd. [AZ] Cummins, F. (2009) Rhythm as entrainment: The case of synchronous speech. Clerget, E., Poncin, W., Fadiga, L. & Olivier, E. (2012) Role of Broca’s area in Journal of Phonetics 37:16–28. [rHA] implicit motor skill learning: Evidence from continuous theta-burst magnetic Cutler, A., Oahan, D. & van Donselaar, W. (1997) Prosody in the comprehension of stimulation. Journal of Cognitive Neuroscience 24(1):80–92. doi: 10.1162/ spoken language: A literature review. Language and Speech 40(2):141–201. jocn_a_00108. [AZ] [AR] Clerget, E., Winderickx, A., Fadiga, L. & Olivier, E. (2009) Role of Broca’s area in Dabelsteen, T. (2004) Strategies that facilitate or counter eavesdropping on vocal encoding sequential human actions: A virtual lesion study. NeuroReport 20 interactions in songbirds. Anais de Academia Brasileirade Ciências 76 (16):1496–99. doi: 10.1097/WNR.0b013e3283329be8. [AZ] (2):274–78. [KBC] Cleveland, J. & Snowdon, C. T. (1982) The complex vocal repertoire of the adult Dall, S. R. X. (2005) Information and its use by animals in evolutionary ecology. cotton-top tamarin (Saguinus oedipus oedipus). Zeitschrift für Tierpsychologie Trends in Ecological Evolution 20(4):187–93. [KBC] 58(3):231–70. [DJW] Damasio, A. R., Grabowski, T. J., Bechara, A., Damasio, H., Ponto, L. L., Parvizi, J. Cohen, J. (2010) Almost chimpanzee: Searching for what makes us human, in rain- & Hichwa, R. D. (2000) Subcortical and cortical brain activity during the feeling forests, labs, sanctuaries, and zoos. Henry Holt. [aHA] of self-generated emotions. Nature Neuroscience 3(10):1049–56. doi: 10.1038/ Cohen, M. J., Riccio, C. A. & Flannery, A. M. (1994) Expressive aprosodia following 79871. [UH] stroke to the right basal ganglia: A case report. Neuropsychology 8:242–45. Danchin, E., Giraldeau, L. A., Valone, T. J. & Wagner, R. H. (2004) Public infor- [aHA] mation: From nosy neighbours to cultural evolution. Science 305 Cohn, J. F., Kruez, T. S., Matthews, I., Yang, Y., Nguyen, M. H., Padilla, M. T., (5683):487–91. [KBC] Zhou, F. & De la Torre, F. (2009) Detecting depression from facial actions Dang, M. T., Yokoi, F., Yin, H. H., Lovinger, D. M., Wang, Y. & Li, Y. (2006) and vocal prosody. In: Proceedings of the Third International Conference on Disrupted motor learning and long-term synaptic plasticity in mice lacking Affective Computing and Intelligent Interaction (ACII-09), 10-12 September NMDAR1 in the striatum. Proceedings of the National Academy of Sciences 2009, Amsterdam, The Netherlands, pp. 1–7. IEEE Xplore Digital Library. USA 103:15254–59. [aHA] [rHA] Darkins, A. W., Fromkin, V. A. & Benson, D. F. (1988) A characterization of the Conway, C. M. & Christiansen, M. H. (2001) Sequential learning in non-human prosodic loss in Parkinson’s disease. Brain and Language 34:315–27. [aHA, primates. Trends in Cognitive Sciences 5(12):539–46. [AR] PBM] Cook, P., Rouse, A., Wilson, M. & Reichmuth, C. (2013) A California sea lion Darwin, C. (1871) The descent of man, and selection in relation to sex. John Murray (Zalophus californianus) can keep the beat: Motor entrainment to rhythmic [2nd edition 1879 by John Murray, reprint 2004 by Penguin Books]. [aHA] auditory stimuli in a non-vocal mimic. Journal of 127 Das, S. & Fowler, S. C. (1995) Acute and subchronic effects of clozapine on licking in (4):412–27. doi: 10.1037/a0032345. [HH, AR] rats: Tolerance to disruptive effects on number of licks, but no tolerance to Coolidge, F. L. & Wynn, T. (2009) The rise of Homo sapiens: The evolution of rhythm slowing. (Berlin) 120:249–55. [CMV] modern thinking. Wiley-Blackwell. [aHA] D’Ausilio, A., Craighero, L. & Fadiga, L. (2012) The contribution of the frontal lobe Coop, G., Bullaughey, K., Luca, F. & Przeworski, M. (2008) The timing of selection to the perception of speech. Journal of Neurolinguistics 25:328–35. [GP] at the human FOXP2 gene. Molecular Biology and Evolution 25:1257–59. David, H. N., Ansseau, M. & Abraini, J. H. (2005) Dopamine–glutamate reciprocal [aHA] modulation of release and motor responses in the rat caudate-putamen and Copeland, D. (2003) The basal ganglia and semantic engagement: Potential insights nucleus accumbens of “intact” animals. Brain Research. Brain Research from semantic priming in individuals with subcortical vascular lesions, Reviews 50:336–60. [aHA]

Downloaded from http:/www.cambridge.org/core588 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Davila-Ross, M., Owren, M. & Zimmermann, E. (2009) Reconstructing the evolu- Doupe, A. J. & Kuhl, P. K. (1999) Birdsong and human speech: Common themes tion of laughter in great apes and humans. Current Biology 19:1106–11. and mechanisms. Annual Review of Neuroscience 22:567–631. [aHA] [GAB] Doupe, A. J., Perkel, D. J., Reiner, A. & Stern, E. A. (2005) Birdbrains could teach Davis, B. L. & MacNeilage, P. F. (2002) The internal structure of the syllable. In: basal ganglia research a new song. Trends in Neurosciences 28(7):353–63. The evolution of language out of pre-language, ed. T. Givòn & B. F. Malle, pp. [aHA, KBC] 135–54. John Benjamins. [DLB] Doya, K. (2000) Complementary roles of basal ganglia and cerebellum in learning Davis, P. J., Zhang, S. P., Winkworth, A. & Bandler, R. (1996) Neural control of and motor control. Current Opinion in Neurobiology 10:732–39. [aHA] vocalization: Respiratory and emotional influences. Journal of Voice Doya, K. (2008) Modulators of decision making. Nature Neuroscience 11:410–16. 10:23–38. [aHA] [TH] de Boer, B. (2008) The acoustic role of supralaryngeal air sacs. Journal of the Doyon, J. & Benali, H. (2005) Reorganization and plasticity in the adult brain during Acoustical Society of America 123(5 Pt 2):3732–33. [BdB] learning of motor skills. Current Opinion in Neurobiology 15:161–67. [aHA] de Boer, B. (2012) Air sacs and vocal fold vibrations: Implications for evolution of Dronkers, N. F. (1996) A new brain region for coordinating speech articulation. speech. Theoria et Historia Scientiarum 9:13–28. [BdB] Nature 384(6605):159–61. doi: 10.1038/384159a0. [CM, DYT] de Boysson-Bardie, B. (2001) How language comes to children. MIT Press. [DLB] Duffy, J. R. (2005) Motor speech disorders: Substrates, differential diagnosis, and Dediu, D. & Levinson, S. C. (2013) On the antiquity of language: The reinterpre- management, 2nd edition. Elsevier Mosby. [aHA] tation of Neandertal linguistic capacities and its consequences. Frontiers in Dum, R. P. & Strick, P. L. (2002) Motor areas in the frontal lobe of the primate. Psychology 4(397):1–17. doi: 10.3389/fpsyg.2013.00397. [SJ] Physiology and Behavior 77:677–82. [aHA] De Filippis, B., Ricceri, L. & Laviola, G. (2010) Early postnatal behavioral changes in Dunbar, R. I. M. (1993) Coevolution of neocortical size, group size and language in the Mecp2-308 truncation mouse model of Rett syndrome. Genes, Brain and humans. Behavioral and Brain Sciences 16:681–94. [DLB] Behavior 9:213–23. [PBM] Dunbar, R. I. M. (1996) Grooming, gossip, and the evolution of language. Harvard Dehaene, S. & Cohen, L. (2007) Cultural recycling of cortical maps. Neuron 56 University Press. [aHA] (2):384–98. doi: 10.1016/j.neuron.2007.10.004. [GP, AZ] Dunbar, R. I. M. (2012) On the evolutionary function of song and dance. In: Music, De Letter, M., Santens, P., Estercam, I., Van Maele, G., De Bodt, M., Boon, P. & language, and human evolution, ed. N. Bannan, pp. 201–14. Oxford University Van Borsel, J. (2007) Levodopa-induced modifications of prosody and com- Press. [aHA] prehensibility in advanced Parkinson’s disease as perceived by professional lis- Egan, J. P. & Hake, H. W. (1950) On the masking pattern of a simple auditory teners. Clinical Linguistic Phonology 21:783–91. [CMV] stimulus. Journal of the Acoustical Society of America 22:622–30. [DJW] DeLong, M. R. & Wichmann, T. (2007) Circuits and circuit disorders of the basal Egnor, S. E. R. & Hauser, M. D. (2006) Noise-induced vocal modulation in cotton- ganglia. Archives of Neurology 64:20–24. [aHA] top tamarins (Saguinus oedipus). American Journal of Primatology 68 De Meirleir, L., Seneca, S., Lissens, W., Schoentjes, E. & Desprechins, B. (1995) (12):1183–90. [DJW] Bilateral striatal necrosis with a novel point mutation in the mitochondrial Egnor, S. E. R., Iguina, C. G. & Hauser, M. D. (2006) Perturbation of auditory ATPase 6 gene. Pediatric Neurology 13:242–46. [aHA] feedback causes systematic perturbation in vocal structure in adult cotton-top Demirezen, M. (1988) Behaviorist theory and language learning. Hacettepe Vni- tamarins. Journal of Experimental Biology 209(18):3652–63. [DJW] versitesi Eðitim Fakültesi Dergisi 3:135–40. [TH] Egnor, S. E. R., Wickelgren, J. G. & Hauser, M. D. (2007) Tracking silence: Ad- Demolin, D. & Delvaux, V. (2006) A comparison of the articulatory parameters in- justing vocal production to avoid acoustic interference. Journal of Comparative volved in the production of sounds of bonobos and modern humans. In: The Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology 193 evolution of language: Proceedings of the 6th International Conference (4):477–83. [aHA, DJW] (Evolang6), ed. A. Cangelosi, A. D. M. Smith & K. Smith. pp. 67–74. World Eimas, P. D., Siqueland, E. R., Jusczyk, P. & Vigorito, J. (1971) Speech perception in Scientific. [BdB] infants. Science 171:303–306. [DLB] Deriziotis, P. & Fisher, S. E. (2013) of speech and language disor- Einspieler, C. & Marschik, P. B. (2012) Central Pattern Generators and their sig- ders: The road ahead. Genome Biology 14:204. Available at: http://genomebi- nificance for the foetal motor function. Klinische Neurophysiologie 43:16–21. ology.com/2013/14/4/204 [KJA] [PBM] Desmurget, M. & Turner, R. S. (2008) Testing basal ganglia motor functions through Ekman, P. & Friesen, W. (1978) The facial action coding system. Consulting Psy- reversible inactivations in the posterior internal globus pallidus. Journal of chologists Press. [DKO] Neurophysiology 99(3):1057–76. doi: 10.1152/jn.01010.2007. [AZ] Eliades, S. J. & Wang, X. (2012) Neural correlates of the Lombard effect in primate Desmurget, M. & Turner, R. S. (2010) Motor sequences and the basal ganglia: Ki- auditory cortex. The Journal of Neuroscience 32(31):10737–48. [DJW] nematics, not habits. Journal of Neuroscience 30(22):7685–90. doi: 10.1523/ Ellgring, H. & Scherer, K. R. (1996) Vocal indicators of mood change in depression. JNEUROSCI.0163-10.2010. [AZ] Journal of Nonverbal Behavior 20:83–110. [rHA] Dessalles, J.-L. (2007) Why we talk: The evolutionary origins of language. Oxford Ellis, N. C. (2009) Language is a complex adaptive system: Position paper. Language University Press. [aHA] Learning 59:1–26. [TAM] Deutch, A. Y., Colbran, R. J. & Winder, D. J. (2007) Striatal plasticity and medium Elowson, A. M. & Snowdon, C. T. (1994) Pygmy marmosets, Cebuella pygmaea, spiny neuron dendritic remodeling in Parkinsonism. Parkinsonism and Related modify vocal structure in response to changed social environment. Animal Disorders 13 (Suppl. 3):S251–58. [aHA] Behaviour 47:1267–77. [aHA] De Waal, F. B. M. (1988) The communicative repertoire of captive bonobos com- Elowson, A. M., Snowdon, C. T. & Lazaro-Perea, C. (1998) “Babbling” and social pared to that of chimpanzees. Behaviour 106:183–251. [aHA] context in infant monkeys: Parallels to human infants. Trends in Cognitive Dewson, J. H. (1964) Speech sound discrimination by cats. Science 144:555–56. Sciences 2:31–37. [PBM] [DLB] Enard, W. (2011) FOXP2 and the role of cortico-basal ganglia circuits in speech and Diller, K. C. & Cann, R. L. (2009) Evidence against a genetic-based revolution in language evolution. Current Opinion in Neurobiology 21:415–24. [aHA] language 50,000 years ago. In: The cradle of language, ed. R. Botha & C. Knight, Enard, W., Gehre, S., Hammerschmidt, K., Hölter, S. M., Blass, T., Somel, M., pp. 135–49. Oxford University Press. [SJ] Brückner, M. K., Schreiweis, C., Winter, C., Sohr, R., Becker, L., Wiebe, V., Diller, K. C. & Cann, R. L. (2012) Genetic influences on language evolution: An Nickel, B., Giger, T., Müller, U., Groszer, M., Adler, T., Aguilar, A., Bolle, I., evaluation of the evidence. In: The Oxford Handbook of Language Calzada-Wack, J., Dalke, C., Ehrhardt, N., Favor, J., Fuchs, H., Gailus-Durner, Evolution, ed. M. Tallerman & K. R. Gibson, pp. 168–75. Oxford University V., Hans, W., Hölzlwimmer, G., Javaheri, A., Kalaydjiev, S., Kallnik, M., Kling, Press. [SJ] E., Kunder, S., Mossbrugger, I., Naton, B., Racz, I., Rathkolb, B., Rozman, J., Dindo, H., Zambuto, D. & Pezzulo, G. (2011) Motor simulation via coupled internal Schrewe, A., Busch, D. H., Graw, J., Ivandic, B., Klingenspor, M., Klopstock, models using sequential Monte Carlo. In: Proceedings of the Twenty-Second T., Ollert, M., Quintanilla-Martinez, L., Schulz, H., Wolf, E., Wurst, W., International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Zimmer, A., Fisher, S. E., Morgenstern, R., Arendt, T., de Angelis, M. H., Catalonia, Spain, 16–22 July 2011, ed. Toby Walsh, pp. 2113–19. AAAI Press/ Fischer, J., Schwarz, J. & Pääbo, S. (2009) A humanized version of Foxp2 affects International Joint Conferences on Artificial Intelligence. [GP] cortico-basal ganglia circuits in mice. Cell 137:961–71. [aHA, KJA] Dissanayake, E. (2009) Root, leaf, blossom, or bole: Concerning the origin and Enard, W. & Pääbo, S. (2004) Comparative primate genomics. Annual Review of adaptive function of music. In: Communicative musicality: Exploring the basis Genomics and Human Genetics 5:351–78. [aHA] of human companionship, ed. S. Malloch & C. Trevarthen, pp. 17–30. Oxford Enard, W., Przeworski, M., Fisher, S. E., Lai, C. S., Wiebe, V., Kitano, T., Monaco, University Press. [aHA] A. P. & Pääbo, S. (2002) Molecular evolution of FOXP2, a gene involved in Dominey, P. F. & Inui, T. (2009) Cortico-striatal function in sentence comprehen- speech and language. Nature 418(6900):869–72. [aHA, SJ] sion: Insights from neurophysiology and modeling. Cortex 45(8):1012–18. doi: Endicott, P., Ho, S. Y. W. & Stringer, C. (2010) Using genetic evidence to evaluate 10.1016/j.cortex.2009.03.007. [AR] four palaeoanthropological hypotheses for the timing of Neanderthal and Donald, M. (1999) Preconditions for the evolution of protolanguages. In: The descent modern human origins. Journal of Human Evolution 59:87–95. [aHA] of mind: Psychological perspectives on hominid evolution, ed. M. C. Corballis & Esposito, A., Demeurisse, G., Alberti, B. & Fabbro, F. (1999) Complete mutism S. E. G. Lea, pp. 138–54. Oxford University Press. [aHA] after midbrain periaqueductal gray lesion. NeuroReport 10:681–85. [aHA]

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 589 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Esposito, G., del Carmen Rostagno, M., Venuti, P., Haltigan, J. D. & Messinger, Fischer, J., Kitchen, D. M., Seyfarth, R. M. & Cheney, D. L. (2004) Baboon loud D. S. (2013) Brief Report: Atypical expression of distress during the calls advertise male quality: Acoustic features and their relation to rank, age, and separation phase of the strange situation procedure in infant siblings at high risk exhaustion. and 56:140–48. [aHA] for ASD. Journal of Autism and Developmental Disorders 44(4):975–80. Fisher, S. E., Lai, C. S. L. & Monaco, A. P. (2003) Deciphering the genetic basis of [MHB] speech and language disorders. Annual Review of Neuroscience 26:57–80. Esposito, G. & Venuti, P. (2009) Comparative analysis of crying in children with [aHA] autism, developmental delays, and typical development. Focus on Autism and Fisher, S. E. & Scharff, C. (2009) FOXP2 as a molecular window into speech and Other Developmental Disabilities 24(4):240–47. [MHB] language. Trends in Genetics 25:166–77. [aHA] Esteve-Gibert, N. & Prieto, P. (2013) Prosody signals the emergence of intentional Fitch, W. T. (1997) Vocal tract length and formant frequency dispersion correlate communication in the first year of life: Evidence from Catalan-babbling infants. with body size in rhesus macaques. Journal of the Acoustical Society of America Journal of Child Language 40(5):919–44. [PL] 102:1213–22. [aHA] Evatt, M. L., DeLong, M. R. & Vitek, J. L. (2002) Parkinson’s disease. In: Fitch, W. T. (2000a) The evolution of speech: A comparative review. Trends in Diseases of the nervous system, vol. 1, 3rd edition, ed. A. K. Asbury, G. M. Cognitive Sciences 4(7):258–67. [aHA, BdB] McKhann, W. I. McDonald & P. J. Goadsby, pp. 477–89. Cambridge University Fitch, W. T. (2000b) The phonetic potential of nonhuman vocal tracts: Comparative Press. [aHA] cineradiographic observations of vocalizing animals. Phonetica 57:205–18. Fadiga, L., Craighero, L. & D’Ausilio, A. (2009) Broca’s area in language, action, and [aHA] music. Annals of the New York Academy of Sciences 1169(1):448–58. doi: Fitch, W. T. (2006) The biology and evolution of music: A comparative perspective. 10.1111/j.1749-6632.2009.04582.x. [GP, AZ] Cognition 100:173–215. [TAM] Fagan, B. (2010) Cro-Magnon: How the Ice Age gave birth to the first modern Fitch, W. T. (2010) The evolution of language. Cambridge University Press. [PL] humans. Bloomsbury. [rHA] Fitch, W. T. (2011) Speech perception: A language-trained chimpanzee weighs in. Falk, D. (2004) Prelinguistic evolution in early hominins: Whence motherese? Current Biology 21(14):R543–46. [BF] Behavioral and Brain Sciences 27(4):491–503. [aHA, SJ] Fitch, W. T. (2012) The biology and evolution of rhythm: Unraveling a paradox. In: Falk, D. (2007) Evolution of the primate brain. In: Handbook of palaeoanthropology, Language and music as cognitive systems, ed. P. Rebuschat, M. Rohrmeier, J. A. vol. 2: Primate evolution and human origins, ed. W. Henke & I. Tattersall, pp. Hawkins & I. Cross, pp. 73–95. Oxford University Press. [aHA, AR] 1133–62. Springer-Verlag. [aHA, BM] Fitch, W. T. (2013) Rhythmic syntax and rhythmic cognition in humans and animals: Falk, D. (2009) Finding our tongues: Mothers, infants and the origins of language. A computational and comparative perspective. Frontiers in Systems Neurosci- Basic Books. [aHA] ence 7:68. doi: 10.3389/fnsys.2013.00068. [HH] Fant, G. (1960) Acoustic theory of speech production. Mouton. [BdB] Fitch, W. T. & Hauser, M. D. (2004) Computational constraints on syntactic pro- Fant, G. (1970) Acoustic theory of speech production – with calculations based on X- cessing in a nonhuman primate. Science 303(5656):377–80. doi: 10.1126/ ray studies of Russian articulations, 2nd edition. Mouton. [aHA] science.1089401. [CIP] Fedurek, P., Schel, A. M. & Slocombe, K. E. (2013) The acoustic structure of Fitch, W. T., Huber, L. & Bugnyar, T. (2010) Social cognition and the evolution of chimpanzee pant-hooting facilitates chorusing. Behavioral Ecology and Socio- language: Constructing cognitive phylogenies. Neuron 65(6):795–814. doi: biology 67(11):1781–89. [AR] 10.1016/j.neuron.2010.03.011. [aHA, ARL, BM, CIP] Fedurek, P. & Slocombe, K. E. (2011) Primate vocal communication: A useful tool Fitch, W. T. & Jarvis, E. D. (2012) Birdsong and other animal models for human for understanding human speech and language evolution? Human Biology speech, song, and vocal learning. In: Language, music and the brain, ed. M. 83:153–73. [DHR] Arbib, pp. 499–540. MIT Press. [CIP] Fee, E. J. (1995) The phonological system of a specifically language-impaired pop- Fitch, W. T. & Reby, D. (2001) The descended larynx is not uniquely human. Pro- ulation. Clinical Linguistics and Phonetics 9:189–209. [aHA] ceedings of the Royal Society, B: Biological Sciences 268:1669–75. [aHA] Feenders, G., Liedvogel, M., Rivas, M., Zapka, M., Horita, H., Hara, E. & Jarvis, Flaherty, A. W. & Graybiel, A. M. (1993) Two input systems for body representations E. D. (2008) Molecular mapping of movement-associated areas in the avian in the primate striatal matrix: Experimental evidence in the squirrel monkey. brain: A motor theory for vocal learning origin. PLOS ONE 3(3):e1768. doi: Journal of Neuroscience 13:1120–37. [aHA] 10.1371/journal.pone.0001768. [CIP] Flaherty, A. W. & Graybiel, A. M. (1994) Input-output organization of the sensori- Feldman, R. (2007) Parent–infant synchrony: Biological foundations and develop- motor striatum in the squirrel monkey. Journal of Neuroscience 14:599–610. mental outcomes. Current Directions in Psychological Science 16(6):340–45. [aHA] [KJA] Flege, J. E., MacKay, I. R. & Meador, D. (1999a) Native Italian speakers’ perception Ferguson, B. & Waxman, S. R. (2013) Communication and categorization: New and production of English vowels. Journal of the Acoustical Society of America insights into the relation between speech, labels and concepts for infants. In: 106(5):2973–87. [CM] Proceedings of the 35th Annual Conference of the Cognitive Science Society, ed. Flege, J. E., Yeni-Komshian, G. H. & Liu, S. (1999b) Age constraints on Second- M. Knauff, M. Pauen, N. Sebanz & I. Wachsmuth, pp. 2267–72. Cognitive Language Acquisition. Journal of Memory and Language 41:78–104. [CM] Science Society. [BF] Fletcher, N. H. (1993) Autonomous vibration of simple pressure-controlled valves in Fernandez-Duque, E., Valeggia, C. R. & Mendoza, S. P. (2009) The biology of pa- gas flows. Journal of the Acoustical Society of America 93(4, Pt. 1):2172–80. ternal care in human and nonhuman primates. Annual Review of Anthropology [BdB] 38:115–30. [DYT] Fletcher, P. (1990) Speech and language defects. Nature 346(6281):226. [KJA] Ferrari, P. F., Paukner, A., Ionica, C. & Suomi, S. J. (2009) Reciprocal face-to-face Flowers, K. A. & Robertson, C. (1985) The effects of Parkinson’s disease on the communication between rhesus macaque mothers and their newborn infants. ability to maintain a mental set. Journal of Neurology, Neurosurgery, and Current Biology 19:1768–72. [DYT] Psychiatry 48(6):517–29. [PL] Ferry, A. L., Hespos, S. J. & Waxman, S. R. (2010) Categorization in 3- and 4-month- Franklin, B. S., Warlaumont, A. S., Messinger, D. S., Bene, E. R., Iyer, S. N., old infants: An advantage of words over tones. Child Development 81 Lee, C.-C., Lambert, B. & Oller, D. K. (2014) Effects of parental interaction (2):472–79. [BF] on infant vocalization rate, variability and vocal type. Language Learning Ferry, A. L., Hespos, S. J. & Waxman, S. R. (2013) Nonhuman primate vocalizations and Development 10(3):279–96. doi.org/10.1080/15475441.2013.849176. support categorization in very young human infants. Proceedings of the National [DKO] Academy of Sciences USA 110(38):15231–35. [BF] French, C. A., Jin, X., Campbell, T. G., Gerfen, E., Groszer, M., Fisher, S. E. & Fichtel, C., Hammerschmidt, K. & Jürgens, U. (2001) On the vocal expression of Costa, R. M. (2012) An aetiological Foxp2 mutation causes aberrant striatal emotion: A multi-parametric analysis of different states of aversion in the activity and alters plasticity during skill learning. Molecular Psychiatry 17:1077– squirrel monkey. Behaviour 138:97–116. [rHA] 85. [CMV] Fink, G. R., Frackowiak, R. S. J., Pietrzyk, U. & Passingham, R. E. (1997) Multiple Friederici, A. D. (2011) The brain basis of language processing: From structure to nonprimary motor areas in the human cortex. Journal of Neurophysiology function. Physiological Reviews 91(4):1357–92. doi: 10.1152/ 77:2164–74. [aHA] physrev.00006.2011. [CIP] Finlay, B. L. & Darlington, R. B. (1995) Linked regularities in the development and Friederici, A. D. & Kotz, S. A. (2003) The brain basis of syntactic processes: Func- evolution of mammalian brains. Science 268:1578–84. [aHA] tional imaging and lesion studies. NeuroImage 20:S8–S17. doi: 10.1016/S1053- Fiorillo, C. D. (2013) Two dimensions of value: Dopamine neurons represent reward 8119(03)00522-6. [AR] but not aversiveness. Science 341:546–49. [CMV] Friston, K. (2010) The free-energy principle: A unified brain theory? Nature Reviews Fischer, J. (2003) Developmental modifications in the vocal behavior of non-human Neuroscience 11:127–38. [GP] primates. In: Primate audition: Ethology and neurobiology, ed. A. A. Ghazanfar, Frith, C. (2009) Role of facial expressions in social interactions. Philosophical pp. 109–25. CRC Press. [aHA] Transactions of the Royal Society B: Biological Sciences 364:3453–58. [CMV] Fischer, J., Hammerschmidt, K., Cheney, D. L. & Seyfarth, R. M. (2002) Acoustic Frodi, A. (1985) When empathy fails: Aversive infant crying and child abuse. In: features of male baboon loud calls: Influences of context, age, and individuality. Infant crying: Theoretical and research perspective, ed. B. M. Lester & C. F. Z. Journal of the Acoustical Society of America 111:1465–74. [aHA] Boukydis, pp. 263–78. Plenum Press. [DLB]

Downloaded from http:/www.cambridge.org/core590 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Frodi A. & Senchack, M. (1990) Verbal and behavioral responsiveness to the cries of Giroud, M., Lemesle, M., Madinier, G., Billiar, T. & Dumas, R. (1997) atypical infants. Child Development 61:76–84. [DLB] Unilateral lenticular infarcts: Radiological and clinical syndromes, aetiology, Frühholz, S. & Grandjean, D. (2013) Processing of emotional vocalizations in and prognosis. Journal of Neurology, Neurosurgery, and Psychiatry bilateral inferior frontal cortex. Neuroscience and Biobehavioral Reviews.37 63:611–15. [aHA] (10):2847–55. [SF] Goldin-Meadow, S. & Mylander, C. (1983) Gestural communication in deaf chil- Fulkerson, A. L. & Waxman, S. R. (2007) Words (but not tones) facilitate object dren: Noneffect of parental input on language development. Science, New categorization: Evidence from 6- and 12-month-olds. Cognition 105(1):218– Series 221(4608):372–74. [BF] 28. [BF] Goldstein, L. M., Byrd, D. & Saltzman, E. (2006) The role of vocal tract gestural Gabrieli, J. D., Stebbins, G. T., Singh, J., Willingham, D. B. & Goetz, C. G. (1997) action units in understanding the evolution of phonology. In: Action to language Intact mirror-tracing and impaired rotary-pursuit skill learning in patients with via the mirror neuron system, ed. M. A. Arbib, pp. 215–49. Cambridge Uni- Huntington’s disease: Evidence for dissociable memory systems in skill learning. versity Press. [rHA] Neuropsychology 11(2):272–81. [UH] Goldstein, M. H., Schwade, J. A. & Bornstein, M. H. (2009) The value of vocalizing: Gale, J. T., Lee, K. H., Amirnovin, R., Roberts, D. W., Williams, Z. M., Blaha, C. D. Five-month-old infants associate their own noncry vocalizations with responses & Eskandar, E. N. (2013) Electrical stimulation-evoked dopamine release in the from adults. Child Development 80(3):636–44. [DKO] primate striatum. Stereotactic Functions Neurosurgery 91(6):355–63. doi: Gomez Portillo, I. J. & Gleiser, P. M. (2009) An adaptive complex network model for 10.1159/000351523. [CMV] brain functional networks. PLoS ONE 4(9):e6863. [TAM] Gantz, S. C., Ford, C. P., Neve, K. A. & Williams, J. T. (2011) Loss of Mecp2 in Gonzalez-Lima, F. (2010) Responses of limbic, midbrain and brainstem structures substantia nigra dopamine neurons compromises the nigrostriatal pathway. The to electrically-induced vocalizations. In: Handbook of mammalian vocalization: Journal of Neuroscience 31:12629–37. [PBM] An integrative neuroscience approach, ed. S. M. Brudzynski, pp. 293–301. Garnier, M., Henrich, N. & Dubois, D. (2010) Influence of sound immersion and Elsevier. [aHA] communicative interaction on the Lombard effect. Journal of Speech, Lan- Goodall, J. (1986) The chimpanzees of Gombe: Patterns of behavior. Belknap Press/ guage, and Hearing Research 53:588–608. [DJW] Harvard University Press. [aHA] Gaser, C. & Schlaug, G. (2003) Brain structures differ between musicians and non- Gopnik, M. (1990a) Feature-blind grammar and dysphasia. Nature 344(6268):715. musicians. Journal of Neuroscience 23:9240–45. [aHA] doi: 10.1038/344715a0. [aHA, AZ] Geissmann, T. (1984) Inheritance of song parameters in the gibbon song, analysed in Gopnik, M. (1990b) Genetic basis of grammar defect. Nature 347(6288):26. [KJA] 2 hybrid gibbons (Hylobates pileatus X H. lar). Folia Primatologica 42:216– Grahn, J. A. & Brett, M. (2007) Rhythm and beat perception in motor areas of the 35. [aHA] brain. Journal of Cognitive Neuroscience 19(5):893–906. doi: 10.1162/ Geissmann, T. (2000) Gibbon songs and human music from an evolutionary per- jocn.2007.19.5.893. [HH] spective. In: The origins of music, ed. N. L. Wallin, B. Merker & S. Brown, pp. Granata, A. R. & Woodruff, G. N. (1982) Dopaminergic mechanisms in the nucleus 103–23. MIT Press. [aHA, AR] tractus solitarius and effects on blood pressure. Brain Research Bulletin Gentilucci, M. & Dalla Volta, R. (2008) Spoken language and arm gestures are 8:483–88. [CMV] controlled by the same motor control system. The Quarterly Journal of Graybiel, A. M. (1990) and neuromodulators in the basal ganglia. Experimental Psychology 61:944–57. [AM] Trends in Neurosciences 13:244–54. [aHA] Gerardin, E., Lehéricy, S., Pochon, J.-B., Tézenas du Montcel, S., Mangin, J.-F., Graybiel, A. M. (2005) The basal ganglia: Learning new tricks and loving it. Current Poupon, F., Agid, Y., Le Bihan, D. & Marsault, C. (2003) Foot, hand, face and Opinion in Neurobiology 15:638–44. [aHA] eye representation in the human striatum. Cerebral Cortex 13:162–69. [aHA] Graybiel, A. M. (2008) Habits, rituals, and the evaluative brain. Annual Review of Gerfen, C. R. (2010) Functional neuroanatomy of dopamine in the striatum. In: Neuroscience 31:359–87. doi: 10.1146/annurev.neuro.29.051605.112851. Dopamine handbook, ed. L. L. Iversen, S. D. Iversen, S. B. Dunnett & A. [aHA, AZ] Björklund, pp. 11–21. Oxford University Press. [aHA] Green, R. E., Krause, J. Briggs, A. W., Maricic, T., Stenzel, U., Kircher, M., Pat- Gerfen, C. R. & Bolam, J. P. (2010) The neuroanatomical organization of the basal terson, N., Li, H., Zhai, W., Fritz, M. H.-Y., Hansen, N. F., Durand, E. Y., ganglia. In: Handbook of basal ganglia structure and function, ed. H. Steiner & Malaspinas, A.-S., Jensen, J. D., Marques-Bonet, T., Alkan, C., Prüfer, K., K. Y. Tseng, pp. 3–28. Elsevier. [aHA] Meyer, M., Burbano, H. A., Good, J. M., Schultz, R., Aximu-Petri, A., Butthof, Gerfen, C. R. & Surmeier, D. J. (2011) Modulation of striatal projection systems by A., Höber, B., Höffner, B., Siegemund, M., Weihmann, A., Nusbaum, C., dopamine. Annual Review of Neuroscience 34:441–66. [aHA] Lander, E. S., Russ, C., Novod, N., Affourtit, J., Egholm, M., Verna, C., Rudan, Ghanbarian, E. & Motamedi, F. (2013) Ventral tegmental area inactivation sup- P., Brajkovic, D., Kucan, E., Gusic, I., Doronichev, V. B., Golovanova, L. V., presses the expression of CA1 long term potentiation in anesthetized rat. PLOS Lalueza-Fox, C. , de la Rasilla, M., Fortea, J., Rosas, A., Schmitz, R. W., ONE 8:e58844. [CMV] Johnson, P. L. F., Eichler, E. E., Falush, D., Birney, E., Mullikin, J. C., Slatkin, Ghazanfar, A. A. (2013) Multisensory vocal communication in primates and the M., Nielsen, R., Kelso, J., Lachmann, M., Reich, D. & Pääbo, S. (2010) A draft evolution of rhythmic speech. Behavioral Ecology and Sociobiology 67(9):1441– sequence of the Neandertal genome. Science 328(5979):710–22. [aHA, SJ] 48. [DYT] Greenberg, S., Carvey, H., Hitchcock, L. & Chang, S. (2003) Temporal properties of Ghazanfar, A. A., Chandrasekaran, C. & Morrill, R. J. (2010) Dynamic, rhythmic spontaneous speech – a syllable-centric perspective. Journal of Phonetics 31 facial expressions and the superior temporal sulcus of macaque monkeys: Im- (3–4):465–85. [DYT] plications for the evolution of audiovisual speech. European Journal of Neuro- Griebel, U. & Oller, D. K. (2008) Evolutionary forces favoring contextual flexibility. science 31:1807–17. [DYT] In: Evolution of communicative flexibility: Complexity, creativity and adapt- Ghazanfar, A. A. & Logothetis, N. K. (2003) Facial expressions linked to monkey ability in human and , ed. D. K. Oller & U. Griebel, pp. calls. Nature 423(6943):937–38. [DYT] 9–40. MIT Press. [DKO] Ghazanfar, A. A. & Miller, C. T. (2006) Language evolution: Loquacious monkey Grillner, S. (1991) Recombination of motor pattern generators. Current Biology brains? Current Biology 16:R879–81. [aHA] 1:231–33. [aHA] Ghazanfar, A. A., Morill, R. J. & Kayser, C. (2013) Monkeys are perceptually tuned Grillner, S., Deliagina, T., Ekeberg, O., el Manira, A., Hill, R. H., Lansner, A., to facial expressions that exhibit a theta-like speech rhythm. Proceedings of the Orlovsky, G. N. & Wallén, P. (1995) Neural networks that co-ordinate loco- National Academy of Sciences USA 110:1959–63. [aHA] motion and body orientation in lamprey. Trends in Neurosciences 18:270–79. Ghazanfar, A. A. & Poeppel, D. (2014) The neurophysiology and evolution of the [PBM] speech rhythm. In: The cognitive neurosciences V (5th edition), ed. M. S. Grillner, S. & Wallén, P. (2004) Innate versus learned movements – a false dichot- Gazzaniga & G. R. Mangun, pp. 629–38. MIT Press. [DYT] omy? In: Brain mechanisms for the integration of posture and movement, ed. S. Ghazanfar, A. A. & Rendall, D. (2008) Evolution of human vocal production. Mori, D. G. Stuart & M. Wiesendanger, pp. 3–12. (Progress in Brain Research, Current Biology 18(11):R457–60. [aHA, DYT] vol. 143). Elsevier. [aHA] Ghazanfar, A. A., Takahashi, D. Y., Mathur, N. & Fitch, W. T. (2012) Cineradiog- Grilo, A., Caetano, A. & Rosa, A. (2002) Immune system simulation through a raphy of monkey lipsmacking reveals the putative origins of speech dynamics. complex adaptive system model. In: Soft Computing and Industry, ed. R. Roy, Current Biology 22:1176–82. [aHA, DYT] M. Köppen, S. Ovaska, T. Furuhashi, & F. Hoffmann, pp. 675–98. Springer. Ghazanfar, A. A., Turesson, H. K., Maier, J. X., van Dinther, R., Patterson, R. D. & [TAM] Logothetis, N. K. (2007) Vocal tract resonances as indexical cues in rhesus Groenewegen, H. J. (2003) The basal ganglia and motor control. Neural Plasticity monkeys. Current Biology 17:425–30. [DYT] 10:107–20. [aHA] Gil-da-Costa, R., Martin, A., Lopes, M. A., Munˇoz, M., Fritz, J. B. & Braun, A. R. Grossman, M., Carvell, S., Gollomp S., Stern, M. B., Vernon, G. & Hurtig, H. I. (2006) Species-specific calls activate homologs of Broca’s and Wernicke’s areas (1991) Sentence comprehension and praxis deficits in Parkinson’s disease. in the macaque. Nature Neuroscience 9:1064–70. [aHA] Neurology 41(10):1620–26. [PL] Gintis, H. (2006) The economy as a complex adaptive system. (Online paper). Groswasser, Z., Korn, C., Groswasser-Reider, I. & Solzi, P. (1988) Mutism associated Available at: http://www.umass.edu/preferen/Class%20Material/Readings% with buccofacial apraxia and bihemispheric lesions. Brain and Language 20in%20Market%20Dynamics/Complexity%20Economics.pdf [TAM] 34:157–68. [aHA]

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 591 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Groszer, M., Keays, D. A., Deacon, R. M. J., de Bono, J. P., Prasad-Mulcare, S., tasks: An integrative neuroimaging approach. Cerebral Cortex 12:1157–70. Gaub, S., Baum, M. G., French, C. A., Nicod, J., Coventry, J. A., Enard, W., [TH] Fray, M., Brown, S. D. M., Nolan, P. M., Pääbo, S., Channon, K. M., Costa, R. Hardus, M. E., Lameira, A. R., Singleton, I., Morrogh-Bernard, H. C., Knott, C. D., M., Ellers, J., Ehret, G., Rawlins, J. N. P. & Fisher, S. E. (2008) Impaired Ancrenaz, M., Utami Atmoko, S. S. & Wich, S. A. (2009a) A description of the synaptic plasticity and motor learning in mice carrying a point mutation impli- orangutan’s vocal and sound repertoire, with a focus on geographic variation. In: cated in human speech deficits. Current Biology 18:354–62. [KJA] Orangutans: Geographic variation in behavioral ecology and conservation, ed. Gruber, T. & Zuberbühler, K. (2013) Vocal recruitment for joint travel in wild S. A. Wich, S. S. Utami Atmoko, T. M. Setia & C. P. van Schaik, pp. 49–64. chimpanzees. PLOS ONE 8:e76073. [ARL] Oxford University Press. [aHA] Gruber-Dujardin, E. (2010) Role of the periaqueductal gray in expressing vocaliza- Hardus, M. E., Lameira, A. R., van Schaik, C. P. & Wich, S. A. (2009b) Tool use in tion. In: Handbook of mammalian vocalization: An integrative neuroscience wild orang-utans modifies sound production: A functionally deceptive innova- approach, ed. S. M. Brudzynski, pp. 313–27. Elsevier. [aHA] tion? Proceedings of the Royal Society B: Biological Sciences 276:3689–94. Guenther, F. H. & Perkell, J. S. (2004) A neural model of speech production and its [aHA, ARL] application to studies of the role of auditory feedback in speech. In: Speech Harrington, D. L. & Haaland, K. Y. (1991) Sequencing in Parkinson’sdisease:Ab- motor control in normal and disordered speech, ed. B. Maassen, R. Kent, H. normalities programming and controlling movement. Brain 114:99–115. [PL] Peters, P. Van Lieshout & W. Hulstijn, pp. 29–49. Oxford University Press. Harrington, J., Palethorpe, S. & Watson, C. I. (2000) Does the Queen speak the [GP] Queen’s English? Elizabeth II’s traditional pronunciation has been influenced Gustafson, G. E., Wood, R. M. & Green, J. A. (2000) Can we hear the causes of by modern trends. Nature 408(6815):927–28. doi: 10.1038/35050160. [CM] infants’ crying? In: Crying as a sign, a symptom, and a signal: Clinical, emo- Hasegawa, A., Okanoya, K., Hasegawa, T. & Seki, Y. (2011) Rhythmic synchroni- tional and developmental aspects of infant and toddler crying, ed. R. G. Barr, B. zation tapping to an audio–visual metronome in budgerigars. Scientific Reports Hopkins & J. A. Green, pp. 8–22. Mac Keith Press/Wiley. [DLB] 1(Article 120):1–8. (Online journal). doi:10.1038/srep00120. [HH, AR] Habbershon, H. M., Ahmed, S. Z. & Cohen, Y. E. (2013) Rhesus macaques recog- Hasson, U., Ghazanfar, A. A., Galantucci, B., Garrod, S. & Keysers, C. (2012) Brain- nize unique multimodal face-voice relations of familiar individuals and not of to-brain coupling: A mechanism for creating and sharing a social world. Trends unfamiliar ones. Brain, Behavior, and Evolution 81:219–25. [DYT] in Cognitive Sciences 16(2):114–21. [DYT] Haber, S. N. (2003) The primate basal ganglia: Parallel and integrative networks. Hast, M. H., Fischer, J. M., Wetzel, A. B. & Thompson, V. E. (1974) Cortical motor Journal of Chemical Neuroanatomy 26(4):317–30. [UH] representation of the laryngeal muscles in Macaca mulatta. Brain Research Haber, S. N. (2010a) Integrative networks across basal ganglia circuits. In: Handbook 73:229–40. [aHA] of basal ganglia structure and function, ed. H. Steiner & K. Y. Tseng, pp. 409– Hattori, Y., Tomonaga, M. & Matsuzawa, T. (2013) Spontaneous synchronized 27. Elsevier. [aHA] tapping to an auditory rhythm in a chimpanzee. Scientific Reports 3(Article Haber, S. N. (2010b) Convergence of limbic, cognitive, and motor cortico-striatal 1566):1–6. (Online journal). doi: 10.1038/srep01566. [HH, AR] circuits with dopamine pathways in primate brain. In: Dopamine handbook, ed. Hauser, M. D., Chomsky, N. & Fitch, W. T. (2002) The faculty of language: What is L. L. Iversen, S. D. Iversen, S. B. Dunnett & A. Björklund, pp. 38–48. Oxford it, who has it, and how did it evolve? Science 298(5598):1569. [BF] University Press. [aHA] Hawkins, S. & Smith, R. (2001) Polysp: A polysystemic, phonetically-rich approach to Haber, S. N., Fudge, J. L. & McFarland, N. R. (2000) Striatonigrostriatal pathways in speech understanding. Italian Journal of Linguistics 13:99–188. [CM] primates form an ascending spiral from the shell to the dorsolateral striatum. Hayes, C. (1951) The ape in our house. Harper & Brothers. [aHA] Journal of Neuroscience 20:2369–82. [aHA] Hayes, K. J. & Hayes, C. (1952) Imitation in a home-raised chimpanzee. Journal of Haber, S. N., Kunishio, K., Mizobuchi, M. & Lynd-Balta, E. (1995) The orbital and Comparative and Physiological Psychology 45:450–59. [aHA] medial prefrontal circuit through the primate basal ganglia. Journal of Neuro- Heilman, K. M., Leon, S. A. & Rosenbek, J. C. (2004) Affective aprosodia from a science 15:4851–67. [aHA] medial frontal stroke. Brain and Language 89:411–16. [aHA] Haesler, S., Rochefort, C., Georgi, B., Licznerski, P., Osten, P. & Scharff, C. (2007) Henry, J. D. & Crawford, J. R. (2004) Verbal fluency deficits in Parkinson’s disease: A Incomplete and inaccurate vocal imitation after knockdown of FoxP2 in song- meta-analysis. Journal of the International Neuropsychological Society bird basal ganglia nucleus area X. PLoS Biology 5:2885–97. [aHA] 10:608–22. [AR] Haesler, S., Wada, K., Nshdejan, A., Morrisey, E. E., Lints, T., Jarvis, E. D. & Hernandez, A. E., Dapretto, M., Mazziotta, J. & Bookheimer, S. (2001) Language Scharff, C. (2004) FoxP2 expression in avian vocal learners and non-learners. switching and language representation in Spanish-English bilinguals: An fMRI The Journal of Neuroscience 24(13):3164–75. [SJ] study. NeuroImage 14:510–20. [TH] Hage, S. R. (2010a) Localization of the central pattern generator for vocalization. In: Hewitt, G. P., MacLarnon, A. & Jones, K. E. (2002) The functions of laryngeal air Handbook of mammalian vocalization: An integrative neuroscience approach, sacs in primates: A new hypothesis. Folia Primatologica 73:70–94. [BdB] ed. S. M. Brudzynski, pp. 329–37. Elsevier. [aHA] Hickok, G. (2012) Computational neuroanatomy of speech production. Nature Hage, S. R. (2010b) Neuronal networks involved in the generation of vocalization. In: Reviews. Neuroscience 13(2):135–45. doi: 10.1038/nrn3158. [UH] Handbook of mammalian vocalization: An integrative neuroscience approach, Hihara, S., Yamada, H., Iriki, A. & Okanoya, K. (2003) Spontaneous vocal differ- ed. S. M. Brudzynski, pp. 339–49. Elsevier. [aHA] entiation of coo-calls for tools and food in Japanese monkeys. Neuroscience Hage, S. R., Gavrilov, N. & Nieder, A. (2013) Cognitive control of distinct vocali- Research 45:383–89. [aHA] zations in rhesus monkeys. Journal of Cognitive Neuroscience 25:1692–701. Hikosaka, O. (2007) GABAergic output of the basal ganglia. In: GABA and the basal doi:0.1162/jocn_a_00428. [aHA] ganglia: From molecules to systems, ed. J. M. Tepper, E. D. Abercrombie & J. P. Hage, S. R., Jiang, T., Berquist, S. W., Feng, J. & Metzner, W. (2013) Ambient noise Bolam, pp. 209–26. (Progress in Brain Research, vol. 160). Elsevier. [aHA, induces independent shifts in call frequency and amplitude within the Lombard PBM] effect in echolocating bats. Proceedings of the National Academy of Sciences Hillix, W. A. (2007) The past, present, and possible futures of re- USA 110(10):4063–68. [DJW, rHA] search. In: Primate perspectives on behavior and cognition, ed. D. A. Washburn, Hage, S. R. & Jürgens, U. (2006) On the role of the pontine brainstem in vocal pp. 223–34. American Psychological Association. [aHA] pattern generation: A telemetric single-unit recording study in the squirrel Hinton, G. E. (2007) Learning multiple layers of representation. Trends in Cognitive monkey. Journal of Neuroscience 26:7105–15. [aHA, PBM] Sciences 11:428–34. [GP] Hage, S. R. & Nieder, A. (2013) Single neurons in monkey prefrontal cortex encode Hirose, H. (2010) Investigating the physiology of laryngeal structures. In: The volitional initiation of vocalizations. Nature Communications 4:2409. [rHA, handbook of phonetic sciences, 2nd edition, ed. W. J. Hardcastle, J. Laver & MHB] F. E. Gibbon, pp. 130–52. Wiley-Blackwell. [aHA] Hagen, E. H. & Bryant, G. A. (2003) Music and dance as a coalition signaling system. Ho, A. K., Bradshaw, J. L., Iansek, R. & Alfredson, R. (1999a) Speech volume reg- Human Nature 14(1):21–51. [GAB, AR] ulation in Parkinson’s disease: Effects of implicit cues and explicit instructions. Hagen, E. H. & Hammerstein, P. (2009) Did Neanderthals and other early humans Neuropsychologia 37:1453–60. [aHA] sing? Seeking the biological roots of music in the territorial advertisements of Ho, A. K., Iansek, R. & Bradshaw, J. L. (1999b) Regulation of Parkinsonian speech primates, lions, hyenas, and wolves. Musicae Scientiae 13(Suppl. 2):291–320. volume: The effect of interlocuter distance. Journal of Neurology, Neurosur- [GAB, AR] gery, and Psychiatry 67:199–202. [aHA] Hammerschmidt, K. & Fischer, J. (2008) Constraints in primate vocal production. Hockett, C. F. (1960) The origin of speech. Scientific American 203:89–96. [BM] In: Evolution of communicative flexibility: Complexity, creativity, and adapt- Hofreiter, M. (2011) Drafting human ancestry: What does the Neanderthal genome ability in human and animal communication, ed. D. K. Oller & U. Griebel, pp. tell us about hominid evolution? Commentary on Green et al. (2010). Human 93–119. MIT Press. [aHA] Biology 83:1–11. [aHA] Hanakawa, T., Dimyan, M. A. & Hallett, M. (2008) Motor planning, imagery, and Hogan, J. A. (1988) Cause and function in the development of behavior systems. In: execution in the distributed motor network: A time-course study with functional Handbook of Behavioral Neurobiology, vol. 9: Developmental psychobiology MRI. Cerebral Cortex 18:2775–88. [TH] and behavioral ecology, ed. E. M. Blass, pp. 63–106. Plenum. [DLB] Hanakawa, T., Honda, M., Sawamoto, N., Okada, T., Yonekura, Y., Fukuyama, H. & Holland, J. H. (2012) Signals and boundaries: Building blocks for complex adaptive Shibasaki, H. (2002) The role of rostral Brodmann area 6 in mental-operation systems. MIT Press. [TAM]

Downloaded from http:/www.cambridge.org/core592 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Holloway, R. L., Broadfield, D. C. & Yuan, M. S. (2004) The human fossil record: Jacob, F. (1977) Evolution and tinkering. Science 196:1161–66. [aHA] Vol. III. Brain endocasts – the paleoneurological evidence. Wiley. [aHA] James, W. (2003) The ceremonial animal: A new portrait of anthropology. Oxford Holstege, G. (1991) Descending pathways from the periaqueductal gray and adja- University Press. [aHA] cent areas. In: Midbrain periaqueductal gray matter: Functional, anatomical, Janik, V. & Slater, P. J. B. (1997) Vocal learning in mammals. In: Advances in the and neurochemical organization, ed. A. Depaulis & R. Bandler, pp. 239–65. Study of Behavior, vol. 26, ed. P. J. B. Slater, J. S. Rosenblatt, C. T. Snowdon, & Plenum Press. (NATO ASI Series, Life Sciences, vol. 213). [aHA] M. Milinski, pp. 59–99. Academic Press. [aHA, ARL, BM, AR, DJW] Honing, H. (2012) Without it no music: Beat induction as a fundamental musical Janik, V. M. & Slater, P. J. B. (2000) The different roles of social learning in vocal trait. Annals of the New York Academy of Sciences 1252(1):85–91. doi: 10.1111/ communication. Animal Behaviour 60:1–11. [aHA, BM] j.1749-6632.2011.06402.x. [HH] Jankovic, J. (2008) Parkinson’s disease: Clinical features and diagnosis. Journal of Honing, H., Bouwer, F. & Háden, G. P. (in press) Perceiving temporal regularity in Neurology, Neurosurgery, and Psychiatry 79:368–76. [arHA] music: The role of auditory event-related potentials (ERPs) in probing beat Jarvis, E. D. (2004a) Brains and birdsong. In: Nature’s music: The science of bird- perception. In: Neurophysiology of temporal processing, ed. H. Merchant & V. song, ed. P. Marler & H. Slabbekoorn, pp. 226–71. Elsevier. [aHA] de Lafuente. Springer. [HH] Jarvis, E. D. (2004b) Learned birdsong and the neurobiology of human language. In: Honing, H., Merchant, H., Háden, G. P., Prado, L. & Bartolo, R. (2012) Rhesus Behavioral neurobiology of birdsong, ed. H. P. Zeigler, P. Marler, pp. 749–77. monkeys (Macaca mulatta) detect rhythmic groups in music, but not the beat. (Annals of the New York Academy of Sciences, vol. 1016). New York Academy of PLoS ONE 7(12):1–10. doi: 10.1371/journal.pone.0051369. [HH] Sciences. [aHA, GJLB, CIP] Honing, H. & Ploeger, A. (2012) Cognition and the evolution of music: Pitfalls and Jarvis, E. D. (2006) Selection for and against vocal learning in birds and mammals. prospects. Topics in Cognitive Science 4:513–24. doi: 10.1111/j.1756- Ornithological Science 5:5–14. [CIP] 8765.2012.01210.x. [HH] Jarvis, E. D., Ribeiro, S., da Silva, M. L., Ventura, D., Vielliard, J. & Mello, C. V. Honing, H., ten Cate, C., Peretz, I., & Trehub, S. (in press) Without it no music: (2000) Behaviourally driven gene expression reveals song nuclei in humming- Cognition, biology, and evolution of musicality. Philosophical Transactions B. bird brain. Nature 406(6796):628–32. doi: 10.1038/35020570. [CIP] Hopkins, W. D. & Cantero, M. (2003) From hand to mouth in the evolution of Joel, D. & Weiner, I. (1994) The organization of the basal ganglia-thalamocortical language: The influence of vocal behaviour on lateralized hand use in manual circuits: Open interconnected rather than closed segregated. Neuroscience gestures by chimpanzees. Developmental Science 6:55–61. [AM] 63:363–79. [aHA] Hopkins, W. D. & Savage-Rumbaugh, E. S. (1991) Vocal communication as a Johansson, S. (2005) Origins of language: Constraints on hypotheses. John Benja- function of differential rearing experiences in Pan paniscus: A preliminary mins. [SJ, rHA] report. International Journal of Primatology 12:559–83. [aHA] Johansson, S. (2011) Constraining the time when language evolved. Linguistic and Hopkins, W. D., Taglialatela, J. P. & Leavens, D. A. (2007) Chimpanzees differen- Philosophical Investigations 10:45–59. [SJ] tially produce novel vocalizations to capture the attention of a human. Animal Johansson, S. (2013) The talking Neanderthals: What do fossils, genetics, and Behaviour 73(2):281–86. [aHA, AM] archeology say? Biolinguistics 7:35–74. [SJ] Hopkins, W. D., Taglialatela, J. P. & Leavens, D. A. (2011) Do chimpanzees have Johansson, S., Zlatev, J. & Gärdenfors, P. (2006) Why don’t chimps talk and humans voluntary control of their facial expressions and vocalizations? In: Primate sing like canaries? Comment on Locke & Bogin’s “Language and life history”. communication and human language: Vocalisation, gestures, imitation and Behavioral and Brain Sciences 29(3):287–88. [SJ] deixis in humans and non-humans, ed. A. Vilain, C. Abry, J. C. Schwartz & Joint, I. (2006) Bacterial conversations: Talking, listening and eavesdropping. A J. Vaucair, pp. 71–88. [Advances in Interaction Studies, vol. 1]. John Benja- NERC Discussion Meeting held at the Royal Society on 7 December 2005. mins. [AM] Journal of the Royal Society Interface 3(8):459–63. [KBC] Hosoda, C., Hanakawa, T., Nariai, T., Ohno, K. & Honda, M. (2012) Neural Jonas, S. (1981) The supplementary motor region and speech emission. Journal of mechanisms of language switch. Journal of Neurolinguistics 25:44–61. [TH] Communication Disorders 14:349–73. [aHA] Hosoda, C., Tanaka, K., Nariai, T., Honda, M. & Hanakawa, T. (2013) Dynamic Jonas, S. (1987) The supplementary motor region and speech. In: The frontal lobes neural network reorganization associated with second language vocabulary ac- revisited, ed. E. Perecman, pp. 241–50. Erlbaum. [aHA] quisition: A multimodal imaging study. The Journal of Neuroscience 33:13663– Jordan, K. E., Brannon, E. M., Logothetis, N. K. & Ghazanfar, A. A. (2005) Monkeys 72. [TH, rHA] match the number of voices they hear with the number of faces they see. Hotchkin, C. F. (2012) Vocal noise compensation in nonhuman mammals: Modifi- Current Biology 15:1034–38. [DYT] cation types and usage patterns. Ph.D. thesis, Department of Ecology, The Jürgens, U. (1974) On the elicitability of vocalization from the cortical larynx area. Pennsylvania State University. [DJW] Brain Research 81:564–66. [aHA] Hotchkin, C. F. & Parks, S. (2013) The Lombard effect and other noise-induced Jürgens, U. (1986) The squirrel monkey as an experimental model in the study of vocal modifications: Insight from mammalian communication systems. Biolog- cerebral organization of emotional vocal utterances. European Archives of ical Reviews 88(4):809–24. [DJW] Psychiatry and Neurological Sciences 236:40–43. [aHA] Hotchkin, C. F., Parks, S. E. & Weiss, D. J. (2013) Vocal modifications in primates: Jürgens, U. (2002a) A study of the central control of vocalization using the squirrel Effects of noise and behavioral context on vocalization structure. In: Proceed- monkey. Medical Engineering and Physics 7–8:473–77. [BM] ings of Meetings on Acoustics, Vol. 19, No. 1, Article 010061. Acoustical Society Jürgens, U. (2002b) Neural pathways underlying vocal control. Neuroscience and of America. [DJW] Biobehavioral Reviews 26:235–58. [aHA, PL] Huang, Y. C. & Hessler, N. A. (2008) Social modulation during songbird courtship Jürgens, U. & Alipour, M. (2002) A comparative study on the cortico-hypoglossal potentiates midbrain dopaminergic neurons. PLOS ONE 3:e3281. [CMV] connections in primates, using biotin dextranamine. Neuroscience Letters Hurford, J. R. (2007) The origins of meaning: Language in the light of evolution. 328:245–48. [aHA] Oxford University Press. [rHA] Jürgens, U., Kirzinger, A. & von Cramon, D. (1982) The effects of deep-reaching Hurst, J., Baraitser, M., Auger, E., Graham, F. & Norell, S. (1990) An extended lesions in the cortical face area on phonation: A combined case report and family with a dominantly inherited speech disorder. Developmental Medicine experimental monkey study. Cortex 18:125–39. [aHA] and Child Neurology 32:347–55. [aHA, KJA] Jürgens, U. & Ploog, D. (1970) Cerebral representation of vocalization in the squirrel Iannetti, P., Spalice, A., Raucci, U., Atzei, G. & Cipriani, C. (1997) Gelastic epilepsy: monkey. Experimental Brain Research 10:532–54. [aHA] Video-EEG, MRI and SPECT characteristics. Brain and Development 19:418– Jürgens, U. & Ploog, D. (1988) On the motor control of monkey calls. In: The 21. [aHA] physiological control of mammalian vocalization, ed. J. D. Newman, pp. 7–19. Ikeda, A., Lüders, H. O., Burgess, R. C. & Shibasaki, H. (1992) Movement-related Plenum Press. [DLB] potentials recorded from supplementary motor area and primary motor area. Jürgens, U. & von Cramon, D. (1982) On the role of the anterior cingulate cortex in Brain 115:1017–43. [aHA] phonation: A case report. Brain and Language 15:234–48. [aHA] Ingold, T. (1994) Tool-using, toolmaking, and the evolution of language. In: Hominid Kaestner, K. H., Knöchel, W. & Martínez, D. E. (2000) Unified nomenclature for culture in primate perspective, ed. D. Quiatt & J. Itani, pp. 279–314. University the winged helix/forkhead transcription factors. Genes and Development Press of Colorado. [aHA] 14:142–46. [aHA] Isler, K. & van Schaik, C. P. (2012) How our ancestors broke through the gray Kaufmann, W. E., Johnston, M. V. & Blue, M. E. (2005) MeCP2 expression and ceiling: Comparative evidence for cooperative breeding in early homo. Current function during brain development: Implications for Rett syndrome’s Anthropology 53:S453–65. [ARL] pathogenesis and clinical evolution. Brain and Development 27:S77–S87. Iwasa, H., Shibata, T., Mine, S., Koseki, K., Yasuda, K., Kasagi, Y., Okada, M., Yabe, [PBM] H., Kaneko, S. & Nakajima, Y. (2002) Different patterns of dipole source lo- Kawashima, S., Ueki, Y., Kato, T., Matsukawa, N., Mima, T., Hallett, M., Ito, K. & calization in gelastic seizure with or without a sense of mirth. Neuroscience Ojika, K. (2012) Changes in striatal dopamine release associated with human Research 43:23–29. [aHA] motor-skill acquisition. PLOS ONE 7:e31728. [aHA] Iwatsubo, T., Kuzuhara, S., Kanemitsu, A., Shimada, H. & Toyokura, Y. (1990) Kelemen, G. (1969) Anatomy of the larynx and the anatomical basis of vocal Corticofugal projections to the motor nuclei of the brainstem and spinal cord in performance. In: The chimpanzee, ed. G. H. Bourne, pp. 165–86. Karger. humans. Neurology 40(2):309–12. [aHA, PL, BM] [BdB]

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 593 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Kent, R. D., Kent, J. F., Weismer, G. & Duffy, J. R. (2000) What dysarthrias can tell Kotz, S. A., Schwartze, M. & Schmidt-Kassow, M. (2009) Non-motor basal ganglia us about the neural control of speech. Journal of Phonetics 28:273–302. functions: A review and proposal for a model of sensory predictability in audi- [aHA] tory language perception. Cortex 45(8):982–90. doi: 10.1016/j. Kent, R. D. & Read, C. (2002) The acoustic analysis of speech, 2nd edition. Singular/ cortex.2009.02.010. [AZ] Thomson Learning. [aHA] Kovac, S., Deppe, M., Mohammadi, S., Schiffbauer, H., Schwindt, W., Möddel, G., Ketteler, D., Kastrau, F., Vohn, R. & Huber, W. (2008) The subcortical role of Dogan, M. & Evers, S. (2009) Gelastic seizures: A case of lateral frontal lobe language processing. High level linguistic features such as ambiguity-resolution epilepsy and review of the literature. Epilepsy and Behavior 15:249–53. and the human brain: An fMRI study. NeuroImage 39(4):2002–2009. [KBC] [aHA] Kiebel, S. J., Daunizeau, J. & Friston, K. J. (2008) A hierarchy of time-scales and the Krägeloh-Mann, I., Helber, A., Mader, I., Staudt, M., Wolff, M., Groenendaal, F. & brain. PLOS Computational Biology 4:e1000209. [GP] DeVries, L. (2002) Bilateral lesions of thalamus and basal ganglia: Kikuchi, Y., Horwitz, B. & Mishkin, M. (2010) Hierarchical auditory processing di- Origin and outcome. Developmental Medicine and Child Neurology 44:477– rected rostrally along the monkey’s supratemporal plane. Journal of Neurosci- 84. [aHA] ence 30:13021–30. [DHR] Krause, J., Lalueza-Fox, C., Orlando, L., Enard, W., Green, R. E., Burbano, H. A., Kim, I.-S., Ki, C.-S. & Park, K.-J. (2010) Pediatric-onset dystonia associated with Hublin, J. J., Hänni, C., Fortea, J., De la Rasilla, M., Bertranpetit, J., Rosas, A. & bilateral striatal necrosis and G14459A mutation in a Korean family: A case Pääbo, S. (2007) The derived FOXP2 variant of modern humans was shared with report. Journal of Korean Medical Science 25:180–84. [aHA] Neandertals. Current Biology 17:1–5. [aHA] Kirby, S. (2002) Learning, bottlenecks and the evolution of recursive syntax. In: Kreiman, J. & Sidtis, D. (2011) Foundations of voice studies: An interdisciplinary Linguistic evolution through language acquisition: Formal and computational approach to voice production and perception. Wiley-Blackwell. [aHA] models, ed. T. Briscoe, pp. 173–204. Cambridge University Press. [BM] Kreitzer, A. C. & Malenka, R. C. (2008) Striatal plasticity and basal ganglia circuit Kirby, S., Cornish, H. & Smith, K. (2008) Cumulative cultural evolution in the lab- function. Neuron 60:543–54. [aHA] oratory: An experimental approach to the origins of structure in human lan- Kuhl, P. K. & Miller, J. D. (1975) Speech perception by chinchilla: Voiced voiceless guage. Proceedings of the National Academy of Sciences USA 105:10681–86. distinction in alveolar plosive consonants. Science 190:69–72. [DLB, DHR] doi: 10.1073#pnas.0707835105. [BM] Kuhl, P. K., Tsao, F. M. & Liu, H. M. (2003) Foreign-language experience in Kirk, E., Howlett, N., Pine, K. J. & Fletcher, B. (C). (2012) To sign or not to sign? infancy: Effects of short-term exposure and social interaction on phonetic The impact of encouraging infants to gesture on infant language and maternal learning. Proceedings of the National Academy of Sciences USA 100(15):9096– mind-mindedness. Child Development 84(2):574–90. [KJA] 101. [MHB] Kirschner, S. & Tomasello, M. (2010) Joint music making promotes prosocial Kunishio, K. & Haber, S. N. (1994) Primate cingulostriatal projection: Limbic striatal behavior in 4- year-old children. Evolution and Human Behavior 31:354–64. versus sensorimotor striatal input. Journal of Comparative Neurology 350:337– [GAB] 56. [aHA] Kirzinger, A. (1985) Cerebellar lesion effects on vocalization of the squirrel monkey. Kurniawan, I. T., Seymour, B., Talmi, D., Yoshida, W., Chater, N. & Dolan, R. J. Behavioural Brain Research 16:177–81. [arHA] (2010) Choosing to make an effort: The role of striatum in signaling physical Kirzinger, A. & Jürgens, U. (1982) Cortical lesion effects and vocalization in the effort of a chosen action. Journal of Neurophysiology 104(1):313–21. doi: squirrel monkey. Brain Research 233:299–315. [aHA] 10.1152/jn.00027.2010. [AZ] Kisilevsky, B. S., Hains, S. M. J., Jacquet, A.-Y., Granier-Deferre, D. & Lecanuet, Kusmierek, P., Ortiz, M. & Rauschecker, J. P. (2012) Sound-identity processing in J. P. (2004) Maturation of fetal responses to music. Developmental Science 7 early areas of the auditory ventral stream in the macaque. Journal of Neuro- (5):550–59. [DLB] physiology 107:1123–41. [DHR] Kittelberger, J. M. & Bass, A. H. (2013) Vocal-motor and auditory connectivity of Kuypers, H. G. J. M. (1958a) Corticobulbar connection to the pons and lower brain- the midbrain periaqueductal gray in a teleost fish. Journal of Comparative stem in man. Brain 81:364–88. [aHA, PL, BM] Neurology 521:791–812. [PBM] Kuypers, H. G. J. M. (1958b) Some projections from the peri-central cortex to the Kjelgaard, M. M. & Speer, S. R. (1999) Prosodic facilitation and interference in the pons and lower brain stem in monkey and chimpanzee. Journal of Comparative resolution of temporary syntactic closure ambiguity. Journal of Memory and Neurology 110:221–55. [aHA, BM] Language 40:153–94. [AR] Kwon, K., Oller, D. K. & Buder, E. H. (2007) Evidence of systematic repetition in Kluender, K. R., Diehl, R. L. & Killeen, P. R. (1987) Japanese quail can learn infant vocalizations. Paper presented at the Annual Meeting of the American phonetic categories. Science 237:1195–97. [DLB] Speech-Language-Hearing Association, Boston, MA, November 2007. Knecht, S., Dräger, B., Deppe, M., Bobe, L., Lohmann, H., Flöel, A., Ringelstein, E. [DKO] B. & Henningsen, H. (2000) Handedness and hemispheric language dominance Ladefoged, P. (2005) Vowels and consonants: An introduction to the sounds of lan- in healthy humans. Brain 123:2512–18. [AM] guages, 2nd edition. Blackwell. [aHA] Knight, C. (1999) Sex and language as pretend-play. In: The evolution of culture: An Laffin, J. J., Raca, G., Jackson, C. A., Strand, E. A., Jakielski, K. J. & Shriberg, L. D. interdisciplinary view, ed. R. Dunbar, C. Knight & C. Power, pp. 228–47. (2012) Novel candidate genes and regions for childhood apraxia of speech Edinburgh University Press. [aHA] identified by array comparative genomic hybridization. Genetic Medicine Knolle, F., Schroger, E. & Kotz, S. A. (2013) Cerebellar contribution to the pre- 14:928–36. [CMV] diction of self-initiated sounds. Cortex 49(9):2449–61. doi: 10.1016/j. Lai, C. S. L., Fisher, S. E., Hurst, J. A., Vargha-Khadem, F. & Monaco, A. P. (2001) cortex.2012.12.012. [UH] A forkhead-domain gene is mutated in a severe speech and language disorder. Koda, H., Lemasson, A., Oyakawa, C., Rizaldi, P. J. & Masataka, N. (2013) Possible Nature 413(6855):519–23. [KJA, SJ] role of mother-daughter vocal interactions on the development of species- Lai, C. S., Gerrelli, D., Monaco, A. P., Fisher, S. E. & Copp, A. J. (2003) FOXP2 specific song in gibbons. PLOS ONE 8:e71432. [ARL] expression during brain development coincides with adult sites of pathology in a Koda, H., Oyakawa, C., Kato, A. & Masataka, N. (2007) Experimental evidence for severe speech and language disorder. Brain 126(Pt 11):2455–62. doi: 10.1093/ the volitional control of vocal production in an immature gibbon. Behaviour brain/awg247. [UH] 144:681–92. [aHA] Lam, D. D., Zhou, L., Vegge, A., Xiu, P. Y., Christensen, B. T., Osundiji, M. A., Konczak, J., Ackermann, H., Hertrich, I., Spieker, S. & Dichgans, J. (1997) Control Yueh, C. Y., Evans, M. L. & Heisler, L. K. (2009) Distribution and neuro- of repetitive lip and finger movements in Parkinson’s disease: Influence of ex- chemical characterization of neurons within the nucleus of the solitary tract ternal timing signals and simultaneous execution on motor performance. responsive to serotonin agonist-induced hypophagia. Behavioral Brain Research Movement Disorders 12:665–76. [rHA] 196:139–43. [CMV] Konopka, G., Bomar, J. M., Winden, K., Coppola, G., Jonsson, Z. O., Gao, F., Peng, Lameira, A. R., Hardus, M. E., Kowalsky, B., de Vries, H., Spruijt, B. M., Sterck, E. S., Preuss, T. M., Wohlschlegel, J. A. & Geschwind, D. H. (2009) Human- H. M., Shumaker, R. W. & Wich, S. A. (2013a) Orangutan (Pongo spp.) whis- specific transcriptional regulation of CNS development genes by FOXP2. tling and implications for the emergence of an open-ended call repertoire: A Nature 462(7270):213–17. [KJA] replication and extension. Journal of the Acoustical Society of America Koopmans-van Beinum, F. J. & van der Stelt, J. M. (1986) Early stages in the de- 134:1–11. [ARL] velopment of speech movements. In: Precursors of early speech, ed. B. Lind- Lameira, A. R., Hardus, M. E., Nouwen, K. J. J. M., Topelberg, E., Delgado, R. A., blom & R. Zetterstrom, pp. 37–50. Stockton Press. [DKO] Spruijt, B. M., Sterck, E. H. M., Knott, C. D. & Wich, S. A. (2013b) Population- Kotz, S. A., Frisch, S., Cramon, S. Y. & Friederici, A. D. (2003) Syntactic language specific use of the same tool-assisted alarm call between two wild orangutan processing: ERP lesion data on the role of the basal ganglia. Journal of the In- populations (Pongopygmaeus wurmbii) indicates functional arbitrariness. PLOS ternational Neuropsychological Society 9:1053–60. [AR] ONE 8:e69749. [ARL] Kotz, S. A., Kalberlah, C., Bahlmann, J., Friederici, A. D. & Haynes, J. D. (2013) Lameira, A., Hardus, M., & Wich, S. (2012) Orangutan instrumental gesture-calls: Predicting vocal emotion expressions from the human brain. Human Brain Reconciling acoustic and gestural speech evolution models. Evolutionary Mapping 34(8):1971–81. doi: 10.1002/hbm.22041. [UH] Biology 39:415–18. [ARL] Kotz, S. A. & Schwartze, M. (2010) Cortical speech processing unplugged: A timely Lameira, A. R., Maddieson, I., & Zuberbühler, K. (2013c) Primate feedstock for the subcortico-cortical framework. Trends in Cognitive Sciences 14(9):392–99. [SF] evolution of consonants. Trends in Cognitive Science 18:60–62. [ARL]

Downloaded from http:/www.cambridge.org/core594 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Lamendella, J. T. (1977) The limbic system in human communication. In: Studies in Levitzky, S. & Cooper, R. (2000) Infant colic syndrome maternal fantasies of neurolinguistics, vol. 3, ed. H. Whitaker & H. A. Whitaker, pp. 157–222. Aca- aggression and . Clinical Pediatrics 39(7):395–400. [DLB] demic Press (Perspectives in Neurolinguistics and Psycholinguistics Series). Lewin, R. & Foley, R. A. (2004) Principles of human evolution, 2nd edition. Black- [aHA] well. [aHA] Lane, H. (1961) Operant control of vocalizing in the chicken. Journal of the Lewis, F. M., Lapointe, L. L., Murdoch, B. E. & Chenery, H. J. (1998) Language Experimental Analysis of Behavior 4:171–77. [ACC] impairment in Parkinson’s disease. Aphasiology 12(3):193–206. doi: 10.1080/ Lange, K. W., Robbins, T. W., Marsden, C. D., James, M., Owen, A. & Paul, G. M. 02687039808249446. [AR] (1992) L-dopa withdrawal in Parkinson’s disease selectively impairs cognitive Lewis, J. (2009) As well as words: Congo Pygmy hunting, mimicry, and play. In: The performance in tests sensitive to frontal lobe dysfunction. Psychopharmacology cradle of language, ed. R. Botha & C. Knight, pp. 236–56. Oxford University 107:394–404. [PL] Press. [aHA] Langers, D. R. & Melcher, J. R. (2011) Hearing without listening: Functional con- Lhommée, E., Klinger, H., Thobois, S., Schmitt, E., Ardouin, C., Bichon, A., nectivity reveals the engagement of multiple nonauditory networks during basic Kistner, A., Fraix, V., Xie, J., Aya, K. M., Chabardès, S., Seigneuret, E., sound processing. Brain Connect 1(3):233–44. doi: 10.1089/brain.2011.0023. Benabid, A. L., Mertens, P., Polo, G., Carnicella, S., Quesada, J. L., Bosson, J. [UH] L., Broussolle, E. Pollak, P. & Krack, P. (2012) Subthalamic stimulation in Langus, A., Marchetto, E., Bion, R. A. H. & Nespor, M. (2012) Can prosody be used Parkinson’s disease: Restoring the balance of motivated behaviours. Brain to discover hierarchical structure in continuous speech? Journal of Memory and 135:1463–77. [CMV] Language 66(1):285–306. doi: 10.1016/j.jml.2011.09.004. [AR] Li, G., Wang, J., Rossiter, S. J., Jones, G. & Zhang, S. (2007) Accelerated FoxP2 Laporte, M. N. C. & Zuberbühler, K. (2010) Vocal greeting behaviour in wild evolution in echolocating bats. PLoS ONE 2(9):e900. doi: 10.1371/journal. chimpanzee females. Animal Behaviour 80:467–73. [DKO] pone.0000900. [KJA, SJ] Larson, C. R. & Kistler, M. K. (1984) Periaqueductal gray neuronal activity associ- Liberman, A. M., Cooper, F. S., Shankweiler, D. P. & Studdert-Kennedy, M. (1967) ated with laryngeal EMG and vocalization in the awake monkey. Neuroscience Perception of the speech code. Psychological Review 74:431–61. [PL] Letters 46:261–66. [rHA] Lieberman, D. E. (2011) The evolution of the human head. Harvard University Larson, C. R., Sutton, D. & Lindeman, R. C. (1978) Cerebellar regulation of pho- Press. [aHA] nation in rhesus monkey (Macaca mulatta). Experimental Brain Research Lieberman, P. (1968) Primate vocalizations and human linguistic ability. Journal of 33:1–18. [aHA] the Acoustical Society of America 44:1574–84. [aHA] Larson, C. R., Sutton, D., Taylor, E. M. & Lindeman, R. (1973) Sound spectral Lieberman, P. (2000) Human language and our reptilian brain: The subcortical properties of conditioned vocalization in monkeys. Phonetica 27:100–10. [aHA] bases of speech, syntax, and thought. Harvard University Press. Laukka, P., Åhs, F., Furmark, T. & Fredrikson, M. (2011) Neurofunctional corre- [aHA, PL] lates of expressed vocal affect in social phobia. Cognitive, Affective, and Lieberman, P. (2002) On the nature and evolution of the neural bases of human Behavioral Neuroscience 11(3):413–25. [SF] language. Yearbook of Physical Anthropology 45:36–62. [PL] Leavens, D. A., Hostetter, A. B., Wesley, M. J. & Hopkins, W. D. (2004) Tactical use Lieberman, P. (2006a) Limits on tongue deformation: Diana monkey formants and of unimodal and bimodal communication by chimpanzees, Pan troglodytes. the impossible vocal tract shapes proposed by Riede et al. (2005). Journal of Animal Behaviour 67:467–76. [AM] Human Evolution 50:219–21. [aHA] Leavens, D. A., Russell, J. L. & Hopkins, W. D. (2010) Multimodal communication Lieberman, P. (2006b) Toward an evolutionary biology of language. Harvard by captive chimpanzees (Pan troglodytes). Animal Cognition 13:33–40. [AM] University Press. [aHA, PL] Le Beau, J. (1954) Anterior cingulectomy in man. Journal of Neurosurgery 11:268– Lieberman, P. (2007) The evolution of human speech: Its anatomical and neural 76. [aHA] bases. Current Anthropology 48:39–66. [aHA] Lehericy, S., Ducros, M., Van de Moortele, P-F., Francois, C. L., Thivard, L., Lieberman, P. (2009) FOXP2 and human cognition. Cell 137:800–802. [SJ, PL] Poupon, C., Swindale, N., Ugurbil, K. & Kim, D. S. (2004) Diffusion tensor Lieberman, P. (2012) Vocal tract anatomy and the neural bases of talking. Journal of fiber tracking shows distinct corticostriatal circuits in humans. Annals of Neu- Phonetics 40:608–22. [PL] rology 55(4):522–29. [PL] Lieberman, P. (2013) The unpredictable species: What makes humans unique. Lemasson, A. & Hausberger, M. (2004) Patterns of vocal sharing and social dynamics Princeton University Press. [PL] in a captive group of Campbell’s monkeys (Cercopithecus campbelli campbelli). Lieberman, P., Friedman, J. & Feldman, L. S. (1990) Syntactic deficits in Parkinson’s Journal of Comparative Psychology 118:347–59. [aHA] disease. Journal of Nervous and Mental Disease 178:360–65. [PL] Lemasson, A., Hausberger, M. & Zuberbühler, K. (2005) Socially meaningful vocal Lieberman, P., Harris, K. S. & Wolff, P. (1968) Newborn infant cry in relation to plasticity in adult Campbell’s monkeys (Cercopithecus campbelli). Journal of nonhuman primate vocalizations. Journal of Acoustic Society of America A Comparative Psychology 119:220–29. [aHA] 44:365. [DLB] Lemasson, A., Ouattara, K., Petic, J. E. & Zuberbühler, K. (2011) Social learning of Lieberman, P., Harris, K. S., Wolff, P. & Russel, L. H. (1971) Newborn infant cry vocal structure in a nonhuman primate? BMC Evolutionary Biology 11:362. and nonhuman primate vocalization. Journal of Speech and Hearing Research doi:10.1186/1471-2148-11-362. [ARL] 14:718–27. [DLB] Lenti Boero, D. (1992) Alarm calls in marmots: Evidence for semantic communi- Lieberman, P., Kako, E. T., Friedman, J., Tajchman, G., Feldman, L. S. & Jiminez, cation. Ethology, Ecology, Evolution 4(2):125–38. [DLB] E. B. (1992) Speech production, syntax comprehension, and cognitive deficits in Lenti Boero, D. (2009) Neurofunctional spectrographic analysis of the cry of brain Parkinson’s disease. Brain and Language 43:169–89. [PL] injured asphyxiated infants: A physioacoustic and clinical study. In: Models and Lieberman, P., Kanki, B. G., Protopapas, A., Reed, E. & Youngs, J. W. (1994) analysis of vocal emissions for biomedical applications, ed. C. Manfredi, pp. 3–6, Cognitive defects at altitude. Nature 372:325. [PL] Università di Firenze-Firenze University Press. [DLB] Lieberman, P., Klatt, D. H. & Wilson, W. H. (1969) Vocal tract limitations on the Lenti Boero, D. & Bottoni, L. (2006) From crying to words: Unique or multilevel vowel repertoires of rhesus monkey and other nonhuman primates. Science selective pressures? Behavioral and Brain Sciences 29(3):292–93. [DLB] 164:1185–87. [aHA] Lenti Boero, D. & Bottoni, L. (2009) Contrasting early cry and early babbling: Lieberman, P., Morey, A., Hochstadt, J., Larson, M. & Mather, S. (2005) Mount Results from a pilot study. In: Book of Abstracts, pp. 25–27. University of Torun. Everest: A space-analog for speech monitoring of cognitive deficits and stress. Available at: http://www.protolang.umk.pl/2009/public/files/book_of_abstracts. Aviation, Space and Environmental Medicine 76:198–207. [PL] pdf [DLB] Liégeois, F., Baldeweg, T., Connelly, A., Gadian, D. G., Mishkin, M. & Vargha- Lenti Boero, D., Miraglia, S. Ortalda, F. Nuti, G. Bottoni, L. & Lenti, C. (2008) Khadem, F. (2003) Language fMRI abnormalities associated with FOXP2 gene Biomusicological approach in infant cry listening. In: Musical Development and mutation. Nature Neuroscience 6:1230–37. [aHA, KJA] Learning: Proceedings of the Second European Conference on Developmental Locke, J. L. (1993) The child’s path to spoken language, First edition. Harvard Psychology of Music, Roehampton University, England, 10–12 September, University Press. [aHA, DKO] 2008, ed. A. Daubney, E. Longhi, A. Lamont & D. Hargreaves, pp. 162–65. GK Locke, J. L. (2006) Parental selection of vocal behavior. Crying, cooing, babbling, Publishing, Hull. (ISBN: 978-0-9553329-1-3). [DLB] and the evolution of language. Human Nature 17(2):155–68. [DLB] Lenti Boero, D., Volpe, C., Marcello, A., Bianchi, C. & Lenti, C. (1998) Newborns Locke, J. L. & Bogin, B. (2006) Language and life history: A new perspective on the crying in different situational contexts: Discrete or graded signals? Perceptual development and evolution of human language. Behavioral and Brain Sciences and Motor Skills 86:1123–40. [DLB] 29(3):301–11. [SJ, DLB, SJ] Lester, B. M. & Boukydis, C. F. Z. (1992) No language but a cry. In: Nonverbal vocal Logemann, J. A., Fisher, H. B., Boshes, B. & Blonsky, E. R. (1978) Frequency and communication. Comparative and developmental approaches, ed. H. Papoušek, cooccurrence of vocal tract dysfunctions in the speech of a large sample of U. Jurgens & M. Papoušek, pp. 145–73. Cambridge University Press. [DLB] Parkinson patients. Journal of Speech and Hearing Disorders 43:47–57. Levenson, R. W. (2003) Blood, sweat, and fears: The autonomic architecture of [aHA] emotion. Annals of the New York Academy of Sciences 1000:348–66. [CM] Lombard, E. (1911) Le signe de l’elevation de la voix [The sign of the elevation Levin, S. A. (1998) Ecosystems and the biosphere as complex adaptive systems. of the voice]. Annales des Maladies de l’Oreille et du Larynx 37:101–19. Ecosystems 1:431–36. [TAM] [DJW]

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 595 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Losin, E., Russell, J. L., Freeman, H., Meguerditchian, A. & Hopkins, W. D. (2008) mental retardation and expressive speech impairment. European Journal of Left hemisphere specialization for oro-facial movements of learned vocal signals Medical Genetics 55:216–21. [KJA] by captive chimpanzees. PLOS ONE 3(6):e2529. [AM] Marshall, A. J., Wrangham, R. W. & Arcadi, A. C. (1999) Does learning affect the Loucks, T. M. J., Poletto, C. J., Simonyan, K., Reynolds, C. L. & Ludlow, C. L. structure of vocalizations in chimpanzees? Animal Behaviour 58:825–30. (2007) Human brain activation during phonation and exhalation: Common vo- [aAH, ARL] litional control for two upper airway functions. NeuroImage 36:131–43. Masataka, N. (2008a) The gestural theory of and the vocal theory of language [aHA] origins are not incompatible with one another. In: The origins of language: Lu, Y. & Cooke, M. (2009) The contribution of changes in F0 and spectral tilt to Unraveling evolutionary forces, ed. N. Masataka, pp. 1–10. Springer. increased intelligibility of speech produced in noise. Speech Communication 51 [aHA] (12):1253–62. [DJW] Masataka, N. (2008b) Implication of the human musical faculty for evolution of Lund, J. P. & Kolta, A. (2006) Brainstem circuits that control mastication: Do they language. In: The origins of language: Unraveling evolutionary forces, ed. N. have anything to say during speech? Journal of Communication Disorders Masataka, pp. 133–51. Springer. [aHA] 39:381–90. [aHA, DYT] Masdeu, J. C., Schoene, W. C. & Funkenstein, H. (1978) Aphasia following infarc- Lundy, B. L. (2013) Paternal and maternal mind-mindedness and preschoolers’ tion of the left supplementary motor area: A clinicopathologic study. Neurology theory of mind: The mediating role of interactional attunement. Social Devel- 28:1220–23. [aHA] opment 22:58–74. [KJA] Massart, R., Mongeau, R. & Lanfumey, L. (2012) Beyond the monoaminergic hy- Lynch, V. J. (2009) Use with caution: Developmental systems divergence and po- pothesis: and epigenetic changes in a transgenic mouse model tential pitfalls of animal models. Yale Journal of Biology and Medicine 82:53– of depression. Philosophical Transactions of the Royal Society B 367:2485– 66. [KJA] 94. [rHA] Macaulay, V. (2005) Single, rapid coastal settlement of Asia revealed by analysis of Matell, M. S. & Meck, W. H. (2004) Cortico-striatal circuits and interval timing: complete mitochondrial genomes. Science 308:1034–36. [SJ] Coincidence detection of oscillatory processes. Brain Research: Cognitive Brain MacDermot, K. D., Bonora, E., Sykes, N., Coupe, A. M., Lai, C. S. L., Vernes, S. C., Research 21(2):139–70. doi: 10.1016/j.cogbrainres.2004.06.012. [CIP] Vargha-Khadem, F., McKenzie, F., Smith, R. L., Monaco, A. P. & Fisher, S. E. Mauboussin, M. J. (2002) Revisiting market efficiency: The stock market as a (2005) Identification of FOXP2 truncation as a novel cause of developmental complex adaptive system. Journal of Applied Corporate Finance 14:47–55. speech and language deficits. American Journal of Human Genetics 76:1074– [TAM] 80. [aHA] Mazzoni, P., Hristova, A. & Krakauer, J. W. (2007) Why don’t we move faster? MacNeilage, P. F. (1998) The frame/content theory of evolution of speech pro- Parkinson’s disease, movement vigor, and implicit motivation. The Journal of duction. Behavioral and Brain Sciences 21(4):499–511. [arHA, DYT] Neuroscience 27(27):7105–16. doi: 10.1523/JNEUROSCI.0264-07.2007. MacNeilage, P. F. (2008) The origin of speech. Oxford University Press. [arHA] [AZ] MacNeilage, P. F. & Davis, B. L. (2001) Motor mechanisms in speech ontogeny: McGettigan, C., Eisner, F., Agnew, Z. K., Manly, T., Wisbey, D. & Scott, S. K. Phylogenetic, neurobiological and linguistic implications. Current Opinion in (2013) T’ain’t what you say, it’s the way that you say it – Left insula and inferior Neurobiology 11:696–700. [rHA] frontal cortex work in interaction with superior temporal regions to control the Madden, G. J., ed. (2012) APA handbook of behavior analysis. American Psycho- performance of vocal impersonations. Journal of Cognitive Neuroscience 25 logical Association. [ACC] (11):1875–86. doi:10.1162/jocn_a_00427. [CM] Maddieson, I. (1984) Patterns of sounds. Cambridge University Press. [ARL] McHaffie, J. G., Stanford, T. R., Stein, B. E., Coizet, V. & Redgrave, P. (2005) Mallet, N., Ballion, B., Le Moine, C. & Gonon, F. (2006) Cortical inputs and GABA Subcortical loops through the basal ganglia. Trends in Neurosciences 28:401– interneurons imbalance projection neurons in the striatum of parkinsonian rats. 407. [aHA] Journal of Neuroscience 26:3875–84. [aHA] McNaughton, R. & Papert, S. (1971) Counter-free Automata. MIT Press. [KBC] Malloch, S. & Trevarthen, C., eds. (2009) Communicative musicality: Exploring the Meguerditchian, A., Cochet, H. & Vauclair, J. (2011) From gesture to language: basis of human companionship. Oxford University Press. [aHA] ontogenetic and phylogenetic perspectives on gestural communication and its Mampe, B., Friederici, A. D., Christophe, A. & Wermke, K. (2009) Newborns’ cry cerebral lateralization. In: Primate communication and human language: melody is shaped by their native language. Current Biology 19:1994–97. Vocalisation, gestures, imitation and deixis in humans and non-humans, ed. A. [DLB] Vilain, J. L. Schwartz, C. Abry & J. Vauclair, pp. 89–118. [Advances in Inter- Männel, C., Schipke, C. S. & Friederici, A. D. (2013) The role of pause as a prosodic action Studies, vol. 1]. John Benjamins. [AM] boundary marker: Language ERP studies in German 3- and 6-year-olds. Meguerditchian, A., Gardner, M. J., Schapiro, S. J. & Hopkins, W. D. (2012) The Developmental Cognitive Neuroscience 5:86–94. doi: 10.1016/j. sound of one hand clapping: Handedness and perisylvian neural correlates of a dcn.2013.01.003. [AR] communicative gesture in chimpanzees. Proceedings of the Royal Society B: Manser, M. B., Seyfarth, R. M. & Cheney, D. L. (2002) Suricate alarm calls signal Biological Sciences 279:1959–66. [AM] predator class and urgency. Trends in Cognitive Sciences 6:55–57. [aHA] Mehler, J., Juskzyc, P., Lamberz, G., Halsted, N., Bertoncini, J. & Amiel-Tison, C. Manson, J. E., Bryant, G. A., Gervais, M. & Kline, M. (2013) Convergence of speech (1988) A precursor of language acquisition in young infants. Cognition 29:143– rate in conversation predicts cooperation. Evolution and Human Behavior 34 78. [DLB] (6):419–26. [GAB] Mendez, J. C., Prado, L., Mendoza, G. & Merchant, H. (2011) Temporal and spatial Mao, C. C., Coull, B. M., Golper, L. A. & Rau, M. T. (1989) Anterior operculum categorization in human and non-human primates. Frontiers in Integrative syndrome. Neurology 39:1169–72. [aHA] Neuroscience 5(50):1–10. doi: 10.3389/fnint.2011.00050. [HH] Maricic, T., Günther, V., Georgiev, O., Gehre, S., Curlin, M., Schreiweis, C., Menuet, C., Cazals, Y., Gestreau, C., Borghgraef, P., Gielis, L., Dutschmann, M., Naumann, R., Burbano, H. A., Meyer, M., Laluela-Fox, C., de la Rasilla, M., Van Leuven, F. & Hilaire, G. (2011) Age-related impairment of ultrasonic vo- Rosas, A., Gajovic, S., Kelso, J., Enard, W., Schaffner, W. & Pääbo, S. (2013) A calization in Tau.P301 L mice: Possible implication for progressive language recent evolutionary change affects a regulatory element in the human FOXP2 disorders. PLOS ONE 6:e25770. [PBM] gene. Molecular Biology and Evolution 30(4):844–52. doi: 10.1093/molbev/ Merchant, H., Battaglia-Mayer, A. & Georgopoulos, A. P. (2003) Interception of real mss271. [aHA, PL] and apparent circularly moving targets: Psychophysics in human subjects and Marques, J. F., Canessa, N. & Cappa, S. (2009) Neural differences in the processing monkeys. Experimental Brain Research 152:106–12. [HH] of true and false sentences: Insights into the nature of “truth” in language Merchant, H. & Honing, H. (2014) Are non-human primates capable of rhythmic comprehension. Cortex 45(6):759–68. [KBC] entrainment? Evidence for the gradual audiomotor evolution hypothesis. Marschik, P. B., Kaufmann, W. E., Sigafoos, J., Wolin, T., Zhang, D., Bartl-Pokorny, Frontiers in Auditory Cognitive Neuroscience 7:274 doi: 10.3389/ K. D., Pini, G., Zappella, M., Tager-Flusberg, H., Einspieler, C. & Johnston, M. fnins.2013.00274. [HH] V. (2013) Changing the perspective on early development of Rett syndrome. Merker, B. (2000) Synchronous chorusing and the origins of music. Musicae Scien- Research in Developmental Disabilities 34:1236–39. [PBM] tiae 3(Suppl. 1):59–73. [AR] Marschik, P. B., Pini, G., Bartl-Pokorny, K. D., Duckworth, M., Gugatschka, M., Merker, B. (2009) Returning language to culture by way of biology. Vollmann, R., Zappella, M. & Einspieler, C. (2012) Early speech-language Commentary on Evans & Levinson (2009). Behavioral and Brain Sciences development in females with Rett syndrome: Focusing on the preserved speech 32:460–61. [BM] variant. Developmental Medicine and Child Neurology 54:451–56. [PBM] Merker, B. (2012) The vocal learning constellation: Imitation, ritual culture, Marsden, C. D. (1982) The mysterious motor function of the basal ganglia: The encephalization. In: Music, language and human evolution, ed. N. Bannan, pp. Robert Wartenberg Lecture. Neurology 32:514–39. [aHA] 215–60. Oxford University Press. [BM, GP] Marsden, C. D. & Obeso, J. A. (1994) The functions of the basal ganglia and the Merker, B., Madison, G. & Eckerdal, P. (2009) On the role and origin of isochrony in paradox of sterotaxic surgery in Parkinson’s disease. Brain 117:877–97. [PL] human rhythmic entrainment. Cortex 45(1):4–17. [AR] Marseglia, G., Scordo, M. R., Pescucci, C., Nannetti, G., Biagini, E., Scandurra, V., Merker, B. & Okanoya, K. (2007) The natural history of human language: Bridging Gerundino, F., Magi, A., Benelli, M. & Torricelli, F. (2012) 372 kb Microde- the gaps without magic. In: Emergence of communication and language, ed. C. letion in 18q12.3 causing SETBP1 haploinsufficiency associated with mild Lyon, L. Nehaniv & A. Cangelosi, pp. 403–20. Springer-Verlag. [BM]

Downloaded from http:/www.cambridge.org/core596 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Mestres-Missé, A., Turner, R. & Friederici, A. D. (2012) An anterior–posterior magnetization transfer imaging. BMC Neuroscience 10:47. Available at: www. gradient of cognitive control within the dorsomedial striatum. NeuroImage 62 biomedcentral.com/1471–2202/10/47 [aHA] (1):41–47. [KBC] Munhall, K. & Löfqvist, A. (1992) Gestural aggregation in speech: Laryngeal ges- Metzner, W. & Schuller, G. (2010) Vocal control in echolocating bats. In: Handbook tures. Journal of Phonetics 20:111–26. [aHA] of mammalian vocalization: An integrative neuroscience approach, ed. S. M. Myers, R. E. (1976) Comparative neurology of vocalization and speech: Proof of a Brudzynski, Ch. 9.4, pp. 403–15. Academic Press/Elsevier. [KJA] dichotomy. In: Origins and evolution of language and speech, ed. S. R. Harnad, Miller, C. T., Beck, K., Meade, B. & Wang, X. (2009a) Antiphonal call timing in H. D. Steklis & J. Lancaster, pp. 745–57. (Annals of the New York Academy of marmosets is behaviorally significant: Interactive playback experiments. Journal Sciences, vol. 280). New York Academy of Sciences. [aHA] of Comparative Physiology, A: Neuroethology, Sensory, Neural, and Behavioral Nachtigall, P., Supin, A. Y., Pawloski, J. & Au, W. W. L. (2004) Temporary threshold Physiology 195:783–89. [aHA] shifts after noise exposure in the bottlenose dolphin (Tursiops truncatus) Miller, C. T., Eliades, S. J. & Wang, X. (2009b) Motor planning for vocal production measured using evoked auditory potentials. Marine Mammal Science in common marmosets. Animal Behaviour 78:1195–203. [aHA] 20:673–87. [DJW] Miller, J. D. (1977) Perception of speech sounds in animals: evidence for speech Naeser, M. A., Alexander, M. P., Helm-Estabrooks, N., Levine, H. L., Laughlin, S. processing by mammalian auditory mechanisms. In: Dahlem Workshop on A. & Geschwind, N. (1982) Aphasia with predominantly subcortical lesion sites: Recognition of Complex Acoustic Signals. Life Sciences Report, vol. 5, ed. T. Description of three capsular/putaminal aphasia syndromes. Archives of Bullock, pp. 49–58. Abakon. [DLB] Neurology 39(1):2–14. [PL] Miller, J. E., Spiteri, E., Condro, M. C., Dosumu-Johnson, R. T., Geschwind, D. H. Nakano, K. (2000) Neural circuits and topographic organization of the basal ganglia & White, S. A. (2008) Birdsong decreases protein levels of FoxP2, a molecule and related regions. Brain and Development 22:S5–16. [aHA] required for human speech. Journal of Neurophysiology 100:2015–25. [KJA] Nambu, A. (2008) Seven problems on the basal ganglia. Current Opinion in Neu- Miller, P. (2010) The smart swarm. Avery Books. [KJA] robiology 18:595–604. [aHA] Miller, R. L., Stein, M. K. & Loewy, A. D. (2011) Serotonergic inputs to FoxP2 Nambu, A. (2011) Somatotopic organization of the primate basal ganglia. Frontiers neurons of the pre-locus coeruleus and parabrachial nuclei that project to the in Neuroanatomy 5:26. [aHA] ventral tegmental area. Neuroscience 193:229–40. [CMV] Nathani, S., Neal, A. R., Olds, H., Brill, J. & Oller, D. K. (2001) Canonical babbling Milo, R. G. & Quiatt, D. (1994) Language in the middle and late stone ages: Glot- and volubility in infants with moderate to severe hearing impairment. Paper togenesis in anatomically modern homo sapiens. In: Hominid culture in primate presented at the International Child Phonology Conference, Boston, MA, April perspective, ed. D. Quiatt & J. Itani, pp. 321–39. University Press of Colora- 2001. [DKO] do. [aHA] Natsopoulos, D., Grouios, G., Bostantzopoulou, S., Mentenopoulos, G., Katsarou, Z. Mimura, M., Oeda, R. & Kawamura, M. (2006) Impaired decision-making in Par- & Logothetis, J. (1993) Algorithmic and heuristic strategies in comprehension of kinson’s disease. Parkinsonism and Related Disorders 12(3):169–75. doi: complement clauses by patients with Parkinson’s disease. Neuropsychologia 10.1016/j.parkreldis.2005.12.003. [AZ] 31(9):951–64. [PL] Mitani, J. C. & Brandt, K. L. (1994) Social factors influence acoustic variability in the Nauta, W. J. H. & Gygax, P. A. (1954) Silver impregnation of degenerating axons in long-distance calls of male chimpanzees. Ethology 96:233–52. [aHA] the central nervous system: A modified technic. Stain Technology 29:91–93. Mitani, J. C. & Gros-Louis, J. (1998) Chorusing and call convergence in chimpan- [PL] zees: Tests of three hypotheses. Behaviour 135:1041–64. [aHA] Nery, M. F., González, D. J. & Opazo, J. C. (2013) How to make a dolphin: Mo- Mithen, S. J. (2006) The singing Neanderthals: The origins of music, language, mind lecular signature of positive selection in Cetacean Genome. PLoS ONE 8(6): and body. Harvard University Press. (Original work published in 2005). e65491. doi: 10.1371/journal.pone.0065491. [KJA] [arHA, SJ] Neubert, F.-X., Mars, R. B., Thomas, A. G., Sallet, J. & Rushworth, M. F. S. (2014) Mogenson, G. J., Jones, D. L. & Yim, C. Y. (1980) From motivation to action: Comparison of human ventral frontal cortex areas for cognitive control and Functional interface between the limbic system and the motor system. Progress language with areas in monkey frontal cortex. Neuron 81:1–14. doi:10.1016/j. in Neurobiology 14:69–97. [aHA] neuron.2013.11.012. [BM] Monchi, O., Petrides, M., Petre, V., Worsley, K. & Dagher, A. (2001) Wisconsin card Neul, J. L., Kaufmann, W. E., Glaze, D. G., Christodolou, J., Clarke, A. J., Bahi- sorting revisited: Distinct neural circuits participating in different stages of the Buisson, N., Leonard, H., Bailey, M. E., Schanen, N. C., Zappella, M., Renieri, task identified by event-related functional magnetic resonance imaging. Journal A., Huppke, P., Percy, A. K. & RettSearch Consortium. (2010) Rett syndrome: of Neuroscience 21(19):7733–41. [PL] Revised diagnostic criteria and nomenclature. Annals of Neurology Moore, C. A. (2004) Physiologic development of speech production. In: Speech 68:944–50. [PBM] motor control in normal and disordered speech, ed. B. Maassen, R. D. Kent, H. Newbury, D. F., Fisher, S. E. & Monaco, A. P. (2010) Recent advances in the ge- F. M. Peters, P. H. H. M. van Lieshout & W. Hulstijn, pp. 191–209. Oxford netics of language impairment. Genome Medicine 2:6. [KJA] University Press. [aHA] Newbury, D. F., Paracchini, S., Scerri, T. S., Winchester, L., Addis, L., Richardson, Moore, R. K. (2007) PRESENCE: A human-inspired architecture for speech-based A. J., Walter, J., Stein, J. F., Talcott, J. B. & Monaco, A. P. (2011) Investigation human-machine interaction. IEEE Transactions on Computers 56:1176–88. of dyslexia and SLI risk variants in reading- and language-impaired subjects. [GP] Behavioral Genetics 41:90–104. [KJA] Morecraft, R. J. & van Hoesen, G. W. (1992) Cingulate input to the primary and Newman, J. D. (2003) Vocal communication and the triune brain. Physiology and supplementary motor cortices in the Rhesus monkey: Evidence for somatotopy Behavior 79:495–502. [aHA] in areas 24c and 23c. Journal of Comparative Neurology 322:471–89. [aHA] Newport, E. L. & Meier, R. P. (1985) The acquisition of American Sign Language. Morecraft, R. J., Louie, J. L., Herrick, J. L. & Stilwell-Morecraft, K. S. (2001) In: The crosslinguistic study of language acquisition, ed. D. Slobin, pp. 881–38. Cortical innervation of the facial nucleus in the non-human primate: A new Erlbaum. [BF] interpretation of the effects of stroke and related subtotal brain trauma on the Nielsen, M. A. & Chuang, I. L. (2000) Quantum computation and quantum infor- muscles of facial expression. Brain 124:176–208. [aHA] mation. Cambridge University Press. [KBC] Morley, I. (2012) Hominin physiological evolution and the emergence of musical Nieuwenhuys, R., Voogd, J. & van Huijzen, C. (2008) The human central nervous capacities. In: Music, language, and human evolution, ed. N. Bannan, pp. 109– system, 4th edition. Springer. [aHA] 41. Oxford University Press. [aHA] Niv, Y., Joel, D. & Dayan, P. (2006) A normative perspective on motivation. Trends Morrill, R. J., Paukner, A., Ferrari, P. F. & Ghazanfar, A. A. (2012) Monkey lip- in Cognitive Sciences 10(8):375–81. doi: 10.1016/j.tics.2006.06.010. [AZ] smacking develops like the human speech rhythm. Developmental Science Noonan, J. P. (2010) Neanderthal genomics and the evolution of modern humans. 15:557–68. [DYT] Genome Research 20:547–53. [aHA] Morse, P. A. & Snowdon, C. T. (1975) An investigation of categorical speech dis- Nottebohm, F. (1976) Discussion paper. Vocal tract and brain: A search for crimination by rhesus monkeys. Perception and Psychophysics 17:9–16. evolutionary bottlenecks. In: Origins and evolution of language and speech, [DLB] ed. S. R. Harnad, H. D. Steklis & J. Lancaster, pp. 643–49. Annals of the Mukamel, Z., Konopka, G., Wexler, E., Osborn, G. E., Dong, H., Bergman, M. Y., New York Academy of Sciences, vol. 280. New York Academy of Sciences. Levitt, P. & Geschwind, D. H. (2011) Regulation of MET by FOXP2, genes [BM] implicated in higher cognitive dysfunction and autism risk. Journal of Neuro- Nuckolls, A. L., Worley, C., Leto, C., Zhang, H., Morris, J. K. & Stanford, J. A. science 31(32):11437–42. [KJA] (2012) Tongue force and tongue motility are differently affected by unilateral Müller, J., Wenning, G. K., Verny, M., McKee, A., Chaudhuri, K. R., Jellinger, K., vs. bilateral nigrostriatal dopamine depletion in rats. Behavioral Brain Research Poewe, W. & Litvan, I. (2001) Progression of dysarthria and dysphagia in 234:343–48. [CMV] postmortem-confirmed Parkinsonian disorders. Archives of Neurology 58:259– Nudel, R. & Newbury, D. F. (2013) FOXP2 advanced review. WIREs Cognitive 64. [aHA] Science 4(5):547–60. doi: 10.1002/wcs.1247. [KJA] Müller-Vahl, K. R., Kaufmann, J., Grosskreutz, J., Dengler, R., Emrich, H. M. & Obeso, J. A., Jahanshahi, M., Alvarez, L., Macias, R., Pedroso, I., Wilkinson, L., Peschel, T. (2009). Prefrontal and anterior cingulate cortex abnormalities in Pavon, N., Day, B., Pinto, S., Rodríguez-Oroz, M. C., Tejeiro, J., Artieda, J., Tourette syndrome: Evidence from voxel-based morphometry and Talelli, P., Swayne, O., Rodríguez, R., Bhatia, K., Rodriguez-Dias, M., Lopez,

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 597 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

G., Guridi, J. & Rothwell, J. C. (2009) What can man do without basal ganglia Patterson, F. G. & Linden, E. (1981) The education of Koko. Holt, Rinehart, & motor output? The effect of combined unilateral subthalamotomy and palli- Winston. [BdB] dotomy in a patient with Parkinson’s disease. Experimental Neurology 220 Paul, R., Fuerst, Y., Ramsay, G., Chawarska, K. & Klin, A. (2011) Out of the mouths (2):283–92. doi: 10.1016/j.expneurol.2009.08.030. [AZ] of babes: Vocal production in infant siblings of children with ASD. Journal of Okanoya, K., Hihara, S., Tokimoto, N., Tobari, Y. & Iriki, A. (2007) Complex vocal Child Psychology and Psychiatry 52:588–98. [PBM] behaviour and cortical-medullar projection. Lecture Notes in Computer Science Paulmann, S., Ott, D. V. & Kotz, S. A. (2011) Emotional speech perception un- 3609:362–67. [BM] folding in time: The role of the basal ganglia. PLOS ONE 6(3):e17694. doi: Okanoya, K. & Merker, B. (2007) Neural substrates for string-context mutual seg- 10.1371/journal.pone.0017694. [UH] mentation: A path to human language. In: Emergence of communication and Paulmann, S., Pell, M. D. & Kotz, S. A. (2008) Functional contributions of the basal language, ed. C. Lyon, L. Nehaniv & A. Cangelosi, pp. 421–34. Springer- ganglia to emotional prosody: Evidence from ERPs. Brain Research 1217:171– Verlag. [BM] 78. doi: 10.1016/j.brainres.2008.04.032. [UH] Olivier, E., Davare, M., Andres, M. & Fadiga, L. (2007) Precision grasping in Paus, T. (2001) Primate anterior cingulate cortex: Where motor control, drive and humans: From motor control to cognition. Current Opinion in Neurobiology 17 cognition interface. Nature Reviews Neuroscience 2:417–24. [aHA] (6):644–48. doi: 10.1016/j.conb.2008.01.008. [AZ] Paus, T., Tomaiuolo, F., Otaky, N., MacDonald, D., Petrides, M., Atlas, J., Morris, R. Oller, D. K. (1980) The emergence of the sounds of speech in infancy. In: Child & Evans, A. C. (1996) Human cingulate and paracingulate sulci: Pattern, vari- phonology, vol. 1: Production, ed. G. Yeni-Komshian, J. Kavanagh & C. Fer- ability, asymmetry, and probabilistic map. Cerebral Cortex 6:207–14. [aHA] guson, pp. 93–112. Academic Press. [DKO] Peake, T. M. & McGregor, P. K. (2004) Information and aggression in fishes. Oller, D. K. (2000) The emergence of the speech capacity. Erlbaum. [DLB, DKO] Learning and Behavior 32(1):114–21. [KBC] Oller, D. K., Buder, E. H., Ramsdell, H. L., Chorna, L., Warlaumont, A. S. & Peelle, J. E. & Davis, M. H. (2012) Neural oscillations carry speech rhythm through Bakeman, R. (2013) Functional flexibility of infant vocalization and the emer- to comprehension. Frontiers in Psychology 3:320. [rHA] gence of language. In: Proceedings of the National Academy of Sciences USA Pell, M. D. & Monetta, L. (2008) How Parkinson’s disease affects non-verbal com- 110(16):6318–23. doi: 10.1073/pnas.1300337110. [KJA, DKO] munication and language processing. Language and Linguistics Compass 2 Öngür, D. & Price, J. L. (2000) The organization of networks within the orbital and (5):739–59. doi: 10.1111/j.1749-818X.2008.00074.x. [AZ] medial prefrontal cortex of rats, monkeys and humans. Cerebral Cortex 10:206– Penn, D. C., Holyoak, K. J. & Povinelli, D. J. (2008) Darwin’s mistake: Explaining 19. [aHA] the discontinuity between human and nonhuman minds. Behavioral and Brain Ouattara, K., Lemasson, A. & Zuberbühler, K. (2009) Campbell’s monkeys concat- Sciences 31(2):109–78. [BF] enate vocalizations into context-specific call sequences. Proceedings of the Na- Pepperberg, I. M. (2007) Grey parrots do not always “parrot”: The roles of imitation tional Academy of Sciences USA 106(51): 22026–31. [aHA, KBC, ARL] and phonological awareness in the creation of new labels from existing vocali- Owren, M. J., Amoss, R. T. & Rendall, D. (2011) Two organizing principles of vocal zations. Language Science 29:1–13. [DLB] production: Implications for nonhuman and human primates. American Journal Perlman, M., Patterson, F. G. & Cohn, R. H. (2011) The incorporation of learned of Primatology 73(6):530–44. [aHA, DJW] breathing-related behavior into the multimodal gestures and rituals of an en- Owren, M. J., Dieter, J. A., Seyfarth, R. M. & Cheney, D. L. (1992) “Food” calls culturated gorilla. Paper presented at the New College Oxford Conference on produced by adult female rhesus (Macaca mulatta) and Japanese (M. fuscata) Embodied Language, Oxford, United Kingdom, September 2011. [BdB] macaques, their normally-raised offspring, and offspring cross-fostered between Perlman, M., Patterson, F. G. & Cohn, R. H. (2012) The human-fostered gorilla species. Behaviour 120:218–31. [aHA] Koko shows breath control in play with wind instruments. Biolinguistics 6:433– Owren, M. J., Dieter, J. A., Seyfarth, R. M. & Cheney, D. L. (1993) Vocalizations of 44. [GAB] rhesus (Macaca mulatta) and Japanese (M. fuscata) macaques cross-fostered Péron, J., Frühholz, S., Verin, M. & Grandjean, D. (2013) Subthalamic nucleus: A between species show evidence of only limited modification. Developmental key structure for emotional component synchronization in humans. Neurosci- Psychobiology 26:389–406. [aHA, DHR] ence and Biobehavioral Reviews 37(3):358–73. [SF] Packard, M. G. & Knowlton, B. J. (2002) Learning and memory functions of the basal Perrodin, C., Kayser, C., Logothetis, N. K. & Petkov, C. I. (2011) Voice cells in the ganglia. Annual Review of Neuroscience 25:563–93. [aHA] primate temporal lobe. Current Biology 21:1408–15. [DHR] Panksepp, J. (1998) Affective neuroscience: The foundations of human and animal Perszyk, D. R. & Waxman, S. R. (2013) The role of experience in linking sounds and emotions. Oxford University Press. [aHA] meaning in language acquisition. Paper Presented at the 38th Annual Boston Panksepp, J. (2010) Emotional causes and consequences of social-affective vocali- University Conference on Language Development, Boston, MA, November 3, zation. In: Handbook of mammalian vocalization: An integrative neuroscience 2013. [BF] approach, ed. S. M. Brudzynski, pp. 201–208. Elsevier. [aHA] Petitto, L. A. & Marentette, P. F. (1991) Babbling in the manual mode: Evidence for Papoušek, M. (2003) Intuitive parenting: A hidden source of musical stimulation in the ontogeny of language. Science 251(5000):1493–96. [BF] infancy. In: Musical beginnings: Origins and development of musical compe- Petkov, C. I. & Jarvis, E. D. (2012) Birds, primates, and spoken language origins: tence, ed. I. Deliège & J. Sloboda, pp. 88–112. Oxford University Press. Behavioral phenotypes and neurobiological substrates. Frontiers in Evolution- [aHA] ary Neuroscience 4:12. doi: 10.3389/fnevo.2012.00012. [CIP] Papoušek, M. & Papoušek, H. (1981) Musical elements in the infant’s Petkov, C. I., Kayser, C., Augath, M. & Logothetis, N. K. (2006) Functional imaging vocalization: Their significance for communication, cognition, and creativity. In: reveals numerous fields in the monkey auditory cortex. PLOS Biology 4:e215. Advances in infancy research, vol. 1, ed. L. P. Lipsitt, pp. 163–224. Ablex. [DHR] [DLB] Petkov, C. I. & Wilson, B. (2012) On the pursuit of the brain network for proto- Parent, A. & Hazrati, L. N. (1995) Functional anatomy of the basal ganglia: I. The syntactic learning in non-human primates: Conceptual issues and neurobio- cortico-basal ganglia-thalamo-cortical loop. Brain Research Brain Research logical hypotheses. Philosophical Transactions of the Royal Society of London. Reviews 20:91–127. [aHA] Series B: Biological Sciences 367(1598):2077–88. doi: 10.1098/ Parks, S. E., Johnson, M., Nowacek, D. P. & Tyack, P. L. (2011) Individual right rstb.2012.0073. [CIP] whales call louder in increased environmental noise. Biology Letters 7:33–35. Petrides, M., Cadoret, G. & Mackey, S. (2005) Orofacial somatomotor responses in [DJW] the macaque monkey homologue of Broca’s area. Nature 435:1235–38. Parr, L., Waller, B. M. & Heintz, M. (2008) Facial expression categorization by [aHA] chimpanzees using standardized stimuli. Emotion 8(2):216–31. doi: 10.1037/ Petrides, M. & Pandya, D. N. (2009) Distinct parietal and temporal pathways to the 1528-3542.8.2.216. [DKO] homologues of Broca’s area in the monkey. PLoS Biology 7:e1000170. [aHA] Passingham, R. (2008) What is special about the human brain? Oxford University Pezzulo, G. (2012a) An Active Inference view of cognitive control. Frontiers in Press. [aHA] Psychology 3:478. doi: 10.3389/fpsyg.2012.00478. [GP] Patel, A. D. (2006) Musical rhythm, linguistic rhythm, and human evolution. Music Pezzulo, G. (2012b) The “Interaction Engine”: A common pragmatic competence Perception: An Interdisciplinary Journal 24(1):99–104. [AR] across linguistic and non-linguistic interactions. IEEE Transactions on Auton- Patel, A. D. (2008) Music, language, and the brain. Oxford University Press. [HH] omous Mental Development 4:105–23. [GP] Patel, A. D., Iversen, J. R., Bregman, M. R. & Schulz, I. (2009a) Experimental Pezzulo, G. (2013) Studying mirror mechanisms within generative and predictive evidence for synchronization to a musical beat in a nonhuman animal. Current architectures for joint action. Cortex 49:2968–69. [GP] Biology 19(10):827–30. [AR] Pezzulo, G., Donnarumma, F. & Dindo, H. (2013) Human sensorimotor commu- Patel, A. D., Iversen, J. R., Bregman, M. R. & Schulz, I. (2009b) Studying syn- nication: A theory of signaling in online social interactions. PLOS ONE 8: chronization to a musical beat in nonhuman animals. Annals of the New York e79876. [GP] Academy of Sciences 1169:459–69. doi: 10.1111/j.1749-6632.2009.04581.x. Phillips-Silver, J., Aktipis, A. & Bryant, G. A. (2010) The ecology of entrainment: [HH] Foundations of coordinated rhythmic movement. Music Perception 28(1):3– Patel, S., Scherer, K. R., Bjorkner, E. & Sundberg, J. (2011) Mapping emotions 14. [GAB] into acoustic space: The role of voice production. Biological Psychology 87 Phister, P. W., Jr. (2010) Cyberspace: The ultimate complex adaptive system. In- (1):93–98. [SF] ternational C2 Journal 4:1–30. [TAM]

Downloaded from http:/www.cambridge.org/core598 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Picard, N. & Strick, P. L. (1996) Motor areas of the medial wall: A review of their Ravignani, A., Olivera, V. M., Gingras, B., Hofer, R., Hernández, C. R., Sonnweber, location and functional activation. Cerebral Cortex 6:342–53. [aHA] R.-S. & Fitch, W. T. (2013) Primate drum kit: A system for studying acoustic Pichon, S. & Kell, C. A. (2013) Affective and sensorimotor components of emotional pattern production by non-human primates using acceleration and strain prosody generation. The Journal of Neuroscience 33(4):1640–50. [SF] sensors. Sensors 13(8):9790–820. [AR] Pick, H. L., Jr., Siegel, G. M., Fox, P. W., Garber, S. R. & Kearney, J. K. (1989) Reimers-Kipping, S., Hevers, W., Pääbo, S. & Enard W. (2011) Humanized Foxp2 Inhibiting the Lombard effect. Journal of the Acoustical Society of America specifically affects cortico-basal ganglia circuits. Neuroscience 175:75–84. doi: 85:94. [DJW] 10.1016/j.neuroscience.2010.11.042. [aHA, BM] Pickering, M. J. & Garrod, S. (2013) An integrated theory of language production Reiner, A. (2010) Organization of corticostriatal projection neuron types. In: and comprehension. Behavioral and Brain Sciences 36(4):329–47. [GP] Handbook of basal ganglia structure and function, ed. H. Steiner & K. Y. Tseng, Pickett, E. R., Kuniholm, E., Protopapas, A., Friedman, J. & Lieberman, P. (1998) pp. 323–39. Elsevier. [aHA] Selective speech motor, syntax and cognitive deficits associated with bilateral Rendall, D., Kollias, S., Ney, C. & Lloyd, P. (2005) Pitch (F0) and formant profiles of damage to the putamen and the head of the caudate nucleus: A case study. human vowels and vowel-like baboon grunts: The role of vocalizer body size Neuropsychologia 36:173–88. [PL] and voice-acoustic allometry. Journal of the Acoustical Society of America Pierce, J. D., Jr. (1985) A review of attempts to condition operantly alloprimate 117:944–55. [aHA] vocalizations. Primates 26:202–13. [aHA] Repp, B. H. (2005) Sensorimotor synchronization: A review of the tapping literature. Pinel, P., Fauchereau, F., Moreno, A., Barbot, A., Lathrop, M., Zelenika, D., Le Psychonomic Bulletin and Review 12:969–92. [GAB] Bihan, D., Poline, J.-B., Bourgeron, T. & Dehaene, S. (2012) Genetic variants of Reser, D. H., Burman, K. J., Richardson, K. E., Spitzer, M. W. & Rosa, M. G. (2009) FOXP2 and KIAA0319/TTRAP/THEM2 locus are associated with altered brain Connections of the marmoset rostrotemporal auditory area: Express pathways activation in distinct language-related regions. Journal of Neuroscience 32 for analysis of affective content in hearing. European Journal of Neuroscience (3):817–25. [KJA] 30:578–92. [DHR] Pinker, S. (2010) Colloquium paper: The cognitive niche: Coevolution of intelli- Richman, B. (1976) Some vocal distinctive features used by gelada monkeys. Journal gence, sociality, and language. Proceedings of the National Academy of Sciences of the Acoustical Society of America 60:718–24. [GAB] USA 107:8993–99. doi: 10.1073/pnas.0914630107. [TAM] Riecker, A., Brendel, B., Ziegler, W., Erb, M. & Ackermann, H. (2008) The influ- Pinker, S. & Bloom, P. (1990) Natural language and . Behavioral ence of syllable onset complexity and syllable frequency on speech motor and Brain Sciences 13:707–84. [BF] control. Brain and Language 107(2):102–13. doi: 10.1016/j. Pistorio, A. L., Vintch, B. & Wang, X. (2006) Acoustic analysis of vocal development bandl.2008.01.008. [UH] in a New World primate, the common marmoset (Callithrix jacchus). Journal of Riecker, A., Kassubek, J., Gröschel, K., Grodd, W. & Ackermann, H. (2006) The the Acoustical Society of America 120:1655–70. [aHA, DHR] cerebral control of speech tempo: Opposite relationship between speaking rate Postuma, R. B. & Dagher, A. (2006) Basal ganglia functional connectivity based on a and BOLD signal changes at striatal and cerebellar structures. NeuroImage meta-analysis of 126 positron emission tomography and functional magnetic 29:46–53. [aHA] resonance imaging publications. Cerebral Cortex 16:1508–21. [aHA] Riecker, A., Wildgruber, D., Dogil, G., Grodd, W. & Ackermann, H. (2002) Potulska, A., Friedman, A., Królicki, L. & Spychala, A. (2003) Swallowing disorders Hemispheric lateralization effects of rhythm implementation during syllable in Parkinson’s disease. Parkinsonism Related Disorders 9:349–453. [CMV] repetitions: An fMRI study. NeuroImage 16:169–76. [rHA] Prather, J. F., Peters, S., Nowicki, S. & Mooney, R. (2008) Precise auditory–vocal Riede, T., Bronson, E., Hatzikirou, H. & Zuberbühler, K. (2005) Vocal production mirroring in neurons for learned vocal communication. Nature 451:305–10. [GP] mechanisms in a non-human primate: Morphological data and a model. Journal Provine, R. R. (1993) Laughter punctuates speech: Linguistic, social and gender of Human Evolution 48:85–96. [aHA] contexts of laughter. Ethology 95:291–98. [GAB] Riede, T., Bronson, E., Hatzikirou, H. & Zuberbühler, K. (2006) Multiple discon- Provine, R. R. (2000) Laughter: A scientific investigation. Viking/Penguin Press. tinuities in nonhuman vocal tracts: A response to Lieberman (2006). Journal of [GAB, ACC] Human Evolution 50:222–25. [aHA] Provine, R. R. (2012) Curious behavior: Yawning, laughing, hiccupping, and beyond. Riede, T., Tokuda, I. T., Munger, J. B. & Thomson, S. L. (2008) Mammalian laryngseal Belknap/Harvard University Press. [ACC] air sacs add variability to the vocal tract impedance: Physical and computational Provine, R. R. & Emmorey, K. (2006) Laughter among deaf signers. Journal of Deaf modeling. Journal of the Acoustical Society of America 124(1):634–47. [BdB] Studies and Deaf Education 11(4):403–409. doi: 10.1093/deafed/enl008. [CM] Riede, T. & Zuberbühler, K. (2003a) Pulse register phonation in Diana monkey Radua, J., van den Heuvel, O. A., Surguladze, S. & Mataix-Cols, D. (2010) Meta- alarm calls. Journal of the Acoustical Society of America 113:2919–26. [aHA] analytical comparison of voxel-based morphometry studies in obsessive-com- Riede, T. & Zuberbühler, K. (2003b) The relationship between acoustic structure pulsive disorder vs other anxiety disorders. Archives of General Psychiatry and semantic information in Diana monkey alarm vocalization. Journal of the 67:701–11. [aHA] Acoustical Society of America 114:1132–42. [aHA] Rajan, R., Dubaj, V., Reser, D. H. & Rosa, M. G. (2013) Auditory cortex of the Rightmire, G. P. (2004) Brain size and encephalization in early to mid-Pleistocene marmoset monkey – complex responses to tones and vocalizations under opiate Homo. American Journal of Physical Anthropology 124:109–23. [aHA] anaesthesia in core and belt areas. European Journal of Neuroscience 37:924– Rightmire, G. P. (2007) Later middle Pleistocene Homo. In: Handbook of paleoan- 41. [DHR] thropology, vol. 3: Phylogeny of hominids, ed. W. Henke & I. Tattersall, pp. Ramig, L. O., Fox, C. & Sapir, S. (2004) Parkinson’s disease: Speech and voice 1695–715. Springer. [aHA] disorders and their treatment with the Lee Silverman Voice Treatment. Semi- Risley, T. R. (1977) The development and maintenance of language: An operant nars in Speech and Language 25:169–80. [aHA] model. In: New developments in behavioral research, ed. B. C. Etzel, J. M. Ramig, L. O., Fox, C. & Sapir, S. (2007) Speech disorders in Parkinson’s disease and LeBlanc & D. M. Baer, pp. 81–101. Erlbaum. [ACC] the effects of pharmacological, surgical and speech treatment with emphasis on Rizzolatti, G. & Arbib, M. A. (1998) Language within our grasp. Trends in Neuro- Lee Silverman Voice Treatment (LSVT®). In: Parkinson’s disease and related sciences 21:188–94. [DYT] disorders, Part 1, ed. W. C. Koller & E. Melamed, pp. 385–99. (Handbook of Robbins, T. W. (2010) From behavior to cognition: Functions of mesostriatal, Clinical Neurology, vol. 83, 3rd series). Elsevier Press. [aHA] mesolimbic, and mesocortical dopamine systems. In: Dopamine handbook, ed. Rao, S. M., Harrington, D. L., Haaland, K. Y., Bobholz, J. A., Cox, R. W. & Binder L. L. Iversen, S. D. Iversen, S. B. Dunnett & A. Björklund, pp. 203–14. Oxford J. R. (1997) Distributed neural systems underlying the timing of movements. University Press. [aHA] Journal of Neuroscience 17:5528–35. [HH] Robinson, B. W. (1967) Vocalization evoked from forebrain in Macaca mulatta. Rappaport, R. A. (1999) Ritual and religion in the making of humanity. Cambridge Physiology and Behavior 2:345–54. [aHA] University Press. [aHA] Rocca, F. & Lenti Boero, D. (2005) Sex differences in human infant cry: A comparative Rappaport, R. A. (2000) Pigs for the ancestors: Ritual in the ecology of a New Guinea view. In: Abstracts of the XXIX International Ethological Conference,ed.R. people, 2nd edition. Waveland Press. [aHA] Sàndor, p. 131. Késult a Codex Print Nyondàbam, Budapest. [DLB] Rasa, O. A. E. (1986) Coordinated vigilance in dwarf mongoose family groups: The Roland, E. H., Poskitt, K., Rodriguez, E., Lupton, B. A. & Hill, A. (1998) Perinatal “watchman’s song” hypothesis and the costs of guarding. Ethology 71(4):340– hypoxic-ischemic thalamic injury: Clinical features and neuroimagery. Annals of 44. [DLB] Neurology 44:161–66. [aHA] Raul, L. (2003) Serotonin2 receptors in the nucleus tractus solitarius: Characteri- Rosas, A., Martínez-Maza, C., Bastir, M., García-Tabernero, A., Lalueza-Fox, C., zation and role in the baroreceptor reflex arc. Cell Molecular Neurobiology Huguet, R., Ortiz, J. E., Julià, R., Soler, V., de Torres, T., Martínez, E., Cana- 23:709–26. [CMV] veras, J. C., Sánchez-Moral, S., Cuezva, S., Lario, J., Santamaría, D., de la Rauschecker, J. P. (2012) Ventral and dorsal streams in the evolution of speech and Rasilla, M. & Fortea, J. (2006) Paleobiology and comparative morphology of a language. Frontiers in Evolutionary Neuroscience 4:7. doi: 10.3389/ late Neandertal sample from El Sidrón, Asturias, Spain. Proceedings of the fnevo.2012.00007. [TAM] National Academy of Sciences USA 103:19266–71. [aHA] Rauschecker, J. P. & Scott, S. K. (2009) Maps and streams in the auditory cortex: Roseberry, S., Hirsh-Pasek, K. & Golinkoff, R. M. (2014) Skype me! Socially con- Nonhuman primates illuminate human speech processing. Nature Neuroscience tingent interactions help toddlers learn language. Child Development 85 12(6):718–24. [SF] (3):956–70. [MHB]

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 599 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Ross, E. D. & Mesulam, M.-M. (1979) Dominant language functions of the right Schmahmann, J. D. & Sherman, J. C. (1998) The cerebellar cognitive affective hemisphere? Archives of Neurology 36:144–48. [aHA] syndrome. Brain 121 (Pt. 4):561–79. [UH, rHA] Ross, E. D. & Monnot, M. (2008) Neurology of affective prosody and its functional- Schmidt, L., Forgeot d’Arc, B., Lafargue, G., Galanaud, D., Czernecki, V., Grabli, anatomic organization in right hemisphere. Brain and Language 104:51–74. D., Schübach, M., Hartmann, A., Lévy, R., Dubois, B. & Pessiglione, M. (2008) [aHA] Disconnecting force from money: Effects of basal ganglia damage on incentive Ross, M. D., Owren, M. J. & Zimmermann, E. (2010) The evolution of laughter in motivation. Brain: A Journal of Neurology 131(Pt. 5):1303–10. doi: 10.1093/ great apes and humans. Communicative and Integrative Biology 3:191–94. brain/awn045. [AZ] [DHR] Schneider, S. M. (2012) The science of consequences. Prometheus. [ACC] Rothermich, K., Schmidt-Kassow, M. & Kotz, S. A. (2012) Rhythm’s gonna get you: Schultz, W. (2006) Behavioral theories and the neurophysiology of reward. Annual Regular meter facilitates semantic sentence processing. Neuropsychologia Review of Psychology 57:87–115. [aHA] 50:232–44. [rHA] Schultz, W. (2007) Behavioral dopamine signals. Trends in Neurosciences 30:203– Roush, R. S. & Snowdon, C. T. (1994) Ontogeny of food-associated calls in cotton- 10. [aHA] top tamarins. Animal Behaviour 47:263–73. [aHA] Schultz, W. (2010) Dopamine signals for reward value and risk: Basic and recent Roush, R. S. & Snowdon, C. T. (1999) The effects of social status on food-associated data. Behavioral and Brain Functions 6:24. [aHA] calling behaviour in captive cotton-top tamarins. Animal Behaviour 58:1299– Schultz, W., Tremblay, L. & Hollerman, J. R. (2000) Reward processing in 305. [aHA] primate orbitofrontal cortex and basal ganglia. Cerebral Cortex 10(3):272–84. Rowe, M. & Goldin-Meadow, S. (2009) Early gesture selectively predicts later lan- [CIP] guage learning. Developmental Science 12(1):182–87. [KJA] Schulz, G. M., Varga, M., Jeffires, K., Ludlow, C. L. & Braun, A. R. (2005) Func- 15 Roy, S., Miller, C. T., Gottsch, D. & Wang, X. (2011) Vocal control by the common tional neuroanatomy of human vocalization: An H2 O PET study. Cerebral marmoset in the presence of interfering noise. Journal of Experimental Biology Cortex 15:1835–47. [aHA] 214:3619–29. [aHA] Schwab, E. D. & Pienta, K. J. (1997) Modeling signal transduction in normal and Rubens, A. B. (1975) Aphasia with infarction in the territory of the anterior cerebral cancer cells using complex adaptive systems. Medical Hypotheses 48:111–23. artery. Cortex 11:239–50. [aHA] [TAM] Ruch, W. & Ekman, P. (2001) The expressive pattern of laughter. In: Emotion, Sebanz, N., Bekkering, H. & Knoblich, G. (2006) Joint action: Bodies and minds qualia, and consciousness, ed. A. W. Kaszniak, pp. 426–43. World Scientific. moving together. Trends in Cognitive Sciences 10:70–76. [GP] [CM] Seeley, W. W. (2008) Selective functional, regional, and neuronal vulnerability Rukstalis, M., Fite, J. E. & French, J. A. (2003) Social change affects vocal structure in frontotemporal dementia. Current Opinion in Neurology 21:701–707. in a callitrichid primate (Callithrix kuhlii). Ethology 109:327–40. [aHA] [aHA] Russell, J. L., McIntyre, J. M., Hopkins, W. D. & Taglialatela, J. P. (2013) Vocal Selezneva, E., Deike, S., Knyazeva, S., Scheich, H., Brechmann, A. & Brosch, M. learning of a communicative signal in captive chimpanzees, Pan troglodytes. (2013) Rhythm sensitivity in macaque monkeys. Frontiers in Systems Neuro- Brain and Language 127(3):520–25. [AM] science 7:49. doi: 10.3389/fnsys.2013.00049. [HH] Ruzza, B., Rocca, F., Lenti Boero, D. & Lenti, C. (2003) Investigating the musical Seyfarth, R. M. (2005) Continuities in vocal communication argue against a gestural qualities of early infant sounds. In: The neurosciences and music, ed. G. Avan- origin of language. Behavioral and Brain Sciences 28:144–45. [ARL] zini, C. Faienza, D. Minciacchi, L. Lopez & L. Majno, pp. 527–29. New York Seyfarth, R. M. & Cheney, D. L. (1980) Monkey responses to three different alarm Academy of Sciences. [DLB] calls: Evidence of predator classification and semantic communication. Science Sacks, O. (1985) The president’s speech. In: The man who mistook his wife for a hat 210:801–803. [DLB] and other clinical tales, by O. Sacks, pp. 80–85. Simon & Schuster. [ACC] Seyfarth, R. M. & Cheney, D. L (2003a) Meaning and emotion in animal vocaliza- Sakai, K., Kitaguchi, K. & Hikosaka, O. (2003) Chunking during human visuomotor tions. Annals of the New York Academy of Sciences 1000:32–55. [ARL] sequence learning. Experimental Brain Research 152(2):229–42. doi: 10.1007/ Seyfarth, R. M. & Cheney, D. L. (2003b) Signalers and receivers in animal com- s00221-003-1548-8. [AZ] munication. Annual Review of Psychology 54:145–73. [aHA] Salvante, K. G., Racke, D. M., Campbell, C. R. & Sockman, K. W. (2010) Plasticity in Seyfarth, R. M. & Cheney, D. L. (2008) Primate social knowledge and the origins of singing effort and its relationship with monoamine metabolism in the songbird language. Mind and Society 7:129–42. [ARL] telencephalon. Developmental Neurobiology 70:41–57. [CMV] Seyfarth, R. M. & Cheney, D. L. (2010) Production, usage, and Satoh, T., Nakai, S., Sato, T. & Kimura, M. (2003) Correlated coding of motivation comprehension in animal vocalizations. Brain and Language 115(1):92–100. and outcome of decision by dopamine neurons. Journal of Neuroscience [KBC, ARL] 23:9913–23. [aHA] Seyfarth, R. M., Cheney, D. L. & Bergman, T. J. (2005) Primate social cognition and Savage-Rumbaugh, S., Fields, W. M. & Spircu, T. (2004) The emergence of knap- the origins of language. Trends in Cognitive Sciences 9:264–66. [ARL] ping and vocal expression embedded in a Pan/Homo culture. Biology and Seyfarth, R. M., Cheney, D. L. & Marler, P. (1980) Vervet monkey alarm calls: Philosophy 19:541–75. [aHA] Semantic communication in a free-ranging primate. Animal Behaviour Sawamoto, N., Honda, M., Hanakawa, T., Aso, T., Inoue, M., Toyoda, H., Ishizu, K., 28:1070–94. [aHA] Fukuyama, H. & Shibasaki, H. (2007) Cognitive slowing in Parkinson disease is Sheinkopf, S. J., Iverson, J. M., Rinaldi, M. L. & Lester, B. M. (2012) Atypical cry accompanied by hypofunctioning of the striatum. Neurology 68:1062–68. acoustics in 6-month-old infants at risk for autism spectrum disorder. Autism [TH] Research 5(5):331–39. [MHB] Sawamoto, N., Honda, M., Hanakawa, T., Fukuyama, H. & Shibasaki, H. (2002) Sherwood, C. C. (2005) Comparative anatomy of the facial motor nucleus in Cognitive slowing in Parkinson’s disease: A behavioral evaluation independent mammals, with an analysis of neuron numbers in primates. The Anatomical of motor slowing. The Journal of Neuroscience 22:5198–203. [TH] Record, Part A: Discoveries in Molecular, Cellular, and Evolutionary Biology Schachner, A., Brady, T. F., Pepperberg, I. M. & Hauser, M. D. (2009) Spontaneous 287(1):1067–79. [aHA] motor entrainment to music in multiple vocal mimicking species. Current Sherwood, C. C., Broadfield, D. C., Holloway, R. L., Gannon, P. J. & Hof, P. R. Biology 19(10):831–36. doi: 10.1016/j.cub.2009.03.061. [HH] (2003) Variability of Broca’s area homologue in African great apes: Scharff, C. & Haesler, S. (2005) An evolutionary perspective on FoxP2: Strictly for Implications for language evolution. The Anatomical Record, the birds? Current Opinion in Neurobiology 15:694–703. [aHA] Part A: Discoveries in Molecular, Cellular, and Evolutionary Biology Scharff, C. & Petri, J. (2011) Evo-devo, deep homology and FoxP2: Implications for 271:276–85. [aHA] the evolution of speech and language. Philosophical Transactions of the Royal Sherwood, C. C., Hof, P. R., Holloway, R. L., Semendeferi, K., Gannon, P. J., Society, Series B 366:2124–40. [KJA] Frahm, H. D. & Zilles, K. (2005) Evolution of the brainstem orofacial motor Scheiner, E., Hammerschmidt, K., Jürgens, U. & Zwirner, P. (2006) Vocal expres- system in primates: A comparative study of trigeminal, facial, and hypoglossal sion of emotions in normally hearing and hearing-impaired infants. Journal of nuclei. Journal of Human Evolution 48:45–84. [aHA] Voice 20(4):585–604. [DKO] Shriberg, L. D., Aram, D. M. & Kwiatkowski, J. (1997) Developmental apraxia of Schel, A. M., Townsend, S. W., Machanda, Z., Zuberbühler, K. & Slocombe, K. E. speech: I. Descriptive and theoretical perspectives. Journal of Speech, Lan- (2013) Chimpanzee alarm call production meets key criteria for intentionality. guage, and Hearing Research 40:273–85. [aHA] PLOS ONE 8:e76674. [ARL] Shriberg, L. D., Ballard, K. J., Tomblin, J. B., Duffy, J. R., Odell, K. H. & Williams, Scherer, K. R. (1986) Vocal affect expression: A review and a model for future C. A. (2006) Speech, prosody, and voice characteristics of a mother and research. Psychological Bulletin 99:143–65. [aHA] daughter with a 7;13 translocation affecting FOXP2. Journal of Speech Lan- Scherer, K. R., Johnstone, T. & Klasmeyer, G. (2009) Vocal expression of emotion. guage Hearing Research 49:500–25. [CMV] In: Handbook of affective sciences, ed. R. J. Davidson, K. R. Scherer & H. Hill Shu, W. G., Cho, J. Y., Jiang, Y. H., Zhang, M. H., Weisz, D., Elder, G. A., Goldsmith, pp. 433–56. Oxford University Press. [aHA] Schmeidler, J., De Gasperi, R., Gama Sosa, M. A., Rabidou, D., Santucci, A. C., Schilbach, L., Timmermans, B., Reddy, V., Costall, A., Bente, G., Schlicht, T. & Perl, D., Morrisey, E. & Buxbaum, J. D. (2005) Altered ultrasonic vocalization Vogeley, K. (2013) Toward a second-person neuroscience. Behavioral and in mice with a disruption in the FoxP2 gene. Proceedings of the National Brain Sciences 36(4):393–462. [KJA] Academy of Science USA 102:9643–48. [KJA]

Downloaded from http:/www.cambridge.org/core600 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Sidtis, J. J. & Van Lancker Sidtis, D. (2003) A neurobehavioral approach to dys- (2009) Universals and cultural variation in turn-taking in conversation. prosody. Seminars in Speech and Language 24:93–105. [arHA, CMV] Proceedings of the National Academy of Sciences USA 106:10587–92. Simard, F., Joanette, Y., Petrides, M. Jubault, T., Madjar, C. & Monchi, O. (2011) [DYT] Fronto-striatal contributions to lexical set-shifting. Cerebral Cortex 21(5):1084– Stoeger, A. S., Mietchen, D., Oh, S., de Silva, S. & Herbst, C. T. (2012) An Asian 93. [PL] elephant imitates human speech. Current Biology 22(22):2144–48. [BF] Simon, S. A., de Araujo, I. E., Gutierrez, R. & Nicolelis, M. A. (2006) The neural Stoodley, C. J. & Schmahmann, J. D. (2010) Evidence for topographic organization mechanisms of gustation: A distributed processing code. Nature Reviews Neu- in the cerebellum of motor control versus cognitive and affective processing. roscience 7:890–901. [CMV] Cortex 46(7):831–44. doi: 10.1016/j.cortex.2009.11.008. [UH] Simonyan, K. & Jürgens, U. (2002) Cortico-cortical projections of the motorcortical Stowe, M. K., Turlings, T. C., Loughrin, J. H., Lewis, W. J. & Tumlinson, J. H. (1995) larynx area in the rhesus monkey. Brain Research 949:23–31. [aHA] The chemistry of eavesdropping, alarm, and deceit. Proceedings of the National Simonyan, K. & Jürgens, U. (2005) Afferent subcortical connections into the motor Academy of Science USA 92(1):23–28. [KBC] cortical larynx area in the rhesus monkey. Neuroscience 130:119–31. [aHA] Striedter, G. F. (2005) Principles of brain evolution. Sinauer. [aHA] Sinnott, J. M., Stebbins, W. C. & Moody, D. B. (1975) Regulation of voice amplitude Stringer, C. (2012) Lone survivors: How we came to be the only humans on earth. by the monkey. Journal of the Acoustical Society of America 58:412–14. Times Books, Henry Holt. [aHA] [DJW] Struhsaker, T. T. (1967) Auditory communication among vervet monkeys (Cerco- Skinner, B. F. (1986) The evolution of verbal behavior. Journal of the Experimental pithecus aethiops). In: Social communication among primates, ed. S. A. Analysis of Behavior 45:115–22. [ACC] Altmann, pp. 281–324. University of Chicago Press. [aHA] Skodda, S. (2012) Effect of deep brain stimulation on speech performance in Par- Stuss, D. T., Benson, D. F. (1986) The frontal lobes. Raven. [PL] kinson’s Disease. Parkinson’s Disease 2012(Article No. 850596):1–10. Available Sugiura, H. (1998) Matching of acoustic features during the vocal exchange of coo at: http://dx.doi.org/10.1155/2012/850596 [CMV] calls by Japanese macaques. Animal Behaviour 55:673–87. [aHA] Skodda, S., Grönheit, W. & Schlegel, U. (2011) Intonation and speech rate in Par- Surmeier, D. J., Day, M., Gertler, T., Chan, S. & Shen, W. (2010a) D1 and D2 kinson’s disease: General and dynamic aspects and responsiveness to levodopa dopamine receptor modulation of glutamatergic signaling in striatal medium admission. Journal of Voice 25:199–205. [aHA] spiny neurons. In: Handbook of basal ganglia structure and function, ed. H. Skodda, S., Rinsche, H. & Schlegel, U. (2009) Progression of dysprosody in Par- Steiner, K. Y. Tseng, pp. 113–32. Elsevier. [aHA] kinson’s disease over time – a longitudinal study. Movement Disorders Surmeier, D. J., Day, M., Gertler, T.S, Chan, C. S. & Shen, W. (2010b) Dopami- 24:716–22. [aHA] nergic modulation of striatal glutamatergic signaling in health and Parkinson’s Sliwa, J., Duhamel, J. R., Pascalis, O. & Wirth, S. (2011) Spontaneous voice-face disease. In: Dopamine handbook, ed. L. L. Iversen, S. D. Iversen, S. B. Dunnett identity matching by rhesus monkeys for familiar conspecifics and humans. & A. Björklund, pp. 349–67. Oxford University Press. [aHA] Proceedings of the National Academy of Sciences USA 108:1735–40. [DYT] Sutton, D., Larson, C. & Lindeman, R. C. (1974) Neocortical and limbic lesion Slocombe, K. E. & Zuberbuhler, K. (2007) Chimpanzees modify recruitment effects on primate phonation. Brain Research 71:61–75. [aHA] screams as a function of audience composition. Proceedings of the National Sutton, D., Larson, C., Taylor, E. M. & Lindeman, R. C. (1973) Vocalization in Academy of Sciences USA 104:17228–33. [ARL] rhesus monkeys: Conditionability. Brain Research 52:225–31. [aHA] Smith, A. (2010) Development of neural control of orofacial movements for speech. Sutton, D., Trachy, R. E. & Lindeman, R. C. (1981) Vocal and nonvocal discrimi- In: The handbook of phonetic sciences, 2nd edition, ed. W. J. Hardcastle, J. native performance in monkeys. Brain and Language 14:93–105. [aHA] Laver & F. E. Gibbon, pp. 251–96. Wiley-Blackwell. [aHA] Sutton, D., Trachy, R. E. & Lindeman, R. C. (1985) Discriminative phonation in Smith, M. C., Smith, M. K. & Ellgring, H. (1996) Spontaneous and posed facial macaques: Effects of anterior mesial cortex damage. Experimental Brain Re- expression in Parkinson’s disease. Journal of the International Neuropsycho- search 59:410–13. [aHA] logical Society 2(5):383–91. [AZ] Syal, S. & Finlay, B. L. (2011) Thinking outside the cortex: Social motivation in the Smith, W. K. (1945) The functional significance of the rostral cingulate cortex as evolution and development of language. Developmental Science 14:417–30. revealed by its responses to electrical excitation. Journal of Neurophysiology [DYT] 8:241–55. [aHA] Szalontai, A. & Csiszar, K. (2013) Genetic insights into the functional elements of Snowdon, C. T. (2008) Contextually flexible communication in nonhuman primates. language. Human Genetics 132:959–86. [KJA] In: Evolution of communicative flexibility: Complexity, creativity, and adapt- Taglialatela, J. P., Reamer, L., Schapiro, S. J. & Hopkins, W. D. (2012) Social ability in human and animal communication, ed. D. K. Oller & U. Griebel, pp. learning of a communicative signal in captive chimpanzees. Biology Letters 71–91. MIT Press. [aHA] 8:498–50. [AM] Snowdon, C. T. & Elowson, A. M. (1999) Pygmy marmosets modify call structure Taglialatela, J. P., Russell, J. L., Schaeffer, J. A. & Hopkins, W. D. (2008) Com- when paired. Ethology 105:893–908. [aHA] municative signaling activates Broca’s homolog in chimpanzees. Current Solano, A., Roig, M., Vives-Bauza, C., Hernandez-Peña, J., Garcia-Arumi, E., Playan, Biology 18(5):343–48. [AM] A., Lopez-Perez, M. J., Andreu, A. L. & Montoya, J. (2003) Bilateral striatal Taglialatela, J. P., Russell, J. L., Schaeffer, J. A. & Hopkins, W. D. (2011) Chim- necrosis associated with a novel mutation in the mitochondrial ND6 gene. panzee vocal signaling points to a multimodal origin of human language. PLOS Annals of Neurology 54:527–30. [aHA] ONE 6:e18852. [AM] Spencer, K. A. & MacDougall-Shackleton, S. A. (2011) Indicators of development as Taglialatela, J. P., Savage-Rumbaugh, S. & Baker, L. A. (2003) Vocal production by a sexually selected traits: The developmental stress hypothesis in context. language-competent Pan paniscus. International Journal of Primatology 24:1– Behavioral Ecology 22:1–9. doi: 10.1093/beheco/arq068. [BM] 17. [aHA] Sperli, F., Spinelli, L., Pollo, C. & Seeck, M. (2006) Contralateral smile and laughter, Takahashi, D. Y., Narayanan, D. Z. & Ghazanfar, A. A. (2013) Coupled oscillator but no mirth, induced by electrical stimulation of the cingulate cortex. Epilepsia dynamics of vocal turn-taking in monkeys. Current Biology 23:2162–68. 47:440–43. [aHA] [DYT] Spiteri, E., Konopka, G., Coppola, G., Bomar, J., Oldham, M., Ou, J., Vernes, S. C., Takikawa, Y., Kawagoe, R., Itoh, H., Nakahara, H. & Hikosaka, O. (2002) Modula- Fisher, S. E., Ren, B. & Geschwind, D. H. (2007) Identification of tion of saccadic eye movements by predicted reward outcome. Experimental the transcriptional targets of FOXP2, a gene linked to speech and language, in Brain Research 142(2):284–91. doi: 10.1007/s00221-001-0928-1. [AZ] developing human brain. American Journal of Human Genetics 8:1144–57. Tallerman, M. & Gibson, K. R. (2012) The Oxford handbook of language evolution. [KJA] Oxford University Press. (Oxford Handbooks in Linguistics Series). [aHA] Stark, R. E. (1980) Stages of speech development in the first year of life. In: Child Talmage-Riggs, G., Winter, P., Ploog, D. & Mayer, W. (1972) Effect of deafening on phonology, vol. 1: Production, ed. G. Yeni-Komshian, J. Kavanagh & C. Fer- the vocal behavior of the squirrel monkey (Saimiri sciureus). Folia Primato- guson, pp. 73–90. Academic Press. [DKO] logica 17:404–20. [aHA] Stark, R. E., Bernstein, L. E. & Demorest, M. E. (1993) Vocal communication in the Tamis-LeMonda, C. S., Bornstein, M. H., Baumwell, L. & Damast, A. M. (1996) first 18 months of life. Journal of Speech and Hearing Research 36:548–58. Responsive parenting in the second year: Specificinfluences on children’s lan- [DKO] guage and play. Early Development and Parenting 5:173–83. [MHB] Steinschneider, M., Nourski, K. V. & Fishman, Y. I. (2013) Representation of speech Taylor, J. (2009) Not a chimp: The hunt to find the genes that make us human. Oxford in human auditory cortex: Is it special? Hearing Research 305:57–73. [DHR] University Press. [aHA] Steinschneider, M., Volkov, I. O., Fishman, Y. I., Oya, H., Arezzo, J. C. & Howard, Teichmann, M., Dupoux, E., Kouider, S., Brugières, P., Boissé, M. F., Baudic, S., M. A., 3rd. (2005) Intracortical responses in human and monkey primary au- Cesaro, P., Peschanski, M. & Bachoud-Lévi, A. C. (2005) The role of the stri- ditory cortex support a temporal processing mechanism for encoding of the atum in rule application: The model of Huntington’s disease at early stage. voice onset time phonetic parameter. Cerebral Cortex 15(2):170–86. [DHR] Brain 128:1155–67. [rHA] Sterelny, K. (2012) The evolved apprentice: How evolution made humans unique. Teichmann, M., Gaura, V., Demonet, J. F., Supiot, F., Delliaux, M., Verny, C., MIT Press. [rHA] Renou, P., Remy, P. & Bachoud-Levi, A. C. (2008) Language processing within Stivers, T., Enfield, N. J., Brown, P., Englert, C., Hayashi, M., Heinemann, T., the striatum: Evidence from a PET correlation study in Huntington’s disease. Hoymann, G., Rossano, F., de Ruiter, J. P., Yoon, K. E. & Levinson, S. C. Brain 131(4):1046–56. doi: 10.1093/brain/awn036. [AR]

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 601 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Teinonen, T., Fellman, V., Näätänen, R., Alku, P. & Huotilainen, M. (2009) Statis- Ullman, M. T. (2001) A neurocognitive perspective on language: The declarative/ tical language learning in neonates revealed by event-related brain potentials. procedural model. Nature Reviews Neuroscience 2:717–26. [rHA] BMC Neuroscience 10:21. doi: 10.1186/1471-2202-10-21. [DLB] Ullman, M. T. (2004) Contributions of memory circuits to language: The declarative/ Teki, S., Grube, M., Kumar, S. & Griffiths, T. D. (2011) Distinct neural substrates of procedural model. Cognition 92(1–2):231–70. [AR] duration-based and beat-based auditory timing. The Journal of Neuroscience 31 Ullman, M. T. (2006) Is Broca’s area part of a basal ganglia thalamocortical circuit? (10):3805–12. doi: 10.1523/jneurosci.5561-10.2011. [HH] Cortex 42(4):480–85. doi: 10.1016/S0010-9452(08)70382-4. [AZ] ten Cate, C. & Okanoya, K. (2012) Revisiting the syntactic abilities of non-human Ullman, M. T., Corkin, S., Coppola, M., Hickok, G., Growdon, J. H., Koroshetz, W. animals: Natural vocalizations and artificial grammar learning. Philosophical J. & Pinker, S. (1997) A neural dissociation within language: Evidence that the Transactions of the Royal Society, B: Biological Sciences 367(1598):1984–94. mental dictionary is part of declarative memory, and that grammatical rules are [AR] processed by the procedural system. Journal of Cognitive Neuroscience Teramitsu, I., Kudo, L. C., London, S. E., Geschwind, D. H. & White, S. A. (2004) 9(2):266–76. [AR] Parallel FoxP1 and FoxP2 expression in human and songbird brain predicts Ungerleider, L. G., Doyon, J. & Karni, A. (2002) Imaging brain plasticity during motor functional interaction. Journal of Neuroscience 24:3152–63. [KJA] skill learning. Neurobiology of Learning and Memory 78:553–64. [aHA] Teramitsu, I., Poopatanapong, A., Torrisi, S. & White, S. A. (2010) Striatal FoxP2 is Usui, C., Inoue, Y., Kimura, M., Kirino, E., Nagaoka, S., Abe, M., Nagata, T. & Arai, actively regulated during songbird sensorimotor learning. PLoS ONE 5: H. (2004) Irreversible subcortical dementia following high altitude illness. High e8548. [aHA] Altitude Medicine and Biology 5(1):77–81. [PL] Terao, S., Li, M., Hashizume, Y., Osano, Y, Mitsuma, T. & Sobue, G. (1997) Upper Van Hooff, J. A. R. A. M. (1962) Facial expressions of higher primates. Symposium of motor neuron lesions in stroke patients do not induce anterograde transneuronal the Zoological Society, London 8:97–125. [DYT] degeneration in spinal anterior horn cells. Stroke 28(12):2553–56. [PL] Van Lancker, D. & Cummings, J. L. (1999) Expletives: Neurolinguistic and neuro- Tervaniemi, M. & Huotilainen, M. (2003) The promises of change-related brain behavioral perspectives on swearing. Brain Research Reviews 31:83–104. potentials in cognitive neuro science of music. In: The neurosciences and music, [CM] ed. G. Avanzini, C. Faienza, D. Minciacchi, L. Lopez & L. Majno, pp. 29–39. Van Lancker Sidtis, D., Pachana, N., Cummings, J. L. & Sidtis, J. J. (2006) Dys- New York Academy of Sciences. [DLB] prosodic speech following basal ganglia insult: Toward a conceptual framework Tettamanti, M. & Weniger, D. (2006) Broca’s area: A supramodal hierarchical pro- for the study of the cerebral representation of prosody. Brain and Language cessor? Cortex 42(4):491–94. [AZ] 97:135–53. [aHA, PBM, CMV] Thyagarajan, D., Shanske, S., Vazquez-Memije, M., DeVivo, D. & DiMauro, S. van Schaik, C. P., Ancrenaz, M., Borgen, G., Galdikas, B., Knott, C. D., Singleton, I., (1995) A novel mitochondrial ATPase 6 point mutation in familial bilateral Suzuki, A., Utami, S. S. & Merrill, M. (2003) Orangutan cultures and the evo- striatal necrosis. Annals of Neurology 38:468–72. [aHA] lution of material culture. Science 299:102–105. [aHA] Tishkoff, S. A., Reed, F., Ranciaro, A., Voight, B. F., Babbitt, C. C., Silverman, J. S., van Schaik, C. P. & Burkart, J. (2010) Mind the gap: Cooperative breeding and the Powell, K., Mortensen, H. M., Hirbo, J. B., Osman, M., Ibrahim, M., Omar, evolution of our unique features. In: Mind gap, ed. P. M. Kappeler & J. Silk, pp. S. A., Lema, G., Nyambo, T. B., Ghori, J., Bumpstead, S., Pritchard, J. K., Wray, 477–96. Springer. [ARL] G. A. & Deloukas, P. (2007) Convergent adaptation of human lactase persis- van Schaik, C. P., van Noordwijk, M. A., & Wich, S. A. (2006) Innovation in wild tence in Africa and Europe. Nature Genetics 39(1):31–40. [PL] Bornean orangutans (Pongo pygmaeus wurmbii). Behaviour 143:839–76. Titze, I. R. (2008) Nonlinear source–filter coupling in phonation: Theory. Journal of [aHA] the Acoustical Society of America 123(5):2733–49. [BdB] Vargha-Khadem, F., Gadian, D. G., Copp, A. & Mishkin, M. (2005) FOXP2 and the Tokita, K., Karadi, Z., Shimura, T. & Yamamoto, T. (2004) Centrifugal inputs neuroanatomy of speech and language. Nature Reviews Neuroscience 6:131– modulate taste aversion learning associated parabrachial neuronal activities. 38. [aHA] Journal of Neurophysiology 92:265–79. [CMV] Vargha-Khadem, F. & Passingham, R. (1990) Speech and language defects. Toma, C., Hervásk, A., Torrico, B., Balmañnak, N., Salgado, M., Maristany, M., Nature 346(6281):226. [aHA, KJA] Vilella, E., Martínez-Leal, R., Planelles, M. I., Cuscóc, I., del Campo, M., Vargha-Khadem, F., Watkins, K. E., Alcock, K. J., Fletcher, P. & Passingham, R.E. Pérez-Jurado, L. A., Caballero-Andaluz, R., de Diego-Otero, Y., Pérez-Costillas, (1995) Praxic and nonverbal cognitive deficits in a large family with a genetically L., Ramos-Quiroga, J. A., Ribasés, M., Bayés, M. & Cormand, B. (2013) transmitted speech and language disorder. Proceedings of the National Analysis of two language-related genes in autism: A case–control association Academy of Sciences USA 92:930–33. [aHA] study of FOXP2 and CNTNAP2. 23(2):82–85. doi: Vargha-Khadem, F., Watkins, K. E., Price, C. J., Ashburner, J., Alcock, K. J., Con- 10.1097/YPG.0b013e32835d6fc6. [KJA] nelly, A., Frackowiak, R. S. J., Friston, K. J., Pembrey, M. E., Mishkin, M., Tomasello, M. (2008) Origins of human communication. MIT Press. [aHA, DYT] Gadian, D. G. & Passingham, R. E. (1998) Neural basis of an inherited speech Toscano, J. C., McMurray, B., Dennhardt, J. & Luck, S. J. (2010) Continuous per- and language disorder. Proceedings of the National Academy of Sciences USA ception and graded categorization: Electrophysiological evidence for a linear 95:12695–700. [aHA] relationship between the acoustic signal and perceptual encoding of speech. Venuti, P., Caria, A., Esposito, G., De Pisapia, N., Bornstein, M. H. & de Falco, S. Psychological Science 21:1532–40. [DHR] (2012) Differential brain responses to cries of infants with autistic disorder and Townsend, S. W., Deschner, T. & Zuberbühler, K. (2008) Female chimpanzees use typical development: An fMRI study. Research in Developmental Disabilities 33 copulation calls flexibly to prevent social competition. PLOS ONE 3:e2431. (6):2255–64. [MHB] [ARL] Verplanck, W. S. (2000) Glossary/thesaurus of behavioral terms (CD-ROM). CMS Trachy, R. E., Sutton, D. & Lindeman, R. C. (1981) Primate phonation: Anterior Software. [ACC] cingulate lesion effects on response rate and acoustical structure. American Verwey, W. B. (1996) Buffer loading and chunking in sequential keypressing. Journal Journal of Primatology 1:43–55. [aHA] of Experimental Psychology: Human Perception and Performance 22(3):544– Tremblay, P.-L., Bedard, M.-A., Langlois, D., Blanchet, P. J., Lemay, M. & Parent, 62. [AZ] M. (2010) Movement chunking during sequence learning is a dopamine- Verwey, W. B. (2001) Concatenating familiar movement sequences: The versatile dependant process: A study conducted in Parkinson’s disease. Experimental cognitive processor. Acta Psychologica 106:69–95. [AZ] Brain Research. 205(3):375–85. doi: 10.1007/s00221-010-2372-6. [AZ] Verwey, W. B. & Eikelboom, T. (2003) Evidence for lasting sequence segmentation Tremblay, P.-L., Bedard, M.-A., Levesque, M., Chebli, M., Parent, M., Courte- in the discrete sequence-production task. Journal of Motor Behavior 35(2):171– manche, R. & Blanchet, P. J. (2009) Motor sequence learning in primate: Role 81. doi: 10.1080/00222890309602131. [AZ] of the D2 receptor in movement chunking during consolidation. Behavioural Vicario, C. M. (2013a) FOXP2 gene and language development: The molecular Brain Research 198(1):231–39. doi: 10.1016/j.bbr.2008.11.002. [AZ] substrate of the gestural-origin theory of speech? Frontiers in Behavioral Trevarthen, C. & Aitken, K. J. (2001) Infant intersubjectivity: Research, theory, and Neuroscience 7:99. [CMV] clinical applications. (Annual Research Review.) Journal of Child Psychology Vicario, C. M. (2013b) Inborn mechanisms of food preference and avoidance: The and Psychiatry 42(1):3–48. [KJA] role of polymorphisms in neuromodulatory systems. Frontiers in Molecular Turner, R. S. & Desmurget, M. (2010) Basal ganglia contributions to motor control: Neuroscience 6:16. [CMV] A vigorous tutor. Current Opinion in Neurobiology 20(6):704–16. doi: 10.1016/ Vicario, C. M. (2013c) Uncovering the of reward and aversiveness. j.conb.2010.08.022. [AZ] Frontiers in .6:41. [CMV] Turner, V. (1967) The forest of symbols: Aspects of Ndembu ritual. Cornell University Vihman, M. M. (1996) Phonological development: The origins of language in the Press. [aHA] child. Blackwell. [ACC] Tuttle, R. H. (2007) Apes, intelligent science, and conservation. In: Primate per- Vogt, B. A. & Barbas, H. (1988) Structure and connections of the cingulate vocali- spectives on behavior and cognition, ed. D. A. Washburn, pp. 17–28. American zation region in the rhesus monkey. In: The physiological control of mammalian Psychological Association. [aHA] vocalization, ed. J. D. Newman, pp. 203–25. Plenum Press. [aHA] Tye-Murray, N., Spencer, L. & Woodworth, G. G. (1995) Acquisition of speech by Volterra, V. & Erting C. J., eds. (1990) From gesture to language in hearing children who have prolonged cochlear implant experience. Journal of Speech and deaf children. Springer Series in Language and Communication and Hearing Research 38(2):327–37. [TH] 27:97–106. [KJA]

Downloaded from http:/www.cambridge.org/core602 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

von Humboldt, W. (1836/1971) Über die Verschiedenheit des Menschlichen Wich, S. A., Swartz, K., Hardus, M. E., Lameira, A. R., Stromberg, E. & Shumaker, Sprachbaues und ihren Einfluss auf die geistige Entwicklung des Menschen- R. (2009) A case of spontaneous acquisition of a human sound by an orangutan. geschlechts [Linguistic variability and intellectual development], trans. G. C. Primates 50:56–64. [aHA, ARL] Buck & F. Raven. Royal Academy of Sciences/University of Miami Press. Wichmann, T. & DeLong, M. R. (2007) Epidemiology of Parkinson’s disease. In: (Original work published in 1836; English translation from the German by Buck Parkinson’s disease and related disorders, Part 1, ed. W. C. Koller & E. & Raven in 1971). [BM] Melamed, pp. 3–18. (Handbook of Clinical Neurology, vol. 83, 3rd series). Voorn, P., Vanderschuren, L. J. M. J., Groenewegen, H. J., Robbins, T. W. & Pen- Elsevier Press. [aHA] nartz, C. M. A. (2004) Putting a spin on the dorsal–ventral divide of the stria- Wickens, J. R., Horvitz, J. C., Costa, R. M. & Killcross, S. (2007) Dopaminergic tum. Trends in Neurosciences 27:468–74. [aHA] mechanisms in actions and habits. Journal of Neuroscience 27:8181–83. Vorperian, H. K. & Kent, R. D. (2007) Vowel acoustic space development in chil- [aHA] dren: A synthesis of acoustic and anatomic data. Journal of Speech Language Wiener, M., Turkeltaub, P. & Coslett, H. H. (2010) The image of time: A voxel-wise and Hearing Research 50(6):1510–45. [DLB] meta-analysis. NeuroImage 49:1728–40. [HH] Vorperian, H. K., Wang, S., Chung, M. K., Schimek, E. M., Durtschi, R. B., Kent, R. Wild, B., Rodden, F. A., Grodd, W. & Ruch, W. (2003) Neural correlates of laughter D., Ziegert, A. J. & Gentry, L. R. (2009) Anatomic development of the oral and and humour. Brain 126:2121–38. [aHA] pharyngeal portions of the vocal tract: An imaging study. Journal of Acoustic Wild, J. M. (1993) Descending projections of the songbird nucleus robustus, archi- Society of America 125:1666–78. [DLB] striatalis. Journal of Comparative Neurology 338:225–41. [BM] Vouloumanos, A., Hauser, M. D., Werker, J. F. & Martin, A. (2010) The tuning of Wild, J. M. (1997) Neural pathways for the control of birdsong production. Journal of human neonates’ preference for speech. Child Development 81(2):517–27. [BF] Neurobiology 33:653–70. [BM, CIP] Wagner, M. (2010) Prosody and recursion in coordinate structures and beyond. Wild, J. M. (2008) Birdsong: Anatomical foundations and central mechanisms of Natural Language and Linguistic Theory 28(1):183–237. doi: 10.1007/s11049- sensorimotor integration, In: Neuroscience of birdsong, ed. H. P. Zeigler & P. 009-9086-0. [AR] Marler, pp. 136–51. Cambridge University Press. [aHA] Wagner, M. & Watson, D. G. (2010) Experimental and theoretical advances in Wildgruber, D., Ackermann, H., Kreifelts, B. & Ethofer, T. (2006) Cerebral pro- prosody: A review. Language and Cognitive Processes 25(7–9):905–45. doi: cessing of linguistic and emotional prosody: fMRI studies. In: Understanding 10.1080/01690961003589492. [AR] emotions, ed. S. Anders, G. Ende, M. Junghofer, J. Kissler & D. Wildgruber, pp. Wallman, J. (1992) Aping language. Cambridge University Press. [aHA] 249–68. (Series: Progress in Brain Research, vol. 156). Elsevier. [aHA, UH] Walters, J. R. & Bergstrom, D. A. (2010) Synchronous activity in basal ganglia cir- Willuhn, I. & Steiner, H. (2008) Motor-skill learning in a novel running-wheel task is cuits. In: Handbook of basal ganglia structure and function, ed. H. Steiner & dependent on D1 dopamine receptors in the striatum. Neuroscience 153:249– K. Y. Tseng, pp. 429–43. Elsevier. [aHA] 58. [aHA] Watkins, K. E., Dronkers, N. F. & Vargha-Khadem, F. (2002a) Behavioural analysis Wilson, B., Slater, H., Kikuchi, Y., Milne, A. E., Marslen-Wilson, W., Smith, K. & of an inherited speech and language disorder: Comparison with acquired Petkov, C. I. (2013) Auditory artificial-grammar learning in macaque and aphasia. Brain: A Journal of Neurology 125(Pt. 3):452–64. [aHA, AZ] marmoset monkeys. Journal of Neuroscience 33(48):18825–35. Open Access Watkins, K. E., Gadian, D. G. & Vargha-Khadem, F. (1999) Functional and struc- publication. PMC3841451. [CIP] tural brain abnormalities associated with a genetic disorder of speech and lan- Wilson, M. & Wilson, T. P. (2005) An oscillator model of the timing of turn-taking. guage. American Journal of Human Genetics 65:1215–21. [aHA] Psychonomic Bulletin and Review 12:957–68. [GAB, rHA] Watkins, K. E., Vargha-Khadem, F., Ashburner, J., Passingham, R. E., Connelly, A., Wiltermuth, S. S. & Heath, C. (2009) Synchrony and cooperation. Psychological Friston, K. J., Frackowiak, R. S., Mishkin, M. & Gadian, D. G. (2002b) MRI Science 20:1–5. [GAB] analysis of an inherited speech and language disorder: Structural brain abnor- Winkler, I., Háden, G. P., Ladinig, O., Sziller, I. & Honing, H. (2009) Newborn malities. Brain 125 (Pt. 3):465–78. [aHA, UH] infants detect the beat in music. Proceedings of the National Academy of Sci- Watson, R. T., Fleet, W. S., Gonzalez-Rothi, L. & Heilman, K. M. (1986) Apraxia ences USA 106(7):2468–71. doi: 10.1073/pnas.0809035106. [HH] and the supplementary motor area. Archives of Neurology 43:787–92. [aHA] Winkler, I., Kushnerenko, E., Horvath, J., Ceponiené, R., Fellman, V., Huotilainen, Wattendorf, E., Westermann, B., Fiedler, K., Kaza, E., Lotze, M. & Celio, M. R. M., Näätänen, R. & Sussman, E. (2003) Newborn infants can organize the (2013) Exploration of the neural correlates of ticklish laughter by functional auditory world. Proceedings of the National Academy of Sciences USA 100 magnetic resonance imaging. Cerebral Cortex 23(6):1280–89. doi: 10.1093/ (2):11812–15. [DLB] cercor/bhs094. [SF, CM] Winter, P., Handley, P., Ploog, D. & Schott, D. (1973) Ontogeny of squirrel monkey Waxman, S. R. & Gelman, S. A. (2009) Early word-learning entails reference, not calls under normal conditions and under acoustic isolation. Behaviour 47:230– merely associations. Trends in Cognitive Sciences 13(6):258–63. [BF] 39. [aHA, DHR] Weaver, T. D., Roseman, C. C. & Stringer, C. B. (2008) Close correspondence Winter, P., Ploog, D. & Latta, J. (1966) Vocal repertoire of the squirrel monkey between quantitative- and molecular-genetic divergence times for Neandertals (Saimiri sciureus), its analysis and significance. Experimental Brain Research and modern humans. Proceedings of the National Academy of Sciences USA 1:359–84. [aHA] 105:4645–49. [aHA] Wittforth, M., Schröder, C., Schardt, D. M., Dengler, R., Heinze, H. J. & Kotz, S. A. Weiller, C., Willmes, K., Reiche, W., Thron, A., Isensee, C., Buell, U. & Ringelstein, (2010) On emotional conflict: Interference resolution of happy and angry E. B. (1993) The case of aphasia or neglect after striatocapsular infarction. Brain prosody reveals valence-specific effects. Cerebral Cortex 20(2):383–92. 116:1509–25. [rHA] [KBC, rHA] Weismer, G. (1980) Control of the voicing distinction for intervocalic stops and Wu, T. & Hallett, M. (2005) A functional MRI study of automatic movements in fricatives: Some data and theoretical considerations. Journal of Phonetics patients with Parkinson’s disease. Brain 128:2250–59. [aHA] 8:427–38. [aHA] Wu, T., Kansaku, K. & Hallett, M. (2004) How self-initiated memorized movements Weiss, D. J. & Hauser, M. D. (2002) Perception of harmonics in the combination become automatic: A functional MRI study. Journal of Neurophysiology long call of cotton-top tamarins (Saguinus oedipus). Animal Behaviour 91:1690–98. [aHA] 64:415–26. [DJW] Wymbs, N. F., Bassett, D. S., Mucha, P. J., Porter, M. A. & Grafton, S. T. (2012) Weiss, D. J., Garibaldi, B. T. & Hauser, M. D. (2001) The production and perception Differential recruitment of the sensorimotor putamen and frontoparietal cortex of long calls by cotton-top tamarins (Saguinus oedipus): Acoustic analyses and during motor chunking in humans. Neuron 74(5):936–46. doi: 10.1016/j. playback experiments. Journal of Comparative Psychology 15(3):258–71. neuron.2012.03.038. [AZ] [DJW] Yale, M. E., Messinger, D. S., Cobo-Lewis, A. B., Oller, D. K. & Eilers, R. E. (1999) West, R. A. & Larson, C. R. (1995) Neurons of the anterior mesial cortex related An event-based analysis of the coordination of early infant vocalizations and to faciovocal activity in the awake monkey. Journal of Neurophysiology facial actions. Developmental Psychology 35(2):505–13. [DKO] 74:1856–69. [aHA] Yildiz, I. B., von Kriegstein, K. & Kiebel, S. J. (2013) From birdsong to human Whitham, J., Gerald, M. & Maestripieri, D. (2007) Intended receivers and functional speech recognition: Bayesian inference on a hierarchy of nonlinear dynamical significance of grunt and gurney vocalizations in free-ranging rhesus macaques. systems. PLOS Computational Biology 9:e1003219. [GP] Ethology 113:862–74. [DHR] Yin, H. H., Mulcare, S. P., Hilário, M. R. F., Clouse, E., Holloway, T., Davis, M. I., Whitty, C. W. M. (1955) Effects of anterior cingulectomy in man. Proceedings of the Hansson, A. C., Lovinger, D. M. & Costa, R. M. (2009) Dynamic reorganization Royal Society of Medicine 48:463–69. [aHA] of striatal circuits during the acquisition and consolidation of a skill. Nature Wich, S. A. & de Vries, H. (2006) Male monkeys remember which group members Neuroscience 12:333–41. [aHA] have given alarm calls. Proceedings of the Royal Society of London, Series B: Yin, J., Ma, J., Zhang, S. & Metzner, W. (2008) FoxP2 expression in the brain of Biological Sciences 273:735–40. [aHA] echolocating and non-echolocating bats and its possible role in vocalization. Wich, S. A., Krützen, M., Lameira, A. R., Nater, A., Arora, N., Bastian, M. L., Paper presented at the 38th Annual Meeting of the Society for Neuroscience, Meulman, E., Morrogh-Bernard, H. C., Atmoko, S. S. U., Pamungkas, J., Washington, DC, 2008. Program No. 796.13. [KJA] Perwitasari-Farajallah, D., Hardus, M. E., van Noordwijk, M. & van Schaik, C. Yotova, V., Lefebvre, J.-F., Moreau, C., Gbeha, E., Hovhannesyan, K., Bourgeois, S., P. (2012) Call cultures in orang-utans? PLOS ONE 7:e36180. [ARL] Bédarida, S., Azevedo, L., Amorim, A., Sarkisian, T., Avogbe, P., Chabi, N.,

Downloaded from http:/www.cambridge.org/core. Max-Planck-Institut fuer Psycholinguistik, on 21 Sep 2016 at 13:18:00BEHAVIORAL, subject to AND the Cambridge BRAIN SCIENCES Core terms (2014) of use, 37:6 available at 603 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007 References/Ackermann et al.: Brain mechanisms of acoustic communication in humans and nonhuman primates

Dicko, M. H., Kou’Santa Amouzou, E. S., Sanni, A., Roberts-Thomson, J., Ziegler, W. (2010) Apraxic failure and the hierarchical structure of speech Boettcher, B., Scott, R. J. & Labuda, D. (2011) An X-linked haplotype of Ne- motor plans: A nonlinear probabilistic model. In: Assessment of motor andertal origin is present among all non-African populations. Molecular Biology speech disorders, ed. A. Lowit & R. D. Kent, pp. 305–23. Plural Publishing. and Evolution 28:1957–62. [SJ] [aHA] Zarco, W., Merchant, H., Prado, L. & Mendez, J. C. (2009) Subsecond timing in Ziegler, W., Aichert, I. & Staiger, A. (2012) Apraxia of speech: Concepts and primates: Comparison of interval production between human subjects and controversies. Journal of Speech, Language, and Hearing Research 55:S1485– rhesus monkeys. Journal of Neurophysiology 102(6):3191–202. doi: 10.1152/ 501. [aHA, DHR] jn.00066.2009. [HH] Ziegler, W., Kilian, B. & Deger, K. (1997) The role of the left mesial frontal cortex in Zhang, J., Webb, D. M. & Podlaha, O. (2002) Accelerated protein evolution and fluent speech: Evidence from a case of left supplementary motor area hemor- origins of human-specific features: FOXP2 as an example. Genetics 162:1825– rhage. Neuropsychologia 35:1197–208. [aHA] 35. [aHA] Zuberbühler, K. (2000a) Causal cognition in a nonhuman primate: Field playback Zhang, S. P., Bandler, R. & Davis, P. J. (1995) Brain stem integration of vocalization: experiments with Diana monkeys. Cognition 76(3):195–207. [aHA, KBC] Role of the nucleus retroambigualis. Journal of Neurophysiology 74:2500–12. Zuberbühler, K. (2000b) Referential labelling in Diana monkeys. Animal Behaviour [PBM] 59(5):917–27. [DLB] Zhang, S. P., Davis, P. J., Bandler, R. & Carrive, P. (1994) Brain stem integration of Zuberbühler, K., Cheney, D. L. & Seyfarth, R. M. (1999) Conceptual semantics in a vocalization: Role of the midbrain periaqueductal gray. Journal of Neurophys- nonhuman primate. Journal of Comparative Psychology 113:33–42. [aHA, KBC] iology 72:1337–56. [aHA] Zuberbühler, K. & Jenny, D. (2007) Interaction between leopard and monkeys. Ziegler, W. (2008) Apraxia of speech. In: Neuropsychology and behavioral neurol- In: Monkeys of the Taï Forest: An African primate community, ed. W. S. ogy, ed. G. Goldenberg & B. L. Miller, pp. 269–85. (Handbook of clinical McGraw, K. Zuberbühler, & R. Noe, pp. 133–54. Cambridge University neurology, vol. 88, 3rd series). Elsevier Press. [aHA] Press. [aHA]

Downloaded from http:/www.cambridge.org/core604 BEHAVIORAL AND. Max-Planck-Institut BRAIN SCIENCES fuer Psycholinguistik (2014) 37:6 , on 21 Sep 2016 at 13:18:00, subject to the Cambridge Core terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0140525X13004007