Facial Expressions and the Evolution of the Speech Rhythm

Facial Expressions and the Evolution of the Speech Rhythm

Facial Expressions and the Evolution of the Speech Rhythm Asif A. Ghazanfar and Daniel Y. Takahashi Downloaded from http://mitprc.silverchair.com/jocn/article-pdf/26/6/1196/1781192/jocn_a_00575.pdf by MIT Libraries user on 17 May 2021 Abstract ■ In primates, different vocalizations are produced, at least in tral primates. If this hypothesis has any validity, we expect that part, by making different facial expressions. Not surprisingly, the extant nonhuman primates produce at least some facial ex- humans, apes, and monkeys all recognize the correspondence pressions with a speech-like rhythm in the 3- to 8-Hz frequency between vocalizations and the facial postures associated with range. Lip smacking, an affiliative signal observed in many genera them. However, one major dissimilarity between monkey vocali- of primates, satisfies this criterion. We review a series of studies zations and human speech is that, in the latter, the acoustic out- using developmental, x-ray cineradiographic, EMG, and percep- put and associated movements of the mouth are both rhythmic tual approaches with macaque monkeys producing lip smacks (in the 3- to 8-Hz range) and tightly correlated, whereas monkey to further investigate this hypothesis. We then explore its vocalizations have a similar acoustic rhythmicity but lack the con- putative neural basis and remark on important differences commitant rhythmic facial motion. This raises the question of between lip smacking and speech production. Overall, the data how we evolved from a presumptive ancestral acoustic-only vocal support the hypothesis that lip smacking may have been an rhythm to the one that is audiovisual with improved perceptual ancestral expression that was linked to vocal output to produce sensitivity. According to one hypothesis, this bisensory speech the original rhythmic audiovisual speech-like utterances in the rhythm evolved through the rhythmic facial expressions of ances- human lineage. ■ INTRODUCTION mandibular positions, and the motion of such articulators Both speech and nonhuman primate vocalizations are influences the acoustics of the signal (Hauser & Ybarra, produced by the coordinated movements of the lungs, 1994; Hauser et al., 1993). For example, coo calls, like /u/ larynx (vocal folds), and the supralaryngeal vocal tract in speech, are produced with the lips protruded, whereas (Ghazanfar & Rendall, 2008; Fitch & Hauser, 1995). The screams, like the /i/ in speech, are produced with the lips vocal tract consists of the pharynx, mouth, and nasal cavity retracted (Figure 1). Facial motion cues used by humans through which a column of air is produced. The shape for speech reading are present during primate vocal pro- of this column determines its resonance properties and duction as well. The fact that different vocalizations are thus in which frequency bands of the sound produced produced through different facial expressions and are at the laryngeal source get emphasized or suppressed. therefore inherently “multisensory” is typically ignored by During vocal production, the shape of the vocal tract theories regarding the evolution of speech/language that can be changed by moving the various effectors of the focus solely on laryngeal control by the neocortex (or the face (including the lips, jaw, and tongue) into different lack thereof; Arbib, 2005; Jarvis, 2004). positions. The different shapes, along with changes in Naturally, any vertebrate organism (from fishes and vocal fold tension and respiratory power, are what give frogs to birds and dogs) that produces vocalizations will rise to different sounding vocalizations. To put it sim- have a simple, concomitant visible motion in the area of ply: Different vocalizations (including different speech the mouth. However, in the primate lineage, both the sounds) are produced in part by making different facial number and diversity of muscles innervating the face expressions. (Burrows, Waller, & Parr, 2009; Huber, 1930a, 1930b) Vocal tract motion not only changes the acoustics of and the amount of neural control related to facial move- vocalizations by changing their resonance frequencies but ment (Sherwood, 2005; Sherwood et al., 2005; Sherwood, also results in the predictable deformation of the face around Holloway, Erwin, & Hof, 2004; Sherwood, Holloway, Erwin, the mouth and other parts of the face (Yehia, Kuratate, & Schleicher, et al., 2004) increased over the course of Vatikiotis-Bateson, 2002; Yehia, Rubin, & Vatikiotis-Bateson, evolution relative to other taxa. This increase in the num- 1998; Hauser & Ybarra, 1994; Hauser, Evans, & Marler, ber of muscles allowed the production of a greater diversity 1993). Different macaque monkey (Macaca spp.) vocaliza- of facial and vocal expressions in primates (Andrew, 1962). tions are produced with unique lip configurations and The inextricable link between vocal output and facial expressions allows many nonhuman primates to recognize the correspondence between the visual and auditory com- Princeton University ponents of vocal signals. Macaque monkeys (Macaca © 2014 Massachusetts Institute of Technology Journal of Cognitive Neuroscience 26:6, pp. 1196–1207 doi:10.1162/jocn_a_00575 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_00575 by guest on 28 September 2021 Figure 1. Different facial expressions are produced concomitantly with different vocalizations. Rhesus monkey coo and scream calls. Video frames extracted at the midpoint of the expressions with their corresponding spectrograms. X axis depicts time in seconds; y axis depicts Downloaded from http://mitprc.silverchair.com/jocn/article-pdf/26/6/1196/1781192/jocn_a_00575.pdf by MIT Libraries user on 17 May 2021 frequency in kHz. mulatta), capuchins (Cebus apella), and chimpanzees (Pan et al., 2009; Greenberg, Carvey, Hitchcock, & Chang, troglodytes) all recognize auditory–visual correspondences 2003; Crystal & House, 1982; Malecot, Johonson, & between their various vocalizations (Evans, Howell, & Kizziar, 1972; Figure 2A). Both mouth motion and the Westergaard, 2005; Izumi & Kojima, 2004; Parr, 2004; acoustic envelope of speech are rhythmic. This 3- to 8-Hz Ghazanfar & Logothetis, 2003). For example, macaque rhythm is critical to speech perception. Disrupting the monkeys, without training or reward, match individual acoustic component of this rhythm significantly reduces identity and expression types across modalities (Sliwa, intelligibility (Elliot & Theunissen, 2009; Ghitza & Greenberg, Duhamel, Pascalis, & Wirth, 2011; Ghazanfar & Logothetis, 2009; Smith, Delgutte, & Oxenham, 2002; Saberi & Perrott, 2003), segregate competing voices in noisy conditions 1999; Shannon, Zeng, Kamath, Wygonski, & Ekelid, 1995; using vision (Jordan, Brannon, Logothetis, & Ghazanfar, Drullman, Festen, & Plomp, 1994), as does disrupting the 2005), and use formant frequencies to estimate the body visual components arising from facial movements (Vitkovitch size of conspecifics (Ghazanfar et al., 2007). More recently, & Barber, 1996). Thus, the speech rhythm parses the signal monkeys trained to detect vocalizations in noise demon- into basic units from which information on a finer (faster) strated that seeing concomitant facial motion sped up temporal scale can be extracted (Ghitza, 2011). Given the their RTs in a manner identical to that of humans detect- importance of this rhythm in speech and its underlying neuro- ing speech sounds (Chandrasekaran, Lemus, Trubanova, physiology (Ghazanfar & Poeppel, in press; Giraud & Gondan, & Ghazanfar, 2011). Poeppel, 2012), understanding how speech evolved requires There are also some very important differences in investigating the origins of its bisensory rhythmic structure. howhumansversusnonhumanprimatesproducetheir Oddly enough, macaque monkey vocalizations have a utterances (Ghazanfar & Rendall, 2008), and these differ- similar acoustic rhythmicity but without the concomitant ences further enhance human multisensory communica- and temporally correlated rhythmic facial motion (Figure 2B). tion above and beyond what monkeys can do. One Macaque vocalizations are typically produced with a single universal feature of speech—typically lacking in monkey ballistic facial motion. Modulation spectra analyses of the vocalizations—is its bisensory rhythm. That is, when acoustic rhythmicity of macaque monkey vocalizations humans speak, both the acoustic output and the move- longer than 400 msec reveals that their rhythmicity is ments of the mouth are highly rhythmic and tightly cor- strikingly similar to that of the acoustic envelope for related with each other (Chandrasekaran, Trubanova, speech (Figure 3A). Both signals fall within the 3- to 8-Hz Stillittano, Caplier, & Ghazanfar, 2009). range (see also Cohen, Theunissen, Russ, & Gill, 2007, for shared low-frequency components of macaque monkey calls and speech). Moreover, examination of a single call category (Figure 3B, top) or an exemplar (Figure 3B, bottom) THE HUMAN SPEECH RHYTHM VERSUS shows that this rhythmicity is not the result of averaging MONKEY VOCALIZATIONS across call-type categories or averaging within a single call Across all languages studied to date, speech typically category. Thus, one key evolutionary question is: How exhibits a 3- to 8-Hz rhythm that is, for the most part, did we evolve from a presumptive ancestral unisensory, related to the rate of syllable production (Chandrasekaran acoustic-only vocal rhythm (Figure 2B) to the one that is Ghazanfar and Takahashi 1197 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_00575

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us