Typical Brain Networks and Individual Differences
Total Page:16
File Type:pdf, Size:1020Kb
Reading Fluent Speech from Talking Faces: Typical Brain Networks and Individual Differences Deborah A. Hall, Clayton Fussell, and A. Quentin Summerfield Downloaded from http://mitprc.silverchair.com/jocn/article-pdf/17/6/939/1757281/0898929054021175.pdf by guest on 18 May 2021 Abstract & Listeners are able to extract important linguistic informa- occipital cortex, whereas visual speech additionally engaged the tion by viewing the talker’s face—a process known as middle temporal gyrus, inferior and middle frontal gyri, and the ‘‘speechreading.’’ Previous studies of speechreading present inferior parietal lobe, particularly in the left hemisphere. These small closed sets of simple words and their results indicate that latter regions are implicated in lexical stages of spoken visual speech processing engages a wide network of brain language processing. Although auditory speech generated regions in the temporal, frontal, and parietal lobes that are extensive bilateral activation across both superior and middle likely to underlie multiple stages of the receptive language temporal gyri, the group-averaged pattern of speechreading system. The present study further explored this network in a activation failed to include any auditory regions along the large group of subjects by presenting naturally spoken superior temporal gyrus, suggesting that fluent visual speech sentences which tap the richer complexities of visual speech does not always involve sound-based coding of the visual input. processing. Four different baselines (blank screen, static face, An important finding from the individual subject analyses was nonlinguistic facial gurning, and auditory speech) enabled us that activation in the superior temporal gyrus did reach to determine the hierarchy of neural processing involved in significance ( p < .001, small-volume corrected) for a subset speechreading and to test the claim that visual input reliably of the group. Moreover, the extent of the left-sided superior accesses sound-based representations in the auditory cortex. temporal gyrus activity was strongly correlated with speech- In contrast to passively viewing a blank screen, the static-face reading performance. Skilled speechreading was also associ- condition evoked activation bilaterally across the border of the ated with activations and deactivations in other brain regions, fusiform gyrus and cerebellum, and in the medial superior suggesting that individual differences reflect the efficiency of a frontal gyrus and left precentral gyrus ( p < .05, whole brain circuit linking sensory, perceptual, memory, cognitive, and corrected). With the static face as baseline, the gurning face linguistic processes rather than the operation of a single evoked bilateral activation in the motion-sensitive region of the component process. & INTRODUCTION information facilitates perceptual accuracy by providing Speechreading is the ability to understand a talker by supplementary information about the spatial location of viewing the movements of their lips, teeth, tongue, and the talker and about the segmental elements of the jaw. Visual speech therefore conveys specific linguistic speech acoustic waveform, as well as complementary information that is separate from the analysis of facial information about consonant place of articulation that is features and of nonmeaningful movements of the face most susceptible to acoustic distortion (Summerfield, (Figure 1). In isolation, speechreading is rarely perfect 1987). Thus, linguistic facial movements are particularly because some phonetic distinctions between conso- informative for speech perception when the auditory nants do not have visible articulatory correlates and signal is degraded by noise (Sumby & Pollack, 1954), fluent speech contains coarticulation between words. reverberation, or the distortions introduced by a senso- Nevertheless, linguistic facial movements exert an auto- rineural hearing impairment (Jeffers & Barley, 1971). matic influence on the perception of heard speech; Macleod and Summerfield (1987, 1990) demonstrated famously demonstrated by the fusion illusion in which, that the comprehension benefit at poor acoustic signal- for example, the congruent presentation of a heard to-noise ratios from seeing the talker’s face was closely syllable [ba] and a seen utterance [ga] generates the related to the success with which those subjects were subjective impression of hearing [da] (McGurk & Mac- able to perform a comparable silent speechreading test Donald, 1976). In normal listening situations, visual (r = .9, p < .01). Therefore, silent speechreading and audiovisual speech perception are likely to share com- mon processes of visual analysis. As a consequence, tests of speechreading have often been used to probe the MRC Institute of Hearing Research, Nottingham, UK patterns of brain activation that support audiovisual D 2005 Massachusetts Institute of Technology Journal of Cognitive Neuroscience 17:6, pp. 939–953 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/0898929054021175 by guest on 02 October 2021 Downloaded from http://mitprc.silverchair.com/jocn/article-pdf/17/6/939/1757281/0898929054021175.pdf by guest on 18 May 2021 Figure 1. Freeze-frame clips of one of the talkers (AQS) used in the present study. Both the gurning face and visual speech clips were extracted from a 3.9-sec recording of dynamic facial articulations. speech processing. The results of these neuroimaging (Calvert & Campbell, 2003). Speechreading activation studies are reviewed in the following section. typically also includes the precentral gyrus (BA 6) and parietal lobe (BA 7 and 40) in both hemispheres. It is proposed that the processes underlying perception and Neural Underpinnings of Speechreading Simple action might share a common representational frame- Word Lists work such that premotor, parietal, and inferior frontal brain regions could support the perception of meaning- The neural basis of speechreading has been measured ful actions (Decety & Grezes, 1999). Indeed, observing by presenting subjects with simple word lists such as the speech-related lip movements has been shown to en- numbers 1 to 10 or other closed sets of high-frequency hance the excitability of the primary motor units under- words. One of the first studies to elucidate some of the lying speech production, particularly those in the left brain areas specifically involved in speechreading pre- hemisphere (Watkins, Strafella, & Paus, 2003). sented five normally hearing individuals with auditory speech, visual speech, pseudo-speech, closed-mouth gurning, static face, and silent baseline conditions (Cal- Individual Differences in Speechreading Ability vert, Bullmore, et al., 1997). Relative to the static face, visual speech engaged the extrastriate visual cortex Closed sets of high-frequency monosyllabic or bi- implicated in the detection of visual motion and the syllabic words are relatively easy to speechread and superior temporal gyrus traditionally viewed as the so previous neuroimaging studies tend to report per- unimodal auditory cortex. Parts of the auditory cortex formance levels that are close to ceiling. However, for were claimed to be involved in encoding at a prelexical, fluent speech, large individual differences in speech- phonological stage of visual speech processing because reading ability exist in both normally hearing and they were also engaged by pseudo-speech, but not by hearing-impaired populations. Fluent visual speech con- the gurning face. At the time of publication, fMRI tains many ambiguities because some phonetic dis- methodology was at an early stage of development: tinctions do not have visible articulatory correlates The imaging view was limited to a portion of the brain, and there is coarticulation across word boundaries. the activation maps were those of a simple composite Speechreading ability can be measured by scoring group average, and the functional localization was ap- the percentage of content words correctly reported proximate. Nevertheless, the finding that visual speech in short sentences (Macleod & Summerfield, 1990). elicits superior temporal gyrus activation has been rep- Scores on such tests often range from less than 10% licated many times since (Calvert & Campbell, 2003; correct to over 70% correct among any group screened Bernstein et al., 2002; MacSweeney et al., 2002; Camp- to have normal vision and similar hearing status and bell et al., 2001; Calvert, Campbell, & Brammer, 2000), age. For 20 normally hearing adults, Macleod and and even in the absence of scanner acoustic noise Summerfield (1987) reported a spread in performance (Paulesu et al., 2003; MacSweeney et al., 2000). Whole- across the bottom 50% range of scores on a subset brain scanning has revealed that visual speech generates of the British-English sentences developed by Bamford robust activation in the inferior frontal gyrus (BA 44 and and Wilson (1979). For 68 deaf children aged 9 to 45), occipito-temporal junction (BA 37), middle tempo- 19 years, Heider and Heider (1940) reported a spread ral gyrus (BA 21), and superior temporal sulcus, but of scores from 11% to 93% on a story-like sentence generally with greater activation on the left than the comprehension test. right side. These regions are implicated in traditional Because the intersubject variability in speechreading speech-processing circuitry and can be engaged even is much greater than in auditory speech perception when syllables, not words or sentences, are presented (Macleod & Summerfield,