<<

Perception & Psychophysics 2000, 62 (2), 233-252

Speech perception without hearing

LYNNE E. BERNSTEIN House Ear Institute, Los Angeles, California

MARILYN E. DEMOREST University ofMaryland, Baltimore County, Catonsville, Maryland and PAULA E. TUCKER House Ear Institute, Los Angeles, California

In this study of visual phonetic perception without accompanying auditory speech stimuli, adults with normal hearing (NH; n = 96) and with severely to profoundly impaired hearing (IH;n = 72) identified consonant- (CV)nonsense syllables and words in isolation and in sentences. The mea­ sures of phonetic perception were the proportion of correct and the proportion of trans­ mitted feature information for CVs, the proportion of phonemes correct for words, and the proportion of phonemes correct and the amount of substitution entropy for sentences. The results demonstrated greater sensitivity to phonetic information in the IH group. Transmitted feature infor­ mation was related to isolated word scores for the IHgroup, but not for the NH group. Phoneme errors in sentences were more systematic in the IH than in the NH group. Individual differences in phonetic perception for CVs were more highly associated with word and sentence performance for the IH than for the NH group. The results suggest that the necessity to perceive speech without hearing can be as­ sociated with enhanced visual phonetic perception in some individuals.

Viewing a talker is generally thought to afford extremely based improvement in visual (Clouser, impoverished phonetic information, even though visual 1977; Conrad, 1977; Green, 1998; Lyxell & Ronnberg, speech stimuli are known to influence heard speech (see, 1991a, 1991b; Massaro, 1987; Mogford, 1987; Ronnberg, e.g., Dekle, Fowler, & Funnell, 1992; Green & Kuhl, 1995; Ronnberg, Ohngren, & Nilsson, 1982; Summer­ 1989; MacLeod & Summerfield, 1990; McGurk & Mac­ field, 1987; cf. Pelson & Prather, 1974; Tillberg, Ronn­ Donald, 1976; Middleweerd & Plomp, 1987; Rosenblum berg, Svard, & Ahlner, 1996), and that, to achieve even & Fowler, 1991; Sekiyama & Tohkura, 1991; Sumby & moderate accuracy, lipreaders must rely on top-down psy­ Pollack, 1954; Summerfield, 1987; Summerfield & Mc­ cholinguistic processes, nonlinguistic contexts, and strate­ Grath, 1984). The literature on lipreading (speechread­ gic processes, such as guessing (Jeffers & Barley, 1971; ing) reports that visual speech stimuli are so impover­ Lyxell & Ronnberg, 1987; Ronnberg, 1995). ished that segmental distinctiveness is highly reduced The present study questions this characterization-in (Fisher, 1968; Kuhl & Meltzoff, 1988; Massaro, 1998; particular, whether high levels of visual phonetic per­ Owens & Blazek, 1985), that accuracy for lipreading sen­ ception are impossible and whether examples ofenhanced tences rarely exceeds 10%-30% words correct (Ronn­ lipreading accuracy are not, in fact, associated with hear­ berg, 1995; Ronnberg, Samuelsson, & Lyxell, 1998), that ing impairment. Our skepticism about the characteriza­ relying on visual speech out ofnecessity, as a consequence tions in the literature arose in the course ofstudies on vi­ ofdeafness, does not result in any substantial experientially brotactile devices to aid lipreading, in which adults with congenital or early-onset profound hearing impairment were employed (Bernstein, Coulter, O'Connell, Eber­ This study was funded by a grant from the NIH National Institute on hardt, & Demorest, 1993; Bernstein, Demorest, Coulter, Deafness and Other Communication Disorders (DC02l07), The authors & O'Connell, 1991). In formal testing and informal com­ thank Edward T. Auer, Jr., Lawrence Rosenblum, and Robin Waldstein munication, these individuals demonstrated unexpect­ for their comments on the manuscript and Brian Chaney for his help with edly accurate lipreading, without the vibrotactile devices. the transmitted information analyses. Collection of the data for this study took place while the first author was a Senior Research Scientist In our research on lipreading in hearing adults (Demorest at the Gallaudet Research Institute, Gallaudet University, Washington, & Bernstein, 1992; Demorest, Bernstein, & DeHaven, DC. Correspondence concerning this article should be addressed to 1996), the best lipreaders never outperformed the profi­ L. E. Bernstein, Department of Communication Neuroscience, House cient deaflipreaders in our vibrotactile aid studies. Scores Ear Institute, 2100 West Third Street, Los Angeles, California 90057 for the most accurate deaf! participants in the unaided (e-mail: [email protected]). conditions were approximately 80% words correct in <-Accepted by previous editor. Myron L. Braunstein sentences (Bernstein et aI., 1993; Bernstein et al., 1991).

233 Copyright 2000 Psychonomic Society, Inc. 234 BERNSTEIN, DEMOREST, AND TUCKER

The Impoverished Visual Phonetic Speech Stimulus nel or closely approximating a blockage. Place of artic­ In a review of audiovisual speech perception, Kuhl ulation ofthe tongue is also only partially visible. For ex­ and Meltzoff (1988) commented, "A relatively small ample, tongue positions responsible for distinctions be­ proportion ofthe information in speech is visually avail­ tween consonants with dental versus velar constrictions able" (p. 240). Phoneme identification rates for non­ (e.g., It! versus Ik/) are difficult to view. Also, the state of sense syllable stimuli, depending on phonetic context, the velum, which is responsible for nasality, is completely have been reported to be below 50% correct (e.g., rates invisible. Thus, a major articulatory contributor to dis­ range between 21% and 43% in Auer, Bernstein, Wald­ tinctions such as Iml versus Ib pi is invisible in syllable­ stein, & Tucker, 1997, and between 19% and 46% in initial position (cf. Scheinberg, 1988). Owens & Blazek, 1985; see, also, Benguerel & Pichora­ The classical articulatory phonetic characterization Fuller, 1982; Fisher, 1968; Lesner & Kricos, 1981; Wal­ does not, however, preclude other possible sources ofvi­ den, Erdman, Montgomery, Schwartz, & Prosek, 1981; sual phonetic information, such as the kinematics ofthe Wozniak & Jackson, 1979). The percentage for the cor­ jaw, the cheeks, and the mouth. Indeed, Vatikiotis-Bateson, rect identification of in Ih/Vlgl stimuli (where Munhall, Kasahara, Garcia, and Yehia (1996) and Yehia, V = vowel) by adults with normal hearing, in a study by Rubin, and Vatikiotis-Bateson (1997) have shown sur­ Montgomery and Jackson (1983), varied between 42% prisingly high quantitative associations between external and 59% (see, also, Montgomery, Walden, & Prosek, facial motions and speech acoustics. They reported, for 1987). However, Auer et aI. (1997) reported 75% correct example, that 77% ofthe variance in acoustic line spec­ vowel identification for 19 vowels, including r-colored trum parameters could be accounted for with orofacial vowels across four phonetic contexts. movement data from the lips,jaw, and cheek. Consistent The term viseme was coined to refer to mutually con­ with the potential for observing visual phonetic detail, fused phonemes that are deemed to form a single percep­ Bernstein, Iverson, and Auer (1997) showed that words tual unit (Fisher, 1968; Massaro, 1998). When visemes are predicted on the basis of viseme analyses to be visually defined via cluster analyses by within-cluster response rates indiscriminable were identified, on average, at approxi­ of75% or greater, as was done by Walden, Prosek, Mont­ mately 75% correct (chance = 50%) in a two-interval gomery, Scherr, and Jones (1977), consonants have been forced-choice procedure. Some participants scored 100% found to fall into approximately six viseme clusters (i.e., correct. Thus, it is likely that the classical articulatory If vi, 18 6/, Iw r/, Ip b mI, If 3 t3 tfl, It d s z j k n gil). account ofoptical phonetic effects is not ade­ Whether phonemes within viseme groups are discrim­ quate and that other sources ofphonetic information are inable or not has not been examined as a general question. potentially available to lipreaders. Nevertheless, that the viseme is a perceptual unit has been accepted in the literature. For example, Massaro (1998) Low Accuracy for Visual Perception ofWords asserts, "Because ofthe limited data available in visible There are many reports of inaccurate lipreading of speech as compared to audible speech, many phonemes connected speech (e.g., Breeuwer & Plomp, 1986; De­ are virtually indistinguishable by sight ... and so are ex­ morest & Bernstein, 1992; Rabinowitz, Eddington, Del­ pected to be easily confused. To eliminate these nonse­ horne, & Cuneo, 1992; Ronnberg, 1995; Ronnberg et aI., rious confusions from consideration, we group visually 1998). Breeuwer and Plomp measured lipreading in a indistinguishable [emphasis added] phonemes into cate­ group ofDutch adults with normal hearing, using short gories called visemes" (p. 394). sentences. The percentage ofcorrect syllables was 10.6% The viseme is a plausible perceptual unit, from the per­ on the first presentation of the sentences and 16.7% on spective of conventional accounts of articulatory pho­ the second. Demorest and Bernstein (1992) reported on netics: Many ofthe speech-related activities ofthe vocal 104 college students with normal hearing, who attempted tract are occluded from view by the lips, cheeks, and to lipread words in sentences. Overall, the students iden­ neck. Vocal fold vibration, for example, which is respon­ tified 20.8% of the words. However, individual scores sible for the fundamental frequency and which ranged between 0% and 45% words correct. Rabinowitz contributes to phonological distinctions between voiced et aI. (1992) studied lipreading in 20 adults with pro­ and unvoiced consonants (e.g., Igl versus Ikl, Ibl versus found hearing impairments acquired after ac­ Ipl, and /zl versus lsi; Lisker & Abramson, 1964), is quisition. These adults had all received cochlear im­ completely invisible.? The type of vocal tract closure plants (devices that deliver direct electrical stimulation made by the tongue, which contributes to phonological to the auditory nerves) and were tested on identification manner distinctions (Catford, 1977), is only partially ofwords in sentences by vision alone in one ofthe con­ visible. For example, the acoustic distinction between lsi ditions ofthe study. Two sets ofmaterials were adminis­ and It! is due, in part, to the continuous maintenance of tered, differing in sentence length, vocabulary, and se­ an air channel in the former case and the achievement of mantic properties. Performance was 18% words correct a briefclosure in the latter, both at approximately the same for the more difficult materials, with a range of scores tongue position in the vocal tract. It is difficult to see from 0% to 45%, and a mean of44% for the easier mate­ whether the tongue is completely blocking the air chan- rials, with a range of 0% to 75%. Although the means SPEECH PERCEPTION WITHOUT HEARING 235 cited here support the generalization that lipreading is profound hearing impairments. The majority of the stu­ highly inaccurate, the ranges suggest that large, poten­ dents with hearing impairment at Gallaudet University tially functionally significant individual differences can (where part ofthe study was undertaken) had experienced occur. their hearing impairments at birth or at a very early age. Hearing impairment at a young age is typically associated Individual Differences with lower English language and reading proficiency Individual differences in lipreading are not generally (Schildroth & Karchmer, 1986). Thus, if our observa­ thought to be due to explicit training, although in the tions showed enhanced visual speech perception among c1assicallipreading literature (Jeffers & Barley, 1971), it these young adults, relative to others with normal hear­ was assumed that training on speech patterns was required ing, it was not likely that it was due to enhanced higher in order to master lipreading. That is, training was con­ level psycholinguistic capabilities. Furthermore, evidence sidered necessary to express a certain inborn potential. has been presented supporting visual perceptual special­ But because some people are apparently unable to bene­ ization for manual communication, demonstrated in the fit from training over years ofexperience (see, e.g., Hei­ deafpopulation having American as a first der & Heider, 1940), it is said that good lipreaders are language (e.g., Neville, 1995). Perceptual enhancement born and not made. for visible speech might be predicted under similar con­ That lipreading is an inborn capability is supported by ditions ofauditory deprivation. reports that people whose hearing impairments are se­ The present study was designed to sample lipreading vere to profound are no more accurate in perceiving speech across three levels oflinguistic complexity in materials: by eye than are people with normal hearing (Conrad, consonant-vowel (CV) nonsense syllables, isolated mono­ 1977; Lyxell & Ronnberg, 1989; Ronnberg, 1995; Sum­ syllabic words, and sentences. The experimental methods merfield, 1991). Perceivers with normal hearing are thought were closed-set identification of phonemes in CV non­ to be more accurate than perceivers with hearing impair­ sense syllables and open-set identification of isolated ments: "That auditory experience ofspeech enhances vi­ words and sentences. To our knowledge, there is not an­ sual speech recognition is shown by the fact that the other large data base with information about visual speech hearing are more competent at lip-reading [sic] than the perception at each ofthese levels across groups ofadults deaf" (Mogford, 1987, p. 191). Given that visible speech who differed in terms oftheir perceptual experience (im­ influences hearing perceivers (see, e.g., Dekle et aI., 1992; paired vs. normal hearing). Basic descriptive adequacy Green & Kuhl, 1989; MacLeod & Summerfield, 1990; seemed to us an important, yet apparently missing, basis Massaro, 1987; McGurk & MacDonald, 1976; Middle­ for current investigations ofspeech perception involving weerd & Plomp, 1987; Rosenblum & Fowler, 1991; visual as well as audiovisual conditions. Sekiyama & Tohkura, 1991; Sumby & Pollack, 1954; There are many methods available to study phonetic Summerfield, 1991; Summerfield & McGrath, 1984), if perception. The approach taken here was to obtain iden­ it afforded adequate phonetic information for speech tificationjudgments and then apply various analytic tools perception, one might expect to observe its adequacy to the obtained data. The identification of phonemes in and, possibly, even its enhancement ofperformance among CV syllables was considered primarily a phonetic per­ those deafpeople forced to rely on it (1. L. Miller, 1991; ception task, although the task is susceptible to postper­ Summerfield, 1991). Recently, Ronnberg (1995) reviewed ceptual response biases (see, e.g., Walden, Montgomery, the literature concerning this question and concluded Prosek, & Schwartz, 1980) and there is evidence that pho­ that there was no evidence for enhanced performance in neme identification engages lexical processes (e.g., New­ relation to auditory experience. Our experience and ob­ man, Sawusch, & Luce, 1997). The CV identification re­ servations suggested that evidence could be obtained for sponses were scored in terms ofproportion ofphonemes enhancement associated with hearing impairment. correct and proportion of transmitted phonological fea­ ture information. The Present Study The information analyses employed sequential trans­ The emphasis in the present study was on phonetic per­ mitted information (T1)analysis (SINFA, software) (Wang, ception. This contrasts with recent studies oflipreading 1976; Wang & Bilger, 1973) that removes the correlations by Ronnberg and colleagues (e.g., Lyxell & Ronnberg, among response patterns associated with features evalu­ 1987, 1992; Ronnberg, 1995), who sought to explain in­ ated in the order of their importance. Importance is ini­ dividual differences in lipreaders in terms of top-down tially determined by calculating feature information non­ psycholinguistic and strategic processes. According to sequentially and comparing estimates across features. their view, the visual stimulus affords so little phonetic Feature specifications were from Chomsky and Halle information that performance can only be optimized at a (1968; see the Appendix). In the now-c1assicalliterature later stage than perceptual processing (Ronnberg, 1995). on features in speech perception (see Wang & Bilger, 1973, The present study shows that individual and group dif­ for a review), a primary question was the psychological ferences do occur for visual phonetic perception. reality ofparticular feature sets. Feature analysis was not Two sizable groups ofparticipants were recruited, one intended here to affirm the psychological reality of fea­ group with normal hearing and the other with severe to tures, but to quantify structure in the stimulus-response 236 BERNSTEIN, DEMOREST, AND TUCKER

confusion matrices in terms of structural relationships diometric services, and ifrecords were inadequate, the participant among phonemes (Keating, 1988). Typically, there is not obtained a new evaluation. a one-to-one relationship between phonetic attributes and A total of 157 individuals responded to advertisements for the ex­ phonological features. But feature structure and its quan­ periment. One hundred twenty-four were considered potential par­ ticipants, afterthe initial contact. Ofthese, 84 met the criteria outlined tification provide evidence that phonetic cues are present above, and 80 were accepted for testing. Eight IH participants were in the stimuli, and this is particularly useful in evaluating eventually excluded from the sample owing to technical problems responses across entire confusion matrices, particularly during testing or the lack ofan EPT score on file. The age range ofthe when responses are errorful. remaining 72 participants was 18-41 years, and 25 participants were Phonetic perception per se is rarely studied with word male. The participants were paid $10 for approximately 2 h oftesting. or sentence stimuli. The scoring ofresponses in terms of Figure I is a scatterplot ofthe hearing levels, in better pure tone average (dB HL) across the two ears, and age of hearing impair­ phonemes requires a methodology by which to align stim­ ment onset for 68 IH participants. Four participants not shown in ulus and response strings that can have insertions, dele­ the figure, because they became hearing impaired much later, were tions, and substitutions. For the scoring here, we employed 7 years (115-dB HL better pure tone average), 9 years (112-dB HL sequence comparison software (Bernstein, Demorest, & better pure tone average), 17 years (140-dB HL better pure tone av­ Eberhardt, 1994) in order to align phonemically tran­ erage), and 27 years (l40-dB HL better pure tone average) of age. scribed stimuli with their respective transcribed responses. The majority of the participants (71%) had profound hearing im­ The proportions of phonemes correct in isolated words pairments (90-dB HL or greater, bilaterally). Forty-five ofthe par­ ticipants (62.5%) had hearing impairment by 6 months of age. and in sentences were derived from the alignment pro­ Twenty-three (31.9%) had onsets ofbetween 7 and 48 months. The cess, as was phoneme substitution uncertainty (SU), majority experienced hearing impairment at birth (65%) or subse­ measured in bits (for a set of the sentence stimuli). The quently, during the period of normal language acquisition earlier uncertainty measure employed only the phoneme error than 36 months (23%). The reported causes ofhearing impairment substitutions. Although lexical and other higher level were unknown (30), meningitis (II), maternal rubella (11), other (6), processes necessarily affect the phonetic perception mea­ genetic (5), premature birth (4), high fever (3), scarlet fever (I), and diabetic pregnancy (1). The EPT results and the questionnaire re­ sures obtained with sentences, we hypothesized that error sponses for these participants are reported in Bernstein et al. (1998). patterns could potentially provide insight into phonetic Participants with normal hearing. One hundred and four par­ perception in sentential contexts. ticipants with normal hearing (NH) were recruited at the University ofMaryland, Baltimore County. They reported that hearing and vi­ sion were normal; English was identified as a first language. Dur­ METHOD ing the course ofthe experiment, data from 8 participants were ex­ cluded owing to equipment malfunction, failure to meet selection Participants criteria, or unwillingness to complete the entire protocol. The partici­ Participants with impaired hearing. Participants with impaired pants ranged in age between 18 and 45 years. None of the partici­ hearing (IH) were screened for the following characteristics: (I) an pants reported any training in lipreading. Twenty-eight were male. age of between 18 and 45 years; (2) enrollment at Gallaudet Uni­ They received course credit and/or $10 for their participation. versity; (3) sensorineural hearing impairments greater than a 60-dB HL average in the better ear across the frequencies 500, 1000, and Materials 2000 Hz; (4) no self-report ofdisability other than hearing impair­ All the stimulus materials were high-quality videodisc recordings ment, and university records reporting no impairments other than (Bernstein & Eberhardt, 1986a, 1986b) ofa male and a female talker hearing impairment; (5) the self-reported use ofspoken English as who spoke General American English. Talkers were recorded so that the primary language ofthe participant's family; (6) a self-report of their faces filled the screen area. (See Demorest & Bernstein, 1992, English (including a manually coded form) as the participant's na­ for a description ofthe recordings.) tive language; (7) education in a mainstream and/or oral program Nonsense syllables. CV nonsense syllables, spoken by both the for 8 or more years; and (8) vision ofat least 20/30 in each eye, as male and female talkers, were composed of two tokens of each of determined with a standard Snellen chart. 22 initial consonants combined with the vowel /a/, plus two tokens The selection criteria were developed in order to recruit participants ofthe vowel /a/ alone. The consonants were: /p b m f v f tf w r t for whom lipreading was a socially important and well-practiced eo d s z k 9 nih d3 3/. skill and to exclude participants whose native language was Amer­ Words. Monosyllabic words, spoken by the male talker only, ican Sign Language or another manual communication system other were from the clinical version (Kreul, Nixon, Kryter, Bell, & Lamb, than English. Criterion (7) excluded participants who had been ed­ 1968) ofthe Modified Rhyme Test (MRT; House, Williams, Hecker, ucated in residential schools, in which manual communication is & Kryter, 1965). The MRT comprises 50 six-word ensembles. Two frequently the primary mode ofcommunication. English reading and words were selected from each of the ensembles, to yield 100 dif­ writing abilities were estimated from the scores on the Gallaudet ferent words. University English Placement Test (EPT). No normative statistics Sentences. Fifty sentences from the Bernstein and Eberhardt were available for this test, but it was the only a priori assessment (I 986a) recordings of the lists ofcm Everyday Sentences (Davis common to every student in the population we sampled. The partici­ & Silverman, 1970) were selected, 25 spoken by each of the talk­ pants were not screened for age ofonset ofhearing impairment, be­ ers. An additional 25 for each talker were selected from Corpus III cause there was no a priori reason to do so. and Corpus IV ofBernstein and Eberhardt (1986b), a large sentence Each participant completed an extensive questionnaire that ad­ database created for the study of visual speech perception. These dressed family and personal history of educational background, are referred to here as B-E sentences. Previous findings with these hearing impairment, and language use. Recent audiological evalu­ sentence materials (Demorest & Bernstein, 1992) indicated that vi­ ations were obtained for each participant. Students received free au- sual perception ofthe female talker's speech is generally less accu- SPEECH PERCEPTION WITHOUT HEARING 237

50

tr: ..c-..... t: 0 40 ~ '-"..... Il) Vl t: 0 30 ..... t: Il)

'~E 20 0. E -OIl ...... t: 10 'C ... t':l Il) ... ::r: ... 0 ...... 65 85 105 125 145

Better Pure Tone Average (dB HL)

Figure I. Distributions ofthe hearing levels, in better pure-tone average (dB HL) across the two ears, and age of hearing impairment onset for the 68 participants with impaired hearing.

rate than ofthe male talker's, In orderto raise sentence scores above Nonsense syllables. The participants were tested with two 92­ floor level, a set ofthe female talker's most intelligible sentences was item lists of the CV syllables, one for each talker. Each list con­ selected on the basis of previous mean results, No subject saw the sisted oftwo repetitions ofthe 44 CV tokens and two repetitions of same sentence spoken by both talkers, the 2 /0/ tokens. Item order was pseudorandomized at presentation time, and talker order was counterbalanced across participants. Se­ Procedures lected keys of the PC keyboard were labeled with 23 one- or two­ General procedures, The availability oflarger numbers ofqual­ character phonemic codes that the participant pressed to respond. ified NH participants afforded the possibility ofstudying possible The participants could not edit their responses. Prior to testing, pho­ practice effects within analyses ofgeneralizability (Demorest et al., nemic codes were explained to the participant, and clarification was 1996), The NH participants were tested with two sets ofmaterials, provided, ifnecessary. The instructions for these and the other ma­ with the intertest interval ranging from 5 to 12 days, The same pro­ terials were given verbally and in print for NH participants. For IH cedure was followed on each day, and set order was counterbalanced participants, verbal instructions were given in simultaneous (sign and across participants. The materials selected for the present study speech) communication mode.' A cue card that paired each response made use of one set of materials only, and comparisons across IH code with an example word was visible for reference throughout the and NH groups therefore included data from the 2nd day of testing test session. The participants received a 5-item practice list. for halfofthe NH participants. On the basis ofthe testing ofthe NH Words. The 100 words were tested in pseudorandornized order. participants, there was no reason to suspect that results would vary Prior to testing, the participants were informed that each word com­ significantly were both sets or only the alternate set presented. prised a single syllable and that each word would be presented once. Testing procedures were essentially the same at the University of The participants were instructed to type what they thought the talker Maryland, Baltimore County, and at Gallaudet University, The par­ had said and were given as long as necessary to respond, Editing of ticipant was seated at a small table in a darkened, sound-attenuating the response was permitted, as well as blank responses, The partic­ room. A videodisc player (Sony Lasermax LDP 1550) was controlled ipants received a five-item practice list. by a personal computer (PC). The PC was used to instruct the partici­ Sentences. The participants viewed 25 cm sentences and 25 B-E pant prior to testing, to present stimuli, and to record responses, A small sentences for each talker. Talker order and the order ofB-E versus lamp illuminated the computer keyboard. The stimuli were presented CID sentences were counterbalanced across NH participants. on a 19-in. high-resolution color monitor (Sony Trinitron PVM 1910), The participants were told that they would see a series of unre­ placed at a distance of2 m from the participant. The rate ofstimulus lated sentences and were instructed to type exactly what they thought presentation was controlled by the participant, who pressed a key to the talker had said. Partial responses, including word fragments, see the first stimulus and pressed the return key following each sub­ were encouraged, but the participants were instructed to keep all sequent stimulus and response. After a briefpause, the first frame of portions of their response in chronological order. They were also the stimulus was presented for 2 sec; then the remaining frames were instructed to use Standard English spelling and to use contractions played in real time. The final frame remained on the screen until the only when they thought that contractions were what the talker had participant's response was completed. The monitor was darkened said. The participants received a three-sentence practice set. The briefly between stimuli. The three types ofmaterials were presented in participants could delete and retype responses ifthey felt it neces­ counterbalanced order. Test sessions took from I V2 to 2 h to complete. sary to do so. Following each response, the participants gave a con- 238 BERNSTEIN, DEMOREST, AND TUCKER

fidence rating on a scale ranging from 0 to 7. These ratings were were normalized by dividing each by the number of phonemes in collected to study in detail the relationship between subjective per­ the stimulus sentence. In addition, the phoneme-to-phoneme align­ formance ratings and objective performance. Demorest and Bern­ ments obtained for the CID sentences were submitted to further stein (1997) showed that the two participant groups differed signif­ analysis. For example, in the following alignment: icantly in their subjective ratings (subjective ratings were higher Stimulus: p r u f rid y u r f e-n ! r 3 s A Its among deafparticipants), but an analysis ofvalidity coefficients for the ratings showed that both groups made valid estimates of their Response: b I u f - I f Q- r f Q n i ------own lipreading performance. All ofthose analyses included the non­ for the stimulus "proof read your final results" and the response response data. An investigation of individual differences in using "blue fish are funny," there are five correct phonemes and six in­ subjective ratings showed that 93.0% ofthe validity coefficients for correct phoneme substitutions. Extraction ofincorrect substitutions hearing participants and 96.4% ofthe validity coefficients for deaf was performed for each ofthe stimulus phonemes in the cm sen­ participants were significant. Therefore, nearly all the participants tences for the male and female talkers and the two participant made valid ratings oftheir own performance. These results suggest groups. An information measure, SU in bits, was calculated for that group performance differences reported below cannot simply be each ofthe stimulus phonemes as attributed to response criterion differences between the two groups. SU = -I,Pk log2Pk, Measures where Pk is the proportion of responses in category k, and k is an Nonsense syllables. The proportion ofphonemes correct in CV index ofsummation that represents each possible substitution error. syllables was the proportion of correct responses across the entire stimulus set for each ofthe talkers. The proportion of transmitted phonological feature information was the independent proportion RESULTS AND DISCUSSION ofTI for the features in the Appendix, as obtained with SINFA soft­ ware (Wang, 1976; Wang & Bilger, 1973). Group and Talker Effects Words. Performance on each word was scored twice. The total An analysis of variance (ANOVA) was performed on number ofentire words correct was tallied, and a proportion of words the measures, proportion of phonemes correct in non­ correct was obtained. The number of phonemes correct in each word was also counted, and a proportion ofphonemes correct per sense syllables, proportion ofisolated words correct, mean word was obtained. The means ofthese proportions were obtained proportion ofphonemes correct in words, mean propor­ for each participant. tion ofwords correct in sentences, and mean proportion Several steps were performed to obtain these measures. First, an of phonemes correct in sentences, employing a group experimenter checked the response files for words that were mis­ (IH vs. NH) X talker (male vs. female) design. The only spelled or that had obvious typographical errors (e.g., rian for rain). exception to this design was that there was not a talker Corrections were made only when there was no ambiguity con­ cerning the intended response. Because the next step was to submit factor for the measures involving isolated words. Because response files to a computer program that counted words correct and data from the NH participants had been obtained on 2 days that was sensitive to homophone distinctions, responses that con­ with counterbalanced order (see Demorest et aI., 1996), tained the incorrect homophone were edited (e.g., piece was re­ preliminary analyses were conducted, and it was verified placed with peace). that mean performance was not significantly different The response files were also phonemically transcribed (using a for NH subgroups who viewed materials on Day I ver­ text-to-speech system) and then submitted to a sequence compara­ tor program, which performed a phoneme-to-phoneme alignment sus Day 2. of the stimulus with the response phoneme strings. A description The means and standard deviations for each ofthe ob­ and validation experiment on the sequence comparator and the tained measures for the two participant groups are given overall method were described in detail in Bernstein et ai. (1994). in Table I. An ANOVAwas conducted on untransformed Sequence comparison techniques take into account the possibility and log-transformed scores to stabilize variances, and that element-by-element alignment oftwo strings can require sub­ only for the group X talker interactions obtained with stitutions, insertions, and deletions (Sankoff& Kruskal, 1983). That sentence stimuli were the results different. For simplicity, is, highly similar but not identical elements can be aligned with each other, and one string may have more or fewer elements than the all the results are reported for the untransformed scores. other. A distance metric and minimization algorithm were used to An ANOVA on each measure showed that the group dif­ obtain the least total estimated visual perceptual distance between ferences in the table are all significant (p < .0005, except the stimulus and the response phoneme strings. The estimated dis­ for nonsense syllables, p == .008). The IH group consis­ tances between phonemes were obtained via multidimensional scal­ tently obtained significantly higher mean scores than did ing ofphoneme confusion matrices obtained in visible speech non­ the NH group (proportion ofnonsense syllables correct, sense syllable identification experiments (see Bernstein et al., 1994).4The output ofthe sequence comparator included several dif­ F(l,166) == 7.28; proportion of isolated words correct, ferent measures. The measure employed here was simply the num­ F(l,166) == 51.33; mean proportion ofphonemes correct ber of correct phonemes per word divided by the number ofphonemes in isolated words, F(I, 166) == 58.34; mean proportion of in the stimulus word. Word scores were averaged across the stimu­ words correct in cm sentences, F(l,166) == 62.02; mean lus set for each participant. proportion of phonemes correct in CID sentences, Sentences. The sentence response data were processed in a man­ F(l,166) == 67.37; mean proportion of words correct in ner analogous to that for the isolated words: Files were checked for F(l,166) obvious spelling errors, and a computer program counted words cor­ B-E sentences, == 71.55; and mean proportion rect per sentence. The response files were transcribed and submitted ofphonemes correct in B-E sentences, F(l,166) == 76.56]. to the sequence comparator, to determine the number ofphonemes These results show that performance for the IH group correct per sentence. The phonemes correct scores per sentence was enhanced, relative to that ofthe NH group, for all the SPEECH PERCEPTION WITHOUT HEARING 239

Table I Means, Standard Deviations, and Upper Quartile Ranges for Phoneme and Word Scores Obtained from the Impaired Hearing and Normal Hearing Groups Phoneme Scores Word Scores Group Talker Mean (SD) Fourth Quartile Range" Mean (SD) Fourth Quartile Range" Nonsense Syllables Impaired hearing Female .31 .06 .35-.44 NA NA NA Male .34 .08 .39-.50 NA NA NA Normal hearing Female .29 .07 .33-.41 NA NA NA Male .32 .06 .37-.46 NA NA NA Isolated Words Impaired hearing Male .52 .11 .59-.73 .19 .09 .25-.42 Normal hearing Male .41 .09 .47-.58 .11 .06 .\5-.24 Cll) Sentences Impaired hearing Female .58 .20 .73-.88 .52 .20 .68-.85 Male .47 .18 .61-.79 .40 .\8 .52-.74 Normal hearing Female .37 .15 .44-.69 .31 .15 .42-.75 Male .27 .14 .36-.64 .2\ .13 .29-.49 B-E Sentences Impaired hearing Female .43 .16 .56-.75 .35 .15 .48-.66 Male .53 .17 .66-.83 .47 .\7 .61-.80 Normal hearing Female .26 .12 .27-.41 .21 .10 .26-.41 Male .33 .12 .42-.64 .28 .11 .36-.57 "Fourth quartile ranges do not include outliers. types of materials and the phoneme- and word-scoring the NH group. Boxplots were found to be an excellent methods. method for examining performance distributions and All talker differences were significant (p < .0005). The isolating the upper quartiles. Figures 2-9 show boxplots group X talker interactions were not significant for pho­ for each ofthe measures in the present study. In each fig­ neme identification and were inconsistently significant for ure, boxes represent the interquartile range, and the me­ sentences. When the transformed scores for a particular dian is marked with a horizontal line (SPSS, 1996). Hor­ measure produced significance, the untransformed ones izontal lines (at the ends ofthe whiskers) mark the most did not, and vice versa. The small, although significant, extreme values not considered outliers. Thus, approxi­ interactions are not considered important. The talker dif­ mately 25% ofthe observations fall into the range defined ference showed the female talker to be the more difficult for the nonsense syllables and the B-E sentences. The result was reversed for the CID sentences. These results can be explained straightforwardly. As was mentioned TALKER earlier, selection ofthe CID sentences attempted to raise o Female scores for the female talker and resulted in a reversal of -....~ .8 difficulty for these materials. 0 DMaie o til Estimates of Achievable Upper Extremes ::c(I) .6 0 on Visual Speech Perception Accuracy ~ Because lipreading is known to vary among individu­ >. als to an extent that has functional significance for speech en c:: .4 communication, estimates ofmean performance, although :e0 useful for group comparisons, fail to provide insights into 0 the range ofpotentially functionally significant individ­ c- ....o .2 ual differences within and across populations. Ofpartic­ n, ular interest here were the upper ranges ofperformance: No one doubts that some individuals are poor at lipread­ 0 ing. The question of interest was the upper limit on vi­ N= 72 72 96 96 sual speech perception accuracy. The upper quartiles of the response distributions were chosen to estimate achiev­ Impaired Hearing Normal Hearing able extremes ofvisual perception accuracy> The upper Figure 2. Boxplots ofthe proportion of nonsense syllables cor­ quartiles had the advantage of including relatively large rect for impaired hearing and normal hearing participants with samples-approximately 18 for the IH group and 24 for both the male and the female talkers. 240 BERNSTEIN, DEMOREST, AND TUCKER

was above that ofthe NH group. These scores suggest a Talker higher level ofaccuracy than did proportion ofwords cor­ rect (see Figure 3). Frequently, the participants were only "0 DMaie partially correct in their word identifications, leading to (I) .8 .... higher scores measured in phonemes. o0 The boxplots of mean proportion of words correct in CID sentences are shown in Figure 5. This figure shows If) .6 "C.... the discontinuity in upper quartiles for the two groups 0 with the male talker, but not with the female talker. The 3: upper quartile ofthe NH group overlaps the range ofthe c 0 .4 third quartile and part of the fourth quartile of the IH :e group: A small subset ofthe NH group was as accurate as a.0 the most accurate participants in the IH group. When ....0 .2 I I scored in terms ofthe proportion of phonemes correct, a- I I however, the upper quartile scores for the two groups are ~ once again distinct (Figure 6). 0 The patterns of results are similar in Figures 7 and 8 N= 72 96 for proportion of words correct and mean proportion of Impaired Hearing Normal Hearing phonemes correct, respectively, for the B-E sentences. Generally, upper quartile ranges for both talkers and both Participants scoring methods are distinct across the IH and NH groups. All ofthe boxplots show that among individuals in the Figure 3. Boxplots of the proportion of isolated words correct IH group were some who were as highly inaccurate at for impaired hearing and normal hearing participants with the perceiving speech visually as were individuals in the NH male talker. group. Thus, severe to profound hearing impairment is not a sufficient condition for even moderately accurate visual speech perception. On the other hand, the distrib­ by each whisker. Outlier scores-that is, scores more than utions ofscores, with virtually no overlap between upper 1.5times the interquartile range from the ends ofthe box­ quartiles for the two groups, suggest that conditions as­ are labeled with circles. None ofthe ranges discussed be­ sociated with auditory deprivation are favorable to en­ low includes outliers, consistent with a conservative ap­ hanced visual phonetic perception. proach to examining the upper extremes ofperformance. The boxplots ofproportion ofnonsense syllables cor­ Relationships Among Measures rect in Figure 2 show that, although there was a signifi­ Having demonstrated reliable group differences in the cant mean group difference, the interquartile range across phoneme and word measures and individual differences both talkers and both participant groups covered a rela­ tively narrow range (.24-.35). Similarly, the upper quar­ tiles for both groups covered a narrow range (.35-.50 for .... the IH group and .33-.46 for the NH group), demonstrat­ 0 Talker ing that, even for the most proficient participants, pho­ ....~ neme identification in nonsense syllables is only moder­ 0 .8 DMaie ately accurate. This result is consistent with the literature. o (/J The boxplots ofproportion ofisolated words correct in (I) E .6 Figure 3 show that scores were very low in both groups. It (I) was anticipated that identification accuracy for these c words would be low: Each word had at least five alternate s:0 I I potential response words in English, which varied either C- .4 c in terms ofthe initial consonant or consonant cluster or 0 in terms ofthe final consonant or consonant cluster (see ~ € .2 House et al., 1965). Given the high likelihood for misper­ a.0 ception, the upper quartile performance (25%-42% cor­ e 0 rect) ofthe IH group can be considered remarkable. Also, C- O the lower limit of the upper quartile range for the IH N= 72 96 group coincides with the highest score observed for the Impaired Hearing Normal Hearing NHgroup. The boxplots ofmean proportion ofphonemes correct Participants in isolated words in Figure 4 show that the upper quar­ Figure 4. Boxplots of the proportion of phonemes correct in tile for the IH group was .59-.73 and for the NH group isolated words for impaired hearing and normal hearing partic­ was .47-.58. Again, the upper quartile of the IH group ipants with the male talker. SPEECH PERCEPTION WITHOUT HEARING 241

t> ~ .8 ~ oo f/l o 'E .6 o ~ r::o .4 TALKER :eo Q. e .2 Female Q.

0-'-----__-r--r-r- -r-r-r-__--' DMaie N= 72 72 96 96 Impaired Hearing Normal Hearing

Participants Ftgure 5. Boxplots of the mean proportion of words correct in CID sen­ tences for impaired hearing and normal hearing participants with the male and female talkers. within groups, the results were further analyzed for evi­ tion and the measures involving words should be consis­ dence that phonetic perception is a factor in individual tently substantial in magnitude. differences for lipreading words and sentences. Figure 2 Table 2 shows the Pearson correlations between the shows that the magnitude of the significant differences word scores and the CV identification scores. Isolated in CV identification between groups was small. Never­ word scores were the proportion of words correct. Sen­ theless, small increments in obtained phonetic informa­ tence scores were the mean proportion of words correct tion could result in large increments in word accuracy per sentence. Table 2 shows that there was a stronger as­ (Auer & Bernstein, 1997). If individuals' perceptual ac­ sociation between visual phonetic perception measured curacy for phonetic information is highly implicated in with CV identification and word identification in the IH individual differences in identifying isolated words and than in the NH group and that the NH participants iden­ words in sentences, correlations between CV identifica- tified the female talker's CVs in a manner unrelated to

....o Q) ~ ~ 0 o .8 o f/) Q) E .6 Q) c 0 .c a.. .4 TALKER c :e0 0 .2 Female o, O ~ a.. 0 DMaie N= 72 72 96 96 Impaired Hearing Normal Hearing

Participants Figure 6. Boxplots of the pr-oportion of phonemes correct in CID sen­ tences for impaired hearing and normal hearing participants with the male and female talkers. 242 BERNSTEIN, DEMOREST, AND TUCKER

0 ...~ o0 .8 CIl "E 0 ~ .6 c: 0 1:: .4 0 TALKER eQ. 0- Female c: .2 ttl CD :E 0 DMaie N= 72 72 96 96 Impaired Hearing Normal Hearing

Participants

Figure 7. Boxplots of the mean proportion of words correct in B-E sentences for impaired hearing and normal hearing participants with the male and female talkers. their identification ofwords. The sample correlations for sibility ofguessing words, which could theoretically di­ the IH group were always higher than the corresponding minish the association between phonetic perception scores ones for the NH group.s and the sample correlations with and word scores, the consistent magnitude of the corre­ the male talker's CV identifications were always higher lations supports a role for phonetic perception in indi­ than those with the female talker's. vidual differences among the IH participants at the sen­ The magnitudes ofthe correlations ofCV identification tence level. scores from the male talker and the word scores were all The importance ofphonetic perception in accounting moderately high (.472-.644) for the IH group. Twenty­ for individual differences is also supported when the cor­ nine percent of the variance in identifying words in the relations are carried out with the phoneme scoring for male talker's sentences is accounted for by the CV iden­ isolated words and sentences. Table 3 shows the Pearson tification scores. Given that sentences afforded the pos- correlations between the proportions of phonemes cor-

U -~ ... .8 o0 CIl CD E .6 CD c: 0 .t: 0- .4 TALKER c: 0 1:: Female 0 .2 eQ. 0- DMaie 0 N= 72 72 96 96 Impaired Hearing Normal Hearing Participants

Figure 8. Boxplots of the proportion of phonemes correct in B-E sen­ tences for impaired hearing and normal hearing participants with the male and female talkers. SPEECH PERCEPTION WITHOUT HEARING 243

Table 2 This presupposition is in opposition to the notion ofthe Correlations Between Proportion of Phonemes Correct in viseme, which is considered to be an undifferentiated Nonsense Syllables and Proportion of Isolated Words Correct perceptual unit comprising several phonemes (e.g., Mas­ and Mean Proportion ofWords Correct in Sentences for the Impaired Hearing (IH) and Normal Hearing (NH) Groups saro, 1998; Massaro, Cohen, & Gesi, 1993). Several factors were taken into account in obtaining Isolated Words, cm Sentences B-E Sentences the TI measures. Because conditional analysis of TI by Male Female Male Female Male Groups Talker Talker Talker Talker Talker SINFA sequentially removes correlated feature struc­ ture, measures are sensitive to the order offeature eval­ Nonsense Syllables Spoken by Female Talker uation. In the present study, the sparse confusion matri­ IH .346t .301* .3lOt .274* .317t NH .152 .111 .092 .137 .124 ces (only two responses per token per talker) obtained from each participant precluded individualized analyses Nonsense Syllables Spoken by Male Talker o~fe~ture importance. Therefore, data across groups but IH .644t .537t .540t A72t .539t NH .389t A26t A30t .378t A04t within talkers were pooled to obtain the relative impor­ tance (unconditional TI) of the features. Then, the con­ *p - .05. tp = .01. di~io?al TI a?alysis was performed on data pooled only withi~ quart~les and talkers for each participant group. Quartile assignment for each participant was based on rect scores for words and sentences and the CV identifi­ mean proportion ofphonemes correct in isolated words cation scores. Isolated word scores were the mean pro­ (see Figure 4). If phonetic perception is related to word portion of phonemes correct in words. Sentence scores identification, as was suggested by the correlations pre­ were the mean proportion of phonemes correct in sen­ sented earlier, the TI measures for each of the features tences. Table 3 shows that sample correlations for the IH should be systematically related to the participants' quar­ group were invariably higher than for the NH group. The tile membership. correlations for scores involving the male talker's CV identifications were higher than those involving the fe­ . Figure 9 shows the results ofthe conditional propor­ tion of TI analyses for each talker and quartile within male talker's. The magnitude of correlations with the male group. Features are ordered from left to right in the fig­ talker's CV identifications was similar to those shown pre­ ures according to the feature importance for the female viously in Table2. The similarity of the correlations across ~alker. The first four features for the male were analyzed materials for a given talker and group suggests that the m the order ofround, strident, coronal, and anterior but contribution of phonetic perception is approximately are presented here in the same order as that for the fe­ equal for both isolated word and sentence perception. ~ale, to facilitate direct comparison across figures. In . Table 3 also shows correlations between mean propor­ Figure 9, the bars for each feature are ordered from first non ofphonemes correct in isolated words and the pho­ to fourth quartile, from left to right. neme scores for sentences. The proportion variance ac­ Figure 9 does not show the features voice continuant counted for by these correlations ranged between 44% high, low, back, and duration features, whi~h were esti~ and 71% across the two participant groups. These results mated to be zero. The features in the feature set are not suggest that a large portion ofthe individual differences perfectly independent, so conditional analysis would be among lipreaders can be attributed to processes no later than word recognition. The patterns of correlation between phoneme identi­ fication and phonemes correct in words and sentences is Table 3 Correlations Between Proportion of Phonemes Correct consistent with the hypothesis that variation in visual pho­ in Nonsense Syllables, Mean Proportion of Phonemes netic perception is a substantial portion of the explana­ Correct in Isolated Words, and Mean Proportion of tion for individual differences in lipreading. The relatively Phonemes Correct in Sentences for the Impaired large numbers ofparticipants in the upper quartiles, dis­ Hearing (IU) and Normal Hearing (NH) Groups cussed above in combination with the consistent corre­ Isolated Words, cm Sentences B-E Sentences lational analyses, ensure that the observed high scores are Male Female Male Female Male not attributable merely to measurement error but, rather, Groups Talker Talker Talker Talker Talker to genuine occurrences of highly accurate visual speech Nonsense Syllables Spoken by Female Talker perception. IH .276* .305t .311t .258* .334t Perception ofphonological features in nonsense syl­ NH .022 .094t .093 .032 .106 lables. An analysis of the proportion of TI for phono­ Nonsense Syllables Spoken by Male Talker ~ogical features. (see the Appendix) was used to quantify lH .657t .518t .537t A58t .539t m greater detail CV syllable identification. Feature TI NH A04t A22t A30t .235* AlOt was interpreted as a measure ofvisual phonetic cues per­ Isolated Words Spoken by Male Talker ceived: Evidence for phonological structure implies that IH .836t .843t .812t .843t phonetic cues have been perceived. Feature analyses pre­ NH .777t .784t .664t .799t suppose that subphonemic information is perceptible. *p = .05. t p = .01. 244 BERNSTEIN, DEMOREST, AND TUCKER

A .8,-----~~~~~~~-~~--~~-____,

.4

I .2J

o I-

B .8,-----~~~~-~~~~-~~~-~-

.6

.4

.2

0L- I-

Figures 9A-98. Proportion of transmitted information (TI) for phonological fea­ tures across quartiles. Quartiles for each feature are arranged from first to fourth se­ quentially from left to right. Figures 9A and 9C show the impaired hearing quartiles for the female and the male talkers, respectively. Figures 98 and 9D show the normal hearing quartiles for the female and the male talker, respectively. Abbreviations: rod, round; cor, coronal; ant, anterior; str, strident; fric, fricative; cons, consonantal; nas, nasal; voc, vocalic; voic, voicing; cont, continuant. expected to result in a low proportion II for some ofthe distribution (see the Appendix), and fricative is trans­ features. The maximum possible stimulus information mitted at a generally moderate level. On the other hand, for the 23 CV syllables is 4.52 bits. Were the features in­ continuant distinguishes the fricatives 13 fl from the af­ dependent, there would be a total of 10.40 bits ofstimu­ fricates Id3 ft!, and this distinction was not perceived. lus information (see Wang & Bilger, 1973). Thus, this High, low, and back, with zero proportion II, take on feature system contains 5.88 bits ofinternal redundancy. value only for Iw a k 9 h 3 f d3 ft!. But round is the most To illustrate, continuant and fricative are very similar in successfully transmitted feature and has positive value SPEECH PERCEPTION WITHOUT HEARING 245

C .8

~ .6 ...... u: :E '-'...... r--o $:I 0 .4 '-2 0 0- 0 ;.., t:l-. .2

1-

o ~ [I .. 11tH. ;.., o o o •o '0 'E u o ;;> :> o

o

.4

.2 I:'

Iw I GJ II] ,dM-. "~I .. "lj ;.., ...... ;.., o u u ...... 0 c:: ""c:: ""ro 0 c:: E u ro ~ 0 c:: '0 0 "" o :> ;;> u Figure 9C-90. Proportion of transmitted information (TI) for phonological fea­ tures across quartiles. Quartiles for each feature are arranged from first to fourth se­ quentially from left to right. Figures 9A and 9C show the impaired hearing quartiles for the female and the male talkers, respectively. Figures 98 and 90 show the normal hearing quartiles for the female and the male talker, respectively. Abbreviations: rod, round; cor, coronal; ant, anterior; str, strident; frlc, fricative; cons, consonantal; nas, nasal; voc, vocalic; vole, voicing; cont, continuant.

only for /w/. Therefore, failure to transmit a particular Tl was overall higher for the IH group and for the male feature does not necessarily predict low intelligibility for talker and that evidence for systematically differential Tl particular phonemes. (See Keating, 1988, for a complete as a function of quartile assignment was obtained only examination ofphonological features.) for the IH group. Figure 9A (female talker, IH group) The main point ofthe TI analysis was to obtain addi­ shows a generally orderly relationship across the IH tional evidence that performance on word identification quartiles for the coronal, anterior, strident, and nasal fea­ is related to phonetic perception. The figure shows that tures and somewhat less so for the round, fricative, and 246 BERNSTEIN, DEMOREST, AND TUCKER consonantal features. Figure 9C (male talker, IH group) tual constraint. The more information obtained, the more shows a wider difference between the first and the fourth systematic are the phoneme errors. These results are con­ IH quartiles than in Figure 9A but several points ofover­ sistent with the correlational and the TI results. They ex­ lap between the second and the third quartiles. Particu­ tend those results by showing that group differences in larly large differences in proportion of TI between the phonetic perception can be detected directly in the errors first and the fourth quartiles were obtained for the coro­ in sentence responses. nal, anterior, strident, fricative, consonantal, and vocalic features. Nasal values are orderly with regard to quartile, GENERAL DISCUSSION but across a narrower range. Round was approximately equal across quartiles, as were voice and continuant. The present study supports several general conclu­ Because the majority of the CV syllable identifica­ sions. First, it shows that visual speech perception per­ tions were errors, the TI analysis shows that more pho­ formance can be considerably more accurate (see Table I) netic information was recovered from the stimuli by the than commonly cited estimates, such as 30% or fewer better IH lipreaders, even when they were not absolutely words correct in sentences. Second, severe to profound correct. The contrast in pattern ofresults across groups hearing impairment can be associated with enhanced vi­ suggests that individual differences in phonetic percep­ sual phonetic perception. All the evidence supports the tion are more important for word identification in IH existence of a subgroup of IH individuals who outper­ than in NH participants. This conclusion is consistent with form adults with normal hearing. Third, relatively small the correlational analyses, which also showed a higher differences in phonetic perception, as measured with CV level of association between phonetic perception and syllable identification, are predictive of individual dif­ identification ofwords in isolation and in sentences. ferences in isolated word and isolated sentence perfor­ Phoneme substitution uncertainty in sentences. mance levels. Fourth, performance patterns vary across Given that phonetic perception drives word recognition, IH and NH groups. In particular, correlations among mea­ it should be possible to observe effects of phonetic percep­ sures are higher in the IH group, differences in feature tion not only in correct responses, but also in phoneme transmission scores differentiate across quartiles in the errors in sentences. Alternatively, higher level top-down IH group but not in the NH group, and phoneme SU is processes might so strongly affect responses to sentence lower in the IH group. The implications ofthese results stimuli that phonetic perception effects might be undetect­ are discussed below. First, however, we relate the perfor­ able. This issue was investigated by using the phoneme­ mance levels obtained here to ones from auditory speech to-phoneme alignments obtained for the CID sentences. perception. This is to provide a more common referent Every instance ofa stimulus phoneme aligned with a dif­ (auditory) for what these performance levels might func­ ferentphoneme (no correct responses were employed) in tionally accomplish. Whether performance levels equated the response was extracted from the entire data set from across visual and auditory tasks of the type here accu­ the CID sentence alignments and tallied in a confusion rately predict various other types of functional equiva­ matrix. A total of93,455 individual phoneme-to-phoneme lence (e.g., easelspeed ofcomprehension) remains to be alignments were tallied. Matrices (four total) were com­ discovered. piled within groups and talkers. Phoneme SU, in bits, the Equating visual and auditory speech performance entropy for individual phonemes, was calculated for each levels. The entire range ofIH upper quartile scores (across ofthe phonemes in each ofthe matrices. all sentence sets) was .48-.85 mean proportion ofwords Figure 10 shows phoneme SU for the female (Fig­ correct (see Table I). This range can be approximately ure lOA) and the male (Figure lOB) talkers' cm sen­ equated to the performance curve reported in G. A. Miller, tences across the two participant groups. The phonetic Heise, and Lichten (1951) for five-word sentences pre­ symbols on the horizontal axes include each ofthe nota­ sented in noise. Performance at the lower end ofthe range tions employed in the sentence transcriptions. Because (.48) is approximately equivalent to auditory speech per­ the sentences were different across talkers, the inventory ception in a signal-to-noise (SIN) condition of -4 dB. ofphonetic symbols is somewhat different. The figures Performance at the upper end of the range (.85) is ap­ show that SU, entropy, was uniformly higher for the proximately equivalent to performance in a +5-dB SIN NH than for the IH group and higher for the female condition. That is, visual speech perception in highly talker than for the male. An ANOVA showed that both skilled IH lipreaders is similar in accuracy to auditory talker and participant group were significant factors speech perception under somewhat difficult to somewhat [F(l,160) = 5.312, p = .022, and F(l,160) = 156.213, favorable listening conditions. The entire range of NH p = .000, respectively]. upper quartile scores for lipreading sentences (across all These results show that the substitution errors com­ the sentence sets) was .27-.69 mean proportion ofwords mitted by the NH group were less systematic than were correct (see Table I). The lower end ofthe range (.27) is those of the IH group. This result is predicted, iferrors in approximately equivalent to performance in a - 8-dB the sentence responses were driven more by phonetic per­ SIN condition. The upper end of the range (.69) is ap­ ception for the IH group than for the NH group. That is, proximately equivalent to performance in a O-dB SIN we theorize that lower entropy is due to greater percep- condition. That is, visual speech perception in the most SPEECH PERCEPTION WITHOUT HEARING 247

A 3.0,------.

2.5 --­.....CIl e 2.0 € .~ 1::: 1.5 OJ u I," :5 1.0 I- s:: / Group o / ,I I :g...... 5 IH CIl -' "§ tr: 0.0-'-- ---1 NH re I ar o d e f IJ I kin 0 r stu v au J Z ~ab~o£ghI Imnp~Seuw~j3

B 3.0.,------,

2.5

2.0

I

/I" 1.5 ,I I \ I 1\ 1\ \ 1\ \ / "\ '\ i 1.0 l...... \ 1\ I \ "/\ I \, I ,/ I 1\ Group I II I \' \ I' \ III ,II\ /\I \II" /\ I \ /, 1 \ I' \/ \ /...' \ I \ I, J\ .5 I \i I/ \, \1 \ I \ I \ /" \'\' IH \1 '/ I,' \/ \/ .... '/ I i \ \ If

0.0.1.-- ---1 NH re I ar o d e t IJ 1 kin p ~ J I::l U w ~ J .~ a b tj 0 s 9 h I I m 0 r stu v au j Z

Figure 10. Substitution Uncertainty in bits for CIO sentences across talkers and groups. Panel A shows the female talker, and panel B the male talker. Phonetic notation includes all the symbols required to transcribe the sentences for each of the talkers.

skilled NH lipreaders is similar in accuracy to auditory Impaired hearing and enhanced visual speech per­ speech perception under difficult to somewhat difficult ception in the presentstudy. One strong assertion in the listening conditions. This characterization suggests that, literature is that hearing impairment is not associated for the IH upper quartile, lipreading can be adequate for with enhanced visual speech perception. For example, fluent communication. However, for the majority of IH Summerfield (1991) states, "The distribution oflipread­ and NH participants, it suggests that lipreading alone ren­ ing skills in both the normal-hearing and hearing-impaired ders communication difficult at best. populations is broad. Indeed, it is to be lamented that ex- 248 BERNSTEIN, DEMOREST, AND TUCKER cellence in lipreading is not related to the need to per­ As was suggested above, it is likely that the present re­ form well, because in formal laboratory tests ... , the best sults are due to having recruited a large number of par­ totally deafand hearing-impaired subjects often perform ticipants with severe to profound congenital or early­ only as well as the best subjects with normal hearing" onset hearing impairments. If superior visual speech (p. 123). According to Ronnberg (1995), "visual speech­ perception is related to severe to profound hearing im­ reading [sic] skill is not critically related to auditory im­ pairment at an early age, prevalence ofprofound hearing pairment per se" (p. 264). (See also Mogford, 1987.) The impairment must influence the frequency of observing present study provides consistent evidence for enhanced skilled lipreading. On the basis of 1984 statistics (De­ lipreading in a subset ofIH participants. partment ofCommerce, 1984, cited in Hotchkiss, 1989), In subsequent studies that we have conducted with the percentage of the United States population 15 years deafand hearing students at California State University, ofage and older who were unable to hear what is said in Northridge, at the National Center on Deafness, we have normal conversation was only 0.3% (481,000). Further continued to find that the most accurate visual speech limiting the opportunity to observe skilled lipreading is perception is demonstrated by the deaf students. One the use ofAmerican Sign Language as a first and primary possible explanation for the discrepancy between our language by a part of this population: Lipreading accu­ findings and the view cited above is that we recruited many racy has been shown to be negatively correlated with sign participants whose severe to profound hearing impair­ language usage (Bernstein, Demorest, & Tucker, 1998; ments were acquired congenitally or very early, whereas Geers & Moog, 1989; Moores & Sweet, 1991). generalizations about lipreading in IH individuals are typically based on studies ofindividuals with postlingual Phonetic Perception and Word Recognition impairments and/or with less significant impairments. If A principal aim of the present study was to describe enhanced visual speech perception depends on early vi­ visual phonetic perception across IH and NH partici­ sual perceptual experience and auditory deprivation, lip­ pants. With word and sentence stimuli, it is ensured that reading enhancements in postlingually deafened adults the phonetic perception measures (phonemes correct and would not be predicted. SU) are not independent oflexical and higher level psy­ Not only is it asserted in the literature that enhanced cholinguistic effects. Even CV syllable identification has lipreading is not associated with impaired hearing, it is been shown to be affected by lexical processes (Newman also asserted that acquisition of language with normal et aI., 1997) and phonological knowledge (Walden et aI., hearing is a condition for achieving the highest levels of 1980). A question is whether the "contamination" ofthe lipreading accuracy (Mogford, 1987). This possibility phonetic perception measures by nonphonetic psycho­ was examined with information on the present IHpartic­ linguistic effects diminishes the strength of the conclu­ ipants. The audiological records available for the IHgroup sion that enhanced phonetic perception occurred in the who scored in the upper quartiles on all ofthe measures IH group. We think not. were examined for evidence that hearing is not a prerequi­ The possibility that enhanced lipreading ofsentences site for outstanding lipreading. Four such participants and words in the present study was actually attributable had records that indicated profound, congenital impair­ to enhanced higher level psycholinguistic or strategic pro­ ments. Each had audiometric pure tone thresholds of cesses might be argued on the grounds that the IHgroup 100-dB HL or greater that were probably attributable, at had a better or more extensive knowledge oflanguage to least in part, to vibrotactile sensation at low frequencies, deploy during visual speech perception than did those in and not to auditory sensation (Boothroyd & Cawkwell, the NH group. However, in a recent study ofsimilarly se­ 1970; Nober, 1967). Audiometric evaluations stated that lected IH and NH groups, we found that the IH partici­ these 4 participants could not identify words by listening pants as a group are less familiar with words (Waldstein, under conditions of amplification. Absolute assurance Auer, Tucker, & Bernstein, 1996) and, in a word-age-of­ that they had never had functionally useful hearing for acquisition task, judge words to have been learned later identifying words is impossible. With that caveat, it ap­ (Auer, Waldstein, Tucker, & Bernstein, 1996). Mean pears that the upper quartile levels oflipreading accuracy reading comprehension scores were lower, as were mean were achieved by individuals with no auditory speech Peabody Picture Vocabulary Test (Dunn & Dunn, 1981) perception experience. The existence of these individu­ scores. Therefore, it seems unlikely that enhanced lipread­ als suggests that visual speech perception (possibly with ing of sentences in the present IH group versus the NH vibrotactile stimulation as well) can be functionally equiv­ group was due to enhanced experience or knowledge of alent to auditory speech perception for acquiring per­ language, although within the present IH group, there is ception ofa spoken language. Future studies are needed an association between language and lipreading (Bern­ to evaluate all aspects ofspoken language processing in stein et aI., 1998). individuals who perform at high levels oflanguage com­ Taken together, (l) the statistically significant differ­ prehension and production, yet have no or a minimal his­ ences between groups on phonetic perception measures, tory ofauditory speech perception. These individuals af­ consistently favoring the IH group, (2) the higher corre­ ford an opportunity to dissociate speech perception from lations between phonetic perception measures and word perceptual modality in accounting for speech perception. measures for the IHgroup, (3) the lower SUfor phonemes SPEECH PERCEPTION WITHOUT HEARING 249 in sentences for the IH group, (4) the systematic rela­ the relative accuracy of lipread word recognition across tionships between feature transmission scores and word individuals. This question cannot be discussed adequately scores for the IH group, and (5) the high likelihood that without outlining the current arguments in the literature most ofthe IH participants had less language experience concerning the various architectures hypothesized to ac­ than the NH participants all argue for genuine differences count for spoken word recognition (e.g., Luce & Pisoni, in phonetic perception across groups. Future studies, par­ 1998; Marslen-Wilson, 1990; McClelland & Elman, 1986). ticularly ones involving measurements of optical pho­ This exercise would far exceed the present space limita­ netic stimulus properties and perceptual sensitivity, are tions. Even if there were space to layout the various ar­ needed to determine what accounts for enhanced pho­ guments for and against hypothesized architectures, it netic perception. would need to be admitted that little work on individual In addition to evidence for phonetic perception differ­ differences has been done in the area ofspoken word rec­ ences across groups, there is strong evidence that the ognition (cf. Lewellen,Goldinger, Pisoni, & Greene, 1993). more successful IH lipreaders were more successful at In the literature on lipreading, however, the question lipreading sentences owing to their ability to recognize ofthe relative importance ofbottom-up versus top-down words. Table 3 shows correlations between mean pro­ processing in accounting for individual differences has portion of phonemes correct in words and in sentences, not been posed in terms ofword recognition architectures which were higher for the IH group. Between 66% and but, rather, as a question concerning the relative impor­ 71% ofthe variance in sentence scores can be accounted tance of phonetic versus higher level psycholinguistic for by the isolated word scores for the IH group, and be­ (sentential semantic or syntactic; Jeffers & Barley, 1971; tween 44% and 64% of the variance for the NH group. Ronnberg, 1995; Rosen & Corcoran, 1982) or expectancy­ Conklin (1917) and Utley (1946) both reported word ver­ based processing (e.g., Lyxell & Ronnberg, 1987). A sus sentence lipreading scores that correlated approxi­ similar but more extensive debate has taken place in the mately .70. Lyxell, Ronnberg, Andersson, and Linderoth literature on reading (Perfetti, 1994; Stanovich, 1980) (1993) reported a similar correlation of .83. and has been raised by studies ofcomprehension ofsyn­ Given that both phoneme identification and word thetic speech produced by rule (Duffy & Pisoni, 1992). identification were associated with sentence performance For both modalities of language comprehension, strong scores, we examined the predictability of cm and B-E evidence supports models that put a premium on bottom­ sentence performance in terms of multiple regression up word recognition processing for fast, automatic, effi­ analyses involving isolated word identification scores cient, and accurate comprehension. The critical contact versus phoneme identification scores. Four analyses were between lower level perceptual processing and stored conducted: proportions ofwords correct and ofphonemes knowledge is thought to occur during lexical processing, correct for CID sentences and for B-E sentences. The which makes lexical semantic and syntactic properties data were pooled across talkers. Potential predictors in­ available to higher levels oflinguistic processing (Tyler, cluded group (IH vs. NH), phoneme identification in non­ Cobb, & Graham, 1992). sense syllables, and isolated word identification (in the Stanovich (1980) presented the case that poorer read­ comparable metric ofwords correct or phonemes correct), ers are the ones who rely on semantic and syntactic and the interactions ofgroup with each ofthe remaining expectancy-based processing, owing to their inadequate predictors. The results ofall four analyses were the same: word identification skills, whereas more expert readers Only group and isolated word performance were signif­ have fast and automatic word recognition and rely less icant predictors ofsentence performance. Phoneme iden­ on higher level sources of information. Stanovich sug­ tification in syllable performance did not contribute sig­ gested that "the ability to recognize words automatically nificantly to the prediction ofsentence performance, once is important because it frees capacity for higher-level in­ word performance was taken into account. Standardized tegrative and semantic processing" (p. 60), and he ar­ partial regression coefficients (beta) for each two-variable gued against the notion that the more fluent readers rely regression equation (word score and group) were similar. more on context. Temporally prior processes ofword rec­ Betas for the group variable ranged between .12 and .16. ognition compete for resources with later comprehension Betas for the isolated word variable ranged between .80 processes (syntactic, semantic, discourse level). Schwantes and .82. The multiple correlation coefficients, R, ranged (1981) showed that children, when they read, rely more between .88 and .90. These results do not mean, how­ on context than do adults, and he attributed this differ­ ever, that phonetic perception is not important in explain­ ence to the children's less effective word-decoding pro­ ing individual differences in lipreading sentences. They cesses. Recently, Perfetti (1994) summarized the reading mean that the word identification measures account for research on context and concluded that no studies show the same variance as do the syllable identification scores a greater context effect for skilled readers. Fluent read­ and for additional variance not accounted for by the syl­ ing involves automatic processes that, under normal cir­ lable scores. cumstances, operate on the bottom-up stimulus. It is not Given the importance ofword recognition in account­ the case, according to Perfetti, that fluent readers do not ing for individual differences in lipreading, a question is use context but that context aids in postidentification pro­ how top-down psycholinguistic processes and knowl­ cesses. Duffy and Pisoni (1992), share similar views in edge versus bottom-up phonetic processing contribute to their review ofperception and comprehension ofspeech 250 BERNSTEIN, DEMOREST, AND TUCKER

synthesized by rule. They argue that bottom-up phonetic auditorily presented speech parameters. Journal ofthe Acoustical So­ perception and word recognition account for performance ciety ofAmerica, 79,481-499. differences associated with different quality phonetic in­ CATFORD, J. C. (1977). Fundamental problems in phonetics. Bloom­ ington: Indiana University Press. formation. Differential use of context does not account for CHOMSKY, N., & HALLE, M. (1968). The soundpattern ofEnglish. New intelligibility differences across types ofsynthetic speech. York: Harper & Row. If the rule is that fluent language comprehension is CLOUSER, R. A. (1977). Relative phoneme visibility and lipreading per­ driven by the accuracy, automaticity, and efficiency of formance. Volta Review, 79, 27-34. CONKLIN, E. S. (1917). A method for the determination ofrelative skill word recognition, rather than by the deployment oftop­ in lip-reading. Volta Review, 19,216-219. down processes, it is predictable that the better lipread­ CONRAD, R. (1977). Lipreading by deaf and hearing children. British ers will evidence both enhanced phonetic perception and Journal ofEducational Psychology, 47, 60-65. enhanced isolated word identification. Studies of lip­ DAVIS, H., & SILVERMAN, S. R. (EDS.) (1970). Hearing and deafness. reading are needed within the framework ofdetailed lan­ New York: Holt, Rinehart & Winston. DEKLE, D. J., FOWLER, C. A., & FUNNELL, M. G. (1992). Audiovisual guage comprehension processing models, with particu­ integration in perception ofreal words. Perception & Psychophysics, lar attention to word recognition. It would be surprising to 51,355-362. learn that visual speech perception departs from the gen­ DEMOREST, M. E., & BERNSTEIN, L. E. (1992). Sources ofvariability in eral form that language comprehension has been found speechreading sentences: A generalizability analysis. Journal ofSpeech & Hearing Research, 35, 876-891. to follow for reading and auditory speech perception. DEMOREST, M. E., & BERNSTEIN, L. E. (1997). Relationships between subjective ratings and objective measures ofperformance in speech­ REFERENCES reading sentences. Journal ofSpeech. Language, & Hearing Re­ search, 40, 900-911. AUER, E. T, JR., & BERNSTEIN, L. E. (1997). Speechreading and the DEMOREST, M. E., BERNSTEIN, L. E., & DEHAVEN, G. P. (1996). Gener­ structure ofthe lexicon: Computationally modeling the effects ofre­ alizablity of speechreading performance on nonsense syllables, duced phonetic distinctiveness on lexical uniqueness. Journal ofthe words, and sentences: Subjects with normal hearing. Journal ofSpeech Acoustical Society ofAmerica, 102,3704-3710. & Hearing Research, 39, 697-713. AUER, E. T, JR., BERNSTEIN, L. E., WALDSTEIN, R. S., & TuCKER,P. E. DUFFY, S. A., & PISONI, D. B. (1992). Comprehension of synthetic (1997). Effects ofphonetic variation and the structure ofthe lexicon speech produced by rule: A review and theoretical interpretation. on the uniqueness ofwords. In C. Benoit & R. Campbell (Eds.), Pro­ Language & Speech, 35, 351-389. ceedings ofthe ESCAIESCOP Workshop on Audio-Visual Speech DUNN,L. M., & DUNN,L. M. (1981 ). Peabody picture vocabulary test­ Processing (pp. 21-24). Rhodes, Greece, September 26-27. revised. Circle Pines, MN: American Guidance Service. AUER, E. T, JR., WALDSTEIN, R. S., TuCKER, P. E., & BERNSTEIN, L. E. FEINGOLD, A. (1995). The additive effects ofdifferences in central ten­ (1996). Relationships between word knowledge and visual speech dency and variability are important in comparisons between groups. perception: I. Subjective estimates ofword age-of-acquisition. Jour­ American Psychologist, 50, 5-13. nal ofthe Acoustical Society ofAmerica, 100,2569. FISHER, C. G. (1968). Confusions among visually perceived consonants. BENGUEREL, A. P., & PICHORA-FuLLER, M. K. (1982). Coarticulation Journal ofSpeech & Hearing Research, II, 796-804. effects in lipreading. Journal ofSpeech & Hearing Research, 25, GEERS, A., & MOOG, J. (1989). Factors predictive ofthe development of 600-607. literacy in profoundly hearing-impaired adolescents. Volta Review, BERNSTEIN, L. E., COULTER, D. c, O'CONNELL, M. P.,EBERHARDT, S. P., 91,69-86. & DEMOREST, M. E. (1993). Vibrotactile and haptic speech codes. In GREEN, K. [P.] (1998). The use ofauditory and visual information dur­ A. Risberg, S. Felicetti, G. Plant, & K.-E. Spens (Eds.), Proceedings of ing phonetic processing: Implications for theories ofspeech percep­ the Second International Conference on Tactile Aids. Hearing Aids, tion. In R. Campbell, B. Dodd, & D. Burnham (Eds.), Hearing by and Cochlear Implants (pp. 57-70). Stockholm, June 7-11,1992. eye: II. Advances in the psychology ofspeechreading and auditory­ BERNSTEIN, L. E., DEMOREST, M. E., COULTER, D. c. & O'CONNELL, visual speech (pp. 3-25). East Sussex, U.K.: Psychology Press. M. P. (1991). Lipreading sentences with vibrotactile vocoders: Per­ GREEN, K. P., & KUHL,P. K. (1989). The role of visual information in formance ofnormal-hearing and hearing-impaired subjects. Journal the processing ofplace and manner features in speech perception. Per­ ofthe Acoustical Society ofAmerica, 90, 2971-2984. ception & Psychophysics, 45, 34-42. BERNSTEIN, L. E., DEMOREST, M. E., & EBERHARDT, S. P. (1994). A HEIDER, E, & HEIDER, G. M. (1940). An experimental investigation of computational approach to analyzing sentential speech perception: lipreading. Psychological Monographs, 52 (Whole No. 232), 124-153. Phoneme-to-phoneme stimulus-response alignment. Journal ofthe HOTCHKISS, D. (1989). Demographic aspects ofhearing impairment: Acoustical Society ofAmerica, 95, 3617-3622. Questions and answers. (Available from Center for Assessment and BERNSTEIN, L. E., DEMOREST, M. E., & TUCKER, P. E. (1998). What Demographic Studies, Gallaudet Research Institute, 800 Florida Av­ makes a good speechreader? First you have to find one. In R. Camp­ enue, N.E., Washington, DC 20002) bell, B. Dodd, & D. Burnham (Eds.), Hearing by eye: II. The psy­ HOUSE, A. S., WILLIAMS, C. E., HECKER, M. H., & KRYTER, K. D. chologyofspeechreading and auditory-visual speech (pp. 211-228). (1965). Articulation-testing methods: Consonantal differentiation East Sussex, If.K: Psychology Press. with a closed-response set. Journal ofthe Acoustical Society ofAmer­ BERNSTEIN, L. E., & EBERHARDT, S. P. (l986a). Johns Hopkins Lip­ ica,37, 158-166. reading Corpus I-II: Disc I [videodisc]. Baltimore: Johns Hopkins JEFFERS, J., & BARLEY, M. (1971). Speechreading (lipreading). Spring­ University. field, II.: Charles C. Thomas. BERNSTEIN, L. E., & EBERHARDT, S. P. (1986b). Johns Hopkins Lipread­ KEATING, P. A. (1988). A survey ofphonological features. Blooming­ ing Corpus III-IV: Disc 2 [videodisc]. Baltimore: Johns Hopkins ton: Indiana University Linguistics Club. University. KREUL, E. J., NIXON, J. c., KRYTER, K. D., BELL, D. W., & LAMB, J. S. BERNSTEIN, L. E., IVERSON, P., & AUER, E. T, JR. (1997). Elucidating the (1968). A proposed clinical test ofspeech discrimination. Journal of complex relationships between phonetic perception and word recog­ Speech & Hearing Research, 11,536-552. nition in audiovisual speech perception. In C. Benoit & R. Campbell KUHL,P. K., & MELTZOFF, A. N. (1988). Speech as an intermodal ob­ (Eds.), Proceedings ofthe ESCAIESCOP Workshop on Audio-Visual ject of perception. In A. Yonas (Ed.), Perceptual development in in­ Speech Processing (pp. 89-92). Rhodes, Greece, September 26-27. fancy (Vol. 20, pp. 235-266). Hillsdale, NJ: Erlbaum. BOOTHROYD, A., & CAWKWELL, S. (1970). Vibrotactile thresholds in LESNER, S. A., & KRICOS, P. B. (1981). Visual vowel and diphthong per­ pure tone audiometry. Acta Otolaryngologica, 69, 381-387. ception across speakers. Journal ofthe Academy ofRehabilitative BREEUWER, M., & PLOMP, R. (1986). Speechreading supplemented with Audiology, 14,252-258. SPEECH PERCEPTION WITHOUT HEARING 251

LEWELLEN, M. J., GOLDINGER, S. D., PISONI, D. B., & GREENE, B. G. borhood effects in phonetic processing. Journal ofExperimental Psv- (1993). Lexical familiarity and processing efficiency: Individual dif­ chology: Human Perception & Performance, 23, 873-889. • ferences in naming, lexical decision, and semantic categorization. NOBER. E. H. (1967). Vibrotactile sensitivity of deaf children to high in­ Journal ofExperimental Psychology: General, 122,316-330. tensity sound. Laryngoscope, 78, 2128-2146. L!SKER, L., & ABRAMSON, A. S. (1964). A cross-language study ofvoic­ OWENS. E., & BLAZEK, B. (1985). Visemes observed by hearing im­ ing in initial stops: Acoustical measurements. Word, 20, 384-422. paired and normal hearing adult viewers. Journal ofSpeech & Hear­ LUCE, P. A., & PISONI, D. B. (1998). Recognizing spoken words: The ing Research, 28, 381-393. neighborhood activation model. Ear & Hearing, 19, 1-36. PELSON, R. 0., & PRATHER, W. F. (1974). Effects of visual message­ LYXELL, 8., & RONNBERG, J. (1987).Guessing and speechreading. British related cues, age, and hearing impairment on speechreading perfor­ Journal ofAudiology, 21, 13-20. mance. Journal ofSpeech & Hearing Research, 17, 518-525. LYXELL, B., & RONNBERG, J. (1989). Information-processing skill and PERFETTI, C. A. (1994). Psycholinguistics and reading ability. In M. A. speech-reading. British Journal ofAudiology, 23, 339-347. Gernsbacher (Ed.), Handbook ofpsycholinguistics (pp. 849-894). LYXELL, B., & RiiNNBERG, J. (1991a). Visual speech processing: Word­ New York: Academic Press. decoding and word-discrimination related to sentence-based speech­ RABINOWITZ, W. M., EDDINGTON, D. K., DELHORNE, L. A, & CUNEO. reading and hearing-impairment. Scandinavian Journal ofPsychology, P.A. ( [992). Relations among different measures of speech reception 32,9-17. in subjects using a . Journal ofthe Acoustical Soci­ LYXELL. B., & RiiNNBERG, J. (199Ib). Worddiscrimination and chrono­ ety ofAmerica, 92, 1869-1881. logical age related to sentence-based speechreading skill. British RAPHAEL, L. J. ( 1972). Preceding vowel duration as a cue to the percep­ Journal ofAudiology, 25, 3-10. tion of the voicing characteristic of word final consonants in American LYXELL. B.. & RONNBERG. J. (1992). The relationship between verbal English. Journal ofthe Acoustical Society ofAmerica, 51, 1296-1303. ability and sentence-based speechreading. Scandinavian Audiology, RONNBERG, J. (1995). Perceptual compensation in the deaf and blind: 21,67-72. Myth or reality? In R. A. Dixon & L. Backman (Eds.), Compensat­ LYXELL. B., RONNBERG. J.. ANDERSSON, J.. & LINDEROTH, E. (1993). ingfor psychological deficits and declines (pp. 251-274). Mahwah, Vibrotactile support: Initial effects on visual speech perception. Scan­ NJ: Erlbaum. dinavian Audiology, 22, 179-183. RONNBERG, J., OHNGREN. G., & NILSSON, L.-G. (1982). Hearing defi­ MACLEOD. A. & SUMMERFIELD. Q. (1990). A procedure for measuring ciency, speechreading and memory functions. Scandinavian Audiol­ auditory and audiovisual speech-reception thresholds for sentences in ogv, n. 261-268. noise: Rationale. evaluation, and recommendations for use. British RONNBERG, J., SAMUELSSON, S., & LYXELL, B. (1998). Conceptual con­ Journal ofAudiology, 24. 29-43. straints in sentence-based lipreading in the hearing-impaired. In MARSLEN-WILSON. W. (1990). Activation, competition, and frequency R. Campbell, B. Dodd, & D. Burnham (Eds.), Hearing by eye: II. The in lexical access. ln G. 1. M. Altmann (Ed.), Cognitive models of psychology ofspeechreading and auditory-visual speech (pp. 143­ speech processing: Psycholinguistic and computationalperspectives 153). East Sussex, U.K.: Psychology Press. (pp. 148-172). Cambridge, MA: MIT Press. ROSEN. S., & CORCORAN, 1. (1982). A video-recorded test oflipreading MASSARO. D. W.( 1987). Speech perception byear andeve: A paradigm for British English, British Journal ofAudiology, 16,245-254. for psychological inquiry. Hillsdale. NJ: Erlbaum. ROSENBLUM, L. D., & FOWLER, C. A. (1991). Audiovisual investigation MASSARO. D. W. (1998). Perceiving tulkingfaces: From speech percep­ of the loudness-effort effect for speech and nonspeech events.Journal tion to a behavioral principle. Cambridge, MA: MIT Press. Bradford ofExperimental Psychology: Human Perception & Performance. 17, Books. 976-985. MASSARO. D. W.,COHEN. M. M.. & GESI. A.1. (1993). Long-term train­ SANKOFF, D., & KRUSKAL, J. B. (EDS.) (1983). Time warps. string edits, ing, transfer. and retention in [earning to lipread. Perception & Psvcho­ and macromolecules: The theory andpractice ofsequence compari­ physics, 53. 549-562. son. Reading, MA: Addison-Wesley. MCCLELLAND. J. L.. & ELMAN. J. (1986). The TRACE model ofspeech SCHEINBERG, J. C. S. (1988). An analysis oflpl. Ibl and Im/ in the perception. Cognitive Psychology, 18. 1-86. speechreading signal. Unpublished doctoral dissertation, The City MCGURK. H., & MACDoNALD. J. (1976). Hearing lips and seeing voices: University of New York,New York. A new illusion. Nature, 264, 746-748. SCHILDROTH, A. N., & KARCHMER, M. A. (1986). Deafchildren in Amer­ MlDDLEWEERD, M. J.. & PLOMP, R. (1987). The effect ofspeechreading ica. San Diego: College-Hill. on the speech-reception threshold of sentences in noise. Journal of SCHWANTES, E M. (1981). Locus ofthe context effect in children's word the Acoustical Society ofAmerica, 82, 2145-2147. recognition. Child Development, 52, 895-903. MILLER, G. A.. HEISE. G. A., & L!CHTEN. W.(1951 ). The intelligibility SEKIYAMA, K., & TOHKURA. Y. (1991). McGurk effect in non-English of speech as a function of the context of the test materials. Journal of listeners: Visual effects for Japanese subjects hearing Japanese syl­ Experimental Psychology, 41, 329-335. lables ofhigh auditory intelligibility. Journal ofthe Acoustical Soci­ MILLER, J. L. ( 199[). Comment: Bimodal speech perception and the ety ofAmerica, 90, 1797-1805. motor theory. In I. G. Mattingly & M. Studdert-Kennedy (Eds.), SPSS FOR WINDOWS 7.0 [Computer Software]. (1996). Chicago, IL: Modularity and the motor theory ofspeech perception (pp. 139-143). SPSS Inc. Hillsdale, NJ: Erlbaum. STANOVICH, K. E. (! 980). Toward an interactive-compensatory model MOGFORD, K. (1987). Lip-reading in the prelingually deaf. In B. Dodd of individual differences in the development of reading fluency.Read­ & R. Campbell (Eds.), Hearing hy el'e: The psychology oflip-reading ing Research Quarterly, I, 32-71. (pp. 191-211). Hillsdale, NJ: Erlbaum. SUM BY. W. H., & POLLACK. I. (1954). Visual contribution to speech in­ MONTGOMERY, A. A.. & JACKSON. P.L. ([ 983). Physical characteristics telligibility in noise. Journal ofthe Acoustical Society ofAmerica, of the lips underlying vowel lipreading performance. Journal ofthe 26,212-215. Acoustical Societv ofAmerica, 73, 2134-2144. SUMMERFIELD, Q. (1987). Some preliminaries to a comprehensive ac­ MONTGOMERY. A A., WALDEN, B. E.. & PROSEK. R. A. (1987). Effects count of audio-visual speech perception. In B. Dodd & R. Campbell of consonantal context on vowellipreading.Journal ofSpeech & Hear­ (Eds.), Hearing by eye: The psychology oflipreading (pp. 3-52). Lon­ ing Research, 30, 50-59. don: Erlbaum. MOORES. D. E, &SWEET. C. (1991). Factorspredictive of school achieve­ SUMMERFIELD. Q. (199 l). Visual perception of phonetic gestures. In I. G. ment. In D. F. Moores & K. P. Meadow-Orlans (Eds.), Educational Mattingly & M. Studdert-Kennedy (Eds.), Modularity and the motor and developmental aspects ofdeafness (pp. 154-20I). Washington, theory ofspeech perception (pp. 117-137). Hillsdale, NJ: Erlbaum. DC: Gallaudet University Press. SUMMERFIELD, Q., & McGRATH, M. (1984). Detection and resolution of NEVILLE. H. J. (1995). Developmental specificity in neurocognitive de­ audio-visual incompatibility in the perception of vowels. Quarterly velopment in humans. In M. Gazzaniga (Ed.), The cognitive neuro­ Journal ofExperimental Psychology, 36A, 51-74. sciences (pp. 219-231 ). Cambridge, MA: MIT Press. TILLBERG, I., RONNBERG, 1.. SVARD, I., & AHLNER. B. (1996). Audio­ NEWMAN, R. S.. SAWUSCH, J. R., & LUCE. P.A. (1997). Lexical neigh- visual speechreading in a group of hearing aid users: The effects of 252 BERNSTEIN, DEMOREST, AND TUCKER

onset age, handicap age, and degree of . Scandinavian YEHIA, H., RUBIN, P., & VATIKIOTIS-BATESON, E. (1997). Quantitative Audiology, 25, 267-272. association of orofacial and vocal-tract shapes. In C. Beno it & TYLER, L. K.,COBB, H., & GRAHAM, N. (EDS.) (1992). Spoken language R. Campbell (Eds.), Proceedings ofthe ESCAIESCOP Workshop on comprehension: An experimental approach to disordered and nor­ Audio- Visual Speech Processing (pp. 41-44). Rhodes, Greece, Sep­ mal processing. Cambridge, MA: MIT Press. tember 26-27. UTLEY, J. (1946). A test oflip reading ability. Journal ofSpeech & Hear­ ing Disorders, 11,109-116. NOTES VATIKIOTIs-BATESON, E., MUNHALL, K. G., KASAHARA, Y, GARCIA, E, & YEHIA, H. (1996). Characterizing audiovisual information during I. The term deafis employed here to refer to individuals whose hear­ speech. In H.T. Bunnell & W.Idsardi (Eds.), Proceedings ICSLP 96: ing impairments are at least 90-dB HL bilaterally. Fourth International Conference on Spoken Language Processing 2. However, vowel duration might provide a cue to postvocalic voic­ (pp. 1485-1488). New Castle, DE: Citation Delaware. ing distinctions (Raphael, 1972). WALDEN, B. E., ERDMAN, S. A., MONTGOMERY, A. A., SCHWARTZ, 3. The experimenter (P.E.T.) is a certified sign language interpreter D. M., & PROSEK, R. A. (1981). Some effects of training on speech with many years ofexperience communicating via sign alone and via si­ recognition by hearing-impaired adults. Journal ofSpeech & Hear­ mutaneous communication. ing Research, 24, 207-216. 4. It would have been possible to employ the CV identification data WALDEN, B. E., MONTGOMERY, A. A., PROSEK, R. A., & SCHWARTZ, from the present study to derive the distance measures for the phoneme­ D. M. (1980). Consonant similarity judgments by normal and hearing­ to-phoneme sequence comparison-perhaps, by employing distance impaired listeners. Journal ofSpeech& HearingResearch, 23, 162-184. measures tuned for each ofthe participant groups. However, given that WALDEN, B. E., PROSEK, R. A., MONTGOMERY, A. A., SCHERR, C. K., & the goal was to compare groups, independently estimated distance mea­ JONES, C. J. (1977). Effects of training on the visual recognition of sures seemed less problematic. consonants. Journal ofSpeech & Hearing Research, 20,130-145. 5. An alternative approach would have been to calculate the upper WALDSTEIN, R. S., AUER, E. T., JR., TUCKER, P. E. & BERNSTEIN, L. E. tail ratios for each ofthe test measures. This measure, as outlined by (1996). Relationships between word knowledge and visual speech per­ Feingold (1995), gives insight into the differences between groups that ception: II. Subjective ratings ofword familiarity. Journal oftheAcous­ may arise when variances differ, independent ofmeans. In the present tical Society 0/America, 100,2569. study, for example, if the IH and NH groups' scores are combined into WANG, M. D. (1976). SINFA: Multivariate uncertainty analysis for con­ a single distribution. it can be shown that 93% ofthe scores in the upper fusion matrices. Behavior Research Methods & Instrumentation, 8, 16% of the total combined population (i.e., one standard deviation 471-472. above the mean) were from the IH group. The graphical approach was WANG, M. D., & BILGER, R. C. (1973). Consonant confusions in noise: taken in this article because ofits straightforward intuitive appeal. A study ofperceptual features. Journal ofthe Acoustical Society 0/ 6. Fisher's Z was used to statistically test the difference between corre­ America,54,1248-1266. lations. Only three were significantly different. On the other hand, all WOZNIAK, V.D., & JACKSON, P. L. (1979). Visual vowel and diphthong sample correlations are different in the same direction across Tables 2 and perception from two horizontal viewing angles. Journal a/Speech & 3. By the binomial test, the probability ofthis happening by chance is ex­ Hearing Research, 22, 355-365. tremely small, if all population correlations are equal for both groups.

APPENDIX Phonological Features From Chomsky and Halle (1968) Phoneme rnd cor ant str fric cons nas voc voi cont high low back dur p 0 0 1 0 0 1 0 0 0 0 0 0 0 0 t 0 1 1 0 0 1 0 0 0 0 0 0 0 0 k 0 0 0 0 0 I 0 0 0 0 I 0 I 0 b 0 0 I 0 0 I 0 0 I 0 0 0 0 0 d 0 I 1 0 0 1 0 0 1 0 0 0 0 0 9 0 0 0 0 0 I 0 0 I 0 I 0 I 0 f 0 0 I I 1 I 0 0 0 I 0 0 0 0 o 0 II 0 II 0 0 0 I 0 0 0 0 S 0 II 1 II 0 0 0 I 0 0 0 I r 0 I 0 1 II 0 0 0 II 0 0 I v 0 0 I I II 0 0 II 0 0 0 0 0 0 II 0 1 I 0 0 II 0 0 0 0 z 0 II 1 II 0 0 II 0 0 0 I 3 0 I 0 1 II 0 0 I 1 I 0 0 I tf 0 I 0 I II 0 0 0 0 I 0 0 0 d3 0 I 0 I II 0 0 I 0 I 0 0 0 m 0 0 I 0 0 II 0 I 0 0 0 0 0 n 0 II 0 0 II 0 1 0 0 0 0 0 r 0 I 0 0 0 I 0 III 0 0 0 0 I 0 II 0 0 I 0 I II 0 0 0 0 w I 0 0 0 0 0 0 0 III 0 I 0 h 0 0 0 0 I 0 0 0 0 I 0 I 0 0 a 0 0 0 0 0 0 0 III 0 III Note-rnd, round; cor, coronal; ant, anterior; str, strident; fric, fricative; cons, consonantal; nas, nasal; voc, vocalic; voi, voicing; cont, continuant; dur, duration.

(Manuscript received September 11,1997; revision accepted for publication October 26, 1998.)