Human Perception and Performance
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Experimental Psychology: Human Perception and Performance Top-Down Influences of Written Text on Perceived Clarity of Degraded Speech Ediz Sohoglu, Jonathan E. Peelle, Robert P. Carlyon, and Matthew H. Davis Online First Publication, June 10, 2013. doi: 10.1037/a0033206 CITATION Sohoglu, E., Peelle, J. E., Carlyon, R. P., & Davis, M. H. (2013, June 10). Top-Down Influences of Written Text on Perceived Clarity of Degraded Speech. Journal of Experimental Psychology: Human Perception and Performance. Advance online publication. doi: 10.1037/a0033206 Journal of Experimental Psychology: © 2013 the Author(s) Human Perception and Performance 0096-1523/13/$12.00 DOI: 10.1037/a0033206 2013, Vol. 39, No. 4, 000 Top-Down Influences of Written Text on Perceived Clarity of Degraded Speech Ediz Sohoglu Jonathan E. Peelle MRC Cognition and Brain Sciences Unit, Cambridge, UK MRC Cognition and Brain Sciences Unit, Cambridge, UK and Washington University in St. Louis Robert P. Carlyon and Matthew H. Davis MRC Cognition and Brain Sciences Unit, Cambridge, UK An unresolved question is how the reported clarity of degraded speech is enhanced when listeners have prior knowledge of speech content. One account of this phenomenon proposes top-down modulation of early acoustic processing by higher-level linguistic knowledge. Alternative, strictly bottom-up accounts argue that acoustic information and higher-level knowledge are combined at a late decision stage without modulating early acoustic processing. Here we tested top-down and bottom-up accounts using written text to manipulate listeners’ knowledge of speech content. The effect of written text on the reported clarity of noise-vocoded speech was most pronounced when text was presented before (rather than after) speech (Experiment 1). Fine-grained manipulation of the onset asynchrony between text and speech revealed that this effect declined when text was presented more than 120 ms after speech onset (Experiment 2). Finally, the influence of written text was found to arise from phonological (rather than lexical) correspondence between text and speech (Experiment 3). These results suggest that prior knowledge effects are time-limited by the duration of auditory echoic memory for degraded speech, consistent with top-down modulation of early acoustic processing by linguistic knowledge. Keywords: prior knowledge, predictive coding, top-down, vocoded speech, echoic memory An enduring puzzle is how we understand speech despite sen- movements, which typically precede arriving speech signals by ϳ150 sory information that is often ambiguous or degraded. Whether ms (Chandrasekaran, Trubanova, Stillittano, Caplier, & Ghazanfar, listening to a speaker with a foreign accent or in a noisy room, we 2009), improve speech intelligibility in noise (Ma, Zhou, Ross, Foxe, recognize spoken language with accuracy that outperforms exist- & Parra, 2009; Ross, Saint-Amour, Leavitt, Javitt, & Foxe, 2007; ing computer recognition systems. One explanation for this con- Sumby, 1954). Another strong source of prior knowledge is the siderable feat is that listeners are highly adept at exploiting prior linguistic context in which an utterance is spoken. Listeners are knowledge of the environment to aid speech perception. quicker to identify phonemes located at the ends of words rather than Prior knowledge from a variety of sources facilitates speech per- nonwords (Frauenfelder, Segui, & Dijkstra, 1990). For sentences ception in everyday listening. Previous studies have shown that lip presented in noise, word report is more accurate when the sentences are syntactically and semantically constrained (Boothroyd & Nittrouer, 1988; Kalikow, 1977; Miller & Isard, 1963). Although the influence of prior knowledge on speech perception Ediz Sohoglu, MRC Cognition and Brain Sciences Unit, Cambridge, is widely acknowledged, there is a longstanding debate about the UK; Jonathan E. Peelle, MRC Cognition and Brain Sciences Unit and underlying mechanism. Much of this controversy has centered on This document is copyrighted by the American Psychological Association or one of its alliedDepartment publishers. of Otolaryngology, Washington University in St. Louis; Rob- one particular effect of prior knowledge: the influence of lexical This article is intended solely for the personal use ofert the individual user and P. is not to Carlyon be disseminated broadly. and Matthew H. Davis, MRC Cognition and Brain Sciences context on phonological judgments in phonetic categorization and Unit. This research was supported by the Medical Research Council phoneme monitoring tasks (e.g., Frauenfelder et al., 1990; Ganong, (MC_US_A060_0038). 1980; Warren, 1970). One explanation for this phenomenon is that This article has been published under the terms of the Creative Com- it reflects top-down modulation of phonological processing by mons Attribution License (http://creativecommons.org/licenses/by/3.0/), higher-level lexical knowledge (McClelland & Elman, 1986; Mc- which permits unrestricted use, distribution, and reproduction in any me- Clelland, Mirman, & Holt, 2006). There are alternative accounts, dium, provided the original author and source are credited. Copyright for however, that do not invoke top-down processing. According to this article is retained by the author(s). Author(s) grant(s) the American these strictly bottom-up accounts, lexical information is combined Psychological Association the exclusive right to publish the article and with phonological information only at a late decision stage where identify itself as the original publisher. Correspondence concerning this article should be addressed to Ediz the phonological judgment is formed (Massaro, 1989; Norris, Sohoglu or Matthew H. Davis, MRC Cognition and Brain Sciences Unit, McQueen, & Cutler, 2000). 15 Chaucer Road, Cambridge CB2 7EF, UK. E-mail: e.sohoglu@gmail In the current study, we used a novel experimental paradigm to .com; [email protected] assess top-down and bottom-up accounts of prior knowledge ef- 1 2 SOHOGLU, PEELLE, CARLYON, AND DAVIS fects on speech perception. Listeners’ prior knowledge was ma- not only have implications for understanding the cognitive pro- nipulated by presenting written text before acoustically degraded cesses subserving speech perception in normal hearing individuals, spoken words. Previous studies have shown that this produces a but also in the hearing impaired. striking effect on the reported clarity of speech (Sohoglu, Peelle, When written text is presented before vocoded speech, listeners Carlyon, & Davis, 2012; Wild, Davis, & Johnsrude, 2012; see also report that the amount of acoustic degradation is reduced (Wild et Mitterer & McQueen, 2009) that some authors have interpreted as al., 2012; see also Goldinger, Kleider, & Shelley, 1999; Jacoby, arising from a decision process (Frost, Repp, & Katz, 1988). This Allan, Collins, & Larwill, 1988). This suggests that written text is because written text was found to modulate signal detection bias rather modifies listeners’ judgments about the low-level acoustic charac- than perceptual sensitivity. However, modeling work has shown teristics of speech. In analogy to the top-down and bottom-up that signal detection theory cannot distinguish between bottom-up explanations of lexical effects on phonological judgments, there and top-down accounts of speech perception (Norris, 1995; Norris are two mechanisms that could enable prior knowledge from et al., 2000). Hence, an effect of written text on signal detection written text to modulate listeners’ judgments about the perceived bias could arise at a late decision stage in a bottom-up fashion or clarity of vocoded speech. One possibility is that abstract (lexical at an early sensory level in a top-down manner. In the current or phonological) knowledge obtained from text has the effect of study, we used written text to manipulate both when higher-level modulating early acoustic processing, giving rise to enhanced knowledge becomes available to listeners and the degree of cor- perceptual clarity (top-down account, see Figure 1a). Alterna- respondence between prior knowledge and speech input. We will tively, information from written and spoken sources could be argue that these manipulations more accurately distinguish be- combined at a late decision stage, where the clarity judgment is tween top-down and bottom-up accounts. formed, without modulating early acoustic processing (bottom-up In the experiments described below, speech was degraded using account, see Figure 1b). We now describe the three experiments a noise-vocoding procedure, which removes its temporal and spec- we have conducted to test these competing accounts of written text tral fine structure while preserving low-frequency temporal infor- effects. mation (Shannon, Zeng, Kamath, Wygonski, & Ekelid, 1995). Vocoded speech has been a popular stimulus with which to study Experiment 1 speech perception because the amount of sensory detail (both spectral and temporal) can be carefully controlled to explore the Experiment 1 introduces the paradigm that we used to assess the low-level acoustic factors contributing to speech intelligibility impact of prior knowledge from written text on the perception of (e.g., Deeks & Carlyon, 2004; Loizou, Dorman, & Tu, 1999; vocoded speech. Listeners were presented with vocoded spoken Roberts, Summers, & Bailey, 2011; Rosen, Faulkner, & Wilkin- words that varied in the amount of sensory detail, and were asked son, 1999; Whitmal, Poissant, Freyman, & Helfer,