During Monaural and Dichotic Listening Neural Coding of Continuous

Neural coding of continuous speech in auditory cortex during monaural and dichotic listening Nai Ding and Jonathan Z. Simon J Neurophysiol 107:78-89, 2012. First published 5 October 2011; doi: 10.1152/jn.00297.2011 You might find this additional info useful... This article cites 63 articles, 26 of which you can access for free at: http://jn.physiology.org/content/107/1/78.full#ref-list-1 This article has been cited by 2 other HighWire-hosted articles: http://jn.physiology.org/content/107/1/78#cited-by Updated information and services including high resolution figures, can be found at: http://jn.physiology.org/content/107/1/78.full Downloaded from Additional material and information about Journal of Neurophysiology can be found at: http://www.the-aps.org/publications/jn This information is current as of October 23, 2012. http://jn.physiology.org/ at Univ Texas At Dallas on October 23, 2012 Journal of Neurophysiology publishes original articles on the function of the nervous system. It is published 24 times a year (twice monthly) by the American Physiological Society, 9650 Rockville Pike, Bethesda MD 20814-3991. Copyright © 2012 the American Physiological Society. ESSN: 1522-1598. Visit our website at http://www.the-aps.org/. J Neurophysiol 107: 78–89, 2012. First published October 5, 2011; doi:10.1152/jn.00297.2011. Neural coding of continuous speech in auditory cortex during monaural and dichotic listening Nai Ding1 and Jonathan Z. Simon1,2 1Department of Electrical and Computer Engineering and 2Department of Biology, University of Maryland, College Park, Maryland Submitted 1 April 2011; accepted in final form 28 September 2011 Ding N, Simon JZ. Neural coding of continuous speech in auditory phase-locked to the speech envelope, i.e., the slow modulations cortex during monaural and dichotic listening. J Neurophysiol 107: 78–89, summed over a broad spectral region (Abrams et al. 2008; Downloaded from 2012. First published October 5, 2011; doi:10.1152/jn.00297.2011.—The Ahissar et al. 2001; Aiken and Picton 2008; Lalor and Foxe cortical representation of the acoustic features of continuous speech is 2010; Luo and Poeppel 2007). Temporal locking to features of the foundation of speech perception. In this study, noninvasive mag- speech has also been supported by intracranial recordings from netoencephalography (MEG) recordings are obtained from human subjects actively listening to spoken narratives, in both simple and human core auditory cortex (Nourski et al. 2009). The temporal cocktail party-like auditory scenes. By modeling how acoustic fea- features of speech contribute significantly to speech intelligi- tures of speech are encoded in ongoing MEG activity as a spectro- bility, as do key spectrotemporal features in speech such as http://jn.physiology.org/ temporal response function, we demonstrate that the slow temporal upward and downward formant transitions. The neural coding modulations of speech in a broad spectral region are represented of spectrotemporal modulations in natural soundtracks has bilaterally in auditory cortex by a phase-locked temporal code. For been studied invasively in human auditory cortex using intra- speech presented monaurally to either ear, this phase-locked response cranial extracellular recordings (Bitterman et al. 2008), where is always more faithful in the right hemisphere, but with a shorter latency in the hemisphere contralateral to the stimulated ear. When the spectrotemporal tuning of individual neurons was found to different spoken narratives are presented to each ear simultaneously be generally complex and sometimes very fine in frequency. At (dichotic listening), the resulting cortical neural activity precisely a neural network level, the blood oxygen level-dependent encodes the acoustic features of both of the spoken narratives, but (BOLD) activity measured by functional magnetic resonance at Univ Texas At Dallas on October 23, 2012 slightly weakened and delayed compared with the monaural response. imaging (fMRI) also shows complex spectrotemporal tuning Critically, the early sensory response to the attended speech is con- and possesses no obvious spatial map (Schönwiesner and siderably stronger than that to the unattended speech, demonstrating Zatorre 2009). Which spectrotemporal features of speech are top-down attentional gain control. This attentional gain is substantial encoded in the large-scale synchronized neural activity mea- even during the subjects’ very first exposure to the speech mixture and therefore largely independent of knowledge of the speech content. surable by MEG and EEG, however, remain unknown and are Together, these findings characterize how the spectrotemporal fea- the focus of the current study. tures of speech are encoded in human auditory cortex and establish a When investigating the neural coding of speech, there are single-trial-based paradigm to study the neural basis underlying the several key issues that deserve special consideration. One cocktail party phenomenon. arises from the diversity of speech: language is a productive system permitting the generation of novel sentences. In every- speech segregation; attention; spectrotemporal response function; magnetoencephalography day life, human listeners constantly decode spoken messages they have never heard. In most neurophysiological studies of speech processing, however, small sets of sentences are re- SPOKEN LANGUAGE IS THE DOMINANT form of human communica- peated tens or hundreds of times (although see Lalor and Foxe tion, and human listeners are superb at tracking and under- 2010). This is primarily due to methodological constraints: standing speech even in the presence of interfering speakers neurophysiological recordings, especially noninvasive record- (Bronkhorst 2000; Cherry 1953). The critical acoustic features ings, are quite variable, and so integrating over trials is neces- of speech are distributed across several distinct spectral and sary to obtain a valid estimate of the neural response. An often temporal scales. The slow temporal modulations and coarse neglected cost of repeated stimuli, however, is that the listener spectral modulations reflect the rhythm of speech and contain has obtained complete knowledge of the entire stimulus speech syllabic and phrasal level segmentation information (Green- after only a few repetitions. Without the demands of speech berg 1999) and are particularly important for speech intelligi- comprehension, the encoding of this repeated speech might be bility (Shannon et al. 1995). The neural tracking of slow quite different from the neural coding of novel speech under temporal modulations of speech (e.g., 1–10 Hz) in human natural listening conditions. It is pressing, therefore, to develop auditory cortex can be studied noninvasively using magneto- experimental paradigms that do not require repeating stimuli encephalography (MEG) and electroencephalography (EEG). many times, to study how speech is encoded in a more The low-frequency, large-scale synchronized neural activity ecologically realistic manner. recorded by MEG/EEG has been demonstrated to be synchro- Second, speech communication is remarkably robust against nized by speech stimulus (Luo and Poeppel 2007) and is interference. When competing speech signals are present, human listeners can actively maintain attention on a particular Address for reprint requests and other correspondence: J. Z. Simon, Univ. of speech target and comprehend it. The superior temporal gyrus Maryland, College Park, MD 20742 (e-mail: [email protected]). has been identified as a region heavily involved in processing 78 0022-3077/12 Copyright © 2012 the American Physiological Society www.jn.org CORTICAL SPEECH ENCODING IN HUMAN LISTENERS 79 concurrent speech signals (Scott et al. 2009). Recent EEG left and right auditory pathways. Moreover, previous studies results have shown that human auditory cortex can selectively have only demonstrated that speech is encoded in MEG/EEG amplify the low-frequency neural correlates of the speech activity with sufficient fidelity to discriminate among two or signal being attended to (Kerlin et al. 2010). This attentional three sentences (Kerlin et al. 2010; Luo and Poeppel 2007). modulation of low-frequency neural activity has been sug- With a long-duration, discourse-level stimulus, we can test the gested as a general mechanism for sensory information selec- limit of this fidelity by quantifying the maximum number of tion (Schroeder and Lakatos 2009). Because speech compre- speech stimuli that can be discriminated based on MEG re- hension is a complex hierarchical process involving multiple sponses. brain regions, it is unclear whether the attentional effect seen in Inspired by research on single-unit neurophysiology (de- the auditory cortex directly modulates feedforward auditory Charms et al. 1998; Depireux et al. 2001), the analysis of MEG processing or reflects only feedback from language areas, or activity was performed using the spectrotemporal response even motor areas (Hickok and Poeppel 2007). One approach to function (STRF), which can reveal neural coding mechanisms test whether feedforward processing is involved in speech by analyzing the relationship between ongoing neural activity segregation is to investigate the latency of the attentional and the corresponding continuous stimuli (Fig. 1). The prop- Downloaded from effect. If the attentional modulation of MEG/EEG response has erties of network-level cortical activity, which plays an impor- a relatively short latency, e.g., 100 ms, then

During Monaural and Dichotic Listening Neural Coding of Continuous

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support