<<

Neural Encoding of Attended Continuous under Different Types of Interference

Andrea Olguin, Tristan A. Bekinschtein, and Mirjana Bozic

Abstract ■ We examined how modulates the neural encoding Critically, however, the type of the interfering stream significantly of continuous speech under different types of interference. In modulated this process, with the fully intelligible distractor an EEG experiment, participants attended to a narrative in English (English) causing the strongest encoding of both attended and while ignoring a competing stream in the other ear. Four different unattended streams and latest dissociation between them and types of interference were presented to the unattended ear: a nonintelligible distractors causing weaker encoding and early dis- different English narrative, a narrative in a language unknown to sociation between attended and unattended streams. The results the listener (Spanish), a well-matched nonlinguistic acoustic inter- were consistent over the time course of the spoken narrative. ference (Musical Rain), and no interference. Neural encoding of These findings suggest that attended and unattended information attended and unattended signals was assessed by calculating can be differentiated at different depths of processing analysis, cross-correlations between their respective envelopes and the with the locus of selective attention determined by the nature EEG recordings. Findings revealed more robust neural encoding of the competing stream. They provide strong support to flexible for the attended envelopes compared with the ignored ones. accounts of auditory selective attention. ■

INTRODUCTION “late selection” approaches. The early selection theory Directingattentiontoasinglespeakerinamultitalker (Broadbent, 1958) argued that, because of our limited environment is an everyday occurrence that we manage processing capacity, attended and unattended informa- with relative ease. This phenomenon is commonly tion is differentiated early in perceptual processing. More termed as the “cocktail party” effect (Cherry, 1953). A specifically, sensory features can guide attentional selec- large body of research has sought to assess how the un- tion early on, thus determining what will be subsequently derlying attentional mechanisms operate and how much processed for meaning. The late selection approach of the nonattended signal is perceived in such situations, (Duncan, 1980; Deutsch & Deutsch, 1963) proposed that producing mixed results. Here, we aim to assess these selective attention cannot affect the perceptual analysis questions by investigating the neural encoding of contin- of the stimuli and that both attended and unattended in- uous attended speech under different types of linguistic puts are processed equivalently by the perceptual sys- and nonlinguistic interference. tem. In this view, selective attention only acts later in the process, after the input had undergone semantic en- coding and analysis. Subsequent theories argued that un- Selective Attention attended information might be attenuated rather than completely filtered out, allowing unattended information Selective attention is the ability to sustain focus on task- with low identification thresholds (as determined by their relevant stimuli in the presence of distractors. This has semantic features) to reach awareness (Treisman, 1969). long been recognized as an essential cognitive capacity Johnston and Heinz (1978) suggested that selective (e.g., James, 1890) because our brains are continuously attention is a multimode flexible system, where attended flooded with information but limited in what they can and unattended information can be differentiated at process. Nevertheless, listeners are also often distracted different depths of processing analysis. They also argued by irrelevant stimuli, prompting the questions about the that selective attention itself requires processing capacity locus and mechanisms of attentional allocation in the (cf. Kahneman, 1973), with later selection requiring more presence of competing streams of information. Histori- processing capacity and effort. On this account, efficient cally, two major views guiding research on auditory selection can be achieved early based on sensory dif- selective attention were the “early selection” and the ferences between attended and unattended streams; however, in the absence of effective sensory cues, se- University of Cambridge mantic features will be driving the differentiation later

© 2018 Massachusetts Institute of Technology Journal of Cognitive 30:11, pp. 1606–1619 doi:10.1162/jocn_a_01303 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021 in the process, using more capacity. A more recent Neural Encoding of Attended and account of attention allocation in speech comprehension Unattended Streams (Bronkhorst, 2015) also argues that attentional selection can be triggered at different processing depths. Attention The temporal envelope of speech is strongly represented triggered early on is based on basic signal properties in the brain, with several studies showing a significant ( level, fundamental frequency) and enables fast correlation between speech envelopes and cortical activ- selection, whereas attention at later processing stages ity (Lalor & Foxe, 2010; Abrams, Nicol, Zecker, & Kraus, is based on complex information, such as syntactic and 2008; Aiken & Picton, 2008). These correlations appear to semantic information, and used for slow selection. be a result of phase locking or synchronization between neural activity and the slow amplitude modulations of the speech envelope, which are mainly present in the theta frequency band (3–7 Hz) and correspond to the syllabic Experimental Evidence rate of speech (Doelling, Arnal, Ghitza, & Poeppel, 2014; A substantial body of research used to Giraud & Poeppel, 2012; Drullman, Festen, & Plomp, assess whether auditory attention selects information 1994). Phase locking has also been observed for noise- early on, based on the physical characteristics of the stim- vocoded speech (i.e., stimuli in which the slow amplitude ulus, or after the input has been processed up to a se- fluctuations are preserved but spectral details are reduced) mantic level. The results are mixed, with some studies but is stronger for intelligible stimuli (Ding, Chatterjee, & showing that both attended and unattended information Simon, 2013; Peelle, Gross, & Davis, 2013). can be processed up to the semantic level (Bentin, Kutas, Selective attention has been shown to have a robust & Hillyard, 1995; Wood & Cowan, 1995; Eich, 1984) and influence on these synchronizations. In “cocktail party” others finding no evidence for semantic processing of the paradigms, the preferentially tracks the unattended stream (Wood, Stadler, & Cowan, 1997; temporal envelope of the attended talker and appears Newstead & Dennis, 1979). The inconsistency has been to be out of phase with the ignored speech stream attributed to inadequate control of attentional shifts to (Rimmele, Zion Golumbic, Schröger, & Poeppel, 2015; the unattended ear (Dupoux, Kouider, & Mehler, 2003; Hambrook & Tata, 2014; Horton, Srinivasan, & D’Zmura, Holender, 1986), prompting the claim that listeners can- 2014; Horton, D’Zmura, & Srinivasan, 2013; Ding & not semantically process information that is genuinely Simon, 2012a; Zion Golumbic, Poeppel, & Schroeder, unattended. Yet, further studies demonstrated that unat- 2012; Kerlin, Shahin, & Miller, 2010). This phenomenon tended information can be processed in the absence has been referred to as the “selective entrainment hy- of attention shifts to the irrelevant channel (Rivenez, pothesis” (Zion Golumbic et al., 2013; Giraud & Poeppel, Guillaume, Bourgeon, & Darwin, 2008) and that it can, 2012; Schroeder & Lakatos, 2010; Lakatos, Karmos, Mehta, under certain conditions, be processed up to the seman- Ulbert, & Schroeder, 2008), which suggests that attention tic and syntactic processing levels (Aydelott, Jamaluddin, causes low-frequency neural oscillations to entrain to the & Nixon Pearce, 2015; Pulvermüller, Shtyrov, Hasting, & temporal envelope of the attended speech stream. For Carlyon, 2008). This conclusion is also consistent with instance, Rimmele et al. (2015) used magnetoencepha- theargumentthattheauditorysystem—although able lography and a cocktail paradigm to reveal stronger to selectively focus processing on the relevant stream— attentional encoding for natural speech compared with has surplus capacity to process auditory information from noise-vocoded speech. They suggested that attentional other streams, regardless of the perceptual load in the enhancement of speech tracking depends on the pres- attended stream (Murphy, Fraenkel, & Dalton, 2013). ence of fine structure in the stimulus. In another study, However, it has also been argued that the nature of the Hambrook and Tata (2014) presented two simultaneous measurement can determine whether the processing of audiobook clips while EEG was being recorded. Atten- the unattended message is observed or not (Rivenez tional selection increased the EEG signals that were syn- et al., 2008), with studies using explicit measures (e.g., chronized with the attended stream, but not the ignored word recall) more likely to find that unattended message one. Similarly, Horton et al. (2013) asked participants to was not processed. attend to one of two competing speech streams. The Thus, although the existing evidence seems to suggest stimuli consisted of random sentences concatenated that unattended auditory information can be processed together for 22 sec. By calculating the cross-correlations at different depths of analysis, this view is still controver- between the speech envelopes and the EEG channels, sial and associated with the effects of task-dependent they found evidence of entrainment to the attended variables. It is also unclear how the nature of the compet- stream’s low-frequency modulations and much weaker ing streams interacts with this process. The current study phase locking to the unattended envelope. Horton addresses these questions in a task-free natural listening et al. (2013) results further suggested that the system paradigm by tracking the neural encoding of continuous selects the attended speech stream by amplifying the attended and nonattended speech under different types neural activity synchronized to the envelope. This mech- of linguistic and nonlinguistic interference. anism reflects an enhancement of the attended stream

Olguin, Bekinschtein, and Bozic 1607 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021 and possibly also entrainment-based suppression of the The final goal of our study was to track how attention unattended one. affects neural encoding over time. Although a substantial body of literature investigated the role of various neural systems in sustaining attention and its intensity over time Current Study (e.g., Malhotra, Coulthard, & Husain, 2009; Manly et al., The current study uses the well-established phenomenon 2003), there have been few attempts to integrate it with of encoding of the speech envelope as an index of atten- the literature on neural encoding of attended speech. We tional processing. To test whether the locus of selective addressed this issue by comparing the strength of neural attention interacts with the nature of the competing encoding of the attended speech envelopes at the begin- streams, we created a cocktail party paradigm in which ning, the middle, and the end of the narratives, across participants were instructed to attend to one speaker the four different interference conditions. while ignoring a competing stream. We then manipulated the type of the competing stream to create interference at perceptual or linguistic levels and recorded EEG METHODS signals in four different conditions. In the first condition, Participants listeners attended to a narrative in English presented in either the left or the right ear while actively ignoring Twenty-five healthy volunteers were recruited from the another distracting English story presented in the unat- University of Cambridge. They were right-handed, mono- tended ear (English–English condition). In the second lingual native speakers of British English with no history condition, listeners attended to a narrative in English of problems. Three participants were excluded while ignoring a narrative in Spanish, a language un- from data analyses because of technical problems and known to the volunteers (English–Spanish condition). excess noise; thus, 22 participants contributed to the In the third condition, the interfering stream was Musical present study (10 men, mean age = 21.5 years). All par- Rain (MuR), a nonlinguistic baseline that is closely ticipants were provided with detailed information regard- matched to the acoustic properties of speech, but does ing the purpose of the study and gave written consent. not trigger speech percept (English–MuR condition). The study was approved by the Cambridge Psychology Finally, the fourth condition was the “Single Talker” con- research ethics committee. dition, where participants were instructed to attend to narratives presented in either the left or right ear, with Stimuli and Procedure no interference presented in the other ear. We hypothesized that attention would increase speech The experiment consisted of three conditions where the tracking in all conditions compared with the nonattended attended speech was paired with interference (English– signal in the other ear, as has been shown in the previous English, English–Spanish, and English–MuR) and one studies (Horton et al., 2013; Ding & Simon, 2012a, condition where participants attended to an English nar- 2012b). However, we also predicted that the nature of rative without any interference (Single Talker; Table 1). the competing stream would modulate this process, in The stimuli were 10 simple children’s narratives, such line with the accounts about flexible locus of selective at- as “The Happy Prince” (eight in English and two in tention (Bronkhorst, 2015; Johnston & Heinz, 1978). On Spanish), and two matched MuR sets that acted as a non- this view, the nonlinguistic acoustic noise (MuR) should linguistic acoustic baseline. The stories were obtained be dissociated from the attended narrative early on based from YouTube channels and websites and transcribed on their low-level differences (speech vs. nonspeech), into 120 sentences each. Two native British English fe- producing least interference and requiring least process- male speakers recorded four stories each, and one native ing capacity. On the other hand, the fully intelligible and Spanish female speaker recorded the Spanish narratives. meaningful distractor in the English–English condition Gender was kept constant to reduce segregation strate- should produce greatest interference, requiring the use gies based on talker’s gender in the dichotic listening par- of higher-level semantic and syntactic features to dissoci- adigm (Brungart & Simpson, 2007). Sentences ranged ate between the two competing streams. This would trig- from 2.5 to 3.1 sec in length, and some of the sentences ger late selection and engage more processing capacity were slightly modified by a narrator (e.g., by adding adverbs (Johnston & Heinz, 1978). Attentional amplification of or adjectives “The swallow was very sad”) to adapt to the the entrained neural activity (cf. Horton et al., 2013) 3 sec per sentence criteria. All sentences were normalized should, therefore, be observed more strongly in the late to have equivalent root mean square sound amplitude. selection condition (English–English) than in the early To produce MuR (following the procedure introduced selection condition (English–MuR), with the English– by Uppenkamp, Johnsrude, Norris, Marslen-Wilson, & Spanish condition positioned in between. The final Patterson, 2006), we extracted temporal envelopes from condition, Single Talker, provides a test case for cortical the recorded English stimuli and filled them with jittered entrainment of speech that is not modulated by the fragments of synthesized speech. As such, MuR segments processing demands of divided attention. preserve the duration, the temporal envelope, and the

1608 Journal of Cognitive Neuroscience Volume 30, Number 11 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021 Table 1. Experimental Conditions

Condition Attended Unattended 1 English–English English A different, competing English narrative 2 English–Spanish English A language unknown to participants (Spanish) 3 English–MuR English Nonlinguistic acoustic set (MuR) 4 Single Talker English N/A (no interference). Single narrative presented in English

Type of attended and unattended streams per condition.

energy levels of the original speech stimuli, but despite tation order, story segment and attention demands counter- these similarities, the absence of continuous formants balanced across left and right ears; Figure 1A). Participants means that MuR does not elicit speech percept (Bozic, always heard the “Single Talker” condition first to familiarize Tyler, Ives, Randall, & Marslen-Wilson, 2010; Uppenkamp themselves with the experimental setup and the demands of et al., 2006). MuR was generated using custom-made attending to left/right. The order of the remaining three scripts in MATLAB (The Mathworks Inc., 2010). conditions and of the stories within each condition was ran- From each story, the first 60 sentences (first half) were domized across participants. Overall, participants attended stringed together and the second 60 sentences (second to 960 sentences across the four conditions. Because the half) were stringed together (with a 300-msec silence gap Single Talker condition did not have interference, the total between each sentence) to create two blocks that were number of unattended trials was 720. approximately 3.2 min (192 sec) long each. In each con- During the experiment, participants sat in a comfort- dition, participants attendedtotwostories(i.e.,four able chair in a sound-attenuated room and were asked blocks of 60 sentences each; 240 sentences in total) to fix their gaze on the printout of a cross placed 150 cm swapped between their left and right ear (Figure 1A). at eye level in front of them while the narratives were be- While they were actively attending to one channel, a ing presented (Figure 1B). The stimuli were delivered competing stream was simultaneously being presented in through insert earphones (3M; E-A-RTONE 3a) with a the other ear. Participants always attended to an English mean intensity of 65 dB SPL and were presented using story, presented either to the left or the right ear, and MATLAB and functions from the Psychophysics Toolbox ignored the other channel. None of the attended stimuli extensions (Brainard, 1997; Pelli, 1997). were repeated for the duration of the experiment (i.e., each sentence was attended to only once), but to keep Behavioral Measures the properties of the attended and the interfering speech equal, the same stories appeared in both capacities— To ensure that participants were paying attention to the once as attended and once as unattended (with the presen- desired channel, they were informed that they would be

Figure 1. (A) Structure of an example condition (English–English). Participants attended to the “Happy Prince” story in the first two blocks. Part 1 of the story was presented in the left ear and Part 2 in the right. The story “Five Peas” was the distractor stream in the first two blocks. In Blocks 3 and 4, participants attended to the “Five Peas” story, and the distractor stream was the “Happy Prince” story that had previously been attended to. Presentation order (e.g., attend to the left/right) was randomized across blocks. Presentation side was counterbalanced (i.e., if Part 1 was presented to the left channel, then Part 2 was presented to the right channel). (B) Sequence of a block. Participants were instructed to attend to a channel before the start of the block. They were asked to fixate on a cross placed 150 cm in front of them. The stimuli were presented between 3 and 10 sec after the verbal instruction. After the stimuli finished, participants were asked to complete 10 true/false questions about the story they had just attended to.

Olguin, Bekinschtein, and Bozic 1609 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021 asked questions about the attended story. They com- Speech envelopes were calculated using the Mel fre- pleted 10 true/false questions after each block, resulting quency cepstral coefficients. EEG data were down- in a total of 160 responses per participant. sampled to 100 Hz to match the speech envelopes. The acoustic properties of the envelopes (their mean Data Collection and Preprocessing frequency components and the distribution of auto- correlation peaks) were closely matched across the three EEG was recorded using 128 Ag/Ag-CI channel electrode types of interference, ensuring validity of comparisons net (Electrical Geodesics, Inc.), with EEG data recorded between them using the cross-correlation approach for 92 of the 128 channels. The 36 excluded channels are described below. located in the outer layers of the net (the neck area); they measure significantly more muscle noise and were therefore not of interest in this study. Voltages were re- Data Analysis corded at a sampling rate of 500 Hz, and all net imped- We characterized the relationship between the acoustic Ω ances were kept below 100 . Data were filtered between envelopes and EEG channels by calculating Pearson’s cor- 1 and 100 Hz and down-sampled to 250 Hz. All data were relation r between these two measures as a function of preprocessed and analyzed in MATLAB: EEGLAB Toolbox lag. As used previously (e.g., Horton et al., 2013; Aiken (Delorme & Makeig, 2004). Data were epoched at the & Picton, 2008), this approach reveals EEG activity that − sentence level (2 sec), with a 200 msec prestimulus represents acoustic envelopes: If an EEG channel is syn- window. This resulted in 960 attended and 720 unat- chronized with an envelope at a certain latency, it will tended epochs per participant. Artifact rejection was car- show a nonzero cross-corelation at a lag equal to that la- ried out per epoch, with bad epochs removed and bad tency. The cross-correlation function (Bendat & Piersol, channels interpolated. The Infomax independent compo- 1986) assumes a linear relationship between neural nent analysis algorithm implemented in the EEGLAB activity and the speech envelope, and for discrete toolbox was carried out to isolate independent compo- functions f and g, it is defined as nents and carry out artifact correction. The resulting independent components were visually inspected to X∞ ½½þ ðÞðÞ¼ fmgn m detect artifacts such as eye blinks and other nonbrain f * g n σ σ activity. These were rejected according to their topogra- m¼−∞ f g phy, time course, and spectral traits, generating clean σ σ nonartifact data. Finally, data were re-referenced to the where f and g are the standard deviations of f and g. average of all channels. This correlation was calculated for each 10 msec lag in the range of −200 msec before the onset of a sentence to 600 msec after the onset of a sentence, which covers Speech Envelopes the range of the observed effects in the literature (e.g., The temporal envelope of the speech was calculated for Baltzell et al., 2016). All EEG channels were cross- all attended and unattended stories and the MuR sets. correlated with the attended, unattended, and control

Figure 2. (A) Control distribution. Control cross-correlations were collapsed over channels and time to form a null distribution. The significance thresholds were set at the 95% confidence interval (thresholds at 97.5th and 2.5th percentiles). (B) Control cross-correlations. Cross-correlations at different lags between EEG channels and control envelopes (Mel frequency cepstral coefficient). The black lines represent the threshold for significance set at the 97.5th and 2.5th percentiles (uncorrected for multiple comparisons), as obtained from the null distribution shown in A.

1610 Journal of Cognitive Neuroscience Volume 30, Number 11 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021 speech envelopes for every sentence. Following Horton the fdr_bh function, allowing us to determine the time et al. (2013), control envelopes were those of different points where the attended time series were reliably dif- (i.e., nonmatching) sentences, and the correlation values ferent from each other. These ANOVAs were followed depicted in the control cross-correlations (Figure 2) are by post hoc pairwise t tests as described above. The same therefore due to chance. Control cross-correlation func- approach was used to assess differences between the tions were then collapsed across channels and time to unattended cross-correlation functions across conditions. form an estimated Gaussian distribution and used to de- termine the confidence interval at 95%. The null hypoth- esis states that there was no correlation between the EEG Attention over Time channels and the control envelope at a particular latency. To evaluate whether encoding of the attended and unat- Therefore, cross-correlation functions that were less than tended speech envelopes changed as the narrative pro- the 2.5th percentile and exceeded the 97.5th percentile gressed over time, we assessed the neural encoding of for attended and unattended correlations were found to sentences corresponding to the first, middle, and the fi- be significantly different from zero ( p <.05,before nal third of the narrative (labeled “Beginning,”“Middle,” correction for multiple comparisons). and “End”). We divided each block (60 sentences) into We first computed average cross-correlation functions three equal parts of 20 sentences (Beginning: 1–20; for all attended and all nonattended trials separately by Middle: 21–40; End = 41–60) and then summed across collapsing the correlation coefficients across conditions all “Beginning,”“Middle,” and “End” items per condition. and subjects at each time lag. This was followed by com- This way, we ended up with 80 sentences per group in putations of the average attended and nonattended each condition (e.g., Condition 1 = 1a, 1b, 1c; where cross-correlation functions for each condition separately. a = beginning, b = middle, c = end), which were com- The cross-correlation functions for all attended and all pared for attended and unattended cross-correlation nonattended trials were not directly compared due to functions using the same approach as above. differences in the overall numbers of attended and unat- tended trials (960 vs. 720). Differences between attended and unattended cross-correlations in each condition were RESULTS evaluated using pairwise t tests, with control for multiple Behavior comparisons achieved using nonparametric cluster-based permutations (Maris & Oostenveld, 2007) as imple- Participants completed the comprehension task with a mented in the ft_timelockstatistics function in FieldTrip. mean accuracy of 94.3% (SD = 3.8%), indicating that To this end, pairs of experimental conditions were com- the target speaker was attended to as instructed. A one- pared in 10-msec steps for each electrode in the −200 to way repeated-measures ANOVA showed that there was a 600 msec time window. All results with a t value larger difference between the number of correct responses than 0.05 (two-tailed test) were selected and clustered across conditions, F(3, 63) = 7.750, p < .001. Post hoc on the basis of temporal and spatial adjacency. To correct results indicated that participants performed reliably bet- – – for multiple comparisons, this calculation used the Monte ter in the English English (96.5%, p < .001) and English Carlo randomization test in which trials were randomly Spanish conditions (95.1%, p = .016) compared with the – partitioned from a combined set of two conditions and English MuR condition (92.2%). There were no reliable – – placed into two subsets. This procedure was repeated differences between the English English and English 1000 times to create a histogram of t values and calculate Spanish conditions, which also did not differ from the the proportion of random partitions that were greater performance in the Single Talker condition (93.3%). than the observed t values. The experimental conditions were considered to be significantly different if the prob- Average Attended and Unattended Cross- ability of such proportion ( p value) was less than .05. We correlations and Their Topographies also report T values for the obtained clusters of signifi- cant differences (representing summed t values across Continuous EEG data were recorded from participants all contributing electrodes) and Cohen’s d at the peak. listening to narrated stories in English in four different To assess if there were any reliable differences between listening conditions (English, Spanish, or MuR as inter- the attended cross-correlation functions across condi- ference, Single Talker). Average cross-correlations for tions, the attended functions for each electrode in the attended and unattended speech envelopes (averaged −200 to 600 msec time window were compared in 10-msec across participants and conditions) are depicted in steps in a one-way repeated-measures ANOVA, using a Figure 3. The attended cross-correlation functions nonparametric permutation approach (1000 permuta- (Figure 3A) show robust neural encoding of the attended tions) as implemented in the statcond function in speech envelope across conditions, with major clustering EEGLAB (Delorme, 2006). The control for multiple com- of peaks at approximate lags of 120 and 320 msec and parisons was achieved using false discovery rate (FDR, a less prominent one at around 500 msec post-onset. p < .05; Benjamini & Yekutieli, 2001) implemented in It is also necessary to note that we observed some

Olguin, Bekinschtein, and Bozic 1611 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021 Figure 3. (A) Average cross-correlations for all attended sentences from −200 to +600 msec post-onset. (B) Average cross-correlations for all unattended sentences from −200 to +600 msec post-onset, clearly showing attenuated encoding compared with the attended sentences. (C) Cross- channel average of the absolute values of the attended cross-correlation function. Prominent peaks are seen at latencies of around 120 and 320 msec, and a less prominent one was seen at around 500 msec post-onset. (D) Scalp topographies for cross-correlation values at all electrode positions averaged over latency ranges 90–150 msec, 290–350 msec, and 490–550 msec, corresponding to the peaks observed in C. Warm colors represent positive correlations, whereas cool colors represent negative correlations. The topography of the earlier effects is more central, whereas the topography of later effects has a more frontal distribution.

correlations between the EEG signal and the attended Attended and Unattended Cross-correlations envelopes before the sentence onset (0 msec). Similar re- across Interference Conditions sults have been reported in the previous literature (Thwaites et al., 2015; Horton et al., 2014) and are likely Auditory cortical responses showed entrainment to the to reflect the periodic nature of the speech signal. Match- attended speech envelope in each condition individually, ing the acoustic properties of envelopes across condi- consistent with previous studies (Ding & Simon, 2012a; tions as described above ensured that this does not Lalor & Foxe, 2010; Aiken & Picton, 2008). Attended impact the validity of our subsequent planned compari- cross-correlation functions were also significantly greater sons between them. The averaged cross-correlation func- than unattended correlations in all conditions, with cluster- tion for unattended speech (Figure 3B) shows that few based permutation t tests showing both positive and EEG channels cross the significance threshold, indicating negative differences in each of the three interference that attention had a major impact on encoding the conditions (Table 2 and Figure 4). As discussed in the lit- speech envelopes. Moreover, the shape of the unat- erature (e.g., Kong, Mullangi, & Ding, 2014), the polarity tended cross-correlation function differs from the at- of the cross-correlation can reflect either the direction of tended one, suggesting that it is not an attenuated or the neural current or whether the neural source re- suppressed version of the same pattern. sponds to a power increase or decrease in the envelope. Scalp topographies for average attended cross- Thus, a negative cross-correlation (or cross-correlation correlations (Figure 3D) are plotted for latency ranges difference) may indicate sources that produce a negative of 90–150 msec, 290–350 msec, and 490–550 msec based voltage on the scalp following a power increase in the on the observed concentration of peaks at those time envelope or sources that produce a positive voltage but points. As Figure 3D shows, the topography of the earlier tracks a power decrease in the envelope. effects is more central, whereas the topography of later For the English–English condition, the difference be- effects has a more frontal distribution. tween attended and unattended cross-correlations only

1612 Journal of Cognitive Neuroscience Volume 30, Number 11 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021 Table 2. Pairwise t Tests between Attended and Unattended Cross-correlations in Each Condition Positive Cluster Negative Cluster

Onset Peak Onset Peak Attended vs. Unattended (msec) (msec) p T Cohen’sd (msec) (msec) p T Cohen’sd English–English 130 190 .001 2444.1 1.3 150 510 .001 −2675.8 0.6 English–Spanish 0 N/A ns N/A N/A 0 200 .009 −1395.5 1.4 English–MuR 0 120 .001 3573.4 2.2 0 100 .001 −6061.8 1.6

T = sum of all t values within the cluster; Cohen’s d = effect size at cluster peak.

Figure 4. (A) Attended cross-correlation functions in each condition from −200 to +600 msec post-onset. Topographies represent significant electrodes at three peak latencies (100, 300, and 500 msec). (B) Unattended cross-correlation functions in each condition and topographies of significant electrodes at three peak latencies (100, 300, and 500 msec). (C) Results for cluster-based permutation t tests between attended and unattended cross-correlation functions in each condition, representing the topographies of the differences and the timing and maxima of the significant clusters. Horizontal blue line represents the timing of significant differences between conditions. (D) Plots of cross-correlation values per participant in the peak electrode for the attended versus unattended comparison in each condition. Means are shown in black horizontal lines.

Olguin, Bekinschtein, and Bozic 1613 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021 emerged from 130 msec onwards, with the cluster of was not modulated by time (Time × Condition interac- negative differences (i.e., attended stream triggering tion, F < 1; Figure 5B). These results confirm that the stronger negative cross-correlations than the unattended type of distractor significantly modulates attentional en- stream) peaking as late as 510 msec over right frontal coding, with nonlinguistic interference (MuR) dissociated regions. In contrast, for both English–Spanish and from the attended signal earlier than linguistic interfer- English–MuR conditions, the encoding of attended and ence and fully intelligible distractor (English) triggering unattended envelopes significantly differed from the strong dissociation between the two streams only at later onset, with peaks at 160 and 200 msec over posterior time points. central and right frontal regions in the English–Spanish condition and at 100 and 120 msec and in the English– Comparisons of Attended Cross-correlations MuR condition for positive and negative effects, respec- across Conditions tively; suggesting that the type of interference affected how early the listeners could differentiate the attended To directly assess differences between encoding of the from the unattended stream. attended stream under different types of interference, To directly test these apparent latency distinctions we submitted all attended cross-correlations (including between conditions, we extracted cross-correlation the no interference Single Talker condition) to one-way difference values for each condition across the 90– repeated-measures ANOVA. Consistent with the analyses 150 msec, 290–350 msec, and 490–550 msec time win- reported above, the results (FDR corrected for multiple dows (corresponding to timings observed in Figure 3C) comparisons) showed robust differences across condi- over the posterior central and right frontal areas that tions, emerging both early (0–300 msec) and at later time consistently emerged as relevant for attentional encoding points (around 500 msec post-onset). To reveal specific (Figures 3D and 4C) and submitted them to a repeated- patterns of differences between conditions, this ANOVA measure ANOVA. Results showed that the three condi- was followed up with post hoc t tests. These showed that tions triggered significantly different effects over time conditions with linguistic interference (English–English across posterior central electrodes (Time × Condition in- and English–Spanish) triggered significantly greater teraction, F(4, 138) = 18.29, p < .001, μ = .35), with the encoding of the attended stream compared with the English–MuR condition triggering strongest differentia- attended stream in the English–MuR condition (Table 3). tion between the attended and unattended streams early These differences were significant from the very onset, on, but showing comparable effects to the other two con- with positive clusters over the central regions peaking ditions by the latest time window (Figure 5A). In the right at 210 msec for the English–English versus English– frontal areas, conditions showed significantly different ef- MuR comparison and 260 msec for the English–Spanish fects only in the 490–550 msec time window, F(2, 48) = versus English–MuR comparison. Comparisons between 5.52, p = .007, μ = .20, with the English–English condi- English–English and English–Spanish showed that they tion triggering strongest differentiation between at- were encoded equivalently up to 320 msec; from 320 tended and unattended streams; however, this effect to600msec,theencodingofattendedspeechinthe

Figure 5. (A) Pattern of attended versus unattended cross-correlation differences over time across posterior central electrodes (top left insert), showing strong early dissociation triggered by nonlinguistic interference (MuR). (B) Pattern of attended versus unattended cross-correlation differences over time across right frontal electrodes (top right insert), showing stronger late dissociation triggered by intelligible linguistic interference (English).

1614 Journal of Cognitive Neuroscience Volume 30, Number 11 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021 Table 3. Pairwise t Tests between Attended Cross-correlation Functions across Conditions Positive Cluster Negative Cluster

Onset Peak Onset Peak (msec) (msec) p T Cohen’sd (msec) (msec) p T Cohen’sd Attended vs. Attended in the Three Interference Conditions English–English vs. English–Spanish N/A N/A ns N/A N/A 320 400 .011 −1118.03 0.9 English–English vs. English–MuR 0 210 .002 2551.74 1.0 N/A N/A ns N/A N/A English–Spanish vs. English–MuR 0 260 .001 2266.61 0.6 N/A N/A ns N/A N/A

Single Talker Comparisons Single Talker vs. English–English 0 260 .004 1449.25 1.1 0 240 .001 −2365.86 1.1 Single Talker vs. English–Spanish 0 350 .001 2643.88 0.8 0 280 .001 −3478.33 1.4 Single Talker vs. English–MuR 100 200 .024 1114.18 0.9 60 160 .012 −1493.71 1.1

T = sum of all t values within the cluster; Cohen’s d = effect size at cluster peak.

English–English condition was significantly greater than 1973), such that the presence of any interference reduces in the English–Spanish condition. These results extend the capacity available for encoding of the attended the findings reported above, revealing that increasing in- stream, compared with no interference condition. telligibility of the interfering stream (English > Spanish > MuR) triggers stronger encoding to the attended speech. Combined with the timing of these effects reported Comparisons between Unattended earlier, they suggest that nonintelligible competitors Cross-correlations across Conditions cause earlier interference but weaker encoding of the attended stream, whereas fully intelligible competitor We next compared cross-correlation functions between causes late interference and strongest encoding of the the EEG data and unattended envelopes in the English– attended stream. English, English–Spanish, and English–MuR conditions Finally, the Single Talker (no interference) condition using the same procedure as above. Repeated-measures showed stronger envelope encoding of the attended ANOVA revealed no significant differences across speech than any of the interference conditions (Table 3). the three conditions. Similarly, post hoc pairwise com- Compared with the linguistic interference conditions parisons showed no significant differences between (English–English and English–Spanish), these differences unattended English–English and English–Spanish con- peaked between 250 and 350 msec; in the comparison ditions, indicating comparable encoding of both types with the English–MuR condition, they peaked a bit earlier of linguistic interference. However, both unattended (160–200 msec). These results further emphasize differ- linguistic interferences were significantly more encoded ential effects of linguistic versus nonlinguistic distractors. than the unattended envelope in the English–MuR con- More informatively, however, they also lend support to dition (Table 4), suggesting that linguistic interference the hypothesis that selective attention itself requires pro- was analyzed to a larger extent than the nonlinguistic cessing capacity (Johnston & Heinz, 1978; Kahneman, interference.

Table 4. Pairwise t Tests between Unattended Cross-correlation Functions across Conditions

Positive Cluster Negative Cluster Onset Peak Onset Peak Unattended vs. Unattended (msec) (msec) p T Cohen’sd (msec) (msec) p T Cohen’sd

English–English vs. English–Spanish N/A N/A ns N/A N/A N/A N/A ns N/A N/A English–English vs. English–MuR 40 130 .006 1459.71 0.7 30 390 .001 −2965.37 0.6 English–Spanish vs. English–MuR 40 390 .003 2417.06 1.0 30 330 .002 −2145.87 0.6

T = sum of all t values within the cluster; Cohen’s d = effect size at cluster peak.

Olguin, Bekinschtein, and Bozic 1615 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021 Attention over Time neural tracking of attended speech was observed across all conditions, with significantly weaker encoding for all The continuous nature of our stimuli allowed us to also types of unattended streams. Furthermore, the most test whether effects of attention on neural encoding re- prominent cross-correlation peaks in the attended signal main constant over time. To this end, in each condition appeared at around 100 msec post-onset, followed by we divided each block (60 sentences) into three equal peaks around 300 and 500 msec post-onset. The first parts (Beginning: Sentences 1–20, Middle: Sentences two peaks have comparable topographies and are clearly 21–40, End = Sentences 41–60) and then summed across emerging from one large cluster of significant correla- all “Beginning,”“Middle,” and “End” items per condition. tions; their latencies and topographies (which are prom- We then assessed the differences between the “begin- inently more central and bilateral than those of the 500 msec ning,”“middle,” and “end” of each narrative across sub- effects) suggest that they might reflect the N1/P2 and N2 jects. There were no significant differences in any components identified in the auditory evoked potential condition between the strength of neural encoding over (AEP) literature (Folstein & Van Petten, 2008; Picton & time (all ps > .05) for either attended or unattended Hillyard, 1974), which have been linked to aspects of sen- streams, indicating that encoding of speech was constant sory encoding of the stimulus. However, it is important to throughout the entire narrative for both attended and note that the latencies derived from the cross-correlation unattended cross-correlation functions. functions are not necessarily equivalent to those reported in the AEP literature (AEPs represent voltage potentials, whereas our data reflect correlation values between EEG DISCUSSION channels and the speech envelope at different latencies); This study aimed to understand how attention modulates hence, any interpretation in this context is necessarily the neural encoding of speech in the presence of differ- tentative. Both early and late cross-correlation effects for ent types of interference. We created a cocktail party par- attended speech observed here were previously reported adigm in which participants attended to one speaker in the literature and were also shown to be enhanced by while ignoring a competing stream in the other ear. In attention (e.g., Kong et al., 2014; Power, Foxe, Forde, Reilly, all conditions, participants attended to a narrative in En- & Lalor, 2012). Also comparable to the previous literature glish. The competing streams varied from fully intelligible are our findings about significantly reduced or absent English narratives to linguistic interference in a language clusters of cross-correlation peaks for unattended streams unknown to the listeners (Spanish) and nonlinguistic (Ding & Simon, 2012a; Power et al., 2012). Yet, our main noise (MuR). The results showed that attention affected question was about the possible influences of the type of the neural tracking of speech, with attended streams con- interfering signal on the processes of selective attention, sistently more encoded than the unattended ones. Criti- which is what we turn to next. cally, however, the characteristics of the interfering stream significantly modulated this process, with increas- Intelligibility of Interfering Speech Modulates ing intelligibility of the distractor causing stronger encod- the Encoding of the Attended Stream ing of both attended and unattended streams and later dissociation between them. To test how the mechanisms for dissociating between the Theoretical accounts of selective attention have put competing streams interact with the nature of the inter- forward a range of views on the locus and the mecha- fering signal, we manipulated the type of interfering nisms of dissociation between interfering auditory streams ranging from fully intelligible English narratives streams (Bronkhorst, 2015; Duncan, 1980; Johnston & to linguistic interference in a language unknown to the Heinz, 1978; Broadbent, 1958). Experimental evidence listeners (Spanish) and nonlinguistic noise (MuR). We has been mixed, with some authors emphasizing the predicted that the fully intelligible and meaningful dis- influence of task-dependent variables on the results ob- tractor in the English–English condition should produce tained (Rivenez et al., 2008). Our study used a natural lis- greatest interference, requiring the use of higher-level tening paradigm and the well-established phenomenon lexicosemantic features to dissociate between the two of neural encoding of the speech envelope as an index competing streams. The results confirmed that the type of processing of both attended and unattended streams. of interference significantly modulates how attended The speech envelope is known to offer key acoustic speech is encoded in the brain. Even if all three interfer- information concerning the syllabic rate of speech and ence conditions showed robust differences between the is critical for (Greenberg, Carvey, encoding of attended and unattended streams, the onset Hitchcock, & Chang, 2003; Rosen, 1992). of these differences was markedly dissimilar across con- Consistent with the existing literature (Rimmele et al., ditions. In the two conditions where the interfering 2015; Horton et al., 2013; Zion Golumbic et al., 2013; Ding stream was not meaningful to our listeners, either be- & Simon, 2012a, 2012b; Horton, D’Zmura, & Srinivasan, cause it was nonlinguistic (MuR) or it was in a language 2011), our results demonstrated that attention strongly they did not understand (Spanish), the difference in the modulated the neural encoding of the spoken signal: Robust encoding of the attended and unattended streams

1616 Journal of Cognitive Neuroscience Volume 30, Number 11 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021 emerged right from the onset. In contrast, intelligible in- ing in the English–English condition only differed from terfering speech (English) was encoded comparably to the English–Spanish condition from 320 msec onwards, the attended stream for up to 130 msec after the onset when lexicosemantic information would have become of the competing sentences, with differences between available to dissociate between competing linguistic them only emerging after that latency and peaking at streams (cf. Marslen-Wilson, 1973, 1978). In contrast, 510 msec in the right frontal areas. Direct comparisons both linguistic interference conditions triggered greater of differences in encoding across conditions (Figure 5) attentional encoding than the English–MuR condition further supported this conclusion, showing that nonlin- from the very onset, arguably reflecting earlier access to guistic interference (MuR) triggered strong early dissoci- the lower-level phonological information (i.e., spectral ation between the attended and unattended streams and temporal properties of formant frequencies needed across posterior central areas, whereas intelligible linguis- to dissociate speech from non-speech; Uppenkamp et al., tic interference (English) triggered stronger late dissocia- 2006) and easier dissociation of MuR from the attended tion across more frontal areas. This clearly indicates that linguistic stream. This distribution of effects over time nonlinguistic interference, which can be distinguished fits well with results from the literature on spoken word from the attended stream based on lower-level features recognition (e.g., Davis & Johnsrude, 2003; Moss, (speech vs. nonspeech) can be easily dissociated from McCormick, & Tyler, 1997; Frauenfelder & Tyler, 1987; the onset, resulting in an immediate enhancement of Marslen-Wilson, 1978), which show hierarchical pro- the attended signal. On the other hand, there is compa- cessing of spoken words from the initial acoustic analy- rable encoding for both attended and intelligible inter- ses to deriving their lexical and semantic properties at fering streams early on, such that the encoding of the the later processing stages. In the context of attentional attended stream only gets enhanced after both streams effects, these results directly follow from the hypothesis have been processed beyond their sensory properties. that attention consumes processing capacity (Johnston These results can be clearly interpreted within the & Heinz, 1978) such that, to avoid overloading it, stimuli framework of flexible accounts of selective attention are processed at a minimum level required to carry out a (Bronkhorst, 2015; Johnston & Heinz, 1978), where se- task. lection between the two streams can be achieved earlier Complementary results also emerged from the Single when the distractor is nonintelligible and does not neces- Talker condition, where participants attended to speech sitate the use of lexical information to dissociate it from without the presence of interference. As discussed, this the attended speech. The absence of such cues in the condition provides a test case for cortical entrainment English–English condition requires the use of more com- of speech that is not modulated by the processing plex lexicosemantic information, causing delayed selec- demands of divided attention, where all attention re- tion and later enhancement in the encoding of the sources can be fully allocated to the instructed task (cf. attended stream. Kahneman, 1973). Data revealed that attended speech Another hypothesis of the flexible selective attention in the Single Talker condition was significantly more en- accounts is that the use of higher-level semantic and syn- coded than in any of the interfering conditions, with dif- tactic information to dissociate between the two streams ferences peaking between 250 and 350 msec compared requires more processing capacity. If correct, this would with linguistic interference conditions (English–English predict that the strength of encoding of the attended and English–Spanish) and between 160 and 200 msec stream would vary as a function of intelligibility of the compared with nonlinguistic interference. The topogra- distractor, with fully intelligible distractors triggering phies of these comparisons also show a clear distinction strongest encoding. In line with the proposals about between linguistic and nonlinguistic interference, further attentional enhancement (e.g., Horton et al., 2013), this supporting the hypothesis that attentional mechanisms would imply that, to maintain full speech comprehension flexibly adapt to the differing demands of linguistic and in conditions where processing capacity is divided be- nonlinguistic distractors. tween two streams, the neurocognitive system might We next turn to the unattended cross-correlations be amplifying the neural activity synchronized to the at- across conditions and comparisons between them. If au- tended envelope most strongly in the presence of fully ditory selective attention is a flexible mechanism, where intelligible distractors (e.g., English–English), compared attended and unattended information can be differenti- with “easier” conditions (MuR/Spanish). We explored this ated at different processing depths depending on the hypothesis by directly comparing the strength of the type of distractor, then we could expect to see some dif- encoding across the attended streams across the three ferences in encoding across the three interference condi- interfering conditions. Results showed that, as the intelli- tions. Specifically, the distractor that can be dissociated gibility of the distractor increased, the strength of encod- from the attended speech stream earlier and more easily ing to the attended stream also increased, such that due to their lower-level differences (MuR) would be attended cross-correlation functions were strongest in expected to be encoded less strongly than the linguistic English–English condition and weakest for the English– distractors (English and Spanish). This is exactly the MuR condition. Notably, however, the strength of encod- pattern we observed, with both unattended linguistic

Olguin, Bekinschtein, and Bozic 1617 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021 interferences significantly more encoded than the unat- evidence. Journal of Experimental Psychology: Human – tended envelope in the English–MuR condition. The Perception and Performance, 21, 54 67. Bozic, M., Tyler, L. K., Ives, D. T., Randall, B., & Marslen-Wilson, differences are, however, subtle and only emerge in W. D. (2010). Bihemispheric foundations for human speech post hoc pairwise comparisons between conditions, pos- comprehension. Proceedings of the National Academy of sibly reflecting entrainment-based suppression of all Sciences, U.S.A., 107, 17439–17444. unattended streams (Horton et al., 2013). Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436. Broadbent, D. (1958). Perception and communication. Encoding of Attended Speech Remains London: Pergamon Press. Bronkhorst, A. W. (2015). The cocktail-party problem revisited: Constant over Time Early processing and selection of multi-talker speech. – To our knowledge, this is the first study that has used Attention, Perception, & Psychophysics, 77, 1465 1487. Brungart, D. S., & Simpson, B. D. (2007). Effect of target-masker continuous natural speech to test whether effects of at- similarity on across-ear interference in a dichotic cocktail tention on neural encoding remain constant over time. party listening task. Journal of Acoustic Society of America, To do this, we compared the encoding in the beginning, 122, 1724–1734. the middle, and the end of the narrative (with each nar- Cherry, E. C. (1953). Some experiments on the recognition of rative being 3 min long), across all conditions. No signif- speech with one and two ears. Journal of the Acoustical Society of America, 25, 975–979. icant differences emerged in any condition for both Davis, M. H., & Johnsrude, I. S. (2003). Hierarchical processing attended and unattended cross-correlation functions, in- in spoken language comprehension. Journal of Neuroscience, dicating that the neural encoding of speech remained 23, 3423–3431. constant over time. Delorme, A. (2006). Statistical methods. In J. Webster (Ed.), Encyclopedia of medical devices and instrumentation (pp. 240–264). Hoboken: Wiley Interscience. Delorme, A., & Makeig, S. (2004). EEGLAB: An open source Conclusion toolbox for analysis of single-trial EEG dynamics including Our results demonstrate that top–down attention signifi- independent component analysis. Journal of Neuroscience Methods, 134, 9–21. cantly modulates the neural encoding of attended speech Deutsch, J., & Deutsch, D. (1963). Attention: Some theoretical in the presence of interference. Characteristics of the in- considerations. Psychological Review, 70, 80–90. terfering stream significantly modulate this process, with Ding, N., Chatterjee, M., & Simon, J. Z. (2013). Robust cortical increasing intelligibility of the distractor causing stronger entrainment to the speech envelope relies on the spectro- – encoding of both attended and unattended streams and temporal fine structure. Neuroimage, 88C, 41 46. Ding, N., & Simon, J. Z. (2012a). Emergence of neural encoding later dissociation between them. These effects remain of auditory objects while listening to competing speakers. constant over the course of a narrative. The results offer Proceedings of the National Academy of Sciences, U.S.A., strong support to flexible accounts of selective attention. 109, 11854–11859. Ding, N., & Simon, J. Z. (2012b). Neural coding of continuous speech in during monaural and dichotic Reprint requests should be sent to Andrea Olguin, Department listening. Journal of Neurophysiology, 107, 78–89. of Psychology, University of Cambridge, CB2 3EB, Cambridge, Doelling, K. B., Arnal, L. H., Ghitza, O., & Poeppel, D. (2014). United Kingdom, or via e-mail: [email protected]. Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing. Neuroimage, 85, 761–768. REFERENCES Drullman, R., Festen, J. M., & Plomp, R. (1994). Effect of reducing slow temporal modulations on speech reception. Abrams, D. A., Nicol, T., Zecker, S., & Kraus, N. (2008). Right- Journal of the Acoustical Society of America, 95, 1053. hemisphere auditory cortex is dominant for coding syllable Duncan, J. (1980). The locus of interference in the perception patterns in speech. Journal of Neuroscience, 28, 3958–3965. of simultaneous stimuli. Psychological Review, 87, Aiken, S. J., & Picton, T. W. (2008). Human cortical responses to 272–300. the speech envelope. Ear and Hearing, 29, 139–157. Dupoux, E., Kouider, S., & Mehler, J. (2003). Lexical access Aydelott, J., Jamaluddin, Z., & Nixon Pearce, S. (2015). Semantic without attention? Explorations using dichotic priming. processing of unattended speech in dichotic listening. Journal of Experimental Psychology: Human Perception Journal of the Acoustical Society of America, 138, 964–975. and Performance, 29, 172–184. Baltzell, L. S., Horton, C., Shen, Y., Richards, V. M., D’Zmura, M., Eich, E. (1984). Memory for unattended events: Remembering & Srinivasan, R. (2016). Attention selectively modulates with and without awareness. Memory & Cognition, 12, cortical entrainment in different regions of the speech 105–111. spectrum. Brain Research, 1644, 203–212. Folstein, J. R., & Van Petten, C. (2008). Influence of cognitive Bendat, J. S., & Piersol, A. G. (1986). Random data: Analysis control and mismatch on the N2 component of the ERP: and measurement procedures. New York: Wiley. A review. Psychophysiology, 45, 152–170. Benjamini, Y., & Yekutieli, D. (2001). The control of the false Frauenfelder, U., & Tyler, L. K. (1987). The process of spoken discovery rate in multiple testing under dependency. word recognition: An introduction. Cognition, 25, 1–20. The Annals of Statistics, 29, 1165–1188. Giraud, A., & Poeppel, D. (2012). Speech perception from a Bentin, S., Kutas, M., & Hillyard, S. A. (1995). Semantic neurophysiological perspective. In The human auditory processing and memory for attended and unattended words cortex. Springer Handbook of Auditory Research (Vol. 43, in dichotic listening: Behavioral and electrophysiological pp. 225–260). New York: Springer.

1618 Journal of Cognitive Neuroscience Volume 30, Number 11 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021 Greenberg, S., Carvey, H., Hitchcock, L., & Chang, S. Newstead, S., & Dennis, I. (1979). Lexical and grammatical (2003). Temporal properties of spontaneous speech—A processing of unshadowed messages: A re-examination of syllable-centric perspective. Journal of Phonetics, 31, the Mackay effect. Quarterly Journal of Experimental 465–485. Psychology, 31, 477–488. Hambrook, D. A., & Tata, M. S. (2014). Theta-band phase Peelle, J. E., Gross, J., & Davis, M. H. (2013). Phase-locked tracking in the two-talker problem. Brain and Language, responses to speech in human auditory cortex are enhanced 135, 52–56. during comprehension. Cerebral Cortex, 23, 1378–1387. Holender, D. (1986). Semantic activation without conscious Pelli, D. G. (1997). The VideoToolbox software for visual identification in dichotic listening, parafoveal vision, and psychophysics: Transforming numbers into movies. Spatial visual masking: A survey and appraisal. Behavioral and Vision, 10, 437–442. Brain Sciences, 9, 1–23. Picton, T. W., & Hillyard, S. A. (1974). Human auditory evoked Horton, C., D’Zmura, M., & Srinivasan, R. (2011). EEG reveals potentials. II. Effects of attention. divergent paths for speech envelopes during selective and Clinical Neurophysiology, 36, 191–200. attention. International Journal of Bioelectromagnetism, Power, A. J., Foxe, J. J., Forde, E. J., Reilly, R. B., & Lalor, E. C. 13, 217–222. (2012). At what time is the cocktail party? A late locus of Horton, C., D’Zmura, M., & Srinivasan, R. (2013). Suppression selective attention to natural speech. European Journal of of competing speech through entrainment of cortical Neuroscience, 35, 1497–1503. oscillations. Journal of Neurophysiology, 109, 3082–3093. Pulvermüller, F., Shtyrov, Y., Hasting, A., & Carlyon, R. P. Horton, C., Srinivasan, R., & D’Zmura, M. (2014). Envelope (2008). Syntax as a reflex: Neurophysiological evidence for responses in single-trial EEG indicate attended speaker early automaticity of grammatical processing. Brain and in a cocktail party. Journal of Neural Engineering, 141, Language, 104, 244–253. 520–529. Rimmele, J. M., Zion Golumbic, E., Schröger, E., & Poeppel, D. James, W. (1890). The principles of psychology. New York: (2015). The effects of selective attention and speech Henry Holt and Company. acoustics on neural speech-tracking in a multi-talker scene. Johnston, W. A., & Heinz, S. P. (1978). Flexibility and capacity Cortex, 68, 144–154. demands of attention. Journal of Experimental Psychology: Rivenez, M., Guillaume, A., Bourgeon, L., & Darwin, C. J. General, 107, 420–435. (2008). Effect of voice characteristics on the attended and Kahneman, D. (1973). Attention and effort. Englewood Cliffs, unattended processing of two concurrent messages. NJ: Prentice-Hall, Inc. European Journal of , 20, Kerlin, J. R., Shahin, A. J., & Miller, L. M. (2010). Attentional gain 967–993. control of ongoing cortical speech representations in a Rosen, S. (1992). Temporal information in speech: Acoustic, “cocktail party.” Journal of Neuroscience, 30, 620–628. auditory and linguistic aspects. Philosophical Transactions Kong, Y. Y., Mullangi, A., & Ding, N. (2014). Differential of the Royal Society of London, Series B, Biological Sciences, modulation of auditory responses to attended and 336, 367–373. unattended speech in different listening conditions. Hearing Schroeder, C. E., & Lakatos, P. (2010). Low-frequency neuronal Research, 316, 73–81. oscillations as instruments of sensory selection. Trends in Lakatos, P., Karmos, G., Mehta, A. D., Ulbert, I., & Schroeder, Neuroscience, 32, 9–18. C. E. (2008). Entrainment of neuronal attentional selection. Thwaites, A., Nimmo-Smith, I., Fonteneau, E., Patterson, R. D., Science, 320, 23–25. Buttery, P., & Marslen-Wilson, W. D. (2015). Tracking cortical Lalor, E. C., & Foxe, J. J. (2010). Neural responses to entrainment in neural activity: Auditory processes in human uninterrupted natural speech can be extracted with precise temporal cortex. Frontiers in Computational Neuroscience, temporal resolution. European Journal of Neuroscience, 31, 9, 1–13. 189–193. Treisman, A. M. (1969). Strategies and models of selective Malhotra, P., Coulthard, E. J., & Husain, M. (2009). Role of right attention. Psychological Review, 76, 282–299. posterior parietal cortex in maintaining attention to spatial Uppenkamp, S., Johnsrude, I. S., Norris, D., Marslen-Wilson, W., locations over time. Brain, 132, 645–660. & Patterson, R. D. (2006). Locating the initial stages of Manly, T., Owen, A. M., McAvinue, L., Datta, A., Lewis, G. H., speech-sound processing in human temporal cortex. Scott, S. K., et al. (2003). Enhancing the sensitivity of a Neuroimage, 31, 1284–1296. sustained attention task to frontal damage: Convergent clinical Wood, N., & Cowan, N. (1995). The cocktail party phenomenon and functional imaging evidence. Neuroscase, 9, 340–349. revisited: How frequent are attention shifts to one’s name in Maris, E., & Oostenveld, R. (2007). Nonparametric statistical an irrelevant auditory channel? Learning, Memory, and testing of EEG- and MEG-data. Journal of Neuroscience Cognition, 21, 255–260. Methods, 164, 177–190. Wood, N. L., Stadler, M. A., & Cowan, N. (1997). Is there implicit Marslen-Wilson, W. D. (1973). Linguistic structure and speech memory without attention? A reexamination of task demands shadowing at very short latencies. Nature, 244, 522. in Eich’s (1984) procedure. Memory & Cognition, 25, Marslen-Wilson, W. D. (1978). Processing interactions and 772–779. lexical access during word recognition in continuous speech. Zion Golumbic, E. M., Ding, N., Bickel, S., Lakatos, P., Schevon, Cognitive Psychology, 10, 29–63. C. A., McKhann, G. M., et al. (2013). Mechanisms underlying Moss, H. E., McCormick, S., & Tyler, L. K. (1997). The time course selective neuronal tracking of attended speech at a ‘cocktail of activation of semantic information during spoken word party’. Neuron, 77, 980–991. recognition. Language and Cognitive Processes, 12, 695–731. Zion Golumbic, E. M., Poeppel, D., & Schroeder, C. E. (2012). Murphy, S., Fraenkel, N., & Dalton, P. (2013). Perceptual load Temporal context in speech processing and attentional does not modulate auditory distractor processing. Cognition, stream selection: A behavioral and neural perspective. 129, 345–355. Brain and Language, 122, 151–161.

Olguin, Bekinschtein, and Bozic 1619 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/jocn_a_01303 by guest on 29 September 2021