Speech Categorization Is Better Described by Induced Rather Than Evoked Neural Activity

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.20.347526; this version posted October 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Title: Speech categorization is better described by induced rather than evoked neural activity Md Sultan Mahmud1,2,a, Mohammed Yeasin1,2, and Gavin M. Bidelman2,3,4 1 Department of Electrical and Computer Engineering, University of Memphis, Memphis, TN 38152, USA 2Institute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USA 3School of Communication Sciences & Disorders, University of Memphis, Memphis, TN 38152, USA 4University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN 38103, USA a Email: [email protected] , ORCID: 0000-0002-7361-122X 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.20.347526; this version posted October 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. ABSTRACT Categorical perception (CP) describes how the human brain categorizes speech despite inherent acoustic variability. We examined neural correlates of CP in both evoked and induced EEG activity to evaluate which mode best describes the process of speech categorization. Using source reconstructed EEG, we used band-specific evoked and induced neural activity to build parameter optimized support vector machine (SVMs) model to assess how well listeners’ speech categorization could be decoded via whole-brain and hemisphere-specific responses. We found whole-brain evoked β-band activity decoded prototypical from ambiguous speech sounds with ~70% accuracy. However, induced γ-band oscillations showed better decoding of speech categories with ~95% accuracy compared to evoked β-band activity (~70% accuracy). Induced high frequency (γ-band) oscillations dominated CP decoding in the left hemisphere, whereas lower frequency (θ-band) dominated decoding in the right hemisphere. Moreover, feature selection identified 14 brain regions carrying induced activity and 22 regions of evoked activity that were most salient in describing category-level speech representations. Among the areas and neural regimes explored, we found that induced γ-band modulations were most strongly associated with listeners’ behavioral CP. Our data suggest that the category-level organization of speech is dominated by relatively high frequency induced brain rhythms. Keywords: Categorical perception; time-frequency analysis; induced oscillations; gamma-band activity, machine learning; support vector machine (SVM). 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.20.347526; this version posted October 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 I. INTRODUCTION 2 The human brain classifies diverse acoustic information into smaller, more meaningful 3 groupings (Bidelman and Walker, 2017), a process known as categorical perception (CP). CP 4 plays a critical role in auditory perception, speech acquisition, and language processing. Brain 5 imaging studies have shown that neural responses elicited by prototypical speech sounds (i.e., 6 those heard with a strong phonetic category) differentially engage Heschl’s (HG) and inferior 7 frontal (IFG) gyrus compared to ambiguous speech (Bidelman et al., 2013; Bidelman and Lee, 8 2015; Bidelman and Walker, 2017). Previous studies also demonstrate that the N1 and P2 waves 9 of the event-related potentials (ERPs ) are highly sensitive to speech perception and correlate 10 with CP (Alain, 2007; Bidelman et al., 2013; Mankel et al., 2020). These studies demonstrate 11 that evoked activity in the time domain provides a neural correlate of speech categorization. 12 However, ERP studies do not reveal how induced brain activity (so-called neural oscillations) 13 might contribute to this process. 14 The electroencephalogram (EEG) can be divided into evoked (i.e., phase-locked) and 15 induced (i.e., non-phase locked) responses that vary in a frequency-specific manner (Shahin et 16 al., 2009). Evoked responses are largely related to the stimulus, whereas induced responses are 17 additionally linked to different perceptual and cognitive processes that emerge during task 18 engagement. These later brain oscillations (neural rhythms) play an important role in perceptual 19 and cognitive processes and reflect different aspects of speech perception. For example, low 20 frequency [e.g., θ (4-8 Hz) ] bands are associated with syllable segmentation (Luo and Poeppel, 21 2012) whereas α (9-13 Hz) band has been linked with attention (Klimesch, 2012) and speech 22 intelligibility (Dimitrijevic et al., 2017). Several studies report listeners’ speech categorization 23 efficiency varies in accordance with their underlying induced and evoked neural activity 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.20.347526; this version posted October 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 24 (Bidelman et al., 2013; Bidelman and Alain, 2015; Bidelman and Lee, 2015). For instance, 25 Bidelman assessed correlations between ongoing neural activity (e.g., induced activity) and the 26 slopes of listeners’ identification functions, reflecting the strength of their CP (Bidelman, 2017). 27 Listeners were slower and varied in their classification of more category-ambiguous speech 28 sounds, which covaried with increases in induced γ activity (Bidelman, 2017). Changes in β (14- 29 30 Hz) power are also strongly associated with listeners’ speech identification skills (Bidelman, 30 2015). The β frequency band is linked with auditory template matching (Shahin et al., 2009) 31 between stimuli and internalized representations kept in memory (Bashivan et al., 2014), whereas 32 the higher γ frequency range (>30 Hz) is associated with auditory object construction (Tallon- 33 Baudry and Bertrand, 1999) and local network synchronization (Giraud and Poeppel, 2012; 34 Haenschel et al., 2000; Si et al., 2017). 35 Studies also demonstrate hemispheric asymmetries in neural oscillations. During syllable 36 processing, there is a dominance of γ frequency activity in LH and θ frequency activity in RH 37 (Giraud et al., 2007; Morillon et al., 2012). Other studies show that during speech perception and 38 production, lower frequency bands (3-6 Hz) better correlate with behavioral reaction times than 39 higher frequencies (20-50 Hz) (Yellamsetty and Bidelman, 2018). Moreover, induced γ-band 40 correlates with speech discrimination and perceptual computations during acoustic encoding (Ou 41 and Law, 2018), further suggesting it reflects a neural representation of speech above and beyond 42 evoked activity alone. 43 Still, given the high dimensionality of EEG data, it remains unclear which frequency 44 bands, brain regions, and “modes” of neural function (i.e., evoked vs. induced signaling) are 45 most conducive to describing the neurobiology of speech categorization. To this end, the recent 46 application of machine learning (ML) to neuroscience data might prove useful in identifying the 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.20.347526; this version posted October 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 47 most salient features of brain activity that predict human behaviors. ML is an entirely data-driven 48 approach that “decodes” neural data with minimal assumptions on the nature of exact 49 representation or where those representations emerge. Germane to the current study, ML has 50 been successfully applied to decode the speed of listeners’ speech identification (Al-Fahad et al., 51 2020) and related receptive language brain networks (Mahmud et al., 2020) from multichannel 52 EEGs. 53 Departing from previous hypothesis-driven studies (Bidelman, 2017; Bidelman and 54 Alain, 2015; Bidelman and Walker, 2017), the present work used a comprehensive data-driven 55 approach (i.e., stability selection and SVM classifiers) to investigate the neural mechanisms of 56 speech categorization using whole-brain electrophysiological data. Our goals were to evaluate 57 which neural regime [i.e., evoked (phase-synchronized ERP) vs. induced oscillations], frequency 58 bands, and brain regions are most associated with CP using whole-brain activity via a data-driven 59 approach. Based on prior work, we hypothesized that evoked and induced brain responses would 60 both differentiate the degree to which speech sounds carry category-level information (i.e., 61 prototypical vs. ambiguous sounds from an acoustic-phonetic continuum). However, we 62 predicted induced activity would best distinguish category-level speech representations, 63 suggesting a dominance of endogenous brain rhythms in describing the neural underpinnings of 64 CP. 65 II. MATERIALS & METHODS 66 A. Participants 67 Forty-eight young (male: 15, female: 33; aged 18 to 33 years) participated in the study 68 (Bidelman et al., 2020; Bidelman and Walker, 2017; Mankel et al., 2020). All participants had 69 normal hearing sensitivity (i.e., <25 dB HL between 500-2000 Hz) and no previous history of 4 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.20.347526;

Load more