bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
1 Neural Encoding of Auditory Features during Music Perception and Imagery:
2 Insight into the Brain of a Piano Player
3 Auditory Features and Music Imagery
4 Stephanie Martin1, 2, 8, Christian Mikutta2, 3, 4, 8, Matthew K. Leonard5, Dylan Hungate5, Stefan
5 Koelsch6, Edward F. Chang5, José del R. Millán1, Robert T. Knight2, 7, Brian N. Pasley2, 9
6
7 1 Defitech Chair in Brain-Machine Interface, Center for Neuroprosthetics, Ecole
8 Polytechnique Fédérale de Lausanne, Switzerland
2 9 Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA
3 10 Translational Research Center and Division of Clinical Research Support, Psychiatric
11 Services University of Bern (UPD), University Hospital of Psychiatry, Bern, Switzerland
12 4 Department of Neurology, Inselspital, Bern, University Hospital, University of Bern,
13 Switzerland
14 5Department of Neurological Surgery, Department of Physiology, and Center for Integrative
15 Neuroscience, University of California, San Francisco, CA 94143, USA.
16 6 Languages of Emotion, Freie Universitä, Berlin, Germany
17 7Department of Psychology, University of California, Berkeley, CA 94720, USA
18 8Co-first author
19 9Lead Contact
20
21 Corresponding Author: Brian Pasley, University of California, Berkeley, Helen Wills
22 Neuroscience Institute, 210 Barker Hall, Berkeley, CA 94720, USA, [email protected]
23
24
25
1 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
26 Abstract
27 It remains unclear how the human cortex represents spectrotemporal sound features during
28 auditory imagery, and how this representation compares to auditory perception. To assess
29 this, we recorded electrocorticographic signals from an epileptic patient with proficient music
30 ability in two conditions. First, the participant played two piano pieces on an electronic piano
31 with the sound volume of the digital keyboard on. Second, the participant replayed the same
32 piano pieces, but without auditory feedback, and the participant was asked to imagine hearing
33 the music in his mind. In both conditions, the sound output of the keyboard was recorded,
34 thus allowing precise time-locking between the neural activity and the spectrotemporal
35 content of the music imagery. For both conditions, we built encoding models to predict high
36 gamma neural activity (70-150Hz) from the spectrogram representation of the recorded
37 sound. We found robust similarities between perception and imagery – in frequency and
38 temporal tuning properties in auditory areas.
39
40 Keywords: electrocorticography, music perception and imagery, spectrotemporal receptive
41 fields, frequency tuning, encoding models.
42
43 Abbreviations
44 ECoG: electrocorticography 45 HG: high gamma 46 IFG: inferior frontal gyrus 47 MTG: middle temporal gyrus 48 Post-CG: post-central gyrus 49 Pre-CG: pre-central gyrus 50 SMG: supramarginal gyrus 51 STG: superior temporal gyrus 52 STRF: spectrotemporal receptive field 53
2 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
54 Introduction
55 Auditory imagery is defined here as the mental representation of sound perception in the
56 absence of external auditory stimulation. The experience of auditory imagery is common,
57 such as when a song runs continually through someone’s mind. On an advanced level,
58 professional musicians are able to imagine a piece of music by looking at the sheet music [1].
59 Behavioral studies have shown that structural and temporal properties of auditory features
60 (see [2] for complete review), such as pitch [3], timbre [4,5], loudness [6] and rhythm [7] are
61 preserved during auditory imagery. However, it is unclear how these auditory features are
62 represented in the brain during imagery. Experimental investigation is difficult due to the lack
63 of observable stimulus or behavioral markers during auditory imagery. Using a novel
64 experimental paradigm to synchronize auditory imagery events to neural activity, we
65 investigated the neural representation of spectrotemporal auditory features during auditory
66 imagery in an epileptic patient with proficient music abilities.
67
68 Previous studies have identified anatomical regions active during auditory imagery [8], and
69 how they compare to actual auditory perception. For instance, lesion [9] and brain imaging
70 studies [4,10–14] have confirmed the involvement of bilateral temporal lobe regions during
71 auditory imagery (see [15] for a review). Brain areas consistently activated with fMRI during
72 auditory imagery include the secondary auditory cortex [10,12,16], the frontal cortex [16,17],
73 the sylvian parietal temporal area [18], ventrolateral and dorsolateral cortices [19] and the
74 supplementary motor area [17,20–24]. Anatomical regions active during auditory imagery
75 have been compared to actual auditory perception to understand the interactions between
76 externally and internally driven cortical processes. Several studies showed that auditory
77 imagery has substantial, but not complete overlap in brain areas with music perception [8] –
78 e.g. the secondary auditory cortex is consistently activated during music imagery and
3 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
79 perception while the primary auditory areas appear to be activated solely during auditory
80 perception [4,10,25,26].
81
82 These studies have helped to unravel anatomical brain areas involved in auditory perception
83 and imagery, however, there is lack of evidence for the representation of specific acoustic
84 features in the human cortex during auditory imagery. It remains a challenge to investigate
85 neural processing during internal subjective experience like music imagery, due to the
86 difficulty in time-locking brain activity to a measurable stimulus during auditory imagery. To
87 address this issue, we recorded electrocorticographic neural signals (ECoG) of a proficient
88 piano player in a novel task design that permitted robust marking of the spectrotemporal
89 content of the intended music imagery to neural activity – thus allowing us to investigate
90 specific auditory features during auditory imagery. In the first condition, the participant
91 played an electronic piano with the sound output turned on. In this condition, the sound was
92 played out loud through speakers at a comfortable sound volume that allowed auditory
93 feedback (perception condition). In the second condition, the participant played the electronic
94 piano with the speakers turned off, and instead imagined the corresponding music in his mind
95 (imagery condition). In both conditions, the digitized sound output of the MIDI-compatible
96 sound module was recorded. This provided a measurable record of the content and timing of
97 the participant’s music imagery when the speakers of the keyboard were turned off and he did
98 not hear the music. This task design allowed precise temporal alignment between the recorded
99 neural activity and spectrogram representations of music perception and imagery – providing
100 a unique opportunity to apply receptive field modeling techniques to quantitatively study
101 neural encoding during auditory imagery.
102
103 A well-established role of the early auditory system is to decompose complex sounds into
104 their component frequencies [27–29], giving rise to tonotopic maps in the auditory cortex (see
4 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
105 [30] for a review). Auditory perception has been extensively studied in animal models and
106 humans using spectrotemporal receptive field (STRFs) analysis [27,31–34], which identifies
107 the time-frequency stimulus features encoded by a neuron or population of neurons. STRFs
108 are consistently observed during auditory perception tasks, but the existence of STRFs during
109 auditory imagery is unclear due to the experimental challenges associated with synchronizing
110 neural activity and the imagined stimulus. To characterize and compare the spectrotemporal
111 tuning properties during auditory imagery and perception, we fitted two encoding models on
112 data collected from the perception and imagery conditions. In this case, encoding models
113 describe the linear mapping between a given auditory stimulus representation and its
114 corresponding brain response. For instance, encoding models have revealed the neural tuning
115 properties of various speech features, such as acoustic, phonetic and semantic representations
116 [33,35–38].
117
118 In this study, the neural representation of music perception and imagery was quantified by
119 spectrotemporal receptive fields (STRFs) that predict high gamma (HG; 70-150Hz) neural
120 activity. High gamma correlates with the spiking activity of the underlying neuronal ensemble
121 [39–41] and reliably tracks speech and music features in auditory and motor cortex [33,42–
122 45]. Results demonstrated the presence of robust spectrotemporal receptive fields during
123 auditory imagery with extensive overlap in frequency tuning and cortical location compared
124 to receptive fields measured during auditory perception. These results provide a quantitative
125 characterization of the shared neural representation underlying auditory perception and the
126 subjective experience of auditory imagery.
127
128 Results
129 High gamma neural encoding during auditory perception and imagery
5 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
130 In this study, ECoG recordings were obtained using subdural electrode arrays implanted in a
131 patient undergoing neurosurgical procedures for epilepsy. The participant was a proficient
132 piano player (age of start: 7, years of music education: 10, hours of training per week: 5).
133 Prior studies have shown that music training is associated with improved auditory imagery
134 ability, such as pitch and temporal acuity [46–48]. We took advantage of this rare clinical
135 case to investigate the neural correlates of music perception and imagery. In the first task, the
136 participant played two music pieces on an electronic piano with the speakers of the digital
137 keyboard turned on (perception condition; Fig 1A). In the second task, the participant played
138 the two same piano pieces, but the volume of the speaker system was turned off to establish a
139 silent room. The participant was asked to imagine hearing the corresponding music in his
140 mind as he played the piano (imagery condition; Fig 1B). In both conditions, the sound
141 produced by the MIDI-compatible sound module was digitally recorded in synchrony with the
142 ECoG signal. The recorded sound allowed synchronizing the auditory spectrotemporal
143 patterns of the imagined music and the neural activity when the speaker system was turned off
144 and no audible sounds were recorded in the room.
145
146
147 Fig 1. Experimental task design.
148 (A) The participant played an electronic piano with the sound of the digital keyboard turned
149 on (perception condition). (B) In the second condition, the participant played the piano with
150 the sound turned off and instead imagined the corresponding music in his mind (imagery
151 condition). In both conditions, the digitized sound output of the MIDI-compatible sound
6 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
152 module was recorded in synchrony with the neural signals (even when the participant did not
153 hear any sound in the imagery condition). The models take as input a spectrogram consisting
154 of time-varying spectral power across a range of acoustic frequencies (200– 7,000 Hz, bottom
155 left) and output time-varying neural signals. To assess the encoding accuracy, the predicted
156 neural signal (light lines) is compared to the original neural signal (dark lines).
157
158 The participant performed two famous music pieces: Chopin’s Prelude in C minor, Op. 28,
159 No.20 and Bach’s Prelude in C major BWV 846. Example auditory spectrograms from
160 Chopin’s Prelude determined through the participant's key presses with the electronic piano
161 are shown in Fig 2B for both perception and imagery conditions. To evaluate how
162 consistently the participant performed across perception and imagery tasks, we computed the
163 realigned Euclidean distance [49] between the spectrograms of the same music pieces played
164 across conditions (within-stimulus distance). We compared the within-stimulus distance with
165 the realigned Euclidean distance between the spectrograms of the different musical pieces
166 (between-stimulus distance). The realigned Euclidean distance was 251.6% larger for the
167 between-stimulus distance compared to the within-stimulus distance (p<10-3; randomization
168 test), suggesting that the spectrograms of same musical pieces played across conditions were
169 more similar than the spectrograms of different musical pieces. This indicates that the
170 participant performed the task with relative consistency and specificity across the two
171 conditions. To compare spectrotemporal auditory representations during music perception and
172 music imagery tasks, we fit separate encoding models in each condition. We used these
173 models to quantify specific anatomical and neural tuning differences between auditory
174 perception and imagery.
175
7 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
176
177 Fig 2. Encoding accuracy.
178 (A) Electrode location overlaid on cortical surface reconstruction of the participant’s
179 cerebrum. (B) Overlay of the spectrogram contours for the perception (blue) and imagery
180 (orange) condition (10% of maximum energy from the spectrograms). (C) Actual and
181 predicted high gamma band power (70–150 Hz) induced by the music perception and imagery
182 segment in (B). Top electrodes have very similar predictive power, whereas bottom electrodes
183 are very different for perception and imagery. Recordings are from two different STG sites,
184 highlighted in pink in (A). (D) Encoding accuracy is plotted on the cortical surface
185 reconstruction of the participant’s cerebrum (map thresholded at p<0.05; FDR correction). (E)
186 Encoding accuracy of significant electrodes of the perception model as a function the imagery
187 model. Electrode-specific encoding accuracy is correlated between both perception and
188 imagery models (r=0.65; p<10-4; randomization test). (F) Encoding accuracy as a function of
189 anatomic location (pre-central gyrus (pre-CG), post-central gyrus (post-CG), supramarginal
190 gyrus (SMG), medial temporal gyrus (MTG) and superior temporal gyrus (STG)).
191
8 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
192 For both perception and imagery conditions, the observed and predicted high gamma neural
193 responses are illustrated for two individual electrodes in the temporal lobe, respectively (Fig
194 2C), together with the corresponding music spectrum (Fig 2B). The predicted neural response
195 for the electrode shown in the upper panel of Fig 2B was significantly correlated with its
196 corresponding measured neural response in both perception (r=0.41; p<10-7; one-sample z-
197 test; FDR correction) and imagery (r=0.42; p<10-4; one-sample z-test; FDR correction)
198 conditions. The predicted neural response for the lower panel electrode was correlated with
199 the actual neural response only in the perception condition (r=0.23; p<0.005; one-sample z-
200 test; FDR correction) but not in the imagery condition (r=-0.02; p>0.5; one-sample z-test;
201 FDR correction). The difference between both conditions was significant for the electrode in
202 the lower panel (p<0.05; two-sample t-test), but not in the upper panel (p>0.5; two-sample t-
203 test). This suggests that there is a strong continuous relationship between time-varying
204 imagined sound features and STG neural activity, but that this relationship is dependent on
205 cortical location.
206
207 To further investigate anatomical similarities and differences between the perception and
208 imagery conditions, we plotted the anatomical layout of prediction accuracy of individual
209 electrodes. In both conditions, results showed that sites with the highest prediction in both
210 conditions were located in the superior and middle temporal gyrus, pre- and post-central
211 gyrus, and supramarginal gyrus (Fig 2D; heat map thresholded to p<0.05; one-sample z-test;
212 FDR correction), consistent with previous ECoG results [33]. Among the 256 electrodes
213 recorded, 210 were fitted in the encoding model, while the remaining 46 electrodes were
214 removed due to excessive noise. Within the fitted electrodes, while 35 and 15 electrodes were
215 significant in the perception and imagery condition, respectively (p<0.05; one-sample z-test;
216 FDR correction), of which 9 electrodes were significant in both conditions. Anatomic
217 locations of the electrodes with significant encoding accuracy are depicted in Fig 3 and Fig
9 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
218 S1. To compare the encoding accuracy across conditions, we performed additional analysis on
219 the electrodes that had significant encoding accuracy in at least one condition (41 electrodes;
220 unless otherwise stated). Prediction accuracy of individual electrodes was correlated between
221 perception and imagery (Fig 2E; 41 electrodes; r=0.65; p<10-4; randomization test). Because
222 both perception and imagery models are based on the same auditory stimulus representation,
223 the correlated prediction accuracy provides strong evidence for a shared neural representation
224 of sound based on spectrotemporal features.
225
226 To assess how brain areas encoding auditory features varied across experimental conditions,
227 we analyzed the significant electrodes in the gyri highlighted in Fig 2A (pre-central gyrus
228 (pre-CG), post-central gyrus (post-CG), supramarginal gyrus (SMG), medial temporal gyrus
229 (MTG) and superior temporal gyrus (STG)) using Wilcoxon signed-rank test (p>0.05; one-
230 sample Kolmogorov-Smirnov test; Fig 2F). Results showed that the encoding accuracy in the
231 MTG and STG was higher for the perception (MTG: M = 0.16, STG: M = 0.13) than for the
232 imagery (MTG: M = 0.11, STG: M= 0.08; p<0.05; Wilcoxon signed-rank test; Bonferroni
233 correction). The encoding accuracy in the pre-CG, post-CG and SMG was not different
234 between the perception (pre-CG M=0.17; post-CG M=0.15; SMG M=0.12; p>0.5; Wilcoxon
235 signed-rank test; Bonferroni correction) and imagery (pre-CG M=0.14; post-CG M=0.13;
236 SMG M=0.12; p>0.5; Wilcoxon signed-rank test; Bonferroni correction) conditions. The
237 significant improvement of the perception vs. imagery model was thus specific to the
238 temporal lobe, which may reflect underlying differences in spectrotemporal encoding
239 mechanisms, or alternatively, a greater sensitivity to discrepancies between the actual content
240 of imagery and the recorded sound stimulus used in the model.
241
242 Spectrotemporal tuning during auditory perception and imagery
243 Auditory imagery and perception are distinct subjective experiences yet both are
10 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
244 characterized by a sense of sound. How does the auditory system encode sound during music
245 perception and imagery? Examples of standard STRFs are shown in Fig 3 for temporal
246 electrodes (Fig S2 for all the STRFs). These STRFs highlight neural stimulus preferences as
247 shown by the excitatory (warm color) and inhibitory (cold color) subregions. Fig 4A shows
248 the latency (s) for the perception and imagery conditions, defined as the temporal coordinates
249 of the maximum deviation in the STRF. The peak latency was significantly correlated
250 between both conditions (r=0.43; p<0.005; randomization test), suggesting that both
251 perception and imagery are simultaneously active for most of their response durations. Then,
252 we analyzed frequency tuning curves estimated from the STRFs (see Materials and Methods
253 for details). Examples of tuning curves for both perception and imagery encoding models are
254 shown for the electrodes indicated by the black outline in the anatomic brain (Fig 4B). Across
255 conditions, the majority of individual electrodes exhibited a complex frequency tuning
256 profile. For each electrode, similarities between the tuning curves in the perception and
257 imagery models were quantified using Pearson’s correlation coefficient. The anatomical
258 distribution of tuning curve similarity is plotted in Fig 4B, with the correlation at individual
259 sites ranging between r=-0.3-0.6. The effect of anatomical location (pre-CG, post-CG, SMG,
260 MTG and STG) on tuning curve similarity was not significant (Chi-square=3.59; p>0.1;
261 Kruskal-Wallis Test). Similarities in tuning curve shape between auditory imagery and
262 perception suggest a shared auditory representation, but there is no evidence that similarities
263 depended on gyral area.
11 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
264
265 Fig 3. Spectrotemporal receptive fields (STRFs).
266 Examples of standard STRFs for the perception (left panel) and imagery (right panel) models
267 (warm colors indicate where the neuronal ensemble is excited, cold colors indicate where the
268 neuronal ensemble is inhibited). On the lower left corner, Electrode location overlaid on
269 cortical surface reconstructions of the participant’s cerebrum. Electrodes whose STRFs are
270 shown are outlined in black. Grey electrodes were removed from the analysis due to excessive
271 noise (see Materials and Methods).
272
12 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
273
274 Fig 4. Auditory tuning.
275 (A) Latency peaks – estimated from the STRFs – were significantly correlated between
276 perception and imagery conditions (r=0.43; p<0.005; randomization test). (B) Examples of
277 tuning curves for both perception and imagery encoding models defined as the average gain
278 of the STRFs as a function of acoustic frequency. Black outline in the anatomic brain indicate
279 electrode location. for the electrodes indicated by the black outline in the left panel –
280 Correlation coefficients between the perception and imagery conditions are plotted for
281 significant electrodes on the cortical surface reconstruction of the participant’s cerebrum.
282 Grey electrodes were removed from the analysis due to excessive noise. Bottom panel is a
283 histogram of electrode correlation coefficients between the perception and imagery tuning.
284 (C) Proportion of predictive electrode sites (N=41) with peak tuning at each frequency.
285 Tuning peaks were identified as significant parameters in the acoustic frequency tuning
286 curves (z>3.1; p<0.001) – separated by more than one third an octave.
287
288 Different electrodes are sensitive to different acoustic frequencies important for music
13 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
289 processing. We next quantitatively assessed how frequency tuning of predictive electrodes
290 (N=41) varied during the two conditions. First, to evaluate how the acoustic spectrum was
291 covered at the population level, we quantified the proportion of significant electrodes with a
292 tuning peak at each acoustic frequency (Fig 4C). Tuning peaks were identified as significant
293 parameters in the acoustic frequency tuning curves (z>3.1; p<0.001; separated by more than
294 one third an octave). The proportion of electrodes with tuning peaks was significantly larger
295 for the perception (mean = 0.19) than for the imagery (mean = 0.14) condition (Fig 4C;
296 p<0.05; Wilcoxon signed-rank test). Despite this, both conditions exhibited reliable frequency
297 selectivity, as nearly the full range of the acoustic frequency spectrum was encoded. The
298 fraction of acoustic frequency covered with peaks by predictive electrodes was 0.91 for the
299 perception and 0.89 for the imagery.
300
301 Reconstruction of auditory features during music perception and imagery
302 To evaluate the ability to identify piano keys from the brain activity, we reconstructed the
303 same auditory spectrogram representation used in the encoding models. Results showed that
304 the overall reconstruction accuracy was higher than zero in both conditions (Fig 5A; p <
305 0.001; randomization test), but did not differ between conditions (p > 0.05; two-sample t-test).
306 As a function of acoustic frequency, mean accuracy ranged from r=0–0.45 (Fig 5B).
14 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
307
308 Fig 5. Reconstruction accuracy.
309 (A) Right panel: Overall reconstruction accuracy of the spectrogram representation for both
310 perception (blue) and imagery (orange) conditions. Error bars denote resampling SEM. Left
311 panel: Reconstruction accuracy as a function of acoustic frequency. Shaded region denotes
312 SEM over the resamples. (B) Examples of original and reconstructed segments for the
313 perception (left) and the imagery (right) model. (C) Distribution of identification rank for all
314 reconstructed spectrogram notes. Median identification rank is 0.65 and 0.63 for the
315 perception and imagery decoding model, respectively, which is significantly higher than 0.50
316 chance level (p<0.001; randomization test). Left panel: Receiver operating characteristic
317 (ROC) plot of identification performance for the perception (blue curve) and imagery (orange
318 curve) model. Diagonal black line indicates no predictive power.
319
15 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
320 We further assessed reconstruction accuracy by evaluating the ability to identify isolated
321 piano notes from the test set auditory spectrogram reconstructions. Examples of original and
322 reconstructed segments are depicted in Fig 5B for the perception (left) and imagery model
323 (right). For the identification, we extracted 0.5-second segments at piano note onsets from the
324 original and reconstructed auditory spectrogram. Onsets of the notes were defined with the
325 MIRtoolbox [50]. Then, we computed the correlation coefficient between a target
326 reconstructed spectrogram and original spectrograms in the candidate set. Finally, we sorted
327 the coefficients and computed the identification rank as the percentile rank of the correct
328 spectrogram. This metric reflects how well the target reconstruction matched the correct
329 original spectrogram out of all candidate. Results showed that the median identification rank
330 of individual piano notes was significantly higher than chance for both conditions (Fig 5C;
331 median identification rank perception = 0.72 and imagery = 0.69; p< 0.001; randomization
332 test). Similarly, the area under the curve (AUC) of identification performance for the
333 perception (blue curve) and imagery (orange curve) model was well above chance level
334 (diagonal black dashed line indicates no predictive power; p<0.001; randomization test).
335 Cross-condition analysis
336 Another way to evaluate the overlapping degree between both perception and imagery
337 conditions is to apply the decoding model built in the perception condition to imagery neural
338 data, and vice-versa. This approach is based on the hypothesis that both tasks share neural
339 mechanisms and is useful when one of the models cannot be built directly, because of the lack
340 of observable measures. This technique has been successfully applied to various fields, such
341 as vision [51–53], and speech [54]. When the model was trained on the perception condition
342 and tested on the imagined condition (r=0.28), decoding performances improved by 50%
343 compared to when the perception model was applied to imagined data (r=0.19) (Fig 6). This
344 highlight the importance of having a model that is specific to each condition.
345
16 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
346
347 Fig 6. Cross-condition analysis.
348 Reconstruction accuracy when the decoding model was built on the perception condition and
349 applied to the imagery neural data and vis-versa. Decoding performances improved by 50%
350 when the model was trained on the perception condition and tested on the imagined condition
351 (r=0.28), compared to when the perception model was applied to imagined data (r=0.19).
352
353 Control analysis for sensorimotor confounds
354 In this study, the participant played piano in two different conditions (music perception and
355 imagery). In addition to the auditory percept, arm-, hand- and finger-movements related to the
356 active piano task could have presented potential confounds to the decoding process. We
357 controlled for possible motor confounds in three different ways. First, the electrode grid did
358 not cover hand or finger sensorimotor brain area (Fig 7A). This reduces the likelihood that the
359 decoding model involved hand-related motor confounds. Second, examples of STRFs for two
360 neighboring electrodes show different weight patterns for both conditions (Fig 7B). For
361 instance, the weights of the electrode depicted in the left example of Fig 7B are correlated
362 between both conditions (r=0.48), whereas the weights are not correlated in the right example
363 (r=0.04). Differences across conditions cannot be explained by motor confounds, because
364 finger movement was similar in both tasks. Third, brain areas that significantly encoded
365 music perception and imagery overlapped with auditory sensory areas (Fig S1 and Fig S3), as
366 revealed by the encoding accuracy and STRFs during passive listening to TIMIT sentences
367 (no movements). These findings provide evidence against motor confounds, and suggest that
17 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
368 the brain responses were induced by auditory percepts rather than motor movements
369 associated with pressing piano keys. Finally, we built two additional decoding models, using
370 1) only temporal lobe electrodes and 2) only auditory-responsive electrodes (Fig 7C; see
371 Materials and Methods for details). Both models showed significant reconstruction accuracy
372 (p<0.001; randomization test) and median identification rate (p<0.001; randomization test).
373 This suggests that even if we removed all electrodes that are potentially not related to auditory
374 processes, the decoding model still performs well.
375
18 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
376
377 Fig 7. Control analysis for motor confound.
378 (A) Electrode location overlaid on cortical surface reconstructions of the participant’s
379 cerebrum. View from the top of the brain shows that hand motor areas not recorded with the
380 grid. Brain areas associated with face, upper and lower limbs are depicted in green, blue and
381 pink, respectively. (B) Different STRFs for two neighboring electrodes highlighted in (A) for
382 both perception and imagery encoding models. (C) Overall reconstruction accuracy (left
19 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
383 panel) and median identification rank (right panel) when using all electrodes, only temporal
384 electrodes, and auditory electrodes (see Materials and methods for details).
385
386 Discussion
387 Music imagery studies are daunting due to the subjective nature and absence of verifiable and
388 observable measures. This task design allowed precise time-locking between the recorded
389 neural activity and spectrotemporal features of music imagery, and provided a unique
390 opportunity to quantitatively study neural encoding during auditory imagery, and compare
391 tuning properties with auditory perception. We provide the first evidence of spectrotemporal
392 receptive field and auditory features encoding in the brain during music imagery, and
393 provided comparative measures with actual music perception encoding. We observed that
394 neuronal ensembles were tuned to acoustic frequencies during imagined music, suggesting
395 that spectral organization occurs in the absence of actual perceived sound. Supporting
396 evidence has shown that restored speech – when a speech instance is replaced by noise, but
397 the listener perceives a specific speech sound – is grounded in acoustic representations in the
398 superior temporal gyrus [55,56]. In addition, results showed substantial, but not complete
399 overlap in neural properties – i.e. spectral and temporal tuning properties, and brain areas –
400 during music perception and imagery. These findings are in agreement with visual studies
401 showing that visual imagery and perception involve many, but not all, overlapping brain areas
402 [57,58]. We also showed that auditory features could be reconstructed from neural activity of
403 the imagined music. Because both perception and imagery models are based on the same
404 auditory stimulus representation, the correlated prediction accuracy provides strong evidence
405 for a shared neural representation of sound based on spectrotemporal features. This confirms
406 that the brain encodes spectrotemporal properties of sounds – as previously shown by
407 behavioral and brain lesion studies (see [2] for a review).
408
20 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
409 Methodological issues in investigating imagery are numerous, including the lack of evidence
410 that the desired mental task was operational. The current task design did not allow verifying
411 how the mental task was performed, yet the behavioral index of keynote press on the piano
412 indicated the precise time and frequency content of the intended imagined sound. In addition,
413 we recorded a skilled piano player, and it has been suggested that participants with musical
414 training exhibited better pitch and temporal acuity during auditory imagery than did
415 participants with little or no musical training [47,59]. Furthermore, tonotopic maps located in
416 the STG are enlarged within trained musicians [60]. Thus, having a trained piano-player
417 suggests improved auditory imagery ability (see also [7,9,14], and reduced issues related to
418 spectral and temporal errors.
419
420 Finally, the electrode grid was located on the left hemisphere of the participant. This raises
421 the question of lateralization in the brain response to music perception and imagery. Studies
422 have shown the importance of both hemispheres for auditory perception and imagination
423 [4,9,10,12–14], although brain patterns tend to shift toward the right hemisphere (see [15] for
424 a review). In our task, the grid was located on the left hemisphere, and allowed significant
425 encoding and decoding accuracy within high gamma frequency ranges. This is consistent with
426 the notion that music auditory processes are also evident in the left hemisphere.
427
428 Materials and Methods
429 Participant and data acquisition
430 Electrocorticographic (ECoG) recording was obtained using subdural electrode arrays
431 implanted in one patient undergoing neurosurgical treatment for refractory epilepsy. The
432 participant gave his written informed consent prior to surgery and experimental testing. The
433 experimental protocol was approved by the University of California, San Francisco and
434 Berkeley Institutional Review Boards and Committees on Human Research. Inter-electrode
21 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
435 spacing (center-to-center) was 4 mm. Grid placement (Fig 2A) was defined solely based on
436 clinical requirements. Localization and co-registration of electrodes was performed using the
437 structural MRI. Multi-electrode ECoG data were amplified and digitally recorded with
438 sampling rate of 3,052 Hz. ECoG signals were re-referenced to a common average after
439 removal of electrodes with epileptic artifacts or excessive noise (including broadband
440 electromagnetic noise from hospital equipment or poor contact with the cortical surface). In
441 addition to the ECoG signals, the audio output of the piano was recorded along with the
442 multi-electrode ECoG data.
443
444 Experimental paradigm
445 The recording session included two conditions. In the first condition, the participant played on
446 an electronic piano with the sound turned on. That is, the music was played out loud through
447 the speakers of the digital keyboard in the hospital room (volume at comfortable and natural
448 sound level; Fig 1A; perception condition). In the second condition, the participant played on
449 the piano with the speakers turned off and instead imagined hearing the corresponding music
450 in his mind (Fig 1B; imagery condition). Fig 1B illustrates that in both conditions, the
451 digitized sound output of the MIDI sound module was recorded in synchrony with the ECoG
452 data (even when the speakers were turned off and the participant did not hear the music). The
453 two music pieces were Chopin’s Prelude in C minor Op. 28 no. 20 and Bach’s Prelude in C
454 major (BWV 846), respectively.
455
456 Feature extraction
457 We extracted the ECoG signal in the high gamma frequency band from eight bandpass filters
458 (hamming window non-causal filter of order 20, logarithmically increasing center frequencies
459 (70–150 Hz) and semi-logarithmically increasing bandwidths), and extracted the envelope
460 using the Hilbert transform. Prior to model fitting, the power was averaged across these eight
22 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
461 bands, down-sampled to 100 Hz and z-scored.
462
463 Auditory spectrogram representation
464 The auditory spectrogram representation was a time-varying representation of the amplitude
465 envelope at acoustic frequencies logarithmically spaced between 200-7,000 Hz. This
466 representation was calculated by affine wavelet transforms of the sound waveform using
467 auditory filter banks that mimics neural processing in the human auditory periphery [31]. To
468 compute these acoustic representations, we used the NSL MATLAB toolbox (http://www.
469 isr.umd.edu/Labs/NSL/Software.htm).
470
471 Encoding model
472 The neural encoding model, based on the spectrotemporal receptive field (STRF) [34]
473 describes the linear mapping between the music stimulus and the high gamma neural response
474 at individual electrodes. The encoding model was estimated as follows:
475 � �, � = ℎ(�, �, �)�(� − �, �)
476 where � �, � is the predicted high gamma neural activity at time t and electrode n, S(t-t, f) is
477 the spectrogram representation at time (t-t) and acoustic frequency f. Finally, h(t, f, n) is the
478 linear transformation matrix that depends on the time lag t, the frequency f and electrodes n. h
479 represents the spectrotemporal receptive field of each electrode. The Neural tuning properties
480 of a variety of stimulus parameters in different sensory systems have been assessed using
481 STRFs [61]. We used Ridge regression to fit the encoding model [62], and a 10-fold cross-
482 validation resampling procedure, with no overlap between training and test partitions within
483 each resample. We performed grid search on the training set to define the penalty coefficient
484 � and the learning rate h, using a nested loop cross-validation approach. We standardized the
485 parameter estimates to yield the final model.
23 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
486
487 Decoding model
488 The decoding model linearly mapped the neural activity to the music representation, as a
489 weighted sum of activity at each electrode, as follows:
490 � �, � = �(�, �, �)�(� − �, �)
491 where � �, � is the predicted music representation at time t and frequency f. R(t-t, n) is the
492 HG neural response of electrode n at time (t-t), t is the time lag ranging between -100ms and
493 400ms. Finally, g(t, f, n) is the linear transformation matrix that depends on the time lag t,
494 frequency f, and electrode n. Both, neural response and music representation were
495 synchronized, downsampled to 100 Hz, and standardized to zero mean and unit standard
496 deviation prior to model fitting. To fit model parameters, we used gradient descent with early
497 stopping regularization. We used a 10-folds cross-validation resampling scheme, and 20% of
498 the training data were used as validation set to determine the early stopping criterion. Finally,
499 model prediction accuracy was evaluated on the independent testing set, and the parameter
500 estimates were standardized to yield the final model.
501
502 Evaluation
503 Prediction accuracy was quantified using the correlation coefficient (Pearson’s r) between the
504 predicted and actual HG signal using data from the independent test. Overall encoding
505 accuracy was reported as the mean correlation over folds. The z-test was applied for all
506 reported mean r values. Electrodes were defined as significant if the p-value was smaller than
507 the significance threshold of a=0.05 (95th-percentile; FDR correction). To define auditory
508 sensory areas, we built an encoding model on data recorded while the participant listened
24 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
509 passively to speech sentences from the TIMIT corpus [63] during 10min. Electrodes with
510 significant encoding accuracy are highlighted in Fig S1.
511
512 To further investigate the neural encoding of spectrotemporal acoustic features during music
513 perception and music imagery, we analyzed all the electrodes that were at least significant in
514 one condition (unless otherwise stated). Tuning curves were estimated from spectrotemporal
515 receptive fields, by first setting all inhibitory weights to zero, then averaging across the time
516 dimension and converting to standardized z-scores. Tuning peaks were identified as
517 significant peak parameters in the acoustic frequency tuning curves (z>3.1; p<0.001) –
518 separated by more than one third an octave.
519
520 Decoding accuracy was assessed by calculating the correlation coefficient (Pearson’s r)
521 between the reconstructed and original music spectrogram representation using testing set
522 data. Overall reconstruction accuracy was computed by averaging over acoustic frequencies
523 and resamples, and standard error of the mean (SEM) was computed by taking the standard
524 deviation across resamples. Finally, to further assess the reconstruction accuracy, we
525 evaluated the ability to identify isolated piano notes from the test set auditory spectrogram
526 reconstructions – using similar approach as in [33,54].
527
528 Acknowledgments
529 This work was supported by the Zeno-Karl Schindler Foundation, NINDS Grant R3721135,
530 NIH 5K99DC012804 and the Nielsen Corporation.
531
532 References
533 1. Meister IG, Krings T, Foltys H, Boroojerdi B, Müller M, Töpper R, et al. Playing piano 534 in the mind--an fMRI study on music imagery and performance in pianists. Brain Res 535 Cogn Brain Res. 2004;19: 219–228. doi:10.1016/j.cogbrainres.2003.12.005
25 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
536 2. Hubbard TL. Auditory imagery: Empirical findings. Psychol Bull. 2010;136: 302–329. 537 doi:10.1037/a0018436
538 3. Halpern AR. Memory for the absolute pitch of familiar songs. Mem Cognit. 1989;17: 539 572–581.
540 4. Halpern AR, Zatorre RJ, Bouffard M, Johnson JA. Behavioral and neural correlates of 541 perceived and imagined musical timbre. Neuropsychologia. 2004;42: 1281–1292. 542 doi:10.1016/j.neuropsychologia.2003.12.017
543 5. Pitt MA, Crowder RG. The role of spectral and dynamic cues in imagery for musical 544 timbre. J Exp Psychol Hum Percept Perform. 1992;18: 728–738.
545 6. Intons-Peterson M. Components of auditory imagery. In: Reisberg D, editor. Auditory 546 imagery. Hillsdale, N.J: L. Erlbaum Associates; 1992. pp. 45–72.
547 7. Halpern. Mental scanning in auditory imagery for songs. J Exp Psychol Learn Mem 548 Cogn. 1988;14: 434–443.
549 8. Kosslyn SM, Ganis G, Thompson WL. Neural foundations of imagery. Nat Rev 550 Neurosci. 2001;2: 635–642. doi:10.1038/35090055
551 9. Zatorre, Halpern. Effect of unilateral temporal-lobe excision on perception and imagery 552 of songs. Neuropsychologia. 1993;31: 221–232.
553 10. Griffiths TD. Human complex sound analysis. Clin Sci Lond Engl 1979. 1999;96: 231– 554 234.
555 11. Halpern AR. When That Tune Runs Through Your Head: A PET Investigation of 556 Auditory Imagery for Familiar Melodies. Cereb Cortex. 1999;9: 697–704. 557 doi:10.1093/cercor/9.7.697
558 12. Kraemer DJM, Macrae CN, Green AE, Kelley WM. Musical imagery: Sound of silence 559 activates auditory cortex. Nature. 2005;434: 158–158. doi:10.1038/434158a
560 13. Rauschecker JP. Cortical plasticity and music. Ann N Y Acad Sci. 2001;930: 330–336.
561 14. Zatorre RJ, Halpern AR, Perry DW, Meyer E, Evans AC. Hearing in the Mind’s Ear: A 562 PET Investigation of Musical Imagery and Perception. J Cogn Neurosci. 1996;8: 29–46. 563 doi:10.1162/jocn.1996.8.1.29
564 15. Zatorre RJ, Halpern AR. Mental Concerts: Musical Imagery and Auditory Cortex. 565 Neuron. 2005;47: 9–12. doi:10.1016/j.neuron.2005.06.013
566 16. Zatorre RJ, Halpern AR, Bouffard M. Mental Reversal of Imagined Melodies: A Role 567 for the Posterior Parietal Cortex. J Cogn Neurosci. 2009;22: 775–789. 568 doi:10.1162/jocn.2009.21239
569 17. Halpern AR, Zatorre RJ. When That Tune Runs Through Your Head: A PET 570 Investigation of Auditory Imagery for Familiar Melodies. Cereb Cortex. 1999;9: 697– 571 704. doi:10.1093/cercor/9.7.697
26 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
572 18. Hickok G, Buchsbaum B, Humphries C, Muftuler T. Auditory-motor interaction 573 revealed by fMRI: speech, music, and working memory in area Spt. J Cogn Neurosci. 574 2003;15: 673–682. doi:10.1162/089892903322307393
575 19. Meyer M, Elmer S, Baumann S, Jancke L. Short-term plasticity in the auditory system: 576 differential neural responses to perception and imagery of speech and music. Restor 577 Neurol Neurosci. 2007;25: 411–431.
578 20. Halpern AR. Cerebral substrates of musical imagery. Ann N Y Acad Sci. 2001;930: 579 179–192.
580 21. Mikumo M. Motor Encoding Strategy for Pitches of Melodies. Music Percept Interdiscip 581 J. 1994;12: 175–197. doi:10.2307/40285650
582 22. Petsche H, von Stein A, Filz O. EEG aspects of mentally playing an instrument. Brain 583 Res Cogn Brain Res. 1996;3: 115–123.
584 23. Brodsky W, Henik A, Rubinstein B-S, Zorman M. Auditory imagery from musical 585 notation in expert musicians. Percept Psychophys. 2003;65: 602–612.
586 24. Schürmann M, Raij T, Fujiki N, Hari R. Mind’s ear in a musician: where and when in 587 the brain. NeuroImage. 2002;16: 434–440. doi:10.1006/nimg.2002.1098
588 25. Bunzeck N, Wuestenberg T, Lutz K, Heinze H-J, Jancke L. Scanning silence: mental 589 imagery of complex sounds. NeuroImage. 2005;26: 1119–1127. 590 doi:10.1016/j.neuroimage.2005.03.013
591 26. Yoo SS, Lee CU, Choi BG. Human brain mapping of auditory imagery: event-related 592 functional MRI study. Neuroreport. 2001;12: 3045–3049.
593 27. Aertsen AMHJ, Olders JHJ, Johannesma PIM. Spectro-temporal receptive fields of 594 auditory neurons in the grassfrog: III. analysis of the stimulus-event relation for natural 595 stimuli. Biol Cybern. 1981;39: 195–209. doi:10.1007/BF00342772
596 28. Eggermont JJ, Aertsen AMHJ, Johannesma PIM. Quantitative characterisation 597 procedure for auditory neurons based on the spectro-temporal receptive field. Hear Res. 598 1983;10: 167–190. doi:10.1016/0378-5955(83)90052-7
599 29. Tian B. Processing of Frequency-Modulated Sounds in the Lateral Auditory Belt Cortex 600 of the Rhesus Monkey. J Neurophysiol. 2004;92: 2993–3013. 601 doi:10.1152/jn.00472.2003
602 30. Saenz M, Langers DRM. Tonotopic mapping of human auditory cortex. Hear Res. 603 2014;307: 42–52. doi:10.1016/j.heares.2013.07.016
604 31. Chi T, Ru P, Shamma SA. Multiresolution spectrotemporal analysis of complex sounds. 605 J Acoust Soc Am. 2005;118: 887. doi:10.1121/1.1945807
606 32. Clopton BM, Backoff PM. Spectrotemporal receptive fields of neurons in cochlear 607 nucleus of guinea pig. Hear Res. 1991;52: 329–344.
27 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
608 33. Pasley BN, David SV, Mesgarani N, Flinker A, Shamma SA, Crone NE, et al. 609 Reconstructing Speech from Human Auditory Cortex. Zatorre R, editor. PLoS Biol. 610 2012;10: e1001251. doi:10.1371/journal.pbio.1001251
611 34. Theunissen FE, Sen K, Doupe AJ. Spectral-temporal receptive fields of nonlinear 612 auditory neurons obtained using natural sounds. J Neurosci Off J Soc Neurosci. 2000;20: 613 2315–2331.
614 35. Lotte F, Brumberg JS, Brunner P, Gunduz A, Ritaccio AL, Guan C, et al. 615 Electrocorticographic representations of segmental features in continuous speech. Front 616 Hum Neurosci. 2015;09. doi:10.3389/fnhum.2015.00097
617 36. Mesgarani N, Cheung C, Johnson K, Chang EF. Phonetic Feature Encoding in Human 618 Superior Temporal Gyrus. Science. 2014;343: 1006–1010. doi:10.1126/science.1245994
619 37. Tankus A, Fried I, Shoham S. Structured neuronal encoding and decoding of human 620 speech features. Nat Commun. 2012;3: 1015. doi:10.1038/ncomms1995
621 38. Huth AG, de Heer WA, Griffiths TL, Theunissen FE, Gallant JL. Natural speech reveals 622 the semantic maps that tile human cerebral cortex. Nature. 2016;532: 453–458. 623 doi:10.1038/nature17637
624 39. Lachaux J-P, Axmacher N, Mormann F, Halgren E, Crone NE. High-frequency neural 625 activity and human cognition: past, present and possible future of intracranial EEG 626 research. Prog Neurobiol. 2012;98: 279–301. doi:10.1016/j.pneurobio.2012.06.008
627 40. Miller KJ, Leuthardt EC, Schalk G, Rao RPN, Anderson NR, Moran DW, et al. Spectral 628 changes in cortical surface potentials during motor movement. J Neurosci Off J Soc 629 Neurosci. 2007;27: 2424–2432. doi:10.1523/JNEUROSCI.3886-06.2007
630 41. Boonstra TW, Houweling S, Muskulus M. Does Asynchronous Neuronal Activity 631 Average out on a Macroscopic Scale? J Neurosci. 2009;29: 8871–8874. 632 doi:10.1523/JNEUROSCI.2020-09.2009
633 42. Towle VL, Yoon H-A, Castelle M, Edgar JC, Biassou NM, Frim DM, et al. ECoG 634 gamma activity during a language task: differentiating expressive and receptive speech 635 areas. Brain. 2008;131: 2013–2027. doi:10.1093/brain/awn147
636 43. Llorens A, Trébuchon A, Liégeois-Chauvel C, Alario F-X. Intra-Cranial Recordings of 637 Brain Activity During Language Production. Front Psychol. 2011; 638 doi:10.3389/fpsyg.2011.00375
639 44. Crone NE, Boatman D, Gordon B, Hao L. Induced electrocorticographic gamma activity 640 during auditory perception. Brazier Award-winning article, 2001. Clin Neurophysiol Off 641 J Int Fed Clin Neurophysiol. 2001;112: 565–582.
642 45. Sturm I, Blankertz B, Potes C, Schalk G, Curio G. ECoG high gamma activity reveals 643 distinct cortical representations of lyrics passages, harmonic and timbre-related changes 644 in a rock song. Front Hum Neurosci. 2014;8. doi:10.3389/fnhum.2014.00798
28 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
645 46. Aleman A, Nieuwenstein MR, Böcker KB., de Haan EH. Music training and mental 646 imagery ability. Neuropsychologia. 2000;38: 1664–1668. doi:10.1016/S0028- 647 3932(00)00079-8
648 47. Janata P, Paroo K. Acuity of auditory images in pitch and time. Percept Psychophys. 649 2006;68: 829–844.
650 48. Lotze M, Scheler G, Tan H-RM, Braun C, Birbaumer N. The musician’s brain: 651 functional imaging of amateurs and professionals during performance and imagery. 652 NeuroImage. 2003;20: 1817–1829.
653 49. Ellis D. Dynamic time warping (DTW) in Matlab [Internet]. 2003. Available: 654 http://www.ee.columbia.edu/~dpwe/resources/matlab/dtw/
655 50. Lartillot O, Toiviainen P, Eerola T. A Matlab Toolbox for Music Information Retrieval. 656 In: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R, editors. Data Analysis, 657 Machine Learning and Applications. Berlin, Heidelberg: Springer Berlin Heidelberg; 658 2008. pp. 261–268. Available: http://link.springer.com/10.1007/978-3-540-78246-9_31
659 51. Haynes J-D, Rees G. Predicting the orientation of invisible stimuli from activity in 660 human primary visual cortex. Nat Neurosci. 2005;8: 686–691. doi:10.1038/nn1445
661 52. Horikawa T, Tamaki M, Miyawaki Y, Kamitani Y. Neural Decoding of Visual Imagery 662 During Sleep. Science. 2013;340: 639–642. doi:10.1126/science.1234330
663 53. Reddy L, Tsuchiya N, Serre T. Reading the mind’s eye: Decoding category information 664 during mental imagery. NeuroImage. 2010;50: 818–825. 665 doi:10.1016/j.neuroimage.2009.11.084
666 54. Martin S, Brunner P, Holdgraf C, Heinze H-J, Crone NE, Rieger J, et al. Decoding 667 spectrotemporal features of overt and covert speech from the human cortex. Front 668 Neuroengineering. 2014; doi:10.3389/fneng.2014.00014
669 55. Leonard MK, Baud MO, Sjerps MJ, Chang EF. Perceptual restoration of masked speech 670 in human cortex. Nature Communications. 2016; doi:10.1038/ncomms13619
671 56. Holdgraf CR, de Heer W, Pasley B, Rieger J, Crone N, Lin JJ, et al. Rapid tuning shifts 672 in human auditory cortex enhance speech intelligibility. Nat Commun. 2016;7: 13654. 673 doi:10.1038/ncomms13654
674 57. Kosslyn SM, Thompson WL. Shared mechanisms in visual imagery and visual 675 perception: Insights from cognitive neuroscience. In: Gazzaniga MS, editor. The New 676 Cognitive Neurosciences. 2nd ed. Cambridge, MA: MIT Press; 2000.
677 58. Kosslyn SM, Thompson WL. When is early visual cortex activated during visual mental 678 imagery? Psychol Bull. 2003;129: 723–746. doi:10.1037/0033-2909.129.5.723
679 59. Herholz SC, Lappe C, Knief A, Pantev C. Neural basis of music imagery and the effect 680 of musical expertise. Eur J Neurosci. 2008;28: 2352–2360. doi:10.1111/j.1460- 681 9568.2008.06515.x
29 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
682 60. Pantev C, Oostenveld R, Engelien A, Ross B, Roberts LE, Hoke M. Increased auditory 683 cortical representation in musicians. Nature. 1998;392: 811–814. doi:10.1038/33918
684 61. Wu MC-K, David SV, Gallant JL. Complete functional characterization of sensory 685 neurons by system identification. Annu Rev Neurosci. 2006;29: 477–505. 686 doi:10.1146/annurev.neuro.29.051605.113024
687 62. Thirion B, Duschenay E, Michel V, Varoquaux G, Grisel O, VanderPlas J, et al. 688 scikitlearn. J Mach Learn Res. 2011;12: 2825–2830.
689 63. Garofolo JS. TIMIT acoustic-phonetic continuous speech corpus. 1993;
690
691 Supporting Information
692 693 Figure S1. Anatomical distribution of significant electrodes. Electrodes with significant encoding accuracy 694 overlaid on cortical surface reconstruction of the participant’s cerebrum. To define auditory sensory areas (pink), 695 we built an encoding model on data recorded while participant listened passively to speech sentences from the 696 TIMIT corpus [64]. Electrodes with significant encoding accuracy (p<0.05; FDA correction) are highlighted. 697
30 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
698 699 Figure S2. Spectrotemporal receptive fields. STRFs for the perception (top panel) and imagery (bottom panel) 700 models. Grey electrodes were removed from the analysis due to excessive noise.
31 bioRxiv preprint doi: https://doi.org/10.1101/106617; this version posted February 7, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
701 702 Figure S3. Neural encoding during passive listening. Standard STRFs for the passive listening model – built 703 on data recorded while the participant listened passively to speech sentences from the TIMIT corpus [64]. Grey 704 electrodes were removed from the analysis due to excessive noise. Encoding accuracy is plotted on the cortical 705 surface reconstruction of the participant’s cerebrum (map thresholded at p<0.05; FDR correction). 706 707
32