<<

The role of in hearing

Monty A. Escabí! Electrical & Computer Engineering, Biomedical Engineering, Psychology UConn Storrs Hearing is an Extremely Versatile Sense! Overview Natural Sounds and Acoustics • Acoustic Properties of Natural Sounds • content • Temporal and spectral

• Perception of temporal and spectral cues

• The role of modulations in speech production, vocalizations, and natural sounds

Physiology

• Encoding of modulations in the auditory nerve. • Encoding of modulations in the brainstem, midbrain, and cortex

• Encoding of speech and vocalizations in the inferior colliculus and auditory cortex Ecological principles of hearing

i. “Natural acoustic environments” guided the development of the auditory system over millions of years of evolution. ii. The auditory system evolved so that it optimally encodes natural sounds. iii. To understand how the auditory system functions one must also understand the acoustic structure of biologically and behaviorally relevant inputs (sounds). What is a natural sound? i. Natural sounds are often species dependent i. Humans: speech ii. Other mammals: vocalized communication sounds iii. Sounds emitted by predators iv. Navigation (e.g., bats, whales, dolphins) ii. Context dependent i. Mating sounds ii. Survival sounds (e.g., running water, predators) iii. Communication sounds (species specific) iii. Background sounds i. Undesirable sounds (e.g., running water, ruffling leaves, wind) – usually “mask” a desirable and biologically meaningful sound. Fourier Signal Analysis (1807)

Any signals can be Constructed by a Sum of Sinusoids!

Jean Fourier (1768-1830) Fourier Synthesis – Square Wave

+

+ =

+ Signal Decomposition by The Auditory System (1863)

The Auditory System Functions Like a Spectrum Analyzer

Helmholtz (1821-1897) The cochlea performs a frequency decomposition of the sound

Low High

Base Apex Base Apex

Adapted from Tondorf (1960)! Some Basic Auditory Percepts

• Loudness – a subjective sensation that allows you to order a sound on the basis of its physical power (intensity).

• Pitch – a subjective sensation in which a listener can order a sound on a scale on the basis of its physical frequency. • Timbre – is the quality of a sound that distinguishes it from other a sounds of identical pitch and loudness. • In music, for instance, timbre allows you to distinguish an oboe from a trumpet. • Timbre is often associated with the spectrum of a sound. CNBH, PDN, University of Cambridge SizeMusical principle Instruments – pitch is inversely come related in toFamilies the sizepitch of the resonator Instruments with different sizes, but same shape violin viola and construction, cello sound similar.

The ‘family’ Scalinge sound is the iz s message. Signal Frequency

τ=1 msec f=1kHz

f=2kHz

f=4kHz

Time Temporal Modulations Are Prominent Features in Natural Sounds Temporal Auditory Percepts • Periodicity Pitch – Pitch percept resulting from the temporal modulations of a sound (50 – 1000 Hz).

• Residue pitch or pitch of the missing fundamental – Perceived pitch of a harmonic signal (e.g., 400, 600, 800, 1000 Hz components) that is missing the fundamental frequency (200 Hz).

• Rhythms – Perception of slow sound modulations below ~20 Hz.

• Timbre – Timbre is not strictly a spectral percept as is typically assumed. Temporal cues can also change the perceived timbre of a sound. Also binaural cues can alter the perceived timbre. Temporal Modulation

RED = carrier signal YELLOW = modulation envelope

The above signal is a sinusoidal amplitude modulated tone (SAM tone). It is expressed as: ⎡1 1 ⎤ x(t) = + cos(2π fmt) ⋅cos(2π fct) ⎣⎢2 2 ⎦⎥ fm = Modulation Frequency (Hz) fc = Carrier Frequency (Hz) Temporal (Rhythm & ”Roughness” Range) Fm!

5 Hz x(t)= A(t)sin(2⋅π ⋅ fct+φ)

10 Hz€

20 Hz

40 Hz

Time Temporal Amplitude Modulation (Pitch Range) F m! A(t)=1+sin(2⋅π ⋅Fmt +θ)

50 Hz x(t)= A(t)sin(2⋅π ⋅ fct +φ)

100 Hz

200 Hz

400 Hz

Time Pitch of the missing fundamental

f0

f0 2f0 10f0 Frequency

Harmonic Tone Complex has a

perceived pitch frequency f0 Pitch of the missing fundamental

f0

f0 2f0 10f0 Frequency

If you remove the fundamental component, f0, the pitch is still present. Pitch of the missing fundamental:

Is pitch temporal or a spectral percept? Fundamental removed

400 800 1200 1600 200 msec Frequency Existence Region for prominent Temporal Percepts

AUDITORY MODULATION PROCESSING 545

and modulation resulting in the percepts of fluctuation, roughness, and residue pitch. (Sensitivity to binaural envelope disparities is discussed in section VI.) Two competing models have been proposed to ex- plain the detection of AM. The first consists of a bandpass filter and half-wave rectifier representing processing by the cochlea, followed by a low-pass filter (285). Some measure of the output of this filter provides the basis for the subject’s response (see Ref. 181 for discussion). In essence, therefore, this model is an envelope . The second scheme models the detection of modulation by a bank of bandpass filters that are sensitive to different ranges of modulation frequency. A channel or filterbank model of modulation analysis was first proposed by Kay and colleagues (84, 132) on the basis of adaptation studies with FM and AM. Subsequently, the adaptation paradigm was questioned (178, 289), but the concept of a modula- Downloaded from tion filterbank persists because studies using different FIG. 2. Amplitude modulation (AM) stimuli generate different per- cepts that encompass several regions of modulation and carrier frequen- psychophysical paradigms have since reported findings

cies. At very low fm, most strongly near 4 Hz and disappearing around 20 which support the concept of modulation frequency tun- Hz, a sensation of fluctuation or rhythm is produced (hatched). The rate ing. Evidence for such selectivity comes from modulation at which the temporal envelope of fluent speech varies is also typically masking experiments (8, 107), and modulation detection

4 Hz (syllables/s). Fluctuation makes a smooth transition to a percept of physrev.physiology.org roughness, which starts at ϳ15 Hz (bottom curved line), is strongest interference (MDI), a phenomenon in which the detection near 70 Hz, and disappears below 300 Hz (top curved line). Harmonic of AM is influenced by modulation at the same frequency complex tones produce a pitch that corresponds to a frequency close to the fundamental frequency. However, the lower harmonics can be re- but on a very different carrier (318). Dau et al. (36) moved without affecting the pitch, resulting in “residue pitch” if fc and invoked a model consisting of a modulation filterbank fm are chosen within the shaded region. Finally, small interaural time differences (ITD) can be detected between modulated stimuli to the two associated with each auditory filter to account for the

ears for a region of combinations of fm and fc that overlaps with the detection and masking of sinusoidally amplitude-modu- region for residue pitch (thick line). Note that these are regions in lated narrowband noise. The latter model was extended stimulus space where modulation is perceptually relevant, but the pre- cise relationship of these percepts to physiological response modulation (283) to account for comodulation masking release, an- on October 11, 2007 is usually unclear. For reference, the small dots indicate Ϫ10 dB cutoff other phenomenon, like MDI, that indicates some element values for modulation transfer functions (MTFs) of auditory nerve fibers of modulation analysis across different carrier (cf. Fig. 3C) [based on further analysis of data reported by Joris and Yin (127)]. Delineation of psychophysical regions is based on References 16, frequencies (96) (see Ref. 180 for review). Such across- 104, 233, 278, 325. The ordinate is truncated at 4 Hz. frequency interactions between similar modulation enve- lopes are likely to contribute to grouping and the con- exceeds the limit for phase-locking to envelopes in more struction of auditory images (90). Despite different lines central neurons. This raises questions as to the nature of of evidence favoring some form of modulation filterbank, modulation encoding in the central auditory system, even the concept remains controversial, and the experimental when one takes into account the encoding of modulations findings discussed above do not concur in their estimates by changes in average rate that become apparent at more of the bandwidths for these putative channels. central sites. Although, as Zwicker noted, a distinct pitch at the III. NEURAL RESPONSE MEASURES frequency of modulation is perceived when components of the stimulus spectrum can be resolved, weaker but nevertheless clear pitches are also perceived with modu- In neurophysiology, one can generally think of a lations containing no resolved components (179, 233). variety of ways in which stimulus features may be “en- Even modulations imposed on noise carriers can generate coded” and processed (208), and it is not immediately pitches which though weaker than those generated with obvious which aspects of neuronal behavior are the most tonal stimuli are able to support melody recognition (21, relevant for the perceptual task at hand. With few excep- 22). Taken together, these findings demonstrate that the tions, the response measures used in studies of AM are periodicity or residue pitches of some modulations must average discharge rate (i.e., the number of spikes evoked result solely from temporal analysis, but when resolved over several modulation cycles), or some measure of components are present, pitch salience is increased. Fig- synchronization of the timing of action potentials to the ure 2 schematically indicates the combinations of carrier envelope waveform.

Physiol Rev • VOL 84 • APRIL 2004 • www.prv.org Timbre is not just strictly a spectral percept

- Sounds have identical periodicity - Sounds have identical spectrum - Sounds are time reversed - Identical pitch/rhythm but different timbre

- Strong tonal percept

- Weak percussive percept

- Weak tonal percept

- Strong percussive - percept

20 msec Patterson & Irino 1998 Timbre is not just strictly a spectral percept

- Sounds have identical periodicity - Sounds have identical spectrum - Sounds are time reversed - Identical pitch/rhythm but different timbre

Ramped Sinusoid

Ramped Noise

Damped Sinusoid

Damped Noise

20 msec Patterson & Irino 1998 “ ” “TheThe SpeechSpeech Chain Chain” Harry Hearing Larry Lynx Larynx Anatomy Vocal folds (top view) Vocal Folds are a nonlinear free air oscillator.

1) Vocal folds are not a motor driven oscillator.

2) They are essentially “flapping in the wind”.

3) Produce a quasi periodic excitation. Vocal Folds Produce a Quasi Periodic Excitation Pattern

Glottal Pulses

100 msec -> f0=100 Hz Phonation During Human Speech

Vocal Fold Vibration Vocal Fold Vibration (Actual Speed) (High Speed Capture) Source-Filter Model for Speech Production

Vibrating Lungs Vocal tract & Vocal Folds Articulators

Airstream Glottal Speech Pulses Source Filter Source-Filter Model for Speech Production

Vibrating Vocal tract & Vocal Folds Articulators

Glottal Speech Pulses Source Filter

Vocal Tract Resonances Peaks in the speech spectrum that are created by the vocal tract resonances are called formants F1 F2 F3

The vocal tract shaping creates spectral modulations. Postural adjustments of the vocal tract and articulators changes the formant frequencies Postural adjustments of the vocal tract and articulators (lips, tongue, soft palate) changes the resonant properties of the vocal tract and oral cavity. This results in distinct formant patterns for different vowel sounds. The relationship between the first and second formant frequencies is distinct for different vowels. Speech production Key Points 1) Vocal folds are the primary excitation source.

a) Increases sound intensity (compare to whispering). b) Partly determine speech quality and pitch (e.g., male versus female voice).

2) Vocal tract shapes the spectrum of the speech sound and produces spectral cues in the form of formant frequencies. Acoustic structure in animal communication signals is similar across many species

As for speech many animal vocalizations contain:

1) Periodic Excitation

2) Slow varying modulation / envelope Speech and music have an ~ 1/f modulation spectrum (Power) 10 log

log10(fm) Voss & Clark, Nature 1975 Natural sounds have an ~ 1/f modulation spectrum

()*+,-./01*23/0 (+990, " "

!"&'

!! !!

!!&'

!# !# 4-5!"6(6788

!#&' 4-5!"6(6788

!% !%

!%&'

!$ !$ !! " ! # !! " ! #

:;<1=-0;4/>;

!! !!

!# !# 4-5!"6(6788 4-5!"6(6788

!% !%

!$ !$ !! " ! # !! " ! # 4-5!"678 4-5!"678

Voss & Clark; Attias & Schreiner 1998 Natural sounds have ~ 1/f modulation spectrum

1) The 1/f spectrum is defined by: S( f ) = C ⋅ f −α

2) Note that if α=1 and C=1 then: S( f ) = f −1 =1/ f

3) Furthermore note that €in dBs:

−α SdB ( f ) = 20log10(S( f )) = 20€ log10 ( f ) = −α ⋅ 20log10 ( f )

Therefore the plot: −α ⋅ 20log10( f ) vs. log10 ( f )

€ Is a straight line with negative slope.

€ The cochlea decomposes sounds into its spectral and temporal components The speech spectrum changes dynamically with time Songbird vocalization Static Ripple Sounds

Timbre = Perceptual quality often related to spectral shape.

1 cycle/octave 0.5 cycle/octave

4 4

3 3 ΔF 2 2

1 1

0 0

Octave Frequency 0 0.5 1 0 0.5 1 Time (sec) Time (sec) How much spectral and temporal resolution is necessary for sound recognition? • Cochlea - Very High Resolution, 30,000 Hair Cells • Speech Recognition - Low Freq. resolution, ~4 channels (R. Shannon et al. Science 1995) - High Temporal Resolution • Music Perception - High Freq. resolution, > 32 channels. - Lower Temporal resolution

What’s the Neuronal Basis for this Dichotomy? Cochlear Implant Simulation

Speech Music

2 channels 4 channels

4 channels 8 channels

8 channels 16 channels

16 channels 32 channels

32 channels Original

Original Moving Ripple Sounds Contain Time and Frequency Modulations

Δt

4 44 3 ΔF 3 2 2

1 1

0 0 0 0.5 1 0 0.5 1 Octave Frequency Time (sec) Time (sec) Spectro-Temporal Ripples serve as a building block for spectral and temporal modulations Fourier Analysis – sinusoids are the basic building block

+

+ =

+ Spectro-Temporal Ripples serve as a building block for spectral and temporal modulations Speech and other complex sounds can be decomposed into ripples Natural Sound Exhibit a Tradeoff Between Spectral and Temporal Modulations

Rodriguez et al 2010 How is spectral and temporal information ! encoded in the central auditory pathway?!

AC

MGB

IC

NLL

CN MSO/LSO TheHearing Cochlea Cochlear Decomposition Inner Hair cells transduce the acoustic signal and send their output to the DCN

M. Lenoir et al. Hair cell nonlinearity rectifies the incoming sound.

Hair cell responds only to positive deflections (towards the kinocilium) What does the hair cell rectification buy us?

1) Creates products. 2) Distortion products allow the hair cell to demodulate the incoming sound (i.e., remove the carrier information and preserve the modulation). 3) Point (2) is especially important at high frequencies because auditory nerve fibers cannot phase lock to the carrier. What is the advantages of NOT representing the sound frequency in the auditory nerve temporal firing pattern? 1) Much of the content carrying information is conveyed by the modulations. 2) Frequency is represented by the “place” on the cochlea. It would be “redundant” to represent it in the temporal firing pattern of auditory nerve fibers. 3) Specialized mechanisms would be required to phase-lock at high frequencies (e.g., barn owl). For most mammals phase-locking < 1000Hz. 4) Would require high metabolic demands. Hair cell rectification is essential for extracting sound modulations

Sound x(t) To CNS g(x) f f Tuning Filter Rectifying Lowpass Filter Nonlinearity (Mechanical) (Haircell synapse and membrane, ~1000 Hz cutoff frequency) Hair cell rectification: Math perspective (single sinusoid)

Consider a single sinusoid input:

x(t) = sin(ωct) where ωc = 2⋅ π ⋅ f c Lets apply a simple rectifying nonlinearity:

2 2 € y(t) = x(t) = cos (ωct)

To simplify apply trigonometric identity:

2 1 1 cos (θ) = + cos(2⋅θ) € 2 2

€ Hair cell rectification: Math perspective (single sinusoid)

Therefore the final output is: 1 1 y(t) = + cos(2⋅ωc ⋅ t) 2 2 Key points: 1) The frequency of the input is ω € c 2) The output does NOT resemble the input. 3) The output contains two NEW frequencies:

2ωc and 0! Lets consider what happens when the input consists of the sum of TWO sinusoids. Hair cell rectification: Math perspective (two sinusoids)

Consider a sum of two sinusoid inputs:

x(t) = sin(ω1t) + sin(ω2t) As before lets apply rectifying nonlinearity:

2 2 € y(t) = x(t) = [cos(ω1t) + cos(ω2t)]

2 2 = cos(ω1t) + 2⋅ cos(ω1t)⋅ cos(ω2t) + cos(ω2t) A B C €

€ Hair cell rectification: Math perspective (two sinusoids)

As for the single tone example:

2 1 1 Term A: cos(ω1t) = + cos(2⋅ω1 ⋅ t) 2 2

2 1 1 Term B: cos(ω2t) = + cos(2⋅ω 2 ⋅ t) 2 2 € How about term C? € Hair cell rectification: Math perspective (two sinusoids)

Term C produces an Interaction/Distortion Product:

Term C: 2cos(2ω1t)cos(2ω2t)

To simplify apply trigonometric identity: € 1 1 cos(α)⋅ cos(β) = cos(α + β) + cos(α − β) 2 2

€ Hair cell rectification: Math perspective (two sinusoids)

Term C simplifies to:

cos((ω1 + ω2)⋅ t) + cos((ω2 −ω1)⋅ t)

And the total output is:

€ y(t) = A + B + C 1 1 =1+ cos(2⋅ω1 ⋅ t) + cos(2⋅ω2 ⋅ t) 2 2 cos t cos t € + ((ω1 + ω2)⋅ ) + ((ω2 −ω1)⋅ )

€ € Hair cell rectification: Math perspective (two sinusoids)

Key points:

1) The input contains two frequencies: ω1 and ω2 2) The output does NOT resemble the input. 3) The output contains five NEW frequencies:

0, 2ω1, 2ω2, ω2-ω1 and ω1+ω2!

4) The terms containing ω2-ω1 and ω1+ω2 are referred to as interaction products.

5) Note that ω2-ω1 is the frequency of the modulation! Hair cell rectification and modulation extraction: Frequency domain perspective HairDistortion cell Nonlinearity: Products g(x) SynapticHaircellSAMHaircell Tone OutputTuningLowpass Filter Filter 0 2fc f f c m f -f f +f 2fc-fm 2fc+fm 2fm c m c m

Frequency How does the hair cell process differ for LOW and HIGH frequency auditory nerve fibers? High Frequency Auditory Nerve Fiber Note that tuning filter and lowpass filter do NOT overlap. Output strictly contains modulation signal.

0 2fc f f c m f -f f +f 2fc-fm 2fc+fm 2fm c m c m

Frequency 1 kHz Auditory Nerve Fiber Note that tuning filter and lowpass filter overlap. Output contains modulation and carrier signals.

0 2fc f f c m f -f f +f 2fc-fm 2fc+fm 2fm c m c m

Frequency 1 kHz Hair cell rectification and modulation extraction (high frequency fiber): Time domain perspective

SAM Input

Hair cell nonlinearity

Rectified

Membrane/synapse lowpass filter Demodulated envelope Hair cell rectification and modulation extraction (low frequency fiber): Time domain perspective

SAM Input

Hair cell nonlinearity

Rectified

Membrane/synapse lowpass filter Hair cell output Hair cell rectification and modulation extraction: Time domain perspective

Key points: 1) The input contains the carrier and the modulation envelope. 2) The rectification and lowpass filtering process removes the carrier for high frequency fibers. 3) The output of the hair cell approximates the envelope of the modulated signal for high frequency fibers. 4) The output of LOW frequency hair cells contains both modulation and carrier information. How are temporal modulations encoded in the brainstem and midbrain? Fm!

5 Hz x(t)= A(t)sin(2⋅π ⋅ fct+φ)

10 Hz€

20 Hz

40 Hz

Time Peripheral and central auditory neurons can phase-lock to the sound envelope Example dot-rastergram for SAM Noise

1.3 kHz

Modulation Frequency 14 Hz 10 Hz 7 Hz 5 Hz 10 trials

200 msec Cycle Histogram

Σ Modulation Transfer Function (MTF)

,30o• • I

m•,m zoo{c) ,

• ß • . , < ---•:.•..••

•E (•) • •EOUE•Yfm(Hz) • ,

SR= 6l s- ' ). (a) Ped• •sto•ramsbinn• •/./2. (b) R (d•ht ordi- nate) and •ain (l• o•in•tc) with illust•fion of m•sur• ( = 12• Hz),•o do ( = 1620Hz), ad B•E ( = 5• Hz). (c) •te (left o•inatc) aridc•ulative ph• (d•ht ordinate),on I•r/. •s. Linc• red--ion forph• data• slope= 2.07ms, r • > 0.999.

- ...... •o NORMAUZED MODULATION FREQUENCY (%) shownin Fig. 11(a). TheMTFs showed a limitedrange of high-and low-f,, slopesand maximum gains. That thisste- reotypedshape extended to lower CFs as well, is shownin FIG. 11. (a) SuperimposedMTFs of 29 high ( > 10 kHz) CF fibers.(b) Fig. 11(b), whereMTFs are normalizedon both the abscis- MTFs of 55 fiberssuperimposed in a normalizedformat. Normalized f,, = 100(f•/• • ). Ordinateis normalized by aligning maximal gains of sa,relative to eachfiber'sf• ds, and on the ordinate, relative all fibersat 0 dB.CFs span a rangefrom 0.38 kHz to 36 kHz. MTFs of fibers to thegain at theBMF. Boththe low- and high-f• portions with CF> 10kHz, whichwere also used in (a), aregraphed with dotted ofthe MTF oflow-CF fibers ( < 10kHz, full lines) and high- lines.Inset shows segment of MTFs (shading)superimposed by low-pass CF fibers(dotted lines) showed the same range of slopes. transferfunctions l/[l + ( f/fo):] "/: of ordern = 5 and6. Cornerfre- quencywas fixed at I ( 100%) by chosingfo= I/[2 •/" -- I ] •/: (seeWeiss The shallowslope of thelow-f,, segmentrarely exceeded 1 and Rose. 1988b). dB/oct.All fibersshowed a steep, negatively sloping high-fro segment.It is difficultto estimatethe asymptotic low-pass slope,since R wassmall and less reliable at highf,. The averageslope of theline through thef• dsandf•o as points was -- 12.4+ 4.6 dB/oct(N= 100);the asymptotic low- passslope was usually considerably steeper, as illustrated by (1987)for synchrony-level functions of increasingf,. The thecomparison with fifth- and sixth-order, low-pass transfer corollaryeffect on MTFs would be to render thesemore functions[inset Fig. 11 (b)]. No trendsin thef• dB/f•OdB bandpassat higher levels. Figure 12(b) shows a representa- ratioas a functionof SRorf• dswere apparent. tiveexample from the 13fibers for whichwe obtainedMTFs at threeor moreSPLs. Increasing SPL caused a decreasein 2. Effect of changes in m and $PL R at all f,• values,as expectedfrom the nonmonotonic In 14fibers, we obtainedMTFs at threeor morem's. As synchronylevel functions described earlier, but did not af- expectedfrom the modulation depth functions [ Fig. 3 (b) ], fectthe 1ow-f,•slope. theMTF shapewas not dependent on rn: The only differ- enceswere higher gain values and noisier appearance at low $. Relationshipof MTF to tuning curve parameters m [Fig. 12(a)]. If theMTF werestrictly governed by peripheralfilter- As notedearlier [Fig. 9(b)], we failed to seethe in- ing,then the MTF cutofffrequency should depend on the creasein R,,x andupward shift of BML reportedby Yates fibers' filter . Moreover, since the filter band-

223 J.Acoust. Sec. Am., VoL 91, No. 1, January 1992 P.X.Joris and T. C. T. Yin: Neural response toAM 223 AN fibers exhibit Lowpass AM sensitivity Auditory Nerve

Joris and Yin 1992 Modulation Sensitivity in the brainstem (CN, MNTB, LSO) is predominantly lowpass AM RESPONSES IN THE LSO CIRCUIT 259

observations concerning the origin of this drop in rate. In Fig. 13, we graph the average firing rate counterpart of the modulation transfer functions of Fig. 9, with the addition of a number of presumed SBCs to supplement the small SBC sample. The drop in firing rate at high modulation frequen- cies also is present in LSO responses to monaural modulation and is strong for contralateral modulation (Fig. 13E) and more varied for ipsilateral stimulation (Fig. 13B). Possible reasons for this drop with increasing modulation frequency would include changes in average firing rate of afferents: a decrease in rate on the excitatory (ipsilateral) side or an increase in rate on the inhibitory (contralateral) side. How- ever, responses from LSO afferents indicate that neither is the case. There is little change in average rate in SBCs (Fig. 13A), consistent with our findings in AN (Joris and Yin 1992). Similarly, rate changes in the contralateral LSO affer- ents (GBCs in Fig. 13C, MNTB cells in Fig. 13D) are smaller than in LSO and moreover are in the wrong direction. The decrease in rate above 300 Hz in many MNTB cells should cause decreased inhibitionÇ and therefore increased firing rate in LSO cells, contrary to what is observed (Fig. 13E). Although rate changes in the LSO afferents were small, we noted a systematic trend in GBCs and MNTB cells when the rate curves were graphed with modulation frequency normalized to cutoff frequency (as in Fig. 9): rate tended to peak about an octave below the cutoff frequency (not shown). This form of rate dependency has been reported for Onset-choppers in CN (Rhode and Greenberg 1994) and may reflect the stronger sensitivity of GBCs and MNTB FIG. 9. Modulation transfer functions to monaural modulation in LSO cells to intensity changes when compared with AN fibers cells and their afferents. Ordinate values are normalized to maximum gain; or SBCs. abscissa values are normalized to the 3-dB cutoff frequency at whichJoris all and Yin 1998 functions intersect. To avoid overlap between populations, data of each ADDITIONAL FINDINGS. The data presented so far give a co- population are offset by 10 dB, and some functions are clipped at the herent picture of envelope phase-locking in the LSO circuit: highest modulation frequencies. For SBCs and LSO cells, all functions LSO afferents show strong phase-locking over a wide range reaching a 3-dB cutoff are shown. For GBCs and MNTB cells, a sample of size and CF distribution similar to LSO is shown. Sample size: SBC of modulation frequencies; this is translated by LSO cells (5), GBC (14), MNTB (13), LSO ipsi (14), and LSO contra (11). Only data points with statistically significant envelope phase-locking are shown.

519 Hz excluded). (Note that values at 40 kHz have not reached the asymptote but are 0.5 ms higher). As mentioned earlier, we obtainedÇ AM data of cells pre- sumed to be SBCs on the basis of their PL response pattern or presence of a PP. To complement the small sample of SBCs, a summary of these ‘‘presumed SBCs’’ is shown in Fig. 12. Different symbols are used for the various subpopu- lations (see legend). Values of maximal synchronization (Fig. 12A) and cutoff frequency (Fig. 12B) are within the range of values observed in the AN and follow the same trends with CF, consistent with our findings for SBCs. De- lays of cells recorded in the AVCN (Fig. 12C; s or () are short and overlap with delays measured in the AN. Delays of PL fibers recorded in the TB (Fig. 12, l) are longer and similar to delays measured on MNTB cells (Fig. 11, h). FIG. 10. Examples of phase-frequency plots to monaural modulation. These data are the phase portions of the synchronization measurements of AVERAGE RATE. With increases of modulation frequency the cells shown in Figs. 5 and 6 (right). Data for ipsilateral modulation all 300 Hz, ITDs are decreasingly effective in modulating the converge to a y intercept 0.25 cycles, which is the stimulus envelope Ç firingú rate of LSO cells (Joris 1996; Joris and Yin 1995). phase measured without time delay (the stimulus envelope started in sine phase). Phase values are uncorrected for acoustic delay. Slopes of linear Moreover, the set point around which rate is modulated de- regressions, corrected for acoustic delay, were: AN, 1.55 ms; SBC, 2.50 creases, so that at high modulation frequencies, firing rate ms; GBC, 2.32 ms; MNTB, 3.16 ms; LSO ipsi, 4.75 ms; and LSO contra, often drops near zero at all ITDs. We made two kinds of 5.32 ms.

/ 9k22$$de12 12-09-97 10:06:57 neupa LP-Neurophys PERIODICITY CODING IN THE ICC 1803

IC 519 IC 115 1, = 21.5 kHz f, = 1.6 kHz fm(Hz1 fm (Hz)

I 20

1 40

--i> ,> 60

A. A u .A.1 A_ A.,

II

b 120

.-A---&AA.....,,...... ,. 140

- 80 ..pi--,-..-.-- 160

2. 6 180 AM Sensitivity in the inferior colliculus is Bandpass 1 LC,.,,,~..,,,.. 200 4 PERIODICITY CODING IN THE ICC 1803 0 50 100 150 0 50 100 150

IC 111 IC 759 IC 519 IC 115 fc = 11.5 kHz f, = 3.1 kHz 1, = 21.5 kHz f, = 1.6 kHz fm(Hz) fm(Hz1 fm(Hz1 fm (Hz) m 20 1 250 I 20

L LO ullu,,i,.,,,l,., 300 1 40

4 60 m 350 --i> ,> 60 h.0 A. A u .A.1 A_ A.,

II

b 120

.-A---&AA.....,,...... ,. 140

. i 160 - 80 ..pi--,-..-.-- 160

2. 6 180

11.200 , I 1 LC,.,,,~..,,,.. 200 0 50 100 150 0 50 100 150 4 Tme hs) Time (ms) 0 50 100 150 0 50 100 150

FIG. 2. Synchronized activity of 3 single units and 1 multiple unit (IC7.59) in response to 10 different modulation IC 111 Modulation Frequency (Hz) IC 759 frequencies&. Modulation frequencies are indicated at the right of the PSTHs (binwidth: 40 ps; stimulus duration: fc = 11.5 kHz f, = 3.1 kHz 150 ms). Bars at the bottom ltrft indicate the number of spikes during 100 repetitions for units IC519, ICI Time15, and (ms) ICI I I and during 30 repetitions for unit IC759. The carrier frequenciesf, for the amplitude modulation were at the fm(Hz) fm(Hz1 CF of the units. They are indicated above each block of PSTHs. The arrows point to the approximate BMF of each 1 unit. m 20 250

L LO ullu,,i,.,,,l,., 300 responses of the isolated unit usually closely in multiple-unit sampling. Occasionally, 4 60 m 350 paralleled the response of the unresolved when the MTF of an isolated unit signifi- units in the background. Double-peaked cantly differed from the MTF of the multiple- h.0 MTFs were only very infrequently observed unit response of the same location, th. e BMF

. i 160

11.200 , I 0 50 100 150 0 50 100 150 Tme hs) Time (ms)

FIG. 2. Synchronized activity of 3 single units and 1 multiple unit (IC7.59) in response to 10 different modulation frequencies&. Modulation frequencies are indicated at the right of the PSTHs (binwidth: 40 ps; stimulus duration: 150 ms). Bars at the bottom ltrft indicate the number of spikes during 100 repetitions for units IC519, ICI 15, and ICI I I and during 30 repetitions for unit IC759. The carrier frequenciesf, for the amplitude modulation were at the CF of the units. They are indicated above each block of PSTHs. The arrows point to the approximate BMF of each unit.

responses of the isolated unit usually closely in multiple-unit sampling. Occasionally, paralleled the response of the unresolved when the MTF of an isolated unit signifi- units in the background. Double-peaked cantly differed from the MTF of the multiple- MTFs were only very infrequently observed unit response of the same location, th. e BMF Temporal Modulation Responses are Tuned in the Inferior Colliculus but not in the auditory nerve

Auditory Nerve Inferior Colliculus

Joris and Yin 1992 Langner and Schreiner 1988 564 JORIS, SCHREINER, AND REES Modulation upper cutoff frequencies decrease systematically along the ascending auditory pathway Downloaded from

Joris, Schreiner & Reis 2004 physrev.physiology.org on October 11, 2007

FIG. 9. An overview of rMTF (left panel) and tMTF (right panel) properties at different anatomical levels. Each entry shows means or medians (circles) Ϯ SD (lines) and lowest and highest values (bar). Dark bars, thick lines, and solid circles are for rBMFs (left) and tBMFs (right); light bars, lines, and empty circles are for upper tMTF cutoff frequencies (right). For convenient comparison, the left panel is arranged mirror-symmetric with respect to the right. The population measures are taken from published data for one anatomical level, sublevel, or cell class; the numbered reference to the publication is shown next to the data, followed by a letter indicating the species (b, bat; c, cat; g, gerbil; gp, guinea pig; m, marmoset; r, rabbit; s, squirrel monkey), and the letter “U” if unanesthetized. Note that part of the differences between studies reflects differences in the metrics used (in particular upper cutoff, which is often defined as a corner frequency or alternatively as the upper limit of significant phase-locking). Approximate ranges of perceptual and sound classes are indicated below the abscissa. phase-locking; however, the proportion of such a group guinea pig (34) showed neurons had maximum following and its properties are still unexplored. It appears that the rates of Ͻ30 Hz. In later studies, the range of synchroni- majority of neurons show limiting rates below that of the zation of AI neurons to AM was systematically explored in IC, but a detailed comparative study of the transformation a variety of species. A high percentage of neurons showed of temporal coding from the IC to the MGB is still lacking. band-pass tMTFs (53, 75, 157, 256). The tBMF values in AI were found to be independent of the CF of the neurons C. Responses to AM in Primary Auditory (53, 157, 256). Accordingly, temporal information in dif- Cortex: Synchronization ferent frequency channels can be processed indepen- dently from each other; within each spectral band, AM A number of studies provided initial evidence that information can be decomposed by different neurons into temporal coding in auditory cortical neurons may be sub- different AM ranges. Much attention has therefore been stantially reduced compared with subcortical levels (Fig. given to the distribution of optimal modulation frequen- 9). Studies with FM and AM in the awake cat (300) and cies. Preferred modulation frequencies commonly vary

Physiol Rev • VOL 84 • APRIL 2004 • www.prv.org Temporal Modulation Coding

Key points: 1) Auditory Nerve has lowpass selectivity and modulation cutoffs near 1000 Hz

2) Modulation cutoff frequencies are progressively reduced along the ascending auditory pathway 3) Tuning changes from lowpass to bandpass in the IC What about Frequency selectivity? How does it change along the auditory pathway?!

AC

MGB

IC

NLL

CN MSO/LSO 2432 C. J. Sumner and A. R. Palmer

2432 C. J. Sumner and A. R. Palmer

Fig. 4. Minimum thresholds as a function of CF. The black continuous line is the ferret audiogram (Kelly et al., 1986b). The grey area is the outline of minimum thresholds reported in the inferior colliculus and auditory cortex (Moore et al., 1983; Kelly et al., 1986a; Phillips et al., 1988). The dashed line is the outline when all of the FTCs from all of the fibres are superimposed. AN, auditory nerve; CNS, central nervous system.

ferrets (t231 = )2.6, P < 0.05, t-test on thresholds corrected for CF dependence by subtracting the neural audiogram).

Frequency Selectivity In the Auditory Nerve Variation of width of frequency tuning with CF The tuning width as a function of CF for the ferret is illustrated in Fig. 5. The usual Q10 dB values (see Materials and methods) are plotted in Fig. 5A. The relative tuning was quite wide at low CFs (Q10 values of £1) and became narrower from 1 kHz up to 7–8 kHz, from  Fig. 4. Minimum thresholds as a function of CF. Thewhich black point continuous it stayed line is at approximately the same value (generally the ferret audiogram (Kelly et al., 1986b). The greybelow areaQ isof the 8). outline When of the tuning width was expressed in ERBs (see minimum thresholds reported in the inferior colliculus and10 auditory cortex (Moore et al., 1983; Kelly et al., 1986a; Phillips etMaterials al., 1988). and The methods), dashed line the function looked quite linear on a log–log is the outline when all of the FTCs from all of the fibresscale. are Following superimposed. Evans AN, (2001) (see also Sayles & Winter, 2010), this auditory nerve; CNS, central nervous system. was fitted with the function ERB 0:31CF0:533. Also shown is the kHz ¼ kHz human function, derived from psychophysical measurements [ERBkHz = 0.0247 (0.00437CFkHz + 1) (Moore & Glasberg, 1983). ferrets (t231 = )2.6, P < 0.05, t-test on thresholds(Tuning corrected values with for CFother species are compared in the Discussion.) dependence by subtracting the neural audiogram).There were no quantitative differences between albino and pigmented ferrets in either tuning measure (anova, CF · pigment · anaesthetic; Q10, F1,196 = 0.19, P = 0.66; ERB, F1,121 = 2.36, P = 0.12). Variation of width of frequency tuning with CF The tuning width as a function of CF for the ferret is illustrated in Temporal firing patterns Fig. 5. The usual Q10 dB values (see Materials and methods) are plotted in Fig. 5A. The relative tuning was quiteAdaptation wide at low CFs (Q10 values of £1) and became narrower from 1 kHz up to 7–8 kHz, from Fig. 3. Frequency response areas from six auditory nerve fibres in one animal  As in all other sensory neurons (Adrian & Zotterman, 1926), auditory with CFs ranging from below 0.6which kHz point to above it stayed 17 kHz. at Sound approximately level is thenerve same fibres value in all(generally species show adaptation (Nomoto et al., 1964; Kiang expressed in dB attenuation, wherebelow 0 dBQ attenuation10 of 8). When is 100 the dB tuning SPL. width was expressed in ERBs (see - Auditory nerve fiber (ANF) tuning is relatively homogeneouset al., 1965). It takes the form of a higher instantaneous firing rate when Materials and methods), the function lookeda quite stimulus linear is on switched a log–log on, slowing to a lower steady-state rate after a - ANF are sharply thetuned ‘neural at audiogram’) the best nicely frequencyscale. accounted Following for Evans the ferret (2001) behavioural (see also Saylesfew & tens Winter, of milliseconds. 2010), this PSTHs for the short-duration tones that we was fitted with the function ERB 0:31CF0:533. Also shown is the - ANF exhibit low thresholdsfrequency at both tails the high-CF at high and low-CF frequencies regions (dashed kHz line¼ in havekHz used here are illustrated in Fig. 6 (for CFs > 1.5 kHz) for four Fig. 4). Here and elsewhere, wehuman have separately function, plotted derived the data from from psychophysicaldifferent ranges measurements of the level of suprathreshold CF tones. The responses, two albino animals. Qualitatively,[ERBkHz the= 0.0247albinos (0.00437CF did not appearkHz + to 1) (Moorederived & Glasberg, from data collected 1983). for CF rate–level functions, are shown for a (Tuning values with other species are compared in the Discussion.) constitute a separable populationSummer with respect and to thisPalmer response 2012large number of fibres (grey), together with the mean (black). The onset characteristic. Indeed, in theThere low-frequency were no quantitative area, thresholds differences in one betweenresponse albino grew and pigmented at a higher rate than the steady-state response. We have of the albinos were among theferrets most sensitive in either in tuning any animal. measure However, (anova, CF ·quantifiedpigment these· anaesthetic; adaptation characteristics by computing the ratio of the thresholds in albinos were, onQ average,10, F1,196 3 dB= 0.19, higherP than= 0.66; in pigmented ERB, F1,121 =onset 2.36, toP = the 0.12). steady-state firing rates, and these are shown as a function

ª 2012 The Authors.Temporal European firing Journal patterns of Neuroscience ª 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd European Journal of Neuroscience, 36, 2428–2439 Adaptation Fig. 3. Frequency response areas from six auditory nerve fibres in one animal As in all other sensory neurons (Adrian & Zotterman, 1926), auditory with CFs ranging from below 0.6 kHz to above 17 kHz. Sound level is nerve fibres in all species show adaptation (Nomoto et al., 1964; Kiang expressed in dB attenuation, where 0 dB attenuation is 100 dB SPL.  et al., 1965). It takes the form of a higher instantaneous firing rate when a stimulus is switched on, slowing to a lower steady-state rate after a the ‘neural audiogram’) nicely accounted for the ferret behavioural few tens of milliseconds. PSTHs for the short-duration tones that we thresholds at both the high-CF and low-CF regions (dashed line in have used here are illustrated in Fig. 6 (for CFs > 1.5 kHz) for four Fig. 4). Here and elsewhere, we have separately plotted the data from different ranges of the level of suprathreshold CF tones. The responses, two albino animals. Qualitatively, the albinos did not appear to derived from data collected for CF rate–level functions, are shown for a constitute a separable population with respect to this response large number of fibres (grey), together with the mean (black). The onset characteristic. Indeed, in the low-frequency area, thresholds in one response grew at a higher rate than the steady-state response. We have of the albinos were among the most sensitive in any animal. However, quantified these adaptation characteristics by computing the ratio of the thresholds in albinos were, on average, 3 dB higher than in pigmented onset to the steady-state firing rates, and these are shown as a function

ª 2012 The Authors. European Journal of Neuroscience ª 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd European Journal of Neuroscience, 36, 2428–2439 Frequency Selectivity in the cochlear nucleus

- Substantially more heterogeneous than auditory nerve

- Frequency response areas exhibit inhibition 154 R. RAMACHANDRAN, K. A. DAVIS, AND B. J. MAY

discharges (n 5 134) and units that responded with a single onset spike at all frequencies and levels (n 5 12). Responses of onset units are not considered further in this report.

Frequency response maps Units with sustained responses were divided into three groups based on the patterns of excitation and inhibition re- vealed in their frequency response maps (Table 1). Represen- tative data for each unit type are shown in Fig. 2. In these plots, excitatory areas (■) are defined as stimulus conditions that elicited responses $1 SD above spontaneous activity; simi- larly, inhibitory areas ( ) indicate tone-driven rates $1 SD below spontaneous activity. Type V units (top) have a V- 154 R. RAMACHANDRAN, K. A. DAVIS, AND B. J. MAY shaped excitatory area that widens about unit BF (vertical line) with increasing sound levels. These units do not show inhibi- discharges (n 5 134) and units that responded with a single onset spike at all frequencies and levels (n 5 12). Responses of tory responses to pure tones. Type I units (middle) generally onset units are not considered further in this report. have an I-shaped excitatory area that maintains its sharp tuning at higher levels; this level tolerant excitatory area is flanked on Frequency response maps both sides by wide inhibitory areas. Some (predominantly low-BF) type I units show less pronounced inhibitory effects at Units with sustained responses were divided into three lower frequencies and thus exhibit more V-shaped excitatory groups based on the patterns of excitation and inhibition re- Frequencyareas; nonetheless, these Selectivity units are easily distinguishable from in the IC. vealed in their frequency response maps (Table 1). Represen- type V units based on the presence of strong above-BF inhi- tative data for each unit type are shown in Fig. 2. In these plots, bition. Type O units (bottom) are characterized by an O-shaped ■ excitatory areas ( ) are defined as stimulus conditions that island of excitation around BF threshold that gives way to elicited responses $1 SD above spontaneous activity; simi- inhibition at higher sound levels. Type O units may exhibit larly, inhibitory areas ( ) indicate tone-driven rates 154$1 SD R. RAMACHANDRAN, K. A. DAVIS, AND B. J. MAY below spontaneous activity. Type V units (top) have a V- additional excitatory areas, but the frequency location of these shaped excitatory area that widens about unit BF (verticaldischarges line) (n 5 134) and units that responded with a singleresponses is highly variable between units. Type V units were with increasing sound levels. These units do not showonset inhibi- spike at all frequencies and levels (n 5 12). Responsesthe of least abundant unit type in our sample (16/134); type O tory responses to pure tones. Type I units (middle) generallyonset units are not considered further in this report. units were the most prevalent (71/134). have an I-shaped excitatory area that maintains its sharp tuning Figure 3 shows the distribution of BFs for the three response at higher levels; this level tolerant excitatory area is flanked on , Frequency response maps types. The BFs of type V units were always low ( 3 kHz), both sides by wide inhibitory areas. Some (predominantly whereas the BFs of type I and type O units spanned most of the low-BF) type I units show less pronounced inhibitory effectsUnits at with sustained responses were divided intocat’s three range of audible frequencies. As in previous studies lower frequencies and thus exhibit more V-shaped excitatorygroups based on the patterns of excitation and inhibition(Aitkin re- et al. 1975; Merzenich and Reid 1974), BFs increased areas; nonetheless, these units are easily distinguishablevealed from in their frequency response maps (Table 1). Represen-as the electrode advanced from the dorsal to ventral limits of type V units based on the presence of strong above-BFtative inhi- data for each unit type are shown in Fig. 2. In thesethe plots, ICC. Consequently, the unit types were not distributed bition. Type O units (bottom) are characterized by an O-shaped excitatory areas (■) are defined as stimulus conditionsuniformly that across the dorsoventral axis of the ICC. Type V island of excitation around BF threshold that gives way to elicited responses $1 SD above spontaneous activity;units simi- typically were recorded during the initial dorsal progres- inhibition at higher sound levels. Type O units maylarly, exhibit inhibitory areas ( ) indicate tone-driven rates $1 SD additional excitatory areas, but the frequency location of these sion of the electrode track, and type I units were recorded more below spontaneous activity. Type V units (top) have a V- responses is highly variable between units. Type V units were ventrally. Type O units dominated our sample and could be the least abundant unit type in our sample (16/134);shaped type O excitatory area that widens about unit BF (verticalfound line) throughout the course of most tracks. with increasing sound levels. These units do not show inhibi- units were the most prevalent (71/134). ICC unit types show differences in the range of frequencies FIG. 2. Frequency response maps for a type V unit (unit 4.01, exp. 96/09/12), Figure 3 shows the distribution of BFs for the three responsetory responses to pure tones. Type I units (middle) generallythat evoke excitatory responses. Frequency tuning was as- type I unit (unit 1.11, exp. 96/09/20), and type O unit (unit 2.01, exp. 96/09/20). types. The BFs of type V units were always low (,have3 kHz), an I-shaped excitatory area that maintains its sharp tuning Stimulus-driven rates are plotted against frequency at multiple sound levels (nu- merical labels on right). —, average spontaneous rates. ■ (u), excitatory (inhibi- whereas the BFs of type I and type O units spanned mostat higherof the levels; this level tolerant excitatory area is flankedTABLE on 1. Response properties of ICC units with sustained tory) response areas. Sound pressure levels (SPL) in all plots are given in dB cat’s range of audible frequencies. As in previousboth studies sides by wide inhibitory areas. Some (predominantlydischarge rates low-BF) type I units show less pronounced inhibitory effects at attenuation; absolute SPL varies with the acoustic calibration, but 0 dB attenuation (Aitkin et al. 1975; Merzenich and Reid 1974), BFs increased - Like the CN, IC frequency selectivityis near is 100 dBheterogeneous re 20 mPa for tones. Vertical lines indicate BFs. as the electrode advanced from the dorsal to ventral limitslower of frequencies and thus exhibit more V-shaped excitatory Type V Type I Type O the ICC. Consequently, the unit types were not distributedareas; nonetheless, these- units are easilyIC distinguishablealso exhibits from inhibitoryUnits responsesUnits Units sessed by calculating each unit’s Q values (BF divided by type V units based on the presence of strong above-BF inhi- uniformly across the dorsoventral axis of the ICC. Type V bandwidth of the excitatory area in the frequency response units typically were recorded during the initial dorsalbition. progres- Type O units (bottom) are characterized by an O-shapedNumber of units 16 47 71 sion of the electrode track, and type I units were recordedisland more of excitation around BF threshold that gives waySpontaneous to rate, spikes/s 1.25 9.8 11 map). At 10 dB above threshold (Fig. 4A), Q10 values increase ventrally. Type O units dominated our sample and couldinhibition be at higher sound levels. Type O units may exhibitBF-tone threshold re ANF, dB 12 4 1.5 with BF until ;10 kHz, after which they remain relatively Max rate for BF tone, spikes/s 58 101 34 unchanged. Most data points fall within the range of values found throughout the course of most tracks. additional excitatory areas, but the frequency location ofMax these rate for noise, spikes/s 72 63Ramachandran 40 et al 1999 ICC unit types show differences in the range of frequenciesresponses is highlyFIG. 2. variable Frequency betweenresponse maps units. for a Type type V V unit units (unitNoise were4.01, exp. threshold 96/09/12), re tone threshold, dB 3.7 4.1 2.2 recorded from auditory-nerve fibers (ANFs) in our laboratory that evoke excitatory responses. Frequency tuningthe was least as- abundanttype I unit unit (unit 1.11, typeexp. in 96/09/20), our sample and type (16/134); O unit (unit typeNorm 2.01, Oexp. slope 96/09/20). for tone, /dB 20.0012 20.0081 20.052 (Calhoun et al. 1997; Miller et al. 1997); therefore, low-level Stimulus-driven rates are plotted against frequency at multipleNorm sound slope levels for noise, (nu- /dB 0.0008 0.0019 20.032 units weremerical the most labels prevalent on right). —, (71/134). average spontaneous rates. ■ (u), excitatory (inhibi- frequency tuning in the ICC appears to be determined by TABLE 1. Response properties of ICC units with sustained Figure 3 shows the distribution of BFs for the three response tory) response areas. Sound pressure levels (SPL) in all plotsTable are entriesgiven in are dB median values. ANF, auditory nerve fiber. Noise thresholds peripheral processes. At 40 dB above threshold (Fig. 4B), type discharge rates types. Theattenuation; BFs of type absolute V SPL units varies were with always the acoustic low calibration, (,3 kHz), but 0 dB attenuation V units continue to follow the tuning properties of ANFs (data is near 100 dB re 20 mPa for tones. Vertical lines indicatewere BFs. computed over a bandwidth 10 dB above the best frequency (BF)-tone Type V Type I whereasType O the BFs of type I and type O units spanned most ofthreshold the for each unit. taken from Liberman 1978), while type O units show no cat’s range of audible frequencies. As in previous studies Units Units Units sessed by calculating each unit’s Q values (BF divided by (Aitkin et al. 1975; Merzenich and Reid 1974), BFs increased bandwidth of the excitatory area in the frequency response Number of units 16 47as 71 the electrode advanced from the dorsal to ventral limits of Spontaneous rate, spikes/s 1.25 9.8 11 map). At 10 dB above threshold (Fig. 4A), Q values increase the ICC. Consequently, the unit types were not distributed10 BF-tone threshold re ANF, dB 12 4 1.5 with BF until ;10 kHz, after which they remain relatively uniformly across the dorsoventral axis of the ICC. Type V Max rate for BF tone, spikes/s 58 101 34 unchanged. Most data points fall within the range of values Max rate for noise, spikes/s 72 63units 40 typically were recorded during the initial dorsal progres- recorded from auditory-nerve fibers (ANFs) in our laboratory Noise threshold re tone threshold, dB 3.7 4.1sion 2.2 of the electrode track, and type I units were recorded more Norm slope for tone, /dB 20.0012 20.0081 20.052 (Calhoun et al. 1997; Miller et al. 1997); therefore, low-level ventrally. Type O units dominated our sample and could be Norm slope for noise, /dB 0.0008 0.0019 20.032 frequency tuning in the ICC appears to be determined by found throughout the course of most tracks. Table entries are median values. ANF, auditory nerve fiber. Noise thresholds peripheral processes. At 40 dB above threshold (Fig. 4B), type ICC unitV types units show continue differences to follow in thetuning range of properties frequencies of ANFsFIG (data. 2. Frequency response maps for a type V unit (unit 4.01, exp. 96/09/12), were computed over a bandwidth 10 dB above the best frequencythat (BF)-tone evoke excitatory responses. Frequency tuning was as- type I unit (unit 1.11, exp. 96/09/20), and type O unit (unit 2.01, exp. 96/09/20). threshold for each unit. taken from Liberman 1978), while type O units showStimulus-driven no rates are plotted against frequency at multiple sound levels (nu- merical labels on right). —, average spontaneous rates. ■ (u), excitatory (inhibi- TABLE 1. Response properties of ICC units with sustained tory) response areas. Sound pressure levels (SPL) in all plots are given in dB discharge rates attenuation; absolute SPL varies with the acoustic calibration, but 0 dB attenuation is near 100 dB re 20 mPa for tones. Vertical lines indicate BFs. Type V Type I Type O Units Units Units sessed by calculating each unit’s Q values (BF divided by Number of units 16 47 71 bandwidth of the excitatory area in the frequency response Spontaneous rate, spikes/s 1.25 9.8 11 map). At 10 dB above threshold (Fig. 4A), Q10 values increase BF-tone threshold re ANF, dB 12 4 1.5 with BF until ;10 kHz, after which they remain relatively Max rate for BF tone, spikes/s 58 101 34 Max rate for noise, spikes/s 72 63 40 unchanged. Most data points fall within the range of values Noise threshold re tone threshold, dB 3.7 4.1 2.2 recorded from auditory-nerve fibers (ANFs) in our laboratory Norm slope for tone, /dB 20.0012 20.0081 20.052 (Calhoun et al. 1997; Miller et al. 1997); therefore, low-level Norm slope for noise, /dB 0.0008 0.0019 20.032 frequency tuning in the ICC appears to be determined by Table entries are median values. ANF, auditory nerve fiber. Noise thresholds peripheral processes. At 40 dB above threshold (Fig. 4B), type were computed over a bandwidth 10 dB above the best frequency (BF)-tone V units continue to follow the tuning properties of ANFs (data threshold for each unit. taken from Liberman 1978), while type O units show no How are spectral and temporal sound cues encoded in the IC Laminar Organization in the IC!

Morest & Oliver 1984 Organization of the IC

How are acoustic attributes organized within the IC? Frequency 1)Frequency Organization

2)Spectral modulation preferences.

3)Temporal modulation preferences.

Dorsal

Medial Lateral

Ventral Frequency Organization

IC CTX

Dorsal Medial

Medial Lateral CerebellumCaudal Rostral Ventral Lateral Best Frequency Increases with Penetration Depth Frequency organization within the IC volume.

Medial Dorsal Frequency (octaves) Frequency (octaves)

Ventral Lateral Discrete ~1/3 octave jumps in BF are observed as a function of penetration depth

Schreiner & Langner 1997 1/3 octave jumps extend along the laminar axis

This finding is consistent with the hypothesis that anatomical lamina provide the substrate for frequency resolution of the IC. Organization of the IC

Frequency Circular Organization for spectral resolution and temporal modulations

Dorsal Schreiner and Langner 1988

Medial Lateral

Ventral Traditional Approach for Measuring Neural Sensitivity Frequency Response Area 1)Play sound 2)Measure firing rate

This approach assumes that firing rate is the key response variable. It completely ignores phase- locking and temporal pattern of the response. Alternative approach

1) Play a persistent complex sound.

The sound should contain a high degree of complexity so that many sound features are covered.

2) Let the neuron tell you what acoustic features it likes! Example persistent sounds Measuring Neuronal Sensitivity - “Spectro-Temporal Receptive Field (STRF)”

Neuronal Response STRF - two alternative interpretations 1) Sound point of view - can be viewed as the “overage” or “optimal” sound that tends to activate the neuron (sounds that produce action potential). 2) Neuron point of view - can alternately be viewed as the functional integration of the neuron. a) Red indicates excitation whereas blue indicates inhibition. b) The duration of the STRF tells you about the integration time. Spectrotemporal Receptive Field (STRF)

Time and Frequency Resolution can be measured from the Δf STRF Δt

Time Resolution =STRF average duration=Δt

Frequency Resolution = STRF average bandwidth=Δf Latency and best frequency are can be defined by the Latency excitatory peak BF Latency and best frequency are can be defined by the excitatory peak

Modulation preferences depend on excitatory/ inhibitory relationship Latency and best frequency are apparent from the excitatory peak

Modulation preferences depend on excitatory/ inhibitory relationship STRF Preference

Spectral: on-off

Temporal: on

Modulation Preference

Spectral MTF: Bandpass

Temporal MTF: Lowpass STRF Preference

Spectral: on

Temporal: on-off

Modulation Preference

Spectral MTF: Lowpass

Temporal MTF: Bandpass Example STRFs

Miller et al 2002; Escbi et al 2002 IC Units Exhibit a Spectrotemporal Resolution Tradeoff

High Spectral Resolution

High Temporal Resolution Spectral Resolution

Temporal ResolutionQiu et al 2003; Rodriguez 2010 Tradeoff resembles modulation spectrum of natural sounds

Rodriguez et al 2010 ICC exhibits tradeoff ICC between temporal and spectral modulation B

MGBv

Tradeoff is not evident in C the thalamus or cortex AI

Escabi et al 2002; Miller et al 2002 Tonotopic Gradient is Evident in the IC With The Electrode Array Modulation tradeoff is organized along the Frequency dimension.

Temporal Modulation Frequency Spectral Modulation Frequency * * 200 * N.S. 1 * 150 * 100 0.5

50 0 0 2−4 4−8 8−16 16−32 2−4 4−8 8−16 16−32 Best Frequency (kHz) Best Frequency (kHz)

• Low Frequency Neurons are fast (high temporal modulation), Higher frequency neurons are slow (low temporal modulations).

• For spectral preferences, low freqeuncy neurons have coarse spectral resolution, while high freqeuncy neurons have high spectral resolution. How are spectrotemporal preference organized within and across the IC Lamina?

Constant BF

IC CTX

Dorsal Medial

Medial Lateral CerebellumCaudal Rostral Ventral Lateral G. Langner et al. / Hearing Research 168 (2002) 110^130 121

latency was around 4 ms. This may correspond to the two-dimensional regression analysis (ANOVA) between total conduction time from the auditory nerve to the latency (L), and the periods 1/BMF and 1/BF of these output of the IC unit in addition to an acoustic delay of data revealed the following relationship: about 0.1 ms. Since for each BF there is not only a L 7:24 0:52 ms 0:15 0:03 U1=BMF distribution of latencies but also of BMFs (Fig. 8a) ¼ð Æ Þ þð Æ Þ þ one might expect a systematic relationship between on- 1:6 0:34 U1=BF; set latency and BMF. Here we evaluated those latencies ð Æ Þ for which BF, rate-BMF, and sync-BMF could be mea- sured and rate- and sync-BMF were in close register. A with BMF and BF measured in kHz (or 1/BMF and

TemporalG. Langner et al. / Hearing ResearchPreferences 168 (2002) 110^130 are organized121 within a latency was around 4 ms. This may correspond to the two-dimensional regression analysis (ANOVA) between total conduction time from the auditory nerve to the latency (L), and the periods 1/BMF and 1/BF of these output of the IC unit in addition to an acoustic delay of data revealed the following relationship: about 0.1 ms. Since for each BF there is not only a frequency lamina L 7:24 0:52 ms 0:15 0:03 U1=BMF distribution of latencies but also of BMFs (Fig. 8a) ¼ð Æ Þ þð Æ Þ þ one might expect a systematic relationship between on- 1:6 0:34 U1=BF; set latency and BMF. Here we evaluated those latencies ð Æ Þ for which BF, rate-BMF, and sync-BMF could be mea- sured and rate- and sync-BMF were in close register. A with BMF and BF measured in kHz (or 1/BMF and b

Fig. 9. Periodicity and latency map in an awake chinchilla. (a) BMFs of 52 multiple units with BP MTFs in the frequency band of 6 kHz of the ICC. The fundamental plane covers nearly the whole frequency band, while the vertical axis represents more than ¢ve octaves of BMFs with low BMFs medial and high BMFs lateral. The inset shows the distribution of data points; the color bar indicates BMF measured in Hz. (b) Latency map for the same and additional units (together 83) with long latencies medial and short latencies lateral. The color bar indicates latency measured in ms. Note the di¡erent viewing angles in (a) and (b). Langner et al 2002

HEARES 3898 27-6-02 Cyaan Magenta Geel Zwart

Fig. 9. Periodicity and latency map in an awake chinchilla. (a) BMFs of 52 multiple units with BP MTFs in the frequency band of 6 kHz of the ICC. The fundamental plane covers nearly the whole frequency band, while the vertical axis represents more than ¢ve octaves of BMFs with low BMFs medial and high BMFs lateral. The inset shows the distribution of data points; the color bar indicates BMF measured in Hz. (b) Latency map for the same and additional units (together 83) with long latencies medial and short latencies lateral. The color bar indicates latency measured in ms. Note the di¡erent viewing angles in (a) and (b).

HEARES 3898 27-6-02 Cyaan Magenta Geel Zwart Laminar Organization (11-16 kHz): Spectrotemporal Resolution STRF Latency Spectral Modulation Freq.

Temporal Modulation Freq.

Medial

Caudal Rostral

Lateral Periodicity and Tonotopy in the Primate IC

Baumann et al 2011 Periodicity and Tonotopy are approximately orthogonal in the Primate IC

Baumann et al 2011 Langner et al. Pitch map in cat AI

A B Tonotopy Periodotopy kHz kHz Hz C 0.8 0.8 25 1.6

3.2 1.6 50 6.4

12.8 3.2 100 Hz D 25 Langner et al. Pitch map in cat AI 6.4 200

A B Tonotopy Periodotopy kHz 50 kHz Modulation preferencesHz C are also 0.8 0.8 25 organized in auditory cortex 1.6 100 3.2 1.6 50 6.4 Langner et al. 12.8 Pitch map in cat AI 400 Tonotopy Periodicity 12.8 200 3.2 100 A B Tonotopy Periodotopy kHz Hz kHz Hz C D 0.8 25 0.8 25 400 6.4 200 max 1.6 50 3.2 d 100 1.6 50 12.8 6.4 400 200

12.8 400 max 3.2 100 d Hz D FIGURE 2 | OpticalD imaging of intrinsic25 signals in cat auditory cortex after pure tone responses, harmonic sounds activated larger areas that are less 200 6.4 R FIGURE 2 | Optical imaging of intrinsic signals in cat auditory cortex after pure tone responses, harmonic sounds activated larger areas that are less stimulation with purestimulation tones with pure tones and and harmonic harmonic sounds.50 (A) Single sounds. condition homogeneous (A) Single than those recordedconditionDinse for pure et tones. al In contrast2009 to the rostro-homogeneous than those recorded for pure tones. In contrast to the rostro- maps for stimulation with pure tone bursts of 0.8 to 12.8 kHz as indicated. caudal shift for variation of frequency, systematic variation of fundamental Sound intensity was 40 dB SPL. Bar length is 1 mm. Warm colors show regions frequencies from low to high revealed a shift of the activated areas from dorsal maps for stimulationof reflwith ectance changes,pure indicating tone enhanced bursts cortical100 activation. of Each0.8 single to 12.8to ventral. kHz (C) Frequency as indicated. composite map calculated from the single conditioncaudal shift for variation of frequency, systematic variation of fundamental condition map was individually scaled to its maximal refl ectance change. The full maps shown in (A). For each pixel comprising the optical map, the frequency 12.8 Sound intensity400 wascolor 40 scale dB corresponds SPL. to fractional Bar refl ectancelength changes ofis maximal 1 mm. 4.9 × 10−2. Warmpreference betweencolors 0.8 and show12.8 kHz is color regions coded according to the colorfrequencies bar from low to high revealed a shift of the activated areas from dorsal There is a shift of the areas of refl ectance changes200 from caudal to rostral when shown on the right. The frequency map demonstrates a smooth and highly of refl ectance changes,the frequency indicating of stimulation is increased, enhanced indicating the tonotopic cortical or activation.ordered representation Each of frequencies single in AI confi rming the tonotopic gradientto ventral. (C) Frequency composite map calculated from the single condition cochleotopic gradient. (B) Single condition maps for400 stimulation with harmonic running along the rostro-caudal axis. (D) Periodicity composite map calculated max sounds with harmonics covering the frequency range from 0.4 to 4.8 kHz and from the single condition maps shown in (B). For each pixel comprising the condition map was individuallydifferent fundamental frequencies scaled between 25to and its 400 Hz, maximal as indicated. In refloptical ectance map, the periodicity change. preference between The 25 and full 400 Hz is color codedmaps shown in (A). For each pixel comprising the optical map, the frequency d order to have a constant bandwidth the lowest component of these stimuli were according to the color bar shown on the right. The periodicity map corroborates −2 color scale correspondskept at 400 to Hz and fractional higher fundamentals wererefl excluded. ectance Same animal changes as in that periodicity of maximal is systematically mapped 4.9 in AI × along 10 a ventro-dorsal. gradientpreference between 0.8 and 12.8 kHz is color coded according to the color bar Figure 2A. Sound intensity was 40 dB SPL. Conventions as in (A). Compared to orthogonal to the gradient of the frequency representation in AI. There is a shift of the areas of refl ectance changes from caudal to rostral when shown on the right. The frequency map demonstrates a smooth and highly FIGURE 2 | Optical imaging of intrinsic signals in cat auditory cortex after pure toneand responses, Tanaka, harmonic 2002; Versnelsounds activated et al., 2002;larger areas Nelken that areet al.,less 2004; Ojima fi ve different fundamental frequencies between 25 Hz and stimulation with pure tones and harmonicthe sounds. frequency(A) Single condition ofhomogeneous stimulation than those recorded is forincreased, pure tones. In contrast toindicating the rostro- the tonotopic or ordered representation of frequencies in AI confi rming the tonotopic gradient maps for stimulation with pure tone bursts of 0.8 to 12.8 kHz as indicated. caudal shiftet al., for 2005variation), we of frequency,found fairly systematic large areasvariation of of refl fundamental ectance changes, with 400 Hz (for schematic illustrations of stimuli see Figure 1 and Sound intensity was 40 dB SPL. Bar length iscochleotopic 1 mm. Warm colors show regions gradient. frequenciesan often from (B) low patch-like to highSingle revealed appearance, a shiftcondition of the specifi activated cally areas inmaps fromthe dorsalcenter for of the stimulation Supporting Information with for harmonic sound fi les). As a rule the responserunning along the rostro-caudal axis. (D) Periodicity composite map calculated of refl ectance changes, indicating enhanced cortical activation. Each single to ventral.frequency (C) Frequency bands. composite The frequency-specifi map calculated from c the shift single of conditionresponse patterns areas obtained for harmonic sounds were larger than condition map was individually scaled to its maximal refl ectance change. The full maps shownalong in the(A). Forrostro-caudal each pixel comprising axis confi the opticalrms themap, well-known the frequency tonotopic those obtained with pure tones. In accordance with their identical color scale corresponds to fractional refl ectancesounds changes of maximal with 4.9 × 10 harmonics−2. preference between covering 0.8 and 12.8 kHz is colorthe coded frequency according to the color bar range from 0.4 to 4.8 kHz and from the single condition maps shown in (B). For each pixel comprising the There is a shift of the areas of refl ectance changes from caudal to rostral when shown ongradient the right. as The revealed frequency in mapelectrophysiological demonstrates a smooth experiments and highly (Merzenich spectral range, the response areas covered nearly the same extent the frequency of stimulation is increased, indicatingdifferent the tonotopic orfundamental orderedet representation al., 1975;frequencies Imig of frequencies and Reale, in AI 1980 confi between rming). the tonotopic gradient25 and 400(4.5 Hz, mm) ofas the indicated. rostro-caudal axis, In but when the fundamentaloptical map, the periodicity preference between 25 and 400 Hz is color coded cochleotopic gradient. (B) Single condition maps for stimulation with harmonic running alongIn the Figure rostro-caudal 2B single axis. (D) condition Periodicity compositemaps in mapthe calculated same animal are frequency was increased we observed systematic shifts of the sounds with harmonics covering the frequencyorder range from 0.4to to 4.8have kHz and a constantfrom theshown single condition for harmonicbandwidth maps shown sounds in (B) with. For theeach frequency pixel lowest comprising components the component extend- response areasof these from dorsal stimuli to ventral, approximately were orthogonalaccording to the color bar shown on the right. The periodicity map corroborates different fundamental frequencies between 25 and 400 Hz, as indicated. In optical map,ing fromthe periodicity 0.4 kHz preference to an upper between cut-off 25 and 400 frequency Hz is color atcoded 4.8 kHz and to the tonotopic gradient. order to have a constant bandwidth the lowest component of these stimuli were according to the color bar shown on the right. The periodicity map corroborates kept at 400 Hz and higher fundamentals werekept excluded. Sameat animal400 as inHz andthat periodicity higher is systematically fundamentals mapped in AI along a ventro-dorsal were gradient excluded. Same animal as in that periodicity is systematically mapped in AI along a ventro-dorsal gradient Figure 2A. Sound intensity was 40 dB SPL. Conventions as in (A). Compared to orthogonal to the gradient of the frequency representation in AI. Figure 2A. Sound intensityFrontiers in Integrative was Neuroscience 40 dB SPL. Conventionswww.frontiersin.org as in (A). ComparedNovember 2009 | Vtoolum e 3 | Articorthogonalle 27 | 5 to the gradient of the frequency representation in AI. and Tanaka, 2002; Versnel et al., 2002; Nelken et al., 2004; Ojima fi ve different fundamental frequencies between 25 Hz and et al., 2005), we found fairly large areas of refl ectance changes, with 400 Hz (for schematic illustrations of stimuli see Figure 1 and an often patch-like appearance, specifi cally in the center of the Supporting Information for sound fi les). As a rule the response frequency bands. The frequency-specifi c shift of response patterns areas obtained for broadband harmonic sounds were larger than along the rostro-caudal axis confi rms the well-known tonotopic those obtained with pure tones. In accordance with their identical gradient as revealed in electrophysiologicaland experiments Tanaka, (Merzenich 2002; spectral range, Versnel the response areas et covered al., nearly 2002; the same extentNelken et al., 2004; Ojima fi ve different fundamental frequencies between 25 Hz and et al., 1975; Imig and Reale, 1980). (4.5 mm) of the rostro-caudal axis, but when the fundamental In Figure 2B single condition maps in the same animal are frequency was increased we observed systematic shifts of the shown for harmonic sounds withet frequency al., components 2005 extend-), weresponse found areas from fairly dorsal to ventral, large approximately areas orthogonal of refl ectance changes, with 400 Hz (for schematic illustrations of stimuli see Figure 1 and ing from 0.4 kHz to an upper cut-offan frequency often at 4.8 patch-likekHz and to the tonotopic appearance, gradient. specifi cally in the center of the Supporting Information for sound fi les). As a rule the response

Frontiers in Integrative Neurosciencefrequency www.frontiersin.org bands. The frequency-specifiNovember 2009 | Volume 3 | Articl e c27 | shift5 of response patterns areas obtained for broadband harmonic sounds were larger than along the rostro-caudal axis confi rms the well-known tonotopic those obtained with pure tones. In accordance with their identical gradient as revealed in electrophysiological experiments (Merzenich spectral range, the response areas covered nearly the same extent et al., 1975; Imig and Reale, 1980). (4.5 mm) of the rostro-caudal axis, but when the fundamental In Figure 2B single condition maps in the same animal are frequency was increased we observed systematic shifts of the shown for harmonic sounds with frequency components extend- response areas from dorsal to ventral, approximately orthogonal ing from 0.4 kHz to an upper cut-off frequency at 4.8 kHz and to the tonotopic gradient.

Frontiers in Integrative Neuroscience www.frontiersin.org November 2009 | Volume 3 | Article 27 | 5 What does brain activity tell us about natural sounds? 3332 N. MESGARANI, S. V. DAVID, J. B. FRITZ, AND S. A. SHAMMA

Relationship between optimal prior and flat Quantifying reconstruction accuracy prior reconstruction To make unbiased measurements of the accuracy of reconstruction, The two methods for reconstructing the input spectrograms from a subset of validation data was reserved from the data used for the neural responses are shown schematically in Fig. 1. In optimal estimating the reconstruction filter, (G for optimal prior reconstruction prior reconstruction, we directly estimated the optimal linear mapping or F for flat prior reconstruction). The estimated filter was used to from neural responses to the stimulus spectrogram. This method reconstruct the stimulus in the validation set, and reconstruction accuracy was measured in two ways: 1) correlation coefficient (Pear- optimally minimizes the mean-squared error (MSE) of the estimated son’s r) between the reconstructed and original stimulus spectrogram spectrogram. In flat prior reconstruction, we used the neuron’s STRFs and 2) mean squared error. to construct a mapping from neural responses to stimulus spectro- In addition to measuring global reconstruction error for TORCs, we gram. This method inverts a set of STRFs that are estimated by also measured error separately for different parts of the modulation minimizing the MSE of the predicted neural responses. The optimal spectrum (i.e., separately for different rates and spectral scales). To do prior method and the STRF estimation are the complementary forward this, we subtracted the reconstructed TORC spectrogram from the Downloaded from and backward predictions in the linear regression framework, and the original spectrogram and computed the modulation spectrum of the goodness of the fit is the same for both directions (in terms of the difference. The normalized error for specific rates and scales was explained fraction of variance, R2) (Draper and Smith 1998). Despite defined as the mean squared magnitude of the error modulation the structural similarities, there are significant conceptual differences spectrum, averaged only over the desired range of rates or scales. between the two methods. One main difference between the two is Because stimuli were reconstructed from a finite number of neurons inclusion and exclusion of known statistical structure of the input in and a finite amount of fit data, we also measured the effect of limited the reconstruction. Because stimulus correlations are removed during sampling on reconstruction performance. For a set of N neurons and Ϫ1 http://jn.physiology.org/ STRF estimation (Css in Eq. 5), flat prior reconstruction has no T seconds of fit data, the reconstruction error can be attributed to two access to stimulus statistics. The optimal prior method, in contrast, sources: failure of the neural response to encode stimulus features and uses whatever stimulus correlations are available to improve recon- error from limited sampling of the neural population and fit data. If we struction accuracy, even if the information is not explicitly encoded3332 in assume that the sources of errorN. MESGARANI, are additive, as S.N V.and DAVID,T grow J. B. larger, FRITZ, AND S. A. SHAMMA Relationshipthe error between from limited optimal sampling prior should and flat fall off inversely with NQuantifyingand T. reconstruction accuracy neural responses. prior reconstruction To determine reconstruction error in the absence of noise, we mea-To make unbiased measurements of the accuracy of reconstruction, Figure 1B shows this point intuitively. In this example, data areThe two methods for reconstructing the input spectrograms from a subset of validation data was reserved from the data used for sured error, e, for a range of values of N and T and fit the functionestimating the reconstruction filter, (G for optimal prior reconstruction located in three-dimensional (3D) space (xyz), but the STRF responsethe neural responses are shown schematically in Fig. 1. In optimal prior reconstruction,(David and Gallant we directly 2005) estimated the optimal linear mapping or F for flat prior reconstruction). The estimated filter was used to from neural responses to the stimulus spectrogram. This method reconstruct the stimulus in the validation set, and reconstruction projects them onto the xy plane. Clearly, one dimension of the input accuracy was measured in two ways: 1) correlation coefficient (Pear- optimally minimizes the mean-squared error (MSE) of the estimated son’s r) between the reconstructed and original stimulus spectrogram

spectrogram. In flat prior reconstruction, we used the neuron’s STRFs at Univ of Connecticut Health Ctr on June 13, 2013 (z) is lost in this transformation, and we cannot accurately reconstruct and 2) mean squared error. to construct a mapping from neural responsesB toC stimulus spectro- the points in 3D by having only their projections (neural response) and In addition to measuring global reconstruction error for TORCs, we gram. This method inverts a sete ofϭ STRFsA ϩ ϩ that are estimated by also(7 measured) error separately for different parts of the modulation minimizing the MSE of the predicted neuralN responses.T The optimal spectrum (i.e., separately for different rates and spectral scales). To do knowledge of the STRFs (flat prior method, F). However, becauseprior method and the STRF estimation are the complementary forward this, we subtracted the reconstructed TORC spectrogram from the Downloaded from there is correlation between z and the other two dimensions (in thisand backward predictions in the linear regression framework, and the original spectrogram and computed the modulation spectrum of the goodnessThe of the limit fit of is reconstruction the same for both error directions for arbitrarily (in terms large ofN theanddifference.T was The normalized error for specific rates and scales was example, all the points belong to a plane z ϭ x ϩ y), having accessexplained to fraction of variance, R2) (Draper and Smith 1998). Despite defined as the mean squared magnitude of the error modulation the structuraltaken to similarities, be A. To make there are unbiased significant measurements conceptual differences of parameterspectrum, values averaged only over the desired range of rates or scales. this prior knowledge in addition to the STRFs and neural respon-between the two methods. One main difference between the two is Because stimuli were reconstructed from a finite number of neurons inclusionfor andEq. exclusion 7, we used of a known procedure statistical in which structure independent of the input subsets in and of the a finite amount of fit data, we also measured the effect of limited ses enables the correct reconstruction of the points in 3D (optimalthe reconstruction. Because stimulus correlations are removed during sampling on reconstruction performance. For a set of N neurons and entire availableϪ1 data set were used to measure reconstruction error for http://jn.physiology.org/ STRF estimation (Css in Eq. 5), flat prior reconstruction has no T seconds of fit data, the reconstruction error can be attributed to two prior, G). accessdifferent to stimulus values statistics. of N and TheT optimal. prior method, in contrast, sources: failure of the neural response to encode stimulus features and uses whatever stimulus correlations are available to improve recon- error from limited sampling of the neural population and fit data. If we “Forward”struction accuracy, even Model if the information of isthe not explicitly Brain encoded in assume that the sources of error are additive, as N and T grow larger, neural responses. the error from limited sampling should fall off inversely with N and T. StimulusFigure 1B shows this point intuitively. In this example, data are To determine reconstruction error in the absence of noise, we mea- ABlocated in three-dimensional (3D) space (xyz), but the STRF response sured error, e, for a range of values of N and T and fit the function (David and Gallant 2005) projects them onto the xy plane. Clearly, one dimension of the input

(z) is lost in this transformation, and we cannot accuratelyZ reconstruct at Univ of Connecticut Health Ctr on June 13, 2013 B C the points in 3D by having only their projections (neural response) and e ϭ A ϩ ϩ (7) 8 knowledge of the STRFs (flat prior method, F). However, because N T there is correlation between z and the other two dimensions (in this The limit of reconstruction error for arbitrarily large N and T was example, all the points belong to a plane z x y), having access to Flat Prior Model Optimal PriorCan Model weϭ ϩ predict braintaken to beA. To make unbiased measurements of parameter values (KHz) this prior knowledge in addition to the STRFs and neural respon- for Eq. 7, we used a procedure in which independent subsets of the Frequency ses enables the correct reconstruction of the points in 3D (optimal entire available data set were used to measure reconstruction error for 0.25 0123prior, G). activity given a sound?different values of N and T. Time (sec) Predicted responses H 8 Neural responsesAB8 Stimulus cell c033a-b2 Z

200 8 (KHz) (KHz) 100 Frequency (Hz) Frequency Flat Prior Model Optimal Prior Model

(KHz) Y

0 Frequency

0.25 Firing rate 0.25 X 0 1 2 3 0.25 0 1 2 3 0 0.07 0.14 0 0.07 0.14 0123 Time (sec) Time (sec) Time (sec) Predicted responses H 8 8 8 Neural responses 8 cell c034a-d2 cell c033a-b2 200 (KHz) (KHz) 100 Frequency (Hz) 200 Frequency Y 0 (KHz) (KHz)

0.25 Firing rate 0.25 X 0 1 2 3 0 0.07 0.14 0 1 2 3 0 0.07 0.14 100 Time (sec) Time (sec) Frequency Frequency (Hz) Mathematical 8 8 0.25 0 cell c034a-d2 0.25 0 1 2 3 0 0.07 0.14 Firing rate 0 1 2 3 0 0.07 0.14 200 (KHz) 100 (KHz) Frequency Frequency Time (sec) Time (sec) (Hz) 0.25 0 0.25 0 1 2 3 0 0.07 0.14ReconstructedFiring rate 0 speech1 2 3 0 0.07 0.14 Model / ComputerTime (sec) Time (sec) Prediction Reconstruction Reconstructed speech Prediction Reconstruction

Algorithm 8 8 8 cell j019e-1 8 8 8 200 (KHz) (KHz) (KHz) 100 Frequency (Hz) cell j019e-1 Frequency Frequency 0.25 0 0.25 0.25 0 1 2 3 0 0.07 0.14 Firing rate 0 1 2 3 0 0.07 0.14 0123 Time (sec) 200 Time (sec) Time lag (sec) Time (sec) Time lag (sec) (KHz) (KHz) (KHz) 100 FIG. 1. Optimal prior vs. flat prior reconstruction. A: optimal prior reconstruction (G) is the optimal linear mapping from a population of neuronal responses Frequency (Hz) Frequency back to the sound spectrogramFrequency (right). Using optimal prior reconstruction, one can reconstruct the spectrogram of a sound, not only features that are explicitly 0.25 0 0.25 0.25 0 1 2 3 Firing rate 0 1 coded by2 neurons but3 also features that are correlated with them. Flat prior reconstruction (F) is the best linear mapping of responses to stimulus spectrogram 0 0.07 0.14 0 0.07 0.14 0123Time (sec) Time (sec) Time lag (sec) Timewhen (sec) only the information explicitlyTime lag (sec) encoded by neurons (i.e., their spectro-temporal receptive fields) is known. B: for this simple 3-dimensional system, only x and y values are explicitly encoded in the neural response. Optimal prior reconstruction can reproduce the entire stimulus, using the knowledge that z values FIG. 1. Optimal prior vs. flat prior reconstruction. A: optimal prior reconstructionare correlated (G) with is thex optimaland y (in linear this example, mappingz fromϭ x ϩ a populationy). However, of flat neuronal prior reconstruction responses does not recover information about the data on the z-axis. back to the sound spectrogram (right). Using optimal prior reconstruction, one can reconstruct the spectrogram of a sound,J not Neurophysiol only features¥ VOL that 102 are¥ DECEMBER explicitly 2009 ¥ www.jn.org coded by neurons but also features that are correlated with them. Flat prior reconstruction (F) is the best linear mapping of responses to stimulus spectrogram when only the information explicitly encoded by neurons (i.e., their spectro-temporal receptive fields) is known. B: for this simple 3-dimensional system, only x and y values are explicitly encoded in the neural response. Optimal prior reconstruction can reproduce the entire stimulus, using the knowledge that z values are correlated with x and y (in this example, z ϭ x ϩ y). However, flat prior reconstruction does not recover information about the data on the z-axis.

J Neurophysiol ¥ VOL 102 ¥ DECEMBER 2009 ¥ www.jn.org Taking the “Brain’s Point of View”

1) Unlike the experimenter, the brain does not care about models that can predict neural responses

2) Instead, the brain infers information about relevant environmental sounds from neural responses 3332 N. MESGARANI, S. V. DAVID, J. B. FRITZ, AND S. A. SHAMMA

Relationship between optimal prior and flat Quantifying reconstruction accuracy prior reconstruction To make unbiased measurements of the accuracy of reconstruction, The two methods for reconstructing the input spectrograms from a subset of validation data was reserved from the data used for the neural responses are shown schematically in Fig. 1. In optimal estimating the reconstruction filter, (G for optimal prior reconstruction prior reconstruction, we directly estimated the optimal linear mapping or F for flat prior reconstruction). The estimated filter was used to from neural responses to the stimulus spectrogram. This method reconstruct the stimulus in the validation set, and reconstruction accuracy was measured in two ways: 1) correlation coefficient (Pear- optimally minimizes the mean-squared error (MSE) of the estimated son’s r) between the reconstructed and original stimulus spectrogram spectrogram. In flat prior reconstruction, we used the neuron’s STRFs and 2) mean squared error. to construct a mapping from neural responses to stimulus spectro- In addition to measuring global reconstruction error for TORCs, we gram. This method inverts a set of STRFs that are estimated by also measured error separately for different parts of the modulation minimizing the MSE of the predicted neural responses. The optimal spectrum (i.e., separately for different rates and spectral scales). To do prior method and the STRF estimation are the complementary forward this, we subtracted the reconstructed TORC spectrogram from the Downloaded from and backward predictions in the linear regression framework, and the original spectrogram and computed the modulation spectrum of the goodness of the fit is the same for both directions (in terms of the difference. The normalized error for specific rates and scales was explained fraction of variance, R2) (Draper and Smith 1998). Despite defined as the mean squared magnitude of the error modulation the structural similarities, there are significant conceptual differences spectrum, averaged only over the desired range of rates or scales. between the two methods. One main difference between the two is Because stimuli were reconstructed from a finite number of neurons inclusion and exclusion of known statistical structure of the input in and a finite amount of fit data, we also measured the effect of limited the reconstruction. Because stimulus correlations are removed during sampling on reconstruction performance. For a set of N neurons and Ϫ1 http://jn.physiology.org/ STRF estimation (Css in Eq. 5), flat prior reconstruction has no T seconds of fit data, the reconstruction error can be attributed to two access to stimulus statistics. The optimal prior method, in contrast, sources: failure of the neural response to encode stimulus features and uses whatever stimulus correlations are available to improve recon- error from limited sampling of the neural population and fit data. If we struction accuracy, even if the information is not explicitly encoded3332 in assume that the sources of errorN. MESGARANI, are additive, as S.N V.and DAVID,T grow J. B. larger, FRITZ, AND S. A. SHAMMA Relationshipthe error between from limited optimal sampling prior should and flat fall off inversely with NQuantifyingand T. reconstruction accuracy neural responses. prior reconstruction To determine reconstruction error in the absence of noise, we mea-To make unbiased measurements of the accuracy of reconstruction, Figure 1B shows this point intuitively. In this example, data areThe two methods for reconstructing the input spectrograms from a subset of validation data was reserved from the data used for sured error, e, for a range of values of N and T and fit the functionestimating the reconstruction filter, (G for optimal prior reconstruction located in three-dimensional (3D) space (xyz), but the STRF responsethe neural responses are shown schematically in Fig. 1. In optimal prior reconstruction,(David and Gallant we directly 2005) estimated the optimal linear mapping or F for flat prior reconstruction). The estimated filter was used to from neural responses to the stimulus spectrogram. This method reconstruct the stimulus in the validation set, and reconstruction projects them onto the xy plane. Clearly, one dimension of the input accuracy was measured in two ways: 1) correlation coefficient (Pear- optimally minimizes the mean-squared error (MSE) of the estimated son’s r) between the reconstructed and original stimulus spectrogram

spectrogram. In flat prior reconstruction, we used the neuron’s STRFs at Univ of Connecticut Health Ctr on June 13, 2013 (z) is lost in this transformation, and we cannot accurately reconstruct and 2) mean squared error. to construct a mapping from neural responsesB toC stimulus spectro- the points in 3D by having only their projections (neural response) and In addition to measuring global reconstruction error for TORCs, we gram. This method inverts a sete ofϭ STRFsA ϩ ϩ that are estimated by also(7 measured) error separately for different parts of the modulation minimizing the MSE of the predicted neuralN responses.T The optimal spectrum (i.e., separately for different rates and spectral scales). To do knowledge of the STRFs (flat prior method, F). However, becauseprior method and the STRF estimation are the complementary forward this, we subtracted the reconstructed TORC spectrogram from the Downloaded from there is correlation between z and the other two dimensions (in thisand backward predictions in the linear regression framework, and the original spectrogram and computed the modulation spectrum of the goodnessThe of the limit fit of is reconstruction the same for both error directions for arbitrarily (in terms large ofN theanddifference.T was The normalized error for specific rates and scales was example, all the points belong to a plane z ϭ x ϩ y), having accessexplained to fraction of variance, R2) (Draper and Smith 1998). Despite defined as the mean squared magnitude of the error modulation the structuraltaken to similarities, be A. To make there are unbiased significant measurements conceptual differences of parameterspectrum, values averaged only over the desired range of rates or scales. this prior knowledge in addition to the STRFs and neural respon-between the two methods. One main difference between the two is Because stimuli were reconstructed from a finite number of neurons inclusionfor andEq. exclusion 7, we used of a known procedure statistical in which structure independent of the input subsets in and of the a finite amount of fit data, we also measured the effect of limited ses enables the correct reconstruction of the points in 3D (optimalthe reconstruction. Because stimulus correlations are removed during sampling on reconstruction performance. For a set of N neurons and entire availableϪ1 data set were used to measure reconstruction error for http://jn.physiology.org/ STRF estimation (Css in Eq. 5), flat prior reconstruction has no T seconds of fit data, the reconstruction error can be attributed to two prior, G). accessdifferent to stimulus values statistics. of N and TheT optimal. prior method, in contrast, sources: failure of the neural response to encode stimulus features and uses whatever stimulus correlations are available to improve recon- error from limited sampling of the neural population and fit data. If we “Forward”struction accuracy, even Model if the information of isthe not explicitly Brain encoded in assume that the sources of error are additive, as N and T grow larger, neural responses. the error from limited sampling should fall off inversely with N and T. StimulusFigure 1B shows this point intuitively. In this example, data are To determine reconstruction error in the absence of noise, we mea- ABlocated in three-dimensional (3D) space (xyz), but the STRF response sured error, e, for a range of values of N and T and fit the function (David and Gallant 2005) projects them onto the xy plane. Clearly, one dimension of the input

(z) is lost in this transformation, and we cannot accuratelyZ reconstruct at Univ of Connecticut Health Ctr on June 13, 2013 B C the points in 3D by having only their projections (neural response) and e ϭ A ϩ ϩ (7) 8 knowledge of the STRFs (flat prior method, F). However, because N T there is correlation between z and the other two dimensions (in this The limit of reconstruction error for arbitrarily large N and T was example, all the points belong to a plane z x y), having access to Flat Prior Model Optimal PriorCan Model weϭ ϩ predict braintaken to beA. To make unbiased measurements of parameter values (KHz) this prior knowledge in addition to the STRFs and neural respon- for Eq. 7, we used a procedure in which independent subsets of the Frequency ses enables the correct reconstruction of the points in 3D (optimal entire available data set were used to measure reconstruction error for 0.25 0123prior, G). activity given a sound?different values of N and T. Time (sec) Predicted responses H 8 Neural responsesAB8 Stimulus cell c033a-b2 Z

200 8 (KHz) (KHz) 100 Frequency (Hz) Frequency Flat Prior Model Optimal Prior Model

(KHz) Y

0 Frequency

0.25 Firing rate 0.25 X 0 1 2 3 0.25 0 1 2 3 0 0.07 0.14 0 0.07 0.14 0123 Time (sec) Time (sec) Time (sec) Predicted responses H 8 8 8 Neural responses 8 cell c034a-d2 cell c033a-b2 200 (KHz) (KHz) 100 Frequency (Hz) 200 Frequency Y 0 (KHz) (KHz)

0.25 Firing rate 0.25 X 0 1 2 3 0 0.07 0.14 0 1 2 3 0 0.07 0.14 100 Time (sec) Time (sec) Frequency Frequency (Hz) Mathematical 8 8 0.25 0 cell c034a-d2 0.25 0 1 2 3 0 0.07 0.14 Firing rate 0 1 2 3 0 0.07 0.14 200 (KHz) 100 (KHz) Frequency Frequency Time (sec) Time (sec) (Hz) 0.25 0 0.25 0 1 2 3 0 0.07 0.14ReconstructedFiring rate 0 speech1 2 3 0 0.07 0.14 Model / ComputerTime (sec) Time (sec) Prediction Reconstruction Reconstructed speech Prediction Reconstruction

Algorithm 8 8 8 cell j019e-1 8 8 8 200 (KHz) (KHz) (KHz) 100 Frequency (Hz) cell j019e-1 Frequency Frequency 0.25 0 0.25 0.25 0 1 2 3 0 0.07 0.14 Firing rate 0 1 2 3 0 0.07 0.14 0123 Time (sec) 200 Time (sec) Time lag (sec) Time (sec) Time lag (sec) (KHz) (KHz) (KHz) 100 FIG. 1. Optimal prior vs. flat prior reconstruction. A: optimal prior reconstruction (G) is the optimal linear mapping from a population of neuronal responses Frequency (Hz) Frequency back to the sound spectrogramFrequency (right). Using optimal prior reconstruction, one can reconstruct the spectrogram of a sound, not only features that are explicitly 0.25 0 0.25 0.25 0 1 2 3 Firing rate 0 1 coded by2 neurons but3 also features that are correlated with them. Flat prior reconstruction (F) is the best linear mapping of responses to stimulus spectrogram 0 0.07 0.14 0 0.07 0.14 0123Time (sec) Time (sec) Time lag (sec) Timewhen (sec) only the information explicitlyTime lag (sec) encoded by neurons (i.e., their spectro-temporal receptive fields) is known. B: for this simple 3-dimensional system, only x and y values are explicitly encoded in the neural response. Optimal prior reconstruction can reproduce the entire stimulus, using the knowledge that z values FIG. 1. Optimal prior vs. flat prior reconstruction. A: optimal prior reconstructionare correlated (G) with is thex optimaland y (in linear this example, mappingz fromϭ x ϩ a populationy). However, of flat neuronal prior reconstruction responses does not recover information about the data on the z-axis. back to the sound spectrogram (right). Using optimal prior reconstruction, one can reconstruct the spectrogram of a sound,J not Neurophysiol only features¥ VOL that 102 are¥ DECEMBER explicitly 2009 ¥ www.jn.org coded by neurons but also features that are correlated with them. Flat prior reconstruction (F) is the best linear mapping of responses to stimulus spectrogram when only the information explicitly encoded by neurons (i.e., their spectro-temporal receptive fields) is known. B: for this simple 3-dimensional system, only x and y values are explicitly encoded in the neural response. Optimal prior reconstruction can reproduce the entire stimulus, using the knowledge that z values are correlated with x and y (in this example, z ϭ x ϩ y). However, flat prior reconstruction does not recover information about the data on the z-axis.

J Neurophysiol ¥ VOL 102 ¥ DECEMBER 2009 ¥ www.jn.org 3332 N. MESGARANI, S. V. DAVID, J. B. FRITZ, AND S. A. SHAMMA

Relationship between optimal prior and flat Quantifying reconstruction accuracy prior reconstruction To make unbiased measurements of the accuracy of reconstruction, The two methods for reconstructing the input spectrograms from a subset of validation data was reserved from the data used for the neural responses are shown schematically in Fig. 1. In optimal estimating the reconstruction filter, (G for optimal prior reconstruction prior reconstruction, we directly estimated the optimal linear mapping or F for flat prior reconstruction). The estimated filter was used to from neural responses to the stimulus spectrogram. This method reconstruct the stimulus in the validation set, and reconstruction accuracy was measured in two ways: 1) correlation coefficient (Pear- optimally minimizes the mean-squared error (MSE) of the estimated son’s r) between the reconstructed and original stimulus spectrogram spectrogram. In flat prior reconstruction, we used the neuron’s STRFs and 2) mean squared error. to construct a mapping from neural responses to stimulus spectro- In addition to measuring global reconstruction error for TORCs, we gram. This method inverts a set of STRFs that are estimated by also measured error separately for different parts of the modulation minimizing the MSE of the predicted neural responses. The optimal spectrum (i.e., separately for different rates and spectral scales). To do prior method and the STRF estimation are the complementary forward this, we subtracted the reconstructed TORC spectrogram from the Downloaded from and backward predictions in the linear regression framework, and the original spectrogram and computed the modulation spectrum of the goodness of the fit is the same for both directions (in terms of the difference. The normalized error for specific rates and scales was explained fraction of variance, R2) (Draper and Smith 1998). Despite defined as the mean squared magnitude of the error modulation the structural similarities, there are significant conceptual differences spectrum, averaged only over the desired range of rates or scales. between the two methods. One main difference between the two is Because stimuli were reconstructed from a finite number of neurons inclusion and exclusion of known statistical structure of the input in and a finite amount of fit data, we also measured the effect of limited the reconstruction. Because stimulus correlations are removed during sampling on reconstruction performance. For a set of N neurons and Ϫ1 http://jn.physiology.org/ STRF estimation (Css in Eq. 5), flat prior reconstruction has no T seconds of fit data, the reconstruction error can be attributed to two access to stimulus statistics. The optimal prior method, in contrast, sources: failure of the neural response to encode stimulus features and uses whatever stimulus correlations are available to improve recon- error from limited sampling of the neural population and fit data. If we struction accuracy, even if the information is not explicitly encoded3332 in assume that the sources of errorN. MESGARANI, are additive, as S.N V.and DAVID,T grow J. B. larger, FRITZ, AND S. A. SHAMMA Relationshipthe error between from limited optimal sampling prior should and flat fall off inversely with NQuantifyingand T. reconstruction accuracy neural responses. prior reconstruction To determine reconstruction error in the absence of noise, we mea-To make unbiased measurements of the accuracy of reconstruction, Figure 1B shows this point intuitively. In this example, data areThe two methods for reconstructing the input spectrograms from a subset of validation data was reserved from the data used for sured error, e, for a range of values of N and T and fit the functionestimating the reconstruction filter, (G for optimal prior reconstruction located in three-dimensional (3D) space (xyz), but the STRF responsethe neural responses are shown schematically in Fig. 1. In optimal prior reconstruction,(David and Gallant we directly 2005) estimated the optimal linear mapping or F for flat prior reconstruction). The estimated filter was used to from neural responses to the stimulus spectrogram. This method reconstruct the stimulus in the validation set, and reconstruction projects them onto the xy plane. Clearly, one dimension of the input accuracy was measured in two ways: 1) correlation coefficient (Pear- optimally minimizes the mean-squared error (MSE) of the estimated son’s r) between the reconstructed and original stimulus spectrogram

spectrogram. In flat prior reconstruction, we used the neuron’s STRFs at Univ of Connecticut Health Ctr on June 13, 2013 (z) is lost in this transformation, and we cannot accurately reconstruct and 2) mean squared error. to construct a mapping from neural responsesB toC stimulus spectro- the points in 3D by having only their projections (neural response) and In addition to measuring global reconstruction error for TORCs, we gram. This method inverts a sete ofϭ STRFsA ϩ ϩ that are estimated by also(7 measured) error separately for different parts of the modulation minimizing the MSE of the predicted neuralN responses.T The optimal spectrum (i.e., separately for different rates and spectral scales). To do knowledge of the STRFs (flat prior method, F). However, becauseprior method and the STRF estimation are the complementary forward this, we subtracted the reconstructed TORC spectrogram from the Downloaded from there is correlation between z and the other two dimensions (in thisand backward predictions in the linear regression framework, and the original spectrogram and computed the modulation spectrum of the goodnessThe of the limit fit of is reconstruction the same for both error directions for arbitrarily (in terms large ofN theanddifference.T was The normalized error for specific rates and scales was example, all the points belong to a plane z ϭ x ϩ y), having accessexplained to fraction of variance, R2) (Draper and Smith 1998). Despite defined as the mean squared magnitude of the error modulation the structuraltaken to similarities, be A. To make there are unbiased significant measurements conceptual differences of parameterspectrum, values averaged only over the desired range of rates or scales. this prior knowledge in addition to the STRFs and neural respon-between the two methods. One main difference between the two is Because stimuli were reconstructed from a finite number of neurons inclusionfor andEq. exclusion 7, we used of a known procedure statistical in which structure independent of the input subsets in and of the a finite amount of fit data, we also measured the effect of limited ses enables the correct reconstruction of the points in 3D (optimalthe reconstruction. Because stimulus correlations are removed during sampling on reconstruction performance. For a set of N neurons and entire availableϪ1 data set were used to measure reconstruction error for http://jn.physiology.org/ STRF estimation (Css in Eq. 5), flat prior reconstruction has no T seconds of fit data, the reconstruction error can be attributed to two prior, G). accessdifferent to stimulus values statistics. of N and TheT optimal. prior method, in contrast, sources: failure of the neural response to encode stimulus features and uses whatever stimulus correlations are available to improve recon- error from limited sampling of the neural population and fit data. If we “Inverse”struction accuracy, evenModel if the information of isthe not explicitly Brain encoded in assume that the sources of error are additive, as N and T grow larger, neural responses. the error from limited sampling should fall off inversely with N and T. StimulusFigure 1B shows this point intuitively. In this example, data are To determine reconstruction error in the absence of noise, we mea- ABlocated in three-dimensional (3D) space (xyz), but the STRF response sured error, e, for a range of values of N and T and fit the function (David and Gallant 2005) projects them onto the xy plane. Clearly, one dimension of the input

(z) is lost in this transformation, and we cannot accuratelyZ reconstruct at Univ of Connecticut Health Ctr on June 13, 2013 B C the points in 3D by having only their projections (neural response) and e ϭ A ϩ ϩ (7) 8 knowledge of the STRFs (flat prior method, F). However, because N T there is correlation between z and the other two dimensions (in this The limit of reconstruction error for arbitrarily large N and T was example, all the points belong to a plane z x y), having access to Flat Prior Model OptimalCan Prior Model weϭ ϩpredict a soundtaken to be A . To make unbiased measurements of parameter values (KHz) this prior knowledge in addition to the STRFs and neural respon- for Eq. 7, we used a procedure in which independent subsets of the Frequency ses enables the correct reconstruction of the points in 3D (optimal entire available data set were used to measure reconstruction error for 0.25 0123prior, G). given a brain activity?different values of N and T. Time (sec) Predicted responses H 8 Neural responsesAB8 Stimulus cell c033a-b2 Z

200 8 (KHz) (KHz) 100 Frequency (Hz) Frequency Flat Prior Model Optimal Prior Model

(KHz) Y

0 Frequency

0.25 Firing rate 0.25 X 0 1 2 3 0.25 0 1 2 3 0 0.07 0.14 0 0.07 0.14 0123 Time (sec) Time (sec) Time (sec) Predicted responses H 8 8 8 Neural responses 8 cell c034a-d2 cell c033a-b2 200 (KHz) (KHz) 100 Frequency (Hz) 200 Frequency Y 0 (KHz) (KHz)

0.25 Firing rate 0.25 X 0 1 2 3 0 0.07 0.14 0 1 2 3 0 0.07 0.14 100 Time (sec) Time (sec) Frequency Frequency (Hz) Mathematical 8 8 0.25 0 cell c034a-d2 0.25 0 1 2 3 0 0.07 0.14 Firing rate 0 1 2 3 0 0.07 0.14 200 (KHz) 100 (KHz) Frequency Frequency Time (sec) Time (sec) (Hz) 0.25 0 0.25 0 1 2 3 0 0.07 0.14ReconstructedFiring rate 0 speech1 2 3 0 0.07 0.14 Model / ComputerTime (sec) Time (sec) Prediction Reconstruction Reconstructed speech Prediction Reconstruction

Algorithm 8 8 8 cell j019e-1 8 8 8 200 (KHz) (KHz) (KHz) 100 Frequency (Hz) cell j019e-1 Frequency Frequency 0.25 0 0.25 0.25 0 1 2 3 0 0.07 0.14 Firing rate 0 1 2 3 0 0.07 0.14 0123 Time (sec) 200 Time (sec) Time lag (sec) Time (sec) Time lag (sec) (KHz) (KHz) (KHz) 100 FIG. 1. Optimal prior vs. flat prior reconstruction. A: optimal prior reconstruction (G) is the optimal linear mapping from a population of neuronal responses Frequency (Hz) Frequency back to the sound spectrogramFrequency (right). Using optimal prior reconstruction, one can reconstruct the spectrogram of a sound, not only features that are explicitly 0.25 0 0.25 0.25 0 1 2 3 Firing rate 0 1 coded by2 neurons but3 also features that are correlated with them. Flat prior reconstruction (F) is the best linear mapping of responses to stimulus spectrogram 0 0.07 0.14 0 0.07 0.14 0123Time (sec) Time (sec) Time lag (sec) Timewhen (sec) only the information explicitlyTime lag (sec) encoded by neurons (i.e., their spectro-temporal receptive fields) is known. B: for this simple 3-dimensional system, only x and y values are explicitly encoded in the neural response. Optimal prior reconstruction can reproduce the entire stimulus, using the knowledge that z values FIG. 1. Optimal prior vs. flat prior reconstruction. A: optimal prior reconstructionare correlated (G) with is thex optimaland y (in linear this example, mappingz fromϭ x ϩ a populationy). However, of flat neuronal prior reconstruction responses does not recover information about the data on the z-axis. back to the sound spectrogram (right). Using optimal prior reconstruction, one can reconstruct the spectrogram of a sound,J not Neurophysiol only features¥ VOL that 102 are¥ DECEMBER explicitly 2009 ¥ www.jn.org coded by neurons but also features that are correlated with them. Flat prior reconstruction (F) is the best linear mapping of responses to stimulus spectrogram when only the information explicitly encoded by neurons (i.e., their spectro-temporal receptive fields) is known. B: for this simple 3-dimensional system, only x and y values are explicitly encoded in the neural response. Optimal prior reconstruction can reproduce the entire stimulus, using the knowledge that z values are correlated with x and y (in this example, z ϭ x ϩ y). However, flat prior reconstruction does not recover information about the data on the z-axis.

J Neurophysiol ¥ VOL 102 ¥ DECEMBER 2009 ¥ www.jn.org 3332 N. MESGARANI, S. V. DAVID, J. B. FRITZ, AND S. A. SHAMMA

Relationship between optimal prior and flat Quantifying reconstruction accuracy prior reconstruction To make unbiased measurements of the accuracy of reconstruction, The two methods for reconstructing the input spectrograms from a subset of validation data was reserved from the data used for the neural responses are shown schematically in Fig. 1. In optimal estimating the reconstruction filter, (G for optimal prior reconstruction prior reconstruction, we directly estimated the optimal linear mapping or F for flat prior reconstruction). The estimated filter was used to from neural responses to the stimulus spectrogram. This method reconstruct the stimulus in the validation set, and reconstruction accuracy was measured in two ways: 1) correlation coefficient (Pear- optimally minimizes the mean-squared error (MSE) of the estimated son’s r) between the reconstructed and original stimulus spectrogram spectrogram. In flat prior reconstruction, we used the neuron’s STRFs and 2) mean squared error. to construct a mapping from neural responses to stimulus spectro- In addition to measuring global reconstruction error for TORCs, we gram. This method inverts a set of STRFs that are estimated by also measured error separately for different parts of the modulation minimizing the MSE of the predicted neural responses. The optimal spectrum (i.e., separately for different rates and spectral scales). To do prior3332 method and the STRF estimation areN. the MESGARANI, complementary S. V. forward DAVID, J. B. FRITZ, AND S. A. SHAMMA this, we subtracted the reconstructed TORC spectrogram from the Downloaded from and backward predictions in the linear regression framework, and the original spectrogram and computed the modulation spectrum of the goodnessRelationship of the fit between is the same optimal for both prior directions and flat (in terms of the difference.Quantifying The normalized reconstruction error accuracy for specific rates and scales was explainedprior reconstruction fraction of variance, R2) (Draper and Smith 1998). Despite defined as the mean squared magnitude of the error modulation To make unbiased measurements of the accuracy of reconstruction, the structural similarities, there are significant conceptual differences spectrum, averaged only over the desired range of rates or scales. a subset of validation data was reserved from the data used for betweenThe the two two methods methods. for One reconstructing main difference the input between spectrograms the two is from Because stimuli were reconstructed from a finite number of neurons estimating the reconstruction filter, (G for optimal prior reconstruction inclusionthe neural and exclusion responses of are known shown statistical schematically structure in Fig. of the 1. input In optimal in and a finite amount of fit data, we also measured the effect of limited or F for flat prior reconstruction). The estimated filter was used to theprior reconstruction. reconstruction, Because we directly stimulus estimated correlations the optimalare removed linear during mappingsampling on reconstruction performance. For a set of N neurons and reconstruct the stimulus in the validation set, and reconstruction http://jn.physiology.org/ from neural responsesϪ1 to the stimulus spectrogram. This methodT seconds of fit data, the reconstruction error can be attributed to two STRF estimation (Css in Eq. 5), flat prior reconstruction has no accuracy was measured in two ways: 1) correlation coefficient (Pear- optimally minimizes the mean-squared error (MSE) of the estimatedsources: failure of the neural response to encode stimulus features and access to stimulus statistics. The optimal prior method, in contrast, son’s r) between the reconstructed and original stimulus spectrogram spectrogram. In flat prior reconstruction, we used the neuron’s STRFserror from limited sampling of the neural population and fit data. If we uses whatever stimulus correlations are available to improve recon- and 2) mean squared error. assume that the sources of error are additive, as N and T grow larger, structionto construct accuracy, a mapping even if the from information neural responses is not explicitly to stimulus encoded spectro- in In addition to measuring global reconstruction error for TORCs, we the error from limited sampling should fall off inversely with N and T. neuralgram. responses. This method inverts a set of STRFs that are estimated by also measured error separately for different parts of the modulation Figureminimizing 1B shows the MSE this ofpoint the intuitively. predicted neural In this responses. example, The data optimal are Tospectrum determine (i.e., reconstruction separately for error different in the rates absence and spectralof noise, scales). we mea- To do prior method and the STRF estimation are the complementary forwardsured error, e, for a range of values of N and T and fit the function located in three-dimensional (3D) space (xyz), but the STRF response this, we subtracted the reconstructed TORC spectrogram from the Downloaded from (David and Gallant 2005) projectsand backward them onto predictions the xy plane. in the Clearly, linear one regression dimension framework, of the input and the original spectrogram and computed the modulation spectrum of the goodness of the fit is the same for both directions (in terms of the

(z) is lost in this transformation, and we cannot accurately reconstruct difference. The normalized error for specific rates and scales was at Univ of Connecticut Health Ctr on June 13, 2013 explained fraction of variance, R2) (Draper and Smith 1998). Despite B C the points in 3D by having only their projections (neural response) and defined as the meane squaredϭ A ϩ magnitudeϩ of the error modulation(7) knowledgethe structural of the similarities, STRFs (flat there prior are method, significantF). conceptual However, because differences spectrum, averaged only over theN desiredT range of rates or scales. therebetween is correlation the two between methods.z and One the main other difference two dimensions between (in the this two is Because stimuli were reconstructed from a finite number of neurons The limit of reconstruction error for arbitrarily large N and T was example,inclusion all the and points exclusion belong of to known a plane statisticalz ϭ x ϩ structurey), having of access the input to in and a finite amount of fit data, we also measured the effect of limited the reconstruction. Because stimulus correlations are removed duringtakensampling to be A on. To reconstruction make unbiased performance. measurements For of a set parameter of N neurons values and this prior knowledge in addition to the STRFs and neural respon- http://jn.physiology.org/ Ϫ1 for TEq.seconds 7, we of used fit data, a procedure the reconstruction in which independent error can be subsets attributed of the to two sesSTRF enables estimation the correct (C reconstructionss in Eq. 5), flatof the prior points reconstruction in 3D (optimal has no access to stimulus statistics. The optimal prior method, in contrast,entiresources: available failure data of set the were neural used response to measure to encode reconstruction stimulus features error for and prior, G). different values of N and T. uses whatever stimulus correlations are available to improve recon- error from limited sampling of the neural population and fit data. If we struction accuracy, even if the information is not explicitly encoded in assume that the sources of error are additive, as N and T grow larger, neuralAB responses. Stimulus the error from limited sampling should fall off inversely with N and T. To determine reconstruction error in the absence of noise, we mea- Figure 1B shows this point intuitively. In this example, data are Z located in three-dimensional (3D) space (xyz), but the STRF response sured error, e, for a range of values of N and T and fit the function 8 (David and Gallant 2005) projects them onto the xy plane. Clearly, one dimension of the input Flat Prior Model Optimal Prior Model at Univ of Connecticut Health Ctr on June 13, 2013 (z) is lost in this transformation, and we cannot accurately(KHz) reconstruct

Frequency B C the points in 3D by having only their projections0.25 (neural response) and 0123 e ϭ A ϩ ϩ (7) knowledge of the STRFs (flat prior method, F). However, becauseTime (sec) N T Predicted responses H 8 Neural responses 8 therecell c033a-b2 is correlation between z and the other two dimensions (in this The limit of reconstruction error for arbitrarily large N and T was example, all the points belong to a plane z ϭ x ϩ200 y), having access to (KHz) (KHz) 100 taken to be A. To make unbiased measurements of parameter values Frequency (Hz) Reconstructing Speech From Frequency Y this prior knowledge in addition to the STRFs0 and neural respon-

0.25 Firing rate 0.25 X 0 1 2 3 0 0.07 0.14 0 1 2 for Eq.3 7, we0 used 0.07 a procedure 0.14 in which independent subsets of the ses enablesTime the (sec) correct reconstruction of the points in 3D (optimalTimeAuditory (sec) Cortex Responses 8 entire available8 data set were used to measure reconstruction error for prior,cell c034a-d2G). different values of N and T. 200 (KHz) 100 (KHz) Frequency Frequency (Hz) 0.25 0 0.25 0 1 2 3 0 0.07 0.14 Firing rate 0 1 2 3 0 0.07 0.14 TimeStimulus (sec) ABTime (sec) Reconstructed speech Prediction Reconstruction Z

8 8 8 8 cell j019e-1 Flat Prior Model Optimal Prior Model

200 (KHz) (KHz) (KHz) (KHz)

100 Frequency Frequency (Hz) Frequency Frequency 0.25 0 0.25 0.25 0.25 Firing rate 0123 0 1 2 3 0 0.07 0.14 0 1 Time (sec)2 3 0 0.07 0.14 0123 Time (sec) Time (sec) Time (sec) Predicted responses Time lag (sec) Time lag (sec) H 8 Neural responses 8 FIG. 1.cell Optimal c033a-b2 prior vs. flat prior reconstruction. A: optimal prior reconstruction (G) is the optimal linear mapping from a population of neuronal responses 200 back to the sound spectrogram (right). Using(KHz) optimal prior reconstruction, one can reconstruct the spectrogram of a sound, not only features that are explicitly (KHz) 100 Frequency (Hz) Frequency Y coded by neurons but also features that are correlated with them.0 Flat prior reconstruction (F) is the best linear mapping of responses to stimulus spectrogram

0.25 Firing rate 0.25 X 0 1 2 3 0 0.07 0.14 0 1 2 3 0 0.07 0.14 when only the informationTime (sec) explicitly encoded by neurons (i.e., their spectro-temporalTime (sec) receptive fields) is known. B: forMesgarani this simple 3-dimensional et al system, 2009 only 8 8 x and y valuescell c034a-d2 are explicitly encoded in the neural response. Optimal prior reconstruction can reproduce the entire stimulus, using the knowledge that z values are correlated with x and y (in this example, z ϭ x ϩ y). However,200 flat prior reconstruction does not recover information about the data on the z-axis. (KHz) 100 (KHz) Frequency Frequency (Hz) 0.25 0 0.25 0 1 2 3 0 0.07 0.14 Firing rate 0 1 2 3 0 0.07 0.14 J Neurophysiol ¥ VOL 102 ¥TimeDECEMBER (sec) 2009 ¥ www.jn.org Time (sec) Reconstructed speech Prediction Reconstruction

8 8 8 cell j019e-1

200 (KHz) (KHz) (KHz) 100 Frequency (Hz) Frequency Frequency 0.25 0 0.25 0.25 0 1 2 3 Firing rate 0 1 2 3 0 0.07 0.14 0 0.07 0.14 0123Time (sec) Time (sec) Time lag (sec) Time (sec) Time lag (sec)

FIG. 1. Optimal prior vs. flat prior reconstruction. A: optimal prior reconstruction (G) is the optimal linear mapping from a population of neuronal responses back to the sound spectrogram (right). Using optimal prior reconstruction, one can reconstruct the spectrogram of a sound, not only features that are explicitly coded by neurons but also features that are correlated with them. Flat prior reconstruction (F) is the best linear mapping of responses to stimulus spectrogram when only the information explicitly encoded by neurons (i.e., their spectro-temporal receptive fields) is known. B: for this simple 3-dimensional system, only x and y values are explicitly encoded in the neural response. Optimal prior reconstruction can reproduce the entire stimulus, using the knowledge that z values are correlated with x and y (in this example, z ϭ x ϩ y). However, flat prior reconstruction does not recover information about the data on the z-axis.

J Neurophysiol ¥ VOL 102 ¥ DECEMBER 2009 ¥ www.jn.org Ramirez et al. • Song Correlations Improve Spectrogram Decoding J. Neurosci., March 9, 2011 • 31(10):3828–3842 • 3837

Figure 8B show the SNR averaged across all songs, and dashed lines show the SE about these lines. An SNR value of one corre- sponds to estimating the spectrogram by a single number, the mean. Improvements in SNR reflect improved estimates in the correlations of song power about the mean. The colors denote which prior was used in computing the MAP estimate. As ex- Ramirez et al. • Song Correlations Improve Spectrogram Decoding pected, the SNR from MAPJ. estimatesNeurosci., Marc usingh 9, 2011 prior• 31(10) spectrotempo-:3828–3842 • 3837 ral information (black line) grows the fastest, followed by the SNRFigure from 8B MAPshow estimates, the SNR averaged which use across only spectral all songs, prior and informa- dashed tionlines (green show the line). SE Theabout faster these growth lines. An in SNR SNR value using of only one spectral corre- priorsponds information to estimating versus the spectrogramtemporal information by a single is number, probably the at- tributablemean. Improvements to the facts in that SNR MLd reflect population improved responses estimates already in the capturecorrelations a good of song deal of power temporal about information, the mean. The spectral colors informa- denote tionwhich helps prior infer was deflections used in computing from the the mean MAP at estimate. times of As sparse ex- activity,pected, the and SNR most from MLd MAP neurons estimates have using STRFs prior with spectrotempo- peak frequen- ciesral information below 4 kHz. (black line) grows the fastest, followed by the SNRFigure from 8 MAPC plots estimates, the coherence which use between only spectral the reconstructions prior informa- Reconstructed Bird Songs from andtion original (green line). spectrograms. The faster growth The coherence in SNR using is a only normalized spectral RamirezNeuralet al. • Song Correla tiResponsesons Improve Spectrogram Decodi ngin MLD measureprior information of the cross-correlation versusJ temporal. Neurosci., Mar informationc betweenh 9, 2011 • 31(1 the0):3 is82 original8 probably–3842 • 3837two- at- dimensionaltributable to signal the facts and that estimate MLd in population the frequency responses domain. already In all Figureofcapture the 8plots,B ashow good the the vertical deal SNR of temporal axis averaged shows information, across spectral all modulations songs, spectral and informa- dashed (in units linesoftion cycles show helps per the infer kilohertz). SE about deflections these These lines. from frequencies An the SNR mean are value at often times of one referred of corre- sparse to as activity, and most MLd neurons have STRFs with peak frequen- spondsthe ripple to estimating density. The the horizontal spectrogram axis by shows a single temporal number, modula- the cies below 4 kHz. mean.tions (in Improvements units of hertz). in SNRWe note reflect that improved the coherence estimates plot isin not the the Figure 8C plots the coherence between the reconstructions correlationssame as the of modulation song power power about spectrum the mean. shown The colors in Figure denote 3D. In and original spectrograms. The coherence is a normalized whichFiguremeasure prior 8C of, thewas the range used cross-correlation in of computing the coherence the between MAP is limited the estimate. original from AsϪ ex-10two- dB pected,(darkdimensional blue), the SNR a signal coherence from and MAP estimate of estimates 0.1, in to the using 0 dB, frequency prior i.e., perfect spectrotempo- domain. coherence In all ral(red).of information the plots, With the the (black vertical exception line) axis of showsgrows the spectral noncorrelatedthe fastest, modulations followed prior, (inby we theunits see a SNRhighof cycles from coherence per MAP kilohertz). estimates, for temporal These which frequenciesmodulations use only spectral are between often prior referredϪ informa-50 and to as 50 tionHzthe and (green ripple ripple density. line). densities The The faster horizontal between growth 0 axis and in SNR shows 0.6 usingcycles/kHz. temporal only spectral modula- When we prioranalyzedtions information (in units the coherence of hertz). versus We within temporal note these that information the frequencies, coherence is probably plotwe found is not at- the that tributablethesame average as the to modulation coherence the facts that is power highest MLd spectrum population for the shown spectrotemporal responses in Figure already 3D prior,. In Figure 7. Population decoding of song spectrogram with varying degrees of prior informa- capturesecondFigure a 8 highestC good, the deal rangefor of the temporal of spectral the coherence information, prior, and is limited smallest spectral from for informa-Ϫ the10 prior dB tion of song statistics. A, Top, Spectrogram of birdsong played to 189 different neurons leading tion helps infer deflections from the mean at times of sparse to thespike responsesshownimmediatelybelow.Spikesfrom agivenneuronareplotted at the without(dark blue), covariance a coherence information. of 0.1, to From 0 dB, this i.e., plot perfect we conclude coherence that BF at which that neuron’s receptive field reaches its maximal value. Neurons with the same BF activity,prior(red). knowledge With and most the exception MLd of the neurons stimulus of the have noncorrelated correlations STRFs with primarily peak prior, frequen- we aids see a in areplottedonthesame row.A–E,MAP estimategiventhe responsesinAusinganuncorrelated ciesreconstructinghigh below coherence 4 kHz. lower for temporal temporal modulations modulations between and rippleϪ50 densities. and 50 prior(B),apriorwithtemporalcorrelationsandnospectralcorrelations (C),apriorwithspectral HzFigureIt and is interestingripple 8C plots densities the to coherence compare between 0 the between and decoding 0.6 cycles/kHz.the reconstructions performance When we just correlations and no temporal correlations (D), and a prior with spectral and temporal correla- anddescribedanalyzed original the with spectrograms. coherence the OLE, within a simplerThe these coherence frequencies,and more is a commonly we normalized found that used tions(E).Combiningthespiketrainwithspectralinformationismoreimportantforreconstruct- measuredecoderthe average of (Mesgarani the coherence cross-correlation et is al., highest 2009). for betweenAs the discussed spectrotemporal the inoriginal Materialstwo- prior, and Figure 7. Population decoding of song spectrogram with varying degrees of prior informa- ing the original spectrogram than combining the spike train with temporal information. second highest for the spectral prior, and smallest for the prior tion of song statistics. A, Top, Spectrogram of birdsong played to 1Ramirez89 different neuron setlead inalg 2011dimensionalMethods, the signal OLE and finds estimate the estimate in the that frequency minimizes domain. the average- In all However, combining spikes with joint spectrotemporal information leads to the best to thespike responsesshownimmediatelybelow.Spikesfrom agivenneuronareplotted at the ofsquaredwithout the plots, Euclideancovariance the vertical distance information. axis shows between spectral From the this modulations spectrogram plot we conclude (in being units that esti- recoBnFsatrtuwcthioicnhst.hat neuron’s receptive field reaches its maximal value. Neurons with the same BF ofmatedprior cycles knowledge andper kilohertz). a linear of the Thesecombination stimulus frequencies correlations of the are often responses. primarily referred Figure aids to as in 8 areplottedonthesame row.A–E,MAP estimategiventhe responsesinAusinganuncorrelated the(magentareconstructing ripple density. line) showslower The temporal thehorizontal growth modulations axis in the shows SNR temporal and of the ripple OLE modula- densities. using the tions,prior( theB),ap MAPriorwith usestempo theralcor sparserelationsa activityndnospectr inalco MLdrrelatio tons (C correctly),apriorwith inferspectral tionssameIt (in isreal units interesting responses of hertz). to as We compare those note that used the the decodingfor coherence the nonlinear, performance plot is not Bayesian the just thecor deflectionsrelations and no t fromemporal thecorrel meanations (D) that, and a occurprior with inspe thectral a originalnd temporal songcorrela- samemodel.described as the The modulationwith OLE the depends OLE, power ona simpler spectrotemporal spectrum and shown more in correlations commonly Figure 3D. usedin In the spectrogram.tions(E).Combin Theingthe correlationsspiketrainwithspe alsoctralin helpformat theionis MAPmoreim inferportant that,forrecon dur-struct- Figurestimulusdecoder 8C, (Mesgarani so the we range compare of et the al., its 2009).coherence performance As discussed is limited with the in from MaterialspriorϪ that10 dB and con- ing the original spectrogram than combining the spike train with temporal information. Methods, the OLE finds the estimate that minimizes the average- ing episodes of high spiking activity, spectral bands above 4 kHz (darktains blue), both spectral a coherence and temporal of 0.1, to correlations 0 dB, i.e., perfect (black coherence line). Com- However, combining spikes with joint spectrotemporal information leads to the best squared Euclidean distance between the spectrogram being esti- should have similar levels of power as those below 4 kHz. In the (red).paring With these the two exception shows that of thewhen noncorrelated the number prior,of neurons we see is low,a reconstructions. mated and a linear combination of the responses. Figure 8 supplemental material (available at www.jneurosci.org) we pro- highthe two coherence estimates for perform temporal similarly. modulations As more between neuronsϪ50 andare added 50 Hz(magenta and ripple line) densities shows the between growth 0 inand the 0.6 SNR cycles/kHz. of the OLE When using we the videtions, the reconstruction the MAP uses the in the sparse time activity domain in created MLd to by correctly combining infer tosame the estimate,real responses the MAP as those estimator used outperforms for the nonlinear, the OLE. Bayesian Previ- analyzed the coherence within these frequencies, we found that thethe spectrogram deflections in from Figure the 7 meanE with that random occur phase. in the For original compari- song ousmodel. work The (Pillow OLE depends et al., 2011) on spectrotemporal has shown that correlations this behavior in the is the average coherence is highest for the spectrotemporal prior, sonspectrogram. purposes, we The also correlations provide the also original help the song MAP constructed infer that, with dur- expectedstimulus so if wethe compare encoding its model performance is a good with model the prior for that spike con- re- Figure 7. Population decoding of song spectrogram with varying degrees of prior informa- second highest for the spectral prior, and smallest for the prior randomizedtiingon of episodessong statis phase.tics of. A, highTop, Sp spikingectrogram o activity,f birdsong p spectrallayed to 189 bandsdifferent aboveneurons l 4ead kHzing sponsestains both to spectral stimuli and and temporal if the prior correlations model does (black a line). good Com- job of toshouldInthes Figurepike r haveespo 8nse similarwessho quantifywnim levelsmediat howe oflybe powerlo reconstructionw.Spike assfr thoseom agive belownn qualityeuron 4are kHz.p improveslotted Inat t thehe withoutcapturingparing covariance these stimulus two shows information. correlations. that when From Pillow the this number et plot al. (2011)we of conclude neurons showed is that low, that asBsupplemental aF a functiont which that n ofeur theo materialn’s r numbereceptive (availablefield ofrea neuronsches it ats ma www.jneurosci.org)xim usedal valu ine. N decoding.eurons with the We wesam pro-e useBF priorwhenthe two knowledge the estimates number of performof the neurons stimulus similarly. used correlations for As estimation more primarily neurons is low, are aids the added MAP in theavidere SNRplott theedo (seen reconstructionthes Materialsame row.A–E, andM inAP e thes Methods)tim timeategive domainnthe asres apon quantitative createdsesinAusin bygan combiningun methodcorrelated reconstructingestimateto the estimate, and lower OLE the MAPtemporal are equivalent. estimator modulations outperforms As the and number ripple the OLE. densities. of neurons Previ- forptherio evaluatingr(B spectrogram),apriorwit reconstructionhtemp inoral Figurecorrelation 7s accuracy.Eanwithdnospe randomctra Aslcorr describedelat phase.ions (C),a Forp inrior Materialsw compari-ithspectral grows,ousIt is work interesting the (Pillow MAP estimate to et compare al., 2011) can the outperform has decoding shown that the performance OLE this behavior because just the is andcosonrre Methods,lat purposes,ions and no thetem wepo neuronsr alsoal corr provideelation chosens (D), theand originalfora prio reconstructionr with songspectral constructedand tem wereporal co ran-r withrela- describedMAPexpected estimator with if the the encodingis OLE,not restricted a model simpler to is and abe good a more linear model commonly function for spike of used spike re- domlytirandomizedons(E). chosenCombining from phase.thespik theetrain population.withspectralinformationismoreimportantforreconstruct- decoderresponses.sponses (Mesgarani to stimuli et and al., if 2009). the prior As discussed model does in Materials a good job and of ingFigurethIne o Figurerigin 8aAl spplotse 8ctr weogra examplequantifym than com howb MAPining reconstructionth estimatese spike train w usingith qualitytemp theoral Gaussianin improvesformation. Methods,capturing the stimulus OLE finds correlations. the estimate Pillow that minimizeset al. (2011) the showed average- that priorHasowe av withe functionr, com spectrotemporalbining ofsp theikes numberwith joi correlationsnt ofspe neuronsctrotempor asa usedl i anf functionorm inati decoding.on lea ofds theto t Weh num-e b useest squaredHierarchicalwhen the Euclidean number prior distance of model neurons between used for the estimation spectrogram is low, being the esti- MAP reconstructions. berthe of SNR neurons (see for Materials a single and example Methods) song. as The a quantitative associated value method of matedWeestimate observed and and a linearvisible OLE are combination differences equivalent. in of power As the the densityresponses. number covariance of Figure neurons 8 and thefor SNR evaluating is given reconstruction above the MAP accuracy. estimate. As described The solid in Materials lines in (magentameangrows, during the line) MAP silent shows estimate and the vocal growth can periods outperform in the (Fig.SNR of 4, the covariancethe OLE OLE because using matrices, the the tions,and Methods, the MAP the uses neurons the sparse chosen activity for in reconstruction MLd to correctly were infer ran- sameMAP real estimator responses is not as thoserestricted used to for be the a linear nonlinear, function Bayesian of spike thedomly deflections chosen from from the the population. mean that occur in the original song model.responses. The OLE depends on spectrotemporal correlations in the spectrogram.Figure 8A Theplots correlations example MAP also help estimates the MAP using infer the that, Gaussian dur- stimulus so we compare its performance with the prior that con- ingprior episodes with spectrotemporal of high spiking correlationsactivity, spectral as a function bands above of the 4 num- kHz tainsHierarchical both spectral prior and model temporal correlations (black line). Com- shouldber of neurons have similar for a levels single of example power as song. those The below associated 4 kHz. value In the of paringWe observed these two visible shows differences that when in the power number density of neurons covariance is low, and supplementalthe SNR is given material above (available the MAP at www.jneurosci.org) estimate. The solid we lines pro- in themean two during estimates silent perform and vocal similarly. periods As (Fig. more 4, neuronscovariance are matrices, added vide the reconstruction in the time domain created by combining to the estimate, the MAP estimator outperforms the OLE. Previ- the spectrogram in Figure 7E with random phase. For compari- ous work (Pillow et al., 2011) has shown that this behavior is son purposes, we also provide the original song constructed with expected if the encoding model is a good model for spike re- randomized phase. sponses to stimuli and if the prior model does a good job of In Figure 8 we quantify how reconstruction quality improves capturing stimulus correlations. Pillow et al. (2011) showed that as a function of the number of neurons used in decoding. We use when the number of neurons used for estimation is low, the MAP the SNR (see Materials and Methods) as a quantitative method estimate and OLE are equivalent. As the number of neurons for evaluating reconstruction accuracy. As described in Materials grows, the MAP estimate can outperform the OLE because the and Methods, the neurons chosen for reconstruction were ran- MAP estimator is not restricted to be a linear function of spike domly chosen from the population. responses. Figure 8A plots example MAP estimates using the Gaussian prior with spectrotemporal correlations as a function of the num- Hierarchical prior model ber of neurons for a single example song. The associated value of We observed visible differences in power density covariance and the SNR is given above the MAP estimate. The solid lines in mean during silent and vocal periods (Fig. 4, covariance matrices, ARTICLES

Manner of articulation observations that monkeys, cats, birds and Stops Fricatives Affricates Nasals Glides Liquids rodents can discriminate consonant 17,18,31–36 Pad Bad Fad Vad Mad Wad sounds . The wide range of difficulty across the 11 tasks is advantageous for identi- fying neural correlates. Lips Consistent with our hypothesis that A1 representations make use of precise spike timing, /d/ versus /b/ was one of the easiest Tad Dad Sad Zad Chad Jad Nad Lad tasks (Fig. 4), and differences in the A1 onset response patterns were highly correlated with performance on the 11 tasks when 1-ms Roof bins were used (R2 0.75, P 0.0006, ¼ ¼ Place of articulation Fig. 5a). A1 responses were not correlated

Kad Gad Shad Had 30 kHz Yad Rad with behavior when spike timing information was removed (R2 0.046, P 0.5; Fig. 5b; ¼ ¼ 20 Supplementary Fig. 5a and Supplementary

Back Data online). 10 Neural discrimination predicts behavioral 0 200 400 600 ms discrimination Figure 1 Spectrograms of each speech sound grouped by manner and place of articulation. Words with Although it is interesting that the average unvoiced initial consonants are underlined. Frequency is represented on the y axis (0–35 kHz) and time neural response to each consonant was related on the x axis (–50 to 700 ms). Speech sounds were shifted one octave higher to accommodate the to behavior, in practice, individual speech rat hearing range. See Supplementary Figure 1 for a more detailed view of the first 40 ms of sounds must be identified during single trials, each spectrogram. not based on the average of many trials. Analysis using a nearest-neighbor classifier spatiotemporal patterns evoked by the consonants /d/ and /b/ were makes it possible to document neural discrimination on the basis of much more distinct than the patterns evoked by /m/ and /n/ (Fig. 3, single trial data and allows the direct correlation between neural and part 1), leading to the prediction that /d/ versus /b/ would be one of the behavioral discrimination in units of percentage correct. This classifier easiest consonant pairs to discriminate and /m/ versus /n/ would be one (which compares the poststimulus time histogram (PSTH) evoked by of the hardest. Alternatively, if information about precise timing is not each stimulus presentation with the average PSTH evoked by used, /d/ versus /b/ was predicted to be a very difficultAuditory discrimination each Cortex consonant and selects Responses the most similar; see Methods) to is effective (Fig. 3, part 2). To test these contrasting predictions, we evaluated in identifying tactile patterns and animal vocalizations using the ability of rats to distinguish between these and nine other cortical activity8,37. consonant pairs using an operant go/no-go20 procedure English wherein rats Behavioral consonants performance was well predicted in bythe classifier performanceRat were rewarded for a lever press after the presentation of a target when activity was binned with 1-ms precision. For example, a single consonant. The tasks were chosen so that each consonant pair differed by one articula- P BFV M W tory feature (place, voicing or manner; Fig. 1). 26.3 Rats were able to reliably discriminate 9 of the 17.4 11 pairs tested (Fig. 4 and Supplementary 13.7 10.4 Fig. 4 online). These results extend earlier 8.2 6.7 4.6 1.6 Figure 2 Neurograms depicting the onset response of rat A1 neurons to 20 English T D S Z Ch J NL consonants. Multiunit data was collected from 445 recording sites in 11 anesthetized, 26.3 experimentally naive adult rats. Average 17.4 13.7 poststimulus time histograms (PSTH) derived 10.4 from 20 repeats are ordered by the 8.2 characteristic frequency (kHz) of each recording 6.7 4.6 site (y axis). Time is represented on the x axis 1.6

(–5 to 40 ms). The firing rate of each site is Characteristic frequency (kHz) represented in grayscale, where black indicates KGSh H YR 400 Hz. For comparison, the mean population PSTH evoked by each sound is plotted above the 26.3 corresponding neurogram. For reference, ‘gad’ 17.4 13.7 evoked the highest population firing rate of 288 10.4 Hz. As in Figure 1, rows differ in the place of 8.2 articulation of each consonant, and columns differ 6.7 4.6 in the manner of articulation. See Supplementary 1.6 Figure 2 for a direct comparison between stop- 0 4000 40 400 40 0 400 40 consonant spectrograms and neurograms. Time (ms) Engineer et al 2008 604 VOLUME 11 [ NUMBER 5 [ MAY 2008 NATURE NEUROSCIENCE Consonants can be accurately discriminated ARTICLES ARTICLESfrom A1 responses in rats

Figure 6 Predictions of consonant discrimination 90% Figure 6 Predictions of consonant discrimination abP P P ability90% based onP nearest-neighbor classifier. ab3. Full response ability based on nearest-neighbor classifier. T 1. Onset T T 1. Onset T 3. Full response K K Kspike timing (a) Part 1, neuralK discrimination of each (a) Part 1, neural discrimination of each B spike timing B spike timing Neural Discrimination 1)spike Discrimination timing is B consonant pairB using the 40-ms onset response consonant pair using the 40-ms onset response D D D D 80% 80% G G G with 1-ms bins.G Partpoor 2, neural for discrimination firing rates of with 1-ms bins. Part 2, neural discrimination of F F F each consonantF pair using the average firing rate each consonant pair using the average firing rate S S S S Sh Sh Sh over the first 40Sh ms of the onset response. (b) Part over the first 40 ms of the onset response. (b) Part V V V V 70% 3, neural70% discrimination using the entire 700-ms 3, neural discrimination using the entire 700-ms Z Z Z Z 2) Discrimination is H H H stimulus durationH with 1-ms bins. Part 4, neural stimulus duration with 1-ms bins. Part 4, neural Ch Ch Ch discriminationCh using the average rate over the discrimination using the average rate over the J J J J very good if one Neural discrimination M M 700-ms stimulusNeural discrimination duration. Pairs that were tested 700-ms stimulus duration. Pairs that were tested 60% M 60% M N N N behaviorally (Fig.N considers 4) are outlined in black. spike W W behaviorally (Fig. 4) are outlined in black. W W Y 2. Onset Y Y 4. Full response2. Onset Y timing 4. Full response L mean rate L L L R R meanmean rate rate mean rate 50% R 50% R R L Y W N M J ChH Z VSh SPF G D B K T R L RY LWYNWMNJMChJHChZ HVZShVSPShFSPGFDGBDKBTK T sites significantlyR L exceededY W N M J Ch actualH Z VSh behavioralSPF G D B K T sites significantly exceeded actual behavioral performance (Fig. 7b). This excessive accuracy performance (Fig. 7b). This excessive accuracy resulted in a ceiling effect, which probably resulted in a ceiling effect, which probably discrimination using individual single units, 16 single units,discrimination individual usingexplains individual why single the correlation units, 16 single with units, behavior individual was lowerexplains when why large the correlation with behavior was lower when large multiunits and sets of 16 multiunits. Stringent spike-sortingmultiunits criteria and setspopulations of 16 multiunits. were used. Stringent After exploring spike-sortingEngineer a large criteria set of neuraletpopulations al readouts 2008 were used. After exploring a large set of neural readouts were used to increase our confidence that we were recordingwere used from to increaseusing our various confidence time windows that we were and recordingpopulation from sizes,using we found various that time windows and population sizes, we found that individual neurons. We collected a total of 16 well isolatedindividual single units neurons.discrimination We collected ausing total onset of 16 activity well isolated patterns single from units individualdiscrimination multiunit using onset activity patterns from individual multiunit from 16 different recording sites distributed across A1.from Consonant 16 differentsites recording correlated sites best distributed with behavioral across A1. discrimination. Consonant sites correlated best with behavioral discrimination. discrimination was evaluated for each of the 16 single unitsdiscrimination individually was evaluatedOur observation for each of that the 16 multiunit single units responses individually were highlyOur correlated observation that multiunit responses were highly correlated and for the set of all 16 single units. When the classifierand was for provided the set ofwith all 16 behavioralsingle units. performance When the classifier is consistent was provided withwith earlier behavioral reports performance is consistent with earlier reports with activity from all 16 sites, each pattern was a matrix ofwith 16 activity columns fromthat all 16 multiunit sites, each responses pattern was are a matrix superior of 16 to columns single-unitthat responses multiunit for responses are superior to single-unit responses for and a number of rows determined by the bin size used.and We a used number the ofidentifying rows determined complex by stimuli.the bin size For used. example, We used V1 single the unitsidentifying provide complex an stimuli. For example, V1 single units provide an same technique to evaluate classifier performance usingsame multiunit technique tounreliable evaluate estimate classifier of performance the local contrast using multiunit in natural images,unreliable whereas estimate of the local contrast in natural images, whereas activity from sets of 16 recording sites randomly selectedactivity from from setsmultiunit of 16 recording responses sites encode randomly this information selected from efficientlymultiunit38. Similarly, responses encode this information efficiently38. Similarly, the full set of 445 recording sites. Each population sizethe was full evaluated set of 445multiunit recording clusters sites. Each in the population bird homolog size was of A1 evaluated are better thanmultiunit single clusters units in the bird homolog of A1 are better than single units with or without precise temporal information using the onsetwith or response without preciseat discriminating temporal information song from using simpler the onset sounds, response includingat tones, discriminating ripples song from simpler sounds, including tones, ripples (that is, 1-ms or 40-ms bins) or the entire response(that is, 1-ms 1-ms orand 40-ms noise bins)39. or the entire response (that is, 1-ms and noise39. or 700-ms bins). or 700-ms bins). Neural discrimination using single units did not correlateNeural with discriminationDISCUSSION using single units did not correlate with DISCUSSION behavior regardless of the coding strategy used inbehavior the analysis regardlessAlthough of the theoretical coding strategy studies have used suggested in the analysis that preciseAlthough spike timing theoretical studies have suggested that precise spike timing (Fig. 7a). The poor correlation may be related to the(Fig. poor 7a neural). The poorcan correlation provide a rapid may be and related accurate to codethe poor for stimulus neural recognitioncan provide and a rapid and accurate code for stimulus recognition and discrimination of single units (Fig. 7b), which was probablydiscrimination due to the ofcategorization single units (Fig.40,41 7b,), studies which in was visual probably and due somatosensory to the categorization cortex have40,41, studies in visual and somatosensory cortex have small number of action potentials in single-unit responsessmall compared number to of actionindicated potentials that firing in single-unit rates averaged responses across compared 50–500 ms to areindicated best correlated that firing rates averaged across 50–500 ms are best correlated multiunit responses. Although discrimination using all 16multiunit single units responses.with Although behavior discrimination2–6. Our results using suggest all 16 that single the units representationwith behavior of con-2–6. Our results suggest that the representation of con- was better than individual single units, neural discriminationwas better on thanthe individualsonant sounds single in units, A1 is neural based ondiscrimination time windows on that the aresonant approximately sounds in A1 is based on time windows that are approximately 11 tasks was still not significantly correlated with behavior11 tasks (Fig. was 7), still50 not times significantly more precise. correlated with behavior (Fig. 7), 50 times more precise. perhaps because of the anatomical distance betweenperhaps the because 16 ofThe the greater anatomical temporal distance precision between observed the in 16 this studyThe could greater be temporal precision observed in this study could be recording sites. recording sites. specific to the auditory system1,7–9. However, it is alsospecific possible to the that auditory system1,7–9. However, it is also possible that Neural discrimination using 16 randomly selected multiunitNeural sites discriminationspike timing using is 16 important randomly in selected all modalities multiunit when sites transientspike stimuli timing are is important in all modalities when transient stimuli are correlated with behavior but did so only when temporallycorrelated precise onset with behaviorinvolved but38,39,42 did so. The only latter when hypothesis temporally is precise supported onset by observationsinvolved38,39,42 of. a The latter hypothesis is supported by observations of a responses were used (Fig. 7a). Although the dependenceresponses on temporally were usedrate-based (Fig. 7a). code Although for steady-state the dependence vowels on23,43–45 temporallyand byrate-based computational code for steady-state vowels23,43–45 and by computational precise onset responses was similar to results basedprecise on onsetsingle responsesstudies showing was similar that cortical to results neurons based can on efficiently single extractstudies temporal showing that cortical neurons can efficiently extract temporal multiunit sites, the average neural performance usingmultiunit 16 multiunit sites, thepatterns average from neural populations performance of neurons using 16 in multiunit a mannerpatterns that promotes from populations of neurons in a manner that promotes

Figure 7 Neural discrimination using the onset activity patternFigure from 7 Neural discrimination usingOnset the 1-ms onset bins activity pattern from Onset 1-ms bins abCorrelation P < 0.05 abCorrelation P < 0.05 individual multiunit sites was best correlated with behavior. (a)individual Percent of multiunit sites was1.0 best correlated with behavior. (a) Percent100 of 1.0 100 Onset 40-ms bin Onset 40-ms bin variance across the 11 behavioral tasks that was explained usingvariance individual across the 11 behavioral0.9 Duration tasks that 1-ms was bins explained using individual 0.9 Duration 1-ms bins ) Duration 700-ms bin )

2 Duration 700-ms bin single units, 16 single units, individual multiunits and sets of 16single multiunits. units, 16 single units, individual multiunits and sets of 16 multiunits. 2 0.8 90 0.8 90 Each population size was evaluated with or without precise temporalEach population size was evaluated with or without precise temporal information using the onset response or the entire response. Filled symbols, 0.7 0.7 information using the onset response or the entire response. Filled80 symbols, 80 statistically significant correlations between neural and behavioralstatistically significant correlations0.6 between neural and behavioral 0.6 discrimination (Pearson’s correlation coefficient, P o 0.05). Regardlessdiscrimination of (Pearson’s correlation0.5 coefficient, P o 0.05). Regardless of 0.5 70 70 population size, neural discrimination based on mean firing ratepopulation (that is, 40- size, neural discrimination0.4 based on mean firing rate (that is, 40- 0.4 or 700-ms bin) did not correlate with behavioral performance onor 700-msthe 11 tasks. bin) did not correlate with behavioral performance on the 11 tasks.

0.3 Percentage correct 60 0.3 Percentage correct (b) Mean neural discrimination on the 11 consonant-discrimination(b) Mean tasks neural discrimination on the 11 consonant-discrimination tasks 60 0.2 0.2 behavioral discrimination ( R behavioral discrimination ( R using individual single units, 16 single units, individual multiunits and sets of Correlation between neural and using individual single units, 16 single units, individual multiunits and sets of Correlation between neural and 16 multiunits. Dashed line, average behavioral discrimination;16 dotted multiunits. lines, Dashed line,0.1 average behavioral discrimination; dotted50 lines, 0.1 50 s.e.m. for behavioral discrimination. Error bars, s.e.m. across thes.e.m. 11 tasks.for behavioral discrimination.0 Error bars, s.e.m. across the 11 tasks. 0 1 16 1 16 1 16 1 16 1 16 1 16 1 16 1 16 The complete distributions of neural discrimination by single andThe multiunit complete distributions of neuralsingle discriminationsingle multi- multi- by single and multiunitsingle single multi- multi- single single multi- multi- single single multi- multi- activity for each task are provided in Supplementary Figure 8 online.activity for each task are provided inunitSupplementaryunits unit units Figure 8 online. unit units unit units unit units unit units unit units unit units

606 606 VOLUME 11 [ NUMBER 5 [ MAY 2008 NATURE NEUROSCIENCEVOLUME 11 [ NUMBER 5 [ MAY 2008 NATURE NEUROSCIENCE Can speech sounds be ARTICLESdiscriminated using A1 responses?

Manner of articulation observations that monkeys, cats, birds and Stops Fricatives Affricates Nasals Glides Liquids rodents can discriminate consonant 17,18,31–36 Pad Bad Fad Vad Mad Wad sounds . The wide range of difficulty across the 11 tasks is advantageous for identi- fying neural correlates. Lips Consistent with our hypothesis that A1 representations make use of precise spike timing, /d/ versus /b/ was one of the easiest Tad Dad Sad Zad Chad Jad Nad Lad tasks (Fig. 4), and differences in the A1 onset response patterns were highly correlated with performance on the 11 tasks when 1-ms Roof bins were used (R2 0.75, P 0.0006, ¼ ¼ Place of articulation Fig. 5a). A1 responses were not correlated

Kad Gad Shad Had 30 kHz Yad Rad with behavior when spike timing information was removed (R2 0.046, P 0.5; Fig. 5b; ¼ ¼ 20 Supplementary Fig. 5a and Supplementary

Back Data online). 10 Neural discrimination predicts behavioral 0 200 400 600 ms discrimination Figure 1 Spectrograms of each speech sound grouped by manner and place of articulation. Words with Although it is interesting that the average unvoiced initial consonants are underlined. Frequency is represented on the y axis (0–35 kHz) and time neural response to each consonant was related on the x axis (–50 to 700 ms). Speech sounds were shifted one octaveEngineer higher to accommodate et the al 2008to behavior, in practice, individual speech rat hearing range. See Supplementary Figure 1 for a more detailed view of the first 40 ms of sounds must be identified during single trials, each spectrogram. not based on the average of many trials. Analysis using a nearest-neighbor classifier spatiotemporal patterns evoked by the consonants /d/ and /b/ were makes it possible to document neural discrimination on the basis of much more distinct than the patterns evoked by /m/ and /n/ (Fig. 3, single trial data and allows the direct correlation between neural and part 1), leading to the prediction that /d/ versus /b/ would be one of the behavioral discrimination in units of percentage correct. This classifier easiest consonant pairs to discriminate and /m/ versus /n/ would be one (which compares the poststimulus time histogram (PSTH) evoked by of the hardest. Alternatively, if information about precise timing is not each stimulus presentation with the average PSTH evoked by used, /d/ versus /b/ was predicted to be a very difficult discrimination each consonant and selects the most similar; see Methods) is effective (Fig. 3, part 2). To test these contrasting predictions, we evaluated in identifying tactile patterns and animal vocalizations using the ability of rats to distinguish between these and nine other cortical activity8,37. consonant pairs using an operant go/no-go procedure wherein rats Behavioral performance was well predicted by classifier performance were rewarded for a lever press after the presentation of a target when activity was binned with 1-ms precision. For example, a single consonant. The tasks were chosen so that each consonant pair differed by one articula- P BFV M W tory feature (place, voicing or manner; Fig. 1). 26.3 Rats were able to reliably discriminate 9 of the 17.4 11 pairs tested (Fig. 4 and Supplementary 13.7 10.4 Fig. 4 online). These results extend earlier 8.2 6.7 4.6 1.6 Figure 2 Neurograms depicting the onset response of rat A1 neurons to 20 English T D S Z Ch J NL consonants. Multiunit data was collected from 445 recording sites in 11 anesthetized, 26.3 experimentally naive adult rats. Average 17.4 13.7 poststimulus time histograms (PSTH) derived 10.4 from 20 repeats are ordered by the 8.2 characteristic frequency (kHz) of each recording 6.7 4.6 site (y axis). Time is represented on the x axis 1.6

(–5 to 40 ms). The firing rate of each site is Characteristic frequency (kHz) represented in grayscale, where black indicates KGSh H YR 400 Hz. For comparison, the mean population PSTH evoked by each sound is plotted above the 26.3 corresponding neurogram. For reference, ‘gad’ 17.4 13.7 evoked the highest population firing rate of 288 10.4 Hz. As in Figure 1, rows differ in the place of 8.2 articulation of each consonant, and columns differ 6.7 4.6 in the manner of articulation. See Supplementary 1.6 Figure 2 for a direct comparison between stop- 0 4000 40 400 40 0 400 40 consonant spectrograms and neurograms. Time (ms)

604 VOLUME 11 [ NUMBER 5 [ MAY 2008 NATURE NEUROSCIENCE Can A1 responses predict ARTICLESdiscrimnation behavior?

Manner of articulation observations that monkeys, cats, birds and Stops Fricatives Affricates Nasals Glides Liquids rodents can discriminate consonant 17,18,31–36 Pad Bad Fad Vad Mad Wad sounds . The wide range of difficulty across the 11 tasks is advantageous for identi- fying neural correlates. Lips Consistent with our hypothesis that A1 representations make use of precise spike timing, /d/ versus /b/ was one of the easiest Tad Dad Sad Zad Chad Jad Nad Lad tasks (Fig. 4), and differences in the A1 onset response patterns were highly correlated with performance on the 11 tasks when 1-ms Roof bins were used (R2 0.75, P 0.0006, ¼ ¼ Place of articulation Fig. 5a). A1 responses were not correlated

Kad Gad Shad Had 30 kHz Yad Rad with behavior when spike timing information was removed (R2 0.046, P 0.5; Fig. 5b; ¼ ¼ 20 Supplementary Fig. 5a and Supplementary

Back Data online). 10 Neural discrimination predicts behavioral 0 200 400 600 ms discrimination Figure 1 Spectrograms of each speech sound grouped by manner and place of articulation. Words with Although it is interesting that the average unvoiced initial consonants are underlined. Frequency is represented on the y axis (0–35 kHz) and time neural response to each consonant was related on the x axis (–50 to 700 ms). Speech sounds were shifted one octaveEngineer higher to accommodate et the al 2008to behavior, in practice, individual speech rat hearing range. See Supplementary Figure 1 for a more detailed view of the first 40 ms of sounds must be identified during single trials, each spectrogram. not based on the average of many trials. Analysis using a nearest-neighbor classifier spatiotemporal patterns evoked by the consonants /d/ and /b/ were makes it possible to document neural discrimination on the basis of much more distinct than the patterns evoked by /m/ and /n/ (Fig. 3, single trial data and allows the direct correlation between neural and part 1), leading to the prediction that /d/ versus /b/ would be one of the behavioral discrimination in units of percentage correct. This classifier easiest consonant pairs to discriminate and /m/ versus /n/ would be one (which compares the poststimulus time histogram (PSTH) evoked by of the hardest. Alternatively, if information about precise timing is not each stimulus presentation with the average PSTH evoked by used, /d/ versus /b/ was predicted to be a very difficult discrimination each consonant and selects the most similar; see Methods) is effective (Fig. 3, part 2). To test these contrasting predictions, we evaluated in identifying tactile patterns and animal vocalizations using the ability of rats to distinguish between these and nine other cortical activity8,37. consonant pairs using an operant go/no-go procedure wherein rats Behavioral performance was well predicted by classifier performance were rewarded for a lever press after the presentation of a target when activity was binned with 1-ms precision. For example, a single consonant. The tasks were chosen so that each consonant pair differed by one articula- P BFV M W tory feature (place, voicing or manner; Fig. 1). 26.3 Rats were able to reliably discriminate 9 of the 17.4 11 pairs tested (Fig. 4 and Supplementary 13.7 10.4 Fig. 4 online). These results extend earlier 8.2 6.7 4.6 1.6 Figure 2 Neurograms depicting the onset response of rat A1 neurons to 20 English T D S Z Ch J NL consonants. Multiunit data was collected from 445 recording sites in 11 anesthetized, 26.3 experimentally naive adult rats. Average 17.4 13.7 poststimulus time histograms (PSTH) derived 10.4 from 20 repeats are ordered by the 8.2 characteristic frequency (kHz) of each recording 6.7 4.6 site (y axis). Time is represented on the x axis 1.6

(–5 to 40 ms). The firing rate of each site is Characteristic frequency (kHz) represented in grayscale, where black indicates KGSh H YR 400 Hz. For comparison, the mean population PSTH evoked by each sound is plotted above the 26.3 corresponding neurogram. For reference, ‘gad’ 17.4 13.7 evoked the highest population firing rate of 288 10.4 Hz. As in Figure 1, rows differ in the place of 8.2 articulation of each consonant, and columns differ 6.7 4.6 in the manner of articulation. See Supplementary 1.6 Figure 2 for a direct comparison between stop- 0 4000 40 400 40 0 400 40 consonant spectrograms and neurograms. Time (ms)

604 VOLUME 11 [ NUMBER 5 [ MAY 2008 NATURE NEUROSCIENCE ARTICLES

1.0 100 P ********* Target T 1. Spike timing Distractor 0.9 K B 0.8 80 D G 0.7 F S 0.6 60 Sh V 0.5 Z H 0.4 40 Ch

J Percentage lever press 0.3 M N 0.2 20 W Relative difference between onset responses Y 0.1 L 2. Mean rate R 0 0 R L Y W N M J Ch H Z V Sh SPF G D B K T D/SD/B D/G D/T Sh/F Sh/J Sh/H Sh/Ch Sh/S R/L M/N Consonant discrimination task Figure 3 Predictions of consonant discrimination ability based on onset response similarity. The euclidean distance between the neurograms evoked Figure 4 Behavioral discrimination of consonant sounds. Rats successfully by each consonant pair (Fig. 2) was computed using (1) spike timing with discriminated 9 of 11 consonant pairs evaluated. Open bars, target sound for 1-ms bins or (2) average firing rate over a 40-ms bin (see Methods). Distance each go/no-go task; filled bars, nontarget sound. Error bars, s.e.m. across measures were normalized by the most distinct pair, such that dark red rats. Asterisks, significant discrimination (one-tailed, paired t-test, squares indicate pairs that are highly distinct and blue indicates pairs that P o 0.05). are similar. Pairs that were tested behaviorally are outlined in black. (1) refers to the upper left half of the figure and (2) refers to the lower right. discrimination was not significant (R2 0.02, P 0.6) when the ¼ ¼ classifier was given all 700 ms of activity (Fig. 6b, part 3). Neural sweep of activity from one multiunit cluster was able to discriminate discrimination was greatly reduced when temporal information was /d/ from /b/ 79.5 ± 0.8% (mean ± s.e.m.) of the time and /m/ from /n/ eliminated (that is, mean firing rate over 700 ms) and no relationship 60.1 ± 0.7% of the time; 50% is chance performance. Consistent with with behavior was observed (R2 0.06, P 0.5). For example, on the ¼ ¼ previous psychophysical evidence that the first 40 ms contain sufficient easiest task (/d/ versus /s/), rats were correct on 92.5 ± 0.8% of trials, information to discriminate consonant sounds13–16,thecorrelation whereas the classifier was correct only 55.4 ± 0.6% of the time when between the behavioral and neural discrimination was highest when spike timing was removed (Fig. 6b, part 4). The correlation between the classifier was provided A1 activity patterns during the first 40 ms of classifier and behavior was also not significant when the mean onset the cortical response (R2 0.66, P 0.002; Figs. 5c and 6a, part 1, response rate was used (40-ms bin, R2 0.14, P 0.2; Figs. 5d and 6a, ¼ ¼ ¼ ¼ Supplementary Fig. 6 and Supplementary Data online). This correla- part 2). These results show that the distinctness of the precise temporal tion was equally strong in awake rats (R2 0.63, P 0.004; Supple- activity patterns evoked by consonant onsets is highly correlated with ¼ ¼ mentary Fig. 7 online). Neural discrimination correlated well with discrimination ability in rats. behavior provided that onset responses were used (5–100 ms) and temporal information was preserved (1–10 ms,Supplementary Fig. 5b). Influence of population size on neural discrimination A1discriminationBecause of aperformance ceiling effect caused predicts by greatly improved neural To determine the neural population size that best correlates with behavioraldiscrimination, discrimination the correlation by between rats the behavioral and neural behavior, we compared behavioral discrimination with neural

abcd2 2 2 R = 0.75, P = 0.0006 R = 0.046, P = 0.5 R 2 = 0.66, P = 0.002 R = 0.14, P = 0.2 100 100 100 100 D/S D/S D/S D/B D/B D/S D/B D/G 90 D/G D/B 90 90 D/G 90 D/T D/T D/T D/G D/T 80 Sh/F 80 Sh/F 80 Sh/F 80 Sh/F

70 Sh/J 70 Sh/H Sh/J 70 Sh/H Sh/J 70 Sh/H Sh/J Sh/Ch Sh/H Sh/Ch Sh/Ch 60 Sh/S 60 Sh/Ch Sh/S 60 60 Sh/S Sh/S M/N R/L M/N R/L M/N R/L 50 R/L 50 50 50 M/N Behavior percentage correct Behavior percentage correct Behavior percentage correct 0.4 0.6 0.8 1 Behavior percentage correct 0.4 0.6 0.8 1 50 60 70 80 90 100 50 60 70 80 90 100 Distance between mean A1 Distance between mean A1 Classifier percent correct (1-ms bins) Classifier percent correct (40-ms bin) responses (1-ms bins) responses (40-ms bin)

Figure 5 Both average A1 responsesEngineer and trial-by-trial et al 2008 neural discrimination predicted consonant discrimination ability when temporal information was maintained. (a) The normalized euclidean distance between neurogram onset patterns (Fig. 3, part 1) correlated with behavioral performance when action potential times were binned with 1-ms precision. (b) Distance did not correlate with behavior when a single 40-ms bin was used (Fig. 3, part 2). (c,d) The average neural discrimination correlated with behavioral discrimination when 1-ms bins were used (c), but not when a 40-ms bin was used (d). Neural discrimination was performed by a nearest-neighbor classifier (see Methods) using a single sweep of neural activity. The 11 consonant pairs are printed next to each data point. Error bars, s.e.m. for behavioral and neural discrimination performance. Solid lines, best linear fits when statistically significant. Analysis using other distance measures is shown in Supplementary Table 1 online. Classifier performance on each task as a function of tone characteristic frequency is shown in Supplementary Figure 6.

NATURE NEUROSCIENCE VOLUME 11 [ NUMBER 5 [ MAY 2008 605 Summary

1) Natural sounds contain spectral and temporal cues essential for sound recognition

2) The cochlea extracts spectral and temporal modulations from sounds 3) Spectral and temporal modulation cues relevant for natural sound analysis are selectively enhanced along the ascending auditory pathway

4) Neural population responses in IC and A1 contain sufficient information for discrimination of speech and vocalization sounds