Spatial Cross-Correlation Biol. Cybern. 47, 149-163 (1983) Biological Cybernetics @ Springer-Verlag 1983 Spatial Cross-Correlation A Proposed Mechanism for Acoustic Pitch Perception Gerald E. Loeb*, Mark W. White, and Michael M. Merzenich Coleman Laboratory, Department of Otolaryngology, University of California at San Francisco, San Francisco, California, USA * Laboratory of Neural Control, IRP, National Institute of Neurological and Communicative Disorders and Stroke, Bethesda, Maryland, USA Abstract. We propose in this paper a new class of Profoundly deaf subjects have been chronically im- model processes for the extraction of spectral infor- planted with intracochlear electrode arrays in an at- mation from the neural representation of acoustic tempt to bypass their defective hair cells, which nor- signals in mammals. We are concerned particularly mally transduce mechanical vibrations traveling along with mechanisms for detecting the phase-locked ac- the basilar membrane into a temporospatial pattern of tivity of auditory neurons in response to frequencies discharges in the distributed auditory nerve fiber pop- and intensities of sound associated with speech per- ulation (Simmons, 1966; Michelson, 1971 ; Clark et ception. Recent psychophysical tests on deaf human al., 1977; Tonndorf, 1977; Eddington et al., 1978; subjects implanted with intracochlear stimulating elec- Merzenich et al., 1979; Tong et al., 1979; Michelson trodes as an auditory prosthesis have produced results and Schindler, 1981). As we shall discuss here, per- which are in conflict with the predictions of the ceptions of sound evoked by electrical stimulation in classical place-pitch and periodicity-pitch theories. In these patients cannot be reconciled with currently our model, the detection of synchronicity between two dominant theories of how the nervous system normally phase-locked signals derived from sources spaced a extracts pitch information from patterns of neural finite distance apart on the basilar membrane can be activity generated by acoustic input. Although this is a used to extract spectral information from the new and unfamiliar source of psychophysical data, it spatiotemporal pattern of basilar membrane motion. represents a powerful tool whose underlying physical Computer simulations of this process suggest an op- processes are beginning to be better understood (see timal spacing of about 0.3-0.4 of the wavelength of Merzenich, 1973 ; Ranck, 1975 ; Marks, 1977 ; Pollen et the frequency to be detected. This interval is consistent al., 1977; Kiang et al., 1979; Black et al., 1981). with a number of psychophysical, neurophysiological, We have been exploring the possibility that pa- and anatomical observations, including the results of tients' pitch sensations arise from changes in the high resolution frequency-mapping of the anteroven- synchronicity of firing among the neurons reaching tral cochlear nucleus which are presented here. One threshold over fairly extended regions of the auditory particular version of this model, invoking the bin- nerve array. A neuronal model of normal acoustic aurally sensitive cells of the medial superior olive as pitch perception can be built on such a concept which the critical detecting elements, has properties which are is simple and which can account for psychophysical useful in accounting for certain complex binaural effects that are difficult to explain by invoking either psychophysical observations. "place-pitch" or "periodicity-pitch" models (see below). This spatiotemporal model is consistent with the de- scribed neuroanatomy and physiology of brainstem auditory nuclei. This latter point is particularly impor- Introduction tant, as current place- and periodicity-pitch theories are abstract constructs of signal theory which address It has recently become possible to directly test theories complex psychophysical problems (e.g. see Wightman, of the neural mechanisms underlying pitch perception 1973) but ignore or even contradict anatomical and by eliminating the complex transduction processes of physiological descriptions of the auditory nervous the middle and inner ear and directly activating the system. A previous attempt to invoke spatial crosscor- sensory neurons in a localized and controlled manner. relation to account for sharpness of tuning in cochlear 150 MECHANISMS FOR PITCH PERCEPTION Fig. 1. Human auditory perceptual range 120 (top) divided into three regions likely to be subserved by different pitch perception LOUDNESS mechanisms. Pure place pitch (Helmholtzian) is adequate only for high frequencies (sharply ::L\ ::L\ tuned) and low amplitudes (below neural saturation). Pure rate pitch is available only for frequencies below the limits of sustained 10 neural firing rates (less than 500 pps). The central reg&, subserving most of the speech TONE NEURAL PHASE HUMAN perception requirements, utilizes the phase- FUSION FIRING LOCKING HEARING locking of neural discharge which is LlMlT LlMlT - LlMlT increasingly stable at higher sound intensities. SINGLE NEURON The typical tuning curve of single auditory nerve fibers and AVCN cells (middle) demonstrates very little frequency selectivity at intensities typical of normal speech. This, combined with the tendency of neurons to 10 saturate at their maximal firing rates, C.F. accounts for the flat profile of neural activity in the auditory nerve for high amplitude, ACOUSTIC STIMULUS FREQUENCY - single frequency stimuli (bottom). Note that the basilar membrane tuning of decreasing frequency with increasing distance from the MEAN AUDITORY cochlear base has been indicated by reversing NERVE ACTIVITY the frequency axis in the bottom figure, PPS 250 pivoting it around the characteristic frequency t of the middle figure. In this and all other figures, basilar membrane position is given in millimeters from the basal entry point of the BASE 5 10 15 20 25 30 SEX (20,000 Hz) (20~~)traveling wave, rather than as the more BASllAR MEMBRANE POSITION, mm conventional but reversed cochlear place TRAVELING WAVE - (distance from the apex) afferents called for a complex synaptic interaction human sound perception, covering the range between inner and outer hair cell afferents which now 20-20,000Hz (Fig. 1). Each appears to involve dif- seems unlikely (Nieder, 197 la, b). ferent information extraction strategies in the auditory In this paper, the kinds of spectral information system. We here use the term "pitch" to denote any normally present in the auditory nerve array and the perceptual experience for which a subject could de- previously proposed neural models for extracting this termine a matching sinusoidal tone. We are primarily information are reviewed. Using new data on the concerned with mechanisms by which the spectral activity patterns induced in the auditory system by frequencies present are converted to an internal neu- intracochlear electrical stimulation, the perceptions ronal representation which is approximately isorepre- reported by auditory prosthesis patients are contrasted sentational. We also assume that the transduction and with the predictions of these models. A new class of pitch detection mechanisms of most mammals includ- spectral detectors which extract "synchronicity-pitch" ing man are comparable in their general form. by a process of spatial crosscorrklation is then de- Place Pitch. The highest frequencies, about 5000 to scribed. Some psychophysical and neuroanatomical 20,000 Hz, are probably detected and discriminated as predictions derived from modeling this process and a result of mechanically tuned resonance along the data consistent with these predictions are reviewed. basilar membrane of the cochlea, somewhat as pro- Finally, the utility of this theory in accounting for posed by Helmholtz (1863). Sound energy from a tone complex psychophysical phenomena is briefly dis- is coupled into the basilar membrane at the base of the cussed. Preliminary discussions of some aspects of this spiral and travels apically through progressively lower theory and some data have been presented elsewhere frequencies of resonance until it reaches the region (White, 1980; White et al., 1980, 1981; Loeb et al., tuned to its frequency. There the amplitude of vi- 1980, 1981 ; Merzenich et al., 1980). brations reaches a maximum, and there local hair cells and auditory nerve fibers are excited, giving rise to the Previous Models "place pitch" (Bekesy and Rosenblith, 1951). The The psychophysics and neurophysiology of pitch per- traveling wave is then very rapidly damped by a ception suggest that there are three distinct bands of process which is still pdorly understood (Wilson and Johnstone, 1972). The psychophysics of pitch per- back onto the dorsal cochlear nucleus (Adams and ception in this band are consistent with the tuning Warr, 1976; see Brugge and Geisler, 1978).] curves of auditory nerve fibers (Moore, 1973). We here It is probable that the nervous system makes use of use the term "place pitch" to indicate the class of pitch phase-locking of discharges generated by these mid- extraction theories based solely on a spatial gradient of range frequencies (Anderson et al., 197 1). For acoustic neural activity, ignoring temporal cues. Auditory nerve stimuli from about 500 to 5000 Hz, no single neuron fibers have sharp enough tuning curves and broad can fire fast enough to follow each stimulus cycle, but enough dynamic range to account for the relatively rather each tends to fire on randomly changing sub- modest frequency discrimination limens achievable at harmonics of the incoming frequency.
