<<

Effects of in and speech suggest multiple pitch mechanisms 1 1,2 email: [email protected] Malinda J. McPherson & Josh H. McDermott Lab: mcdermottlab.mit.edu 1) Harvard Program in Speech and Hearing Bioscience and Technology 2) MIT, Department of Brain and Cognitive Science

Was the Basic Discrimination Contour Discrimination Introduction Results second note higher 100 Harmonic Conclusions or lower than 100 Are the two melodies N.S. Inharmonic the same or different? the first note? Inharmonic - Changing 90 - Pitch is classically defined as the percept of fundamental 90 ** Results suggest that “pitch ” is 80 (F0). F0 80 mediated by multiple mechanisms:

F0 70 - Traditional view: F0 is extracted, then used to compare musi- 70 Time ROC Area Based on Dowling and Fujitani, 1971

Percent Correct 60 cal notes, perceive speech prosody, etc. Inharmonic Trial Inharmonic-Changing Trial 60 - One mechanism which tracks shifts in Time Could track fine 50 Could 50 - Many standard pitch-related spectral pattern F0 - Chance performance for Inharmonic-Changing condi-

F0 fine spectral pattern and does not rely F0 track 40 tions suggests that participants are tracking shifts in fine 40 tasks are assumed to rely on .1 .25 12 Harmonic Inharmonic Inharm.-Changing lowest or Time F0 Difference () spectral pattern during Inharmonic conditions. on F0 Time F0, but other features (e.g. most TimeTime Time prominent - Subjects can track shifts in spectral Speech Contour Perception What word did you hear? spectrum) could give cues. harmonic Which speech excerpt 100 Mandarin Tone Perception 100 pattern regardless of whether har is different from the Harmonic Correct Response: second one, the first Inharmonic n = 12 N.S. or last? 90 Questions: 90 monic or inharmonic. N.S. ** F0 wu4li3 - Does pitch perception depend on estimating F0? 80 80 - A second mechanism that extracts F0 “Wùl ” - Is there a single mechanism for pitch perception? 70 Incorrect response 70 - Needed to determine interval 60 60 Percent Correct Approach: Time wu2li3 Percent Correct relationships between musical notes. Speech contours were modified by adding random 50 50 - Performed a battery of tests to depend on pitch frequency modulations between 1 and 2 Hz. - Suggests distinct means for 40 Only fluent Mandarin speakers were tested. 40 - With harmonic and inharmonic stimuli 51525 Testing occurred on Mturk. Harmonic Inharmonic Whisper FM Depth (% of F0) Condition extracting contour vs. interval. - Assumption: Inharmonic stimuli should impair F0-specific Some tasks are not affected by inharmonicity. Patterns of up/down (contour) can - F0 also critical for voice qualities mechanisms be derived without extracting F0, likely from shifts in spectral pattern. - Driven in part by absolute pitch

Interval Pattern Discrimination Sour Note Detection Methods and Materials Are the two patterns 100 100 the same or different? n = 10 Does the melody The standard psychoacoustic assessment 90 90 ** Synthetic Complex Tones: contain a mistake? 80 * 80 task (basic discrimination) does not Noise to mask Fixed bandpass filter to Each harmonic jittered F0 ** remove coarse spectral U(-.5, .5) 70 70 appear to require the textbook notion of

products ROC Area 60 60 cues Percent Correct pitch, but many real-world pitch tasks 50 Sample 300 Hz Harmonic Tone Sample 400 Hz Inharmonic Tone Time 50 -40 -40 Contours were the same within every trial; intervals 40 40 indeed seem to involve estimating the F0. Harmonic Inharmonic Harmonic InharmonicInharm.-Changing were altered by 1 for ‘different’ trials Condition Out of key note ) ) -60 -60 Open Questions 1 Interval Size Comparison Harmonic Open Famous Melody Recognition -80 -80 Inharmonic 100 - Why would there be multiple mechanisms 90 n=120 What song is this? (OR, how do 80 80 ** Magnitude (dB Magnitude (dB you recognize this song?) for processing pitch? -100 ** ** -100 Which interval 60 is wider? 70 F0 - If there are multiple pitch mechanisms, F0 ‘Somewhere over the 40

60 rainbow? The wizard of Oz’ Percent Correct -120 -120 Percent Correct how are they realized in the brain? 0 2000 4000 6000 8000 0 2000 4000 6000 8000 20 50 Frequency (Hz) Frequency (Hz) Time Time 0 Based on Burns & Ward, 1978 c c y References 40 Harmoni Inharmoni .25 .5 124 Rhythm Onl - ‘Inharmonic’: same jitter pattern for entire trial. Inharmonic - Changing Change in Interval (In Semitones) Condition Burns, E.M., W.D. Ward, (1978) - ‘Inharmonic - Changing’: different jitter patterns for each note. Some tasks are strongly affected by inharmonicity. F0 based pitch estimation appears to be Dowling, W. J., & Fujitani, D. S. (1971). Sample Inharmonic Sentence Speech Synthesis: Speech was 6000 0 McDermott, J. H., D.P.W. Ellis, H. Kawahara (2012) -5 necessary for interval and tonality perception. 5000 manipulated using STRAIGHT. -10 -15 Open Set Celebrity Voice Recognition Micheyl and Oxenham, 2010 4000 100 Voice Discrimination -20 Which speaker Four seconds of speech 50 (McDermott et al. 2012). Speech Harmonic 3000 -25 by a well known individual n=207 Woods, Siegel, Traer, McDermott. (2016, PsyArxiv) only spoke once 90 * Inharmonic -30 (ex. Barack Obama, Betty (first or last)? 40 Frequencey (Hz) 2000 was high-pass filtered; low-pass -35 ** White, James Earl Jones, 80 ** -40 etc.) 1000 Speaker 1 Funding -45 noise added to mask DPs. 30 ** F0 70 0 -50 Speaker 2 0 0.2 0.4 0.6 0.8 1 Whose voice is this? (OR, how Harmonic jittered using U(-.5, 5). Time (s) 60 This work was supported by a McDonnell Scholar

Percent Correct do you recognize this voice?) 20 award to JHM, NIH/NIDCD training grant to MJM and Percent Correct In lab experiments: n=15 tested in a soundproof booth, 50 NSF Graduate Research Fellowship under grant No. ‘He plays Professor Snape in 10 DGE1144152 to MJM. (mean age=32.4, s.d.=15.2, 9 musicians, mean=8.8 years). 40 the Harry Potter movies’ Time Harmonic Inharmonic Whispered Condition Mechanical Turk Testing: MTurk subjects passed a head- 0 The content is solely the responsibility of the authors Whisper Regular Pitch and does not necessarily represent the official views of phone quality check (Woods et al.) before completing the task. Pitch Shifted +31/4 ST Pitch Variance F0 and absolute pitch aid in speaker recognition. Condition the National Institutes of Health.