Quantifying the Consonance of Complex Tones with Missing Fundamentals
Total Page:16
File Type:pdf, Size:1020Kb
QUANTIFYING THE CONSONANCE OF COMPLEX TONES WITH MISSING FUNDAMENTALS A THESIS SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF ENGINEER Song Hui Chon June 2008 c Copyright by Song Hui Chon 2008 All Rights Reserved ii Approved for the department. (Julius O. Smith, III) Advisor Approved for the Stanford University Committee on Graduate Studies. iii Abstract Consonance is one of the more fundamental perceptual criteria in music and sound on which Western music theory is built. It is closely related to the musical intervals, explaining why certain intervals sound better than others. Recently, the concept of tonal consonance has been distinguished from that of mu- sical consonance. Tonal consonance is about consonance or “pleasantness” of tones when sounded together, while musical consonance is about intervals. Tonal conso- nance appears to be a superset of musical consonance, in that musically consonant intervals are tonally consonant. The “missing fundamental” (or “virtual pitch”) is an interesting psychoacoustical phenomenon. When we hear a set of tones whose frequencies are integer multiples of a fundamental that is missing in the sound, we identify it as a complex tone whose pitch is the missing fundamental frequency. This phenomenon enables producing decent bass sound out of mini-sized speakers. Then the question is, why do we hear something that is not there? This thesis deals with the intersection of tonal consonance and missing funda- mental. While trying to explain some data from a psychoacoustical experiment, I had stumbled onto the concept tonal consonance. This work is built upon the earlier work, with the addition of missing fundamental analysis. The work covered in this thesis finds that the consonance of most sound stimuli stayed pretty constant regardless of the loudness level at which they were presented. iv It also supports a previous conclusion that each type of stimulus has its own intrinsic consonance value. A new discovery here is that the consonance values for the stimuli considered seem to be grouped by the size of bandwidth. v Acknowledgments I would first like to thank my advisor, Julius O. Smith, for his constant support and guidance. It was his advice that first prompted me to embark on this exciting journey of applying engineering techniques to the applications of music. I would also like to thank the CCRMA community including the faculty there – Jonathan Berger, Chris Chafe, Jonathan Abel, Marina Bosi, Malcolm Slaney and David Berners. I had the honor of being the teaching assistant for Drs. Abel, Berners and Bosi, to whom I owe much of my audio compression and effect knowledge. They were teachers, mentors and friends for the past four years of my life. There are my friends and colleagues at CCRMA and Electrical Engineering who have been there for me when I needed a guidance – Woon Seung Yeo, Kyogu Lee, Sook Young Won, Juhan Nam, Minjong Kim, Greg Sell, Gautham Mysore, Ed Berdahl and David Yeh. They would teach, help and discuss with me and that eventually led me here. I would especially like to thank Daniel Steele, with whom I worked on a research project and published a paper together. In doing the research we collaborated, I found the topic of consonance quantification and its underlying problems, which eventually became this thesis. I owe a big thank to my church community at Calvary Chapel Mountain View, including Inma and David Robinson, Lisa Erickson, Regina and Kirill Buryak, Brid- get Ingham and others. They have been unvaryingly understanding, supportive and vi loving during my many ups and downs in the last few years, with prayer and encour- agement. My life at Stanford would have been a lot more challenging without their support. My special thanks go to Doctor Ik Keun Hwang of Chonbuk National University in Korea. Without his medical cares, I would not be here now studying what I am curious about. I am indebted to him eternally for his thorough care and the miracle he produced. The biggest thanks is for my family for their love and support. They have showed a continuous and unconditional love throughout my entire life. I feel very blessed to belong to my family. It was my parents’ love and encouragements that enabled me to pursue a degree at Stanford. My caring sister was always there for me with arms stretched. My brother, who is also studying psychoacoustics at Seoul National University in Korea, was a friend and colleague with whom I had numerous research conversations. Last, but certainly not the least, I thank God for my past four years at Stanford. He brought me here and opened the doors for me to study music and engineering together. My life often took an unexpected turn, but no matter what I was going through, God always was faithful to His Words. I hope that this thesis is a testimony to His glory. vii Contents Abstract iv Acknowledgments vi 1 Introduction 1 1.1 IntroductionandMotivation . 1 1.2 Consonance ................................ 2 1.2.1 TonalConsonance......................... 3 1.3 MissingFundamental........................... 4 2 Quantifying Consonance 6 2.1 Introduction................................ 6 2.2 TheProposedAlgorithm . 10 2.2.1 HowYINWorks ......................... 14 2.3 Experiment ................................ 16 2.3.1 A Simple Counterexample for Plomp and Levelt . 16 3 Annoyance Perception Experiment 21 3.1 Stimuli................................... 21 3.2 Procedure ................................. 23 3.3 TheMatterofWeighting. .. .. 24 viii 3.4 Results................................... 28 4 Results and Discussion 30 4.1 QTC Values of Annoyance Stimuli . 30 4.2 CorrelationCoefficients. 32 4.3 Fundamental Estimation using YIN . 34 4.3.1 Comparison of Estimated Fundamentals with Human Perception 38 5 Conclusions and Future Works 40 A A Perceptual Study of Sound Annoyance 42 B Sound Annoyance as a Function of Consonance 49 Bibliography 58 ix List of Tables 3.1 Theoretically calculated perceived loudness values using dB(A), dB(B), dB(C)anddB(ITU-R468) . 27 4.1 The QTC values calculated with the proposed algorithm . ..... 30 4.2 The correlation coefficients of QTC values and loudness levels per stim- ulustype.................................. 32 4.3 The correlation coefficients of QTC values and annoyance order, per loudnesslevel ............................... 32 x List of Figures 2.1 The original QTC algorithm (from Chon et al. [5]). .... 7 2.2 Illustration of consonant and dissonant parts in a pair of pure tones (fromChonetal.[5]). .......................... 8 2.3 ThemodifiedQTCalgorithm(from[6]). 9 2.4 Perceived consonance and dissonance of a pair of pure tones as defined by Plomp and Levelt in [17] (reproduced from [6]). 10 2.5 TheproposedQTCalgorithm.. 12 2.6 Magnitude spectrum of a) the complex tone, b) the pure tone and c) themixture................................. 17 2.7 Spectrogram of a) the complex tone, b) the pure tone and c) the mixture. 18 2.8 Time-domain waveforms of a) the complex tone, b) the pure tone and c)themixture. .............................. 19 3.1 The magnitude spectrum of the six stimuli . .. 22 3.2 Weighting standards: A-, B-, C-weighting and ITU-R 468 . ...... 24 3.3 Weighted decibel levels of six stimuli using A- (solid red line), B- (blue dash dotted line), C- (black dotted line) and ITU-R 468 (green dashed line)weighting............................... 25 3.4 The result from annoyance perception experiment [20] . ....... 28 4.1 QTC values of twenty-four stimuli . 31 xi 4.2 Magnitude spectrum of the mixture from the Simple Test in section 2.3.1(fromfigure2.6)........................... 34 4.3 Magnitude spectrum of the mixture from the Simple Test in section 2.3.1 and the estimated fundamentals using YIN . 35 4.4 Fundamental estimation procedure in the Peaks and Fundamentals blockinfigure2.5............................. 36 xii Chapter 1 Introduction 1.1 Introduction and Motivation Consonance is one of the oldest concepts in music and sound. It is an intuitive concept that produces ambiguities in attempting a formal definition. During my annoyance perception studies [6] [20] I became curious about this concept of consonance and started to wonder if I could somehow quantify the consonance of sound stimuli and whether it could in turn explain the annoyance experiment data. It was relatively successful, because it could explain most cases. While studying papers on consonance, I realized that the literature was sparse. And I began to wonder if there should have been more questions in this field, now that we have achieved a significant development in psychoacoustics and also in com- putational processing capabilities. Thus it is time to make advances. I have decided to pursue the issue of consonance quantification in the case of missing fundamentals, described in section 1.3. My last paper [6] used a classic algorithm by Plomp and Levelt from [17] to quantify disso- nance between two tones, which regarded two pure tones (i.e., sinusoids) completely consonant when they are over 1.2 times a critical bandwidth apart. This will be the 1 CHAPTER 1. INTRODUCTION 2 right judgment for cases with only two sinusoids considered. But in real life, most audio stimuli have complex harmonic structures in them; therefore Plomp and Lev- elt’s theory may not hold up well. In fact, a simple listening test reveals that there is a perceived beating created between a pure tone and a set of complex tones with a missing fundamental, where the missing fundamental and the pure tone are within the same critical bandwidth, while the lowest existing harmonic and the pure tone are more than a critical bandwidth apart. We can see using MATLAB that the time- domain waveforms exhibit a physical modulation whose rate corresponds to the rate of beats in this case. This suggests not only that there is a perceptual phenomenon that can be explained by nonlinear processing in ear, but also a physical phenomenon of beating even though the fundamental is not there, and that beating is translated as dissonance in psychoacoustics.