Audio Processing and Loudness Estimation Algorithms with Ios Simulations

Total Page:16

File Type:pdf, Size:1020Kb

Audio Processing and Loudness Estimation Algorithms with Ios Simulations Audio Processing and Loudness Estimation Algorithms with iOS Simulations by Girish Kalyanasundaram A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science Approved September 2013 by the Graduate Supervisory Committee: Andreas Spanias, Chair Cihan Tepedelenlioglu Visar Berisha ARIZONA STATE UNIVERSITY December 2013 ABSTRACT The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been examined and improved algorithms have been proposed to overcome limitations of these methods. In addition, real-time applications such as perceptual loudness estimation and loudness equalization using auditory models have also been implemented. A software implementation of loudness estimation on iOS devices is also reported in this thesis. In addition to the loudness estimation algorithms and software, in this thesis project we also created new illustrations of speech and audio processing concepts for research and education. As a result, a new suite of speech/audio DSP functions was developed and integrated as part of the award-winning educational iOS App 'iJDSP.” These functions are described in detail in this thesis. Several enhancements in the architecture of the application have also been introduced for providing the supporting framework for speech/audio processing. Frame-by-frame processing and visualization functionalities have been developed to facilitate speech/audio processing. In addition, facilities for easy sound recording, processing and audio rendering have also been i developed to provide students, practitioners and researchers with an enriched DSP simulation tool. Simulations and assessments have been also developed for use in classes and training of practitioners and students. ii DEDICATION To my parents. iii ACKNOWLEDGMENTS This thesis would not have been possible without the support of a number of people. It is my pleasure to start by heartily thanking my advisor, Prof. Andreas Spanias, for providing me with the wonderful opportunity of being a Research Assistant with his supervision and pursuing a thesis for my Master’s degree. I thank him for his support and for providing me with all the resources for fuelling a fruitful Masters program. I also thank Prof. Cihan Tepedelenlioglu and Dr. Visar Berisha for agreeing to be part of my defense committee. In particular, I would like to thank Dr. Tepedelenlioglu for his support for the first six months of my assistantship on the PV array project. My interactions with him were great learning experiences during the initial phases of my assistantship. I also thank ‘Paceco Corporation’ for providing the financial support during my involvement in the initial stages of my assistantship. And, I thank Henry Braun and Venkatachalam Krishnan for all their support during this project. Dr. Harish Krishnamoorthi played a particularly important role by providing me with all the required materials and software during the initial stage of my work on auditory models. The lengthy discussions with him provided me with crucial insight to work towards the initial breakthroughs in the research. I would like to specially thank Dr. Mahesh Banavar , Dr. Jayaraman Thiagarajan and Dr. Karthikeyan Ramamurthy for all those stimulating discussions, timely help, suggestions and continued motivation which were vital for every quantum of my progress. I also thank Deepta Rajan, Sai Zhang, Shuang Hu, Suhas Ranganath, Mohit iv Shah, Steven Sandoval, Brandon Mechtley, Dr. Alex Fink, Bob Santucci, Xue Zhang, Prasanna Sattigeri and Huan Song for all their support. The best is always reserved for the last. I express my deepest and warmest thanks to my parents for placing their faith in me. Their contributions are beyond the scope of this thesis. I thank all friends and relatives for their support. And I finally thank the grace of the divine for everything. v TABLE OF CONTENTS Page LIST OF TABLES .................................................................................................................... x LIST OF FIGURES ................................................................................................................. xi CHAPTER 1 INTRODUCTION ..................................................................................................... 1 1.1 Human Auditory Models Based Processing and Loudness Estimation .............. 3 1.2 Computation Pruning for Efficient Loudness Estimation ................................... 6 1.3 Automatic Control of Perceptual Loudness ......................................................... 7 1.4 Interactivity in Speech/Audio DSP Education ..................................................... 7 1.5 Contributions ....................................................................................................... 12 1.6 Organization of Thesis ........................................................................................ 13 2 PSYCHOACOUSTICS AND THE HUMAN AUDITORY SYSTEM ................ 15 2.1 The Human Auditory System ............................................................................. 15 The Outer Ear ...................................................................................................... 15 The Middle Ear ................................................................................................... 15 The Inner Ear ...................................................................................................... 17 2.2 Psychoacoustics .................................................................................................. 18 The Absolute Threshold of Hearing ................................................................... 19 Auditory Masking ............................................................................................... 22 Critical Bands ..................................................................................................... 27 3 HUMAN AUDITORY MODELS AND LOUDNESS ESTIMATION ................ 31 3.1 Loudness Level and the Equal Loudness Contours (ELC) ............................... 31 vi 3.2 Steven’s Law and the ‘Sone’ Scale for Loudness .............................................. 35 3.3 Loudness Estimation from Neural Excitations .................................................. 36 3.4 The Moore and Glasberg Model for Loudness Estimation ............................... 41 Outer and Middle Ear Transformation ............................................................... 42 The Auditory Filters: Computing the Excitation Pattern .................................. 45 Specific Loudness Pattern .................................................................................. 47 Total Loudness .................................................................................................... 49 Short-term and Long-term Loudness ................................................................. 50 3.5 Moore and Glasberg Model: Algorithm Complexity ........................................ 51 4 EFFICIENT LOUDNESS ESTIMATION - COMPUTATION PRUNING TECHNIQUES ....................................................................................................... 54 4.1 Estimating the Excitation Pattern from the Intensity Pattern ............................ 59 Interpolative Detector Pruning ........................................................................... 61 Exploiting Region of Support of Auditory Filters in Computation Pruning .... 62 4.2 Simulation and Results ....................................................................................... 62 Experimental Setup ............................................................................................. 63 Performance of Proposed Detector Pruning ...................................................... 64 5 LOUDNESS CONTROL ........................................................................................ 68 5.1 Background ......................................................................................................... 68 5.2 Loudness Based Gain Adaptation ...................................................................... 70 5.3 Preserving the Tonal Balance ............................................................................. 77 5.4 Loudness Control System Setup......................................................................... 80 6 SPEECH/AUDIO PROCESSING FUNCTIONALITY IN IJDSP ........................ 82 vii 6.1 The iJDSP Architecture: Enhancements ............................................................ 83 Framework for Blocks with Long Signal Processing Capabilities ................... 83 Frame-by-Frame Processing and Visualization ................................................. 85 Planned DSP Simulations ................................................................................... 88 6.2 Developed DSP Blocks .....................................................................................
Recommended publications
  • An Assessment of Sound Annoyance As a Function of Consonance
    An Assessment of Sound Annoyance as a Function of Consonance Song Hui Chon CCRMA / Electrical Engineering Stanford University Stanford, CA 94305 [email protected] Abstract In this paper I study the annoyance generated by synthetic auditory stimuli as a function of consonance. The concept of consonance is one of the building blocks for Western music theory, and recently tonal consonance (to distinguish from musical consonance) is getting attention from the psychoacoustics community as a perception parameter. This paper presents an analysis of the experimental result from a previous study, using tonal consonance as a factor. The result shows that there is a direct correlation between annoyance and consonance and that perceptual annoyance, for the given manner of stimulus presentation, increases as a stimulus becomes more consonant at a given loudness level. 1 Introduction 1.1 Addressing Annoyance Annoyance is one of the most common feelings people experience everyday in this modern society. It is a highly subjective sensation; however this paper presumes that there is a commonality behind each person's annoyance perception. It is generally understood that one aspect of annoyance is a direct correlation with spectral power [1], although the degree of correlation is under investigation [2][3]. Previous studies have assumed that noise level is a strict gauge for determining annoyance [4], but it is reasonable to suggest that the influence of sound level might be more subtle; annoyance is, after all, a subjective and highly personal characteristic. One study reveals that subjects perceive annoyance differently based on their ability to influence it [5]. Another demonstrates that age plays a significant role in annoyance judgments [6], though conflicting results indicate more complex issues.
    [Show full text]
  • Relating Optical Speech to Speech Acoustics and Visual Speech Perception
    UNIVERSITY OF CALIFORNIA Los Angeles Relating Optical Speech to Speech Acoustics and Visual Speech Perception A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering by Jintao Jiang 2003 i © Copyright by Jintao Jiang 2003 ii The dissertation of Jintao Jiang is approved. Kung Yao Lieven Vandenberghe Patricia A. Keating Lynne E. Bernstein Abeer Alwan, Committee Chair University of California, Los Angeles 2003 ii Table of Contents Chapter 1. Introduction ............................................................................................... 1 1.1. Audio-Visual Speech Processing ............................................................................ 1 1.2. How to Examine the Relationship between Data Sets ............................................ 4 1.3. The Relationship between Articulatory Movements and Speech Acoustics........... 5 1.4. The Relationship between Visual Speech Perception and Physical Measures ....... 9 1.5. Outline of This Dissertation .................................................................................. 15 Chapter 2. Data Collection and Pre-Processing ...................................................... 17 2.1. Introduction ........................................................................................................... 17 2.2. Background ........................................................................................................... 18 2.3. Recording a Database for the Correlation Analysis.............................................
    [Show full text]
  • Improving Opus Low Bit Rate Quality with Neural Speech Synthesis
    Improving Opus Low Bit Rate Quality with Neural Speech Synthesis Jan Skoglund1, Jean-Marc Valin2∗ 1Google, San Francisco, CA, USA 2Amazon, Palo Alto, CA, USA [email protected], [email protected] Abstract learned representation set [11]. A typical WaveNet configura- The voice mode of the Opus audio coder can compress wide- tion requires a very high algorithmic complexity, in the order band speech at bit rates ranging from 6 kb/s to 40 kb/s. How- of hundreds of GFLOPS, along with a high memory usage to ever, Opus is at its core a waveform matching coder, and as the hold the millions of model parameters. Combined with the high rate drops below 10 kb/s, quality degrades quickly. As the rate latency, in the hundreds of milliseconds, this renders WaveNet reduces even further, parametric coders tend to perform better impractical for a real-time implementation. Replacing the di- than waveform coders. In this paper we propose a backward- lated convolutional networks with recurrent networks improved compatible way of improving low bit rate Opus quality by re- memory efficiency in SampleRNN [12], which was shown to be synthesizing speech from the decoded parameters. We compare useful for speech coding in [13]. WaveRNN [14] also demon- two different neural generative models, WaveNet and LPCNet. strated possibilities for synthesizing at lower complexities com- WaveNet is a powerful, high-complexity, and high-latency ar- pared to WaveNet. Even lower complexity and real-time opera- chitecture that is not feasible for a practical system, yet pro- tion was recently reported using LPCNet [15]. vides a best known achievable quality with generative models.
    [Show full text]
  • Lecture 16: Linear Prediction-Based Representations
    Return to Main LECTURE 16: LINEAR PREDICTION-BASED REPRESENTATIONS Objectives ● Objectives: System View: Analysis/Synthesis ❍ Introduce analysis/synthesis systems The Error Signal ❍ Discuss error analysis and gain-matching Gain Matching ❍ Relate linear prediction coefficients to other spectral representations Transformations: Levinson-Durbin ❍ Introduce reflection and prediction coefficent recursions Reflection Coefficients Transformations ❍ Lattice/ladder filter implementations The Burg Method There is a classic textbook on this subject: Summary: Signal Modeling J.D. Markel and A.H. Gray, Linear Prediction of Speech, Springer-Verlag, New Typical Front End York, New York, USA, ISBN: 0-13-007444-6, 1976. On-Line Resources: This lecture also includes material from two other textbooks: AJR: Linear Prediction LPC10E MAD LPC J. Deller, et. al., Discrete-Time Processing of Speech Signals, MacMillan Publishing Co., ISBN: 0-7803-5386-2, 2000. Filter Design and, L.R. Rabiner and B.W. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Upper Saddle River, New Jersey, USA, ISBN: 0-13-015157-2, 1993. Return to Main Introduction: 01: Organization (html, pdf) Speech Signals: ECE 8463: FUNDAMENTALS OF SPEECH 02: Production (html, pdf) RECOGNITION 03: Digital Models Professor Joseph Picone (html, pdf) Department of Electrical and Computer Engineering Mississippi State University 04: Perception (html, pdf) email: [email protected] phone/fax: 601-325-3149; office: 413 Simrall 05: Masking URL: http://www.isip.msstate.edu/resources/courses/ece_8463 (html, pdf) Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, 06: Phonetics and Phonology Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These (html, pdf) systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP).
    [Show full text]
  • Source-Channel Coding for CELP Speech Coders
    2G'S c5 Source-Channel Coding for CELP Speech Coders J.A. Asenstorfer B.Ma.Sc. (Hons),B.Sc.(Hons),8.8',M'E' University of Adelaide Department of Electrical and Electronic Engineering Thesis submitted for the Degree of Doctor of Philosophy. Septemb er 26, L994 A*ond p.] l'ìq t Contents 1 1 Speech Coders for NoisY Channels 1.1 Introduction 1 2 I.2 Thesis Goals and Historical Perspective 2 Linear Predictive Coders Background I I 2.1 Introduction ' 10 2.2 Linear Predictive Models . 2.3 Autocorrelation of SPeech 15 17 2.4 Generic Code Excited Linear Prediction 3 Pitch Estimation Based on Chaos Theory 23 23 3.1 Introduction . 3.1.1 A Novel Approach to Pitch Estimation 25 3.2 The Non-Linear DYnamical Model 26 29 3.3 Determining the DelaY 32 3.4 Speech Database 3.5 Initial Findings 32 3.5.1 Poincaré Sections 34 35 3.5.2 Poincaré Section in Time 39 3.6 SpeechClassifrcation 40 3.6.1 The Zero Set 4l 3.6.2 The Devil's Staircase 42 3.6.3 Orbit Direction Change Indicator 45 3.7 Voicing Estimator 47 3.8 Pitch Tracking 49 3.9 The Pitch Estimation Algorithm ' 51 3.10 Results from the Pitch Estimator 55 3.11 Waveform Substitution 58 3.11.1 Replication of the Last Pitch Period 59 3.1I.2 Unvoiced Substitution 60 3.11.3 SPlicing Coders 61 3.11.4 Memory Considerations for CELP 61 3.12 Conclusion Data Problem 63 4 Vector Quantiser and the Missing 63 4.I Introduction bÐ 4.2 WhY Use Vector Quantisation? a Training Set 66 4.2.1 Generation of a Vector Quantiser Using Results 68 4.2.2 Some Useful Optimal Vector Quantisation ,7r 4.3 Heuristic Argument ùn .
    [Show full text]
  • AN INTRODUCTION to MUSIC THEORY Revision A
    AN INTRODUCTION TO MUSIC THEORY Revision A By Tom Irvine Email: [email protected] July 4, 2002 ________________________________________________________________________ Historical Background Pythagoras of Samos was a Greek philosopher and mathematician, who lived from approximately 560 to 480 BC. Pythagoras and his followers believed that all relations could be reduced to numerical relations. This conclusion stemmed from observations in music, mathematics, and astronomy. Pythagoras studied the sound produced by vibrating strings. He subjected two strings to equal tension. He then divided one string exactly in half. When he plucked each string, he discovered that the shorter string produced a pitch which was one octave higher than the longer string. A one-octave separation occurs when the higher frequency is twice the lower frequency. German scientist Hermann Helmholtz (1821-1894) made further contributions to music theory. Helmholtz wrote “On the Sensations of Tone” to establish the scientific basis of musical theory. Natural Frequencies of Strings A note played on a string has a fundamental frequency, which is its lowest natural frequency. The note also has overtones at consecutive integer multiples of its fundamental frequency. Plucking a string thus excites a number of tones. Ratios The theories of Pythagoras and Helmholz depend on the frequency ratios shown in Table 1. Table 1. Standard Frequency Ratios Ratio Name 1:1 Unison 1:2 Octave 1:3 Twelfth 2:3 Fifth 3:4 Fourth 4:5 Major Third 3:5 Major Sixth 5:6 Minor Third 5:8 Minor Sixth 1 These ratios apply both to a fundamental frequency and its overtones, as well as to relationship between separate keys.
    [Show full text]
  • Samia Dawood Shakir
    MODELLING TALKING HUMAN FACES Thesis submitted to Cardiff University in candidature for the degree of Doctor of Philosophy. Samia Dawood Shakir School of Engineering Cardiff University 2019 ii DECLARATION This work has not been submitted in substance for any other degree or award at this or any other university or place of learning, nor is being submitted concurrently in candidature for any degree or other award. Signed . (candidate) Date . STATEMENT 1 This thesis is being submitted in partial fulfillment of the requirements for the degree of PhD. Signed . (candidate) Date . STATEMENT 2 This thesis is the result of my own independent work/investigation, except where otherwise stated, and the thesis has not been edited by a third party beyond what is permitted by Cardiff Universitys Policy on the Use of Third Party Editors by Research Degree Students. Other sources are acknowledged by explicit references. The views expressed are my own. Signed . (candidate) Date . STATEMENT 3 I hereby give consent for my thesis, if accepted, to be available online in the Universitys Open Access repository and for inter-library loan, and for the title and summary to be made available to outside organisations. Signed . (candidate) Date . ABSTRACT This thesis investigates a number of new approaches for visual speech synthesis using data-driven methods to implement a talking face. The main contributions in this thesis are the following. The ac- curacy of shared Gaussian process latent variable model (SGPLVM) built using the active appearance model (AAM) and relative spectral transform-perceptual linear prediction (RASTAPLP) features is im- proved by employing a more accurate AAM.
    [Show full text]
  • End-To-End Video-To-Speech Synthesis Using Generative Adversarial Networks
    1 End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis Member, IEEE Bjorn¨ W. Schuller Fellow, IEEE, Maja Pantic Fellow, IEEE Video-to-speech is the process of reconstructing the audio speech from a video of a spoken utterance. Previous approaches to this task have relied on a two-step process where an intermediate representation is inferred from the video, and is then decoded into waveform audio using a vocoder or a waveform reconstruction algorithm. In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm. Our model consists of an encoder-decoder architecture that receives raw video as input and generates speech, which is then fed to a waveform critic and a power critic. The use of an adversarial loss based on these two critics enables the direct synthesis of raw audio waveform and ensures its realism. In addition, the use of our three comparative losses helps establish direct correspondence between the generated audio and the input video. We show that this model is able to reconstruct speech with remarkable realism for constrained datasets such as GRID, and that it is the first end-to-end model to produce intelligible speech for LRW (Lip Reading in the Wild), featuring hundreds of speakers recorded entirely ‘in the wild’. We evaluate the generated samples in two different scenarios – seen and unseen speakers – using four objective metrics which measure the quality and intelligibility of artificial speech.
    [Show full text]
  • The Perception of Pure and Mistuned Musical Fifths and Major Thirds: Thresholds for Discrimination, Beats, and Identification
    Perception & Psychophysics 1982,32 (4),297-313 The perception ofpure and mistuned musical fifths and major thirds: Thresholds for discrimination, beats, and identification JOOS VOS Institute/or Perception TNO, Soesterberg, TheNetherlands In Experiment 1, the discriminability of pure and mistuned musical intervals consisting of si­ multaneously presented complex tones was investigated. Because of the interference of nearby harmonics, two features of beats were varied independently: (1) beat frequency, and (2) the depth of the level variation. Discrimination thresholds (DTs) were expressed as differences in level (AL) between the two tones. DTs were determined for musical fifths and major thirds, at tone durations of 250, 500, and 1,000 msec, and for beat frequencies within a range of .5 to 32 Hz. The results showed that DTs were higher (smaller values of AL) for major thirds than for fifths, were highest for the lowest beat frequencies, and decreased with increasing tone duration. Interaction of tone duration and beat frequency showed that DTs were higher for short tones than for sustained tones only when the mistuning was not too large. It was concluded that, at higher beat frequencies, DTs could be based more on the perception of interval width than on the perception of beats or roughness. Experiments 2 and 3 were designed to ascertain to what extent this was true. In Experiment 2, beat thresholds (BTs)for a large number of different beat frequencies were determined. In Experiment 3, DTs, BTs, and thresholds for the identifica­ tion of the direction of mistuning (ITs) were determined. For mistuned fifths and major thirds, sensitivity to beats was about the same.
    [Show full text]
  • Linear Predictive Modelling of Speech - Constraints and Line Spectrum Pair Decomposition
    Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing Espoo 2004 Report 71 LINEAR PREDICTIVE MODELLING OF SPEECH - CONSTRAINTS AND LINE SPECTRUM PAIR DECOMPOSITION Tom Bäckström Dissertation for the degree of Doctor of Science in Technology to be presented with due permission for public examination and debate in Auditorium S4, Department of Electrical and Communications Engineering, Helsinki University of Technology, Espoo, Finland, on the 5th of March, 2004, at 12 o'clock noon. Helsinki University of Technology Department of Electrical and Communications Engineering Laboratory of Acoustics and Audio Signal Processing Teknillinen korkeakoulu Sähkö- ja tietoliikennetekniikan osasto Akustiikan ja äänenkäsittelytekniikan laboratorio Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing P.O. Box 3000 FIN-02015 HUT Tel. +358 9 4511 Fax +358 9 460 224 E-mail [email protected] ISBN 951-22-6946-5 ISSN 1456-6303 Otamedia Oy Espoo, Finland 2004 Let the floor be the limit! HELSINKI UNIVERSITY OF TECHNOLOGY ABSTRACT OF DOCTORAL DISSERTATION P.O. BOX 1000, FIN-02015 HUT http://www.hut.fi Author Tom Bäckström Name of the dissertation Linear predictive modelling for speech - Constraints and Line Spectrum Pair Decomposition Date of manuscript 6.2.2004 Date of the dissertation 5.3.2004 Monograph 4 Article dissertation (summary + original articles) Department Electrical and Communications Engineering Laboratory Laboratory Acoustics and Audio Signal Processing Field of research Speech processing Opponent(s) Associate Professor Peter Händel Supervisor Professor Paavo Alku (Instructor) Abstract In an exploration of the spectral modelling of speech, this thesis presents theory and applications of constrained linear predictive (LP) models.
    [Show full text]
  • A Biological Rationale for Musical Consonance Daniel L
    PERSPECTIVE PERSPECTIVE A biological rationale for musical consonance Daniel L. Bowlinga,1 and Dale Purvesb,1 aDepartment of Cognitive Biology, University of Vienna, 1090 Vienna, Austria; and bDuke Institute for Brain Sciences, Duke University, Durham, NC 27708 Edited by Solomon H. Snyder, Johns Hopkins University School of Medicine, Baltimore, MD, and approved June 25, 2015 (received for review March 25, 2015) The basis of musical consonance has been debated for centuries without resolution. Three interpretations have been considered: (i) that consonance derives from the mathematical simplicity of small integer ratios; (ii) that consonance derives from the physical absence of interference between harmonic spectra; and (iii) that consonance derives from the advantages of recognizing biological vocalization and human vocalization in particular. Whereas the mathematical and physical explanations are at odds with the evidence that has now accumu- lated, biology provides a plausible explanation for this central issue in music and audition. consonance | biology | music | audition | vocalization Why we humans hear some tone combina- perfect fifth (3:2), and the perfect fourth revolution in the 17th century, which in- tions as relatively attractive (consonance) (4:3), ratios that all had spiritual and cos- troduced a physical understanding of musi- and others as less attractive (dissonance) has mological significance in Pythagorean phi- cal tones. The science of sound attracted been debated for over 2,000 years (1–4). losophy (9, 10). many scholars of that era, including Vincenzo These perceptual differences form the basis The mathematical range of Pythagorean and Galileo Galilei, Renee Descartes, and of melody when tones are played sequen- consonance was extended in the Renaissance later Daniel Bernoulli and Leonard Euler.
    [Show full text]
  • Beethoven's Fifth 'Sine'-Phony: the Science of Harmony and Discord
    Contemporary Physics, Vol. 48, No. 5, September – October 2007, 291 – 295 Beethoven’s Fifth ‘Sine’-phony: the science of harmony and discord TOM MELIA* Exeter College, Oxford OX1 3DP, UK (Received 23 October 2007; in final form 18 December 2007) Can science reveal the secrets of music? This article addresses the question with a short introduction to Helmholtz’s theory of musical harmony. It begins by discussing what happens when tones are played at the same time, which introduces us to the idea of beats. Next, the difference between a pure tone and a note played on an instrument is explained, which provides the insight needed to form Helmholtz’s theory. This allows us to explain why a perfect fifth sounds consonant yet a diminished fifth is dissonant. When sitting through a performance of Beethoven’s Fifth an octave. Thus, if something vibrates at 523.3 Hz we Symphony, we are interpreting the complex sound wave would hear the note C an octave above middle C (this note which propagates through the air from the orchestra to our is denoted C^, the note two octaves up C^^ and so on—see ears. We say that we are ‘listening to music’ and be it figure 1). classical, rock or rap, music is considered a form of art. But So what happens when we hear more than one pure tone a sound wave is something we study in physics. So just how at the same time? The two sound waves combine, obeying much science is there behind the music? Can we use physics the Principle of Superposition and the resultant wave is as to explain why Ludwig’s harmony is harmonious? And in figure 2.
    [Show full text]