Noise'' Versus the Harmonic Series for Musical Instrument Recognition
Total Page:16
File Type:pdf, Size:1020Kb
The Significance of the Non-Harmonic “Noise” Versus the Harmonic Series for Musical Instrument Recognition Arie Livshin, Xavier Rodet To cite this version: Arie Livshin, Xavier Rodet. The Significance of the Non-Harmonic “Noise” Versus the Harmonic Series for Musical Instrument Recognition. International Symposium on Music Information Retrieval (ISMIR’06), 2006, NA, France. pp.1-1. hal-01161372 HAL Id: hal-01161372 https://hal.archives-ouvertes.fr/hal-01161372 Submitted on 8 Jun 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Significance of the Non-Harmonic “Noise” Versus the Harmonic Series for Musical Instrument Recognition Arie Livshin Xavier Rodet IRCAM Centre Pompidou 1 place Stravinsky, Paris 75004, FRANCE arie.livshin,[email protected] Abstract the energy levels of the different harmonics is widely Sound produced by Musical instruments with definite considered as the main characteristic of pitched pitch consists of the Harmonic Series and the non- instruments’ timbre (e.g. [1]), if we subtract this harmonic Residual. It is common to treat the Harmonic Harmonic Series from the original sound there is a non- Series as the main characteristic of the timbre of pitched harmonic Residual left. This Residual is far from being musical instruments. But does the Harmonic Series 'white noise'; it is heavily filtered by the nature of the indeed contain the complete information required for instrument itself as well as the playing technique, and discriminating among different musical instruments? may contain inharmonic sinusoidal partials as well as Could the non-harmonic Residual, the “noise”, be used all non-sinusoidal ‘noise’, such as the breathing sounds in by itself for instrument recognition? The paper begins by the flute or the scraping noises in the guitar. performing musical instrument recognition with an Does the Harmonic Series indeed encapsulate all the extensive sound collection using a large set of feature distinguishing information of the sounds of pitched descriptors, achieving a high instrument recognition rate. musical instruments? If so, about the same instrument Next, using Additive Analysis/Synthesis, each sound recognition rate should be achieved by using only the sample is resynthesized using solely its Harmonic Series. Harmonic Series as by using all the information in the These “Harmonic” samples are then subtracted from the signal, with the same feature descriptor set used for original samples to retrieve the non-harmonic Residuals. classification. This is a practical question for the field of Instrument recognition is performed on the resynthesized instrument recognition; when performing instrument and the “Residual” sound sets. The paper shows that the recognition in multi-instrumental, polyphonic music, it is Harmonic Series by itself is indeed enough for achieving difficult as well as computationally expensive to perform a high instrument recognition rate; however, the non- full source separation [2] and restore the original sounds harmonic Residuals by themselves can also be used for out of the polyphonic mixture in order to recognize each distinguishing among musical instruments, although with source separately. On the other hand, estimating the lesser success. Using feature selection, the best 10 feature Harmonic Series of the different notes in the mixture is a descriptors for instrument recognition out of our relatively easier task [3]. For example, in [4] Harmonic extensive feature set are presented for the Original, Series estimation is used for performing “Source Harmonic and Residual sound sets. Reduction”, reducing the volume of all instruments except one and then recognizing it. In [1], instrument Keywords: instrument recognition, musical instruments, recognition is performed using only features based on the residual, noise, harmonic series, pitch Harmonic Series, estimated from pre-given notes. There 1. Introduction is also research attempting to perform instrument recognition by recognizing directly instrument mixtures, Musical instruments with definite pitch (“pitched instead of trying to separate them into individual instruments”) are usually based on a periodic oscillator instruments, see for example [5]. such as a string or a column of air with non-linear Another interesting question comes from the opposite excitation. In consequence, their sound is mostly direction: is the non-harmonic Residual, the “noise” a composed of a Harmonic Series of sinusoidal partials, i.e. musical instrument produces, so distinct as to allow frequencies which are integer multiples of the distinguishing between different instrument types, e.g. fundamental frequency (f0). While the relation between can we actually distinguish between different wind instruments just by the sound of their airflow hiss? Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies In order to answer these questions, the paper explores are not made or distributed for profit or commercial advantage and that how instrument recognition rates using signals copies bear this notice and the full citation on the first page. resynthesized solely from the Harmonic Series of the © 2006 University of Victoria sound, and signals containing solely the non-harmonic Residuals, compare with the recognition rates when using an “instrument Instance”. The total number of instrument the complete signals. In order to perform this comparison Instances in the sound set is 77. as directly as possible, the first step is to achieve a high All the sounds are sampled in 44 KHz, 16 bit, mono. instrument recognition rate. This is accomplished here by computing an extensive set of feature descriptors on a 3. Harmonic Sounds and Residuals large and diverse set of pitched musical instrument sound Additive analysis/synthesis is based on Fourier's theorem, samples, reducing the feature dimensions with Linear which states that any physical function that varies Discriminant Analysis (LDA) and then classifying the periodically with time with a frequency f can be sounds with K-nearest neighbours (KNN). expressed as a superposition of sinusoidal components of Next, the Harmonic Series of each sample in the sound frequencies: f, 2f, 3f, 4f, etc. Additive synthesis applies set is estimated, including the f0s, harmonic partials and this theorem to the synthesis of sound [6]. For a review of corresponding energy levels, and using Additive supplementary Additive Synthesis techniques see [7]. Synthesis all the signals are resynthesized using only their In order to separate the sound samples into their Harmonic Series, thus creating synthesized ‘images’ of harmonic and non-harmonic components, the samples are the original signals which lack any non-harmonic analyzed and then selectively resynthesized using the information. These resynthesized sounds will be referred Additive analysis/synthesis program, “Additive” [8], to in the paper as “Harmonic” signals, while the original which considers also inharmonic deviations of the partials sounds from the sound set will be called, the “Original” (e.g. found in the piano sounds). Very precise Additive signals. As the phase information of the Original signals analysis was performed by supplying the Additive is kept in the Harmonic signals, by subtracting the program with specifically tailored parameters for each Harmonic signals from the Original signals we remain sound sample using its note name and octave, known in with the non-harmonic, “noisy”, part of the signals, advance, for estimating its f0. For example, the Additive referred to shortly as the “Residuals”. analysis/synthesis window size was set to 4*(1/f0), FFT After that, the same set of feature descriptors is size to 4*nextpow21(sampleRate * windowSize), etc. original resynthesized residual computed on each sample group: the Original, Harmonic 0.15 0.15 0.15 and Residual Signals. These three groups are then divided separately into training and test sets and instrument 0.1 0.1 0.1 recognition is performed on each group independently. 0.05 0.05 0.05 The instrument recognition results are presented and 0 0 0 compared in Section 7. Using the Correlation-based Feature Selection (CFS) -0.05 -0.05 -0.05 algorithm with a greedy stepwise forward search method, -0.1 -0.1 -0.1 the 10 most important feature descriptors for each of the 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 three groups of samples are estimated and presented. 4 4 4 x 10 x 10 x 10 2. Original Sound Set Figure 1. Left to right: original Clarinet sample (A3), the sample resynthesized from the Harmonic Series, the The sound set consists of 5006 samples of single notes Residual (subtraction). of 10 “musical instruments”: bassoon, clarinet, flute, trombone, trumpet, contrabass, contrabass pizzicato, Figure 1 shows an example of an Original clarinet violin, violin pizzicato and piano. As the violin and bass sample (the note A3), the sound we resynthesized from pizzicato sounds are very different from the bowed the Harmonic Series of the Original,