Music Perception
Total Page:16
File Type:pdf, Size:1020Kb
ggggggggggggggggggggggggggDegradation of musical signal quality after suppression of frequency channels M. Paquier1,2, J.C. Béra1, C. Berger-Vachon2 1 Laboratoire de Mécanique des Fluides et d’Acoustique CNRS UMR 5509 Ecole Centrale de Lyon, 36 av. Guy de Collongue, BP 163, 69131 Ecully Cedex, France [email protected] 2 Laboratoire “Neurosciences et Systèmes Sensoriels” CNRS UMR 5020 50 avenue Tony Garnier, 69366 Lyon Cedex 07, France This study focussed on the sound quality of musical sounds reconstructed after a frequency channel suppression. The musical signals used were single instrument sequences. Most of the instrument families were studied. The processing began with a time- frequency analysis by sliding-window FFT. Then, only the most energetic frequency channels were kept (width was a parameter), and ten subjects were interrogated about the quality of these degraded sounds. 40 channels with a 40 Hz width were sufficient to conserve a correct quality for these musical sources, and asymptotic results were obtained from 80 channels with a 40 Hz width. sound quality of different musical sources as a function of the number of kept channels and the width INTRODUCTION MATERIAL AND METHOD A purpose of the computational auditory scene We studied the degradation of the quality of ten analysis is the separation of simultaneous acoustic instruments : violin, guitar, piano, clarinet, oboe, flute, sources. The establishment of such a system in trumpet, accordion, glockenspiel, and bongos. automatic recognition algorithms or in hearing aids Instruments were recorded using a 44100 Hz sampling could resolve the cocktail party problem frequency and a 16 bits quantification. The ten sounds (performances are quite good when only one source is were simple 5 seconds melodies, played by present, but as soon as a rival source or a noise professional musicians, without chords, vibrato, nor interferes, intelligibility is degraded). The method tremolo. Every sequence was analysed with a sliding classically used to separate sources is the selection of window FFT, with a 50 % overlap and a hamming frequency channels to each of them. Channels window. The duration of the temporal window Lwin, common to several sources raise then a problem : their which determines the frequency precision fu, was a total deletion leads to a loss of useful information, and variable parameter (4 values) : the sharing of their energy between several sources is - Lwin = 50 ms ⇒ Fu=20 Hz arbitrary and introduces noise [1]. Speech spectra and - Lwin = 25 ms ⇒ Fu=40 Hz musical instruments spectra are very redundant : - Lwin = 12,5 ms ⇒ Fu=80 Hz Warren and al. [2] showed that the conservation of a - Lwin = 6,25 ms ⇒ Fu=160 Hz few narrow spectral bands (1/20 octave large) was To be sure to observe the influence of the frequency sufficient to preserve a correct intelligibility. Loizou precision fu and not that of the inversely proportional and al. [3] divided speech spectra into n channels, then temporal precision, we have down-sampled the time of they reconstructed a signal by adding n sine waves at the time-frequency representations, in order to always the central frequency of the channels. They found refresh the spectrum every 25 milliseconds. recognition rates superior to 90% for n=5 bands and Then we kept only the n most energetic channels. The asymptotic results were obtained from n=8. energy of the remaining channels is put in zero. In this work, we kept only the most energetic The parameter n could take six different values : frequency channels (width was a parameter) of single n=5;10;20;40;80;160 musical instrument sequences, and then we Let us note that the case fmin=160 Hz and n=160 interrogated subjects about the quality of these means the conservation of a bandwidth of degraded sounds. So we studied the sound quality of 160*160=25600 Hz, so higher to fe/2. This case was different musical sources as a function of the number replaced by the simple conservation of all the of kept channels and of their width. frequencies (reference case). Experimental protocol Logically, the more the kept channels were numerous, A five minutes pre-test allowed subjects to know the the more the sound quality was positively ranked. original sounds and the global range of the damages Also, the more these channels were wide, the more undergone by those. All subjects were interested in they kept information, and better was the sound music. Six were non professional musicians at quality. different levels, and four were professional musicians. The factor musician/no musician had no effect, and the Obviously we did not want to restrict this study of assessment differences between instruments were sound quality to musicians, but only persons knowing globally weak (not indicated here). normal timbre of an instrument were capable of A fundamental finding of the experiment is the small assessing its possible degradation (even if the original number of channels necessary to keep the quality of sounds were listened at the beginning of the test). instrumental sources : 40 channels with a 40 Hz width Then the main test began : the subjects listened to a were sufficient to reach the assessment “correct”, and sequence, more or less frequency impoverished, asymptotic results were obtained from 80 channels played by an instrument the name of which was with a 40 Hz width. This observation can be displayed on a computer screen. They should then particularly useful for simultaneous sources assess (by clicking on the screen) the quality with the separation: if only few channels are sufficient to following range : "not recognised" (the heard correctly describe a source, the total removal of instrumental timbre was so degraded that it did not channels common to several sources can be realised correspond to the instrument displayed on the screen), rather than the sharing of their energy. "very degraded", "degraded", "correct", "very good", Another observation is the relatively small importance "perfect". of channel width relatively to their number. So the Ten sequences were played for six different numbers keeping of 2N channels of width L seems more of kept channels and four different widths for these profitable than the keeping of N channels of width 2L. channels. So, a test was composed by 240 randomised Improving the quality by increasing channel width was sounds. To homogenize assessments, the test was seen mainly when the number of channels was repeated three times. So the global session represented between 10 and 40. When fewer channels were kept, 720 sounds, and lasted about two hours. the quality was very poor, whatever the channel width. On the other hand, the step "very good" was obtained RESULTS AND DISCUSSION as soon as 80 channels were kept, whatever their width. The relatively small importance of the width of Figure I indicates the assessments given by all the channels relatively to their number confirms the theory subjects for all the ten instruments. So the variable that the conservation of a narrow band around the parameters were the number of kept channels and the central frequency of frequency peaks of a sound is width of these channels. sufficient for its correct description. Finally, let us note that if the frequency distribution of assessment the kept channels concerned mainly the frequencies lower than 8000 Hz, some frequency channels isolated in high frequency were sometimes kept, and it is perfect 6 channel width possible that these channels, absent in the classic 20 Hz 40 Hz systems using reduced bandwidth (such as telephone very good5 80 Hz for example), are important. 160 Hz correct 4 REFERENCES degraded3 1. Cooke and G.J. Brown, “Computational auditory scene analysis : Exploiting principle of perceived very 2 degraded continuity”, Speech Communication, 13, 391-399 (1993). not 1 recognised 2. R.M. Warren, K.R. Riener, J.A. Bashford Jr, B.S. 51 102 203 404 805 1606 number of Brubaker, “Spectral redundancy : intelligibility of Figure I kept channels sentences heard through narrow spectral slits”, Percept. Psychophys, 57(2), 175-182 (1995). First, let’s note that the range was not completely used by the subjects : assessments rarely exceeded the step 3. P.C. Loizou, M. Dorman, Z. Tu, “On the number of "very good". channels needed to understand speech”, J. Acoust. Soc. Am., 106(4), 2097-2103 Aesthetic Evaluation for Allowable Answers of "Given Bass" tasks in the Theory of Harmony M. Miuraa, M. Yamadab, M. Obanac and M. Yanagidaa aFaculty of Engineering, Doshisha University, Kyo-tanabe, Kyoto, Japan bDept. of Musicology, Osaka Univ. of Arts, Kanan-cho, Osaka, Japan cDept. of Music, Takarazuka Univ. of Art and Design, Takarazuka, Hyogo, Japan Evaluation of allowable solutions for "given bass" tasks is an appropriate subject for investigating what the musical aesthetics is, because its search space is finite, as the number of allowable solutions for a "given bass" sequence is limited though it might be large. The authors have developed a "Basse Donnée" System (BDS), which can generate all allowable solutions for any given bass sequences not violating inhibition rules in the theory of harmony within triads. If there are some tendencies common among professional musicians or even among novice students of composition course, that will imply existence of aesthetic evaluation criterion besides inhibition rules in the theory of harmony. A series of comparative aesthetic evaluation tests was made employing professionals and students of composition courses as subjects on several complete sets of allowable solutions obtained by BDS for given bass sequences. The results indicate that the common focal point of evaluation is the soprano line. Based on the results of the aesthetic evaluation tests, a "Music Aesthetics Evaluation System" is realized for predicting aesthetic scores for allowable solutions of given bass sequences.