<<

International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 3 (2019), pp. 378-384 © International Research Publication House. http://www.irphouse.com

A Study on the Principles of Human Whistling and its Analysis by Using the Parameters of Speech Production Models

Sang-Hyu Yoon1, Kwang-Bock You2 and Myung-Jin Bae3* 1Soong-sil University, Department of Information and telecommunication Engineering, Seoul, 06978, Korea 2Soong-sil University, Department of Information and telecommunication Engineering, Seoul, 06978, Korea 3 Soong-sil University, Department of Information and telecommunication Engineering, Seoul, 06978, Korea Orcid Id : 0000-0002-7585-0400 *Corresponding author

Abstract: Ethiopia. It is known that people in the Canary Islands have spoken the whistling language for at least 600 years [1]. It is, In this paper, the production principles of human whistling and also, known that whistling is widely used in the northwestern its analysis were carried out utilizing the parameters of the areas of Africa, northern part of Laos, and Amazon region of speech production models. The speech production models have Brazil when the long-distance communication is needed. used in this paper are the vocal tract model and the source Recently, in Japan, the whistle blowing contests are held synthesis filter model. In order to ensure the validity of this regularly, the winners of such contests teach musical whistling study, the musical scales are utilized as the experimental data in places like schools [2]. since it can be predicted its results. In this paper, four types of musical scale data were used for the purpose of comparison and A whistling is a series of processes that produce a constant analysis. The first is an artificial musical scale composed of flow of air from the lungs and then collect it into the lips and sinusoidal signals, the second is a musical scale created by control it using the tongue. While these processes are going on, human whistle, the third is generated by human voice, and the by using tongue, lips, teeth, or fingers change the flow of air last is a musical scale created by real piano. Here, the data by and operate the mouth as a kind of device to get the human whistling, voice, and piano are those of noisy desired . The whistle analyzed in this paper is the labial environment. whistling that produces sound by making small circles with both lips and works well when exhale or inhale. These whistles It is shown that the fundamental of human whistling have some content, but they are clearly signals with has almost the same properties as that of the Helmholtz a single frequency or a single tone. Therefore, the fundamental . The results of the analysis of the vocal tract speech frequency of human whistling is likely to be the same as the production model, composed of tubes with non-uniform cross- resonant frequency of the Helmholtz resonator [2], [3], [4]. sectional areas, suggested the possibility of this model as a production model of human whistling. The frequency Researches for the speech signal production models such as the of the speech signal is described by using this modelling. The vocal tract model and source filter model have been relationship among the of human consistently studied for more than 40 years, but there are whistling, the formant frequency of the human speech and the relatively few studies on the principle of whistling production. resonant frequency of the Helmholtz resonator is explained. In this paper, not only the speech signal and whistle signal are The source synthesis filter model, which modeled the speech analyzed but also the piano signal for the purpose of production with impulse and random signals, and the linear comparison is analyzed. For comparison purposes, the Prediction Coding (LPC) equations which are the solutions of waveform of each data, the frequency component analysis of this model have been studied. Using the four types of data the signal, and the spectrum are obtained. aforementioned, we performed computer simulation for these models. 2. HUMAN WHISTLING, HELMHOLTZ RESONATOR, In this paper, the applications using not only the Helmholtz AND SPEECH PRODUCTION MODELS resonator but also the properties of the fundamental frequency of human whistling are presented. 2.1 The Principles of Human Whistling Production and Helmholtz Resonator Keyword: Vocal Tract Model, Fundamental frequency, Helmholtz resonator, Formant frequency, Resonant frequency Unlike voice and song, whistling is not involved with the Larynx or Voice box, which is located at the center of the neck

to carry out the function of breathing and vocalization. The 1. INTRODUCTION whistling is accomplished by a constant flow of air, which is produced starting from the lungs across to the mouth cavity, to Human vocal organs can produce a considerable number of the outside by almost periodic inhale and exhale airflow at the such as whistling, singing, and voice. People in the lips [4]. small town of Kuskoy in northern Turkey use whistling like their mother tongue. An ancient Greek historian, Herodotus It is said that the frequency of the whistle is determined by the described that there were humans who are "talking like bats” in frequency of the Helmholtz resonator in the mouth cavity, and

378 International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 3 (2019), pp. 378-384 © International Research Publication House. http://www.irphouse.com that the fundamental frequency of the whistle is very close to typical examples for modeling of the speech signal production. the resonant frequency of the Helmholtz resonator. In fact, the vocal tract between the larynx and pharynx is almost Experiments to demonstrate these have been done using the bent at a right angle. However, the vocal tract model is modeled vocal tract model [2], [3]. However, some whistles are similar by connecting multiple tubes with non-uniform cross sectional to sounds with very high pitch produced by wind instrument areas. Such non-uniform cross sectional areas modeling allows using air column response frequency [5]. precise mathematical simulations of sound transmission in the vocal tract. When a signal is transmitted through these tubes, Helmholtz resonance is a resonance phenomenon that occurs the frequency spectrum is shaped by the characteristics of the in the cavity or chamber of a bottle when air is injected from tube. This phenomenon is very similar to the resonance effect the bottleneck. In this resonant phenomenon, the loudest sound that appears in the pipe of the wind instrument or organ. These is produced at or near the resonant frequency. This resonant resonance are called Formant frequencies in the frequency is determined by the length of the neck of and the speech signal analysis [5]. Formant frequencies are depending volume of the capacity of the Helmholtz resonator. Musical upon the dimension of the vocal tract and its shape such as instruments such as guitar and are major examples of length and form. This means that the characteristics of the voice use. Often it is used to increase the low frequency response of signal passing through the tube are varying depending on the the speakers [6], [7], [8]. length of the tubes and the cross sectional areas of the vocal 퐴 tract. The vocal tract can be analyzed with a lossless tube model 2휋퐹 = 푐√ (1) ℎ 푉퐿 which is connected with both lossless and one-sided-closed acoustic tubes. Therefore, the reflection coefficient at tube of between the 푘푡ℎ and (푘 + 1)푡ℎ can be expressed by the following equation (2). Using equation (1) Helmholtz resonance frequency (퐹ℎ) is calculated. Here, A, V, and L are the cross-sectional area of the 퐴푘+1−퐴푘 푟푘 = (2) neck, the volume of cavity, and the length of the neck of the 퐴푘+1+퐴푘 Helmholtz resonator, respectively, and c is the in air. It is noticed from this equation that, during whistling, the neck of the resonator formed by the lips is affected, although And the transfer function between these connections is its geometric shape does not change. The volume of cavity expressed by the equation (3) in below. changes by the movement of the tongue so that the whistling 푁 1 − (1+푟 ) ∏푁 (1+푟 )푧 2 can make different tones [9]. 2 퐺 푘=1 푘 푉(푧) = 푁 −푘 (3) Figure 1 below shows a typical frequency response of a 1−∑푘=1 훼푘푧 Helmholtz resonator. As it is seen the Helmholtz resonant frequency is appeared at 68 Hz. In the equations, 퐴푘, 푟퐺, and 훼푘 are the cross sectional areas at the 푘푡ℎ tube, the reflection coefficient at glottal, and the prediction coefficients, respectively [10], [11], [12]. The source synthesis filter model is a method to implement speech signal production by using two different signal models, which are the voiced signal (modeling with impulse signal) and the unvoiced signal (modeling with random signal). It can be expressed as a time-varying filter that characterizes the glottal, vocal tract, and lips effects that contribute to the production of speech signal. The transfer function of the time-varying filter is shown in the following equation. 푆(푧) 퐺 퐻(푧) = = 푝 −푘 (4) 푋(푧) 1+∑푘=1 훼푘푧

where 푋(푧), 푆(푧), and 훼푘 are the input signal, the output of the filter, and the prediction coefficients, respectively. The equation (4) can be converted into the time domain as following: Fig 1. Typical frequency response for Helmholtz Resonator 푝 푠(푛) = 퐺푥(푛) − ∑푘=1 훼푘 푠(푛 − 푘) (5)

2.2 Speech Signal Production Principles and Formant The equation (5), which is the difference equation of the Linear Frequencies Prediction Coding (LPC), shows that the current output 푠(푛) of The source synthesis filter model and the vocal tract model are the synthesis filter is determined by a linear combination of the

379 International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 3 (2019), pp. 378-384 © International Research Publication House. http://www.irphouse.com current input 퐺푥(푛) and the past output samples. Therefore, analyzes. Nevertheless, it is more important to apply the this model becomes a problem of finding the prediction appropriate parameters to the model using the key features of parameters of its linear combination for a given speech signal. the speaker or language. The ways to find these parameters are Autocorrelation Function and AMDF (Average Magnitude Difference Function) in time domain, and Cepstrum method in frequency domain. It can be 3. SIMULATION AND RESULTS said that speaker and language are not important factors in these

Fig 2. Time-Domain waveforms of C4 (Do) and E4 (Mi)

A musical note has its own pitch which is determined by its 466.16 퐻푧. In this way, based upon the semitone relationship perceived fundamental frequency of composing sinusoidal between musical tones, we can calculate the frequency, in Hz, signal. There can be a combination of different frequencies of any note on the musical scale. In figures 2 and 3, the existed in a musical note, which affect the timbre of the note. waveforms and frequency components of C4 (261.63 Hz) and The musical scale is tied to a frequency reference where A4 is E4 (329.63 Hz), which the artificial musical scales composed set to 440 Hz. This scale defines that the frequency relationship of sinusoid signal, are shown. It was compared with data – of one semitone is the twelfth root of two. Therefore, since A4 human whistling, piano, and voice in noise environment. 1 ( ) is 440 Hz, one semitone higher -A#4- should be 440 × 2 12 =

Fig 3. The Fourier Transform of "C4 (Do) and E4 (Mi)"

380 International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 3 (2019), pp. 378-384 © International Research Publication House. http://www.irphouse.com

The data used in this paper are “C (Do)/E (Mi)” of the musical Figure 4 is the waveforms in the time-domain that passed the scales, and they are compared and analyzed with the signals HPF of equation (6). Figures 5, 6, and 7 show the waveforms implemented by whistle, piano, and voice. First, these signals of the time-domain of the data and its results of the Fourier are preprocessed. The first step of the preprocessing is the transform. Figures 7, 8, and 9 below show the waveforms in the normalization, which is exploited the mean and variance of the time-domain (top) and its corresponding spectra (bottom) for input random signal. In the second, in order to remove the DC the musical scale "E". Of course, these signals also performed component of the signal the high pass filter (HPF) of Equation the preprocessing steps in the same way. (6) was used [12].

0.946−1.892푧−1+0.946푧−2 퐻 (푧) = (6) ℎ푝푓 1−1.889033푧−1+0.89487푧−2

Fig 4. Time-Domain waveforms after HPF

Fig 5. The waveform and its Fourier Transform of piano's "Do (C)"

381 International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 3 (2019), pp. 378-384 © International Research Publication House. http://www.irphouse.com

Fig 6. The waveforms and its Fourier Transform of Human Whistling “Do (C)”

Fig 7. The waveform and its Fourier Transform of Voicing "Do (C)"

Fig 8. The waveform of piano "Mi (E)" and corresponding spectrogram

382 International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 3 (2019), pp. 378-384 © International Research Publication House. http://www.irphouse.com

Fig 9. The waveform of Human Whistling "Mi (E)" and corresponding spectrogram

Fig 10. The waveform of Voicing "Mi (E)" and corresponding spectrogram

4. CONCLUSION known that the energy of the voicing “E (Mi)” as shown in Figure 10 is gathered at least three places so that the formant In figure 5, the Fourier Transform results show the dominant frequencies of the signal are formed. It is shown that the frequency at 522.9 Hz, which is equal to the frequency (523 spectrum of whistling has the greatest amount of energy in the Hz) of the "C (Do)" of the fifth octave in graph of frequency vicinity of 1330 Hz. changes corresponding to musical scale and octave. This result ensures the legitimacy of the analytical programs that used in Using the frequencies investigated in this paper - resonant this paper. In figure 6, "C (Do)" in human whistling shows that frequency, fundamental frequency, and formant frequency - it the fundamental frequency is forming at approximately 400 Hz. can be possible to invent a novel musical instrument useful for "C (Do) of the human voice, shown in figure 7, displays at least everyday life [13], [14]. And, by using the Helmholtz three Formant frequencies, as like the general speech signals. frequency, new musical scales, which might be for a musical synthesizer, can be created. Comparing the spectra of "E (Mi)" in Figures 8, 9, and 10 shows that the signal energy of the piano is collected near the If a very large sound can be generated in the Helmholtz inherent frequencies of the musical scales, as be expected. It is resonance, a convenient fire extinguisher may be able to invent.

383 International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 3 (2019), pp. 378-384 © International Research Publication House. http://www.irphouse.com

This can be known from the spectrum of whistles that energy is [11] Ian Vince McLoughlin, Speech and Audio Processing, aggregated in one frequency band. Looking at the frequency Cambridge University Press, New York (2016) characteristic of the whistling a very sharp notch at the peak is [12] http://newt.phys.unsw.edu.au/jw/voice.html existed. By using this property it seems to be created a good [13] G.E.Poliner, Daniel Ellis, Andreas Ehmann, E. Gomez, S. noise cancelling filter. Streich, and B. Ong, “Melody Transcription From Music Audio: Approaches and Evaluation”, IEEE Trans. on

Audio, Speech, and Language Processing, Vol. 15, No. 4, REFERENCE 2007, pp.1247-1256. [14] https://en.wikipedia.org/wiki/Whistle_register

[1] https://www.newyorker.com/tech/elements/the-whistled- [15] You, K.-B., Bae, M.-J., “A study of the variations of language-of-northern-turkey, August 17, (2015). energy of the speech signal when english speakers speak [2] T. Shigetomi and Mikio Mori, “Principles of sound Korean”, Information (Japan), Vol.20, No.7, 2017, pp. resonance in human whistling using physical models of 5035-5040. human vocal tract”, Acoust. Sci. & Tech., Vol.37, No.2, [16] Kim, B.-Y., Bae, M.-J., “A study on the sound 2016 characteristics analysis of drum-sound rock”, Journal of [3] R.G. Busnel, Whistled Languages, Springer-Verlag, Berlin Engineering and Applied Sciences, Vol.13, No.12, 2018, Heidelberg (1976) pp. 4414-4418.

[4] Oliver Keller, Pitch Detection in Human Whistling Sounds [17] Park, S.-B., Kim, M.-S., Bae, M.-J. “A study on the high- on an Embedded System, Project Report, Hochschule pitched vocal voice of Lee, Nan-Young”, Information Karlsruhe Technik und Wirtschaft University (2013) (Japan), Vol.20, No.6, 2017, pp. 4063-4070.

[5] Seiji Adachi, “Principles of Sound Production in Wind [18] Kim, B.-Y., Kim, M.-S., Bae, M.-J., “Study on the Sound Instruments”, , Vol.25, No.6, 2004 Acoust. Sci. & Tech. Transmission Characteristics of Rescue Signal 'Ya-Ho'”, [6] https://en.wikipedia.org/wiki/Helmholtz_resonance International Journal of Engineering Research and [7] Takayuki Arai, “Education system in of speech Technology, Vol.12, No.2, 2019, pp. 222-226. production using physical models of the human vocal [19] Ahn, I.-S., Kim, B.-Y., You, K.-B., Bae, M.-J., “A Study tract”, Acoust. Sci. & Tech., Vol.28, No.3, 2007 on the Characteristics of an EEG Based on a Singing [8] M. Nilsson, Josef Bartunek, J. Nordberg, and I. Claesson, Bowl’s Sound Frequency”, Studies in Computational “Human Whistle Detection and Frequency Estimation”, Intelligence, Volume 789, 2019, pp. 233-243. CISP, IEEE computer society, (2008) [20] You, K.-B., Bae, M.-J., "Study on energy changes during [9] Myonghyun Han, Sound Reduction by a Helmholtz Korean vocalization of english natives", Information Resonator, Thesis Master of Science, Lehigh University (Japan), Vol.19, No.9A, September 2016, pp.3913-3918. (2018) [10] L.R. Rabiner and R.W. Schafer, Digital Speech Processing, Prentice Hall, Upper Saddle River, NJ (2011)

384