Tuvan Throat Singing and Harmonics

Total Page:16

File Type:pdf, Size:1020Kb

Tuvan Throat Singing and Harmonics IOP Physics Education Phys. Educ. 53 P A P ER Phys. Educ. 53 (2018) 035011 (6pp) iopscience.org/ped 2018 Tuvan throat singing © 2018 IOP Publishing Ltd and harmonics PHEDA7 Michael J Ruiz1 and David Wilken2 035011 1 Department of Physics, University of North Carolina at Asheville, Asheville, NC 28804, United States of America M J Ruiz and D Wilken 2 MusicWorks Asheville, Asheville, NC 28804, United States of America E-mail: [email protected] and [email protected] Tuvan throat singing and harmonics Abstract Printed in the UK Tuvan throat singing, also called overtone singing, provides for an exotic demonstration of the physics of harmonics as well as introducing an Asian PED musical aesthetic. A low fundamental is sung and the singer skillfully alters the resonances of the vocal system to enhance an overtone (harmonic above the fundamental). The result is that the listener hears two pitches 10.1088/1361-6552/aaa921 simultaneously. Harmonics such as H8, H9, H10, and H12 form part of a pentatonic scale and are commonly selected for melody tones by Tuvan singers. A real-time spectrogram is provided in a video (Ruiz M J 2018 Video: 1361-6552 Tuvan Throat Singing and Harmonics http://mjtruiz.com/ped/tuva/) so that Tuvan harmonics can be visualized as they are heard. Published Where is Tuva? Throat singing is known not only to the Tuvins, but also to several neighboring peo- 5 Richard Feynman, who shared the 1965 Nobel ples (Mongols, Oirats, Khakass, Gorno-Altais Prize in physics for his work on quantum electro- and Bashkirs). However, among the Tuvins it 3 dynamics, answered in a British documentary [1]: has been preserved in the most developed and ‘just outside of Outer Mongolia, in the middle of widespread form, … [3]. Central Asia, in the depths of Russia, far away from anything’. Feynman’s interest in Tuva (also Tibetan monks also have a tradition of throat called Tannu Tuva) dated back to when, as a child, singing with an extremely low fundamental pitch his father told him about interesting stamps from (first harmonic), close to 75 Hz [4]. In contrast, this captivating land. Later in life, he attempted the typical Tuvan fundamental is near 150 Hz, an to go to Tuva [2] with his friend Ralph Leighton octave higher, as demonstrated in this paper. (son of Caltech physicist Robert B Leighton3), Throat singing is also called overtone sing- but Feynman passed away before finally getting ing. Individual overtones (harmonics above the permission to travel there. fundamental) are enhanced in this type of sing- Not only is the location far out, but Tuva is ing, which is a beautiful demonstration of the also a very important region for throat singing. laws of physics. Students can visualize Fourier’s In throat singing, at least one overtone is empha- theorem, which states that any periodic tone with sized with the fundamental pitch. The composer frequency f can be constructed by adding sine and later ethnomusicologist A N Aksenov wrote: waves with frequencies f, 2f, 3f, and so on, the harmonics. 3 Leighton senior is coauthor of The Feynman Lectures on Physics (Richard P Feynman, Robert B Leighton, and Some students may already be familiar with Matthew Sands), first published in 1964. Tuvan throat singing from the popular television 1361-6552/18/035011+6$33.00 1 © 2018 IOP Publishing Ltd M J Ruiz and D Wilken show ‘The Big Bang Theory’, where the fictional somewhat and helps to create the resonance character Sheldon Cooper demonstrates the art chamber inside the mouth. in an episode. Students will find throat singing To produce the harmonics, the tongue must especially fascinating if the teacher begins by raise and lower inside the mouth in a manner sim- asking them if it is possible for the human voice ilar to whistling. The tip of the tongue is placed to sing two or three different notes at the same behind the upper teeth, as if saying the letter ‘L’ time. The skill employed in throat singing allows and the sides of the tongue are brought against the the singer to enhance harmonics above the fun- molars. By raising and lowering the middle of the damental, which creates the illusion that one is tongue while the tip and sides are in this position, singing a musical interval (two notes) or even a the harmonics are isolated and melodies can be chord (three or more notes). The practice for sing- sung. It takes some practice and experimentation ing a tune is to emphasize one overtone so that for the singer to learn the precise position of the by hearing the fundamental and the amplified tongue in order to isolate the specific partials and higher harmonic, one perceives two simultane- bring out the harmonics. ously sounding notes. The melody is carried by When initially learning to sing in this way the the overtone and the fundamental serves as a con- singer’s throat can itch, causing a coughing fit. stant pedal bass tone. Singers in training must be careful not to sing too Coauthor David Wilken first learned how to loud and when the throat itches, the singer should sing the base pedal-tone pitch with a particular stop, drink some water, and take a break to avoid sort of constriction that closes the ventricular damaging the vocal folds. The above description folds, sometimes called the ‘false folds’. These is informational and not intended to be instruc- false vocal folds are ‘paired tissues that occur tional. As with any advanced vocal technique, it directly above the true folds’ [5]. This practice is best to learn with a professional voice teacher. helps the oral cavity resonate more and brings There are other variants to throat singing out the overtones. Then he learned to employ a among the indigenous populations of Siberia two-cavity technique by creating a second reso- (Tuvans), Mongolia, and Tibet. In all cases, the nance chamber by raising his tongue up to the physics of resonance is important. The two reso- roof of the mouth. The cavity configuration is nances described earlier lead to the listener’s per- varied by adjusting the position of the tongue ception of two distinct tones. Levin and Edgerton against the roof of the mouth. In this manner describe the lower pedal tone as ‘a low, sustained higher harmonics can be selected for emphasis. fundamental pitch, similar to the drone of a bag- A single harmonic has an ethereal whistle qual- pipe’ [5]. They describe the whistle-like overtone ity characteristic of sine waves with midrange to as a sound in ‘a series of flutelike harmonics, high frequencies perceived by humans. Melodies which resonate high above the drone and may with tones that fit the pitches available in the over- be musically stylized to represent such sounds as tone series can then be sung. Since your students the whistle of a bird, the syncopated rhythms of a are likely to ask about the mechanism by which mountain stream or the lilt of a cantering horse’ throat singing works, we include a more detailed [5]. Later we will show why these higher tones are description in the following section. typically between three and four octaves above the fundamental. To read more about the different styles of throat singing with accompanying dia- The mechanism of throat singing grams of the vocal system, the reader is encour- The Tuvan style of throat singing, known generi- aged to consult the excellent references [5, 6]. cally as khoomei, begins with applying a special type of constriction of the vocal folds in order to help isolate the harmonics inside the oral cavity. Formants It can be learned by softly singing a low pitch Before analyzing the harmonics of throat sing- using the lowest register of the voice (the vocal ing, first consider speech and usual singing. The fry). Maintaining the vocal folds in the same posi- resonances in the vocal system, unique to each tion add a bit of volume and tone to the voice. individual due to variation in biological structure, This throat constriction closes the vocal folds enhance some frequency regions of the speech or May 2018 2 Phys. Educ. 53 (2018) 035011 Tuvan throat singing and harmonics Figure 1. Coauthor Michael Ruiz (with no voice training) singing the vowels A, E, I, O, and U. Note the harmonics and the brighter areas. These enhanced brighter regions are called formants or formant regions. song. These enhanced regions are called formants O, and 2nd and 3rd for the U. Therefore, strong or formant regions. Throat singing is a skillful overtones are present in normal singing. In Tuvan control of these enhancements. singing, the strength of individual overtones can See figure 1 for a spectrogram of coauthor be expertly controlled by the singer. In this light, Michael Ruiz singing the vowels A, E, I, O, and Tuvan throat singing is not so strange after all. All U with no voice training. The raw spectrogram of us naturally have enhanced harmonics in our for figure 1 came from the PC desktop software voice and song. Mark van Tongeren emphasizes called ‘Spectrogram 16’, developed by Richard this connection in his book Overtone Singing [6]. Horne years ago. A spectrogram is a plot of sine frequencies against time. Note the presence of The throat singer ‘Michael Vetter has said many harmonics with various degrees of bright- that when we speak we produce sequences ness. The stronger the presence of the harmonic, of chords. These are not the triads that our the more pronounced the harmonic appears in the beloved great composers used, of course.
Recommended publications
  • Enhance Your DSP Course with These Interesting Projects
    AC 2012-3836: ENHANCE YOUR DSP COURSE WITH THESE INTER- ESTING PROJECTS Dr. Joseph P. Hoffbeck, University of Portland Joseph P. Hoffbeck is an Associate Professor of electrical engineering at the University of Portland in Portland, Ore. He has a Ph.D. from Purdue University, West Lafayette, Ind. He previously worked with digital cell phone systems at Lucent Technologies (formerly AT&T Bell Labs) in Whippany, N.J. His technical interests include communication systems, digital signal processing, and remote sensing. Page 25.566.1 Page c American Society for Engineering Education, 2012 Enhance your DSP Course with these Interesting Projects Abstract Students are often more interested learning technical material if they can see useful applications for it, and in digital signal processing (DSP) it is possible to develop homework assignments, projects, or lab exercises to show how the techniques can be used in realistic situations. This paper presents six simple, yet interesting projects that are used in the author’s undergraduate digital signal processing course with the objective of motivating the students to learn how to use the Fast Fourier Transform (FFT) and how to design digital filters. Four of the projects are based on the FFT, including a simple voice recognition algorithm that determines if an audio recording contains “yes” or “no”, a program to decode dual-tone multi-frequency (DTMF) signals, a project to determine which note is played by a musical instrument and if it is sharp or flat, and a project to check the claim that cars honk in the tone of F. Two of the projects involve designing filters to eliminate noise from audio recordings, including designing a lowpass filter to remove a truck backup beeper from a recording of an owl hooting and designing a highpass filter to remove jet engine noise from a recording of birds chirping.
    [Show full text]
  • Tuvan Throat Singing): Preliminary Evaluations from Training Seminars Vladislav Matrenitsky Un-Hun School of Healing Throat Singing
    International Journal of Transpersonal Studies Volume 31 | Issue 2 Article 13 7-1-2012 Transpersonal Effects of Exposure to Shamanic Use of Khoomei (Tuvan Throat Singing): Preliminary Evaluations from Training Seminars Vladislav Matrenitsky Un-Hun School of Healing Throat Singing Harris L. Friedman University of Florida Follow this and additional works at: https://digitalcommons.ciis.edu/ijts-transpersonalstudies Part of the Anthropology Commons, Philosophy Commons, Psychology Commons, and the Religion Commons Recommended Citation Matrenitsky, V., & Friedman, H. L. (2012). Matrenitsky, V., & Friedman, H. L. (2012). Transpersonal effects of exposure to shamanic use of Khoomei (Tuvan throat singing): Preliminary evaluations from training seminars [Research note]. International Journal of Transpersonal Studies, 31(2), 111–117.. International Journal of Transpersonal Studies, 31 (2). http://dx.doi.org/10.24972/ ijts.2012.31.2.111 This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License. This Special Topic Article is brought to you for free and open access by the Journals and Newsletters at Digital Commons @ CIIS. It has been accepted for inclusion in International Journal of Transpersonal Studies by an authorized administrator of Digital Commons @ CIIS. For more information, please contact [email protected]. RESEARCH NOTE Transpersonal Effects of Exposure to Shamanic Use of Khoomei (Tuvan Throat Singing): Preliminary Evaluations from Training Seminars Vladislav Matrenitsky Harris L. Friedman
    [Show full text]
  • 4 – Synthesis and Analysis of Complex Waves; Fourier Spectra
    4 – Synthesis and Analysis of Complex Waves; Fourier spectra Complex waves Many physical systems (such as music instruments) allow existence of a particular set of standing waves with the frequency spectrum consisting of the fundamental (minimal allowed) frequency f1 and the overtones or harmonics fn that are multiples of f1, that is, fn = n f1, n=2,3,…The amplitudes of the harmonics An depend on how exactly the system is excited (plucking a guitar string at the middle or near an end). The sound emitted by the system and registered by our ears is thus a complex wave (or, more precisely, a complex oscillation ), or just signal, that has the general form nmax = π +ϕ = 1 W (t) ∑ An sin( 2 ntf n ), periodic with period T n=1 f1 Plots of W(t) that can look complicated are called wave forms . The maximal number of harmonics nmax is actually infinite but we can take a finite number of harmonics to synthesize a complex wave. ϕ The phases n influence the wave form a lot, visually, but the human ear is insensitive to them (the psychoacoustical Ohm‘s law). What the ear hears is the fundamental frequency (even if it is missing in the wave form, A1=0 !) and the amplitudes An that determine the sound quality or timbre . 1 Examples of synthesis of complex waves 1 1. Triangular wave: A = for n odd and zero for n even (plotted for f1=500) n n2 W W nmax =1, pure sinusoidal 1 nmax =3 1 0.5 0.5 t 0.001 0.002 0.003 0.004 t 0.001 0.002 0.003 0.004 £ 0.5 ¢ 0.5 £ 1 ¢ 1 W W n =11, cos ¤ sin 1 nmax =11 max 0.75 0.5 0.5 0.25 t 0.001 0.002 0.003 0.004 t 0.001 0.002 0.003 0.004 ¡ 0.5 0.25 0.5 ¡ 1 Acoustically equivalent 2 0.75 1 1.
    [Show full text]
  • Lecture 1-10: Spectrograms
    Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed that the signal is stationary: that it has constant quality. We need a new kind of analysis for dynamic signals. A suitable analogy is that we have spectral ‘snapshots’ when what we really want is a spectral ‘movie’. Just like a real movie is made up from a series of still frames, we can display the spectral properties of a changing signal through a series of spectral snapshots. 2. The spectrogram: a spectrogram is built from a sequence of spectra by stacking them together in time and by compressing the amplitude axis into a 'contour map' drawn in a grey scale. The final graph has time along the horizontal axis, frequency along the vertical axis, and the amplitude of the signal at any given time and frequency is shown as a grey level. Conventionally, black is used to signal the most energy, while white is used to signal the least. 3. Spectra of short and long waveform sections: we will get a different type of movie if we choose each of our snapshots to be of relatively short sections of the signal, as opposed to relatively long sections. This is because the spectrum of a section of speech signal that is less than one pitch period long will tend to show formant peaks; while the spectrum of a longer section encompassing several pitch periods will show the individual harmonics, see figure 1-10.1.
    [Show full text]
  • The Music of Tuva
    The Music of Tuva THE TINY REPUBLIC OF TUVA is a giant when it comes to mastery of the human voice. The ancient tradition of throat singing (xöömei in Tuvan) developed among the nomadic herdsmen of Central Asia, people who lived in yurts, rode horses, raised yaks, sheep and camels, and had a close spiritual relationship with nature. WHERE IS TUVA? Tuva (sometimes spelled Tyva) sits at the southern edge of Siberia, with Mongolia to its south. Over the centuries, Tuva has been part of Chinese and Mongolian empires, and shares many cultural ties with Mongolia. Early in the 20th century it came under Russian influence, and in 1944 it became part of the Soviet Union. Tuva is now a member of the Russian Federation. A UNIQUE CONCEPT OF SOUND. The Tuvan way of making music is based on appreciation of complex sounds with multiple layers. Whereas the western cellist aims to produce a focused, pure tone, the Tuvan igil player enjoys breaking the tone into a spray of sounds and textures. Absolute pitch is less important than richness of texture. Multiple sonorities are heard together as an inseparable whole. This idea may be illustrated by an anecdote about a respected Tuvan musician who was demonstrating the igil, a bowed instrument with two strings tuned a fifth apart. When asked to play each string separately, he refused, saying it wouldn’t make any sense. The only meaningful sound was the combination of the two pitches played together. TUVAN THROAT SINGING. Despite what the term might suggest, throat singing does not strain the singer’s throat.
    [Show full text]
  • Analysis of EEG Signal Processing Techniques Based on Spectrograms
    ISSN 1870-4069 Analysis of EEG Signal Processing Techniques based on Spectrograms Ricardo Ramos-Aguilar, J. Arturo Olvera-López, Ivan Olmos-Pineda Benemérita Universidad Autónoma de Puebla, Faculty of Computer Science, Puebla, Mexico [email protected],{aolvera, iolmos}@cs.buap.mx Abstract. Current approaches for the processing and analysis of EEG signals consist mainly of three phases: preprocessing, feature extraction, and classifica- tion. The analysis of EEG signals is through different domains: time, frequency, or time-frequency; the former is the most common, while the latter shows com- petitive results, implementing different techniques with several advantages in analysis. This paper aims to present a general description of works and method- ologies of EEG signal analysis in time-frequency, using Short Time Fourier Transform (STFT) as a representation or a spectrogram to analyze EEG signals. Keywords. EEG signals, spectrogram, short time Fourier transform. 1 Introduction The human brain is one of the most complex organs in the human body. It is considered the center of the human nervous system and controls different organs and functions, such as the pumping of the heart, the secretion of glands, breathing, and internal tem- perature. Cells called neurons are the basic units of the brain, which send electrical signals to control the human body and can be measured using Electroencephalography (EEG). Electroencephalography measures the electrical activity of the brain by recording via electrodes placed either on the scalp or on the cortex. These time-varying records produce potential differences because of the electrical cerebral activity. The signal generated by this electrical activity is a complex random signal and is non-stationary [1].
    [Show full text]
  • I Am a Linguist Therefore I Am Kalmyk Reclaiming My Ethnic Identity
    I am a linguist therefore I am Kalmyk Reclaiming my ethnic identity Elena Indjieva March 13, 2009 Linguistics Department University of Hawai‘i at Manoa Focus Reclaiming my ethnic identity The value of the linguistic heritage Oirat is a Western Mongolian language spoken in China, Russia, and Mongolia In Russia it’s called Kalmyk [xal’mg] In China and Mongolia it’s Oirat [oerd] Oirat = Kalmyk 400 years between Oirats in Russia and Oirats in China Causes of Kalmyk language and culture loss Soviet policies • Fight with illiteracy (early 20s) • Introduction of the Cyrillic alphabet (1924) (Losing touch with the written heritage) • Eradication of the religion (killing of about 2000 Buddhist monks) • Deportation to Siberia as a major blow (13 years of humiliation) • Decidedly assimilationist policies (Drastic cuts in native language education (1960-70s) the last Kalmyk national school was closed in 1963) In 1980s about 98% of Kalmyk pupils entering school at the age of seven don't speak their mother tongue. CPR for the Kalmyk language Revitalization policies • Russian and Kalmyk languages are declared the state languages of the Republic of Kalmykia (1991) • The Concept of the National System of Education (1993) • National schools are opened again (30 years later) • New Terminology Committee As a result we have it all • Oriental architecture, sculpture • Billboards with scenes from the traditional epic • Signs written in the old Kalmyk vertical writing • CDs with national folklore songs • National dance ensemble • Traditional celebrations •
    [Show full text]
  • Improved Spectrograms Using the Discrete Fractional Fourier Transform
    IMPROVED SPECTROGRAMS USING THE DISCRETE FRACTIONAL FOURIER TRANSFORM Oktay Agcaoglu, Balu Santhanam, and Majeed Hayat Department of Electrical and Computer Engineering University of New Mexico, Albuquerque, New Mexico 87131 oktay, [email protected], [email protected] ABSTRACT in this plane, while the FrFT can generate signal representations at The conventional spectrogram is a commonly employed, time- any angle of rotation in the plane [4]. The eigenfunctions of the FrFT frequency tool for stationary and sinusoidal signal analysis. How- are Hermite-Gauss functions, which result in a kernel composed of ever, it is unsuitable for general non-stationary signal analysis [1]. chirps. When The FrFT is applied on a chirp signal, an impulse in In recent work [2], a slanted spectrogram that is based on the the chirp rate-frequency plane is produced [5]. The coordinates of discrete Fractional Fourier transform was proposed for multicompo- this impulse gives us the center frequency and the chirp rate of the nent chirp analysis, when the components are harmonically related. signal. In this paper, we extend the slanted spectrogram framework to Discrete versions of the Fractional Fourier transform (DFrFT) non-harmonic chirp components using both piece-wise linear and have been developed by several researchers and the general form of polynomial fitted methods. Simulation results on synthetic chirps, the transform is given by [3]: SAR-related chirps, and natural signals such as the bat echolocation 2α 2α H Xα = W π x = VΛ π V x (1) and bird song signals, indicate that these generalized slanted spectro- grams provide sharper features when the chirps are not harmonically where W is a DFT matrix, V is a matrix of DFT eigenvectors and related.
    [Show full text]
  • On the Features of the Sedentary Constructions of Zunghars and Defensive Sistem
    JOURNAL OF CRITICAL REVIEWS ISSN- 2394-5125 VOL 7, ISSUE 08, 2020 ON THE FEATURES OF THE SEDENTARY CONSTRUCTIONS OF ZUNGHARS AND DEFENSIVE SISTEM Dordzhi G. Kukeev1, Nina V. Shorvaeva2 1 Federal State Budgetary Educational Institution of Higher Education Kalmyk State university named after B.B. Gorodovikov, 358000, Pushkin Street, 11. Elista, Russia. 2Federal State Budgetary Educational Institution of Higher Education Kalmyk State university named after B.B. Gorodovikov, 358000, Pushkin Street, 11. Elista, Russia. E-mail:1 [email protected] Received: 11.03.2020 Revised: 12.04.2020 Accepted: 28.05.2020 ABSTRACT: Because of the importance of studying the history of relations of the Qing dynasty with the peoples of Central Asia in the 18th and 19th centuries, and the existence of the phenomenon referred to as the “Zunghar heritage”, it is appropriate to study its background in the defensive systems of Zunghar and Qing Empires in Central Asia. There is a recent tendency to mention the so-called “Zunghar legacy” in the works of modern historiography on the history of Central Eurasia. It means like as a combination of political traditions, administrative and economic activities and methods of contacts, which were adopted by the Qing authorities from the Oirats. The researchers, actively using Manchu sources, explain the nature of the using of this “legacy” by the Qing through the model of “North Asian policy”, the “Qing world order” or the “Central Eurasian tradition”. In this regard and according to the logic, a comparative method and an attempt to make clear the genesis of a phenomenon, which had related to the Qing-Oirat relations before the contact of the Qing with Central Asia, west of Xinjiang, should also cause some interest in Qing and Central Asian studies, especially in the area of sedentary constructions of Zunghars and defensive system, named “Karul” or “Karun”.
    [Show full text]
  • Akü Amader / Sayi 8 1 Afyon Kocatepe Üniversitesi
    AKÜ AMADER / SAYI 8 AFYON KOCATEPE ÜNİVERSİTESİ AKADEMİK MÜZİK ARAŞTIRMALARI DERGİSİ Cilt IV / Sayı 8 / Haziran 2018 ISSN: 2149-4304 AFYON KOCATEPE UNIVERSITY ACADEMIC MUSIC RESEARCH JOURNAL Volume IV / Issue 8 / June 2018 Cihan IŞIKHAN Midi Dönüştürücü Yazılımların Başarı Karşılaştırması ve Matlab’de Müzik Analizi Emel Funda TÜRKMEN Koro Eğitiminde Dalcroze Yöntemi’nin Müziksel Aslı PANCAR Algı ve Bilgilenme Öğrenme Alanı Açısından Değerlendirilmesi Filiz YILDIZ Amatör Keman Eğitimini Destekleyici Teori ve Solfej Eğitimi “CAKA (Cihat Aşkın ve Küçük Arkadaşları) Modeli” Melike ÇAKAN Ses Eğitiminde Kullanılan Nefes ve Ses Gülnihal GÜL Egzersizlerinin Konuşma Bozukluklarının Giderilmesinde Kullanılabilirliği Bensu KİTİRCİ Palyatif Bakım Ünitelerinde Uygulanan Müzik Terapi Çalışmaları Üzerine Bir Araştırma Fakı Can YÜRÜK Sanatçı Adayı Üstün Yetenekli Çocukları Olan Ailelerin Karşılaştığı Sorunlar Üzerine Bir Araştırma Ezginur KÜÇÜKDÜRÜM Teori Eksenli Disiplinlerarası Bir Çalışma Tuva Gırtlaktan Söyleme Stillerinin Analiz Sonuçları 1 AKÜ AMADER / SAYI 8 AFYON KOCATEPE ÜNİVERSİTESİ AKADEMİK MÜZİK ARAŞTIRMALARI DERGİSİ Cilt IV / Sayı 8 / Haziran 2018 ISSN: 2149-4304 AFYON KOCATEPE UNIVERSITY ACADEMIC MUSIC RESEARCH JOURNAL Volume IV / Issue 8 June 2018 Sahibi / Owner Afyon Kocatepe Üniversitesi adına Devlet Konservatuvarı Müdürü Prof. Dr. Uğur TÜRKMEN Editörler / Editors Prof. Dr. Uğur TÜRKMEN Dr. Öğr. Üyesi Duygu SÖKEZOĞLU ATILGAN Yardımcı Editörler / Co-Editorials Doç. Çağhan ADAR Arş. Grv. Safiye YAĞCI Öğr. Elm. Filiz YILDIZ Yayın Kurulu
    [Show full text]
  • Attention, I'm Trying to Speak Cs224n Project: Speech Synthesis
    Attention, I’m Trying to Speak CS224n Project: Speech Synthesis Akash Mahajan Management Science & Engineering Stanford University [email protected] Abstract We implement an end-to-end parametric text-to-speech synthesis model that pro- duces audio from a sequence of input characters, and demonstrate that it is possi- ble to build a convolutional sequence to sequence model with reasonably natural voice and pronunciation from scratch in well under $75. We observe training the attention to be a bottleneck and experiment with 2 modifications. We also note interesting model behavior and insights during our training process. Code for this project is available on: https://github.com/akashmjn/cs224n-gpu-that-talks. 1 Introduction We have come a long way from the ominous robotic sounding voices used in the Radiohead classic 1. If we are to build significant voice interfaces, we need to also have good feedback that can com- municate with us clearly, that can be setup cheaply for different languages and dialects. There has recently also been an increased interest in generative models for audio [6] that could have applica- tions beyond speech to music and other signal-like data. As discussed in [13] it is fascinating that is is possible at all for an end-to-end model to convert a highly ”compressed” source - text - into a substantially more ”decompressed” form - audio. This is a testament to the sheer power and flexibility of deep learning models, and has been an interesting and surprising insight in itself. Recently, results from Tachibana et. al. [11] reportedly produce reasonable-quality speech without requiring as large computational resources as Tacotron [13] and Wavenet [7].
    [Show full text]
  • Time-Frequency Analysis of Time-Varying Signals and Non-Stationary Processes
    Time-Frequency Analysis of Time-Varying Signals and Non-Stationary Processes An Introduction Maria Sandsten 2020 CENTRUM SCIENTIARUM MATHEMATICARUM Centre for Mathematical Sciences Contents 1 Introduction 3 1.1 Spectral analysis history . 3 1.2 A time-frequency motivation example . 5 2 The spectrogram 9 2.1 Spectrum analysis . 9 2.2 The uncertainty principle . 10 2.3 STFT and spectrogram . 12 2.4 Gabor expansion . 14 2.5 Wavelet transform and scalogram . 17 2.6 Other transforms . 19 3 The Wigner distribution 21 3.1 Wigner distribution and Wigner spectrum . 21 3.2 Properties of the Wigner distribution . 23 3.3 Some special signals. 24 3.4 Time-frequency concentration . 25 3.5 Cross-terms . 27 3.6 Discrete Wigner distribution . 29 4 The ambiguity function and other representations 35 4.1 The four time-frequency domains . 35 4.2 Ambiguity function . 39 4.3 Doppler-frequency distribution . 44 5 Ambiguity kernels and the quadratic class 45 5.1 Ambiguity kernel . 45 5.2 Properties of the ambiguity kernel . 46 5.3 The Choi-Williams distribution . 48 5.4 Separable kernels . 52 1 Maria Sandsten CONTENTS 5.5 The Rihaczek distribution . 54 5.6 Kernel interpretation of the spectrogram . 57 5.7 Multitaper time-frequency analysis . 58 6 Optimal resolution of time-frequency spectra 61 6.1 Concentration measures . 61 6.2 Instantaneous frequency . 63 6.3 The reassignment technique . 65 6.4 Scaled reassigned spectrogram . 69 6.5 Other modern techniques for optimal resolution . 72 7 Stochastic time-frequency analysis 75 7.1 Definitions of non-stationary processes .
    [Show full text]