10. Pitch and Voicing Determination of Speech with an Extension Toward Music Signals Pitch Andw

Total Page:16

File Type:pdf, Size:1020Kb

10. Pitch and Voicing Determination of Speech with an Extension Toward Music Signals Pitch Andw 181 10. Pitch and Voicing Determination of Speech with an Extension Toward Music Signals Pitch andW. J. Hess Vo 10.2.3 Frequency-Domain Methods: This chapter reviews selected methods for pitch Harmonic Analysis ........................ 188 determination of speech and music signals. As 10.2.4 Active Modeling ........................... 190 both these signals are time variant we first define 10.2.5 Least Squares what is subsumed under the term pitch.Thenwe and Other Statistical Methods ........ 191 subdivide pitch determination algorithms (PDAs) 10.2.6 Concluding Remarks ..................... 192 into short-term analysis algorithms, which apply some spectral transform and derive pitch from 10.3 Selected Time-Domain Methods............. 192 a frequency or lag domain representation, and 10.3.1 Temporal Structure Investigation.... 192 time-domain algorithms, which analyze the signal 10.3.2 Fundamental Harmonic Processing. 193 directly and apply structural analysis or determine 10.3.3 Temporal Structure Simplification... 193 10.3.4 Cascaded Solutions 195 individual periods from the first partial or compute ....................... the instant of glottal closure in speech. In the 1970s, 10.4 A Short Look into Voicing Determination. 195 when many of these algorithms were developed, 10.4.1 Simultaneous Pitch the main application in speech technology was the and Voicing Determination............ 196 vocoder, whereas nowadays prosody recognition in 10.4.2 Pattern-Recognition VDAs ............. 197 speech understanding systems and high-accuracy 10.5 Evaluation and Postprocessing .............. 197 pitch period determination for speech synthesis 10.5.1 Developing Reference PDAs corpora are emphasized. In musical acoustics, pitch with Instrumental Help................. 197 determination is applied in melody recognition or 10.5.2 Error Analysis............................... 198 Part B automatic musical transcription, where we also 10.5.3 Evaluation of PDAs have the problem that several pitches can exist and VDAs– Some Results ............... 200 simultaneously. 10.5.4 Postprocessing and Pitch Tracking .. 201 10 10.6 Applications in Speech and Music........... 201 10.1 Pitch in Time-Variant Quasiperiodic 10.7 Some New Challenges and Developments 203 Acoustic Signals.................................... 182 10.7.1 Detecting the Instant 10.1.1 Basic Definitions .......................... 182 of Glottal Closure ......................... 203 10.1.2 Why is the Problem Difficult? ......... 184 10.7.2 Multiple Pitch Determination......... 204 10.1.3 Categorizing the Methods.............. 185 10.7.3 Instantaneousness 10.2 Short-Term Analysis PDAs ...................... 185 Versus Reliability.......................... 206 10.2.1 Correlation and Distance Function 185 .. 10.8 Concluding Remarks ............................. 207 10.2.2 Cepstrum and Other Double-Transform Methods ........... 187 References .................................................. 208 Pitch and voicing determination of speech signals are the presence of a voiced excitation and the presence of the two subproblems of voice source analysis. In voiced a voiceless excitation, a problem we will refer to as voic- speech, the vocal cords vibrate in a quasiperiodic way. ing determination and, for the segments of the speech Speech segments with voiceless excitation are gener- signal in which a voiced excitation is present, the rate of ated by turbulent air flow at a constriction or by the vocal cord vibration, which is usually referred to as pitch release of a closure in the vocal tract. The parameters determination or fundamental frequency determination we have to determine are the manner of excitation, i. e., in the literature. 182 Part B Signal Processing for Speech Unlike the analysis of vocal-tract parameters, where statistical methods, particularly in connection with si- a number of independent and equivalent representations nusoidal models [10.2], entered the domain. A number are possible, there is no alternative to the parameters of known methods were improved and refined, whereas pitch and voicing, and the quality of a synthesized sig- other solutions that required an amount of computational nal critically depends on their reliable and accurate effort that appeared prohibitive at the time the algorithm determination. This chapter presents a selection of the was first developed were revived. With the widespread methods applied in pitch and voicing determination. The use of databases containing many labeled and processed emphasis, however, is on pitch determination. speech data, it has nowadays also become possible to Over the last two decades, the task of fundamen- thoroughly evaluate the performance of the algorithms. tal frequency determination has become increasingly The bibliography in [10.1], dating from 1983, in- popular in musical acoustics as well. In the begin- cludes about 2000 entries. To give a complete overview ning the methodology was largely imported from the of the more-recent developments, at least another 1000 speech community, but then the musical acoustics com- bibliographic entries would have to be added. It goes munity developed algorithms and applications of their without saying that this is not possible here given the own, which in turn became increasingly interesting to space limitations. So we will necessarily have to present the speech communication area. Hence it appears jus- a selection, and many important contributions cannot be tified to include the aspect of fundamental frequency described. determination of music signals and to present some of The remainder of this chapter is organized as fol- the methods and specific problems of this area. One lows. In Sect. 10.1 the problems of pitch and voicing specific problem is multipitch determination from poly- determination are described, definitions of what is sub- phonic signals, a problem that might also occur in speech sumed under the term pitch are given, and the various when we have to separate two or more simultaneously PDAs are grossly categorized. Sections 10.2 and 10.3 speaking voices. give a more-detailed description of selected PDAs. Pitch determination has a rather long history which Section 10.4 shortly reviews a selection of voicing de- goes back even beyond the times of vocoding. Liter- termination algorithms (VDAs); Sect. 10.5 deals with Part B ally hundreds of pitch determination algorithms (PDAs) questions of error analysis and evaluation. Selected ap- have been developed. The most important developments plications are discussed in Sect. 10.6, and Sect. 10.7 leading to today’s state of the art were made in the finally presents a couple of new developments, such 10.1 1960s and 1970s; most of the methods that are briefly as determining the instant of glottal closure or process- reviewed in this chapter were extensively discussed dur- ing signals that contain more than one pitch, such as ing this period [10.1]. Since then, least-squares and other polyphonic music. 10.1 Pitch in Time-Variant Quasiperiodic Acoustic Signals 10.1.1 Basic Definitions a look at the parameter pitch and provide a clear defi- nition of what should be measured and what is actually Pitch, i. e., the fundamental frequency F0 and fundamen- measured. tal period T0 of a (quasi)periodic signal, can be measured A word on terminology first. There are three points in many ways. If a signal is completely stationary and of view for looking at such a problem of acoustic signal periodic, all these strategies – provided they operate cor- processing [10.3]: the production,thesignal-processing, rectly – lead to identical results. Since both speech and and the perception points of view. For pitch deter- musical signals, however, are nonstationary and time mination of speech, the production point of view is variant, aspects of each strategy such as the starting obviously oriented toward phonation in the human lar- point of the measurement, the length of the measuring ynx; we will thus have to start from a time-domain interval, the way of averaging (if any), or the operating representation of the waveform as a train of laryn- domain (time, frequency, lag etc.) start to influence the geal pulses. If a pitch determination algorithm (PDA) results and may lead to estimates that differ from algo- works in a speech-production oriented way, it measures rithm to algorithm even if all these results are correct individual laryngeal excitation cycles or, if some aver- and accurate. Before entering a discussion on individ- aging is performed, the rate of vocal-fold vibration.The ual methods and applications, we must therefore have signal-processing point of view, which can be applied Pitch and Voicing Determination of Speech 10.1 Pitch in Time-Variant Quasiperiodic Acoustic Signals 183 2. T is defined as the elapsed time between two a) T [(Def. (2)] 0 0 successive laryngeal pulses. Measurement starts at an arbitrary point within an excitation cycle. The choice of this point depends on the individual t method, but for a given PDA it is always located at T0 [(Def. (3)] the same position within the cycle. b) T0 [(Def. (1)] Time-domain PDAs usually follow this definition. The reference point can be a significant extreme, a certain zero crossing, etc. The signal is tracked t period by period in a synchronous way yielding individual pitch periods. This principle can be ap- Fig. 10.1a,b Time-domain definitions of T0. (a) Speech plied to both speech and music signals. In speech it signal (a couple of periods), (b) glottal waveform (recon- may even be possible to derive the point of glottal structed). For further detail, see the text closure from the reference point when the signal is undistorted. to any acoustic signal, means that (quasi)periodicity 3. T0 is defined as the elapsed time between two suc- or at least cyclic behavior is observed, and that the cessive excitation cycles. Measurement starts at an task is to extract those features that best represent arbitrary instant which is fixed according to exter- this periodicity. The pertinent terms are fundamental nal conditions, and ends when a complete cycle has frequency and fundamental period. If individual cy- elapsed. cles are determined, we may (somewhat inconsistently) This is an incremental definition.
Recommended publications
  • 23 Strange Answers
    23 ♦ Strange Answers Lawyer Plum was there and one pair of heirs when Otis Amber danced into the game room. “He-he-he, the Turtle’s lost its tail, I see.” Turtle slumped low in her chair. Flora Baumbach thought the short, sleek haircut was adorable, especially the way it swept forward over her little chin, but Turtle did not want to look adorable. She wanted to look mean. The dressmaker fumbled past the wad of money in her handbag. “Here, Alice, I thought you might like to see this.” Turtle glanced at the old snapshot. It’s Baba, all right, except younger. Same dumb smile. Suddenly she sat upright. “That’s my daughter, Rosalie,” Flora Baumbach said. “She must have been nine or ten when that picture was taken.” Rosalie was squat and square and squinty, her protruding tongue was too large for her mouth, her head lolled to one side. “I think I would have liked her, Baba,” Turtle said. “Rosalie looks like she was a very happy person. She must have been nice to have around.” Thump-thump, thump-thump. “Here come the victims,” Sydelle Pulaski announced. Angela greeted her sister with a wave of her crimson- streaked, healing hand. Turtle had convinced her not to confess: It would mean a criminal record, it would kill their mother, and no one would believe her anyhow. “I like your haircut.” “Thanks,” Turtle replied. Now Angela had to love her forever. Most of the heirs had to comment on Turtle’s hair. “You look like a real businesswoman,” Sandy said.
    [Show full text]
  • Life on the Reef
    INTERVIEW HOW TO BE… UK BAT DR SANDY KNAPP ON HER AN EXPERT SPECIES FASCINATION WITH PLANTS WITNESS 18 EXPLAINED THE SOCIETY OF BIOLOGY MAGAZINE ⁄www.societyofbiology.org ISSN 0006-3347 • Vol 62 No 3 • Jun/Jul 2015 LIFE ON THE REEF Unravelling the mysteries of coral NEWNEWNEW FROMFROM FROM GARLANDGARLAND GARLAND SCIENCE SCIENCE SCIENCE CellCellCell MembranesMembranes Membranes NEW FROM GARLANDLukasLukasLukas K.K. K. Buehler,Buehler, Buehler, SouthwesternSouthwestern Southwestern College, College, College, USA USASCIENCE USA CellCell CellMembranes Membranes Membranes offers offers offers a solida solid a solidfoundation foundation foundation for for for NEW FROM GARLANDunderstandingunderstandingunderstanding the the structurethe structure structure andSCIENCE and functionand function function of biologicalof ofbiological biological NEW FROM GARLANDmembranes.membranes.membranes. SCIENCE The book explores the composition and dynamics of cell TheThe book book explores explores the the composition composition and and dynamics dynamics of cellof cell membranes—discussing the molecular and biological NEW FROM GARLANDmembranes—discussingmembranes—discussing theSCIENCE the molecular molecular and and biological biological diversity of its lipid and protein components and how the diversitydiversity of itsof itslipid lipid and and protein protein components components and and how how the the combinatorialcombinatorial richness richness of both of bothcomponents components explains explains the the Cell Membranes chemical,combinatorial mechanical,
    [Show full text]
  • Speech Signal Basics
    Speech Signal Basics Nimrod Peleg Updated: Feb. 2010 Course Objectives • To get familiar with: – Speech coding in general – Speech coding for communication (military, cellular) • Means: – Introduction the basics of speech processing – Presenting an overview of speech production and hearing systems – Focusing on speech coding: LPC codecs: • Basic principles • Discussing of related standards • Listening and reviewing different codecs. הנחיות לשימוש נכון בטלפון, פלשתינה-א"י, 1925 What is Speech ??? • Meaning: Text, Identity, Punctuation, Emotion –prosody: rhythm, pitch, intensity • Statistics: sampled speech is a string of numbers – Distribution (Gamma, Laplace) Quantization • Information Theory: statistical redundancy – ZIP, ARJ, compress, Shorten, Entropy coding… ...1001100010100110010... • Perceptual redundancy – Temporal masking , frequency masking (mp3, Dolby) • Speech production Model – Physical system: Linear Prediction The Speech Signal • Created at the Vocal cords, Travels through the Vocal tract, and Produced at speakers mouth • Gets to the listeners ear as a pressure wave • Non-Stationary, but can be divided to sound segments which have some common acoustic properties for a short time interval • Two Major classes: Vowels and Consonants Speech Production • A sound source excites a (vocal tract) filter – Voiced: Periodic source, created by vocal cords – UnVoiced: Aperiodic and noisy source •The Pitch is the fundamental frequency of the vocal cords vibration (also called F0) followed by 4-5 Formants (F1 -F5) at higher frequencies.
    [Show full text]
  • Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks
    sensors Article Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks Roneel V. Sharan * , Hao Xiong and Shlomo Berkovsky Australian Institute of Health Innovation, Macquarie University, Sydney, NSW 2109, Australia; [email protected] (H.X.); [email protected] (S.B.) * Correspondence: [email protected] Abstract: Audio signal classification finds various applications in detecting and monitoring health conditions in healthcare. Convolutional neural networks (CNN) have produced state-of-the-art results in image classification and are being increasingly used in other tasks, including signal classification. However, audio signal classification using CNN presents various challenges. In image classification tasks, raw images of equal dimensions can be used as a direct input to CNN. Raw time- domain signals, on the other hand, can be of varying dimensions. In addition, the temporal signal often has to be transformed to frequency-domain to reveal unique spectral characteristics, therefore requiring signal transformation. In this work, we overview and benchmark various audio signal representation techniques for classification using CNN, including approaches that deal with signals of different lengths and combine multiple representations to improve the classification accuracy. Hence, this work surfaces important empirical evidence that may guide future works deploying CNN for audio signal classification purposes. Citation: Sharan, R.V.; Xiong, H.; Berkovsky, S. Benchmarking Audio Keywords: convolutional neural networks; fusion; interpolation; machine learning; spectrogram; Signal Representation Techniques for time-frequency image Classification with Convolutional Neural Networks. Sensors 2021, 21, 3434. https://doi.org/10.3390/ s21103434 1. Introduction Sensing technologies find applications in detecting and monitoring health conditions.
    [Show full text]
  • Designing Speech and Multimodal Interactions for Mobile, Wearable, and Pervasive Applications
    Designing Speech and Multimodal Interactions for Mobile, Wearable, and Pervasive Applications Cosmin Munteanu Nikhil Sharma Abstract University of Toronto Google, Inc. Traditional interfaces are continuously being replaced Mississauga [email protected] by mobile, wearable, or pervasive interfaces. Yet when [email protected] Frank Rudzicz it comes to the input and output modalities enabling Pourang Irani Toronto Rehabilitation Institute our interactions, we have yet to fully embrace some of University of Manitoba University of Toronto the most natural forms of communication and [email protected] [email protected] information processing that humans possess: speech, Sharon Oviatt Randy Gomez language, gestures, thoughts. Very little HCI attention Incaa Designs Honda Research Institute has been dedicated to designing and developing spoken [email protected] [email protected] language and multimodal interaction techniques, Matthew Aylett Keisuke Nakamura especially for mobile and wearable devices. In addition CereProc Honda Research Institute to the enormous, recent engineering progress in [email protected] [email protected] processing such modalities, there is now sufficient Gerald Penn Kazuhiro Nakadai evidence that many real-life applications do not require University of Toronto Honda Research Institute 100% accuracy of processing multimodal input to be [email protected] [email protected] useful, particularly if such modalities complement each Shimei Pan other. This multidisciplinary, two-day workshop
    [Show full text]
  • Introduction to Digital Speech Processing Schafer Rabiner and Ronald W
    SIGv1n1-2.qxd 11/20/2007 3:02 PM Page 1 FnT SIG 1:1-2 Foundations and Trends® in Signal Processing Introduction to Digital Speech Processing Processing to Digital Speech Introduction R. Lawrence W. Rabiner and Ronald Schafer 1:1-2 (2007) Lawrence R. Rabiner and Ronald W. Schafer Introduction to Digital Speech Processing highlights the central role of DSP techniques in modern speech communication research and applications. It presents a comprehensive overview of digital speech processing that ranges from the basic nature of the speech signal, through a variety of methods of representing speech in digital form, to applications in voice communication and automatic synthesis and recognition of speech. Introduction to Digital Introduction to Digital Speech Processing provides the reader with a practical introduction to Speech Processing the wide range of important concepts that comprise the field of digital speech processing. It serves as an invaluable reference for students embarking on speech research as well as the experienced researcher already working in the field, who can utilize the book as a Lawrence R. Rabiner and Ronald W. Schafer reference guide. This book is originally published as Foundations and Trends® in Signal Processing Volume 1 Issue 1-2 (2007), ISSN: 1932-8346. now now the essence of knowledge Introduction to Digital Speech Processing Introduction to Digital Speech Processing Lawrence R. Rabiner Rutgers University and University of California Santa Barbara USA [email protected] Ronald W. Schafer Hewlett-Packard Laboratories Palo Alto, CA USA Boston – Delft Foundations and Trends R in Signal Processing Published, sold and distributed by: now Publishers Inc.
    [Show full text]
  • Indie TV: Innovation in Series Development Aymar Jean Christian Northwestern Univeristy I Spent Five Years Interviewing Creators
    Indie TV: Innovation in series development DRAFT Aymar Jean Christian Northwestern Univeristy I spent five years interviewing creators of American web series, or independent television online,1 speaking with 134 producers and executives making and releasing scripted series independent of traditional network investment. No two series or creators were ever alike. Some were sci-fi comedies created by Trekkers, others urban dramas by black women. Nevertheless nearly every creator differentiated their work and products from traditional television in some way. Strikingly, whenever I ask if any particular creator inspired them to develop their own show independently, I heard one response more than any other: Felicia Day. Indie TV creators revere Day for her innovation in production and distribution: she developed a hit franchise, The Guild, outside network development processes. The Guild, a sitcom about a diverse group of gamers who sit at computers all day to play an MMORPG in the style of World of Warcraft, ran for six seasons, from 2007 to 2013, and spawned a music video, DVDs, comic books, a companion book, as well as marshaled the passions of hundreds of thousands of fans. By the end of its run, The Guild had amassed over 330,000 subscribers on YouTube, though it could also be seen on other portals. In my work I have seen The Guild emerge as central to the online indie TV industry’s case for itself as antidote to what ails Hollywood. Day initially pitched to Christian, A.J. (forthcoming). Indie TV: Innovation in Series Development, in Media Independence: 1 working with freedom or working for free?.
    [Show full text]
  • Models of Speech Synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden
    Proc. Natl. Acad. Sci. USA Vol. 92, pp. 9932-9937, October 1995 Colloquium Paper This paper was presented at a colloquium entitled "Human-Machine Communication by Voice," organized by Lawrence R. Rabiner, held by the National Academy of Sciences at The Arnold and Mabel Beckman Center in Irvine, CA, February 8-9,1993. Models of speech synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden ABSTRACT The term "speech synthesis" has been used need large amounts of speech data. Models working close to for diverse technical approaches. In this paper, some of the the waveform are now typically making use of increased unit approaches used to generate synthetic speech in a text-to- sizes while still modeling prosody by rule. In the middle of the speech system are reviewed, and some of the basic motivations scale, "formant synthesis" is moving toward the articulatory for choosing one method over another are discussed. It is models by looking for "higher-level parameters" or to larger important to keep in mind, however, that speech synthesis prestored units. Articulatory synthesis, hampered by lack of models are needed not just for speech generation but to help data, still has some way to go but is yielding improved quality, us understand how speech is created, or even how articulation due mostly to advanced analysis-synthesis techniques. can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages Flexibility and Technical Dimensions are discussed as special challenges facing the speech synthesis community.
    [Show full text]
  • Audio Processing Laboratory
    AC 2010-1594: A GRADUATE LEVEL COURSE: AUDIO PROCESSING LABORATORY Buket Barkana, University of Bridgeport Page 15.35.1 Page © American Society for Engineering Education, 2010 A Graduate Level Course: Audio Processing Laboratory Abstract Audio signal processing is a part of the digital signal processing (DSP) field in science and engineering that has developed rapidly over the past years. Expertise in audio signal processing - including speech signal processing- is becoming increasingly important for working engineers from diverse disciplines. Audio signals are stored, processed, and transmitted using digital techniques. Creating these technologies requires engineers that understand real-time application issues in the related areas. With this motivation, we designed a graduate level laboratory course which is Audio Processing Laboratory in the electrical engineering department in our school two years ago. This paper presents the status of the audio processing laboratory course in our school. We report the challenges that we have faced during the development of this course in our school and discuss the instructor’s and students’ assessments and recommendations in this real-time signal-processing laboratory course. 1. Introduction Many DSP laboratory courses are developed for undergraduate and graduate level engineering education. These courses mostly focus on the general signal processing techniques such as quantization, filter design, Fast Fourier Transformation (FFT), and spectral analysis [1-3]. Because of the increasing popularity of Web-based education and the advancements in streaming media applications, several web-based DSP Laboratory courses have been designed for distance education [4-8]. An internet-based signal processing laboratory that provides hands-on learning experiences in distributed learning environments has been developed by Spanias et.al[6] .
    [Show full text]
  • ECE438 - Digital Signal Processing with Applications 1
    Purdue University: ECE438 - Digital Signal Processing with Applications 1 ECE438 - Laboratory 9: Speech Processing (Week 1) October 6, 2010 1 Introduction Speech is an acoustic waveform that conveys information from a speaker to a listener. Given the importance of this form of communication, it is no surprise that many applications of signal processing have been developed to manipulate speech signals. Almost all speech processing applications currently fall into three broad categories: speech recognition, speech synthesis, and speech coding. Speech recognition may be concerned with the identification of certain words, or with the identification of the speaker. Isolated word recognition algorithms attempt to identify individual words, such as in automated telephone services. Automatic speech recognition systems attempt to recognize continuous spoken language, possibly to convert into text within a word processor. These systems often incorporate grammatical cues to increase their accuracy. Speaker identification is mostly used in security applications, as a person’s voice is much like a “fingerprint”. The objective in speech synthesis is to convert a string of text, or a sequence of words, into natural-sounding speech. One example is the Speech Plus synthesizer used by Stephen Hawking (although it unfortunately gives him an American accent). There are also similar systems which read text for the blind. Speech synthesis has also been used to aid scientists in learning about the mechanisms of human speech production, and thereby in the treatment of speech-related disorders. Speech coding is mainly concerned with exploiting certain redundancies of the speech signal, allowing it to be represented in a compressed form. Much of the research in speech compression has been motivated by the need to conserve bandwidth in communication sys- tems.
    [Show full text]
  • Speech Processing Based on a Sinusoidal Model
    R.J. McAulay andT.F. Quatieri Speech Processing Based on a Sinusoidal Model Using a sinusoidal model of speech, an analysis/synthesis technique has been devel­ oped that characterizes speech in terms of the amplitudes, frequencies, and phases of the component sine waves. These parameters can be estimated by applying a simple peak-picking algorithm to a short-time Fourier transform (STFf) of the input speech. Rapid changes in the highly resolved spectral components are tracked by using a frequency-matching algorithm and the concept of"birth" and "death" ofthe underlying sinewaves. Fora givenfrequency track, a cubic phasefunction is applied to a sine-wave generator. whose outputis amplitude-modulated andadded to the sine waves generated for the other frequency tracks. and this sum is the synthetic speech output. The resulting waveform preserves the general waveform shape and is essentially indistin­ gUishable from the original speech. Furthermore. inthe presence ofnoise the perceptual characteristics ofthe speech and the noise are maintained. Itwas also found that high­ quality reproduction was obtained for a large class of inputs: two overlapping. super­ posed speech waveforms; music waveforms; speech in musical backgrounds; and certain marine biologic sounds. The analysis/synthesis system has become the basis for new approaches to such diverse applications as multirate coding for secure communications, time-scale and pitch-scale algorithms for speech transformations, and phase dispersion for the enhancement ofAM radio broadcasts. Moreover, the technology behind the applications has been successfully transferred to private industry for commercial development. Speech signals can be represented with a Other approaches to analysis/synthesis that· speech production model that views speech as are based on sine-wave models have been the result of passing a glottal excitation wave­ discussed.
    [Show full text]
  • Edinburgh Research Explorer
    Edinburgh Research Explorer Designing Speech and Multimodal Interactions for Mobile, Wearable, and Pervasive Applications Citation for published version: Munteanu, C, Irani, P, Oviatt, S, Aylett, M, Penn, G, Pan, S, Sharma, N, Rudzicz, F, Gomez, R, Nakamura, K & Nakadai, K 2016, Designing Speech and Multimodal Interactions for Mobile, Wearable, and Pervasive Applications. in Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems. CHI EA '16, ACM, New York, NY, USA, pp. 3612-3619, 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, San Jose, United States, 7/05/16. https://doi.org/10.1145/2851581.2856506 Digital Object Identifier (DOI): 10.1145/2851581.2856506 Link: Link to publication record in Edinburgh Research Explorer Document Version: Peer reviewed version Published In: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 28. Sep. 2021 Designing Speech and Multimodal Interactions for Mobile, Wearable, and Pervasive Applications Cosmin Munteanu Nikhil Sharma University of Toronto Google, Inc.
    [Show full text]