Speech Pattern Recognition for Speech to Text Conversion”, Thesis Phd, Saurashtra University

Total Page:16

File Type:pdf, Size:1020Kb

Speech Pattern Recognition for Speech to Text Conversion”, Thesis Phd, Saurashtra University Saurashtra University Re – Accredited Grade ‘B’ by NAAC (CGPA 2.93) Kumbharana, Chandresh K., 2007, “Speech Pattern Recognition for Speech to Text Conversion”, thesis PhD, Saurashtra University http://etheses.saurashtrauniversity.edu/id/eprint/337 Copyright and moral rights for this thesis are retained by the author A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the Author. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the Author When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given. Saurashtra University Theses Service http://etheses.saurashtrauniversity.edu [email protected] © The Author Speech Pattern Recognition for Speech To Text Conversion A thesis to be submitted to Saurashtra University in fulfillment of the requirement for the degree of Doctor of Philosophy (Computer Science) By CK Kumbharana Guided by Dr. N. N. Jani Prof. & Head Department of Computer Science Saurashtra University Rajkot Department of Computer Science Saurashtra University Rajkot November - 2007 Certificate by Guide This is to certify that the contents of the thesis entitled “Speech Pattern Recognition for Speech to Text Conversion” is the original research work of Mr. Kumbharana Chandresh Karsanbhai carried out under my supervision and guidance. I further certified that the work has not been submitted either partially or fully to any other university/Institute for the award of any degree. Place: Rajkot Dr. N.N. Jani Date: 30-11-2007 Prof. & Head Department of Computer Science Saurashtra University, Rajkot I Candidate’s Statement I hereby declare that the work incorporated in the present thesis is original and has not been submitted to any University/Institution for the award of the Degree. I further declare that the result presented in the thesis, considerations made there-in, contribute in general the advancement of knowledge in education and particular to – “Speech Pattern Recognition For Speech to Text Conversion” Date: 30-11-2007 (CK Kumbharana) Candidate II Acknowledgement I extend my sincere thanks to Dr. NN Jani, Professor and Head, Department of Computer Science, who has permitted to work under his guidance. I take this opportunity to convey my deepest gratitude to him for his valuable advice, comfort encouragement, keen interest, constructive criticism, scholarly guidance & wholehearted support. I admire the assistance of my friend Prof. Suneelchandra Ramanth Dwivedi, Associate Professor, Department of Computer Science, Bhavnagar University, Bhavnagar, for his help and enthusiastic support during the work. I express my special thanks to Dr. Anil D. Ambasana, The Associate Professor, Department of Education, Saurashtra University, Rajkot for his constant encouragement, elderly advice and for providing the required moral support during the work. His mentorship is a rewarding experience, which I will treasure my whole life. I am highly obliged by the five persons Dr Anil Ambasana, Mr. Prashant Ambasana, Heena Kumbharana, Mansi Kumbharana & Kirti Makadia. Who has given their valuable time and voice for recording of the Gujarati Characters, which is finally used as the master database for my researcher work. I have no words to express my sincere thanks to all the teaching and non-teaching staff of my department i.e. Department of Computer Science, Saurashtra University, Rajkot. I can also not forget all the current as well as passed students of MCA who are always eagerly seeking and praying to “GOD” for the completion of my work as per my ambition and desire. I am afraid the list of names will be very long one. I am grateful to their encouragement and convictions and again thanks them all with due regards. III I must express my indebtedness to my father Karasanbhai and mother Dhirajben, wife Heena & my lovely loving daughter Ku. Mansi. Without their kind and silent support this work could not have been completed. Above all, I offered my heartiest regard to “ GOD ” for giving strength to work inspiration. -CK Kumbharana IV List of Figures Figure Page Name of Figure No. No. 2.1 Simple speech production model..………………………………… 23 2.2 Schematic view of human speech production mechanism..……….. 24 2.3 Mechanism for creating speech……………………………………. 25 2.4 Speech coding and decoding process……………………………… 27 2.5 Speech analysis synthesis system………………………………….. 27 2.6 Synthesis models applied in speech coding ……………………… 27 2.7 Block diagram of text-to-speech synthesis system………………... 29 2.8 Speech Enhancement by spectral magnitude estimation…………... 31 2.9 Basic structure of speaker recognition system…….………………. 33 2.10 Wave………………………………………………………………. 40 2.11 Wave..……………………………………………………………… 40 2.12 Sine waves of various frequencies………………………………… 41 2.13 Wavelength………………………………………………………… 44 Two tones of different amplitude (frequency and phase is 2.14 47 constant)…………………………………………………………… The male scale, showing the relationship between pitch (in mels) 2.15 48 and frequency in Hertz…………………………………………….. 2.16 Zero crossing………………………………………………………. 52 2.17 The basic/enhanced SSP pipeline………………………………….. 57 3.1 Speech Processing Mechanism……………………………………. 94 3.2 Speech recognition system………………………………………… 95 3.3 Model for speech file preparation…………………………………. 96 3.4 Speech ditaction process…………………………………………... 96 3.5 Speech recognition system setup…………………………………... 97 4.1 Layout of S in Glyph………………………………………………. 123 4.2 Layout of F in Glyph……………………………………………….. 124 4.3 Layout of l in Glyph……………………………………………… 124 4.4 Layout of L in Glyph………………………………………………. 124 4.5 Layout of ] in Glyph……………………………………………... 124 4.6 Layout of } in Glyph……………………………………………... 125 4.7 Layout of [ in Glyph………………………………………………. 125 4.8 Layout of { in Glyph………………………………………………. 125 4.9 Layout of M in Glyph……………………………………………… 125 4.10 Layout of Á in Glyph……………………………………………… 126 4.11 Layout of \ in Glyph………………………………………………. 126 V Figure Page Name of Figure No. No. 4.12 Layout of o in Glyph……………………………………………… 126 4.13 Main screen for editor ckksedit……………………………………. 130 4.14 Title bar……………………………………………………………. 130 4.15 Menu Bar…………………………………………………………... 131 4.16 File menu…………………………………………………………... 131 4.17 Edit menu………………………………………………………….. 133 4.18 Format menu………………………………………………………. 134 4.19 Toolbar – 1………………………………………………………… 135 4.20 Toolbar – 2………………………………………………………… 136 4.21 Editing region……………………………………………………… 137 4.22 Status bar…………………………………………………………... 138 4.23 Accelerator (IDR_MENU1)……………………………………….. 155 4.24 Dialog IDD_CKKPLAY…………………………………………... 155 4.25 Icon (IDR_MENU1)………………………………………………. 156 4.26 Cepstrum analysis steps for pitch detection……………………….. 168 Conceptual diagram illustrating vector quantization codebook 4.27 169 formation.………………………………………………………….. 4.28 Block diagram of the MFCC processor……………………………. 173 4.29 Original & extracted wave signal of S for male speaker…………... 174 4.30 Mel-frequency cepstral coefficient of S for male speaker………… 175 4.31 Detailed fft magnitude of S for male speaker……………………… 175 4.32 log 10 mel-scale filter bank output of S for male speaker…………. 176 4.33 Original & extracted wave signal of S for female speaker………… 177 4.34 Mel-frequency cepstral coefficient of S for female speaker………. 178 4.35 Detailed fft magnitude of S for female speaker…………………… 178 4.36 log 10 mel-scale filter bank output of S for female speaker………. 179 4.37 Original & extracted wave signal of R for male speaker………….. 180 4.38 Mel-frequency cepstral coefficient of R for male speaker………… 181 4.39 Detailed fft magnitude of R for male speaker……………………... 181 4.40 log 10 mel-scale filter bank output of R for male speaker………… 182 4.41 Original & extracted wave signal of R for female speaker………... 183 4.42 Mel-frequency cepstral coefficient of R for female speaker………. 184 4.43 Detailed fft magnitude of R for female speaker…………………… 184 4.44 log 10 mel-scale filter bank output of R for female speaker………. 185 4.45 Original & extracted wave signal of Z for male speaker…………... 186 4.46 Mel-frequency cepstral coefficient of Z for male speaker…..……... 187 VI Figure Page Name of Figure No. No. 4.47 Detailed fft magnitude of Z for male speaker……………………… 187 4.48 log 10 mel-scale filter bank output of Z for male speaker…………. 188 4.49 Original & extracted wave signal of Z for female speaker………… 189 4.50 Mel-frequency cepstral coefficient of Z for female speaker……….. 190 4.51 Detailed fft magnitude of Z for female speaker……………………. 190 4.52 log 10 mel-scale filter bank output of Z for female speaker………. 191 4.53 Original & extracted wave signal of T for male speaker………….. 192 4.54 Mel-frequency cepstral coefficient of T for male speaker………… 193 4.55 Detailed fft magnitude of T for male speaker……………………... 193 4.56 log 10 mel-scale filter bank output of T for male speaker………… 194 4.57 Original & extracted wave signal of T for female speaker………... 195 4.58 Mel-frequency cepstral coefficient of T for female speaker………. 196 4.59 Detailed fft magnitude of T for female speaker…………………… 196 4.60 log 10 mel-scale filter bank output of “T for female speaker……… 197 4.61 Original & extracted wave signal of D for male speaker………….. 198 4.62 Mel-frequency cepstral coefficient of D for male speaker………… 199 4.63 Detailed fft magnitude of D for male speaker……………………... 199 4.64 log 10 mel-scale filter bank output of D for male speaker………… 200 4.65 Original & extracted wave signal of D for female speaker………... 201 4.66 Mel-frequency cepstral coefficient of D for female speaker……… 202 4.67 Detailed fft magnitude of D for female speaker…………………… 202 4.68 log 10 mel-scale filter bank output of D for female speaker………. 203 VII List of Tables Table Page Name of Table No. No. 1.1 Comparative char. for dragon naturally speaking professional and 12 IBM via voice………………………………………………………... 4.1 Font measurement…………………………………………………… 122 4.2 Character measurement……………………………………………… 123 4.3 File menu popup options…………………………………………….
Recommended publications
  • Commercial Tools in Speech Synthesis Technology
    International Journal of Research in Engineering, Science and Management 320 Volume-2, Issue-12, December-2019 www.ijresm.com | ISSN (Online): 2581-5792 Commercial Tools in Speech Synthesis Technology D. Nagaraju1, R. J. Ramasree2, K. Kishore3, K. Vamsi Krishna4, R. Sujana5 1Associate Professor, Dept. of Computer Science, Audisankara College of Engg. and Technology, Gudur, India 2Professor, Dept. of Computer Science, Rastriya Sanskrit VidyaPeet, Tirupati, India 3,4,5UG Student, Dept. of Computer Science, Audisankara College of Engg. and Technology, Gudur, India Abstract: This is a study paper planned to a new system phonetic and prosodic information. These two phases are emotional speech system for Telugu (ESST). The main objective of usually called as high- and low-level synthesis. The input text this paper is to map the situation of today's speech synthesis might be for example data from a word processor, standard technology and to focus on potential methods for the future. ASCII from e-mail, a mobile text-message, or scanned text Usually literature and articles in the area are focused on a single method or single synthesizer or the very limited range of the from a newspaper. The character string is then preprocessed and technology. In this paper the whole speech synthesis area with as analyzed into phonetic representation which is usually a string many methods, techniques, applications, and products as possible of phonemes with some additional information for correct is under investigation. Unfortunately, this leads to a situation intonation, duration, and stress. Speech sound is finally where in some cases very detailed information may not be given generated with the low-level synthesizer by the information here, but may be found in given references.
    [Show full text]
  • Models of Speech Synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden
    Proc. Natl. Acad. Sci. USA Vol. 92, pp. 9932-9937, October 1995 Colloquium Paper This paper was presented at a colloquium entitled "Human-Machine Communication by Voice," organized by Lawrence R. Rabiner, held by the National Academy of Sciences at The Arnold and Mabel Beckman Center in Irvine, CA, February 8-9,1993. Models of speech synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden ABSTRACT The term "speech synthesis" has been used need large amounts of speech data. Models working close to for diverse technical approaches. In this paper, some of the the waveform are now typically making use of increased unit approaches used to generate synthetic speech in a text-to- sizes while still modeling prosody by rule. In the middle of the speech system are reviewed, and some of the basic motivations scale, "formant synthesis" is moving toward the articulatory for choosing one method over another are discussed. It is models by looking for "higher-level parameters" or to larger important to keep in mind, however, that speech synthesis prestored units. Articulatory synthesis, hampered by lack of models are needed not just for speech generation but to help data, still has some way to go but is yielding improved quality, us understand how speech is created, or even how articulation due mostly to advanced analysis-synthesis techniques. can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages Flexibility and Technical Dimensions are discussed as special challenges facing the speech synthesis community.
    [Show full text]
  • Annotating Multimodal Data of Singing and Speaking Coralie Vincent
    Annotating Multimodal data of Singing and Speaking Coralie Vincent To cite this version: Coralie Vincent. Annotating Multimodal data of Singing and Speaking. Frank A. Russo; Beatriz Ilari; Annabel J. Cohen. The Routledge Companion to Interdisciplinary Studies in Singing, I, Routledge, 2020, Development, 9781315163734. hal-02869809 HAL Id: hal-02869809 https://hal.archives-ouvertes.fr/hal-02869809 Submitted on 18 Jan 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Annotating multimodal data of singing and speaking Coralie Vincent, Formal Structures of Language Lab, UPL, CNRS-Paris 8 University, France Introduction With the advent of complex multimodal corpora, the use of annotation software is now an almost mandatory step in the process of quantitative data analysis. Despite the fact that annotating multimodal data in a widely interoperable and sustainable format is challenging, within the last two decades considerable progress has been made through the development in academia of free and often open-source software allowing annotation of video and audio resources. The present chapter provides a brief update on the most recent advances made in the field of (free of charge) annotation software, focusing on three multimodal annotation tools: ANVIL (Kipp, 2012), EXMARaLDA (Schmidt & Wörner, 2014) and ELAN (Wittenburg, Brugman, Russel, Klassmann, & Sloetjes, 2006).
    [Show full text]
  • Text-To-Speech Synthesis System for Marathi Language Using Concatenation Technique
    Text-To-Speech Synthesis System for Marathi Language Using Concatenation Technique Submitted to Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, INDIA For the award of Doctor of Philosophy by Mr. Sangramsing Nathsing Kayte Supervised by: Dr. Bharti W. Gawali Professor Department of Computer Science and Information Technology DR. BABASAHEB AMBEDKAR MARATHWADA UNIVERSITY, AURANGABAD November 2018 Abstract A speech synthesis system is a computer-based system that should be able to read any text aloud with a particular language or multiple languages. This is also called as Text-to- Speech synthesis or in short TTS. Communication plays an important role in everyones life. Usually communication refers to speaking or writing or sending a message to another person. Speech is one of the most important ways for human communication. There have been a great number of efforts to incorporate speech for communication between humans and computers. Demand for technologies in speech processing, such as speech recogni- tion, dialogue processing, natural language processing, and speech synthesis are increas- ing. These technologies are useful for human-to-human like spoken language translation systems and human-to-machine communication like control for handicapped persons etc., TTS is one of the key technologies in speech processing. TTS system converts ordinary orthographic text into acoustic signal, which is indis- tinguishable from human speech. For developing a natural human machine interface, the TTS system can be used as a way to communicate back, like a human voice by a computer. The TTS can be a voice for those people who cannot speak. People wishing to learn a new language can use the TTS system to learn the pronunciation.
    [Show full text]
  • Wormed Voice Workshop Presentation
    Wormed Voice Workshop Presentation micro_research December 27, 2017 1 some worm poetry and songs: The WORM was for a long time desirous to speake, but the rule and order of the Court enjoyned him silence, but now strutting and swelling, and impatient, of further delay, he broke out thus... [Michael Maier] He worshipped the worm and prayed to the wormy grave. Serpent Lucifer, how do you do? Of your worms and your snakes I'd be one or two; For in this dear planet of wool and of leather `Tis pleasant to need neither shirt, sleeve, nor shoe, 2 And have arm, leg, and belly together. Then aches your head, or are you lazy? Sing, `Round your neck your belly wrap, Tail-a-top, and make your cap Any bee and daisy. Two pigs' feet, two mens' feet, and two of a hen; Devil-winged; dragon-bellied; grave- jawed, because grass Is a beard that's soon shaved, and grows seldom again worm writing the the the the,eeeronencoug,en sthistit, d.).dupi w m,tinsprsool itav f bometaisp- pav wheaigelic..)a?? orerdi mise we ich'roo bish ftroo htothuloul mespowouklain- duteavshi wn,jis, sownol hof." m,tisorora angsthyedust,es, fofald,junss ownoug brad,)fr m fr,aA?a????ck;A?stelav aly, al is.'rady'lfrdil owoncorara wns t.) sh'r, oof ofr,a? ar,a???????a? fu mo towess,eethen hrtolly-l,."tigolav ict,a???!ol, w..'m,elyelil,tstreamas..n gotaillas.tansstheatsea f mb ispot inici t.) owar.**1 wnshigigholoothtith orsir.tsotic.'m, sotamimoledug imootrdeavet..t,) sh s,tranciror."wn sieee h asinied.tiear wspilotor,) bla av.nicord,ier.dy'et.*tite m.)..*d, hrouceto hie, ig il m, bsomoug,.t.'l,t, olitel bs,.nt,.dotr tat,)aa? htotitedont,j alesil, starar,ja taie ass.nishiceroouldseal fotitoonckysil, m oitispl o anteeeaicowousomirot.
    [Show full text]
  • A Tooi to Support Speech and Non-Speech Audio Feedback Generation in Audio Interfaces
    A TooI to Support Speech and Non-Speech Audio Feedback Generation in Audio Interfaces Lisa J. Stfelman Speech Research Group MIT Media Labomtory 20 Ames Street, Cambridge, MA 02139 Tel: 1-617-253-8026 E-mail: lisa@?media.mit. edu ABSTRACT is also needed for specifying speech and non-speech audio Development of new auditory interfaces requires the feedback (i.e., auditory icons [9] or earcons [4]). integration of text-to-speech synthesis, digitized audio, and non-speech audio output. This paper describes a tool for Natural language generation research has tended to focus on specifying speech and non-speech audio feedback and its use producing coherent mtdtisentential text [14], and detailed in the development of a speech interface, Conversational multisentential explanations and descriptions [22, 23], VoiceNotes. Auditory feedback is specified as a context-free rather than the kind of terse interactive dialogue needed for grammar, where the basic elements in the grammar can be today’s speech systems. In addition, sophisticated language either words or non-speech sounds. The feedback generation tools are not generally accessible by interface specification method described here provides the ability to designers and developers.z The goal of the work described vary the feedback based on the current state of the system, here was to simplify the feedback generation component of and is flexible enough to allow different feedback for developing audio user interfaces and allow rapid iteration of different input modalities (e.g., speech, mouse, buttons). designs. The declarative specification is easily modifiable, supporting an iterative design process. This paper describes a tool for specifying speech and non- speech audio feedback and its use in the development of a KEYWORDS speech interface, Conversational VoiceNotes.
    [Show full text]
  • Voice Synthesizer Application Android
    Voice synthesizer application android Continue The Download Now link sends you to the Windows Store, where you can continue the download process. You need to have an active Microsoft account to download the app. This download may not be available in some countries. Results 1 - 10 of 603 Prev 1 2 3 4 5 Next See also: Speech Synthesis Receming Device is an artificial production of human speech. The computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware. The text-to-speech system (TTS) converts the usual text of language into speech; other systems display symbolic linguistic representations, such as phonetic transcriptions in speech. Synthesized speech can be created by concatenating fragments of recorded speech that are stored in the database. Systems vary in size of stored speech blocks; The system that stores phones or diphones provides the greatest range of outputs, but may not have clarity. For specific domain use, storing whole words or suggestions allows for high-quality output. In addition, the synthesizer may include a vocal tract model and other characteristics of the human voice to create a fully synthetic voice output. The quality of the speech synthesizer is judged by its similarity to the human voice and its ability to be understood clearly. The clear text to speech program allows people with visual impairments or reading disabilities to listen to written words on their home computer. Many computer operating systems have included speech synthesizers since the early 1990s. A review of the typical TTS Automatic Announcement System synthetic voice announces the arriving train to Sweden.
    [Show full text]
  • Towards Expressive Speech Synthesis in English on a Robotic Platform
    PAGE 130 Towards Expressive Speech Synthesis in English on a Robotic Platform Sigrid Roehling, Bruce MacDonald, Catherine Watson Department of Electrical and Computer Engineering University of Auckland, New Zealand s.roehling, b.macdonald, [email protected] Abstract Affect influences speech, not only in the words we choose, but in the way we say them. This pa- per reviews the research on vocal correlates in the expression of affect and examines the ability of currently available major text-to-speech (TTS) systems to synthesize expressive speech for an emotional robot guide. Speech features discussed include pitch, duration, loudness, spectral structure, and voice quality. TTS systems are examined as to their ability to control the fea- tures needed for synthesizing expressive speech: pitch, duration, loudness, and voice quality. The OpenMARY system is recommended since it provides the highest amount of control over speech production as well as the ability to work with a sophisticated intonation model. Open- MARY is being actively developed, is supported on our current Linux platform, and provides timing information for talking heads such as our current robot face. 1. Introduction explicitly stated otherwise the research is concerned with the English language. Affect influences speech, not only in the words we choose, but in the way we say them. These vocal nonverbal cues are important in human speech as they communicate 2.1. Pitch information about the speaker’s state or attitude more effi- ciently than the verbal content (Eide, Aaron, Bakis, Hamza, Pitch contour seems to be one of the clearest indica- Picheny, and Pitrelli 2004).
    [Show full text]
  • Praat Tutorial
    Számítógépes beszédfeldolgozás Bevezetés a Praat program használatába Oktatási segédanyag1 Nyomtatási verzió Készült: 2013/12/31 Utolsó módosítás: 2014/09/15 * Szekrényes István DE BTK Általános és Alkalmazott Nyelvészeti Tanszék 1 Tartalom 1. Bevezetés 3 2. Hozzáférés, telepítés és vezérlés 4 3. Grafikus kezelőfelület 5 3.1. Praat Objects ablak 5 3.2. Praat Picture ablak 7 3.3. Szerkesztő ablakok 8 3.4. Praat Info ablak 9 4. Funkciók 10 4.1. Beszédakusztikai elemzés 10 4.2. Annotáció 11 4.3. Tanuló algoritmusok 11 4.4. Grafikonok, ábrák készítése 12 4.5. Beszédszintetizálás 12 4.6. Kísérletek készítése 13 4.7. Beszédmanipuláció 13 4.8. Statisztikai elemzés 14 5. Praat szkript 15 5.1. A Praat mint programozási nyelv 15 5.2. A szkriptek futtatása 15 5.3. Utasítások használata 16 5.4. Objektumok kezelése 17 5.5. Nyelvi elemek és használatuk 18 5.6. Vezérlés 19 6. Hivatkozások 22 2 1. Bevezetés Az alábbi jegyzet elsősorban a Digitális bölcsészet mesterképzés (MA) Számítógépes nyelvészet szakirányának gyakorlati kurzusain történő felhasználás céljából készült, de hasznos segédanyagként szolgálhat bármely, a számítógépes beszédfeldolgozás területén tevékenykedő, a Praat2 program használatával most vagy eddig csak felületesen ismerkedő kutató számára is. A jegyzetnek nem célja átfogó képet adni a számítógépes beszédfeldolgozás általános módszereiről és alapfogalmairól, hanem előfeltételezve azok legszükségesebb részének ismeretét, gyakorlati és alkalmazás-specifikus irányban igyekszik útmutatóul szolgálni. A Praat program funkcióinak rövid tárgyalásán és felhasználói felület bemutatásán kívül sor kerül az alkalmazás rendkívül kreatívan kihasználható programozási lehetőségeinek taglalására is. A jegyzet HTML verziójában3 a tartalomjegyzék, a kereszthivatkozások és a jegyzetszámok a könnyebb navigációt szolgálandó hiperhivatkozásként funkcionálnak.
    [Show full text]
  • Using Praat for Linguistic Research
    Using Praat for Linguistic Research Dr. Will Styler Document Version: 1.8.3 Last Update: March 12, 2021 Important! This document will be continually updated and improved. Download the latest version at: http://savethevowels.org/praat Using Praat for Linguistic Research by Will Styler is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. For more information on the specifics of this license, go to http://creativecommons.org/licenses/by-sa/3.0/. 1 Will Styler - Using Praat for Linguistic Research - Version 1.8.2 Contents 1 Version History 4 2 Introduction 7 2.1 Versions ........................................... 7 2.2 Other Resources ...................................... 7 2.3 Getting and Installing Praat ................................ 8 2.4 Using this guide ...................................... 8 3 About Praat 8 3.1 Praat Windows ....................................... 8 4 Recording Sounds 10 4.1 Mono vs. Stereo Recording ................................ 10 5 Opening and Saving Files} 11 5.1 Opening Files ........................................ 11 5.1.1 Working with longer sound files ......................... 11 5.2 Playing Files ........................................ 12 5.3 Saving Files ......................................... 12 6 Phonetic Measurement and Analysis in Praat 13 6.1 Working with Praat Waveforms and Spectrograms ................... 13 6.1.1 Pulling out a smaller section of the file for analysis ............... 14 6.2 Adjusting the Spectrogram settings ............................ 15 6.2.1 Narrowband vs. Broadband Spectrograms .................... 16 6.3 Measuring Duration .................................... 16 6.3.1 Measuring Voice Onset Time (VOT) ....................... 16 6.4 Examining and measuring F0/Pitch ............................ 17 6.4.1 Measuring F0 from a single cycle ......................... 17 6.4.2 Viewing Pitch via a narrowband spectrogram .................. 17 6.4.3 Using Praat’s Pitch Tracking ..........................
    [Show full text]
  • A Unit Selection Text-To-Speech-And-Singing Synthesis Framework from Neutral Speech: Proof of Concept 39 II.1 Introduction
    Adding expressiveness to unit selection speech synthesis and to numerical voice production 90) - 02 - Marc Freixes Guerreiro http://hdl.handle.net/10803/672066 Generalitat 472 (28 de Catalunya núm. Rgtre. Fund. ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de ió la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual undac F (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs. Universitat Ramon Llull Universitat Ramon ADVERTENCIA. El acceso a los contenidos de esta tesis doctoral y su utilización debe respetar los derechos de la persona autora. Puede ser utilizada para consulta o estudio personal, así como en actividades o materiales de investigación y docencia en los términos establecidos en el art. 32 del Texto Refundido de la Ley de Propiedad Intelectual (RDL 1/1996).
    [Show full text]
  • A. Acoustic Theory and Modeling of the Vocal Tract
    A. Acoustic Theory and Modeling of the Vocal Tract by H.W. Strube, Drittes Physikalisches Institut, Universität Göttingen A.l Introduction This appendix is intended for those readers who want to inform themselves about the mathematical treatment of the vocal-tract acoustics and about its modeling in the time and frequency domains. Apart from providing a funda­ mental understanding, this is required for all applications and investigations concerned with the relationship between geometric and acoustic properties of the vocal tract, such as articulatory synthesis, determination of the tract shape from acoustic quantities, inverse filtering, etc. Historically, the formants of speech were conjectured to be resonances of cavities in the vocal tract. In the case of a narrow constriction at or near the lips, such as for the vowel [uJ, the volume of the tract can be considered a Helmholtz resonator (the glottis is assumed almost closed). However, this can only explain the first formant. Also, the constriction - if any - is usu­ ally situated farther back. Then the tract may be roughly approximated as a cascade of two resonators, accounting for two formants. But all these approx­ imations by discrete cavities proved unrealistic. Thus researchers have have now adopted a more reasonable description of the vocal tract as a nonuni­ form acoustical transmission line. This can explain an infinite number of res­ onances, of which, however, only the first 2 to 4 are of phonetic importance. Depending on the kind of sound, the tube system has different topology: • for vowel-like sounds, pharynx and mouth form one tube; • for nasalized vowels, the tube is branched, with transmission from pharynx through mouth and nose; • for nasal consonants, transmission is through pharynx and nose, with the closed mouth tract as a "shunt" line.
    [Show full text]