Lecture # 07 (Phonetics)

Phonetics Phonetics (Introduction), Syllables, - text-to-speech recognition, - Onset, nucleus - Phones, - Coda, Rime, Speech sounds and Phonetic - Consonants and vowels, transcription, Phonological Categories and - International Phonetic Alphabet Pronunciation variation, (IPA), - ARPAbet, - Phonetic features, Articulatory Phonetics, - Predicting Phonetic variation, - The Vocal organs, - Factors Influencing Phonetic - Consonants: Place of Articulation, variation, - Consonants: Manner of Articulation, - Vowels, - Syllables, @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 1. Phonetics (Introduction) “ What is the proper pronunciation of word “Special”” Text-to-speech conversation (converting strings of text words into acoustic waveforms). “Phonics” methods of teaching to children; - is first glance like a purely modern educational debate. Phonetics is the study of linguistic sounds, - how they are produced by the articulators of the human vocal tract, - how they are realized acoustically, - and how this acoustic realization can be digitized and processed. A key element of both speech recognition and text-to-speech systems: - how words are pronounced in terms of individual speech units called phones. @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 2. Speech Sounds and Phonetic Transcription The study of the pronunciation of words is part of the field of phonetics, We model the pronunciation of a word as a string of symbols which represent phones or segments. - A phone is a speech sound; - phones are represented with phonetic symbols that bear some resemblance to a letter. We use 2 different alphabets for describing phones as; (1) The International Phonetic Alphabet (IPA) - is an evolving standard originally developed by the International Phonetic Association in 1888 with the goal of transcribing the sounds of all human languages. (2) The ARPAbet is another phonetic alphabet, - but one that is specifically designed for American English and which uses ASCII symbols; - ARPAbet symbols are often used in applications where non-ASCII fonts are inconvenient, such as in on-line pronunciation dictionaries. @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 2. Speech Sounds and Phonetic Transcription (Cont…) Most of the IPA and ARPAbet symbols are equivalent; For example; the ARPAbet phone [p] represents the consonant sound at the beginning of platypus and puma. Mapping between the letters of English is relatively opaque; - a single letter can represent very different sounds in different contexts. For example; letter c corresponds to phone [k] in cougar [k uw g axr]. - but phone [s] in cell [s eh l]. - Besides appearing as c and k, the phone [k] can appear as part of x (fox [f aa k s]). @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 3. Articulatory Phonetics How phones are produced as the various organs in the mouth, throat, and nose modify the airflow from the lungs. 3.1 The Vocal Organs . Sounds made with the vocal folds together and vibrating are called voiced. - Voiced sounds includes [b], [d], [g], [v], [z]. Sounds made without this vocal cord vibration are called unvoiced or voiceless. - Unvoiced sounds includes [p], [t], [k], [f], [s]. @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 3. Articulatory Phonetics (Cont…) 3.1 The Vocal Organs . Phones are divided into two classes. consonants and vowels. - Both kinds of sounds are formed by the motion of air through the mouth, throat or nose. (1) Consonants are made by restricting or blocking the airﬂow in some way, and may be voiced or unvoiced. - e.g., [p], [b], [t], [d], [k], [g], [f], [v], [s], [z], [r], [l], etc. are consonants. (2) Vowels have less obstruction, are usually voiced, and are generally louder and longer- lasting than consonants. - e.g., [aa], [ae], [ao], [ih], [aw], [ow], [uw], etc., are vowels. @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 3. Articulatory Phonetics 3.2 Consonants: Place of Articulation . Consonants are made by restricting the airflow in some way; - the point of maximum restriction is called the place of articulation. Automatic speech recognition as a useful way of grouping phones into equivalence classes, described as; (a) Labial : - consonant whose main restriction is formed by the two lips coming together. e.g., [p] possum, [b] as in bear, and [m] as in marmot. (b) Dental : - Sounds that are made by placing the tongue against the teeth are dentals. e.g., [th] of thing or the [dh] of though, @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 3. Articulatory Phonetics (Cont…) 3.2 Consonants: Place of Articulation (c) Alveolar: - The alveolar ridge is the portion of the roof of the mouth just behind the upper teeth. e.g., phones [s], [z], [t], and [d]. (d) Palatal: - The roof of the mouth (the palate) rises sharply from the back of the alveolar ridge. e.g., [sh] (shrimp), [ch](china), [zh](Asian), and [jh] (jar). (e) Velar: - a movable muscular ﬂap at the very back of the roof of the mouth. e.g., [k] (cuckoo), [g] (goose) and [n] (kingfisher). @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 3. Articulatory Phonetics 3.3 Vowels (a) Front vowels: - Vowels in which the tongue is raised toward the front are called front vowels; e.g., [ih] and [eh] are front vowels. (b) Mid vowels or low vowels: - vowels with mid or low values of maximum tongue height are called mid vowels or low vowels, respectively. e.g., [ae]. (c) Round vowels: - dimension for vowels is the shape of the lips. - vowels are pronounced with the lips rounded (the same lip shape used for whistling). e.g., [uw],[ao], and [ow]. @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 3. Articulatory Phonetics 3.4 Syllables . Consonants and vowels combine to make a syllable. - a syllable is a vowel-like (or sonorant) sound together with some of the surrounding consonants. For example; - The word dog has one syllable, [d aa g], - while the word catnip has two NUCLEUS syllables, [k ae t] and [n ih p], @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 3. Articulatory Phonetics (Cont…) 3.4 Syllables (Terms) (a) Onset : - The optional initial consonant or set of consonants is called the onset. e.g., In word dog, [d] is onset. - If the onset has more than one consonant (as in the word strike [s t r ay k]), we say it has a complex onset. (b) Nucleus : - the vowel at the core of a syllable the nucleus. e.g., In word dog, [o] is nucleus. (c) Coda : - The coda. is the optional consonant or sequence of consonants following the nucleus. e.g., In word dog, [g] is coda. (d) Rime or rhyme : - is the nucleus plus coda. e.g., In word dog, [o] is nucleus and [g] is coda. @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 3. Articulatory Phonetics (Class Participation) Draw hierarchical flow by considering Syllable structures of the following wording as; Structures, Grammars, Computational, Consonants, NLP, Unstressed, Optional, Prominent, Lexicons, Surrounding, Although, Reduction, Diphthong, Probabilistic, Knowledge, Important, Somewhat, Secondary, Dimension, Neutral, Discussion, @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 4. Phonological categories and Pronunciation variation If each word was pronounced with a fixed string of phones having all contexts and by all speakers, - the speech recognition and speech synthesis tasks would be really easy. But, the realization of words and phones varies massively depending on many factors. Figure shows a wide variation in pronunciation in the words “because” and “about” from the hand-transcribed Switchboard corpus of American English telephone conversations. @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 4. Phonological categories and Pronunciation variation (Cont…) How can we model and predict this extensive variation? - One useful tool is that speaker’s mind are abstract categories rather than phones. For Example; consider the different pronunciations of [t] in the words tunafish and starfish. CASE 1: In case of [t] of tunafish is aspirated. - Aspiration is a period of voicelessness after a stop closure and before the onset of voicing of the following vowel. CASE 2: By contrast, In case of [t] following an initial [s] is unaspirated; - the [t] in starfish ([s t aa r f ih sh]) has no period of voicelessness after the [t] closure. CASE 3: Glottal stop : either stops or becomes irregular with a low rate and sudden drop in intensity. e.g., In word kitten, [t] act as stop, irregular and low rate voice. @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 4. Phonological categories and Pronunciation variation (Cont…) CASE 4: Glottal stop t: word-finally. e.g., In word bat, [t] become final word. CASE 5: Tap : occurs between two vowels. e.g., In word butter, [t] between [u, e]. CASE 6: Unreleased t : first stop of a cluster has no audible release, before consonant with no release burst. e.g., In word fruitcake, [t] acting as no release burst. Another example is doc tor, CASE 7: dental t : the tongue touches the back of the teeth. e.g., In word eighth, [th] become dental. CASE 8: Deleted t : sometimes word-finally. e.g., past. @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/) 4. Phonological categories and Pronunciation variation (Cont…) One major factor of influencing variation is that; - faster the speaker talks, the more the sounds are shortened, reduced and generally run together.

Lecture # 07 (Phonetics)

Rule-Based Standard Arabic Phonetization at Phoneme, Allophone, and Syllable Level

UC Berkeley Dissertations, Department of Linguistics

Multilingual Speech Recognition for Selected West-European Languages

Phonemic Similarity Metrics to Compare Pronunciation Methods

Phonetic Properties of Oral Stops in Three Languages with No Voicing Distinction

Techniques and Challenges in Speech Synthesis Final Report for ELEC4840B

Automatic Phonetization-Based Statistical Linguistic Study of Standard Arabic

A Tutorial on Acoustic Phonetic Feature Extraction for Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) Applications in African Languages

Synchronizing Keyframe Facial Animation to Multiple Text-To- Speech Engines and Natural Voice with Fast Response Time

UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations

Learning Allophones: What Input Is Necessary?

Using Phonetic Knowledge in Tools and Resources for Natural Language Processing and Pronunciation Evaluation