The Phonetic Analysis of Speech Corpora
Total Page:16
File Type:pdf, Size:1020Kb
The Phonetic Analysis of Speech Corpora Jonathan Harrington Institute of Phonetics and Speech Processing Ludwig-Maximilians University of Munich Germany email: [email protected] Wiley-Blackwell 2 Contents Relationship between International and Machine Readable Phonetic Alphabet (Australian English) Relationship between International and Machine Readable Phonetic Alphabet (German) Downloadable speech databases used in this book Preface Notes of downloading software Chapter 1 Using speech corpora in phonetics research 1.0 The place of corpora in the phonetic analysis of speech 1.1 Existing speech corpora for phonetic analysis 1.2 Designing your own corpus 1.2.1 Speakers 1.2.2 Materials 1.2.3 Some further issues in experimental design 1.2.4 Speaking style 1.2.5 Recording setup 1.2.6 Annotation 1.2.7 Some conventions for naming files 1.3 Summary and structure of the book Chapter 2 Some tools for building and querying labelling speech databases 2.0 Overview 2.1 Getting started with existing speech databases 2.2 Interface between Praat and Emu 2.3 Interface to R 2.4 Creating a new speech database: from Praat to Emu to R 2.5 A first look at the template file 2.6 Summary 2.7 Questions Chapter 3 Applying routines for speech signal processing 3.0 Introduction 3.1 Calculating, displaying, and correcting formants 3.2 Reading the formants into R 3.3 Summary 3.4 Questions 3.5 Answers Chapter 4 Querying annotation structures 4.1 The Emu Query Tool, segment tiers and event tiers 4.2 Extending the range of queries: annotations from the same tier 4.3 Inter-tier links and queries 4.4 Entering structured annotations with Emu 4.5 Conversion of a structured annotation to a Praat TextGrid 4.6 Graphical user interface to the Emu query language 4.7 Re-querying segment lists 4.8 Building annotation structures semi-automatically with Emu-Tcl 4.9 Branching paths 4.10 Summary 4.11 Questions 4.12 Answers 3 Chapter 5 An introduction to speech data analysis in R: a study of an EMA database 5.1 EMA recordings and the ema5 database 5.2 Handling segment lists and vectors in Emu-R 5.3 An analysis of voice onset time 5.4 Inter-gestural coordination and ensemble plots 5.4.1 Extracting trackdata objects 5.4.2 Movement plots from single segments 5.4.3 Ensemble plots 5.5 Intragestural analysis 5.5.1 Manipulation of trackdata objects 5.5.2 Differencing and velocity 5.5.3 Critically damped movement, magnitude, and peak velocity 5.6 Summary 5.7 Questions 5.8 Answers Chapter 6 Analysis of formants and formant transitions 6.1 Vowel ellipses in the F2 x F1 plane 6.2 Outliers 6.3 Vowel targets 6.4 Vowel normalisation 6.5 Euclidean distances 6.5.1 Vowel space expansion 6.5.2 Relative distance between vowel categories 6.6 Vowel undershoot and formant smoothing 6.7 F2 locus, place of articulation and variability 6.8 Questions 6.9 Answers Chapter 7 Electropalatography 7.1 Palatography and electropalatography 7.2 An overview of electropalatography in Emu-R 7.3 EPG data reduced objects 7.3.1 Contact profiles 7.3.2 Contact distribution indices 7.4 Analysis of EPG data 7.4.1 Consonant overlap 7.4.2 VC coarticulation in German dorsal fricatives 7.5 Summary 7.6 Questions 7.7 Answers Chapter 8 Spectral analysis. 8.1 Background to spectral analysis 8.1.1 The sinusoid 8.1.2 Fourier analysis and Fourier synthesis 8.1.3 Amplitude spectrum 8.1.4 Sampling frequency 8.1.5 dB-Spectrum 8.1.6 Hamming and Hann(ing) windows 8.1.7 Time and frequency resolution 4 8.1.8 Preemphasis 8.1.9 Handling spectral data in Emu-R 8.2 Spectral average, sum, ratio, difference, slope 8.3 Spectral moments 8.4 The discrete cosine transformation 8.4.1 Calculating DCT-coefficients in EMU-R 8.4.2 DCT-coefficients of a spectrum 8.4.3 DCT-coefficients and trajectory shape 8.4.4 Mel- and Bark-scaled DCT (cepstral) coefficients 8.5 Questions 8.6 Answers Chapter 9 Classification 9.1 Probability and Bayes theorem 9.2 Classification: continuous data 9.2.1 The binomial and normal distributions 9.3 Calculating conditional probabilities 9.4 Calculating posterior probabilities 9.5 Two-parameters: the bivariate normal distribution and ellipses 9.6 Classification in two dimensions 9.7 Classifications in higher dimensional spaces 9.8 Classifications in time 9.8.1 Parameterising dynamic spectral information 9.9 Support vector machines 9.10 Summary 9.11 Questions 9.12 Answers References 5 Relationship between Machine Readable (MRPA) and International Phonetic Alphabet (IPA) for Australian English. MRPA IPA Example Tense vowels i: i: heed u: ʉ: who'd o: ɔ: hoard a: ɐ: hard @: ɜ: heard Lax vowels I ɪ hid U ʊ hood E ɛ head O ɔ hod V ɐ bud A æ had Diphthongs I@ ɪә here E@ eә there U@ ʉә tour ei æɪ hay ai ɐɪ high au æʉ how oi ɔɪ boy ou ɔʉ hoe Schwa @ ә the Consonants p p pie b b buy t t tie d d die k k cut g g go tS ʧ church dZ ʤ judge H h (Aspiration/stop release) m m my n n no N ŋ sing f f fan v v van 6 T θ think D ð the s s see z z zoo S ʃ shoe Z ʒ beige h h he r ɻ road w w we l l long j j yes 7 Relationship between Machine Readable (MRPA) and International Phonetic Alphabet (IPA) for German. The MRPA for German is in accordance with SAMPA (Wells, 1997), the speech assessment methods phonetic alphabet. MRPA IPA Example Tense vowels and diphthongs 2: ø: Söhne 2:6 øɐ stört a: a: Strafe, Lahm a:6 a:ɐ Haar e: e: geht E: ɛ: Mädchen E:6 ɛ:ɐ fährt e:6 e:ɐ werden i: i: Liebe i:6 i:ɐ Bier o: o: Sohn o:6 o:ɐ vor u: u: tun u:6 u:ɐ Uhr y: y: kühl y:6 y:ɐ natürlich aI aɪ mein aU aʊ Haus OY ɔY Beute Lax vowels and diphthongs U ʊ Mund 9 œ zwölf a a nass a6 aɐ Mark E ɛ Mensch E6 ɛɐ Lärm I ɪ finden I6 ɪɐ wirklich O ɔ kommt O6 ɔɐ dort U6 ʊɐ durch Y Y Glück Y6 Yɐ würde 6 ɐ Vater Consonants p p Panne b b Baum t t Tanne d d Daumen k k kahl 8 g g Gaumen pf pf Pfeffer ts ʦ Zahn tS ʧ Cello dZ ʤ Job Q ʔ (Glottal stop) h h (Aspiration) m m Miene n n nehmen N ŋ lang f f friedlich v v weg s s lassen z z lesen S ʃ schauen Z ʒ Genie C ç riechen x x Buch, lachen h h hoch r r, ʁ Regen l l lang j j jemand 9 Downloadable speech databases used in this book Database Description Language/di n S Signal Annotations Source name alect files aetobi A fragment of American 17 various Audio Word, tonal, Beckman et al the AE-TOBI English break. (2005); Pitrelli database: Read et al (1994); and Silverman et spontaneous al (1992) speech. ae Read Australian 7 1M Audio, Prosodic, Millar et al sentences English spectra, phonetic, (1997); Millar formants tonal. et al (1994) andosl Read Australian 200 2M Audio, Same as ae Millar et al sentences English formants (1997); Millar et al (1994) ema5 Read Standard 20 1F Audio, Word, Bombien et al (ema) sentences German EMA phonetic, (2007) tongue-tip, tongue-body epgassim Isolated words Australian 60 1F Audio, Word, Stephenson & English EPG phonetic Harrington (2002); Stephenson (2003) epgcoutts Read speech Australian 2 1F Audio, Word. Passage from English EPG Hewlett & Shockey (1992) epgdorsal Isolated words German 45 1M Audio, Word, Ambrazaitis & EPG, phonetic. John (2004) formants epgpolish Read Polish 40 1M Audio, Word, Guzik & sentences EPG phonetic Harrington (2007) first 5 utterances from gerplosives gerplosives Isolated words German 72 1M Audio, Phonetic Unpublished in carrier spectra sentence gt Continous German 9 various Audio, f0 Word, Utterances speech Break, Tone from various sources isolated Isolated word Australian 218 1M Audio, Phonetic As ae above production English formants. b-widths 10 kielread Read German 200 1M, 1F Audio, Phonetic Simpson sentences formants (1998), Simpson et al (1997). mora Read Japanese 1 1F Audio Phonetic Unpublished second Two speakers from gerplosives stops Isolated words German 470 3M,4F Audio, Phonetic unpublished in carrier formants sentence timetable Timetable German 5 1M Audio Phonetic As enquiries kielread 11 Preface In undergraduate courses that include phonetics, students typically acquire skills both in ear-training and an understanding of the acoustic, physiological, and perceptual characteristics of speech sounds. But there is usually less opportunity to test this knowledge on sizeable quantities of speech data partly because putting together any database that is sufficient in extent to be able to address non-trivial questions in phonetics is very time-consuming. In the last ten years, this issue has been offset somewhat by the rapid growth of national and international speech corpora which has been driven principally by the needs of speech technology. But there is still usually a big gap between the knowledge acquired in phonetics from classes on the one hand and applying this knowledge to available speech corpora with the aim of solving different kinds of theoretical problems on the other. The difficulty stems not just from getting the right data out of the corpus but also in deciding what kinds of graphical and quantitative techniques are available and appropriate for the problem that is to be solved.