Accent Features and Idiodictionaries

Total Page:16

File Type:pdf, Size:1020Kb

Accent Features and Idiodictionaries PhD Dissertation Accent Features and Idiodictionaries: On Improving Accuracy for Accented Speakers in ASR Michael Tjalve Department of Phonetics and Linguistics University College London March 2007 Declaration I, Michael Tjalve, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Copyright The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without the prior written consent of the author. 2007 Michael Tjalve ii ABSTRACT One of the most widespread approaches to dealing with the problem of accent variation in ASR has been to choose the most appropriate pronunciation dictionary for the speaker from a predefined set of dictionaries. This approach is weak in two ways: firstly that accent types are more numerous and more variable than can be captured in a few dictionaries, even if the knowledge were available to create them; and secondly, accents vary in the composition and phonotactics of the phone inventory not just in which phones are used in which word. In this work, we identify not the speaker's accent, but accent features which allow us to predict by rule their likely pronunciation of all words in the dictionary. Any given speaker is associated with a set of accent features, but it is not a requirement that those features constitute a known accent. We show that by building a pronunciation dictionary for an individual, an idiodictionary , recognition accuracy can be improved over a system using standard accent dictionaries. The idiodictionary approach could be further enhanced by extending the set of phone models to improve the modelling of phone inventory and variation across accents. However an extended phoneme set is difficult to build since it requires specially-labelled training material, where the labelling is sensitive to the speaker's accent. An alternative is to borrow phone models of a suitable quality from other languages. In this work, we show that this phonetic fusion of languages can improve the recognition accuracy of the speech of an unknown accent. This work has practical application in the construction of speech recognition systems that adapt to speakers' accents. Since it demonstrates the advantages of treating speakers as individuals rather than just as members of a group, the work also has potential implications for how accents are studied in phonetic research generally. iii ACKNOWLEDGEMENTS Many people have contributed to this PhD in many different ways. First and foremost, I would like to thank my research supervisor Dr. Mark Huckvale for academic guidance and for many exciting discussions. I have enjoyed how he has challenged me and my ideas. It has been a privilege to work closely with one of the true experts in the field of accent variation in speech technology. I would like to thank Infinitive Speech Systems and Visteon Corporation for sponsoring my research. The speech data, phonetic data and the speech recogniser used in the majority of the experiments described on the pages below belong to Infinitive Speech Systems, and without this agreement, it would not have been possible to complete the research within the timeframe I had planned. I would like to thank Dr. Maha Kadirkamanathan, without whom this PhD would not even have begun. He has been my industrial mentor within speech technology and he is one of the few persons I have met on the commercial side of the speech community who truly understands the potential of academia and the industry working together. I would also like to thank Darren, Omri, Mark, Paul, Bernard and Gary at Infinitive Speech Systems for many good discussions, ideas and help. I am grateful to Dr. Roger Moore for giving me assistance with the Accents of the British Isles corpus. Also thanks to Simon King and the Centre of Speech Technology Research at University of Edinburgh for making the Unisyn dictionary available. iv Thanks to Mads Torgersen for very valuable proofreading and good suggestions. I would like to thank my mother, Merete Tjalve, for always believing in me and for taking an important part in making me who I am. I would like to thank my father, Eskild Tjalve, for inspiring scientific thinking and for showing the value of thinking outside of the box. Finally, I would like to thank my amazing life companion - my wife Rocío - and our wonderful children, Isabella and Oliver, for bearing up with me spending time on this project of mine. It has been a fascinating journey but it has also been a long one and I could not have done it without your support and understanding. Thank you for always being there. Jeg elsker jer! v TABLE OF CONTENTS Abstract..............................................................................................................................iii Acknowledgements............................................................................................................ iv Table of Contents................................................................................................................ 1 1 Introduction.................................................................................................................. 7 1.1 Aims and overview of thesis ................................................................................. 9 1.2 Scope of research................................................................................................. 11 1.3 General notes about the experiments................................................................... 12 1.3.1 The speech data............................................................................................. 12 1.3.2 The ASR engine............................................................................................ 15 1.3.3 The pronunciation dictionary........................................................................ 17 2 Accent Variation and ASR......................................................................................... 19 2.1 Introductory remarks ........................................................................................... 19 2.2 Accent variation................................................................................................... 19 2.3 The mechanics of an ASR engine........................................................................ 24 2.3.1 The acoustic signal........................................................................................ 25 2.3.2 The front-end ................................................................................................ 26 2.3.3 The back-end................................................................................................. 26 2.3.3.1 The acoustic models ...................................................................................26 2.3.3.2 The pronunciation dictionary......................................................................30 2.3.3.3 The grammar...............................................................................................31 2.3.4 The response ................................................................................................. 34 2.4 Why accent variation is a problem to ASR engines ............................................ 35 2.5 Summary and discussion ..................................................................................... 36 3 Phonetics and Phonology in ASR .............................................................................. 38 3.1 Introductory remarks ........................................................................................... 38 3.2 Phonetic and phonological variation ................................................................... 39 3.3 Phonetic and phonological representation........................................................... 41 3.3.1 The phoneme set........................................................................................... 42 3.3.2 The acoustic models...................................................................................... 44 3.3.3 The pronunciation dictionary........................................................................ 46 3.4 Summary and discussion ..................................................................................... 47 4 Accent Variation Modelling....................................................................................... 49 4.1 Introductory remarks ........................................................................................... 49 4.2 The acoustic models ............................................................................................ 50 4.2.1 Training of the acoustic models.................................................................... 51 4.2.1.1 Details of the experiment............................................................................52 4.2.1.2 Findings ......................................................................................................53 4.2.2 Speaker adaptation of the acoustic models ................................................... 54 4.2.2.1 Details of the experiment............................................................................57 4.2.2.2 Findings ......................................................................................................58 4.3 The pronunciation dictionary............................................................................... 60 4.3.1 The canonical dictionary..............................................................................
Recommended publications
  • The Past, Present, and Future of English Dialects: Quantifying Convergence, Divergence, and Dynamic Equilibrium
    Language Variation and Change, 22 (2010), 69–104. © Cambridge University Press, 2010 0954-3945/10 $16.00 doi:10.1017/S0954394510000013 The past, present, and future of English dialects: Quantifying convergence, divergence, and dynamic equilibrium WARREN M AGUIRE AND A PRIL M C M AHON University of Edinburgh P AUL H EGGARTY University of Cambridge D AN D EDIU Max-Planck-Institute for Psycholinguistics ABSTRACT This article reports on research which seeks to compare and measure the similarities between phonetic transcriptions in the analysis of relationships between varieties of English. It addresses the question of whether these varieties have been converging, diverging, or maintaining equilibrium as a result of endogenous and exogenous phonetic and phonological changes. We argue that it is only possible to identify such patterns of change by the simultaneous comparison of a wide range of varieties of a language across a data set that has not been specifically selected to highlight those changes that are believed to be important. Our analysis suggests that although there has been an obvious reduction in regional variation with the loss of traditional dialects of English and Scots, there has not been any significant convergence (or divergence) of regional accents of English in recent decades, despite the rapid spread of a number of features such as TH-fronting. THE PAST, PRESENT AND FUTURE OF ENGLISH DIALECTS Trudgill (1990) made a distinction between Traditional and Mainstream dialects of English. Of the Traditional dialects, he stated (p. 5) that: They are most easily found, as far as England is concerned, in the more remote and peripheral rural areas of the country, although some urban areas of northern and western England still have many Traditional Dialect speakers.
    [Show full text]
  • Chrismon Tree Ornaments
    Chrismon Tree Ornaments 1106 North Main Street Garden City, KS 67846 Chrismon Ornaments were originated and first made for use on the Christmas tree of Ascension Lutheran Church in Danville, Virginia, in 1957 by Mrs. Frances Kipps Spencer, a church member. The designs were monograms of and symbols for our Lord Jesus Christ. Because these designs have been used by his followers since biblical times, they are the heritage of all Christians and serve to remind each of us regardless of denomination of the One we follow. All Chrismon Ornaments are made in a combination of white and gold to symbolize the purity and majesty of the Son of God and the Son of Man. Please use this Chrismon booklet to teach others about our Lord Jesus Christ. Return the booklet to its original place so others can use as a form of discipleship as well. Thanks and enjoy! Alpha & Omega Bell Alpha and Omega are the first and last Bells have been used for centuries to call letters in the Greek alphabet. Used people to worship. They were mentioned together, they symbolize that Christians as early as in the 6th century. Before clocks believe Jesus is the beginning and the end a church bell was rung to tell people it was of all things. time to attend a wedding, funeral or other services. High church towers were built just so that the bells could be heard as far as possible. May the bells we hear remind us of God’s gift to us, the birth of Jesus. Anchor Cross Bottony Cross The Anchor Cross was used by early The cross always reminds us of Jesus’ Christians as a symbol of their faith when great gift to us through his death and they had to avoid recognition as Christians resurrection.
    [Show full text]
  • Typing in Greek Sarah Abowitz Smith College Classics Department
    Typing in Greek Sarah Abowitz Smith College Classics Department Windows 1. Down at the lower right corner of the screen, click the letters ENG, then select Language Preferences in the pop-up menu. If these letters are not present at the lower right corner of the screen, open Settings, click on Time & Language, then select Region & Language in the sidebar to get to the proper screen for step 2. 2. When this window opens, check if Ελληνικά/Greek is in the list of keyboards on your ​ ​ computer under Languages. If so, go to step 3. Otherwise, click Add A New Language. Clicking Add A New Language will take you to this window. Look for Ελληνικά/Greek and click it. When you click Ελληνικά/Greek, the language will be added and you will return to the previous screen. 3. Now that Ελληνικά is listed in your computer’s languages, click it and then click Options. 4. Click Add A Keyboard and add the Greek Polytonic option. If you started this tutorial without the pictured keyboard menu in step 1, it should be in the lower right corner of your screen now. 5. To start typing in Greek, click the letters ENG next to the clock in the lower right corner of the screen. Choose “Greek Polytonic keyboard” to start typing in greek, and click “US keyboard” again to go back to English. Mac 1. Click the apple button in the top left corner of your screen. From the drop-down menu, choose System Preferences. When the window below appears, click the “Keyboard” icon.
    [Show full text]
  • What Are Mergers?
    What are mergers? Warren Maguire (University of Edinburgh) Lynn Clark (Lancaster University) Kevin Watson (University of Canterbury) Example mergers • You’ll all have heard of the following (and many other phonological mergers besides): – the MEAT-MEET merger: the merger of Middle English / ɛː / and /e ː/ – the /o/ -/oh/ merger or COT -CAUGHT merger: the merger of English / ɒ/ (or / ɑ/) and / ɔː / – the NEAR-SQUARE merger: the merger of the vowel in words such as beer , fear , near with the vowel in words such as bare , fair , square • But what are these phenomena? – and why are they of considerable interest to linguists? Phonological merger • ‘Merger’ in these cases refers to loss or absence of phonological distinction • ‘Merger’ can refer to a property of a language, of variety of a language, or of a speech community – these are convenient cover terms for collections of individuals (and their phonologies) who are more or less similar • It can refer to a feature of an individual’s phonology – in comparison with those of other speakers Synchronic and diachronic merger • ‘Merger’ may refer to a synchronic state – absence of a historic distinction (which may still exist elsewhere) – in varieties – in individuals • Or to diachronic change – loss of a distinction over time – in varieties – can individuals lose distinctions? Synchronic merger • A complicating issue is that, though phonology is a cognitive state, speakers don’t live in vacuums but are exposed to other speech patterns which may be different than their own – how much knowledge
    [Show full text]
  • Policy on Sexual Misconduct on the Part of the Church Personnel of the Archdiocese of San Antonio (Revised October 2003)
    Policy on Sexual Misconduct on the Part of the Church Personnel of the Archdiocese of San Antonio (Revised October 2003) The Catholic Church expects its personnel to live chaste and moral lives, respecting in every way the gift of sexuality. Inappropriate sexual activity of any type abuses the call to ministry, the power and authority of the pastoral role of all who work for and serve the people of God. Thus, sexual misconduct by church personnel of the Archdiocese of San Antonio is contrary to Christian morals, canon law and in some cases civil law. It is obviously outside the scope of the duties of church ministry and employment for all personnel of the Archdiocese, its parishes, schools and other agencies, and will not under any circumstances be tolerated. It is imperative that all personnel of the Archdiocese, its parishes, schools and other agencies comply with all applicable church, federal, state, and local laws regarding incidents of actual, alleged or suspected sexual misconduct, and with the procedures outlined in this document. Sexual abuse is a transgression of canon, civil, and criminal law. A violation of the criminal law of the State of Texas can subject the perpetrator to a prison sentence and/or fine. An act of sexual abuse can also be the basis for a civil suit for monetary damages. To assist in the implementation of this policy, the Archbishop has established a Crisis Intervention Committee, which will act immediately on any complaint to the Archdiocese of sexual misconduct on the part of church personnel. The Archdiocese has also established a Review Board to review policies and procedures.
    [Show full text]
  • Alpha Iota Sydney
    Delta Sigma Pi – Alpha Iota Hello Brothers! The summer is quickly approaching us here at Drake! In August, we have four members so far registered to attend the 52nd Grand Chapter Congress in Atlanta, Georgia. This is a very special event where we will get to learn more about all of the different aspects of the Fraternity and meet other brothers from across the nation. We will attend a various number of professional and networking events, learn how to improve our chapter, and at the same time explore the city of Atlanta. This is a fantastic professional opportunity, but also an expensive one for brothers to attend. We are asking for your support to help us fund this event! Donating to Alpha Iota Chapter Leadership Fund helps us afford events such as Grand Chapter Congress, LEAD Provincial Conferences, LEAD Schools, and the Presidents’ Academy. For this Grand Chapter Congress the costs for members to attend are broken down approximately in the following ways: One Member Four Members Eight Members Registration $335 $1,340 $2,680 Hotel $139 $556 $1,112 Travel $300 $1,200 $2,400 TOTAL $774 $3,096 $6,192 All gifts to the Alpha Iota Chapter Leadership Fund are tax-deductible (as allowable by law). The chapter’s CLF is a fund within the Delta Sigma Pi Leadership Foundation. The Leadership Foundation is a 501(c)3 charitable organization as defined by the IRS. To donate, please visit this website: dsp.org/donate, click “Chapter Donation”, and select Alpha Iota under Chapter Information. If you have any questions, suggestions, or comments please feel free to reach out.
    [Show full text]
  • The Greek Alphabet & Pronunciation
    Lesson 1 tHe Greek aLPHaBet & Pronunciation n this lesson, we learn how to identify and pronounce the letters of I the Greek alphabet. We also distinguish smooth and rough breathing marks and learn the sounds of Greek diphthongs. Finally, we practice reading a few Greek words, such as Ἀχαιός, ἴφθιμος, and προϊάπτω. The classical Greek alphabet has 24 letters (plus two archaic letters that help explain older forms of Greek). Greek Latin Greek Latin Letter Equivalents Sound Name Transcription a as in father (when short, as Α, α A, a ἄλφα alpha in aha) Β, β B, b b as in bite βῆτα beta always g as in get (never soft, Γ, γ G, g γάμμα gamma as in gym) Δ, δ D, d d as in deal δέλτα delta Ε, ε E, e e as in red ἒ ψιλόν epsilon zd as in Mazda (many also pronounce this dz or simply z, Ζ, ζ Z, z because these are simpler to ζῆτα zeta pronounce for native English speakers) long a as in gate or as in Η, η E, e ἦτα eta (French) fête Θ, θ th th as in thick θῆτα theta long e as in feet and police or , ι I, i ἰῶτα iota short i as in hit 2 , κ K, k or C, c k as in kill κάππα kappa , λ L, l l as in language λάμβδα lambda , μ M, m m as in man μῦ mu , ν N, n n as in never νῦ nu , ξ X, x x as in box ξῖ xi o as in ought, but shorter (that is, a “closed” o), or as , ο O, o ὂ μικρόν omicron in the British pronunciation of pot , π P, p p as in pie πῖ pi a trilled r (as in continental , ρ R, r ῥῶ rho European languages) Σ, σ, ς S, s s as in sing σίγμα sigma Τ, τ T, t t as in tip ταῦ tau u as in (French) tu or U, u or (German) Müller, but the u in Υ, υ ὖ ψιλόν upsilon
    [Show full text]
  • Shakespeare's Original Pronunciation
    ORIGINAL PRONUNCIATION – Speak the speech, I pray you, as I pronounced it to you, trippingly on the tongue. THE ORIGINAL PRONUNCIATION (OP) OF SHAKESPEARE'S ENGLISH by PAUL MEIER Based on the work of David Crystal in Early Modern English (EME) with embedded sound files ORIGINAL PRONUNCIATION THE INTERNATIONAL PHONETIC ALPHABET I have used the symbols of the International Phonetic Alphabet to represent in the text the sounds you hear me making in the recordings. While only a few of my readers may be familiar with this alphabet, I have found that simply seeing the sounds represented visually this way strongly reinforces what you are hearing; and, as its name implies, the IPA, among many phonetic systems, has been the international standard since the early twentieth century. When I was a student at the Rose Bruford School of Speech and Drama in London, I had a wonderful phonetics teacher, Greta Stevens, who painstakingly demonstrated the sounds in class until her students “fixed” the sounds associated with each symbol. We also were able to purchase the huge, old 78 r.p.m. discs with Daniel Jones, the father of the system, speaking the cardinal vowels. Under Miss Stevens’ superb tutelage, I took my studies as far as I could, culminating in the rigorous proficiency examination administered by the International Phonetics Association. It is a testament to her skill that, among those gaining the IPA Certificate of Proficiency that year, 1968, I was the high scorer. My love of phonetics and its ability to record the way humans speak has never diminished.
    [Show full text]
  • The Greek Alphabet Sight and Sounds of the Greek Letters (Module B) the Letters and Pronunciation of the Greek Alphabet 2 Phonology (Part 2)
    The Greek Alphabet Sight and Sounds of the Greek Letters (Module B) The Letters and Pronunciation of the Greek Alphabet 2 Phonology (Part 2) Lesson Two Overview 2.0 Introduction, 2-1 2.1 Ten Similar Letters, 2-2 2.2 Six Deceptive Greek Letters, 2-4 2.3 Nine Different Greek Letters, 2-8 2.4 History of the Greek Alphabet, 2-13 Study Guide, 2-20 2.0 Introduction Lesson One introduced the twenty-four letters of the Greek alphabet. Lesson Two continues to present the building blocks for learning Greek phonics by merging vowels and consonants into syllables. Furthermore, this lesson underscores the similarities and dissimilarities between the Greek and English alphabetical letters and their phonemes. Almost without exception, introductory Greek grammars launch into grammar and vocabulary without first firmly grounding a student in the Greek phonemic system. This approach is appropriate if a teacher is present. However, it is little help for those who are “going at it alone,” or a small group who are learning NTGreek without the aid of a teacher’s pronunciation. This grammar’s introductory lessons go to great lengths to present a full-orbed pronunciation of the Erasmian Greek phonemic system. Those who are new to the Greek language without an instructor’s guidance will welcome this help, and it will prepare them to read Greek and not simply to translate it into their language. The phonic sounds of the Greek language are required to be carefully learned. A saturation of these sounds may be accomplished by using the accompanying MP3 audio files.
    [Show full text]
  • Book Ii the People of God
    BOOK II THE PEOPLE OF GOD PART I CHURCH PERSONNEL TITLE I GENERAL PERSONNEL POLICIES TABLE OF CONTENTS ARCHDIOCESAN EMPLOYEE PHILOSOPHY .........................................................................i GLOSSARY......................................................................................................................................iii §100 EMPLOYMENT RELATIONSHIPS ............................................................................ [100] §101 Employment Status..................................................................................................1 §101.1. Exempt vs. Non-exempt.............................................................................1 §101.2. Full-time/Part-time Status..........................................................................2 §101.3. Independent Contractor vs. Employee.......................................................2 §101.4. Time Sheets and Work Schedules..............................................................4 §101.4.1. Full-Time Exempt Employees ....................................................4 §101.4.2. Full-Time Non-Exempt Employees............................................5 §101.4.2. Full-Time Non-Exempt Employees............................................5 §101.5. Absences and Tardiness.............................................................................6 §102 Civil and Canon Law...............................................................................................6 §200 RECRUITMENT ............................................................................................................
    [Show full text]
  • Faculties of the Diocese of GF-B/Request for Permission
    CANONICAL FACULTIES Faculties of Pastors Diocese of Great Falls-Billings FACULTIES OF PASTORS 1. a. To baptize an adult or admit a baptized adult into full communion with the Catholic church. (The law itself grants the additional faculty to confirm the candidate during these rites.) b. To admit into full communion and to confirm an adult who has been baptized in the Latin Catholic church but was brought up in or adhered to a non-Catholic religion. c. To administer the Sacrament of Confirmation to a baptized adult Latin Catholic who has completed the various stages of the catechumenate according to the rite of the catechumenate approved by the U.S. Bishops. d. To mandate another priest to perform these rites in individual cases. (Any priest so delegated has the faculty by the law itself to confirm the candidate during the rites of admission.) The law states that “the baptism of adults, at least those who have completed their fourteenth year, is to be referred to the bishop, so that he himself may confer it if he judges this appropriate”. Furthermore, “the provisions of the canons on adult baptism apply to all those who, being no longer infants, have reached the age of reason; on the completion of the seventh year, the minor is presumed to have the use of reason" (Canon 863, 852§1, 97§2). An adult is to be admitted to the catechumenate. This faculty permits pastors to baptize or receive those individuals above the age of seven. Converts from Orthodox churches may not be received without permission from the Holy See.
    [Show full text]
  • Formant Frequencies of Vowels in 13 Accents of the British Isles
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Hal-Diderot Formant frequencies of vowels in 13 accents of the British Isles Emmanuel Ferragne &Franc¸ois Pellegrino Laboratoire Dynamique du Langage, UMR 5596 CNRS, UniversiteLyon2´ [email protected] [email protected] This study is a formant-based investigation of the vowels of male speakers in 13 accents of the British Isles. It provides F1/F2 graphs (obtained with a semi-automatic method) which could be used as starting points for more thorough analyses. The article focuses on both phonetic realization and systemic phenomena, and it also provides detailed information on automatic formant measurements. The aim is to obtain an up-to-date picture of within- and between-accent vowel variation in the British Isles. F1/F2 graphs plot z-scored Bark- transformed formant frequencies, and values in Hertz are also provided. Along with the findings, a number of methodological issues are addressed. 1 Introduction In the linguistic literature, so much attention has already been paid to the phonetics and phonology of the modern accents of the British Isles that one may wonder why more research is needed in this field. Part of the answer lies in the constantly evolving nature of phonological systems and phonetic realizations: what used to be true when John Wells wrote his Accents of English some 25 years ago (Wells 1982) may not entirely apply to current pronunciation trends. Recent books (Foulkes & Docherty 1999, Schneider et al. 2004) have endeavoured to update our knowledge of accent variation, often focusing on urban accents, in the British Isles (and beyond), and a whole host of articles have been published.
    [Show full text]