Pronunciation Modeling in Speech Synthesis

Total Page:16

File Type:pdf, Size:1020Kb

Pronunciation Modeling in Speech Synthesis University of Pennsylvania ScholarlyCommons IRCS Technical Reports Series Institute for Research in Cognitive Science February 1998 Pronunciation Modeling In Speech Synthesis Corey Andrew Miller University of Pennsylvania Follow this and additional works at: https://repository.upenn.edu/ircs_reports Miller, Corey Andrew, "Pronunciation Modeling In Speech Synthesis" (1998). IRCS Technical Reports Series. 55. https://repository.upenn.edu/ircs_reports/55 University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-98-09. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/ircs_reports/55 For more information, please contact [email protected]. Pronunciation Modeling In Speech Synthesis Abstract This dissertation investigates the area of pronunciation modeling in speech synthesis. By pronunciation modeling, we mean architectures and principles for generating high-quality human-like pronunciations. The term pronunciation modeling has previously been applied in the context of speech recognition (e.g. Byrne et al. 1997). In that context, it describes theories and procedures for handling the pronunciation variation that naturally occurs across speakers. In contrast, our work is in the domain of text-to-speech synthesis, which, as we will show, requires modeling the pronunciation variation of an individual whose speech the synthesizer is attempting to model. We will explain our methodology for learning and reproducing pronunciation variation on an individual basis, and show how most crucial features of such variation can be easily generated using the architecture we describe. Throughout the course of this exposition, we highlight contributions to linguistic theory that such a thorough analysis of individual variation provides. We describe the postlexical module of an English text-to-speech synthesizer. This module is responsible for transforming underlying lexical pronunciations from a lexical database into contextually appropriate surface postlexical pronunciations. This transformation is achieved by machine learning of a corpus of hand-labeled postlexical pronunciations that have been aligned with lexical pronunciations. The machine learning is conducted by a neural network, whose architecture and data encoding we describe. A thorough analysis of the performance of the postlexical module is offered, with attention to the relative success of the neural network at learning a wide range of postlexical phenomena. We examine the extent to which a symbolic approach to allophony is warranted, and provide an acoustic analysis that attempts to provide an answer to this question. Assessments of the success of currently existing theories of phonetics, phonology and their interface are offered, based on the experience of generating a complete postlexical phonology of English for use in synthetic speech. Comments University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-98-09. This thesis or dissertation is available at ScholarlyCommons: https://repository.upenn.edu/ircs_reports/55 PRONUNCIATION MODELING IN SPEECH SYNTHESIS Corey Andrew Miller A DISSERTATION in Linguistics Presented to the Faculties of the University of Pennsylvania in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy 1998 _____________________________ Mark Liberman Supervisor of Dissertation _____________________________ George Cardona Graduate Group Chairperson © COPYRIGHT Corey Andrew Miller 1998 DEDICATION To Jonathan Connett. iii ACKNOWLEDGMENTS I am very pleased to have had the encouragement and support of a committee of three linguists for whom I have the greatest respect and admiration: Mark Liberman, William Labov and Eugene Buckley. Each of them made my transition back to Penn pleasant after what seemed like a long absence. It was a great pleasure to have Mark Randolph both as an external reader and as a colleague at Motorola. Mark’s work at MIT a decade ago has served as an inspiration to me. Orhan Karaali made this dissertation possible in this millennium. As my manager for over two years at Motorola, Orhan insisted on making my dissertation a priority at work. Harry Bliss provided his voice to this project and our whole group is very grateful for his patience and cooperation. My colleagues at Motorola listened to my ideas and provided technical and theoretical assistance at every turn: Noel Massey, Jerry Corrigan, William Thompson, Andrew Mackie, Erica Zeinfeld, Otto Schnurr, Lea Adams, Michael Murdock, Joseph Goldberg and Lynette Melnar. My entire management was supportive in this effort: Ira Gerson, Kevin Kloker and James Mikulski. D. J. Stockley made the pursuit of intellectual property a positive learning experience. My colleagues at my former workplace, Franklin Electronic Publishers, provided a great environment as I wet my feet in the industrial world: Will Dowling, David Justice and Mike Wolff. I thank Matt Lennig and Kristin Precoda for introducing me to speech technology and software development at BNR. My linguist friends and colleagues at Penn and others schools provided support and encouragement while I lived in Philadelphia and afterwards: Christine Zeller, Tom Veatch, Peter Slomanson, Fabiola Varela-García, Mary O’Malley, Bill Reynolds, Hadass Sheffer, Stephanie Strassel, Nadia iv Biassou, Lisa Lavoie, Hikyoung Lee, Alex Dimitriadis, and Jason Eisner. I thank Brian Sietsema, Pat Keating, Sean Fulop and Howard Nusbaum for providing valuable advice and information. My mother always believed that this dissertation would become a reality, and to her I owe inexpressible gratitude for providing encouragement at the more difficult moments of my graduate career. The thought of the rest of my family including my father, Stephanie, Ken, Jordan, Avery, Lee, Elihu, and my late grandparents has always kept me going. Thank you Jonathan for paying the bills, arranging exotic vacations and, most of all, hanging in there. Finally, I would like to thank the Summer Institute of Linguistics for their excellent phonetic fonts which are freely distributed on the World Wide Web. v ABSTRACT PRONUNCIATION MODELING IN SPEECH SYNTHESIS Corey Andrew Miller Mark Liberman This dissertation investigates the area of pronunciation modeling in speech synthesis. By pronunciation modeling, we mean architectures and principles for generating high-quality human-like pronunciations. The term pronunciation modeling has previously been applied in the context of speech recognition (e.g. Byrne et al. 1997). In that context, it describes theories and procedures for handling the pronunciation variation that naturally occurs across speakers. In contrast, our work is in the domain of text-to-speech synthesis, which, as we will show, requires modeling the pronunciation variation of an individual whose speech the synthesizer is attempting to model. We will explain our methodology for learning and reproducing pronunciation variation on an individual basis, and show how most crucial features of such variation can be easily generated using the architecture we describe. Throughout the course of this exposition, we highlight contributions to linguistic theory that such a thorough analysis of individual variation provides. We describe the postlexical module of an English text-to-speech synthesizer. This module is responsible for transforming underlying lexical pronunciations from a lexical database into contextually appropriate surface postlexical pronunciations. This transformation is achieved by machine learning of a corpus of hand-labeled postlexical pronunciations that have been aligned with lexical pronunciations. The machine learning is conducted by a vi neural network, whose architecture and data encoding we describe. A thorough analysis of the performance of the postlexical module is offered, with attention to the relative success of the neural network at learning a wide range of postlexical phenomena. We examine the extent to which a symbolic approach to allophony is warranted, and provide an acoustic analysis that attempts to provide an answer to this question. Assessments of the success of currently existing theories of phonetics, phonology and their interface are offered, based on the experience of generating a complete postlexical phonology of English for use in synthetic speech. vii TABLE OF CONTENTS Chapter 1. Introduction .....................................................................................................1 1.1. What is speech synthesis pronunciation modeling? ................................................1 1.2. Computational phonology and speech technology..................................................2 1.2.1. Linguistic aspects of speech synthesis.............................................................4 1.2.2. Review of prior work in postlexical modeling..............................................14 1.2.3. Review of neural networks in phonology......................................................25 1.3. General phonological issues..................................................................................30 1.3.1. Phonetics-phonology interface ......................................................................30 1.3.2. Lexical phonology and the interface between syntax, morphology and phonology..................................................................................................................35 1.4. Sociolinguistics/Variation .....................................................................................43
Recommended publications
  • Gregory Ward
    Gregory Ward Department of Linguistics Tel.: (847) 491-8055 Northwestern University Fax: (847) 491-3770 2016 Sheridan Road, Suite 21 Email: [email protected] Evanston, IL 60208-4090 Internet: http://www.gregoryward.org Education 1985 Ph.D. in Linguistics, University of Pennsylvania. Thesis: The Semantics and Pragmatics of Preposing. Advisor: Ellen F. Prince. 1978 B.A. in Linguistics (with Honors) and Comparative Literature, University of California, Berkeley. Graduated with Great Distinction in General Scholarship from the College of Letters & Sciences. Phi Beta Kappa. Appointments Current: 1997 — Professor, Department of Linguistics, Northwestern University. 2014 — Professor, by courtesy, Department of Philosophy, Northwestern University (Affiliated Faculty since 2007). 2014 — Member of the Advisory Board, Program in Gender & Sexuality Studies, Northwestern University. Other: 2015 Faculty, LSA Linguistic Institute, University of Chicago, Chicago. 2013 Faculty, LSA Linguistic Institute, University of Michigan, Ann Arbor. 2011 Faculty, LSA Linguistic Institute, University of Colorado, Boulder. 2011 Distinguished Visiting Professor, University of Vigo, Spain. June. 2010 Distinguished Visiting Professor, University of Osnabrück, Germany. July. 2010 Visiting Professor, Department of Cognitive and Linguistic Sciences, Brown University. May. 2009 Faculty, LSA Linguistic Institute, University of California, Berkeley. 2008 Distinguished Visiting Professor, University of Santiago de Compostela, Spain. June. 2007 Faculty, LSA Linguistic Institute, Stanford University. 2003 Faculty, LSA Linguistic Institute, Michigan State University. 1999 – 04 Chair, Department of Linguistics, Northwestern University. 1997 Faculty, LSA Linguistic Institute, Cornell University. Gregory Ward, CV 2 1996 Visiting Professor, UFR Angellier (Department of English), Université Charles de Gaulle – Lille 3. 1993 Faculty, LSA Linguistic Institute, The Ohio State University. 1991 – 97 Associate Professor, Department of Linguistics, Northwestern University.
    [Show full text]
  • Proceedings of the Human Language Technology Conference of The
    HLT-NAACL 2006 Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics Proceedings of the Main Conference Robert C. Moore, General Chair Jeff Bilmes, Jennifer Chu-Carroll and Mark Sanderson Program Committee Chairs June 4-9, 2006 New York, New York, USA Published by the Association for Computational Linguistics http://www.aclweb.org Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ii Preface from the General Chair This year marks the third time that the conference on Human Language Technology has combined with the North American chapter meeting of the Association for Computational Linguistics. The roster of accepted papers reveals an eclectic mix of topics in natural-language processing, speech processing, and information retrieval. A gratifying number of the papers are difficult to classify because they span more than one of these three major areas of human language technology. For example, the boundary between natural-language processing and information retrieval is hard to draw in the papers that focus on the World Wide Web as a corpus; moreover, several of these include speech-related aspects as well. The crazy thing about putting on a conference like this is that you start out with a group of people who have never done it before, and by the time they really figure out what they are doing, the conference is over and you replace them with another group of people who have never done it before! To do a good job as general chair, however, there is only one really important thing to learn: pick really good people to do all the other jobs, sit back, and let them do all the work.
    [Show full text]
  • 31St Annual Meeting of the Association for Computational Linguistics
    31st Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference 22-26 June 1993 Ohio State University Columbus, Ohio, USA Published by the Association for Computational Linguistics © 1993, Association for Computational Linguistics Order copies of this and other ACL proceedings from: Donald E. Walker (ACL) Bellcore 445 South Street, MRE 2A379 Morristown, NJ 07960, USA PREFACE This volume contains the papers prepared for the 31 st Annual Meeting of the Association for Compu- tational Linguistics, held 22-26 June 1993 at The Ohio State University in Columbus, Ohio. The cluster of papers in the final section stems from the student session, featured at the meeting for the 3rd successive year and testifying to the vigor of this emerging tradition. The number and quality of submitted papers was again gratifying, and all authors deserve our collec- tive plaudits for the efforts they invested despite the well-known risks of submitting to a highly selective conference. It was their efforts that once again ensured a Meeting (and Proceedings) reflecting the highest standards in computational linguistics, offering a tour of some of the most significant recent ad- vances and most lively research frontiers. Special thanks go to our invited speakers, Wolfgang Wahlster, Geoff Nunberg and Barbara Partee, for contributing their insights and panache to the conference; to Philip Cohen for concocting and coordinat- ing a varied and relevant tutorial program, and to Helen Gigley, Steve Small, Alexis Manaster Ramer, Wlodek Zadrozny, Kent Wittenburg, David D. Lewis, and Elizabeth D. Liddy for offering the tutorials; to Linda Suri and Sandra Carberry for very effectively co-organizing the program for the student sessions (see the separate preface for the student session papers); to Terry Patten for directing local arrangements, and to Robert Kasper for helping with local arrangements and coordinating exhibits and demonstrations.
    [Show full text]
  • Julia Hirschberg
    JULIA HIRSCHBERG September 12, 2021 Columbia University Department of Computer Science 1214 Amsterdam Avenue, M/C 0401 450 CS Building New York, NY 10027 email: [email protected] phone: (212) 853-8464 FAX: (212) 853-8440 http://www.cs.columbia.edu/ julia Biographical Sketch Julia Hirschberg has been a professor in the Department of Computer Science at Columbia University since 2002, serving as Chair from July 2012 through June 2018. She was appointed Percy K. and Vida L. W. Hudson Professor of Computer Science in 2013. She received her PhD in Computer Science from the University of Pennsylvania. She worked at Bell Laboratories and AT&T Laboratories { Research from 1985-2003 as a Member of Technical Staff, Department Head, and Technology Leader, creating the Human-Computer Interface Research Department at Bell Labs. She has done research on prosody, dis- course structure, conversational implicature, text-to-speech synthesis, speech search, spoken dialogue systems, emotional speech, deceptive speech, charismatic speech and speech summarization. She served as editor-in-chief of Computational Linguistics from 1993-2003 and was an editor-in-chief of Speech Communication from 2003-2006 and is now on the Editorial Board. She served on the Executive Board of the Association for Computational Linguistics (ACL) from 1993-2003, has served on the Permanent Council of International Conference on Spoken Language Processing (ICSLP) since 1996, the IEEE Speech and Language Processing Technical Committee (SLTC) from 2011{, the executive board of the
    [Show full text]
  • Gregory Ward
    Gregory Ward Department of Linguistics Ph.: (847) 491-8055 Northwestern University Fax: (847) 491-3770 2016 Sheridan Road, Suite 21 e-mail: [email protected] Evanston, IL 60208-4090 www: www.ling.northwestern.edu/~ward Education 1985 Ph.D. in Linguistics, University of Pennsylvania. Thesis: The Semantics and Pragmatics of Preposing. Advisor: Ellen F. Prince. 1978 B.A. in Linguistics (with Honors) and Comparative Literature, University of California, Berkeley. Graduated with Great Distinction in General Scholarship from the College of Letters & Sciences. Phi Beta Kappa. Positions Held 2004 – 05 Fellow, Center for Advanced Study in the Behavioral Sciences. 1997 – Professor, Department of Linguistics, Northwestern University. 1999 – 04 Chair, Department of Linguistics, Northwestern University. 1991 – 97 Associate Professor, Department of Linguistics, Northwestern University. 1986 – 91 Assistant Professor, Department of Linguistics, Northwestern University. 1985 – 86 Lecturer, Department of Linguistics, San Diego State University. Other Professional Experience 2007 Visiting Professor, LSA Linguistic Institute, Stanford University. 2003 Visiting Professor, LSA Linguistic Institute, Michigan State University. 1997 Visiting Professor, LSA Linguistic Institute, Cornell University. 1996 Visiting Professor, UFR Angellier (Department of English), Université Charles de Gaulle – Lille 3. 1993 Visiting Professor, LSA Linguistic Institute, The Ohio State University. 1986 – 97 Consultant, AT&T Labs – Research (formerly AT&T Bell Laboratories); Murray
    [Show full text]
  • Julia Hirschberg
    JULIA HIRSCHBERG February 8, 2017 Columbia University Department of Computer Science 1214 Amsterdam Avenue, M/C 0401 450 CS Building New York, NY 10027 email: [email protected] phone: (212) 939-7114 FAX: (212) 666-0140 http://www.cs.columbia.edu/ julia Biographical Sketch Julia Hirschberg has been a professor in the Department of Computer Science at Columbia University since 2002, serving as Chair since 2012. She was appointed Percy K. and Vida L. W. Hudson Professor of Computer Science in 2013. She received her PhD in Computer Science from the University of Penn- sylvania. She worked at Bell Laboratories and AT&T Laboratories { Research from 1985-2003 as a Member of Technical Staff, Department Head, and Technology Leader, creating the Human-Computer Interface Research Department at Bell Labs. She has done research on prosody, discourse structure, conversational implicature, text-to-speech synthesis, speech search, spoken dialogue systems, emotional speech, deceptive speech, and speech summarization. She served as editor-in-chief of Computational Linguistics from 1993-2003 and was an editor-in-chief of Speech Communication from 2003-2006 and is now on the Editorial Board. She was on the Executive Board of the Association for Computational Linguistics (ACL) from 1993-2003, has been on the Permanent Council of International Conference on Spoken Language Processing (ICSLP) since 1996, the Board of Directors of the Computing Research Association (CRA) from 2013-14, and on the board of the International Speech Communication Associ- ation (ISCA) from 1999-2007 (as President 2005-2007; Advisory Council 2007{, chair of Distinguished Lecturers Selection Committee, 2010{).
    [Show full text]
  • 1 Mona T. Diab
    Mona T. Diab, PhD Associate Professor Department of Computer Science School of Engineering and Applied Science George Washington University [email protected] http://www.seas.gwu.edu/~mtdiab Office: +1(202) 994.8109 RESEARCH FOCUS & INTERESTS Computational lexical semantics, multilingual processing, computational sociolinguistics, computational pragmatics, social media analytics, health analytics, low resource language processing, resource building, applied machine learning techniques, text analytics, information extraction, sentiment and emotion analysis, Arabic computational linguistics. PROFESSIONAL EXPERIENCE 01.2013-present Associate Professor, Department of Computer Science, The George Washington University, Washington DC, USA 01.2013-present Director, GW NLP Lab (CARE4Lang), The George Washington University, Washington DC, USA (~20 active members) 06.2005-present Co-Director, Computational Approaches for Arabic Dialect Modeling (CADIM) Group, Columbia University, The George Washington University, NYU-Abu Dhabi (~10 active members) 09.2009-12.2012 Research Scientist (Principal Investigator), Center for Computational Learning Systems (CCLS), Columbia University, New York NY, USA 09.2009-12.2012 Adjunct Associate Professor, Department of Computer Science, Columbia University, New York NY, USA 09.2007-08.2009 Adjunct Assistant Professor, Department of Computer Science, Columbia University, New York NY, USA 02.2005-08.2009 Associate Research Scientist (Principal Investigator), Center for Computational Learning Systems (CCLS), Columbia University,
    [Show full text]
  • Kathleen Mckeown
    Kathleen McKeown Department of Computer Science Columbia University New York, NY. 10027 U.S.A. Phone: 212-939-7118 Fax: 212-666-0140 email: [email protected] website: http://www.cs.columbia.edu/ kathy Current position Henry and Gertrude Rothschild Professor of Computer Science, Columbia University, New York Research Interests Computational Linguistics/Natural-Language Processing: Text Summarization; Language Genera- tion; Social Media Analysis; Open-ended question answering; Sentiment Analysis Education 1982 Ph.D., Computer and Information Science, University of Pennsylvania 1979 M.S., Computer and Information Science, University of Pennsylvania 1976 A.B., Comparative Literature, Brown University Appointments held 2012-2018 Founding Director, Columbia Data Science Institute 2011-2012 Vice Dean for Research, School of Engineering and Applied Science 2005-present Henry and Gertrude Rothschild Professor of Computer Science, Columbia University 1997-present Professor, Columbia University 7/2003-12/2003 Acting Chair, Department of Computer Science, Columbia University 1997-2002 Chair, Department of Computer Science, Columbia University 1987-1997 Associate Professor, Columbia University 1982-1987 Assistant Professor, Columbia University Honors & awards 2016 Keynote Speaker, International Semantic Web Conference, Kobe, Japan 20162014 Grace Hopper Distinguished Lecture, University of Pennsylvania Philadelphia, Pa., Nov. 2016. Distin- guished Lecturer, University of Edinburgh, Launch of the Centre for Doctoral Training in Data Science, Edinburgh,
    [Show full text]
  • Proceedings of the First Workshop on Computational Approaches to Code
    EMNLP 2014 First Workshop on Computational Approaches to Code Switching Proceedings of the Workshop October 25, 2014 Doha, Qatar Production and Manufacturing by Taberg Media Group AB Box 94, 562 02 Taberg Sweden c 2014 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA ISBN 978-1-937284-96-1 Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ii Introduction Code-switching (CS) is the phenomenon by which multilingual speakers switch back and forth between their common languages in written or spoken communication. CS is pervasive in informal text communications such as news groups, tweets, blogs, and other social media of multilingual communities. Such genres are increasingly being studied as rich sources of social, commercial and political information. Apart from the informal genre challenge associated with such data within a single language processing scenario, the CS phenomenon adds another significant layer of complexity to the processing of the data. Efficiently and robustly processing CS data presents a new frontier for our NLP algorithms on all levels. The goal of this workshop is to bring together researchers interested in exploring these new frontiers, discussing state of the art research in CS, and identifying the next steps in this fascinating research area. The workshop program includes exciting papers discussing new approaches for CS data and the development of linguistic resources needed to process and study CS. We received a total of 17 regular workshop submissions of which we accepted eight for publication (47% acceptance rate), five of them as workshop talks and three as posters.
    [Show full text]