Automatic Derivation of Grammar Rules Used for Speech Recognition in Dialogue Systems
Total Page:16
File Type:pdf, Size:1020Kb
Masaryk University Faculty of Informatics Automatic Derivation of Grammar Rules Used for Speech Recognition in Dialogue Systems Master’s Thesis Bc. Klára Kufová Brno, Spring 2018 This is where a copy of the official signed thesis assignment and a copy of the Statement of an Author is located in the printed version of the document. Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or ex- cerpted during elaboration of this work are properly cited and listed in com- plete reference to the due source. Bc. Klára Kufová Advisor: Mgr. Luděk Bártek, Ph.D. i Acknowledgements First and foremost, I would like to acknowledge the thesis advisor, Mgr. Luděk Bártek, Ph.D., of the Faculty of Informatics at Masaryk University, for his valuable ideas, constructive advice, and the time dedicated to our consultations. My sincere thanks also go to both my current and former colleagues at Red Hat, especially to Mgr. Milan Navrátil and Ing. Radovan Synek, who helped me tremendously during both the implementation and writing of this thesis. Last but not least, I would like to express my sincere gratitude to my parents, grandparents, my brother, and to my fiancé, who have supported and encouraged me throughout the course of my studies. iii Abstract The thesis deals with the process of building an adaptive dialogue system, which is capable of learning new grammar rules used for automatic speech recognition based on past conversations with real users. The subsequent abil- ity to automatically reduce the unused rules within a grammar is proposed and implemented as well. Such a dialogue system is domain-independent, effi- ciently extensible, and overcomes the most significant drawbacks of grammar- based language models utilized within speech recognizers. The described theoretical principles are demonstrated on the created conversational agent, which is prepared to be deployed in production as a virtual shopping assis- tant in an online fashion boutique. Apart from the system’s implementation details, the thesis provides a comprehensive overview of the area of dialogue systems in the field of artificial intelligence and natural language processing. iv Keywords dialogue system, grammar rules, speech recognition, Sphinx4, speech syn- thesis, MaryTTS, language modelling, chat bot, virtual personal assistant, Coco, natural language processing, speech understanding, speech generation, corrective dialogue, grammar expansion, grammar reduction v Contents Introduction 1 1 State of the Art 3 1.1 Contemporary Dialogue Systems ................3 1.1.1 Virtual Personal Assistants . .3 1.1.2 Business and Commerce . .5 1.1.3 Education and Healthcare . .5 1.2 Current Fields of Research ....................7 1.2.1 Natural Language Understanding . .8 1.2.2 Dialogue Management . .9 1.2.3 Natural Language Generation . 10 2 Building a Dialogue System 11 2.1 Introducing Coco ......................... 11 2.1.1 Problem Domain . 12 2.1.2 Deployment . 13 2.2 Voice Dialogue Standards .................... 16 2.2.1 VoiceXML . 17 2.2.2 Aspect Prophecy . 19 2.2.3 VoxML . 21 2.3 Input and Output Speech .................... 22 2.3.1 Speech Recognition . 22 2.3.2 Speech Synthesis . 31 2.4 Speech Understanding and Generation ............. 35 2.4.1 Speech Understanding . 35 2.4.2 Speech Generation . 36 3 Adaptive Dialogue Systems 37 3.1 Automatic Derivation of Grammar Rules ............ 38 3.1.1 Detecting Out-Of-Grammar Utterances . 38 3.1.2 Corrective Dialogue . 40 3.1.3 Grammar Expansion . 44 3.1.4 Dialogue Continuation . 50 3.2 Automatic Reduction of Grammar Rules ............ 50 3.2.1 Removing Old Rules . 51 3.2.2 Removing Unused Rules . 52 4 Future Work 55 vii Conclusion 57 Bibliography 59 A Running Coco 67 A.1 Software Distribution ....................... 67 A.2 Execution Instructions ...................... 67 A.2.1 Linux-Based Operating Systems . 68 A.2.2 Microsoft Windows . 68 B Contributing to Coco 69 B.1 Developing Coco ......................... 69 B.2 Building Coco ........................... 70 C Example Dialogue with Coco 71 viii List of Figures 1.1 An architecture of a dialogue system. 7 2.1 The architecture of a VoiceXML application. 17 2.2 The architecture of the Sphinx4 speech recognition system. 23 2.3 An example search graph generated by the Linguist module. 27 2.4 The architecture of the MaryTTS speech synthesizer. 32 3.1 The schema of a corrective dialogue. 41 ix Introduction „Simplicity is the keynote of all true elegance.“ – Coco Chanel The idea of a real conversation with a machine has been tempting the re- search from the field of artificial intelligence and natural language processing from the very beginning. Starting in 1950, when the British journal Mind published the influential article Computing Machinery and Intelligence [1] written by Alan Mathison Turing, the area of natural language processing instantaneously emerged and soon after, machines were not only able to talk, but also recognize and understand human speech. Simple and naively operating dialogue systems from 1960s shifted into complex, sophisticated conversational agents of the new millennium. With the assistance of a dialogue system, making a restaurant reservation, booking a plane ticket, or recognizing a new song is a matter of seconds, while learning a new language or improving one’s mental health may be a matter of days. Classified based on their area of usage, both the former and contemporary dialogue systems are introduced in chapter 1. The chapter, however, does not only mention the existing conversational agents, but also describes the current prominent research areas as well. The second half of the first chapter is therefore divided into three sections, which correspond to the fundamental components of a dialogue system: a natural language understanding unit, a dialogue manager, and a natural language generation unit. The most important part of this thesis is Coco, a personal shopping as- sistant introduced at the beginning of chapter 2. The Coco dialogue system is named after Gabrielle Bonheur Chanel, the founder of the world-famous Parisian haute couture fashion house, nicknamed Coco. Principles and tech- niques associated with the implementation of a conversational agent are demonstrated on the created dialogue system. Apart from the characteriza- tion of the system’s problem domain, deployment, and personas, the chapter also introduces the most influential voice dialogue standards and thoroughly describes the used speech recognition and speech synthesis libraries and re- lated methods, which were utilized within the system. The natural language understanding and generation logic implemented in Coco is mentioned as well. 1 The Coco dialogue system was created to demonstrate the ability of an artificial computer system to learn new grammar rules for speech recognition based on past conversations with real users. The automatic derivation of grammar rules together with the employed approaches to their automatic reduction are explained in detail in chapter 3. The final chapter 4 then briefly lists the system’s weak points and desirable enhancements, which may be included in one of the future releases of Coco to provide a more progressive state-of-the-art dialogue interface. The aim of the thesis is not only to define and build an adaptive dialogue system, but also to provide a brief but comprehensive overview of the con- versational agents-specific field of artificial intelligence and natural language processing. And as stated at the beginning, it is crucial to achieve the goal as simply as possible. 2 1 State of the Art At the time of writing the very first sentence of this thesis, researching and implementing computer dialogue systems that can vary in many assorted as- pects has been a significant and fairly appealing part of the natural-language processing area of computer science for over fifty years. From ELIZA1 and PARRY2, simple and intuitive early chat bots that did not as a matter of fact apply many artificial intelligence approaches—although they are considered to be early artificial intelligence programs—the research moved noticeably forward to complex and sophisticated systems, which can solve and address many current issues. 1.1 Contemporary Dialogue Systems In general, dialogue systems can be divided by many diverse aspects. One of the most common methods of classifying dialogue systems is by the used initiative—a dialogue interface can have either a system initiative, user ini- tiative, or a mixed initiative—but with the current notable progress in the area of multimodal human-computer interaction interfaces, it is becoming more common to categorize dialogue systems based on modality (a dialogue interface can be multimodal or controlled by a written text, spoken word, or through a graphical user interface). For the purposes of this thesis, a classifi- cation based on areas of usage is discussed, although each of the mentioned dialogue systems may belong to more—occasionally overlapping—categories. 1.1.1 Virtual Personal Assistants Virtual personal assistants (hereinafter also referred to as VPAs) are pos- sibly the most widely known type of dialogue systems mainly due to their automatic availability in personal electronic devices. Current virtual per- sonal assistants are built to perform a large amount of different tasks, which makes the interaction with the personal device easier and faster, while also allowing the users not to focus on accomplishing their goals manually. 1. ELIZA was created between 1964 and 1966 by Joseph Weizenbaum at the Massachusetts Institute of Technology and is believed to be one of the very first dialogue systems ever implemented. The program simulated a conversation with a psychotherapist and its fundamental logic was based on the detection of critical words in the user’s text input [2]. 2. PARRY—as a reaction to ELIZA—was supposed to act as a paranoid patient suffering from schizophrenia. PARRY, created by Kenneth Mark Colby, comprised more advanced approaches than ELIZA and was even examined by the famous Turing test [3].