Domain-Sensitive Topic Management in a Modular Conversational Agent Framework

Domain-Sensitive Topic Management in a Modular Conversational Agent Framework

Domain-Sensitive Topic Management in a Modular Conversational Agent Framework A thesis submitted for the degree of Doctor of Philosophy Daniel Mac´ıas-Galindo, B.Eng., M.Sc, School of Computer Science and Information Technology, College of Science, Engineering, and Health, RMIT University. February 2014 Declaration I certify that except where due acknowledgement has been made, the work is that of the author alone; the work has not been submitted previously, in whole or in part, to qualify for any other academic award; the content of the thesis is the result of work which has been carried out since the official commencement date of the approved research program; any editorial work, paid or unpaid, carried out by a third party is acknowledged; and, ethics procedures and guidelines have been followed. Daniel Mac´ıas-Galindo School of Computer Science and Information Technology RMIT University February 2014 ii Acknowledgments First and foremost, this dissertation would not have been possible without the continuous involvement and support of my supervisors, Dr Lawrence Cavedon, Dr John Thangarajah, and Dr Wilson Wong. Their guidance from the initial stages of my Candidature until the late stages of my writing have helped me grow as a more confident person and scholar; I also consider my friends. I must also thank Prof Michael Winikoff and Prof Lin Padgham for accepting my application for a Candidature, giving me the possibility of making RMIT University and the School of Computer Science and Information Technology (CS&IT) the place where I pursued my Doctorate degree. For the technical support with respect to the Intelligent Interactive Toy, I must acknowl- edge the support received from Realthing Entertainment Pty Ltd. under the Australian Research Council Linkage grant number LP0882013. In addition, I also thank staff involved with the Toy project, who helped me with technical issues at some stage of my candida- ture: Andrew Hodgson, Dr Ralph R¨onquisst,Dr Carole Adam, Dr Patrick Ye, and Murray Henderson. Moreover, to the staff and students from the Intelligent Systems Discipline at RMIT University who got, to a lesser or greater extent, involved in the development of the Toy. Special thanks go to Aidan Martin, who helped design the GUI for collecting judges described in Chapters 4, 5 and 6. For the financial support required in this venture, I acknowledge the support from CONA- CYT, my main sponsor for both the Masters and PhD Candidature. Funding for conducting the experiments described in this thesis was obtained from my supervisors. Financial support for the presentation of the papers produced in this research was mostly obtained from the School of Graduate Research at RMIT University. I want to acknowledge my Undergraduate and Masters supervisors, Dr Darnes Vilarino and Dr Fabiola Lopez for encouraging me to pursue a Doctorate degree. My gratitude also goes to people who helped me make my journey more enjoyable. To my office mates Dhiren- dra, Lavindra, Nitin, Dave, Stephen, Mahmud, Yoosef, and Borhan. To the research staff from the Intelligent Systems Group and the School of CS&IT for your valuable suggestions and comments. To close friends who I met through these years: Patricia, Mardi, Naimah, Vidura, Dhiah, Ayman, Ermyas, Rodrigo, Peter and Jessica, for sparing part of your time outside uni. To the administrative staff of the School of CS&IT for being so helpful when administrative processes were required. To the School of Statistics and Geospatial Science, for providing support for Statistical Consultancy, and to the Writing Circle under the super- vision of Jennifer Anderson for helping me improve my writing. And also to everyone in the SAPI team in Sensis Pty Ltd, where I currently continue my endeavors. I acknowledge and want to remember forever the unconditional support from my part- iii ner, Lorena, who has been with me all this time, even before we thought about living in Melbourne. Also to my parents Rafael and Guadalupe, my brother Federico and my family back in Mexico, who have been in my mind all this time and supported me from the dis- tance. Special thanks to my friends from high school: Carlos, Emilio, Erwin, Hector, Ivan, Javier, Manuel, Pepe, Rafael, Raul S, Raul M, and Rodrigo; thanks to my friends from uni: Alejandra, Chelo, Gesuri, Karina R, and Karina S. And to the anonymous reviewers who have been involved in the reading of this dissertation and the papers that were written from parts of it. This work is dedicated to all of you. iv Credits Portions of the material in this thesis have previously appeared in the following publications: • D. Macias-Galindo, L. Cavedon, J. Thangarajah, and W. Wong. Effects of domain on measures of semantic relatedness. Journal of the American Society for Information Science and Technology, To appear (ERA Rank: A*) • D. Macias-Galindo, W. Wong, L. Cavedon, and J. Thangarajah. Coherent topic transi- tion in a conversational agent. In Proceedings of INTERSPEECH, pages 1{4, Portland, OR, USA, 2012 (ERA Rank: A) • D. Macias-Galindo, W. Wong, L. Cavedon, and J. Thangarajah. Using a lexical dictio- nary and a folksonomy to automatically construct domain ontologies. In Proceedings of the Australasian Joint Conference on Artificial Intelligence (AI), pages 638{647, Perth, WA, Australia, 2011b (ERA Rank: B) • D. Macias-Galindo, L. Cavedon, and J. Thangarajah. Building Modular Knowledge Bases for Conversational Agents. In IJCAI Workshop on Knowledge Representation and Reasoning for Practical Dialogue Systems (KRPDS), pages 16{23, Barcelona, Spain, 2011a (ERA Rank: Unranked) This work was supported by the Consejo Nacional de Ciencia y Tecnolog´ıa(CONACYT), scholarship number 201228. The thesis was written in Texlipse under Windows 7 and Ubuntu, and typeset using the LATEX 2" document preparation system. All trademarks are the property of their respective owners. Note Unless otherwise stated, all fractional results have been rounded to the displayed number of decimal figures. Contents Abstract 1 1 Introduction 3 1.1 Context . 4 1.2 Challenges in Open-ended Conversational Agents . 5 1.3 Contributions . 6 1.4 Publications . 8 1.5 Thesis Structure . 8 2 Background and Related Work 11 2.1 Conversational Agents . 12 2.1.1 The Intelligent Interactive Toy . 14 2.1.1.1 Architecture . 14 2.1.1.2 Input and Output Processing in the Toy . 16 2.1.1.3 Topic Management . 17 2.1.2 Dialogue Coherence in the Toy . 18 2.2 Knowledge Representation in Conversational Agents . 19 2.2.1 Knowledge Structures . 20 2.2.1.1 Ontologies . 20 2.2.1.2 Taxonomies . 23 2.2.1.3 Folksonomies . 23 2.2.2 Classification of Ontologies . 24 2.2.3 Construction of Ontologies . 24 2.2.3.1 Expert-constructed Ontologies . 25 2.2.3.2 Community-driven Ontologies . 30 2.2.3.3 Automatic Ontology Construction . 35 2.3 Text and Dialogue Coherence . 40 2.3.1 Text Coherence . 40 v CONTENTS vi 2.3.2 Analysis of Dialogue Coherence . 42 2.4 Semantic Relatedness . 44 2.4.1 Measuring Semantic Relatedness or Similarity . 46 2.4.1.1 Taxonomy-based Measures . 46 2.4.1.2 Folksonomy-based Measures . 51 2.4.1.3 Web-based Measures . 53 2.4.2 Datasets for Semantic Analysis . 55 2.4.2.1 Datasets for Analysing Semantic Similarity . 56 2.4.2.2 Datasets for Analysing Semantic Relatedness . 57 2.5 Influence of Domain on Semantic Analysis Tasks . 58 2.5.1 Domain in Semantic Analysis . 59 2.5.2 Domain Information for Word Sense Disambiguation . 60 2.5.3 Domain-specific Semantic Relatedness . 60 2.6 Summary . 62 3 Semantic Relatedness and Coherence in Dialogue: A Pilot Experiment 63 3.1 Motivation . 63 3.2 Conversational Fragments Used in this Chapter . 64 3.3 Hypothesis of the Experiment . 66 3.3.1 Interaction Sequence . 67 3.3.2 Nearest-Context Approach [Gandhe and Traum, 2007] . 68 3.3.3 Proposed Approach using Semantic Relatedness . 70 3.4 Comparing Approaches for Conversational Fragment Selection . 71 3.4.1 Experimental Setup . 72 3.4.2 Constructing Sample Conversations . 72 3.4.3 Results and Discussion . 74 3.4.4 Failure Analysis . 76 3.5 Summary . 77 4 M-OntoBUILD: Constructing Domain-specific Ontologies 79 4.1 Modular Ontologies . 80 4.2 The Architecture of Modular Ontologies . 81 4.3 Stages of M-OntoBUILD ............................. 82 4.3.1 Stage 1. Definition of the Primary Domain Concept . 85 4.3.2 Stage 2. Automatic Extraction of Domain-related Concepts . 86 4.3.3 Stage 3. Hierarchy Construction . 99 4.3.4 Stage 4. Connecting Multiple M-Ontos . 100 CONTENTS vii 4.4 Example Extracted . 101 4.5 M-OntoBUILD as a Java Tool . 101 4.6 Evaluating the M-OntoBUILD Process . 103 4.6.1 Design of the Domain Appropriateness experiment . 105 4.6.2 Data Collection . 107 4.6.3 Evaluation Metrics . 107 4.6.4 Participants' Inter-agreement . 110 4.6.5 Results . 111 4.6.6 Error Analysis . 114 4.7 Summary . 116 5 A Framework for Evaluating Domain-based Semantic Relatedness 118 5.1 An Overview of Semantic Relatedness . 119 5.2 Assessing Term Relatedness . 119 5.3 Constructing a Domain-aware Dataset for Evaluating Semantic Relatedness . 120 5.3.1 Steps in the Construction of the Dataset . 121 5.3.2 Contrasting Properties of the Dataset with Other Testbeds . 123 5.4 Behaviour of Semantic Relatedness in Concepts from the Same Domain . 127 5.4.1 Design of the Same-Domain Exploration . 128 5.4.2 Relatedness Measures compared in the Exploration . 129 5.5 Results . 133 5.5.1 Statistical Tests . 133 5.5.2 Agreement between Assessors . 134 5.5.3 Distributions of Pairs by Type .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    276 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us