<<

IntelliBot: A Domain-specific for the Insurance Industry

MOHAMMAD NURUZZAMAN

A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy

UNSW Canberra at Australia Defence Force Academy (ADFA) School of Business

20 October 2020

ORIGINALITY STATEMENT

‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institute, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.’

Signed

Date

To my beloved parents

Acknowledgement

Writing a thesis is a great process to review not only my academic work but also the journey I took as a PhD student. I have spent four lovely years at UNSW Canberra in the Australian Defence Force Academy (ADFA). Throughout my journey in graduate school, I have been fortunate to come across so many brilliant researchers and genuine friends. It is the people who I met shaped who I am today. This thesis would not have been possible without them. My gratitude goes out to all of them.

Above all, I would express my supreme gratitude to my PhD supervisor, Assoc. Prof. Omar Khadeer Hussain, who has been a fantastic advisor throughout this journey. He has encouraged me to the challenging field of natural language processing and deep neural networks. He has been the most supportive of my work, providing me with excellent guidance and support in both academic and professional; and encouragement in times of need. In addition to this, he has also been very patient and understanding, more than anyone I have known. Prof Omar has made my journey an enjoyable one, provided lots of useful feedback, corrected my work extremely promptly, for which I have the utmost gratitude. He has also provided me with great opportunities to participate in a lot of interesting research projects. There would not exist this thesis without his constant support.

Importantly, I would like to thank my parents Al-Hajj Sultan Ahammad and Sahana Sultana for their unconditional love throughout my life and provided me with the soil to grow. Without their support I may not have found myself at UNSW, nor had the courage to engage in this journey and see it through. My parents, to whom my academic career owes greatly and supported me wholeheartedly for my pursuit of dreams, even if that meant a distance of thousands of miles for many years. Furthermore, I thank to my younger brother Kamruzzaman Poran, two sisters Aspiya Sultana Shekha and Sultana Momotaz Sima, and two brothers-in-law Md Mahmud Reza and Salam Prodhan. I would not be at this stage I am today without their unflagging support and great sacrifices towards my along the way. are not enough to express their encouragement and love in my life. Thank you very much for always being there for me. I am very grateful to have such a supportive family.

I am also grateful to Assoc Prof Farookh Hussain and Dr Morteza Saberi, who always has confidence on me and encourages me to pursue a higher standard and for their generous

help in addressing research issues and paper revision. I thank to Gianin Zogg for giving me valuable career advice and inspiration. To Samuel Sun and Ramesh Thiagalingam, who helped me solve the issues in the project and industry collaboration. Their insightful guidance and sense of responsibility motivate me towards professional personnel. To my colleagues Peter Scott and Greg Creighton for their supports on hypothesis explanation insightful technical conversations and participation in fruitful discussions. They have continuously demonstrated how to discover interesting problems, how to bake ideas and how to ruthlessly question a work, especially that of oneself. To my friends who aside from being teammates, Gautham Ravi and Yifan Zhao, whom I learned a lot while working with them. Thank you buddies, for your hard work and making our time worthwhile.

And my deepest thanks to Saleh Ibne Rosul, Ashraf Chowdhury, his beloved wife Mahzabin Akhter and their little adorable daughter Ophelia Chowdhury, who makes it a big family to me and cares for me more than himself. Special appreciation goes to Farida Yesmin Shetu with whom I always share my good news and frustration. I couldn't complete this journey and achieve any accomplishment without their unconditional love.

I would like to acknowledge the financial supports from UNSW, for providing international postgraduate award, tuition fees, research stipend and student health coverage. Life at UNSW Canberra at ADFA would have been much more difficult without members of the administrative and technical staffs. I would like to thanks to all staffs in the School of Business and School of Engineering and Information Technology at UNSW. I would like to take this opportunity to thanks my committee members Prof Michael O’Donnell, Prof Satish Chand, Dr Fiona Buick, Prof Elizabeth Chang, Jessica Campbell and Elvira Berra, for their extreme support and provided insightful comments to make this thesis deeper and more coherent.

Special thanks to my UNSW friends Sukanto Kumer Shill, Abdul Khaleque, Hang Thanh Bui, Ahmad Jorban Al-mahasneh, Tasneem Rahman, Ahasanul Haque, Xiao Zhang, Md Alamgir Hossain, Tanmoy Das Gupta, Mohiuddin Khan, Sohel Ahmed, Anwar Us Saadat, Forhad Zaman, Mousa Hadipour, Wenxin Chen and Jo Ji.

Also, I would like to thanks to all of my school friends Salahuddin Ahmed, Istiaque Ahmed, Saiful Islam, Mydul Hossain Khan, Kaniz Fatema, Mahbub Alam, Alauddin, Mohammad Razzakul Haider, Bashir Ahmed, Noor Mohammad, Mohammad Shazzad Hossain Bhuiyan, Isa

Ahmmed Saleh, Shahin Ahmed, Golam Mostofa, Monower Hossen, Tanvir Hasan, Hedayet Ullah, Nazmul Islam, Shamim Hossain, Ruma Pervin, Zahid Hasan, Tahamina Alich, Shahana Akter Shelly, Ayesha Siddiqua, Mahmud Hasan, Nazmul Haque, Homayun Khan, Mohasin Ali, Mohammad Al Amin, Husne Ara, Asmaul Husna, Shamima Akter, Jesmine Akter, Rustom Ali, Shohidul Shazib, Nilima Ibrahim, Abu Ohab, Abbas Uddin, Rokonuzzaman, Azman Khan, Nasreen, Mahamuda, Bappi, Belal Hossain, Abu Bakar Siddique, Arifur Rahman and Sir Afzal Hossain, who left precious memories for me. I thank you for steering me to a better self.

Last but not least, I thank to all my friends and family members specially SalBadiul Alam, Kamal, Abul Khair Mojumdar, Jahangir Alam, Mohammed Reaz, Al-Mamun, Alamgir Alam, Ahmad Abdul Majid, Walid Abouaghreb, Rajib Hasan, Aziz, Anawarul, Mahadi Miraz, Shamimul Azim, Taslima Akter Tasu, Faria Zaman, Nurjahan Akter Shanta, Michelle Williams, Nafis Iqbal, Zakir Hossain, Vinay Kumar Adepu, Fatin Nurul Ghazali, Sharmin Afroz, Tania Habib, Prof Hatim Mohammad Tahir, Prof Azham Hussain, Zhamri Che Ani, Mohammad Amir, Dr. Husbullah Omar, Prof Wan Rozaini Sheik Osman, Zaini Mostafa, Muhammad Aiman Mazlan, Dr Shifa Mahmod, Dr Azman Yasin, Peter Chong, Chang Fu Tong, Sing Choong Lau, Lee Teik Hui, Kevin Yap, Imam Hossain, Freed Jawad, Aumio Chowdhury, Akhlaqur Rahman, Ishrar Tabenda Hasan, Saeed Nezamy, Yogi Babria, Bahar Torkaman, Arif Yusof, Dina Rayhan, Ritesh Singh, Ajita Shah, Santosh Mainali, Sheikh Salam, Mesbahur Rahman Topu, Surya Maharjan, Ram Sharma, Reejo Augusti, Samir Khan, Tenzin, Baysaa Baatar, Farid Ahmed, Ang Ling Weay and Zannatul Mawa Dihan. I thank you for steering me to a better self.

Abstract

Communication is an indispensable aspect in the success of any business. Due to the increase in digital innovation, Internet-based services such as now play a vital role in maintaining communication between users. However, a traditional chatbot’s dialogue capability is quite inflexible as It can answer the user only if there is pattern-matching between the set of questions-answers and user’s query. This may leave the customers unhappy and research has shown that 91% of unhappy customers tend not to engage with the business again. To address this, chatbots needs to have meaningful dialogue abilities rather than merely providing either a yes, no or a short response. The major challenge in building a better AI model is ensuring it has a domain-specific conversational capability to engage with the user while presenting meaningful responses and semantically correct information.

The existing literature has explored the capabilities of advanced techniques such as recurrent neural networks (DBRNN) for chatbots to engage in human-like conversation and generate responses. However, while an enormous amount of research has been done to bring this idea to realization, no significant outcome in the area of engaging with the users while generating a response is achieved to date. To address this problem, in this thesis, an innovative framework architecture, named IntelliBot, is designed and developed. IntelliBot is a chatbot which to facilitate a high degree of engagement with the user using the seq2seq model when generating its response with the user. Additionally, it not only has the ability to answer a question, but also complex user queries with a semantically correct meaningful response and solves user queries specifically in the insurance domain. To meet these challenges, IntelliBot generates a response in four distinct ways, namely, template-based strategy, knowledge- based strategy, internet retrieval strategy and generative-based strategy. An AI selection process is adopted which sequentially determines which strategy fits best according to the specifics of the user’s question.

To demonstrate the effectiveness of IntelliBot in generating a superior response, its outputs are evaluated against three publicly available chatbots. These results were then evaluated by the experts to determine their accuracy to the questions asked. Metrics such as Cohen Kappa

and F1 score were computed to benchmark the results of each chatbot. These scores demonstrated that IntelliBot outperformed and overcome the shortcomings of the existing chatbots and had better conversational capabilities which led it to give the highest number of complete, meaningful and semantically correct answers to the questions asked in a service industry.

List of Publications arising from this thesis

Referred Journal Articles: 1. Nuruzzaman, . and Hussain, OK. (2020). "IntelliBot: A Dialogue-based chatbot for the insurance industry," Knowledge-Based Systems, vol. 196, p. 105810. 21/05/2020. doi> https://doi.org/10.1016/j.knosys.2020.105810 2. Nuruzzaman, M., Hussain, OK., and Hussain, FK. (2020). “Design and Training of Response Generation Strategies for Domain-oriented Chatbot with Grammar Error Check”, ID: TEIS-2020-0204, Submitted to Enterprise Information Systems Journal.

Referred Conference Articles: 1. Nuruzzaman, M. and Hussain, OK. (2019). “Identifying facts for chatbot's via sequence labelling using Recurrent Neural Networks”. Proceedings of the ACM Turing Celebration Conference, on May 17–19. ACM , Article No. 93. doi>https://dl.acm.org/doi/10.1145/3321408.3322626 2. Nuruzzaman, M. and Hussain, OK. (2018). “A Survey on Chatbot Implementation in Customer Service Industry through Deep Neural Networks”. 2018 IEEE 15th International Conference on e-Business Engineering (ICEBE) on 12-14 Oct 2018. IEEE Computer Society. doi> https://ieeexplore.ieee.org/document/8592630

News Mentioned: 1. Anonymous,"Engineering - Knowledge Engineering; Findings on Knowledge Engineering Detailed by Investigators at UNSW" NewsRx, United States, Atlanta, pp. 143, 25th May 2020, retrieved on 15/06/2020, available at: https://login.wwwproxy1.library.unsw.edu.au/login?qurl=https%3A%2F%2Fsearch.pro quest.com%2Fdocview%2F2406951496%3Faccountid%3D12763

TABLE OF CONTENTS

Abstract ...... xii List of Publications arising from this thesis ...... iii TABLE OF CONTENTS ...... v List of Tables ...... x List of Figures ...... xii List of Abbreviations ...... xv CHAPTER 1 ...... 1 1.1 Chatbots and Their Evolution to Answer a Customer’s Queries ...... 1 1.2 Shortcomings of Existing Chatbots to Answer Customer Questions in a Service Industry ...... 3 1.2.1 The motivation of choosing the insurance industry as the area of application ...... 5 1.3 Objectives of the Thesis ...... 6 1.4 Research Questions to Achieve the Research Objectives ...... 7 1.5 Contributions of the Thesis ...... 8 1.6 Significance of the Thesis ...... 9 1.7 Scope of the Thesis ...... 9 1.8 Structure of the Thesis ...... 9 CHAPTER 2 ...... 11 2.1 Overview ...... 11 2.2 Taxonomy of Chatbots ...... 11 2.2.1 Goal-based chatbot ...... 13 2.2.2 Knowledge-based chatbot ...... 13 2.2.3 Service-based chatbot ...... 14 2.2.4 Response generated-based chatbot ...... 14 2.3 Analysis of Models Used to Generate a Response that Mimics a Human Brain ...... 15 2.3.1 Template-based Model ...... 15 2.3.2 Retrieval-based Model...... 16 2.3.3 Search Engine Model ...... 19 2.3.4 Generative Model ...... 20 2.4 Workings of the Existing Chatbots in the Literature ...... 21 2.4.1 Elizabot ...... 22 2.4.2 Alicebot ...... 22 2.4.3 Elizabeth bot ...... 23 2.4.4 Mitsuku ...... 24 2.4.5 Cleverbot ...... 24 2.4.6 Chatfuel ...... 25 2.4.7 ChatScript ...... 25 2.4.8 IBM ...... 26 2.4.9 LUIS ...... 26 2.4.10 Google ...... 27 2.4.11 Amazon Lex ...... 27 2.5 Techniques Used in Existing Dialogue-based Chatbots to Build and Generate a Response ...... 31 2.5.1 Rule-based approach ...... 31 2.5.2 TF-IDF approach ...... 31 2.5.3 End-to-End approach ...... 33 2.5.4 RNN approach with seq2seq mechanism ...... 33 2.5.5 RNN approach with memory network ...... 38 2.6 Critical Evaluation of the Literature ...... 39 2.7 Conclusion ...... 41 CHAPTER 3 ...... 42 3.1 Introduction ...... 42 3.2 Key Terms ...... 42

3.3 Existing Gaps in Domain-oriented Dialogue-based Chatbots Which Aim to Engage with Customers in the Service Industry ...... 43 3.3.1 Drawback 1: Use of templates to map questions and answers to respond to user questions ...... 43 3.3.2 Drawback 2: Inability to respond to a user’s complex queries ...... 44 3.3.3 Drawback 3: Deciding which strategy to select according to the question asked to generate a meaningful and domain-specific response ...... 44 3.3.4 Drawback 4: Unable to engage users in a meaningful conversation ...... 44 3.3.5 Drawback 5: Unable to identify errors in user questions ...... 45 3.3.6 Drawback 6: Unable to learn continuously from a user-bot conversation ...... 46 3.4 Research Problem Addressed in this Thesis ...... 46 3.5 Adopted Research Methodology to Solve the Thesis Problem ...... 48 3.5.1 Theoretical study ...... 50 3.5.2 Addressing the problem ...... 50 3.5.3 Solution design ...... 50 3.5.4 Experiment ...... 51 3.6 Conclusion ...... 51 CHAPTER 4 ...... 52 4.1 Introduction ...... 52 4.2 Key Terms ...... 52 4.3 Requirements of a Domain-specific Chatbot ...... 53 4.4 Methodological Approach for Designing and Building a Domain-specific Chatbot ...... 55 4.4.1 Identify components ...... 56 4.4.2 Design conceptual framework ...... 58 4.4.3 Develop and train AI model ...... 59 4.4.4 Experiment and validation ...... 59 4.5 Proposed Conceptual Model of IntelliBot’s Response Generation Component ...... 59 4.5.1 User emulator ...... 60 4.5.2 Input Processing Unit (IPU) ...... 61 4.5.3 Neural Dialogue Manager (NDM) ...... 61 4.5.3.1 Language Understanding Unit (LUU) ...... 62 4.5.3.2 Strategy Selection Unit (SSU)...... 64 4.5.3.3 Context Tracking Unit (CTU) ...... 68 4.5.3.4 Response Generator Unit (RGU) ...... 70 4.5.3.5 Response Analyser Unit (RAU)...... 70 4.6 Conclusion ...... 71 CHAPTER 5 ...... 72 5.1 Introduction ...... 72 5.2 Key Terminology ...... 73 5.3 Strategy Selection Unit’s Workflow to Generate a Response to the User’s Query ...... 73 5.4 Design and Working of the Template-based Strategy ...... 75 5.4.1 Objective ...... 75 5.4.2 Summary of the working of the template-based strategy ...... 75 5.4.3 Detailed process of generating a response ...... 76 5.4.3.1 User question resulting in a direct match with the defined templates ...... 78 5.4.3.2 User question resulting in an induced match with the defined templates ...... 78 5.4.4 Limitation of template-based strategy ...... 79 5.5 Design and Working of the Knowledge-based Strategy ...... 80 5.5.1 Objective ...... 80 5.5.2 Summary of the working of the knowledge-based strategy ...... 80 5.5.3 Detailed Process of generating a response ...... 81 5.5.4 Limitation of knowledge-based strategy ...... 86 5.6 Design of the Internet Retrieval (IR) Strategy ...... 87

5.6.1 Objective ...... 87 5.6.2 Summary of working of Internet retrieval strategy ...... 87 5.6.3 Detailed process of generating a response ...... 88 5.6.3.1 Question analyser ...... 88 5.6.3.2 Answer analyser ...... 91 5.6.4 Limitation of Internet retrieval strategy ...... 92 5.7 Design of Generative-based Strategy ...... 92 5.7.1 Objective ...... 92 5.7.2 Summary of the working of the generative-based strategy ...... 92 5.7.3 Detailed process of generating a response ...... 93 5.7.4 Limitation of generative-based strategy ...... 97 5.8 Conclusion ...... 97 CHAPTER 6 ...... 98 6.1 Introduction ...... 98 6.2 Key Terms ...... 99 6.3 Natural Language Processing (NLP) Tasks Performed in the LUU of IntelliBot ...... 99 6.3.1 Lowercase conversion ...... 100 6.3.2 Tokenization ...... 101 6.3.3 Abbreviation determination ...... 102 6.3.3.1 Abbreviation recognizer ...... 104 6.3.3.2 Abbreviation extractor ...... 105 6.3.3.3 Definition finder ...... 105 6.3.3.4 Abbreviation matcher ...... 105 6.3.4 POS tagging using HMM ...... 106 6.3.5 Grammar check and correction ...... 107 6.3.5.1 Classification of grammatical errors ...... 108 6.3.5.2 Process in GEC to detect and correct errors ...... 110 6.3.5.2.1 Text classification to detect errors ...... 112 6.3.5.2.2 Text transformation to correct errors ...... 114 6.3.6 Removing stopwords ...... 116 6.3.7 Lemmatization ...... 117 6.3.8 Entity extraction ...... 117 6.3.9 Punctuation removal ...... 120 6.4 Computing of a Possible Answer with the User Question ...... 121 6.4.1 Detail of determining the semantic similarity at the level ...... 123 6.4.1.1 Identifying words and POS tagging ...... 124 6.4.1.2 Find word sense disambiguation ...... 125 6.4.1.3 Calculate the shortest path between two synsets ...... 125 6.4.1.4 Hierarchical distribution of words ...... 127 6.4.1.5 Measuring the similarity between the two vectors ...... 128 6.4.2 Detail of semantic similarity at the sentence level ...... 128 6.5 Process of sentence scoring at RAU ...... 131 6.6 Conclusion ...... 132 CHAPTER 7 ...... 133 7.1 Introduction ...... 133 7.2 Key Terms ...... 134 7.3 Process of collecting the data required for each response generation strategy ...... 134 7.4 Process of training the generative-based strategy for response generation of IntelliBot ...... 138 1) Prepare Data ...... 139 2) Extract features ...... 139 3) Design of neural networks for training ...... 139 4) Setup training environment ...... 139 5) Training the RNN to generate a response ...... 139

7.5 Data preparation ...... 140 7.5.1 Data cleansing ...... 140 7.5.2 Removal of duplicate data ...... 144 7.6 Feature Engineering to Extract Features and Use them for Training at DBRNN ...... 144 7.6.1 Extracting features required to train IntelliBot ...... 146 7.6.1.1 Character-level layer ...... 147 7.6.1.2 Highway layer ...... 148 7.6.1.3 Word-level layer ...... 148 7.6.1.4 CRF layer ...... 148 7.7 Design Neural Networks ...... 149 7.7.1 Input standardization ...... 149 7.7.2 Determine neuron and neural network layers ...... 151 7.7.3 Determine the activation function for each layer ...... 153 7.7.4 Identify values of weights initialization ...... 155 7.7.5 Adding bias ...... 157 7.7.6 Word embeddings ...... 159 7.7.7 Batch normalization ...... 163 7.8 Training environment ...... 165 7.8.1 Phase 1: Training on the Cornell dialogue dataset ...... 166 7.8.2 Phase 2: Training on insurance domain dataset ...... 166 7.8.3 Phase 3: Training on particular words ...... 167 7.9 Training of IntelliBot using DBRNN ...... 168 7.9.1 Forward propagation ...... 168 7.9.1.1 Input layer ...... 169 7.9.1.2 Hidden layer ...... 169 7.9.1.3 Output layer ...... 171 7.9.2 Backward propagation ...... 172 7.9.2.1 Calculate the total error in the output layer ...... 174 7.9.2.2 Check whether error is minimized (Iterate until converged) ...... 175 7.9.2.3 Update parameters ...... 175 7.9.3 Stochastic gradient descent (SGD)...... 176 7.9.4 Attention mechanism in training ...... 177 7.9.4.1 Global attention model ...... 177 7.9.4.2 Local attention model...... 178 7.10 Conclusion ...... 179 CHAPTER 8 ...... 180 8.1 Overview ...... 180 8.2 Process of Evaluating IntelliBot’s Output Against the Requirements and the Outputs of the Other Chatbots ...... 181 8.3 Tools and Techniques Used to Develop the IntelliBot Prototype ...... 182 8.4 Different Categories of Questions for Chatbot Evaluation ...... 185 8.5 High-level Overview of the Three Existing Chatbots Used in the Experiment for Comparison with IntelliBot ...... 188 8.6 Output of RootyAI, ChatterBot, DeepQA and IntelliBot on the Considered Questions ...... 189 8.7 Evaluate Engagement with the User in Relation to the Responses Generated by the Chatbots . 200 8.7.1 Expert judgment ...... 200 8.7.2 Measuring cohen’s kappa co-efficient to ensure agreement between the experts ...... 203 8.8 Demonstrating IntelliBot’s Ability to Correct Grammatical Errors in the Questions before Generating a Meaningful Response ...... 204 8.9 Exploratory Test (ET) ...... 212 8.9.1 Strategy of conducting an exploratory test ...... 213 8.10 Conclusion ...... 217 CHAPTER 9 ...... 219

9.1 Recapitulation of the Thesis ...... 219 9.2 Contributions of the Thesis ...... 220 9.2.1 Contribution 1: Develops a modular-based framework for generating appropriate responses to user queries...... 221 9.2.2 Contribution 2: Develops different response generation strategies that can answer a user’s question according to its complexity...... 221 9.2.3 Contribution 3: Develops the detailed working of the different sub-components of IntelliBot that assist it to process and understand the user’s input...... 222 9.2.4 Contribution 4: Develops an approach to collect insurance domain-specific data required to train IntelliBot...... 222 9.2.5 Contribution 5: Compares and validates the outputs of IntelliBot with three existing chatbots to demonstrate IntelliBot’s accuracy and superiority in engaging with the users while answering their questions...... 222 9.3 Future Work Arising from this Thesis ...... 223 9.3.1 Future improvement for the chatbot to be domain independent ...... 223 9.3.2 Future improvement in response generation techniques ...... 223 9.3.3 Future improvement in domain-oriented dataset ...... 224 9.3.4 Future improvement in the neural network model...... 224 9.3.5 Future improvement in evaluation approach ...... 224 9.3.6 Future improvement in correcting grammatical errors and identifying abbreviations .... 225 9.3.7 Future improvement in unsupervised and self-learning capability ...... 225 9.3.8 Future improvement in speech chatbots ...... 226 REFERENCES ...... 227 APPENDIX A ...... 237 APPENDIX B ...... 241 APPENDIX C ...... 246

List of Tables

Table 2.1 Template-based models – description with issues and impacts ...... 16 Table 2.2 Retrieval-based models – description with issues and impacts ...... 18 Table 2.3 Search engine models – description with issues and impacts ...... 19 Table 2.4 Generative-based models – description with issues and impacts ...... 20 Table 2.5 Features and drawbacks of existing chatbots ...... 28 Table 2.6 Summary of the TF-IDF approaches to generate a response with their description and issues ...... 32 Table 2.7 Summary of the end-to-end approaches to generate a response with their description and issues ...... 33 Table 2.8 Summary of the RNN approaches with seq2seq to generate a response with their description and issues ...... 37 Table 2.9 Summary of RNN approaches with memory networks to generate a response with their description and issues ...... 39 Table 5.1 Template-based pattern matching ...... 76 Table 5.2 Corresponding question types and event elements ...... 80 Table 5.3 POS and entity dependency relationship of the user question ...... 82 Table 5.4 Example of similar meaning (senses) of a word ...... 83 thankTable 6.1 Example of lowercase conversation ...... 100 Table 6.2 List of abbreviations in full form ...... 103 Table 6.3 List of abbreviations ...... 103 Table 6.4 Abbreviation categorization ...... 105 Table 6.5 Confidence score of responses ...... 122 Table 6.6 Synsets of words ...... 125 Table 6.7 Similarity between answer relevant to the question ...... 131 Table 7.1: Statistics for the insurance domain-specific QA dataset ...... 135 Table 7.2 Statistics for the Cornell movie corpus ...... 137 Table 7.3 Sample raw data from Cornell movie corpus ...... 137 Table 7.4 Sample raw data from Cornell movie corpus ...... 137 Table 7.5 Statistics for the vocabulary dataset used to build IntelliBot ...... 138 Table 7.6 List of features used in experiments ...... 145 Table 7.7 Example of lexical feature extraction ...... 146 Table 7.8 List of tokens to fill the input sequence ...... 150 Table 7.9 Filling the input sequence in bucket size of (5,10) ...... 150 Table 7.10 List of activation functions of neural networks ...... 154 Table 7.11 Importance of appropriate weight initialization ...... 155 Table 7.12 Training system's specification ...... 165 Table 7.13 Summary of training parameters’ specification ...... 167 Table 7.14 Vector representation of x ...... 169 Table 8.1 List of hardware used in developing IntelliBot ...... 182 Table 8.2 List of software used in developing IntelliBot ...... 182 Table 8.3 List of library packages used install in python environment ...... 183 Table 8.4 Questions in the greetings category ...... 186 Table 8.5 Questions in the asking for assistance category ...... 186 Table 8.6 Questions in the asking for time & date category ...... 186

Table 8.7 Questions in the general category...... 187 Table 8.8 Questions in the arithmetic problem-solving category ...... 187 Table 8.9 Questions in the domain-specific category ...... 187 Table 8.10 Questions in ending the chat session category ...... 188 Table 8.11 User questions and the response received from each chatbot ...... 190 Table 8.12 Confusion from the results of each chatbot ...... 197 Table 8.13 Precision, Recall, and F1 Score ...... 198 Table 8.14 Example of rating used by an expert to score the answer of each chatbot ...... 202 Table 8.15 Statistics of the general conversation rating ...... 202 Table 8.16 Statistics of the domain-specific conversation rating ...... 203 Table 8.17 Expert’s agreement ...... 203 Table 8.18 Cohen kappa co-efficient value for each chatbot ...... 204 Table 8.19 Confusion matrix from the results of each chatbot ...... 205 Table 8.20 Precision, Recall, and F1 Score ...... 205 Table 8.21 Error responses from the three existing chatbots and IntelliBot ...... 206 Table 8.22 Generating meaningful responses and engaging the user in conversation ...... 214 Table 8.23 Detecting and correcting grammatical errors based on user confirmation ...... 215 Table 8.24 Multiple strategy selection for generating a response ...... 216 Table 8.25 Validate the IntelliBot prototype ...... 216

List of Figures

Fig. 1.1 Adoption of chatbots in different industries ...... 2 Fig. 2.1 Taxonomy of chatbot classification according to the requirements ...... 13 Fig. 2.2 Classification of response generated-based models ...... 15 Fig. 3.1 Research methodology adopted in this thesis to solve the research problem ...... 50 Fig. 4.1 Methodological approach ...... 55 Fig. 4.2 Components required for building a chatbot application ...... 56 Fig. 4.3 Components required in a response-generating chatbot application ...... 57 Fig. 4.4 Conceptual framework of IntelliBot ...... 60 Fig. 4.5 Mobile and web Interface of IntelliBot ...... 61 Fig. 4.6 Neural Dialogue Manager (NDM) of IntelliBot ...... 62 Fig. 4.7 Selection policy of AI conversational strategies ...... 65 Fig. 4.8 High-level workflow of template-based strategy ...... 65 Fig. 4.9 High-level workflow of knowledge-based strategy ...... 66 Fig. 4.10 High-level workflow of Internet retrieval strategy ...... 67 Fig. 4.11 High-level workflow of generative-based strategy ...... 67 Fig. 5.1 Conversational strategy selection in SSU ...... 74 Fig. 5.2 Design of the template-based strategy ...... 76 Fig. 5.3 Basic building block of AIML code snippet ...... 77 Fig. 5.4 Recursion of AIML code snippet ...... 78 Fig. 5.5 Memorizing previous conversation of AIML code snippet ...... 79 Fig. 5.6 Design of the knowledge-based strategy ...... 82 Fig. 5.7 Semantic graph and entity dependency of user question ...... 83 Fig. 5.8 Code snapshot of KB query formation ...... 84 Fig. 5.9 Code snapshot of percentage of matching words ...... 85 Fig. 5.10 Design of Internet retrieval strategy...... 87 Fig. 5.11 Semantic graph and entity dependency of the question...... 88 Fig. 5.12 Code snippet of web crawling ...... 89 Fig. 5.13 process from the web using a web crawler...... 89 Fig. 5.14 Traverse child node to obtain expected question and answer ...... 90 Fig. 5.15 HTML code snapshot ...... 91 Fig. 5.16 Design of the generative-based strategy ...... 93 Fig. 5.17 Architecture of the DBRNN seq2seq model ...... 94 Fig. 5.18 Visual representation of input to output ...... 96 Fig. 6.1 NLP tasks performed in the LUU of IntelliBot ...... 100 Fig. 6.2 Code snippet of lowercase conversion ...... 101 Fig. 6.3 Code snippet of tokenization ...... 102 Fig. 6.4 Workflow of abbreviation recognition and extraction ...... 104 Fig. 6.5 Part-of-speech tagging into a sentence ...... 107 Fig. 6.6 Classification of grammatical errors ...... 108 Fig. 6.7 Process of grammar checking ...... 111 Fig. 6.8 Working of the text classification & error detection phase ...... 113 Fig. 6.9 Working of the text classification & error correction phase ...... 115 Fig. 6.10 Code snippet of stopwords ...... 116 Fig. 6.11 Code snippet of lemmatization ...... 117

Fig. 6.12 Process of entity extraction ...... 118 Fig. 6.13 POS tagging for both sentences ...... 118 Fig. 6.14 Entity recognition for both sentences ...... 119 Fig. 6.15 Coreference resolution ...... 119 Fig. 6.16 POS tagging with entity dependency relationship ...... 120 Fig. 6.17 Named entity recognition ...... 120 Fig. 6.18 Code snippet of removing punctuation ...... 121 Fig. 6.19 Various senses of a word ...... 122 Fig. 6.20 Semantic similarity determined at the sentence and word levels in the four response generation strategies ...... 123 Fig. 6.21 Semantic similarity at the word level ...... 124 Fig. 6.22 Hierarchical structure graph (subset of wordNet) ...... 126 Fig. 6.23 Hierarchical distribution of words ...... 127 Fig. 6.24 Semantic similarity at the sentence level ...... 129 Fig. 6.25 Two sets with Jaccard similarity 7/13 ...... 131 Fig. 7.1 Data collection procedures for the four strategies ...... 136 Fig. 7.2 Process of training generative-based strategy (RNN) for response generation ...... 138 Fig. 7.3 Data cleansing process flow for the Cornell movie dialogue corpus ...... 141 Fig. 7.4 Code snippet of data cleansing and saved data ...... 142 Fig. 7.5 Cornell dialogue dataset ...... 143 Fig. 7.6 Histogram distribution of the Cornell dataset ...... 143 Fig. 7.7 Exploratory data analysis of the Cornell movie dialogue dataset ...... 144 Fig. 7.8 Designing neural networks of IntelliBot ...... 149 Fig. 7.9 Single neuron connection ...... 151 Fig. 7.10 Architecture of neural networks ...... 152 Fig. 7.11 Activation function in a neuron ...... 153 Fig. 7.12 Code snippet of appropriate weight initialization ...... 156 Fig. 7.13 Parameter initialization with appropriate values ...... 157 Fig. 7.14 Representation of bias in the layer ...... 157 Fig. 7.15 Effect of bias neuron ...... 158 Fig. 7.16 Effect of bias neuron ...... 159 Fig. 7.17 Vector representation (on left) and cosine distances of university (on right) ...... 161 Fig. 7.18 Window and process for computing Pwt + j | ct ...... 161 Fig. 7.19 Window and process for computing Pwt + j | ct ...... 162 Fig. 7.20 Code Snippet of CBoW model ...... 163 Fig. 7.21 Training phases of IntelliBot ...... 165 Fig. 7.22 Training Process of IntelliBot ...... 168 Fig. 7.23 Process of forward propagation ...... 168 Fig. 7.24 Hidden vector for the word "how" ...... 170 Fig. 7.25 Hidden vector for the word "are" ...... 170 Fig. 7.26 Hidden vector for the word "you" ...... 171 Fig. 7.27 Final output ...... 171 Fig. 7.28 Final output from forward propagation ...... 172 Fig. 7.29 Example of the wrong prediction produced by RNN ...... 173 Fig. 7.30 Visualization of the effect of the loss function ...... 173 Fig. 7.31 Gradient flow ...... 174 Fig. 7.32 Process of backward propagation ...... 174

Fig. 8.1 Steps in chatbot evaluation ...... 181 Fig. 8.2 The working of IntelliBot on a desktop application ...... 184 Fig. 8.3 The working of IntelliBot on a mobile device ...... 185 Fig. 8.4 Strategy selection ratio used by IntelliBot to give an answer to the user’s questions ...... 197 Fig. 8.5 F1 Scores of the four chatbots in all question categories ...... 198 Fig. 8.6 Scores of the four chatbots categorised according to domain-specific and conversational questions ...... 199 Fig. 8.7 Chatbot evaluation steps ...... 200 Fig. 8.8 F1 Scores of the four chatbots when there is an error in the question ...... 211 Fig. 8.9 GUIs showing how IntelliBot corrects errors in questions before generating a meaningful response ...... 212 Fig. 8.10 Steps of exploratory test ...... 213

List of Abbreviations

AHRE Attentive Hierarchical Recurrent Encoder AI AIML Artificial Intelligence Markup Language ALICE Artificial Linguistic Internet Computer Entity ANN Artificial Neural Networks ASR Automatic ASCII American Standard Code for Information Interchange AMEX American Express AP Average Perceptron AWS Amazon Web Services BLUE Bilingual Evaluation Understudy BoW Bag-of-Words BRNN Bidirectional Recurrent Neural Networks CNN Convolutional Neural Networks COVID-19 Corona Virus Disease 2019 CPU Central Processing Unit CRF Conditional Random Field CRM Customer Relationship Management CSR Customer Service Representative CTU context Tracking Unit CBoW Continuous Bag-of-Words CUDA Compute Unified Device Architecture DL Deep Learning DNN Deep Neural Networks DBRNN Deep Bidirectional Recurrent Neural Networks DOM Document Object Mapping DST Dialogue State Tracker ESIM Enhanced Sequential Inference Model FAQ Frequently Asked Question EOS End of Sentence ET Exploratory Test GEC Grammar Error Checker GPU Graphical Processing Unit GRU Question Generator Unit GUI Graphical HMM Hidden Markov Model HTTP HyperText Terminal Protocol HTML HyperText Markup Language IDF Inverse Document Frequency IR Internet Retrieval

IBM International Business Machines IDE Integrated Development Environment IPU Input Processing Unit KB Knowledge-based KBDB Knowledge-based LSTM Long Short-Term Memory LUU Language Understanding Unit LUIS Language Understanding Information Service ML MLE Maximum Likelihood Estimation MEMM Maximum Entropy Markov Model MIT Massachusetts Institute of Technology MSE Mean Squared Error NDM Neural Dialogue Manager NER Named Entity Recognition NLP Natural Language Processing NLU Natural Language Understanding NLTK NN Neural Networks NMT Neural OOV Out-of-Vocabulary PCFG Probabilistic Context-free Grammar PDS Product Disclosure Statement POS Part-of-speech QA Question Answer RNN Recurrent Neural Networks RGC Response Generation Component RGU Response Generation Unit RAU Response Analyser Unit SGD Stochastic Gradient Descent SMF Sequential Matching Framework SMT Statistical Machine Translation SSU Strategy Selection Unit SP Shortest Path SVM Support Vector Machine TF Term Frequency TF-IDF Term Frequency and Inverse Document Frequency TTS Text to Speech UIMA Unstructured Information Management Architecture UNSW University of New South Wales WSD Word Sense Disambiguation XML Extended Markup Language

CHAPTER 1

“Everything we hear is an opinion, not a fact. Everything we see is a perspective, not the truth.” — Marcus Aurelius

INTRODUCTION

1Nearly 75% of customers have experienced poor customer service [1-3] The generation of meaningful, long and informative responses is a challenging task.

1.1 Chatbots and Their Evolution to Answer a Customer’s Queries For a customer-focused service industry organisation, being connected with their customers and answering their queries is an essential factor for success. Due to the rise of digital innovation, Internet-based communication services now play an exclusive role in an organization maintaining communication with its users. The importance of and need for this medium have been proven during the current unprecedented times of novel coronavirus (COVID-19), where social distancing is a mandatory requirement to be implemented. In such times, customer-focused organizations from retail, business & education industries need to come up with innovative measures that enable them to remain in contact with their customers while adhering to the new requirements of social distancing. Researchers in AI developing one such measure to achieve this, namely the chatbot.

A chatbot is a conversational software that is designed to emulate the communication capabilities of a human being to interact automatically with a user. It represents a new, modern form of customer assistance powered by artificial intelligence via a chat interface. Chatbots are based on AI techniques that understand natural language and identify meaning

1 Parts of this chapter have been published in [8] and [20]. 1 and emotion and are designed for meaningful responses. To businesses, they provide an improved way of connecting with customers and increasing customer satisfaction. To customers, they provide a better and more convenient way of having their questions answered without waiting on the phone or sending emails. Chatbots can reduce the number of customer calls, average handling time and cost of customer care. first conceptualised chatbots in the 1950s [1] by asking “Can machines think?”. Since then, the combined fields of Natural Language Processing (NLP) and Machine Learning (ML) have been used to develop and realise chatbots. In 1966, Weizenbaum [2] developed the first chatbot, named “ELIZA”, that was able to identify the keywords of the given input sentence and pattern match them against a set of predefined rules to generate responses. Since then, significant progress in the development of intelligent chatbots has been made. Hence, as shown in Figure 1.1, it is not surprising to see the widespread adoption of chatbots in many different areas of business in which humans communicate to obtain answers to their queries.

Fig. 1.1 Adoption of chatbots in different industries (Past, Present and Future use)

The chatbots that have been applied in the different areas of business can be categorised according to their style of working [3] as follows:

• Question answering bots are knowledge-based chatbots that answer users’ queries by analysing the underlying information collected from various sources like Wiki,

DailyMail [4], Allen AI science and Quiz Bowl [5, 6]. Examples of areas in which such chatbots have been applied are the Wall Street Journal, CNN, and E-commerce.

• Task-oriented bots are goal-based chatbots that assist in achieving a certain task or attempt to solve a specific problem [7] such as a flight booking, hotel reservation etc. Examples of areas in which such chatbots have been applied are flight centres, hotel bookings and restaurant ordering and booking management.

• Social bots are chatbots that communicate with other users and make recommendations to them [8], for example, Microsoft Xiaoice [9], Replika [10], Google [11], [12], [13] and [14]. Even though social bots can communicate autonomously, they can only answer simple questions and have a low degree of engagement with the user. These chatbots are used to do very specific and basic tasks and have drawbacks such as they are not able to answer complex queries and have a low degree of engagement with users.

• Service bots are chatbots which have been developed and are used by a business to answer users’ queries with a specific goal or focus on the completion of certain tasks requested by their customers. Such chatbots are domain-specific and may use a combination of QA and task-oriented to generate a response. A key requirement for service-based chatbots is that they should not only answer the user’s question but also engage in a conversation with the user. So, such chatbots need to be designed with handcrafted rules that not only answer simple and predefined questions but also have the ability to answer complex user queries.

The focus of this thesis is on service-based chatbots which are used in businesses to answer user queries in an automated manner. However, the existing service-based chatbots have shortcomings which are discussed in the next section.

1.2 Shortcomings of Existing Chatbots to Answer Customer Questions in a Service Industry

Even though chatbots are able to communicate autonomously, they are only able to answer simple and predefined questions and have a low degree of engagement with the user. This

was starting to create issues in the service industry as customers not only wanted their questions answered, they also wanted to be engaged in a conversation in a similar way as when speaking to a customer service representative (CSR). According to a Drift report [15], 75% of customers experience problems with traditional online communication channels when dealing with a business. The report [15] also indicated that this leads to further flow- on effects, as 91% of unhappy customers will not engage with the business again [16]. This will result in customer dissatisfaction and a negative experience will be conveyed to other customers, thereby adversely impacting the business [17]. While chatbots exist in the service industry, their current drawbacks are that they do not engage with the users and thus do not provide a similar experience which customers have when dealing with a CSR.

• This is emphasised in a report which states that emotionless chatbots are taking over to respond to customer queries in companies such as Pizza Express, Lufthansa and Uber which is bad news from the perspective of customer service [18]. Chatbots are referred to as emotionless due to their inability to understand difficult user questions and their failure to detect user emotions and respond appropriately. Thus, domain-specific chatbots, as opposed to social bots, need to have the capability to capture particular characteristics from the users’ questions before generating an appropriate response. While modern social bots such as Google’s , , Alexa, Samsung’s , , and Echo [19] utilize modern architectures, retrieval processes and advanced ML techniques, they do not perform well on domain-specific topics and hence cannot be applied to specific domains.

• Furthermore, the majority of existing chatbots only answer the user’s query but do not ‘engage’ in a conversation with them while doing so [20]. To explain the difference, let us consider the question ‘What day is today?’. Two possible answers can be either ‘Tuesday’ or ‘Today is Tuesday, 31st December 2019. Your next appointment is in 13 minutes’. Both responses answer the question, however the second one is more detailed and engages with the user more than the first one. Another way for chatbots to engage with users is to show in their responses rather than merely using their conditional response library [20]. For example, in response to the user’s query, ‘I am not feeling well’ or ‘I am sad’, a chatbot using its conditional response library would simply say ‘How can I help you?’

in response to both questions. But a human would reply ‘How can I help you? Do you need medical help?’ and ‘I am sorry to hear that. Why are you sad?’ respectively. The human response shows the presence of empathy and therefore relates more to the user. This is a feature which chatbots should be able to replicate in their responses. The literature finds that customer support chatbots should not respond in a way that is too serious and transactional as this will not inspire continued use [20]. So, a service-based chatbot needs to keep the customers engaged and have dialogue abilities rather than merely providing either a yes or no to a short response.

For domain-specific chatbots in the service industry, having such ability to engage with the users in domain-specific terminology is a key requirement to answer user queries effectively. To achieve this, domain-specific chatbots need to have to engage with the user to have conversational capabilities and the ability to understand users’ questions thoroughly before providing a semantically correct meaningful response [21]. The objective of this thesis is to address these drawbacks in service-based chatbots by designing and implementing an AI chatbot application for the insurance industry. The motivation for choosing the insurance industry sector as the area of application is explained is the next sub-section.

1.2.1 The motivation of choosing the insurance industry as the area of application

Customer satisfaction with a company’s services is often seen as the key to success and the long-term competitiveness for a company. The insurance industry such as credit card insurance is attracting a lot of attention in the current times as customers all over the world use them frequently. Credit card insurance is a competitive market so from a card provider’s perspective, a strong marketing strategy the provision of the right customer support is vital [22]. A credit card’s inclusions are confusing and complex, and in a world dominated by cashless payments, consumers are using credit cards at an ever-growing rate. Most credit cards offer their consumers some form of embedded complementary insurance products. Consumers are often not aware of these products and the type of language which is used to explain them makes it difficult for consumers to understand the inclusions and benefits. For example, the majority of cards and accounts include complimentary travel insurance, however, customers are often not aware of the details regarding what the cover includes, if the cover includes family or travelling companions, how the cover is activated and who to call when they need help or need to make a claim. In addition, insurance personnel require

reference materials, policies and procedures to answer this question. It is challenging for customers to obtain the information they need as they have to sift through large documents to find the answer. As a result, the best way to get help quickly is to talk to technical support or sales support teams –even for answers to FAQs or basic “how-to” questions. This overload call centres, resulting in long wait times as it takes a long time to process a single request. As a result, the customer experience is poor, and customers become dissatisfied which reduces the throughput and business performance drastically. Research shows that approximately 75% of customers have experienced poor customer service [23-25].

Having a chatbot functionality integrated into a technology platform that allows modelling the entire credit card insurance ecosystem with artificial intelligence (AI) to simulate scenarios of different economic, market and individual conditions is thus needed. Hence, there is an ever-increasing demand for improved AI capabilities so chatbots can interact with customers in relation to advice on benefits, insurance coverage and claims processes. Another advantage of chatbots is that they remove human factors and provide a 24-hour service. This enables the customer to obtain advise on the most appropriate course of action and receive information on the benefits embedded into a credit card, the level of coverage and the insurance claims process at any time without needing the involvement of a CSR or waiting in a queue. This will allow customers to learn about credit card insurance coverage and they will have peace of mind, knowing that they have independent experts looking after them. Furthermore, the card provider’s, revenue will be increased, their costs will be saved, and customer satisfaction will increase.

1.3 Objectives of the Thesis

The objective of the thesis is to develop a domain-specific, dialogue-based response generating and user-oriented chatbot that can assist an insurance business respond to user's questions. The proposed system is termed as IntelliBot, which stands for Intelligent Strategy- based Dialogue Chatbot System and is an AI-based chatbot application system that is able to automate the entire business process by generating a response to the user’s question. For this, the chatbot needs to understand user inputs and thus it should have a natural language processing (NLP) ability in order to generate an appropriate response to the user’s questions using deep neural networks while at the same time ensuring that the customers are kept engaged.

1.4 Research Questions to Achieve the Research Objectives

To accomplish the objective of the thesis, the following research questions need to be explored:

i. Understand how a chatbot works and can we identify what components are required to build an advanced AI chatbot application system to answer user queries in the insurance industry? To answer this question, the is to study how existing chatbot applications in the service industry work and identify their drawbacks and the features required to build an AI chatbot. This question will be answered by studying the literature in the area of deep learning, specifically deep neural networks (DNN) and bidirectional recurrent neural networks (BRNN).

ii. How to design and develop various response generation strategies to generate an appropriate response to user queries? User questions will be of different levels of complexity. Some questions may be standard and repetitive while other questions may be complex and may require the chatbot application to synthesize knowledge from the underlying information. So, to achieve that, different response generation strategies need to be chosen according to the complexity of the question to be answered, which not only generate a semantically correct and meaningful response, but also keep the user engaged. In finding a solution to this question, this thesis provides a concrete solution to the given problem and develops four response generation strategies are designed to answer the user’s question. Each strategy is different in the way it generates responses. These strategies will be studied and developed in this research question.

iii. How to develop and train a deep bidirectional recurrent neural network (DBRNN) model so that it can understand human natural language and generate appropriate responses? The ability of a chatbot to generate a response to a user’s question which is semantically and grammatically correct and that is not predefined in a template is one of the main goals of this thesis. To answer this question, an AI chatbot application system needs to be designed and trained, as it is not possible to pre-define a template for every possible question that a user can ask. So, a deep bidirectional recurrent neural network (DBRNN) model that can

understand the user’s question and generate an appropriate response needs to be investigated to answer this research question.

iv. How to evaluate if the developed AI model gives an acceptable response and how to validate this? This question evaluates the developed AI chatbot application system to determine its accuracy in responding to the questions it is being asked. The quality of the generated response needs to be assessed against different factors, such as keeping the user engaged, and answering a question with a response that is grammatically and semantically correct. Furthermore, the generated response needs to be evaluated against other existing chatbots to show its superiority using F1 score and Cohen’s kappa metrics.

1.5 Contributions of the Thesis

This thesis contributes to the literature on AI-based chatbots in the service industry, specifically in the insurance industry, as follows:

• it introduces how the four response generation strategies can be used to generate a response and explains how IntelliBot selects each strategy to generate a response. • it proposes a scalable and flexible conceptual framework for IntelliBot which can converse with the user in a meaningful way in the domain of insurance industry and keep them engaged. • it designs a data pipeline process for processing data and forms appropriate QA pairs from the Cornell movie dialogue dataset and the insurance domain-specific dataset. The QA pairs are then used to create the training and testing dataset for IntelliBot to generate a response when there is no pre-defined model for a question asked by a user. • it develops a prototype model of IntelliBot and compares its performance with existing chatbots to show its superiority in different aspects such as generating a grammatically correct response which engages the user. • it creates an insurance domain-related QA dataset that can be used for future experiments.

1.6 Significance of the Thesis

The significance of the thesis is that it builds a neural dialogue manager (NDM) that incorporates an AI chatbot interface (IntelliBot). The developed chatbot is significant for the following two reasons:

• it has the ability to be used in customer service sectors to answer customer ’s query in a timely manner. • it provides training datasets that can be used for further studies or extensive experiments that can verify the effectiveness of the system performance.

1.7 Scope of the Thesis

To solve the problem in this thesis, the scope of the research is limited to the following factors:

• Identify components required to build a domain-specific chatbot application to answer user queries. • Design a methodological approach for building IntelliBot’s framework. • Design four strategy selection units which can generate responses to user queries according to their complexity. • Collect domain-specific data from the product disclosure statement (PDS) and basic conversational data from the Cornell movie dialogue corpus. • Incorporate NLP tasks and grammar checks with the IntelliBot framework. • Train IntelliBot using DBRNN in a seq2seq model with an attention mechanism to generate responses for question for which there is no defined template. • Evaluate IntelliBot’s generated response to user queries against three publicly available chatbots.

1.8 Structure of the Thesis

The remainder of this thesis is structured as follows:

• Chapter 2 presents an extensive literature study in the area of neural networks. It describes an overview of various types of chatbots and evaluates whether they are suitable for user-bot conversation in the industry domain and summarises their

drawbacks. It also discusses the techniques behind existing QA applications that are relevant to the thesis’s problem. • Chapter 3 explains in detail the problem which this thesis addresses. It also identifies the various research questions that need to be addressed to achieve the thesis’s objective. The research methodology adopted in this thesis is also explained. • Chapter 4 presents an overview of IntelliBot’s proposed framework. It details the requirements which are needed for a domain-specific chatbot to answer user questions. It then presents a methodological approach for building IntelliBot’s framework and the development of a prototype is described in the next chapters. • Chapter 5 explains the four strategies, namely template-based strategy, knowledge-based strategy, internet-based strategy and generative-based strategy which IntelliBot uses to generate a response. It also explains how IntelliBot selects each strategy to generate a response and describes the working process of each strategy in detail. • Chapter 6 defines the key terms required to understand the working of IntelliBot’s Language Understanding Unit (LUU) and explains the various NLP tasks performed by IntelliBot. The process of measuring semantic similarity at a word-level and sentence- level is explained which is used to check if the generated response matches with the user question. • Chapter 7 details the process of how the domain-specific data, which in the context of this thesis is insurance-related, is collected and curated from various sources. This chapter also explains the design of the DBRNN and process of training the DBRNN which is necessary for the generative-based response generation strategy. • Chapter 8 explains how the performance of IntelliBot is validated and compared against other chatbots in the literature. The other chatbots are discussed and the quality of the generated responses of each chatbot are measured using F1 score and Cohen’s kappa. The experiment results show that IntelliBot outperforms the three other chatbots in relation to the different factors required for a domain-specific chatbot. • Chapter 9 concludes the thesis by providing an overview of the proposed solution to address the problem discussed in this thesis. It also introduces areas for future work arising from this thesis.

CHAPTER 2

“Exploratory research is really like working in a fog. You don’t know where you’re going. You’re just groping. Then people learn about it afterwards and think how straightforward it was.”—Discoverer of DNA.

LITERATURE REVIEW

2.1 Overview

2This chapter conducts a systematic review on the existing chatbots and identifies their shortcomings with respect to the requirements of a domain-oriented chatbot to answer customers’ queries in the service industry. The chapter is structured as follows. Section 2.2 illustrates a taxonomy of chatbots according to how they are classified into different groups. Section 2.3 analyses the models used to generate a chatbot response that mimics a human brain. Section 2.4 describes the existing chatbots and their drawbacks in relation to their application in service industries. Section 2.5 explains the different working techniques used in dialogue-based chatbots. Section 2.6 concludes the chapter with a discussion of the existing gaps in the working techniques of dialogue-based chatbots when answering customer questions in the service industry.

2.2 Taxonomy of Chatbots

True to their growth, chatbots have been applied in various industry sectors [26] and have been classified into different groups [27]. For example, The objective of the study [26] is to analyse the ‘purpose’ of chatbots and classify them into the four categories of service, commercial, entertainment and advisory. Service chatbots such as Eliza [2] and Alice [28] provide services to customers. For example, a logistics firm uses a chatbot to respond to

2 Parts of this chapter have been published in [8] and [20].

customers’ questions about deliveries and provide copies of dispatch documents through an channel rather than emails or phone calls. Commercial chatbots such as Pendorabots and Alexa [29] streamline purchases for customers. For example, they assist customers by answering their questions and help them to place orders. Entertainment chatbots keep customers engaged by discussing sport, the customer’s favourite or movie or other events. They also offer customers the option of placing a bet, provide details on upcoming events and give information on ticket deals. Chatbots such as Siri [30] can be classified under the entertainment category although they also fall into other categories as well. Advisory chatbots such as Alexa [29] provide suggestions, give recommendations on services and offer support and advice. Other researchers, such as [27], grouped chatbots according to their purpose, classifying them into task-oriented and non-task-oriented chatbots. Task-oriented chatbots help to complete certain tasks through short conversations with customers. For example, applications such as Siri, , Alexa [30] can provide customers with travel directions, find restaurants and help them to make phone calls or send texts. On the other hand, non-task-oriented chatbots do not perform a task but may converse with customers to answer their questions.

Another recent study on chatbots classified them as either question answering, task-oriented or social bots [3] according to the scope expected from them. Question answering bots are knowledge-based chatbots that answer users’ queries by analyzing the underlying information collected from various sources like Wiki, DailyMail [4], Allen AI science and Quiz Bowl [5, 6]. Task-oriented bots are goal-based chatbots that assist in performing a certain task or attempt to solve a specific problem such as a flight booking, hotel reservation [7]. Social bots communicate with other users and make recommendations to them [8], for example Microsoft Xiaoice [9] and Replika [10]. When a chatbot is used by a business to answer users’ queries with a specific goal or focus on the completion of certain tasks requested by their customers, this is referred to as a service-based chatbot. Such chatbots are domain-specific and may use one of the aforementioned classifications to generate a response. A key requirement for service-based chatbots is for them to engage with the customers, as figures show that 91% of unhappy customers will not engage again with the business [16]. To keep the customers engaged, the chatbot needs to have dialogue abilities rather than just providing either a yes or no as a response.

Chatbots with dialogue generation ability are classified as either goal-based, knowledge- based, service-based or response-generated-based, as shown in Fig. 2.1. A brief explanation of the models which they use is given in the next sub-sections.

Fig. 2.1 Taxonomy of chatbot classification according to the requirements

2.2.1 Goal-based chatbot

Goal-based chatbots have a primary goal or aim to complete specific tasks. It is designed to have short conversations to obtain the required information from the user to complete a task. For example, a company deploys a chatbot on their website to help clients to answer their questions. For such a chatbot to work, three main capabilities or requirements namely activity-based, conversational, and informative are needed. Activity-based bots are able to perform a particular task, for example, make a flight booking or hotel reservation as required by the user. The conversational capability provides bots with the ability to talk to the user and continue their conversation based on the user’s questions. The informative capability provides bots with the ability to collect information from different knowledge sources. Some examples of chatbots that have these capabilities to respond to a goal are Alexa, Siri, Mitsuku, Xaoice and FAQ bots [30].

2.2.2 Knowledge-based chatbot

Knowledge-based chatbots are able to collect information from underlying data sources or online documents that are either in an open-domain or close-domain. The capabilities required for such a chatbot to work is the ability to access a collection of rational or online documents, extract information containing the answer and generate a response. The response from open-domain data sources are publicly available and depends on general

topics. Open-domain data sources are Allen AI Science and Quiz Bowl [5, 6]. On the other hand, a closed-domain data source focusses on a specific knowledge domain and all information are necessary to answer the question is provided in the dataset itself, such as Daily Mail [4], MCTest and bAbI [4].

2.2.3 Service-based chatbot

Service-based chatbots provide either personal or commercial service to the customer. They are classified under the sub-categories of being personal, social bots or agent-based. The capabilities required for such chatbots to work is the ability to access the required knowledge from the relevant sources and achieve the required goal, for example, making a flight booking, hotel reservation, restaurant booking etc. Personal service-based bots require the capability to mimic users’ activities such as manage a user’s calendar, store opinions, set reminders etc. Agent-based bots require the capability to communicate with other bots to accomplish a task. An example of such a bot is the integration of Alexa [29] and Cortana [4].

2.2.4 Response generated-based chatbot

Response generated-based chatbots are dynamic models that mimic the working of the human brain while generating responses. In other words, such models decide on what actions to perform as their response to a question asked by the user. The capabilities required for such chatbots to work is the ability to take inputs from the user, understand it, develop a response to it and communicate the response to the user as a conversation. To generate a response requires the chatbot to use advanced artificial intelligence techniques. Examples of such chatbots with this ability are Microsoft Tay, Apple Siri, Google bot, etc. These models are complex as they develop a response from scratch based on techniques such as machine learning, deep learning NLP, recurrent neural networks etc.

This thesis emphasis is on response generated-based type of chatbots. The objectives of this chapter are to study the existing response generation approaches and determine their effectiveness in generating a response that engages with the user in the form of a dialogue that is semantically correct and engages the user in conversations. In the next section, this study explains the techniques that are used in the existing literature to generate a response that mimics the working of the human brain.

2.3 Analysis of Models Used to Generate a Response that Mimics a Human Brain

This section discusses techniques used in the four main models namely template-based, generative, retrieval-based and search engine, as shown in Fig 2.2, and identifies their drawbacks that enable chatbots to generate a response that mimics a human.

Fig. 2.2 Classification of response generated-based models

2.3.1 Template-based Model

This model has pre-defined questions and answers. It matches the users’ question based on the pre-defined collection of rules, question template and in the case of a match, displays their answer to the user. Such chatbots work on determining patterns using rules and are commonly used in the entertainment industry. To bootstrap the interaction, an initiator model is used that acts as the conversation starter [31]. Examples of these types of chatbots are Alicebot [28], Elizabot [2], ChatScript [32], and Storybot [31]. The template-based model uses the Artificial Intelligence Markup Language (AIML) [28], ChatScript [33], RiveScript [34] and Rasa [35] to structure the responses. Some applications of this model are Storybot [31] and BoW movies [36] which respond with a story and with the movie title, actors’ names etc., respectively. However, the common drawback with this type of model is that the user receives a response only if there is a high level of pattern similarity between the input questions and sometimes the response is inappropriate. Additionally, template-based models are difficult to maintain, time-consuming and are weak in terms of pattern matching [37].

Table 2.1 Template-based models – description with issues and impacts

Name of Description of the Design Issues/limitations Impacts Approach Approach technique AIML AIML is an XML Mark-up Does not split This will not based approach, language input and combine enable the chatbot specify rule-based based on the result. to generate a chatbot content and XML. meaningful designed for response and simplicity. cannot answer complex queries. Pattern It is an algorithmic AIML It needs a great It is not possible to Matching task that finds pre- category deal of effort from write rules for defined sequence of pattern the subject matter every possible tokens to match matching. expert. scenario of expressions, or to questions a user detect patterns. may ask.

Initiator This acts as a Fact Inappropriate This will not Model conversation starter. generator responses. enable the chatbot The system takes the model with to generate a initiative by asking pattern meaningful questions. matching. response. Storybot This is a Pattern It is responds It does not engage Model which tells stories. It matching when the user asks the user in outputs a short for one. conversation. fictional story at the request of the use.r

BoW It answers questions Template- Specific to movie It can only answer Movies in the movie domain. based with domain only. questions in the It provides movie string movie domain and title, actors’ names matching. its generated and a description responses are from IMDB. template-based.

2.3.2 Retrieval-based Model

This model is more advanced than the template-based model in that it not only finds matches between the user’s and the predefined question, it also considers the intent of the conversation. To understand the intent of the conversation, these models use techniques such as recurrent neural networks, a logistics regression classifier and sequence-to-

sequence. Based on these, the underlying database is searched for an answer. To select a response, information is retrieved from the knowledge-base, such as previous conversational history, logs, PDS and insurance domain-specific terms. It applies a complex semantic query formation technique to obtain information and matches this using an ensemble of machine learning classifiers. As is the case with the template-based model, this model constructs a response through keywords, identified facts and semantic query formation for corresponding questions. An example where this type of chatbot has been applied is the BoW escape plan [38]. Using a logistics regression classifier, the chatbot engages with the user and provides responses on 35 different topics. The work in [39] develops a dual encoder model that uses recurrent neural networks and sequence encoders to generate a response. The work in [6] uses the sequence-to-sequence approach with Gaussian latent variables that use a logistics regression model to generate responses from Reddit data. The work in [40] uses a bag-of-words model with the approach to select a response from the underlying dataset that has the highest cosine similarity. So, while these models generate a response, they may be inappropriate and need a large amount of data to be reasonably functional.

Table 2.2 Retrieval-based models – description with issues and impacts

Name of Description of the Design Issues/limitations Impacts Approach Approach technique End-to-End It is domain-specific, Recurrent - Retrieval This approach has Method and its aim is to Neural operation is non- difficulties in complete certain job Networks differentiable. extracting a user’s and able to - Result does not natural language communicate with an convey uncertainty questions and the existing database. about semantic model cannot learn . from a conversation. BoW It is capable of Logistic Dependent on 35 It will not enable EscapePlan handling user regression predefined topics the chatbot to involvement and classifier and trained on generate a keep them in a level data. meaningful conversation even if -Performance of response and the model is not able the model is poor. engage a user in a to provide conversation. meaningful response. It returns responses from a set of 35 predefined topics. VHRED It is a logistic seq2seq Inappropriate It cannot identify Model regression model and with responses and poor errors and does uses Reddit data to Gaussian performance. not generate a generate responses. latent meaningful variables response. Dual It uses two sequence Recurrent Inappropriate It cannot identify Encoder encoders with a neural responses and poor errors and does Model single LSTM networks performance. not enable the recurrent layer for a chatbot to response. generate a meaningful response. Bag-of- The model based on Bag-of- -Information It cannot identify Words BoW models, Words duplication issue. errors, it is difficult Model Word2Vec Model - It acquires a large to train the model embeddings and amount of data. and does not Glove word. It generate a retrieves responses meaningful with the highest response.

cosine similarity.

2.3.3 Search Engine Model

This model generates a response by crawling the web or using search engine results using a deep classifier model. using web crawling or search engines is not as simple as searching a knowledge base. It first needs to identify semantic annotations or metadata from the semantic layer of the web. Then, it needs to apply DOM parsing and extract only the required and relevant data that contains an answer. The search engine model uses approaches such as a deep classifier or an LSTM classifier to generate a response from the search results. Techniques such as deep brain and deep learning are used to identify the possible answer before selecting the one to be shown to the user. While it choose best possible answers from a model, the challenge is to choose the best suitable response among many possible answers. Examples of such chatbots are Indri, , Microsoft Bing, Lucene [3].

Table 2.3 Search engine models – description with issues and impacts

Name of the Description of the Design Issues/limitations Impacts Approach Approach technique Deep It searches the web Deep Brain, Lots of response It is not able to Classifier with user queries Deep Learning results for one provide one best Model and responds from a user query. matching response set of search engine from the list of results. responses. It will not generate a meaningful response. LSTM It uses LSTM cell at Binary Finite-length It cannot answer Classifier encoder and Classification sequences complex queries decoder to get a and engaged a user response. in a meaningful conversation.

2.3.4 Generative Model

Generative-based models generate new answers in response to users’ questions. They do not depend on pre-defined questions and answers, rather they use neural network models or deep learning techniques [41] such as ANN, RNN, CNN and DeepQ to develop a dialogue with the user on the fly. The generative model uses a knowledge synthesis approach to generate answers and engage the users in a form of dialogue. These models generate responses by translating from the inputs to the outputs. Various chatbots that use this model have been developed. The work in [42] developed a Question Generator Unit (GRU) that generates follow-up questions to be presented to the user using a word-by-word vector. The work in [43] proposed the seq2seq Model which is a feedforward fully connected neural network to generate a response. The Deep-Q-Network works on the end-to-end decoder approach and uses an iterative decoding strategy to obtain an output sequence with maximal probability [44]. The research in [45, 46] proposed the Markov chain model that builds responses which have the highest probability using the stochastic model and Markov process. The work in [27] proposed the pipeline method which is a task-oriented method using neural networks and deep learning. These models, however, have drawbacks as they need large training data, longer training time, and significant human input to correctly train them so that they give a satisfactory performance.

Table 2.4 Generative-based models – description with issues and impacts

Name of the Description of the Design Issues/limitations Impacts Approach Approach technique GRU It generates follow- Word-by-word The generation It will not be Question up questions word- vector procedure works able to engage Generator by-word. The when a question the user in model is used for mark is detected. continuous short questions. conversation seq2seq It is a feedforward Deep learning It is hard to train, It is not easy to Model neural network and RNN. and it takes a long train the chatbot which responses training time and and requires a are generated large dataset. huge dataset on during a specific conversation. domain. Meaningful and continuous conversation is

still questionable Deep Q It is an end-to-end Deep Neural Requires many It is difficult to Network decoder and uses Networks iterations to train chatbot an iterative obtain and requires a decoding strategy. satisfactory huge dataset on It aims to obtain an performance and a specific output sequence requires large domain. with maximal dataset. Meaningful and probability. continuous conversation is still questionable. Pipeline It is a task-oriented Neural - Information is It will not enable Method dialogue system Networks, Deep omitted and the chatbot to method. Learning duplication issues. generate a - Process requires meaningful significant human response and effort. does not engage with user.

Name of the Description of the Design Issues/limitations Impacts Approach Approach technique Markov It is a statistical Stochastic Finite-length It is difficult to Chain Model model that model, Markov sequences train chatbot generates response process and it requires a based on Markov huge dataset on Chains model. The a specific idea is a probability domain. of occurrences for Meaningful and each word in the continuous dataset. conversation is still questionable.

2.4 Workings of the Existing Chatbots in the Literature

In this section, we discuss the existing chatbots in the literature and determine whether the various response generating chatbots have the aforementioned drawbacks.

2.4.1 Elizabot

Elizabot is one of the earliest and well-known chatbots. It was developed in an MIT Lab in 1966 [2] and was intended to demonstrate natural language conversation between humans and machines to provide Rogerian psychotherapy. Rogerian psychotherapy primarily encourages the patient to talk more rather than engaging in a discussion. Elizabot responses are personal questions that are meant to engage the patient to continue the conversation. It uses rule-based techniques and a script to respond to patient’s questions with keyword matching from a set of templates and context identification. The model detects the appropriate template and selects the corresponding responses. If there are multiple templates, a template is selected randomly and the model runs it through a set of reflections to better format the string for a response. Elizabot was able to convince some people and assist in the treatment of patients suffering from psychological issues. Nonetheless, Elizabot could not provide anything comparable to therapy with a human therapist. The drawback of Elizabot is failure to keep a conversation going. Furthermore, Elizabot is incompetent of learning new information or discover context and lack of logical reasoning capabilities [47].

2.4.2 Alicebot

The Artificial Linguistic Internet Computer Entity, also referred to as ALICE, was inspired by [2] and developed in [28]. Alicebot is based on an updated version of Eliza’s pattern or architecture. However, Alicebot is purely based on pattern matching and the depth-first search technique for the user’s input. It is a form of XML dialect that encodes rules for questions and answers. It uses a set of artificial intelligence markup language (AIML) templates to produce responses given the dialogue history and user utterance [48]. First, AIML receives the user sentence as input and this is stored in what is known as a category. Each category comprises of a response template and a set of conditions that give meaning to the template, known as context. Then the model prepossesses it and matches it against nodes of the decision tree. When user input is matched, the chatbot responds or executes an action. The AIML templates repeat the user’s input utterance using recursive techniques but these are not always meaningful responses. Therefore, string-based rules are required to determine if the response is correct or meaningful.

The drawback of Alicebot is the difficulty it has in modelling personalities such as traits, attitudes, mood, emotions and physical states [49]. The botmaster must integrate personality elements within the AIML. However, this is not a straightforward task. Alicebot is also incapable of generating appropriate responses, has no reasoning capabilities and is unable to generate human-like responses (). A large number of QA pairs require to build a chatbot and maybe difficult to maintain or time-consuming, hence making it unfeasible. Alicebot does not have intelligence features like natural language understanding (NLU), and grammatical analysis to structure a sentence. In addition, if the same input is repeated during the conversation, Alicebot gives the same answer, most of the time.

2.4.3 Elizabeth bot

Elizabeth bot is a version of Weizenbaum’s ELIZA application which was developed in [50]. However, various selection, substitution, and phrase storage mechanism have enhanced and increased potential adaptability and its flexibility. Elizabeth bot uses four steps to generate a response. First is a command script processing, where each line has single character which represent a command notation, not a keyword message. For example, the character ‘W’ for the welcome text, ‘S’ for the sending text, ‘N’ for no match etc. It can also be indexed using a user special code. Second is the input transformation rules in which input is mapped to predefined keywords to be compatible form. Third is the output transformation rules and personal pronouns are changed to be an appropriate response. Fourth is the keyword patterns to be matched. Elizabeth bot tries to give a different answer using different selection responses for the same question [51]. The nature of some rules in Elizabeth bot may cause iteration, which is solved by applying the rule only once.

The drawback of Elizabeth bot is that it does not provide a way to partition or split the user input sentence and then combine the results. Due to Elizabeth bot’s structure, it will be difficult to do the splitting. Furthermore, a lot of complications occur due to some rules being written in uppercase and others in lowercase which may cause a lot of errors and result in the generation of unsuitable answers. However, Elizabeth bot has the ability to give the derivation structure for a sentence using grammatical analysis, keyword extraction and pattern matching.

2.4.4 Mitsuku

Mitsuku is the most widely used standalone human-like chatbot developed using AIML [52]. It was designed for a general typed conversation based on the rules written in AIML [53] and integration in a bot network such as Twitter, Telegram, Firebase and Twilio to serve as a personality layer. Mitsuku bot uses NLP using heuristic patterns and hosted at Pandorabot. Bot modules abstract a lot of the work that goes into creating a robust chatbot system. In order to integrate its module, there is a need to include some AIML categories to route inputs from users. Whenever Mitsuku bot fails to find a better match for input, it will automatically redirect to the default category. Mitsuku can hold a long conversation, it learns from the conversation and it remembers personal details about the user (age, location, gender, etc.). Its features include the ability to reason with specific objects. For example, if someone says “Can you eat a house?” Mitsuku will look up the properties for “house” and find the value of “made_from” is set to “brick” and reply “No” as a house is not edible. Mitsuku is a multi-lingual bot and uses supervised machine learning. As it learns something new, the data is sent to the human manager for verification. Only verified data can be further incorporated and used by the app. However, Mitsuku is not effective without a large amount of dataset and fails to provide dialogue management components.

2.4.5 Cleverbot

Cleverbot is one of the most popular entertainment chatbots that implements rule-based AI techniques to communicate with humans [54]. It is developed in [55] to collect a large amount of data based on conversational exchanges with people online through crowdsourcing. Unlike other chatterbots, Cleverbot’s responses are not pre-programmed. Instead, it simulates natural conversation by learning from user input and relying on feedback in order to interact. When the user inputs a sentence, Cleverbot finds all the keywords or phrases matching the input. After searching through its saved conversations, it responds to the input by finding how a user has responded to that input when it was asked. Cleverbot is unique in that it ‘‘learns’’ what users have said to it in previously saved conversations and uses this knowledge to determine how to respond to new conversations [56]. To enhance the realism of the conversation, the bot has its own human avatar that shows emotions. The underlying technology in Cleverbots not only processes verbal and textual interactions but also facial expressions and movements to create a more authentic

conversation. The drawback of Cleverbot is its unpredictable responses and its tendency to suddenly change the subject and respond without context. It is also unable to continue a long conversation, it is not accurate in language translation and may not be suitable for children due to mature themes, profanity or expose them to a little alcohol or tobacco.

2.4.6 Chatfuel

Chatfuel provides a drag and drop user-friendly interface to construct a rule-based chatbot, developed in [57]. It is an artificial intelligence module to train the bot to map input sentences to output. It allows response prompts and integration with services such as , a third-party and CRM. With analytics capabilities, users can collect and view valuable information on chatbot performance and subscriptions quickly and effectively. Users can dictate the conversational rules via the Chatfuel dashboard to ensure the chatbot understands and answers user requests efficiently. It also allows a json integration to accommodate custom logic into the bot. The most attractive point of service is that it is simple to build a rule-based bot which is suitable for small businesses. The drawback of Chatfuel is that it is quite inflexible in terms of conversation flow and it does not support knowledge-based and multi-language features. Additionally, NLP is limited, difficult to setup and its documentation is poor. However, it is capable of understanding the user’s intent.

2.4.7 ChatScript

ChatScript is a scripting-based commercial chatbot developed in [32]. It uses pattern matching techniques similar to AIML and is a combination of an NLP and dialogue management system, including some control scripts. This is merely another ordinary topic of rules. A rule consists of a type, label, pattern and output. Rules are bundled into collections called topics such as keywords that allow the engine to automatically search the topic for relevant rules based on user input. Unlike AIML, which finds the best pattern match for an input, ChatScript first finds the best topic match, then executes a rule contained in that topic. ChatScript is well suited for stand-alone applications such as information kiosks and help desks. Although it has excellent documentation, it is difficult to implement. The drawback of CharScript is it is difficult to learn and there are no hosting services. It is also difficult to embed in a web page [56].

2.4.8 IBM Watson

Watson is a rule-based AI chatbot developed by IBM's DeepQA project [58]. It is designed for information retrieval and is a question-answering system that integrates with NLP and hierarchical ML methods. Watson uses a broad range of mechanisms to identify and assign feature values such as names, dates, geographic locations or other entities to the generated response. The machine learning system then learns how to combine the values of these features into a final score for each response. Based on this score, it ranks all possible answers and selects one as its top answer. Watson incorporates a variety of technologies including Hadoop and Apache Unstructured Information Management Architecture (UIMA) framework to examine phrase structure and grammar of the question to better gauge what is being asked.

Watson’s uses cognitive computing technology as underlying structure which able to process and complex analytics on unstructured data and handle enormous quantities of data. As the application gains experience with more input, it can find enough patterns to make accurate predictions. In addition to the advantages of Watson, it has some major drawbacks, such as it does not process structured data directly, it has no relational databases, it incurs a higher maintenance cost, it is targeted towards bigger organizations and it takes a longer time and more effort to train Watson to use its full potential.

2.4.9 Microsoft LUIS

Language Understanding Information Service (LUIS) is a domain-specific AI engine developed by Microsoft [59]. It is built using NLP and information extraction that uses prebuilt domain entities model and context. LUIS performs NLP against big data to find the intent from a sentence. It performs well in retrieving conversational data, interpreting it, extracting user intents and entities. The model starts with a list of general user intentions such as "Book Flight" or "Contact Help Desk." Once the intentions are identified, the user supplies example phrases called utterances for the intents. Then, the utterances are labelled with the specific details the user wants LUIS to pull out of the utterance. After the model is trained, it is able to process user input. LUIS receives the user input via HTTP endpoint and convey set of relevant intensions.

LUIS is integrated with various prebuilt applications and tools such as calendar for organizing days, for word lookup and collection knowledge of the web, email for communication, music and devices etc. The LUIS model is easily deployable and integrates seamlessly with the Azure Bot Service. The major drawback of LUIS is that it requires Azure subscriptions.

2.4.10 Google Dialogflow

Dialogflow, known as Api.ai, was developed by Google [60] and is a part of the Google Cloud Platform. It allows app developers to enable their users to interact with interfaces through voice and text exchanges powered by machine learning and natural language processing technologies. This lets them focus on other integral parts of app creation rather than on delineating in-depth grammar rules. Dialogflow recognizes the intent and context of what the user says and then matches the user input to specific intents and uses entities to extract relevant data from them. Finally, it allows the conversational interface to provide responses. The drawback of Dialogflow is that there is no handheld device version, it does not have an interactive user interface.

2.4.11 Amazon Lex

Amazon Lex is a service for building conversational capability into applications using deep learning technologies, developed by Amazon [29]. It provides a deep learning functionality and NLU to build a flexible user-bot conversational interfaces which increases user engagement. Amazon Lex integrated with AWS Lambda so that the user can easily trigger functions to execute back-end business logic for data retrieval and updates. The drawback of Amazon Lex is that it is not multilingual, and currently, it only supports English. Unlike Watson, Lex integration processes are complex. Furthermore, the preparation of the dataset and mapping of the entities are difficult.

Table 2.5 summarizes the workings of the existing chatbots in terms of the category in which they fall, their functionality and technique specifications and it also discusses their drawbacks which prevents them from generating a meaningful response and engaging the user in a dialogue. In the next section, a discussion of the technical approaches used in the generative model to build and generate a response is presented.

Table 2.5 Features and drawbacks of existing chatbots

Functionality Technical Specification

Chatbot Extract Semantic Classification Sentence Searching Input/output Technique Drawback Category Structure Intent Entity

Eliza [2] No No No No No ability to Basic Basic pattern Template- No logical reasoning Service-based structure a matching with based capabilities, sentence templates to inappropriate generate a responses response

Alice [28] Yes No Yes Yes No structure Depth-first Pattern matching Recursive Grammatical Goal-based ability. Stores search to represent input techniques analysis to structure a huge corpus and output sentences of text.

Elizabeth Yes No No No Derivation First keyword Command line Iterative Does not split input Goal-based [50] structure of a pattern match script as input and combine the sentence rules, and output result using transformation grammatical rules to generate analysis responses.

Mitsuku Yes Yes Yes Yes Yes Search value AIML category to NLP with Failed to provide Service-based [52] of category route input from heuristic dialogue and the user patterns, components properties supervised ML

28

Functionality Technical Specification

Chatbot Extract Semantic Classification Sentence Searching Input/output Technique Drawback Category Structure Intent Entity

LUIS [59] Yes Yes Yes Yes Uses Find intent Identify valuable NLU with the Requires Azure Knowledge- grammatical from the info. from user prebuilt subscription based analysis input, conversation domain, response with Active extracted learning intentions

Dialogflow Yes Yes Yes Yes Ability to Search Matches input to NLP, ML No interactive UI Response- [60] structure keywords specific intents and and does not based sentence uses entities to support handheld extract devices

Amazon Yes Yes No Yes Ability to Search Matches keywords NLU, AWS Not multilingual, Response- Lex [29] structure keywords for input and Lambda mapping utterances based sentence response & entities are very difficult

Chatfuel No No Yes No Yes Search Maps input Rule-based Inflexible Service-based [57] keywords sentences to conversation flows output

Functionality Technical Specification

Chatbot Extract Semantic Classification Sentence Searching Input/output Technique Drawback Category Structure Intent Entity

Cleverbot Yes No No Yes Ability to Search Matches keywords Rule-based Unpredictable Service-based [55] structure keywords for input and responses without sentence through its response based on context saved previous chat conversation

ChatScript Yes No Yes Yes No structure Finds topic & Pattern matching Script-based Difficult to learn Goal-based [32] ability executes a and embed in a web rule page contained in that topic

Watson Yes Yes Yes Yes Phrase and Search Identify feature Rule-based Does not process Knowledge- [58] grammar keywords values to generate NLP, UIMA structure data based structure responses based No relational analysis on the score databases

2.5 Techniques Used in Existing Dialogue-based Chatbots to Build and Generate a Response

Information extraction and user intention identification are central research topics in NLP. Several models have been presented by researchers in the last few years. Deep neural network models are a recent development in deep learning and have shown potential for building self-learning chatbots. However, there have been several related attempts to address the seq2seq model problems with deep learning approaches such as RNN, DNN and CNN [8]. This section summarises the previous studies and identifies the gaps. This study takes the systematic literature review approach [61] to conduct the review process. The next sub-section presents a summary of each technique and a comparison is conducted to identify the gaps from the perspective of meeting the requirements needed from response- generating chatbots.

2.5.1 Rule-based approach

In earlier days, researchers focused on conversational systems that were built using simple and predefined templates. This approach does not require any training. However, it requires an excessive deal of expertise effort to produce handcrafted rules or templates [62, 63]. However, the authors found that it is expensive to construct rule-based systems and the discussion could easily go beyond the scope. Thus, researchers and industries started to pay more attention to data-driven methods such as retrieval-based and generated response- based methods.

2.5.2 TF-IDF approach

TF-IDF determines the significance of a word in a document depending on the number of times it appears in it. It has two components—Term Frequency (TF) and Inverse Document Frequency (IDF) [64]. The importance of a word is determined according to its TF and IDF values. Wang et al. [65] proposed a two-step retrieval technique to find appropriate responses from the massive data repository using this approach. The retrieval process consists of extracting the user’s input, response matching and ranking the responses according to their TF-IDF values respectively. While such models generate responses, it uses the bag-of-words (BoW) model that does not capture text position, semantics or co- occurrences in distinct articles. Additionally, the frequency of each word needs to be

31 normalized in terms of their occurrence throughout the collection. Cerezo et al. [66] implemented a chatbot for expert recommendation tasks to help developers find the right person to contact. The proposed chatbot is based on NLP for sentence classification and key concept identification using the TF-IDF . They conducted a preliminary evaluation in two steps. First, three participants were asked to complete a specific task through interaction with the chatbot. In the second step, a semi-structured interview was conducted, and they were asked to recognize emotions while interacting. Although the chatbot gave users the answers they were expecting, its response was just an answer to their questions as opposed to engaging them in a meaningful conversation.

Mondal et al. [67] proposed a chatbot to assist Q&A in an educational domain. It uses an ensemble learning method and builds the application in the form of a telegram bot. The authors pre-processed the crawled data to convert it into a structure. Then, using NLTK features, they extract their corresponding features from a dataset of 1500 questions. The model was trained using a random forest approach and learns from a subset of features to answer the corresponding questions. However, it only answers user queries and fails to start or engage the user in a conversation. Table 2.6 presents a summary of the TF-IDF approaches to generate a response.

Table 2.6 Summary of the TF-IDF approaches to generate a response with their description and issues How does it generate a Approach Description Issues/ drawback response? TF-IDF [65] A retrieval-based Formulates the TF-IDF - Responds to matching with BoW conversational score of each word to questions only system using TF-IDF. generate an appropriate - Does not generate response. meaningful response TF-IDF [66] A chatbot to help It uses NLP for sentence - Answers users’ questions with NLP developers find the classification and key only right person to concept identification, - Trained on basic contact. using TF-IDF algorithm. conversational dataset only TF-IDF [67] A chatbot to assist Converted to structure - Answers users’ questions with Q&A in an data and extracts only and cannot engage the random educational domain features that assist in user in a meaningful forest in a form of a responding to the conversation telegram bot. corresponding question.

32

2.5.3 End-to-End approach

The end-to-end approach uses a single neural network in which all the NLP processing steps to generate a response are carried out. Williams et al. [68] developed a task-oriented chatbot using such an approach to carry out tasks such as booking movie tickets. The authors trained the model through supervised learning techniques. The NLU unit spontaneously classifies user queries with domain-specific intents and fills several slots to create a semantic frame. LSTM was used as an approach for slot filling and to determine the user’s intent simultaneously [68, 69]. The Deep-Q-Network approach is applied during the training on the labelled dataset to fine-tune the chat engine. This approach is similar to that in [5] in which the author conducted extensive experiments and performed a quantitative analysis which showed that errors at the slot-level have a higher effect on the output than errors at the intent-level. Drawbacks of this approach are that each epoch needs to be trained individually which presents several challenges and makes the performance of the entire system less robust. Gu et al. [70] proposed an enhanced sequential inference model (ESIM) with an end- to-end approach where given a partial conversation, the model selects the correct next utterance. ESIM has four features, namely a new word representation method, attentive hierarchical recurrent encoder (AHRE), multi-dimensional pooling and a modification layer for response selection. The drawback is, it requires a vast amount of labelled training data. Table 2.7 presents a summary of the end-to-end approaches to generate a response.

Table 2.7 Summary of the end-to-end approaches to generate a response with their description and issues How does it generate a Approach Description Issues/ drawback response? NLU unit automatically - No conversational A task-oriented bot Deep-Q- classifies the user’s query capabilities with end-to-end Network [68] & uses LSTM for slot methods. filling. ESIM Model Response selection Given a partial - Does not have [70] task conversational conversation, it selects conversational capabilities system the next utterance from a - Trained on basic set of possible candidates conversational dataset only

2.5.4 RNN approach with seq2seq mechanism

The seq2seq mechanism revolutionised the process of translation by making use of deep learning. Seq2seq takes as input a chain of words that are in a sequence and generates their 33 corresponding outputs. In this approach, each word is converted to its target sequence without considering its grammar or the sentence structure. It has two main components—an encoder and a decoder which encodes the input and decodes the output, respectively. While decoding, the decoder also considers the previous and next inputs apart from the current one. It does this by using neural networks such as a RNN, DNN and CNN [8, 71].

Gasic et al. [72] built an agenda-based goal-oriented chatbot for booking movie tickets. They proposed a Gaussian process-based technique to learn, as well as implement strategies that can be used with a small amount of information to adapt to the use case being addressed. The authors used the domain ontology and executed the dialogue system in two levels—NLU and the semantic level. The NLU component determines the intent of the user. The semantic level defines user-bot interactions in semantic frames as . Additionally, the user- bot interaction consists of two slots, namely inform and request slots. Inform slots are user- known values such as movie_name (avatar), number_of_persons (4), day (Friday) and the request slots are the user’s answers such as location name (city), theatre name, movie time. One shortcoming of the approach is that it requires an end-to-end supervised approach to add new knowledge in the neural networks. Wu et al. [73] developed a response selection approach in a retrieval-based chatbot. The proposed approach determines the response candidate to be given by determining the context of the conversation, its important parts and modelling the relationships among the utterances in that context. A sequential matching framework (SMF) is proposed to achieve these tasks. In the first stage, each word is transformed to a vector according to the context in which it is being studied. In the next step, the hidden states of RNN are used to determine the relationships between utterances according to their vectors. However, one of the drawbacks of their model is that a large amount of labelled data is needed to train the matching model.

Language generation techniques are another way to build a conversational system. Liu et al. [74] created RubyStar, which is a human-like dialogue system by combining distinct strategies for generating responses. The authors integrate both rule-based and deep learning techniques to decode speech to text. NLU does pre-process including topic detection, intent analysis, entity linking, after which the response generation strategies’ layer and neural networks (NNs) handle the user input. After going through the NN, the input stream flows into the response generator which eliminates incoherent or questionable answers using

34 a content filter. A ranking method is used for ranking in case there is more than one valid answer. The selected answer is passed to Amazon Alexa text to speech (TTS) and this is given to the user. Their results showed that character-level RNN is an efficient overall response generation model. However, the model’s performance can be improved by using other types of conversational topics, such as to replace the current topic embedding, sentiment embedding and engagement embedding. Cho et al. [42] proposed an NN-based encoder that encodes a chain of words into a fixed-length vector and a decoder that decodes it into another sequence. The encoder and decoder are used to trained to improve model’s accuracy and map the input sequence to an output. The proposed approach was shown to enhance the performance of BLEU scores, which is like the F1 score. Gu et al. [75] addressed a significant issue in seq2seq learning referred to as a copying mechanism. The authors’ proposed approach predicts the output sequence directly from the input by carrying information to the next stage without any nonlinear transformation. A similar method is proposed by Srivastava et al. [76] for deep neural networks training. However, a shortcoming of such approaches is that they cannot predict the output outside of the set of the input sequence. This approach was enhanced by Gu et al. [75] by selectively replicating input segments in the outputs. This is helpful in those cases where people are likely to repeat entity names. However, the challenge in seq2seq comes while copying. The authors addressed this by proposing a new model with an encoder-decoder structure called COPYNET. The proposed approach can integrate a common word generation technique in the decoder with the copying mechanism which can select the input sequence and place them in an output sequence.

Serban et al. [31] proposed a deep reinforcement learning chatbot called MILABOT which is able to interact with humans through speech and text. MILABOT comprises NLP and a neural network-based retrieval model including QA templates. The user-bot interaction provides responses using reinforcement learning from crowdsourced data. Lee et al. [77] state that a conventional seq2seq model discovers sequences more accurately when the input sequences are conditioned without taking into account the output sequences. To demonstrate this, the authors’ model scales the sentiment of the chatbot by training it with Tensorflow (proposed by [78]) using Twitter chatting corpus. In the training stage, the sequence input to the encoder and seq2seq models maximizes the probability of generating an accurate response. Two evaluation metrics are used, namely sentiment coherence and

35 sentiment classifier score. Sentiment coherence gives score regarding whether the output is meaningful or not and sentiment classifier score measures how positive the output sentence is. Wu et al. [79] proposed an attention-based RNN in which the responses are enhanced based on user inputs. First, the RNN influences the input sequence and the generated response is weighed by a pre-trained LDA model. This is used to form topic vectors that are linear combinations of the topic words, which then, through an attention mechanism, refines the given inputs and responses with the topic vectors. Despite the success of the seq2seq model, there is not much focus on dealing with chatbot’s speech recognition errors in the end-to-end dialogue system. Chen et al. [80] investigated the problem of converting speech to text. The study uses DNN to determine the probability of error on spoken text summarization and applied the CRF model. This model has dual encoders (two RNNs) with different parameters, one ASR gate and one decoder. The encoder encodes the input, the ASR gate forwards vectors to the decoder, and the decoder generates the output. Additionally, it makes hidden states similar to the decoder for it to predict the dialogue text. The model demonstrates that it generates similar responses from the given input. However, the output has errors in it. Stroh and Mathur [81] used a seq2seq model with the GloVe word vector to answer questions. The authors used a cross-entropy error on the decoder output to train the RNN on the bAbI English dataset. The proposed approach performed well on questions that required either a yes or no answer, however, it failed for longer response generators. Table 2.8 presents a summary of the seq2seq approaches to generate a response.

36

Table 2.8 Summary of the RNN approaches with seq2seq to generate a response with their description and issues How does it generate a Approach Description Issues/ drawback response? Gaussian Goal-oriented bot in the User-bot interaction - Does not engage user in a Process [72] movie booking domain consists of two slots— long conversational inform and request to - Cannot provide a generate a response. meaningful response Sequential A retrieval-based bot with Generates a response by - Trained on basic matching RNN determining the context conversational dataset only framework of the conversation, its [73] important parts and modelling relationships among the utterances in that context. Language Non-task oriented social Mimics human-like generation bot integrates with rule- conversation by using a - Trained on basic techniques based and deep learning. combination of response conversational dataset only [74] generation strategies. Copying It predicts the output Generates the output - Little conversational mechanism sequence directly from the according to the capability [75] input using seq2seq. sequence of the inputs. - Cannot predict or generate a meaningful response - Trained on basic conversational dataset only Neural Deep reinforcement Able to interact with a - Cannot provide a Network- learning chatbot with NLP. human through speech meaningful response based and text. - Trained on basic retrieval conversational dataset only model [31] Conventional Model to scale the Sentiment coherence and - Focused on sentiment seq2seq sentiment of the chatbot sentiment classifier score rather than conversation model [77] using seq2seq. - Trained on basic conversational dataset only Attention- User inputs and the Responses in high-level - Trained on basic based RNN responses are enhanced. with rich content conversational dataset only [79] End-to-end The method can detect Speech to text - Trained on basic dialogue speech errors and try to conversion. conversational dataset only system [80] recover them using CRF model with DNN.

37 seq2seq Build with Tensorflow GRU It answers well on tasks - Does not have a model with and separated with yes/no questions. conversational capability or Glove word representations for the generate a meaningful vector [81] query e.g ‘Q’ for a response question, ‘GO’ for start. - Trained on bAbI dataset only

2.5.5 RNN approach with memory network

The RNN approach with a memory network is termed long short-term memory (LSTM) [82]. The LSTM cell eliminates the short-term memory characteristics of RNN which means it has a short attention span. Instead, it can remember patterns between words for a longer duration of time and use this to accurately determine the next word in a sequence. It develops the context of the word by taking inputs and determines what should be the next output in the sequence. Researchers have proposed methods by which the LSTM model is able to deal with a longer sequence of inputs and process them to produce an accurate output.

Bahdanau et al. [83] developed an approach that combines the attention mechanism with DNN and applied it to neural machine translation (NMT). To address the LSTM’s drawback when translating long sentences, Sutskever et al. [84] developed a multilayered LSTM built on a limited vocabulary. One LSTM maps the input to a fixed dimension vector whereas another LSTM decodes the target sequences from the vector. Shao et al. [43] proposed an approach that focusses on generating the correct output by addressing the shortcomings of the seq2seq model which struggles to generate long responses. The author proposed a glimpse model with a stochastic beam-search decoding technique. The glimpse model scales the ability to train on bigger datasets (2.3B conversational messages) that were then used to generate responses from a diversified range using MAP-decoding. Weizenbaum [2] improves the alignment of the generated outputs to the inputs by proposing an attention-based seq2seq mechanism. Yin et al. [85] proposed DeepProbe which understands the input question before translating it to a simpler query form. A recommendation model is then used to ascertain the best match to the query. Table 2.9 presents a summary of the RNN approaches with memory networks to generate a response.

38

Table 2.9 Summary of RNN approaches with memory networks to generate a response with their description and issues How does it generate a Approach Description Issues/ drawback response? NMT with the One layer of LSTM maps Adopts multilayered LSTM to - Little conversational attention the input while the improve the attention span. capability mechanism other layer decodes the - Limited vocabulary [84] output. - Trained on basic conversational dataset only seq2seq with Attempts to generate The glimpse model scales - Trained on basic glimpse the correct output by the ability to train on bigger conversational dataset model [43] addressing the datasets that were then used only shortcomings of the to generate responses from seq2seq model. a diversified range by using MAP-decoding. seq2seq with Uses seq2seq model Uses the top hidden vector - Trained on basic the attention with attention-based at the decoder side to conversational dataset mechanism mechanism to generate generate an accurate only [86] outputs. response. DeepProbe Uses attention-based Translates the input question - Little conversation [85] seq2seq RNN to to a simpler query form capability generate output. which is then used to - Trained on basic determine the best match to conversational dataset the query. only

2.6 Critical Evaluation of the Literature

The aim of this section is to discuss current gaps in the existing techniques used by a chatbot to generate a response to answer a user’s questions. These techniques are rule-based, TF- IDF, end-to-end, RNN, DNN and CNN. Although many approaches, techniques and tools are available, a key challenge is to move towards engaging with the user in a meaningful conversation specific to a domain. Furthermore, apart from just responding to the user questions being asked, a chatbot should include additional features and capabilities such as spelling corrections, identifying errors in a user’s question, asking confirmation to resolve the identified errors, sentence structure correction, abbreviation checks, continuous learning from user-bot conversation, retrieving up-to-date information from the web and saving it for further use. As summarised in Table 2.5, existing chatbots do not address these requirements

39 and hence there is a need to address them in a chatbot that can answer domain-specific user questions.

Section 2.5 presents a summary of the approaches and techniques used in existing dialogue- based chatbots to build and generate a response. The findings from the summary show that the majority of legacy chatbots use a rule-based approach which lacks any capability to engage in a discussion with the users. They respond to a user question only if it matches with predefined rules or set of questions in the template. But a challenge is that it is impossible to write rules for every possible scenario of questions which a user may ask. Furthermore, to make such rules, a great deal of effort is required by a subject matter expert. Another issue is that as the questions are predefined, this approach is not able to engage the user in a long and domain-specific conversation if it asks a variant of the question. However, to overcome a rule-based limitation, data-driven methods are introduced such as TF-IDF, generated- based, and RNN are used in the literature. These approaches consider the importance of a word in a document instead of using rules or QA templates. However, as discussed in Section 2.6, the responses generated by TF-IDF were not appropriate to the question the user asked. Furthermore, this approach also fails to generate a meaningful response and does not have the capability to engage a user in a conversation. RNN or generative-based approaches too were used in the literature. However, as discussed in Table 2.8 and Table 2.9, they are not trained on a domain-specific dataset, rather they are trained on a basic conversational dataset only with a limited vocabulary. Furthermore, most of the generated responses were not meaningful and failed to engage the user in a conversation.

As mentioned in Chapter 1, the focus of this thesis is to engage the user in a meaningful conversation by generating appropriate responses in the insurance domain. To achieve this aim, IntelliBot should understand the question the user is asking before selecting a suitable strategy for generating appropriate responses. Among generated multiple responses, IntelliBot should then choose the best possible answer before conveying to the user. To the best of the author’s knowledge, and as shown from the summary of the existing approaches in the literature in the previous sections, while there has been number of attempts in this area, no prior framework exists which addresses all the discussed shortcomings in one framework. Hence, this is a gap in the literature that needs to be addressed. In Chapter 3, this gap is formally defined as the problem to be addressed in this thesis.

40

2.7 Conclusion

This chapter discusses the previous studies in the literature, describes the use and drawbacks of the existing chatbots and classifies them according to the different categories. It also explains the different techniques used in dialogue-based chatbots, and then summarizes the issues within the existing approaches from the perspective of a dialogue-based chatbot. The identified issues will be formally defined as the problem to be addressed in this thesis.

41

CHAPTER 3

“Research is to see what everybody else has seen, and to think what nobody else has thought.”—Unknown.

PROBLEM DEFINITION

3.1 Introduction

3The literature study in Chapter 2 presented the drawbacks of the existing chatbots and the different techniques used in dialogue-based chatbots. In this chapter, the drawbacks which have been identified are defined as the problem addressed in this thesis. This chapter is organized as follows: Section 3.2 defines the key terms used in this chapter. Section 3.3 explains the shortcomings in the existing chatbots to generate meaningful responses. Section 3.4 breaks down the gaps in the literature in terms of the research issues to be addressed to solve them. Section 3.5 discusses the research methodology that is followed in this thesis to solve the research problem. Section 3.6 concludes this chapter.

3.2 Key Terms

Dialogue-based System is an AI software system that is intended to converse with humans in a meaningful way. It addresses features of human-to-human dialogue and aims to integrate them into dialogue system for human-machine interaction. It is also referred as chatbot.

Natural Language Processing (NLP) is an AI method that communicate with an intelligent system to understand the human’s natural language. The objective of NLP is to read,

3 Parts of this chapter have been published in [20]. 42

decipher, understand and make sense of human languages in a manner that is valuable. It performs tasks like translation, grammar checking and topic classification.

Domain-specific means specialized to a particular application domain or a specific context in a problem domain.

Knowledge-based Database (KBDB) is a database system that use database concepts and models to store and retrieve knowledge. It typically links and integrates all available knowledge sources including explicit/inexplicit knowledge. The objectives of KBDB is to make available the most optimal knowledge in the optimal time to enable appropriate decision- making.

3.3 Existing Gaps in Domain-oriented Dialogue-based Chatbots Which Aim to Engage with Customers in the Service Industry

As discussed in Chapter 1 and 2, although there are various domain-oriented dialogue-based chatbots in the service industry, the literature highlights their shortcomings in that they are not able to respond to user’s complex queries, nor can they engage users in a long and meaningful conversation. In order to address these drawbacks and develop a chatbot that can converse naturally with customers in a way which is indistinguishable from a human are described in the literature. The following sub-sections explain the drawbacks that will form the problem that is addressed in this thesis.

3.3.1 Drawback 1: Use of templates to map questions and answers to respond to user questions

Legacy chatbots use a series of predefined rules to map pairs of questions and answers. This is done in anticipation of what the user will ask and then pattern matching techniques are used to check whether the user question matches the predefined rules or questions. To devise these rules, a great deal of effort from a subject matter expert is needed. It is impossible to write rules for every possible scenario of questions which a user may ask. This is despite the rule-based technique being relatively straightforward but having a less flexible conversation flow which is not efficient in answering questions. The obstacles of rule-based techniques are that they cannot learn on their own nor can they generate responses for questions that are not defined in the template. Rather, they only provide answers that are

43 defined in the templates. Additionally, this technique leads to conflict when more than one rule satisfies their conditions to a given question.

3.3.2 Drawback 2: Inability to respond to a user’s complex queries

Chatbots are revolutionizing the way organizations interact with their customers. However, as mentioned in Section 3.3.1, existing chatbots can only handle simple queries which are predefined in the template and fail to manage complex queries. Thus, it is crucial for organizations to develop chatbots to address this gap to create a positive image with the customer by keeping them engaged. The existing chatbots in the literature are not well suited for advanced technologies like NLP framework to create responses or analyse user questions. As they use only keywords, they cannot understand the facts and context of the conversation which results in them communicating with all users in the same way. Hence, the chatbots are unable to build a query to find appropriate responses for a user’s complex questions and speed up the time taken to answer the users.

3.3.3 Drawback 3: Deciding which strategy to select according to the question asked to generate a meaningful and domain-specific response

The existing chatbots in the literature use either a template-based, knowledge-based or neural network-based strategy for general user-bot conversation. The working style of each response generation is different. A template-based strategy can be used to answer simple questions, while neural networks can be used to answer complex questions. But the working style and complexity of the working of each strategy is different. So, on one hand, while combining each strategy to generate a response can make the chatbot more efficient than using an individual strategy, on the other hand there is a need to instil a decision-making ability in the chatbot so it can decide on what strategy to use to generate the response so that it is meaningful. Furthermore, most existing chatbots do not take into consideration domain-oriented QA and thus are not suitable for answering domain-specific questions.

3.3.4 Drawback 4: Unable to engage users in a meaningful conversation

As previously discussed, the aim of domain-oriented chatbots in the service industry should be to engage the users in long and meaningful conversations. Existing approaches in the literature use techniques such as the TF-IDF approach which is based on a frequency

44 distribution and uses a bag-of-words with a limited document size of up to 50k words to generate a response [66]. This impacts the conversational ability of the chatbot to generate meaningful answers. Additionally, existing approaches do not capture text position, semantics, and co-occurrences in different articles [65]. Other approaches such as TF-IDF with NLP and random forest overcame these problems and showed significant improvement on feature extraction and sentence classification [66]. However, even though they can answer a user’s simple queries, the chatbot failed to engage the user in conversation. End- to-end approaches were able to partially overcome this problem, but training the model required a vast amount of labelled data. Moreover, it was built on a single neural network and required a lot of time to generate an output. To engage users in meaningful conversation, the chatbot needs to ask the user relevant questions and provide suggestions and recommendations. In order to do this, the chatbot needs to consider the contextual information, topics and previous conversational data for every query to generate a meaningful response. Existing approaches do not enable chatbots to do this for all conversations, thus are not able to engage users in a meaningful conversation.

3.3.5 Drawback 5: Unable to identify errors in user questions

Each language has a different sentence structure and thus the structure of texts, punctuation and use of spaces differ between them. Chatbots need to be able to understand it to make sense of the question according to its context. Furthermore, when dealing with text-based chatbots, users may incorporate shorthand or make grammatical mistakes when writing their questions. If the chatbot is not able to understand or correct these, it will not understand user’s question and will not be able to generate an appropriate response. All types of errors such as spelling errors, syntax errors, punctuation errors, semantic errors and non-word errors need to be addressed so that the chatbot can understand the meaning of the user’s question. To correct such errors, the chatbot should first identify them and then notify the user of the mistake or error and recommend this be corrected before generating a response. Existing chatbots do not consider such an approach and thus is unable to generate meaningful responses if the user has made an error.

45

3.3.6 Drawback 6: Unable to learn continuously from a user-bot conversation

As users interact with the chatbot, new patterns or information can be observed from their conversation. By examining the conversations, chatbots can understand new information and store this into a KBDB to generate future solutions. However, existing chatbots do not analyse and extract patterns from user-bot conversations and fail to incorporate the new information into the conversational flow. For example, if a chatbot knows how to answer a question like “how do I add another user?” it can automatically recognize “where do I add another user?” as having the same meaning. Similar phrasing can automatically be added to its knowledge bank so that future questions that follow the second sentence can be answered using the same response to the first question. By doing so, chatbots will learn and automatically improve the quality of the support they offer to their users. Existing chatbots do not have the ability to do this.

3.4 Research Problem Addressed in this Thesis

To solve the aforementioned drawbacks in the dialogue-based domain-oriented application, the problem to be addressed in this thesis is defined as follows:

Develop and validate IntelliBot with a modular-based framework by which domain-oriented chatbots can engage with users in natural language and address their questions related to the insurance domain. In generating a response, IntelliBot should have multiple response generation strategies which it can use to address questions of different level/s of complexity. Furthermore, IntelliBot should identify and address any grammatical errors or shorthand text which users may use so that an appropriate response to the user’s question is generated.

To address the aforementioned problem, the following sub-problems have been identified that need to be addressed:

Sub-problem (1) – Develop IntelliBot’s conceptual model so it can engage with the user and address their queries related to the insurance domain.

The objective of this sub-problem is to develop the conceptual model for IntelliBot with all the sub-components that enable it to perform tasks that range from 46

understanding the user’s input to generating an appropriate response. The required sub-components should have NLP capabilities to understand the user’s question, context, intent and then accordingly generate a response. In Chapter 4, the solution to this sub-problem is proposed.

Sub-problem (2) – Develop different response generation strategies that IntelliBot can use to answer a user’s question according to its complexity and the process to train them.

The aim of this sub-problem is to design IntelliBot’s conceptual architecture by building four response generation strategies, namely the template-based strategy, knowledge-based strategy, internet retrieval strategy and generative-based strategy. These strategies are responsible for generating meaningful responses and engaging the user in human-like conversation. In Chapter 4, the high-level design of these four strategies is proposed and in Chapter 5, the process by which responses are generated through each is explained in detail.

Sub-problem (3) – Develop the detailed working of the different sub-components of IntelliBot that assist it to process and understand the user’s input along with correcting the grammatical errors in the user input and the chatbot’s generated output.

As previously discussed, IntelliBot’s architecture is modular based and has five response generation components, namely input processing unit, language understanding unit, strategy selection unit, response generation unit and response analyser unit. The purpose of this sub-problem is to develop the working of these five sub-components which enable IntelliBot to respond to a user’s query. The Language Understanding Unit (LUU) component understands a user question by breaking it into smaller pieces using various techniques such as tokenization, abbreviation, POS tagging, grammar check, stop words removal, lemmatization, entity extraction and punctuation removal. The working of each of them is detailed in Chapter 6. User questions may use shorthand or grammatical mistakes in their inputs. The aim of the grammar error correction component is to correct user questions so that IntelliBot can understand them and generate an appropriate response. Chapter 6 explains the six types of errors, namely structure errors, syntax errors, punctuation errors, semantic errors, spelling errors and non-word errors which IntelliBot considers when generating a response. 47

Sub-problem (4) – Develop an approach to collect insurance domain-specific data required to train IntelliBot.

The response generation strategies in sub-problem 2 require domain-specific data for them to train IntelliBot. The goal of this sub-problem is to collect insurance domain-specific data on which training will be conducted so that IntelliBot understands domain-specific terms and keywords and is able to generate an appropriate response using the response generation strategies. A data collection strategy is developed to collect data from various sources such as the knowledge database, ANZ and Commonwealth Bank websites and the Cornell movie dialogue corpus. As these data are not suitable for training IntelliBot, a data preparation technique also needs to be developed. This process is explained detail in Chapter 7.

Sub-problem (5) – Compare and validate the outputs of IntelliBot with existing chatbots from the literature to demonstrate IntelliBot’s accuracy and superiority in engaging with the users while answering their questions.

The objective of this sub-problem is to evaluate IntelliBot’s generated response to user queries against three publicly available chatbots. To do this, all responses will be recorded and then these responses will be evaluated by the experts to determine their accuracy in relation to the questions asked. The accuracy is measured by F1 score and Cohen’s kappa metrics. Chapter 8 presents the adopted approach in detail for a comparison and validation of IntelliBot’s response.

3.5 Adopted Research Methodology to Solve the Thesis Problem

This thesis addresses the aforementioned research issues and proposes a flexible solution to build a modular-based framework for a domain-oriented chatbot which understands natural language to generate response. In order to ensure that the research is conducted systematically and is built on well-tested techniques and tools, it is necessary to follow a systematic approach to ensure the research is aligned with data science and machine learning standards. The research approaches can be grouped into two categories: 1) social science approach and 2) science and engineering approach.

48

Social science approach observes and analyses human behaviour using empirical methods of research and a set of hypotheses is formulated based on the observations. The aim of this approach is to approve or reject the hypothesis [87]. A hypothesis is an educated guess regarding what the researchers expect to find. Social science research is either quantitative, qualitative or a mixed approach [88]. Quantitative approach researches are explorative. They focus on numerical and unchanging data that can be used to classify features, predict future results, and construct statistical model in an attempt to explain what is observed [89]. On the other side, the qualitative approach is descriptive, and regards phenomenon which can be observed but not measured. The results of qualitative research can vary according to the skills of observer. However, it is interpreted by all experts in an almost similar manner. While the social science research approach does not develop new technology, it thoroughly evaluates different aspects of existing methods [90].

Science and engineering approach analyses data, makes a prediction then validates it with the observation data [91]. The aim of this approach is to devise scientific theories to explain a phenomena and develop a solution to address the identified problems. Science and engineering research is conducted either by a qualitative or mixed approach [92]. It is typically experimental and is dependent on the architectures, techniques, tools, methods, concepts and collection of observational data. Its goal is to make something work, which suits the problem to be addressed in this thesis. Thus, this thesis adopts the science and engineering approach, as shown in Figure 3.1. The research approach adopted in this thesis is divided into four phases as follows: theoretical study, addressing the problem, solution design and validation. A brief description of each phase is presented in the next sub-sections.

49

Fig. 3.1 Research methodology adopted in this thesis to solve the research problem

3.5.1 Theoretical study

This is the initial phase of the research in which broad study is required in the different areas of AI, namely ML, DNN or RNN to identify the gaps in the existing research. This includes conducting a taxonomy of chatbots, and past and current trends before identifying the drawbacks in existing chatbots. To obtain a good foundation of knowledge is the aims of this phase, in Chapter 2, this thesis reviewed the previous literature from journals and conference articles related to these areas.

3.5.2 Addressing the problem

Based on the literature review from the previous phase, the purpose of this step is to define the problem the thesis aims to solve and understand the goal of the problem. This was addressed in this chapter of the thesis.

3.5.3 Solution design

In this phase, the aim is to design new solutions and concepts to solve the problem defined in the thesis. In Chapter 4, this thesis proposes the framework for IntelliBot, which is a dialogue-based chatbot to generate meaningful responses to engage the users in continuous conversation. Different components of IntelliBot are required to achieve this goal as defined in Chapter 4. In Chapter 5, the four response generation strategies which are developed to engage the users are explained. Chapter 6 details the spelling correction and other 50 components required to generate a response. Chapter 7 explains the process of training the deep bidirectional recurrent neural network (DBRNN).

3.5.4 Experiment

The purpose of this phase is to test the accuracy and performance of the IntelliBot framework in generating a response in the insurance domain. The working of IntelliBot is compared with the output of three publicly available chatbots in Chapter 8. The quality of the generated responses is measured using F1 scores and Cohen’s kappa metrics.

3.6 Conclusion

This chapter explains the research problem that is addressed in this thesis. It discussed the shortcomings in the area of dialogue systems, and their inability to generate meaningful responses to domain-specific QAs. The different research issues that need to be addressed to solve the research questions were then presented. The details of the research methodology adopted in this thesis were presented. The next chapter provides a solution overview of the proposed IntelliBot.

51

CHAPTER 4

“Basic research is what I am doing when I don’t know what I am doing” —Rocket scientist

SOLUTION OVERVIEW

4.1 Introduction

4As discussed in chapter 1, this thesis proposes IntelliBot which is a domain-specific chatbot for the insurance industry. The aim of this chapter is to describe the design of IntelliBot’s architecture as a modular-based and scalable framework for a continuous natural language conversation and solve user queries specifically in the insurance domain. The proposed architecture facilitates the building of IntelliBot on various response generation strategies, including a seq2seq model in deep bidirectional recurrent neural networks (DBRNN) with self-learning capabilities. These strategies will assist IntelliBot to generate a meaningful response and engage with the user in human-like conversation.

The structure of this chapter is as follows. Section 4.2 defines the key terms needed to introduce IntelliBot. Section 4.3 defines the basic requirements that IntelliBot, as a domain- specific chatbot designed to answer users’ questions, should meet. Section 4.4 categorises the process of building IntelliBot’s framework using a methodological approach with four different steps. Section 4.5 presents the design of IntelliBot’s various response generation components. Section 4.6 concludes the chapter.

4.2 Key Terms

DBRNN is a Deep Bidirectional Recurrent Neural Networks, are two hidden layers running in opposite directions to a single output, allowing them to receive information from both

4 Parts of this chapter have been published in [20]. 52

the previous and next states. For example, to predict a missing word in a sequence, it looks at both the left and right context. RGC is a Response generation component which is responsible for understanding the user’s question and generating responses. This component uses various response generation models and different NLP techniques to understand what the user is asking and construct responses to it. Entity is anything having an existence. DOM parsing is Document Object Model that extracts information from a tree-like structure such as HTML or XML.

4.3 Requirements of a Domain-specific Chatbot

A key requirement for service-based chatbots is for them to engage with the customers and answer their queries correctly [93]. These queries can be across a wide range of spectrum of the particular service domain. This is important as research shows that 91% of unhappy customers will not engage with the business again [16]. To keep the customers engaged, the chatbot needs to have meaningful dialogue abilities rather than merely providing either a yes or no or a short response. Dialogue abilities enable the chatbot to converse with the user according to the terminology of the domain. So, taking these requirements into consideration, this thesis defines a domain-specific chatbot as follows:

Definition: A domain-specific chatbot for the service industry is one which has the conversational capability to engage with the user while answering their queries. In doing so, it should be trained on a domain-specific dataset so that it can present meaningful responses that contain the semantically correct information.

The important terms in the above definition are underlined to stress their importance in meeting expectations. The meaning of each term is explained in the following.

• Requirement 1 (R1) Conversational capability. Conversations are the core requirement of a chatbot. However, to be engaged with the users, rather than merely providing a short yes or no answer, the chatbot should generate accurate and meaningful responses by identifying the topic intent, entities and context [94].

53

For example, in relation to the questions “What is the cash advance rate of the credit card?”, “What is the interest-free period of my credit card?” and “What is the annual fee of a credit card?” the user might be expressing their intent to enquire about their credit card rate and fees. Before answering the question, the chatbot should ask “which credit card are you referring to?” as the user did not specify this. By identifying the intent and entity, the chatbot can engage the users in a long conversation. Furthermore, the length of the chatbot’s response to the query also is a factor in keeping the user engaged and happy. Research has shown that a short answer often leaves the user dissatisfied [95]. For example, in response to the user’s questions ‘I am not feeling well’ and ‘I am sad’, a chatbot using its conditional response library would simply mention ‘How can I help you?’ for both the questions. But a human being would reply ‘How can I help you? Do you need medical help?’ and ‘I am sorry to hear that. Why are you sad?’ respectively. The human response shows empathy which chatbots should be able to replicate in their response, so it relates more to the user. This is justified in [96] which states that chatbots that assist in customer support should not appear to be too serious and transactional, as they do not inspire continued use. So, chatbots need to keep customers engaged and have conversational abilities rather than just providing either a yes or no to a short response.

• Requirement 2 (R2) Present semantically correct information. Semantics help to bring the correct meaning to a word according to the context of a sentence. For instance, the word “create” can mean build, make, construct or compose. When the context of the sentence is database creation, the word “build” is more appropriate than “compose”. In an insurance scenario, for the question “when will the policy cover end?”, the word “policy” refers to an insurance plan, not to a strategy or approach as it may refer to other domains. Thus, a chatbot should have a vocabulary such that it is able to understand and generate responses that are syntactic, pragmatic and semantically correct according to the context of the information presented [21].

• Requirement 3 (R3) Present meaningful responses. When generating a response, the chatbot should not only give a response which is semantically correct, it should also provide significant detail for easy understanding [97]. For example, if the user says, “My gold credit card should have arrived two days ago, but it has not arrived yet”, the chatbot

54

in response should give a meaningful response such as “Let me check on the delivery status from the carrier. Give me just a moment”, rather than showing the standard response as “the standard shipping time is 3-5 business days”. Similarly, for a question “Why are cash advances not permitted on my AMEX card?”, a meaningful response from the chatbot would be “You’re holding a corporate card. As per corporate card policy, you’re not allowed cash advances” rather than just saying “cash advances are not allowed in AMEX”.

• Requirement 4 (R4) Trained on a domain-specific dataset. To build a domain-oriented chatbot, it needs to be trained on a specific domain rather than just simple dialogue dataset [94] such as the Cornell movie dialogue and Twitter dataset. This is important for the chatbot to understand domain-specific terms, information and workflow. Training on simple dataset, such as the Cornell movie dialogue dataset, will enable the chatbot to engage with the user, but it will be a basic conversation only and will not be able to respond to domain-related queries.

This thesis proposes a framework for IntelliBot so that these requirements are achieved. In the next section, the methodological approach by which machine learning engineers, data scientists and chatbot designers can design a chatbot with these requirements is explained.

4.4 Methodological Approach for Designing and Building a Domain-specific Chatbot

To achieve the aforementioned requirements (R1—R4), various design processes which are classified into different tasks need to be implemented. To make IntelliBot practical and implementable, the process of building IntelliBot in this thesis is divided into four tasks namely, identify components, design conceptual framework, develop & train AI Model, and experiment & validation as shown in Fig. 4.1. The following sub-sections discuss the objective to be achieved in each task.

Fig. 4.1 Methodological approach 55

4.4.1 Identify components

The objective of this task is to identify the different components required to build IntelliBot. These components are as follows:

• Interface component. The interface component links the chatbot with the user through an app or webpage. As shown in Fig. 4.2, this component is responsible for capturing information from the user, checking input validation and forwarding this information to the response generation unit. Finally, it conveys the response to the user.

Fig. 4.2 Components required for building a chatbot application

• Response generation component (RGC). This component is responsible for understanding the user’s question captured by the interface component and generates responses to be given to the user. This component uses various response generation models and different NLP techniques to understand what the user is asking and develop a response to it.

• Data layer component. As shown in Fig. 4.2, this component is responsible for holding both generic and domain-specific information that is required to answer user queries. Various types of data, for example user profile, questions and answers categorized into different topics, user-bot conversation history, domain-specific knowledge such as credit card insurance is required for IntelliBot to work effectively. This component is connected with external knowledge or data sources to produce more meaningful answers. In this thesis, a document-oriented database such as MongoDB and a rational database such as MySQL are used to store the required information.

• Integration layer component. This component is responsible for linking the chatbot to existing systems, platforms and databases for it to access and retrieve the required 56

information, such as workforce management, a third-party service provider etc. Integration with existing components should be plug & play so that it minimizes development efforts and improve system performance and productivity. Another integration component is an authentication layer that manages system security, identifies an individual user and protects their sensitive information which is verified by an authentication process. It enables the resources to be either accessed or denied to the user.

As this thesis focusses on developing those components which enable a chatbot to respond to a user query, it focuses on the Response Generation Component (RGC). In other words, it is assumed that the chatbot has the required interface and access to all information sources and integrates different systems to obtain the data. From this perspective, Fig. 4.3 illustrates the Neural Dialogue Manager (NDM) of IntelliBot. The NDM, which is inside the RGC, has six different modules to generate responses to engage a user, namely the input processing unit, language understanding unit, strategy selection unit, response generation unit, response analyser unit and context tracking unit. The objectives and aims of each unit are explained briefly as follows:

Fig. 4.3 Components required in a response-generating chatbot application

• Input Processing Unit (IPU): This unit processes the user’s input, which can either be in text or voice (speech) form using NLP techniques. In this thesis, we limit ourselves to consider only the user’s text as input. The goal of this unit is to clean the user input text by applying various techniques such as lowercasing, stopword removal and word segmentation for better knowledge discovery before sending it to the next unit. Further details of the working of the IPU is explained in section 4.5.2. 57

• Language Understanding Unit (LUU): This unit understands the user’s question by taking the word segments from IPU as its input. Various techniques, such as tokenisation, abbreviation check, POS tagging, grammar check, , named entity recognition, context identification and query classification are needed to understand the user’s question. Further detail of the working of LUU is explained in section 4.5.3.1.

• Strategy Selection Unit (SSU): This unit is responsible for deciding which conversational strategy to select in order to generate a response and engage users in continuous conversation. Four strategies have been developed for IntelliBot to use, namely the template-based strategy, knowledge-based strategy, internet retrieval strategy and generative-based strategy. An AI selection process which sequentially determines which strategy best fits best according to the specifics of the user’s question is adopted. Further detail of the working of SSU is explained in section 4.5.3.2.

• Context Tracking Unit (CTU): This unit is used in different stages of IntelliBot’s working. It is used to undertake tasks such as determining the intent of the user’s questions, forming the user’s query, performing system actions and handling errors. Further detail of CTU’s working is explained in section 4.5.3.3.

• Response Generation Unit (RGU): This unit generates responses to the user’s query by accessing the required data from multiple data sources according to the selected response generation strategy. As the working of each strategy is different, appropriate components are needed to generate a meaningful response. Further detail of the working of RGU is explained in section 4.5.3.4.

• Response Analyser Unit (RAU): This unit analyses the response generated from the RGU to ensure that it answers the user’s question. Techniques such as response filtering, grammar checking and answer scoring are needed to achieve the objective of this unit. Further detail of the working of RAU is explained in section 4.5.3.5.

4.4.2 Design conceptual framework

The objective of this task is to design in detail the conceptual model of the different units of the RGC introduced above. The design process should be according to the software engineering principles, AI methodology and the latest development tools that will assist in 58 meeting objectives of the chatbot. The design of each unit will result in various independent modules that need to be integrated to provide all the necessary services. The details of the various components that comprise the different units of NDM are explained in Section 4.5.3.

4.4.3 Develop and train AI model

The aim of this task is to build and train IntelliBot so that it generates the most appropriate answer to the users’ queries. The chatbot needs to be firstly trained in conversing in the English language and then trained in using domain-specific information. IntelliBot was trained to converse in English using the Cornell movie corpus dataset and it was trained to answer insurance-related information using the insurance QA dataset. This training was performed using the Tensorflow seq2seq model with an attention mechanism in DBRNN. Chapters 6 and 7 discuss in detail the process of training IntelliBot.

4.4.4 Experiment and validation

The purpose of this phase is to assess the effectiveness of IntelliBot in an actual working environment. To do this, this thesis conducts two empirical sets of evaluation of IntelliBot’s output and compares this with the outputs of various publicly available chatbots. Two experts examined the responses of the chatbots and rated them. Matrixes such as F1 score and Cohen’s kappa co-efficient are used to measure the efficiency and effectiveness of each chatbot. Chapter 8 discusses the process of conducting the experiments and the validation in more detail.

4.5 Proposed Conceptual Model of IntelliBot’s Response Generation Component

Fig. 4.4 shows the high-level conceptual framework of IntelliBot’s RGC. As seen from the figure, there are multiple units that need to work together for the RGC to generate a response. A brief explanation of each unit is presented in the next sub-sections.

59

Fig. 4.4 Conceptual framework of IntelliBot

4.5.1 User emulator

The user emulator uses an interactive user interface to connect the user and the chatbot. It receives the user’s input (user question) as natural language and displays the response (output) to the user’s query. At the input side, it forwards the user query to the IPU and LUU over the authentication layer. Both a web browser and mobile can act as the user emulator as shown in Fig. 4.5. Techniques such as AngularJS, HTML5, bootstrap and iconic framework are used to design the user emulator. RESTful API is used to exchange messages between the user and the AI model.

60

Fig. 4.5 Mobile and web Interface of IntelliBot

4.5.2 Input Processing Unit (IPU)

The Input Processing Unit (IPU) takes the user’s input before queuing and pre-processing it. During the pre-processing, the objective of IPU is to validate the user’s input to ensure that it does not violate the pre-defined rules. Tasks such as removing extra whitespaces, extra lines and non-ASCII characters are undertaken in this process. Furthermore, all characters are converted to lowercase and numbers are converted to their word equivalents for better knowledge discovery. For example, if the user presses ‘enter’ without any text, IntelliBot checks the input against the validation rules and if it passes, it is forwarded to LUU.

4.5.3 Neural Dialogue Manager (NDM)

As discussed in section 4.4.1, the NDM is the core of IntelliBot’s RGC which is responsible for end-to-end input processing and response generation. As shown in Fig. 4.3, NDM has five units namely: the Language Understanding Unit (LUU), Strategy Selection Unit (SSU),

61

Response Generator Unit (RGU), Response Analyser Unit (RAU) and Context Tracking Unit (CTU). Fig 4.6 shows the working of these units in more detail. A brief description of the working of each unit is explained next.

Fig. 4.6 Neural Dialogue Manager (NDM) of IntelliBot

4.5.3.1 Language Understanding Unit (LUU)

The Language Understanding Unit (LUU) is responsible for various tasks that aim to understand the meaning of the user’s question. Therefore, it is like the glue between the user’s input and the other units of NDM. It receives the user’s query from IPU and parses it into a semantic frame. It automatically classifies the user’s query with intents, domain- specific terms and fills in a slot to form a semantic frame. The objective is to obtain the conditional probability of the user’s word sequence [98]. The following tasks are performed by LUU:

• Tokenization: Tokenization is a technique of chopping sequence of input text into words, symbols, phrases or text elements. It is known as token. For example, if the user input is “what is the monthly premium?”, there will be six tokens after the tokenization process. These are: “what”, “is”, “the”, “monthly”, “premium”, “?”. This is a mandatory step before any kind of NLP process such as parsing, POS tagging, entity extraction, grammar checking and lemmatization etc.

62

• Abbreviation: this is a shorthand form of a word or phrase, used as a symbol for the full form [99] which comprises the initial letters of a collection of words. It should not be considered a spelling error. For example, in a sentence such as “Contact you ASAP”, the abbreviation ‘ASAP’ is not a dictionary word so the chatbot should identify “ASAP” as an abbreviation whose full form is “As soon as possible”. Correct recognition of abbreviations and their full forms is very significant for understanding a user’s query, the user’s context and correcting grammar.

• POS tagging: the part-of-speech (POS) explains how a word is used in a sentence. It is a process of assigning part-of-speech marker such as noun, verb, adjective, adverb, preposition to each word in user’s input sequence. Considering the sentence “book the flight”, book is a verb, but in the sentence “give me the book”, book is a noun. POS tagging is disambiguation task where the goal is to find the proper tag for the given word. It is a very important step to understand, extract relationships and find grammatical or lexical patterns in a sentence.

• Grammar Check: Grammar checking is a task of analysing grammar rules, sentence structure and spelling mistakes that are quite commonly made by users in text-based input. Text containing grammatical errors could lead to the generation of incorrect responses. Therefore, it is essential to be able to identify and correct these grammatical errors. Grammar checking enables the automatic detection and correction of any faulty, unconventional or controversial usage in the underlying grammar.

• Lemmatization: this is a morphological analysis of words. It aims to remove inflectional endings only and transforms the word back to its common base form or dictionary form of a word, called a lemma. For example, the base form of the word “studies” is study. The base form of the word “boys” is boy and the base form of the word ‘running’ is run.

• Entity extraction: this is a process of information processing to identify and extract the named entities and classify them under various predefined classes such as PERSON, ORGANIZATION, DATE, LOCATION etc. Entity extraction techniques automatically pull

63

proper nouns from text and determine their common entity tags such as person, location, organization, events. For example, in a sentence such as “Nuruzzaman studies at UNSW”, the entity extraction extracts “Nuruzzaman” as a person and “UNSW” as an organization.

• Punctuation removal: punctuation and stopwords are not necessary for the AI model to predict and generate a response. Punctuation marks are special characters such as: ? @ # % & * ! and so on. Stopwords are: am, is, was, and, the, an, a, he etc. For example, in the response “Hello!!, how? are you?” IntelliBot processes this as “Hello how are you” as a result of punctuation removal.

The working of the aforementioned tasks of LUU is explained in detail in Chapter 7.

4.5.3.2 Strategy Selection Unit (SSU)

The central part of the NDM is the strategy selection unit (SSU). After the IPU and LUU complete the NLP processes, the SSU identifies and select the best strategy to generate a response. As shown in Fig. 4.7, four possible strategies are proposed in this thesis for IntelliBot to choose from while generating a response. They are template-based, knowledge- based (KB), internet retrieval (IR) and generative-based. Each strategy has a different data structure, matching techniques, and strategy to generate responses [100]. The neural dialogue manager (NDM) selects the most appropriate strategy which generates not only semantically correct but also meaningful responses for the user queries throughout the conversation. An AI selection process is adopted which sequentially determines which strategy best fits the selection criteria required for it. The process of how IntelliBot selects a strategy from which to generate a response is briefly explained as follows.

64

Fig. 4.7 Selection policy of AI conversational strategies

Template-based Strategy: This strategy is a collection of predefined rules and it is given the first priority in answering the user’s question. This strategy encodes human knowledge into the form of templates. As shown in Fig 4.8, after LUU performs the grammar check on the user input, pattern matching is used to determine if the user’s question matches a template. If it does, then the appropriate response to the matched template is IntelliBot’s answer to the user. In other words, the template-based strategy matches the entity of the question identified by the LUU together with the AIML rules as shown in Fig. 4.8. If the user input is recognized, the template is retrieved, and the RAU presents the response to the user. If there is no match between the entity of the user’s question with what is defined in the template, IntelliBot uses the knowledge-based strategy to ascertain whether it can be chosen or not. The working of the template-based strategy is explained in detail in Chapter 5.

Fig. 4.8 High-level workflow of template-based strategy

65

Knowledge-based Strategy: The knowledge-based strategy searches the existing KB database (KBDB) to answer the user’s question. As shown in Fig. 4.9, the query engine forms a query to generate a model of the user’s scenario to determine the facts necessary to generate a response, and if those facts are in the KBDB, accumulate these facts into a structure and conveys it to the RGU. The RGU determines the semantic similarity of the selected results and if they match above a certain level of threshold, it passes them to the RAU. The RAU passes the top-scoring answer to the user as the output. If this strategy is not able to match the facts of the question with those stored in the KBDB, then the IR strategy is assessed to ascertain if it can be used to generate a response to the user’s question. The working of the knowledge-based strategy is explained in detail in Chapter 5.

Fig. 4.9 High-level workflow of knowledge-based strategy

Internet-retrieval Strategy: This strategy searches for a possible answer from the Internet or intranet. As shown in the Fig. 4.10, a query is formed with the entity of the question, and the results retrieved from the Internet are stored as text. This text may be huge in volume as it is crawled from the Internet and may also include many errors such as spelling mistakes, HTML tags, special characters etc. So, as an additional step in DOM parsing and content segmentation is needed before the text is processed by the RGU. RGU determines the semantic similarity of the selected results with the question and passes it to the RAU which determines if it matches above a certain level of threshold. If it does, then it is passed to the user as output. If it does not, then the generative-based strategy is assessed to ascertain whether it can be chosen or not. The working of the IR strategy is explained in detail in Chapter 5.

66

Fig. 4.10 High-level workflow of Internet retrieval strategy

Generative-based Strategy: As shown in Fig. 4.11, the generative-based strategy is based on neural machine translation (NMT) techniques [101]. It “translates” from the user input sentences input to a output. This strategy is able to bring up entities from the input sentences and give the impression that the user is speaking to a human. This makes IntelliBot smarter and more advanced than other existing chatbots. But it requires a complex design and implementation of which are comparatively difficult to build. To generate an output, DBRNN uses the seq2seq model which trains the AI model by focusing on key elements of the sentence and considers previous input words that have an extra piece of information. By doing so, it develops the ability to accurately predict the next word. The generated message is passed to the RAU which determines if it matches above a certain level of threshold to the question after filtering. If it does, then it is passed on to the user as output. The working of the generative-based strategy is explained in detail in Chapter 5 and the process of training is explained in Chapter 7.

Fig. 4.11 High-level workflow of generative-based strategy

67

4.5.3.3 Context Tracking Unit (CTU)

Irrespective of which strategy is selected for generating responses, the entire conversation is stored into a database for further analysis. This is important as the conversation history may need to be accessed at regular periods of time or the chatbot may need to remind itself of the user’s question. Context Tracking Unit (CTU) keeps track of the user’s history, stores it in the database and accesses it when needed. It is also responsible for analysing the intent and identifying the theme or area of the user’s conversation. CTU comprises four sub- components, namely Context Discovery, Dialogue State Tracker, Policy Learner, and Error Controller. The following sub-sections describe the need for and role which each component plays in IntelliBot’s working.

Context Discovery

The context discovery component is used in every response generation strategy. It is responsible for identifying the context of the user’s query which includes topic detection and intent analysis. Topic detection identifies the subject or area of the domain in the user-bot conversation. Intent analysis identifies the intent. In cases where the user’s question needs to be linked with his previously asked information, the context discovery component retrieves the most recent conversational history of the user from the ChatLog, relevant to the current context. This history is then tokenized, and informative keywords are extracted to determine the context. Techniques such as Stanford CoreNLP [102] are used to determine the topics and intents. When detecting the current context, it takes the previously identified context into consideration.

Dialogue State Tracker (DST)

This component constantly monitors and updates the status of the conversation. This is required in the KB strategy, in which users’ inputs are formed as questions and the knowledge from the database is used to respond to them. In this case, it may be possible that the user may ask a question that relates to a question which they had asked some time ago. For IntelliBot to answer such a question effectively, it needs to link the user’s current question with the previous related question. The Dialogue State Tracker (DST) component of the CTU is responsible for doing this. DST constantly updates the state of the conversation and builds a robust and reliable representation of the current state of the conversation. It 68 keeps track of the user inputs, query results, and system actions. As IntelliBot focuses on a semantic level, a rule-based state tracker via supervised learning is used [103]. The DST performs the following three major functions:

i. A symbolic query or semantic frame is formed to interact with the database to obtain appropriate results. ii. DST updates based on the user dialogue action and results obtained from database. iii. DST prepares the state representation for the policy learner unit.

Policy Learner

The policy learner is responsible for selecting the best action from the available results in a database and retrieves information and history dialogue etc. IntelliBot is designed to respond to the user inputs in a way that results in achieving the user’s goal in a minimal number of dialogue turns. Based on the current dialogue state, policy learner module generates the next available action. For example, in the insurance scenario, if the dialogue state is “insurance claim”, the “insurance_claim” action is executed, and the IntelliBot retrieves it from the database. This could be trained using DBRNN which simultaneously learns the feature representation and dialogue policy.

Error Controller

This component is used in the processing of IntelliBot’s four conversational strategies. It is responsible for identifying errors, both in the processing of IntelliBot and in the text inputted by the human and corrects it.

• In relation to the text inputted by the human, natural language understanding does not operate without errors. When IntelliBot detects an error in the user question, whether it be grammatical or conceptual, the error controller is used to decide whether to ask the user for confirmation with the corrected meaning. It is important for such errors to be corrected as without this, IntelliBot may either generate an incorrect response or may not generate any response at all. Chapter 7 explains the working of the grammar checking component of IntelliBot in detail which uses the error controller of CTU.

69

• In the processing of IntelliBot, errors are those tasks that cannot be performed due to programming or logical errors which require software engineers to fix the bugs. In the presence of errors, the error controller of CTU is used to correct them.

4.5.3.4 Response Generator Unit (RGU)

Depending on the strategy chosen, the RGU acts as a decoder of the IntelliBot framework. • It is possible that RGU may generate more than one response if a knowledge-based, internet retrieval or generative-based strategy is used. In such cases, the RGU needs to select which response is the most suitable one to respond to the user’s question. For this purpose, RGU computes the word and sentence similarity of each response with a question to determine the semantic similarity of the responses to the user’s query. • In the generative-based strategy, the RGU generates the probability of the next word occurring in the sequence using DBRNN with an attention mechanism. This represents the output in the natural language.

Chapter 5 explains how RGU generates a response and Chapter 6 explains the process of determining the semantic similarity of a generated response with the user’s question.

4.5.3.5 Response Analyser Unit (RAU)

The Response Analyser Unit (RAU) is the glue between the system and the user. This unit takes responses from RGU and forwards these to the user. Before doing this, it performs filtering checks to eliminate or filter questionable responses. In the case of more than one valid response from RGU, RAU applies a scoring process to rank the answers and select the best response or to merge them into one before forwarding to the user. The score is assigned to rank the answers by the RAU component which determines the answer’s relevance with the corresponding question. If the similarity passes the pre-defined threshold, this is presented to the user. In this way, IntelliBot provides a balance between accuracy and flexibility in the evaluation process. Chapter 6 explains the working of this unit in detail.

70

4.6 Conclusion

In this chapter, the conceptual architectural model of IntelliBot’s RGC is introduced. The architecture was designed based on requirements of a domain-specific chatbot such as conversational capability, present semantically correct information, present meaningful responses and training on domain-specific dataset. The proposed architecture facilitates the building of IntelliBot on various response generation strategies for them to engage with the customers and answer their queries correctly. A methodological approach is applied for designing and building a robust framework for IntelliBot that will enable it to keep the user engaged and respond to its queries.

71

CHAPTER 5

“Software architecture is the set of design decisions which, if made incorrectly, may cause your project to fail.” — Eoin Woods

DESIGN MULTI-STRATEGY SELECTION AND RESPONSE GENERATION

5.1 Introduction

5As discussed in the previous chapter, the NDM of IntelliBot has four different strategies to generate an appropriate response to the user’s question. These strategies are template- based, knowledge-based, Internet retrieval-based and generative-based. The NDM must select that strategy which meets the requirements mentioned in Section 4.2 and increases involvement of the user. As each strategy has a different data structure, matching techniques, and working process to generate a response [100], our focus in this chapter is to explain the working of each strategy in detail. Specifically, we focus on the techniques which are used in each strategy to generate a response. These techniques are used in the SSU of the NDM as shown in Fig. 4.4.

The structure of the chapter is as follows: Section 5.2 introduces the key terms needed to explain the working of the SSU. Section 5.3 briefly explains how IntelliBot selects a strategy to generate a response. Sections 5.4-5.7 explain the working of each strategy in detail. Specifically, Section 5.4 illustrates how predefined rules are used to generate a response via the template-based strategy. Section 5.5 demonstrates the process of using query formation through events and entities required to generate a response in the knowledge-based strategy. Section 5.6 explains the process of information extraction from selected websites and generates a response through the Internet retrieval strategy. Section 5.7 presents the

5 Parts of this chapter have been published in [20]. 72 working of the bidirectional recurrent neural networks to generate a response in the generative-based strategy. Finally, Section 5.8 concludes the chapter.

5.2 Key Terminology

Context is the particular setting or situation in which the content occurs. The meaning of a sentence is always context dependent. Event is an occurrence happening at a determinable place and time. Token is segmented word in a sentence or piece of an . WordNet is a large lexical English data dictionary developed and hosted at Princeton. It is part of NLTK corpus. WordNet It is able to find meaning of words, synonyms, antonyms and more. Approximately 117,000 synsets are found in WordNet. Web Crawler is an application or set of instructions that analyse the web pages in a systematic and automated manner to categorize information on the basis of user demand. DOM parsing is Document Object Model that extracts information from a tree-like structure such as HTML or XML. LSTM stands for long short term memory, is a special kind of RNN architecture that extends the memory of RNN. It is designed to remember information for long periods of time. Vector is set of weights. There are several vectors in this thesis, namely, word vector, thought vector, embeddings vector, hidden state vector, bias vector.

5.3 Strategy Selection Unit’s Workflow to Generate a Response to the User’s Query

As mentioned in Section 4.5.3.2, IntelliBot has four unique response generation strategies to generate a response. The quality of the response generated by each strategy along with the way it is generated is different for each. Thus, the NDM has the challenging task of selecting a strategy in response to a user’s question that not only generates semantically correct and meaningful responses but also keeps the user engaged throughout. In doing so, it is possible that NDM for a user conversation which may consist of many questions uses different strategies for different questions. In other words, this means that depending on the question asked, NDM may choose a different strategy to respond to it irrespective of what strategy was used to answer the previous question from the same user in the same conversation. The

73 schematic representation of the selection process in SSU is shown in Fig. 5.1 and explained below. The following sequential process to determine which strategy NDM should select to respond to the user query is as follows:

Fig. 5.1 Conversational strategy selection in SSU

• The template-based strategy is the first strategy which IntelliBot assesses to determine if it can be used to generate a response. This strategy has pre-defined patterns that checks whether the structure of the user’s question matches the predefined rules in the template. If they match, then it is used to generate a response. Otherwise, the suitability of the next strategy is assessed.

• The knowledge-based strategy is the second strategy which IntelliBot assesses to determine if it can be used to generate a response. This strategy identifies the contexts and facts from the users’ question and matches them with the information about the questions stored in the underlying databases, user-bot conversation history and any new knowledge learned during the conversation. If they match, then the corresponding answers of the questions are presented to the user. Otherwise, the suitability of the next strategy is assessed.

74

• The objective of the Internet-retrieval strategy, which is the third strategy IntelliBot assesses to determine if it can be used to generate a response, is to provide more complete and up-to-date information by identifying question type, event elements and entities for extracting data from preselected websites. The Internet-retrieval strategy is used when the KB does not have the required knowledge for which the user is asking. If the Internet also does not have the required information to generate a response, then the suitability of the next strategy is assessed.

• The fourth strategy of IntelliBot to generate a response is to use deep bidirectional RNN with the seq2seq model to generate a conversational output. The objective of the generative-based strategy is to map between previous inputs and predict subsequent words to generate responses using DBRNN.

The working of each strategy is explained in detail in the next sections.

5.4 Design and Working of the Template-based Strategy

5.4.1 Objective

Depending on the specifics of the user’s question, the objective of the template-based strategy is to generate responses using a pattern-matching technique that matches predefined rules in the template.

5.4.2 Summary of the working of the template-based strategy

The template-based strategy is a collection of predefined question-answer with set rules in the form of templates. It uses pattern matching algorithm that identifies the structure of the sentence together with the entity of the user’s input. If these match, then in response, the output of the pre-defined rules is presented to the user. Such a strategy is also termed as rule-based, where the rule refers to the formed pattern.

In the pattern-matching process, a user’s input passes through Input Processing Unit (IPU) to the Language Understanding Unit (LUU) as shown in Fig. 5.2. Then, LUU performs tokenization, abbreviation, grammar checks and removes punctuation from the user input. Upon selection of the template-based strategy by the Strategy Selection Unit (SSU), user input sequences are converted to uppercase and passed to the Neural Dialogue Manager 75

(NDM) for pattern fitting. Pattern fitting normalization determines whether the user input can be found or not in a predefined template by applying AIML rules. If a template is found, it sets the variable into the template message as necessary and the corresponding result is conveyed to the user. The answer is not filtered and checked for grammar correction as done by the other response generation strategies as the answer defined in the template is done by the expert and is assumed to have been checked for correctness.

Fig. 5.2 Design of the template-based strategy

For example, Table 5.1 shows some commonly occurring user questions in the form of patterns, where [ ∗ ] is the pattern-matching variable. In a case where the pattern matches, the response column shows the answer to be given to the user’s question.

Table 5.1 Template-based pattern matching User Query Pattern Corresponding response Who are you? WHO ∗ YOU I am an AI Chatbot for your assistance Who is Einstein? WHO IS ∗ Albert Einstein was a German physicist. What is your name? WHAT IS YOUR ∗ My name is AI Chatbot.

5.4.3 Detailed process of generating a response

IntelliBot uses templates to answer basic questions such as “what is the date today?”, “what is your name?”. Patterns are created for questions using AIML [104], which is a mark-up language based on an XML dialect used for specifying patterns and rules. AIML has 47 case- 76 sensitive tags to design the rules of patterns within the template-based strategy to respond to natural language conversations. However, three mandatory tags are required to build a block. These are: which consists of two more tags: and