Superimposition of Natural Language Conversations over Software Enabled Services
Shayan Zamanirad
A thesis in fulfilment of the requirements for the degree of Doctor of Philosophy
School of Computer Science and Engineering Faculty of Engineering January 2020 $!/%/%//!.00%+*$!!0
1.*)!)%(5)! ! %2!*)!/ ' .!2%0%+*"+. !#.!!/#%2!%*0$!*%2!./%05 (!* . "! $#! 1(05 ! $++( $#! $ ! "# #$! $ %!"# " %! #&! $!/%/%0(! !%"
# # # #! # " % $ &'# &'# % # # " &$$#'$ "# " # # # % $
# " # " " # " $ "# " ! $
# # $ " % "$ " #! # # #% ! &$$ '# &$$ ! ' &$$' " $ ! % $ $
! (.0%+*.!(0%*#0+ %/,+/%0%+*+",.+&! 00$!/%/ %//!.00%+*
$!.!5#.*00+0$!*%2!./%05+"!3+10$(!/+.%0/#!*0/0$!.%#$00+. $%2!* 0+)'!2%((!)50$!/%/+. %//!.00%+*%*3$+(!+.%*,.0 %*0$!*%2!./%05(%..%!/%*(("+.)/+")! %*+3+.$!.!"0!.'*+3*/1&! 00+0$!,.+2%/%+*/+"0$!+,5.%#$0 0 .!0%*((,.+,!.05.%#$0/ /1 $/,0!*0.%#$0/(/+.!0%*0$!.%#$00+1/!%*"101.!3+.'//1 $/.0% (!/+.++'/((+.,.0+"0$%/0$!/%/+. %//!.00%+*
(/+10$+.%/!*%2!./%05% .+"%()/0+1/!0$! 3+. /0. 0+")50$!/%/%*%//!.00%+*/0. 0/*0!.*0%+*(0$%/%/,,(% (!0+ + 0+.( 0$!/!/+*(5
66666666666666666666666 66666666666666666666 66666666666666 %#*01.! %0*!//%#*01.! 0! $!*%2!./%05.! +#*%/!/0$00$!.!)5!!4 !,0%+*( %. 1)/0* !/.!-1%.%*#.!/0.% 0%+*/+* +,5%*#+. +* %0%+*/+*1/!!-1!/0/"+..!/0.% 0%+* "+.,!.%+ +"1,0+ 5!./)1/0!) !%*3.%0%*#!-1!/0/"+.(+*#!.,!.%+ +".!/0.% 0%+*)5! +*/% !.! %*!4 !,0%+*( %. 1)/0* !/* .!-1%.!0$!,,.+2(+"0$!!*+". 10!!/!. $
0!+" +),(!0%+*+".!-1%.!)!*0/"+.3. ORIGINALITY STATEMENT
‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’
Signed ......
Date ...... COPYRIGHT STATEMENT
‘I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.'
Signed ......
Date ......
AUTHENTICITY STATEMENT
‘I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.’
Signed ......
Date ...... INCLUSION OF PUBLICATIONS STATEMENT
UNSW is supportive of candidates publishing their research results during their candidature as detailed in the UNSW Thesis Examination Procedure.
Publications can be used in their thesis in lieu of a Chapter if: x The student contributed greater than 50% of the content in the publication and is the “primary author”, ie. the student was responsible primarily for the planning, execution and preparation of the work for publication x The student has approval to include the publication in their thesis in lieu of a Chapter from their supervisor and Postgraduate Coordinator. x The publication is not subject to any obligations or contractual agreements with a third party that would constrain its inclusion in the thesis
Please indicate whether this thesis contains published material or not.
This thesis contains no publications, either published or submitted for publication ܆ (if this box is checked, you may delete all the material on page 2) Some of the work described in this thesis has been published and it has been documented in the relevant Chapters with acknowledgement (if this box is ܈ checked, you may delete all the material on page 2)
This thesis has publications (either published or submitted for publication) ܆ incorporated into it in lieu of a chapter and the details are presented below
CANDIDATE’S DECLARATION I declare that: x I have complied with the Thesis Examination Procedure x where I have used a publication in lieu of a Chapter, the listed publication(s) below meet(s) the requirements to be included in the thesis. Name Signature Date (dd/mm/yy)
Postgraduate Coordinator’s Declaration (to be filled in where publications are used in lieu of Chapters) I declare that: x the information below is accurate x where listed publication(s) have been used in lieu of Chapter(s), their use complies with the Thesis Examination Procedure x the minimum requirements for the format of the thesis have been met. PGC’s Name PGC’s Signature Date (dd/mm/yy)
i For each publication incorporated into the thesis in lieu of a Chapter, provide all of the requested details and signatures required
Details of publication #1: Full title: “Dynamic Event Type Recognition and Tagging for Data-driven Insights in Law-Enforcement” Authors: Shayan Zamanirad; Boualem Benatallah; Moshe Chai Barukh; Carlos Rodriguez; Reza Nouri Journal or book name: Springer Computing Journal Volume/page numbers: NA Date accepted/ published: Status Published Accepted and In Ŷ In progress press (submitted) The Candidate’s Contribution to the Work
I am the first author of the paper. I am primary contributor to design of the proposed concepts, techniques, implementation and experimentation. I acknowledged my collaborators in Acknowledgment section.
Location of the work in the thesis and/or how the work is incorporated in the thesis:
Chapter 3: Event Embeddings - Data-driven Insights in Law-Enforcement
Primary Supervisor’s Declaration I declare that: • the information above is accurate • this has been discussed with the PGC and it is agreed that this publication can be included in this thesis in lieu of a Chapter • All of the co-authors of the publication have reviewed the above information and have agreed to its veracity by signing a ‘Co-Author Authorisation’ form. Supervisor’s name Supervisor’s signature Date (dd/mm/yy)
ii Dedication
To my mother, who keep showering me with her love and kindness. Your love and prayers made me keep moving forward.
To the rock of my life, my father who kept asking me about the end of my research although I keep giving him wrong answers. I try my best to make you proud.
To my only brother, who is my only best friend, my hero. God bless you for all you did for me.
Ikaw ang buhay ko, Isa kang diamante sa buhay ko, Aking kaligayahan, Aking lakas, Ikaw ang lahat sa aken... Mahal kita mahal na mahal kita magpakailanman... Aking Dianne. Acknowledgements
I would like to thank all the people who supported me during this major step in my life. My family, friends and colleagues that helped me in achieving this thesis:
• I would like to thank my sponsor, Data to Decisions Cooperative Research Centre, for giving me the opportunity to be involved in this endeavour.
• I would like to thank my supervisor, Scientia Professor Boualem Benatallah. You have been such an inspiration and guidance during these years. Without you, I would have never been able to do this work. Your energy and knowledge has always amazed me. Thank you for honing my skills to become a better researcher, to be tough in failures, and be impactful and ethical in all my work. I really appreciated and enjoyed the opportunity to work with you everyday during these years.
• Thanks to Professor Fabio Casati for his helpful guidance and comments on my work. I really appreciated the opportunity to work with you. It was always useful to have your expert eye on my work.
• I am thankful to everyone in the Service-Oriented Computing (SOC) group at UNSW, es- pecially Dr Carlos Rodriguez, Dr Moshe Chai Barukh. Thanks to Dr Seyed Mehdi Reza Beheshti for your invaluable help. Special thanks to my former colleague Reza Nouri for all the fun and tough moments we had together during the projects. I gained an enormous amount of knowledge from you.
• Special thanks to MohammadAli for all his helpful insights and suggestions. I will never forget our tea times in the kitchen where most of our ideas emerged and came to life. • My brother, Mojtaba, thanks for the good laughs and fun times. I am glad that I had you in my Ph.D. journey.
• My older sister, Maisie, thank you for everything. Your comments and advice were always helpful in achieving this milestone.
• Special thanks to John and Olga, my brother and sister from Colombia. I hope to keep your friendship for the rest of our lives. John, you are amazing. I will never forget how we enjoyed our conference trips we attended together.
• To my aunties, Roghi and Latifeh, my uncle Kia, my grandmother Esi, thank you for your prayers and being present in spirit all the this time.
• Special thanks to FlatIcon0. Most of the icons used in this thesis are from FlatIcon.
0https://www.flaticon.com
Page ii of 194 Abstract
Digital assistants and their instantiation in the form of messaging or chat bots, software robots, virtual assistants, have become the quintessential engine for understanding user needs, expressed in natural language, and on fulfilling such needs by invoking the appropriate back-end software services. The continuous improvement in Natural Language Processing (NLP), Artificial Intelli- gence (AI), messaging interfaces and devices allow natural language-based interactions between users and a deluge of software enabled services including interactions with “data sources”, “ap- plications”, “resources” and “physical assets” (e.g., sensors). Increasingly, organisations leverage digital assistants to increase productivity and automate business processes in various application domains including office tasks, travel, healthcare, and e-government services.
Nonetheless, despite the early adoption, digital assistant technologies are still only in their preliminary stages of development, with several unsolved theoretical and technical challenges stemming from the lack of effective support for wide range of possibly ambiguous user intents and to leverage the large and growing number of services. More specifically, the lack of latent knowledge to represent the different types of software services and the lack for supporting com- plex interactions between users and services inhibit design and engineering of effective and effi- cient techniques that harness the full potential of natural interactions between users and software enabled services.
This thesis advances the fundamental and practical understanding of natural language based conversations between users, resources, services and devices. In this thesis we build upon ad- vances in NLP and entity recognition and devise novel concepts and techniques to address im- portant shortcomings in natural-language based conversational systems. Inspired by word em- beddings, their extensions and impacts, we develop novel vector space models and techniques to capture, represent and reason about rich latent knowledge about user intents, semi-structured and textual artefacts (e.g., emails), data structures (e.g., attributes in an indexing schema over data sources) and API elements (e.g. API methods) to support potentially ambiguous natural language user requests and tasks. We develop extended state-machine based models to capture conversation patterns among users and services. We provide validation and evaluation of the proposed models and techniques.
Page iv of 194 Contents
Acknowledgements i
Abstract iii
Contents v
List of Figures xii
List of Tables xviii
1 Introduction 1
1.1 Background, Motivations and Aims ...... 1
1.2 Research Issues ...... 4
1.2.1 From Raw Unstructured Information Items to Semantically Annotated In- formation Items ...... 4
1.2.2 Schema-less and natural language access to heterogeneous information sources ...... 5
1.2.3 User Intents and APIs integration ...... 7
1.3 Contributions ...... 8 1.4 Thesis structure ...... 12
2 Background and State of the Art 13
2.1 Intent Recognition ...... 15
2.1.1 Rule based Techniques ...... 16
2.1.2 Traditional Classification-based Techniques ...... 19
2.1.3 Deep learning-based Techniques ...... 21
2.2 Dialogue Management ...... 22
2.2.1 Flow Based Models ...... 23
2.2.2 Deterministic State Machines based Models ...... 25
2.2.3 Probabilistic State Machines based Models ...... 30
2.2.4 Memory based Learning Models ...... 31
2.3 Natural Language Generation ...... 33
2.4 Term Embeddings ...... 35
2.4.1 Word Embedding ...... 35
2.4.2 Sentence Embedding ...... 37
2.4.3 Document Embedding ...... 40
2.4.4 Domain Specific Embeddings ...... 41
3 Event Embeddings 42
3.1 Introduction ...... 43
3.2 Related work ...... 45
Page vi of 194 3.3 Evidence Collection & Analysis ...... 47
3.4 Dynamic Event Type Recognition and Tagging ...... 49
3.4.1 Training Data Generation ...... 51
3.4.1.1 Datasets ...... 52
3.4.1.2 Training Data for Event Type Recognition ...... 53
3.4.2 Event Type Recognition ...... 54
3.4.2.1 Event-Type Vector Encoding ...... 55
3.4.2.2 Tuning Event-Type Vectors ...... 55
3.4.2.3 Event-Type Recognizer ...... 57
3.4.2.4 Event Information Extraction ...... 58
3.4.2.5 Event Recognizer REST APIs ...... 59
3.4.3 Insights and Discovery ...... 60
3.5 Experiments ...... 61
3.5.1 Event-Type Recognition ...... 62
3.5.1.1 Conventional Validation ...... 62
3.5.1.2 K-Fold Cross Validation ...... 63
3.5.1.3 Human Validation ...... 63
3.5.1.4 Effect of Word Embedding Model ...... 64
3.6 Discussion ...... 66
3.7 Concluding Remarks ...... 67
Page vii of 194 4 Attribute Embedding based Indexing of Heterogeneous Information Sources 69
4.1 Introduction ...... 70
4.2 Related work ...... 72
4.3 Security Vulnerability Information Model ...... 77
4.4 Collecting, Enriching and Indexing Security Vulnerability Information ...... 79
4.4.1 Security Vulnerability Information Collection, Adaptation and Enrichment 80
4.4.2 Security Vulnerability Information Indexing ...... 81
4.5 Security Vulnerability Information Embedding ...... 83
4.5.1 Training the Vector Space Model (VSM) ...... 84
4.5.2 Attribute Embedding ...... 85
4.5.3 Tuning Attribute Embedding ...... 86
4.5.4 Attribute Value Embedding ...... 87
4.5.5 Attribute/Value Recognition REST APIs ...... 88
4.6 Security Vulnerability Information Querying with NL Support ...... 89
4.7 Evaluation ...... 93
4.8 Conclusion and Future Work ...... 97
5 API Elements Embeddings 98
5.1 Introduction ...... 98
5.2 Related Work ...... 101
5.2.1 Summary ...... 111
Page viii of 194 5.3 Approach overview ...... 111
5.4 API Knowledge Graph (API-KG) ...... 115
5.5 Deriving API vectors ...... 116
5.5.1 Training the Vector Space Model (VSM) ...... 117
5.5.2 Populating the Knowledge Graph ...... 118
5.5.2.1 API Description Embedding ...... 119
5.5.2.2 API Method Embedding ...... 120
5.5.2.3 API Parameter Embedding ...... 121
5.5.3 API Parameter Enrichment - Acquiring Mentions ...... 123
5.6 Building bots using API-KG ...... 125
5.6.1 API-KG REST APIs ...... 125
5.6.2 Bot development scenario ...... 126
5.7 Experiments ...... 131
5.7.1 Bot Development using only API-KG ...... 131
5.7.1.1 Results - Benefits of using API-KG ...... 132
5.7.1.2 Results - Issues of using API-KG ...... 133
5.7.2 Bot Development using Bot-Builder ...... 134
5.7.2.1 Results - Searching with API-KG vs others ...... 134
5.7.2.2 Results - Bot Development using Bot Builder ...... 135
5.7.3 Bot Development by Non-developers ...... 137
Page ix of 194 5.7.4 Effect of Vectorization Techniques ...... 138
5.8 Conclusions and Limitations ...... 140
6 Multi-Turn and Multi-Intent User Chatbot Conversations 142
6.1 Introduction ...... 142
6.2 Human-Chatbot Conversations ...... 144
6.3 Conversation State Machines ...... 146
6.3.1 Transitions between States ...... 148
6.4 Generating State Machines ...... 150
6.4.1 Generating Intent States from Bot Specification ...... 150
6.4.2 Generating Transitions between States ...... 150
6.5 Conversation Manager Service ...... 153
6.5.1 Conversation Manager Architecture ...... 154
6.5.2 State Machine Generator ...... 154
6.5.3 Dialog Act Recogniser ...... 154
6.5.4 Slot Memory Service ...... 155
6.5.5 User-Chatbot Conversation Scenarios ...... 155
6.6 Extended Bot Builder - Conversations Support ...... 157
6.6.1 The Bot Builder Architecture ...... 158
6.6.2 Defining Bot Specification ...... 158
6.7 Validation ...... 159
Page x of 194 6.7.1 Chatbot Development by Developers ...... 159
6.8 Conclusions and Future Work ...... 161
7 Conclusion 162
7.1 Summary the Research Issues ...... 162
7.2 Summary of the Research Outcomes ...... 163
7.3 Future Research Directions ...... 164
BIBLIOGRAPHY 165
Page xi of 194 List of Figures
1.1 Research Approach ...... 12
2.1 Relevant APIs to the search query ...... 14
2.2 An utterance belongs to an intent and contains entities ...... 16
2.3 Defined pattern/response pairs for Greeting intent ...... 16
2.4 Defined rules to return corresponding results from database [8] ...... 17
2.5 Rules written by AIML (Left) and Rivescript (Right) scripting languages ..... 18
2.6 An excerpt of training dataset for machine learning based intent recognition model 20
2.7 Extracted features to define rules for entity extraction (e.g. meeting duration) from email content [51] ...... 21
2.8 Given a process model, the approach generates a set of rules to deploy a flow-based chatbot [153] ...... 24
2.9 Quick reply is an instant input (answer) from user ...... 24
2.10 A Carousel contains list of items with more details ...... 25
2.11 Building a flow-based chatbot to book doctor appointments by using Chatfuel plat- form ...... 26 2.12 Dialog management by exploiting a defined finite state machine [112] ...... 27
2.13 An excerpt of conversation between user and Ava [109] ...... 27
2.14 User interaction with Daisy to explore available features and supported models [140] 28
2.15 Template code for Pearson Correlation command includes sample utterances and follow-up questions for required arguments, all defined by developer [69] . . . . 29
2.16 Iris state machine that supports command composition - call methods recursively, and sequencing - referencing previous command results [69] ...... 30
2.17 Devy’s finite state machine for handling workflows [192] ...... 31
2.18 Over informative answer from user for the question asked by state based chatbot [169] ...... 32
2.19 An sequence model (e.g. LSTM) is considered as a blackbox with a source se- quence (user utterance) and target sequence (chatbot response) ...... 33
2.20 Two-dimensional projection of vector space model that represents countries and their capital cities [176] ...... 35
2.21 Skip-gram model vs CBOW model - w(t) is the target word and w(t−2)...w(t+2) are context words [176] ...... 36
2.22 Subword-level in FastText - Given the word “accomodation” which is a mis- spelling of accommodation, we still get closest words ...... 38
2.23 A sequence to sequence machine translation neural network - encoder and decoder are connected together through a hidden state which represents the input sentence [232] ...... 39
2.24 Two-layers DAN - it first converts the given sentence to a vector by averaging all its words, then it feeds feed-forward DNN to generate a new sentence [107] . . . 40
3.1 Illustration of an investigator’s workspace...... 48
Page xiii of 194 3.2 Our framework for dynamic recognition and tagging of event-types for insights and discovery ...... 50
3.3 Detailed architecture for Training Data Generation, Event Type Recognition, and Insight and Discovery...... 51
3.4 Excerpt of one of the case files in our case dataset...... 52
3.5 Excerpt of the gold standard data showing two sentences and their corresponding event-types...... 52
3.6 Event Type Vector Encoding Using Seed n-grams ...... 54
3.7 Tuning Event Type Vector by using training dataset ...... 56
3.8 Recognition of Event Types from Evidence Logs ...... 58
3.9 Swagger documentation for Event Regognition Service ...... 59
3.10 Case Walls for Law-Enforcement ...... 61
3.11 Event Type Recognizer - Precision/Recall/F-Score for 0-50% training set (testing set is fixed on 50% of gold dataset) ...... 62
4.1 User can add (A) additional entities (rows) and (B) additional columns from list of suggestions [292] ...... 75
4.2 Example of a CI similarity query - find similar customers based on their purchased items [25] ...... 76
4.3 Converting tables into sentences to create a corpora to train embedding model [25] 77
4.4 (a) Security Vulnerability Information Model [54]. (b) Architecture for collecting, enriching, indexing and querying security vulnerability information. The bottom part of the architecture operates offline, while the upper part does it online. . . . . 79
Page xiv of 194 4.5 Index-ready JSON representation of security vulnerability information model (in- troduced in Figure 4.4(a)): (a) JSON schema of information model (shaded at- tributes correspond to enrichments), (b) Example of a single document containing vulnerability information, (c) JSON schema for storing attribute mentions, (d) ex- ample of two attributes and their possible mentions ...... 82
4.6 Pipline for constructing attribute- and value- embeddings...... 84
4.7 Attribute Vector Encoding Using Extracted Words ...... 86
4.8 Tuning Attribute Vector by using mentions ...... 87
4.9 Generating Value Vector by using indexed values ...... 88
4.10 Swagger documentation for Attribute/Value Embedding Service ...... 89
4.11 Steps for NL to ElasticSearch’s DSL query translation...... 90
4.12 Dependency tree indicates the attachments between tokens (words) in an NL query 91
4.13 Translating NL queries into ElasticSearch’s DSL query...... 92
5.1 An answer in StackOverflow which adds more details to the API documentation [255] ...... 104
5.2 A partial code completion - (Hx) are empty places which are fulfilled by SLANG [212] ...... 106
5.3 An example of API sequences and annotation for a Java method IOUtils.copyLarge [83] ...... 107
5.4 An example of Stack Overflow post used to extract the keyword-API mapping: (a) question, (b) accepted answer [208] ...... 110
5.5 Typical Bot development process (Left) vs Bot development process using API- KG (Right) ...... 112
5.6 Approach overview ...... 113
Page xv of 194 5.7 Finding relevant APIs for the given goal ...... 114
5.8 API Knowledge Graph (Yelp API) ...... 116
5.9 Generating an API description vector: (i) Keyword extraction, (ii) Keyword ex- tension, (iii) Generating the final vector by averaging the vectors of keywords . . 119
5.10 Crowdsourcing task to provide three paraphrases per annotated utterance ..... 121
5.11 Generating API method embedding - (i) API owner provides an initial utterance that describes best the interaction with method, (ii) initial utterance is then para- phrased by crowd workers to collect more utterances, (iii) collected paraphrases are then used to generate a method embedding ...... 122
5.12 Generating an API parameter vector: (i) Value extraction, (ii) Value extension, (iii) Generating the final vector by averaging the vectors of values ...... 123
5.13 Choosing a target synset: (i) Retrieve synsets from BabelNet, (ii) Extract key tokens from naming of sysnets, (iii) Generate a vector per synset by averaging the vectors of tokens, (iiii) Choose the synset with closest vector to the vector of parameter ...... 124
5.14 Swagger documentation for API-KG ...... 126
5.15 Relevant APIs to the search query ...... 127
5.16 Building chatbot using Bot Builder ...... 128
5.17 Relevant API Methods to seed utterances ...... 129
5.18 Utterances associated to an API Method ...... 130
6.1 Types of human-chatbot conversations - from less to more natural ...... 145
6.2 User changes the intent to know about her calendar schedule ...... 146
6.3 Transition between intent-states based on user intent - current intent-state is de- noted by blue color, “new intent” transition is highlighted in orange ...... 147
Page xvi of 194 6.4 Transition to nested slot-value state - current nested slot-value state is denoted by red color ...... 148
6.5 Transition to nested slot-intent state - user’s answer is a request to another intent, state machine moves from “location” nested slot-value state to a nested slot-intent state (“GetUserDetails”) to obtain the value for the missing slot ...... 149
6.6 Conversation Manager Architecture ...... 153
6.7 Bot Builder Architecture - Automated Chatbot Development ...... 157
Page xvii of 194 List of Tables
2.1 An illustrative example of a user utterance ...... 23
3.1 Examples of Evidence Sources ...... 47
3.2 Event Type Recognizer - Precision(P)/ Recall(R)/ F-Score(F)/ Average(Avg) for 5-Fold Cross-Validation ...... 64
3.3 Event Type Recognizer - Precision(P) and Recall(R) while using different embed- ding models ...... 66
4.1 Sample of questions asked when diagnosing security vulnerabilities (extracted from [233]). Terms that are relevant for the security vulnerability domain are underlined...... 78
4.2 Examples of adapted questions. The questions in bold font are the original ques- tions (Q) from [233], while questions in regular font are examples of adapted questions (AQ). We used a total of 65 variants of these AQs for the evaluation. . . 94
4.3 Evaluation results. We report on average values for |Rel|. We also report on the average values for R-Precision when using no embedding, GoogleNews embed- ding, Wikipedia embedding and Security embedding. We use the metric P@10 for questions with large |Rel| [226]. Entries marked with N/A means that the approach was not able to return any results whatsoever...... 95 5.1 Examples of API Methods, their associated utterance, and possible paraphrase. . 120
5.2 Comparison between API method vectorization techniques for given natural lan- guage search queries ...... 140
6.1 Examples of Dialog Acts in a conversation between user and chatbot...... 152
Page xix of 194
Chapter 1
Introduction
1.1 Background, Motivations and Aims
Software enabled services are central to the operation of digital processes [14, 28]. They are perva- sive to core processes that streamline the delivery of structured services such as HR, procurement, payroll, banking (e.g., loan), etc. They are also pervasive to support processes that enhance the ef- ficiency and effectiveness of indirect activities and streamline the management of day-to-day tasks (e.g., send emails, schedule meetings, record notes, manage tasks). The continuous improvement in connectivity, user interfaces, software platforms allow access to software enabled services del- uge including data services, Internet of Things (IoT) services, document management services, cloud resource services, task management services, platform services. With the advent of widely available software enabled technologies, coupled with intensifying global competition, fluid busi- ness and social requirements, organizations are rapidly shifting to digitization of their processes. Accordingly, organizations embraced radical changes that are necessary for increased productiv- ity and effectiveness. Capabilities arising from advances in digital transformation technologies, enabled organizations to increase productivity, embrace automation, and extend business to lo- cations far beyond their normal operations. Now, at all levels, software services enabled digital transformation is firmly recognised as a strategic priority for modern organizations [138, 197]. It is also firmly recognised that online service-enabled economy - also called the digital economy - is central priority for economic development. As economies undergo significant structural change, digital strategies and innovation must provide industries across the spectrum with tools to create a competitive edge and build more value into their services [19, 138, 52].
Clearly advances in online service technologies already transformed the Internet into global workplaces, social forums, collaboration and business platforms – allowing services to be deliv- ered via the Internet from any location to any other location. In addition, services integration also matured in recent years [18, 3]. More specifically, while the Software as Service (SaaS) trans- formed the enterprise software industry over last decades by reducing the cost of software-enabled operations and improving automation of processes, the enabling engine for scalable integration and automation are Application Programming Interfaces (APIs). APIs allow programmatic ac- cess to heterogeneous and autonomous data sources and applications via standard protocols and languages. They are at heart of integrating services and streamlining processes automation. In a nutshell, once an online service has reached a threshold of popularity, organisations are competi- tively compelled to implement APIs in order to allow third-party developers to write auxiliary or satellite ‘apps’ which add new uses to the original service, enrich its features and accessibility, enhance its agility and accelerate overall development and integration. As mentioned before, es- sentially APIs unlock application, data source and device silos through standardized interaction protocols and access interfaces [190, 138, 102]. They are the glue of online services and their interactions. They are fundamental to the Web, social media already depend heavily on APIs, as do cloud and enterprise services (e.g., document management tools, databases, platforms, appli- cations, appliances, IoT and sensors) [3, 152].
However, while software enabled services and APIs enabled organisations to increase effi- ciency, streamline services integration and process automation, new usability, productivity and effectiveness challenges have also emerged. Human users involved in processes delivery typically access services by separately accessing a variety of (cloud-based) software tools and interfaces like apps, Web sites, productivity, task management and collaboration interfaces. For instance, users may access, query, integrate and analyze data using common user productivity services (e.g., spreadsheets, database interfaces, and SaaS applications and tools such as CRM, social me- dia, email and collaboration tools) over underlying data services. Characteristic examples of such paradigm are numerous, across government, enterprise and the consumer arena. Consequently, processes are in general hidden and highly unstructured, i.e., no visibility and trace-ability over the end-to-end process interactions. In addition, even sophisticated end users regularly resort to low efficiency manual methods to draw information from one service and use it elsewhere to
Page 2 of 194 support their tasks. This problem worsens as the variety of available services and the flexibility of available tools increase. On the other extreme of the spectrum, structured processes known as automated business processes are managed using dedicated enterprise software systems called workflow management systems [257]. The main drawbacks of structured processes stem from the inherent development and maintenance of cost and thus fail to cater for much needed agility in today’s dynamic environments.
Users should be empowered to benefit from the power of SaaS and APIs in performing their day-to-day activities in digitally enabled processes. However, a commonly overlooked limitation of SaaS technologies is that they do not make available services accessible to human users in natural manner. For instance, Web browsers and applications allow users to access underlying information and services through clicking, scrolling, form filling on visual interfaces. At the same time, conversational Artificial Intelligence (AI) and its instantiation in the form of messaging or chat bots, task-oriented conversational bots, software robots, digital or virtual assistants, emerged as new paradigm to naturally access services and perform tasks through natural language (text or voice) conversations with software services and humans. Conversational AI based services enable the understanding of user needs, expressed in natural language, and on fulfilling such needs by invoking the appropriate back-end services. We will use the term conversational bots to refer to all different instantiation of conversation AI based services.
Applications such as Apple Siri, Google Assistant, Amazon Alexa, Baidu and Microsoft Cor- tana have enabled an increasingly large number of conversational bots in different application domains such as marketing, health, customer support. At present, efforts in conversation bots fo- cus on making software technologies human-inclusive: more than 100 million Alexa devices and almost 1 billion Google assistant devices are implanted in our homes collectively1. Similarly, or- ganizations are also embracing conversational bots as “side-by-side digital co-workers” [80, 75]. Forrester analysts predicts this will be more than 40% of all companies by the end of 2019 [116].
Conversational AI is part of a larger and major transformation in software enabled services, namely AI enabled process augmentation including augmentation of human work, AI-powered digital assistants [66, 156, 192]. From a service engineering perspective, integration of bots and software enabled services have not kept pace with our ability deploy individual devices and ser- vices. Despite advances, in various research areas on the one hand, and the greater availability and
1https://www.cnet.com/news/google-assistant-expands-to-a-billion-devices-and-80-countries/
Page 3 of 194 malleability of data and services on the other, the process still breaks down when we attempt to put it all together. For instance, current bots development techniques rely on human understanding of different APIs and extensive manual programming to produce bots that interact with enterprise services. This is clearly unrealistic in large scale and evolving environments. The ubiquity of conversational bots will have little value if they cannot easily integrate and reuse concomitant capabilities across large number of evolving and heterogeneous information sources, databases, resources, devices and applications. This a very ambitious objective and the investigation of re- lated research issues requires meaningful integration of concepts and techniques from machine learning, knowledge representation and extraction, API engineering and natural language conver- sations between users and back-end services.
In this thesis, we contribute novel abstractions and techniques focusing on re-conceptualizing the integration of existing natural-language based conversational systems and back-end services to better leverage available software-enabled capabilities across large number of evolving and het- erogeneous information sources, databases, resources, devices and applications. The abstractions and techniques seek to enable and, indeed, semi-automate the augmentation of services with latent knowledge and interaction models that are essential to reason about potentially ambiguous user tasks, semi-structured artefacts (e.g., emails, PDF files), support natural language interactions be- tween users, bots and back-end services (e.g., integrated data services and APIs).
1.2 Research Issues
1.2.1 From Raw Unstructured Information Items to Semantically Annotated In- formation Items
Most knowledge-driven processes involve accessing and understanding large number of items including documents (e.g., PDF, word files, spreadsheets), conversations (e.g., emails, tweets, social media posts, text messages, interactions in collaboration platforms like Yammer and slack) and other information (e.g., notes), to extract information, generate insights and make decisions.
This is the case for instance, in law enforcement investigation processes. Investigation cases may last for years. Investigators have to collect, annotate and organize troves of information (e.g.,
Page 4 of 194 witness statements, forensic reports and telephone intercepts) to identify evidence. Knowledge- driven processes are highly cognitive both with respect to collecting and analysing information, as well as to inferring dependencies between information to eventually produce insights (e.g., evidence and evidence items). Despite advances, in various research areas, most cognitive tasks are performed manually; this is no doubt tedious, error prone and highly inefficient [12, 207]. For instance in investigations this may lead to offenses not identified due to limited manual processing power. Search and query systems are not accurate in identifying specific evidence elements (e.g., events such as phone calls, bank transfers, travel movements). It is very time consuming for investigators to keep track of relevant events (facts about things that happened in reality) and identify possible offences from raw evidence items and logs. Overall, traditional search, enterprise search and query techniques merely scratch the surface of the knowledge-driven processes analysis problem, as both its applicability and its effectiveness are very limited, and consequently their usefulness in real life situations such as investigations is negligible.
The challenge is devising scalable and effective entity-based enrichment, exploration and ma- nipulation techniques that will unlock knowledge-driven processes. For instance, for enabling investigators to understand large-scale investigation items, there is a need to extract relevant en- tities from information items, find similarity among them, classify them into groups, and more. Effective cognitive support would not only help extract events, but also analyse, attach semantics to evidence items and ask natural language questions to identify information that are relevant to a given event, entity, offence, etc. Users should be able to identify evidence items that mention events. There should also assistance to reconstruct chains of events, the identification of parties involved, understanding of its temporal dynamics, among other aspects. Without such techniques, knowledge-driven investigations will be tedious, as they would be like querying a database where relations among tables are not modelled. Such a development represents a significant undertaking, not only in terms of effort and time required, but also in complexity.
1.2.2 Schema-less and natural language access to heterogeneous information sources
As mentioned before, the ubiquity of conversational bots will have little value if they cannot easily integrate and reuse concomitant capabilities across large number of evolving and heterogeneous information sources, databases, resources, devices and applications. Daily knowledge-driven pro-
Page 5 of 194 cesses are conducted through ad-hoc processes that require access to information stored in various data sources and services. No single information source can fully support requirements of a given process in terms of information inquiries. For instance, cyber security analysts and professionals need integrated access to information to become aware and informed about security vulnerabili- ties. They find information from several sources (e.g., vulnerability databases, security bulletins and advisories , social media) to identify newly discovered vulnerabilities, learn about existing vulnerabilities, identify new exploits, vulnerable software packages, link information, etc [294].
However, as in several other domains, while much of the security vulnerability information may be available from both private and public information sources, such information is in many cases scattered across different, heterogeneous and complex information silos (e.g., vulnerability databases, security bulletins and advisories , social media) [295]. Knowledge-driven processes over complex information silos is time consuming, frustrating, error prone, repetitive, and often bloated with non-necessary work. Even when sophisticated indexing techniques such as Elastic Search [81] are used to provide an integrated view over various data sources, interfaces that are used to access and integrate information are not appropriate for domain analysts [115]. Query in- terfaces used to access information through integrated and indexed sources are in general keyword search, SQL-like or Domain Specific Language (DSL) based [219]. Keyword search techniques are known for their non accuracy limitations [115, 219]. The other techniques presuppose tech- nical expertise comparable to that of professional users, including employing different low-level APIs to access various data sources, together with procedural data flow constructs [91, 50, 92].
While existing techniques in data management, information retrieval and indexing have pro- duced promising results that are certainly useful, more advanced techniques that cater for under- standing of ad-hoc and knowledge-driven processes, interacting with users in natural language, improving productivity, to effectively supporting user’ daily tasks are needed. More specifically, existing indexing and integration techniques lack latent knowledge about index and information source attributes. This knowledge is essential to reason about potentially ambiguous user intents and effectively map them to queries over indexed data sources.
Page 6 of 194 1.2.3 User Intents and APIs integration
While enriching non-structured information items with meta-data like events and supporting nat- ural language queries over integrated information sources are important requirements for effective integration of bots and software enabled services, modern organisations use a large number of applications like task and document management (e.g., Trello, Jira, Google Docs, Dropbox) and collaboration apps (e.g., Slack, Yammer). In addition, APIs can be used to provide uniform access to these apps and facilitate their integration. As mentioned before APIs are the engine for online services integration which essential for streamlining both knowledge-driven (e.g., manage docu- ments through Google Docs or Dropbox APIs) and structured processes (trigger workflows using a workflow management system APIs).
APIs enable an actionable framework across many different applications, data sources and de- vices. It is estimated that there are already 50,000 APIs, and that the number will grow rapidly over the next few years [73]. This growth will come from APIs for linking cloud resources and applications as well as APIs for appliances, mobile devices, sensors or vehicles. We believe that the seamless integration of bots and APIs will charts an effective new paradigm to make con- versational services both user and process-centred. A main hard challenge to achieve this new paradigm is linking low-level API abstractions (e.g., API calls) to high-level bot abstractions (e.g., user intents, user utterances) [289].
In existing bot and API integration techniques, developers leverage NLP and machine learn- ing capabilities to recognise user intents [165, 271, 227]. However, the burden of integrating user intents to APIs is shifted to developers to bolt intents onto existing low-level APIs. This leads to an inflexible and costly environment which adds considerable complexity, demands ex- tensive programming effort, requires extensive bot training and perpetuates closed cloud solutions. However, effective integration of natural language conversations and APIs requires rich API ab- stractions to reason about potentially ambiguous user intents and effectively integrate intents and APIs. We need more dynamic and knowledge driven techniques that provide high-level and latent knowledge based reasoning about API elements (i.e., description, methods and parameters) and automated support for matching intents and APIs. Furthermore, user intent may be complex and its realisation requires complex conversations between users, bots and APIs. Consequently, de- signing effective integration of natural language conversations and API-enabled services remains
Page 7 of 194 a deeply challenging problem.
1.3 Contributions
We build upon advances in NLP, machine learning, information indexing, knowledge graphs, dia- log and conversation modelling techniques. We contribute innovative concepts and techniques to scale the integration of natural language based and software enabled services [28]. The proposed concepts and technique resolving important gaps in integration of natural language based conver- sations and software enabled services. They enable new efficiencies to bridge these gaps includ- ing: (i) the enrichment of unstructured information items with entities and event types supporting their semantic understanding, (ii) semantic augmentation of indexing attributes of integrated in- formation sources to effectively support multi-entity mentions and ambiguous user queries over heterogeneous data sources, (iii) latent-knowledge based middleware techniques and services to support effective integration of user intents and APIs, (iv) hierarchical state machine based models to represent and reason about complex interactions between users, bots and APIs. We investigate and develop software architectures, prototypes, evaluation studies and applications to assess the proposed models and techniques.
From Raw Information Item to Semantic Information Item - Enrichment of unstructured information items with entities and event types.
This study was conducted in the context of knowledge-driven law enforcement investigations1. The study provided us with formidable challenges that are relevant not only to this domain but to most knowledge-driven processes where semantic understanding of raw unstructured information items (e.g., emails, PDF files, communication messages, social media post) is required. The ob- jective we have is to enable investigators to understand, organize and query the large amount of unstructured information items, collected and generated during investigation processes. We build upon advances in NLP (e.g., extracting information from unstructured information items) [91, 48], word embeddings [176] and knowledge-based enrichment (e.g., extracting entity mentions from knowledge graphs) [185] to enable the recognition of events from investigation case-related infor- mation (e.g., collected evidence items). We encode event-types as vectors in vector space model
1Data to Decisions CRC, Data Curation Foundry project
Page 8 of 194 based on the distributional semantics of sentences in evidence items. The proposed approach con- fers event robust recognition because it caters for the automated identification and enrichment of variations and word mentions across various information items. Event and event type vector based similarity and matching techniques are then used identify when sentences in evidence items relate to a particular event-type.
Semantic Augmentation of indexing attributes in integrated information sources.
This study was conducted in the context of knowledge-driven security vulnerability investigations2 [219]. The study provided us with formidable challenges that are relevant not only to this domain but to most knowledge driven processes where endowing integrated data services with capabili- ties to reason about potentially ambiguous user information search and linkage natural language queries and effectively map them to structured queries over an integrated index of large number of heterogeneous information sources. The objective we have is to improve the accuracy of data retrieval operations involved security vulnerability information search and linkage (e.g., identify newly discovered vulnerabilities, learn about existing vulnerabilities, identify new exploits, vul- nerable software packages) from various information sources (e.g., vulnerability databases, secu- rity bulletins and advisories , social media) [115]. They will allow security analysts and profes- sionals to leverage available information sources to gain insights into potential or existing security vulnerabilities and improve awareness and security assurance strategies in general.
We build upon advances in NLP (e.g., natural language queries) [48], word embeddings [176, 174], knowledge-based enrichment (e.g., WordNet, BabelNet and ConceptNet) [178, 186, 235], to enable the augmentation of indexing attributes with semantics that is essential to support multi- entity mentions and ambiguous user queries over heterogeneous data sources. We propose a novel attributed-based embedding indexing mechanism over vulnerability information data sources that leverages knowledge graph-based enrichment. We devise query mapping techniques that is able to translate NL queries into Elastic search queries. These techniques allow retrieving and correlating vulnerability information from large and heterogeneous information sources. They do not require the users to have precise knowledge of index or information source schemas.
2Data to Decisions CRC, Data Curation Foundry project
Page 9 of 194 Latent-knowledge based middleware techniques and services to support effective integration of user intents and APIs.
Effectively integrating conversational services with information and functionality that is accessi- ble through APIs will allow these services to exploit complete advantage of available software- enabled technologies in order to keep-up with this rapid increase of data and process enabled opportunities. We therefore set forth the semi-automated integration of user intents in conver- sational services with a potentially large and evolving set of APIs as key feature to achieve the above objective [289]. This will allow a new generation of conversational services where ad-hoc process tools (e.g., apps), structured process management systems (e.g., workflow management systems) are augmented with conversational digital assistance. This type of assistance will have bot-like natural language conversational interfaces, with various layers of integration intelligence as its core components. It will enable a framework where conversational services and APIs work together in tandem to unify disparate process and work tools giving life to new services allow processes to benefit from the power of NLP, conversational AI and APIs.
We build upon advances in NLP (e.g., user utterance understanding, entity recognition) [48], word embeddings [176, 174] and knowledge-based enrichment (e.g., extraction of entity mentions from knowledge graphs), to enable the augmentation of API elements (e.g., description, methods and invocation parameters) with semantics that is essential to support the effective mapping of high-level user intent abstractions (e.g., user goals, user utterances) to low-level API abstractions (e.g., API calls). We propose to represent API elements as vectors in an extended vector space model. We combine crowdsourcing, knowledge graph and word embedding techniques to build end enrich API element embeddings. We propose knowledge-powered middleware techniques and services to interact with APIs based on how users would call the method in natural language.
Hierarchical state machine based models to represent and reason about complex interactions between users, bots and APIs.
A key distinguishing feature of conversational services is dialog patterns, i.e., the interaction styles needed to fulfil user intents (e.g., a question by the bot to user to resolve the value of a missing intent parameter, an invocation of an API by the bot to resolve the value of a missing parameter, a question by a bot to a user to confirm an inferred intent value or make a choice among sev- eral options, extracting an intent parameter value from the history of user and bot interactions)
Page 10 of 194 [99]. Interactions involve user utterances, conversation management acts and actions (e.g., API calls, natural language response generation). Instead of relying on low-level scripting mecha- nisms or provider-specific rule engines, we argue that models and languages for describing nat- ural language conversations between users, bots and APIs should be endowed with intuitive and automation-friendly constructs that can be used to specify a range of dialog patterns. We propose the concept of conversation state machines as a abstraction to represent and reason about dialog patterns. Conversation state machines represent multi-turn and multi-intent conversations where state represent intents, their parameters and actions to realise them. Transitions between states are triggered when certain conditions are satisfied (e.g., detection of new intent, detection of missing intent parameter). Transitions automatically trigger actions to perform desired intent fulfilment operations. We propose automated generation of run-time nested conversation state machines that are used to deploy and control conversations with respect to user intents.
Page 11 of 194 1.4 Thesis structure
The thesis structure follows the research approach presented in Figure 1. In each chapter we present the related work to the study reported in this chapter to motivate the study and illustrate relevancy. We provide essential background knowledge and terminology in chapter 2. Chapter 3 proposes and presents techniques to automate event identification and enrichment in raw unstruc- tured information items. Chapter 4 proposes and presents an attributed-based embedding indexing mechanism to support flexible natural language queries over structured vulnerability information stored in data services. Chapter 5 proposes and presents latent knowledge-driven techniques to fa- cilitate the interaction with APIs via intent-based conversations (e.g., task-oriented conversational bots). Chapter 6 proposes and presents an extended state-machine based model and techniques to support multi-turn multi-intent natural language conversations between users and services.
Chapter 2 Review of background and state of the art approaches
Chapter 3 Chapter 4 Enrichment of unstructured Semantic Augmentation of information items with entities indexing attributes in integrated and event types information sources
Chapter 5 Latent-knowledge based middleware techniques and services to support effective integration of user intents and APIs.
Chapter 6 Hierarchical state machine based models to represent and reason about complex interactions between users, bots and APIs.
Figure 1.1: Research Approach
Page 12 of 194 Chapter 2
Background and State of the Art
In this chapter, we discuss background on dialog systems and distributed representations of words and their contexts (i.e., word embedding and its extensions). We introduce main concepts and tech- niques that are relevant to contributions that we present in the following chapters. As mentioned in the introduction, our work investigates language understanding, extended term embeddings and conversational services.
In Section 2.1, we discuss user intent recognition techniques. Section 2.2 discusses existing approaches for dialog management. In Section 2, we overview techniques to generate natural language responses. Finally, in Section 2.3, we discuss main term embedding models.
Part I - Dialog Systems
Dialog systems are computer programs that provide natural language conversations between users and software systems. The input of such systems is natural language utterances (text/voice). The system also generates an appropriate response (in form of text/voice) back to the user. Dialog sys- tems are generally categorized into two classes [39]: (i) Non-task oriented, and (ii) Task oriented.
Non-task oriented dialog systems focus on open domain conversations with users (i.e, non predefined goal of conversations). As reported in [39], around 80% of conversations are chi-chat messages in the online shopping scenario. Examples for this type of dialog systems include: Natural Language Understanding (NLU)
Dialogue Management (DM)
Natural Language Generation (NLG)
Figure 2.1: Relevant APIs to the search query
DBpedia chatbot [10], which answers faceted questions sourced from DBpedia, Cleverbot1 and Mitsuku2, which handle open-domain conversations. Non-task oriented dialog systems hardly keep track of conversation states and are therefore not designed to perform specific user tasks (e.g., travel booking, task management) [253]. In general, three main approaches have been proposed to build non-task oriented dialog systems:
• Providing question and answer pairs in the form of handcrafted rules with scalability and flexibility issues in large scale cases [229, 224].
• Exploiting generative sequence to sequence models to build an entire phrase word by word conditioned on a user utterance [262, 77].
• Retrieval-based methods, which learn to select responses from external repositories (e.g. knowledge graphs [280, 97, 179]).
Task-oriented dialog systems or simply Chatbots3, on the other hand, allow users to accom- plish a goal (e.g. maintain schedules [51], organise projects [253]) using information provided by users during conversations. Chatbots perform tasks [39] by utilising several specialised compo- nents. These components together with their interactions are shown in Figure 2.1:
• Language Understanding Component: The Natural Language Understanding (NLU) com- ponent parses user inputs into a structured format that can be used by chatbots. We will discuss the functions of this component in Section 2.1.
1https://www.cleverbot.com/ 2https://www.pandorabots.com/mitsuku/ 3Since the focus of this dissertation is on task-oriented dialog systems, the term “chatbots” refers to this type of dialog systems for simplicity.
Page 14 of 194 • Dialogue Management Component: this is the core component of a chatbot [39]. It man- ages the conversation flow, checks user inputs in each turn, and chooses next actions based on conversation history [39, 67]. In section 2.2, we will unfold existing techniques and approaches for this component.
• Natural Language Generation (NLG) Component: Generating human-like responses as re- sult of actions performed by chatbots (e.g. invoking an API, querying database) is the re- sponsibility of this component [228]. Later, in Section 2.3 we will discuss NLG techniques in more details.
2.1 Intent Recognition
In order to be able to interact with users, it is essential for chatbots to understand user intentions, a task known as intent recognition [120, 205, 86].
An intent refers to users’ purposes, which a chatbot should be able to respond to [205, 39]. For example, if a user says “Show me some Italian restaurants near UNSW”, the user wants to know about available restaurants in a specific area. Intents are specified using short names such as “FindRestaurant” or “BookTable”. Typically bot developers define a number of intents that the bot will be able to handle.
Defining an intent requires feeding chatbots with a set of sample user utterances.Anutterance refers to anything that a user says whilst conversing with a chatbot [39, 296]. As an example, when a user says “Show me some Italian restaurants near UNSW”, the entire sentence is the utterance. Often there may be several utterances that could say the same thing. For example, there are various ways to ask about “finding restaurant” (e.g. “is there any Italian restaurant near UNSW?”, “Can you help me to find a italian resto around UNSW?”). Based on the user’s utterance, a chatbot can recognize the intent. Thus, providing more and diverse utterances can help in the creation of more robust chatbots [281].
User utterances may carry out important information that is necessary for a chatbot to un- derstand in order to be able to serve the correct answer [77, 39]. Such information is known as entities or slots. These entities have types (e.g. date, time). In the previous example, when user
Page 15 of 194 Utterance
"Show me some Italian restaurants near UNSW"
Entity Entity
Intent: findRestaurant
Verb Noun Figure 2.2: An utterance belongs to an intent and contains entities says “Show me some Italian restaurants near UNSW”, the term UNSW is an entity of type loca- tion and it indicates that the user is referring to a specific place. Figure 2.2 shows an example that illustrates the relationship between intents, utterances and entities. In the following sections, we will describe different approaches to recognize user intents.
2.1.1 Rule based Techniques
A traditional approach to map user utterances into intents is to adopt hand-crafted rules [224]. In this approach, bot developers define intents by encoding a set of intent recognition rules. Such rules are in the form of pattern/response pairs (shown in Figure 2.3). They are similar to if- statements in programming languages [268].
+ pattern: hi bot response: Hi human!
+ pattern: my name is
+ pattern: how are you response: I am good, how about you?
Figure 2.3: Defined pattern/response pairs for Greeting intent
For a given user utterance, a chatbot recognizes its intent by matching all rules with the ut- terance. It does this by checking the order of words in the utterance. For example, if a user says “My name is John”, since the first three words “My name is” are matched with the second pat- tern in Figure 2.3, the chatbot concludes that the user intent is Greeting. Thus, the answer should be “Nice to meet you, John.” considering that
Page 16 of 194 Figure 2.4: Defined rules to return corresponding results from database [8]
[8] proposed a pattern-matching system that uses rules (shown in Figure 2.4) to map user natural language utterances into SQL queries. In [10], they propose a rule-based chatbot based on DBpedia to improve community discussions and interactions in the DBpedia community. At the core of DBpedia chatbot, there is a request router that specifies type of user question (inten- tion). There are three different types of questions (i) questions about DBpedia e.g. “Is DBpedia down right now?”, (ii) fact questions e.g. “What is the Capital of France?”, (iii) jokes or ca- sual conversations e.g. Hi, How are you?, “What’s up?. In order to answer those questions, they used DBpedia mailing list as a source of question/answer pairs and build patterns (rules). Therefore, whenever user question is matched with one of those patterns, the chatbot returns as- sociated answer from the question/answer pairs extracted before. To answer factual questions, the chatbot benefits from WDAqua’s QANARY question answering system1, WolframAlpha and OpenStreetMap. Finally, for the banters, they simply use a predefined set of responses for each question. TQBot [173] is a virtual tutor, which is empowered with rules that guide students to find answers (course contents) related to their natural language questions (e.g. “give me some infor- mation about mammals”). Similarly, Charlie [172] is an online learning platform with the ability to communicate with students using controlled natural language (i.e. take quizzes, check marks).
A number of languages have been proposed to specify intent recognition rules [224]. Artificial Intelligence Markup Language (AIML), which is a derivative of XML, is a widely used language in this context [268]. ELIZA [272], PARRY [47] and ALICE [268] are the first generation of chatbots that leverage this language2. AIML consists of units called categories and topics. Categories are blocks of rules each consisting of (i) pattern which defines user input (e.g. “Hi bot”), and (ii) template which indicates the response of the chatbot (e.g. “Hi human”) to users’ input. Topics, on the other hand, are collections of categories. Figure 2.5-Left shows a chatbot which is able to answer simple user utterances using pre-defined AIML rules. The main drawback of AIML is that it is too verbose and its underlying pattern matching is quite primitive and generic, thus it requires
1https://github.com/WDAqua/Qanary 2Pandorabots is an online platform to build rule-based chatbots using AIML.
Page 17 of 194 a lot of rules to perform even simple tasks [169]..