ANEESAH: a Novel Methodology and Algorithms for Sustained Dialogues and Query Refinement in Natural Language
Total Page:16
File Type:pdf, Size:1020Kb
ANEESAH: A Novel Methodology and Algorithms for Sustained Dialogues and Query Refinement in Natural Language Interfaces to Databases Khurim Shabaz A thesis submitted in partial fulfilment of the requirements of the Manchester Metropolitan University for the degree of Doctor of Philosophy School of Computing, Mathematics and Digital Technology the Manchester Metropolitan University June 2017 Abstract This thesis presents the research undertaken to develop a novel approach towards the development of a text-based Conversational Natural Language Interface to Databases, known as ANEESAH. Natural Language Interfaces to Databases (NLIDBs) are computer applications, which replace the requirement for an end user to commission a skilled programmer to query a database by using natural language. The aim of the proposed research is to investigate the use of a Natural Language Interface to Database (NLIDB) capable of conversing with users to automate the query formulation process for database information retrieval. Historical challenges and limitations have prevented the wider use of NLIDB applications in real-life environments. The challenges relevant to the scope of proposed research include the absence of flexible conversation between NLIDB applications and users, automated database query building from multiple dialogues and flexibility to sustain dialogues for information refinement. The areas of research explored include; NLIDBs, conversational agents (CAs), natural language processing (NLP) techniques, artificial intelligence (AI), knowledge engineering, and relational databases. Current NLIDBs do not have conversational abilities to sustain dialogues, especially with regards to information required for dynamic query formulation. A novel approach, ANEESAH is introduced to deal with these challenges. ANEESAH was developed to allow users to communicate using natural language to retrieve information from a relational database. ANEESAH can interact with the users conversationally and sustain dialogues to automate the query formulation and information refinement process. The research and development of ANEESAH steered the engineering of several novel NLIDB components such as a CA implemented NLIDB framework, a rule-based CA that combines pattern matching and sentence similarity techniques, algorithms to engage users in conversation and support sustained dialogues for information refinement. Additional components of the proposed framework include a novel SQL query engine for the dynamic formulation of queries to extract database information and perform querying the query operations to support the information refinement. i Furthermore, a generic evaluation methodology combining subjective and objective measures was introduced to evaluate the implemented conversational NLIDB framework. Empirical end user evaluation was also used to validate the components of the implemented framework. The evaluation results demonstrated ANEESAH produced the desired database information for users over a set of test scenarios. The evaluation results also revealed that the proposed framework components can overcome the challenges of sustaining dialogues, information refinement and querying the query operations. ii CHAPTER 1 - INTRODUCTION ......................................................................................................... 8 1.1 INTRODUCTION ................................................................................................................................... 8 1.2 BACKGROUND .................................................................................................................................... 8 1.3 RESEARCH AIM ................................................................................................................................... 9 1.4 RESEARCH QUESTIONS ....................................................................................................................... 10 1.5 RESEARCH HYPOTHESIS ...................................................................................................................... 10 1.6 RESEARCH OBJECTIVES ....................................................................................................................... 10 1.7 CONTRIBUTIONS ............................................................................................................................... 11 1.8 THESIS OUTLINE ............................................................................................................................... 13 CHAPTER 2 - STATE OF THE ART .................................................................................................... 15 2.1 INTRODUCTION ................................................................................................................................. 15 2.2 NATURAL LANGUAGE INTERFACES TO DATABASES ................................................................................... 15 2.2.1 NLIDB Development Challenges and Limitations................................................................ 17 2.2.1.1 Syntax-based Approach ........................................................................................................... 18 2.2.1.2 Semantic Grammar Approach.................................................................................................. 18 2.2.1.3 Pattern Matching Approach .................................................................................................... 19 2.2.2 Datasets used for NLIDBs Evaluation ................................................................................. 20 2.2.3 Current known weaknesses in the field of NLIDBs .............................................................. 22 2.2.3.1 Linguistic Coverage .................................................................................................................. 22 2.2.3.2 Domain Coverage Failure ......................................................................................................... 22 2.2.3.3 Users Assumption of System’s Intelligence ............................................................................. 22 2.2.3.4 Interface Problems .................................................................................................................. 23 2.2.3.5 Configuration and Maintenance .............................................................................................. 23 2.2.4 Challenges for NLIDBs ......................................................................................................... 23 2.2.5 Existing Methods of Evaluating NLIDB ............................................................................... 24 2.3 CONVERSATIONAL AGENTS ................................................................................................................. 26 2.3.1 Pattern-matching Text-based CAs ...................................................................................... 29 2.3.2 Background......................................................................................................................... 30 2.3.3 Review of Challenges for CAs.............................................................................................. 33 2.3.4 Existing Methods of Evaluating CA ..................................................................................... 34 2.3.5 Formulation of Evaluation Metrics ..................................................................................... 36 2.4 EXISTING CONVERSATION ENABLED NLIDB SYSTEMS ............................................................................... 38 2.5 CONCLUSION ................................................................................................................................... 40 2.6 CHAPTER HIGHLIGHTS ........................................................................................................................ 41 CHAPTER 3 - A METHODOLOGY FOR DEVELOPING A CONVERSATIONAL NATURAL LANGUAGE INTERFACE TO DATABASE (NLIDB) ....................................................................................................... 43 3.1 INTRODUCTION ................................................................................................................................. 43 iii 3.2 ANEESAH CONVERSATIONAL NLIDB ................................................................................................... 43 3.2.1 Phase 1: Components Development ................................................................................... 44 3.2.1.1 Adopt a NLIDB Building Approach ........................................................................................... 44 3.2.1.2 Selection of a Domain Database .............................................................................................. 45 3.2.1.3 Analyse Real Life Information and Query Requirements ......................................................... 45 3.2.1.4 Determine Conversation Scope and Structure ........................................................................ 46 3.2.1.5 Develop Knowledge Base Structure ......................................................................................... 46 3.2.1.6 Devise Methodology for ANEESAH’s Evaluation ...................................................................... 47 3.2.2 Phase 2: Conversation Scripting and Query Formulation .................................................... 47 3.2.2.1 Selection of a scripting methodology .....................................................................................