
A Virtual Assistant for Web Dashboards Analytics Bot Anisa Shahidian Thesis to obtain the Master of Science Degree in Information Systems and Computer Engineering Supervisors: Prof. Maria Lu´ısaTorres Ribeiro Marques da Silva Coheur Dr. Angela Bairos Pimentel Examination Committee Chairperson: Prof. Ana Maria Severino de Almeida e Paiva Supervisor: Prof. Maria Lu´ısaTorres Ribeiro Marques da Silva Coheur Member of the Committee: Prof. Ricardo Daniel Santos Faro Marques Ribeiro October 2018 Acknowledgments A huge thank you to Professora Lu´ısa. You always remove any lingering of worry that I may have, and I truly could not have chosen a better advisor. I do not have words to describe how helpful you are. Thank you. Thank you Angela (and Jos´e),for answering each and every question about the thesis environment, and always ensuring I had all the tools needed for this project. I would like to thank my parents, who took care of me all these years, and have provided me with all sorts of experiences. I hope I have made you proud. I would also like to thank you for giving me two sisters. Nika and Sama, you have taught me how to be a friend and an older sister, and I will always be here for you. Thank you Diogo. For everything. You have helped me grow and given me the encouragement and support to be myself. Life by your side is all and more than I could have hoped for. I would like to thank Mariana, Chico, Henrique, Pacheco, Z´e,Luis, Bruno and Rui. Thank you for all the good times we have had. Having you as my friends has made it all the harder to leave. Also to Mariana, I am glad I said \hi" enough times for you to respond back. Thank you for always being silly with me. Thank you to Alex and In^es. We have had many memorable late night philosophical talks. I hope your lives will be filled with happiness. In^es, thank you for surviving RRR with me. We've had good times. Thank you Teresa, Rui, Zita, av´oJo~ao,tia Lurdes and tio Jo~ao.You have welcomed me very dearly, and that warms my heart. Abstract This thesis relates the research and implementation of a Natural Language Interface to a Database about Rollover Opportunites, responsible for answering questions at BNP Paribas Global Client Analyt- ics Group. We study several Natural Language Interfaces to Databases' and other Natural Language Processing systems, taking into consideration the project at hand and the limitations of the problem this thesis deals with. After that, we describe the architecture for the implementation. The environment in which the questions are asked, named entity recognition, the processing of the user's question, and the two approaches implemented are also detailed. We have implemented a semantic grammar, which constitutes sys1. An implementation with the SEMPRE toolkit and its learning component constitutes sys2. Both implementations are detailed, as are the aspects that needed special attention, such as the use of date expressions in the user's question. Our results show that knowledge of the domain is crucial in a rule-based implementation, as it is not flexible. We also notice that our learning implementation, sys2 has a better result than sys1. Keywords Information; Natural Language Interface to Database; Natural Language Processing; Interface; Rule Based Grammar; Named Entity Recognition; Database; Table; Query iii Resumo Esta tese relata a pesquisa e implementa¸c~aode uma Interface de L´ınguaNatural para Base de Dados sobre Rollover Opportunities, respons´avel por responder a perguntas no Banco BNP Paribas, atrav´esde pedidos a uma base de dados. Estudamos v´ariasInterfaces de L´ınguaNatural para Base de Dados e outros sistemas de Processamento de L´ınguaNatural, tendo em considera¸c~aoo projecto e as limita¸c~oesdo problema sobre o qual esta tese se debru¸ca. Seguidamente, iremos descrever a arquitectura para a implementa¸c~ao. O ambiente no qual as quest~oess~aoinquiridas, o reconhecimento de entidades mencionadas, o processamento das quest~oesdo utilizador, e as duas abordagens tamb´ems~aodescritas. Implement´amosuma gram´atica sem^antica, que constitui o sys1. Uma abordagem com a ferramenta SEMPRE e o seu mecanismo de aprendizagem constitui o sys2. Ambas as implementa¸c~oess~aodetalhadas, tal como os aspectos que necessitaram de aten¸c~aoespecial, como a utiliza¸c~aode express~oestemporais na pergunta do utilizador. Os nossos resultados demonstram que conhecimento do dom´ınio´ecrucial numa implementa¸c~aobaseada em regras, pois a mesma n~ao´eflex´ıvel. Tamb´emobserv´amos que a nossa implementa¸c~aocom aprendizagem, sys2 obt´emmelhores resultados do que sys1. Palavras Chave Informa¸c~ao;Interface de L´ınguaNatural para Bases de Dados; Processamento de L´ınguaNatural; Inter- face; Gram´aticabaseada em Regras; Reconhecimento de Entidades Mencionadas; Base de Dados; Tabela; Pesquisa v Contents 1 Introduction 1 1.1 Motivation............................................2 1.2 Goal and Requirements.....................................2 1.3 Contributions...........................................3 1.4 Document Structure.......................................3 2 Related Work 5 2.1 Introduction............................................6 2.2 Historical Background......................................6 2.3 Main Components........................................9 2.4 Data Management........................................ 12 2.5 Question Analysis........................................ 13 2.6 Query Construction....................................... 16 2.7 Answering Step.......................................... 17 2.8 Learning.............................................. 19 2.9 Summary............................................. 20 3 Pipeline 21 3.1 Introduction............................................ 22 3.2 Domain Characterization.................................... 22 3.2.1 Table........................................... 23 3.2.2 Questions......................................... 23 3.3 Proposal Overview........................................ 25 3.3.1 Data Management.................................... 26 3.3.2 Question Analysis.................................... 27 3.4 Sys1................................................ 29 3.5 Sys2................................................ 30 3.5.1 Grammar......................................... 31 3.5.2 Learning.......................................... 32 vii 3.6 Comparing Approaches..................................... 33 3.7 Summary............................................. 33 4 Evaluation 35 4.1 Introduction............................................ 36 4.2 Experimental Setup....................................... 36 4.2.1 Corpora.......................................... 36 4.2.2 Evaluation Measure................................... 36 4.3 Development Tests........................................ 37 4.4 User Tests............................................. 38 4.5 What if we change the domain?................................. 39 4.6 Summary............................................. 40 5 Contributions, Conclusions and Future Work 41 5.1 Contributions........................................... 42 5.2 Conclusions............................................ 42 5.3 Future Work........................................... 42 A Table Fields 49 A.1 Measure List for \Rollover Opportunities" database..................... 50 A.2 Attribute List for \Rollover Opportunities" database..................... 50 B Dashboard definitions 52 C sys1 Grammar 55 D sys2 Grammar 60 E sys2 Example File 67 viii List of Figures 2.1 First results shown when searching for Karlie Kloss. This Figure contains biographic information, social media and news appearances. The movies she has acted in, and her direct social media links, also appear below these results. Last accessed May 2018.....9 2.2 General overview of a Natural Language Interface to Database (NLIDB) system...... 10 3.1 Screenshot of the dashboard in use. Client names, codes and dates have been removed... 23 3.2 General Architecture provided in Chapter 2........................... 25 3.3 Architecture implemented..................................... 26 3.4 Tree for word matching given the rules provided in Listings 3.9, 3.10 and 3.11....... 32 ix List of Tables 3.1 Comparing the pros and the cons of implemented approaches................. 33 x Listings 2.1 Example of a Syntax based grammar. Adapted from [1]....................7 2.2 Example of a Semantic based grammar. Adapted from [1]...................7 3.1 Example of questions that can be asked by the user. The Measures and Attributes present in the questions are underlined.................................. 24 3.2 Structure of some questions that can be asked by the user................... 24 3.3 Example of questions that can be asked by the user. These questions show the cases of top values and excluding a certain value............................. 24 3.4 Query to obtain the values from the table............................ 27 3.5 Synonyms for words used in the user questions......................... 28 3.6 Temporal rules........................................... 28 3.7 Main Components of the Lexer of the grammar for sys1.................... 29 3.8 Part of the yacc of the grammar for sys1............................ 30 3.9 Measure declaration rules defined in SEMPRE toolkit..................... 31 3.10 General rules defined in SEMPRE toolkit............................ 31 3.11 SEMPRE Root rule........................................ 32 3.12 Example of SEMPRE learning file................................ 32 4.1
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages83 Page
-
File Size-