Enhancing Community Interactions with Data-Driven Chatbots – the Dbpedia Chatbot

Enhancing Community Interactions with Data-Driven Chatbots – The DBpedia Chatbot Ram G Athreya Axel-Cyrille Ngonga Ngomo Ricardo Usbeck Arizona State University Data Science Group, Paderborn Data Science Group, Paderborn [email protected] University, Germany University, Germany [email protected] [email protected] ABSTRACT (1) We implemented a baseline approach to intent classification In this demo, we introduce the DBPEDIA CHATBOT, a knowledge- for technology federation, graph-driven chatbot designed to optimize community interaction. (2) We implemented a rule-based approach to answer questions The bot was designed for integration into community software to related to the DBpedia community, facilitate the answering of recurrent questions. Four main challenges (3) We combined several tools for answering factual questions, were addressed when building the chatbot, namely (1) understanding (4) We expose the research work being done in DBpedia as prod- user queries, (2) fetching relevant information based on the queries, uct features. For example: (3) tailoring the responses based on the standards of each output • Genesis [5]: We use the APIs from the Genesis project to platform (i.e. Web, Slack, Facebook) as well as (4) developing sub- show similar and related information for a particular entity. sequent user interactions with the DBPEDIA CHATBOT. With this • QANARY [4]: We use WDAqua’s question answering ap- demo, we will showcase our solutions to these four challenges. proach to answer factual questions that are posed to the DBPEDIA CHATBOT. KEYWORDS Our implementation is open-source and a screencast as well as DBpedia, chatbot, knowledge base, question answering source code can be found at https://github.com/dbpedia/chatbot. A live demo can be found at http://chat.dbpedia.org. ACM Reference Format: Ram G Athreya, Axel-Cyrille Ngonga Ngomo, and Ricardo Usbeck. 2018. Enhancing Community Interactions with Data-Driven Chatbots – The DBpe- 2 RELATED WORK dia Chatbot. In WWW ’18 Companion: The 2018 Web Conference Compan- One of the early chatbots was ELIZA [11], which was designed ion, April 23–27, 2018, Lyon, France. ACM, New York, NY, USA, 4 pages. around a pattern matching approach. ELIZA was intended to sound https://doi.org/10.1145/3184558.3186964 like different persons, e.g. like a psychotherapist. Later, the ELIZA- inspired chatbot ALICE [10] was introduced. ALICE was imple- 1 INTRODUCTION mented based on a scripting language called AIML2, which is next to Growing domain-specific communities (e.g., the DBpedia [6] com- RiveScript3, which is currently among the most used languages for munity)1 are often characterized by new users asking previously chatbots. More recent chatbots, such as Rose, combine chat scripts answered questions upon joining the community. Mailing lists, fo- with question understanding to better mimic domain experts [1]. rums (e.g., Facebook, Twitter) or the public Slack channel and FAQs Recent approaches have also considered tackling the combination of are examples of the types of measures that are often put in place RDF and chatbots, e.g., [2, 3, 9]. Due to the lack of space, we refer to mitigate this common problem. However, these solutions are the interested reader to [1, 7] for further references. static and fail to interact with new community members. We address this interaction issue for the DBpedia community by presenting 3 APPROACH the DBPEDIA CHATBOT, which leverages existing communication Our architecture is based on a modular approach to account for sources (between users) as well as DBpedia to enhance community different types of user queries and responses, as can be seen in interactions in the DBpedia community. Figure 1. The DBPEDIA CHATBOT is capable of responding to users In this demo, we will 1) describe and showcase the idea of tech- via (1) simple short text messages or (2) through more elaborate nology federation through a chatbot to facilitate more efficient com- interactive messages using the underlying knowledge graph. Users munication in a community via examples run on laptops, 2) we will can communicate with the DBPEDIA CHATBOT through (3) text but ask users for their feedback to enhance the community value and 3) also through (4) interactions such as clicking on buttons or links as show preliminary evaluations of the DBPEDIA CHATBOT. well as giving feedback. Furthermore, we focused on designing this The contributions of this demo are as follows: bot so that it can be connected to a multitude of platforms to increase 1http://wiki.dbpedia.org/about/dbpedia-community-0 outreach to the community. The core of the DBPEDIA CHATBOT is based on the Spring framework4 and can run on a single core This paper is published under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. Authors reserve their rights to disseminate the work on their machine or be distributed across a cluster. Incoming user feedback personal and corporate Web sites with the appropriate attribution. is stored in Couch DB in an anonymous way for further research . WWW ’18 Companion, April 23–27, 2018, Lyon, France © 2018 IW3C2 (International World Wide Web Conference Committee), published 2 under Creative Commons CC BY 4.0 License. http://www.alicebot.org/aiml.html ACM ISBN 978-1-4503-5640-4/18/04. 3https://www.rivescript.com https://doi.org/10.1145/3184558.3186964 4https://spring.io/ Figure 2: Query Fulfillment Workflow. Figure 1: Architecture of DBPEDIA CHATBOT. language questions are passed to RiveScript which was developed upon the underlying communication protocols for intent classifica- In the following sections, we explain our approach in detail. We will tion. Based on contextual rules defined in RiveScript, the incoming also demonstrate all possible interactions at the demo booth. request is classified into the categories aforementioned and passed Query Lifecycle. We devised a modular pipeline to interweave on to respective request handlers. Algorithm 1 shows the mechanics novel knowledge base technologies, in this case DBpedia-related, of the intent classification process and Figure 2 shows the overall into our approach. The process by which the DBPEDIA CHATBOT workflow of how a query is internally processed by the chatbot. The handles incoming requests can be divided into four steps: different handlers and the APIs that support them are explained later. (1) Incoming Request: For each platform, a webhook handles the incoming request and forwards it to the routing module. Algorithm 1 Intent Classification Pseudocode (2) Request Routing: Incoming requests are routed based on 1: Consider user query Q their respective type. Pure text requests are handled by the 2: Intialize RiveScript variables Text Handler while parameterized requests are handled by the 3: Sanitize common misspellings in Q using JLanguageTool Template Handler. 4: Apply substitutions on Q {Eg: I’m => I am} • Pure Text Requests: A pure text request is basically a 5: Load all RiveScript rules to memory text message from the user. We use RiveScript to identify 6: Sort rules based on priority, rule structure, position of wildcards, the intent of the message and classify it into the following etc. types (see Section Intent Classification): DBpedia Ques- 7: for all rule in rules do tions (rule-based community questions, service checks, lan- 8: if Q matches rule then guage chapters, etc.), Factual Questions (including location 9: return rule− > classi f ication, rule− > arдs questions) and Banter. 10: end if • Parameterized Requests: These are triggered when users 11: end for click on links in information already presented (e.g., a “Learn More” button when presented with information about a particular resource). Rule-Based Community Questions. The DBpedia Handler is (3) Generate Response: The response from the handlers is con- tailored towards conversations around the DBpedia community and verted to a format that is suitable for each platform. based on existing useful background conversations. Here, we ex- (4) Send Response: Finally, the response is sent back to the re- plored DBpedia’s mailing lists to answer DBpedia-related questions spective platform. Additionally, for the Web interface, the via a rule-based approach. client side code to handle the responses is written using Data Sources. The official mailing lists of DBpedia were chosen standard front-end technologies such as HTML, CSS and as the primary data source for this task. This includes the DBpedia Javascript. Discussion5 and DBpedia Developers mailing lists6. Conversational Intent Classification. Intent classification is the first step while threads from the mailing lists were extracted to find interesting processing natural language queries since there are several kinds of 5https://sourceforge.net/p/dbpedia/mailman/dbpedia-discussion questions that the DBPEDIA CHATBOT needs to handle. All natural 6https://sourceforge.net/p/dbpedia/mailman/dbpedia-developers question-answer pairs that could be used for creating conversational Query Fulfillment. Based on the intent detected in the previous scenarios for the chatbot. This dataset was augmented with some step, an appropriate request handler is triggered. Within the context conversations from Slack that were handpicked in an ad-hoc manner. of DBpedia-related questions, the DBpedia handler generates a response which is usually a templated answer based on the scenario. 7 Data Cleanup Tasks. The mailing list dump (mbox file) was However, for certain queries such as "Is DBpedia down right now?" taken as an input and pre-processed to remove undesired messages additional steps are carried out by the DBPEDIA CHATBOT to check based on the criteria mentioned in subsequent sections. The result whether the DBpedia website and SPARQL endpoint are working from pre-processing was stored in a JSON file with the key being the and the result of this finding is returned back to the user asthe original message subject and all associated messages were stored response. as an array for further processing. Several threads were excluded or sanitized in the final dataset, including the following criteria: Natural Language Questions.

Load more