Recommending Best Answer in a Collaborative Question Answering System
Total Page:16
File Type:pdf, Size:1020Kb
RECOMMENDING BEST ANSWER IN A COLLABORATIVE QUESTION ANSWERING SYSTEM Mrs Lin Chen Submitted in fulfilment of the requirements for the degree of Master of Information Technology (Research) School of Information Technology Faculty of Science & Technology Queensland University of Technology 2009 Recommending Best Answer in a Collaborative Question Answering System Page i Keywords Authority, Collaborative Social Network, Content Analysis, Link Analysis, Natural Language Processing, Non-Content Analysis, Online Question Answering Portal, Prestige, Question Answering System, Recommending Best Answer, Social Network Analysis, Yahoo! Answers. © 2009 Lin Chen Page i Recommending Best Answer in a Collaborative Question Answering System Page ii © 2009 Lin Chen Page ii Recommending Best Answer in a Collaborative Question Answering System Page iii Abstract The World Wide Web has become a medium for people to share information. People use Web- based collaborative tools such as question answering (QA) portals, blogs/forums, email and instant messaging to acquire information and to form online-based communities. In an online QA portal, a user asks a question and other users can provide answers based on their knowledge, with the question usually being answered by many users. It can become overwhelming and/or time/resource consuming for a user to read all of the answers provided for a given question. Thus, there exists a need for a mechanism to rank the provided answers so users can focus on only reading good quality answers. The majority of online QA systems use user feedback to rank users’ answers and the user who asked the question can decide on the best answer. Other users who didn’t participate in answering the question can also vote to determine the best answer. However, ranking the best answer via this collaborative method is time consuming and requires an ongoing continuous involvement of users to provide the needed feedback. The objective of this research is to discover a way to recommend the best answer as part of a ranked list of answers for a posted question automatically, without the need for user feedback. The proposed approach combines both a non-content-based reputation method and a content- based method to solve the problem of recommending the best answer to the user who posted the question. The non-content method assigns a score to each user which reflects the users’ reputation level in using the QA portal system. Each user is assigned two types of non-content-based reputations cores: a local reputation score and a global reputation score. The local reputation score plays an important role in deciding the reputation level of a user for the category in which the question is asked. The global reputation score indicates the prestige of a user across all of the categories in the QA system. Due to the possibility of user cheating, such as awarding the best answer to a friend regardless of the answer quality, a content-based method for determining the quality of a given answer is proposed, alongside the non-content-based reputation method. Answers for a question from different users are compared with an ideal (or expert) answer using traditional Information Retrieval and Natural Language Processing techniques. Each answer provided for a question is assigned a content score according to how well it matched the ideal answer. To evaluate the performance of the proposed methods, each recommended best answer is compared with the best answer determined by one of the most popular link analysis methods, © 2009 Lin Chen Page iii Recommending Best Answer in a Collaborative Question Answering System Page iv Hyperlink-Induced Topic Search (HITS). The proposed methods are able to yield high accuracy, as shown by correlation scores: Kendall correlation and Spearman correlation. The reputation method outperforms the HITS method in terms of recommending the best answer. The inclusion of the reputation score with the content score improves the overall performance, which is measured through the use of Top-n match scores. © 2009 Lin Chen Page iv Recommending Best Answer in a Collaborative Question Answering System Page v Table of Contents Keywords .................................................................................................................................................i Abstract ................................................................................................................................................. iii Table of Contents .................................................................................................................................... v List of Figures ..................................................................................................................................... viii List of Tables........................................................................................................................................ xii List of Abbreviations ............................................................................................................................ xvi Statement of Original Authorship ........................................................................................................xvii Acknowledgments ................................................................................................................................ xix CHAPTER 1: INTRODUCTION ...................................................................................................... 1 1.1 Background .................................................................................................................................. 1 1.2 Related Works .............................................................................................................................. 3 1.3 Research objectives ...................................................................................................................... 4 1.4 Research Contribution .................................................................................................................. 5 1.5 Thesis Organisition ...................................................................................................................... 6 1.6 Published Paper ............................................................................................................................ 7 CHAPTER 2: BACKGROUND & LITERATURE REVIEW ........................................................ 9 2.1 Online Social Network ................................................................................................................. 9 2.2 Social Network Analysis Methods for Yahoo! Answers ............................................................ 11 2.3 Approaches to Identify Answer Quality ..................................................................................... 14 2.3.1 Content Based Approach – Natural Language Processing and Information Retrieval .... 14 2.3.1.1 Information Retrieval ..................................................................................... 15 2.3.1.2 Natural Language Processing ......................................................................... 18 2.3.1.3 Content Based Question Answering System Architecture ............................. 20 2.3.1.4 Current status of Question Answering Systems.............................................. 22 2.3.2 Reputation Based Approaches ........................................................................................ 24 2.3.3 Link Analysis .................................................................................................................. 25 2.3.3.1 PageRank Algorithm ...................................................................................... 25 2.3.3.2 Hyperlink-Induced Topic Search Algorithm .................................................. 27 2.3.4 Statistical Approach ........................................................................................................ 31 2.4 Conclusion ................................................................................................................................. 33 © 2009 Lin Chen Page v Recommending Best Answer in a Collaborative Question Answering System Page vi CHAPTER 3: ANALYSIS OF YAHOO! ANSWERS .................................................................... 35 3.1 Yahoo! Answers Mechanism ..................................................................................................... 36 3.2 Graph Representation................................................................................................................. 41 3.3 The Bow Tie Structure Analysis ................................................................................................ 43 3.4 Degree Centrality ....................................................................................................................... 46 3.5 Question Quality & Answer Quality .......................................................................................... 48 3.6 A Hierarchical Classification Structure for Placing Questions .................................................. 50 3.7 Conclusion ................................................................................................................................. 52 CHAPTER 4: METHODOLOGY .................................................................................................... 55 4.1 Overview of Methodology ........................................................................................................