Addressing the Cold Start Problem in the Wikipedia Recommender System Through Content-Based Filtering
Total Page:16
File Type:pdf, Size:1020Kb
Addressing the Cold Start Problem in the Wikipedia Recommender System through Content-Based Filtering Mihai Mihăilă Kongens Lyngby 2011 IMM-MSC-2011-75 Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 [email protected] www.imm.dtu.dk IMM-PHD: ISSN 0909-3192 Abstract Wikipedia is a free online encyclopaedia that can be edited by anyone. Despite its large number of contributors and articles, the qualification of the authors, as well as the quality of the contribution they generate, cannot be easily assessed. Wikipedia Recommender System is a collaborative filtering system used to determine the quality of Wikipedia articles based on user ratings. Being a collaborative filtering system, WRS is affected by the cold start problem. The cold start problem is a phenomenon that occurs when there is not enough data for the system to function correctly. New users of WRS do not receive article ratings until they start interacting with the system and have a trust profile. The current project addresses this problem by using WikiTrust article rating. WikiTrust is a system that calculates the rating of an article by determining and computing the trust value of its words. In this system, the words’ trust level is proportional to their authors trust. Authors gain trust by making changes that last over multiple reviews. The goal is to use the WikiTrust article reputation for calculating the WRS trust level in those cases when WRS does not have enough information to determine a good trust estimation of the article quality. While the main purpose is to address the cold start problem, incorporating WikiTrust reputation into WRS trust calculation could potentially increase WRS’s overall trust level accuracy. The current project will investigate the advantages and disadvantages of integrating the two systems, WRS and WikiTrust, and will attempt to determine the best formula to use the latter for improving the current system performance. ii Preface This thesis was prepared at the Department of Informatics and Mathematical Modelling, within the Technical University of Denmark in partial fulfilment of the requirements for acquiring the M.Sc. degree in engineering. The project was completed in the period from March 7th, 2011 to September 23th, 2011 under the supervision of Associate Professor Christian Damsgaard Jensen. The project design was presented during the poster session of International Conference on Trust Management 5th IFIP WG 11.11. One of the results of this project, the WRS Google Chrome Extension, has been published in the Chrome Web Store and is currently available for free download1 and usage. Lyngby, September 2011 Mihai Mihăilă s091368 1 Wikipedia Recommender System - Chrome Web Store https://chrome.google.com/webstore/detail/dlbpjdiahnhhokdbanadnhgjfoiojdmb iv Contents ABSTRACT ...............................................................................................................I PREFACE ............................................................................................................... III CONTENTS ............................................................................................................. V LIST OF TABLES .................................................................................................. VII LIST OF FIGURES ................................................................................................. IX 1. INTRODUCTION............................................................................................... 1 1.1. INTRODUCTION ............................................................................................ 1 1.2. WIKIPEDIA RECOMMENDER SYSTEM .............................................................. 3 1.3. OBJECTIVES ................................................................................................ 4 1.4. STRUCTURE ................................................................................................ 5 1.5. DEFINITION OF TERMS .................................................................................. 6 2. STATE OF THE ART ........................................................................................ 9 2.1. TRUST MODEL ............................................................................................. 9 2.2. CLASSIFICATION ........................................................................................ 12 2.3. CURRENT ARCHITECTURE .......................................................................... 13 2.4. WRS EVOLUTION ...................................................................................... 17 2.5. SUMMARY ................................................................................................. 18 3. ANALYSIS...................................................................................................... 19 3.1. WIKITRUST ............................................................................................... 20 3.1. NECESSARY SYSTEM ARCHITECTURE CHANGES ........................................... 26 3.2. SUMMARY ................................................................................................. 29 4. DESIGN .......................................................................................................... 31 4.1. SERVER SERVICES .................................................................................... 31 4.2. DATABASE ................................................................................................ 33 4.3. BROWSER EXTENSION ............................................................................... 33 4.4. SECURITY ................................................................................................. 34 4.5. SUMMARY ................................................................................................. 35 5. IMPLEMENTATION ........................................................................................ 37 5.1. SERVER SERVICES .................................................................................... 37 5.2. DATABASE ................................................................................................ 46 5.3. BROWSER EXTENSION ............................................................................... 47 5.4. SUMMARY ................................................................................................. 52 vi 6. EVALUATION................................................................................................. 53 6.1. CONTRIBUTIONS ........................................................................................ 53 6.2. CENTRALIZED VERSUS DECENTRALIZED ....................................................... 54 6.3. PERFORMANCE ......................................................................................... 55 6.4. COLD START PROBLEM .............................................................................. 56 6.5. UI & OTHER IMPROVEMENTS ...................................................................... 64 6.6. SUMMARY ................................................................................................. 67 7. CONCLUSION ................................................................................................ 69 7.1. WIKIPEDIA RECOMMENDER SYSTEM ............................................................ 69 7.2. FUTURE WORK .......................................................................................... 70 7.3. FUTURE RESEARCH ................................................................................... 70 7.1. SUMMARY OF CONCLUSIONS ....................................................................... 71 8. APPENDIX ..................................................................................................... 73 8.1. ACCEPTING THE SELF-SIGNED GLASSFISH CERTIFICATE ................................ 73 8.2. EXAMPLES OF WIKITRUST RATINGS ............................................................. 77 8.3. CODE ....................................................................................................... 87 BIBLIOGRAPHY .................................................................................................. 149 List of Tables Table 1: Entity mapping in wrs.web.dal.tables ................................................ 42 Table 2: WikiTrust rating distribution for featured articles ......................................... 59 Table 3: WikiTrust rating distribution for poor quality articles.................................... 61 Table 4: WikiTrust rating distribution for random articles .......................................... 63 Table 5: Experiments summary ............................................................................... 63 viii List of Figures Figure 1: Trust evolution function ............................................................................ 11 Figure 2: Trust relationship between user A and B .................................................. 13 Figure 3: Scone Proxy WRS Architecture ................................................................ 14 Figure 4: WRS Feedback interface.......................................................................... 17 Figure 5: WikiTrust rating example .......................................................................... 21 Figure 6: Fixing cold start using WikiTrust ............................................................... 23 Figure 7: Updating the WikiTrust rating