User Profiling Based on Folksonomy Information in Web 2.0 for Personalised Recommender Systems

USER PROFILING BASED ON FOLKSONOMY INFORMATION IN WEB 2.0 FOR PERSONALISED RECOMMENDER SYSTEMS Huizhi Liang Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy Faculty of Science and Technology Queensland University of Technology April 2011 To my dear father Weiwang Liang and mother Tengying Liang Keywords User Profiling, Recommender Systems, Folksonomy, Tags, Taxonomy, Personalisation, Web 2.0 Page i Page ii Abstract Information overload has become a serious issue for web users. Personalisation can provide effective solutions to overcome this problem. Recommender systems are one popular personalisation tool to help users deal with this issue. As the base of personalisation, the accuracy and efficiency of web user profiling affects the performances of recommender systems and other personalisation systems greatly. In Web 2.0, the emerging user information provides new possible solutions to profile users. Folksonomy or tag information is a kind of typical Web 2.0 information. Folksonomy implies the users‘ topic interests and opinion information. It becomes another source of important user information to profile users and to make recommendations. However, since tags are arbitrary words given by users, folksonomy contains a lot of noise such as tag synonyms, semantic ambiguities and personal tags. Such noise makes it difficult to profile users accurately or to make quality recommendations. This thesis investigates the distinctive features and multiple relationships of folksonomy and explores novel approaches to solve the tag quality problem and profile users accurately. Harvesting the wisdom of crowds and experts, three new user profiling approaches are proposed: folksonomy based user profiling approach, taxonomy based user profiling approach, hybrid user profiling approach based on folksonomy and taxonomy. The proposed user profiling approaches are applied to recommender systems to improve their performances. Based on the generated user profiles, the user and item based collaborative filtering approaches, combined with the content filtering methods, are proposed to make recommendations. Page iii The proposed new user profiling and recommendation approaches have been evaluated through extensive experiments. The effectiveness evaluation experiments were conducted on two real world datasets collected from Amazon.com and CiteULike websites. The experimental results demonstrate that the proposed user profiling and recommendation approaches outperform those related state-of-the-art approaches. In addition, this thesis proposes a parallel, scalable user profiling implementation approach based on advanced cloud computing techniques such as Hadoop, MapReduce and Cascading. The scalability evaluation experiments were conducted on a large scaled dataset collected from Del.icio.us website. This thesis contributes to effectively use the wisdom of crowds and expert to help users solve information overload issues through providing more accurate, effective and efficient user profiling and recommendation approaches. It also contributes to better usages of taxonomy information given by experts and folksonomy information contributed by users in Web 2.0. Page iv Table of Contents Keywords ................................................................................................................................................. i Abstract.................................................................................................................................................. iii Table of Contents.................................................................................................................................... v List of Figures ...................................................................................................................................... viii List of Tables .......................................................................................................................................... x Statement of Original Authorship .......................................................................................................... xi 1 INTRODUCTION ............................................................................................................................. 1 1.1 Overview ..................................................................................................................................... 1 1.2 Research Problem and Objectives ............................................................................................... 7 1.2.1 Research Problem ............................................................................................................ 7 1.2.2 Research Objectives ....................................................................................................... 10 1.3 Research Significance and Contributions .................................................................................. 11 1.4 Research Methodology .............................................................................................................. 13 1.5 Thesis Outline ........................................................................................................................... 14 2 LITERATURE REVIEW ............................................................................................................... 18 2.1 User Profiling ............................................................................................................................ 18 2.1.1 Web Personalisation ....................................................................................................... 18 2.1.2 User Profiling Approaches ............................................................................................. 19 2.1.2.1 User Information Collection....................................................................................... 21 2.1.2.2 User Profile Representation ....................................................................................... 25 2.1.3 User Profiling in Web 2.0 .............................................................................................. 27 2.1.3.1 User Profiling Based on Tags ..................................................................................... 28 2.1.3.2 User Profiling Based on Other Web 2.0 User Information ........................................ 30 2.1.3.3 Hybrid User Profiling Based on Tags and Other Information .................................... 32 2.2 Recommender Systems ............................................................................................................. 33 2.2.1 Recommendation Tasks and Evaluation Approaches .................................................... 34 2.2.2 Recommendation Approaches........................................................................................ 36 2.2.2.1 Content Based Filtering .............................................................................................. 36 2.2.2.2 Collaborative Filtering Approaches ........................................................................... 37 2.2.2.3 Hybrid Approaches .................................................................................................... 41 2.2.3 Recommender Systems Based on Taxonomy ................................................................ 42 2.2.4 Recommender Systems Based on Folksonomy .............................................................. 44 2.3 Chapter summary ...................................................................................................................... 47 3 USER PROFILING BASED ON FOLKSONOMY ...................................................................... 49 3.1 Notations ................................................................................................................................... 49 3.2 The Relationship Modelling of Folksonomy ............................................................................. 53 3.3 Tag Representation Based on Folksonomy ............................................................................... 57 3.3.1 The Two Conditional Probabilities ................................................................................ 60 3.3.2 The Relevance of Two Tags in Terms of an Individual User ........................................ 62 3.4 Item Representation Based on Folksonomy .............................................................................. 67 3.5 User Profile Generation Based on Folksonomy ........................................................................ 73 3.6 A Framework of User Profiling Based on Folksonomy ............................................................ 78 3.7 Chapter Summary ...................................................................................................................... 79 Page v 4 USER PROFILING BASED ON TAXONOMY ........................................................................... 80 4.1 Notations ................................................................................................................................... 81 4.2 Taxonomy Based User Profiling ............................................................................................... 83 4.2.1 Item Representation Based on Taxonomy ...................................................................... 84 4.2.2 Tag Representation Based on Taxonomy ....................................................................... 91 4.2.3 User Representation Based on Taxonomy ....................................................................

Load more