Relevance Models for Collaborative Filtering
Total Page:16
File Type:pdf, Size:1020Kb
Relevance Models for Collaborative Filtering Relevance Models for Collaborative Filtering Proefschrift ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus prof.dr.ir. J. T. Fokkema, voorzitter van het College voor Promoties, in het openbaar te verdedigen op maandag 7 april 2008 om 12:30 uur door Jun Wang MSc in Computer Science from National University of Singapore, Singapore, Bachelor in Electrical Engineering from Southeast University, China, geboren te Jiangsu, China. Dit proefschrift is goedgekeurd door de promotor: Prof.dr.ir. M.J.T. Reinders Toegevoegd promotor: Dr.ir. A.P. de Vries Samenstelling promotiecommissie: Rector Magnificus, voorzitter Prof.dr.ir. M.J.T. Reinders, Technische Universiteit Delft, promotor Dr.ir. A.P. de Vries, CWI, toegevoegd promotor Prof.dr.ir. G. Jongbloed, Technische Universiteit Delft Prof.dr. S.E. Robertson, Microsoft Research, Cambridge, UK Prof.dr.ir. M. van Steen, Vrije Universiteit Prof.dr. B. Berendt, Katholieke Universiteit Leuven, Belgium Dr.ir. D. Hiemstra, Universiteit Twente Advanced School for Computing and Imaging This work was carried out in the ASCI graduate school; ASCI dissertation series number 158. The research reported in this thesis was financed by the CACTUS and I-Share projects. ISBN 978-90-9022932-4 Copyright c 2007 by Jun Wang All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without written permission from the copyright owner. The Master said, “When I walk along with two others, they may serve me as my teachers. I will select their good qualities and follow them, their bad qualities and avoid them.” The Lunyu: BooK VII Shu’er Confucius, 551 BCE - 479 BCE to my family for making it possible Summary Collaborative filtering is the common technique of predicting the interests of a user by collecting preference information from many users. Although it is generally regarded as a key information retrieval technique, its relation to the existing information retrieval theory is unclear. This thesis shows how the de- velopment of collaborative filtering can gain many benefits from information retrieval theories and models. It brings the notion of relevance into collabora- tive filtering and develops several relevance models for collaborative filtering. Besides dealing with user profiles that are obtained by explicitly asking users to rate information items, the relevance models can also cope with the situa- tions where user profiles are implicitly supplied by observing user interactions with a system. Experimental results complement the theoretical insights with improved recommendation accuracy for both item relevance ranking and user rating prediction. Furthermore, the approaches are more than just analogy: our derivations of the unified relevance model show that popular user-based and item-based approaches represent only a partial view of the problem, whereas a unified view that brings these partial views together gives better insights into their relative importance and how retrieval can benefit from their combination. vii Samenvatting Collaborative filtering is een bekende techniek om de interesses van een ge- bruiker te voorspellen aan de hand van gegevens over de voorkeuren van een grote groep gebruikers. De relatie tussen collaborative filtering en de algemeen geaccepteerde theorie voor information retrieval is echter grotendeels onbek- end. Dit proefschrift toont aan dat information retrieval modellen een grote bijdrage kunnen leveren aan de ontwikkeling van collaborative filtering. Het introduceert het begrip relevantie in collaborative filtering, en ontwikkelt ver- volgens een reeks modellen voor dit relevantiebegrip. Naast gebruikersprofielen gebaseerd op expliciete voorkeursinformatie, verkregen door gebruikers te vra- gen objecten te beoordelen, kunnen deze modellen ook gebruik maken van im- pliciete indicatoren van relevantie op basis van waarnemingen over de interactie van gebruikers met objecten. Behalve theoretische inzichten, wijzen de exper- imentele resultaten uit dat deze modellen de gebruikersinteresse nauwkeuriger voorspellen, alsmede relevantere objecten aanbevelen. Bovendien blijkt uit de afleidingen in het proefschrift dat collaborative filtering met information re- trieval theorie niet slechts een mooie metafoor is. De reeds bekende gebruikers- en objectbenaderingen van collaborative filtering blijken slechts een deel van het probleem te beschrijven, terwijl een verenigde kijk op beide deelbenaderingen een beter inzicht geeft in hun relatieve belang en hoe retrieval kan profiteren van hun combinatie. ix Contents Summary vii Samenvatting ix 1 Introduction 1 1.1 Scope ................................. 3 1.1.1 Collaborative Filtering Scenarios . 4 1.1.2 Putting Relevance into Collaborative Filtering . 4 1.1.3 Issues in Collaborative Filtering . 6 1.2 ThesisOutline ............................ 7 1.3 MainContributions.......................... 8 I Relevance Models 11 2 Language Modelling Approaches 13 2.1 Introduction.............................. 14 2.2 Background .............................. 14 2.2.1 Rating-based Collaborative Filtering . 14 2.2.2 Log-based Collaborative Filtering . 15 2.3 A User-Item Relevance Model . 17 xi 2.3.1 Item-based Generation . 18 Probability Estimation and Smoothing . 19 Linear Interpolation Smoothing . 19 2.3.2 User-based Generation . 20 2.3.3 Discussions .......................... 21 InverseItemFrequency . 21 TwoRepresentations. 22 2.4 Experiments.............................. 22 2.5 Conclusions .............................. 25 Commentary on Chapter 2 27 3 Probabilistic Relevance Ranking 31 3.1 Introduction.............................. 32 3.2 RelatedWork............................. 33 3.2.1 Rating Prediction . 33 3.2.2 ItemRanking......................... 34 3.3 A Probabilistic Relevance Ranking Framework . 35 3.3.1 Item-Based Relevance Model . 36 3.3.1.1 Probability Estimation . 38 3.3.2 User-Based Relevance Model . 43 3.3.3 Discussions .......................... 44 3.4 Experiments.............................. 44 3.4.1 DataSets ........................... 44 3.4.2 ExperimentProtocols . 45 3.4.3 Performance ......................... 46 3.4.4 Parameter Estimation . 49 3.5 Conclusions .............................. 50 3.A The Okapi BM25 Document Ranking Score . 53 xii Commentary on Chapter 3 55 4 Personalized Collaborative Tagging 57 4.1 Introduction.............................. 58 4.2 RelatedWork............................. 60 4.3 Personalization Models . 61 4.3.1 IndexingPhase ........................ 63 4.3.1.1 Collaborative Indexing Model . 63 4.3.2 ExploratorySearchPhase . 65 4.3.2.1 Collaborative Browsing Model . 65 4.3.2.2 Collaborative Item Search Model . 66 4.3.3 Discussions .......................... 67 4.4 Experiments.............................. 68 4.4.1 Data Set Preparation . 68 4.4.2 Evaluation Protocols . 69 4.4.3 Performance of Personalization Models . 71 4.4.4 Representation of User Profiles . 72 4.4.5 Impact of Parameters . 74 4.5 Conclusions .............................. 76 4.A ProbabilityEstimation. 77 Commentary on Chapter 4 79 II Unified Models 81 5 Similarity Fusion 83 5.1 Introduction.............................. 84 5.2 RelatedWork............................. 85 5.3 Background .............................. 86 5.3.1 User-based Collaborative Filtering . 86 xiii 5.3.2 Item-based Collaborative Filtering . 88 5.4 SimilarityFusion ........................... 89 5.4.1 IndividualPredictors. 89 5.4.2 Probabilistic Fusion Framework . 90 5.4.3 Probability Estimation . 92 5.4.4 Discussions .......................... 93 5.5 EmpiricalEvaluation. 94 5.5.1 ExperimentalSetup . 94 5.5.2 IndividualPredictors. 96 5.5.3 Impact of Parameters . 96 5.5.4 DataSparsity......................... 97 5.5.5 Comparison to Other Methods . 99 5.6 Conclusions ..............................100 5.A Normalization. .. .. .. .. .. .. .. .. .100 5.B A Unified Weighting Function . 101 6 Unified Relevance Models 103 6.1 Introduction..............................104 6.2 RelatedWork.............................106 6.2.1 Collaborative Filtering . 106 6.2.2 Probabilistic Models for Information Retrieval . 107 6.3 Background ..............................109 6.3.1 User-based Collaborative Filtering . 110 6.3.2 Item-based Collaborative Filtering . 111 6.3.3 Combining User-based and Item-based Approaches . 112 6.4 Probabilistic Relevance Prediction Models . 113 6.4.1 Three Different Relevance Models . 114 6.4.1.1 User-based Relevance Model . 115 6.4.1.2 Item-based Relevance Model . 116 xiv 6.4.1.3 Unified Relevance Model . 116 6.4.2 Probability Estimation . 117 6.4.2.1 Density Estimation for Rating Models . 117 6.4.2.2 Density Estimation for Preference Models . 118 6.4.3 Rating Predictions . 120 6.4.4 Cross-validated EM algorithm . 122 6.4.5 A Generalised Distance Measure . 122 6.4.6 Discussions ..........................125 6.4.7 Computational Complexity . 127 6.4.7.1 Offline Computation . 128 6.4.7.2 Online Computation . 129 6.5 Experiments..............................130 6.5.1 DataSets ...........................130 6.5.2 Evaluation Protocols . 130 6.5.3 Results ............................131 6.5.3.1 Parameter Estimation . 131 6.5.3.2 Sparsity. .. .. .. .. .. .. ..134 6.5.3.3 Comparison to other approaches . 140 6.6 ConclusionsandFutureWork . .142 6.A Cross-validated EM algorithm . 144 Commentary on