Intelligent, Item Based Stereotype Recommender System

Total Page:16

File Type:pdf, Size:1020Kb

Intelligent, Item Based Stereotype Recommender System Intelligent, Item-Based Stereotype Recommender System NOURAH ABDULMOHSEN AL-ROSSAIS Ph.D. Thesis Computer Science Feb 2021 To my beloved father, my role model and inspiration who has sadly passed away during my study ... You will always motivate me to achieve more in life ... Abstract Recommender systems (RS) have become key components driving the success of e-commerce, and other platforms where revenue and customer satisfaction is dependent on the user’s ability to discover desirable items in large catalogues. As the number of users and items on a platform grows, the computational complexity, the vastness of the data, and the sparsity problem constitute important challenges for any recommendation algorithm. In addition, the most widely studied filtering-based RS, while effective in providing suggestions for established users and items, are known for their poor performance for the new user and new item (cold start) problems. Stereotypical modelling of users and items is a promising approach to solving these problems. A stereotype represents an aggregation of the characteristics of the items or users which can be used to create general user or item classes. This work propose a set of methodologies for the automatic generation of stereotypes during the cold-starts. The novelty of the proposed approach rests on the findings that stereotypes built independently of the user-to-item ratings improve both recommendation metrics and computational performance during cold-start phases. The resulting RS can be used with any machine learning algorithm as a solver, and the improved performance gains due to rate-agnostic stereotypes are orthogonal to the gains obtained using more sophisticated solvers. Recommender Systems using the primitive metadata features (baseline systems) as well as factorisation-based systems are used as benchmarks for state-of-the-art methodologies to assess the results of the proposed approach under a wide range of recommendation quality metrics. The results demonstrate how such generic groupings of the metadata features, when performed in a manner that is unaware and independent of the user’s community preferences, may greatly reduce the dimension of the recommendation model, and provide a framework that improves the quality of recommendations in the cold start. 3 Contents List of Tables . 10 List of Figures . 14 Acknowledgements . 15 Declaration . 16 Publications . 17 1 Introduction 18 1.1 Background and Rationals . 18 1.2 Problem Statement . 20 1.3 Research Questions and Hypothesis . 22 1.4 Structure of the Thesis . 22 2 Literature Review 24 2.1 Recommender System in e-commerce . 24 2.2 Popular Recommender Systems . 26 2.3 Existing Recommender System Approaches . 27 2.3.1 Collaborative Filtering (CF) . 27 2.3.2 Content-Based Filtering (CBF) . 29 2.3.3 Demographic . 30 2.3.4 Hybrid . 30 2.3.5 Pairwise Preference Learning . 31 2.4 Cold-Start Problem for Recommender Systems . 31 2.5 Stereotype-Based Modeling . 33 2.5.1 Stereotype - Definition and Evolution . 33 2.5.2 Stereotypes in Recommender Systems . 37 2.5.3 Approaches for Stereotypical Groups . 39 2.6 Evaluation of Recommender Systems . 41 2.7 Summary . 42 3 Problem Analysis 44 3.1 Research Methodology . 45 3.2 MovieLens 1 Million Dataset . 46 3.3 Clustering Algorithms . 48 4 CONTENTS 5 3.4 Experimental Evaluation . 49 3.5 Summary . 68 4 Automatic Construction of Item-Based Stereotypes 70 4.1 Stereotypes for Complex Categorical Features . 70 4.2 Stereotypes for Numerical Features . 74 4.3 Stereotype Creation Experiment . 80 4.3.1 Results with Complex Categorical Features . 80 4.3.2 Results with Numerical Features . 96 4.4 Summary . 102 5 Preliminary Evaluation of Stereotypes 103 5.1 Homogeneity of Training and Test Datasets . 104 5.2 Stereotypes Evaluation . 106 5.2.1 A Hard Test . 106 5.2.2 A Soft Test . 114 5.2.3 Predictive Power of Stereotypes . 122 5.3 Summary . 124 6 Stereotype-Based Recommendation Performance 125 6.1 Experimental Evaluation . 126 6.1.1 Cold Start Assessment of Item Consumption . 127 6.1.2 Cold Start Assessment of Item Rating . 133 6.1.3 Cold Start Assessment of Recommendations Driven by Stereotypes versus SVD-Based RS (with metadata) . 139 6.2 Summary . 149 7 Validation of the Stereotype-Driven Methodology 151 7.1 Amazon Dataset . 152 7.2 Constructing Item-Based Stereotypes . 153 7.2.1 Stereotypes for Complex Categorical Features . 154 7.2.2 Stereotypes for the Numerical Features . 159 7.3 Evaluation of Stereotypes . 163 7.3.1 Homogeneity of Training and Test Datasets . 163 7.3.2 A Hard Test . 166 7.3.3 A Soft Test . 171 7.3.4 Predictive Power of Stereotypes . 174 7.4 Recommendation Performance . 176 7.4.1 Cold Start Assessment of Item Rating . 177 7.4.2 Cold Start Assessment of Recommendations Driven by Stereotypes versus SVD-Based RS (with metadata) . 184 7.5 Summary . 191 8 Conclusion and Future Work 193 8.1 Conclusion . 193 8.2 Contribution to knowledge . 196 8.3 Limitations and Future Work . 199 References 201 List of Tables 3.1 Combined ML 1 Million/IMDb movie and user features ................. 47 3.2 Movie features transformed to a numerical feature vector for Analysis 1 and Analysis 1.B 49 3.3 Top 7 ranked feature directions (or subspace of directions) according to maximum sep- n;a aration as measured by δj for an increasing number of clusters (n) and algorithm (a) equals to k-means ................................... 59 3.4 Top 7 ranked feature directions (or subspace of directions) according to maximum sep- n;a aration as measured by δj for an increasing number of clusters (n) and algorithm (a) equals to GMM .................................... 59 3.5 Top 7 ranked feature directions (or subspace of directions) according to maximum separ- n;a ation as measured by δj for DBSCAN ........................ 59 3.6 Top 7 ranked feature directions (or subspace of directions) according to maximum sep- n;a aration as measured by δj for an increasing number of clusters (n) and algorithm (a) equals to k-means for Analysis 1.B ........................... 64 3.7 Top 7 ranked feature directions (or subspace of directions) according to maximum sep- n;a aration as measured by δj for an increasing number of clusters (n) and algorithm (a) equals to gaussian mixture model (GMM) for Analysis 1.B ............... 64 3.8 Top 7 ranked feature directions (or subspace of directions) according to maximum separ- n;a ation as measured by δj for DBSCAN for Analysis 1B ................ 64 3.9 Top 7 ranked feature directions (or subspace of directions) according to maximum sep- n;a aration as measured by δj for an increasing number of clusters (n) and algorithm (a) equals to k-means for Analysis 1.C .......................... 67 3.10 Top 7 ranked feature directions (or subspace of directions) according to maximum sep- n;a aration as measured by δj for an increasing number of clusters (n) and algorithm (a) equals to GMM for Analysis 1.C ........................... 68 3.11 Top 7 ranked feature directions (or subspace of directions) according to maximum separ- n;a ation as measured by δj for DBSCAN for Analysis 1.C ................ 68 4.1 Stereotypes automatically generated using algorithm 1 for the feature: genre and keywords 92 4.2 k-modes resulting centroids composition for 5 clusters and the genre feature, with two alternative methodologies for the initialisation of the position of the centroids ...... 94 4.3 k-modes resulting centroids composition for 10 clusters and the genre feature, with two alternative methodologies for the initialisation of the position of the centroids ...... 94 6 LIST OF TABLES 7 4.4 Centroid composition identified by k-modes for the feature keywords with Huang initial- isation methodology and k= 20 ............................ 95 4.5 Filtered modes discovered and ranked via the persistence algorithm for feature: log(budget+1) 97 4.6 Filtered modes discovered and ranked via the persistence algorithm for feature: log(revenue+1) 97 4.7 Filtered modes discovered and ranked via the persistence algorithm for feature: director popularity ....................................... 97 4.8 Filter modes discovered and ranked via the persistence algorithm for feature: country distance ........................................ 97 4.9 Filtered modes discovered and ranked via the persistence algorithm for feature: cast pop- ularity ........................................ 97 4.10 Filter modes discovered and ranked via the persistence algorithm for feature: language popularity ....................................... 97 4.11 Filter modes discovered and ranked via the persistence algorithm for feature: cast gender bias .......................................... 98 4.12 Filtered modes discovered and ranked via the persistence algorithm for feature: popularity of movie ....................................... 98 4.13 Filtered modes discovered and ranked via the persistence algorithm for feature: production company popularity .................................. 98 4.14 Filtered modes discovered and ranked via the persistence algorithm for feature: release time of the year .................................... 98 4.15 Filtered modes discovered and ranked via the persistence algorithm for feature: vote average 98 4.16 Filtered modes discovered and ranked via the persistence algorithm for feature: release year 98 4.17 Filtered modes discovered and ranked via the persistence algorithm for feature: runtime . 99 4.18 Filtered modes discovered and ranked via the persistence algorithm for feature: log(vote count) ........................................ 99 4.19 Classification
Recommended publications
  • Applications Received During October, 2017
    Applications received during October, 2017 The publication is a list of applications received during the month of October, 2017. The said publication may be treated as a notice to every person who claims or has any interest in the subject matter of Copyright or disputes the rights of the applicant to it under Rule 70 (9) of Copyright Rules, 2013. Objections, if any, may be made in writing to copyright office by post or e-mail within 30 days of the publication. Even after issue of this notice, if no work/documents are received within 30 days, it would be assumed that the applicant has no work / document to submit and as such, the application would be treated as abandoned, without further notice, with a liberty to apply afresh. S.No. Diary No. Date Title of Work Category Applicant 1 13989/2017-CO/A 01/10/2017 Rani ki Vav Artistic Harshi Shah 1. Lavangi Mirchi, 2. Mann mhantay Wow 3.Mitranon Kaalchit 2 13990/2017-CO/L 01/10/2017 Baat 4.Thamb zara Literary/ Dramatic Jyoti Rane 3 13991/2017-CO/L 01/10/2017 Activity Journal 2 Literary/ Dramatic Dissect EduSolutions Pvt Ltd 4 13992/2017-CO/L 01/10/2017 Activity Journal 3 Literary/ Dramatic Dissect EduSolutions Pvt Ltd Septinity The Science of the Godhead The 5 13993/2017-CO/L 01/10/2017 Creed Literary/ Dramatic Bishop Dr. K. P. V. Abraham 6 13994/2017-CO/L 01/10/2017 Beyond the Sins Literary/ Dramatic Rahul Nakra 7 13995/2017-CO/L 01/10/2017 Activity Journal 4 Literary/ Dramatic Dissect EduSolutions Pvt Ltd 8 13996/2017-CO/L 01/10/2017 Activity Journal 5 Literary/ Dramatic Dissect EduSolutions
    [Show full text]
  • Youth and Status in Tamil Nadu, India
    University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 2010 Youth and Status in Tamil Nadu, India Constantine V. Nakassis University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Film and Media Studies Commons, Linguistic Anthropology Commons, Social and Cultural Anthropology Commons, and the South and Southeast Asian Languages and Societies Commons Recommended Citation Nakassis, Constantine V., "Youth and Status in Tamil Nadu, India" (2010). Publicly Accessible Penn Dissertations. 227. https://repository.upenn.edu/edissertations/227 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/227 For more information, please contact [email protected]. Youth and Status in Tamil Nadu, India Abstract This sociocultural anthropological study looks at youth culture in Tamil Nadu, India, focusing on college- age youth in Madurai and Chennai. The dissertation first shows how outhy experience their position in the larger Tamil society as “being outside of.” This exteriority is manifest in youth concepts of status and gender, the signs and activities which express such status and gender, and the social spaces in which such signs and activities are played out. In particular, the dissertation focuses on how the youth peer group is dually shaped as an exterior space of youth status negotiation—as exterior to adult norms of authority (and thus a space of status-raising qua transgression) and as exterior to norms of hierarchical ranking (and thus an egalitarian space of status-leveling, intimacy, and reciprocity). It is this tension between status-raising and -lowering which the dissertation shows to be crucially at play in how youth engage with and deploy various status-ful signs.
    [Show full text]
  • Film As Text and the Exploration of Reading Practices in Sanctioned Institutional Abuse
    Behind Closed Doors: Film as Text and the Exploration of Reading Practices in Sanctioned Institutional Abuse Item Type text; Electronic Dissertation Authors di Filippo, JoAnn Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 28/09/2021 17:49:44 Link to Item http://hdl.handle.net/10150/306146 1 BEHIND CLOSED DOORS: FILM AS TEXT AND THE EXPLORATION OF READING PRACTICES IN SANCTIONED INSTITUTIONAL ABUSE by JoAnn di Filippo _________________________ Copyright @ JoAnn di Filippo 2013 A Dissertation Submitted to the Faculty of the COMPARATIVE CULTURAL AND LITERARY STUDIES PROGRAM In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 2013 2 THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE As members of the Dissertation Committee, we certify that we have read the dissertation prepared by JoAnn di Filippo entitled BEHIND CLOSED DOORS: FILM AS TEXT AND THE EXPLORATION OF READING PRACTICES IN SANCTIONED INSTITUTIONAL ABUSE and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy. _______________________________________________________________________ Date: June 14, 2013 Barbara Babcock _______________________________________________________________________ Date: June 14, 2013 James Greenberg _______________________________________________________________________ Date: June 14, 2013 Mary Beth Haralovich Final approval and acceptance of this dissertation is contingent upon the candidate’s submission of the final copies of the dissertation to the Graduate College.
    [Show full text]
  • 40 Page Feb2010.Qxd
    VOL. 7 NO. 04 february 2010 www.civilsocietyonline.com rs 50 bbiigg rriissee ooff rreeggiioonnaall Paresh Mokashi and other directors cciinneemmaa rewrite the rules street to police reforms in doldrums school with Pages 6-7 literacy india football with a sixth sense Pages 10-11 Children who once worked as ragpickers and domestic help join green hunt targets activist formal schools and Pages 12-13 dream big Pages 14-15 for a glowing skin Page 36 politics is not jUst aboUt politicians Understand development Understand politics right plaCe for politiCs ConTenTS reaD u s. We reaD yo u. Local and global egional cinema is making waves in the hands of a new breed of directors who are fiercely independent and rooted in their local rrealities. They aren’t attracting nationwide audiences as yet, but that perhaps has more to do with the way films are marketed and distributed than their own formidable talents. it is a question of time before the demand for such films grows and in these early offerings is perhaps the first truly original indian cinema in a long while. it is a measure of how much attention these films are capable of attracting that Harishchandrachi Factory by Paresh Mokashi was an indian entry at the oscars. COVER STORY is regional cinema representative of a larger trend in the country? are people demanding a better hearing for their local problems? are new tech - big rise of regional cinema nologies helping them to express themselves nationally and globally? The Young regional language directors are making films which films not only speak of local realities, but they also show flowering of tal - are indigenous, original and global in scope.
    [Show full text]
  • Film As Text and the Exploration of Reading Practices in Sanctioned Institutional Abuse
    Behind Closed Doors: Film as Text and the Exploration of Reading Practices in Sanctioned Institutional Abuse Item Type text; Electronic Dissertation Authors di Filippo, JoAnn Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 30/09/2021 08:46:19 Link to Item http://hdl.handle.net/10150/306146 1 BEHIND CLOSED DOORS: FILM AS TEXT AND THE EXPLORATION OF READING PRACTICES IN SANCTIONED INSTITUTIONAL ABUSE by JoAnn di Filippo _________________________ Copyright @ JoAnn di Filippo 2013 A Dissertation Submitted to the Faculty of the COMPARATIVE CULTURAL AND LITERARY STUDIES PROGRAM In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 2013 2 THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE As members of the Dissertation Committee, we certify that we have read the dissertation prepared by JoAnn di Filippo entitled BEHIND CLOSED DOORS: FILM AS TEXT AND THE EXPLORATION OF READING PRACTICES IN SANCTIONED INSTITUTIONAL ABUSE and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy. _______________________________________________________________________ Date: June 14, 2013 Barbara Babcock _______________________________________________________________________ Date: June 14, 2013 James Greenberg _______________________________________________________________________ Date: June 14, 2013 Mary Beth Haralovich Final approval and acceptance of this dissertation is contingent upon the candidate’s submission of the final copies of the dissertation to the Graduate College.
    [Show full text]