User Modeling and Personalization in the Microblogging Sphere
Total Page:16
File Type:pdf, Size:1020Kb
User Modeling and Personalization in the Microblogging Sphere Qi Gao . User Modeling and Personalization in the Microblogging Sphere Proefschrift ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus prof.ir. K.C.A.M. Luyben, voorzitter van het College voor Promoties, in het openbaar te verdedigen op maandag 28 oktober 2013 om 15:00 uur door Qi GAO Bachelor of Engineering in Automation, Tongji University, geboren te Jiashan, Zhejiang, China. Dit proefschrift is goedgekeurd door de promotoren: Prof.dr.ir. G.J.P.M. Houben Samenstelling promotiecommissie: Rector Magnificus voorzitter Prof.dr.ir. G.J.P.M. Houben Technische Universiteit Delft, promotor Prof.dr. P. Brusilovsky University of Pittsburgh Prof.dr. P.M.E. De Bra Technische Universiteit Eindhoven Prof.dr. V.G. Dimitrova University of Leeds Prof.dr. A. Hanjalic Technische Universiteit Delft Dr. F. Abel XING AG Prof.dr.ir. D.H.J. Epema Technische Universiteit Delft (reservelid) SIKS Dissertation Series No. 2013-33 The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Research School for Information and Knowledge Systems. Published and distributed by: Qi Gao E-mail: [email protected] ISBN: 978-94-6186-227-3 Keywords: user modeling, personalization, recommender systems, semantic web, social web, microblog, twitter, sina weibo Copyright c 2013 by Qi Gao All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, in- cluding photocopying, recording or by any information storage and retrieval system, without written permission of the author. Cover image: Amayzun, “Shell Macro” via Flickr, Creative Commons Attribution. Printed and bound in The Netherlands by CPI Wormann¨ Print Service. Acknowledgments First and foremost I would like to thank my promotor Prof. Geert-Jan Houben who gave me the opportunity to carry out this PhD in the Web Information Systems (WIS) group. I appreciate the freedom that he has given me in trying new ideas and making my own choices along the way. I want to express my sincere gratitude for his extensive guidance and continuous support to my PhD work. I am extremely indebted to my advisor Dr. Fabian Abel without whom this thesis would not have been possible. It has been a great pleasure to work with him. Fabian, thank you for the support, the fund and inspiring discussions, and keeping advising me even after you moved to Hamburg. Besides my promoter and advisor, I would like to thank the rest of my thesis committee: Prof. Peter Brusilovsky, Prof. Paul De Bra, Prof. Vania Dimitrova, Prof. Dick Epema, and Prof. Alan Hanjalic, for their time spent on this thesis and their insightful feedback. I am grateful for working with the WIS group and other colleagues. I appreciate the help and friendship of Stefano Bocconi, Alessandro Bozzon, Ilknur C¸elik, Clau- dia Hauff, Laura Hollink, Damir Juric, Erwin Leonardi, and Richard Stronkman. I would like to thank my officemate, Jan Hidders, for translating the propositions along with this thesis. I also enjoyed interacting and collaborating with other PhD students: Samur Araujo,´ Engin Bozdag, Beibei Hu, Jasper Oosterman, Yue Shi, Ke Tao, and Jie Yang. My many thanks go to Rina Abbriata, Ilse Oonk, Franca Post, and Esther van Rooijen for their help and assistant with administrative issues in the past four years. I also want to thank Paulo Anita, Munire van der Kruyk, and Stephen van der Laan for their excellent ICT support. My special thanks go to my former advisors Prof. Junwei Yan and Dr. Min Liu in Tongji University. I also take this opportunity to thank Prof. Yong Yu, Dr. Haofen Wang, and many other friends in Shanghai Jiaotong University. Their support made my research visit to Shanghai productive and joyful. I have spent a great time in Delft with many good friends. Thank you all and v vi Acknowledgments take care! Last but certainly not least, I would like to thank my parents who always encour- age me to explore and find my own way. My gratitude to them is beyond words. My utmost gratitude goes to my wife Qin Zhou for her unconditional support and love throughout these years. Qi Gao October 2013 Delft Contents Acknowledgments v 1 Introduction 1 1.1 Thesis Outline . .4 1.2 Origin of Chapters . .5 2 Background 7 2.1 User Modeling . .7 2.1.1 Overview . .7 2.1.2 User Profiling in the Social Web . .9 2.1.3 User Modeling for the Social Semantic Web . 14 2.2 Recommender Systems . 17 2.2.1 Overview . 17 2.2.2 Collaborative Filtering Recommender Systems . 18 2.2.3 Content-based Recommender Systems . 20 2.3 Research Challenges tackled in this Thesis . 22 3 Microblogging-based User Modeling Framework 25 3.1 Introduction . 25 3.2 TweetUM - Tweet-based User Modeling Framework . 27 3.2.1 Topic Modeling . 29 3.2.2 Enrichment . 32 3.2.3 Temporal Constraints . 34 3.2.4 Weighting Schemes . 35 vii viii Contents 3.3 GeniUS - Generic User Modeling Library for the Social Semantic Web.................................. 37 3.3.1 Architecture of GeniUS . 37 3.3.2 Domain-specific User Profile Construction Using GeniUS . 40 3.4 Discussion . 43 4 Semantic Enrichment for Microblogging-based User Modeling 47 4.1 Introduction . 47 4.2 Exploitation of Linkage for Microblogging-based User Modeling . 49 4.2.1 Linkage Discovery Strategies . 52 4.2.2 Evaluation of Linkage Discovery . 55 4.2.3 Analyzing User Profile Construction based on Linkage Dis- covery . 59 4.3 Exploitation of Emotion for Microblogging-based User Modeling . 62 4.3.1 Emotions in Microposts . 63 4.3.2 Emotion Classification Strategies . 64 4.3.3 Evaluation of Emotion Classification . 66 4.3.4 Analyzing Emotion-based User Profiles . 68 4.4 Discussion . 71 5 Microblogging-based User Modeling for Culture-aware Analytics 73 5.1 Introduction . 73 5.2 Analysis of Users’ Microblogging Behavior on Sina Weibo and Twitter . 75 5.2.1 Methodology . 76 5.2.2 Analysis of Access Behavior . 79 5.2.3 Syntactic Content Analysis . 81 5.2.4 Semantic Content Analysis . 84 5.2.5 Sentiment Analysis . 86 5.2.6 Analysis of Temporal Behavior . 88 5.2.7 Interpretation of Findings . 91 5.3 Analysis of Information Propagation on Sina Weibo and Twitter . 92 5.3.1 Research Questions . 92 5.3.2 Reposting Frequency . 93 5.3.3 Reposting Speed . 94 Contents ix 5.3.4 Broadness of User Interests . 94 5.3.5 Syntactical Characteristics of propagated messages . 95 5.3.6 Sentiment Characteristics of propagated messages . 96 5.3.7 Interpretation of Findings . 97 5.4 Discussion . 98 6 Microblogging-based User Modeling for Personalized Recommendations101 6.1 Introduction . 101 6.2 Analyzing User Modeling on Twitter for Personalized News Rec- ommendation . 104 6.2.1 Analysis of Twitter-based User Profiles . 104 6.2.2 Exploitation of User Profiles for Personalized News Rec- ommendations . 109 6.2.3 Synopsis . 112 6.3 Interweaving Trend and User Modeling on Twitter for Personalized News Recommendation . 113 6.3.1 Trend Modeling on Twitter . 113 6.3.2 Temporal Analysis of User and Trend Profiles on Twitter . 117 6.3.3 Evaluation of Trend and User Modeling for Recommending News Articles . 120 6.3.4 Synopsis . 123 6.4 Analyzing Temporal Dynamic on Twitter for Personalization . 124 6.4.1 Evolution of User Interests in Trending Topics . 125 6.4.2 Time-sensitive User Modeling for Personalized Recommen- dations . 131 6.4.3 Synopsis . 135 6.5 Domain-specific User Modeling on Twitter for Personalized Rec- ommendations . 136 6.5.1 Analysis of Domain-Specific User Profile Construction . 136 6.5.2 Evaluation of Domain-Specific User Profile Construction for Recommendation System . 139 6.5.3 Synopsis . 142 6.6 Discussion . 143 7 Conclusion 147 x Contents 7.1 Summary of Contributions . 147 7.2 Future Work . 152 Bibliography 155 List of Figures 175 List of Tables 177 Summary 179 Samenvatting 181 Curriculum Vitae 183 Chapter 1 Introduction Throughout the last years, microblogging has become a popular mechanism for in- formation sharing and communication on the Web. For example, Twitter, as the most prominent microblogging service, serves more than 500 million users who post over 340 million short messages every day1, sharing their thoughts and every- day activities with the public. On microblogging platforms, users are able to post messages, which are limited to a certain maximum length (e.g., 140 characters on Twitter), as well as repost messages of other users. In addition, users can follow other users so that they can receive the latest posts published by those users. Mi- croblogging services such as Twitter also provide APIs that allow third parties to ac- cess microblogging data and develop various external applications such as systems for event detection [168, 181], opinion mining [40] or personalized recommenda- tions [44, 85]. As microblogging services have gained immense popularity around the world, more and more people post real-time messages via different devices to discuss a variety of topics. Given the plethora of digital traces that people leave on the mi- croblogging platforms, researchers have started exploiting microblogging activities for understanding users’ information needs and modeling users’ preferences [28, 96]. Some research initiatives focus on inferring specific attributes of a user from microblogging data such as the user’s location [153], political orientation [78], or influential power [42]. However, there are interesting research questions regard- ing user modeling based on microblogging activities that have not been studied yet. How can we learn the semantics of microblogging activities and infer users’ inter- ests from those activities? How can we construct user profiles based on microblog- ging data to support different applications such as personalized recommender sys- 1http://techcrunch.com/2012/07/30/analyst-twitter-passed-500m-users-in-june-2012-140m-of- them-in-us-jakarta-biggest-tweeting-city/ 1 2 1.