Recommendation in Social Bookmarking Communities

Graph-Based Recommendation in Broad Folksonomies vorgelegt von Diplom-Ingenieur Robert Wetzker Von der FakultätIV { Elektrotechnik und Informatik { der Technischen UniversitätBerlin zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften { Dr.-Ing. { genehmigte Dissertation Promotionsausschuss: Vorsitzender: Prof. Dr. Möller Berichter: Prof. Dr. Albayrak Berichter: Prof. Dr. Bauckhage Tag der wissenschaftlichen Aussprache: 14. Mai 2010 Berlin 2010 D 83 Abstract The user-centric annotation of resources by freely chosen words (tags) has become the predominant form of content categorization of the Web 2.0 age. It provided the foundations for the success of now famous services, such as Delicious, Flickr, LibraryThing, or Last.fm. Tags have proven to be a powerful alternative to existing top-down categorization techniques, as for example taxonomies or predefined dictionaries, which lack flexibility and are expensive in their creation and maintenance. Instead, tagging allows users to choose the labels that match their real needs, tastes, or language, which reduces the required cognitive effort. Services that center on the collaborative tagging of resources are called folksonomies. These communities have become an invaluable source for information retrieval, since they bundle the interests of thousand or even millions of users, magnifying the underlying domain from a user-centric perspective. Furthermore, the existence of tags allows for the unified modeling of content independent of its type or format and without the need of expensive content retrieval, processing, storage, or indexing. Of special interest from an information retrieval perspective are the tags that are generated in \broad folksonomies", where many users reference and tag the same objects. This collaborative tagging produces high quality content descriptors and charac- teristic tag spectra with high potential for better content models. This dissertation aims to develop novel algorithms for the personalized recommendation of objects within folksonomies. Recommender systems support users in the discovery of valuable content out of an overwhelming set of choices and already increase the perceived value of many web applications. However, the tripartite structure of folksonomies that results from the in- teraction of content, tags, and users goes beyond the capability of common recommendation techniques. The algorithms presented in this dissertation are instead especially tailored towards to needs of folksonomies. They incor- porate tagging and usage patterns in parallel in order to derive more precise and robust recommendation models. Exhaustive tests for different settings and various folksonomies show that the developed solutions produce more iii accurate recommendations than the current state-of-the-art. iv Zusammenfassung Die nutzerzentrierte Klassifikation von Ressourcen durch Zuweisung frei wählbarerWörter(Tags) hat sich zu einem integralen Bestandteil moderner Webanwendungen entwickelt. Währendsich kontrollierte Klassifikations- schemata wie etwa Taxonomien bzw. Ontologien aus Nutzersicht häufig als zu starr, kostspielig und wenig dynamisch erweisen, gewährenTags ein hohes Maß an Flexibilitätund erfordern nur geringen kognitiven Aufwand währenddes Klassifikationssprozesses. Die Nutzerfreundlichkeit und Effek- tivitätdieser Art der unkontrollierten Inhaltsklassifikation begründeteden Erfolg und die heutige weite Verbreitung von Tagging Communities wie De- licious, Flickr, LibraryThing oder Last.fm, welche man auch als Folksonomien bezeichnet. Folksonomien bündelndie Interessen von Tausenden oder Millio- nen von Nutzern und werden dadurch zu einzigartigen Informationsquellen. Die Existenz von Tags erlaubt darüber hinaus eine einheitliche Modellierung von Inhalten unabhängigvon ihrer Art oder ihres Formats und erspart in vie- len Fälleneine aufwendige Verarbeitung, Speicherung sowie Indizierung der Inhalte selbst. Aus Datenverarbeitungssicht besonders interessant sind soge- nannte \broad folksonomies", bei welchen die selben Inhalte von mehreren Nutzern referenziert und annotiert werden können,so dass es zur Bildung deskriptiver und charakteristischer Worthäufungenkommt. In dieser Dissertation werden Algorithmen zur personalisierten Empfehlung von Inhalten in Folksonomien entworfen und evaluiert. Empfehlungssysteme unterstützen Nutzer beim Entdecken interessanter Inhalte aus einer unübersichtlichen Menge existierender Möglichkeiten. Obwohl bereits umfangreiche Forschungsarbeiten auf dem Gebiet von Empfehlungssystemen existieren, lassen sich diese nicht ohne Informa- tionsverlust auf die tripartite Folksonomiestruktur aus Inhalten, Nutzern und Tags übertragen. Die in dieser Arbeit entwickelten Algorithmen berücksich- tigen hingegen diese besondere Graphstruktur und profitieren von den durch Tags gegebenen semantischen Zusatzinformationen. Ausführliche Tests in verschiedenen Szenarien und fürmehrere Folksonomien zeigen, dass die entwickelten Lösungenbessere Empfehlungen liefern als existierende Ansätze. v Acknowledgments This dissertation would not have been possible without the constant encour- agement and support of many inspiring people. I was fortunate to have two experienced supervisors during my time at the DAI-Labor of the Technische UniversitätBerlin. Professor Sahin Albayrak convinced me to follow the academic career path and provided the guidance as well as the freedom that were essential for this dissertation. Professor Christian Bauckhage introduced me to the world of research and has been an enduring source of inspiration and assistance ever since. Furthermore, I have been lucky to work with an amazing group of re- searchers during the past four years. I am deeply thankful to: Tansu Alpcan, Leonhard Hennig, Alexander Korth, Florian Metze, Nicolas Neubauer, Till Plumbaum, Alan Said, Winfried Umbrath, and Carsten Zimmermann. I would especially like to thank Tansu Alpcan, Florian Metze, and Chris- tian Bauckhage who introduced me to the topic of folksonomies and provided the opportunity for extensive study in this direction. I thank Winfried Um- brath and Alan Said for all the fruitful discussions and countless MATLAB nights. I also want to thank JérômeKunegis, who constantly challenged the problems and solutions we discussed. Trying to convince him of my ideas always contested previous believes and led to new insights. I further thank all other members of the competence center on information retrieval and machine learning (CC-IRML) as well as the entire team of the DAI-Labor for their collaboration and support during the last years. Finally, I would like to express my deep gratitude to the many people who proofread this dissertation: Leonhard Hennig, Alan Said, Andreas Hotho, Alice Motanga, Carsten Zimmermann, Stefan Fricke, and JérômeKunegis. Their valuable comments, questions, and ideas have helped to improve the quality and coherence of this dissertation significantly. Last but not least I thank my family. For everything. vii Publications Many of the materials and concepts in this thesis have appeared in previous publications by the author. • Robert Wetzker, Carsten Zimmermann, Christian Bauckhage, Sahin Albayrak, I tag, You tag: Translating Tags for Advanced User Models, In: 3rd ACM Int. Conf. on Web Search and Data Mining (WSDM), 2010 • Alan Said, Robert Wetzker, Winfried Umbrath, Leonhard Hennig, A hybrid PLSA approach for warmer cold start in folksonomy recommendation, In: 3rd ACM Conf. on Recommender Systems, Workshop on Recommender Systems and the Social Web, 2009 • Robert Wetzker, Alan Said, Carsten Zimmermann, Understanding the user: Personomy translation for tag recommendation, In: Eur. Conf. on Machine Learning and Principles and Practice of Knowledge Dis- covery in Databases (ECML PKDD), Discovery Challenge, 2009 • Nicolas Neubauer, Robert Wetzker, Klaus Obermayer, Tag Spam Cre- ates Large Non-Giant Connected Components, In: 18th Int. World Wide Web Conference, 5th Int. Workshop on Adversarial Information Retrieval on the Web (AIRWEB), 2009 • Robert Wetzker, Winfried Umbrath, Alan Said, A hybrid approach to item recommendation in folksonomies (Best Technical Paper), In: 2nd ACM Int. Conf. on Web Search and Data Mining (WSDM), Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR), 2009 • Robert Wetzker, Carsten Zimmermann, Christian Bauckhage, Detect- ing Trends in Social Bookmarking Systems: A del.icio.us Endeavor, In: Int. Journal of Data Warehousing and Mining (to appear) ix • Christian Bauckhage, Tansu Alpcan, Robert Wetzker, Winfried Um- brath, IMAGE RETRIEVAL AND WEB 2.0: WHERE CAN WE GO FROM HERE?, In: 15th IEEE Int. Conf. on Image Processing (ICIP), 2008 • Robert Wetzker, Till Plumbaum, Alexander Korth, Christian Bauck- hage, Tansu Alpcan, Florian Metze, Detecting Trends in Social Book- marking Systems using a Probabilistic Generative Model and Smooth- ing, In: Int. Conf. on Pattern Recognition (ICPR), 2008 • Robert Wetzker, Carsten Zimmermann, Christian Bauckhage, Ana- lyzing Social Bookmarking Systems: A del.icio.us Cookbook, In: Eur. Conf. on Artificial Intelligence (ECAI), 2008 x Contents Introduction1 Motivation . .1 Contribution and structure . .3 Related publications . .7 Out of scope . .7 I The Nature of Folksonomies 11 1 An introduction to folksonomies 13 1.1 The power of tags . 15 1.2 Broad and narrow folksonomies . 17 1.3 Folksonomies & IR . 19 1.4 Modeling folksonomies . 22 1.5 Datasets . 24 2 A delicious endeavor 29 2.1 Crawling Delicious . 31 2.2

Recommendation in Social Bookmarking Communities

From the Semantic Web to Social Machines

Studies and Analysis of Reference Management Software: a Literature Review

Studying Social Tagging and Folksonomy: a Review and Framework

Linked Data Overview and Usage in Social Networks

What Makes a Good Reference Manager? A

Mendeley Reader Counts for US Computer Science Conference

Module Five - Management of Information

Selected Papers from the Conference and School on Records, Archives and Memory

Reference Management Software (Rms) in an Academic Environment: a Survey at a Research University in Malaysia

Evaluación De Software Libre Para La Gestión De Bibliografía

Research Techniques in Network and Information Technologies, February

Tactics for Evaluating Citation Management Tools William Marino Eastern Michigan University, [email protected]