Limits and Chances of Social Information Retrieval Using a Prototypical Implementation of a Social Information Retrieval System

Technische Universität München Fakultät für Informatik Forschungs- und Lehreinheit XI Angewandte Informatik / Kooperative Systeme LIMITSANDCHANCESOFSOCIAL INFORMATION RETRIEVAL christoph fuchs Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzende(r): Univ.-Prof. Dr. Dr. h.c. Javier Esparza Prüfer der Dissertation: 1. Priv.-Doz. Dr. Georg Groh 2. Univ.-Prof. Dr. Helmut Krcmar Die Dissertation wurde am 30.03.2016 bei der Technischen Universität München eingereicht und durch die Fakultät für Informatik am 05.08.2016 angenom- men. ABSTRACT The prevailing approaches for web search are mainly driven by content similarities and disregard social relationships between the information seeker and the information provider. Furthermore, only explicitly published information is considered. Al- though several social search approaches exist, only a small subset interprets social search as querying other people’s information spaces. Following concepts like homophily from the social sciences, the objective of this thesis is to assess the potential of social information retrieval approaches to satisfy information needs. Therefore, a specific, but also highly customizable social information retrieval concept is developed, prototypically implemented, and evaluated in various usage scenarios. The results allow to identify limits of and success factors for social information retrieval systems. By conducting a survey with 112 participants, we show that using one’s social network is a valid method to satisfy information needs, but privacy is considered as a potential threat for information seekers (an additional survey also confirmed the results for information providers, n = 608). The analysis of two large social networking datasets from Twitter and Facebook indicate that content from socially close people is perceived as more important by the information seeker than content from other people, affirming social information retrieval as promising method to satisfy information needs. As part of the thesis, a social information retrieval concept is developed that is specific and specific enough to be implemented prototypically, but also sufficiently flexible and parameterizable to cover a broad range of social information retrieval scenarios. The distributed character of the system leads to smaller document collec- tions which allow to apply semantically richer modeling approaches like latent topic models or explicit concept representations. Using these prototypes, various aspects of the social information retrieval workflow are evaluated using (1) datasets covering socially relevant information (scientific abstracts as expertise profiles, social question & answer platforms) and (2) data obtained from a real-world social information retrieval experiment using the developed prototypes with 121 participants in the course of three weeks. The social information retrieval experiment consists of a manual mode relying on human intelligence to route questions and reply to answers (considered as the hypothetical upper bound w.r.t. quality), an automatic mode (routing and content identification done by the system), and a specific use case (social product search). The results confirm that an adjusted interaction pattern successfully mitigates the participants’ reluctance to share information. The findings indicate that social closeness is positively correlated with the reply’s degree of relevance. Based on the col- lected data, serendipitous effects can not be linked to social closeness, but appear to co-occur with high degrees of content knowledge similarity. The outcome of the social product search experiment suggests that socially close people are interested in the same products with a higher probability than socially distant people. This could be interpreted as confirmation that social networks can support buying decisions. iii Overall, the results indicate that social information retrieval is a promising enhance- ment of existing tools for information gathering, especially for information needs that benefit from personal judgment. iv ZUSAMMENFASSUNG Vorherrschende Verfahren zur Informationssuche im Web greifen vorwiegend auf in- haltliche Kriterien zurück und ignorieren weitgehend die soziale Beziehung zwischen informationssuchendem und -bereitstellendem Nutzer. Darüber hinaus werden aus- schließlich explizit publizierte Informationen berücksichtigt. Obwohl einige Social- Search-Ansätze existieren, interpretiert nur ein kleiner Teil davon “Social Search” als direkte Abfrage der Informationsräume anderer Benutzer. Dem aus den Sozialwissenschaften entlehnten Homophilie-Begriff folgend, ist das Ziel dieser Arbeit das Potential von Social-Information-Retrieval-Ansätzen zur Erfül- lung von Informationsbedürfnissen zu bewerten. Hierzu wird ein ausreichend spe- zifisches, aber dennoch hinreichend allgemeines Konzept eines Social-Information- Retrieval-Systems entwickelt, als Prototyp implementiert und in zahlreichen Anwen- dungsfällen evaluiert, um die Grenzen und Erfolgsfaktoren für Social-Information- Retrieval-Systeme zu identifizieren. Basierend auf einer Umfrage unter 112 Teilnehmern zeigen wir, dass soziale Netz- werke eine ernstzunehmende Methode sind, Informationsbedürfnisse zu erfüllen, aber die Verletzung der Privatheit von den informationssuchenden Nutzern als po- tentielle Gefahr gesehen wird (eine zusätzliche Umfrage bestätigt die Ergebnisse auch für die Anbieter der Informationen, n = 608). Die Ergebnisse der Analyse zweier Da- tensätze aus dem Social-Networking-Bereich (Twitter, Facebook) deuten darauf hin, dass Inhalte von sozial nahestehenden Personen von informationssuchenden Benut- zern als bedeutsamer wahrgenommen werden, was grundsätzlich Social Information Retrieval als vielversprechenden Ansatz bekräftigt. Im weiteren Verlauf der Arbeit wird ein Konzept für ein Social-Information-Retrieval-System entwickelt, das einer- seits ausreichend spezifisch und konkret ist, um als Prototyp implementiert zu werden, andererseits aber auch flexibel genug ist, um eine Vielzahl möglicher Anwen- dungsfälle und Implementierungsvarianten abzubilden. Die verteilte Struktur des Systems und die damit einhergehende geringere Größe der einzelnen Informations- räume erlaubt die Verwendung semantisch reicher Modellierungen wie Latent Topic Models oder die Rückführung auf explizite Konzeptrepräsentationen. Mit Hilfe der Prototypen werden verschiedene Aspekte des Social-Information-Retrieval-Ablaufs evaluiert. Hierzu wird auf existierende Datensätze (wissenschaftliche Abstracts als Expertise-Profile, soziale Q&A Seiten) und empirisch erhobene Daten aus einem Social-Information-Retrieval-Experiment über drei Wochen mit 121 Teilnehmern zu- rückgegriffen. Das Social-Information-Retrieval-Experiment besteht aus einem ma- nuellen Modus, der bei Routing-Entscheidungen und der Beantwortung von Fragen ausschließlich auf menschliche Intelligenz zurückgreift (um eine hypothetische, obe- re Qualitätsgrenze zu simulieren), einem automatischen Modus, wobei Routing von Fragen und Identifikation relevanter Inhalte durch das System durchgeführt werden, und einem konkreten Anwendungsfall (Social Product Search). Die Ergebnisse bestätigen, dass ein angepasstes Interaktionsmuster die Teilungsbe- reitschaft von Informationen erhöht. Darüber hinaus weisen die Resultate darauf hin, dass soziale Nähe zwischen informationssuchendem und informationsbereitstellen- v dem Nutzer positiv mit der Relevanz des Ergebnisses korreliert. Serendipity-Effekte können anhand der gesammelten Daten nicht durch soziale Nähe erklärt werden, sondern scheinen auf Ähnlichkeiten des vorhandenen Wissens zwischen beiden Par- teien zurückzuführen sein. Das Ergebnis des Social-Product-Search-Experiments be- kräftigt, dass sozial nahestehende Personen mit höherer Wahrscheinlichkeit Interesse an den gleichen Produkten haben als sozial weiter entfernte Personen. Diese Erkennt- nis kann als Bestätigung der Eignung sozialer Netzwerke zur Unterstützung von Kaufentscheidungen interpretiert werden. Insgesamt lässt sich festhalten, dass Social Information Retrieval eine vielverspre- chende Erweiterung existierender Werkzeuge zur Sammlung von Informationen dar- stellt, besonders für Informationsbedürfnisse, die von persönlichen Einschätzungen profitieren. vi PUBLICATIONS Fuchs, C. and Groh, G. (2015a). An Attempt to Evaluate Chances and Limitations of Social Information Retrieval. In Proceedings of the 5th International Conference on Advanced Collaborative Networks, Systems and Applications, COLLA ’15, St. Julians, Malta. IARIA. Fuchs, C. and Groh, G. (2015b). Appropriateness of Search Engines, Social Networks, and Directly Approaching Friends to Satisfy Information Needs. In Proceedings of the 5th Workshop on Social Network Analysis in Applications, SNAA ’15, Paris, France. Fuchs, C. and Groh, G. (2016a). Data Sharing Sensitivity, Relevance, and Social Traits in Social Information Retrieval. International Journal of Social Computing and Cyber- Physical Systems, submitted, currently in review. Fuchs, C. and Groh, G. (2016b). Routing of Queries in Social Information Retrieval using Latent and Explicit Semantic Cues. In Proceedings of the Third Network Intelligence Conference, ENIC ’16, Wroclaw, Poland. Fuchs, C., Hauffa, J., and Groh, G. (2015). Does Friendship Matter? An Analysis of Social Ties and Content Relevance in Twitter and Facebook. In Proceedings of the

Load more