Analyzing and Improving Diversification, Privacy, and Information Management on the Web

ANALYZING AND IMPROVING DIVERSIFICATION, PRIVACY, AND INFORMATION MANAGEMENT ON THE WEB Von der FakultätfürElektrotechnik und Informatik der Gottfried Wilhelm Leibniz UniversitätHannover zur Erlangung des akademischen Grades DOKTOR DER NATURWISSENSCHAFTEN Dr. rer. nat. genehmigte Dissertation von Dipl.-Math. Kaweh Djafari Naini geboren am 11. Mai 1984, in Teheran, Iran Hannover, Deutschland, 2019 Referent: Prof. Dr. techn. Wolfgang Nejdl Korreferent: Assoc. Prof. Dr. Ismail Sengor Altingovde Tag der Promotion: 13.11.2018 ABSTRACT Today, the World Wide Web has become the main source and medium for people to access, share, and manage information. Since user expectations towards all three types of functionalities are high and information volumes are growing very fast, modern web applications are exposed to new challenges by supporting the users in their daily and long- term interactions on the web. In this thesis, we contribute to the following core challenges related to the aforementioned functionalities. Diversification for improving information access - in Web search engines the user can access information by submitting a query that returns a set of search results. Web search queries often contain only a few terms, and can be ambiguous, which is a core issue for retrieval systems. For instance, modern search engines extract a large amount of additional features for building a sophisticated ranking model. Further, recent studies on web search results diversification show that retrieval effectiveness for ambiguous queries can be consid- erably improved by diversifying the search results. In this thesis, we present two approaches for improving retrieval effectiveness and efficiency. First, we present an efficient and scalable algorithm for web search results diversification for large-scale retrieval systems. Second, we present an approach for feature selection in learning-to-rank. Privacy issues and communication practices through information sharing - social networks allow the user to share information to a wider audience or communicate within specific groups. Understanding the users' motivation and behavior in social networks is crucial for supporting the users' needs, e.g. by suggesting relevant resources or creating new services. In recent years, the increasing amount of personal information shared in social networks has exposed users to risks of endangering their privacy. Popular social networks often allow the user to manually control the privacy settings of social content before it is shared. However, existing functionalities for privacy settings are often restricted and very time consuming for the user. In this thesis, we present an approach for predicting privacy settings of the user. Furthermore, we present an in-depth study of social and professional networks for identifying communication practices for different types of users with different skills and expertise. Personalized and long-term information management for social content - the information flood in social media makes it is nearly impossible for users to manually manage their social media posts over several years. Approaches for summarizing and aggregating of social media postings face the challenge to identify information from the past that is still relevant in the future, i.e., for reminiscence or inclusion into a summary. In this thesis, we conduct user evaluation studies to better capture the users' expectation towards information retention. Next, we extract various of features from social media posts, profile and network of the users. Finally, we build general and personalized ranking models for retention, and present a set of seed features which perform best of identifying memorable posts. The approaches in this thesis are compared to existing baselines and state of the art approaches from related work. Keywords: web search results diversification, scalability and efficiency in web search, letor, feature selection, privacy prediction, social network analysis, social media summary ZUSAMMENFASSUNG Heutzutage ist das World Wide Web die wichtigste Quelle zur Informationsbeschaffung, zum Informationsaustausch und der Verwaltung von Informationen. Da die Erwartungen der Nutzer bezüglich der drei vorgenannten Funktionalitätenhoch sind und das Informationsvol- umen sehr schnell wächst, sind moderne Webanwendungen stets neuen Herausforderungen ausgesetzt, um die Nutzer bei ihren täglichen und langfristigen Interaktionen im Web un- terstützenzu können.In dieser Dissertation tragen wir zu den folgenden zentralen Heraus- forderungen bei, die sich auf die oben genannte Funktionalitätenbeziehen. Diversifizierung zur Verbesserung des Informationszugangs - In Web-Suchmaschinen kann der Nutzer auf Informationen zugreifen, indem er eine Abfrage sendet, die eine Reihe von Suchergebnissen zurückliefert. Web-Suchanfragen enthalten oft nur wenige Begriffe und könnenmehrdeutig sein, was fürRetrieval-Systeme ein Kernproblem darstellt. Zum Beispiel extrahieren moderne Suchmaschinen eine große Menge zusätzlicher Merkmale, um ein ausgeklügeltesRanking-Modell zu erstellen. Darüber hinaus zeigen neuere Studien zur Diversifizierung von Websuchergebnissen, dass die Retrieval-Effektivitätfürmehrdeutige Abfragen durch Diversifizierung der Suchergebnisse erheblich verbessert werden kann. In dieser Arbeit präsentieren wir zwei Ansätzezur Verbesserung der Retrieval-Effektivitätund -Effizienz. Zunächst stellen wir einen effizienten und skalierbaren Algorithmus fürdie Diver- sifizierung von Web-Suchergebnissen fürgroße Retrieval-Systeme vor. Zweitens präsentieren wir einen Ansatz fürdie Merkmalauswahl im Learning-to-Rank. Datenschutzprobleme und Kommunikationspraktiken durch Informationsaustausch - Soziale Netzwerke ermöglichen dem Nutzer, Informationen an ein breites Publikum weit- erzugeben oder innerhalb bestimmter Gruppen zu kommunizieren. Um die Nutzer un- terstützenzu können,ist es erforderlich, ihre Motivation und ihr Verhalten in sozialen Netzwerken zu verstehen, indem z.B. relevante Ressourcen vorgeschlagen oder neue Dienste angeboten werden. Indem die Nutzer in den letzten Jahren zunehmend persönliche Informa- tionen in den sozialen Netzwerken teilen, setzen sie sich dem Risiko aus, ihre Privatsphäre zu gefährden.Beliebte soziale Netzwerke ermöglichen es dem Nutzer häufig,die Datenschutze- instellungen von sozialen Inhalten vor der Freigabe manuell zu steuern. Die zum Schutz der Privatsphärevorhandenen Funktionen sind jedoch oft eingeschränktund fürden Nutzer sehr zeitaufwendig. In dieser Arbeit präsentieren wir einen Ansatz zur Vorhersage von Datenschutzeinstellungen des Nutzers. Darüber hinaus stellen wir eine eingehende Studie über soziale und berufliche Netzwerke zur Identifizierung von Kommunikationspraktiken für verschiedene Arten von Nutzern mit unterschiedlichen Fähigkeiten und Kenntnissen vor. Personalisiertes und langfristiges Informationsmanagement fürsoziale Inhalte - Die In- formationsflut in den sozialen Medien macht es den Nutzern nahezu unmöglich, ihre Social- Media-Beiträgeüber mehrere Jahre hinweg manuell zu verwalten. Lösungsansätzezum Sammeln von Social-Media-Beiträgestehen vor der Herausforderung, Informationen aus der Vergangenheit zu identifizieren, die in der Zukunft fürden Nutzer denkwürdigsind und fürdie Erstellung von Zusammenfassungen in Frage kommen. In dieser Arbeit führen wir Nutzerbewertungsstudien durch, um die Erwartungen der Nutzer an die Information- serhaltung besser zu erfassen. Als nächstes extrahieren wir verschiedene Merkmale aus Social-Media-Beiträgensowie aus Profilen und Netzwerken der Nutzer. Schließlich erstellen wir allgemeine und personalisierte Ranking-Modelle für die Aufbewahrung von Beiträgen. Zusätzlich stellen wir eine Reihe von Kernfunktionen vor, die am besten geeignet sind, denkwürdigeBeiträgezu identifizieren. Die Ansätzein dieser Arbeit werden mit bestehenden Baselines und State-of-the-Art Ansätzenaus verwandten Arbeiten verglichen. Schlagwörter: Diversifizierung von Web Suchergebnissen, Skalierbarkeit und Effizienz in der Websuche, letor, Merkmalsauswahl, Privatsphäre Vorhersage, soziale Netzwerkanalyse, Zusammenfassung in sozialen Medien FOREWORD The contributions presented in this thesis have previously appeared in several conference and journal papers as well as one book chapter published in the course of this PhD program: The contributions in Chapter3 are published in: • Kaweh Djafari Naini, Ismail Sengor Altingovde, and Wolf Siberski. Scal- able and efficient web search result diversification. ACM Transactions on the Web, TWEB, 10(3):15:1-15:30, August 2016. The contributions in Chapter4 are published in: • Kaweh Djafari Naini and Ismail Sengor Altingovde. Exploiting result diversification methods for feature selection in learning to rank. In Proceedings of the 36th European Conference on Information Retrieval, ECIR'14, pages 455-461, 2014. The contributions in Chapter5 are published in: • Kaweh Djafari Naini, Ismail Sengor Altingovde, Ricardo Kawase, Eelco Herder, and Claudia Niederée. Analyzing and Predicting Privacy Set- tings in the Social Web. In Proceedings of the 23rd International Con- ference on User Modeling, Adaptation and Personalization, UMAP'15, pages 104-117, Dublin, Ireland, June 29 - July 3, 2015. The contributions in Chapter6 are published in: • Sergiu Chelaru, Eelco Herder, Kaweh Djafari Naini, and Patrick Siehn- del. Recognizing skill networks and their specific communication and connection practices. In Proceedings of the 25th ACM Conference on Hypertext and Social

Analyzing and Improving Diversification, Privacy, and Information Management on the Web

Runtime Repair and Enhancement of Mobile App Accessibility

International Journal of Software and Web Sciences (IJSWS)

Google Benefit from News Content

Journal of International Media & Entertainment

V.6, N.1, Jan./Abr. 2018 § Parágrafo 1

Google and the Algorithmic Infomediation of News

The Best Defense: Threats to Journalists’ Safety Demand Fresh Approach

Toward an Interdisciplinary Framework for Research and Policy Making PREMS 162317

Diversifying Web Search Results

Googles Employee and Technology Focus

Estratègies D'èxit Per a Una Empresa En Línea

Bibliography