Probabilistic Models for Context in Social Media Novel Approaches and Inference Schemes Christoph Carl Kling Institute for Web Science and Technologies University of Koblenz{Landau
[email protected] November 2016 Vom Promotionsausschuss des Fachbereichs 4: Informatik der Universit¨atKoblenz{Landau zur Verleihung des akademischen Grades Doktor der Naturwissenschaften (Dr. rer. nat.) genehmigte Dissertation. PhD thesis at the University of Koblenz-Landau Datum der wissenschaftlichen Aussprache: 16.11.2016 Vorsitz des Promotionsausschusses Prof. Dr. Ralf L¨ammel Berichterstatter: Prof. Dr. Steffen Staab Berichterstatter: Prof. Dr. Markus Strohmaier Berichterstatter: Prof. Dr. Lars Schmidt-Thieme 2 Abstract This thesis presents novel approaches for integrating context information into probabilistic models. Data from social media is typically associated with metadata, which in- cludes context information such as timestamps, geographical coordinates or links to user profiles. Previous studies showed the benefits of using such context information in probabilistic models, e.g. improved predictive perfor- mance. In practice, probabilistic models which account for context informa- tion still play a minor role in data analysis. There are multiple reasons for this. Existing probabilistic models often are complex, the implementation is difficult, implementations are not publicly available, or the parameter estimation is computationally too expensive for large datasets. Addition- ally, existing models are typically created for a specific type of content and context and lack the flexibility to be applied to other data. This thesis addresses these problems by introducing a general approach for modelling multiple, arbitrary context variables in probabilistic models and by providing efficient inference schemes and implementations. In the first half of this thesis, the importance of context and the poten- tial of context information for probabilistic modelling is shown theoretically and in practical examples.