Creation, Enrichment and Application of Knowledge Graphs
Total Page:16
File Type:pdf, Size:1020Kb
CREATION, ENRICHMENT AND APPLICATION OF KNOWLEDGE GRAPHS Von der Fakult¨atf¨urElektrotechnik und Informatik der Gottfried Wilhelm Leibniz Universit¨atHannover zur Erlangung des Grades DOKTOR DER NATURWISSENSCHAFTEN Dr. rer. nat. genehmigte Dissertation von M. Sc. Simon Gottschalk geboren am 22. April 1992 in Hannover, Deutschland Hannover, Deutschland, 2021 Referent: Prof. Dr. techn. Wolfgang Nejdl Korreferent: Prof. Dr. Elena Demidova Korreferent: Prof. Dr. S¨orenAuer Tag der Promotion: 29. April 2021 ABSTRACT The world is in constant change, and so is the knowledge about it. Knowledge-based systems { for example, online encyclopedias, search engines and virtual assistants { are thus faced with the constant challenge of collecting this knowledge and beyond that, to understand it and make it accessible to their users. Only if a knowledge-based system is capable of this understanding { that is, it is capable of more than just reading a collection of words and numbers without grasping their semantics { it can recognise relevant information and make it understandable to its users. The dynamics of the world play a unique role in this context: Events of various kinds which are relevant to different communities are shaping the world, with examples ranging from the coronavirus pandemic to the matches of a local football team. Vital questions arise when dealing with such events: How to decide which events are relevant, and for whom? How to model these events, to make them understood by knowledge-based systems? How is the acquired knowledge returned to the users of these systems? A well-established concept for making knowledge understandable by knowledge-based systems are knowledge graphs, which contain facts about entities (persons, objects, locations, . ) in the form of graphs, represent relationships between these entities and make the facts understandable by means of ontologies. This thesis considers knowledge graphs from three different perspectives: (i) Creation of knowledge graphs: Even though the Web offers a multitude of sources that provide knowledge about the events in the world, the creation of an event-centric knowledge graph requires recognition of such knowledge, its integration across sources and its representation. (ii) Knowledge graph enrichment: Knowledge of the world seems to be infinite, and it seems impossible to grasp it entirely at any time. Therefore, methods that autonomously infer new knowledge and enrich the knowledge graphs are of particular interest. (iii) Knowledge graph interaction: Even having all knowledge of the world available does not have any value in itself; in fact, there is a need to make it accessible to humans. Based on knowledge graphs, systems can provide their knowledge with their users, even without demanding any conceptual understanding of knowledge graphs from them. For this to succeed, means for interaction with the knowledge are required, hiding the knowledge graph below the surface. In concrete terms, I present EventKG { a knowledge graph that represents the happenings in the world in 15 languages { as well as Tab2KG { a method for understanding tabular data and transforming it into a knowledge graph. For the enrichment of knowledge graphs without any background knowledge, I propose HapPenIng, which infers missing events from the descriptions of related events. I demonstrate means for interaction with knowledge graphs at the example of two web-based systems (EventKG+TL and EventKG+BT ) that enable users to explore the happenings in the world as well as the most relevant events in the lives of well-known personalities. Key words knowledge graphs, events, Semantic Web, creation of knowledge graphs, enrichment of knowledge graphs, interaction with knowledge graphs ZUSAMMENFASSUNG Die Welt befindet sich im steten Wandel, und mit ihr das Wissen ¨uber die Welt. Wissens- basierte Systeme { seien es Online-Enzyklop¨adien,Suchmaschinen oder Sprachassistenten { stehen somit vor der konstanten Herausforderung, dieses Wissen zu sammeln und dar¨uber hinaus zu verstehen, um es so Menschen verf¨ugbarzu machen. Nur wenn ein wissens- basiertes System in der Lage ist, dieses Verst¨andnisaufzubringen { also zu mehr in der Lage ist, als auf eine unsortierte Ansammlung von W¨orternund Zahlen zur¨uckzugreifen, ohne deren Bedeutung zu erkennen {, kann es relevante Informationen erkennen und diese seinen Nutzern verst¨andlich machen. Eine besondere Rolle spielt hierbei die Dynamik der Welt, die von Ereignissen unterschiedlichster Art geformt wird, die f¨urunterschiedlichste Bev¨olkerungsgruppe relevant sind; Beispiele hierf¨urerstrecken sich von der Corona-Pandemie bis hin zu den Spielen lokaler Fußballvereine. Doch stellen sich hierbei bedeutende Fragen: Wie wird die Entscheidung getroffen, ob und f¨urwen derlei Ereignisse relevant sind? Wie sind diese Ereignisse zu modellieren, um von wissensbasierten Systemen verstanden zu werden? Wie wird das angeeignete Wissen an die Nutzer dieser Systeme zur¨uckgegeben? Ein bew¨ahrtesKonzept, um wissensbasierten Systemen das Wissen verst¨andlich zu machen, sind Wissensgraphen, die Fakten ¨uber Entit¨aten(Personen, Objekte, Orte, . ) in der Form von Graphen sammeln, Zusammenh¨angezwischen diesen Entit¨atendarstellen, und dar¨uber hinaus anhand von Ontologien verst¨andlich machen. Diese Arbeit widmet sich der Betrachtung von Wissensgraphen aus drei aufeinander aufbauenden Blickwinkeln: (i) Erstellung von Wissensgraphen: Auch wenn das Internet eine Vielzahl an Quellen anbietet, die Wissen ¨uber Ereignisse in der Welt bereithalten, so erfordert die Erstellung eines ereigniszentrierten Wissensgraphen, dieses Wissen zu erkennen, miteinander zu verbinden und zu repr¨asentieren. (ii) Anreicherung von Wissensgraphen: Das Wissen ¨uber die Welt scheint schier unendlich und so scheint es unm¨oglich, dieses je vollst¨andig(be)greifen zu k¨onnen.Von Interesse sind also Methoden, die selbstst¨andigdas vorhandene Wissen erweitern. (iii) Interaktion mit Wissensgraphen: Selbst alles Wissen der Welt bereitzuhalten, hat noch keinen Wert in sich selbst, vielmehr muss dieses Wissen Menschen verf¨ugbargemacht werden. Basierend auf Wissensgraphen, k¨onnenwissensbasierte Systeme Nutzern ihr Wissen darlegen, auch ohne von diesen ein konzeptuelles Verst¨andisvon Wissensgraphen abzuverlangen. Damit dies gelingt, sind M¨oglichkeiten der Interaktion mit dem gebotenen Wissen vonn¨oten,die den genutzten Wissensgraphen unter der Oberfl¨ache verstecken. Konkret pr¨asentiere ich EventKG { einen Wissensgraphen, der Ereignisse in der Welt repr¨asentiert und in 15 Sprachen verf¨ugbarmacht, sowie Tab2KG { eine Methode, um in Tabellen enthaltene Daten anhand von Hintergrundwissen zu verstehen und in Wissens- graphen zu wandeln. Zur Anreicherung von Wissensgraphen ohne weiteres Hintergrundwissen stelle ich HapPenIng vor, das fehlende Ereignisse aus den vorliegenden Beschreibungen ¨ahnlicher Ereignisse inferiert. Interaktionsm¨oglichkeiten mit Wissensgraphen demonstriere ich anhand zweier web-basierter Systeme (EventKG+TL und EventKG+BT ), die Nutzern auf einfache Weise die Exploration von Geschehnissen in der Welt sowie der wichtigsten Ereignisse in den Leben bekannter Pers¨onlichkeiten erm¨oglichen. Schlagw¨orter: Wissensgraphen, Ereignisse, Semantic Web, Erstellung von Wissens- graphen, Anreicherung von Wissensgraphen, Interaktion mit Wissensgraphen ACKNOWLEDGMENTS First of all, I would like to thank Prof. Dr. techn. Wolfgang Nejdl, who has supervised my path as a PhD student from the first day on and thus made it possible for me to write up this thesis finally. Second, I would like to thank Prof. Dr. Elena Demidova, who accompanied me closely over the past years at the L3S, which in the end has not only lead to a bunch of joint publications but also to an uncountable amount of inspiring discussions. Third, my thanks go to Prof. Dr. S¨orenAuer for his time and effort to examine this thesis. With more than five years at the L3S Research Center, I have seen many colleagues come and go. With many of them, I had not only intense scientific discussions but also shared great moments outside the research institutions, and it is not without difficulties to mention all of them here. Without a doubt, I have spent the most time with my office mates Nicolas and Markus. This time includes all our conversations whenever we needed to have a break and look up from our screens. The office right next to ours was occupied by Renato and Tarc´ısiomost of the time. They had started at L3S just briefly before me and thus accompanied and enriched a large part of my time as a PhD student. Apart from these two offices at the 15th floor of the L3S, there have been the \KBS guys" Besnik, Bill, Christoph, Ujwal as well as many others, including but not limited to Asmelash, Damianos, Jaspreet, Jurek, Lijun, Mary, Ran and Sergej. More recently, I started collaborations with Sara, Tin and Fakhar. I would also like to thank the colleagues working in the administrative and technical departments, such as Dimitar, Miroslav and others. Outside of the L3S, at conferences, during research visits and via project meetings, I have met many great researchers, including Paolo and several others. Even if we only met for a few days, we will always have some common memories to share. Finally, I would like to thank my family, in particular my parents, sister and girlfriend, as well as my friends. They had accompanied my path as a PhD student, endured my non-availability when it came close to submission deadlines, and never stopped providing me with valuable pieces of