![Simulating Human Associations with Linked Data – End-To- End Learning of Graph Patterns with an Evolutionary Algorithm Supervisors: Prof](https://data.docslib.org/img/3a60ab92a6e30910dab9bd827208bcff-1.webp)
dissertation SIMULATINGHUMANASSOCIATIONSWITH LINKEDDATA End-to-End Learning of Graph Patterns with an Evolutionary Algorithm sdog tcat ni ... nj Thesis approved by the Department of Computer Science of the TU Kaiserslautern for the award of the Doctoral Degree doctor of natural sciences (dr. rer. nat.) to Jörn Hees Date of the viva: 2018-04-09 Dean: Prof. Dr. Stefan Deßloch Reviewers: Prof. Dr. Prof. h.c. Andreas Dengel Prof. Dr. Heiko Paulheim (University of Mannheim) D 386 Jörn Hees: Simulating Human Associations with Linked Data – End-to- End Learning of Graph Patterns with an Evolutionary Algorithm supervisors: Prof. Dr. Prof. h.c. Andreas Dengel Prof. Dr. Heiko Paulheim (University of Mannheim) supplemental material: https://w3id.org/associations or http://purl.org/associations contact information: http://joernhees.de ABSTRACT In recent years, enormous progress has been made in the field of Ar- tificial Intelligence (AI). Especially the introduction of Deep Learning and end-to-end learning, the availability of large datasets and the nec- essary computational power in form of specialised hardware allowed researchers to build systems with previously unseen performance in areas such as computer vision, machine translation and machine gam- ing. In parallel, the Semantic Web and its Linked Data movement have published many interlinked RDF datasets, forming the world’s largest, decentralised and publicly available knowledge base. Despite these scientific successes, all current systems are still nar- row AI systems. Each of them is specialised to a specific task and cannot easily be adapted to all other human intelligence tasks, as would be necessary for Artificial General Intelligence (AGI). Further- more, most of the currently developed systems are not able to learn by making use of freely available knowledge such as provided by the Semantic Web. Autonomous incorporation of new knowledge is however one of the pre-conditions for human-like problem solving. This work provides a small step towards teaching machines such human-like reasoning on freely available knowledge from the Seman- tic Web. We investigate how human associations, one of the building blocks of our thinking, can be simulated with Linked Data. The two main results of these investigations are a ground truth dataset of se- mantic associations and a machine learning algorithm that is able to identify patterns for them in huge knowledge bases. The ground truth dataset of semantic associations consists of DB- pedia entities that are known to be strongly associated by humans. The dataset is published as RDF and can be used for future research. The developed machine learning algorithm is an evolutionary al- gorithm that can learn SPARQL queries from a given SPARQL end- point based on a given list of exemplary source-target entity pairs. The algorithm operates in an end-to-end learning fashion, extracting features in form of graph patterns without the need for human in- tervention. The learned patterns form a feature space adapted to the given list of examples and can be used to predict target candidates from the SPARQL endpoint for new source nodes. On our seman- tic association ground truth dataset, our evolutionary graph pattern learner reaches a Recall@10 of > 63 % and an MRR (& MAP) > 43 %, outperforming all baselines. With an achieved Recall@1 of > 34% it even reaches average human top response prediction performance. We also demonstrate how the graph pattern learner can be applied to other interesting areas without modification. ACKNOWLEDGMENTS This PhD thesis would not have been possible without the support of countless people. First, I would like to thank Prof. Andreas Dengel for the opportu- Supervisors nity to conduct my research. Without his ongoing support, supervi- sion, feedback, the freedom to investigate different approaches, and gentle nudges in the right direction, this thesis would not have been possible. Further, I would like to thank Prof. Heiko Paulheim for be- coming my external supervisor towards the end of this thesis. De- spite the short time, his deep insights, invaluable feedback, fruitful discussions and many great ideas vastly improved this thesis. Finally, I would like to thank Prof. Karsten Berns, my early mentor in the PhD program for his initial guidance and feedback on my research and later for becoming the head of my PhD commission and provid- ing valuable external feedback. I would also like to thank the DFKI, my colleagues and students, DFKI starting with my office mates Ralf, Bahaa, Benjamin, Damian, Joachim and Rouven. Besides being my first real office mate in DFKI and intro- Office ducing me to fancy eye-tracking research, the many discussions with Ralf led to the first conceptual ideas and research questions for this thesis, such as how to rank triples by association strengths, and how to collect such information with GWAPs. Later, Bahaa gave me valu- able insights into the world of semantic editing and Benjamin into ontology based information extraction, leading to me being involved in the NEXUS project and generating many ideas on how to auto- matically disambiguate named entities in the very short association strings that I am dealing with. Next, Damian briefly shared an of- fice with me, allowed me to shape the MOM and DeFuseNN projects with him, took me onto the SVL adventure with him, and later in the MADM group always had an open ear for me, tons of advice and allowed me to do my research by connecting Linked Data with Mul- timedia Analysis and Data Mining. Then, Joachim, not only let me tap into his vast knowledge about computer graphics, deep learning, machine learning in general and mad coding and optimization skills, but also deserves my gratitude for keeping me happy with never ending humour, keeping me focused, being one of the hardest, but always constructive critiques, keeping the bar high and developing a gazillion ideas with me. Last but not least, Rouven, one of my first interns, then HiWi for many years and part-time office mate, not only helped me to test out many crazy ideas and develop the many sys- tems and interactive visualisations for this work, but also never gave up on overcoming even the weirdest browser, JavaScript and CSS chal- lenges. All of you have become much more than just colleagues to me and I enjoyed every second of creativity with you guys in the room. Your feedback, ideas and support were invaluable to me and made this work what it is. Students Next, I would like to thank the many bachelor and master stu- dents whom I had the honour to supervise in seminars, projects and theses. You gave me the chance to look left and right, and to widen my scope much further than I could’ve done without your support. Exceptional thanks here go to Tim for investigating how similarities between Wikipedia topics can be used to predict access statistics and Khamis for developing the Wikipedia Knowledge Test game with me. Research Group Further, I would like to thank the many other members of the former MADM, KM and current SDS research groups, starting with SemWeb the Semantic Web and Linked Data people. The works of Ludger, Michael, Ansgar, Heiko, Benjamin, Manuel, Björn, Sven, Malte, Gun- nar and Leo originally inspired me to join the DFKI in Kaiserslautern. During my time, this area was strengthened by Tristan, Mike and later Markus, Sven and Christian. Thank you all for always taking the time for the many fruitful discussions that not only challenged me and deepened my knowledge, but also helped me to develop many of the ideas behind this thesis. Special thanks to Benjamin, Malte, Leo and Gunnar for igniting the Linked Data flame in me, and to Gunnar for letting me glimpse into his huge machine learning and SemWeb tool- box, encouraging my use of bash pipelines and Unix tools to juggle massive amounts of data, and last but not least for pulling me into the RDFLib project. NEXUS Next, I’d like to thank the people from and around the NEXUS project, so Benjamin, Martin, Heinz, Stephan, Darko and Reuschi. You allowed me to generate and test my many ideas about ranking Linked Data facts in a very creative and fruitful environment. Special thanks to Martin and Heinz for embedding my ideas into the ALOE system and the never ending supply of “Heit schunn Gelacht?” and cat jump fail videos, and many thanks to Reuschi for later allowing me to re- use the Wikipedia indices for the article similarity baselines. MADM I would also like to thank the former and current MADM group, with Tom, Jane, Armin, Adrian, Joost, Markus, Matthias, Kofi, Chris- tian, Damian, Marco and later Sebastian, Federico, Benjamin, Patrick, Tushar and Philipp. You all made me feel at home, always had time for discussions and never were shy to give feedback and comments leading to an endless stream of ideas. Special thanks to Armin and TRB who initially took me on as CBR HiWi, then later to TRB for being my first mentor within the DFKI, supervising my master thesis and thereby paving the way for my PhD topic. Many thanks also to Adrian for his long term mentoring, showing me how to throw pa- pers over the door saddle, being the sickest hacker (only challenged by Gunnar, Mike and Joachim), gently nudging me into the right di- rection, tons of very fruitful discussions, unstucking me with for him obvious ideas and comments, and all the feedback and proofreading towards the end of this thesis. Tons of thanks also to Markus and later Christian and Damian for maintaining and gradually extending the compute infrastructure for everyone and all their feedback especially on the fusion part of this thesis. Last but not least, special thanks to all the people involved in the MOM and DeFuseNN projects for the many discussions, ideas and in general the nice atmosphere and fruitful environment.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages234 Page
-
File Size-