Kernel Methods for Knowledge Structures
Total Page:16
File Type:pdf, Size:1020Kb
Kernel Methods for Knowledge Structures Zur Erlangung des akademischen Grades eines Doktors der Wirtschaftswissenschaften (Dr. rer. pol.) von der Fakultät für Wirtschaftswissenschaften der Universität Karlsruhe (TH) genehmigte DISSERTATION von Dipl.-Inform.-Wirt. Stephan Bloehdorn Tag der mündlichen Prüfung: 10. Juli 2008 Referent: Prof. Dr. Rudi Studer Koreferent: Prof. Dr. Dr. Lars Schmidt-Thieme 2008 Karlsruhe Abstract Kernel methods constitute a new and popular field of research in the area of machine learning. Kernel-based machine learning algorithms abandon the explicit represen- tation of data items in the vector space in which the sought-after patterns are to be detected. Instead, they implicitly mimic the geometry of the feature space by means of the kernel function, a similarity function which maintains a geometric interpreta- tion as the inner product of two vectors. Knowledge structures and ontologies allow to formally model domain knowledge which can constitute valuable complementary information for pattern discovery. For kernel-based machine learning algorithms, a good way to make such prior knowledge about the problem domain available to a machine learning technique is to incorporate it into the kernel function. This thesis studies the design of such kernel functions. First, this thesis provides a theoretical analysis of popular similarity functions for entities in taxonomic knowledge structures in terms of their suitability as kernel func- tions. It shows that, in a general setting, many taxonomic similarity functions can not be guaranteed to yield valid kernel functions and discusses the alternatives. Secondly, the thesis addresses the design of expressive kernel functions for text mining applications. A first group of kernel functions, Semantic Smoothing Kernels (SSKs) retain the Vector Space Models (VSMs) representation of textual data as vec- tors of term weights but employ linguistic background knowledge resources to bias the original inner product in such a way that cross-term similarities adequately con- tribute to the kernel result. A second group of kernel functions, Semantic Syntactic Tree Kernels (SSTKs) replace the VSM representation by a representation based on parse trees. In contrast to other kernel functions based on this representation, they are again capable of a more adequate handling of semantic similarities between dif- ferent terms. Both types of kernel functions are then extensively evaluated in practi- cal experiments. Thirdly, the thesis provides a framework for kernel functions on instance data in ontologies. These data instances are formally described by means of an ontologi- cal structure. The framework allows to flexibly model kernel functions that probe instances for selected ontological properties. Again, the thesis also shows the appli- cation of the proposed kernel functions in practical experiments. i ii Acknowledgements Foremost, I would like to thank Prof. Dr. Rudi Studer, my advisor, for supporting me and my research during the last years. I have significantly benefited from his long experience and from his way of leading his team which has resulted in an outstanding research environment. I am also extremely grateful for his commitment to timely review this thesis despite his busy schedule. I would also like to thank Prof. Dr. Dr. Lars Schmidt-Thieme from University of Hildesheim who, without hesitation, accepted the request to serve on the exami- nation committee as second reviewer and provided detailed and valuable technical feedback. Many thanks to Prof. Dr. Detlef Seese and Prof. Dr. Oliver Stein who served on the examination committee as examiner and chairman, respectively. I am grateful to my friends and colleagues at LS3/WIM for the excellent and lively working atmosphere within the team. Special thanks to York Sure for mentoring my project and thesis work. Many thanks also to Philipp Cimiano and Peter Haase for reviewing parts of this thesis as well as to Max Völkel and Denny Vrandeˇci´cfor the fun discussions on dissertation progress during the Sushi lunchtime meetings designed for this purpose. Similarly, I am grateful to all external collaborators and project partners. In this context, special thanks to Prof. Dr. Alessandro Moschitti from University of Trento. Various parts of this thesis have grown out of collaboration with him and I have often benefited from his experience and opinion. Last but not least, I would like to thank my family and friends. My parents Brigitte and Jörg as well as my brother Johannes have at all times supported my way in every respect. Mostly, however, I am grateful to Melanie for her constant encouragement and, well, for liking me. Karlsruhe, August 2008 Stephan Bloehdorn iii iv Contents Abstract i Acknowledgements iii Notational Conventions xi 1 Introduction 1 1.1 Background .................................. 1 1.2 Research Questions and Contributions ................... 3 1.3 Outline ..................................... 5 1.4 Bibliographic Notes .............................. 7 I Preliminaries 9 2 Machine Learning and Kernel Methods 11 2.1 Mathematical Foundations .......................... 12 2.1.1 Vector Spaces, Inner Products and Hilbert Spaces ........ 12 2.1.2 Fundamental Concepts from Matrix Algebra ........... 17 2.2 Machine Learning as Pattern Analysis ................... 19 2.2.1 Classes and Properties of Machine Learning Problems ..... 20 2.2.2 Supervised Learning and Statistical Learning Theory ...... 22 2.2.3 Unsupervised Learning ....................... 26 2.3 Kernel-Based Learning Algorithms ..................... 27 2.3.1 Introducing Kernel-Based Learning: The Kernel Perceptron .. 27 2.3.2 Support Vector Machines ...................... 33 2.3.3 Kernel-Based Nearest Neighbour .................. 39 2.3.4 Kernel K-Means ............................ 40 2.4 Characterization of Kernels ......................... 41 2.4.1 Positive Semi-Definite Kernels ................... 42 2.4.2 Kernel Construction ......................... 45 2.4.3 Kernel Design ............................. 48 2.5 Performance Assessment ........................... 48 2.5.1 Binary Classification ......................... 49 v Contents 2.5.2 Multiclass Classification ....................... 51 2.5.3 Measuring Significance: the Paired T-Test ............. 54 2.6 Summary .................................... 55 3 Knowledge Structures 57 3.1 Background .................................. 58 3.1.1 Motivation ............................... 58 3.1.2 Ontologies and Knowledge Structures ............... 59 3.1.3 Semantic Web Standards ....................... 60 3.2 Formalizing Knowledge Structures ..................... 61 3.2.1 Elementary Knowledge Structures ................. 61 3.2.2 Ontologies and Formal Semantics ................. 63 3.2.3 Advanced Ontology Constructs .................. 66 3.3 Summary .................................... 68 II Kernels for Entities in Taxonomic Structures 69 4 Kernel Functions for Entities in Taxonomic Structures 71 4.1 Defining Similarity in Taxonomic Structures ................ 72 4.1.1 Similarity Functions ......................... 72 4.1.2 Structural Similarity in Taxonomic Structures ........... 74 4.1.3 Basic Notions on Taxonomic Structures .............. 75 4.2 Analysis of Set-Based Similarity Functions ................ 76 4.3 Analysis of Taxonomic Similarity Functions ................ 79 4.3.1 Path-Based Taxonomic Similarity .................. 79 4.3.2 Taxonomic Similarity by Wu&Palmer ............... 81 4.3.3 Information Content Based Similarity by Resnik ......... 82 4.3.4 Information Content Based Similarity by Lin ........... 84 4.3.5 Taxonomic Overlap and Related Similarity Functions ...... 85 4.4 Related Work ................................. 86 4.5 Summary and Discussion .......................... 87 III Kernels for Semantic Smoothing in Text Mining 89 5 Designing Kernel Functions for Semantic Smoothing 91 5.1 Text Mining and the Vector Space Model .................. 92 5.1.1 Text Mining Tasks .......................... 92 5.1.2 Fundamentals of the Vector Space Model ............. 93 5.1.3 Deficiencies of the Vector Space Model — Semantic Aspects .. 96 vi Contents 5.1.4 Deficiencies of the Vector Space Model — Syntactic Aspects .. 99 5.2 Semantic Smoothing Kernels ........................ 100 5.2.1 A General Model for Semantic Smoothing Kernels ........ 101 5.2.2 Interpretation of Semantic Smoothing Kernels .......... 103 5.2.3 Complexity and Optimization ................... 106 5.2.4 Extensions ............................... 108 5.3 Semantic Syntactic Tree Kernels ....................... 109 5.3.1 Context: Tree Kernels for Parse Trees ................ 110 5.3.2 Feature Space Design for Semantic Syntactic Tree Kernels ... 116 5.3.3 Efficient Computation of Semantic Syntactic Tree Kernels .... 120 5.3.4 Complexity .............................. 121 5.3.5 Extensions ............................... 121 5.4 Choice of Smoothing Parameters ...................... 123 5.4.1 Investigating Positive Semi-Definiteness Case-by-Case ..... 123 5.4.2 Fixing Indefinite Kernels ....................... 123 5.4.3 Matrix Squaring for Arbitrary Measures .............. 124 5.4.4 Using Similarity Functions based on Taxonomic Overlap .... 124 5.5 Related Work ................................. 124 5.5.1 Semantic Smoothing and Query Expansion ............ 125 5.5.2 Syntactic Representations ...................... 127 5.5.3 Other Kernel Paradigms for Text Data ............... 128 5.6 Summary and Discussion .......................... 128 6 Experiments