Extracting Semantic Concept Relations from Wikipedia Patrick Arnold Erhard Rahm Inst. of Computer Science, Leipzig University Inst. of Computer Science, Leipzig University Augustusplatz 10 Augustusplatz 10 04109 Leipzig, Germany 04109 Leipzig, Germany
[email protected] [email protected] ABSTRACT high complexity of language. For example, the concept pair Background knowledge as provided by repositories such as (car, automobile) is semantically matching but has no lex- WordNet is of critical importance for linking or mapping ical similarity, while there is the opposite situation for the ontologies and related tasks. Since current repositories are pair (table, stable). Hence, background knowledge sources quite limited in their scope and currentness, we investigate such as synonym tables and dictionaries are frequently used how to automatically build up improved repositories by ex- and vital for ontology matching. tracting semantic relations (e.g., is-a and part-of relations) The dependency on background knowledge is even higher from Wikipedia articles. Our approach uses a comprehen- for semantic ontology matching where the goal is to identify sive set of semantic patterns, finite state machines and NLP- not only pairs of equivalent ontology concepts, but all related techniques to process Wikipedia definitions and to identify concepts together with their semantic relation type, such as semantic relations between concepts. Our approach is able is-a or part-of. Determining semantic relations obviously to extract multiple relations from a single Wikipedia article. results in more expressive mappings that are an important An evaluation for different domains shows the high quality prerequisite for advanced mapping tasks such as ontology and effectiveness of the proposed approach.