Schema Matching and Mapping-Based Data Integration

SCHEMA MATCHING AND MAPPING-BASED DATA INTEGRATION Ph.d. Thesis by Do Hong Hai August 2005 Interdisciplinary Center for Bioinformatics and Department of Computer Science University of Leipzig Germany CHAPTER 0TABLE OF CONTENTS LIST OF FIGURES . V LIST OF TABLES . IX ACKNOWLEDGEMENTS . XIII ABSTRACT . XV PART I. INTRODUCTION CHAPTER 1. SCHEMA MATCHING . .3 1.1 Motivation . 3 1.2 Applications of Schema Matching . 5 1.3 Semantic Heterogeneity . 7 CHAPTER 2. PROBLEM DEFINITION . .11 2.1 Schemas . 11 2.2 Input Information . 12 2.3 Output Information . 13 2.4 Architecture for Generic Schema Matching . 14 CHAPTER 3. OPEN ISSUES AND CONTRIBUTIONS . .17 3.1 State of the Art and Open Issues . 17 3.2 Contributions . 20 3.3 Outline . 22 PART II. SCHEMA MATCHING APPROACHES CHAPTER 4. APPROACH CLASSIFICATION . .27 4.1 Classification Criteria . 27 4.2 Schema-based Matching . 28 4.3 Instance-based Matching . 32 4.4 Reuse-oriented Matching . 34 4.5 Combination Approaches . 36 4.6 Match Cardinality . 37 4.7 Summary . 38 CHAPTER 5. THE COMA AND COMA++ APPROACHES . .39 5.1 System Architecture . 40 5.2 Match Processing . 42 5.3 Schema Import and Representation . 44 5.4 Matcher Library . 47 5.5 Mapping Representation and Manipulation . 50 5.6 Implementation and Use . 52 5.7 Summary . 55 CHAPTER 6. REUSE OF PREVIOUS MATCH RESULTS . .57 6.1 Reuse Strategies . 57 6.2 Schema Similarity . 59 6.3 The MatchCompose Operation . 60 6.4 The Reuse Matcher . 62 6.5 Summary . 63 CHAPTER 7. MATCHER CONSTRUCTION . .65 7.1 CombineMatcher . 65 7.2 Determination of Elements/Constituents . 67 7.3 Combination of Similarity Values . 67 7.4 Configuration of Combined Matchers . 70 7.5 Summary . 72 CHAPTER 8. MATCH STRATEGIES . .73 8.1 Refinement of Match Results . 73 8.2 Context-dependent Match Strategies . 75 8.3 Fragment-based Match Strategies . 77 8.4 Summary . 79 CHAPTER 9. OTHER PROTOTYPES AND COMPARISON . .81 9.1 Schema-based Prototypes . 81 9.2 Instance-based Prototypes . 85 9.3 Ontology Matching Prototypes . 88 9.4 Prototype Comparison . 90 9.5 Summary . 95 PART III. SCHEMA MATCHING EVALUATION CHAPTER 10. EVALUATION CRITERIA . .99 10.1 Input: Test Problems and Auxiliary Information . 100 10.2 Output: Match Result . 100 10.3 Quality Measures . 101 II T ABLE OF CONTENTS 10.4 Methodology: What Effort is Measured and How . 103 10.5 Runtime Performance . 104 10.6 Summary . 105 CHAPTER 11. COMA++ EVALUATION: SCHEMA MATCHING . .107 11.1 Test Schemas and Series . 107 11.2 Experiment Design . 108 11.3 Combination Strategies . 110 11.4 Match Strategies and Matchers . 115 11.5 Reuse Strategies . 120 11.6 Summary . 127 CHAPTER 12. COMA++ EVALUATION: ONTOLOGY MATCHING . .131 12.1 Test Ontologies and Series . 131 12.2 Experiment Design . 133 12.3 Quality and Execution Time . 135 12.4 Summary . 139 CHAPTER 13. OTHER EVALUATIONS AND COMPARISON . .141 13.1 Individual Evaluations . 141 13.2 Comparative Evaluations . 144 13.3 Evaluation Comparison . 147 13.4 Contest-based Comparison . 148 13.5 Summary . 150 PART IV. MAPPING-BASED DATA INTEGRATION CHAPTER 14. DATA INTEGRATION IN BIOINFORMATICS . .155 14.1 Molecular-biological Data . 155 14.2 Integration Requirements . 157 14.3 State of the Art . 158 CHAPTER 15. THE GENMAPPER APPROACH . .161 15.1 Overview . 162 15.2 The Generic Annotation Model . 163 15.3 Data Import . 164 15.4 View Generation . 165 15.5 Implementation and Use . 167 15.6 Summary . 169 CHAPTER 16. THE HYBRID INTEGRATION APPROACH . .171 16.1 Overview . 171 16.2 Mapping Management . 173 16.3 Query Processing . 175.

Load more