Requirements, Design and Applications to Semantic Integration and Knowledge Discovery
Total Page:16
File Type:pdf, Size:1020Kb
Practical Ontologies: Requirements, Design and Applications to Semantic Integration and Knowledge Discovery by Daniela Rosu A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Computer Science University of Toronto © Copyright by Daniela Rosu 2013 Practical Ontologies: Requirements, Design and Applications to Semantic Integration and Knowledge Discovery Daniela Rosu Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2013 Abstract Due to their role in describing the semantics of information, knowledge representations, from formal ontologies to informal representations such as folksonomies, are becoming increasingly important in facilitating the exchange of information, as well as the semantic integration of information systems and knowledge discovery in a large number of areas, from e-commerce to bioinformatics. In this thesis we present several studies related to the development and application of knowledge representations, in the form of practical ontologies. In Chapter 2, we examine representational challenges and requirements for describing practical domain knowledge. We present representational requirements we collected by surveying current and potential ontology users, discuss current approaches to codifying knowledge and give formal representation solutions to some of the issues we identified. In Chapter 3 we introduce a practical ontology for representing data exchanges, discuss its relationship with existing standards and its role in facilitating the interoperability between information producing and consuming services. ii We also consider the problem of assessing similarity between ontological concepts and propose three novel measures of similarity, detailed in Chapter 4. Two of our proposals estimate semantic similarity between concepts in the same ontology, while the third measures similarity between concepts belonging to different ontologies. In Chapters 5 and 6 we discuss applications of our similarity measures. Specifically, in Chapter 5 we present a method for discovering mappings between different bio-ontologies and discuss the results of a quality assessment performed by two expert biologists on the top mappings identified by our method. In Chapter 6 we detail an integrative method for discovering gene associations: the Gene Association Predictor (GAP) and present the results of assessing the predictions made by GAP using a multi-pronged approach: (1) overlap with datasets of known, experimentally derived, gene associations, (2) comparison with other prediction services, and (3) manual curation of scientific literature. iii Acknowledgments Completing this thesis has been a long and arduous process and it is my pleasure to thank the many people who helped me along the way. Foremost, I would like to thank my supervisor, Dr. Igor Jurišica, who has been a steady influence during my PhD. Igor’s ability to balance research and personal pursuits inspired me; his advice was consistently timely and useful, his support and guidance invaluable, and his patience with me … endless. I have learned a lot from him and I am deeply grateful for his help in realizing my scholarly dreams. I owe an immense debt of gratitude to Dr. Michael Grüninger for his generous advice and constant encouragement. His passion for ontology engineering and patient teaching inspired me to want to learn more about representing knowledge and do research in this field. I benefited enormously from his suggestions, lectures, and the Torontology meetings he organizes. I am also very grateful to Dr. Ryan Lilien and Dr. Kelly Lyons who gave me detailed and constructive feedback on my thesis work and a welcome diversity of perspectives. Their encouragement and suggestions have been tremendously helpful. Dr. Periklis Andritsos, Dr. Lawrence Hunter, and Dr. Eric Yu graciously agreed to be my external examiners and their feedback was invaluable. This work could have not been completed without the financial support I received from the Centre for Advanced Studies at the IBM Toronto Lab and I am indebted to Joanna Ng and Alex Lau for their assistance and collaboration during my IBM fellowship. To all members of the Jurišica group: thank you for making me look forward to coming to the lab everyday. Chiara, Fiona, Mahima, Dan, and Kevin: many thanks for answering with patience my (many) biology related questions. Kristen and Max: thank you for our many long late night discussions and for the patience with which you read and commented on the early drafts of my thesis. Cristiana and Sara: thanks for being awesome officemates and such dear friends, and for always being there to listen, console (most of the time), but also celebrate with me. I could have not iv done it without you! Marc: I will always remember our many lively discussions. Thanks for constantly encouraging me to look forward to a life after grad school. Christian: thank you for always making me smile (and for listening with unwavering patience to my random ramblings). A big thank you to my many CS friends (Yannis, Periklis, Mihaela, Matei, Anastasia, Sam, Tassos, Sotirios, Adrian, Natasha, Stavros, Themis, Fanis, Cosmin, Antonina, Apostolis, George 2, Kleoni, Stratis, Grace, Rui, Abraham) who shared with me over the years the ups and downs of grad school and who also taught me how to celebrate, … really celebrate! This thesis would have not been completed without the love and support of my friends outside school, who kept me sane and constantly reminded me that there is life outside the lab. Raluca, Valentina, Charin, Rani,Yuki: thank you for being there for me! Răzvan: I am very sorry you had to bear the brunt of the frustrations I have felt during these long years in grad school. We shared many happy moments, too, and I have learned a lot from you. I hope you can also share in the joy of the (late) successes. Lastly, and most importantly, I would not have chosen this path if not for my parents, who instilled within me a passion for science and research. To them I dedicate this thesis. MulŃumesc pentru tot! v Table of Contents Acknowledgments.......................................................................................................................... iv Table of Contents.......................................................................................................................... vii List of Tables ................................................................................................................................. ix List of Figures................................................................................................................................. x List of Appendices ........................................................................................................................ xii 1 Introduction................................................................................................................................ 1 1.1 What is “knowledge”? ........................................................................................................ 1 1.2 What is a knowledge representation ? ................................................................................ 3 1.3 What is a (practical) ontology? ......................................................................................... 11 1.4 Applications of ontologies ................................................................................................ 15 1.5 Research contributions of this thesis................................................................................. 17 1.6 Thesis Outline ................................................................................................................... 24 2 Knowledge Representation Requirements for Practical Ontologies........................................ 25 2.1 Requirements for practical ontology frameworks – users’ perspective............................ 26 2.2 Practical representations of knowledge ............................................................................ 36 3 From Information Models to Knowledge Models – Insights into Practical Ontology Design: towards a Core Business Data Interchange Ontology ................................................ 82 3.1 Related work ..................................................................................................................... 85 3.2 Ontology engineering strategy.......................................................................................... 86 3.3 A short overview of the Process Specification Language ................................................ 90 3.4 A short overview of the Open Applications Group Integration Specification (OAGIS).. 93 3.5 Overview of the Core Business Data Interchange Ontology............................................ 98 3.6 Discussion and future work ............................................................................................ 111 4 Measuring the Similarity between Ontological Concepts...................................................... 117 4.1 Feature-based similarity measures.................................................................................. 119 vi 4.2 Edge-based similarity measures...................................................................................... 120 4.3 Mutual information-based similarity measures .............................................................. 121 4.4 A new mutual information-based measure for the Gene Ontology ...............................