Knowledge Acquisition Framework from Unstructured Biomedical Knowledge Sources Demeke Asres Ayele
Total Page:16
File Type:pdf, Size:1020Kb
Knowledge Acquisition Framework from Unstructured Biomedical Knowledge Sources Demeke Asres Ayele To cite this version: Demeke Asres Ayele. Knowledge Acquisition Framework from Unstructured Biomedical Knowledge Sources. Information Retrieval [cs.IR]. Université d’Addis Abeba, 2016. English. tel-02087577 HAL Id: tel-02087577 https://tel.archives-ouvertes.fr/tel-02087577 Submitted on 18 Oct 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES KNOWLEDGE ACQUISITION FRAMEWORK FROM UNSTRUCTURED BIOMEDICAL KNOWLEDGE SOURCES DEMEKE ASRES AYELE A THESIS SUBMITTED TO IT DOCTORAL PROGRAM ADDIS ABABA UNIVERSITY PRESENTED IN FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN INFORMATION TECHNOLOGY (LANGUAGE TECHNOLOGY) Jury: Prof. Luciano Serafìni External Examiner Dr. Solonron Tefera Internal Examiner Dr. Jean-Pierre Chevallet Supervisor Dr. Getnet Mitikie Co-Supervisor Dr. Milion Mcshesha Co-Supervisor Dr. Dida Midekso Chairman ADDIS ABABA, ETHIOPIA 9 August 2016 Knowledge Acquisition Framework from Unstructured Biomedical Knowledge Sources DEMEKE ASRES AYELE Addis Ababa University Oct 2016 Addis Ababa University School of Graduate Studies This is to certify that the thesis prepared by Demeke Asres Ayele, entitled Knowledge Acquisition Framework from Biomedical Knowledge Sources and submitted in fulfillment of the requirement for the Degree of Doctor of Philosophy in Information Technology (Language Technology) complies with the regulations of the university and meets the accepted standards with respect to originality and quality. Signed By Examining Committee: External Examiner: _____________________________Signature: __________ Date: ________ Internal Examiner: _____________________________Signature: __________ Date: ________ Principal Advisor: _____________________________Signature: __________ Date: ________ Co-Advisor: _____________________________Signature: __________ Date: ________ Co-Advisor: _____________________________Signature: __________ Date: ________ Chair of Department or Graduate Program Coordinator Abstract In biomedicine, the explosion of textual knowledge sources has introduced formidable challenges for knowledge-aware information systems. Traditional knowledge acquisition methods have been proved costly, resource intensive and time consuming. Automation of large scale knowledge acquisition systems requires narrowing down the semantic gap between biomedical texts and structured representations. In this context, this study proposes a knowledge acquisition framework from biomedical texts. This contributes towards reducing efforts, time and cost incurred to minimaize ontology acquisition bottlenecks. The proposed framework approximates, models, structures and ontologizes implicit knowledge buried in biomedical texts. In the framework, the semantic disambiguator approximates biomedical artefacts from biomedical texts. The conceptual disambiguator models and structures the biomedical knowledge abstracted from the domain texts. Ontologization presents an explicit interpretation of biomedical artefacts and conceptualizations. The components of the framework are instantiated with scientific and clinical text documents and produced about four million concepts and seven million associations. This set of artefacts is structured into the lower ontological knowledge structure where the upper ontology structure is reused from existing ones. The conceptual structure is represented with graph formalism. The formal interpretation is based on OWL DL language primitives and constructs, which generates a set of OWL DL axioms. The set of OWL DL axioms is referred as the OWL ontology ( K o ). The extent of approximation and quality of structural design are evaluated using criteria-based methods. A set of metrics is used to measure each criterion and showed encouraging results. Correctness measurements for concept entity are 70% for accuracy, 82% for completeness, 68% for conciseness and 100% for consistency. Quality measurement showed complex ontology structure with metrics values of 986,448 for vocabulary size, 18.73 for connectivity density, 145,246 for tree impurity and 226, 698 for graph entropy. The ontology schema potential metrics values are also 0.80 for relationship richness, 3 for attribute richness and 13,253 for inheritance richness. Ontology clarity showed an average readability, which is 3 attributes on average. The proposed framework has limitations to address the acquisition of individuals and entity attributes, losing cardinality information in the acquisition of the ontological knowledge. These lead to limitations on the formal interpretation of biomedical semantics, which in turn lead to deploy only existential restriction based interpretations. Thus, a way forward has been recommended to enhance semantic disambiguation and ontologization of the proposed framework so that they enable to accommodate the acquisition of cardinality and attribute information. Keywords: Semantic Disambiguation, Conceptual Disambiguation, Ontologization, Knowledge Acquisition Framework, Biomedical Knowledge Source, Ontological Knowledge i Dedication To my parents ii Acknowledgements First of all I would like to thank the almighty God, who helped me to succeed in this dissertation work with His power and love. He was with me through the ups and downs and supported me to achieve this success, and I gave the glory to Him. The Bible says “But we have this treasures in earthen vessels, that the Excellency of the power may be of God, and not off us” (II CORINTHIANS 4: 7). Nothing is impossible for God! I also thank Addis Ababa University, IT Doctoral Program, for all financial, academic and technical support in pursuing my study. I will also never forget the MRIM team of LIG lab, Grenoble, in supporting me to develop good technical experiences through my supervisor, Jean-Pierre Chevallet. My gratitude also goes to my supervisors. I really thank my principal supervisor, Jean-Pierre Chevallet (Assoc. Prof.), for his supervision, dedicated help and advice through out this dissertation work. He has been a true inspiration throughout my research period. He thought me how to think critically and confidently in the research I was doing. He was encouraged me to publish and work hard without losing patience. My gratitude also goes to my co-supervisors, Dr Million and Prof. Getnet. My deep gratitude goes to Dr Million for his unlimited professional guidance and patience, from the planning of the research to its write up. I also appreciate Dr Million for his dedicated help on the academic and administrative matters throughout the research work. My thankfulness also goes to Professor Getnet for his brotherly advice and concern towards the biomedical understandings of each part of the research work. I also like to appreciate Prof. Getnet on his concern on the over all research accomplishment and finalization of the dissertation. My deepest gratitude also goes to the IT Doctoral Program community. It was very helping, positive, cooperative, which has given me moral, strength and strong commitment towards successful completion of my dissertation. It is experienced in establishing research environments, particularly encouraging effortful students towards successful completion of their studies. iii Table of Contents Table of Contents .................................................................................................................... iv List of Figures...........................................................................................................................ix List of Tables .............................................................................................................................x Chapter One Introduction - 1 - 1.1 Background.................................................................................................................. - 1 - 1.2 Statement of the Problem ............................................................................................. - 5 - 1.3 Objective of the Study.................................................................................................. - 8 - 1.3.1 General Objective .............................................................................................. - 8 - 1.3.2 Specific Objectives ............................................................................................ - 8 - 1.4 Scope and Limitation ................................................................................................... - 9 - 1.5 Significance ............................................................................................................... - 12 - 1.6 Contribution............................................................................................................... - 13 - 1.7 Methodology.............................................................................................................. - 14 - 1.7.1 Research Design .............................................................................................