Using Structural and Semantic Methodologies to Enhance Biomedical Terminologies Zhe He New Jersey Institute of Technology
Total Page:16
File Type:pdf, Size:1020Kb
New Jersey Institute of Technology Digital Commons @ NJIT Dissertations Theses and Dissertations Fall 2013 Using structural and semantic methodologies to enhance biomedical terminologies Zhe He New Jersey Institute of Technology Follow this and additional works at: https://digitalcommons.njit.edu/dissertations Part of the Computer Sciences Commons Recommended Citation He, Zhe, "Using structural and semantic methodologies to enhance biomedical terminologies" (2013). Dissertations. 144. https://digitalcommons.njit.edu/dissertations/144 This Dissertation is brought to you for free and open access by the Theses and Dissertations at Digital Commons @ NJIT. It has been accepted for inclusion in Dissertations by an authorized administrator of Digital Commons @ NJIT. For more information, please contact [email protected]. Copyright Warning & Restrictions The copyright law of the United States (Title 17, United States Code) governs the making of photocopies or other reproductions of copyrighted material. Under certain conditions specified in the law, libraries and archives are authorized to furnish a photocopy or other reproduction. One of these specified conditions is that the photocopy or reproduction is not to be “used for any purpose other than private study, scholarship, or research.” If a, user makes a request for, or later uses, a photocopy or reproduction for purposes in excess of “fair use” that user may be liable for copyright infringement, This institution reserves the right to refuse to accept a copying order if, in its judgment, fulfillment of the order would involve violation of copyright law. Please Note: The author retains the copyright while the New Jersey Institute of Technology reserves the right to distribute this thesis or dissertation Printing note: If you do not wish to print this page, then select “Pages from: first page # to: last page #” on the print dialog screen The Van Houten library has removed some of the personal information and all signatures from the approval page and biographical sketches of theses and dissertations in order to protect the identity of NJIT graduates and faculty. ABSTRACT USING STRUCTURAL AND SEMANTIC METHODOLOGIES TO ENHANCE BIOMEDICAL TERMINOLOGIES by Zhe He Biomedical terminologies and ontologies underlie various Health Information Systems (HISs), Electronic Health Record (EHR) Systems, Health Information Exchanges (HIEs) and health administrative systems. Moreover, the proliferation of interdisciplinary research efforts in the biomedical field is fueling the need to overcome terminological barriers when integrating knowledge from different fields into a unified research project. Therefore well-developed and well-maintained terminologies are in high demand. Most of the biomedical terminologies are large and complex, which makes it impossible for human experts to manually detect and correct all errors and inconsistencies. Automated and semi-automated Quality Assurance methodologies that focus on areas that are more likely to contain errors and inconsistencies are therefore important. In this dissertation, structural and semantic methodologies are used to enhance biomedical terminologies. The dissertation work is divided into three major parts. The first part consists of structural auditing techniques for the Semantic Network of the Unified Medical Language System (UMLS), which serves as a vocabulary knowledge base for biomedical research in various applications. Research techniques are presented on how to automatically identify and prevent erroneous semantic type assignments to concepts. The Web-based adviseEditor system is introduced to help UMLS editors to make correct multiple semantic type assignments to concepts. It is made available to the National Library of Medicine for future use in maintaining the UMLS. The second part of this dissertation is on how to enhance the conceptual content of SNOMED CT by methods of semantic harmonization. By 2015, SNOMED will become the standard terminology for EHR encoding of diagnoses and problem lists. In order to enrich the semantics and coverage of SNOMED CT for clinical and research applications, the problem of semantic harmonization between SNOMED CT and six reference terminologies is approached by 1) comparing the vertical density of SNOMED CT with the reference terminologies to find potential concepts for export and import; and 2) categorizing the relationships between structurally congruent concepts from pairs of terminologies, with SNOMED CT being one terminology in the pair. Six kinds of configurations are observed, e.g., alternative classifications, and suggested synonyms. For each configuration, a corresponding solution is presented for enhancing one or both of the terminologies. The third part applies Quality Assurance techniques based on “Abstraction Networks” to biomedical ontologies in BioPortal. The National Center for Biomedical Ontology provides BioPortal as a repository of over 350 biomedical ontologies covering a wide range of domains. It is extremely difficult to design a new Quality Assurance methodology for each ontology in BioPortal. Fortunately, groups of ontologies in BioPortal share common structural features. Thus, they can be grouped into families based on combinations of these features. A uniform Quality Assurance methodology design for each family will achieve improved efficiency, which is critical with the limited Quality Assurance resources available to most ontology curators. In this dissertation, a family-based framework covering 186 BioPortal ontologies and accompanying Quality Assurance methods based on abstraction networks are presented to tackle this problem. USING STRUCTURAL AND SEMANTIC METHODOLOGIES TO ENHANCE BIOMEDICAL TERMINOLOGIES by Zhe He A Dissertation Submitted to the Faculty of New Jersey Institute of Technology in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science Department of Computer Science January 2014 Copyright © 2014 by Zhe He ALL RIGHTS RESERVED . APPROVAL PAGE USING STRUCTURAL AND SEMANTIC METHODOLOGIES TO ENHANCE BIOMEDICAL TERMINOLOGIES Zhe He Dr. James Geller, Dissertation Co-Advisor (Date) Professor and Chair, Department of Computer Science, NJIT Dr. Yehoshua Perl, Dissertation Co-Advisor (Date) Professor, Department of Computer Science, NJIT Dr. Mei Liu, Committee Member (Date) Assistant Professor, Department of Computer Science, NJIT Dr. Michael Halper, Committee Member (Date) Professor and Program Director, Information Technology Program, NJIT Dr. Gai Elhanan, Committee Member (Date) Chief Medical Information Officer, Halfpenny Technologies, Inc. Dr. Chunhua Weng, Committee Member (Date) Assistant Professor, Department of Biomedical Informatics, Columbia University BIOGRAPHICAL SKETCH Author: Zhe He Degree: Doctor of Philosophy Date: January 2014 Undergraduate and Graduate Education: • Doctor of Philosophy in Computer Science, New Jersey Institute of Technology, Newark, NJ, 2014 • Master of Science in Computer Science, Columbia University in the City of New York, New York, NY, 2009 • Bachelor of Engineering in Computer Science and Technology, Beijing University of Posts and Telecommunications, Beijing, P. R. China, 2007 Major: Computer Science Publications: Zhe He, Yehoshua Perl, Gai Elhanan and James Geller. “Auditing the Extents of Top Semantic Types of the UMLS Semantic Network, ” to be submitted for journal publication. Zhe He, Charles Paul Morrey, Yehoshua Perl and James Geller. “Sculpting the UMLS Refined Semantic Network,” to be submitted for journal publication. Zhe He, Christopher Ochs, James Geller and Yehoshua Perl. “A Structural Meta- Ontology for Family-Based Quality Assurance Framework for Biomedical Ontologies in BioPortal, ” to be submitted for journal publication. Siviram Arabandi, Christopher Ochs, Zhe He, Yehoshua Perl and James Geller. “Quality Assurance for the Sleep Domain Ontology Using Abstraction Networks, ” to be submitted for journal publication. James Geller, Zhe He and Gai Elhanan. “Categorizing the Relationship between Structurally Congruent Concepts from Pairs of Terminologies, ” submitted for publication. iv Zhe He, Christopher Ochs, Ankur Agrawal, Yehoshua Perl, Dimitris Zeginis, Konstantinos Tarabanis, Gai Elhanan, Michael Halper, Natasha Noy and James Geller. “A Family-based Framework for Supporting Quality Assurance of Biomedical Ontologies in BioPortal,” Proceedings of AMIA 2013 Annual Symposium, November 16-20, Washington, D.C: 581-90. Zhe He, Christopher Ochs, Larisa Soldatova, Yehoshua Perl, Sivaram Arabandi and James Geller. “Auditing Excess Reuse of a Top Level Ontology for the Drug Discovery Investigation Ontology,” Proceedings of 2013 International Workshop on Vaccine and Drug Ontology Studies. July 7, 2013, Montreal, QC, Canada. James Geller, Christopher Ochs, Zhe He and Yehoshua Perl. “A Structural Meta- ontology for the BioPortal Ontologies,” Proceedings of Bio-ontologies 2013. July 20, 2013, Berlin, Germany. Christopher Ochs, Zhe He, Yehoshua Perl, Sivaram Arabanadi and James Geller. “Refining the Granularity of Abstraction Networks for the Sleep Domain Ontology,” Proceedings of the 4th International Conference on Biomedical Ontology. July 8-9, 2013, Montreal, QC, Canada: 84-9. Ankur Agrawal, Zhe He, Duo Wei, Michael Halper, Yehoshua Perl and Gai Elhanan. “The Readiness of SNOMED Problem List Concepts for Meaningful Use of EHRs,” Artificial Intelligence in Medicine, 2013. (58)2: 73-80. James Geller, Zhe He, Yehoshua Perl, C. Paul Morrey and Julia Xu. “Rule-Based Support