Linking Electronic Health Records with the Biomedical Literature

LINKING ELECTRONIC HEALTH RECORDS WITH THE BIOMEDICAL LITERATURE A THESIS SUBMITTED TO THE UNIVERSITY OF MANCHESTER FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN THE FACULTY OF SCIENCE AND ENGINEERING 2016 By Xiao Fu School of Computer Science Contents List of Tables ............................................................................................................... 5 List of Figures ............................................................................................................. 7 Abstract ..................................................................................................................... 10 Declaration ................................................................................................................ 11 Copyright ................................................................................................................... 12 Acknowledgements ................................................................................................... 14 Abbreviations ............................................................................................................ 15 1 Introduction ........................................................................................................... 19 1.1 Motivation ........................................................................................................ 19 1.2 Research hypotheses and objectives ................................................................ 20 1.3 Overview of contributions ................................................................................ 22 2 Phenotype extraction.............................................................................................24 2.1 Literature review .............................................................................................. 24 2.1.1 Pre-processing ............................................................................................ 25 2.1.2 Post-processing .......................................................................................... 27 2.1.3 Phenotype extraction in the clinical domain .............................................. 28 2.1.4 Phenotype extraction in the biomedical domain ........................................ 29 2 2.1.5 Evidence-based medicine........................................................................... 30 2.2 Dataset .............................................................................................................. 32 2.3 Methodology .................................................................................................... 33 2.3.1 Baseline ...................................................................................................... 33 2.3.2 Truecasing .................................................................................................. 38 2.3.3 Abbreviation disambiguation ..................................................................... 42 2.3.4 Distributional similarity ............................................................................. 44 2.3.5 Hybrid methods .......................................................................................... 46 2.4 Results and discussion ...................................................................................... 47 2.4.1 Results ........................................................................................................ 50 2.4.2 Discussion .................................................................................................. 56 2.5 Summary .......................................................................................................... 57 3 Corpus development .............................................................................................. 59 3.1 Literature review .............................................................................................. 60 3.2 Document collection ........................................................................................ 63 3.2.1 Clinical records .......................................................................................... 63 3.2.2 Biomedical literature .................................................................................. 70 3.3 Annotation scheme and procedure ................................................................... 76 3.3.1 Annotation scheme .................................................................................... 76 3.3.2 Text mining-assisted annotation ................................................................ 78 3.4 Results and discussion ...................................................................................... 87 3.4.1 Evaluation in the clinical domain .............................................................. 87 3.4.2 Evaluation in the biomedical domain ........................................................ 88 3.5 Summary .......................................................................................................... 96 4 Phenotype normalisation ...................................................................................... 98 4.1 Literature review ............................................................................................ 101 4.1.1 Concept normalisation ............................................................................. 101 4.1.2 Domain adaptation ................................................................................... 103 4.2 Methodology .................................................................................................. 105 4.2.1 Compositional synonym generation ........................................................ 106 4.2.2 Normalisation of syntactic structures....................................................... 115 4.2.3 Normalisation of acronyms and abbreviations ........................................ 123 4.2.4 Normalisation of plural and singular ....................................................... 124 4.2.5 Individual and hybrid methods ................................................................ 126 4.3 Results and discussion .................................................................................... 126 4.3.1 Variants generation .................................................................................. 126 4.3.2 Evaluation ................................................................................................ 128 4.4 Discussion ...................................................................................................... 137 4.5 Summary ........................................................................................................ 144 5 Conclusion ............................................................................................................ 145 5.1 Evaluation of research objectives ................................................................... 145 5.2 Future work .................................................................................................... 150 References................................................................................................................ 152 Appendix A List of COPD phenotypes used to retrieve articles from the PubMed OpenAccess subset .................................................................................. 172 Appendix B Finer-grained COPD phenotypes .................................................... 189 B.1 Problem .......................................................................................................... 189 B.2 Treatment ....................................................................................................... 191 B.3 Test ................................................................................................................ 191 Word Count: 38,210 4 List of Tables Table 2.1: Examples from a pair of report and annotation files. ............................... 33 Tabl Trauma series demonstrated no evidence of a pelvic fracture. ..................................................................................................... 37 Table 2.3: Proximity relationships used to calculate DS. .......................................... 44 Table 2.4: Definition of TP, FP, and FN. .................................................................. 47 Table 2.5: Exact and relaxed matching results for concept extraction. ..................... 51 Table 2.6: Exact and relaxed matching results for each concept type ((a) Problem; (b) Treatment; (c) Test) ................................................................................................... 53 Table 2.7: Number of unrecognised abbreviations. ................................................... 57 Table 3.1: Number of MIMIC clinical records of each disease and its relationship with COPD ................................................................................................................ 65 Table 3.2: Distribution of 1000 MIMIC II clinical records ....................................... 69 Table 3.3: Description of collected clinical records of COPD patients..................... 71 Table 3.4: PMC journals relevant to COPD. ............................................................. 72 Table 3.5: 30 top-ranked biomedical articles. ........................................................... 74 Table 3.6: Annotation scheme for capturing COPD phenotypes (*: Semantic types adapted from the PhenoCHF scheme) ....................................................................... 77 Table 3.7: Examples

Linking Electronic Health Records with the Biomedical Literature

Role and Regulation of the P53-Homolog P73 in the Transformation of Normal Human Fibroblasts

Biological Functions of CDK5 and Potential CDK5 Targeted Clinical Treatments

Genomic Structure of the Human Caldesmon Gene

Proteins Regulating Cyclin Dependent Kinases Cdk4 and Cdk5

View of the Foregoing There Are a Nephropathy [18]

Loss of LMOD1 Impairs Smooth Muscle Cytocontractility and Causes

Erectile Dysfunction and Cancer: Current Perspective Renu Madan, Chinna Babu Dracham, Divya Khosla, Shikha Goyal, Arun Kumar Yadav

Gene Amplifications at Chromosome 7 of the Human Gastric Cancer Genome

Update on Endometrial Stromal Tumours of the Uterus

Differential Gene Regulation in Fibroblasts in Co-Culture with Keratinocytes and Head and Neck SCC Cells

Caldesmon Effects on the Actin Cytoskeleton and Cell Adhesion in Cultured HTM Cells

Network-Based Analysis of Oligodendrogliomas Predicts Novel