Mining Patient Journeys from Healthcare Narratives

Total Page:16

File Type:pdf, Size:1020Kb

Mining Patient Journeys from Healthcare Narratives MINING PATIENT JOURNEYS FROM HEALTHCARE NARRATIVES A THESIS SUBMITTED TO THE UNIVERSITY OF MANCHESTER FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN THE FACULTY OF ENGINEERING AND PHYSICAL SCIENCES 2014 By Azad Dehghan School of Computer Science Contents Abstract 14 Declaration 16 Copyright 17 Acknowledgements 18 1 Introduction 23 1.1 Hypotheses and research questions................... 25 1.2 Aim and objectives........................... 26 1.3 Contributions.............................. 27 1.4 Thesis structure............................. 28 2 Background 29 2.1 Text Mining............................... 29 2.1.1 Natural language processing.................. 30 2.1.2 Information extraction..................... 40 2.1.3 Named entity recognition.................... 47 2.1.4 Temporal information extraction................ 54 2.1.5 Temporal entity extraction................... 59 2.1.6 Temporal relation extraction.................. 66 2.2 Clinical Background.......................... 77 2.2.1 Clinical text........................... 77 2.2.2 Computer aided standardisation of clinical care........ 77 2.2.3 Health-related quality of life.................. 78 2.3 Summary................................ 81 2 3 Extraction of Health-related Concepts 83 3.1 Event Extraction............................ 84 3.1.1 Methods............................ 85 3.1.2 Data............................... 89 3.1.3 Experiments, results, and discussion.............. 91 3.2 Health-related Quality of Life Extraction................ 96 3.2.1 HrQoL Schema......................... 96 3.2.2 Data............................... 99 3.2.3 Methods............................ 100 3.2.4 Experiments, results, and discussion.............. 103 3.3 Summary................................ 108 4 Temporal Information Extraction 110 4.1 Temporal Entity Extraction....................... 111 4.1.1 Methods............................ 111 4.1.2 Data............................... 115 4.1.3 Experiments, results, and discussion.............. 116 4.2 Temporal Relation Extraction...................... 119 4.2.1 Methods............................ 119 4.2.2 Data............................... 125 4.2.3 Experiments, results, and discussion.............. 126 4.3 Summary................................ 129 5 Extracting Patient Journeys: a Case Study 132 5.1 Introduction: Childhood Central Nervous System Tumours...... 133 5.2 Introduction: Case Study........................ 134 5.2.1 Data............................... 137 5.3 Comparative Analysis of Narratives.................. 140 5.3.1 Aggregated analysis of narratives............... 141 5.3.2 Individual case analysis of narratives............. 148 5.3.3 Discussion........................... 157 5.4 Extracting Patient Journeys....................... 158 5.4.1 Methods............................ 158 5.4.2 Evaluation........................... 162 5.4.3 Individual patient journeys................... 163 5.4.4 Aggregated patient journeys.................. 172 3 5.4.5 Discussion........................... 175 5.5 Visualising Patient Timelines...................... 176 5.6 Summary................................ 179 6 Conclusion 181 6.1 Contributions.............................. 181 6.2 Limitations............................... 183 6.3 Future work............................... 184 6.4 Summary................................ 185 A Background 213 A.1 A sample clinical note......................... 213 A.2 Token level sequence label modelling................. 214 A.3 Transitive closure............................ 214 A.4 HUI-2 classification system....................... 215 B Extraction of Health-related Concepts 216 B.1 NER annotation examples....................... 216 B.2 HrQoL NER............................... 218 C Tools and Resources 220 C.1 NLP pre-processing........................... 220 C.2 Implementation............................. 220 D HrQoL Annotation Guideline 221 D.1 Introduction............................... 221 D.1.1 HrQoL schema description................... 222 D.1.2 HrQoL annotation....................... 222 D.2 HrQoL schema............................. 225 D.3 Ambiguous cases............................ 230 D.3.1 Indirect references....................... 230 D.4 What (not)? to annotate......................... 231 D.4.1 What not to annotate...................... 231 D.4.2 What to annotate........................ 231 D.5 More examples............................. 232 4 E Extracting Patient Journeys: a Case Study 234 E.1 Comparative Analysis of Narratives.................. 234 E.2 Extracting Patient Journeys....................... 236 5 Word Count: 61,885 6 List of Tables 2.1 A example of lexical normalisation................... 32 2.2 Common negation phrases in clinical data............... 37 2.3 Evaluation variables matrix....................... 38 2.4 Common rule-based languages..................... 44 2.5 Top systems in the 2010 i2b2 event extraction task........... 51 2.6 Top systems in the 2012 i2b2 event extraction task........... 52 2.7 Common data-driven features used for clinical event extraction.... 52 2.8 TIMEX3 representation schema.................... 57 2.9 TLINK representation schema..................... 57 2.10 TempEval-2 TERN results....................... 62 2.11 TempEval-3 TERN results....................... 63 2.12 2012 i2b2 TERN: methods and resources............... 64 2.13 2012 i2b2 TERN results........................ 65 2.14 Common data-driven features used for clinical TER.......... 65 2.15 TempEval-3: TLINK identification and classification task....... 69 2.16 TempEval-3: TLINK end-to-end task.................. 69 2.17 TempEval-3: approaches for TLINK identification........... 70 2.18 TempEval-3: approaches for TLINK classification........... 70 2.19 2012 i2b2: TLINK identification and classification task........ 72 2.20 2012 i2b2: TLINK end-to-end task................... 72 2.21 Common TLINK classification features................ 75 2.22 HrQoL instruments and corresponding classification dimentions... 81 3.1 Definition of EVENT categories.................... 84 3.2 Feature template: clinical EVENTs................... 87 3.3 Label fixer heuristic........................... 88 3.4 EVENT annotated datasets....................... 90 3.5 i2b2-TRC: EVENT IAA........................ 91 7 3.6 i2b2-CARC: EVENT IAA....................... 91 3.7 EVENT label distribution........................ 91 3.8 EVENT extraction validation test results................ 92 3.9 EVENT extraction results on the held-out test set........... 93 3.10 The clinical NER performance..................... 94 3.11 EVENT recognition: feature impact analysis.............. 94 3.12 EVENT recognition: dataset impact.................. 95 3.13 EVENT recognition: post-processing impact analysis......... 95 3.14 Performance of the offficial 2012 i2b2 submission........... 96 3.15 Example of HrQoL concepts...................... 99 3.16 HrQoL dataset label distribution.................... 100 3.17 HrQoL dataset IAA........................... 100 3.18 HrQoL dictionary summary....................... 102 3.19 HrQoL NER results on the development set.............. 104 3.20 HrQoL NER results on the held-out test set.............. 105 3.21 The HrQoL NER performance..................... 105 3.22 The HrQoL NER impact analysis.................... 107 4.1 Feature template: clinical TER..................... 114 4.2 TIMEX3 label distribution....................... 116 4.3 i2b2-TRC: TIMEX3 IAA........................ 116 4.4 TER validation results......................... 116 4.5 TER evaluation on the held-out test set................. 117 4.6 TE normalisation results........................ 118 4.7 TLINK patterns............................. 122 4.8 TLINK extraction features....................... 124 4.9 TLINK label distribution........................ 126 4.10 i2b2-TRC: TLINK IAA......................... 126 4.11 TLINK development set results..................... 127 4.12 TLINK results on the held-out test set................. 128 4.13 TLINK component based evaluation.................. 128 4.14 TLINK end-to-end results on the held-out test set........... 129 5.1 Adopted case study specific types................... 137 5.2 Case study corpus: clinical narratives profile.............. 138 5.3 Case study corpus: patient narratives profile.............. 138 8 5.4 Top occuring concept in clinical and patient narratives......... 145 5.5 Proportion of traditional clinical concepts in patient narratives.... 145 5.6 Proportion of HrQoL concepts in clinical narratives.......... 146 5.7 A semantic analysis of patient narratives................ 147 5.8 A semantic analysis of clinical narratives................ 147 5.9 An example list of concepts and their PathCluster confidence..... 162 5.10 Test case A: a tabular view of high-level processes........... 164 5.11 Test case B: a tabular view of high-level processes........... 167 5.12 Test case C: a tabular view of high-level processes........... 170 5.13 Aggregated patient pathway: a tabular view of high-level processes.. 172 A.1 Transitive relations........................... 214 B.1 Contextual reasoner exclusion cues................... 219 B.2 Boundary adjustment: adjectives.................... 219 D.1 Example 1 annotations......................... 226 D.2 Example 2 annotations......................... 226 D.3 Example 3 annotations......................... 227 D.4 Example 4 annotations......................... 227 D.5 Example 5 annotations......................... 228 D.6 Example 6 annotations........................
Recommended publications
  • Colin Cherry Catalogue
    List of Papers Colin Cherry 1914-1979 Imperial College Archives , I t * * * * * * * * * * * * * * * * * * * * * * * * IMPERIAL COLLEGE OF SCIENCE AND TECHNOLOGY ( University of London ) List of the papers and correspondence of COLIN CHERRY 1914 1979 Compiled by Jeanne Pingree * * * * * * * * * * * * * * * * * * * * * * * * IMPERIAL COLLEGE ARCHIVES 198 6 Copyright Imperial College Archives 1986 Artwork by David Rowe Cover and frontispiece printed by Imperial College Union Print Unit CON TEN T S Frontispiece: Professor Cherry: portrait, reproduced from a coloured photograph given by Mrs Heather Cherry Introduction iv 1 Biographical papers 1 2 Diaries 1 3 Early papers 1-2 4 Electrical Engineering Department, Imperial College 2-3 5 Departmental conferences etc 3 6 Postgraduate courses 4-5 7 Other lectures etc at Imperial College 4-5 8 Communications section. research projects and grants 5-6 9 Research students 6-7 10 General correspondence 7-24 11 Institution of Electrical Engineers 24-25 12 Books and unpublished papers 25 13 Publications 26 14 The age of access 26 15 Tapes 26-27 Appendix: list of publications 29-35 Name index 36-38 iii I N T ROD U C T ION Professor Cherry wrote several short biographical accounts ot his career. The following quotation is typical: Personal Notes Born 1914, St. Albans, England. Took first Degree (B.Sc.) as evening part-time student, Northampton Polytechnic, London 1936, whilst Laboratory Assistant at Research Laboratories, General Electric Company, of Great Britain. M.Sc. in 1940. During Second World War, 1939-45, was attached to the Radar Research Establishment, Malvern, mainly concerned with flying trials. After War, joined Manchester University as Lecturer.
    [Show full text]
  • The Cybernetics Moment New Studies in American Intellectual and Cultural History Jeffrey Sklansky, Series Editor the Cybernetics Moment
    The Cybernetics Moment New Studies in American Intellectual and Cultural History Jeffrey Sklansky, Series Editor The Cybernetics Moment Or Why We Call Our Age the Information Age RONALD R. KLINE Johns Hopkins University Press Baltimore © 2015 Johns Hopkins University Press All rights reserved. Published 2015 Printed in the United States of America on acid-free paper 9 8 7 6 5 4 3 2 1 Johns Hopkins University Press 2715 North Charles Street Baltimore, Maryland 21218-4363 www.press.jhu.edu Library of Congress Cataloging-in-Publication Data Kline, Ronald R., author. The cybernetics moment: or why we call our age the information age / Ronald R. Kline. pages cm. — (New studies in American intellectual and cultural history) Includes bibliographical references and index. ISBN 978-1-4214-1671-7 (hardcover : acid-free paper) — ISBN 1-4214-1671-9 (hardcover : acid-free paper) — ISBN 978-1-4214-1672-4 (electronic) — ISBN 1-4214-1672-7 (electronic) 1. Information theory. 2. Cybernetics—Social aspects. I. Title. II. Title: Cybernetics moment. III. Title: Why we call our age the information age. Q360.K56 2015 303.48'33—dc23 2014035091 A catalog record for this book is available from the British Library. Special discounts are available for bulk purchases of this book. For more information, please contact Special Sales at 410-516-6936 or [email protected]. Johns Hopkins University Press uses environmentally friendly book materials, including recycled text paper that is composed of at least 30 percent post-consumer waste, whenever possible. To the memory of Alfred Motz, Margot Ruth Marcotte, Maggie Marcotte Mattke, Raymond Orville Kline, and Nellie Frank Motz Information is information, not matter or energy.
    [Show full text]
  • The Legacy of George Zames
    590 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 43, NO. 5, MAY 1998 The Legacy of George Zames (January 7, 1934–August 10, 1997) EORGE ZAMES tragically passed away on August 10, Communications Theory Group of Norbert Wiener, Y. W. Lee, G1997, after a brief illness. Some of George’s friends and Amar Bose at MIT’s Research Laboratory of Electronics. have assembled this impressionistic overview of his seminal His doctoral thesis entitled, “Nonlinear Operations for System contributions to the field of Systems and Control. Analysis” submitted for the Sc.D. degree at MIT forms the George Zames was born on January 7, 1934, in Lodz, foundation of much of his later work in Systems and Control. Poland. He was a child living with his parents in Warsaw, In 1957, Niels Bohr arrived for a lecture tour of North Poland, when the bombing of that city on September 1, 1939, America and asked for a “typical American” student to guide marked the start of World War II. His family escaped Europe him around Cambridge. George was found to be appropriate in an odyssey through Lithuania, whose occupation by Soviet for this task, and after being asked by Norbert Wiener for tanks they witnessed, then through Russia, Siberia, followed an introduction to Bohr, witnessed a remarkable argument by a triple crossing of the Sea of Japan, eventually reaching between the two men on the merits of research into the natural Kobe, Japan, early in 1941. (This episode, which involved sciences, such as Physics versus the sciences which focus on the extraordinary help of the Japanese consul to Lithuania, man-made phenomena, notably Cybernetics.
    [Show full text]
  • A Iilstory of the THEORY of INFORMATION
    Paper No. 1177 621.391(091) RADIO SECTION A IIlSTORY OF THE THEORY OF INFORMATION By E. COLIN CHERRY, M.Sc., Associate Member. (The paper was first received 1th February, and in revised form 28th May, 1951.) SUMMARY most important of which has been his ability to receive, to The paper mentions first some essential points about the early communicate and to record his knowledge. Communication development of languages, codes and symbolism, picking out those essentially involves a language, a symbolism, whether this is a fundamental points in human communication which have recently been spoken dialect, a stone inscription, a cryptogram, a Morse-code summarized by precise mathematical theory. A survey of telegraphy signal or a chain of numbers in binary digital form in a modern and telephony development leads to the need for "economy," which computing machine. It is interesting to observe that as technical has given rise to various systems of signal compression. Hartley's applications have increased in complexity with the passage of early theory of communication is summarized, and Gabor's theory time, languages have increased in simplicity, until to-day we are of signal structure is described. Modern statistical theory of Wiener and Shannon, by which considering the ultimate compression of information in the "information" may be expressed quantitatively, is shown to be a simplest possible forms. It is important to emphasize, at the logical extension of Hartley's work. A Section on calculating start, that we are not concerned with the meaning or the truth machines and brains attempts to clear up popular misunderstandings of messages; semantics lies outside the scope of mathematical and to separate definite accomplishments in mechanization of thought information theory.
    [Show full text]