Download Thesis

Total Page:16

File Type:pdf, Size:1020Kb

Download Thesis This electronic thesis or dissertation has been downloaded from the King’s Research Portal at https://kclpure.kcl.ac.uk/portal/ Big data approaches to investigating Child Mental Health disorder outcomes Downs, Jonathan Muir Awarding institution: King's College London The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without proper acknowledgement. END USER LICENCE AGREEMENT Unless another licence is stated on the immediately following page this work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International licence. https://creativecommons.org/licenses/by-nc-nd/4.0/ You are free to copy, distribute and transmit the work Under the following conditions: Attribution: You must attribute the work in the manner specified by the author (but not in any way that suggests that they endorse you or your use of the work). Non Commercial: You may not use this work for commercial purposes. No Derivative Works - You may not alter, transform, or build upon this work. Any of these conditions can be waived if you receive permission from the author. Your fair dealings and other rights are in no way affected by the above. Take down policy If you believe that this document breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 07. Oct. 2021 BIG DATA APPROACHES TO INVESTIGATING CHILD MENTAL HEALTH DISORDER OUTCOMES JOHNNY DOWNS Thesis submitted for the degree of Doctor of Philosophy September 2017 Department of Psychological Medicine Institute of Psychiatry, Psychology & Neuroscience King’s College London ABSTRACT Background: In the UK, administrative data resources continue to expand across publically funded youth-orientated health, education and social services. Despite attempts to capture these data in structured formats, which are more accessible for analysis, most health information is stored as free text entry in electronic records. Big data techniques which combine large scale data linkage and automatic information extraction from free text, using Natural Language Processing (NLP), have considerable potential for enhancing the depth of information available from routinely collected public service data. There are a very limited number of published studies which have applied these big data techniques to answer questions relevant to child and adolescent psychiatry. Methods: This thesis examined original and clinically relevant research questions using data from routinely collected electronic health records, enriched by NLP and linkages to external data sources. Five related studies were performed all using data obtained from the SLaM BRC Case Record Information Search (CRIS) extracted using a NLP approaches, with two studies using external linkages with routinely collected national electronic datasets (NHS Hospital Episode Statistics and DfE National Pupil Database, NPD). Results: Using these data resources, I provide empirical support for the hypothesis that neurodevelopmental comorbidities increase children and adolescents’ risk for potentially more harmful treatments, greater treatment complexity and worse clinical outcomes. The NLP methods employed overcame limitations of structured data extraction, providing better assessment of a diverse range of symptom types, severity and related impairments, including suicidal risk, negative symptoms, antipsychotic treatment failure, and self-harm. External data linkages with the NPD enabled population level analyses by nesting clinical samples within their source population. NPD linkage also permitted the inclusion of education performance data, which were not routinely available within electronic health records. Conclusion: The thesis illustrates how the legal, governance and technical challenges were surmountable to enable linkage between NHS and Department for Education public service data. Also, it demonstrated that NLP and data linkages of electronic health records, have a clear role in clinical epidemiological studies of child and adolescent mental health. These tools, combined with the continued digitisation of public service activity, can unlock huge and detailed data resources for population-based analyses. However, current approaches have deficiencies, including limitations in accuracy, construct validity, and restrictions in the data available, providing challenges for future research. 2 ACKNOWLEDGEMENTS I have been extremely fortunate to have Dr Richard Hayes, Professor Tamsin Ford and Professor Matthew Hotopf as supervisors. It was Tamsin and Matthew’s generosity and guidance, which kick-started this research, turning a set of disparate ideas into a MRC funded research project. Richard was the catalyst to finally undertaking the PhD, and I am very grateful for all his support over the course of the last 4 years. He has been a tremendous mentor, balancing my impetuous nature, and kindly reinforcing what I need to do to move from a research idea, to a grant application, and onto published articles. Richard, Tamsin and Matthew continue to inspire me, and fuel my post-doctoral aspirations for child and adolescent mental health research. This PhD has been the most enjoyable period of my working life, which I believe is due to them, and the research environments they have created. I would also like to thank Professor Robert Stewart and Matthew Broadbent who have had a considerable influence on the work within this thesis. Over the course of my PhD, they have developed the BRC Nucleus into a lively, multi-disciplinary workspace; it has provided me with rich opportunities to build valuable friendships and collaborations. Of these, a special mention must go to Dr Laura Pina, who was wonderful to work with, and whose drive and enthusiasm for understanding what might improve lives of children with early onset psychosis, was infectious. Thank you to Professor Ruth Gilbert, who has kept me alert to some of the methodological flaws commonly hidden within data linkage research, but also introduced me to key researchers in the child health and education policy arena, and generously supported me as an early career researcher. Thank you to Richard White, the National Pupil Database Team, and Dr Murat Soncul for helping me drive through the innovative linkage work. I am very grateful to Professor Emily Simonoff, Dr James MacCabe and Dr Rashmi Patel for all their contributions to the research contained in this thesis, and for all their help in shaping my future research ideas. Similarly, I would like to thank Dr Kate Polling, Dr Sumithra Velupillai, Dr Rina Dutta, and Dr George Gkotsis for their important contributions to the thesis and being great colleagues. Thanks to Dr Omer Moghraby for supporting me in keeping my hand in clinically, and his enthusiasm for applying research into clinical practice. I am incredibly grateful for all the practical and moral support that Megan Pritchard, Leo Koeser, Clare Taylor, Craig Colling, Anna Kolliakou, David Chandran, Ryan Little, Jyoti Jyoti, Debbie Cummings, Amelia Jewell and Hitesh Shetty have given me whilst I’ve been at the BRC. Poppy, thanks for keeping it real. None of this work would have been possible without your love. 3 TABLE OF CONTENTS ABSTRACT 2 ACKNOWLEDGEMENTS 3 LIST OF TABLES 10 LIST OF FIGURES 13 LIST OF ABBREVIATIONS 15 PUBLICATIONS 16 CONTRIBUTION STATEMENT 19 ETHICS STATEMENT 20 CHAPTER 1. INTRODUCTION 21 1.1 What are Big Data? 22 1.2 Current challenges for Child & Adolescent Mental Health Epidemiology 23 1.2.1 Issues with conventional epidemiological study approaches: randomised control trials 23 1.2.2 Issues with conventional epidemiological study approaches: cross-sectional surveys 24 1.2.3 Issues with conventional epidemiological study approaches: prospective cohorts 25 1.3 Why do we need ‘Big data’ for child and adolescent mental health research? 26 1.4 Psychiatric epidemiology and Big Data 27 1.4.1 Data linkage of clinical and social care data 27 1.4.2 Natural language processing and the Electronic Health Records 30 1.4.3 Applying Natural Language Processing to Electronic Health Records 32 1.4.4 Natural language processing and its application in child and adolescent psychiatric epidemiology 36 1.4.5 Examples of other Big data methodologies 45 1.5 Aims and structure of this thesis. 45 CHAPTER 2. CLINICAL PREDICTORS OF ANTIPSYCHOTIC USE IN CHILDREN AND ADOLESCENTS WITH AUTISM SPECTRUM DISORDERS: A HISTORICAL OPEN COHORT STUDY USING ELECTRONIC HEALTH RECORDS 49 2.1 Summary 50 2.2 Introduction 51 2.3 Methods 52 2.3.1 Study Setting 52 The Clinical Record Interactive Search (CRIS) system 53 2.3.2 Study sample 54 4 2.3.3 Measurements 55 Outcome: antipsychotic use 55 Exposure: psychiatric comorbidity & intellectual disability 56 Covariates: 56 2.3.4 Analysis 57 2.4 Results 58 2.4.1 Demographic and clinical characteristics of the sample 58 2.4.2 Authentication of co-morbid diagnoses against the SDQ 60 2.4.3 Socio-demographic and clinical factors and their associations with antipsychotic treatment 61 2.4.4 Sensitivity analysis 64 2.5 Discussion 65 2.5.1 Strengths 66 2.5.2 Limitations 66 2.5.3 Conclusions 67 CHAPTER 3. DETECTION OF SUICIDALITY IN ADOLESCENTS WITH AUTISM SPECTRUM DISORDERS: DEVELOPING A NATURAL LANGUAGE PROCESSING APPROACH FOR USE IN ELECTRONIC HEALTH RECORDS 69 3.1 Summary 70 3.2 Introduction 71 3.3 Materials and Methods 74 3.3.1 Data resources 74 3.3.2 Overall workflow 75 3.3 Results 78 3.3.1 Distribution of SR annotation within the random selection of test and training set documents 78 3.3.2 Adaptions to the NLP tool following test and training 78 3.3.3 Performance NLP tool on SR test and training set documents 79 3.3.4 Manual review of NLP and gold-standard discrepancies 80 3.4 Discussion 83 3.4.1 Strengths 84 3.4.2 Limitations 85 3.4.3 Conclusion 85 5 CHAPTER 4. THE ASSOCIATION BETWEEN CO-MORBID AUTISM SPECTRUM DISORDERS AND ANTIPSYCHOTIC TREATMENT FAILURE IN EARLY-ONSET PSYCHOSIS: A HISTORICAL COHORT STUDY USING ELECTRONIC HEALTH RECORDS.
Recommended publications
  • D2.1 State of the Art and Requirements Analysis V1 Project Deliverable
    Collective Wisdom Driving Public Health Policies Del. no. – D2.1 State of the Art and Requirements Analysis v1 Project Deliverable This project has received funding from the European Union’s Horizon 2020 Programme (H2020-SC1-2016-CNECT) under Grant Agreement No. 727560 D2.1 State of the Art and Requirements 29/09/2017 Analysis v1 D2.1 State of the Art and Requirements Analysis v1 Work Package: WP2 Due Date: 01/09/2017 Submission Date: 29/09/2017 Start Date of Project: 01/03/2017 Duration of Project: 36 Months Partner Responsible of Deliverable: SIEMENS Version: 1.0 Final Draft Ready for internal Review Status: Task Leader Accepted WP leader accepted Project Coordinator accepted Dimosthenis Kyriazis (UPRC), Ilias Maglogiannis (UPRC), Christos Xenakis (UPRC), Argyro Mavrogiorgou (UPRC), Athanasios Kiourtis (UPRC), George Peppas (UPRC), Carlos Cavero (ATOS), Santiago Aso (ATOS), Antonio De Nigro (ENG), Francesco Torelli (ENG), Domenico Martino (ENG), George Moldovan (SIEMENS), Kosmin- Septimiu Nechifor (SIEMENS), Salvador Tortajada Velert (HULAFE), Sokratis Nifakos (KI), Tanja Tomson (KI), Jan Janssen (DFKI), Serge Autexier (DFKI), Petrina Smyrli (BIO), Andreas Menychtas (BIO), Author name(s): Christos Panagopoulos (BIO), Chris Orton (ICE), Usman Wajid (ICE), Thanos Kosmidis (CRA), Ricardo Jimenez-Peris (LXS), Patricio Martinez (LXS), Marta Patino-Martinez (UPM), Rafael Fernandez (UPM), Michael Boniface (IT-INN), Vegard Engen (IT- INN), Daniel Burns (IT-INN), Mitja Lustrek (JSI), Maroje Soric (ULJ), Patrick Weber (EFMI), John Mantas (EFMI),
    [Show full text]
  • Chapter 11 AC Group's 2007 Annual Report the Digital Medical Office Of
    Chapter 11 AC Group’s 2007 Annual Report The Digital Medical Office of the Future Performance A. Methodology The majority of previous EMR evaluations have been limited to self-reported functionality. Although high rankings in this arena often indicate a superior product, the reviewers are aware that in some cases this correlation does not always hold. There may be some highly ranked products offering the full range of functionality that from the end user’s point of view may have features, organization or display that are limiting. The converse may also occur where a product that achieves a lower ranking because it offers less that full functionality nonetheless offers highly innovative features that would be advantageous for all end users. In short, although scores from self-reported functionality are extremely useful, they do not capture rich qualitative information that could significantly influence the practitioner’s decision of which system to choose. The purpose of this document is to help a physician evaluate a vendor’s solution. The document is divided into separate product demonstrations. If the practice is interested in one fully integrated system, then have the vendor complete and interact with this entire document. If the practice is only interested in a Document Image Management solution, complete sections B and D. If the practice is only interested in a comprehensive EMR/EHR application, then complete sections B and E. Speed is essential Time the execution of the tasks and record how long they take. You may be surprised at the significant difference in the results. Speed is extremely important during physician documentation.
    [Show full text]
  • BENTO: a Visual Platform for Building Clinical NLP Pipelines Based on Codalab
    BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab Yonghao Jin, Fei Li and Hong Yu Department of Computer Science, University of Massachusetts Lowell, MA, USA Abstract CodaLab1 is an open-source web-based plat- form for collaborative computational research. Although CodaLab has gained popularity in the research community, its interface has lim- ited support for creating reusable tools that can be easily applied to new datasets and composed into pipelines. In clinical domain, natural language processing (NLP) on med- ical notes generally involves multiple steps, like tokenization, named entity recognition, etc. Since these steps require different tools Figure 1: The architecture of BENTO. The BENTO which are usually scattered in different publi- back end stores the description files of various tools cations, it is not easy for researchers to use (e.g., pre-trained NLP models), processes static con- them to process their own datasets. In this pa- tents of the application and handles compilation of the per, we present BENTO, a workflow manage- user-defined pipelines. The CodaLab back end stores ment platform with a graphic user interface the datasets (bundles) and executes computational jobs. (GUI) that is built on top of CodaLab, to fa- The two back end servers are brought behind a single cilitate the process of building clinical NLP domain name using a reverse proxy server. pipelines. BENTO comes with a number of clinical NLP tools that have been pre-trained using medical notes and expert annotations this problem, researchers have developed CodaLab and can be readily used for various clinical as an open-source platform for researchers and soft- NLP tasks.
    [Show full text]
  • Healthcare It Industry Update │ Q3 2016
    HEALTHCARE IT INDUSTRY UPDATE │ Q3 2016 www.harriswilliams.com Investment banking services are provided by Harris Williams LLC, a registered broker-dealer and member of FINRA and SIPC, and Harris Williams & Co. Ltd, which is authorised and regulated by the Financial Conduct Authority. Harris Williams & Co. is a trade name under which Harris Williams LLC and Harris Williams & Co. Ltd conduct business. 0 HEALTHCARE IT INDUSTRY UPDATE | Q3 2016 HEALTHCARE IT PRACTICE OVERVIEW CONTENTS INTRODUCTION . WHAT WE’RE READING Harris Williams & Co. is pleased to present our review of Q3 Healthcare IT . KEY HEALTHCARE MACRO activity. This report provides commentary and analysis on current capital market INDICATORS trends and merger and acquisition dynamics within the Healthcare IT industry. PUBLIC MARKETS . PUBLIC COMPARABLES We hope you find this edition helpful and encourage you to contact us directly if . M&A ACTIVITY you would like to discuss our perspective on current industry trends and M&A . PRIVATE PLACEMENT ACTIVITY opportunities or our relevant industry experience. IPO ACTIVITY OUR PRACTICE CONTACTS Sam Hendler Harris Williams & Co. is a leading advisor to the Healthcare IT industry. Our global Managing Director practice includes professionals across the firm from Harris Williams & Co.’s [email protected] Technology, Media & Telecom Group and Healthcare & Life Sciences Group. +1 (617) 654-2117 Jeff Bistrong HW&Co. Healthcare IT Taxonomy Managing Director [email protected] +1 (617) 654-2102 . Patient-Facing Solutions Mike Wilkins . Care Delivery: Managing Director Operational Efficiency [email protected] +1 (415) 271-3411 . Care Delivery: Clinical / Acute Thierry Monjauze Managing Director . Care Delivery: [email protected] Clinical / Non-Acute +44 20 7518 8901 CARE DELIVERY EMPLOYERS ORGANIZATIONS PATIENTS .
    [Show full text]
  • Lessons from The
    Ideas from the NHS Dr. Francis Wong, MBBS, MBA, MPH www.franciswong.com September 2018 Purpose Background: The NHS (National Health System) is the largest single-payer system in the world and was ranked #1 in the Commonwealth Fund’s Health Care System Performance Rankings in 20171 (US placed #11). It achieved particularly high scores in Care Process (prevention, safe care, coordination & patient engagement), Access (timeliness & affordability) and Equity. The NHS is also one of the world’s largest employers with around 1.5m staff nationwide. In 2014, the UK spent 9.8% of GDP on healthcare2. Of course, it is not without shortcomings, but I am immensely proud of the NHS and stand behind its founding principles. As a physician who has worked in both UK and US, I believe there is much that the countries can learn from each other. My goals for this document are: • Outline several examples of successful, promising and even controversial innovations in the UK’s National Health Service with an emphasis on primary care. • Briefly describe their impact on care delivery in the UK. • Hypothesize how these innovations might be applied to a US employer-based model. 1 Commonwealth Fund: Mirror, Mirror 2017 2 NHS spends about EU average as percentage of GDP on health Primary Care as Gatekeeper Overview Impact Relevance In England, there are approximately 340M GP Primary care as a gatekeeper to specialists is The gatekeeping role of primary care in the UK consultations a year with an average of 6 visits not a novel concept in the US and is a defining originated over 100 years ago, prior to the per year.
    [Show full text]
  • Deliverable 2.4 Development of Supervised Categorization Models
    Project Acronym:ICTUSnet Project code: SOE2/P1/E0623 E2.4 DEVELOPMENT OF SUPERVISED CATEGORIZATION MODELS, TOPIC MODELLING AND EXTRACTION OF CLINICAL INFORMATION COMPUTING. Due date: 30/04/2020 WP2. WPActual Development submission date: and30/ 04integration/2021 of 'Machine Responsible partner: BSC Version: Learning' algorithms04 Status: Final Dissemination level: Public / Consortium Project funded by the Interreg Sudoe Programme through the European Regional Development Fund (ERDF) Project Acronym: ICTUSnet Project code: SOE2/P1/E0623 Deliverable description: This deliverable describes the methodology used in WP2 to develop the SUPERVISED CATEGORIZATION MODELS, TOPIC MODELLING AND EXTRACTION OF CLINICAL INFORMATION using deep learning techniques. The results obtained by the deep learning models are remarkable, reaching 91% F1 on average. Revision history Version Date Comments Partner 01 12/2020 First version BSC 02 02/2021 Second version BSC 03 04/2021 Final version BSC Authors Name Partner Marta Villegas Montserrat BSC Aitor González Agirre BSC Joan Llop BSC Siamak Barzegar BSC Contributors Name Partner ICTUSnet: E2.4Development of supervised categorization models, topic modelling and extraction of clinical information via cognitive computing. 30/04/2021 02 Page 2 of 65 Project Acronym: ICTUSnet Project code: SOE2/P1/E0623 ABBREVIATIONS AND ACRONYMS HUSE Hospital Universitario Son Espases XML Extensible Markup Language HER Electronic Health Record TTR Type Token Ratio BRAT Brat Rapid Annotation Tool F1 F1 score IAA Inter-Annotator Agreement NER Named Entity Recognition NERC Named Entity Recognition and Classification ICTUSnet: E2.4Development of supervised categorization models, topic modelling and extraction of clinical information via cognitive computing. 30/04/2021 02 Page 3 of 65 Project Acronym: ICTUSnet Project code: SOE2/P1/E0623 TABLE OF CONTENTS INRODUCTION .............................................................................................................
    [Show full text]
  • NHS Data: Maximising Its Impact on the Health and Wealth of the United Kingdom Saira Ghafur, Gianluca Fontana, Jack Halligan James O’Shaughnessy & Ara Darzi Contents
    NHS data: Maximising its impact on the health and wealth of the United Kingdom Saira Ghafur, Gianluca Fontana, Jack Halligan James O’Shaughnessy & Ara Darzi Contents 02 ACKNOWLEDGEMENTS 04 FOREWORD 05 EXECUTIVE SUMMARY 08 INTRODUCTION: MAXIMISING THE IMPACT OF NHS DATA 12 PUBLIC OPINION AND ENGAGEMENT 16 DATA GOVERNANCE AND LEGAL FRAMEWORKS 20 DATA QUALITY AND INFRASTRUCTURE 24 CAPABILITIES 26 INVESTMENT SUGGESTED CITATION Ghafur S, Fontana G, Halligan J, O’Shaughnessy J, Darzi A. NHS data: Maximising its impact 28 VALUE SHARING on the health and wealth of the United Kingdom. Imperial College London (2020) doi: 10.25561/76409 34 REFERENCES Acknowledgements We would like to thank the following people who contributed to this document through interviews/ attendance at a round table and have agreed to be acknowledged: NAME ORGANISATION Dr. Natalie Banner * Understanding Patient Data Professor Sir John Bell * The Academy of Medical Sciences Kate Cheema British Heart Foundation Professor Diane Coyle University of Cambridge Douglas de Jager * Human.ai Rachel Dunscombe * NHS Digital Academy Dr. Andrew Elder Albion Capital Lord Valerian Freyberg * House of Lords John Godfrey * Legal & General Joanne Hackett * Genomics England Dr. Hugh Harvey * Hardian Health Eleonora Harwich * Reform A total of 26 one-to-one interviews were held with individuals with a strong Geoff Heyes Mind interest in this topic. Interviewees included representatives from government, Dr. Dominic King Google Health the NHS, academia, industry (technology and life sciences), research Dr. Jack Kreindler * Centre for Health & Human Performance institutions, charities and data privacy organisations. We have not consulted the public or healthcare professionals for the purposes of this paper, as we Michael MacDonnell * Google Health chose to focus on experts in the data policy and governance space.
    [Show full text]
  • Digital Health in the UK: an Industry Study for the Office of Life Sciences
    Digital Health in the UK An industry study for the Office of Life Sciences September 2015 Contents Foreword 1 Executive summary 2 Part 1. Market introduction 6 Part 2. Telehealthcare 10 Part 3. mHealth 21 Part 4. Data Analytics 32 Part 5. Digitised health systems 39 Part 6. Conclusions on the digital health UK competitive position 45 Monitor Deloitte team 51 References 52 Foreword Welcome to the Monitor Deloitte report on Digital Health. This is one of a series of reports reflecting work commissioned by Office of Life Sciences in March 2015 on key healthcare and life science industry segments in the United Kingdom. Digital Health is an emerging industry arising from the intersection of healthcare services, information technology and mobile technology. Digital health innovations are only just starting to be more widely accepted as necessary for the future of efficient healthcare service delivery. As we address the behaviour, social, legal and technical challenges, over time, digital health advances have the potential to help increase access, decrease healthcare system costs and improve health outcomes. This report analyses trends in the digital health industry based on discussions with stakeholders, literature reviews and our work across the sector. It focuses on the United Kingdom but in the context of the global market and draws on examples from other countries. The report considers the challenges to growth, barriers to adoption, shifting dynamics and how the emergent industry is developing. The intention is to provoke discussion and offer readers an overview of the industry challenges and dynamics in the United Kingdom. We welcome your feedback and comments.
    [Show full text]
  • Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs"
    Appendix to the paper "Code smell prediction employing machine learning meets emerging Java language constructs" Hanna Grodzicka, Michał Kawa, Zofia Łakomiak, Arkadiusz Ziobrowski, Lech Madeyski (B) The Appendix includes two tables containing the dataset used in the paper "Code smell prediction employing machine learning meets emerging Java lan- guage constructs". The first table contains information about 792 projects selected for R package reproducer [Madeyski and Kitchenham(2019)]. Projects were the base dataset for cre- ating the dataset used in the study (Table I). The second table contains information about 281 projects filtered by Java version from build tool Maven (Table II) which were directly used in the paper. TABLE I: Base projects used to create the new dataset # Orgasation Project name GitHub link Commit hash Build tool Java version 1 adobe aem-core-wcm- www.github.com/adobe/ 1d1f1d70844c9e07cd694f028e87f85d926aba94 other or lack of unknown components aem-core-wcm-components 2 adobe S3Mock www.github.com/adobe/ 5aa299c2b6d0f0fd00f8d03fda560502270afb82 MAVEN 8 S3Mock 3 alexa alexa-skills- www.github.com/alexa/ bf1e9ccc50d1f3f8408f887f70197ee288fd4bd9 MAVEN 8 kit-sdk-for- alexa-skills-kit-sdk- java for-java 4 alibaba ARouter www.github.com/alibaba/ 93b328569bbdbf75e4aa87f0ecf48c69600591b2 GRADLE unknown ARouter 5 alibaba atlas www.github.com/alibaba/ e8c7b3f1ff14b2a1df64321c6992b796cae7d732 GRADLE unknown atlas 6 alibaba canal www.github.com/alibaba/ 08167c95c767fd3c9879584c0230820a8476a7a7 MAVEN 7 canal 7 alibaba cobar www.github.com/alibaba/
    [Show full text]
  • Proceedings of the Workshop on NLP for Medicine and Biology Associated with RANLP 2013, Pages 1–8, Hissar, Bulgaria, 13 September 2013
    NLP for Medicine and Biology 2013 Proceedings of the Workshop on NLP for Medicine and Biology associated with The 9th International Conference on Recent Advances in Natural Language Processing (RANLP 2013) 13 September, 2013 Hissar, Bulgaria WORKSHOP ON NLP FOR MEDICINE AND BIOLOGY ASSOCIATED WITH THE INTERNATIONAL CONFERENCE RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING’2013 PROCEEDINGS Hissar, Bulgaria 13 September 2013 ISBN 978-954-452-024-3 Designed and Printed by INCOMA Ltd. Shoumen, BULGARIA ii Preface Biomedical NLP deals with the processing of healthcare-related text—clinical documents created by physicians and other healthcare providers at the point of care, scientific publications in the areas of biology and medicine, and consumer healthcare text such as social media blogs. Recent years have seen dramatic changes in the types and amount of data available to researchers in this field. Where most research on publications in the past has dealt with the abstracts of journal articles, we now have access to the full texts of journal articles via PubMedCentral. Where research on clinical documents has been hampered by a lack of availability of data, we now have access to large bodies of data through the auspices of the Cincinnati Children’s Hospital NLP Challenge, the i2b2 shared tasks (www.i2b2.org), the TREC Electronic Medical Records track, the US-funded Strategic Health Advanced Research Projects Area 4 (www.sharpn.org) and the Shared Annotated Resources (ShARe; https://sites.google.com/site/shareclefehealth/taskdescription; www.clinicalnlpannotations.org) project. Meanwhile, the number of abstracts in PubMed continues to grow exponentially. Text in the form of blogs created by patients discussing various healthcare topics has emerged as another data source, with a new perspective on healthrelated issues.
    [Show full text]
  • Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
    JMIR MEDICAL INFORMATICS Pfaff et al Viewpoint Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning Emily R Pfaff1, MS; Miles Crosskey2, PhD; Kenneth Morton2, PhD; Ashok Krishnamurthy3, PhD 1North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States 2CoVar Applied Technologies, Durham, NC, United States 3Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States Corresponding Author: Emily R Pfaff, MS North Carolina Translational and Clinical Sciences Institute University of North Carolina at Chapel Hill 160 N Medical Drive Chapel Hill, NC, United States Phone: 1 919 843 4712 Email: [email protected] Abstract Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient's medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning±based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier.
    [Show full text]
  • Domain Independent Natural Language Processing—A Case Study for Hospital Readmission with COPD
    2014 IEEE 14th International Conference on Bioinformatics and Bioengineering Domain Independent Natural Language Processing – A Case Study for Hospital Readmission with COPD Ankur Agarwal, Ph.D. Sirish Malpura Florida Atlantic University Florida Atlantic University Boca Raton, Florida Boca Raton, Florida [email protected] [email protected] Ravi S Behara, Ph.D. Vivek Tyagi Florida Atlantic University Florida Atlantic University Boca Raton, Florida Boca Raton, Florida [email protected] [email protected] Abstract—Natural language processing is a field of computer science, which focuses on interactions between computers and phenomena based on actual examples of these phenomena human (natural) languages. The human languages are provided by the text input without adding significant linguistic ambiguous unlike Computer languages, which make its analysis or world knowledge. Machine learning based NLP solutions and processing difficult. Most of the data present these days is in use this approach. unstructured form (such as: Accident reports, Patient discharge summary, Criminal records etc), which makes it hard for Connectionist Approach: Connectionist approach computers to understand for further use and analysis. This develops generalized models from examples of linguistic unstructured text needs to be converted into structured form by phenomena. However, connectionist models combine clearly defining the sentence boundaries, word boundaries and statistical learning with various theories of representation – context dependent character boundaries for further analysis. This paper proposes a component-based domain-independent thus the connectionist representations allow transformation, text analysis system for processing of the natural language known inference, and manipulation of logic formulae. [1] as Domain-independent Natural Language Processing System (DINLP). Further the paper discusses the system capability and Connectionist NLP approach is newer compared to its application in the area of bioinformatics through the case symbolic and statistical approaches.
    [Show full text]