128 ©2008 IMIA and SchattauerGmbH

Extracting Information from Textual Documents in the : A Review of Recent Research

S. M. Meystre1,G. K. Savova2, K. C. Kipper-Schuler2, J. F. Hurdle1 1 Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, Utah, USA 2 Biomedical Informatics Research, Mayo Clinic College of Medicine, Rochester, Minnesota, USA

rules or based on statistical methods and Summary Introduction . The information ex- Objectives: We examine recent published research on the extraction of In the biomedical domain, the rapid information from textual documents in the Electronic Health Record tracted can then be linked to concepts adoption of Electronic Health Records (EHR). in standard terminologies and used for Methods: Literature review of the research published after 1995, (EHR) with the parallel growth of nar- coding. The information can also be based on PubMed, conference proceedings, and the ACM Digital rative data in electronic form, along used for decision support and to enrich Library, as well as on relevant publications referenced in papers with the needs for improved quality of the EHR itself. Biosurveillance, bio- already included. care and reduced medical errors are both medical research, , and au- Results: 174 publications were selected and are discussed in this strong incentives for the development tomatic terminology management can review in terms of methods used, pre-processing of textual of Natural Language Processing (NLP) also benefit from information extrac- documents, contextual features detection and analysis, extraction of (sometimes called Medical Language tion. Finally, automatic de-identifica- information in general, extraction of codes and of information for Processing in this domain). Much of the tion of textual documents also uses the decision-support and enrichment of the EHR, available clinical data are in narrative for surveillance, research, automated terminology management, and extraction of personal information be- form as a result of transcription of dicta- data mining, and de-identification of clinical text. fore its removal or replacement. We Conclusions: Performance of information extraction systems with tions, direct entry by providers, or use review all these uses of information clinical text has improved since the last systematic review in 1995, of speech recognition applications. This extraction in this paper. but they are still rarely applied outside of the laboratory they have free-text form is convenient to express This review focuses on research about been developed in. Competitive challenges for information extraction concepts and events, but is difficult for information extraction from narrative from clinical text, along with the availability of annotated clinical searching, summarization, decision-sup- documents stored in the EHR and pub- text corpora, and further improvements in system performance are port, or statistical analysis. To reduce lished after 1995, with an emphasis on important factors to stimulate advances in this field and to increase errors and improve quality control, coded recent publications. Previous research the acceptance and usage of these systems in concrete clinical and data are required; this is where NLP, and on this topic is described in a review biomedical research contexts. more precisely Information Extraction by Spyns [1]. Research on information Keywords (IE), is needed as explained below. extraction from the biomedical literature Electronic health record, natural language processing, information IE typically requires some "pre-pro- is not discussed in this paper, but is well extraction, text mining, state-of-the-art review cessing" such as spell checking, docu- described in reviews by Cohen et al. [2] ment structure analysis, sentence split- and by Zweigenbaum et al. [3]. Geissbuhler A, Kulikowski C, editors. IMIA Yearbook of Medical ting, tokenization, word sense disam- Informatics 2008. Methods Inf Med 2008; 47 Suppl 1:128-44 biguation, part-of-speech tagging, and some form of parsing. Contextual fea- tures like negation, temporality, and What Is Information Extraction? event subject identification are crucial IE involves extracting predefined types for accurate interpretation of the ex- of information from text [4]. In con- tracted information. Several different trast, information retrieval (IR) is fo- techniques can be used to extract in- cused on finding documents and has formation, from simple pattern match- some very popular examples such as the ing to complete processing methods Google [5] or PubMed [6] search en- based on symbolic information and gines. IR returns documents whereas IE

IMIA Yearbook of Medical Informatics 2008

Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 For personal or educational use only. No other uses without permission. All rights reserved. 129 Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research

returns information or facts. IE is a necessary), information extraction (to a special challenge to NLP? First, some specialized sub-domain of Natural Lan- extract specific types of information clinical texts are ungrammatical and guage Processing. As cited in the En- from texts of interest), and data min- composed of short, telegraphic phrases. cyclopedia of Artificial Intelligence, ing (to find associations among the Other texts, including discharge sum- "Natural Language Processing is the extracted pieces of information). maries and consult reports such as ra- formulation and investigation of diographic readings, are often dictated computationally effective mechanisms and are composed deliberately for clear for communication through natural lan- communication, while texts like guage." [7] NLP research focuses on Clinical versus Biomedical Text progress notes are written mainly for building computational models for un- Much of what has been written on the documentation purposes. Second, clini- derstanding natural language. "Natural biomedical uses of NLP can be broken cal narratives are pregnant with short- language" is used to describe any lan- down into two categories: those that hand (abbreviations, acronyms, and guage used by human beings, to distin- focus on biomedical text and those that local dialectal shorthand phrases). These guish it from programming languages focus on clinical text. For our purposes shorthand lexical units are often over- and data representation languages used here, we define biomedical text to be loaded (i.e., the same set of letters has by computers and described as "artifi- the kind of text that appears in books, multiple renderings); Liu et al. estimate cial." Some important domains of re- articles, literature abstracts, posters, and that acronyms are overloaded about 33% search are closely related to information so forth. Clinical texts, on the other of the time and are often highly ambigu- extraction (and sometimes are confused hand, are texts written by clinicians in ous even in context [10]. Third, misspell- with it), and these are explained below. the clinical setting. These texts describe ings abound in clinical texts, especially Named Entity Recognition (NER) is a patients, their pathologies, their per- in notes without rich-text or spelling sup- sub-field of information extraction and sonal, social, and medical histories, port. For example, the US Veterans refers to the task of recognizing expres- findings made during interviews or Administration's (VA) EHR system is the sions denoting entities (i.e., Named En- during procedures, and so forth. Indeed, largest in the world, but offers essen- tities), such as diseases, drugs, or people's the term "clinical text" covers the en- tially only simple text support. It is not names, in free text documents [8]. Some tire gamut of narratives appearing in uncommon in the VA corpus to find ab- entities can be identified solely through the patient record. These can be sur- breviations or acronyms themselves surface structure patterns (e.g., Social prisingly short (e.g., a chief complaint) misspelled. Fourth, clinical narratives Security Numbers: XXX-XX-XXXX), or quite long (a medical student his- can contain any characters that can be but most of them require rules like tory and physical). There is an impor- typed or pasted. A common example [TITLE][PERSON] (for "Mr. Doe"), tant class of texts that arise in the clini- in the VA corpus is long, pasted sets of or [LOCATION], [LOCATION] (for cal research setting that are rarely de- laboratory values or vital signs. Such "Salt Lake City, Utah"). Rule-based scribed in the literature. Some of these embedded non-text strings complicate NER systems can be very effective, but resemble biomedical texts (e.g., inter- otherwise straightforward NLP tasks require some manual effort. Machine nal research reports) while others re- like sentence segmentation, since they learning approaches can successfully semble classic clinical texts (e.g., pa- are usually filled with periods. Fifth, in extract named entities but require large tient notes made during a clinical trial). an attempt to bring some structure and annotated training corpora. Advantages Since these narratives are rarely made consistency to otherwise unstructured of machine learning approaches are that available outside the corporate setting clinical narratives, templates and pseudo- they do not require human intuition and that generated them, formal studies of tables (e.g., plain text made to look tabu- can be retrained without reprogram- them are sparse. While we do not ad- lar by the use of white space) are com- ming for any domain. dress these texts further here, we note mon. Implicit templates, like the norma- Text mining uses information extrac- that there is no a priori reason to think tive structures for a history-and-physi- tion and is defined by Hearst [9] as the that the techniques tailored to either bio- cal or a discharge summary that are com- process of discovering and extracting medical or to clinical texts would not be monly used across care settings can be knowledge from . Text useful (perhaps with modification) in the quite useful to NLP. Explicit templates, mining typically comprises two or three realm of clinical research narratives. though, are pre-formatted, highly id- steps: information retrieval (to gather What makes clinical text different from iosyncratic, and institution-specific relevant texts; this step is not always biomedical text, and why does it pose with fields to be filled in by the user.

Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 IMIA Yearbook of Medical Informatics 2008 For personal or educational use only. No other uses without permission. All rights reserved. 130 Meystre et al.

All of these issues complicate NLP on to allow natural language queries. textual clinical documents in the EHR. clinical text, making it especially chal- MedLEE was the first biomedical NLP As mentioned previously, we only in- lenging. In spite of the challenges, ex- system to be applied to an institution cluded research published after 1995. cellent research has been described in the different than the one where it was de- Our focus is on recent published re- literature on extracting information from veloped. This resulted in a small drop search. We selected interesting publi- clinical text. We review this work below. in performance. However, after some cations from bibliographic queries in adjustments, MedLEE performed as PubMed (for "information extraction", well as in the original institution [13]. "text mining" without "information SPRUS (Special Purpose Radiology retrieval", "natural language process- A Short History of Information Understanding System) [14] was the ing" and "record" without "literature" Extraction in the Biomedical Domain first NLP application developed by the or "Medline", "medical language pro- IE has been developed mostly outside of Medical Informatics group at the cessing", and "natural language under- the biomedical domain, in the Message University of Utah (Salt Lake City), and standing"), conference proceedings, Understanding Conferences (MUC) or- was only semantically driven. Later came and the ACM Digital Library (for "in- ganized between 1987 and 1998 and SymText (Symbolic Text processor) formation extraction" with "medical" sponsored by the U.S government. The [15], with syntactic and probabilistic se- or "medicine" or "biomedical" or MUC conferences fostered much of the mantic analysis. SymText relied on Baye- "clinical" without "literature" or work in the IE domain and consisted in sian networks for semantic analysis. "Medline"). We also added relevant competitive evaluations of systems de- The U.S. National Library of Medicine publications referenced in papers that veloped for extraction of specific in- has developed a set of NLP applica- were already included. formation such as named entities tions called the SPECIALIST system (people, organizations, locations), [16], as part of the Unified Medical events, and relations (e.g. employee_of, Language System (UMLS®) project State of the Art in Informati- location_of, manufacture_of). The [17]. It includes the SPECIALIST Lexi- MUC evaluation methods have been con, the Semantic Network, and the on Extraction from the EHR widely adopted and adapted to other UMLS Metathesaurus®. The NLM also Methods Used for Information domains such as . developed several applications that use In the biomedical domain, IE was ini- the UMLS, such as the Lexical Tools and Extraction tially evaluated with complete NLP sys- MetaMap [18], with many other applica- A variety of methods have been em- tems (i.e. large systems featuring all func- tions described in the following sections. ployed in the general and biomedical tions required to fully analyze free-text). The examples of complete NLP sys- literature domains to extract facts from The first of these large-scale projects tems cited above required significant free text and fill out template slots. was the Linguistic String Project-Medi- resources to develop and implement. McNaught et al. [19] describe a detailed cal Language Processor (LSP-MLP) Considering this issue, several authors review of IE techniques in the biomedi- [11], at New York University, enabling progressively experimented with more cal domain; however their review does extraction and summarization of signs/ simple systems focused on specific IE not include the clinical field. Here, we symptoms and drug information, and tasks and on a limited number of dif- adopt their classification scheme with identification of possible medication ferent types of information to extract. references to the clinical subdomain. side effects. Inspired by this work, These more focused systems demon- A typical IE system consists of a combi- Friedman at al. [12] developed the strated good performance and now con- nation of the following components de- MedLEE (Medical Language Extrac- stitute the majority of the systems used scribed by Hobbs [20,21]: tokenizer, sen- tion and Encoding system) system. This for IE. This review includes all systems tence boundary detector, part-of-speech system currently is in production at the used for IE, complete or more focused. tagger, morphological analyzer, shallow New York Presbyterian Hospital and at parser, deep parser (optional), gazetteer, Columbia University. MedLEE is named entity recognizer, discourse mod- mainly semantically driven and is used Methods ule, template extractor, and template com- to extract information from clinical biner. The performance of the higher- narrative reports, to participate in an This paper presents a review of recent level components (discourse module, automated decision-support system, and work in information extraction from template extractor and template com-

IMIA Yearbook of Medical Informatics 2008 Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 For personal or educational use only. No other uses without permission. All rights reserved. 131 Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research

biner) is highly dependent on the per- tect patient confidentiality). A major et al. use morpho-syntactic disambigu- formance of the lower level components. challenge is the creation of a large and ation tools in addition to a classical spell- The state-of-the-art of the lower-level vibrant community around shared data, checker to rank and select the best can- components is discussed in a following tasks, annotation guidelines, annotations, didate for word correction. Tolentino section. Higher-level components, e.g., and evaluation techniques. So far, there et al. [35] create a UMLS-based spell- templates, require careful modeling for have been three clinical shared tasks com- ing error correction tool. Their method relevant attributes; with any template petitive evaluations on clinical texts: performs spelling correction by detect- change, the IE system needs to be rerun (1)Automatic assignment of ICD-9- ing errors and suggesting corrections to populate the modified template. CM codes to clinical free text [29]. against a dictionary. They use the One approach to IE is pattern-match- The shared task involved mapping UMLS Specialist Lexicon as the pri- ing, which exploits basic patterns over ICD-9-CM codes to radiology re- mary source of dictionary terms and ports. Pestian et al. [30] describe the a variety of structures - text strings, WordNet [36,37] as a secondary source. task, its organization, and results. part-of-speech tags, semantic pairs, and (2)De-identification of discharge sum- Tomanek et al. [38] examine the ques- dictionary entries [22]. The main disad- maries within the i2b2 [31] initia- tion of whether sentence and token split- vantage of pattern-matching approaches tive held in November 2006. The ting tools trained on general annotated is their lack of generalizability, which task is described in Uzuner et al. [32]. corpora are adequate for medical texts. limits their extension to new domains. Top systems achieved F-measure re- They compiled and annotated a corpus Another set of approaches is the use of sults in the high 90's. More details (Julie) according to a schema developed shallow and full syntactic parsing. How- are provided in the "De-identifica- for sentence and token-splitting, which ever, non-robust parser performance is tion of clinical text" section below. then served as a training set for a ma- an outstanding issue since medical/clini- (3)Patient smoking status discovery chine learning algorithm using Condi- from discharge summaries within cal language has different characteris- tional Random Fields. The results in- the i2b2 initiative held in Novem- tics than general English. The differ- ber 2006 [33]. The participating dicate that for sentence splitting, the ence between general and medical En- systems applied a variety of tech- training corpus is not very critical; for glish has led to the development of sub- niques to assign the final patient tokenization, however, performance is language-driven approaches, which smoking status with the top micro- significantly improved when training formulate and exploit a sublanguage's averaged F-measure results in the on a domain-specific corpus. particular set of constraints [23-27]. The 80's. More details are provided in Word Sense Disambiguation (WSD) is disadvantage of sublanguage approaches the "Extraction of information in the process of understanding which lies in their poor transferability to new general" section below. sense of a word (from a set of candi- domains. Ontology-driven IE aims at dates) is being used in a particular con- using an ontology to guide the free-text text. WSD is a crucial task for applica- processing [28]. Syntactic and seman- "Pre-processing" of Textual Documents tions that aim to extract information tic parsing approaches combine the two The vast amount of medical and clini- from text. Weeber et al. [39], at the in one processing step. Machine learn- cal data available are only useful in as National Library of Medicine, derived ing techniques have demonstrated re- much as the information contained in a corpus of MEDLINE abstracts and markable results in the general domain them can be properly extracted and manually sense-tagged 5,000 instances and hold promise for clinical IE, but they understood. Much work has been done of 50 ambiguous words using the require large, annotated corpora for recently in developing and adapting UMLS as sense inventory. Liu et al. [10] training, which are both expensive and natural language tools for cleaning and present a very good background sec- time-consuming to generate. processing this data for the subsequent tion on general English WSD, biomedi- The general and biomedical IE com- tasks of information extraction and text- cal WSD and supervised approaches to munities have pushed the field towards and data-mining. the task. They avoid the use of manu- the development of sophisticated meth- One useful pre-processing task is spell- ally annotated sense-tagged data by us- ods for deeper, comprehensive extrac- checking. Ruch et al. [34] note that the ing a two-step unsupervised approach. tions from text. Clinical - and medical incidence of misspellings in medical They automatically derive a sense- - domain IE has lagged behind mainly records is around 10%, which is sig- tagged corpus from MEDLINE ab- because of limited access to shareable nificantly higher than the misspelling stracts using the knowledge in the clinical data (e.g., constraints that pro- incidence for other types of texts. Ruch UMLS, and use the derived sense-

Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 IMIA Yearbook of Medical Informatics 2008 For personal or educational use only. No other uses without permission. All rights reserved. 132 Meystre et al.

tagged corpus as a training set for a clas- lexicon required much less effort than They evaluated Bikel, Collins, sifier for ambiguous words. Liu et al. manual annotations. Liu et al. [46] de- Stanford, Charniak, and Charniak-Lease [40] implement four machine learning veloped a manually annotated corpus parsers and mapped the constituent algorithms and use three datasets in a of pathology reports and a domain spe- parsed trees to dependency graphs. study showing that supervised WSD is cific lexicon to evaluate the perfor- Bikel and Charniak-Lease parser per- suitable when there is enough sense- mance of a maximum-entropy POS- formed well on parsing sentences from tagged data. All supervised WSD clas- tagger trained on general English. The the Genia Treebank (also mapped to sifiers performed with a precision of POS-tagger re-trained with the anno- dependency graphs). Pyysalo et al. [50] less than 80% for biomedical terms tated corpus performed better than with investigated the adaptation of a Link while most classifiers achieved around the lexicon addition. The study also Grammar parser to the biomedical lan- 90% for general English. There was no showed that more than 30% of the words guage with a focus on unknown words. single combination of feature represen- in the pathology reports were unknown tation, window size, or algorithm that to the general English trained tagger. The performed best for all ambiguous addition of an 800-word domain-specific words. Xu et al. [41] also investigated lexicon revealed a performance increase Contextual Feature Detection and the effects of sample size, sense distri- of 5% and selecting sentences that con- Analysis bution, and degree of difficulty on the tained the most frequent unknown words When extracting information from nar- performance of WSD classifiers. proved to be most helpful. Hahn et al. rative text documents, the context of Pakhomov et al. [42] focus on WSD in [47] investigated the use of a rule-based the concepts extracted plays a critical the clinical domain and experiment with POS-tagger (Brill tagger) and a statisti- role. Important contextual information abbreviation and acronym disambigu- cal tagger (TnT) on clinical data. The includes negation (e.g. "denies any chest ation by applying a combination of su- statistical tagger TnT trained on general pain"), temporality (e.g. "...fracture of pervised and unsupervised methods. texts performed close to the state of the the tibia 2 years ago..."), and the event Coden et al. [43] use a supervised art in the medical domain. They claim subject identification (e.g. "his mother method to train a classifier for the top that the model (statistical vs. rule-based) has diabetes"). 50 ambiguities from a clinical corpus is more important than the sublanguage. NLP systems such as the LSP [11] or compiled from Mayo Clinic notes. Nonetheless, the statistical tagger im- MedLEE [12] include negation analy- Part-of-speech tag assignment has a proved its performance substantially sis in their processing, but research fo- major impact on the natural language when trained on medical data. cused explicitly on negation detection tasks that follow. According to Campbell Parsers generate a constituent tree that started only a few years ago with et al. [44] an error of 4% in part-of- provides a syntactic representation of NegExpander [51], a program detect- speech-tag assignment can translate to the sentence structure with its depen- ing negation terms and then expanding 10% in error rate at the sentence level. dencies. Medical language is especially (NegExpanding) the related concepts. Part-of-speech taggers for general En- challenging because of its ungrammati- This program had a precision of 93% glish achieve very high scores in the cal and fragmented constructions. and was used by a mammography re- task. Coden et al. [45] suggest two ways Campbell et al. [48] argue in favor of ports classification algorithm. More to adapt a part-of-speech tagger (POS- dependency grammars (DG) for bio- recently, a negation detection algorithm tagger) trained in general English texts medical text exactly because of the called NegEx was developed using to the clinical language: by adding a ungrammaticality of many sentences. regular expressions [52] and achieved 500-word domain-specific lexicon; and In DG, each word has one attachment 94.5% specificity and 77.8% sensitiv- by creating manual annotations on do- only and a tree structure with the de- ity. Several systems later implemented main-specific documents and adding pendencies is the sentence representa- NegEx, such as the system developed these documents to the English corpus. tion. The authors applied a Transfor- by Mitchell et al. [53] to extract infor- The addition of the annotated docu- mational Based Learning algorithm to mation from pathology reports in the ments increased the tagger's perfor- learn a dependency grammar for medi- Shared Pathology Informatics Network mance by 6% to 10%, whereas the ad- cal texts. Clegg et al. [49] present a (SPIN). When only evaluating nega- dition of the lexicon increased its per- method for evaluating parsers' perfor- tion detection, they measured a preci- formance by about 2%. The authors mance using an intermediate represen- sion of 77% and a recall of 83%. In noted however that the creation of the tation based on dependency graphs. the process of developing NLP tools

IMIA Yearbook of Medical Informatics 2008 Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 For personal or educational use only. No other uses without permission. All rights reserved. 133 Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research

for the i2b2 (Informatics for Integrat- posed a system for automated temporal Chu et al. [65] have manually anno- ing Biology and the Bedside) project, information extraction based on a tem- tated four contextual features for 56 Goryachev et al. [54] compared NegEx, poral tagger, an NLP system (MedLEE), clinical conditions detected in ED re- NegExpander, and two classification- some post-processing based on medical ports. These features - Validity (valid/ based algorithms, and measured the best and linguistic knowledge to treat implicit invalid), Certainty (absolute, high, performance with NegEx (94.5% sensi- temporal information and uncertainty, moderate, low), Directionality (af- tivity and 94.3% specificity). and the simple temporal constraint satis- firmed, negated, resolved), and Tem- A more complex system, called faction problem for temporal reasoning porality (recent, during visit, histori- Negfinder [55], also used indexed con- [60]. This system, called TimeText, has cal) - were then evaluated in terms of cepts using the UMLS and regular ex- recently been evaluated with discharge their contribution to the classification pressions, but added a parser using a summaries [61]. TimeText detected clini- of the detected conditions as acute, LALR (Look-Ahead Left-Recursive) cally important temporal relations with chronic or resolved. Directionality (i.e. grammar to identify negations, and 93.2% recall and 96.9% correctness. It negation) was the most important con- achieved 97.7% specificity and 95.3% also answered clinically plausible tem- textual feature. Chapman et al. [66] sensitivity when analyzing surgical poral queries with 83.7% accuracy. propose an algorithm for contextual notes and discharge summaries. Harkema et al. have developed tempo- features identification. This algorithm, A system extracting SNOMED-CT con- ral analysis in the context of the CLEF called ConText, is an extension of cepts from History and Physical Exami- (Clinical eScience Framework) IE NegEx cited above. ConText determines nation reports at the Mayo Clinic imple- component [62]. The information ex- the values of three contextual features: mented a negation detection algorithm tracted is used to build the patient Negation (negated, affirmed), Tempo- based on an ontology for negation. They chronicle, an overview of the significant rality (historical, recent, hypothetical), measured a 97.2% sensitivity and 98.8% events in the patient's medical history. and Experiencer (patient, other). Like specificity [56]. The most recently pub- Events extracted from narrative reports NegEx, this algorithm uses regular ex- lished negation detection algorithm used are associated with structured data from pressions to detect trigger terms, a hybrid approach based on regular ex- the EHR. The system still includes some pseudo-trigger terms, and scope termi- pressions and grammatical parsing [57]. manual steps, but the authors are work- nation terms, and then attributes the Negation terms were detected using regu- ing on a fully automatic system. Focus- detected context to concepts between lar expressions to achieve high sensi- ing on discharge summaries, Bramsen the trigger terms and the end of the sen- tivity, and the part-of-speech parse tree et al. analyzed temporal segments (i.e., tence or a scope termination term. The was then traversed to locate negated a fragment of text that does not exhibit evaluation of ConText used an NLP- phrases with high specificity. When abrupt changes in temporal focus), a assisted review methodology described evaluating negation detection on radi- coarser level of analysis than Zhou et al., by Meystre et al. [67] and measured a ology reports, a 92.6% sensitivity and and their ordering to characterize the tem- 97% recall and precision for negation, a 99.8% specificity were measured. poral flow of discourse [63]. The authors 50% recall and 100% precision for Temporality analysis in clinical narra- use machine learning techniques for au- experiencer, and 67.4% to 82.5% re- tive text can be significantly more com- tomatic temporal segmentation and seg- call and 74.2% to 94.3% precision for plex than negation analysis, and has ment ordering. For temporal segmenta- temporality (when assigning historical been investigated by Zhou, Hripcsak tion, they use lexical, topical, positional, or hypothetical values). and colleagues, starting by proposing a and syntactic features and measured Some conclusions that can be drawn model for temporal information based 78% recall and 89% precision. The best from this research are that separate al- on a simple temporal constraint satisfac- results for segments ordering were ob- gorithms (i.e., specialized in contextual tion problem [58]. Discharge summaries tained with the Integer Linear Program- features analysis) are easier to imple- were analyzed for temporal structures ming framework (84.3% accuracy) [64]. ment, and one of the best performing and a temporal constraint structure for Finally, algorithms combining the negation detection algorithms - NegEx historical events was developed and then analysis of the subject of the text (e.g., - is a good example of this. Most of applied to other discharge summaries. the patient) and other contextual fea- these algorithms are based on lexical The temporal constraint structure success- tures have recently been developed and information, even if some algorithms fully modeled 97% of the temporal ex- evaluated. As a first step towards auto- add part-of-speech information like pressions [59]. The authors then pro- mated extraction of contextual features, ChartIndex cited below.

Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 IMIA Yearbook of Medical Informatics 2008 For personal or educational use only. No other uses without permission. All rights reserved. 134 Meystre et al.

Extraction of Information in General different tasks is MedLEE [79]. Besides tem [92] instead of the UMLS utilized being progressively extended to most by the SPIN system. HITEx (Health In- In this section, we present published re- of the documents present in the EHR formation Text Extraction) was also search involving extraction of informa- [24] and tested for its transferability in based on GATE and was developed to tion from textual documents in the EHR another institution [13], it has been extract diagnoses and smoking status and emphasis the different methods used used to detect findings evocative of [93]. Finally, the KnowledgeMap Con- and types of documents analyzed. The breast cancer [80], to analyze modifi- cept Identifier (KMCI) was adapted to application of IE for specific purposes cations to data entry templates [81], and extract UMLS concepts from echo- such as coding, surveillance, terminol- even combined with machine transla- cardiography reports [94] and to detect ogy management, or research are de- tion to detect abnormal findings and QT interval prolongations [95]. scribed in the subsequent sections. devices in Portuguese radiology reports Recent systems are almost always based Useful IE has been attempted with ba- [82]. MetaMap and its Java™ version on some machine learning methods, for sic pattern matching techniques such as called MMTx (MetaMap Transfer) were limited tasks or for most of their func- regular expressions. Dictionaries of also often used to extract information tions. An example is a system developed variable size were also typically used. from clinical documents, even it they by Taira et al. [96] that used Maximum Long [68] has extracted diagnoses from have been developed for MEDLINE Entropy classifiers for parsing and se- discharge summaries using regular ex- abstracts and lack negation detection. mantic analysis, and later also a vector pressions and a punctuation marks dic- Some examples are Schadow et al. [83] space model to extract UMLS concepts tionary as well as the UMLS Meta- who used it to extract information from [97]. Another example is the semantic thesaurus [69]. To extract blood pres- pathology reports, Chung et al. [84] category classifier developed by Sibanda sure and antihypertensive treatment in- who used it with echocardiography re- et al. [98]. It employs support vector ma- tensification information, Turchin [70] ports, and Meystre et al. [67] who used chines to attribute semantic categories to also used regular expressions. The it to extract medical problems. each word in discharge summaries. REgenstrief eXtraction tool [71] uses SymText [15] and its successor, MPLUS Systems developed to extract informa- pattern matching to extract some diag- [85], make extensive use of semantic tion from textual documents in the noses and radiological findings related networks for semantic analysis. These EHR have mostly been focused on chest to congestive heart failure. Finally, a networks are implemented as Bayesian radiography reports [13,71,75,82, module extracting UMLS concepts and networks (also called belief networks), 88,93,96,99,100]. They have also been based on pattern matching was devel- trained to infer probabilistic relation- developed to analyze other types of ra- oped by Bashyam [72]. It was faster ships between extracted terms and their diology reports [72,78,80,85,86,97], than MMTx [73] and was proposed as meaning. They have been used to ex- echocardiogram reports [84,94,95], and a new option in MMTx. tract interpretations of lung scans [86], other types of documents that have Systems based on full or partial pars- to detect pneumonia [87], and to detect more diversity and larger vocabularies ing were based on morpho-semantems mentions of central venous catheters such as discharge summaries [68, (i.e., elementary meaningful units that [88]. Other systems combining syntac- 93,98], pathology reports [26,27,83], compose words and are prefixes like tic and semantic analysis have recently and other notes [70,81,101]. oto- or suffixes like -itis) [74], on lexi- been developed and evaluated. The Pitts- Some systems have been developed to cal and/or syntactic information such as burgh SPIN information extraction sys- analyze several different types of docu- the LifeCode® system [75], or the Dutch tem [27] was a project of the Shared Pa- ments [15,24,67], and the effort re- Medical Language Processor developed thology Informatics Network (SPIN) quired to port an NLP application from by Spyns et al. [76], or on semantic in- [89] based on GATE (General Architec- only chest radiography report, to other formation like the application developed ture for Text Engineering) [90] and evalu- radiology reports, discharge summaries, for the ChartIndex project to convert ated to extract specific information from and pathology reports is well described documents to the HL7 CDA [77] format pathology reports. A very similar appli- by Friedman [24,102]. The largest ef- and extract UMLS concepts [78]. cation - caTIES (cancer Text Informa- forts to develop and evaluate informa- Approaches combining syntactic and tion Extraction System) [23] - was later tion extraction from clinical text have semantic analysis constitute the major- developed by the same team as a caBIG- been achieved in the context of the i2b2 ity of the systems. A famous system that compliant [91] application. It is based smoking status identification challenge has been adapted and used for several on the NCI Enterprise Vocabulary Sys- in 2006 and the Medical NLP challenge

IMIA Yearbook of Medical Informatics 2008 Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 For personal or educational use only. No other uses without permission. All rights reserved. 135 Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research

[30] in 2007 and described in the next ticipated, an astounding number. We summarized in Friedman et al. [25] section. For the i2b2 "smoking chal- found two papers that reported tech- MedLEE has seen use in code extrac- lenge", a corpus of 502 de-identified niques and results. Aronson et al. [108] tion in many contexts. Friedman her- and "re-identified" (with realistic sur- leveraged several existing technologies self describes an automated pneumo- rogates) discharge summaries was first (e.g., NLM's Medical Text Indexer, a nia severity score coding system using created by Uzuner et al. [33]. Eleven support vector machine classifier, a k- it [112]. Elkins et al. [113] describe an teams participated. Their task was to NN classifier, etc.) and arranged them adaptation of its use for neuroradiology use discharge summaries to classify each in a stack-like architecture to evaluate standard concept extraction; Kukafka patient as a smoker, current smoker, their fused performance. They placed et al. [114] used it to code to a stan- past smoker, non-smoker, or unknown. 11th in the challenge with a mean F- dard for health, and health-related The best performing system was devel- measure considerably higher than the states, called the International Classi- oped by Clark et al. [103] and first fil- average score for all participants (F- fication of Functioning, Disability, And tered out documents with unknown measure = 0.85; the best score was 0.89; Health (ICF; also a WHO standard). smoking status, and then used SVMs mean score was 0.77). Crammer et al. Lussier et al. [115] have applied (Support Vector Machines) to classify [109] also described a multi-component MedLEE to extract SNOMED codes. the smoking status. They also added coding system; it used machine learn- SNOMED was also the driver for work 1200 documents to improve the train- ing, a rule-based system, and an auto- done by Hasman et al. [116] They have ing of their system. The overall accu- matic coding system based on human exploited SNOMED coding in clinical racy of their system reached 93.6%. coding policies. They judged these to text NLP, primarily to assist patholo- Some other good performing systems be loosely orthogonal so they combined gists during the coding process. are described in Cohen [104], Heinz et the results in a cascade that gave prior- In addition to extracting codes that con- al. [105], Savova et al. [106], and ity to the human coding policy ap- form to a standard coding scheme like Wicentowski and Sydes [107]. proach. They placed fourth in the chal- ICD-9/10 or SNOMED, there is con- lenge and in this paper describe the siderable interest in extracting codes same technology's performance against from text that conform to a local insti- Extracting Codes from Clinical Text a local corpus of radiology reports. tutional standard like a problem list. A popular approach in the literature ICD-10, the newer ICD standard, is Pakhomov et al. [117] and Haug et al. over the last several years has been to more popular overseas than in the US, [118] describe examples of problem- use NLP to extract codes mapped to so it is not surprising that the literature list extraction at the Mayo Clinic and controlled sources from text. The most describing automatic extraction of these Intermountain Healthcare, respectively. common theme was to extract codes codes comes mainly from Europe and These are two mature centers for clini- dealing with diagnoses, such as Inter- Japan. Baud et al. [110] detail an inter- cal informatics. The Pakhomov system national Classification of Diseases esting overview of the problems inher- uses a multi-pass, certainty-based ap- (ICD) versions 9 and 10 codes. In ad- ent in the task of ICD-10 encoding. And proach; while Haug's efforts use a Baye- dition to a focus on systematic coding in a vein similar to the ICD-9 ap- sian belief network technology. That schemes like ICD-9, institutions often proaches above, Aramaki et al. [111] use team's work built on the work presented also have local coding schemes they a multi-component approach using three in Gundersen et al., [119] which makes wish to extract. different extraction algorithms followed a convincing case for the superiority of 2007 was a particularly interesting year by a polling technique at the end to de- just-in-time automated coding over for this because it was the year of the termine the winner. A consistent theme static, pre-coded systems. Medical NLP challenge, a shared task with all these recent NLP-based code exercise that provided a moderately extractors for ICDs is the use of mul- Extracting Information to Enrich the large test/training corpus of radiologi- tiple, parallel components followed by cal reports and their corresponding hu- some sort of adjudication module. EHR and for Decision Support man-coded ICD-9-CM codes. The The past decade has seen the ascendancy The past dozen years has seen an in- project is described in Pestian et al. [30] of a remarkable general-purpose infor- crease of interest in using NLP for en- and was very well conceived - espe- mation extraction tool for clinical texts. riching the content and utility of the cially the evaluation metrics. Ulti- As noted above, it is called MedLEE EHR, especially to support computer- mately, 44 different research teams par- and its use as a code extractor is well ized decision-making. We have catego-

Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 IMIA Yearbook of Medical Informatics 2008 For personal or educational use only. No other uses without permission. All rights reserved. 136 Meystre et al.

rized this work into four broad groups, lar, it is likely that we will see an in- NLP system is constrained by the qual- but the boundaries between them are crease in research in this area. ity of the human-composed text. They fluid, as is often the case in NLP. Another way to enrich the value of the showed that even the most basic of in- In contrast to work done in the early EHR using information extraction is formation, demographics, are often in- 1990s, recent work on the automatic case finding. In this setting the goal is consistently entered by humans. They structuring of documents using NLP has to find patients that match certain cri- compared the demographic data in dis- been on the wane. Kashyap et al. [120] teria based on either text alone or text charge summaries as extracted by an used a commercial product called A- in conjunction with other EHR data. early prototype of MedLEE to the origi- Life© to automatically structure stan- Day et al. [124] used the MPLUS NLP nal data input by humans at admission. dard admission notes such as the his- system to classify trauma patients, and The NLP system performed quite well tory and physical. They reached the the system did well enough that it is in at extracting the demographics, while same conclusion common to similar use daily at a Level 1 trauma center. the demographics input by humans was efforts in the past, namely that NLP Mendonça's team [125] used MedLEE quite inconsistent. As clinical text re- technology is not ready yet to com- to identify pneumonia in newborns with positories grow, they note, the reposi- pletely structure these texts. They ar- a very reasonable F-measure. Commu- tories will increasingly be filled with gue that, given the volume of admis- nity acquired pneumonia (CAP) is a conflicting data, posing a challenge to sion data, even partial support is a wor- very common problem in healthcare any NLP system. thy goal. The VA CPRS system that was today and it has been the focus of sev- noted in the Introduction provides an eral NLP efforts. Fiszman et al. [126] interesting and large-scale platform for showed how SymText could be used to Information Extraction for Surveillance research; Lovis et al. [121] used a find cases of CAP by comparing clini- One of the great benefits of computing handcrafted parser to assist in the struc- cal notes to the CAP clinical guidelines. in general is the ability of a computer turing of computerized provider order Using a similar technical approach, to do mundane, repetitive tasks where entry fields. While the parser itself is Aronsky et al. combined the same NLP humans have a hard time maintaining limited to use within CPRS, the study is system with a Bayesian network to iden- vigilance. Surveillance based on clini- important because it was the first to show tify general pneumonia. Finally, again cal texts is precisely such a task, at the that NLP could be used successfully using MedLEE, Jain et al. [127] dem- same time both very important and pro- within the CPRS environment. See also onstrated a very impressive F-measure foundly tedious. Adverse events sur- the section below on research uses of and finding cases of tuberculosis in ra- veillance based on clinical texts is a good NLP, which describes a note structuring diographic reports. example. Penz et al. [134] used system in use at the Mayo Clinic. Beyond the three categories above, the MedLEE to test the feasibility of As clinical text systems have grown in use of NLP to enrich the EHR and to mounting surveillance for adverse popularity, a problem revealed itself: support decision-making is quite di- events related to central venous cath- the sheer number of notes from so many verse. Representative of that work are eters, using surgical operation reports diverse disciplines being integrated into examples such as: Meystre and Haug from the VA's CPRS. Their specificity one spot makes navigation through them using MMTx, combined with was about 0.80 and their sensitivity was all quite difficult. Two interesting pa- Chapman's NegEx algorithm, to enrich about 0.72. Error analysis showed that pers reported on the use of NLP to make the problem list [67,128,129]. In 2005 errors were due to the difficulties of navigation easier through visualization Hazelhurst et al. [130] described their processing raw clinical text using a stan- of notes. Cimino et al. [122] recently MediClass NLP system which is an in- dard parser (coupled with inadequate described work that successfully ab- teresting combination of knowledge- provider documentation). Interestingly, stracted and summarized medication based and NLP-based techniques. They they found that the corresponding ad- data in an effort to improve patient demonstrate its utility in the automatic ministrative data for detected catheter safety. Liu and Friedman [123] dem- discovery of vaccination reactions from placements (e.g., ICD-9 codes) only onstrated that a tool they call CliniViewer, clinical notes and in assessing adher- captured about 11% of the use of these built using MedLEE and an XML en- ence to tobacco cessation guidelines devices, showing that the text was a far gine, can be used to summarize and navi- [131,132]. better place to look for catheter place- gate clinical text. As clinical text mod- In 1996, Johnson and Friedman [133] ment information than billing data. ules in the EHR become more popu- noted a caution: the performance of any Melton and Hripcsak [135] used

IMIA Yearbook of Medical Informatics 2008 Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 For personal or educational use only. No other uses without permission. All rights reserved. 137 Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research

MedLEE to mount a surveillance for a plaint, as well as keyword searching in flects a growing acceptance of infor- broad range of adverse events. While dictated ED histories and physicals. matics research into the mainstream the sensitivity for their technique was There are several efforts underway medical literature. Medical journals low at 0.28, the specificity was quite within the VA system to use NLP in often require a more rigorous clinical high at 0.99. Cao et al. [136] used a quality surveillance. A representative evaluation of informatics tools such as straightforward keyword-based NLP study of this work is that by Brown et Pakhomov's, and it is refreshing to see approach for the surveillance of adverse al. [141]. They used pattern-matching informatics tools compared in a rigor- drug events, but found only modest techniques to extract information from ous statistical way to quantitative and positive predictive value. Both of an electronic quality (eQuality) assess- qualitative health services research tech- Melton and Cao studies used discharge ment form used within the VA system. niques. Xu et al., [26] using MedLEE, summaries, which are fairly clean clini- They reported a sensitivity of 0.87 and extracted subject eligibility data from cal documents. a specificity of 0.71, and the note that surgical pathology reports. These re- Syndromic surveillance has become a for sensitivity human performance was ports often present structural process- popular area of research, especially only 4 to 6% better. ing barriers for MedLEE, so the team with growing concerns about national designed a preprocessor that was tai- security and pandemic issues. Chapman Information Extraction Supporting lored to emphasize eligibility data. et al. [137] reported on a system using An interesting use of statistical NLP to MPLUS to conduct biosurveillance of Research support research is presented by Niu et chief complaint text fragments. The Under the stewardship of the NIH al. [145]. They used classic n-gram tech- system's performance was good enough Roadmap project called the Clinical and niques coupled with machine learning that it was used in the Winter Olympic Translational Science Award (CTSA) and negation to try to discern the "polar- Games in 2002. Pneumonia outbreaks process, translational research is boom- ity" of sentences in the journal Clinical are an important clinical surveillance ing, along with translational informatics Evidence, which summarizes recent find- issue, so Haas et al. [138] extracted in- research. The first CTSA awards were ings in the clinical literature. In this formation from neonatal chest x-ray made in 2007 and we anticipate that sense, the polarity refers to whether the reports using MedLEE. The positive research-oriented NLP studies will soon outcome was "positive," "negative," predictive value of the system was 0.79 be appearing. For now, the application "neutral," or "no outcome reported." but the negative predictive value was of NLP to information extraction from Their average F-measure for each was greater than 0.99. The NLM's MetaMap clinical texts to support research is a com- in the high 80s, with the best perfor- tool was used by Chapman et al. [139] paratively small body of work. mance on positive outcomes. This ap- for the biosurveillance of general res- By far, the most common use of NLP proach could be used to assist clinicians piratory findings in the emergency de- in this context is in subject recruitment, in automatic question answering or to partment. The results were moderately where textual data is used to identify locate studies that are pertinent to their low with an F-measure in the mid-60s. patients who may benefit from being research. Their error analysis allowed the research in a study. Pakhomov et al. adapted the team to identify areas that would im- text analysis system created at the Mayo prove MetaMap's performance, and Clinic for the structuring of semi-struc- De-identification of Clinical Text these are very likely to be applicable to tured notes for use in identifying pa- In the United States, the HIPAA (Health any concept extractor using emergency tients with angina [142,143] and heart Insurance Portability and Accountabil- department clinical text (e.g., tempo- failure [144]. In both domains, the ity Act, codified as 45 CFR §160 and ral discrimination, anatomic location NLP system improved ICD-9 based 164) protects the confidentiality of pa- discrimination, finding-disease pair dis- subject searching. Once the texts of in- tient data, and the Common Rule (codi- crimination, and in contextual infer- terest were structured, they used key- fied as 45 CFR §46) protects the con- ence). In another study, Chapman [140] word searches on now-mapped concep- fidentiality of research subjects. The did show that surveillance for fever, as tual entities to identify the patients of European Union Data Protection Di- a biosurveillance indicator, could be interest. His two papers that appeared rective provides similar confidentiality readily accomplished using keywords in 2007 are especially interesting be- protection. These laws typically require and a probabilistic algorithm called cause they appeared in medical jour- the informed consent of the patient and CoCo to infer fever from chief com- nals, not informatics journals. This re- approval of the Institutional Review

Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 IMIA Yearbook of Medical Informatics 2008 For personal or educational use only. No other uses without permission. All rights reserved. 138 Meystre et al.

Board (IRB) to use data for research charge summaries, and successfully re- ing the de-identified document to the purposes, but these requirements are moved about 99% of the identifiers. To suppressed identifiers. waived if data are de-identified. detect proper names only, two differ- Beckwith at al. [154] have developed Anonymization and de-identification ent approaches have been reported. an open source system removing PHI are often used interchangeably, but de- Taira et al. [149] trained a system with from pathology reports and called HMS identification only means that explicit a corpus of annotated reports from pe- Scrubber. This system first removed all identifiers are hidden or removed, when diatric patients. A lexical analyzer at- identifying information from the header anonymization implies that the data tributed syntactic and semantic tags to of the reports that were also found in cannot be linked to identify the patient each token, and obvious non-patient the body of the report. It then used 50 (i.e., de-identified is often far from names (drug names, institutions, de- regular expressions to detect and re- anonymous). Scrubbing is also some- vices, etc.) were removed. A maximum move dates, addresses, accession num- times used as a synonym of de-identi- entropy model was then used to deter- bers, and names cited with markers such fication. mine the probability that a token can as Dr, MD, PhD, etc. Finally, it used For a narrative text document to be take the PATIENT role. With a deci- two freely available lists of names considered de-identified, the HIPAA sion threshold of 0.55, a 99.2% preci- (90,000 unique first and last names "Safe Harbor" technique requires 18 sion and a 93.9% recall were measured. from the 1990 US census) and of loca- data elements (called PHI: Protected Thomas et al. [150] used the property tions (16,000 unique cities, towns, etc. Health Information) to be removed, of names to usually occur in pairs or from the US Census Bureau). When such as names, telephone numbers, ad- be preceded or followed by affixes (e.g. evaluated, this system removed 98.3% dresses, dates, and identifying numbers. Dr, MD) to detect and replace them in of the PHI present in 1800 pathology Dorr et al. [146] have evaluated the the narrative section of pathology re- reports from the SPIN (Shared Pathol- time cost to manually de-identify nar- ports. With a list of clinical and com- ogy Informatics Network). rative text notes (average of 87.2 ± 61 mon usage words, and a list of proper The largest effort to develop and evalu- seconds per note), and concluded that names, they correctly identified 98.7% ate automated de-identification has been it was time-consuming and difficult to of the proper names. achieved in the context of the i2b2 de- exclude all PHI required by HIPAA. The Concept-Match scrubbing algo- identification challenge in 2006. Already well aware of these issues, sev- rithm was developed by Berman [151] Uzuner et al. [32] have first created a eral authors have investigated auto- and took a radical approach to de-iden- corpus of 889 de-identified and "re- mated de-identification of narrative text tify pathology reports: all phrases that identified" (with realistic surrogates) documents from the EHR. Sweeney could be matched with UMLS concepts discharge summaries. Identifying infor- developed the Scrub system [147] to were replaced by the corresponding mation was first tagged using statisti- hide personally identifying information code (CUI) and another synonym map- cal Named Entity Recognition tech- (names, contact information, identify- ping to the same code, and all other niques. This system was based on ing numbers, age, etc.). Each specific words (except stop words) were replaced SVMs using local context (mostly lexi- entity was detected by a specific algo- by asterisks. The algorithm was fast but cal features and part-of-speech) and a rithm using a list of all possible values was not formally evaluated. Fielstein few dictionaries (names, locations, hos- (e.g., an algorithm detected first names et al. [152] have evaluated an algo- pitals and months). It was compared to and used a list of all commonly known rithm using regular expressions and a other systems and achieved the best first names). This system found 99- city list to remove PHI as defined by performance with 95% recall and 100% of identifying information. HIPAA (except photographic images) 97.5% precision [155]. A manual veri- Ruch et al. [148] adapted a system and achieved a 92% sensitivity and a fication of the de-identified documents build for disambiguation, the 99.9% specificity. The De-Id system was then executed, followed by the re- MEDTAG system, to detect and replace was developed to remove all PHI from placement of this information with re- all instances of titles and names. They narrative clinical reports [153]. It used alistic surrogates and the addition of used the MEDTAG lexicon to tag se- rules and dictionaries that were incre- some ambiguity and randomly gener- mantic types, and manually written dis- mentally improved to finally miss some ated surrogate PHIs. This corpus, with ambiguation rules. The system was identifiers in only 3.4% of the reports. tagged PHI, was then made available evaluated with mostly French surgery Unlike all other system described, De- to the seven teams who participated in reports, laboratory results, and dis- Id keeps an encrypted linkage file ty- the challenge. About 3/4 of the corpus

IMIA Yearbook of Medical Informatics 2008 Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 For personal or educational use only. No other uses without permission. All rights reserved. 139 Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research

was made available for training, and ensure system interoperability. The tra- University and the Portland Veterans then the remaining 1/4 was used for ditional method for building them re- Administration Medical Center through testing. The systems developed and sub- lies on experts to identify the terms and February 1995. Harris et al. [165] and mitted for testing by the teams had to create the hierarchy, a process which is Savova et al. [166] investigate a method remove names of patients, doctors, hos- time-consuming and which requires the for term candidate discovery for the pitals, and locations, as well as identi- collaborative effort of domain special- domain of patient functioning, disabil- fication numbers, dates, phone numbers ists. Here, we focus on summarizing the ity and health and later apply lexico- and ages above 90. The best systems were field as applied to the clinical domain. syntactic patterns and latent semantic developed by Wellner et al. [156], and For a comprehensive review of the topic analysis to induce structure for the can- by Szarvas et al. [157]. The best system as related to the entire field of biomedi- didate terms [167]. Do Amaral et al. developed by Wellner et al. was based cine, its methods and terminological re- [168] use radiology reports to apply on Carafe, a toolkit implementing Con- sources, consult [158]. They outline the NLP techniques to abstract the reports' ditional Random Fields developed at the general steps for automatic terminology general framework and discover the MITRE Corporation (Bedford, MA). management: (1) automatic term recog- reports' semantic template. Friedman et This system tagged each token as part of nition, (2) term variants augmentation, al. [169] describe their controlled vo- a PHI phrase or not, and also included (3) automatic term structuring. cabulary development tool, which dis- some regular expressions to detect phone The most recent advances in automatic plays candidate terms along with usage numbers, zip codes, addresses, etc. and terminology management in the clini- statistics obtained from a corpus, their a lexicon of US state names, months, and cal domain are represented by systems compositional structure, and suggested English words. It reached a 97.5% re- that employ the combination of NLP ontology mappings. call and 99.22% precision (F-measure techniques for term discovery and A number of vocabulary servers are of 98.35%). The system developed by lexico-syntactic patterns for semantic available for the biomedical domain to Szarvas et al. used local context, regular relation discovery along with visualiza- support terminology management - expressions (for ages, dates, identifica- tion tools. Baneyx et al. [159,160] in- UMLS knowledge source server [170], tion numbers, and phone numbers), and vestigation focuses on building an on- LexGrid [171], Metaphrase [172], Medi- dictionaries (first names, US locations, tology of pulmonary diseases. Zhou et cal Entities Dictionary (MED) [173]. All names of countries, and names of dis- al. [161] experiment with surgical pa- of them are Web-based interfaces that eases). They then used decision tree al- thology reports, while Charlet et al. take as input a user specified term and gorithms (C4.5 and Boosting) to clas- [162] work is in the surgical intensive return ontological mappings. sify each word as PHI or non-PHI. Their care domain. Kolesa and Preckova system reached a 96.4% recall and a [163] tackle an additional complexity Clinical Text Corpora and their 98.9% precision (F-measure of 97.6%). - that of a semi-automated, NLP-based In general, methods based on dictionar- localization of international biomedi- Annotation ies performed better with PHI that is cal ontologies, in their case a Czech The use of automatic information ex- rarely mentioned in clinical text, but drug ontology seeded with terms dis- traction and retrieval tools depends are difficult to generalize. Methods covered from drug information leaflets. heavily on the quality of the annotated based on machine learning tend to per- All of them successfully demonstrate corpora available for their training and form better but require annotated cor- the use of NLP and IE techniques in testing. Currently, much work is being pora for training. the full-circle process of terminology done on developing guidelines for cor- discovery and ontology building. pus annotation, identifying relevant A number of other efforts describe ap- features to annotate, and on the char- Automatic Terminology Management proaches to the subtasks in the process acterization of what makes a particular Terminologies, lists of vetted terms for of automatic terminology management. corpus usable. a given domain, and ontologies, the Hersh et al. [164] is one of the first Chapman et al. [174] present an anno- relational organization of the vetted investigations combining NLP tech- tation schema to manually annotate terms, are critical for a number of clini- niques for the task of candidate term clinical conditions. The schema was cal domain applications - concept-based discovery and terminology expansion, developed based on 40 emergency de- information retrieval, decision-support which they test on all EHR narrative partment reports and tested on 20 such systems, autocoding among many - to reports at the Oregon Health Sciences reports. The two authors acted as anno-

Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 IMIA Yearbook of Medical Informatics 2008 For personal or educational use only. No other uses without permission. All rights reserved. 140 Meystre et al.

tators and achieved a high agreement certainty, evidence, and directionality extraction, which extracts the known and an F-measure of 93%. They point and developed guidelines on how to facts from text and presents them in a out that there are no standard guide- annotate sentence fragments along these structured form. The inspiration for text lines determining which words to in- five dimensions. The guidelines were mining comes from the pioneering clude in the annotation of clinical texts; developed over a one-year period work of Swanson [180] in which he thus their proposal focuses on which through multiple iterations of testing brilliantly demonstrates that chaining semantic categories and words are im- and revision. Results of 12 annotators facts from disparate literature sources portant to include in such annotations. on 101 sentences from biomedical pe- could lead to the generation of new sci- They suggest that similar methodology riodicals are reported between 70-80%. entific hypotheses. can be used to develop principled guide- This methodology and guidelines are Biomedical text mining has been pri- lines for other clinical text annotation. being used to annotate a large corpus marily explored in relation to literature, In a follow-up investigation, Chapman of 10,000 sentences to serve as train- the main reasons being the confidenti- et al.[175] examined the improvement ing corpus for automated classifiers. An ality provisions that govern patient clini- in agreement among annotators after interesting point is that the difficulty cal records and the limited number of they were trained with the annotation of the annotation varies considerably investigators with access to such data. schema. For this investigation, three depending on the dimension being an- Clinical text mining has been investi- physicians and three lay people were notated, with rating of the evidence gated for finding association patterns. used as annotators and concluded that being one of the most challenging tasks. Chen et al. [181] employ text mining physicians presented a higher agreement Liu et al. [27] study and review the and statistical techniques to identify after training on the schema than when types of error a system that automati- disease-drug associations in the bio- applying a baseline one-hour training; cally extracts information from pathol- medical literature and discharge sum- moreover, lay people performed almost ogy reports makes. The information maries and conclude that there are dis- as well as physicians when trained on the extracted was compared to a manually tinct patterns in the drug usage as re- schema. These results suggest that good annotated gold standard. The authors ported in the literature and as recorded annotation guidelines are essential to good classified the errors into: 1) system er- in the patient record. Cao et al. [182] annotation quality, especially when the rors and 2) semantic disagreement be- explore the automatic calibration of the annotators are not domain experts. tween the report and the annotation. statistic value and apply it for the dis- Cohen et al. [176] examine six avail- This second point shows that even when covery of disease-findings associations. able corpora with respect to their de- gold standard annotations are available In another study, Cao et al. [183] show sign characteristics to determine which they may still be difficult to interpret that statistical methods are successful features may be responsible for their and automatic extraction may be more in finding strong disease-finding rela- high or low usage rates by external sys- valid for some variables than for other. tions. Their use-case was a knowledge tems. Their conclusion is that semantic base construction for the patient prob- annotation, standard formats for anno- lem list generation. Rindflesch et al. tation and distribution, and high-qual- Clinical Text Mining [184] use statistical methods to con- ity annotation of structural and linguis- Ananiadou and McNaught [8] and struct a database of drug-disorder co- tic characteristics are relevant features Hirschman and Blaschke [179] provide occurrences from a large collection of and good predictors of usage. Cohen et an extensive overview of the state-of- clinical notes from the Mayo Clinic. al. [177] analyze in further detail cor- the-art of text mining and its challenges pus design characteristics and suggest that in biomedicine. In our review here, we good documentation, balanced represen- focus on text mining in the clinical tation, the ability to recover the original domain. We adhere to the widely-ac- Conclusions and Future text, and data on inter-annotator agree- cepted definition of text mining by Challenges ment are the main characteristics to pro- Hearst [9] and used in Ananiadou and mote a high-level use of a corpus. McNaught [8] - the discovery and ex- In this paper, we reviewed the advances Wilbur et al. [178] discuss what prop- traction of new knowledge from un- of information extraction from free- erties make a text useful for data-min- structured data - and contrast it with text EHR documents. IE is still a rela- ing applications. They identified 5 data mining, which finds patterns from tively new field of research in the bio- qualitative dimensions: focus, polarity, structured data, and with information medical domain, and the extraction of

IMIA Yearbook of Medical Informatics 2008 Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 For personal or educational use only. No other uses without permission. All rights reserved. 141 Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research

information from clinical text is even tional challenges that could also con- Care 1995:284-8. 16. McCray AT, Sponsler JL, Brylawski B, Browne AC. newer. Compared to the IE tasks of the tribute to performance improvements. The role of lexical knowledge in biomedical text Message Understanding Conferences, Improvements in system performance understanding. SCAMC 87; IEEE; 1987. p. 103-7. results in clinical text IE were often will subsequently enhance the accep- 17. Lindberg C. The Unified Medical Language Sys- tem (UMLS) of the National Library of Medicine. mixed. Reasons proposed for this dif- tance and usage of IE in concrete clini- Journal (American Medical Record Association). ference are that more experience is cal and biomedical research contexts. 1990 May;61(5):40-2. needed, annotated text corpora are rare 18. Aronson AR. Effective mapping of biomedical text Acknowledgments to the UMLS Metathesaurus: the MetaMap pro- and small, and clinical text is simply We warmly thank Wendy W. Chapman for her help in gram. Proc AMIA Symp 2001:17-21. harder to analyze than biomedical lit- 19. McNaught J, Black WJ. Information extraction: the reviewing this paper. task. In: Ananiadou S, McNaught J, editors. Text erature, or even newswires. During the Mining for Biology and Biomedicine: Artech House last several years, performance has Books; 2006. p. 143-76. References 20. Hobbs JR. The generic information extraction sys- gradually improved, exceeding 90% tem. Proc MUC-5; Baltimore, MD: Morgan sensitivity and specificity in several 1. Spyns P. Natural language processing in medicine: Kaufmann; 1993. p. 87-92. an overview. Methods Inf Med 1996 Dec;35(4- cases. Systems are now mostly statisti- 21. Hobbs JR. Information extraction from biomedical 5):285-301. text. J Biomed Inform 2002 Aug;35(4):260-4. cally-based, and therefore require an- 2. Cohen AM, Hersh WR. A survey of current work in 22. Pakhomov S, Buntrock J, Duffy PH. High Through- notated corpora for training. Creating biomedical text mining. Brief Bioinform 2005 put Modularized NLP System for Clinical Text Mar;6(1):57-71. 43rd Annual Meeting of the Association for Com- annotated clinical text corpora is one 3. Zweigenbaum P, Demner-Fushman D, Yu H, Cohen putational Linguistics; 2005; Ann Arbor, MI; 2005. of the main challenges for the future KB. Frontiers of biomedical text mining: current 23. cancer Text Information Extraction System (caTIES) of this field. The effort required to de- progress. Brief Bioinform 2007:358-75. website. [cited 01/10/2008]; Available from: https:/ 4. DeJong GF. An Overview of the FRUMP System. /cabig.nci.nih.gov/tools/caties velop annotated corpora is significant and In: Ringle WGLaMH, editor. Strategies for Natural 24. Friedman C. A broad-coverage natural language pro- patient data confidentiality issues ham- Language Processing. Hillsdale, NJ: Lawrence cessing system. Proc AMIA Symp 2000:270-4. per access to data. An issue that we ob- Erlbaum; 1982. p. 149-76. 25. Friedman C, Shagina L, Lussier Y, Hripcsak G. Automated encoding of clinical documents based served in several publications is the qual- 5. Google. [cited 01/10/2008]; Available from: http:/ /www.google.com on natural language processing. J Am Med Inform ity of the evaluation of the systems. The 6. PubMed. [cited 01/10/2008]; Available from: http:/ Assoc 2004:392-402. /www.ncbi.nlm.nih.gov/sites/entrez/ 26. Xu H, Anderson K, Grann VR, Friedman C. Facili- study design might be prone to biases, tating cancer research using natural language process- 7. Carbonell JG, Hayes PJ. Natural Language Under- and the reference standards used might ing of pathology reports. Medinfo 2004:565-72. standing. In: Shapiro SC, editor. Encyclopedia of have limited value, especially when cre- 27. Liu K, Mitchell KJ, Chapman WW, Crowley RS. Artificial Intelligence: Wiley;1992. p. 660-77. Automating tissue bank annotation from pathology ated by only one reviewer. Robust 8. Ananiadou S, McNaught J. Text Mining for Biol- reports - comparison to a gold standard expert anno- evaluation practices in this domain are ogy and Biomedicine: Artech House, Inc; 2006. tation set. AMIA Annu Symp Proc 2005:460-4. 9. Hearst MA. Untangling text data mining. Proc 28. Hahn U, Romacker M, Schulz S. Creating knowl- well described in Hripcsak et al. [100]. 37th Annual meeting of the Association for Com- edge repositories from biomedical reports: the The potential uses of information ex- putational Linguistics. College Park, MD; 1999. MEDSYNDIKATE text mining system. Pac Symp traction from clinical text are numer- p. 3-10. Biocomput 2002:338-49. 10. Liu H, Lussier YA, Friedman C. Disambiguating 29. International Challenge: Classifying Clinical Free ous and far-reaching. Current applica- ambiguous biomedical terms in biomedical narra- Text Using Natural Language Processing. [cited tions, however, are rarely applied out- tive text: an unsupervised method. J Biomed In- 01/10/2008]; Available from: http://www. form 2001:249-61. computationalmedicine.org/challenge/index.php side of the laboratories they have been 11. Sager N, Friedman C, Chi E. The analysis and 30. Pestian JP, Brew C, Matykiewicz P, Hovermale DJ, developed in, mostly because of processing of clinical narrative. In: Salamon R, Johnson N, Cohen KB, et al. A Shared Task In- scalability and generalizability issues. Blum B, Jørgensen M, editors. Medinfo 86; 1986; volving Multi-label Classi?cation of Clinical Free Amsterdam (Holland): Elsevier; 1986. p. 1101-5. Text. BioNLP 2007: Biological, translational, and In the same way the MUCs have fos- 12. Friedman C, Johnson SB, Forman B, Starren J. clinical language processing. Prague, CZ; 2007. tered the development of information Architectural requirements for a multipurpose natu- 31. i2b2 (Informatics for Integrating Biology and the extraction in the general domain, simi- ral language processor in the clinical environment. Bedside) website. [cited 01/10/2008]; Available Proc Annu Symp Comput Appl Med Care. from: https://www.i2b2.org/ lar competitive challenges for informa- 1995:347-51. 32. Uzuner O, Luo Y, Szolovits P. Evaluating the state- tion extraction from clinical text will 13. Hripcsak G, Kuperman GJ, Friedman C. Extracting of-the-art in automatic de-identification. J Am Med findings from narrative reports: software transfer- Inform Assoc 2007:550-63. undoubtedly stimulate advances in the ability and sources of physician disagreement. 33. Uzuner O, Goldstein I, Luo Y, Kohane I. Identify- field reviewed here. Organizing these Methods Inf Med 1998:1-7. ing Patient Smoking Status from Medical Dis- competitive challenges is another chal- 14. Haug PJ, Ranum DL, Frederick PR. Computerized charge Records. J Am Med Inform Assoc 2008 extraction of coded findings from free-text radio- January-February;15(1):14-24. Epub 2007 Oct 18. lenge for the future. Some domains of logic reports. Work in progress. Radiology 1990 34. Ruch P, Baud R, Geissbuhler A. Using lexical dis- research like discourse analysis and tem- Feb;174(2):543-8. ambiguation and named-entity recognition to im- 15. Haug PJ, Koehler S, Lau LM, Wang P, Rocha R, prove spelling correction in the electronic patient porality analysis have not been investi- Huff SM. Experience with a mixed semantic/syn- record. Artif Intell Med 2003:169-84. gated thoroughly yet and pose addi- tactic parser. Proc Annu Symp Comput Appl Med 35. Tolentino HD, Matters MD, Walop W, Law B, Tong

Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 IMIA Yearbook of Medical Informatics 2008 For personal or educational use only. No other uses without permission. All rights reserved. 142 Meystre et al.

W, Liu F, et al. A UMLS-based spell checker for mentation and Evaluation of Four Different Meth- cessing system to extract and code concepts relat- natural language processing in vaccine safety. BMC ods of Negation Detection. DSG technical report? ing to congestive heart failure from chest radiology Med Inform Decis Mak 2007:3. 55. Mutalik PG, Deshpande A, Nadkarni PM. Use of reports. AMIA Annu Symp Proc 2006:269-73. 36. Miller G. WordNet: a dictionary browser. Proc of general-purpose negation detection to augment con- 72. Bashyam V, Divita G, Bennett DB, Browne AC, the First International Conference on Information cept indexing of medical documents: a quantitative Taira RK. A normalized lexical lookup approach to and Data; 1985; Ontario, Canada; 1985. study using the UMLS. J Am Med Inform Assoc identifying UMLS concepts in free text. Medinfo 37. Fellbaum C. WordNet: An electronic lexical data- 2001:598-609. 2007:545-9. base. Cambridge, MA: MIT Press; 1998. 56. Elkin PL, Brown SH, Bauer BA, Husser CS, 73. Divita G, Tse T, Roth L. Failure analysis of 38. Tomanek K, Wermter J, Hahn U. A reappraisal of Carruth W, Bergstrom LR, et al. A controlled trial MetaMap Transfer (MMTx). Medinfo. 2004;11(Pt sentence and token splitting for life sciences docu- of automated classification of negation from clini- 2):763-7. ments. Medinfo 2007:524-8. cal notes. BMC Med Inform Decis Mak 2005:13. 74. Baud RH, Lovis C, Rassinoux AM, Scherrer JR. 39. Weeber M, Mork JG, Aronson AR. Developing a 57. Huang Y, Lowe HJ. A novel hybrid approach to Morpho-semantic parsing of medical expressions. test collection for biomedical word sense disam- automated negation detection in clinical radiology Proc AMIA Symp 1998:760-4. biguation. Proc AMIA Symp 2001:746-50. reports. J Am Med Inform Assoc 2007:304-11. 75. Mamlin BW, Heinze DT, McDonald CJ. Automated 40. Liu H, Teller V, Friedman C. A multi-aspect com- 58. Hripcsak G, Zhou L, Parsons S, Das AK, Johnson extraction and normalization of findings from can- parison study of supervised word sense disambigu- SB. Modeling electronic discharge summaries as a cer-related free-text radiology reports. AMIA Annu ation. J Am Med Inform Assoc 2004:320-31. simple temporal constraint satisfaction problem. J Symp Proc 2003:420-4. 41. Xu H, Markatou M, Dimova R, Liu H, Friedman Am Med Inform Assoc. 2005 Jan-Feb;12(1):55-63. 76. Spyns P, De Moor G. A Dutch medical language C. Machine learning and word sense disambigua- 59. Zhou L, Melton GB, Parsons S, Hripcsak G. A processor. Int J Biomed Comput 1996:181-205. tion in the biomedical domain: design and evalua- temporal constraint structure for extracting tempo- 77. Dolin RH, Alschuler L, Beebe C, Biron PV, Boyer tion issues. BMC 2006:334. ral information from clinical narrative. J Biomed SL, Essin D, et al. The HL7 Clinical Document 42. Pakhomov S, Pedersen T, Chute CG. Abbreviation Inform 2006:424-39. Architecture. J Am Med Inform Assoc 2001 Nov- and acronym disambiguation in clinical discourse. 60. Zhou L, Friedman C, Parsons S, Hripcsak G. Sys- Dec;8(6):552-69. AMIA Annu Symp Proc 2005:589-93. tem architecture for temporal information extraction, 78. Huang Y, Lowe HJ, Klein D, Cucina RJ. Improved 43. Coden A, Savova G, Buntrock J, Sominsky I, Ogren representation and reasoning in clinical narrative re- identification of noun phrases in clinical radiology PV, Chute CG, et al. Text analysis integration into ports. AMIA Annu Symp Proc 2005:869-73. reports using a high-performance statistical natural a medical information retrieval system: challenges 61. Zhou L, Parsons S, Hripcsak G. The Evaluation of a language parser augmented with the UMLS special- related to word sense disambiguation. Medinfo; Temporal Reasoning System in Processing Clinical ist lexicon. J Am Med Inform Assoc 2005:275-85. 2007; Brisbane, Australia; 2007. Discharge Summaries. J Am Med Inform Assoc 79. Friedman C, Alderson PO, Austin JH, Cimino JJ, 44. Campbell DA, Johnson SB. Comparing syntactic 2007. Johnson SB. A general natural-language text pro- complexity in medical and non-medical corpora. 62. Harkema H, Setzer A, Gaizauskas R, Hepple M. cessor for clinical radiology. J Am Med Inform Proc AMIA Symp 2001:90-4. Mining and Modelling Temporal Clinical Data. Assoc 1994 Mar-Apr;1(2):161-74. 45. Coden AR, Pakhomov SV, Ando RK, Duffy PH, Chute Proceedings of the UK e-Science All Hands Meet- 80. Jain NL, Friedman C. Identification of findings CG. Domain-specific language models and lexicons for ing 2005 2005:507-14. suspicious for breast cancer based on natural lan- tagging. J Biomed Inform 2005:422-30. 63. Bramsen P, Deshpande P, Lee YK, Barzilay R. In- guage processing of mammogram reports. Proc 46. Liu K, Chapman W, Hwa R, Crowley RS. Heuris- ducing Temporal Graphs. Proceedings of the 2006 AMIA Annu Fall Symp 1997:829-33. tic sample selection to minimize reference standard Conference on Empirical Methods in Natural Lan- 81. Wilcox AB, Narus SP, Bowes WA, 3rd. Using training set for a part-of-speech tagger. J Am Med guage Processing (EMNLP 2006). Sydney, Austra- natural language processing to analyze physician Inform Assoc 2007:641-50. lia 2006:189-98. modifications to data entry templates. Proc AMIA 47. Hahn U, Wermter J. High-Performance Tagging on 64. Bramsen P, Deshpande P, Lee YK, Barzilay R. Find- Symp 2002:899-903. Medical Texts. 20th International Conference on ing temporal order in discharge summaries. AMIA 82. Castilla AC, Furuie SS, Mendonca EA. Multilin- Computational Linguistics, Geneva, Switzerland; Annu Symp Proc 2006:81-5. gual information retrieval in thoracic radiology: 2004. 65. Chu D, Dowling JN, Chapman WW. Evaluating feasibility study. Medinfo 2007:387-91. 48. Campbell DA, Johnson SB. A transformational- the effectiveness of four contextual features in clas- 83. Schadow G, McDonald CJ. Extracting structured based learner for dependency grammars in discharge sifying annotated clinical conditions in emergency information from free text pathology reports. AMIA summaries. Proceedings of the ACL-02 Workshop department reports. AMIA Annu Symp Proc Annu Symp Proc 2003:584-8. on Natural Language Processing in the Biomedical 2006:141-5. 84. Chung J, Murphy S. Concept-value pair extraction Domain. Philadelphia, PN; 2002. 66. Chapman W, Chu D, Dowling JN. ConText: An from semi-structured clinical narrative: a case study 49. Clegg AB, Shepherd AJ. Benchmarking natural- Algorithm for Identifying Contextual Features from using echocardiogram reports. AMIA Annu Symp language parsers for biological applications using Clinical Text. BioNLP 2007: Biological, transla- Proc 2005:131-5. dependency graphs. BMC Bioinformatics 2007:24. tional, and clinical language processing. Prague, 85. Christensen L, Haug P, Fiszman M. MPLUS: A 50. Pyysalo S, Salakoski T, Aubin S, Nazarenko A. CZ; 2007. Probabilistic Medical Language Understanding Lexical adaptation of link grammar to the biomedi- 67. Meystre S, Haug PJ. Natural language processing System. BioNLP 2002. cal sublanguage: a comparative evaluation of three to extract medical problems from electronic clinical 86. Fiszman M, Haug PJ, Frederick PR. Automatic approaches. BMC Bioinformatics 2006:S2. documents: performance evaluation. J Biomed In- extraction of PIOPED interpretations from ventila- 51. Aronow DB, Fangfang F, Croft WB. Ad hoc classi- form 2006:589-99. tion/perfusion lung scan reports. Proc AMIA Symp fication of radiology reports. J Am Med Inform 68. Long W. Extracting diagnoses from discharge sum- 1998:860-4. Assoc 1999:393-411. maries. AMIA Annu Symp Proc 2005:470-4. 87. Fiszman M, Chapman WW, Aronsky D, Evans RS, 52. Chapman WW, Bridewell W, Hanbury P, Cooper 69. McCray AT, Aronson AR, Browne AC, Rindflesch Haug PJ. Automatic detection of acute bacterial GF, Buchanan BG. A simple algorithm for identi- TC, Razi A, Srinivasan S. UMLS knowledge for pneumonia from chest X-ray reports. J Am Med fying negated findings and diseases in discharge biomedical language processing. Bull Med Libr Inform Assoc 2000:593-604. summaries. J Biomed Inform 2001:301-10. Assoc 1993 Apr;81(2):184-94. 88. Trick WE, Chapman WW, Wisniewski MF, 53. Mitchell KJ, Becich MJ, Berman JJ, Chapman WW, 70. Turchin A, Kolatkar NS, Grant RW, Makhni EC, Peterson BJ, Solomon SL, Weinstein RA. Elec- Gilbertson J, Gupta D, et al. Implementation and Pendergrass ML, Einbinder JS. Using regular ex- tronic interpretation of chest radiograph reports to evaluation of a negation tagger in a pipeline-based pressions to abstract blood pressure and treatment detect central venous catheters. Infect Control Hosp system for information extract from pathology re- intensification information from the text of physi- Epidemiol 2003:950-4. ports. Medinfo 2004:663-7. cian notes. J Am Med Inform Assoc 2006:691-5. 89. Shared Pathology Informatics Network (SPIN) 54. Goryachev S, Sordo M, Zeng QT, Ngo L. Imple- 71. Friedlin J, McDonald CJ. A natural language pro- website. [cited 01/10/2008]; Available from: http:/

IMIA Yearbook of Medical Informatics 2008 Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 For personal or educational use only. No other uses without permission. All rights reserved. 143 Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research

/spin.nci.nih.gov/ 2007 Oct 18. trolled terminologies. Medinfo 2007:679-83. 90. General Architecture for Text Engineering (GATE) 107. Wicentowski R, Sydes MR. Using Implicit In- 123. Liu H, Friedman C. CliniViewer: a tool for view- website. [cited 01/10/2008]; Available from: http:/ formation to Identify Smoking Status in Smoke- ing electronic medical records based on natural /gate.ac.uk/ blind Medical Discharge Summaries. J Am Med language processing and XML. Medinfo 91. Fenstermacher D, Street C, McSherry T, Nayak V, Inform Assoc. 2008 January-February;15(1):29- 2004:639-43. Overby C, Feldman M. The Cancer Biomedical 31. Epub 2007 Oct 18. 124. Day S, Christensen LM, Dalto J, Haug P. Identi- Informatics Grid (caBIG™). Conf Proc IEEE Eng 108. Aronson AR, Bodenreider O, Demner-Fushman fication of trauma patients at a level 1 trauma Med Biol Soc 2005;1:743-6. D, Fung KW, Lee VK, Mork JG, et al. From center utilizing natural language processing. J 92. NCI Enterprise Vocabulary Services (EVS) website. indexing the biomedical literature to coding clini- Trauma Nurs 2007:79-83. [cited 01/10/2008]; Available from: http:// cal text: experience with MTI and machine learn- 125. Mendonca EA, Haas J, Shagina L, Larson E, Fried- evs.nci.nih.gov/ ing approaches. BioNLP 2007: Biological, trans- man C. Extracting information on pneumonia in 93. Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy lational, and clinical language processing. Prague, infants using natural language processing of radi- SN, Lazarus R. Extracting principal diagnosis, co- CZ; 2007:105-12. ology reports. J Biomed Inform 2005:314-21. morbidity and smoking status for asthma research: 109. Crammer K, Dredze M, Ganchev K, Talukdar PP, 126. Fiszman M, Haug PJ. Using medical language evaluation of a natural language processing system. Carroll S. Automatic Code Assignment to Medi- processing to support real-time evaluation of pneu- BMC Med Inform Decis Mak 2006:30. cal Text. BioNLP 2007: Biological, translational, monia guidelines. Proc AMIA Symp 2000:235-9. 94. Denny JC, Spickard A, 3rd, Miller RA, Schildcrout and clinical language processing. Prague, CZ; 127. Jain NL, Knirsch CA, Friedman C, Hripcsak G. J, Darbar D, Rosenbloom ST, et al. Identifying UMLS 2007:129-36. Identification of suspected tuberculosis patients concepts from ECG Impressions using 110. Baud R. A natural language based search engine for based on natural language processing of chest ra- KnowledgeMap. AMIA Annu Symp Proc ICD10 diagnosis encoding. Med Arh 2004:79-80. diograph reports. Proc AMIA Annu Fall Symp 2005:196-200. 111. Aramaki E, Imai T, Kajino M, Miyo K, Ohe K. 1996:542-6. 95. Denny JC, Peterson JF. Identifying QT prolonga- Statistical selector of the best multiple ICD-cod- 128. Meystre S, Haug PJ. Automation of a problem tion from ECG impressions using natural language ing method. Medinfo 2007;12(Pt 1):645-9. list using natural language processing. BMC Med processing and negation detection. Medinfo 112. Friedman C, Knirsch C, Shagina L, Hripcsak G. Inform Decis Mak 2005:30. 2007:1283-8. Automating a severity score guideline for com- 129. Meystre SM, Haug PJ. Comparing natural lan- 96. Taira RK, Soderland SG. A statistical natural lan- munity-acquired pneumonia employing medical guage processing tools to extract medical prob- guage processor for medical reports. Proc AMIA language processing of discharge summaries. Proc lems from narrative text. AMIA Annu Symp Proc Symp 1999:970-4. AMIA Symp 1999:256-60. 2005:525-9. 97. Bashyam V, Taira RK. Indexing anatomical phrases 113. Elkins JS, Friedman C, Boden-Albala B, Sacco 130. Hazlehurst B, Frost HR, Sittig DF, Stevens VJ. in neuro-radiology reports to the UMLS 2005AA. RL, Hripcsak G. Coding neuroradiology reports MediClass: A system for detecting and classify- AMIA Annu Symp Proc 2005:26-30. for the Northern Manhattan Stroke Study: a com- ing encounter-based clinical events in any elec- 98. Sibanda T, He T, Szolovits P, Uzuner O. Syntacti- parison of natural language processing and manual tronic medical record. J Am Med Inform Assoc cally-informed semantic category recognition in review. Comput Biomed Res 2000:1-10. 2005:517-29. discharge summaries. AMIA Annu Symp Proc 114. Kukafka R, Bales ME, Burkhardt A, Friedman C. 131. Hazlehurst B, Mullooly J, Naleway A, Crane B. 2006:714-8. Human and automated coding of rehabilitation Detecting possible vaccination reactions in clini- 99. Friedman C, Hripcsak G, Shablinsky I. An evalua- discharge summaries according to the International cal notes. AMIA Annu Symp Proc 2005:306-10. tion of natural language processing methodologies. Classification of Functioning, Disability, and 132. Hazlehurst B, Sittig DF, Stevens VJ, Smith KS, Proc AMIA Symp 1998:855-9. Health. J Am Med Inform Assoc 2006:508-15. Hollis JF, Vogt TM, et al. Natural language pro- 100. Hripcsak G, Kuperman GJ, Friedman C, Heitjan 115. Lussier YA, Shagina L, Friedman C. Automating cessing in the electronic medical record: assessing DF. A reliability study for evaluating information SNOMED coding using medical language under- clinician adherence to tobacco treatment guide- extraction from radiology reports. J Am Med standing: a feasibility study. Proc AMIA Symp lines. Am J Prev Med 2005:434-9. Inform Assoc 1999:143-50. 2001:418-22. 133. Johnson SB, Friedman C. Integrating data from 101. Friedlin J, McDonald CJ. Using a natural lan- 116. Hasman A, de Bruijn LM, Arends JW. Evaluation natural language processing into a clinical infor- guage processing system to extract and code fam- of a method that supports pathology report cod- mation system. Proc AMIA Annu Fall Symp ily history data from admission reports. AMIA ing. Methods Inf Med 2001;40(4):293-7. 1996:537-41. Annu Symp Proc 2006:925. 117. Pakhomov SV, Buntrock JD, Chute CG. Auto- 134. Penz JF, Wilcox AB, Hurdle JF. Automated iden- 102. Friedman C. Towards a comprehensive medical mating the assignment of diagnosis codes to pa- tification of adverse events related to central venous language processing system: methods and issues. tient encounters using example-based and machine catheters. J Biomed Inform 2007:174-82. Proc AMIA Annu Fall Symp 1997:595-9. learning techniques. J Am Med Inform Assoc 135. Melton GB, Hripcsak G. Automated detection of 103. Clark C, Good K, Jezierny L, Macpherson M, 2006:516-25. adverse events using natural language processing Wilson B, Chajewska U. Identifying Smokers 118. Haug PJ, Christensen L, Gundersen M, Clemons of discharge summaries. J Am Med Inform Assoc with a Medical Extraction System. J Am Med B, Koehler S, Bauer K. A natural language pars- 2005:448-57. Inform Assoc 2008 January-February;15(1):36-9. ing system for encoding admitting diagnoses. Proc 136. Cao H, Stetson P, Hripcsak G. Assessing explicit Epub 2007 Oct 18. AMIA Annu Fall Symp 1997:814-8. error reporting in the narrative electronic medical 104. Cohen AM. Five-way Smoking Status Classifi- 119. Gundersen ML, Haug PJ, Pryor TA, van Bree R, record using keyword searching. J Biomed In- cation Using Text Hot-Spot Identification and Koehler S, Bauer K, et al. Development and evalu- form 2003:99-105. Error-correcting Output Codes. J Am Med Inform ation of a computerized admission diagnoses encod- 137. Chapman WW, Christensen LM, Wagner MM, Assoc 2008 January-February;15(1):32-5. Epub ing system. Comput Biomed Res 1996:351-72. Haug PJ, Ivanov O, Dowling JN, et al. Classify- 2007 Oct 18. 120. Kashyap V, Turchin A, Morin L, Chang F, Li Q, ing free-text triage chief complaints into syndromic 105. Heinze DT, Morsch ML, Potter BC, Sheffer RE, Hongsermeier T. Creation of structured documen- categories with natural language processing. Artif Jr. Medical i2b2 NLP Smoking Challenge: The tation templates using Natural Language Process- Intell Med 2005:31-40. A-Life System Architecture and Methodology. J ing techniques. AMIA Annu Symp Proc 2006:977. 138. Haas JP, Mendonca EA, Ross B, Friedman C, Am Med Inform Assoc 2008 January-Febru- 121. Lovis C, Payne TH. Extending the VA CPRS Larson E. Use of computerized surveillance to ary;15(1):40-3. Epub 2007 Oct 18. electronic patient record order entry system using detect nosocomial pneumonia in neonatal inten- 106. Savova GK, Ogren PV, Duffy PH, Buntrock JD, natural language processing techniques. Proc sive care unit patients. Am J Infect Control Chute CG. Mayo Clinic NLP System for Patient AMIA Symp 2000:517-21. 2005:439-43. Smoking Status Identification. J Am Med Inform 122. Cimino JJ, Bright TJ, Li J. Medication reconcili- 139. Chapman WW, Fiszman M, Dowling JN, Assoc 2008 January-February;15(1):25-8. Epub ation using natural language processing and con- Chapman BE, Rindflesch TC. Identifying respi-

Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 IMIA Yearbook of Medical Informatics 2008 For personal or educational use only. No other uses without permission. All rights reserved. 144 Meystre et al.

ratory findings in emergency department reports gan A, Peshkin L, et al. Rapidly retargetable ap- tient problems in healthcare enterprises. Methods for biosurveillance using MetaMap. Medinfo proaches to de-identification in medical records. Inf Med 1998 Nov;37(4-5):373-83. 2004:487-91. J Am Med Inform Assoc 2007:564-73. 173. Cimino JJ, Clayton PD, Hripcsak G, Johnson 140. Chapman WW, Dowling JN, Wagner MM. Fever 157. Szarvas G, Farkas R, Busa-Fekete R. State-of- SB. Knowledge-based approaches to the mainte- detection from free-text clinical records for the-art anonymization of medical records using an nance of a large controlled medical terminology. J biosurveillance. J Biomed Inform 2004:120-7. iterative machine learning framework. J Am Med Am Med Inform Assoc 1994 Jan-Feb;1(1):35-50. 141. Brown SH, Speroff T, Fielstein EM, Bauer Inform Assoc 2007:574-80. 174. Chapman WW, Dowling JN. Inductive creation BA, Wahner-Roedler DL, Greevy R, et al. 158. Ananiadou S, Nenadic G. Automatic Terminol- of an annotation schema for manually indexing eQuality: electronic quality assessment from ogy Management in Biomedicine. In: Ananiadou clinical conditions from emergency department narrative clinical reports. Mayo Clinic pro- S, McNaught J, eds. Text Mining for Biology and reports. J Biomed Inform 2006:196-208. ceedings 2006:1472-81. Biomedicine: Artech House Books 2006:67-98. 175. Chapman WW, Dowling JN, Hripcsak G. Evalua- 142. Pakhomov S, Weston SA, Jacobsen SJ, Chute 159. Baneyx A, Charlet J, Jaulent MC. Methodology tion of training with an annotation schema for manual CG, Meverden R, Roger VL. Electronic medical to build medical ontology from textual resources. annotation of clinical conditions from emergency records for clinical research: application to the AMIA Annu Symp Proc 2006:21-5. department reports. Int J Med Inform 2007. identification of heart failure. Am J Manag Care 160. Baneyx A, Charlet J, Jaulent MC. Building an 176. Cohen KB, Fox L, Ogren PV, Hunter L. Empiri- 2007:281-8. ontology of pulmonary diseases with natural lan- cal data on corpus design and usage in biomedical 143. Pakhomov SS, Hemingway H, Weston SA, guage processing tools using textual corpora. Int natural language processing. AMIA Annu Symp Jacobsen SJ, Rodeheffer R, Roger VL. Epidemi- J Med Inform 2007:208-15. Proc 2005:156-60. ology of angina pectoris: role of natural language 161. Zhou L, Tao Y, Cimino JJ, Chen ES, Liu H, 177. Cohen KB, Fox L, Ogren PV, Hunter L. Corpus processing of the medical record. Am Heart J Lussier YA, et al. Terminology model discovery design for biomedical natural language process- 2007:666-73. using natural language processing and visualiza- ing. AC-ISMB Workshop on Linking Biologi- 144. Pakhomov SV, Buntrock J, Chute CG. Prospec- tion techniques. J Biomed Inform 2006:626-36. cal Literature, Ontologies and Databases; 2005: tive recruitment of patients with congestive heart 162. Charlet J, Bachimont B, Jaulent MC. Building Association for Computational Linguistics; failure using an ad-hoc binary classifier. J Biomed medical ontologies by terminology extraction from 2005. p. 38-45. Inform 2005:145-53. texts: an experiment for the intensive care units. 178. Wilbur WJ, Rzhetsky A, Shatkay H. New direc- 145. Niu Y, Zhu X, Li J, Hirst G. Analysis of polarity Comput Biol Med 2006:857-70. tions in biomedical text annotation: definitions, information in medical text. AMIA Annu Symp 163. Kolesa P, Preckova P. Tools for Czech biomedical guidelines and corpus construction. BMC Proc 2005:570-4. ontologies creation. Stud Health Technol Inform Bioinformatics 2006:356. 146. Dorr DA, Phillips WF, Phansalkar S, Sims SA, 2006:775-80. 179. Hirschman L, Blaschke C. Evaluation of Text Hurdle JF. Assessing the difficulty and time cost 164. Hersh WR, Campbell EH, Evans DA, Brownlow Mining in Biology. In: Ananiadou S, McNaught of de-identification in clinical narratives. Meth- ND. Empirical, automated vocabulary discovery J, editors. Text Mining for Biology and Biomedi- ods Inf Med 2006:246-52. using large text corpora and advanced natural lan- cine: Artech House Books 2006:67-98. 147. Sweeney L. Replacing personally-identifying in- guage processing tools. Proc AMIA Annu Fall 180. Swanson DR. Two medical literatures that are formation in medical records, the Scrub system. Symp 1996:159-63. logically but not bibliographically connected. Proc AMIA Annu Fall Symp 1996:333-7. 165. Harris MR, Savova GK, Johnson TM, Chute CG. JASIS 1987;38(4):228-33. 148. Ruch P, Baud RH, Rassinoux AM, Bouillon P, A term extraction tool for expanding content in 181. Chen ES, Hripcsak G, Xu H, Markatou M, Fried- Robert G. Medical document anonymization with a the domain of functioning, disability, and health: man C. Automated Acquisition of Disease-Drug semantic lexicon. Proc AMIA Symp 2000:729-33. proof of concept. J Biomed Inform 2003 Aug- Knowledge from Biomedical and Clinical Docu- 149. Taira RK, Bui AA, Kangarloo H. Identification of Oct;36(4-5):250-9. ments: An Initial Study. J Am Med Inform Assoc patient name references within medical documents 166. Savova GK, Harris M, Johnson T, Pakhomov SV, 2007. using semantic selectional restrictions. Proc AMIA Chute CG. A data-driven approach for extracting 182. Cao H, Hripcsak G, Markatou M. A statistical Symp 2002:757-61. "the most specific term" for ontology develop- methodology for analyzing co-occurrence data from 150. Thomas SM, Mamlin B, Schadow G, McDonald ment. AMIA Annu Symp Proc 2003:579-83. a large sample. J Biomed Inform 2007 Jun;40 C. A successful technique for removing names in 167. Savova G, Becker D, Harris M, Chute CG. Com- (3):343-52. pathology reports using an augmented search and bining Rule-Based Methods and Latent Semantic 183. Cao H, Markatou M, Melton GB, Chiang MF, replace method. Proc AMIA Symp 2002:777-81. Analysis for Ontology Structure Construction. Hripcsak G. Mining a clinical data warehouse to 151. Berman JJ. Concept-match medical data scrub- Medinfo; 2004; San Francisco, CA; 2004. p. 1848. discover disease-finding associations using co- bing. How pathology text can be used in research. 168. do Amaral MB, Roberts A, Rector AL. NLP tech- occurrence statistics. AMIA Annu Symp Proc Archives of pathology & laboratory medicine niques associated with the OpenGALEN ontol- 2005:106-10. 2003:680-6. ogy for semi-automatic textual extraction of medi- 184. Rindflesch TC, Pakhomov SV, Fiszman M, 152. Fielstein EM, Brown SH, Speroff T. Algorithmic cal knowledge: abstracting and mapping equiva- Kilicoglu H, Sanchez VR. Medical facts to sup- De-identification of VA Medical Exam Text for lent linguistic and logical constructs. Proc AMIA port inferencing in natural language processing. HIPAA Privacy Compliance: Preliminary Find- Symp 2000:76-80. AMIA Annu Symp Proc 2005:634-8. ings. Medinfo 2004:1590. 169. Friedman C, Liu H, Shagina L. A vocabulary 153. Gupta D, Saul M, Gilbertson J. Evaluation of a development and visualization tool based on natu- deidentification (De-Id) software engine to share ral language processing and the mining of textual Correspondence to: pathology reports and clinical documents for re- patient reports. J Biomed Inform 2003:189-201. search. Am J Clin Pathol 2004:176-86. 170. UMLS Knowledge Source Server (UMLSKS). Stéphane M. Meystre 154. Beckwith BA, Mahaadevan R, Balis UJ, Kuo F. [cited 01/10/2008]; Available from: http:// University of Utah Development and evaluation of an open source umlsks.nlm.nih.gov Department of Biomedical Informatics software tool for deidentification of pathology 171. The Lexical Grid (LexGrid). [cited 01/10/2008]; 26 South 2000 East, HSEB Suite 5700 reports. BMC Med Inform Decis Mak 2006:12. Available from: http://informatics.mayo.edu/ Salt Lake City, UT 84112-5750 155. Sibanda T, Uzuner O. Role of Local Context in LexGrid/index.php?page= USA Automatic Deidenti?cation of Ungrammatical, 172. Tuttle MS, Olson NE, Keck KD, Cole WG, Erlbaum Tel: +1 801 581 4080 Fragmented Text. ACL conference 2006. MS, Sherertz DD, et al. Metaphrase: an aid to the Fax: +1 801 581 4297 156. Wellner B, Huyck M, Mardis S, Aberdeen J, Mor- clinical conceptualization and formalization of pa- E-mail: [email protected]

IMIA Yearbook of Medical Informatics 2008 Downloaded from imia.schattauer.de on 2012-02-03 | IP: 173.79.253.196 For personal or educational use only. No other uses without permission. All rights reserved.