Comprehend Medical: a Named Entity Recognition and Relationship Extraction Web Service

Parminder Bhatia Busra Celikkaya Mohammed Khalilia Selvan Senthivel Amazon Amazon Amazon Amazon Seattle, Washington, USA Seattle, Washington, USA Seattle, Washington, USA Seattle, Washington, USA [email protected] [email protected] [email protected] [email protected]

Abstract—Comprehend Medical is a stateless and Health source NLP package based on the Unstructured Informa- Insurance Portability and Accountability Act (HIPAA) eligible tion Management Architecture (UIMA) framework [7] and Named Entity Recognition (NER) and Relationship Extraction OpenNLP [8] natural language processing toolkit. cTAKES (RE) service launched under Amazon Web Services (AWS) trained using state-of-the-art deep learning models. Contrary to uses a dictionary look-up and each mention is mapped to many existing open source tools, Comprehend Medical is scalable a Unified Medical Language System (UMLS) concept [9]. and does not require steep learning curve, dependencies, pipeline MetaMap [10] is another open-source tool aims at mapping configurations, or installations. Currently, Comprehend Medical mentions in biomedical text to UMLS concepts using dictio- performs NER in five medical categories: Anatomy, Medical nary lookup. MetaMap Lite [11] adds negation detection based Condition, Medications, Protected Health Information (PHI) and Treatment, Test and Procedure (TTP). Additionally, the service on either ConText [12] or NegEx [13]. provides relationship extraction for the detected entities as well The Clinical Language Annotation, Modeling, and Process- as contextual information such as negation and temporality in ing (CLAMP) [14] is one of the most recent clinical NLP the form of traits. Comprehend Medical provides two Application systems. CLAMP is motivated by the fact that existing clinical Programming Interfaces (API): 1) the NERe API which returns NLP systems need customization and must be tailored to all the extracted named entities, their traits and the relationships between them and 2) the PHId API which returns just the one’s task. For NER, CLAMP takes two approaches: machine protected health information contained in the text. Furthermore, learning approach using Conditional Random Field (CRF) [15] Comprehend Medical is accessible through AWS Console, Java and dictionary-based, which maps mentions to standardized and Python Software Development Kit (SDK), making it easier ontologies. CLAMP also provides assertion and negation de- for non-developers and developers to use. tection based on machine learning or rule-based NegEx. Index Terms—Neural Networks, Multi-task Learning, Natural Language Processing, Clinical NLP, Named Entity Recognition, Many of the existing NLP systems rely on ConText [12] Relationship Extraction and NegEx [13] to detect assertions such as negation. ConText extracts three contextual features for medical conditions: nega- tion, historical or hypothetical and experienced by someone I.INTRODUCTION other than the patient. ConText is an extension of NegEx, Electronic Health Records (EHR) contain a wealth of pa- which is based on regular expression. tients’ data ranging from diagnoses, problems, treatments, Most of the NLP systems discussed above perform linking medications to imaging and clinical narratives such as dis- of mentions to UMLS. They are based on pipelined compo- charge summaries and progress reports. Structured data are nents that are configurable, rely on dictionary look-up for NER important for billing, quality and outcomes. On the other and regular expressions for assertion detection. hand, narrative text is more expressive, more engaging and Recently, neural network models have been proposed to arXiv:1910.07419v1 [cs.CL] 15 Oct 2019 captures patient’s story more accurately. Narrative notes may overcome some of the limitations of rule-based techniques. also contain information about level of concern and uncertainty A feedforward and bidirectional Long Short Term Memory to others who are reviewing the note. Studies have shown that (BiLSTM) networks for generic negation scope detection was narrative notes contain more naturalistic prose, more reliable proposed in [16]. In [17] a gated recurrent units (GRUs) in identifying patients with a given disease and more under- are used to represent the clinical relations and their context, standable to healthcare providers reviewing those notes [1]– along with an attention mechanism. Given a text annotated [5]. Therefore, to have a clear perspective on patient condition, with relations, it classifies the presence and period of the narrative text should be analyzed. However, manual analysis relations. However, this approach is not end-to-end as it does of massive number of narrative text is time consuming, labor not predict the relations. Additionally, these models generally intensive and prone to errors. require large annotated corpus to achieve good performance, Many clinical Natural Language Processing (NLP) tools but clinical data is scarce. and systems were published to help us make sense of those Kernel-based approaches are also very common, especially valuable narrative text. For instance, clinical Text Analysis in the 2010 i2b2/VA task of predicting assertions. The state- and Knowledge Extraction System (cTAKES) [6] is an open- of-the-art in that challenge applied support vector machines (SVM) to assertion prediction as a separate step after entity and the relation extraction using explicit context conditioning extraction [18]. They train classifiers to predict assertions proposed in [30]. of each concept word, and a separate classifier to predict A. Named Entity Recognition Architecture the assertion of the whole entity. Augmented Bag of Words Kernel (ABoW), which generates features based on NegEx A sequence tagging problem such as NER can be formulated rules along with bag-of-words features was proposed in [19] as maximizing the conditional probability distribution over and a CRF based approach for classification of cues and scope tags y given an input sequence x, and model parameters θ. detection was proposed in [20]. These machine learning based T Y approaches often suffer in generalizability. P (y|x, θ) = P (yt|xt, y1:t−1, θ) (1) Once named entities are extracted it is important to identify t=1 the relationships between the entities. Several end-to-end mod- T is the length of the sequence, and y1:t−1 are tags for the els were proposed that jointly learn named entity recognition previous words. The architecture we use as a foundation is that and relationship extraction [21]–[23]. Generally, relationship of [31], [32]. The model consists of three main components: (i) extraction models consist of an encoder followed by rela- character encoder, (ii) word encoder, and (iii) decoder/tagger. tionship classification unit [24]–[26]. The encoder provides 1) Encoders: Given an input sequence x ∈ NT whose context aware vector representations for both target entities, coordinates indicate the words in the input vocabulary, we which are then merged or concatenated before being passed first encode the character level representation for each word. to the relation classification unit, where a two layered neural (t) L×ec For each xt the corresponding sequence c ∈ R of network or multi-layered perceptron classifies the pair into character embeddings is fed into an encoder, where L is the different relation types. length of a given word and ec is the size of the character Despite the existence of many clinical NLP systems, au- embedding. The character encoder employs two LSTM units −→ ←− tomatic from narrative clinical text (t) (t) has not achieved enough traction yet [27]. As reported by which produce h1:l , and h1:l , the forward and backward hid- [27] there is a significant gap between clinical studies using den representations, respectively, where l is the last timestep in both sequences. We concatenate the last timestep of each of Electornic Health Record (EHR) data and studies using clinical −→ ←− information extraction. Reasons for such gap can be attributed (t) (t) (t) these as the final encoded representation, hc = [hl ||hl ], to limited expertise of NLP experts in the clinical domain, lim- of xt at the character level. ited availability of clinical data sets due to the HIPAA privacy The output of the character encoder is concatenated with (t) rules and poor portability and generalizability of clinical NLP a pre-trained word embedding, mt = [hc ||embword(xt)], systems. Rule-based NLP systems require handcrafted rules, which is used as the input to the word level encoder. while machine learning-based NLP systems require annotated Using learned character embeddings alongside word em- datasets. beddings has shown to be useful for learning word level To narrow the clinical NLP adoption gap and to address morphology, as well as mitigating loss of representation for some of the limitations in existing NLP systems, we present out-of-vocabulary words. Similar to the character encoder we Comprehend Medical, a web service for clinical named entity use a BiLSTM to encode the sequence at the word level. The recognition and relationship extraction. Our contributions are word encoder does not lose resolution, meaning the output at as follows: each timestep is the concatenated output of both word LSTMs, −→ ←− • Named entity recognition, relationship extraction and trait ht = [ht||ht]. detection service encapsulated in one easy to use API. 2) Decoder and Tagger: Finally, the concatenated output of • Web service that uses deep learning multi-task [28] the word encoder is used as input to the decoder, along with approach trained on labeled training data and requires the label embedding of the previous timestep. During training no configurations or customization. we use teacher forcing [33] to provide the gold standard label • Trait (negation, sign, symptom and diagnosis) detection as part of the input. for medical condition and negation detection for medica- tion. ot = LSTM(ot−1, [ht||yˆt−1]) (2) s The rest of the paper is organized as follows: section II yˆt = Softmax(Wot + b ) (3) presents the methods, section III describes the datasets and d×n experimental settings, section IV contains the results for the where W ∈ R , d is the number of hidden units in the NER and RE models, section V talks about the implementation decoder LSTM, and n is the number of tags. The model is details, section VI gives overview of the supported entities, trained in an end-to-end fashion using a standard cross-entropy traits and relationships, section VII presents some of the use objective. cases and we conclude in section VIII. 3) Named Entity Recognition Decoder Model: Our decoder model provides more context to trait detection by adding an II.METHODS additional input, which is the softmax output from entity In this section we briefly introduce the architectures for extraction. We refer to this architecture as the Conditional named entity recognition and trait detection proposed in [29] Softmax Decoder as shown in Fig. 1 [29]. Thus, the model learns more about the input as well as the label distribution two entities is given as a weighted sum of first and second from entity extraction prediction. As an example, we use order scores. negation only for problem entity in the i2b2 dataset. Providing the entity prediction distribution helps the negation model to make better predictions. The negation model learns that if the prediction probability is not inclined towards the problem entity, then it should not predict negation irrespective of the word representation.

Entity Entity Ent Ent s yˆt , SoftOutt = Softmax (W ot + b ) (4)

Neg Neg Neg Entity s yˆt = Softmax (W [ot, SoftOutt ] + b ) (5) Entity where, SoftOutt is the softmax output of the entity at time step t. Readers are referred to [29] for more detailed discussion on the conditional softmax decoder model.

Fig. 2. Relationship extraction model

III.EXPERIMENTS A. Dataset We evaluated our model on two datasets. First is the 2010 i2b2/VA challenge dataset for “test, treatment, problem” (TTP) Fig. 1. Conditional softmax decoder model entity extraction and assertion detection, herein referred to as i2b2. Unfortunately, only part of this dataset was made public after the challenge, therefore we cannot directly compare B. Relationship Extraction Architecture with NegEx and ABoW results. We followed the original The extracted entities are not very meaningful by them- data split from [34] of 170 notes for training and 256 for selves, specially in the healthcare domain. For instance, it is testing. The second dataset is proprietary and consists of 4,200 important to know if the procedure was performed bilaterally, de-identified clinical notes with medical conditions, herein on the left or right side. Knowing the correct location will referred to as DCN. result in more accurate and reliable billing and reimbursement. The i2b2 dataset contains six predefined relations types Hence, it is important to identify the relationships among those including TrCP (Treatment Causes Problem), TrIP (Treatment clinical entities. Improves Problem), TrWP (Treatment Worsens Problem) and The RE model architecture is described in [30], but we one negative relation. The DCN dataset contains seven prede- reiterate some of the important details here. Relationships are fined relationship types such as with dosage, every and one defined between two entities, which we refer to as head and negative relation. A summary of the datasets is presented in tail entity. To extract such relationships we proposed relation Table I. extraction using explicit context conditioning, where two target entities (head and tail) can be explicitly connected via a TABLE I context token also known as second order relations. Similar OVERVIEW OF THE I2B2 AND DCN DATASETS. to Bi-affine Relation Attention Networks (BRAN) [24], we head i2b2 DCN first compute the representations for both the head, ei , tail Notes 426 4,200 and tail, ei , entities, which are then passed through two Tokens 416K 1.5M multi-layer perceptron (MLP-1) to obtain first-order relation Entity Tags 13 37 scores, score(1)(phead, ptail), as shown in Fig. 2. We also pass Relations 3,653 270,000 head tail Relation Types 6 7 ei and ei through MLP-2 to obtain second-order relation scores, score(2)(phead, ptail), where phead and ptail are the indices for the head and tail entities. The motivation for adding MLP-2 was driven by the need for representations focused on B. NER Model Settings establishing relations with context tokens, as opposed to first- Word, character and tag embeddings are 100, 25, and order relations. At the end, the final score for relation between 50 dimensions, respectively. Word embeddings are initialized using GloVe, while character and tag embeddings are learned. B. RE Results Character and word encoders have 50 and 100 hidden units, respectively, while the decoder LSTM has a hidden size of To show the benefits of using second-order relations we 50. Dropout is used after every LSTM, as well as for word compared our models performance to BRAN. The two models embedding input. We use Adam as an optimizer. Our model is are different in the weighted addition of second-order relation built using MXNet. Hyperparameters are tuned using Bayesian scores. We tune over this weight parameter on the dev set Optimization [35]. and observed an improvement in MacroF1 score from 0.712 to 0.734 over DCN data and from 0.395 to 0.407 over i2b2 data. For further comparison a recently published model called C. RE Model Settings Hybrid Deep Learning Approach (HDLA) [36] reported a Our final network had two encoder layers, with 8 attention macroF1 score of 0.388 on the same i2b2 dataset. It should heads in each multi-head attention sublayer and 256 filters for be mentioned that HDLA used syntactic parsers for feature convolution layers in position-wise feedforward sublayer. We extraction but we do not use any such external tools. used dropout with probability 0.3 after the embedding layer, Table III summarizes the performance of our relationship head/tail MLPs and the output of each encoder sublayer. We model (+SOR) using second-order relations compared to also used a word dropout with probability 0.15 before the BRAN and HDLA. We refer the readers to [30] for more embedding layer. detailed analysis of our relationship extraction model.

IV. RESULTS TABLE III TESTSETPERFORMANCEOFRELATIONEXTRACTIONONI2B2 AND DCN A. NER and Trait Detection Results DATASETS

We report the results for NER and negation detection for Data Model Precision Recall F1 HDLA [36] 0.378 0.422 0.388 i2b2 both the i2b2 and DCN datasets in Table II. We observe that BRAN [24] 0.396 0.403 0.395 our purposed conditional softmax decoder approach outper- +SOR 0.424 0.419 0.407 BRAN [24] 0.614 0.85 0.712 forms the best model [34] on the i2b2 challenge. DCN We compare our models for negation detection against +SOR 0.643 0.879 0.734 NegEx [13] and ABoW [19], which has the best results for the negation detection task on i2b2 dataset. Conditional softmax decoder model outperforms both NegEx and ABoW (Table II). Low performance of NegEx and ABoW is mainly attributed V. IMPLEMENTATION to the fact that they use ontology lookup to index findings and negation regular expression search within a fixed scope. A Comprehend Medical APIs run in Amazon’s proven, high- similar trend was observed in the medication condition dataset availability data centers, with service stack replication con- (Table II). The important thing to note is the low F1 score for figured across three facilities in each AWS region to provide NegEx. This can primarily be attributed to abbreviations and fault tolerance in the event of a server failure or Availability misspellings in clinical notes which can not be handled well Zone outage. Additionally, Comprehend Medical ensures that by rule-based systems. system artifacts are encrypted in transit and user data is pass through and will not be stored in any part of the system.

TABLE II Comprehend Medical is available through a Graphical User TESTSETPERFORMANCEWITHMULTI-TASK I2B2 AND DCN DATASETS interface (GUI) within the AWS console and can be accessed using the Java and Python SDK. Comprehend Medical offers Data Model Precision Recall F1 Named Entity two APIs: 1) the NERe API which returns all the extracted LSTM:CRF [34] 0.844 0.834 0.839 named entities, their traits and the relationships between them, i2b2 Conditional Decoder 0.854 0.858 0.855 2) the PHId API which returns just the protected health infor- LSTM:CRF [34] 0.82 0.84 0.83 DCN Conditional Decoder 0.878 0.872 0.874 mation contained in the text. Developers can easily integrate Negation Comprehend Medical into their data processing pipelines as Negex [13] 0.896 0.799 0.845 i2b2 shown in Fig. 4. ABoW Kernel [19] 0.899 0.900 0.900 Conditional Decoder 0.919 0.891 0.905 The only input needed by Comprehend Medical is the Negex [13] 0.403 0.932 0.563 text to be analyze. No configuration, customization or other DCN Conditional Decoder 0.928 0.874 0.899 parameters needed, making Comprehend Medical easy to use by anyone who has access to AWS. Comprehend Medical We also evaluated the conditional softmax decoder in low outputs the results in JavaScript Object Notation (JSON), resource settings, where we used a sample of our training data. which contains named entities, begin offset, end offset, traits, We observed that conditional decoder is more robust in low confidence scores and the relationships between the entities. resource settings than other approaches as we reported in [29]. Using the GUI (Fig. 3) users can quickly visualize their results. Fig. 3. Rendering of entities, traits and relations by Comprehend Medical UI

VI.ENTITIES,TRAITSAND RELATIONSHIPS and whether or not the individual is taking the medication. A. Entities Dx Name has three additional traits: Diagnosis, Sign and Symptom. Diagnosis identifies an illness or a disease. Sign is Named entity mentions found in narrative notes are tagged an objective evidence of disease and it is a phenomenon that with entity types listed in Table IV. The entities are divided is detected by a physician or a nurse. Symptom is a subjective into five categories: Anatomy, Medical Condition, Medication, evidence of disease and it is phenomenon that is observed by PHI and TTP. Comprehend Medical is HIPAA eligible and the individual affected by the disease. An example of traits is therefore it supports HIPAA identifiers. Some of those iden- shown in Fig. 3. tifiers are grouped under one identifier. For instance, Contact Point covers phone and fax numbers, and ID covers social security number, medical record number, account number, TABLE V TRAITSEXTRACTEDBY COMPREHEND MEDICAL certificate or license number and vehicle or device number. An example input text is shown in Fig. 3. Trait Entity Negation Brand/Generic Name, Dx Name TABLE IV Diagnosis Dx Name ENTITIESEXTRACTEDBY COMPREHEND MEDICAL Sign Dx Name Symptom Dx Name Category Entity Anatomy Direction System Organ Site C. Relationships Medical Condition Dx Name Acuity A relationship is defined between a pair of entities in the Medication Brand Name Medication and TTP categories (Table VI). One of the entities Generic Name Dosage in a relationship is the head while the other is the tail entity. Duration In Medication, Generic and Brand Name are the head entity, Frequency which can have relationships to tail entities such as Strength Form Route or Mode and Dosage. An example of relations is shown in Fig. 3. Strength Rate TABLE VI PHI Age RELATIONSHIPSEXTRACTEDBY COMPREHEND MEDICAL Date Name Contact Point Head Entity Tail Entity Email Brand/Generic Name Dosage URL Duration Identifier Frequency Address Form Profession Route or Mode TTP Test Name Strength Test Value Test Name Test Value Test Unit Test Unit Procedure Name Treatment Name

VII.USE CASES B. Traits Comprehend Medical reduces the cost, time and effort of Comprehend Medical covers four traits, listed in Table V. processing large amounts of unstructured medical text with Negation asserts the presence or absence of a Dx Name high accuracy, making it possible to pursue use cases such Fig. 4. Integrating Comprehend Medical into data processing pipeline as clinical trial management, clinical decision support and narrative text is invaluable to organizations participating in revenue cycle management. value-based healthcare and population health. Structured med- ical records do not fully identify patients with medical history A. Clinical Trial Management of diabetes, which results in an underestimation of disease It can take about 10-15 years for a treatment to be developed prevalence [39]. The inability to identify patient cohorts from from discovery to registration with the Federal Drug Admin- structured data represents a problem for the development of istration (FDA). During that time, research organization can population health and clinical management systems. It also spend six years on clinical trails. Despite the number of year negatively affects the accuracy of identifying high-risk and it takes to design those clinical trails, 90% of all clinical trails high-cost patients [40]. Ref. [41] identified three areas that fail to enroll patients within the targeted time and are forced may have an impact on readmission, but that are poorly to extend the enrollment period, 75% of trails fail to enroll the documented in the EMR system, thus the need for NLP-based targeted number of patients and 27% fail to enroll any subjects solutions to extract such information. Also, some symptoms [37]. and illness characteristics that are necessary to develop reliable Life sciences and clinical research organizations can speed predictors are missing in the coded billing data [42]. Ref. [43] up and optimize the process of recruiting patients into a performed mortality prediction and reported a 2% increase clinical trial as extractions from unstructured text and medical in the Area Under the Curve when using features from both records can expedite the matching process. For instance, structured data and concepts extracted from narrative notes indexing patients based on medication, medical condition and [44] found that the predictive power of suicide risk factors and treatments can help with quickly identifying the right found in EMR systems become asymptotic, leading them to participants for a lifesaving clinical trial. incorporate analysis on clinical notes to predict risk of suicide. Fred Hutchinson Cancer Research Center (FHCRC) uti- As seen from the examples above, NLP-based approaches lized Comprehend Medical in their clinical trail management. can assist in identifying concepts that are incorrectly codified FHCRC was spending 1.5 hours to annotate a single patient or are missing in EMR system. Population health platforms note, about 2.5 hours on manual chart abstraction per patient can expand their risk analytics to leveraged unstructured clin- and per day they can process charts for about three patients. ical data for prediction of high risk patients and epidemiologic By using Comprehend Medical, FHCRC was able to annotate studies on outbreaks of diseases. 9,642 patient notes per hour. C. Revenue Cycle Management B. Patient and Population Health Analytics In healthcare, Revenue Cycle Management (RCM) is the Population health focuses on the discovery of factors and process of collecting revenue and tracking claims from health- conditions for the a health of a population over time. It aims care providers including hospitals, outpatient clinics, nursing at identifying patterns of occurrence and knowledge discovery homes, dentist clinics and physician groups [45]. in order to develop polices and actions to improve health of a RCM process has been inefficient as most healthcare sys- group or population [38]. tems use rule-based approaches and manual audits of doc- Examples of population health analytics include patient uments for billing and coding purposes [46]. Rule-based stratification, readmission prediction and mortality measure- systems are time consuming, expensive to maintain, require ment. Automatically unlocking important information from the attention and frequent human intervention. Due to these inef- fective processes, data coded at point care, which is the source and contrary to many other existing clinical NLP systems, for claims data, can contain errors and inconsistencies. it does not require dependencies, configuration or pipelined Coding is the process of encoding the details of patient components customization. encounters into standardized terminology [38]. A study by [47] shows that 48 errors found in 38 of the 106 finished consultant REFERENCES episodes in urology and 71% of these errors are caused [1] S. T. Rosenbloom, J. C. Denny, H. Xu, N. Lorenzi, W. W. Stead, and by inaccurate coding. Ref. [48] measured the consistency of K. B. Johnson, “Data from clinical notes: a perspective on the tension coded data and found that some of these errors were significant between structure and flexible documentation,” Journal of the American Medical Informatics Association, vol. 18, no. 2, pp. 181–186, mar 2011. enough to change the diagnostic related group. [2] K. M. Fox, M. Reuland, W. G. Hawkes, J. R. Hebel, J. Hudson, S. I. RCM companies can use Comprehend Medical to enhance Zimmerman, J. Kenzora, and J. Magaziner, “Accuracy of medical records existing workflows around computer assisted coding, and val- in hip fracture.” Journal of the American Geriatrics Society, vol. 46, no. 6, pp. 745–50, jun 1998. idate submitted codes by providers. In addition, claim audits, [3] K. A. Marill, E. S. Gauharou, B. K. Nelson, M. A. Peterson, R. L. which often requires finding text evidence for submitted claims Curtis, and M. R. Gonzalez, “Prospective, randomized trial of template- and is done manually, could be done more accurately and assisted versus undirected written recording of physician records in the emergency department.” Annals of emergency medicine, vol. 33, no. 5, faster. pp. 500–9, may 1999. [4] A. M. van Ginneken, “The physician’s flexible narrative.” Methods of D. Pharmacovigilance information in medicine, vol. 35, no. 2, pp. 98–100, jun 1996. [5] A. J. Cawsey, B. L. Webber, and R. B. Jones, “Natural language The aim of pharmacovigilance is to monitor, detect and generation in health care.” Journal of the American Medical Informatics prevent adverse drug events (ADE) of medical drugs. Early Association : JAMIA, vol. 4, no. 6, pp. 473–82, 1997. system used for pharmacovigilance is the spontaneous re- [6] G. K. Savova, J. J. Masanz, P. V. Ogren, J. Zheng, S. Sohn, K. C. Kipper- Schuler, and C. G. Chute, “Mayo clinical Text Analysis and Knowledge porting system (SRS), which provided safety information on Extraction System (cTAKES): architecture, component evaluation and drugs [49]. However, SRS databases are incomplete, inaccurate applications.” Journal of the American Medical Informatics Association and contain biased reporting [49], [50]. A newer generation : JAMIA, vol. 17, no. 5, pp. 507–13, jan 2010. [7] D. Ferrucci and A. Lally, “UIMA: an architectural approach to unstruc- of databases was created that contains clinical information tured information processing in the corporate research environment,” for large patient population, such as the Intensive Medicines Natural Language Engineering, vol. 10, no. 3-4, pp. 327–348, sep 2004. Monitoring Program (IMMP) and the General Practice Re- [8] J. Baldridge, “The Apache OpenNLP Project,” URL: https://opennlp.apache.org/, 2005. search Database (GPRD). Such databases included data from [9] O. Bodenreider, “The Unified Medical Language System (UMLS): structured fields and forms, but very small amount of details integrating biomedical terminology,” Nucleic Acids Research, vol. 32, are stored in the structured fields. Researchers then started no. 90001, pp. 267D–270, jan 2004. to look into EHR data for pharmacovigilance. However, most [10] A. R. Aronson and F.-M. Lang, “An overview of MetaMap: historical perspective and recent advances.” Journal of the American Medical valuable information in patient records are contained in the Informatics Association : JAMIA, vol. 17, no. 3, pp. 229–36, jan 2010. unstructured text. [11] D. Demner-Fushman, W. J. Rogers, and A. R. Aronson, “MetaMap Lite: Using NLP to extract information from narrative text have an evaluation of a new Java implementation of MetaMap,” Journal of the American Medical Informatics Association, vol. 24, no. 4, p. ocw177, shown improvement in ADE detection and pharmacovigilance jan 2017. [51]. Ref. [50], [52] also reported that ADEs are underreported [12] H. Harkema, J. N. Dowling, and T. Thornblade, “ConText: An algorithm in EHR systems and they used NLP techniques to enhance for determining negation, experiencer, and temporal status from clinical reports,” Journal of Biomedical Informatics, vol. 42, no. 5, pp. 839–851, ADE detection. oct 2009. [13] W. W. Chapman, W. Bridewell, P. Hanbury, G. F. Cooper, and B. G. VIII.CONCLUSION Buchanan, “A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries,” Journal of Biomedical Informatics, Studies have shown that narrative notes are more expressive, vol. 34, no. 5, pp. 301–310, oct 2001. more engaging and captures patient’s story more accurately [14] E. Soysal, J. Wang, M. Jiang, Y. Wu, S. Pakhomov, H. Liu, and compared to the structured EHR data. They also contain more H. Xu, “CLAMP a toolkit for efficiently building customized clinical natural language processing pipelines,” Journal of the American Medical naturalistic prose, more reliable in identifying patients with a Informatics Association, vol. 25, no. 3, pp. 331–336, mar 2018. given disease and more understandable to healthcare providers [15] J. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional Random reviewing those notes, which urges the need for a more Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” in Proceedings of the 18th International Conference on Machine accurate, intuitive and easy to use NLP system. In this paper Learning, vol. 951. Citeseer, 2001, pp. 282–289. we presented Comprehend Medical, a HIPAA eligible Amazon [16] F. Fancellu, A. Lopez, and B. Webber, “Neural networks for negation Web Service for medical language entity recognition and scope detection,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), relationship extraction. Comprehend Medical supports several vol. 1, 2016, pp. 495–504. entity types divided into five different categories (Anatomy, [17] L. Rumeng, N. Jagannatha Abhyuday, and Y. Hong, “A hybrid neural Medical Condition, Medication, Protected Health Information, network model for joint prediction of presence and period assertions of medical events in clinical notes,” in AMIA Annual Symposium Treatment, Test and Procedure) and four traits (Negation, Di- Proceedings, vol. 2017. American Medical Informatics Association, agnosis, Sign, Symptom). Comprehend Medical uses state-of- 2017, p. 1149. the-art deep learning models and provides two APIs, the NERe [18] B. de Bruijn, C. Cherry, S. Kiritchenko, J. Martin, and X. Zhu, “Machine-learned solutions for three stages of clinical information and PHId API. Comprehend Medical also comes with four extraction: the state of the art at i2b2 2010,” Journal of the American different interfaces (CLI, Java SDK, Python SDK and GUI) Medical Informatics Association, vol. 18, no. 5, pp. 557–562, 2011. [19] C. Shivade, M.-C. de Marneffe, E. Fosler-Lussier, and A. M. Lai, [37] R. B. Giffin, Y. Lebovitz, R. A. English, and Others, Transforming “Extending negex with kernel methods for negation detection in clinical clinical research in the United States: challenges and opportunities: text,” in Proceedings of the Second Workshop on Extra-Propositional workshop summary. National Academies Press, 2010. Aspects of Meaning in Computational (ExProM 2015), 2015, [38] K. Giannangelo and S. Fenton, “EHR’s effect on the revenue cycle pp. 41–46. management Coding function.” Journal of healthcare information man- [20] K. Cheng, T. Baldwin, and K. Verspoor, “Automatic negation and agement : JHIM, vol. 22, no. 1, pp. 26–30, 2008. speculation detection in veterinary clinical text,” in Proceedings of the [39] L. Zheng, Y. Wang, S. Hao, A. Y. Shin, B. Jin, A. D. Ngo, M. S. Jackson- Australasian Language Technology Association Workshop 2017, 2017, Browne, D. J. Feller, T. Fu, K. Zhang, X. Zhou, C. Zhu, D. Dai, Y. Yu, pp. 70–78. G. Zheng, Y.-M. Li, D. B. McElhinney, D. S. Culver, S. T. Alfreds, [21] M. Miwa and M. Bansal, “End-to-end relation extraction using lstms F. Stearns, K. G. Sylvester, E. Widen, and X. B. Ling, “Web-based Real- on sequences and tree structures,” in Proceedings of the 54th Annual Time Case Finding for the Population Health Management of Patients Meeting of the Association for Computational Linguistics (Volume 1: With Diabetes Mellitus: A Prospective Validation of the Natural Lan- Long Papers), vol. 1, 2016, pp. 1105–1116. guage Processing-Based Algorithm With Statewide Electronic Medical [22] S. Zheng, F. Wang, H. Bao, Y. Hao, P. Zhou, and B. Xu, “Joint Records.” JMIR medical informatics, vol. 4, no. 4, p. e37, nov 2016. extraction of entities and relations based on a novel tagging scheme,” [40] D. W. Bates, S. Saria, L. Ohno-Machado, A. Shah, and G. Escobar, in Proceedings of the 55th Annual Meeting of the Association for “Big Data In Health Care: Using Analytics To Identify And Manage Computational Linguistics (Volume 1: Long Papers). Vancouver, High-Risk And High-Cost Patients,” Health Affairs, vol. 33, no. 7, pp. Canada: Association for Computational Linguistics, Jul. 2017, pp. 1227– 1123–1131, jul 2014. 1236. [41] J. L. Greenwald, P. R. Cronin, V. Carballo, G. Danaei, and G. Choy, [23] H. Adel and H. Schutze,¨ “Global normalization of convolutional neural “A Novel Model for Predicting Rehospitalization Risk Incorporating networks for joint entity and relation classification,” in Proceedings Physical Function, Cognitive Status, and Psychosocial Support Using of the 2017 Conference on Empirical Methods in Natural Language Natural Language Processing,” Medical Care, vol. 55, no. 3, pp. 261– Processing. Copenhagen, Denmark: Association for Computational 266, mar 2017. Linguistics, Sep. 2017, pp. 1723–1729. [42] A. Rumshisky, M. Ghassemi, T. Naumann, P. Szolovits, V. M. Castro, [24] P. Verga, E. Strubell, and A. McCallum, “Simultaneously self-attending T. H. McCoy, and R. H. Perlis, “Predicting early psychiatric readmission to all mentions for full-abstract biological relation extraction,” in with natural language processing of narrative discharge summaries,” Proceedings of the 2018 Conference of the North American Chapter Translational Psychiatry, vol. 6, no. 10, pp. e921–e921, oct 2016. of the Association for Computational Linguistics: Human Language [43] M. Jin, M. T. Bahadori, A. Colak, P. Bhatia, B. Celikkaya, R. Bhakta, Technologies, Volume 1 (Long Papers). New Orleans, Louisiana: S. Senthivel, M. Khalilia, D. Navarro, B. Zhang, T. Doman, A. Ravi, Association for Computational Linguistics, Jun. 2018, pp. 872–884. M. Liger, and T. Kass-hout, “Improving Hospital Mortality Prediction [25] F. Christopoulou, M. Miwa, and S. Ananiadou, “A walk-based model on with Medical Named Entities and Multimodal Learning,” Neural Infor- entity graphs for relation extraction,” in Proceedings of the 56th Annual mation Processing Systems workshop on Machine Learning for Health, Meeting of the Association for Computational Linguistics (Volume 2: 2018. Short Papers). Melbourne, Australia: Association for Computational [44] C. Poulin, B. Shiner, P. Thompson, L. Vepstas, Y. Young-Xu, B. Go- Linguistics, Jul. 2018, pp. 81–88. ertzel, B. Watts, L. Flashman, and T. McAllister, “Predicting the Risk [26] Y. Su, H. Liu, S. Yavuz, I. Gur, H. Sun, and X. Yan, “Global relation em- of Suicide by Analyzing the Text of Clinical Notes,” PLoS ONE, vol. 9, bedding for relation extraction,” in Proceedings of the 2018 Conference no. 1, p. e85733, jan 2014. of the North American Chapter of the Association for Computational [45] V. Mindel and L. Mathiassen, “Contextualist inquiry into IT-enabled Linguistics: Human Language Technologies, Volume 1 (Long Papers). hospital revenue cycle management: bridging research and practice,” New Orleans, Louisiana: Association for Computational Linguistics, Journal of the Association for Information Systems, vol. 16, no. 12, p. Jun. 2018, pp. 820–830. 1016, 2015. [27] Y. Wang, L. Wang, M. Rastegar-Mojarad, S. Moon, F. Shen, N. Afzal, [46] P. Schouten, “Big data in health care: solving provider revenue leakage S. Liu, Y. Zeng, S. Mehrabi, S. Sohn, and H. Liu, “Clinical information with advanced analytics,” Healthcare Financial Management, vol. 67, extraction applications: A literature review,” Journal of Biomedical no. 2, pp. 40–43, feb 2013. Informatics, vol. 77, pp. 34–49, jan 2018. [47] A. Ballaro, S. Oliver, and M. Emberton, “Do we do what they say we [28] P. Bhatia, K. Arumae, and E. B. Celikkaya, “Dynamic transfer learning do? Coding errors in urology,” BJU International, vol. 85, no. 4, pp. for named entity recognition,” in International Workshop on Health 389–391, mar 2000. Intelligence. Springer, 2019, pp. 69–81. [48] D. P. Lorence and I. A. Ibrahim, “Benchmarking variation in coding [29] P. Bhatia, B. Celikkaya, and M. Khalilia, “Joint Entity Extraction and accuracy across the United States.” Journal of health care finance, Assertion Detection for Clinical Text,” in Proceedings of the 57th Annual vol. 29, no. 4, pp. 29–42, 2003. Meeting of the Association for Computational Linguistics. Florence, [49] X. Wang, G. Hripcsak, M. Markatou, and C. Friedman, “Active Comput- Italy: Association for Computational Linguistics, 2019, pp. 954–959. erized Pharmacovigilance Using Natural Language Processing, Statis- [30] G. Singh and P. Bhatia, “Relation Extraction using Explicit Context tics, and Electronic Health Records: A Feasibility Study,” Journal of Conditioning,” in Proceedings of the 2019 Conference of the North the American Medical Informatics Association, vol. 16, no. 3, pp. 328– American Chapter of the Association for Computational Linguistics: 337, may 2009. Human Language Technologies. Minneapolis, Minnesota, USA: Asso- [50] A. Henriksson, M. Kvist, H. Dalianis, and M. Duneld, “Identifying ciation for Computational Linguistics, 2019, pp. 1442–1447. adverse drug event information in clinical notes with distributional [31] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, semantic representations of context,” Journal of Biomedical Informatics, “Neural architectures for named entity recognition,” in Proceedings of vol. 57, pp. 333–349, oct 2015. NAACL-HLT, 2016, pp. 260–270. [51] Y. Luo, W. K. Thompson, T. M. Herr, Z. Zeng, M. A. Berendsen, [32] Z. Yang, R. Salakhutdinov, and W. Cohen, “Multi-task cross-lingual S. R. Jonnalagadda, M. B. Carson, and J. Starren, “Natural Language sequence tagging from scratch,” arXiv preprint arXiv:1603.06270, 2016. Processing for EHR-Based Pharmacovigilance: A Structured Review,” [33] R. J. Williams and D. Zipser, “A Learning Algorithm for Continually Drug Safety, vol. 40, no. 11, pp. 1075–1089, nov 2017. Running Fully Recurrent Neural Networks,” Neural Computation, vol. 1, [52] N. Shang, H. Xu, T. C. Rindflesch, and T. Cohen, “Identifying plausible no. 2, pp. 270–280, jun 1989. adverse drug reactions using knowledge extracted from the literature,” [34] R. Chalapathy, E. Z. Borzeshi, and M. Piccardi, “Bidirectional LSTM- Journal of Biomedical Informatics, vol. 52, pp. 293–310, dec 2014. CRF for Clinical Concept Extraction,” in Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP), Osaka, Japan, 2016, pp. 7–12. [35] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimiza- tion of machine learning algorithms,” in Advances in neural information processing systems, 2012, pp. 2951–2959. [36] V. R. Chikka and K. Karlapalem, “A hybrid deep learning approach for medical relation extraction,” arXiv preprint arXiv:1806.11189, 2018.