Structured Prediction Models for RNN Based Sequence Labeling in Clinical Text

Structured prediction models for RNN based sequence labeling in clinical text Abhyuday N Jagannatha1, Hong Yu1,2 1 University of Massachusetts, MA, USA 2 Bedford VAMC and CHOIR, MA, USA [email protected] , [email protected] Abstract from drug efficacy analysis to adverse effect surveil- lance. Sequence labeling is a widely used method A widely used method for Information Extrac- for named entity recognition and information tion from natural text documents involves treating extraction from unstructured natural language the text as a sequence of tokens. This format al- data. In the clinical domain one major ap- plication of sequence labeling involves ex- lows sequence labeling algorithms to label the rel- traction of relevant entities such as medica- evant information that should be extracted. Sev- tion, indication, and side-effects from Elec- eral sequence labeling algorithms such as Condi- tronic Health Record Narratives. Sequence la- tional Random Fields (CRFs), Hidden Markov Mod- beling in this domain presents its own set of els (HMMs), Neural Networks have been used for challenges and objectives. In this work we information extraction from unstructured text. CRFs experiment with Conditional Random Field and HMMs are probabilistic graphical models that based structured learning models with Recur- rent Neural Networks. We extend the pre- have a rich history of Natural Language Process- viously studied CRF-LSTM model with ex- ing (NLP) related applications. These methods try plicit modeling of pairwise potentials. We also to jointly infer the most likely label sequence for a propose an approximate version of skip-chain given sentence. CRF inference with RNN potentials. We use Recently, Recurrent (RNN) or Convolutional these methods1 for structured prediction in or- Neural Network (CNN) models have increasingly der to improve the exact phrase detection of clinical entities. been used for various NLP related tasks. These Neu- ral Networks by themselves however, do not treat sequence labeling as a structured prediction prob- 1 Introduction lem. Different Neural Network models use different methods to synthesize a context vector for Patient data collected by hospitals falls into two cat- each word. This context vector contains informa- egories, structured data and unstructured natural lan- tion about the current word and its neighboring con- guage texts. It has been shown that natural text tent. In the case of CNN, the neighbors comprise clinical documents such as discharge summaries, of words in the same filter size window, while in progress notes, etc are rich sources of medically rel- Bidirectional-RNNs (Bi-RNN) they contain the en- evant information like adverse drug events, medica- tire sentence. tion prescriptions, diagnosis information etc. Infor- Graphical models and Neural Networks have their mation extracted from these natural text documents own strengths and weaknesses. While graphical can be useful for a multitude of purposes ranging models predict the entire label sequence jointly, they 1Code is available at https://github.com/abhyudaynj/LSTM- usually rely on special hand crafted features to pro- CRF-models vide good results. Neural Networks (especially Re- 856 Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 856–865, Austin, Texas, November 1-5, 2016. c 2016 Association for Computational Linguistics current Neural Networks), on the other hand, have phrase labeling. However, better ways of modeling been shown to be extremely good at identifying pat- the pairwise potential functions of CRFs might lead terns from noisy text data, but they still predict each to improvements in labeling rare entities and detect- word label in isolation and not as a part of a se- ing exact phrase boundaries. quence. In simpler terms, RNNs benefit from rec- Another important challenge in this domain is a ognizing patterns in the surrounding input features, need to model long term label dependencies. For ex- while structured learning models like CRF benefit ample, in the sentence “the patient exhibited A sec- from the knowledge about neighboring label predic- ondary to B”, the label for A is strongly related to tions. Recent work on Named Entity Recognition the label prediction of B. A can either be labeled as by (Huang et al., 2015) and others have combined an adverse drug reaction or a symptom if B is a Med- the benefits of Neural Networks(NN) with CRF by ication or Diagnosis respectively. Traditional linear modeling the unary potential functions of a CRF as chain CRF approaches that only enforce local pair- NN models. They model the pairwise potentials as wise constraints might not be able to model these a paramater matrix [A] where the entry Ai,j corre- dependencies. It can be argued that RNNs may im- sponds to the transition probability from the label i plicitly model label dependencies through patterns to label j. Incorporating CRF inference in Neural in input features of neighboring words. While this is Network models helps in labeling exact boundaries true, explicitly modeling the long term label depen- of various named entities by enforcing pairwise con- dencies can be expected to perform better. straints. In this work, we explore various methods of struc- This work focuses on labeling clinical events tured learning using RNN based feature extractors. (medication, indication, and adverse drug events) We use LSTM as our RNN model. Specifically, and event related attributes (medication dosage, we model the CRF pairwise potentials using Neural route, etc) in unstructured clinical notes from Elec- Networks. We also model an approximate version of tronic Health Records. Later on in the Section 4, skip chain CRF to capture the aforementioned long we explicitly define the clinical events and attributes term label dependencies. We compare the proposed that we evaluate on. In the interest of brevity, for the models with two baselines. The first baseline is a rest of the paper, we use the broad term “Clinical En- standard Bi-LSTM model with softmax output. The tities” to refer to all medically relevant information second baseline is a CRF model using handcrafted that we are interested in labeling. feature vectors. We show that our frameworks im- Detecting entities in clinical documents such as prove the performance when compared to the base- Electronic Health Record notes composed by hospi- lines or previously used CRF-LSTM models. To tal staff presents a somewhat different set of chal- the best of our knowledge, this is the only work fo- lenges than similar sequence labeling applications cused on usage and analysis of RNN based struc- in the open domain. This difference is partly due tured learning techniques on extraction of clinical to the critical nature of medical domain, and partly entities from EHR notes. due to the nature of clinical texts and entities therein. Firstly, in the medical domain, extraction of exact 2 Related Work clinical phrase is extremely important. The names of clinical entities often follow polynomial nomen- As mentioned in the previous sections, both Neural clature. Disease names such as Uveal melanoma Networks and Conditional Random Fields have been or hairy cell leukemia need to be identified exactly, widely used for sequence labeling tasks in NLP. since partial names ( hairy cell or melanoma) might Specially, CRFs (Lafferty et al., 2001) have a long have significantly different meanings. Addition- history of being used for various sequence labeling ally, important clinical entities can be relatively rare tasks in general and named entity recognition in par- events in Electronic Health Records. For example, ticular. Some early notable works include McCal- mentions of Adverse Drug Events occur once ev- lum et. al. (2003), Sarawagi et al. (2004) and Sha et. ery six hundred words in our corpus. CRF inference al. (2003). Hammerton et. al. (2003) and Chiu et. with NN models cited previously do improve exact al. (2015) used Long Short Term Memory (LSTM) 857 (Hochreiter and Schmidhuber, 1997) for named en- sentence is processed with a regular expression to- T tity recognition. kenizer into sequence of tokens x = [xt]1 . The to- Several recent works on both image and text based ken sequence is fed into the embedding layer, which domains have used structured inference to improve produces dense vector representation of words. The the performance of Neural Network based mod- word vectors are then fed into a bidirectional RNN els. In NLP, Collobert et al (2011) used Convolu- layer. This bidirectional RNN along with the em- tional Neural Networks to model the unary poten- bedding layer is the main machinery responsible for tials. Specifically for Recurrent Neural Networks, learning a good feature representation of the data. Lample et al. (2016) and Huang et. al. (2015) used The output of the bidirectional RNN produces a T LSTMs to model the unary potentials of a CRF. feature vector sequence ω(x) = [ω(x)]1 with the In biomedial named entity recognition, several same length as the input sequence x. In this base- approaches use a biological corpus annotated with line model, we do not use any structured inference. entities such as protein or gene name. Settles (2004) Therefore this model alone can be used to predict the T used Conditional Random Fields to extract occur- label sequence, by scaling and normalizing [ω(x)]1 . rences of protein, DNA and similar biological en- This is done by using a softmax output layer, which tity classes. Li et. al. (2015) recently used LSTM scales the output for a label l where l 1, 2, ..., L ∈ { } for named entity recognition of protein/gene names as follows: from BioCreative corpus. Gurulingappa et. al. exp(ω(x)tWj) (2010) evaluated various existing biomedical dictio- P (˜yt = j x) = (1) | L naries on extraction of adverse effects and diseases l=1 exp(ω(x)tWl) from a corpus of Medline abstracts. The entire model is trainedP end-to-end using cate- This work uses a real world clinical corpus of gorical cross-entropy loss.

Load more