Neural Text Simplification of Clinical Letters with a Domain Specific

Neural Text Simplification of Clinical Letters with a Domain Specific Phrase Table Matthew Shardlow Raheel Nawaz Department of Computing Department of Operations, Technology, and Mathematics Events and Hospitality Management Manchester Metropolitan University Manchester Metropolitan University [email protected] [email protected] Abstract of Medical Royal Colleges, 2018). Inspired by this document, we took data from publicly avail- Clinical letters are infamously impenetrable for the lay patient. This work uses neural able datasets of clinical letters (Section3), used text simplification methods to automatically state of the art Neural Text Simplification software improve the understandability of clinical let- to improve the understandability of these docu- ters for patients. We take existing neural text ments (Section4) analysed the results and iden- simplification software and augment it with tified errors (Section5), built a parallel vocabu- a new phrase table that links complex medi- lary of complex and simple terms (Section6), inte- cal terminology to simpler vocabulary by min- grated this into the simplification system and eval- ing SNOMED-CT. In an evaluation task using crowdsourcing, we show that the results uated this with human judges, showing an overall of our new system are ranked easier to under- improvement (Section7). stand (average rank 1.93) than using the original system (2.34) without our phrase table. We 2 Related Work also show improvement against baselines in- cluding the original text (2.79) and using the The idea of simplifying texts through machine phrase table without the neural text simplifica- translation has been around some time (Wubben tion software (2.94). Our methods can easily et al., 2012; Xu et al., 2016), however with re- be transferred outside of the clinical domain cent advances in machine translation leveraging by using domain-appropriate resources to pro- deep learning (Wu et al., 2016), text simplifica- vide effective neural text simplification for any tion using neural networks (Wang et al., 2016; domain without the need for costly annotation. Nisioi et al., 2017; Sulem et al., 2018) has be- 1 Introduction come a realistic prospect. The Neural Text Sim- plification (NTS) system (Nisioi et al., 2017) uses Text Simplification is the process of automatically the freely available OpenNMT (Klein et al., 2017) improving the understandability of a text for an software package1 which provides sequence to se- end user. In this paper, we use text simplification quence learning between a source and target lan- methods to improve the understandability of clin- guage. In the simplification paradigm, the source ical letters. Clinical letters are written by doctors language is difficult to understand language and and typically contain complex medical language the target language is an easier version of that lan- that is beyond the scope of the lay reader. A pa- guage (in our case both English, although other tient may see these if they are addressed directly, languages can be simplified using the same ar- or via online electronic health records. If a patient chitecture). The authors of the NTS system pro- does not understand the text that they are reading, vide models trained on parallel data from English this may cause them to be confused about their di- Wikipedia and Simple English Wikipedia which agnosis, prognosis and clinical findings. Recently, can be used to simplify source documents in En- the UK Academy of Medical Royal Colleges in- glish. NTS provides lexical simplifications at troduced the “Please Write to Me” Campaign, the level of both single lexemes and multiword which encouraged clinicians to write directly to expressions in addition to syntactic simplifica- patients, avoid latin-phrases and acronyms, ditch tions such as paraphrasing or removing redundant redundant words and generally write in a man- ner that is accessible to a non-expert (Academy 1http://opennmt.net/ 380 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 380–389 Florence, Italy, July 28 - August 2, 2019. c 2019 Association for Computational Linguistics grammatical structures. Neural Machine Transla- i2b2 MIMIC Total tion is not perfect and may sometimes result in er- Records 149 150 299 rors. A recent study found that one specific area of Words 80,273 699,798 780,071 concern was lexical cohesion (Voita et al., 2019), Avg. Words 538.7 4665.3 2,608.9 which would affect the readability and hence sim- plicity of a resulting text. Table 1: Corpus statistics Phrase tables for simplification have also been applied in the context of paraphrasing systems filtered these for all records containing more than where paraphrases are identified manually (Hoard 10 tokens. This gave us 149 records to work with. et al., 1992) or learnt from corpora (Yatskar et al., We concatenated all the information from each 2010; Grabar et al., 2014; Hasan et al., 2016) and record into one file and did no further preprocess- stored in a phrase table for later application to a ing of this data as it was already tokenised and nor- text. A paraphrase consists of a complex phrase malised sufficiently. paired with one or more simplifications of that phrase. These are context specific and must be 3.2 MIMIC applied at the appropriate places to avoid seman- In addition to i2b2, we also downloaded data from tic errors that lead to loss of meaning (Shardlow, MIMIC-III v1.4 (Johnson et al., 2016) (referred to 2014). herein as MIMIC). MIMIC provides over 58,000 The clinical/medical domain recieves much at- hospital records, with detailed clinical informa- tention for NLP (Shardlow et al., 2018; Yunus tion regarding a patient’s care. One key differ- et al., 2019; Jahangir et al., 2017; Nawaz et al., ence between MIMIC and i2b2 was that each 2012) and is well suited to the task of text simplifi- of MIMIC’s records contained multiple discrete cation as there is a need for experts (i.e., clinicians) statements separated by time. We separated these to communicate with non-experts (i.e., patients) in sub-records, and selected the 150 with the largest a language commonly understood by both. Previ- number of tokens. This ensured that we had se- ous efforts to address this issue via text simplifi- lected a varied sample from across the documents cation have focussed on (a) public health informa- that were available to us. We did not use all the tion (Kloehn et al., 2018), where significant inves- data available to us due to the time constraints of tigations have been undertaken to understand what (a) running the software and (b) performing the makes language difficult for a patient and (b) the analysis on the resulting documents. We prepro- simplification of medical texts in the Swedish lan- cessed this data using the tokenisation algorithm guage (Abrahamsson et al., 2014), which presents distributed with OpenNMT. its own unique set of challenges for text simplification due to compound words. 4 Neural Text Simplification 3 Data Collection We used the publicly available NTS system (Ni- sioi et al., 2017). This package is freely avail- To assess the impact of simplification on patient able via GitHub2. We chose to use this rather understanding, we obtained 2 datasets represent- than reimplementing our own system as it allows ing clinical texts that may be viewed by a patient. us to better compare our work to the current state We selected data from the i2b2 shared task, as well of the art and makes it easier for others to repro- as data from MIMIC. A brief description of each duce our work. We have not included details of the dataset, along with the preprocessing we applied specific algorithm that underlies the OpenNMT is below. We selected 149 records from i2b2 and framework, as this is not the focus of our paper 150 from MIMIC. Corpus statistics are given in and is reported on in depth in the original paper, Table1. where we would direct readers. Briefly, their system uses an Encoder-Decoder LSTM layer with 3.1 i2b2 500 hidden units, dropout and attention. Original The i2b2 2006 Deidentification and Smoking words are substituted when an out of vocabulary Challenge (Uzuner et al., 2007) consists of 889 word is detected, as this is appropriate in mono- unannotated, de-identified discharge summaries. 2https://github.com/senisioi/ We selected the test-set of 220 patient records and NeuralTextSimplification/ 381 lingual machine translation. The simplification i2b2 MIMIC model that underpins the NTS software is trained Flesch Pre 8.70 6.40 using aligned English Wikipedia and Simple En- Kincaid Post 6.46 4.84 glish Wikipedia data. This model is distributed as P-Value < 0:001 < 0:001 part of the software. Gunning Pre 14.53 12.69 We ran the NTS software on each of our 299 Fox Post 12.35 7.36 records to generate a new simplified version of P-Value < 0:001 < 0:001 each original record. We used the standard param- Coleman Pre 10.60 10.12 eters given with the NTS software as follows: Liau Post 9.04 5.90 P-Value < 0:001 < 0:001 Beam Size = 5: This parameter controls the beam search that is used to select a final sentence. Table 2: The results of calculating 3 readability indices A beam size of 1 would indicate greedy on the texts before and after simplification. We show search. a significant reduction in the metrics in each case indi- cating that the texts after simplification are suitable for n-best = 4: This causes the 4 best translations to a lower reading grade level. be output, although in practice, we only selected the best possible translation in each takes into account the ratio of words to sen- case.

Load more