Hierarchical Neural Architectures for Classifying Cancer Pathology Reports
Total Page:16
File Type:pdf, Size:1020Kb
University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 12-2019 Hierarchical Neural Architectures for Classifying Cancer Pathology Reports Shang Gao University of Tennessee, [email protected] Follow this and additional works at: https://trace.tennessee.edu/utk_graddiss Recommended Citation Gao, Shang, "Hierarchical Neural Architectures for Classifying Cancer Pathology Reports. " PhD diss., University of Tennessee, 2019. https://trace.tennessee.edu/utk_graddiss/5722 This Dissertation is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a dissertation written by Shang Gao entitled "Hierarchical Neural Architectures for Classifying Cancer Pathology Reports." I have examined the final electronic copy of this dissertation for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Doctor of Philosophy, with a major in Computer Science. Georgia Tourassi, Major Professor We have read this dissertation and recommend its acceptance: Arvind Ramanathan, Hairong Qi, Russell Zaretzki Accepted for the Council: Dixie L. Thompson Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) Hierarchical Neural Architectures for Classifying Cancer Pathology Reports A Dissertation Presented for the Doctor of Philosophy Degree The University of Tennessee, Knoxville Shang Gao December 2019 c by Shang Gao, 2019 All Rights Reserved. ii This work is dedicated to Nala, my landlord's cat, who thinks she is the queen of the universe. iii Acknowledgments This work was made possible thanks to the help of my esteemed advisors, Arvind Ramanathan and Georgia Tourassi, and my committee members, Hairong Qi and Russell Zaretzki. I would also like to thank the members of the Biomedical Sciences, Engineering, and Computing group at Oak Ridge National Laboratory for their help with this work { Mohammed Alawad, John Qiu, Jacob Hinkle, Noah Schaefferkoetter, Hong-Jun Yoon, and Blair Christian. Lastly, I would like to thank my wife Dasha Herrmannova for her support and for proofreading this work. This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DEAC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE- AC05-00OR22725 This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. iv Abstract Electronic health records (EHRs) are the primary method for documenting and storing patient outcomes in modern healthcare; data mining and machine learning approaches utilize the information stored in EHRs to assist in clinical decision support and other critical healthcare applications. Important information in EHRs is often stored in the form of unstructured clinical text. Unfortunately, the state-of-the-art methods used to automatically extract useful information from unstructured clinical text lags significantly behind the state- of-the-art methods used in the general natural language processing (NLP) community for other tasks such as machine translation, question answering, and sentiment analysis. In this work, we attempt to bridge this gap by applying and developing hierarchical neural approaches to classify key data elements in cancer pathology reports, such as cancer site, histology, grade, and behavior. We (1) show that a hierarchical attention network (HAN), which has strong performance on classifying general text such as Yelp reviews and news snippets, achieves better classification accuracy and macro F-score on identifying cancer site and grade than previous state-of-the-art approaches, (2) develop a novel hierarchical self-attention network (HiSAN) which not only achieves better accuracy and macro F-score in cancer pathology pathology report classification than the HAN but also trains over 10x faster, and (3) introduce a hierarchical framework for incorporating case-level context when classifying cancer pathology reports and show that it gives a significant boost in accuracy and macro F-score. v Table of Contents 1 Introduction1 1.1 State of the Art..................................4 1.1.1 General NLP...............................4 1.1.2 Clinical NLP...............................6 1.1.3 Classifying Cancer Pathology Reports..................7 1.2 Research Objectives and Approach....................... 10 1.2.1 Hierarchical Attention Networks..................... 12 1.2.2 Hierarchical Self-Attention Networks.................. 14 1.2.3 Capturing Case-Level Context...................... 16 1.3 Dissertation Outline............................... 18 2 Hierarchical Attention Networks for Information Extraction from Cancer Pathology Reports 20 2.1 Abstract...................................... 21 2.2 Objective..................................... 22 2.3 Background.................................... 23 2.4 Materials and Methods.............................. 26 2.4.1 Dataset Description and Preprocessing................. 26 2.4.2 Word Embeddings............................ 28 2.4.3 Hierarchical Attention Network..................... 28 2.4.4 Hyperparameter Optimization...................... 31 2.4.5 Unsupervised Pretraining Step...................... 31 2.4.6 Experimental Setup and Evaluation Metrics.............. 32 vi 2.4.7 Model Baseline Comparisons....................... 33 2.5 Experimental Results............................... 35 2.5.1 Primary Site Classification........................ 35 2.5.2 Histological Grade Classification..................... 35 2.5.3 Model Visualizations........................... 37 2.5.4 Error Analysis............................... 37 2.6 Discussion and Conclusions........................... 40 3 Classifying Cancer Pathology Reports with Hierarchical Self-Attention Networks 45 3.1 Abstract...................................... 46 3.2 Introduction.................................... 47 3.3 Background.................................... 48 3.4 Materials and Methods.............................. 51 3.4.1 Dataset Details.............................. 51 3.4.2 Hierarchical Self-Attention Networks.................. 53 3.5 Results....................................... 58 3.5.1 Baselines, Hyperparameters, and Setup Details............. 59 3.5.2 Evaluation Metrics............................ 60 3.5.3 Experimental Results........................... 62 3.6 Discussion..................................... 66 3.6.1 Error Analysis............................... 66 3.6.2 Visualizing Document Embeddings................... 68 3.6.3 Visualizing Word Importance...................... 68 3.7 Conclusion..................................... 72 4 Using Case-Level Context to Classify Cancer Pathology Reports 74 4.1 Abstract...................................... 75 4.2 Introduction.................................... 75 4.3 Background.................................... 77 4.4 Materials and Methods.............................. 78 vii 4.4.1 Problem Description........................... 78 4.4.2 Capturing Case-Level Context...................... 79 4.4.3 Dataset.................................. 86 4.4.4 Baseline Models.............................. 87 4.5 Experiments.................................... 88 4.5.1 Setup Details............................... 88 4.5.2 Evaluation Metrics............................ 89 4.5.3 Results................................... 90 4.6 Discussion..................................... 93 4.7 Conclusion..................................... 98 5 Conclusion 100 5.1 Summary of Results............................... 100 5.2 Impact and Implications............................. 103 5.3 Transfer Learning................................. 105 5.4 Applications to General Text.......................... 109 5.5 A Note on ELMo and BERT........................... 113 5.6 Additional Future Work............................. 116 Bibliography 118 Appendices 137 A Chapter2 Supporting Information....................... 138 A.1 Preprocessing Pipeline.......................... 138 A.2 Preprocessing Procedures........................ 138 A.3 Corpus Histogram............................ 139 A.4 LSTMs and GRUs............................ 140 A.5 Attention Mechanism........................... 141 A.6 Hyperparameter Optimization Setup.................. 142 A.7 TF-IDF Weighted Word Embeddings.................. 143 A.8 Bootstrapping Procedures for Metrics.................. 143 viii A.9 Doc2Vec Embeddings........................... 144 A.10 CNN Embeddings............................. 145 A.11 Most Important Words for Each Classification Task.......... 145 A.12 Model Prediction Time.......................... 146 A.13 Confusion Matrices............................ 146