Text Mining of Mutations and Their Impact from Biomedical Literature

TEXT MINING OF MUTATIONS AND THEIR IMPACT FROM BIOMEDICAL LITERATURE by A. S. M. Ashique Mahmood A dissertation submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Fall 2018 c 2018 A. S. M. Ashique Mahmood All Rights Reserved TEXT MINING OF MUTATIONS AND THEIR IMPACT FROM BIOMEDICAL LITERATURE by A. S. M. Ashique Mahmood Approved: Kathleen F. McCoy, Ph.D. Chair of the Department of Computer and Information Sciences Approved: Babatunde A. Ogunnaike, Ph.D. Dean of the College of Engineering Approved: Douglas J. Doren, Ph.D. Interim Vice Provost for Graduate and Professional Education I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Vijay K. Shanker, Ph.D. Professor in charge of dissertation I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Cathy H. Wu, Ph.D. Member of dissertation committee I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Li Liao, Ph.D. Member of dissertation committee I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Peter McGarvey, Ph.D. Member of dissertation committee ACKNOWLEDGEMENTS This research work is the outcome of years-long dedication and patience. But, it would not have been possible without the support from many people around me. First of all, I express my gratitude towards my advisor and mentor, Prof. Vijay K Shanker. Throughout the research journey, his continuous advisement, mentoring and encouragement played an integral role in shaping up this dissertation. Specially, Prof. Shanker taught me how to think critically about a research problem, how to effectively write research papers and how to present research in front of peers. These skills have helped me in my research in many ways and I believe I will continue to benefit from them in my future career. I am truly grateful for all he has done for me. I thank my dissertation committee members: Prof. Cathy Wu, Prof. Li Liao and Dr. Peter McGarvey. Despite their busy schedules, they were kind enough to serve in my dissertation committee and helped me with their suggestions and insights regarding the applicability of my research work. I am grateful for their invaluable time and attention towards this dissertation. I have spend many wonderful years in the BioTM lab. I have come across wonderful minds in BioTM lab, who also played roles in shaping my research. I am thankful to Oana (Catalina Tudor) for mentoring me when I first joined the lab. She helped me getting into the NLP research world. I fondly remember former and present members of BioTM lab: Gang Li, Yifan Peng, Samir Gupta, Ruoyao Ding, Jia Ren and Peng Su. We spent a lot of time together, be it for \research" or leisurely activities. We had fun together in hacking into some new cool tool as well as watching UCL matches. Thank you guys! Since I moved to USA, I am lucky to have wonderful friends who were there for me always. It would take pages if I start listing why they are special to me. iv Instead, I just express my heartfelt gratitude to Farzana Khair, Musawir Chowdhury, Shermin Ashraf, Saif Tahsin, Samara Saif, Tareque Aziz, Firdous Saleheen, Purujit Saha, Laura Moum, Sonia Jahan, Fazle Rob, Mahfuzur Khan, Zannatun Noor, Rifat Lutful, Shafique Ahmed, Mithub Deb and Dabojani Das for their kind friendship. They all made me feel home, while away from home. I thank my family and relatives for their unconditional love and support that shaped my entire life. My parents, Shaheen Sultana and Mahbubul Hoq, have always believed in me and encouraged in every step of my academic journey. I cannot thank my parents enough for this. Specially, without the love, care and sacrifices from my mother, I would not be the person that I am today. Thank you mom! And last but not the least, I thank my wife Nancy (Tanjima Ferdous). We got married while I was a PhD student; and since then, she has supported my PhD journey through unconditional love, sacrifices, encouragement and patience. She is the best partner and companion I could wish for. I love her and I am grateful for all she has done for me. In addition, I would like to thank my department (CIS, UDEL) for this wonderful opportunity of graduate education as well as for the financial support at the be- ginning. I also thank the funding agencies who continued to fund the research projects that I was involved with. I am grateful to our research collaborators in Georgetown University, George Washington University, Delaware Biotechnology Institute (DBI) and University of Delaware for the countless fruitful discussions, from which I learned a lot. In a nutshell, I am grateful to each and everyone who supported my journey, in one way or other. To everyone I mentioned and forgot to mention, thank you. v TABLE OF CONTENTS LIST OF TABLES :::::::::::::::::::::::::::::::: x LIST OF FIGURES ::::::::::::::::::::::::::::::: xi ABSTRACT ::::::::::::::::::::::::::::::::::: xii Chapter 1 INTRODUCTION :::::::::::::::::::::::::::::: 1 1.1 Motivation ::::::::::::::::::::::::::::::::: 1 1.2 Thesis contributions ::::::::::::::::::::::::::: 2 1.2.1 Mutation detection :::::::::::::::::::::::: 2 1.2.2 Mutation-disease association ::::::::::::::::::: 2 1.2.3 Impact of genomic anomalies on drug responses :::::::: 3 1.2.4 Mutation impact on PPI ::::::::::::::::::::: 4 1.3 Outline of the dissertation :::::::::::::::::::::::: 5 2 MUTATION DETECTION :::::::::::::::::::::::: 6 2.1 Introduction :::::::::::::::::::::::::::::::: 6 2.2 Related works ::::::::::::::::::::::::::::::: 6 2.3 Approach ::::::::::::::::::::::::::::::::: 7 2.3.1 Mutation detection :::::::::::::::::::::::: 8 2.3.2 Genotype/Allele detection :::::::::::::::::::: 9 2.3.3 Mutation-gene association :::::::::::::::::::: 10 2.4 Evaluation ::::::::::::::::::::::::::::::::: 13 2.4.1 Evaluation setup ::::::::::::::::::::::::: 13 vi 2.4.2 Evaluation metrics :::::::::::::::::::::::: 14 2.5 Results and discussion :::::::::::::::::::::::::: 15 2.5.1 Results on mutation detection :::::::::::::::::: 15 2.5.2 Results on mutation-gene association :::::::::::::: 16 2.6 Conclusion ::::::::::::::::::::::::::::::::: 17 3 MUTATION-DISEASE ASSOCIATION :::::::::::::::: 19 3.1 Introduction :::::::::::::::::::::::::::::::: 19 3.2 Related works ::::::::::::::::::::::::::::::: 23 3.3 Approach ::::::::::::::::::::::::::::::::: 24 3.3.1 General relation extraction system ::::::::::::::: 24 3.3.2 CAIR relations :::::::::::::::::::::::::: 26 3.3.3 MF relations ::::::::::::::::::::::::::: 28 3.3.4 Statistical relations :::::::::::::::::::::::: 28 3.3.5 Co-occurrence in title/conclusion :::::::::::::::: 29 3.3.6 Extracting specific information ::::::::::::::::: 29 3.3.6.1 Extracting mutations :::::::::::::::::: 29 3.3.6.2 Extracting diseases ::::::::::::::::::: 30 3.3.6.3 Patient Context (PC) sentence :::::::::::: 30 3.3.7 Extracting additional information :::::::::::::::: 31 3.3.7.1 Rhetorical zones :::::::::::::::::::: 31 3.3.7.2 Patient related information :::::::::::::: 32 3.4 System implementation :::::::::::::::::::::::::: 34 3.5 Evaluation ::::::::::::::::::::::::::::::::: 35 3.5.1 Evaluation setup ::::::::::::::::::::::::: 35 3.5.2 Evaluation metrics :::::::::::::::::::::::: 35 3.6 Results and discussion :::::::::::::::::::::::::: 36 3.6.1 Results on annotated datasets :::::::::::::::::: 36 3.6.2 Full-scale processing ::::::::::::::::::::::: 37 3.7 Conclusion ::::::::::::::::::::::::::::::::: 38 vii 4 IMPACT OF GENOMIC ANOMALIES ON DRUG RESPONSES 39 4.1 Introduction :::::::::::::::::::::::::::::::: 39 4.2 Related works ::::::::::::::::::::::::::::::: 42 4.3 Approach ::::::::::::::::::::::::::::::::: 43 4.3.1 Different information types :::::::::::::::::::: 43 4.3.1.1 Association ::::::::::::::::::::::: 44 4.3.1.2 Comparison ::::::::::::::::::::::: 44 4.3.1.3 Biomarker :::::::::::::::::::::::: 46 4.3.1.4 Sensitization :::::::::::::::::::::: 47 4.3.2 Syntactic processing ::::::::::::::::::::::: 47 4.3.3 Entity recognition ::::::::::::::::::::::::: 48 4.3.4 Typing of phrases ::::::::::::::::::::::::: 50 4.3.5 Pattern matching ::::::::::::::::::::::::: 51 4.3.6 Extracting specific information ::::::::::::::::: 52 4.3.6.1 Extracting drugs :::::::::::::::::::: 52 4.3.6.2 Extracting diseases ::::::::::::::::::: 52 4.3.7 Extracting additional information :::::::::::::::: 53 4.4 System implementation :::::::::::::::::::::::::: 53 4.5 Evaluation ::::::::::::::::::::::::::::::::: 54 4.5.1 Evaluation setup ::::::::::::::::::::::::: 54 4.5.2 Evaluation metrics :::::::::::::::::::::::: 56 4.6 Results and discussion :::::::::::::::::::::::::: 57 4.6.1 Results on annotated datasets :::::::::::::::::: 57 4.7 Conclusion ::::::::::::::::::::::::::::::::: 59 5 MUTATION IMPACT ON PROTEIN-PROTEIN INTERACTIONS :::::::::::::::::::::::::::::: 60 5.1 Introduction :::::::::::::::::::::::::::::::: 60 5.2 Related works ::::::::::::::::::::::::::::::: 62 viii 5.3 Approach ::::::::::::::::::::::::::::::::: 63 5.3.1 Extraction of PPI relation ::::::::::::::::::::

Text Mining of Mutations and Their Impact from Biomedical Literature

BMC Bioinformatics Biomed Central

AI and Bioinformatics

Alliheedi Mohammed.Pdf (7.910Mb)

Are You an Invited Speaker? a Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics

Syntactic Analyses and Named Entity Recognition for Pubmed and Pubmed Central — Up-To-The-Minute

Adding Value to Scholarly Communications Through Text Mining

Syntactic Analyses and Named Entity Recognition for Pubmed and Pubmed Central — Up-To-The-Minute

Text Mining for Biomedicine an Overview: Selected Bibliography

Dear Delegates,History of Productive Scientiﬁc Discussions of New Challenging Ideas and Participants Contributing from a Wide Range of Interdisciplinary ﬁelds

Themes in Biomedical Natural Language Processing: Bionlp08

Development and Analysis of NLP Pipelines in Argo

Improved Prediction of Protein Secondary Structure by Use Of