Alliheedi Mohammed.Pdf (7.910Mb)
Total Page:16
File Type:pdf, Size:1020Kb
Procedurally Rhetorical Verb-Centric Frame Semantics as a Knowledge Representation for Argumentation Analysis of Biochemistry Articles by Mohammed Alliheedi A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in Computer Science Waterloo, Ontario, Canada, 2019 c Mohammed Alliheedi 2019 Examining Committee Membership External Examiner: Vlado Keselj Professor, Faculty of Computer Science Dalhousie University Supervisor(s): Robert E. Mercer Professor, Dept. of Computer Science, The University of Western Ontario Robin Cohen Professor, School of Computer Science, University of Waterloo Internal Member: Jesse Hoey Associate Professor, School of Computer Science, University of Waterloo Internal-External Member: Randy Harris Professor, Dept. of of English Language and Literature, University of Waterloo Other Member(s): Charles Clarke Professor, School of Computer Science, University of Waterloo ii I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. iii Abstract The central focus of this thesis is rhetorical moves in biochemistry articles. Kanoksila- patham has provided a descriptive theory of rhetorical moves that extends Swales' CARS model to the complete biochemistry article. The thesis begins the construction of a com- putational model of this descriptive theory. Attention is placed on the Methods section of the articles. We hypothesize that because authors' argumentation closely follows their experimental procedure, procedural verbs may be the guide to understanding the rhetor- ical moves. Our work proposes an extension to the normal (i.e., VerbNet) semantic roles especially tuned to this domain. A major contribution is a corpus of Method sections that have been marked up for rhetorical moves and semantic roles. The writing style of this genre tends to occasionally omit semantic roles, so another important contribution is a prototype ontology that provides experimental procedure knowledge for the biochem- istry domain. Our computational model employs machine learning to build its models for the semantic roles and rhetorical moves, validated against a gold standard reflecting the annotation of these texts by human experts. We provide significant insights into how to derive these annotations, and as such have contributions as well to the general challenge of producing markups in the domain of biomedical science documents, where specialized knowledge is required. iv Acknowledgements This thesis would not have been possible foremost without the assistance and support of Allah (God) and then the help and support of many individuals in so many ways. At the top of the list of whom I wish to credit are my supervisors, Robert (Bob) E. Mercer, Robin Cohen, and former supervisor, Chrysanne DiMarco. Their knowledge and friendship were invaluable for the completion of this thesis. I would like to express my deepest appreciation to them for all of the things I have learned from them during the years I spent under their supervision in the PhD program. My supervisors have been always available and open to discuss and provide advice on different issues either on my research or outside the research area. I will never forget not only their positive attitude during the PhD years, but also their understanding and support when most needed. It was a genuine honor to work with all of them. I also would like to express my gratitude and admiration to the members of my com- mittee, Vlado Keselj, Jessy Hoey, Randy Harris, and Charles Clarke, for their guidance and encouragement throughout this process. I am especially grateful to the many people who contributed to this work in a variety of ways. My appreciation and thanks extend also to my parents: Abdulrahman Alliheedi and Johra Alliheedi. My Mother (Johra) was the person who most encouraged and supported me to pursue a higher education. She was consistently available to offer a variety of support and advice. She will remain the greatest teacher in my life, from whom I learned a lot. I also cannot find enough words to express gratitude to my mother and my father for their presence in my life. I thank them for all of the support and patience they have shown since my first year in Canada to the present day. This thesis is dedicated to my parents especially my mother for her endless patience, her unconditional love, and encouragement. Her courage, support and confidence will always inspire me. I am forever indebted to my family: to my wife, Suad, for her understanding, endless patience, and encouragement, support and confidence, and her unconditional love when it was most required; and to my children for being here at the right time, for their uncondi- tional love, and for allowing me to have the time that I would have otherwise spent with them. I also thank Suad for being part of my daily life during my PhD years. Her presence has added valuable things to my life. Without her support and encouragement, completion of this thesis would not have been possible. She shared in all of the experiences I had in my PhD program day and night. I thank Suad for all of her support and encouragement. I am really blessed to have a wife like her. I wish also to express gratitude to my brother, Thamer, and sisters, Tahani and Nawal, for their usual supportive contacts and encour- agement during my years in Waterloo. Their kind and uplifting words were a meaningful source of empowerment. v I would like also to thank all my friends for the great friendship we established during the PhD studies. We spent countless hours discussing different issues in research and academia in general, from which I benefited greatly. My thanks extend to all my colleagues for all of the enjoyable times we spent together. Finally, I am also extremely grateful to Al Baha University and the Saudi Bureau for funding my PhD. vi Dedication This is dedicated to my family. vii Table of Contents List of Tables xii List of Figures xiv 1 Introduction1 1.1 Background...................................1 1.2 The Problem Statement............................2 1.3 An Overview of Our Design Process......................4 1.3.1 Support for Our Approach within the NLP Community.......7 1.4 Contributions..................................8 2 Background 10 2.1 Argumentation Theory............................. 10 2.1.1 What is Argumentation........................ 10 2.1.2 Classical Models of Argumentation.................. 11 2.1.3 Rhetorical Approaches to Argumentation.............. 12 2.2 Computational Argumentation......................... 15 2.2.1 Approaches for Recognizing Argumentation Schemes........ 15 2.2.2 Approaches for Detecting Argumentation.............. 17 2.3 Remarks..................................... 25 viii 3 Rhetorical Moves for Biochemistry Articles 27 3.1 Narrowing our Focus to the Methods Section of Biochemistry Texts.... 28 3.2 Gaining Insights into a Set of Rhetorical Moves to Model.......... 30 3.2.1 Manual Tagging of Rhetorical Moves in a Corpus.......... 31 3.3 Reflection on Our Proposed Set of Rhetorical Moves............. 37 4 Semantic Roles 39 4.1 Experimental Procedure-oriented Writing................... 40 4.1.1 Procedural Verbs............................ 42 4.1.2 Semantic Roles............................. 42 4.2 Frame Semantics................................ 46 4.3 Semantic Role Labelling............................ 48 4.3.1 Pre-processing Step for Our Model Learning............. 50 4.3.2 Model Learning............................. 51 4.3.3 Results and Discussion......................... 52 4.4 Our Developed SRL System.......................... 54 4.4.1 Pre-processing Step for Our SRL Prediction............. 54 4.5 Remarks..................................... 56 4.6 The Derivation of Our Set of Semantic Roles................. 58 5 Annotation 60 5.1 Analysis of Experimental Procedures..................... 60 5.1.1 Implicit Knowledge........................... 61 5.1.2 General vs. Procedural Verbs..................... 61 5.1.3 Sequence of Events in Procedure-oriented Writing.......... 62 5.2 Data Set..................................... 64 5.3 Annotation Guidelines............................. 65 5.3.1 Annotation Scheme for Experimental Events............. 65 ix 5.3.2 Annotation for Rhetorical Moves................... 66 5.3.3 Annotation for Semantic Roles..................... 67 5.3.4 Human Input and Annotation Procedures.............. 72 5.4 Inter-annotator Agreement........................... 73 5.4.1 Identification of Semantic Roles.................... 73 5.4.2 Identification of Rhetorical Moves................... 74 5.5 Remarks..................................... 76 6 Ontology 78 6.1 Background Information............................ 78 6.2 Related Work.................................. 79 6.3 Procedure-oriented Ontology.......................... 81 6.3.1 Classes and Properties......................... 81 6.3.2 Relations................................ 85 6.4 Case Study................................... 88 6.4.1 Ontology Queries using SPARQL................... 91 6.5 Remarks..................................... 92 7 Rhetorical Moves Revisited and the System as a Whole 97 7.1 Rhetorical Moves in Biochemistry Articles.................. 98 7.2 The Overall Structure of Our Framework................... 99 7.2.1 Preliminary Validation of the Rhetorical Moves........... 102 7.3