Exploring the Data Work Organization of the Gene Ontology Shuheng Wu
Total Page:16
File Type:pdf, Size:1020Kb
Florida State University Libraries Electronic Theses, Treatises and Dissertations The Graduate School 2014 Exploring the Data Work Organization of the Gene Ontology Shuheng Wu Follow this and additional works at the FSU Digital Library. For more information, please contact [email protected] FLORIDA STATE UNIVERSITY COLLEGE OF COMMUNICATION AND INFORMATION EXPLORING THE DATA WORK ORGANIZATION OF THE GENE ONTOLOGY By SHUHENG WU A Dissertation submitted to the School of Information in partial fulfillment of the requirements for the degree of Doctor of Philosophy Degree Awarded: Fall Semester, 2014 Shuheng Wu defended this dissertation on October 24, 2014. The members of the supervisory committee were: Besiki Stvilia Professor Directing Dissertation Henry W. Bass University Representative Corinne L. Jörgensen Committee Member Michelle M. Kazmer Committee Member The Graduate School has verified and approved the above-named committee members, and certifies that the dissertation has been approved in accordance with university requirements. ii I dedicate this dissertation to my beloved mother Peiqiong Ou, and my father, my husband, and those who have supported and helped me. It is all of you who have helped me grow and become who I am today. iii ACKNOWLEDGMENTS My deepest appreciation is owed to my major professor Dr. Besiki Stvilia and his family. Without his advice and help, I cannot even imagine if I could finish my doctoral coursework. Thanks go to him for introducing Activity Theory to me and for developing a theoretical framework that I can use in my current and future studies. Because of working with him, I learned about the beauty of theory and the power of methodology, which will benefit my future research and career. Thanks again for his support, hard work, and wisdom. Thanks also to Dr. Hank Bass. Without his advice and help, I could never finish my dissertation. I would also like to thank my other committee members Dr. Corinne Jörgensen and Dr. Michelle Kazmer. Their dedication to and passion for research in LIS set a role model for me as a researcher. Special thanks go to Dr. Gary Burnett, who are always willing to give me advice on writing and research. I would also like to express my thanks to Dr. Kathleen Burnett for her generous support to the international students of the school. I cannot help thanking Deborah Paul, Dr. Paula Mabee, and Dr. Greg Riccardi for connecting me to the insiders of the Gene Ontology community. Thanks are also due to the people of the Gene Ontology community and other interviewees, who were willing to spend time to answer my endless questions and emails. I really appreciate that I can study such an amazing community. I would also like to thank Naiqian Zhan for helping me recruit participants and being with me through the good and bad times. Thanks must go to my cohort Adam Worrall, who helped review tons of my papers and transcriptions through my doctoral program. One of my best times in the program was to discuss research with you. I would also like to thank my other cohorts Jung Hoon Baeg, Aisha Johnson, Sheila Baker, and Janice Newsum. Special thanks must go to Nicole Alemanne, Aaron Elkins, Yong Jeong Yi, and Melinda Whetstone, who are always willing to give me help and advice. I would also like to thank my wonderful friends in the program: Min Sook Park, Jongwook Lee, Blake Robinson, Hengyi Fu, Biyang Yu, and all the other doctoral students. My deepest love goes to my husband, parents, and mother-in-law. Thanks Xiang Wang for helping with my data collection and participant recruitment. Thanks to my father for inspiring me to pursue the doctoral degree. Lastly and most importantly, I would like to express my thanks, love, and respect to my dearest mother. It is you who gave me the power to finish my dissertation. I am very proud of being your daughter. I hope I make you proud and happy. iv TABLE OF CONTENTS List of Tables ................................................................................................................................. ix List of Figures ................................................................................................................................. x Abstract .......................................................................................................................................... xi 1. INTRODUCTION ...................................................................................................................... 1 1.1 Problem Statement ............................................................................................................ 1 1.1.1 Scientific Data Curation ........................................................................................... 1 1.1.2 Bio-ontologies .......................................................................................................... 3 1.1.3 The Gene Ontology .................................................................................................. 4 1.1.4 Research Purpose and Significance .......................................................................... 7 1.2 Research Questions ........................................................................................................... 8 1.3 Theoretical Frameworks .................................................................................................... 9 1.3.1 Activity Theory ........................................................................................................ 9 1.3.2 Stvilia’s Information Quality Assessment Framework .......................................... 11 1.4 Research Design .............................................................................................................. 12 1.5 Conclusion ....................................................................................................................... 14 2. LITERATURE REVIEW ......................................................................................................... 15 2.1 Knowledge Organization Systems .................................................................................. 15 2.1.1 Term Lists ............................................................................................................... 16 2.1.2 Classifications and Categories ................................................................................ 16 2.1.3 Relationship Lists ................................................................................................... 18 2.1.4 Folksonomies .......................................................................................................... 20 2.1.5 Structure of KO Systems ........................................................................................ 21 2.1.6 Comparison between Ontologies and Other KO Systems ...................................... 26 2.1.7 Knowledge Organization in Scientific Data Management ..................................... 26 2.2 Ontology Development ................................................................................................... 27 2.2.1 Ontology Development Tools ................................................................................ 28 2.2.2 Ontology Development Methodologies ................................................................. 29 2.3 Bio-ontologies ................................................................................................................. 30 2.4 The Gene Ontology ......................................................................................................... 32 2.4.1 The GO Term Record ............................................................................................. 33 2.4.2 The GO Structure ................................................................................................... 34 2.4.3 The Development and Maintenance of the GO ...................................................... 34 2.5 Data Quality .................................................................................................................... 35 2.5.1 Data Quality Assessment Models ........................................................................... 36 2.5.2 Scientific Data Quality Problems ........................................................................... 40 v 2.6 Activity Theory ............................................................................................................... 41 2.6.1 The Origin and Development of Activity Theory .................................................. 42 2.6.2 Principles of Activity Theory ................................................................................. 45 2.6.3 Previous Applications ............................................................................................. 47 2.6.4 Strengths and Limitations ....................................................................................... 52 2.7 Stvilia’s Information Quality Assessment Framework ................................................... 53 2.7.1 Concepts ................................................................................................................. 53 2.7.2 Components and Relationships among the Components ....................................... 57 2.7.3 How to Use ............................................................................................................. 58 2.7.4 Previous Applications ............................................................................................. 59 2.7.5 Strengths and Limitations ......................................................................................