Sequence and Functional Analysis of Schistosoma
Total Page:16
File Type:pdf, Size:1020Kb
MINING FOR CONSERVED MOTIFS AND SIGNIFICANT FUNCTIONS IN S. MANSONI CERCARIAL SECRETIONS Amy L. Schmidbauer Submitted to the faculty of the School of Informatics in partial fulfillment of the requirements for the degree Master of Science in Bioinformatics, Indiana University December 30, 2006 ii Accepted by the Faculty of Indiana University, in partial fulfillment of the requirements for the degree of Master of Science in Bioinformatics. (Committee Chair’s signature)_______________________ Sean D. Mooney, Ph.D., Chair Master’s Thesis Committee (Second member’s signature)________________________ Xiaoman Shawn Li, Ph.D. (Third member’s signature)__________ _______________ William J. Sullivan, Ph.D. ii © 2006 Amy L. Schmidbauer All Rights Reserved iii ACKNOWLEDGMENTS This project would not have been possible without the guidance and support of many people including faculty, advisors, colleagues, friends and family. I am extremely grateful to my advisor, Dr. Sean Mooney, for welcoming me into his laboratory as a graduate student, and for providing, not only computing resources, but continued support, suggestions, and guidance as, what started out as an independent study project, grew into what became this thesis. I extend my sincere appreciation to Dr. Giselle Knudsen, an honorary member of my thesis committee, for the original project inspiration, for her enthusiastic encouragement, insight, and direction throughout the project, and for her thoughtful review of this thesis. I would like also like to extend a heartfelt thank you to Dr. William Sullivan and Dr. Xiaoman Li for their willingness to lend their time to reviewing my thesis and for the insightful feedback they provided. For statistical expertise and support I would like to extend my deepest appreciation to Dr. Eunseog Eun. Thank you also to Dr. Snehasis Mukhopadhyay for advice provided in the early stages of this work. For assistance with various software installations, I would like to thank Brandon Peters and Dustin Reynolds, and for assistance early in the project with BioPerl, I would like to thank Jessica Dantzer. It is my pleasure to thank my employer, Dow AgroSciences LLC, Indianapolis, IN, for providing computing resources and funding the majority of my graduate work. My sincere gratitude is extended to Dr. Neil Kirby of Dow AgroSciences for his flexibility and support throughout my graduate studies. Also, I would like to thank my colleagues Dr. Ignacio Larrinua, for consulting on various issues, and Mindi Dippold, for her assistance with Perl. Finally, I am very grateful to my friends and family, and especially Terry, for their support and encouragement throughout this journey. iv ABSTRACT Amy L. Schmidbauer MINING FOR CONSERVED MOTIFS AND SIGNIFICANT FUNCTIONS IN S. MANSONI CERCARIAL SECRETIONS Schistosomiasis is a disease caused by a parasitic flatworm of the genus Schistosoma. It infects an estimated 200 million people and 165 million head of livestock worldwide. There is medical interest in characterizing the parasitic proteins that interact with the human host for either the development of vaccines or the identification of drug targets. The cercarial secretome and adult tegument sub-proteome of S. mansoni have both been recently published (Knudsen, Medzihradszky et al. 2005) (van Balkom, van Gestel et al. 2005). As secretome proteins are secreted extracellularly, and tegument sub-proteome proteins are anchored in the cellular membrane, we hypothesize that both sets of proteins employ similar secretion machinery and mechanisms. Motivated by the discovery in the malarian parasite, Plasmodium falciparum, of conserved sequence motifs that are required for export downstream of N-terminal signal sequences (Hiller, Bhattacharjee et al. 2004), S. mansoni secretory and tegumental proteins were analyzed for conserved motifs using recursive iterations of MEME and MAST. To compliment the conserved motif analysis, an automated workflow to process InterProScan functional domain and GO annotation data, that employs statistical methods for determining significant functions, was developed. A conserved motif, enriched in the mechanically-induced vesicle secretion proteins, was elucidated and insight was gained into both the functions of proteins found to contain the motif, as well as the effects of different cercarial secretion induction methods. A hypothesis of the secretion model empoloyed by the invading parasite was generated. v TABLE OF CONTENTS I. INTRODUCTION................................................................................. 1 A. Introduction of Subject .................................................................... 1 B. Importance of Subject ...................................................................... 1 C. Knowledge Gap................................................................................. 2 II. BACKGROUND ................................................................................... 6 A. Related Research............................................................................... 6 B. Current Understanding.................................................................. 12 C. Research Question .......................................................................... 26 D. Intended Research Project............................................................. 26 III. METHODS .......................................................................................... 27 A. Materials: Software and Hardware..............................................27 B. Samples and Subjects ..................................................................... 28 C. Procedures and Interventions........................................................ 30 D. Statistical Analysis .......................................................................... 32 E. Expected Results............................................................................. 32 F. Alternate Plans................................................................................ 33 IV. RESULTS............................................................................................. 36 A. Introduction ..................................................................................... 36 B. Important Highlights...................................................................... 36 C. Specific Findings ............................................................................. 37 D. Summary........................................................................................ 112 V. CONCLUSION.................................................................................. 112 A. Overview of Significant Findings................................................ 112 B. Consideration of Findings in Context of Current Knowledge 114 C. Theoretical Implications of the Findings ...................................115 VI. DISCUSSION ....................................................................................116 A. Limitations of the Study............................................................... 116 B. Recommendations for Further Research...................................116 APPENDICES................................................................................................ 118 Appendix A: Summary of Sample Data Sets ......................................118 Appendix B: Control Data Set..............................................................126 Appendix C: Sequence Homology Within Sample Sets.....................132 Appendix D: Sequence Homology Between Sample Sets ..................136 Appendix E: ProP Propeptide prediction results............................... 140 Appendix F: Gene Ontology Annotation............................................. 152 Appendix G: Protein Domain Annotation ..........................................160 Appendix H: GO Entropy Calculations ..............................................171 Appendix I: Protein Domain Entropy Calculations ..........................177 Appendix J: Perl And Matlab Scripts .................................................182 Appendix K: Permission To Use Figures ............................................193 REFERENCES .............................................................................................. 198 CURRICULUM VITAE vi LIST OF FIGURES Figure 1. Praziquantel (left) and Oxamniquine (right) ..................................................................4 Figure 2. Life cycle of schistosomes.................................................................................................14 Figure 3. Scanning electron micrograph of a S. mansoni cerariae..............................................15 Figure 4. Cross section of a S. mansoni cercaria ...........................................................................16 Figure 5. Anterior organ of a S. mansoni cercaria........................................................................17 Figure 6. Components of the S. mansoni tegument.......................................................................18 Figure 7. Workflow #1 ......................................................................................................................34 Figure 8. Workflow #2 ......................................................................................................................35 Figure 9. Comparison of 3 Sample Sets..........................................................................................38 Figure 10. S. mansoni amino acid frequencies...............................................................................49