Structure-Based Subfamily Classification of Homeodomains

Structure-Based Subfamily Classification of Homeodomains by Jennifer Ming-Jiun Tsai A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate Department of Molecular Genetics University of Toronto © Copyright by Jennifer Ming-Jiun Tsai 2008 STRUCTURE-BASED SUBFAMILY CLASSIFICATION OF HOMEODOMAINS A thesis submitted in conformity with the requirements for Master of Science Jennifer Ming-Jiun Tsai Graduate Department of Molecular Genetics, University of Toronto, 2008 Abstract Eukaryotic DNA-binding proteins mediate many important steps in embryonic development and gene regulation. Consequently, a better understanding of these proteins would hopefully allow a more complete picture of gene regulation to be determined. In this study, a structure- based subfamily classification of the homeodomain family of DNA-binding proteins was undertaken in order to determine whether sub-groupings of a protein family could be identified that corresponded to differences in specific function, and identification of subfamily-determining residues was performed in order to gain some insight on functional differences via analysis of the residue properties. Subfamilies appear to have different specific DNA binding properties, according to DNA profiles obtained from TRANSFAC [1] and other sources in the literature. Subfamily-specific residues appear to be frequently associated with the protein-DNA interface and may influence DNA binding via interactions with the DNA phosphate backbone; these residues form a conserved profile uniquely identifying each subfamily. ii Acknowledgements First and foremost, I would like to thank my husband Christopher, my parents David and Virginia, and my sister Margaret for their unfailing love and support that has enabled me to maintain my focus on my studies and research. I would especially like to thank my supervisor, Dr. Shoshana Wodak, for her guidance and support during my graduate studies, and for developing my knowledge of what science is all about. The opportunity to attend one of the most well-known bioinformatics conferences, to give poster presentations and lectures, to act as a teaching assistant, constructive criticism, advice, and ideas all contributed to my personal development as a scientist and researcher. I am most grateful to Dr. Boris Steipe, for recognizing the potential in a young undergraduate student and nurturing my interest in science, especially computational biology, and for his enthusiastic support and constructive criticism as a member of my supervisory committee. I would like to thank Dr. John Parkinson for his enlightening contributions during joint meetings held with his laboratory members, for his constructive suggestions and support as a member of my supervisory committee, and for his generous access to the PartiGeneDB EST database. I would also like to thank Dr. Jack Greenblatt for his insightful contributions and support as a member of my supervisory committee, and his willingness to share his knowledge about DNA-binding proteins. I would like to thank all my colleagues, friends, and mentors who have challenged me and supported me through this journey, and who have made my academic journey at the University of Toronto such a rewarding one. Special thanks go to my fellow lab colleagues past and present at the Centre for Computational Biology at the Hospital for Sick Children, especially Miguel Santos for his assistance with the structural analysis, my fellow students Gerald Quon, Torben Broemstrup, and Jim Vlasblom, and the members of the Parkinson lab, especially Chris Sanford and David He, for insightful and entertaining discussions over coffee breaks and inter-lab lunches, and for creating a friendly and supportive working environment. iii Table of Contents Structure-Based Subfamily Classification of Homeodomains ............................................ i Abstract............................................................................................................................... ii Acknowledgements............................................................................................................ iii Table of Contents...............................................................................................................iv List of Tables ..................................................................................................................... vi List of Figures................................................................................................................... vii List of Appendices............................................................................................................. ix Chapter 1 – Introduction......................................................................................................1 1.1 Importance of Studying Eukaryotic DNA-Binding Proteins.....................................1 1.1.1 Regulation of Gene Expression: From DNA to RNA to Protein.......................2 1.1.2 DNA Binding Proteins: An Overview...............................................................8 1.1.3 Homeodomains: A DNA Binding Domain Family .........................................11 1.2 Sequence Conservation and Inferences ...................................................................16 1.3 Principles of Protein Classification .........................................................................18 1.3.1 Importance of Sequence Alignment ................................................................23 1.3.2 Phylogenetic Representation Methods ............................................................25 1.3.3 Protein Family and Subfamily Classifications ................................................27 1.4 Current Strategies for Determining Functional Residues in Proteins......................32 1.5 A Requirement for More Structural Insight: A Case Study ....................................34 Chapter 2 – Experimental Protocol....................................................................................37 2.1 Outline of the Protocol.............................................................................................37 2.2 Curation of Protein Structures .................................................................................39 2.3 Curation of Protein Sequences.................................................................................40 2.4 Structure-Based Sequence Alignment .....................................................................41 2.4.1 MALECON Multiple Structure Alignment.....................................................42 2.4.2 ClustalX Sequence-to-Profile Alignment........................................................42 2.5 Identifying the Subfamilies and Subfamily-Determining Residues ........................43 2.5.1 Bête Subfamily Classification .........................................................................43 2.5.2 SDPpred: Subfamily Determining Residue Identification ..............................46 2.6 Validation of Bête Neighbour-Joined Tree..............................................................49 2.7 Validation of Subfamily Integrity............................................................................49 2.8 Structural Analysis of Subfamily-Determining Residues........................................50 2.9 Subfamily DNA Binding Profile Analysis ..............................................................52 Chapter 3 – Results............................................................................................................53 3.1 Subfamily Classification..........................................................................................53 3.2 Subfamily Determining Residues............................................................................60 3.3 Validation of Results ...............................................................................................63 3.3.1 Quality of Obtained Subfamilies .....................................................................63 3.3.2 Comparison of Bête Neighbour-Joined Tree with PHYLIP............................64 iv 3.4 Analysis of Protein-DNA Interaction: Physical Characteristics of Subfamily- Determining Residues..........................................................................................................68 3.4.1 Contribution to the Protein-DNA Interface .....................................................68 3.4.2 Interactions Made by Interface Residues.........................................................71 3.5 Potential role of specificity determining residues in modulating protein-DNA interaction ............................................................................................................................74 3.6 Subfamily Cognate DNA Sequences.......................................................................91 Chapter 4 – Discussion......................................................................................................96 Chapter 5 – Conclusion ...................................................................................................103 Appendices ......................................................................................................................105 v List of Tables Table 1: Homeodomains and availability of protein structures in the PDB. Subfamily assignments are given (as applicable) according to the Interpro homeodomain subfamily classification illustrated in Figure 18...............................................................................13

Structure-Based Subfamily Classification of Homeodomains

Detecting Remote, Functional Conserved Domains in Entire Genomes by Combining Relaxed Sequence-Database Searches with Fold Recognition

Supplementary Data

Supplementary Table 2

LN-EPC Vs CEPC List

Supplementary Figures

Protein Classification Based on Text Document Classification Techniques

Supplemental Material 1

SUPPLEMENTARY INFORMATION Gotree/Goalign

Targeted Disruption of Lynx2 Reveals Distinct Functions for Lynx Homologues in Learning and Behavior Ayse Begum Tekinay

Evolutionary Analysis of the CAP Superfamily of Proteins Using Amino Acid Sequences

4 353 Skin Oral 1 B

GPRASP/ARMCX Protein Family: Potential Involvement in Health And