The Function and Evolution of C2H2 Zinc Finger Proteins and Transposons
Total Page:16
File Type:pdf, Size:1020Kb
The function and evolution of C2H2 zinc finger proteins and transposons by Laura Francesca Campitelli A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Molecular Genetics University of Toronto © Copyright by Laura Francesca Campitelli 2020 The function and evolution of C2H2 zinc finger proteins and transposons Laura Francesca Campitelli Doctor of Philosophy Department of Molecular Genetics University of Toronto 2020 Abstract Transcription factors (TFs) confer specificity to transcriptional regulation by binding specific DNA sequences and ultimately affecting the ability of RNA polymerase to transcribe a locus. The C2H2 zinc finger proteins (C2H2 ZFPs) are a TF class with the unique ability to diversify their DNA-binding specificities in a short evolutionary time. C2H2 ZFPs comprise the largest class of TFs in Mammalian genomes, including nearly half of all Human TFs (747/1,639). Positive selection on the DNA-binding specificities of C2H2 ZFPs is explained by an evolutionary arms race with endogenous retroelements (EREs; copy-and-paste transposable elements), where the C2H2 ZFPs containing a KRAB repressor domain (KZFPs; 344/747 Human C2H2 ZFPs) are thought to diversify to bind new EREs and repress deleterious transposition events. However, evidence of the gain and loss of KZFP binding sites on the ERE sequence is sparse due to poor resolution of ERE sequence evolution, despite the recent publication of binding preferences for 242/344 Human KZFPs. The goal of my doctoral work has been to characterize the Human C2H2 ZFPs, with specific interest in their evolutionary history, functional diversity, and coevolution with LINE EREs. I contributed to the expert curation of the full complement of 1,639 Human TFs and used the results to quantitatively compare their evolutionary history and tissue specificities to all other Human TFs. I analyzed ii protein-protein interaction (PPI) data for 118 DNA-binding C2H2 ZFPs and found extremely diverse interactions with nuclear factors despite paradoxically few dedicated PPI domains, revealing a new and unexplained dimension of functional diversity in addition to DNA-binding specificity diversity. Finally, I pioneered a computational technique to reconstruct extinct LINE L1 sequences and showed that they can be used to anchor the integration of KZFP genomic binding data and binding specificities for a complete picture of dynamic KZFP-ERE sequence specificity relationships. Together, my results paint a detailed picture of the diverse functionality and rapid evolution of Human C2H2 ZFPs, contribute to ongoing theorization on KZFP-ERE coevolution, and provide parallel datasets to power future investigations. iii Acknowledgements I am thankful to have had the guidance and mentorship of my excellent supervisor, Dr. Tim Hughes. He has taught me to follow the data, work systematically, explain clearly, and question everything. The entirety of this work results from opportunities enabled and supported by Tim. I hope to carry throughout my life the example he has set as a leader and scientific thinker. I’d like to thank my thesis committee members, Drs. Jack Greenblatt and Michael Wilson. Their informed perspectives have challenged and shaped this work and motivated my growth as a scientist. Several additional professors have contributed directly or indirectly to my doctoral studies, including Drs. Mathieu Blanchette, Anne-Claude Gingras, Mikko Taipale, and Quaid Morris. I am grateful to the C2H2 ZFP experts who gave me the special opportunity to learn through collaboration with knowledgeable and kind mentors early in my graduate studies: Drs. Frank Schmitges, Ernest Radovani, Hamed Najafabadi, and Marjan Barazandeh. There are many senior researchers whose patience, kindness, and exceeding scientific talent played critical roles, including Drs. Edyta Marcon, Mandy Lam, and Ally Yang. Jeff Liu and Dr. Mihai Albu deserve special thanks for joining me on winding journeys through poorly documented bioinformatics software. I’m especially appreciative of Drs. Rozita Razavi and Debashish Ray for many years of teaching and encouragement. I’m grateful for all the students I’ve had the opportunity to work with over the years. Dr. Samuel Lambert, my only senior lab mate and first example of a great grad student. Kaitlin Laverty has taught me a lot about data science, is always willing to thought partner on scientific ideas, and has made an immeasurable impact through her friendship. My experience in graduate school was made complete by personal and professional connections with all of the students in the Hughes lab, as well as classmates including Samantha Ing Esteves, Lauren Tracey, and Nader Alerasool. I am extremely thankful to the family and friends who have supported me unconditionally throughout my education and expanded my horizons in a stage when the world can sometimes iv feel very small – my parents Rita and Peter Campitelli, my grandmothers Franca Campitelli and Maria Celenza, my friend Marilena Danelon, and especially my fiancé Nicholas Fleming. Finally, I’d like to dedicate this thesis to my friend Dr. Benjamin Grys, who has been there from beginning to end – from my first committee meeting practice talk, to beachside cocktails the world over, to my mid-pandemic online defence seminar. Ben has been my most patient mentor and my closest friend in this experience, and I couldn’t have done it without him. v Table of Contents ACKNOWLEDGEMENTS IV TABLE OF CONTENTS VI LIST OF TABLES IX LIST OF FIGURES X LIST OF ABBREVIATIONS XIII CHAPTER 1 1 INTRODUCTION 2 1.1 CHAPTER OUTLINE 2 1.2 TRANSCRIPTION FACTORS 3 1.2.1 DEFINING A TRANSCRIPTION FACTOR 3 1.2.2 HUMAN DNA-BINDING DOMAINS (DBDS) 3 1.2.3 IDENTIFYING TFS 7 1.2.4 TRANSCRIPTIONAL REGULATION BY TFS 12 1.2.5 SUMMARY 18 1.3 C2H2 ZINC FINGER PROTEINS 19 1.3.1 C2H2 ZFP DNA-BINDING 19 1.3.2 C2H2 ZFP PPIS 25 1.3.3 EVOLUTION OF C2H2 ZFPS 31 1.4 TES AND TRANSCRIPTIONAL REGULATION 35 1.4.1 ANNOTATION AND CLASSIFICATION OF ERES 37 1.4.2 RECONSTRUCTING ANCESTRAL TES 40 1.4.3 SUMMARY 42 1.5 CHAPTER SUMMARY AND THESIS RATIONALE 42 CHAPTER 2 45 THE FUNCTION AND EVOLUTION OF HUMAN TRANSCRIPTION FACTORS 46 2.1 INTRODUCTION 46 2.2 METHODS 48 2.2.1 HUMAN TF LIST CURATION 48 2.2.2 HUMAN TF PARALOG EVOLUTION 50 2.2.3 TF EXPRESSION PROFILES IN HUMAN TISSUES 50 2.2.4 PBM CONSTRUCT DESIGN FOR MONODACTYL ZFPS 51 2.2.5 PBM EXPERIMENTS 52 vi 2.3 RESULTS AND DISCUSSION 52 2.3.1 THE HUMAN TFS 52 2.3.2 PARALOG EVOLUTION OF HUMAN TFS 53 2.3.3 EXPRESSION PROFILES OF THE HUMAN TFS 53 2.3.4 SEQUENCE-SPECIFIC DNA BINDING BY MONODACTYL ZFPS 57 2.4 SUMMARY 61 CHAPTER 3 62 PROTEIN-PROTEIN INTERACTIONS OF HUMAN C2H2 ZINC FINGER PROTEINS 63 3.1 INTRODUCTION 63 3.2 METHODS 65 3.2.1 MOLECULAR COMPARISON BETWEEN INVESTIGATED C2H2 ZFPS AND ALL HUMAN C2H2 ZFPS 65 3.2.2 AP-MS EXPERIMENTAL METHODS 65 3.2.3 STATISTICAL ANALYSIS OF AP-MS DATA 67 3.2.4 FUNCTIONAL ANNOTATION OF AP-MS PREYS 68 3.3 RESULTS 69 3.3.1 MOLECULAR COMPARISON BETWEEN INVESTIGATED C2H2 ZFPS AND ALL HUMAN C2H2 ZFPS 69 3.3.2 C2H2 ZFPS HAVE UNIQUE PPI PROFILES 69 3.3.3 EFFECTOR DOMAIN SUBCLASSES RECRUIT EXPECTED INTERACTION PARTNERS, AND ADDITIONAL AND ALTERNATIVE PPIS ARE PERVASIVE WITHIN EACH GROUP 72 3.3.4 C2H2 ZFPS INTERACT WITH TRANSCRIPTION-RELATED NUCLEAR FACTORS 76 3.4 DISCUSSION 79 3.5 SUMMARY 82 CHAPTER 4 83 RECONSTRUCTING THE EVOLUTIONARY HISTORY OF LINE L1S TO INTERPRET KZFP-ERE COEVOLUTION 84 4.1 INTRODUCTION 84 4.2 METHODS 88 4.2.1 ANCESTRAL RECONSTRUCTED GENOMES 88 4.2.2 TE ANNOTATION 88 4.2.3 FULL-LENGTH PROGENITOR SEQUENCE RECONSTRUCTION 89 4.2.4 ORF REFINEMENT 90 4.2.5 COMPOSITE PROGENITOR SEQUENCE RECONSTRUCTION 92 4.2.6 COMPOSITE RECONSTRUCTED PROGENITOR SEQUENCE VALIDATION 92 4.3 RESULTS AND DISCUSSION 94 4.3.1 REPEATMASKER ANNOTATIONS IN ANCESTRAL GENOMES CORRESPOND TO L1 SUBFAMILY RELATIVE AGES AND SPECIES DISTRIBUTIONS 94 vii 4.3.2 REPEATMASKER HITS IN ANCESTRAL GENOMES ARE SIMILAR IN LENGTH TO THOSE IN HG38, AND LESS DIVERGENT FROM CONSENSUS MODELS 94 4.3.3 FULL-LENGTH ANCESTRAL SEQUENCE RECONSTRUCTION 97 4.3.4 TARGETED ORF RECONSTRUCTION 101 4.3.5 COMPOSITE RECONSTRUCTED PROGENITOR SEQUENCES CAPTURE EXPECTED PHYLOGENETIC RELATIONSHIPS AND SEQUENCE COMPONENTS 104 4.3.6 COMPOSITE RECONSTRUCTED PROGENITOR SEQUENCES ANCHOR INTEGRATION OF IN VIVO KZFP BINDING EVIDENCE AND IN SILICO-PREDICTED KZFP BINDING PREFERENCES 108 4.4 SUMMARY 111 CHAPTER 5 113 DISCUSSION 113 DISCUSSION 114 5.1 CHAPTER OUTLINE 114 5.2 PERSPECTIVES AND FUTURE DIRECTIONS 116 5.2.1 THE DNA-BINDING PREFERENCES OF EVERY HUMAN TF 116 5.2.2 FUNCTIONAL DIVERSIFICATION OF C2H2 ZFPS: DNA-BINDING, PPIS, AND EXPRESSION PATTERNS 116 5.2.3 IDENTIFYING FAST-EVOLVING PPI ELEMENTS OF THE C2H2 ZFPS 118 5.2.4 THE EVOLUTIONARY ARC OF THE KZFPS 119 5.2.5 ROLES OF C2H2 ZFPS OUTSIDE THE NUCLEUS 120 5.2.6 FUNCTIONAL AND EVOLUTIONARY ASSESSMENT OF RECONSTRUCTED L1 PROGENITOR SEQUENCES 121 5.2.7 INTERPRETING PREDICTED KZFP BINDING SITES ON RECONSTRUCTED L1 PROGENITOR SEQUENCES 122 5.2.8 THE FUTURE OF TE RECONSTRUCTION 123 5.3 CLOSING REMARKS 124 REFERENCES 126 COPYRIGHT ACKNOWLEDGEMENTS 147 viii List of Tables Table 3.1 Molecular features of the 118 C2H2 ZFPs investigated compared to all Human C2H2 ZFPs……………………………………………………………………………………………66 ix List of Figures Figure 1.1 Schematic of a prototypical TF......................................................................................4 Figure 1.2 Examples of DBD structures and counts of Human TFs by their DBD type.................5 Figure 1.3 C2H2 ZFP binding