Open Duartedissertationfinal.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
The Pennsylvania State University The Graduate School Department of Biology THE IMPACT OF GENE DUPLICATION ON MOLECULAR EVOLUTION IN FLOWERING PLANTS A Dissertation in Biology by Jill Marie Duarte 2008 Jill Marie Duarte Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2008 The dissertation of Jill Marie Duarte was reviewed and approved* by the following: Claude dePamphilis Professor of Biology Dissertation Advisor Chair of Committee Naomi Altman Associate Professor of Statistics Hong Ma Professor of Biology Kateryna Makova Assistant Professor of Biology Webb Miller Professor of Biology and Computer Science and Engineering Douglas Cavener Professor of Biology Head of the Department of Biology *Signatures are on file in the Graduate School ii ABSTRACT Gene duplication has played an important role in the evolution of life on Earth, and it is particularly important for the evolution of flowering plants, given the prevalence of segmental and whole genome duplication in this lineage. This dissertation explores the link between gene duplication and the evolution of flowering plants by: 1) examining the role of expression divergence in the evolution of regulatory genes; 2) estimating the makeup of the ancestral angiosperm genome; 3) identifying and characterizing shared single copy genes in sequence plant genomes; and 4) exploring the phylogenetic utility of single copy nuclear orthologs and their relative performance compared to organellar and rRNA DNA. We show that expression divergence is a common feature of duplicated regulatory genes in Arabidopsis, but that the divergence patterns are not clearly classified by current theories concerning the evolution of duplicated genes. Results indicate that the ancestral angiosperm genome included a rich diversity of developmental genes that have contributed to angiosperm diversity through subsequent duplication and diversification. We have also identified a large number of shared single copy genes and have shown their phylogenetic utility in a number of different systems and using different methodologies. Results from these studies illustrate the added complexity in the evolution of flowering plants that is the result of gene duplication and the need to revise current theoretical frameworks and methodologies in order to account for this complexity. iii TABLE OF CONTENTS LIST OF FIGURES…………………………………………………………………… vi LIST OF TABLES……………………………………………………………………. vii ACKNOWLEDGEMENTS……………………………………………………………viii Chapter 1 The Impact of Gene Duplication on Molecular Evolution……………………1 References……………………………………………………………………….17 Chapter 2 Expression Pattern Shifts Following Duplication Indicative of Subfunctionalization and Neofunctionalization in Regulatory Genes of Arabidopsis…..25 Preface……………………………………………………………………………25 Abstract…………………………………………………………………………..27 Introduction……………………………………………………………………....28 Methods…………………………………………………………………………..33 Results……………………………………………………………………….…...37 Discussion………………………………………………………………….…….43 Acknowledgments………………………………………………………….…….52 Additional Files…………………………………………………………….…….52 References…………………………………………………………………….….53 Chapter 3 Utility of Amborella trichopoda and Nuphar advena ESTs for comparative sequence analysis…………………………………………………………………..…… 57 Preface……………………………………………………………………..……..57 Abstract……………………………………………………………………..……59 Introduction…………………………………………………………………..…..60 Methods……………………………………………………………………….….62 Results…………………………………………………………………………....65 Discussion………………………………………………………………………..70 Acknowledgments………………………………………………………………..74 Additional Files…………………………………………………………………..74 References………………………………………………………………………..75 Chapter 4 Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels.………..85 Preface……………………………………………………………………………85 Abstract…………………………………………………………………………..87 Background………………………………………………………………………88 Results……………………………………………………………………………93 Discussion………………………………………………………………………104 Methods…………………………………………………………………………112 iv Acknowledgments………………………………………………………………117 Additional Files…………………………………………………………………118 References………………………………………………………………………119 Chapter 5 A Marriage of High-throughput Identification of Phylogenetic Markers with High-throughput Sequencing: Next-generation Phylogenomics using 454 Transcriptomes …………………………………………………………………………………………..125 Preface…………………………………………………………………………..125 Abstract…………………………………………………………………………127 Introduction……………………………………………………………………..128 Methods…………………………………………………………………………132 Results…………………………………………………………………………..135 Discussion………………………………………………………………………143 Acknowledgments………………………………………………………………144 Additional Files…………………………………………………………………145 References………………………………………………………………………146 Concluding Remarks……………………………………………………………………150 References………………………………………………………………………154 v LIST OF FIGURES Figure 2-1. Fate of duplicated genes and the effect of each fate on expression patterns ………………………………………………………………………………………...…29 Figure 2-2. Range of ANOVA results…………………………………………………...38 Figure 2-3. Inferring ancestral expression patterns in MIKCC gene family in Arabidopsis………………………………………………………………………………41 Figure 2-4. Complete linkage clustering and mapping of complementary expression patterns supports independence of regulatory modules…………………………………50 Figure 3-1. Characterization of the Amborella and Nuphar ESTs and insights into the ancestral angiosperm……………………………………………………………………..67 Figure 3-2. Using single copy nuclear genes for organismal phylogeny…………………69 Figure 4-1. Shared single copy genes in various combinations of angiosperm genomes…94 Figure 4-2. Overrepresentation and underrepresentation of shared single copy genes in select GO categories……………………………………………………………………...97 Figure 4-3. Single copy nuclear genes improve phylogenetic resolution in the Brassicaceae………………………………………………………………………………99 Figure 4-4. Angiosperm phylogeny using ESTs for 13 shared single copy genes………102 Figure 5-1. Slim GO Annotation of the Brassica oleracea leaf transcriptome………….136 Figure 5-2. Slim GO Annotation of the Medicago truncatula leaf transcriptome……….137 Figure 5-3. Sequence coverage of orthologous single copy genes in Brassica and Medicago…………………………………………………………………………….….139 Figure 5-4. 6-taxon phylogenetic trees from different datasets and phylogenetic methodologies…………………………………………………………………….…..…142 vi LIST OF TABLES Table 3-1. Reproductive development genes present in the Amborella and Nuphar Unigenes………………………………………………………………………………….81 Table 4-1. Shared single copy nuclear genes are present throughout plant lineages………95 Table 4-2. Amplification of shared single copy nuclear genes in Brassicaceae…………...98 Table 4-3. Shared single copy nuclear genes are a rich source of phylogenetic information…………………………………………………………………………...…100 Table 5-1. Coverage of orthologous single copy genes in the Brassica and Medicago 454 transcriptome sequencing………………………………………………………………..140 Table 5-2. Summary of alignments used for phylogenetic analysis………………….….141 vii ACKNOWLEDGMENTS I have been privilieged over the past five years to work with an outstanding group of colleagues who have taught me more about botany, phylogenetics, genomics, statistics, molecular evolution, etc than they will ever know. I would like to first acknowledge my indebtedness to my thesis advisor, Claude dePamphilis, and the rest of my thesis committee, Naomi Altman, Hong Ma, Kateryna Makova and Webb Miller, for their continued support, insight and advice. I would also like to acknowledge all the members of the Biology Department at Penn State who have provided a welcoming academic community to call home for the past 5 years. An invaluable resource and sounding board for my research has been the Institute for Molecular Evolutionary Genetics and the IMEG seminar series each semester. In making me a better educator and guiding me through the trials and tribulations of teaching undergraduate students, I would like to thank all the individuals involved in the BIOL220W for the past four years, especially Dianne Burpee, Jim Minesky and Carla Hass. I would also like to thank the members of the dePamphilis lab, extant and extinct, especially Lena Landherr, Kerr Wall, Paula Ralph, Nadira Naznin, Yan Zhang, Barbara Bliss, Yuannian Jiao, Joel McNeal, Kevin Beckmann, Ali Barakat, Laura Zahn, and Jim Leebens-Mack. I would also like to thank our collaborators from FGP, including, but not limited to Pam and Doug Soltis, Matyas Buzgo, Vic Albert, and Peter Endress. J. Chris Pires and Patrick Edger from the University of Missouri- Columbia are to be thanked for their collaboration and intense discussions on the relative importance of selection in the evolution of duplicated genes. I would also like to acknowledge Susan “Doc” Behel for her role in first introducing me to the wonders of biology and the research community and Charles Delwiche for his role as my viii undergraduate research advisor and providing me a solid foundation in phylogenetics, genomics, and molecular evolution. There are many more people that deserve my thanks and appreciation for their involvement in my education. Without the financial contributions from the following sources, my research and graduate school experience would have been severely lacking in monies: a Braddock fellowship, the Mohnkern scholarship and teaching assistantships from the Department of Biology; a University fellowship from The Pennsylvania State University; travel grants from the Department of Biology, the MORPH