The Evolution of Gene and Genome Duplication in Soybean
Total Page:16
File Type:pdf, Size:1020Kb
THE EVOLUTION OF GENE AND GENOME DUPLICATION IN SOYBEAN by BRIAN D. NADON (Under the Direction of Scott A. Jackson) ABSTRACT Duplication of DNA is one of the prime drivers of diversification, speciation, and adaptation for life on earth. Plants are highly tolerant of polyploidy or whole-genome duplication (WGD) - indeed, all characterized plant genomes show traces of a history of polyploidy. Duplicated genes often diverge in their function post-duplication, and can assume subsets of their old functions, take on new functions, or be deleted altogether. Studying how these processes have affected crop plants can illuminate their evolution and show how they might be improved through breeding in the future. Legumes, and especially soybean (Glycine max L.), offer a valuable system to study this. For this work, first, the soybean genome was aligned to itself, which showed that gene pairs from the most recent duplication event in soybean have maintained similar expression profiles within tissues and have maintained their methylation status more consistently than older duplicate pairs. Next, an algorithm (TetrAssign) was developed to reconstruct and phase ancient soybean subgenomes post-WGD, and comparison of these reconstructions with maize showed that soybean’s ancient subgenome sets were less divergent in their gene deletion, expression, and methylation profiles than maize’s. Then, a set of gene families (orthogroups) for soybean and several other sequenced legume genomes was analyzed to reveal that rates of stochastic gene duplication were low, while gene deletion (death) rates were higher but variable among the legume branches, and furthermore found that the Glycine-specific duplication event had a much higher retention of gene duplicates post-WGD than the Faboideae duplication. Lastly, resequencing of elite and wild (Glycine soja) soybean accessions determined that while duplicated genes dominate the gene set of soybean, orphan genes and dispensable genes are overrepresented among genes most strongly selected for during the domestication of soybean. The results of this work indicate that soybean’s genome is unusually duplicated for a diploidized paleotetraploid, that its subgenomes 8-13 My ago were less diverged than was initially thought, and that the evolution of duplicate genes is ongoing in soybean and has probably impacted the transformation of soybean from a wild, vine-like plant into the dependable economic powerhouse it is today. INDEX WORDS: soybean; legume; genome; evolution; polyploidy; epigenomics; domestication; pangenome THE EVOLUTION OF GENE AND GENOME DUPLICATION IN SOYBEAN by BRIAN D. NADON B.S., Arizona State University, 2013 A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY ATHENS, GEORGIA 2019 © 2019 Brian D. Nadon All Rights Reserved THE EVOLUTION OF GENE AND GENOME DUPLICATION IN SOYBEAN by BRIAN D. NADON Major Professor: Scott Jackson Committee: Zenglu Li Peggy Ozias-Akins Robert Schmitz James Leebens-Mack Electronic Version Approved: Suzanne Barbour Dean of the Graduate School The University of Georgia May 2019 ACKNOWLEDGEMENTS I owe many thanks to many people for my time at the University of Georgia, only a few of which I can list here. I owe my gratitude to Chunming Xu for being my serendipitous advisor on all things bioinformatics, and for just so happening to be working on the exact same hypothesis as I was, sitting directly behind me for months, unbeknownst to both of us, which led to the most fruitful collaborations I had during my studies. I owe deep thanks to Wayne Parrott for being like a second advisor to me, pushing me to be my best when I needed it the most. My thanks to my committee members for continually keeping me on the right track, and for their valuable input in my development as a scientist. Last, and most importantly, my deepest gratitude goes to all my family and friends who supported me through the most challenging four and a half years of my life. iv TABLE OF CONTENTS Page ACKNOWLEDGEMENTS ........................................................................................................... iv CHAPTER 1 THE POLYPLOID ORIGINS OF CROP GENOMES AND THEIR IMPLICATIONS: A CASE STUDY IN LEGUMES (INTRODUCTION AND LITERATURE REVIEW) .............................................................................................1 2 THE EVOLUTION AND DIVERGENCE OF WHOLE-GENOME AND SEGMENTAL DUPLICATIONS IN SOYBEAN ......................................................44 3 RECONSTRUCTION AND COMPARISON OF ANCIENT PALEOPOLYPLOID SUBGENOMES: CONTRASTING EVOLUTIONARY HISTORIES FOR SOYBEAN AND MAIZE ...........................................................................................83 4 A HISTORY OF GENE DUPLICATION AND DELETION IN LEGUMES REVEALED BY PHYLOGENETIC ANALYSIS OF GENE FAMILIES ..............120 5 DOMESTICATION SWEEPS IN DUPLICATED AND DISPENSABLE GENES IN SOYBEAN .................................................................................................................161 v 6 CONCLUSIONS .......................................................................................................202 REFERENCES ............................................................................................................................208 vi CHAPTER 1 THE POLYPLOID ORIGINS OF CROP GENOMES AND THEIR IMPLICATIONS: A CASE STUDY IN LEGUMES1 (INTRODUCTION AND LITERATURE REVIEW) Abstract: Gene duplication and polyploidy are some of the most important, yet underappreciated, evolutionary forces that have shaped all flowering plants on earth, and the crop plants that enable human economic activity are prime exemplars of duplication in action. Polyploidy involves an immediate doubling, tripling, or more of a genome, often followed by drastic chromosomal reorganization or a reduction back to diploidy and has occurred many times in the history of most characterized plant genomes. Understanding how duplication shapes plant genomes is critical for understanding plant genetics and breeding. Of particular importance are legumes, one of the largest plant families on earth, often noted for their nitrogen fixation abilities and high nutritional value due to their protein content. Among these Papilionoid (Faboideae) legume crops are alfalfa, soybean, peanut, and common bean. All of these have experienced polyploidy events somewhere in their history, some ancient (60 My or more) and some very recent (e.g. ~10,000 years ago in peanut). The modes by which these polyploidies arose, whether from divergent genomes coming together (allopolyploidy), or identical or similar genomes duplicating (autopolyploidy), can affect their evolution, domestication, and improvement considerably, whether by generating new functional diversity or driving speciation. Appreciating the indelible mark polyploidy and duplication leave on these legume genomes will 1 Modified from submission to Advances in Agronomy. 1 enable a better understanding of the molecular biology, breeding, and agronomy of these critical crops. 2 BACKGROUND: THE SIGNIFICANCE OF POLYPLOIDY AND DUPLICATION Polyploidy: an overview The landscape of modern genetics and genomics, with the explosion of discoveries enabled by continuous advances in field studies, laboratory techniques, sequencing technologies, and the exponential growth of computing power, offers a staggering wealth of data and insights yet unseen in biology. While the breadth and depth of information offered by these modern techniques has allowed for new discoveries in genetics, these advances are framed, contextualized, and enabled by the fundamentals of cell biology and cytology established in large part centuries ago. Beginning perhaps as far back as the 17th century, when Robert Hooke recorded his observations of the cells of a cork tree, the importance of the smallest physically observable components of an organism was understood. With the discovery of the nucleus in the early 19th century by Robert Brown (Oliver, 1913), and the first observations of the chromosomes within that nucleus later in the same century, the foundations of molecular genetics were laid. Simultaneously, Gregor Mendel, now a household name, quietly carried out his essential work in inheritance in pea plants. The rediscovery of Mendel’s work in the early 20th century, and the synthesis of cellular and nuclear biology with Mendel’s laws of inheritance led T.H. Morgan and his colleagues to deduce that chromosomes, one of those essential microscopic cellular components, carried genes, and that genes were the fundamental unit of inheritance in living organisms (Morgan et al., 1922). Thus, it was clear early on that the small bundles of DNA in the nucleus, or chromosomes, and their behavior were essential to understanding biology. Abnormalities in the expected behavior of chromosomes were noted early in the history of cytology. While it was generally understood that most sexually reproducing organisms that 3 were studied were of a diploid nature, having two copies of their chromosome set (one from each parent), oddities with chromosome counts and pairing were observed that suggested that this was not always the case. Of interest is polyploidy, the state of having more than 2 copies of a chromosome set in a sexually reproducing organism. Perhaps the first example of the discovery of polyploidy was in 1907, when Lutz noted that some mutants of her Oenothera lamarckiana plants had enlarged cells with approximately