Analysis of the Maize (Zea Mays L) Genome Using Molecular, Genetic and Computational Approaches Yan Fu Iowa State University

Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 2005 Analysis of the maize (Zea mays L) genome using molecular, genetic and computational approaches Yan Fu Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Genetics Commons Recommended Citation Fu, Yan, "Analysis of the maize (Zea mays L) genome using molecular, genetic and computational approaches " (2005). Retrospective Theses and Dissertations. 1326. https://lib.dr.iastate.edu/rtd/1326 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Analysis of the maize (Zea mays. L) genome using molecular, genetic and computational approaches by Yan Fu A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Genetics Program of Study Committee: Patrick S. Schnable, Major Professor Daniel A. Ashlock Volker Brendel Allen W. Miller Basil J. Nikolau Iowa State University Ames, Iowa 2005 UMI Number: 3217340 INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. UMI UMI Microform 3217340 Copyright 2006 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 ii Graduate College Iowa State Univeristy This is to certify that the doctoral dissertation of Van Fu has met the dissertation requirements of Iowa State University Signature was redacted for privacy. Major Professor Signature was redacted for privacy. r the Major Program iii TABLE OF CONTENTS ACKNOWLEDGEMENTS v CHAPTER 1. GENERAL INTRODUCTION 1 Introduction 1 Dissertation Organization 2 References 3 CHAPTER 2. RESCUING OPEN READING FRAMES 6 Abstract 6 Introduction 6 Results 8 Discussion 11 Materials and Methods 14 Acknowledgements 16 References 16 Figure Legends 19 CHAPTER 3. TYPES AND FREQUENCIES OF SEQUENCING ERRORS IN METHYL-FILTERED AND HIGH COT MAIZE GENOME SURVEY SEQUENCES .26 Abstract 26 Introduction 27 Results and Discussion 28 Recommendations 33 Materials and Methods 33 Acknowledgements 35 References 35 Figure Legends 37 Supplementary data 42 CHAPTER 4. QUALITY ASSESSMENT OF MAIZE ASSEMBLED GENOMIC ISLANDS (MAGIS) AND LARGE-SCALE EXPERIMENTAL VERIFICATION OF PREDICTED GENES 48 Abstract 48 Introduction 49 Materials and Methods 50 Results and Discussion 53 Conclusions 62 Acknowledgements 63 References 63 Figure Legends 65 Supporting Materials 73 iv CHAPTER 5. GENETIC DISSECTION OF THE GENOMES OF MAIZE INTERMATED RECOMBINANT INBRED LINES 80 Abstract 80 Introduction 81 Materials and Methods 82 Results 86 Discussion 93 Acknowledgements 96 References 96 Figure Legends 99 Supplementary Data 111 CHAPTER 6. GENERAL CONCLUSIONS 116 Summary and discussion 116 Perspective 117 References 117 APPENDICES 118 V ACKNOWLEDGEMENTS I would like to express my gratitude to those who have supported me throughout my Ph.D. training program. I especially thank Dr. Patrick Schnable, my major professor, for his guidance and support in my graduate research and his devotion and dedication to science that have profound influence on my career developing, and Shengmei Chen, my wife, for her unconditional love and inspirational encouragement. I also thank my Program of Study committee members (Drs. Daniel Ashlock, Volker Brendel, Allen Miller, and Basil Nikolau) for their serving on my committee and helpful comments on my thesis. I truly appreciate the joy from working with my colleagues (Dr. Feng Liu, Dr. Xiangqin Cui, Dr. Hong Yao, Scott Emrich, and Jun Cao) and many advices from Dr. An-Ping Hsia and Debbie Chen. Finally, I would like to send my gratitude to my parents in China, Chibin Fu and Shuhua Miao, for their love and support, and my son here in Ames, Yannis Boyi Fu, for the joy and inspiration that he brings to me. 1 CHAPTER 1. GENERAL INTRODUCTION Introduction Maize (Zea mays L.) is one of the most important crops in the world and the best- studied biological model for cereals. The size of the maize nuclear genome, a segmental allotetraploid[1,2], is approximately 2,500 million bases (Mb)[3], Although it has been examined extensively at the cytogenetic level, the initial glimpse of the maize genome at molecular level, which was obtained by sequencing a 280-kb maize bacterial artificial clone (BAC) containing Adh1-F gene and u22 genes, suggested that maize genes are organized into islands separated by nested retrotransposon lseas'[4,5]. Subsequent maize BAC sequencing data suggest that retrotransposons and other repeats account for 50%~80% of the maize genome[6,7]. In addition, the observations that only 5% of random shotgun reads[8] and -7.5% of BAC end reads[9] correspond to putative maize non-transposon coding regions provide a more global view of maize genome organization. Because of the highly repetitive nature of the maize genome makes, the initial efforts focused mainly on the non-repetitive gene-rich regions [6,10,11]. Sequencing random cDNA clones to generate Expressed Sequence Tags (ESTs) can also quickly recover coding sequences. The successful normalization or subtraction of cDNA libraries[12] can significantly improve the rate of gene discovery. However, in nearly all 'finished' whole genome sequencing projects, it is difficult to recover more than 60% of genes using only EST sequencing^]. High-Cot (HC) sequencing[13,14] was used to enrich for low-copy sequences by normalization (i.e. removing highly or moderately repetitive DNA). The Methylation Filtration (MF) method was developed to selectively sequence unmethylated gene-rich regions by filtering out methylated regions, which are often composed of retrotranspons[15-17], MF and HC strategies have proven effective in selectively recovering maize genes not captured by EST projects[13,18]. Significantly, these two strategies appear to be complementary for sampling the maize gene space[13,19,20]. RescueMu is another alternative method 2 for maize gene discovery[21], which is based on the observation that maize Mutator transposons preferentially insert into low-copy genie regions[22]. The analysis of RescueMu flanking sequences, however, showed a high level of redundancy[23] that could be resulted from the presence of hotspots for RescueMu insertion [24]. A high-quality integrated maize physical-genetic map is essential for various studies[10,11]. The latest integrated map still contains ~300 unanchored BAC fingerprint contigs (FPCs; http://www.genome.arizona.edu/fpc/maize). Efficiently placing genie sequences such as assembled ESTs and gene-rich GSSs [25] onto the maize genetic map using maize intermated B73 x M017 recombinant inbred lines will not only enhance the quality of the current integrated map but also help anchor "orphan" FPCs for the upcoming maize genome sequencing project. Dissertation Organization This dissertation contains four journal papers (Chapter 2~5) and a general conclusion (Chapter 6). All these papers were written under Dr. Schnable's extensive guidance. Chapter 2 is a manuscript in preparation for submission to Genome Research, where we describe the development and testing of a new ORF selecting vector for enriching maize coding sequences. In this manuscript, Yan Fu developed the vector, established ORF-Rescue procedures, sequenced all clones, analyzed the data, and wrote the manuscript. Dr. Feng Liu was involved in the design of the three-frame Hiss-tag (3xHis6) and in the early development of the pHIS6 vector. Chapter 3 is a paper published in Plant Physiology, where we reported the types and frequencies of sequencing errors in methyl-filtered and high Cot maize genome survey sequences (GSSs). Yan Fu performed most of the computational analyses and was the major contributor to the writing of this paper. Dr. An-Ping Hsia provided trace files for some lab genes. Ling Guo assisted with data analysis. 3 Chapter 4 is a paper published in the Proceedings of the National Academy of Sciences USA, where we reported the release of an improved assembly of maize assembled genomic islands (MAGIs), a quality assessment of this assembly, the annotation of individual MAGIs using computational approaches, and the large-scale experimental validation of predicted genes without prior evidence of transcription. Yan Fu prepared the GSS data collection, validated MAGI structures, displayed MAGI annotations, and performed the large-scale experimental verification of the transcription of predicted genes. Scott Emrich, the co-first author, produced the GSS assembly, determined evidence of transcription using bioinformatic approaches, and performed the gene content analyses. Both Yan Fu and Scott Emrich wrote the paper. Ling Guo,

Analysis of the Maize (Zea Mays L) Genome Using Molecular, Genetic and Computational Approaches Yan Fu Iowa State University

Comparative Genomics of Arabidopsis and Maize: Prospects and Comment Limitations Volker Brendel*, Stefan Kurtz† and Virginia Walbot‡

Bioinformatics: a Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D

Quality Assessment of Maize Assembled Genomic Islands (Magis) and Large-Scale Experimental Verification of Predicted Genes

An Active DNA Transposon Family in Rice

Biological Sequence Database: NCBI

5, and J. Chris Pires

Genome Survey of Misgurnus Anguillicaudatus to Identify Genomic Information, Simple Sequence Repeat (SSR) Markers and Mitochondrial Genome

SSR-HRM) Analysis for Genetic Relationship of Luffa Genotypes

Identification and Characterization of Rearrangements in the Vervet Monkey Genome

The Nuclear Genome of Brachypodium Distachyon: Analysis of BAC End Sequences

Rice Transposable Elements: a Survey of 73,000 Sequence-Tagged-Connectors

Distribution of Genes and Repetitive Elements in the Diabrotica Virgifera Virgifera Genome Estimated Using BAC Sequencing Brad S