Model SNP Development for Complex Genomes Based on Hexaploid Oat

Model SNP Development for Complex Genomes Based on Hexaploid Oat

Oliver et al. BMC Genomics 2011, 12:77 http://www.biomedcentral.com/1471-2164/12/77 RESEARCHARTICLE Open Access Model SNP development for complex genomes based on hexaploid oat using high-throughput 454 sequencing technology Rebekah E Oliver1, Gerard R Lazo2, Joseph D Lutz3, Marc J Rubenfield4,13, Nicholas A Tinker5, Joseph M Anderson6, Nicole H Wisniewski Morehead1, Dinesh Adhikary7, Eric N Jellen7, P Jeffrey Maughan7, Gina L Brown Guedira8, Shiaoman Chao9, Aaron D Beattie10, Martin L Carson11, Howard W Rines12, Donald E Obert1, J Michael Bonman1, Eric W Jackson1* Abstract Background: Genetic markers are pivotal to modern genomics research; however, discovery and genotyping of molecular markers in oat has been hindered by the size and complexity of the genome, and by a scarcity of sequence data. The purpose of this study was to generate oat expressed sequence tag (EST) information, develop a bioinformatics pipeline for SNP discovery, and establish a method for rapid, cost-effective, and straightforward genotyping of SNP markers in complex polyploid genomes such as oat. Results: Based on cDNA libraries of four cultivated oat genotypes, approximately 127,000 contigs were assembled from approximately one million Roche 454 sequence reads. Contigs were filtered through a novel bioinformatics pipeline to eliminate ambiguous polymorphism caused by subgenome homology, and 96 in silico SNPs were selected from 9,448 candidate loci for validation using high-resolution melting (HRM) analysis. Of these, 52 (54%) were polymorphic between parents of the Ogle1040 × TAM O-301 (OT) mapping population, with 48 segregating as single Mendelian loci, and 44 being placed on the existing OT linkage map. Ogle and TAM amplicons from 12 primers were sequenced for SNP validation, revealing complex polymorphism in seven amplicons but general sequence conservation within SNP loci. Whole-amplicon interrogation with HRM revealed insertions, deletions, and heterozygotes in secondary oat germplasm pools, generating multiple alleles at some primer targets. To validate marker utility, 36 SNP assays were used to evaluate the genetic diversity of 34 diverse oat genotypes. Dendrogram clusters corresponded generally to known genome composition and genetic ancestry. Conclusions: The high-throughput SNP discovery pipeline presented here is a rapid and effective method for identification of polymorphic SNP alleles in the oat genome. The current-generation HRM system is a simple and highly-informative platform for SNP genotyping. These techniques provide a model for SNP discovery and genotyping in other species with complex and poorly-characterized genomes. Background the most abundant type of DNA variation currently Genetic markers accurately distributed throughout a used as genetic markers [1]. SNP assays directly interro- genome are pivotal to the success of association map- gate the sequence variation, reducing genotyping errors ping, marker-assisted breeding (MAB), map-based clon- compared to assays based on size discrimination or ing, and studies relating to genome structure and hybridization. SNP assays are also amenable to high- function. Single nucleotide polymorphisms (SNPs) are throughput technologies, making them an excellent tool for use in modern genomics research. Advances in sequencing technology have enhanced gen- * Correspondence: [email protected] ome-wide SNP discovery, and SNP platforms have been 1USDA-ARS, Small Grains and Potato Germplasm Research Unit, Aberdeen, ID, USA developed for a number of diploid species. These Full list of author information is available at the end of the article © 2011 Oliver et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Oliver et al. BMC Genomics 2011, 12:77 Page 2 of 15 http://www.biomedcentral.com/1471-2164/12/77 resources have enabled comprehensive genetic analyses in and validated manually with Phrap. The hybrid program barley [2,3], rice [4], maize [5], and soybean [6,7], includ- increased the speed and accuracy of SNP calls. ing studies on diversity and population structure, com- Effective SNP discovery in complex genomes would parative genomics, and QTL identification [3,5,8-11]. require additional analysis to consider duplicate loci and As with SNP discovery methods, SNP genotyping to identify and eliminate pseudo-SNPs produced by mis- technologies have proliferated in recent years, with avail- assembly of paralogous and homoeologous sequences able platforms utilizing mass spectroscopy [12], direct inherent to polyploid genomes. For example, cultivated sequencing, fluorescence detection, and microchip wheat, including common wheat (Triticum aestivum, hybridization [13]. Each technology has advantages and 2n = 6x = 42) and durum wheat (T. turgidum,2n=4x ® limitations. For example, the widely utilized TaqMan = 28), has been the subject of intense genetic investiga- assay has a high sample success rate, with excellent tion, but SNP discovery and assay development have not repeatability and cluster separation [14] but requires kept pace with other species of similar importance (e.g., fluorescence-labeled probes, making the assay cost pro- rice). A recent report by Akhunov et al. [22] demon- hibitive for large numbers of assays. The Illumina Gold- strated the feasibility of using the Illumina GoldenGate enGate assay [15,16] is also widely used, but limited to assay for SNP development in both tetraploid and hexa- allele differentiation, and is cost-effective only for a ploid wheat genomes, but also underlined the complica- large number of SNPs run in a single parallel assay. tions presented by multiple targets in polyploid A practical and more flexible alternative may be avail- genomes. Although gene duplication is frequent even in able using amplicon melting analysis in conjunction diploid genomes [23,24], redundant loci are far more with real-time PCR [17]. This method uses unlabeled prevalent within polyploid genomes, complicating the primers and interrogates the entire amplicon, providing discrimination of haplotypes and allele ratios [25,26]. an efficient SNP genotyping system in terms of reagent Complications may also arise if polymorphisms are dis- cost, throughput, and data production. covered through transcriptome analysis, since the tran- Although direct detection of sequence variation is scriptome only represents copies of expressed alleles. robust and accurate, the requirement of sequence infor- For example, Adams et al. [27] showed that homoeolo- mation for SNP discovery has been an obstacle in com- gous genes in cotton produced various levels of gene plex uncharacterized genomes with limited funding, silencing, with expression biased in different genes and including cultivated oat (Avena sativa L.). Current- tissues. If not accounted for, these homoeologous gene generation Roche 454 pyrosequencing now allows sets could distort and prevent marker development in cost-effective de novo sequencing and assembly with regions with subgenome homologies. excellent depth and coverage, low error rates, and rapid Genome assembly will be especially complicated in output [18,19]. Novaes et al. [20] sequenced transcrip- cultivated oat, where partial homoeology, numerous tomes from six tissue types of seven Eucalyptus grandis chromosomal rearrangements, and cis- and trans- (2n = 2x = 22) families using 454 technology. The sequence duplication exist within the genome. Oat sequencing effort generated 148 Mbp of expressed research is also handicapped by a lack of DNA sequence sequence that assembled into 2,392 contigs. Comparison information. GenBank holds only a few thousand ESTs, of individual reads with a consensus assembly detected most of which are from a limited number of genotypes 23,742 putative SNPs. These results demonstrate that and tissues [28]. current-generation transcriptome sequencing can over- Several attempts have been made to develop good mar- come obstacles in crops with sparse sequencing ker resources in oat. Microsatellite technology has pro- resources; however, additional work is needed to apply duced several hundred oat markers [29-34]; however, the these high-throughput SNP-discovery approaches to number of robust markers available for routine labora- polyploid plant genomes. tory use is limited [35]. Recently, a high-throughput mar- Utilization of next-generation sequencing for SNP dis- ker array based on DArT technology was developed for covery requires significant data management and analysis cultivated oat [36]. These markers have quickly become of sequence information. A system for handling data com- the marker of choice for the oat community; however, plexity has been developed for pine [21], and could serve efficient and repeatable use of this technology has thus as a model for SNP mining in other species. The method far been limited to a single service provider, and the mar- runs on a Unix/Linux platform written in Perl, using ker produces a dominant polymorphism which does not Phred for base calling, and Phrap and ProbconsRNA for allow discrimination of both alleles at a locus. The lack of sequence alignment. Individual alignments are converted numerous, easily assayed, and co-dominant markers to FASTA files and aligned to an overall consensus. SNPs remains a major barrier in oat genetics research. Whole- are called

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    15 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us