Triticeae Resources in Ensembl Plants Special Online Collection Dan M. Bolser, Arnaud Kerhornou, Brandon Walts and Paul Kersey* Ensembl Genomes, EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK *Corresponding author: E-mail, [email protected]; Fax, +44 223 494 468. (Received September 22, 2014; Accepted November 12, 2014) Recent developments in DNA sequencing have enabled the Introduction large and complex genomes of many crop species to be determined for the first time, even those previously intract- The first cereal genome sequenced was rice in 2002 (Goff et al. able due to their polyploid nature. Indeed, over the course of 2002, Yu et al. 2002). More recently, progress has accelerated the last 2 years, the genome sequences of several commer- with the publication of the genome sequence of maize in 2009 cially important cereals, notably barley and bread wheat, (Schnable et al. 2009), barley in 2012 (International Barley have become available, as well as those of related wild spe- Sequencing Consortium 2012), progenitors of the bread cies. While still incomplete, comparison with other, more wheat A and D genomes in 2013 (Jia et al. 2013, Ling et al. completely assembled species suggests that coverage of 2013) and the draft bread wheat genome itself in 2014 genic regions is likely to be high. Ensembl Plants (http:// (Brenchley et al. 2012, International Wheat Genome – Database Paper plants.ensembl.org) is an integrative resource organizing, Sequencing Consortium 2014). These four cereals, barley, analyzing and visualizing genome-scale information for im- maize, rice and wheat, together account for 30% of global portant crop and model plants. Available data include ref- food production or 2.4 out of 3.8 billion tonnes annually. erence genome sequence, variant loci, gene models and It is important to note that the current reference genome functional annotation. For variant loci, individual and popu- assemblies vary considerably in their contiguity and in the detail lation genotypes, linkage information and, where available, of available functional annotation. The Triticeae genomes phenotypic information are shown. Comparative analyses were all sequenced primarily using short read sequencing are performed on DNA and protein sequence alignments. (mainly from the Illumina platform), and the completion of The resulting genome alignments and gene trees, represent- these assemblies remains a scientific challenge, due to ing the implied evolutionary history of the gene family, are their large size and repetitive nature. The improvement of made available for visualization and analysis. Driven by the sequencing technologies, particularly those capable of captur- case of bread wheat, specific extensions to the analysis pipe- ing long-range information, will be necessary to achieve lines and web interface have recently been developed to this goal. However, even in their existing condition, these re- support polyploid genomes. Data in Ensembl Plants is ac- sources are already sufficiently complete to be usefully repre- cessible through a genome browser incorporating various sented through data analysis and visualization platforms specialist interfaces for different data types, and through a designed for genomes with finished assemblies, such as variety of additional methods for programmatic access and Ensembl Plants. data mining. These interfaces are consistent with those Ensembl Plants (http://plants.ensembl.org) offers integrative offered through the Ensembl interface for the genomes of access to a wide range of genome-scale data from plant species non-plant species, including those of plant pathogens, pests (Kersey et al. 2014), using the Ensembl software infrastructure and pollinators, facilitating the study of the plant in its (Flicek et al. 2014). Currently, the site includes data from 38 environment. plant genomes, from algae to flowering plants. Genomes are selected for inclusion in the resource based on the availability of Keywords: Comparative genomics Functional genomics the complete genome sequence, their importance as model Editor-in-Chief’s Choice Genetic variation Genome browser Transcriptomics organisms (e.g. Arabidopsis thaliana, Brachypodium distach- Triticeae. yon), their importance in agriculture (e.g. potato, tomato, vari- Abbreviations: API, Application Programming Interface; EST, ous cereals and Brassicaceae) or because of their interest as expressed sequence tag; FTP, File Transfer Protocol; GO, gene evolutionary reference points (e.g the basal angiosperm, ontology; IEA, Inferred by Electronic Annotation; MIPS, Amborella trichopoda, the aquatic alga Chlamydomonas rein- Helmholtz Zentrum Mu¨nchen; POPSEQ, POPulation hardtii, the moss Physcomitrella patens and the vascular non- SEQuencing; QTL, quantitative trait locus; REST, seed spikemoss Selaginella moellendorffii). In total, the resource REpresentational State Transfer; RNA-Seq, RNA sequencing; contains the genomes of 19 true grasses, Musa accuminata SNP, single nucleotide polymorphism; SQL, structured query (banana), 12 dicots and six other species that provide evolu- language; TREP, Triticeae Repeat Sequence Database. tionary context for the plant lineage. Plant Cell Physiol. 56(1): e3(1–11) (2015) doi:10.1093/pcp/pcu183, Advance Access publication on 27 November 2014, available FREE online at www.pcp.oxfordjournals.org ! The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. D. M. Bolser et al. | Triticeae resources in Ensembl plants All species in the resource have data for genome sequence, databases optimized to support the efficient performance of annotations of protein-coding and non-coding genes, and common gene- and variant-centric queries, and can be accessed gene-centric comparative analysis. Additional data types through their own web-based and programmatic interfaces. within the resource include gene expression, sequence poly- morphism and whole-genome alignments, which are selectively Triticeae genomes in Ensembl Plants available for different species. In this sense, Ensembl Plants is Four Triticeae genomes are currently hosted in Ensembl Plants similar to comparable, species-, clade- or data type-specific re- (Table 1): Hordeum vulgare (barley), Triticum aestivum (bread sources such as WheatGenome.info (Lai et al. 2012), HapRice wheat, also known as common wheat) and the genomes of two (Yonemaru et al. 2014) or ATTED-II (Obayashi et al. 2014). of bread wheat’s diploid progenitors: Triticum urartu (the A- Ensembl Plants is released 4–5 times a year, in synchrony genome progenitor) and Aegilops tauschii (the D-genome pro- with releases of other genomes (from animals, fungi, protists genitor). In addition, a further three wheat transcriptomes were and bacteria) in the Ensembl system. The provision of common included by alignment (described below). interfaces allows access to genomic data from across the tree of Barley is the world’s fourth most important cereal crop and life in a consistent manner, including data from plant patho- an important model for ecological adaptation, having been gens, pests and pollinators. cultivated in all temperate regions from the Arctic Circle to the tropics. It was one of the first domesticated cereal grains, originating in the Fertile Crescent of south-west Asia/north- Database east Africa >10,000 years ago (Harlan and Zohary 1966). With a haploid genome size of approximately 5.3 Gbp in The Ensembl genome browser seven chromosomes, the barley genome is among the largest Interactive access to Ensembl Plants is provided through an yet sequenced. However, as a diploid, it is a natural model for advanced genome browser. The browser allows users to visual- the genetics and genomics of the polyploid members of the ize a graphical representation of a completely assembled Triticeae tribe, including wheat and rye. chromosome sequence or a contiguous sequence assembly The current barley genome assembly (cv. Morex) was pro- comprising only a small portion of a molecule at various duced by the International Barley Genome Sequencing levels of resolution. Functionally interesting ‘features’ are de- Consortium (2012). The assembly is highly fragmented, but picted on the sequence with defined locations. Features include comparison with related grass species suggested that coverage conceptual annotations such as genes and variant loci, se- of the gene space was good. The assembly was dubbed a ‘gene- quence patterns such as repeats, and experimental data such ome’, a near complete gene set integrated into a chromosome- as sequence features mapped onto the genome, which often scale assembly using physical and genetic information. provide direct support for the annotations (Fig. 1). Functional Sequence contigs that could not be assigned chromosomal information is provided through import of manual annotation positions in this way were binned by homology to low-coverage from the UniProt Knowledgebase (Uniprot Consortium 2014), shotgun sequence of flow-sorted chromosome arms (Mun˜oz- imputation from protein sequence using the classification tool Amatriaı´n et al. 2011). This method integrated 22% of the total InterProScan (Jones et al. 2014), or by projection from orthologs assembled
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages11 Page
-
File Size-