Genome Sequencing and Analysis of the Model Grass Brachypodium Distachyon

Genome Sequencing and Analysis of the Model Grass Brachypodium Distachyon

UC Davis UC Davis Previously Published Works Title Genome sequencing and analysis of the model grass Brachypodium distachyon Permalink https://escholarship.org/uc/item/0pd789wt Journal Nature, 463(7282) ISSN 0028-0836 1476-4687 Authors Vogel, John P Garvin, David F Mockler, Todd C et al. Publication Date 2010-02-11 DOI 10.1038/nature08747 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Vol 463 | 11 February 2010 | doi:10.1038/nature08747 ARTICLES Genome sequencing and analysis of the model grass Brachypodium distachyon The International Brachypodium Initiative* Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops. Grasses provide the bulk of human nutrition, and highly productive (Supplementary Fig. 1) detected two false joins and created a further grasses are promising sources of sustainable energy1. The grass family seven joins to produce five pseudomolecules that spanned 272 Mb (Poaceae) comprises over 600 genera and more than 10,000 species (Supplementary Table 3), within the range measured by flow cyto- that dominate many ecological and agricultural systems2,3. So far, metry20,21. The assembly was confirmed by cytogenetic analysis (Sup- genomic efforts have largely focused on two economically important plementary Fig. 2) and alignment with two physical maps and grass subfamilies, the Ehrhartoideae (rice) and the Panicoideae sequenced BACs (Supplementary Data). More than 98% of expressed (maize, sorghum, sugarcane and millets). The rice4 and sorghum5 sequence tags (ESTs) mapped to the sequence assembly, consistent genome sequences and a detailed physical map of maize6 showed with a near-complete genome (Supplementary Table 4 and Sup- extensive conservation of gene order5,7 and both ancient and rela- plementary Fig. 3). Compared to other grasses, the Brachypodium tively recent polyploidization. genome is very compact, with retrotransposons concentrated at the Most cool season cereal, forage and turf grasses belong to the centromeres and syntenic breakpoints (Fig. 1). DNA transposons and Pooideae subfamily, which is also the largest grass subfamily. The derivatives are broadly distributed and primarily associated with gene- genomes of many pooids are characterized by daunting size and rich regions. complexity. For example, the bread wheat genome is approximately We analysed small RNA populations from inflorescence tissues 17,000 megabases (Mb) and contains three independent genomes8. with deep Illumina sequencing, and mapped them onto the genome This has prohibited genome-scale comparisons spanning the three sequence (Fig. 2a, Supplementary Fig. 4 and Supplementary Table 5). most economically important grass subfamilies. Small RNA reads were most dense in regions of high repeat density, Brachypodium, a member of the Pooideae subfamily, is a wild similar to the distribution reported in Arabidopsis22. We identified annual grass endemic to the Mediterranean and Middle East9 that 413 and 198 21- and 24-nucleotide phased short interfering RNA has promise as a model system. This has led to the development of (siRNA) loci, respectively. Using the same algorithm, the only phased highly efficient transformation10,11, germplasm collections12–14,genetic loci identified in Arabidopsis were five of the eight trans-acting siRNA markers14, a genetic linkage map15, bacterial artificial chromosome loci, and none was 24-nucelotide phased. The biological functions of (BAC) libraries16,17,physicalmaps18 (M.F., unpublished observations), these clusters of Brachypodium phased siRNAs, which account for a mutant collections (http://brachypodium.pw.usda.gov, http://www. significant number of small RNAs that map outside repeat regions, brachytag.org), microarrays and databases (http://www.brachybase. are not known at present. org, http://www.phytozome.net, http://www.modelcrop.org, http:// A total of 25,532 protein-coding gene loci was predicted in the v1.0 mips.helmholtz-muenchen.de/plant/index.jsp) that are facilitating annotation (Supplementary Information and Supplementary Table 6). the use of Brachypodium by the research community. The genome This is in the same range as rice (RAP2, 28,236)23 and sorghum (v1.4, sequence described here will allow Brachypodium to act as a powerful 27,640)5, suggesting similar gene numbers across a broad diversity of functional genomics resource for the grasses. It is also an important grasses. Gene models were evaluated using ,10.2 gigabases (Gb) of advance in grass structural genomics, permitting, for the first time, Illumina RNA-seq data (Supplementary Fig. 5)24. Overall, 92.7% whole-genome comparisons between members of the three most eco- of predicted coding sequences (CDS) were supported by Illumina data nomically important grass subfamilies. (Fig. 2b), demonstrating the high accuracy of the Brachypodium gene predictions. These gene models are available from several data- Genome sequence assembly and annotation bases (such as http://www.brachybase.org, http://www.phytozome.net, The diploid inbred line Bd21 (ref. 19) was sequenced using whole- http://www.modelcrop.org and http://mips.org). genome shotgun sequencing (Supplementary Table 1). The ten largest Between 77 and 84% of gene families (defined according to Sup- scaffolds contained 99.6% of all sequenced nucleotides (Supplemen- plementary Fig. 6) are shared among the three grass subfamilies tary Table 2). Comparison of these ten scaffolds with a genetic map represented by Brachypodium, rice and sorghum, reflecting a relatively *A list of participants and their affiliations appears at the end of the paper. 763 ©2010 Macmillan Publishers Limited. All rights reserved ARTICLES NATURE | Vol 463 | 11 February 2010 a 123 4 5 Total small RNA reads/loci 10,000 Chr. 1 5,000 0 5,000 10,000 Repeat-normalized 21-nt reads STA 1,500 cLTRs 750 sLTRs 0 DNA-TEs 750 MITEs 1,500 CDS Repeat-normalized 24-nt reads 1,500 750 0 750 Chr. 2 1,500 Phased small RNA loci 70 35 0 STA Repeat-normalized RNA-seq reads cLTRs 100,000 sLTRs DNA-TEs 50,000 MITEs CDS 0 b c Rice Sorghum Ehrhardtoideae Panicoideae 16,235 families 17,608 families 20,559 genes 25,816 genes Chr. 3 in families in families 0.9 1,479 495 860 0.7 STA 13,580 cLTRs 0.5 681 1,689 sLTRs DNA-TEs 0.3 Brachypodium MITEs 265 0.1 Wheat/barley CDS Pooideae 16,215 families Coverage over feature length Coverage over feature 20,562 genes UTR CDS SJS 5′UTR 3′ Exons in families Chr. 4 Introns cDNAs Figure 2 | Transcript and gene identification and distribution among three grass subfamilies. a, Genome-wide distribution of small RNA loci and STA transcripts in the Brachypodium genome. Brachypodium chromosomes (1–5) cLTRs sLTRs are shown at the top. Total small RNA reads (black lines) and total small RNA DNA-TEs loci (red lines) are shown on the top panel. Histograms plot 21-nucleotide (nt) MITEs CDS (blue) or 24-nucleotide (red) small RNA reads normalized for repeated matches to the genome. The phased loci histograms plot the position and phase-score of 21-nucleotide (blue) and 24-nucleotide (red) phased small RNA loci. Repeat- Retrotransposons Chr. 5 normalized RNA-seq read histograms plot the abundance of reads matching Genes (introns) RNA transcripts (green), normalized for ambiguous matches to the genome. Genes (CDS exons) DNA transposons b, Transcript coverage over gene features. Perfect match 32-base oligonucleotide Satellite tandem arrays STA Illumina reads were mapped to the Brachypodium v1.0 annotation features cLTRs sLTRs using HashMatch (http://mocklerlab-tools.cgrb.oregonstate.edu/). Plots of DNA-TEs Illumina coverage were calculated as the percentage of bases along the length of MITEs CDS the sequence feature supported by Illumina reads for the indicated gene model features. The bottom and top of the box represent the 25th and 75th quartiles, Figure 1 | Chromosomal distribution of the main Brachypodium genome respectively. The white line is the medianandthereddiamondsdenotethe features. The abundance and distribution of the following genome elements mean. SJS, splice junction site. c, Venn diagram showing the distribution of are shown: complete LTR retroelements (cLTRs); solo-LTRs (sLTRs); shared gene families between representatives of Ehrhartoideae (rice RAP2), potentially autonomous DNA transposons that are not miniature inverted- Panicoideae (sorghum v1.4) and Pooideae (Brachypodium v1.0, and Triticum repeat transposable elements (MITEs) (DNA-TEs); MITEs; gene exons (CDS); aestivum and Hordeum vulgare TCs (transcript consensus)/EST sequences). gene introns and satellite tandem arrays (STA). Graphs are from 0 to 100 per Paralogous gene families were collapsed in these data sets. cent base-pair (%bp) coverage of the respective window. The heat map tracks have different ranges and different

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us