TRPGR: Sequencing the Barley Gene-Space

Total Page:16

File Type:pdf, Size:1020Kb

TRPGR: Sequencing the Barley Gene-Space

Chan et al., Barley sequencing

TRPGR: Sequencing the barley gene-space

PI: Agnes P. Chan (The Institute for Genomic Research, Rockville, MD). Co-PIs: Timothy J. Close (UC, Riverside, CA); Stefano Lonardi (UC, Riverside, CA); Gary J. Muehlbauer (UMN, St. Paul, MN); Roger Wise (USDA-ARS/ISU, Ames, IA). Senior Personnel: Pablo D. Rabinowicz (The Institute for Genomic Research, Rockville, MD). Service providers: Jeffrey Bennetzen (UGA, Athens, GA); Ming Cheng Luo (UC, Davis, CA).

PROJECT DESCRIPTION

RELEVANCE AND JUSTIFICATION The grain crops in the Triticeae tribe, barley (Hordeum vulgare L.), common wheat (Triticum aestivum), and rye (Secale cereale), are cultivated on 66 million acres in the United States with an average annual value of $8 billion (USDA-National Agricultural Statistics Service, http://www.usda.gov/nass/pubs/agr05/05_ch1.PDF). Barley is one of the major grains used in the food, feed and beer industries throughout the world. Because barley is a true diploid, it is a natural model for genetics and genomics for the Triticeae tribe. Highly collaborative national and international efforts have produced a substantial body of genetic and genomic resources in the past several years, including extensive structured populations and genetic maps, >460,000 expressed sequence tags (ESTs), the community-designed Affymetrix 22K Barley1 GeneChip along with >130 Gb of contributed expression data from >100 treatments, a bacterial artificial chromosome (BAC) library covering 6 genome equivalents from the US, a second BAC library from an European/Australian effort, and a physical map representing gene-containing contigs.

Use this:

Mission statement: The objective of the IBSC is to physically map and sequence the barley gene space, with the near-term need being the identification the remainder of the ~50,000 genes, including the 5’ and 3’ regulatory regions, and the longer-term goal an ordered physical map linked to the genetic map to accelerate crop improvement.

------

News:

The International Barley Sequencing Consortium (IBSC), including the US, Germany, UK, Finland, Australia, and Japan (http://barleygenome.org), was formalized at the 18th annual International Triticeae Mapping Initiative (ITMI) workshop (Victor Harbor, Australia; August 27 – 31, 2006). Recent initiatives of the consortium include:

1. 5,000 full length (FL) barley cDNAs from cv. Haruna Nijo (K. Sato, Okayama University, Japan, nearly complete). Another 30,000 FL cDNAs from the same cDNA pool will be produced at Tsukuba in the next three years. 2. A new integrated whole genome barley physical map funded by the Leibniz society has been initiated at the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) in Gatersleben, Germany and the Australian Centre for Plant Functional Genomics (ACPFG). 350,000 HCIF fingerprints of cv. Morex will be integrated with a minimal tiling path from the NSF project (PI Close, Award #0321756). In addition, a pilot physical map of barley chromosome 3 (syntenic with rice chromosome 1) has been initiated for cv. Haruna Nijo by K. Sato from Okayama University.

1 Chan et al., Barley sequencing

This also will link with French, UK, US, and Australian efforts to physically map and sequence chromosomes 3A, 3B, and 3D from wheat. 3. 800,000 BAC-end sequences from the Morex physical map and anchoring to the genetic map has been submitted for funding in the EU by the Scottish Crop research Institute (SCRI), the University of Udine, and IPK. BAC-end sequencing to be done at the Arizona Genomics Institute (AGI) in the US.4. 4. 3,000 single nucleotide polymorphisms (SNPs) on two Illumina 1,526 SNP OPAs for high throughput mapping and genotyping from the USDA-CSREES Barley CAP project (PI Muehlbauer, Award #2005-05128). These will be combined with the SCRI efforts to reach a total of 5,000 SNPs to quickly leverage the genomic efforts into plant breeding.

Sequencing of grass genomes:

In addition to the existing rice genome resources and the ongoing maize genome sequencing effort, genomic resources for other grass species are also being developed thus allowing comparative analysis within the grass family. The Joint Genome Institute (JGI) from the US Department of Energy (DOE) will generate ESTs and an 8X whole genome shotgun (WGS) draft of the 500 Mbp genome of Brachypodium distachyon, a member of the Brachypodieae tribe, which, like the Triticeae, belongs to the Pooideae subfamily (http://www.jgi.doe.gov/sequencing/why/CSP2007/brachypodium.html). Wheat and Brachypodium conserved orthologous sequences (COS) using rice as the core genome are being developed at the John Innes centre (UK). JGI will also carry out a large-scale EST project studying switchgrass (Panicum virgatum), a member of the Panicoideae subfamily which also includes maize (http://www.jgi.doe.gov/sequencing/why/CSP2007/switchgrass.html). Thus, sequencing the barley gene space will not only provide an excellent genomic resource for the Triticeae tribe but also for the grass family (Poaceae) in general.

------

In the NSF-funded project led by T. Close, clustered EST sequences were used to design overgos to hybridize against the barley Morex BAC library to select clones that contain expressed sequences. Typically, EST sequencing only identifies up to 60% of the genes for a given genome because the transcription of many genes is highly regulated {Barbazuk, 2005 #25}. Genomic-based gene-targeted approaches, on the other hand, result in a more comprehensive gene representation than cDNA-based approaches, reaching 90 to 95% of the total gene content {Barbazuk, 2005 #25; Bedell, 2005 #24; Martienssen, 2004 #19; Rabinowicz, 2003 #14}. Thus, we propose to apply two gene-enrichment (GE) methods, methylation filtration (MF) and high C0t (HC) selection to capture the remaining genes, currently not represented by EST contigs. Experience from maize GE sequencing at TIGR has already demonstrated that the clustering and assembly of GE reads resulted in many complete gene sequences, including upstream and downstream regulatory sequences and introns. Preliminary results show that MF is highly effective when applied to barley and can enrich for gene sequences up to 18-fold when compared to a random sequence sample from WGS sequencing [5], making it an extremely efficient gene discovery tool for the barley genome. Application of HC to other cereal genomes including maize and wheat has shown that HC and MF result in comparable levels of gene-enrichment. In addition, analysis from the maize GE sequencing project demonstrated that both the MF and HC methods targeted for distinct but overlapping genic regions and are therefore complementary approaches for capturing the gene-space from large genomes with a high repeat content, resulting in one of the most successful gene

2 Chan et al., Barley sequencing discovery efforts in grass genomics. Thus, this barley GE sequencing initiative is the logical next step in the US commitment to the international effort to physically map and sequence the barley “gene space”.

Another important preliminary step towards sequencing the large genome of barley is to obtain a glimpse to the genome structure and how it compares to other related sequenced genomes. As barley is expected to have a low gene density (approximately 1 gene every 100 kbp), contiguous sequences in the megabase size range are necessary to be able to perform colinearity analyses that involve several genes per region. Sequencing of BAC contigs in maize, for example, has provided a further understanding of the structure and evolution of this large genome and how the genome and genes expanded and contracted relative to the rice genome (Bruggmann et al 2006 GR, in press).

The proposed project will integrate with and complement the existing barley ESTs to enable researchers to have access to the entire barley gene set. The relationship between GE sequences and gene expression will be possible via in silico alignment with the existing Barley1 gene sequences. Furthermore, GE sequences not represented by ESTs will constitute a source for new overgo probes to identify additional gene-harboring BACs for integration into the barley physical map. In the long term, these gene-rich BAC contigs will be the foundation for eventually sequencing the barley genome, and novel genic GE sequences will be an invaluable resource to design Barley2, the next generation 61,000-probe set GeneChip. The GE sequences will be linked to genetically mapped markers by alignment to all SNP loci generated from the USDA-NRI CAP project (Muehlbauer, Close, and Wise) as well as current and future collaborating international efforts, thus, translating genomics to plant breeding. The barley GE and BAC contig sequences generated in this project will provide the research community with a substantial amount of complete genes, which will be annotated with state-of-the-art tools that have been successfully applied to other plant genome sequencing projects (e.g. Arabidopsis, rice and maize). The sequences will constitute a comprehensive catalogue of barley genes, which will be used for genome-wide and cross- species comparative studies. It will provide the broader grass and plant research communities with the first extensive gene-space sequencing of a member of the Triticeae tribe. The BAC contig sequences will represent the first megabase-size contiguous draft sequences from barley that will provide the first glimpse into the architectural landscape of the barley genome, and possible insight into other members of the Triticeae tribe, synergizing with the other cereal genomes such as rice, maize and sorghum for comparative genomics studies.

Our multi-disciplinary team will actively participate in undergraduate and graduate education, providing opportunities for advanced training in genomics and bioinformatics, particularly for under-represented groups. Thus, these activities will promote research, education, and the dissemination of our results to a broad audience, while developing a new generation of agricultural scientists.

Deliverables: 1. Gene-enriched (GE) assemblies. The MF/HC sequencing approaches will capture gene sequences not represented by ESTs, including 5' and 3' flanking regulatory regions and will provide the community with novel gene sequences in a timely manner. GE assemblies not represented in EST contigs will be used for overgo design. 2. Improved physical map by inclusion of both EST- and GE-derived gene-containing BACs. This will enable integration of the US physical map with the new European/Australian physical map. 3. BAC contigs with assembled draft sequences. This will provide a case study for genome architecture and evolution.

3

Recommended publications