An Integrated Transcriptome Atlas of the Crop Model Glycine Max, and Its Use in Comparative Analyses in Plants
Total Page:16
File Type:pdf, Size:1020Kb
The Plant Journal (2010) 63, 86–99 doi: 10.1111/j.1365-313X.2010.04222.x An integrated transcriptome atlas of the crop model Glycine max, and its use in comparative analyses in plants Marc Libault1,*, Andrew Farmer2, Trupti Joshi3, Kaori Takahashi1, Raymond J. Langley2, Levi D. Franklin3,JiHe4, Dong Xu3, Gregory May2 and Gary Stacey1 1Division of Plant Sciences, National Center for Soybean Biotechnology, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA, 2National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA, 3Computer Science Department, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA, and 4Plant Biology Division, Samuel Roberts Noble Foundation, Ardmore, OK 73401, USA Received 18 January 2010; revised 25 March 2010; accepted 31 March 2010; published online 14 May 2010. *For correspondence (fax +573 884 9676; e-mail [email protected]). SUMMARY Soybean (Glycine max L.) is a major crop providing an important source of protein and oil, which can also be converted into biodiesel. A major milestone in soybean research was the recent sequencing of its genome. The sequence predicts 69 145 putative soybean genes, with 46 430 predicted with high confidence. In order to examine the expression of these genes, we utilized the Illumina Solexa platform to sequence cDNA derived from 14 conditions (tissues). The result is a searchable soybean gene expression atlas accessible through a browser (http://digbio.missouri.edu/soybean_atlas). The data provide experimental support for the transcrip- tion of 55 616 annotated genes and also demonstrate that 13 529 annotated soybean genes are putative pseudogenes, and 1736 currently unannotated sequences are transcribed. An analysis of this atlas reveals strong differences in gene expression patterns between different tissues, especially between root and aerial organs, but also reveals similarities between gene expression in other tissues, such as flower and leaf organs. In order to demonstrate the full utility of the atlas, we investigated the expression patterns of genes implicated in nodulation, and also transcription factors, using both the Solexa sequence data and large-scale qRT-PCR. The availability of the soybean gene expression atlas allowed a comparison with gene expression documented in the two model legume species, Medicago truncatula and Lotus japonicus, as well as data available for Arabidopsis thaliana, facilitating both basic and applied aspects of soybean research. Keywords: soybean, gene expression atlas, comparative genomic, transcription factors, nodulation. INTRODUCTION After grasses, legumes are the most economically impor- hair cell curling, cortical cell division, induction of Nod tant plant family based on their consumption in human and factor-responsive plant genes and calcium spiking in root animal nutrition. In addition, the use of legumes in biofuel hair cells). These changes are the first signs of the devel- production will further increase the economic impact of this opment of a new plant organ, the nodule, where the bac- plant family. These characteristics justify a substantial effort teria differentiate into bacteroids and reduce atmN2.In by the research community to better understand legume exchange, the plant provides a steady supply of carbon to biology. An attribute of most legumes is the development of the bacteroids. a symbiotic interaction with soil bacteria (rhizobia) that fix As part of the effort to better understand legume biology, and assimilate atmospheric dinitrogen (atmN2). This symbi- the genome sequences of three legume species are now osis is based on the chemical recognition of diffusible sig- complete, or nearly complete: that is, Lotus japonicus nals by both partners, which determines the specificity of (Lotus; http://www.kazusa.or.jp/lotus), Glycine max (soy- the interaction (Oldroyd and Downie, 2008). For example, bean; http://www.phytozome.net/soybean) and Medicago the recognition of the lipo-chitin Nod factor, produced by truncatula (Medicago; http://www.medicago.org/genome). rhizobia, by the root hair cells of the compatible host leads Schmutz et al. (2010) recently described the complete soy- to plant morphological and biochemical changes (e.g. root bean genome sequence. In each case, a large number of 86 ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd Soybean transcriptome atlas 87 genes were predicted. The availability of these genome RESULTS AND DISCUSSION sequences now enables a variety of functional genomic Sequence-based transcriptome atlas of soybean: methods to characterize these genes and their related an overview functions. For example, large-scale cDNA sequencing tech- nologies [e.g. 454 Life Sciences (Margulies et al., 2005) and We used the Illumina Solexa sequencing platform to quan- Illumina Solexa platforms (Bennett et al., 2005)] provide a tify the expression of soybean genes (i.e. the number of means to accurately profile gene expression (e.g. Libault sequence reads/million reads aligned) in nine different et al., 2010). In the past, gene expression atlases were conditions: root hair cells isolated 84 and 120 h after sowing established in Arabidopsis thaliana (Schmid et al., 2005), (HAS), root tip, root, mature nodules, leaves, SAM, flower Oryza sativa (Nobuta et al., 2007; Jiao et al., 2009), M. trun- and green pods. Our choice to include root hair cells isolated catula (Benedito et al., 2008) and L. japonicus (Hogslund at two different time points in this analysis was motivated by et al., 2009) by using massive, parallel-signature sequencing the changes in their transcriptome during development and array-hybridization technologies. (Libault et al., 2010). Between 4.18 and 6.84 million reads of In this study, the high-throughput Illumina Solexa around 36 bp were generated for each of the nine condi- sequencing platform was used to develop a gene expression tions. Among them, 45.8–82.6% of the reads aligned with atlas of the soybean genome. cDNAs derived from a total of less than five loci on the soybean genome (Table 1). Such nine different soybean tissues were sequenced. Included in variation resulted from the high and low numbers of the soybean gene atlas are five additional data sets, unaligned and repetitive reads (i.e. from matches with more described by Libault et al. (2010), for a combined total of than five loci) in pod (54.2% of the total reads) and flower 14 different conditions (tissues). This provides an unprece- samples (17.4% of the total reads), respectively. We classi- dented coverage of the transcriptome, including documen- fied the sequence reads aligned with less than five loci on tation of expression from annotated pseudogenes and the soybean genome into two different groups based on the unannotated genes, and also provides accurate quantifica- number of matches identified against the soybean genome tion of low abundant transcripts (Cheung et al., 2006; Weber [i.e. non-unique reads (from two to five loci) and unique et al., 2007; Libault et al., 2010). To demonstrate the utility of reads (only one soybean locus); Table 1]. To insure accuracy the soybean gene expression atlas, we focused specifically in the quantification of expression in the different tissues on expression in root hair cells, as well as on meristem- tested, only the sequence reads matching uniquely against specific genes and expression of transcription factor (TF) the soybean genome were used. A total of 51 529 annotated genes. The results from the soybean gene expression atlas soybean genes (74.5% of the 69 145 putative, annotated were also compared with previously published expression soybean genes) were found to be expressed in at least one data from A. thaliana, M. truncatula and L. japonicus. For condition (Table S1). Included in the present analysis are example, the comparison to the well-annotated A. thaliana five additional data sets described by Libault et al. (2010) – genome identified putative soybean genes involved in the i.e. root hairs harvested 12, 24 and 48 h after Bradyrhizobi- determination of floral organs and the maintenance of the um japonicum inoculation (HAI); 24-HAI mock-inoculated shoot apical meristem (SAM). The availability of the soybean root hairs; and 48-HAI inoculated stripped roots (Table S2) – gene expression atlas should facilitate additional studies on resulting in the documentation of expression for a total of the basic biology of soybean, while also supporting applied 52 947 annotated genes. No gene expression in any of the 14 research to improve soybean agronomic performance. conditions was detected for 16 198 annotated genes, Table 1 Distribution of Illumina-Solexa 36-bp reads according to their alignment G. max Unaligned and against the Glycine max (soybean) G. max non-unique highly repetitive genome Sample unique (2–5 matches) reads (>5 matches) Total reads Root tip 3 235 689 850 750 1 068 142 5 154 581 Root 3 790 433 884 257 1 432 754 6 107 444 84-HAS root hairs 2 828 246 719 626 2 063 637 5 611 509 120-HAS root hairs 4 086 965 1 052 457 1 698 787 6 838 209 Nodule 3 401 083 936 037 1 999 389 6 336 509 Leaves 2 813 916 1 202 914 1 279 012 5 295 842 Shoot apical 3 947 566 1 041 894 1 488 700 6 478 160 meristem Flower 3 372 444 902 730 901 116 5 176 290 Green pods 1 462 809 453 340 2 268 639 4 184 788 HAS, h after sowing. ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99 88 Marc Libault et al. suggesting that these genes were not expressed, were each of the 7127 regions found to have gene expression. expressed at a level below our detection limit or were Using FGENESH, we predicted putative protein-coding expressed only under highly restricted conditions (Table genes for 6059 of the 7127 loci (85%).