The Plant Journal (2010) 63, 86–99 doi: 10.1111/j.1365-313X.2010.04222.x An integrated transcriptome atlas of the crop model Glycine max, and its use in comparative analyses in plants

Marc Libault1,*, Andrew Farmer2, Trupti Joshi3, Kaori Takahashi1, Raymond J. Langley2, Levi D. Franklin3,JiHe4, Dong Xu3, Gregory May2 and Gary Stacey1 1Division of Plant Sciences, National Center for Biotechnology, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA, 2National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA, 3Computer Science Department, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA, and 4Plant Biology Division, Samuel Roberts Noble Foundation, Ardmore, OK 73401, USA

Received 18 January 2010; revised 25 March 2010; accepted 31 March 2010; published online 14 May 2010. *For correspondence (fax +573 884 9676; e-mail [email protected]).

SUMMARY Soybean (Glycine max L.) is a major crop providing an important source of and oil, which can also be converted into biodiesel. A major milestone in soybean research was the recent sequencing of its genome. The sequence predicts 69 145 putative soybean , with 46 430 predicted with high confidence. In order to examine the expression of these genes, we utilized the Illumina Solexa platform to sequence cDNA derived from 14 conditions (tissues). The result is a searchable soybean expression atlas accessible through a browser (http://digbio.missouri.edu/soybean_atlas). The data provide experimental support for the transcrip- tion of 55 616 annotated genes and also demonstrate that 13 529 annotated soybean genes are putative pseudogenes, and 1736 currently unannotated sequences are transcribed. An analysis of this atlas reveals strong differences in gene expression patterns between different tissues, especially between root and aerial organs, but also reveals similarities between gene expression in other tissues, such as flower and leaf organs. In order to demonstrate the full utility of the atlas, we investigated the expression patterns of genes implicated in nodulation, and also transcription factors, using both the Solexa sequence data and large-scale qRT-PCR. The availability of the soybean gene expression atlas allowed a comparison with gene expression documented in the two model legume species, and Lotus japonicus, as well as data available for Arabidopsis thaliana, facilitating both basic and applied aspects of soybean research.

Keywords: soybean, gene expression atlas, comparative genomic, transcription factors, nodulation.

INTRODUCTION After grasses, legumes are the most economically impor- hair cell curling, cortical cell division, induction of Nod tant plant family based on their consumption in human and factor-responsive plant genes and calcium spiking in root animal nutrition. In addition, the use of legumes in biofuel hair cells). These changes are the first signs of the devel- production will further increase the economic impact of this opment of a new plant organ, the nodule, where the bac- plant family. These characteristics justify a substantial effort teria differentiate into bacteroids and reduce atmN2.In by the research community to better understand legume exchange, the plant provides a steady supply of carbon to biology. An attribute of most legumes is the development of the bacteroids. a symbiotic interaction with soil bacteria () that fix As part of the effort to better understand legume biology, and assimilate atmospheric dinitrogen (atmN2). This symbi- the genome sequences of three legume species are now osis is based on the chemical recognition of diffusible sig- complete, or nearly complete: that is, Lotus japonicus nals by both partners, which determines the specificity of (Lotus; http://www.kazusa.or.jp/lotus), Glycine max (soy- the interaction (Oldroyd and Downie, 2008). For example, bean; http://www.phytozome.net/soybean) and Medicago the recognition of the lipo-chitin , produced by truncatula (Medicago; http://www.medicago.org/genome). rhizobia, by the root hair cells of the compatible host leads Schmutz et al. (2010) recently described the complete soy- to plant morphological and biochemical changes (e.g. root bean genome sequence. In each case, a large number of

86 ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd Soybean transcriptome atlas 87 genes were predicted. The availability of these genome RESULTS AND DISCUSSION sequences now enables a variety of functional genomic Sequence-based transcriptome atlas of soybean: methods to characterize these genes and their related an overview functions. For example, large-scale cDNA sequencing tech- nologies [e.g. 454 Life Sciences (Margulies et al., 2005) and We used the Illumina Solexa sequencing platform to quan- Illumina Solexa platforms (Bennett et al., 2005)] provide a tify the expression of soybean genes (i.e. the number of means to accurately profile gene expression (e.g. Libault sequence reads/million reads aligned) in nine different et al., 2010). In the past, gene expression atlases were conditions: root hair cells isolated 84 and 120 h after sowing established in Arabidopsis thaliana (Schmid et al., 2005), (HAS), root tip, root, mature nodules, leaves, SAM, flower Oryza sativa (Nobuta et al., 2007; Jiao et al., 2009), M. trun- and green pods. Our choice to include root hair cells isolated catula (Benedito et al., 2008) and L. japonicus (Hogslund at two different time points in this analysis was motivated by et al., 2009) by using massive, parallel-signature sequencing the changes in their transcriptome during development and array-hybridization technologies. (Libault et al., 2010). Between 4.18 and 6.84 million reads of In this study, the high-throughput Illumina Solexa around 36 bp were generated for each of the nine condi- sequencing platform was used to develop a gene expression tions. Among them, 45.8–82.6% of the reads aligned with atlas of the soybean genome. cDNAs derived from a total of less than five loci on the soybean genome (Table 1). Such nine different soybean tissues were sequenced. Included in variation resulted from the high and low numbers of the soybean gene atlas are five additional data sets, unaligned and repetitive reads (i.e. from matches with more described by Libault et al. (2010), for a combined total of than five loci) in pod (54.2% of the total reads) and flower 14 different conditions (tissues). This provides an unprece- samples (17.4% of the total reads), respectively. We classi- dented coverage of the transcriptome, including documen- fied the sequence reads aligned with less than five loci on tation of expression from annotated pseudogenes and the soybean genome into two different groups based on the unannotated genes, and also provides accurate quantifica- number of matches identified against the soybean genome tion of low abundant transcripts (Cheung et al., 2006; Weber [i.e. non-unique reads (from two to five loci) and unique et al., 2007; Libault et al., 2010). To demonstrate the utility of reads (only one soybean locus); Table 1]. To insure accuracy the soybean gene expression atlas, we focused specifically in the quantification of expression in the different tissues on expression in root hair cells, as well as on meristem- tested, only the sequence reads matching uniquely against specific genes and expression of transcription factor (TF) the soybean genome were used. A total of 51 529 annotated genes. The results from the soybean gene expression atlas soybean genes (74.5% of the 69 145 putative, annotated were also compared with previously published expression soybean genes) were found to be expressed in at least one data from A. thaliana, M. truncatula and L. japonicus. For condition (Table S1). Included in the present analysis are example, the comparison to the well-annotated A. thaliana five additional data sets described by Libault et al. (2010) – genome identified putative soybean genes involved in the i.e. root hairs harvested 12, 24 and 48 h after Bradyrhizobi- determination of floral organs and the maintenance of the um japonicum inoculation (HAI); 24-HAI mock-inoculated shoot apical meristem (SAM). The availability of the soybean root hairs; and 48-HAI inoculated stripped roots (Table S2) – gene expression atlas should facilitate additional studies on resulting in the documentation of expression for a total of the basic biology of soybean, while also supporting applied 52 947 annotated genes. No gene expression in any of the 14 research to improve soybean agronomic performance. conditions was detected for 16 198 annotated genes,

Table 1 Distribution of Illumina-Solexa 36-bp reads according to their alignment G. max Unaligned and against the Glycine max (soybean) G. max non-unique highly repetitive genome Sample unique (2–5 matches) reads (>5 matches) Total reads Root tip 3 235 689 850 750 1 068 142 5 154 581 Root 3 790 433 884 257 1 432 754 6 107 444 84-HAS root hairs 2 828 246 719 626 2 063 637 5 611 509 120-HAS root hairs 4 086 965 1 052 457 1 698 787 6 838 209 Nodule 3 401 083 936 037 1 999 389 6 336 509 Leaves 2 813 916 1 202 914 1 279 012 5 295 842 Shoot apical 3 947 566 1 041 894 1 488 700 6 478 160 meristem Flower 3 372 444 902 730 901 116 5 176 290 Green pods 1 462 809 453 340 2 268 639 4 184 788

HAS, h after sowing.

ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99 88 Marc Libault et al. suggesting that these genes were not expressed, were each of the 7127 regions found to have gene expression. expressed at a level below our detection limit or were Using FGENESH, we predicted putative protein-coding expressed only under highly restricted conditions (Table genes for 6059 of the 7127 loci (85%). Among them, 4323 S2). The data also shows expression from 7314 different of the gene predictions overlapped existing annotated soybean loci currently lacking any gene annotation (Table genes, resulting in the 5¢ or 3¢ expansion of the currently S2). Considering only the nine conditions sequenced as part annotated cDNA sequences (Table S6). The remaining 1736 of the current study, the data demonstrate expression from genes predicted by FGENESH did not overlap currently 7174 currently unannotated regions (Table S1). A number of annotated genes, suggesting the existence of new protein- root hair genes were found to be specifically expressed upon coding genes. We used Interproscan (Zdobnov and Apwe- inoculation with B. japonicum, as documented by Libault iler, 2001) software to identify the signature domains of the et al. (2010). encoded : 542 and 1194 genes encode protein with The soybean genome annotation, as described by and without conserved domains, respectively (Table S7). Schmutz et al. (2010), refers to 46 430 soybean genes Altogether, our analysis suggested that 57 352 soybean predicted with high confidence, with the remaining genes genes are transcribed (i.e. 55 616 out of the 69 145 putative predicted with low confidence. We compared our gene list genes in the current, published soybean genome annota- for which no detectable expression was found across 14 tion; the remaining 13 529 are putative pseudogenes, plus conditions with the list of low-confidence genes. From the 1736 newly annotated genes). list of 16 198 putative genes lacking expression, 12 673 Tissue-specific gene expression (78.2%) were predicted with low confidence in the current soybean genome annotation (Table S3). The presence of an Benedito et al. (2008) noticed large differences in the tran- expressed sequence tag (EST) or full-length cDNA sequence scriptome between one M. truncatula organ compared with led to the annotation of the remaining 3525 genes with high another, based on a number of DNA microarray hybridiza- confidence (Table S3). Having reviewed the conditions in tions. Similarly, Schmid et al. (2005) and Aceituno et al. which these 3525 transcripts were detected, we conclude (2008) concluded that the A. thaliana transcriptome strongly these genes were expressed under highly restricted condi- varied from one organ to another. These studies suggest tions, such as at very specific stages of organ development that the identity of specific plant organs is derived from the or in specific response to abiotic stress, such as drought respective transcriptome. In soybean, across the nine tis- stress. Therefore, it is likely that most of the 12 673 low- sues tested in the current study, the number of annotated/ confidence genes, which lack expression, are pseudogenes. unannotated sequences transcribed was similar from one Soybean is an allotetraploid that has undergone at least tissue to another (min. 52.4% in pod; max. 61.2% in the two rounds of whole genome duplication, with the most SAM; Table 2). Altogether, these percentages were slightly recent having occurred approximately 13 Mya (Schlueter lower than those reported in M. truncatula (55–63%; et al., 2004, 2007; Gill et al., 2009). In a previous study, we Benedito et al., 2008) and A. thaliana tissues (55–67%; demonstrated cases in which the homeologous gene pairs Schmid et al., 2005). Such differences might be a direct showed significant divergence in their expression (Libault consequence of the non-negligible number of putative et al., 2010). In order to examine this on a whole genome pseudogenes mentioned above, and might also reflect the basis, we established syntenic relationships between 19 533 residual background or cross-hybridization existing when annotated genes (28.2% of the annotated soybean genes) to using array hybridization technology. A similar number of establish their homeology (Table S4). Among the 12 673 soybean genes were expressed in a single cell type (root predicted pseudogenes, we identified homeologs expressed hair) and in multicellular organs (e.g. 45 717, 40 034, 43 377 at some level in all conditions tested for only 61 (<1%; and 46 173 soybean genes were expressed in flower, pod, Table S5). Such results are consistent with current theories 84- and 120-HAS root hair cells, respectively; Table 2). Jiao of gene evolution, where, after whole genome duplication, et al. (2009) previously reported that transcripts undetect- gene fates include silencing or neofunctionalization of one able in cDNA derived from shoot, root or germinated seeds of the two copies (Adams, 2007). could be detected if mRNA was sampled from a single cell A number of sequence reads matched against the 7314 type from this organ. Therefore, we hypothesize that the loci currently lacking gene annotation (Table S2). The heterogeneous population of differentiated cells composing majority of these loci (7127) were found in regions assem- a soybean organ results in a larger diversity of expressed bled as part of the chromosome pseudomolecules, whereas sequences, but also in the poor detection of low-abundance the remainder (187) were located on currently unanchored transcripts. In contrast, cDNA derived from the single cell scaffolds. In a previous study, we demonstrated the use of root hairs allows for the detection of low-abundance tran- high-throughput cDNA sequencing to improve the current scripts, because of a lack of dilution from other tissues, and soybean genome annotation (Libault et al., 2010). Therefore, the homogeneity of the tissue sampled. Apparently, these we mined 20 kbp of the genomic DNA sequence around opposing factors result in approximately the same number

ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99 Soybean transcriptome atlas 89

Table 2 Distribution of expressed and not expressed annotated and unannotated Number of silenced sequences sequences across nine Glycine max (soy- Number of expressed sequences (i.e. no transcript detected) bean) tissues Annotated Unannotated Annotated Unannotated sequences (%) sequences (%) sequences (%) sequences (%)

Root hair 84 HAS 38 645 (50.54) 4732 (6.19) 30 500 (39.89) 2582 (3.38) Root hair 120 HAS 40 849 (53.43) 5324 (6.97) 28 296 (37.01) 1990 (2.60) Root tip 36 882 (48.24) 4624 (6.05) 32 263 (42.20) 2690 (3.52) Root 40 576 (53.07) 5126 (6.71) 28 569 (37.37) 2188 (2.86) Nodule 36 369 (47.57) 4438 (5.81) 32 776 (42.87) 2876 (3.76) Leaf 37 600 (49.18) 4518 (5.91) 31 545 (41.26) 2796 (3.66) Shoot apical meristem 41 415 (54.17) 5341 (6.99) 27 730 (36.27) 1973 (2.58) Flower 40 863 (53.44) 4854 (6.35) 28 282 (36.99) 2460 (3.22) Pod 36 325 (47.51) 3709 (4.85) 32 820 (42.92) 3605 (4.72)

of transcripts sequenced from a single cell type and multi- soybean organs are likely to reflect their unique identity, and cellular organ samples. are not the result of specific environmental conditions. To better establish the identity of the different soybean Therefore, in order to better understand soybean organ tissues, we generated a heat map based on the correlation development, we analyzed the soybean gene expression between their transcriptomes (Figure 1a). Based on this atlas to identify those genes that were ubiquitously map, the nine organs can be divided into three different expressed across the nine tissues, and those showing a groups: (i) root tip, root and root hairs; (ii) SAM, pod, flower very high level of tissue-specific expression. The results of and leaf; and (iii) nodule. The lack of correlation between this analysis showed that 58 703 soybean genome loci, root-related tissues and aerial organs was previously including both annotated and unannotated regions, were reported by Benedito et al. (2008) in M. truncatula. These expressed in at least one of the nine soybean tissues. results are likely to reflect the divergence in function Roughly half of these genes (28 374) were transcribed between the root and aerial portions of the plant. Consistent ubiquitously (Table S8). In theory, organ identity could with this notion, other tissues show significant overlap in depend on both the level of expression of ubiquitously their transcriptomes. For example, gene expression in the expressed genes and the organ-specific expression of soybean pod and SAM was strongly correlated (Figure 1a). selected genes. To address this issue, we first compared The transcription profile can also reflect development. For the overall expression levels of the 28 374 ubiquitous genes example, the flower and leaf transcriptomes were closely between the nine conditions (Figure 1b). As shown in correlated. In 1790, Goethe hypothesized that floral organs Figure 1, this analysis revealed significant differences in were modified leaves (Coen, 2001). Indeed, four MADS-box the absolute expression levels of the 28 374 ubiquitously TF genes named SEPALLATA1–4 (SEP1, SEP2, SEP3 and expressed genes. These data also leave the impression that SEP4, previously named AGL2, AGL4, AGL9 and AGL3) were few, if any, soybean genes are stably expressed in the characterized for their role in the acquisition of floral organ various soybean tissues. In order to examine this directly, identity, as sep mutants develop leaf-like organs instead of we included the additional five conditions from the publi- flowers (Honma and Goto, 2001; Pelaz et al., 2001; Ditta cation by Libault et al. (2010) to define genes constitutively et al., 2004). These results suggest that organ-specific gene expressed by the following criteria: (i) the gene was expression could be the result of the action of relatively few expressed in all 14 conditions tested; (ii) the fold change in regulatory genes. the relative expression levels was not higher than three The soybean nodule transcriptome showed little correla- between conditions where genes were the most and the tion with other organs, with the exception of mature roots. It least expressed. These criteria identified 2532 putative is interesting to note that the soybean root hair transcrip- constitutive genes (Figure S1; Table S9). Among these, tome was not strongly correlated with that of the whole root, PFAM, KOG or PANTHER conserved domains were identi- nor with any of the other soybean tissues analyzed (Fig- fied for 2187 genes, leading to the identification of 140 TF ure 1a). This is likely to reflect the specialization of this single genes [2.5% of the 5671 predicted TF genes in the soybean cell type, but also the tissue dilution that occurred by genome; Schmutz et al., 2010; Libault et al., 2009a; PFAM, sampling the other organs, especially the roots. KOG and PANTHER domain predictions are available from In a previous study, Aceituno et al. (2008) showed that the ftp://ftp.jgi-psf.org/pub/JGI_data/Glycine_max/Glyma1/Gly- Arabidopsis organ transcriptomes were not strongly ma1_domains). Such a relatively low number is a direct affected in response to environmental changes. Therefore, reflection of the specific role of TF genes in the determina- the unique transcriptomic patterns exhibited by the various tion of plant organ identity.

ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99 90 Marc Libault et al.

(a) cus flowers exhibited the highest degree of tissue-specific 84HAS RH –1 gene expression. In soybean, the largest numbers of tissue- –0.75 120HAS RH –0.5 specific genes were identified in nodules and flowers (1465 Root tip –0.25 and 1145 genes, respectively; ‡3-fold change; Figure 2b). 0 Using more stringent parameters, soybean nodule, flower Root 0.25 0.5 and pod were the organs that were strongly enriched in Nodule 0.75 highly tissue-specific genes (61, 54 and 29 genes, respec- Leaf 1 tively; ‡100-fold change; Figure 2b). Given the lack of Flower correlation in overall gene expression between the nodule SAM transcriptome and the other tissues sampled (Figure 1), it

Pod was not surprising to identify this tissue among those showing the highest level of organ-specific gene expression. In contrast, it would appear that the correlation in the overall Root tip level of gene expression between flowers and leaves (b) (Figure 1) hides a significant level of flower-specific gene 84HAS RH –1 expression (1145 flower-specific genes; ‡3-fold change). –0.75 120HAS RH –0.5 These genes are clearly strong candidates for determining Root –0.25 the specific functional components of the flower. The overall 0 soybean transcriptome was also mapped relative to the Root tip 0.25 0.5 position of the respective genes in the assembled soybean Nodule 0.75 genome. As an aid to visualization of these data, we SAM 1 established a color-code map for each chromosome, and Pod for each tissue, to reflect the overall gene expression level Flower (Figure 3). These data, as well as the data from the earlier Leaf Libault et al. (2010) study, can best be viewed as part of the soybean genome browser available at http://digbio. missouri.edu/soybean_atlas. Visualizing the data in this Root tip way rapidly demonstrates that most of the protein-coding

Figure 1. Comparison of the transcriptomes of various Glycine max (soy- genes and also the most strongly expressed genes are located bean) tissues. on the chromosome arms, whereas expression from the Ward hierarchical clustering of log2 transformed gene distribution in nine less gene-dense pericentromeric regions is much reduced. diverse soybean organs [root hair cells isolated 84 and 120 h after sowing, root tip, root, mature nodules, leaves, shoot apical meristem (SAM), flower Root hair and meristem-specific soybean genes and green pods], based on Pearson correlation coefficients. The entire soybean tissue transcriptome (a) or the 28 374 annotated soybean genes Root hairs are single cell extensions of the root epidermis, identified to be expressed in all nine tissues (b) were used to generate two distinct maps. The color scale indicates the degree of correlation (green, low and play a key role in water and nutrient uptake. However, correlation; red, strong correlation). The heat map was generated using in legumes, they play a secondary role as the primary site JMP GENOMICS 4.0. for rhizobial infection, leading to the development of nitrogen-fixing nodules. Root hairs also exhibit polar cell expansion. In a previous study, we identified around 2000 soybean genes regulated in root hair cells in response to We also sought to identify soybean transcripts expressed B. japonicum infection (Libault et al., 2010). In order to solely in one soybean organ. These genes were classified extend our understanding of the soybean root hair cell, we into four groups depending on their tissue specificity: also sought to identify genes that were specifically preferentially (‡3- and <10-fold changes between the expres- expressed in root hairs. Using the same criteria outlined sion levels of the most highly expressed and second most above, we identified 451 soybean sequences that were highly expressed genes), specifically (‡10- and <100-fold preferentially expressed in root hairs, including 69 and change), very specifically (‡100- and <1000-fold change) and three root hair-specific and highly specific genes, respec- exclusively identified in one tissue (‡1000-fold change). tively (Table S11). Using PFAM, KOG and PANTHER These criteria identified 5313, 1374, 147 and nine genes that domain predictions, we predicted the functions of 304 of were preferentially, specifically, highly specifically and the 451 annotated genes. Some gene families are clearly exclusively expressed in one tissue, respectively (Figure 2a; over-represented in this list of root hair-specific genes. For Table S10). Benedito et al. (2008) reported that M. truncatula example, cellulase (three genes, 1%), pectinesterase (four seeds and nodules possessed the largest number of tissue- genes, 1.3%), peroxidase (eight genes, 2.6%) and extensin specific genes. Hogslund et al. (2009) found that L. japoni- genes (four genes, 1.3%) were gene families preferentially

ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99 Soybean transcriptome atlas 91

Figure 2. Gene expression specificity across (a) 8000 nine Glycine max (soybean) tissues. 6843 (a) All soybean transcripts (dashed grey line), 7000 unannotated transcripts (black line) and tran- scription factor transcripts (grey line) were clas- 6000 sified into four groups according to their tissue specificity: preferentially (‡3- and <10-fold changes between the expression levels of the 5000 most highly expressed and second most highly expressed genes), specifically (‡10- and <100- 4000 fold changes), very specifically (‡100- and <1000-fold changes) and exclusively identified 3000 in one tissue (‡1000-fold change). Gene number (b) Distribution of the number of overall soy- 2000 bean transcripts in the nine different soybean 1530 899 tissues tested according to their level of speci- 1000 ficity (3-, 10-, 100- and 1000-fold change cut-off). 213 156 624 9 0 8 120 6 1 0 3 10 100 1000

Tissue specificity Fold-change

Annotated and unannotated sequences

Unannotated sequences

TF genes

(b) 3 fold-change cut-off 10 fold-change cut-off 1600 800

1200 600

800 400

400 200 Gene number

0 0

100 fold-change cut-off 1000 fold-change cut-off 80 6

60 4

40

2 20 Gene number

0 0

) expressed in root hairs (v2 <1· e 50). These families rep- et al., 2001, 2003; Bucher et al., 2002; Carol and Dolan, resented only 0.06% (28 genes), 0.3% (144 genes), 0.4% 2006). (205 genes) and 0.03% (16 genes) of the 47 724 soybean Shoot apical and root meristems are the locations of the annotated genes for which predicted functions were intense cell division required for plant growth. We combined established. It is likely that the expression of these gene the transcriptomes of these two meristematic tissues to families reflects the polar growth of the root hair cells, identify 28 soybean genes that were preferentially expressed where continuous cell wall expansion is required, and in the soybean meristematic zones (Table S11). Among where reactive oxygen species are essential (Baumberger these, 18 genes encode proteins with conserved domains,

ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99 92 Marc Libault et al.

Figure 3. Color code maps of gene expression across the 20 Glycine max (soybean) chromosomes. For each chromosome, gene expression (i.e. number of sequence reads per million reads aligned: <0.5, yellow; 0.5–2, orange; 2–5, light green; 5–10, green; 10–25, greenish brown; 25–50, brown; 50–100, brownish red; 100, red) is indicated for nine different tissues (from top to bottom: root hairs 84 h after sowing, root hairs 120 h after sowing, nodule, root, root tip, shoot apical meristem, leaf, flower and pod). The final color strip at the bottom of each chromosome represents gene density (i.e. number of genes per 100 kbp; 0–15 or higher fi black-white). These maps were generated by using the comparative map and trait viewer (CMTV) software. including three encoding a predicted kinesin, a regulator of Expression pattern of soybean nodulation-related genes cytokinesis (Mu¨ ller et al., 2006). In addition, eight transcrip- tional and translational regulators (e.g. bHLH, SBP, Zf-HD A unique feature of legumes, including soybean, is their TFs; RNA polymerase subunit, PIWI and ribosomal protein) formation of a novel root organ, the nodule, in response to were also preferentially expressed in soybean meristematic rhizobial infection. Previously, Schmutz et al. (2010) anno- zones, suggesting strong transcriptional and translational tated approximately 100 soybean genes as those predicted activities, which are probably also involved in maintaining to play a role in nodulation, based on an extensive review of the high cell division rate and in controlling cell determina- the nodulation literature. Among these 100 putative nodu- tion, differentiation and elongation. lation-related soybean genes, 14 were regulated during root

ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99 Soybean transcriptome atlas 93 hair cell infection by B. japonicum (Libault et al., 2010). An in nodules, but also in root hair cells uninoculated by examination of the soybean gene expression atlas showed B. japonicum. In addition, Glyma06g06930 expression was that only one, Glyma13g12440 (a putative GmN56 gene; induced in soybean root hairs at 12 (3.7-fold change), but not Schmutz et al., 2010), of the 100 soybean nodulation-related at 24 and 48 HAI, with B. japonicum (Table S2). These data genes (Table S12) was not expressed in any of the nine tis- suggest that the flotillin encoded by Glyma06g06930 is likely sues sampled. In a previous study of soybean nodulation, to be orthologous to the genes shown by Haney and Long Kouchi and Hata (1995) clearly identified a transcript for (2009) to be crucial to root hair infection by S. meliloti. GmN56. Consequently, we looked at the expression of Gly- Expression patterns of soybean transcription factor genes ma13g12490 and Glyma13g12500, two homeologous genes to Glyma13g12440 (Schmutz et al., 2010). Both genes were The TF genes are of clear interest because they control plant expressed and to a significantly higher level in nodules responses to the environment, as well as developmental (Figure S2; Table S12). Therefore, it is likely that the GmN56 pathways (for a review, see Libault et al., 2009a). For EST identified by Kouchi and Hata (1995) arose from either example, our earlier study (Libault et al., 2010) identified a Glyma13g12490 or Glyma13g12500, and not from Gly- number of soybean TF genes in which expression ma13g12440. Of the remaining 100 putative nodulation- responded to B. japonicum inoculation. Soybean genes related genes, 70 genes were not expressed preferentially in homologous to MtHAP2.1, MtERN and LjNIN genes, genes nodules (£3-fold change between nodule and the eight controlling M. truncatula and L. japonicus nodule develop- remaining tissues), including those encoding the putative ment (Schauser et al., 1999; Combier et al., 2006; Middleton Nod factor receptors (NFR1a-b and NFR5a-b), and TFs known et al., 2007), were clearly identified based on syntenic rela- to regulate root hair cell infection (e.g. NSP1 and NSP2) tionships and their nodule-specific expression (Libault et al., (Table S12). The induction of the expression of these genes 2009a,b). during root hair infection by B. japonicum (Libault et al., The soybean gene expression atlas was mined to identify 2010), but not in mature nodules, is in agreement with their TF genes exhibiting tissue-specific expression. This analysis early role during legume infection (Catoira et al., 2000; Amor identified 624 TF genes that were expressed preferentially in et al., 2003; Madsen et al., 2003; Oldroyd and Long, 2003; one soybean tissue compared with the eight others, includ- Radutoiu et al., 2003; Kalo et al., 2005; Smit et al., 2005; ing 114, five and one TF genes, specifically, very specifically Heckmann et al., 2006; Murakami et al., 2006). The remain- and exclusively expressed in one tissue, respectively (Fig- ing 29 genes were preferentially expressed in nodules (‡3- ure 2a; Table S13). fold change; Figure S2; Table S12). Among these, 16 and Examination of this list of 120 TF genes specifically seven genes were specifically (‡10- and <100-fold changes) expressed in at least one tissue (‡10-fold change) identified and very specifically (‡100- and <1000-fold changes) a significant number of C2H2 (Zn) and NIN-like TF genes expressed in the nodules (Figure S2; Table S12). Homeolo- expressed preferentially in nodules (Figure 4). As described gous pairs of NIN (Glyma04g00210 and Glyma06g00240), above, the role of NIN-like genes in legume nodulation is NIN2 (Glyma12g05390 and Glyma11g13390) and CYCLOPS well established. However, to date, there is no functional genes (Glyma01g35260 and Glyma09g34690) were demonstration of a role for C2H2 (Zn) TF genes during expressed specifically in soybean nodules (Figure S2; Table legume nodulation. Our data suggests that this should be S12). The role of NIN in L. japonicus nodule development examined more closely. Members of the Homeodomain TF was previously noted by Schauser et al. (1999), whereas family were restricted to the SAM, whereas members of the CYCLOPS function during L. japonicus nodule development LIM, MADS and NAC TF families were preferentially was not clearly established (Yano et al., 2008). In addition, expressed in flowers, suggesting a specific role for these consistent with their initial characterization, 23 encoded TF gene families in the normal development of these tissues nodulins were also expressed specifically in nodules. (Figure 4). In A. thaliana, a large number of MADS TF genes, Recently, Haney and Long (2009) identified seven flotillin- such as SEP1, SEP2, SEP3, SEP4, APETALA1 (AP1), APET- like genes in M. truncatula, which are gene homologs of the ALA3 (AP3), PISTILLATA (PI) and AGAMOUS (AG), are key soybean nodulin GmNod53b (Winzer et al., 1999). Two of regulators of flower development (for a review, see Robles the M. truncatula flotillin genes were induced at 24 HAI with and Pelaz, 2005). Arabidopsis thaliana Homeodomain TF Sinorhizobium meliloti. Utilizing the GmNod53b sequence, genes, such as WUSCHEL (WUS) and SHOOTMERISTEM- we identified only two, homeologous flotillin genes in soy- LESS (STM), are important in the formation and mainte- bean (Glyma06g06930 and Glyma04g06830; e-value < e)20). nance of the SAM (Barton and Poethig, 1993; Endrizzi et al., However, their expression patterns across the nine tissues 1996; Laux et al., 1996; Mayer et al., 1998). Consequently, we were very different. For example, Glyma04g06830 expres- hypothesized that some of the soybean Homeodomain and sion was not detected in any tissues, with the exception of MADS TF genes expressed specifically in the SAM and nodule tissue, where its transcript was barely detected. flower may be orthologs to WUS and STM, and to SEP1, Glyma06g06930 was strongly and primarily expressed SEP2, SEP3, SEP4, AP1, AP3, PI and AG, respectively. In

ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99 94 Marc Libault et al.

Zf-HD WRKY ABI3/VP1 Nodule SBP SRS (6)

AP2-EREBP TPR AS2 AUX-IAA-ARF NIN-like Flower (5) NAC bHLH C2C2 (Zn) CO-like C2C2 (Zn) Dof BZIP C2C2 (Zn) YABBY

SAM (1)Pod MYB/HD-like (2) C2H2 (Zn) Nodule (11)

CAMTA CCAAT

MYB MADS Homeodomain DHHC (Zn) LIM HOMEOBOX GRAS Nodule (2) Flower Pod (1) Flower (2) Flower (8) (5) SAM (7)

Figure 4. Distribution of Glycine max (soybean) transcription factor genes expressed specifically in one soybean tissue, based on their family membership. The sub-pies highlight the distribution of specific transcription factor gene families in the different tissues, based on the specificity of their expression. order to establish this orthology, we looked for syntenic soybean genes putatively involved in flower development relationships between these gene families in the A. thaliana were strongly but not exclusively expressed in flowers and G. max genomes. With the exception of SEP3 and PI (Figure 5). Among them, four MADS genes (Gly- genes, we identified soybean orthologs of the flower and ma01g08150, Glyma02g13420, Glyma04g02980 and Gly- SAM-related Arabidopsis genes (Figure S3). In most cases, ma06g02990), orthologs to AtAP1, AtSEP4 and AtAP3, the recent duplication of the soybean genome logically led were identified as specifically expressed in flowers (Fig- to the identification of two putative orthologs. More surpris- ure S3; Table S13). The function of the remaining four ingly, the Glyma18g50900 gene was identified as the soybean MADS genes and seven Homeodomain genes potential ortholog of SEP1 and SEP2, whereas the region expressed specifically in flower and SAM needs to be encoding Glyma02g13420 was orthologous to both SEP4 investigated. Altogether, this analysis clearly demonstrates and AP1. Such a surprising result suggested the gene pairs the usefulness of combining genome and transcriptome SEP1/SEP2 and SEP4/AP1 probably diverged from common comparisons to identify genes playing critical developmen- gene ancestors before the divergence between soybean and tal roles in soybean. Arabidopsis. To provide further evidence of the orthology Taking advantage of this analysis, and to validate the between the soybean MADS and Homeodomain genes accurate measurement of soybean gene expression by WUS, STM, SEP1, SEP2, SEP4, AP1, AP3 and AG, we mined Illumina Solexa technology, we compared the Illumina the Arabidopsis gene expression data (Hruz et al., 2008) to Solexa data set with transcriptomic analyses performed on compare the expression profiles of the genes in both 11 soybean tissues using the previously published quanti- organisms. Similarly to A. thaliana, a significant number of tative RT-PCR primer set library, designed against more than

ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99 Soybean transcriptome atlas 95

Figure 5. Gene expression patterns of Arabid- (a) opsis genes involved in the formation and maintenance of the shoot apical meristem (SAM) and the determination of flower organs Arabidopsis thaliana (a), and their putative orthologs in Glycine max Callus Cell culture/primary cell (soybean) (b). Sperm cell Seedling Genevestigator (Hruz et al., 2008) and the soy- Cotyledons Hypocotyl bean gene atlas were mined to establish the Radicle expression pattern of the Arabidopsis and soy- Imbibed seed Inflorescence bean genes, respectively. Flower Silique Seed Stem Node Shoot apex Cauline leaf Rosette Juvenile leaf Adult leaf Petiole Senescent leaf Hypocotyl Leaf primordia Stem Root Lateral root Root hair zone Root tip Elongation zone Endodermis Endodermis + cortex Epid. atrichoblasts Lateral root cap Stele

(b)

Root tip 0 0.9 Root hair 1.8 2.7 84HAS 3.6 Root hair 4.5 5.4 120HAS 6.3 7.2 Root 8.1 9 Nodule

SAM

Leaf

Flower

Pod

1000 soybean regulatory genes, including 652 TF genes duplication in soybean (Schlueter et al., 2004, 2007), and the (Libault et al., 2009b). In virtually all cases, the qRT-PCR silencing of homeologous genes (Libault et al., 2009a,b; results validated the measurements made by Illumina present study). Consequently, the use of orthology to Solexa sequencing. Full details are provided in Appen- deduce common function among the three legume species dix S1. will not only require the establishment of a syntenic relationship, but also the demonstration of similar gene Comparison of the M. truncatula, L. japonicus and G. max expression patterns. This is further evidence for the utility of transcriptome gene expression atlases for these three species. Glycine max, M. truncatula and L. japonicus probably The majority of gene expression data available for diverged around 40 Mya, reflecting the extensive micro- M. truncatula and L. japonicus come from a variety of synteny that exists between their genomes (Choi et al., 2004; Affymetrix microarray experiments. Therefore, as a first Cannon et al., 2006; Young and Udvardi, 2009). This rela- step to compare gene expression from these two species tionship provides opportunities to transfer genetic knowl- with that of soybean, we sought to identify the ortholo- edge between these three species. However, such gous genes present on the M. truncatula and L. japonicus comparisons also need to allow for divergence in the Affymetrix arrays, and their counterparts in soybean. To expression patterns of orthologous genes during legume simplify this analysis, we focused on the 147 annotated evolution, especially given the more recent whole genome soybean genes expressed very specifically in only one

ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99 96 Marc Libault et al. tissue (‡100-fold change; Table S10). Subsequently, we evolutionary history. Space prevents us from presenting a mined the M. truncatula and L. japonicus expression data variety of additional examples. At this point, the annotation for the corresponding orthologs by referencing the respec- of the G. max, M. truncatula and L. japonicus genomes tive gene expression atlases (Benedito et al., 2008; Hogsl- clearly needs improvement. We predict that the full integra- und et al., 2009). This approach allowed the direct tion of the syntenic and transcriptome analysis of these comparison of 40 soybean genes in five tissues (nodule, three genomes will ultimately lead to the systematic iden- root, leaf, flower and pods) with the corresponding tification of legume orthologs. At that point, it will be M. truncatula orthologs in the same five tissues, and the possible to rapidly transfer genetic and functional knowl- L. japonicus orthologs in four tissues (nodule, root, leaf edge derived in one species to the others. and flower). This comparison showed that 18 soybean EXPERIMENTAL PROCEDURES genes share similar tissue specificity with their putative orthologs in M. truncatula and L. japonicus (Table S14). Bacterial cultures This number may simply reflect the difficulty of establish- Bradyrhizobium japonicum USDA110 was grown at 30C for 3 days ing true orthology, or may reflect subfunctionalization or in HM medium (Cole and Elkan, 1973), supplemented with yeast neofunctionalization of the remaining 22 soybean putative extract (0.025%), D-arabinose (0.1%) and chloramphenicol (0.004%). orthologs. To better establish orthology, we analyzed Before plant inoculation, B. japonicum cells were pelleted (2000 g microsynteny between the G. max, M. truncatula and for 10 min), washed and diluted with sterile water to OD600 = 0.1. L. japonicus loci encoding the various putative orthologs. Plant culture Significant microsynteny was found between three G. max All tissues described below were isolated from soybean G. max (L.) and M. truncatula and eight G. max and L. japonicus gene Merr. cultivar ‘Williams 82’ plants. For each tissue, three indepen- regions (Figure S4; Table S14). For example, microsynteny dent biological replicates were performed on a different set of plants was found between the Glyma01g44660 soybean gene to ensure the reproducibility of the plant tissues analyzed (i.e. seeds region and the corresponding regions in both M. trunca- were sowed three times on different days, and tissues were har- tula (Medtr5g006680) and L. japonicus (CM0591.50.nd). vested as described below). Soybean seeds were surface sterilized according to the method These three genes were expressed specifically in flowers. described by Wan et al. (2005), and were sowed on nitrogen-free Interestingly, during our analysis we also highlighted B&D agar medium (Broughton and Dilworth, 1971). Untreated synteny between legume genes not identified during the root hair cells and stripped roots used for qRT-PCR were isolated initial screen (Figure S4). For instance, in addition to appar- from 3-day-old seedlings, as described by Wan et al. (2005). A ent orthology to Glyma07g16290, LjCM0147.870.nc was also similar protocol was used to isolate 84- and 120-HAS root hairs (Libault et al., 2010; 84- and 120-HAS root hairs were mock- orthologous to Glyma18g40360, a soybean gene preferen- inoculated root hairs isolated 12 and 48 h after being sprayed tially expressed in the nodules, based on the soybean gene with water). atlas (Figure S4; Table S2). These three genes are predicted Other tissues were isolated as described below. The 3-day-old to encode C2H2 (Zn) TFs, consistent with the previously seedlings were germinated between moist Whatman filter paper. mentioned abundant expression of this family of TF genes in Root tips were harvested on these seedlings. To produce other tissues, germinated seedlings were transferred to the glasshouse nodule tissue. Microsynteny was found between genes in under long-day conditions (16-h day/8-h night) at 27C on Promix Bx G. max and M. truncatula, which have very different expres- soil (Premier Horticulture, http://www.premierhort.com). Fourteen- sion patterns. For example, Glyma09g41200, Gly- day-old SAM (V2 stage), 18-day-old trifoliate leaves, stem and roots ma18g44670 and Glyma18g44680 are soybean genes (V2 stage), flowers (R2 stage), and seeds and pods (R6 stage) were expressed specifically in flowers, and lie on a region of the harvested. Nodules were harvested 32 days after the inoculation of 1 ml of B. japonicum suspension (OD = 0.1) on transferred soybean genome microsyntenic to Medtr7g080300, which 600 3-day-old seedlings. also appears microsyntenic to the soybean loci encoding Glyma01g32750 and Glyma01g32760, two soybean genes RNA extraction, DNase treatments, and reverse expressed in a variety of organs (Figure S4; Tables S2). transcription This example suggests the subfunctionalization of Total RNA was isolated using Trizol Reagent (Invitrogen, http:// Glyma01g32750 and Glyma01g32760 after the divergence www.invitrogen.com) according to the manufacturer’s instructions, of G. max and M. truncatula.AsGlyma18g44670–Gly- followed by a chloroform extraction to improve their purity. Total ma18g44680 and Glyma01g32750–Glyma01g32760 proba- were treated and reverse-transcribed differentially regarding the technology used to quantify cDNA levels. bly arose by tandem duplication, we assume that the subfunctionalization of Glyma01g32750 and Gly- qRT-PCR. The qRT-PCR reactions including the different controls ma01g32760 occurred after the duplication of the soybean were performed as described by Libault et al. (2009b). genome, but before their tandem duplication. The above example further illustrates the value of genome and tran- Solexa sequencing. For each condition, similar quantities of total scriptome comparisons that allow interesting conclusions RNA isolated from three independent biological replicates were concerning the orthology of specific genes, and their pooled together. After first- and second-strand cDNA synthesis, the

ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99 Soybean transcriptome atlas 97 cDNAs were end repaired prior to ligation of Solexa adaptors. The lighted their orthology by establishing a microsynteny relationship products were sequenced on a Solexa platform. between them using the same methodology described above. Graphics showing microsynteny relationships were generated by Quantitative PCR reaction conditions and data analysis using CMTV (Sawkins et al., 2004). The qRT-PCR reactions were performed as described by Libault ACKNOWLEDGEMENTS et al. (2009b). The specificity of primer sets was confirmed by analyzing the dissociation curve profile of each qRT-PCR amplicon, We thank Melanie Mormile, Sandra Thibivilliers and Charlie and the efficiency of primers (Peff) was quantified using LinRegPCR P. Jones for their critical reading of the manuscript. We also thank (Ramakers et al., 2003). Cons6, encoding an F-box protein (Libault Chia Rou Yeo for technical assistance and Shaoxing Wang for et al., 2008), was used to normalize the expression levels of puta- providing some total RNA samples. We are also grateful to the tive soybean regulatory genes. The cycle threshold (Ct) value of the Medicago Genome Sequence Consortium (MGSC) for providing reference gene was subtracted from the Ct values of the test gene M. truncatula genomic sequences. This work was funded by a grant analyzed (DCt). The expression level (E) of each gene was calcu- from the National Science Foundation (Plant Genome Program, ()DC ) lated according to the equation: E = Peff t . The average of #DBI-0421620). TJ, LDF and DX were supported by United Soybean the expression levels between three different replicates was Board grant #8236. calculated. SUPPORTING INFORMATION Solexa read alignment, statistical analysis and data Additional Supporting Information may be found in the online representation version of this article: Illumina Genome Analyzer II image data were base-called and Figure S1. Expression levels of putative soybean (Glycine max) quality filtered using the default filtering parameters of the Illumina constitutive genes in 14 different conditions (y-axis) compared with GA Pipeline GERALD stage (Illumina, Inc., http://www.illu- the average of their expression levels across the 14 conditions mina.com). Alignments of passing 36-mer reads to all contigs of the (x-axis). Glyma1 8x Soybean Genome assembly (Soybean Genome Project, Figure S2. A total of 29 soybean (Glycine max) nodulation-related http://www.jgi.doe.gov) were performed using GSNAP (Wu and genes were expressed preferentially in mature nodules. Nacu, 2010), an alignment program derived from GMAP (Wu and Figure S3. Syntenic relationship between Glycine max and Arabid- Watanabe, 2005), with optimizations for aligning short transcript opsis thaliana genes involved in flower organ determination and reads from next-generation sequencers to genomic reference maintenance of the shoot apical meristem. sequences. Alignments were processed using the Alpheus pipeline Figure S4. Syntenic relationship between Glycine max (soybean), (Miller et al., 2008), keeping only alignments that had at least 34 out Medicago truncatula and Lotus japonicus genes surrounding soy- of 36 identities, and had no more than five equivalent best hits. Read bean-nodule- and flower-specific genes. counts used in expression analyses were based on the subset of Figure S5. Comparison of the transcriptomes of 1016 soybean uniquely aligned reads that also overlapped the genomic spans of regulatory genes by qRT-PCR tissues. the Glyma1 gene predictions. Read counts for a given sample were Table S1. Gene expression pattern of predicted and unannotated normalized by using values for a gene’s uniquely aligned read Glycine max (soybean) genes in nine different tissues. counts per million reads uniquely aligning within that sample. Table S2. Gene expression pattern of predicted and unannotated The raw and normalized Solexa data are available on http:// Glycine max (soybean) genes in nine different tissues, and in root digbio.missouri.edu/soybean_atlas, whereas the entire set of hair and stripped roots in response to Bradyrhizobium japonicum. Solexa sequences used in our studies can be downloaded from Table S3. Confidence in gene prediction according to Schmutz et al. the NCBI SRA browser (accession number SRA012188.1; http:// (2010) of 16 198 Glycine max (soybean) genes not expressed in www.ncbi.nlm.nih.gov/Traces/sra). soybean tissues, and in the early steps of nodulation. The color code maps of the soybean transcriptome across the 20 Table S4. Gene expression of Glycine max (soybean) homeologous chromosomes were generated by using the comparative map and genes. trait viewer (CMTV) software (Sawkins et al., 2004). Table S5. Gene expression of Glycine max (soybean) homeologous genes relative to putative pseudogenes. Synteny analysis Table S6. Unannotated sequence reads that overlap Glycine max (soybean) annotated genes leading to an improvement of the To establish microsynteny between G. max and A. thaliana, amino soybean gene annotation. acid sequences of the A. thaliana candidate genes and at least the 20 Table S7. Identification of the signature domains of the 1736 genes surrounding them were blasted against soybean genome )20 proteins encoded by the putative new Glycine max (soybean) sequences. Using a P < e as a cut-off, BLAST results and gene genes. annotation were analyzed manually to established microsynteny. Table S8. Expression levels of Glycine max (soybean) sequences To compare the gene expression of orthologous genes between identified to be ubiquitously expressed across the nine soybean G. max, M. truncatula and L. japonicus, we first mapped the tissues tested. medicago and lotus Affymetrix probe sets against their respective Table S9. Gene expression and function of putative Glycine max genomes based on NCBI BLASTN searches. Only probe sets with at (soybean) constitutive genes across 14 different conditions. least nine matching probes, sited at least 22-bp up- or downstream Table S10. Identification of Glycine max (soybean) transcripts of a 4000-bp region, were considered for further analysis. The preferentially (‡3- and <10-fold changes between the expression BLAST of the predicted soybean transcripts against the Medicago levels of the most highly expressed and second most highly Mt v3.0 (http://www.medicago.org/genome) and Lotus pseudoge- expressed genes; yellow), specifically (‡10- and <100-fold changes; nomes (http://www.kazusa.or.jp/lotus) associated with the mapping orange), very specifically (‡100- and <1000-fold changes; red) and of the Medicago and Lotus Affymetrix probe sets led to a direct exclusively (‡1000-fold change; purple) identified in one of the nine comparison of the expression of the soybean, Medicago and Lotus tissues tested. genes. When genes shared a similar tissue specificity, we high-

ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99 98 Marc Libault et al.

Table S11. Identification of Glycine max (soybean) transcripts Medicago truncatula controlling components of a nod factor transduction preferentially (‡3- and <10-fold change; yellow), specifically (‡10- pathway. Plant Cell, 12, 1647–1666. and <100-fold change; orange) and very specifically (‡100- and Cheung, F., Haas, B.J., Goldberg, S.M., May, G.D., Xiao, Y. and Town, C.D. <1000-fold change; red) identified in soybean root hair cells and (2006) Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology. BMC Genomics, 7, 272. meristems. Choi, H.K., Mun, J.H., Kim, D.J. et al. (2004) Estimating genome conservation Table S12. Relative gene expression levels of putative Glycine between crop and model legume species. Proc. Natl Acad. Sci. USA, 101, max (soybean) nodulation-related genes in nine different tissues, 15289–15294. including mature nodules. Coen, E. (2001) Goethe and the ABC model of flower development. C. R. Acad. Table S13. Identification of Glycine max (soybean) transcription Sci. III, 324, 523–530. factor genes preferentially (‡3- and <10-fold changes; yellow), Cole, M.A. and Elkan, G.H. (1973) Transmissible resistance to penicillin G, specifically (‡10- and <100-fold changes; orange), very specifically neomycin, and chloramphenicol in Rhizobium japonicum. Antimicrob. (‡100- and <1000-fold changes; red) and exclusively (>1000-fold Agents Chemother. 4, 248–253. change; purple) expressed in one out of the nine tissues tested. Combier, J.P., Frugier, F., de Billy, F. et al. (2006) MtHAP2-1 is a key tran- scriptional regulator of symbiotic nodule development regulated by micro- Table S14. Gene expression pattern between Glycine max RNA169 in Medicago truncatula. Genes Dev. 20, 3084–3088. (soybean), Medicago truncatula and Lotus japonicus orthologous Ditta, G., Pinyopich, A., Robles, P., Pelaz, S. and Yanofsky, M.F. (2004) The genes. SEP4 gene of Arabidopsis thaliana functions in floral organ and meristem Table S15. Gene expression of 1016 Glycine max (soybean) identity. Curr. Biol. 14, 1935–1940. regulatory genes in 11 different soybean tissues. Endrizzi, K., Moussian, B., Haecker, A., Levin, J.Z. and Laux, T. (1996) The Table S16. Identification of tissue-specific Glycine max (soybean) SHOOT MERISTEMLESS gene is required for maintenance of undifferen- regulatory genes, based on qRT-PCR experiments. tiated cells in Arabidopsis shoot and floral meristems and acts at a different Appendix S1. Large-scale qRT-PCR of Glycine max (soybean) regulatory level than the meristem genes WUSCHEL and ZWILLE. Plant J. transcription factor genes. 10, 967–979. Gill, N., Findley, S., Walling, J.G., Hans, C., Ma, J., Doyle, J., Stacey, G. and Please note: As a service to our authors and readers, this journal Jackson, S.A. (2009) Molecular and chromosomal evidence for allopoly- provides supporting information supplied by the authors. Such ploidy in soybean. Plant Physiol. 151, 1167–1174. materials are peer-reviewed and may be re-organized for online Haney, C.H. and Long, S.R. (2009) Plant flotillins are required for infection by delivery, but are not copy-edited or typeset. Technical support nitrogen-fixing bacteria. Proc. Natl. Acad. Sci. USA, 107, 478–483. issues arising from supporting information (other than missing Heckmann, A.B., Lombardo, F., Miwa, H., Perry, J.A., Bunnewell, S., Parniske, files) should be addressed to the authors. M., Wang, T.L. and Downie, J.A. (2006) Lotus japonicus nodulation requires two GRAS domain regulators, one of which is functionally conserved in a REFERENCES non-legume. Plant Physiol. 142, 1739–1750. Hogslund, N., Radutoiu, S., Krusell, L. et al. (2009) Dissection of symbiosis Aceituno, F.F., Moseyko, N., Rhee, S.Y. and Gutierrez, R.A. (2008) The rules of and organ development by integrated transcriptome analysis of lotus gene expression in plants: organ identity and gene body methylation are japonicus mutant and wild-type plants. PLoS ONE, 4, e6556. key factors for regulation of gene expression in Arabidopsis thaliana. BMC Honma, T. and Goto, K. (2001) Complexes of MADS-box proteins are suffi- Genomics, 9, 438. cient to convert leaves into floral organs. Nature, 409, 525–529. Adams, K.L. (2007) Evolution of duplicate gene expression in polyploid and Hruz, T., Laule, O., Szabo, G., Wessendorp, F., Bleuler, S., Oertle, L., hybrid plants. J. Hered. 98, 136–141. Widmayer, P., Gruissem, W. and Zimmermann, P. (2008) Genevestigator Amor, B.B., Shaw, S.L., Oldroyd, G.E., Maillet, F., Penmetsa, R.V., Cook, D., V3: a reference expression database for the meta-analysis of transcripto- Long, S.R., Denarie, J. and Gough, C. (2003) The NFP locus of Medicago mes. Adv. Bioinformatics, 420747. truncatula controls an early step of Nod factor signal transduction Jiao, Y., Tausta, S.L., Gandotra, N. et al. (2009) A transcriptome atlas of rice upstream of a rapid calcium flux and root hair deformation. Plant J. 34, cell types uncovers cellular, functional and developmental hierarchies. Nat. 495–506. Genet. 41, 258–263. Barton, M.K. and Poethig, R.S. (1993) Formation of the shoot apical meristem Kalo, P., Gleason, C., Edwards, A. et al. (2005) Nodulation signaling in in Arabidopsis thaliana: an analysis of development in the wild type and in legumes requires NSP2, a member of the GRAS family of transcriptional the shoot meristemless mutant. Development (Cambridge, England), 119, regulators. Science, 308, 1786–1789. 823–831. Kouchi, H. and Hata, S. (1995) GmN56, a novel nodule-specific cDNA from Baumberger, N., Ringli, C. and Keller, B. (2001) The chimeric leucine-rich soybean root nodules encodes a protein homologous to isopropylmalate repeat/extensin cell wall protein LRX1 is required for root hair morpho- synthase and homocitrate synthase. Mol. Plant Microbe Interact. 8, 172– genesis in Arabidopsis thaliana. Genes Dev. 15, 1128–1139. 176. Baumberger, N., Doesseger, B., Guyot, R. et al. (2003) Whole-genome Laux, T., Mayer, K.F., Berger, J. and Jurgens, G. (1996) The WUSCHEL gene is comparison of leucine-rich repeat extensins in Arabidopsis and rice. A required for shoot and floral meristem integrity in Arabidopsis. Develop- conserved family of cell wall proteins form a vegetative and a reproductive ment, 122, 87–96. clade. Plant Physiol. 131, 1313–1326. Libault, M., Thibivilliers, S., Bilgin, D.D., Radwan, O., Benitez, M., Clough, S.J. Benedito, V.A., Torres-Jerez, I., Murray, J.D. et al. (2008) A gene expression and Stacey, G. (2008) Identification of four soybean reference genes for atlas of the model legume Medicago truncatula. Plant J. 55, 504–513. gene expression normalization. The Plant Genome, 1, 44–54. Bennett, S.T., Barnes, C., Cox, A., Davies, L. and Brown, C. (2005) Toward the Libault, M., Joshi, T., Benedito, V.A., Xu, D., Udvardi, M.K. and Stacey, G. 1,000 dollars human genome. Pharmacogenomics, 6, 373–382. (2009a) Legume transcription factor genes: what makes legumes so spe- Broughton, W.J. and Dilworth, M.J. (1971) Control of leghaemoglobin cial? Plant Physiol. 151, 991–1001. synthesis in snake beans. Biochem. J. 125, 1075–1080. Libault, M., Joshi, T., Takahashi, K. et al. (2009b) Large-scale analysis of Bucher, M., Brunner, S., Zimmermann, P., Zardi, G.I., Amrhein, N., Willmitzer, putative soybean regulatory gene expression identifies a myb gene L. and Riesmeier, J.W. (2002) The expression of an extensin-like protein involved in soybean nodule development. Plant Physiol. 151, 1207–1220. correlates with cellular tip growth in tomato. Plant Physiol. 128, 911–923. Libault, M., Farmer, A., Brechenmacher, L. et al. (2010) Complete transcrip- Cannon, S.B., Sterck, L., Rombauts, S. et al. (2006) Legume genome evolution tome of soybean root hair cell, a single cell model, and its alteration in viewed through the Medicago truncatula and Lotus japonicus genomes. response to Bradyrhizobium japonicum infection. Plant Physiol. 152, 541– Proc. Natl Acad. Sci. USA, 103, 14959–14964. 552. Carol, R.J. and Dolan, L. (2006) The role of reactive oxygen species in cell Madsen, E.B., Madsen, L.H., Radutoiu, S. et al. (2003) A receptor kinase gene growth: lessons from root hairs. J. Exp. Bot. 57, 1829–1834. of the LysM type is involved in legume perception of rhizobial signals. Catoira, R., Galera, C., de Billy, F., Penmetsa, R.V., Journet, E.P., Maillet, F., Nature, 425, 637–640. Rosenberg, C., Cook, D., Gough, C. and Denarie, J. (2000) Four genes of

ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99 Soybean transcriptome atlas 99

Margulies, M., Egholm, M., Altman, W.E. et al. (2005) Genome sequencing in Schauser, L., Roussis, A., Stiller, J. and Stougaard, J. (1999) A plant regulator microfabricated high-density picolitre reactors. Nature, 437, 376–380. controlling development of symbiotic root nodules. Nature, 402, 191–195. Mayer, K.F., Schoof, H., Haecker, A., Lenhard, M., Jurgens, G. and Laux, T. Schlueter, J.A., Dixon, P., Granger, C., Grant, D., Clark, L., Doyle, J.J. and (1998) Role of WUSCHEL in regulating stem cell fate in the Arabidopsis Shoemaker, R.C. (2004) Mining EST databases to resolve evolutionary shoot meristem. Cell, 95, 805–815. events in major crop species. Genome, 47, 868–876. Middleton, P.H., Jakab, J., Penmetsa, R.V. et al. (2007) An ERF transcription Schlueter, J.A., Lin, J.Y., Schlueter, S.D. et al. (2007) Gene duplication and factor in Medicago truncatula that is essential for Nod factor signal trans- paleopolyploidy in soybean and the implications for whole genome duction. Plant Cell, 19, 1221–1234. sequencing. BMC Genomics, 8, 330. Miller, N.A., Kingsmore, S.F., Farmer, A.D. et al. (2008) Management of high- Schmid, M., Davison, T.S., Henz, S.R., Pape, U.J., Demar, M., Vingron, M., throughout DNA sequencing projects: Alpheus. J. Comput. Sci. Syst. Biol. Scholkopf, B., Weigel, D. and Lohmann, J.U. (2005) A gene expression map 1, 132–148. of Arabidopsis thaliana development. Nat. Genet. 37, 501–506. Mu¨ ller, S., Han, S. and Smith, L.G. (2006) Two kinesins are involved in the Schmutz, J., Cannon, S.B., Schlueter, J. et al. (2010) Genome sequence of the spatial control of cytokinesis in Arabidopsis thaliana. Curr. Biol. 16, 888– palaeopolyploid soybean. Nature, 463, 178–183. 894. Smit, P., Raedts, J., Portyanko, V., Debelle, F., Gough, C., Bisseling, T. and Murakami, Y., Miwa, H., Imaizumi-Anraku, H., Kouchi, H., Downie, J.A., Geurts, R. (2005) NSP1 of the GRAS protein family is essential for rhizobial Kawaguchi, M. and Kawasaki, S. (2006) Positional cloning identifies Lotus Nod factor-induced transcription. Science, 308, 1789–1791. japonicus NSP2, a putative transcription factor of the GRAS family, Wan, J., Torres, M., Ganapathy, A., Thelen, J., DaGue, B.B., Mooney, B., Xu, D. required for NIN and ENOD40 gene expression in nodule initiation. DNA and Stacey, G. (2005) Proteomic analysis of soybean root hairs after Res. 13, 255–265. infection by Bradyrhizobium japonicum. Mol. Plant Microbe Interact. 18, Nobuta, K., Venu, R.C., Lu, C. et al. (2007) An expression atlas of rice mRNAs 458–467. and small RNAs. Nat. Biotechnol. 25, 473–477. Weber, A.P., Weber, K.L., Carr, K., Wilkerson, C. and Ohlrogge, J.B. (2007) Oldroyd, G.E. and Downie, J.A. (2008) Coordinating nodule morphogenesis Sampling the Arabidopsis transcriptome with massively parallel pyrose- with rhizobial infection in legumes. Annu. Rev. Plant. Biol. 59, 519–546. quencing. Plant Physiol. 144, 32–42. Oldroyd, G.E. and Long, S.R. (2003) Identification and characterization of Winzer, T., Bairl, A., Linder, M., Linder, D., Werner, D. and Muller, P. (1999) A nodulation-signaling pathway 2, a gene of Medicago truncatula involved in novel 53-kDa nodulin of the symbiosome membrane of soybean nodules, Nod actor signaling. Plant Physiol. 131, 1027–1032. controlled by Bradyrhizobium japonicum. Mol. Plant Microbe Interact. 12, Pelaz, S., Tapia-Lopez, R., Alvarez-Buylla, E.R. and Yanofsky, M.F. (2001) 218–226. Conversion of leaves into petals in Arabidopsis. Curr. Biol. 11, 182–184. Wu, T.D. and Nacu, S. (2010) Fast and SNP-tolerant detection of complex Radutoiu, S., Madsen, L.H., Madsen, E.B. et al. (2003) Plant recognition of variants and splicing in short reads. Bioinformatics, 26, 873–881. symbiotic bacteria requires two LysM receptor-like kinases. Nature, 425, Wu, T.D. and Watanabe, C.K. (2005) GMAP: a genomic mapping and align- 585–592. ment program for mRNA and EST sequences. Bioinformatics, 21, 1859– Ramakers, C., Ruijter, J.M., Deprez, R.H. and Moorman, A.F. (2003) 1875. Assumption-free analysis of quantitative real-time polymerase chain Yano, K., Yoshida, S., Muller, J. et al. (2008) CYCLOPS, a mediator of sym- reaction (PCR) data. Neurosci. Lett. 339, 62–66. biotic intracellular accommodation. Proc. Natl Acad. Sci. USA, 105, 20540– Robles, P. and Pelaz, S. (2005) Flower and fruit development in Arabidopsis 20545. thaliana. Int. J. Dev. Biol. 49, 633–643. Young, N.D. and Udvardi, M. (2009) Translating Medicago truncatula Sawkins, M.C., Farmer, A.D., Hoisington, D., Sullivan, J., Tolopko, A., Jiang, genomics to crop legumes. Curr. Opin. Plant Biol. 12, 193–201. Z. and Ribaut, J.M. (2004) Comparative map and trait viewer (CMTV): an Zdobnov, E.M. and Apweiler, R. (2001) InterProScan–an integration platform integrated bioinformatic tool to construct consensus maps and compare for the signature-recognition methods in InterPro. Bioinformatics, 17, 847– QTL and functional genomics data across genomes and experiments. Plant 848. Mol. Biol. 56, 465–480.

ª 2010 The Authors Journal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99