<<

McClean et al. BMC Genomics 2010, 11:184 http://www.biomedcentral.com/1471-2164/11/184

RESEARCH ARTICLE Open Access Synteny mapping between common and reveals extensive blocks of shared loci Phillip E McClean1,2*, Sujan Mamidi1, Melody McConnell1, Shireen Chikara1, Rian Lee1,2

Abstract Background: Understanding syntentic relationship between two species is critical to assessing the potential for comparative genomic analysis. Common bean ( vulgaris L.) and soybean (Glycine max L.), the two most important members of the Phaseoleae , appear to have a diploid and polyploidy recent past, respectively. Determining the syntentic relationship between these two species will allow researchers to leverage not only genomic resources but also genetic data for important agronomic traits to improve both of these species. Results: Genetically-positioned transcript loci of common bean were mapped relative to the recent soybean 1.01 pseudochromosome assembly. In nearly every case, each common bean locus mapped to two loci in soybean, a result consistent with the duplicate polyploidy history of soybean. Blocks of synteny averaging 32 cM in common bean and 4.9 Mb in soybean were observed for all 11 common bean linkage groups, and these blocks mapped to all 20 soybean pseudochromosomes. The median physical-to-genetic distance ratio in common bean (based on soybean physical distances) was ~120 kb/cM. ~15,000 common bean sequences (primarily EST contigs and EST singletons) were electronically positioned onto the common bean map using the shared syntentic blocks as references points. Conclusion: The collected evidence from this mapping strongly supports the duplicate history of soybean. It further provides evidence that the soybean genome was fractionated and reassembled at some point following the duplication event. These well mapped syntentic relationships between common bean and soybean will enable researchers to target specific genomic regions to discover genes or loci that affect phenotypic expression in both species.

Background that aid the cloning of a gene in one species based on Comparative genetics and genomics leverages knowledge its position in a reference species. from companion species and attempts to define impor- Synteny has been studied quite extensively in , tant evolutionary relationships. Important to under- beginning with the early macro comparisons using RFLP standing these relationships is the physical and genetic markers. Although these analyses used a limited number synteny between any two species. Physical synteny can of markers, they revealed major syntenic themes that be used to explain the cytogenetic events that a genome have informed much of the research in the field. Closely has undergone as it evolved along a lineage into its cur- related species, such as tomato and potato [1,2], have rent structural form. Associated with these genomic highly conserved marker order that is only disturbed by events is the evolutionary repositioning of genes respon- clearly defined events such as paracentric inversions. At sible for phenotypes shared between the two species. greater evolutionary distances, as evidenced by tomato This repositioning places genes in different genomic and pepper [3], inter- and intra-chromosomal transloca- contexts that could alter the degree and timing of their tions have redistributed loci such that only short synten- phenotypic effects. Understanding the overall position- tic blocks are observed while the chromosome number ing of genes in two related species can suggest strategies remains unchanged. Finally, a one-to-two mapping of loci, as seen with sorghum and [4], revealed the * Correspondence: [email protected] effects of polyploidy on synteny. These types of studies, 1Genomics and Bioinformatics Program, North Dakota State University, Fargo, involving multiple species and using collections of ND 58105, USA

© 2010 McClean et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. McClean et al. BMC Genomics 2010, 11:184 Page 2 of 10 http://www.biomedcentral.com/1471-2164/11/184

shared probes, have revealed major evolutionary rela- observed between Arabidopsis and B. oleracea,twospe- tionships within lineages such as the grasses [5] and cies that diverged ~20 MYA [11]. The length of the legumes [6]. For the grasses, using 30 segments of the shared block appears to depend upon the divergence rice genome as a reference, the genomes of ten other time. The rice/wheat synteny appears to vary, with some species could be reconstructed [5]. regions showing extensive microcolinearity [12,13] while More recently, Brassicaceae evolution was studied in other regions exhibit a history of cytological events that depth with a high density set of markers. Using the gen- has reduced the extent of local syntenty [14,15]. Maize ome of the amphidiploid B. napus as a reference, the and sorghum are recently related evolutionarily through duplication history of its two parents, B. rapa and B. acommonancestorthatgaverisetosorghumandthe oleracea, revealed earlier rounds of duplication shared two progeneitors of maize [16]. The species diverged between these two diploid genomes and suggested that only 12 MYA [17], yet the synteny between sorghum a progenitor with fewer chromosomes was responsible and maize appears to be less extensive than that for the lineage [7]. Finally, a comparison of a dense between sorghum and rice that diverged ~40 MYA genetic map of B. napus, consisting of > 1000 loci [18,19]. orthologous to Arabidopsis genes, and the genomic Here we investigate the syntenic relationship between sequence of Arabidopsis, revealed that 90% of the two important legumes, common bean (Phaseolus vul- B. napus genome could be reconstructed from 21 ortho- garis L.) and soybean (Glycine max). These two species logous blocks of Arabidopsis genome while a number of are the two most important economically important the Arabidopsis segments were duplicated in the legumes, soybean for its many human and animal B. napus genome [8]. usages, and common bean as an important nutritional With the wide-spread availability of whole genome, crop for many economically poorer countries [20]. BAC, and EST sequencing, and new approaches to high These two species are members of the Phaseo- density mapping, further aspects of comparative evolu- leae (Figure 1), a clade within the economically impor- tion were revealed. Early comparisons of the genomic tant Papilionoideae legumes. This clade diverged from sequence data from Arabidopsis and tomato, two species the IRLC clade, the other economically important Papi- whose divergent evolution began 94 MYA at the time of lionoideae clade, 54.3 MYA [21,22]. the Asterid/Rosid divergence [9], revealed a pattern of Common bean and soybean diverged 19 MYA [21,22]. segmental duplication followed by local gene loss [10]. Determining those genomic events that followed the A similar pattern, but not to the same degree, was divergence is important as investigators attempt to

Phaseolus vulgaris (n=11; 0.60 pg)

4.9 Mya Vigna unguiculata (n=11; 0.60 pg) 5.6 Mya Dolichos (n=11; 0.38 pg) 18.0 Mya Glycine max (n=20; 1.13 pg) 23.0 Mya Erythrina sousae (n=21; 1.55 pg) 25.5 Mya Cajunus cajan (n=11; 0.88 pg) Figure 1 Phylogeny of economically important Phaseoleae legumes. The phylogeny is condensed from that presented in Stefanovic et al. (2009). The nodal dates are the minimum date of the range based on 32.1 million years ago (MYA) date of “A” node representing the divergence of the Phaseoleae legumes from Apios americana. The haploid chromosome number and C-value [in piocgrams (pg)] are from the C-Value Database http://data.kew.org/cvalues/; verified May 5, 2009). McClean et al. BMC Genomics 2010, 11:184 Page 3 of 10 http://www.biomedcentral.com/1471-2164/11/184

leverage the recently released soybean genome sequence genomes. Although these results are compelling, they [23] as a tool for research within the clade. RFLP map- only provided a limited description of the genomic rela- ping [24] and EST Ks analysis [25] provide strong evi- tionships between these two species. With a richer array dence that soybean under went a major duplication of sequenced-based resources, including an increasing event dated at 11 MYA [[25], Mamidi S, Lee RK, Terp- number of ESTs in common bean and an extensive strea J, Lavin M, Schlueter JA, Shoemaker RC, McClean draft sequence of the soybean genome, it is now possible PE: Whole genome duplications in the evolutionary his- to investigate the syntentic relationships between these tory of legumes, submitted]. Most likely this was an two species at a greater depth. autopolyploid event [22,26]. By contrast, molecular mar- ker data [27] show that common bean is a diploid while Common bean and soybean orthologous loci EST Ks analysis suggests that its genome has only Our first analysis of common bean and soybean syn- undergone localized segmental duplications. Given their teny was based on 300 gene-based loci [McConnell M, close relationship within the Phaseoleae, common bean Chikara S, Mamidi S, Rossi M, Lee R, McClean PE. is considered a diploid model for soybean [20]. Low- A gene-based linkage map of common bean (Phaseolus density RFLP mapping [28] found that the two species vulgaris), submitted.] that were genetically mapped share a high degree of sequence homology but synteny using the community-wide P. vulgaris BAT93 × Jalo is only found only over shorts blocks of the genomes. EEP558 mapping population and 59 gene-containing This is in contrast to the long stretches of synteny RFLP probes [30,31]. These loci were compared to the shared between common bean and (Vigna firstpublicreleaseofthesoybean genome, consisting radiate), another member of the Phaseoleae clade. of 20 pseudochromosomes, using the blastn algorithm. These two species diverged between 4.9 and 8.0 MYA We first limited the hits to those with an E-value < 1 [21,22]. ×10-10. This decision was motivated by our decision To better understand the structural relationships to compare the two genomes by uncovering those soy- between the common bean and soybean genome, we bean loci with some degree of orthology to the com- compared the organization of an extensive gene-based mon bean query sequences. A total of 1065 hits were map of common bean with the complete sequence of detected that met this criterion, and the median E- the soybean genome. We discovered overwhelming evi- valueforthesehitswas2.0×10-61. These hits were to dence of a one-to-two relationship between common loci on all 20 soybean pseudochromosomes. bean and soybean sequences. In addition, we were able Since another goal of this analysis was to determine to trace many of the gene rich regions of all soybean the genomic relationships between common bean and chromosomes back to specific regions of the common soybean genomes, we next limited the analysis to the bean genetic map. Evidence is also provided that, rela- best two soybean hits for each common bean sequence. tive to common bean, soybean is segmentally rear- This decision was based on the assumption that soybean ranged. Using this genetic/physical synteny, we have underwent a whole genome duplication event since it also been able to electronically map an additional 20,000 divergence from common bean, an assumption sup- common bean EST contigs and singletons relative to ported by a number of lines of evidence [24,25]. There- soybean. This result is a major first step in understand- fore, absent a major reduction in newly duplicated ing the evolution of the soybean genome relative to a soybean genes, there should be a one-to-two locus cor- diploid species within the Phaseoleae clade while provid- respondence between common bean and soybean. ing a framework for the comparative genetics and geno- Under this criterion, a total of 720 hits were observed, mics of these two species. and the median E-value was 3.5 × 10-91.Again,hitsto loci on all 20 pseudochromosomes were observed. Results and Discussion A common bean centric display of the synteny is found A few limited studies have investigated the relationship in Figure 2. between the common bean and soybean genomes. Bou- tin et al. [28] utilized shared RFLP markers and discov- Conserved common bean and soybean genomic blocks ered a number of shared common bean/soybean A conserved syntenic block was defined as one that syntentic markers blocks. In addition, individual markers shared three loci between common bean and soybean showed a more complex pattern of syntenty between the and covered a minimum of 4 cM of the common bean two species suggesting a pattern of fragmentation in the genetic map. A total of 55 syntenic blocks were evolutionary history of soybean relative to common observed between the two species. On average, each bean. Subsequently, Lee et al. [29] built upon these block consisted of 7 loci. From a common bean genetic results and were the first to show a clear one-to-two perspective, the mean size of each syntenic block was relationship between the common bean and soybean 32 cM (with a median of 29 cM). 75% of the blocks McClean et al. BMC Genomics 2010, 11:184 Page 4 of 10 http://www.biomedcentral.com/1471-2164/11/184

Pv1 Pv2 Pv3 Pv4 Pv5 52.8 0 D1080 6.0 0 g1795 41.0 0 g1686 4.2 0 DROI16a 0 g2467 0.7 12 g564 6 D1619

11 15 DDiap_1 0 DFR 1

11.6 15 13 g797 27 D1151 7 DROF10b 8 16 g1188 50.1 7 g1645 16 NRAMP 19 DROD20b 14 32 g2274 5.6 14 g1375 47.1 17 13 g2145 14.1 24 g2596 12 9.2 39 g2108 39.0 20 g1092 51.3 29 g511 2.8 19 g2303 36.1 32 g2410 2 47 g1830 19 40 g1968 33.0 30 g2266 57 g1805 27 Bng71 42 g34 45 g1410 45 D1595 65 D1377 13 43 g174

13 44 g1086 3.2 14.3 36 D1298 24.9 51 g822 54.7 64 g1515 73 g964 45 D1301 0.8 15 27.5 80 g1520 63 g483 47 g1333 54 g934 69 ChS 11 83 DROF10a 50 g1689 58 D1327 1 76 DROS3c 86 g417,g1656 2.9 72 DRbcs 55 g2308 36.7 81 DROG19b 61 D1251 65 g2647 7.8 77 g2685 7.3 66 87 g1801 90 Z_AM10S 43.7 g1883 0.7 86 g2490 53.7 93 g1556 1.5 79 G_AP3 69 g1664 37.3 96 g2476 72 Bng162 102 g1737 33.5 95 g457 7.7 79 102 g1148 101 g125 79 G_U14S DROC18a 8 2 108 g1771 5 42.1 104 g1925 115 g849 17 110 g1959 41.8 106 g2371 79 G_AP7 19 123 g1784 113 g724 32.3 126 g2113 4.7 107 DROF7c 84 g128 3 117 g1176 108 g117 2.1 119 g2409 104 D1325 119 g1954 133 g893 5 125 g686 123 Bng224 132 g2162 136 D0166 5 36.6 8 10.7 4.0 129 g1808 147 g1952 143 g1247 143 g2341 37.0 148 g1132 155 g2504 11.1 150 g665 18.7 149 g968 151 D1287 4.4 156 g2213

44.7 166 g1404 47.2 153 D1367 2

189 Bng171a 41.2 161 g1194 1.2 164 g2558,g2218 16 27.6 199 Bng122 2.6 164 g2577 5.1 171 g664 27.8

166 g680 8 36.3 202 g1224 5 g774 203 g499 169170 g693 206 gCV54201 173 g2427 18 11 207 TGA1.1 39.9 181 g2127 5.0 210 g1367 198 S2 214 g2562 208 DROS3b 218 g683 225 g523 39.0 232 g2132 0.2 Pv6 Pv7 Pv8 Pv9 0 DROJ9c 0 g503 0 D1831 0 g1588 62.2 4.6 6 g544 8972 7 9 8.1 28 41.4 50.4 10 CHI_INT3 15 g2543 4 44 9 D1096_2 14 g2538 12 g791 8.4 20 g2311 13 IRT3 23 g2553 14.6 21 g2298 34.2 22 g89 2.4 17 g2178 3.4

31 g2551 27 DROF1b 23.5 5.6 24 g1884 6

8 24 g2393 33 D0096 30 DBng199 29 g2498 18 34 AGT1 10 10 33 g1615 22.5 29 g1879 32 g993 36 g1119 7.3 34 g1107 6.8 37 g1639 36 g501 20 40 gTC1436 40 g129 38 g732 43 g739 42.6 37 WagG_W17 45.9 44 g792 9.9 44 g139 44 Bng060 38 g2567 46 Bng102 44.7 14 18 47 g471 45 Phs 44 DJ1kscar 47 Bng228 50 g1192 4.8 46 D1861 49 g708 6 54 D1086 54 g1233 54 g696,g2462 52 g2516 59 DROC11b 19 48.6 61 g2531 55 g1206 61 g2329 64 g1378 40.1 58 g131 58 g1126 66 g1361 49.5 44.0 66 g487 2.7 9 69 g1852 61 g549 66 g2510 16.2 76 g1748 11.2 70 Bng204 0.4 5.2 65 g1031 41.1 89 g1671 82 g1159 73 g415 23.5 45.5 69 g1990 32.3 99 g861 4.4 74 g2416 13 108 DP2062 15 13 78 P_U3 17.8 113 g2192 93 g1175 73 g776,C_AP2S

114 Bng94 10 103 g2459 15.5 75 g1539 Pv10 Pv11 117 g1998 6.2 120 g2208 112 g1065 12.7 0 g2521 0 g811 2.3 7.2 116 g1357 20 25.6 g785 77 10 8.7 3 122 g1818 121 g1853 30.1 77 g440,Bng205 39.0 3 DROE9a 7 g2273 34.7 130 g1280 35.4 11

77 g1758 12 8 135 130 g1380 5 BIP_J17S 13 g1731 14.3 g1757 23.7 80 g2514 9 g1383 16 g1932 14.8 136 g1190 3.8 29.6 133 g290 6.6 20 g1438 12 86 g140 143 g1174 32.5 28 g2331 25 g2285 146 g1947 46.0 109 g1713 16 17.4 31 4.3 33 g1994 1.8 g835 146 Bng104 31.8 118 g2264 4.9 33 DRON9b 44 L_L4S 0.5 34 g1168 25.4 125 g1181 47 g2600 15.2 134 A1CAO 52 g2560 35 DROJ9a 50.9 159 DROS3a 0.8 55 g2268 8 36 D1630 58 g1341 38 DROG5b 62 g2260 54 g2135 183 g2316 1. 2 66 g1661 17.3 61 DH20sT 70 DROF1a 67 g1983 73 g1215 Figure 2 Syntentic relationships of soybean relative to the common bean genetic map. A common bean genetic map anchors corresponding syntenic regions of the soybean 1.01 genome build. The location (in megabase pairs) of each soybean fragment straddling the common bean linkage is noted at the beginning and end of the homology. were > 20 cM in length. Conversely, the mean physical DNA has a much lower gene density than euchromatic distance relative to soybean was 4.9 Mb (with a median regions at the ends of the chromosomes. Finally, these of 2.6 Mb). 63% of the blocks were > 2 Mb in length. regions have a much lower recombination rate than the By comparing the location of these blocks, it is very euchromatic region. All of these features appear to be clear that nearly all segments of the common bean gen- common for eudicot genomes. ome mapped to two segments of the soybean genome. As Figure 3 shows, 35 of the 40 of the soybean chro- 69.3% of the genetic distance of the common bean map mosome arms exhibit a signature of conserved synteny mapped to some region of the soybean genome. Much between the two species. Much of the synteny that we of the uncovered regions in common bean fell in larges observed here was to orthologous loci in the euchro- gaps on the linkage map. Conversely, these blocks con- matic regions of the soybean genome. Only a few soy- sisted of 271 Mb or 27.9% of the soybean genome bean chromosomes, specifically 10, 12, 14, 17, 18, and sequence. To better understand the difference in gen- 20, contain extensive blocks of common bean loci that ome coverage, we needed to integrate additional soy- map to soybean pericentromeric DNA. If we exclude bean genomic information into our analysis. the pericentromeric DNA from our calculations, of the As displayed graphically at Soybase http://soybase.org 271 Mb of the soybean that were syntentic to common and Phyotzome http://www.phytozome.org/soybean, and bean, 200 Mb mapped to the euchromatic arms. There- captured in Figure 3, much of the soybean genome con- fore, using this limited set of common bean loci, we sists of pericentromeric DNA. Several features of this were able to determine the ancestry of 42.7% of the class of DNA are important for this analysis here. First, gene rich euchromatic region of soybean. Further, 30 of this DNA is centered around the centromeres and the 33 duplicated blocks of soybean genome anchored encompasses 56% of the soybean genome. Further, this to the common bean genetic map were also defined as McClean et al. BMC Genomics 2010, 11:184 Page 5 of 10 http://www.biomedcentral.com/1471-2164/11/184

Gm1 Gm2 Gm3 Gm4 Gm5 Gm6 Gm7 Gm8 Gm9 Gm10 0 66 161 82 13 119 17 181 6 69 156 (Pv3) 9 0 3 129 33 74 171 (Pv3) 126 7 73 24 9 2 112 15 (Pv8) 34 38 143 20 (Pv8) 9 10 116 10 44 148 39 60 3 0 16 73 75 5 24 69 135 (Pv6) 8 73 136 (Pv6) 10 47 20 66 8 29 24 Mb 30 133 126 95 7 143 121 86 2 148 181 40 89 161 14 1 37 166 9 6 44 8 66 109 44 69 8 7 50 29 159 10 2 0 87 64 Gm11 Gm12 Gm13 Gm14 Gm15 Gm16 Gm17 Gm18 Gm19 Gm20 0 64 159 66 44 232 0 93 0 5 33 1 29 8 44 3 86 199 4 2 0 11 14 118 136 96 4 0 6 77 3 77 1 6 3 10 69 11 50 31 150 20 73 130 34 7 51 116 164 0 7 30 171 121 Mb 99 3 40 6 130 21 1 202 5 32 65 232 24 40 1 0 64 5 54 66 1 7

24 166 50 5 16

8 60 77 Figure 3 Syntentic relationships of common bean relative to pseudochromosomes defined in build 1.01 of the soybean genome. Painted on each soybean pseudochromosome are syntenic common bean fragments. The genetic location (in cM) of the fragment are noted. The extent to which each fragment was electronically extended (see text for procedure) beyond the genetic mapping distance is noted as the distance beyond the genetic boundaries of the syntenic common bean fragment. The gray bar represents the heterochromatic region, and the black dot is the location of the centromere. This information was collected from the Soybean Genome Browser at SoyBase.org http://soybase. org/gbrowse/cgi-bin/gbrowse/gmax1.01/. soybean-to-soybean duplicates based on dot blot analysis Next, a comparison of the physical to genetic distances of the full genome sequence of soybean [[23]; see Figure between duplicate blocks from the soybean genome that S5 there]. were syntentic to the same common bean genetic block We next compared the ratio of the physical distance was made. The average difference between these block to genetic distance in common bean. To calculate the was 33,571 bp/cM, while the median was 18,056 bp/cM. physical distance per cM, we used the physical distance The range of difference spread from 76 to 203,002 bp/cM. in soybean relative to the genetic distance in common This largest difference was between duplicates on soybean bean. This same approach was used when the A. thali- chromosomes 8 and 18 that were syntenic to a Pv6 block ana and B. napus genomes were compared [8]. The bounded by markers g2553 and g139 in the interval from physical distance per cM was calculated for a total of 23-47 cM. The ratio for these two blocks (Gm8 = 119,914 245 comparisons of neighboring loci. The average dis- bp/cM; Gm198 = 351,548 bp/cM) was much greater than tance was 290,441 bp/cM. The median ratio was the genome-wide average. These two soybean blocks ter- 119,405 bp/cM, and 42% of the comparisons were less minate in the low recombination region, and the Gm18 than 100,000 bp/cM. These later values were very simi- blocks goes further into the pericentromeric region. This lar to those observed when A. thaliana and B. napus probably accounts for the differences in the physical to were compared. genetic distance ratio. McClean et al. BMC Genomics 2010, 11:184 Page 6 of 10 http://www.biomedcentral.com/1471-2164/11/184

Electronic mapping of common bean sequences sequence loci within the interval, and the amplification Given the extensive synteny observed between common products from BAT93 and Jalo EEP558 were sequenced. bean and soybean, we considered the possibility of map- Of these we detected 7 polymorphic loci. CAPs markers pingothercommonbeansequencesrelativetothe were developed, and the loci were mapped on the observed duplicate syntentic blocks. The underlying BAT93 × Jalo EEP558 community mapping population. concept is that duplicate soybean blocks could serve as All 7 of the loci mapped within the interval that was a reference that would point to the most likely genetic selected for study. Five of the seven mapped exactly as position for any given common bean sequence. This predicted. The positions of the remaining two loci were concept is analogous to the binning of ESTs in wheat inverted relative to the positions of the two soybean [32] except rather relying on the physical deletion land- orthologs. This result strongly suggests that the electro- marks it relies upon the comparative genomic structure nic mapping approach provides extensive physical and between two species within the same evolutionary line- genetic position information not previously available for age. These electronically mapped loci might also 1) common bean. reveal additional details about the degree of synteny and These results allowed us to systematically extend the the chromosomal history of these two species, and 2) distance of the common bean syntenic block (Figure 3). point to duplication blocks in the soybean genome itself. The principle we applied is to search for a common To test this concept, we collected all available com- bean genetic block shared by two soybean regions and mon bean sequences for a blastn analysis. This analysis extend the borders in either or both directions until consisted of 11,043 EST contigs and 9,847 EST single- sequence colinearity between the two soybean blocks is tons that were constructed here using the procedure broken. For example, the Pv 9 block from 44-89 cM is described by Childs et al. [33]; 85,102 repeat masked shared with both Gm4, from positions 41.1 to 45.9 Mb, BES [34]; and 694 CDSs from common bean. Several and Gm6, from positions 11.9 and 16.2 Mbp. By apply- criteria were used before a sequence was included in ing the principle described above, we see an unbroken our set of electronically mapped sequences. First, the block in the same orientation from 36.0-49.0 Mbp on sequence must have two hits against different segments Gm4 and from position 8.6-19.5 Mbp on Gm6. By of the soybean genome. Secondly, the e-value for all hits applying this concept, we were able to extend the degree must be less than 1 × 10-30. Next, the length of the of synteny relative to this one Pv9 block from 9.1 Mbp match of the query to the soybean must be 150 nt or to 23.1 Mbp. We reexamined the synteny between the greater. And finally, to ensure that the homology was two species and applied the same principle we used against two different sequences, the distance between with the Pv9 to the entire genome. Using this newly the 3’ end of one sequence and the 5’ beginning of is derived information, we can account for an additional neighboring sequence must be greater than 50 nucleo- 187 Mbp of syntentic regions between common bean tides. Using these criteria an additional 15,091 common and soybean. All of our approaches here allow us to bean sequences were electronically mapped. This con- trace the ancestry of 456 Mbp of the soybean genome sisted of 549 gene-based sequences, 2,548 BES, 4,522 relative to the diploid common bean genome. EST singletons, and 7,472 EST contigs. Of these, 316 were previously mapped genetically. This gives a total of Duplication history of soybean 14,775 newly mapped loci. The median distance Given that soybean has undergone a major duplication between any two sequences, based on their soybean event in its history, it is reasonable to expect that many orthologous location, was 11,068 nucleotides. common bean sequences should map to two locations In addition to its usefulness for studying common in soybean. Our extensive Blastn analysis using all avail- bean and soybean synteny, this large collection of elec- able common bean sequences clearly showed this to be tronically mapped common bean sequences greatly the case. Given that common bean is a diploid relative increases the marker set available for common bean of soybean, the common bean sequences can then be genetics. For this to be the case, it was essential to used as a reference point that links two duplicate determine if the electronic map position of each of regions in soybean. This approach does have limitations, these sequences corresponds to its predicted genetic although not serious. First, any sequence or sequence position. To test this concept, we focused on a 43 cM block unique to the soybean lineage will not have a region of Pv7 between markers g2298 and g1378. We common bean sequence signal, and any duplication selected this region because although QTL for common associated with those sequences will not be uncovered. bacterial blight [35] and white mold resistance [36] are Also, given that 83% of the sequences we were able to known to map to this location, useful markers have not map electronically were gene-based, and that most yet been developed for marker assisted selection. Pri- genes are found in the euchromatic regions near the mers were designed to 15 EST contig or singleton ends of eukaryotic chromosomes, the strongest evidence McClean et al. BMC Genomics 2010, 11:184 Page 7 of 10 http://www.biomedcentral.com/1471-2164/11/184

regarding the duplication history will come from those blocks. These blocks are interrupted by a duplicate gene-rich regions. block from Gm7. Two of the blocks on both Gm5 and To investigate the duplication history of each soybean Gm8 are syntenic to common bean Pv2. Furthermore, chromosome relative to the remainder of the soybean of the two chromosomes, Gm8 shares duplications with chromosome set, we first ordered all of the common more Gm chromosomes which in turn trace back to five bean sequences relative to their soybean ortholog on a different common bean linkage groups. chromosome-by-chromosome basis. That ordering was Using both the Pv/Gm syntentic data as well the cross-referenced to the duplicate with the lowest E-value duplications found in Gm using the large data set from on a second soybean chromosome. These defined blocks Pv, several conclusions can be drawn regarding the of loci that mapped to the same duplicate chromosome. chromosomal history of soybean. First, the one-to-two It was required that each block consist of ten common mapping of Pv to Gm sequences provides further com- bean loci, and that the loci must be in consecutive order. pelling evidence that a major duplication event is part of Figure 4 summarizes the duplicated block data for the history of the soybean genome [25]. The modular chromosomes Gm5 and Gm8 in graphical form. These nature of the both the Pv/Gm synteny and soybean two were chosen because they are representative of the duplications suggest that either coincident with the other chromosomes, and because they share synteny to duplication, or shortly after, the duplicated chromo- the same regions on common bean Pv2. The most somes were fractionated or new chromosomes were apparent observation is the modular nature of each reassembled. A dramatic example is found by comparing chromosome with regards to their duplicate blocks. the Gm15 duplicates to Gm8. Here we see three dupli- Gm5 shares duplication blocks with three other soybean cate blocks from the end of Gm15 organized non-con- chromosomes (Gm8, Gm17, Gm19) while Gm8 shares tiguously. These are right next to a block from the duplication blocks with Gm5, Gm7, Gm12, Gm15, and beginning of the same chromosome. A chromosome- Gm18. Furthermore, the modular blocks from the dupli- wide analysis suggests that at least 19 blocks rearranged cate chromosome are not contiguous. For example, to form Gm8. Because of a lack of significant sequences beginning at position 0 on Gm8, there is about 12 Mbp homologous much of the pericentromeric region, we of sequences duplicated on Gm5 in three non-contigous could not determine the modular nature of that region.

Pv3 Pv2 Pv2 Pv2 51 - 41 84-112 65-60 27-47

8.4 - 6.8 10.2 - 13.1 Gm 5 15.9 - 13.2 11.7 - 23.7 9.4 - 1.3 6.5 - 6.2 0.8 - 0.1 0.2 - 0.6 0.7 - 1.2 6.6 - 6.8 10.2 - 9.5

13 91 136 20 39 19 24 65 535 57 394

Pv2 Pv2 Pv2 Pv5 Pv10 Pv8 Pv6 47-27 84-112 65-60 15-24 62-80 29-25 124-102

41.9 - 38.1 30.6 - 37.3 59.1 - 60.7 Gm 8 24.6 - 6.1 0.3 - 0.1 2.7 - 1.6 0.1 - 0.2 4.4 - 3.6 0.5 - 1.6 0.3 - 0.5 38.1- 46.3 37.4 - 38.1 36.0 - 35.1 49.8 - 50.4 49.7 - 49.1 48.9 - 46.9 32.6 - 32.5 62.2 - 61.6 37.4 - 37.5 57 394 74 466 24 10 315862 16 104 13 95 56 14 89 245 50 36

0 10 20 30 40 50

5 7 8 12 15 17 18 19

Gm chromosome code Figure 4 Duplicated blocks along soybean chromosomes Gm5 and Gm8 based on reference ordering of common bean sequences. Each chromosome consists of a series of duplicate blocks that are color coded. White blocks have no reference common bean sequence. The physical position (in Mbp) of the duplicate block on its home chromosome is given. The position is given in either direct or inverted order based on its position relative to the reference chromosome. Above some blocks is the genetic location of the corresponding syntentic common bean blocks. Below each block is the number of common bean sequences used to establish the block. The thin bars represent the location of the pericentromeric regions of the chromosomomes. The physical distance scale at the bottom is in Mbp. McClean et al. BMC Genomics 2010, 11:184 Page 8 of 10 http://www.biomedcentral.com/1471-2164/11/184

Conclusions blastn analyses. First, 300 EST or singletons, originally We have used the first public release of the soybean developed by Ramirez et al. [40], using a set of 21,026 genome to evaluate the syntentic relationship between EST, and 15 other sequences, all mapped onto the 11 this important economic species and common bean, common bean linkage groups at a LOD value of 2 another member of the Phaseoleae legumes. It appears [McConnell M, Chikara S, Mamidi S, Rossi M, Lee R, that extensive regions of synteny exist between these McClean PE. A gene-based linkage map of common two species. These relationships further suggest that the bean (Phaseolus vulgaris), submitted.], were used as a soybean genome has undergone a whole genome dupli- query. Subsequently, blastn analyses were performed cation based on the fact that nearly all of the common using the new set of ESTs (11,043) and singletons bean sequences that map to a single location, have two (9,847) as one query, and the 85,102 BES as a second copies in the soybean genome. Furthermore, soybean query. A final query consisted of 694 complete P. vul- appears to have undergone extensive chromosome garis coding sequences (CDS) downloaded from NCBI breakage and rearrangement. This conclusion is based http://www.ncbi.nlm.nih.gov/ on August 8, 2008. on the observation that most soybean chromosomes Because we were looking for significant levels of orthol- consist of fragments from multiple common bean chro- ogy between common bean and soybean sequences, we mosomes. Whether these rearrangements occurred prior initially limited all blastn analysis to hits with E-values to, coincident with, or following the duplication event is less than 1 × 10-10 and overlaps of at least 150 nt. The unclear at this time. From an applied perspective, these results reported here are based on the first high scoring results suggest that a comparative genomics approach to pair. See Additional File 1 for all of the common bean gene discovery is feasible for these two evolutionarily sequences that met these criteria. related species. High density mapping of linkage group Pv7 Methods Primers were developed for 15 contig or singletons that EST contiging and BAC-end sequence processing electronically mapped between markers g2298 and The 83,448 Phaseolus vulgaris sequences in the National g1378 on linkage group Pv7. A 3’-primer was designed Center for Biotechnology Information (NCBI) EST data- toasequencewithin150ntoftheputativestop,and base available on August 1, 2008 were downloaded from the corresponding 5’ primer was located about 500 nt http://www.ncbi.nlm.nih.gov/sites/entrez?db=nucest. upstream of the 3’-primer site. Subsequent PCR amplifi- EST contigs and singletons were defined using the pro- cation and sequencing followed the procedures cedures described in Childs et al. [33]. The procedure described in McClean et al. [41]. Based on sequence was performed using the following stand-alone software: polymorphism, either CAPs or SNP diagnostic markers SeqClean (http://www.tigr.org/tdb/tgi/software; default were developed. These markers were used to score poly- parameters), Megablast [[37]; http://www.ncbi.nlm.nih. morphisms among members of the community-wide gov/blast/megablast.shtml; default parameters], and BAT93 × Jalo EEP558 RI population. These marker loci CAP3 [[38]; default parameters]. All 89,017 BAC-end were then mapped using the MAPMAKER software sequences (BES) were downloaded from the NCBI GSS [42]. The final order was verified using the “ripple” com- database http://www.ncbi.nlm.nih.gov/sites/entrez? mand with a LOD value of 3.0. db=nucgss on August 1, 2008. We limited our analysis to those 85,102 BES representing sequences from both List of abbreviations ends of 42,551 sBACs. Prior to the blastn analyis, these BAC: bacterial artificial chromosome; BES: BAC end BES were analyzed using RepeatMasker (http://www. sequence; bp: base pairs; CAPs: cleaved amplified poly- repeatmasker.org; default parameters, v. 3.15) with the morphic sequence; CDS: coding sequence; cM: centi- repeats database obtained from Plant Repeat morgans; EST: expressed sequence tags; Gm: Glycine Database [[39]; http://plantrepeats.plantbiology.msu.edu/ max;LOD:logarithmof(base10)ofodds;Mb:mega- ; file: TIGR_Fabaceae_Repeats.v2_0_0.fsa.txt]. A fasta file bases; MYA: million years ago; RFLP: restriction frag- containing the singletons and EST contigs is available ment length polymorphism. upon request from the corresponding author. Additional file 1: Electronically mapped common bean loci based Blastn analysis on synteny to the soybean genome sequence. List of P. vulgaris loci which met the following selection criteria for electronic mapping: e-value The 20 pseudochromosomes from the version 1.01 less than 1 × 10-30, hits to two soybean chromosomes, query length release of the soybean assembly were downloaded from greater than 149 nt, and soybean position for the end of one soybean ftp://ftp.jgi-psf.org/pub/JGI_data/Glycine_max/Glyma1/ locus and the beginning of the next locus was less than 50 nt. assembly/sequences/ and used as a database for all McClean et al. BMC Genomics 2010, 11:184 Page 9 of 10 http://www.biomedcentral.com/1471-2164/11/184

Acknowledgements 13. Yan L, Loukoianov A, Tranquilli G, Helguera M, Fahima T, Dubcovsky J: This research was funded in part by the Department of Positional cloning of wheat vernalization gene VRN1. Proc Natl Acad Sci Agriculture Cooperative State Research, Education and Extension Service USA 2003, 100:6263-6268. National Research Initiative as part of their Plant Genome program. The 14. Lu HJ, Faris JD: Macro- and micro-colinearity between the genomic grant was awarded to PEM, and the award number is 2005-00757. Partial region of wheat chromosome 5B containing the Tsn1 gene and the rice support for SM was made possible by NIH grant number P20 RR016741 genome. Funct Integr Genomics 2006, 6:90-103. from the INBRE program of the National Center for Research Resources. 15. Sorrells ME, La Rota M, Bermudez-Kandianis CE, Greene RA, Kantety R, These soybean sequence data were produced by the US Department of Munkvold JD, Miftahudin , Mahmoud A, Ma X, Gustafson PJ, Qi LL, Energy Joint Genome Institute http://www.jgi.doe.gov/ in collaboration with Echalier B, Gill BS, Matthews DE, Lazo GR, Chao S, Anderson OD, Edwards H, the user community. Linkiewicz AM, Dubcovsky J, Akhunov ED, Dvorak J, Zhang D, Nguyen HT, Peng J, Lapitan NLV, Gonzalez-Hernandez JL, Anderson JA, Hossain K, Author details Kalavacharla V, Kianian SK, Choi DW, Close TJ, Dilbirligi M, Gill KS, Steber C, 1Genomics and Bioinformatics Program, North Dakota State University, Fargo, Walker-Simmons MK, McGuire PE, Qualset CO: Comparative DNA sequence ND 58105, USA. 2Department of Plant Sciences, Loftsgard Hall, North Dakota analysis of wheat and rice genomes. Genome Res 2003, 13:1818-1827. State University, Fargo, ND 58105, USA. 16. Wei F, Coe E, Nelson W, Bharti AK, Engler F, Butler E, Kim H-R, Goicoechea JL, Chen M, Lee S, Buks G, Sanchez-Villeda H, Schroeder S, Authors’ contributions Fang Z, McMullen M, Davis G, Bowers JE, Paterson AH, Schaeffer M, PEM conceived the project and was the principal author of the manuscript. Gardiner J, Cone K, Messing J, Soderlund C, Wing RA: Physical and genetic MM performed all of the molecular marker analysis for common bean and structure of the maize genome reflects its complex evolutionary history. analyzed the initial syntenty between the two species. RL and SM jointly PLoS Genetics 2007, 3:1254-1263. developed the common bean contigs, and performed all of the 17. Swigonova Z, Lai J, Ma J, Ramakrishna W, Llaca V, Bennetzen , Messing J: bioinformatic analysis of blast analyses between the common bean Close split of sorghum and maize genome progenitors. Genome Res sequences and the soybean 1.01 build. SC remapped the common bean 2004, 14:1916-1923. molecular segregation data as new data became available. MM, RL, SM, and 18. Bowers JE, Abbey C, Anderson S, Chang C, Draye X, Hoppe AH, Jessup R, SC each reviewed the manuscript. Lemke C, Lennington J, Li Z, Lin Y-R, Liu S-C, Luo L, Marler B, Ming R, Mitchell SE, Quang D, Reischmann K, Schulze SR, Skinner DN, Wang Y-W, Received: 26 August 2009 Accepted: 18 March 2010 Kresovich S, Schrerz KF, Paterson AH: A high-density genetic Published: 18 March 2010 recombination map of sequence-tagged sites for sorghum, as a framework for comparative structural and evolutionary genomics of References tropical and grasses. Genetics 2003, 165:367-386. 1. Bonierbale MW, Plaisted RL, Tanksley SD: RFLP maps based on a common 19. Paterson AH, Bowers JE, Chapman BA: Ancient polyploidization predating set of clones reveal modes of chromosomal evolution in potato and divergence of the cereals, and its consequences for comparative tomato. Genetics 1988, 120:1095-1103. genomics. Proc Natl Acad Sci USA 2004, 101:9903-9908. 2. Tanksley SD, Ganal MW, Prince JP, de-Vicente MC, Bonierbale MW, Broun P, 20. McClean PE, Gepts P, Jackson S, Lavin M: Phaseolus vulgaris : A diploid Fulton TM, Giovannoni J, Grandillo S, Martin GM, Messeguer R, Miller JC, model for soybean. Genetics and Genomics of Soybean New York: Springer, Miller L, Paterson AH, Pineda O, Roder MS, Wing RA, Wu W, Young ND: New YorkStacey G 2008, 55-78. High density molecular linkage maps of the tomato and potato 21. Lavin M, Herendeen PS, Wojjciechowski MF: Evolutionary rate analysis of genomes. Genetics 1992, 132:1141-1160. Leguminosae implicates a rapid diversification of lineages during the 3. Tanksley SD, Bernatzky R, Lapitan NL, Prince JP: Conservation of gene tertiary. Syst Biol 2005, 54:575-594. ć repertoire but not gene order in pepper and tomato. Proc Natl Acad Sci 22. Stefanovi S, Pfeil BE, Palmer JD, Doyle JJ: Relationships among USA 1988, 85:6419-6423. phaseoloid legumes based on sequences from eight chloroplast regions. 4. Whitkus R, Doebley J, Lee M: Comparative genome mapping of sorghum Systematic Botany 2009, 34:115-128. and maize. Genetics 1992, 132:1119-1130. 23. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, 5. Gale M, Devos K: Comparative genetics in the grasses. Proc Natl Acad Sci Song Q, Thelen JJ, Cheng J, Xu D, Hellsten U, May GD, Yu Y, Sakurai T, USA 1993, 95:1971-1974. Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, 6. Choi H-K, Mun J-H, Kim K-J, Zhu H, Baek J-M, Mudge J, Roe B, Ellis N, Grant D, Shu S, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, Du J, Doyle J, Kiss GB, Young ND, Cook DR: Estimating genome conservation Tian Z, Zhu L, Gill N, et al: Genome sequence of the paleopolyploid between crop and model legume species. Proc Natl Acad Sci USA 2004, soybean. Nature 2010, 463:178-183. 101:15289-15294. 24. Shoemaker RC, Polzin K, Labate J, Specht J, Brummer EC, Olson T, Young N, 7. Parkin IAP, Sharpe AG, Lydiate DJ: Patterns of genome duplication within Concibido V, Wilcox J, Tamulonis JP, Kochert G, Boerma R: Genome the Brassica napus genome. Genome 2003, 46:291-303. duplication in soybean (Glycine subgenus soja). Genetics 1996, 8. Parkin IAP, Gulden SM, Sharpe AG, Lukens L, Trick M, Osborn TC, Lydiate DJ: 144:329-338. Segmental structure of the Brassica napus genome based on 25. Schlueter JA, Dixon P, granger C, Grant D, Clark L, Doyle JJ, Shoemaker RC: comparative analysis with Arabidopsis thaliana. Genetics 2005, Mining the EST databases to determine evolutionary events in the 171:765-781. legumes and grasses. Genome 2004, 47:868-876. 9. Crepet WL, Nixon KC, Gandolfo MA: Fossil evidence and phylogeny: the 26. Straub SCK, Pfeil BE, Doyle JJ: Testing the polyploidy past of soybean age of major angiosperm clades based on mesofossil and macrofossil using a low-copy nuclear gene - Is Glycine (Fabaceae: Papilionoideae) an evidence from Cretaceous deposits. Amer J Bot 2004, 91:1666-1682. auto- or allopolyploid?. Mol Phyol Evol 2006, 39:580-584. 10. Ku M-M, Vision T, Liu J, Tanksley SD: Comparing sequenced segments of 27. Freyre R, Skroch P, Geffroy V, Adam-Blondon A-F, Shirmohamadali A, the tomato and Arabidopsis genomes: Large-scale duplication followed Johnson W, Llaca V, Nodari R, Pereira P, Tsai S-M, Tohme J, Dron M, by selective gene loss creates a network of synteny. Proc Natl Acad Sci Nienhuis J, Vallejos C, Gepts P: Towards an integrated linkage map of USA 2000, 97:9121-9126. common bean. 4. Development of a core map and alignment of RFLP 11. Yang YW, Lai KN, Tai PY, Li WH: Rates of nucleotide substitution in maps. Theor Appl Genet 1998, 97:847-856. angiosperm mitochondrial DNA sequences and dates of divergence 28. Boutin SR, Young ND, Olson TC, Yu ZH, Vallejos CE, Shoemaker RC: between Brassica and other angiosperm lineages. J Mol Evol 1999, Genome conservation among three legume genera detected with DNA 48:597-604. markers. Genome 1995, 38:928-937. 12. Schnurbusch T, Collins NC, Eastwood RF, Sutton T, Jefferies SP, Langridge P: 29. Lee JM, Grant D, Vallejos CE, Shoemaker RC: Genome organization in ‘ ’ Fine mapping and targeted SNP survey using rice-wheat gene dicots. II. Arabidopsis as a bridging species to resolve evolution events colinearity in the region of the Bo1 boron toxicity tolerance locus of among legumes. Theor Appl Genet 2001, 103:765-773. bread wheat. Theor Appl Genet 2007, 115:451-461. 30. Murray J, Larsen J, Michaels TE, Schaafsma A, Vallejos CE, Pauls KP: Identification of putative genes in bean Phaseolus vulgaris) genomic McClean et al. BMC Genomics 2010, 11:184 Page 10 of 10 http://www.biomedcentral.com/1471-2164/11/184

(Bng) RFLP clones and their conversion to STSs. Genome 2002, 45:1013-1024. 31. Vallejos CE, Sakiyama NS, Chase CD: A molecular marker-based linkage map of Phaseolus vulgaris L. Genetics 1992, 131:733-740. 32. Qi LL, Echalier B, Chao S, Lazo GR, Butler GE, Anderson OD, Akhunov ED, Dvorák J, Linkiewicz AM, Ratnasiri A, Dubcovsky J, Bermudez-Kandianis CE, Greene RA, Kantety R, La Rota CM, Munkvold JD, Sorrells SF, Sorrells ME, Dilbirligi M, Sidhu D, Erayman M, Randhawa HS, Sandhu D, Bondareva SN, Gill KS, Mahmoud AA, Ma XF, Miftahudin , Gustafson JP, Conley EJ, Nduati V, Gonzalez-Hernandez JL, Anderson JA, Peng JH, Lapitan NL, Hossain KG, Kalavacharla V, Kianian SF, Pathan MS, Zhang DS, Nguyen HT, Choi DW, Fenton RD, Close TJ, McGuire PE, Qualset CO, Gill BS: A chromosome bin map of 16,000 expressed sequence tag loci and distribution of genes among the three genomes of polyploid wheat. Genetics 2004, 168:701-712. 33. Childs KL, Hamilton JP, Zhu W, Ly E, Cheung F, Wu H, Rabionwicz PD, Town CD, Buell CR, Chan AP: The TIGR plant transcript assemblies database. Nucleic Acids Res 2007, 35:D846-D851. 34. Schlueter JA, Goichochea JL, Collura K, Gill N, Lin J-Y, Yu Y, Kudrna D, Zuccolo A, Vallejos CE, Muñoz-Torres M, Blair M, Thome J, Tomkins J, McClean P, Wing R, Jackson SA: BAC-end sequence and a draft physical map of the common bean (Phaseolus vulgaris L.) genome. Tropical Plant Biology 2008, 1:40-48. 35. Miklas PM, Johnson WC, Delorme R, Gepts P: QTL conditioning physiological resistance and avoidance to white mold in dry bean. Crop Sci 2001, 41:309-315. 36. Nodari RO, Tsai SM, Guzman P, Gilbertson RL, Gepts P: Towards an integrated linkage map of common bean. III. Mapping genetic factors controlling host- interactions. Genetics 1993, 134:341-350. 37. Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for aligning DNA sequences. J Comput Biol 2000, 7:203-214. 38. Huang X, Madan A: CAP3: a DNA sequence assembly program. Genome Res 1999, 9:868-877. 39. Ouyang S, Buell CR: The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res 2004, 32:D360-D363. 40. Ramierz M, Graham MA, Blanco-López L, Silvente S, Medrano-Soto A, Blair MW, Hernandez G, Vance CP, Lara M: Sequencing and analysis of common bean ESTs. Building a foundation of functional genomics. Plant Physiol 2005, 137:1211-1227. 41. McClean PE, Lee RK, Miklas PN: Sequence diversity analysis of dihydroflavonol 4-reductase intron 1 in common bean. Genome 2004, 47:266-280. 42. Lander ES, Green P, Abrahamson J, Barlow A, Daly MJ, Lincoln SE, Newburg L: MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1987, 1:174-181.

doi:10.1186/1471-2164-11-184 Cite this article as: McClean et al.: Synteny mapping between common bean and soybean reveals extensive blocks of shared loci. BMC Genomics 2010 11:184.

Submit your next manuscript to BioMed Central and take full advantage of:

• Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit