Available online at www.sciencedirect.com
ScienceDirect
Progress, challenges and the future of crop genomes
1 2
Todd P Michael and Robert VanBuren
The availability of plant reference genomes has ushered in a The high throughput and low cost of NGS technologies
new era of crop genomics. More than 100 plant genomes have made it possible to sequence crops with lower economic
been sequenced since 2000, 63% of which are crop species. value or large genomes and have paved the way for
These genome sequences provide insight into architecture, establishing new model species. The complexity and size
evolution and novel aspects of crop genomes such as the of some crop genomes made traditional Sanger sequenc-
retention of key agronomic traits after whole genome ing cost prohibitive. The wheat genome for instance, is
duplication events. Some crops have very large, polyploid, hexaploid, 90% repetitive, and 17 gigabases (Gb), and the
repeat-rich genomes, which require innovative strategies for sugarcane genome ranges in ploidy up to decaploid, and
sequencing, assembly and analysis. Even low quality reference its 12 Gb is 80% repetitive. Although sequencing capacity
genomes have the potential to improve crop germplasm and computational power are increasing exponentially,
through genome-wide molecular markers, which decrease numerous challenges still remain, and both novel meth-
expensive phenotyping and breeding cycles. The next stage of odologies and legacy techniques are important to crack
plant genomics will require draft genome refinement, building these impossible genomes.
resources for crop wild relatives, resequencing broad diversity
panels, and plant ENCODE projects to better understand the Model plant genomes such as Arabidopsis [1], Brachypo-
complexities of these highly diverse genomes. dium distachyon [3], Physcomitrella patens (moss [4]) and
Addresses Setaria italica [5,6], serve as an engine for research, while
1
Ibis Biosciences, Carlsbad, CA, United States others like Oyrza sativa (rice [7,8]), Populus trichocarpa ([9]
2
Donald Danforth Plant Science Center, St. Louis, MO, United States poplar), Zea mays (maize [10]), Glycine max (soybean [11]),
Solanum lycopersicum (tomato [12]), and Pinus taeda (lob-
Corresponding author: VanBuren, Robert ([email protected])
lolly pine [13]) serve a dual purpose not just as crops but
as functional models. Together these genomes have
Current Opinion in Plant Biology 2015, 24:71–81 provided the foundation for an era of molecular genomics
research that has enabled functional definition of many
This review comes from a themed issue on Genome studies and
molecular genetics key genes and pathways.
Edited by Insuk Lee and Todd C Mockler
Non-model and non-crop plant genomes provide im-
For a complete overview see the Issue and the Editorial
portant clues to plant genome architecture and the
Available online 19th February 2015
evolution of flowering plants. Although it was thought
http://dx.doi.org/10.1016/j.pbi.2015.02.002 that plants have a ‘one-way ticket to genome obesity’ as
1369-5266/# 2015 Elsevier Ltd. All rights reserved. a result of the retention of proliferating transposable
elements (TEs) [14], the smallest plant genomes [15],
Utricularia gibba (bladderwort) and Genlisea aurea (cork-
screw), provided evidence that almost all intragenic
space and repeat sequence can be purged [16,17]. In
addition, the aquatic, highly morphologically reduced,
Introduction non-grass monocot Spirodela polyrhiza (greater duck-
After the release of the Arabidopsis genome in 2000 [1] weed), has a genome similar in size to Arabidopsis
and the advent of Next Generation Sequencing (NGS) yet functions with 28% less genes (19,623) [18]. The
technology in 2005, the number of sequenced plant genomes of Selaginella moellendorffii (spikemoss [19])
genomes has rapidly increased to more than 100 ([2], and Amborella trichopoda [20], provide the evolutionary
List of sequenced plant genomes; URL: https:// link between vascular plants and angiosperms respec-
genomevolution.org/wiki/index.php/Sequenced_plant_ tively, yielding key insights into the trajectory of plant
genomes). Nearly two-thirds (63%) of the sequenced specific gene families and the radiance of flowering
plant genomes are from crops, while model, non-model plants.
and crop wild relatives make up the remainder; three-
fourths (76%) of the sequenced plant genomes are from In this review we focus primarily on the most recently
eudicots and one-fifth (19%) are from monocots. Few sequenced specialty and row crop genomes with an
genomes from non-flowering plants have been published emphasis on challenges and limitations of current genome
thus far, with only three from the Gymnospermae, one sequencing techniques. This segues into downstream
from the Bryophyta and one from the Lycopodiophyta work aimed at linking the genome to the biology, and
(Figure 1, Table 1). concludes with the future of plant genomics.
www.sciencedirect.com Current Opinion in Plant Biology 2015, 24:71–81
72 Genome studies and molecular genetics
Figure 1
Kiwi Blueberry Coffee Eggplant Tomato Potato Pepper Utricularia Monkey Flower Asterids Sugar Beet Grape Soybean Common Bean Pigeon Pea Medicago Apple Sequencing Technology Pear Sanger only Strawberry Sanger + 454/Illumina Peach
454 + Illumina Core eudicots Watermelon Illumina only Rosids
Rosids I Cucumber Whole Genome Duplication Poplar Whole Genome Triplication Willow Cassava Polyploid crop species Rubber Jatropha Castor Bean
Rosids II Eucalyptus Orange Cotton Cocao Papaya Arabidopsis thaliana Basal Eudicots Arabidopsis lyrata Camelina Brassica rapa Brassica oleracea Brassica napus Sacred lotus Wheat Barley Brachypodium Rice Bamboo Tef
Monocots Setaria Maize Sorghum Banana Oil Palm
Flowering Plants Flowering Date Palm Duckweed Amborella Seed Plants Loblolly Pine Plants Norway Spruce
Vascular Selaginella Plants Physcomitrella Land Chlamydomonas Vovlox
Current Opinion in Plant Biology
Current Opinion in Plant Biology 2015, 24:71–81 www.sciencedirect.com
Progress, challenges and the future of crop genomes Michael and VanBuren 73
Major challenges in crop genome sequencing Outcrossing species like grape, clonally propagated crops
projects like apple, and long-lived trees like Eucalyptus tend to
Genome assembly tools, which were generally designed have high levels of within genome heterozygosity. Para-
and tested for non-plant species [21], are ill suited for logous regions and heterozygous sites create ‘bubbles’
handling the issues of genome size, repeat content, paral- during genome assembly where two or more regions that
ogy, and heterozygosity that are common in plant gen- are highly similar assemble together, and the adjacent
omes. The throughput of NGS technologies has made it dissimilar regions assemble separately but eventually
economical to sequence most crop genomes, but resolv- merge again (Figure 2a). Assembly issues stemming from
ing plant genome complexity with 100–200 bp reads is polyploidy or heterozygosity can be overcome by using
still a challenge. Most recent mammalian genomes are diploid progenitor species (‘robusta’ coffee [26] and
assembled into chromosome scale regions [22], but most wheat [27]), closely related wild diploid species (wood-
draft plant genomes remain in thousands of highly frag- land strawberry [28]), haploid/monoploid lines (citrus
mented contigs or hundreds of scaffolds with numerous [29,30], banana [31] or peach [32]), or a bacterial artificial
imbedded gaps. Even the Arabidopsis genome, which is chromosome (BAC) by BAC sequencing approach (maize
arguably the best-assembled plant genome, is still in [10] or pear [33]) (Figure 2).
102 contigs with a total gap length of at least
185,644 bp (TAIR 10 [23]). Organelle DNA contamination can be a major problem in
genome sequencing projects. Plant cells can have over
Genome size and repeat content, which are often highly 100 chloroplasts with up to 10,000 plastid DNA copies per
correlated, present a major problem for plant genome cell [34] and organelle derived reads can constitute 5–
assembly. Genome size in plants varies by 4 orders of 20% of the total sequences in a whole genome sequencing
magnitude, from 61 Mb (Genlisea tuberosa) to over 150,000 (WGS) project. Modified DNA extraction protocols opti-
Mb (Paris japonica) (reviewed in [24]). NGS platforms can mized for nuclei isolation are typically used, which can
now generate enough raw data to sequence large genomes reduce organelle contamination several fold [35]. Organ-
but assembling so much data is a major computational elle contamination can be tested before library construc-
problem. The loblolly pine genome is the largest genome tion using a simple qPCR protocol [35]. Plant nuclear
assembled to date (22 Gb) and used a preprocessed, genomes also contain numerous organelle derived
condensed set of ‘super-reads’ to reduce the computation sequences which can have near identical homology to
resources needed for assembly [13]. the organelle genomes themselves, accurately sequenc-
ing through these regions requires read lengths that can
Repeats are a major problem in genome assembly, and span the insertion junction sites.
resolving repeat structures requires sequencing read
lengths that exceed the 10–20 kb repeats commonly Overcoming the challenges of sequencing plant genomes
found in plant genomes. Type II ‘cut and paste’ long requires both advances in sequencing technology. Lon-
terminal repeat (LTR) retrotransposons are the most ger read lengths provided by third generation single
prevalent repeat in plant genome and their proliferation molecule sequencers like Pacific Biosciences (PacBio)
results in genome bloating. Estimating the average repeat offer the possibility of overcoming the complexities of
lengths in the genome is crucial for picking read lengths, plant genomes. The average read length of PacBio reads
sequencing technology, mate pair libraries sizes and using P6C4 chemistry is over 15 kb with an average
coverage. Much of the structural variation (SV) between coverage of 30 required for a quality of Q50. Circular-
Â
cultivars within plant genomes is due to the movement of ized consensus sequencing (CCS) from PacBio uses short
LTRs, and reference based re-sequencing projects often sequences (1–5 kb) that are circularized and sequenced
miss or inaccurately predict SVs. De novo assembly of multiple times to produce very high quality sequences
three divergent rice strains uncovered several Megabases which can be used to distinguish highly similar repeat or
(Mb) of novel sequences in each strain, with many contigs paralogous structures. CCS is also useful for building
containing expressed genes [25]. high confidence, allele specific, full length transcripts
with well annotated alternative splicing sites [36]. Given
Issues of paralogy complicate genome assembly and high enough read depth, all but the longest repeats (such
result in incomplete, highly fragmented assemblies. Poly- as telomeres and centromeres) can be resolved, poten-
ploidy is common among crop species and plants have tially ushering a new era of platinum quality genome
large, multi-gene families with highly similar paralogs. assemblies. A preliminary de novo assembly of Arabidopsis
( Figure 1 Legend ) Distribution and characteristics of sequenced plant genomes. Phylogeny was adapted from: https://genomevolution.org/wiki/
index.php/Sequenced_plant_genomes and only representative crop, model and evolutionary significance genomes are shown. Branch colors
represent sequencing technologies used: Sanger only (brown), Sanger + 454/Illumina (blue), 454 + Illumina (red) and Illumina only (gold). Green and
blue circles represent whole genome duplications and triplications respectively. Branch length does not correlate with divergence
www.sciencedirect.com Current Opinion in Plant Biology 2015, 24:71–81
74 Genome studies and molecular genetics
Table 1
Published sequenced plant genomes. Over 100 plant genomes have been sequenced and published since 2000. The statistics for each
genome are taken from the publication, despite several model plants having significant updates to genome assemblies and gene counts.
NA, data not available in publication. Mb, megabases; kb, kilobases
Common name Scientific name Year Phyla Type Size Gene (#) Repeat Scaffold PMID
(Mb) (%) N50 (kb)
Arabidopsis Arabidopsis thaliana 2000 Dicot Model 125 25,498 14 NA 11130711
Rice Oryza sativa 2002 Monocot Crop 430 59,855 26 12 11935017
Rice Oryza sativa 2002 Monocot Crop 420 29,961 NA NA 11935018
Rice Oryza sativa 2005 Monocot Crop 403 37,544 26 NA 16100779
Black Populus trichocarpa 2006 Dicot Crop 485 45,555 42 3100 16973872
Cottonwood
Grape Vitis vinifera 2007 Dicot Crop 475 30,434 41 2065 17721507
Moss Physcomitrella patens 2008 Bryophyte Model 510 35,938 16 1320 18079367
Grape Vitis vinifera 2007 Dicot Crop 505 29,585 27 1330 18094749
Papaya Carica papaya 2008 Dicot Crop 372 28,629 43 1000 18432245
Lotus Lotus japonicus 2008 Dicot Non-model 472 30,799 56 NA 18511435
Sorghum Sorghum bicolor 2009 Monocot Crop 818 34,496 62 62,400 19189423
Cucumber Cucumis sativus 2009 Dicot Crop 367 26,682 24 1140 19881527
Corn Zea mays 2009 Monocot Crop 2300 32,540 85 76 19965430
Soybean Glycine max 2010 Dicot Crop 1115 46,430 57 47,800 20075913
Brachypodium Brachypodium 2010 Monocot Model 272 25,532 21 59,300 20148030
distachyon
Castor bean Ricinus communis 2010 Dicot Crop 320 31,237 50 561 20729833
Apple Malus domestica 2010 Dicot Crop 742 57,386 67 1542 20802477
Â
Jatropha Jatropha curcas 2010 Dicot Crop 380 40,929 36 NA 21149391
Cocoa Theobroma cacao 2011 Dicot Crop 430 28,798 24 473 21186351
Strawberry Fragaria vesca 2011 Dicot Crop 240 34,809 23 1300 21186353
Lyrata Arabidopsis lyrata 2011 Dicot Model 207 32,670 30 24,500 21478890
Spikemoss Selaginella 2011 Lyco Non-model 110 22,285 38 1700 21551031
moellendorffii
Date palm Phoenix dactylifera 2011 Monocot Crop 658 28,890 40 30 21623354
Potato Solanum tuberosum 2011 Dicot Crop 844 39,031 62 1318 21743474
Thellungiella Thellungiella parvula 2011 Dicot Model 140 30,419 8 5290 21822265
Cucumber Cucumis sativus 2011 Dicot Crop 367 26,587 NA 319 21829493
Chinese Brassica rapa 2011 Dicot Crop 485 41,174 40 1971 21873998
cabbage
Hemp Cannabis sativa 2011 Dicot Crop 820 30,074 NA 16 22014239
Pigeon pea Cajanus cajan 2012 Dicot Crop 833 48,680 52 516 22057054
Medicago Mediucago truncatula 2011 Dicot Model 454 62,388 31 1270 22089132
Setaria Setaria italica 2012 Monocot Model 490 38,801 46 1007 22580950
Setaria Setaria italica 2012 Monocot Model 510 35,471 40 47,300 22580951
Tomato Solanum lycopersicum 2012 Dicot Crop 900 34,727 63 16,467 22660326
Melon Cucumis melo 2012 Dicot Crop 450 27,427 NA 4680 22753475
Flax Linum usitatissimum 2012 Dicot Crop 373 43,484 24 132 22757964
Banana Musa acuminata 2012 Monocot Crop 523 36,542 44 1311 22801500
malaccensis
Tobacco Nicotiana benthamiana 2012 Dicot Crop 3000 NA NA 89 22876960
Cotton D Gossypium raimondii 2012 Dicot Crop 880 40,976 60 2284 22922876
Neem Azadirachta indica 2012 Dicot Crop 364 20,169 13 452 22958331
Barely Hordeum vulgare 2012 Monocot Crop 5100 30,400 84 NA 23075845
Pear Pyrus bretschneideri 2013 Dicot Crop 527 42,812 53 541 23149293
Dwarf birch Betula nana 2012 Dicot Non-model 448 NA NA 19 23167599
Sweet orange Citrus sinensis 2013 Dicot Crop 367 29,445 21 1690 23179022
Watermelon Citrullus lanatus 2012 Dicot Crop 425 23,440 45 2380 23179023
Wheat Triticum aestivum 2012 Monocot Crop 17,000 94,000 80 NA 23192148
Cotton D Gossypium raimondii 2012 Dicot Crop 880 37,505 61 18,800 23257886
Chinese plum Prunus mume 2012 Dicot Crop 280 31,390 45 578 23271652
Chickpea Cicer arietinum 2013 Dicot Crop 738 28,269 49 39,990 23354103
Rubber tree Hevea brasiliensis 2013 Dicot Crop 2150 68,955 72 3 23375136
Moso bamboo Phyllostachys 2013 Monocot Non-model 2075 31,987 59 329 23435089
heterocycla
Rice relative Oryza brachyantha 2013 Monocot Wild-relative 300 32,038 29 1013 23481403
Eutrema Eutrema salsugineum 2013 Dicot Non-model 243 26,351 51 13,400 23518688
salsugineum
Peach Prunus persica 2013 Dicot Crop 265 27,852 37 27,400 23525075
Wheat DD Aegilops tauschii 2013 Monocot Crop 4360 43,150 66 58 23535592
Current Opinion in Plant Biology 2015, 24:71–81 www.sciencedirect.com
Progress, challenges and the future of crop genomes Michael and VanBuren 75
Table 1 (Continued )
Common name Scientific name Year Phyla Type Size Gene (#) Repeat Scaffold PMID
(Mb) (%) N50 (kb)
Wheat AA Triticum urartu 2013 Monocot Crop 4940 34,879 67 64 23535596
Sacred lotus Nelumbo nucifera 2013 Dicot Non-model 929 26,685 57 3400 23663246
Bladderwort Utricularia gibba 2013 Dicot Non-model 77 28,500 3 95 23665961
Norway spruce Picea abies 2013 Gymnosperm Crop 19,600 28,354 70 NA 23698360
White spruce Picea glauca 2013 Gymnosperm Crop 20,000 NA NA 20 23698863
C. grandiflora C. grandiflora 2013 Dicot Non-model 200 NA NA 98 23749190
Neslia paniculata Neslia paniculata 2013 Dicot Non-model NA NA NA 62 23749190
Capsella Capsella rubella 2013 Dicot Non-model 219 26,521 NA 15,100 23749190
Tobacco Nicotiana sylvestris 2013 Dicot Crop 2636 38,940 72 80 23773524
Tobacco Nicotiana 2013 Dicot Crop 2360 38,648 75 83 23773524
tomentosiformis
Brassicaceae Leavenworthia 2013 Dicot Non-model 316 30,343 27 70 23817568
alabamica
Brassicaceae Sisymbrium irio 2013 Dicot Non-model 262 28,917 38 135 23817568
Brassicaceae Aethionema arabicum 2013 Dicot Non-model 240 23,167 37 118 23817568
Genlisea Genlisea aurea 2013 Dicot Non-model 64 17,755 NA 6 23855885
Oil palm Elaeis guineensis 2013 Monocot Crop 1824 34,802 18 1045 23883927
Mulberry Morus notabilis 2013 Dicot Non-model 357 29,338 47 390 24048436
Kiwifruit Actinidia chinensis 2013 Dicot Crop 758 39,040 36 646 24136039
Poplar, wild Populus euphratica 2013 Dicot Wild-relative 593 34,279 44 482 24256998
Amborella Amborella trichopoda 2013 Dicot Non-model 748 27,313 57 4900 24357323
Greater duckweed Spirodela polyrhiza 2013 Monocot Crop 158 19,623 13 4924 24548928
Pepper Capsicum annuum 2014 Dicot Crop 3260 35,336 80 1226 24591624
Peper, wild Capsicum annuum 2014 Dicot Wild-relative 3070 34,476 81 445 24591625
Lobloly pine Pinus taeda 2014 Gymnosperm Crop 23,200 50,172 82 67 24647006
Camelina Camelina sativa 2014 Dicot Crop 785 89,418 28 2160 24759634
Tobacco Nicotiana tabacum 2014 Dicot Crop 4600 91,870 73 345 24807620
Tobacco Nicotiana tabacum 2014 Dicot Crop 4410 81,404 79 385 24807620
Tobacco Nicotiana tabacum 2014 Dicot Crop 4570 93,650 73 350 24807620
Tobacco Nicotiana otophora 2014 Dicot Crop 2700 NA NA 27 24807620
Cotton A Gossypium arboreum 2014 Dicot Crop 1724 41,330 69 666 24836287
Brassica Brassica oleracea 2014 Dicot Crop 630 45,758 38 1457 24852848
Wild radish Raphanus 2014 Dicot Crop 515 38,174 NA 10 24876251
raphanistrum
Common bean Phaseolus vulgaris 2014 Dicot Crop 587 27,197 45 50,000 24908249
Sweet orange Citrus sinensis 2014 Dicot Crop 367 25,379 31 250 24908277
Clementine Citrus clementina 2014 Dicot Crop 367 24,533 45 6800 24908277
Eucalyptus Eucalyptus grandis 2014 Dicot Crop 640 36,376 50 53,900 24919147
Willow Salix suchowensis 2014 Dicot Non-model 425 26,599 40 924 24980958
Soybean, wild Glycine max 2014 Dicot Wild-relative 1165 52,395 43 401 25004933
Tef Egrostis tef 2014 Monocot Crop 772 38,000 14 66 25007843
Wheat Triticum aestivum 2014 Monocot Crop 17,000 124,201 NA NA 25035500
Rice relative Oryza glaberrima 2014 Monocot Wild-relative 316 33,164 34 217 25064006
Tomato, wild Solanum pennellii 2014 Dicot Wild-relative 1200 32,273 82 1741 25064008
Canola Brassica napus 2014 Dicot Crop 1130 101,040 35 763 25146293
Coffee Cofea canefora 2014 Dicot Crop 710 25,574 50 1260 25190796
Soybean, wild Glycine soja 2014 Dicot Wild-relative 981 55,061 NA 18 25218520
Soybean, wild Glycine Soja 2014 Dicot Wild-relative 1001 54,256 NA 57 25218520
Soybean, wild Glycine soja 2014 Dicot Wild-relative 1054 56,542 NA 17 25218520
Soybean, wild Glycine soja 2014 Dicot Wild-relative 1118 57,631 NA 49 25218520
Soybean, wild Glycine soja 2014 Dicot Wild-relative 956 55,901 NA 65 25218520
Soybean, wild Glycine soja 2014 Dicot Wild-relative 993 54,805 NA 52 25218520
Soybean, wild Glycine soja 2014 Dicot Wild-relative 889 54,797 NA 45 25218520
Eggplant Solanum melongena 2014 Dicot Crop 1126 85,446 70 64 25233906
Cassava Manihot esculenta 2014 Dicot Wild-relative 742 34,483 37 67 25300236
Cassava Manihot esculenta 2014 Dicot Wild-relative 742 38,845 26 27 25300236
Jujube Ziziphus jujuba 2014 Dicot Crop 444 32,808 47 301 25350882
Blueberry Vaccinium 2014 Dicot Crop 600 60,000 NA 145 NA
corymbosum
www.sciencedirect.com Current Opinion in Plant Biology 2015, 24:71–81
76 Genome studies and molecular genetics
Figure 2
resources until recently because of the high cost of early
Sanger sequencing and their low production values com-
(a) (b) pared to row crops. Papaya is an exception to this, as it was
the fifth plant genome and first specialty crop to be
sequenced with roughly 3X coverage of Sanger reads
and BAC end sequences for scaffolding. The papaya
drafted genome showed the stability of transgenic inser-
tions and served as a powerful model for early compara-
tive genomics work because it lacks a lineage specific
whole genome duplication (WGD) [37]. The introduction
of NGS technologies facilitated sequencing several small double haploid or diploid species double
whole genome shotgun reads
specialty crop genomes including cucumber [38], apple
Highly heterozygotic or polyploid species
[39], strawberry [28], cacao [40], date palm [41], and
watermelon [42]. Most of the recently published crop
A B A A′ genomes are specialty crops (see Table 1). Here we focus
(c) (d)
on specialty crops with the most economic value, and row
crops with novel sequencing strategies that can be applied
to other species.
Tomato seed is worth its weight in gold. It is the leading
vegetable crop with a rich and diverse breeding program,
and serves as a model system for fruit development. The
high quality sanger based genome uncovered a whole
Minimum tilling path of BACs Minimum
genome triplication event which facilitated neo-functio-
long third generation PacBio reads PacBio long third generation
nalization of genes related to fruit quality and develop-
ment [43]. Comprehensive resequencing of 360 diverse Use short readsPacBio or WGS reads for error correction WGS reads for Use shortor readsPacBio
A B tomato accessions showed that two independent sets of
A B
quantitative trait loci (QTLs) are responsible for the 100-
Current Opinion in Plant Biology
fold increase in fruit size during tomato domestication
[44]. In addition to tomato and potato, other members of
Strategies for sequencing complex crop genomes. A and B represent
the Solanaceae have also been recently sequenced, in-
subgenomes in a polyploidy or homologous chromosomes in a highly
heterozygotic species, grey lines connect nearly identical regions and cluding a domesticated pepper, a wild-relative and
light blue and red represent diverse regions. A typical WGS strategy 20 resequenced pepper accessions [45] and the eggplant
yields a highly fragmented genome as similar regions assemble
genome [46].
together and diverse regions assemble separately creating ‘bubbles’ in
the assembly graph. (b) A double haploid (for heterozygous species)
or diploid relative/progenitor (for polyploidy species) can be used to The citrus complex, which is an admixture of hybrids that
simplify the assembly. Reads from only one haplotype (in this case A) includes oranges, grapefruit, lemons and limes, is the
assemble without ambiguities. (c) Long reads from third generation
highest value fruit crop across the world. Reference citrus
single molecule sequencers like PacBio can be used to assemble both
genomes for sweet orange [30] and Clementine mandarin
haplotypes separately resulting in two complete subgenomes. Short
[29] were sequenced using double-haploid lines to elimi-
PacBio or Illumina WGS reads are used to correct the long reads prior
to assembly. (d) A more traditional and expensive BAC by BAC nate within genome heterozygosity because of clonal
approach can be used where a BACs from a minimum tilling path are propagation and interspecific hybridization. Resequen-
sequenced separately and the stitched together to create a chimeric
cing efforts showed cultivated citrus are derived from two
assembly of both haplotypes. WGS can be mapped to the chimera to
progenitor species and sweet orange has a complex pedi-
sequence the second haplotype.
gree; one parent is a shared ancestor of mandarin and the
other is likely a pummelo with introgressions of wild
mandarin [29,30].
thaliana accession Ler-0 using P4C3 chemistry produced
a contig N50 of 6.36 Mb, similar to the quality of the Banana is a key starch staple in Africa and Asia with
TAIR10 release (PacBio website; URL: http://blog. consumption up to 400 kg/person/year, and is the second
pacificbiosciences.com/2013/08/new-data-release- most cultivated tropical fruit behind the citrus complex.
arabidopsis-assembly.html). Banana was the first non-grass monocot with a published
high contiguity reference genome. Most global banana
NGS technologies kick-start specialty crops production stems from somaclones derived from a single
genomics triploid line ‘Cavendish’, and genomic resources are
Specialty crops, which include most fruits and vegetables, essential for improved disease resistance and yield. A
nut trees, and beverage crops, have had limited genomic double-haploid banana was used to overcome issues
Current Opinion in Plant Biology 2015, 24:71–81 www.sciencedirect.com
Progress, challenges and the future of crop genomes Michael and VanBuren 77
associated with triploidy [31]. Banana has three WGD resources were limited because of its large (17 Gb),
events independent of the grasses, but surprisingly few hexaploid (2n = 6x = 42), repeat rich ( 87%) genome
NBS-LRR disease resistance genes (89), which may [51]. To overcome issues with genome complexity, flow
contribute to its disease susceptibility [31]. cytometry was used with aneuploidy wheat lines to isolate
and sequence each chromosome arm separately, with
Over 2.25 billion cups of coffee are consumed each day each arm representing 1.3–3.3% of the genome. The final
making coffee the world’s leading beverage crop. Most assembly is highly fragmented spanning 10.2 Gb with
coffee comes from Coffea arabica, a highly heterozygous, 124,201 genes distributed unevenly across the three sub-
outcrossing allotetraploid. The coffee reference genome genomes. The subgenomes of wheat have limited gene
was generated using a double haploid accession of C. loss or rearrangements, contrasting the dynamic shuffling
canephora, one of the diploid parents of C. arabica and the and loss in the much younger polyploidy events in B.
source of ‘robusta’ coffee [26]. Coffee has tandem dupli- napus. This suggests plasticity in the events post WGD;
cations of N-methyltransferases (NMTs) that contribute all polyploidy events are not alike. There is no global
to caffeine production. Comparisons of NMTs from tea genome dominance among the wheat subgenomes, but
and cacao suggest caffeine biosynthesis has polyphyletic there is cell and stage dependent dominance, including
origins and evolved at least twice. gene families related to baking quality [52].
Sub-genome assisted sequencing of complex Tetraploid cotton (AADD, Gossypium hirsutum) has higher
polyploidy row crops fiber and quality production than diploid cotton (G.
42% of human energy supply comes from cereal row crops barbadense), an emergent property with major QTLs in
such as rice, wheat, and maize [47], while other row crops the D subgenome from G. raimondii, which has no spin-
such as cotton, soybean and canola play major roles in nable fibers [53]. Diploid cotton (G. raimindii) has an
nutrition and clothing. The first row crop genomes, rice, abrupt 5–6 fold ploidy increase after splitting from the
maize and sorghum were sequenced using Sanger WGS cacao linage, rivaled only by the Brassicaceae. Tetraploid
and BAC by BAC approaches [7,8,10,48]. Other cereals cotton has numerous non-reciprocal DNA exchanges
have been sequenced recently including barley [49] the between the A and D subgenomes, and coordinated gene
model C3 grass B. distachyon [3] and the model C4 grass S. expression changes including nuclear mitochondrial
italica [5]. Row crops have a propensity for polyploidy, genes involved in electron transport, which likely con-
which likely contributes to their improved nutritional tribute to fiber production [54].
content and high yields. Polyploidy also confers emergent
properties like seed oil accumulation in canola, spinnable Tree crop genomes
fibers in cotton, and grain composition in wheat. The Trees are long-lived perennials that are valued for timber,
presence of multiple subgenomes complicates genome fuel and other products. Extraordinary progress has been
assembly of polyploids, but a powerful approach of se- made on economically important tree crops starting with
quencing the likely subgenome diploid contributor has the publication of P. trichocarpa (poplar [9]), in 2006,
accelerated our understanding of these complex genomes. which made it the third published plant genome. Another
fast growing and economically import tree Eucalyptus
Brassica napus (canola or rapeseed), the third largest grandis was published more recently [55]. The Eucalyptus
source of vegetable oil, is an allotetraploid of B. rapa genome revealed an expansion of terpene synthesis genes
(turnips and nana cabbage) and B. oleracea (cabbage, associated with defense as well as the largest number of
broccoli, cauliflower, kale and other cruciferous vegeta- tandem repeats of any sequenced genome to date. Three
bles), which occurred 7500–12,500 years ago. Most of the gymnosperm genomes, Picea tadea (loblolly pine [13]),
20,000 Illumina and 454 based scaffolds in B. napus were Picea glauca (white spruce [56]), and Picea abies (norway
assigned to either the A or C subgenome using 454 reads spruce [57]), also have been sequenced recently. In
from each progenitor parent, providing for unprecedented addition to being the first three gymnosperm genomes,
comparisons of homeologous regions in a polyploidy they are also the largest genomes sequenced to date.
species. Despite the young age of rapeseed, around Since the generation time in most trees is long compared
100 genes have been lost from each subgenome and to row crops, draft genomes enables technologies like
subtle changes in epigenetic regulation, homeologous genomic selection (GS), which model superior genotypes
exchange and gene expression divergence have occurred. using genome-wide markers and limited phenotyping,
Repeated whole genome duplications in rapeseed have reducing time-consuming phenotypic selection and
created a 72 fold duplication of the ancestral flowering breeding cycles [58].
plant genome, and unique expansions in oil biosynthesis
genes and loss of glucosinolate genes were observed [50]. Orphan crops
Orphan crops have limited improvement from their wild
Bread wheat (Triticum aestivum) is a staple food for 30% of relatives, unrecognized nutritional value, disease suscep-
the world’s population but until recently, genomic tibility, poor shelf life and growth constraints. However,
www.sciencedirect.com Current Opinion in Plant Biology 2015, 24:71–81
78 Genome studies and molecular genetics
orphan crops like pigeon pea (Cajanus cajan), cassava and the 13,632 accessions maintained in the ICRISAT gen-
tef (Eragrostis tef) are major staples in underdeveloped bank [59]. Pigeon pea has a large repertoire of universal
regions and genomic resources are essential to boost pro- drought response proteins, which may contribute to its
duction. The draft genome of pigeon pea provided over drought tolerance. Draft genomes for both wild and culti-
300,000 SSR markers for plant breeding and screening of vated cassava are available as well as a repertoire of SNPs
Figure 3
(a)
100% Soil 75% Soil 50% Soil 25% Soil No Water Capacity Capacity Capacity Capacity
Abiotic Stress
Photosynthesis Measurements Electrolyte Leakage Assays
(b) (c) ENCODE Analyses
Nucleosome Hypersensitivi- TFs CH3 ty Sites CH3 Gene CH3
(d) Long-range Cis-regulatory elements Exon regulatory elements (Promoters TF binding sites) Transcript H3K4me3 ChIP-seq (Methylation)
ChIP-seq (Protein/Histone Binding)
DNase-seq (Hypersenstitive Sites)
mRNAseq (transcript Abundance)
DNAseq
(Resequencing) SNP Deletion
Current Opinion in Plant Biology
The next frontier of plant genomics: Overview of the Brachypodium plant ENCODE project. (a) Plants (in this case Brachypodium), are subjected
to various abiotic stresses and responses are measured using high-throughput phenotyping and physiology. Material from stressed and control
plants are used to generate ENCODE datasets in panel d. (b) Gene co-regulation network produced from ENCODE data. (c) Genetic elements that
are targeted in an ENCODE project. (d) Example datasets from an ENCODE project. Peaks reflect a high depth of Illumina reads corresponding to
highly methylated regions (H3K4me3 ChIP-seq), regions with TF or histone binding (ChIP-seq), open chromatin regions (DNase-seq), transcribed
regions (mRNAseq) and polymorphisms/InDels between different lines (DNAseq). Taken together, ENCODE datasets can paint a near complete
picture of epigenetic regulation under a given stress.
Current Opinion in Plant Biology 2015, 24:71–81 www.sciencedirect.com
Progress, challenges and the future of crop genomes Michael and VanBuren 79
for breeding. Cassava genes involved in starch accumula- phenotyping will be leveraged to understand the net-
tion, photosynthesis and abiotic stresses have been posi- works active under drought conditions (Figure 3, DOE
tively selected during domestication and genes involved in award abstract; URL: http://genomicscience.energy.gov/
toxic cyanogenic glucoside formation have been purged research/DOEUSDA/abstracts/2014mockler_abstract.
[60]. A draft genome of tef identified novel SSR loci for shtml). Once we have detailed genome maps the ability to
marker assisted breeding and provides a framework for edit specific sequences such as has been done in rice and
identifying genes related to abiotic stresses and nutrition wheat [67] will enable the next generation of crop do-
[61]. mestication.
Conclusion: beyond the assembled genome The amount of available genomic data for crop plants is
and the future of crop genomics staggering, and thousands of Gb of plant sequences are
Though reference genomes are now available for many deposited in NCBI and other public databases monthly.
crops, only diploid/haploid references are available for As a community, we are about to have resources we could
polyploidy crops like potato, coffee, strawberry and ba- only dream of. How will we use them to meet the
nana and most highly heterozygotic genomes have only challenge of feeding 9 billion people by 2050? New tools
one sequenced haplotype. Some of the elements contrib- for analyzing these high-throughput datasets are desper-
uting to agronomic traits like gene duplications, genome ately needed, and training of young scientists should shift
rearrangements and repeat integrations are cultivar spe- toward computer science and engineering in order to
cific, and cannot be found using resequencing strategies prosper in the changing face of biological research.
or reduced complexity references. High quality refer- How will we build the plant genomicists of the future
ences of each subgenome (in polyploids) and each haplo- who only know science with big data, whole genome
type (in heterozygous crops) as well as multiple analysis and full information access?
references per crop species are needed to survey true
variation. Acknowledgements
This work was supported in part by funding from the National Science
Foundation (DBI-1401572) to R.V., and DARPA (HR0011-13-C-0103) to
Many crop genomes, especially those of vegetables, have T.P.M.
gone through a very tight breeding bottleneck to arrive at
our table. The impact is that a great deal of diversity is lost
References and recommended reading
in current breeding germplasm, leading to slowed im- Papers of particular interest, published within the period of review,
have been highlighted as:
provement and potential for loss of disease resistance.
There is a pressing need to develop genomic resources for of special interest
of outstanding interest
these crop wild-relatives so that they can be used in
breeding, allele identification and introgression [62].
1. Initiative AG: Analysis of the genome sequence of the flowering
Draft genomes from wild relatives of tomato (Solanum
plant Arabidopsis thaliana. Nature 2000, 408:796.
pennellii), soybean (Glycine soja) [63–65] and cassava [60]
2. Michael TP, Jackson S: The first 50 plant genomes. Plant
are currently available, but more wild species are needed Genome 2013:6.
for crop improvement programs.
3. Vogel JP, Garvin DF, Mockler TC, Schmutz J, Rokhsar D,
Bevan MW, Barry K, Lucas S, Harmon-Smith M, Lail K: Genome
sequencing and analysis of the model grass Brachypodium
An assembled reference genome sequence is simply a
distachyon. Nature 2010, 463:763-768.
foundation; the true challenge is to identify the features
4. Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H,
of the genome that describe the biology. Although every
Nishiyama T, Perroud P-F, Lindquist EA, Kamisugi Y: The
cell has essentially the same DNA sequence, epigenetic Physcomitrella genome reveals evolutionary insights into the
conquest of land by plants. Science 2008, 319:64-69.
decorations and gene expression vary greatly by cell
based on the environment, developmental stage and 5. Bennetzen JL, Schmutz J, Wang H, Percifield R, Hawkins J,
Pontaroli AC, Estep M, Feng L, Vaughn JN, Grimwood J:
tissue type. The next phase of crop genomics will be
Reference genome sequence of the model plant Setaria. Nat
to completely elucidate these biologically active states of Biotechnol 2012, 30:555-561.
DNA, as has been done for other model systems such as 6. Zhang G, Liu X, Quan Z, Cheng S, Xu X, Pan S, Xie M, Zeng P,
Yue Z, Wang W: Genome sequence of foxtail millet (Setaria
human, mouse, drosophila and C. elegans in a plant EN-
italica) provides insights into grass evolution and biofuel
CODE (Encyclopedia of DNA Elements) projects [66].
potential. Nat Biotechnol 2012, 30:549-554.
Such studies can take on several forms, such as has been
7. Goff SA, Ricke D, Lan T-H, Presting G, Wang R, Dunn M,
done for other model systems where integrated maps of Glazebrook J, Sessions A, Oeller P, Varma H: A draft sequence of
the rice genome (Oryza sativa L. ssp. japonica). Science 2002,
DNA methylation, smallRNA, histone modification and
296:92-100.
transcript abundance are measure across multiple tissues
8. Yu J, Hu S, Wang J, Wong GK-S, Li S, Liu B, Deng Y, Dai L, Zhou Y,
and conditions (Figure 3). Brachypodium has the first
Zhang X: A draft sequence of the rice genome (Oryza sativa L.
formal plant ENCODE project funded by DOE/USDA ssp. indica). Science 2002, 296:79-92.
where not only detailed molecular maps of epigenetic
9. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I,
modifications and expression will be generated, but Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A: The
www.sciencedirect.com Current Opinion in Plant Biology 2015, 24:71–81
80 Genome studies and molecular genetics
genome of black cottonwood, Populus trichocarpa (Torr. & Oryza sativa, document novel gene space of aus and indica.
Gray). Science 2006, 313:1596-1604. Genome Biol 2014, 15:506.
The authors produce de novo sequences of three diverse rice strains and
10. Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, uncover several Mb of sequences unique to each strain that could not
Liang C, Zhang J, Fulton L, Graves TA: The B73 maize genome: have been identified using a standard resequencing approach.
complexity, diversity, and dynamics. Science 2009, 326:1112-
1115. 26. Denoeud F, Carretero-Paulet L, Dereeper A, Droc G, Guyot R,
Pietrella M, Zheng C, Alberti A, Anthony F, Aprea G: The coffee
11. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, genome provides insight into the convergent evolution of
Hyten DL, Song Q, Thelen JJ, Cheng J: Genome sequence of the caffeine biosynthesis. Science 2014, 345:1181-1184.
palaeopolyploid soybean. Nature 2010, 463:178-183.
27. Ling H-Q, Zhao S, Liu D, Wang J, Sun H, Zhang C, Fan H, Li D,
12. Consortium TG: The tomato genome sequence provides Dong L, Tao Y: Draft genome of the wheat A-genome
insights into fleshy fruit evolution. Nature 2012, 485:635-641. progenitor Triticum urartu. Nature 2013, 496:87-90.
13. Zimin A, Stevens KA, Crepeau MW, Holtz-Morris A, Koriabine M,
28. Shulaev V, Sargent DJ, Crowhurst RN, Mockler TC, Folkerts O,
Marc¸ais G, Puiu D, Roberts M, Wegrzyn JL, de Jong PJ:
Delcher AL, Jaiswal P, Mockaitis K, Liston A, Mane SP: The
Sequencing and assembly of the 22-Gb loblolly pine genome.
genome of woodland strawberry (Fragaria vesca). Nat Genet
Genetics 2014, 196:875-890.
2011, 43:109-116.
The authors use a novel reduced complexity sequencing strategy to
assemble the 22 Gb loblolly pine genome, the largest genome sequenced 29. Wu GA, Prochnik S, Jenkins J, Salse J, Hellsten U, Murat F,
to date. Perrier X, Ruiz M, Scalabrin S, Terol J: Sequencing of diverse
mandarin, pummelo and orange genomes reveals complex
14. Bennetzen JL, Kellogg EA: Do plants have a one-way ticket to
history of admixture during citrus domestication. Nat
genomic obesity? Plant Cell 1997, 9:1509.
Biotechnol 2014, 32:656-662.
15. Fleischmann A, Michael TP, Rivadavia F, Sousa A, Wang W,
30. Xu Q, Chen L-L, Ruan X, Chen D, Zhu A, Chen C, Bertrand D,
Temsch EM, Greilhuber J, Mu¨ ller KF, Heubl G: Evolution of
Jiao W-B, Hao B-H, Lyon MP: The draft genome of sweet orange
genome size and chromosome number in the carnivorous
(Citrus sinensis). Nat Genetics 2013, 45:59-66.
plant genus Genlisea (Lentibulariaceae), with a new estimate
of the minimum genome size in angiosperms. Annals Bot 2014. 31. D’Hont A, Denoeud F, Aury J-M, Baurens F-C, Carreel F,
mcu189. Garsmeur O, Noel B, Bocs S, Droc G, Rouard M: The banana
(Musa acuminata) genome and the evolution of
16. Leushkin EV, Sutormin RA, Nabieva ER, Penin AA, Kondrashov AS,
monocotyledonous plants. Nature 2012, 488:213-217.
Logacheva MD: The miniature genome of a carnivorous plant
Genlisea aurea contains a low number of genes and short non- 32. Verde I, Abbott AG, Scalabrin S, Jung S, Shu S, Marroni F,
coding sequences. BMC Genomics 2013, 14:476. Zhebentyayeva T, Dettori MT, Grimwood J, Cattonaro F: The high-
quality draft genome of peach (Prunus persica) identifies
17. Ibarra-Laclette E, Lyons E, Herna´ ndez-Guzma´ n G, Pe´ rez-
unique patterns of genetic diversity, domestication and
Torres CA, Carretero-Paulet L, Chang T-H, Lan T, Welch AJ,
genome evolution. Nat Genetics 2013, 45:487-494.
Jua´ rez MJA, Simpson J: Architecture and evolution of a minute
plant genome. Nature 2013, 498:94-98.
33. Wu J, Wang Z, Shi Z, Zhang S, Ming R, Zhu S, Khan MA, Tao S,
The compact bladderwort genome provides evidence that almost all
Korban SS, Wang H: The genome of the pear (Pyrus
intragenic space and repeat sequences can be purged.
bretschneideri Rehd.). Genome Res 2013, 23:396-408.
18. Wang W, Haberer G, Gundlach H, Gla¨ ßer C, Nussbaumer T, Luo M,
34. Shaver JM, Oldenburg DJ, Bendich AJ: Changes in chloroplast
Lomsadze A, Borodovsky M, Kerstetter R, Shanklin J: The
DNA during development in tobacco, Medicago truncatula,
Spirodela polyrhiza genome reveals insights into its
pea, and maize. Planta 2006, 224:72-82.
neotenous reduction fast growth and aquatic lifestyle. Nat
Commun 2014:5. 35. Lutz KA, Wang W, Zdepski A, Michael TP: Isolation and analysis
of high quality nuclear DNA with reduced organellar DNA for
19. Banks JA, Nishiyama T, Hasebe M, Bowman JL, Gribskov M,
plant genome sequencing and resequencing. BMC Biotechnol
Albert VA, Aono N, Aoyama T, Ambrose BA, Ashton NW: The
2011, 11:54.
Selaginella genome identifies genetic changes associated
with the evolution of vascular plants. Science 2011, 332:960- 36. Tilgner H, Grubert F, Sharon D, Snyder MP: Defining a personal,
963. allele-specific, and single-molecule long-read transcriptome.
Proc Natl Acad Sci 2014, 111:9869-9874.
20. Albert VA, Barbazuk WB, Der JP, Leebens-Mack J, Ma H,
The authors use single-molecule long-reads to assemble a haplotype
Palmer JD, Rounsley S, Sankoff D, Schuster SC, Soltis DE: The
specific transcriptome rich with gene isoforms which has broad applica-
Amborella genome and the evolution of flowering plants.
tions for crop genomics.
Science 2013, 342:1241089.
Amborella is the most basal flowering plant and serves as a powerful 37. Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, Senin P,
reference for comparative genomics. The amborella genome provided Wang W, Ly BV, Lewis KL: The draft genome of the transgenic
evidence for an ancient WGD that predated all flowering plants. tropical fruit tree papaya (Carica papaya Linnaeus). Nature
2008, 452:991-996.
21. Metzker ML: Sequencing technologies — the next generation.
Nat Rev Genet 2009, 11:31-46. 38. Huang S, Li R, Zhang Z, Li L, Gu X, Fan W, Lucas WJ, Wang X,
Xie B, Ni P: The genome of the cucumber, Cucumis sativus L.
22. Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Burton JN,
Nat Genetics 2009, 41:1275-1281.
Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S: High-quality draft
assemblies of mammalian genomes from massively parallel
39. Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A,
sequence data. Proc Natl Acad Sci 2011, 108:1513-1518.
Kalyanaraman A, Fontana P, Bhatnagar SK, Troggio M, Pruss D:
The genome of the domesticated apple (Malus [times]
23. Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C,
domestica Borkh.). Nat Genetics 2010, 42:833-839.
Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-
Hernandez M: The Arabidopsis Information Resource (TAIR):
40. Argout X, Salse J, Aury J-M, Guiltinan MJ, Droc G, Gouzy J,
improved gene annotation and new tools. Nucleic Acids Res
Allegre M, Chaparro C, Legavre T, Maximova SN: The genome of
2012, 40:D1202-D1210.
Theobroma cacao. Nat Genetics 2011, 43:101-108.
24. Michael TP: Plant genome size variation: bloating and purging
41. Al-Dous EK, George B, Al-Mahmoud ME, Al-Jaber MY, Wang H,
DNA. Briefings Funct Genomics 2014. elu005.
Salameh YM, Al-Azwani EK, Chaluvadi S, Pontaroli AC, DeBarry J:
De novo genome sequencing and comparative genomics of
25. Schatz MC, Maron LG, Stein JC, Wences AH, Gurtowski J,
date palm (Phoenix dactylifera). Nat Biotechnol 2011, 29:521-
Biggers E, Lee H, Kramer M, Antoniou E, Ghiban E: Whole
527.
genome de novo assemblies of three divergent strains of rice,
Current Opinion in Plant Biology 2015, 24:71–81 www.sciencedirect.com
Progress, challenges and the future of crop genomes Michael and VanBuren 81
42. Guo S, Zhang J, Sun H, Salse J, Lucas WJ, Zhang H, Zheng Y, polyploidization of Gossypium genomes and the evolution of
Mao L, Ren Y, Wang Z: The draft genome of watermelon spinnable cotton fibres. Nature 2012, 492:423-427.
(Citrullus lanatus) and resequencing of 20 diverse accessions.
55. Myburg AA, Grattapaglia D, Tuskan GA, Hellsten U, Hayes RD,
Nat Genetics 2013, 45:51-58.
Grimwood J, Jenkins J, Lindquist E, Tice H, Bauer D: The genome
43. Consortium PGS: Genome sequence and analysis of the tuber of Eucalyptus grandis. Nature 2014, 510:356-362.
crop potato. Nature 2011, 475:189-195.
56. Birol I, Raymond A, Jackman SD, Pleasance S, Coope R,
44. Lin T, Zhu G, Zhang J, Xu X, Yu Q, Zheng Z, Zhang Z, Lun Y, Li S, Taylor GA, Saint Yuen MM, Keeling CI, Brand D, Vandervalk BP:
Wang X: Genomic analyses provide insights into the history of Assembling the 20 Gb white spruce (Picea glauca) genome
tomato breeding. Nat Genetics 2014, 46:1220-1226. from whole-genome shotgun sequencing data. Bioinformatics
The authors resequence 360 wild and cultivated tomato accessions and 2013. btt178.
uncover two independent sets of QTLs that increased fruit size 100x
57. Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin Y-C,
compared to wild tomatoes.
Scofield DG, Vezzi F, Delhomme N, Giacomello S, Alexeyenko A:
45. Qin C, Yu C, Shen Y, Fang X, Chen L, Min J, Cheng J, Zhao S, Xu M, The Norway spruce genome sequence and conifer genome
Luo Y: Whole-genome sequencing of cultivated and wild evolution. Nature 2013, 497:579-584.
peppers provides insights into Capsicum domestication and
58. Resende MF, Mun˜ oz P, Resende MD, Garrick DJ, Fernando RL,
specialization. Proc Natl Acad Sci 2014, 111:5135-5140.
Davis JM, Jokela EJ, Martin TA, Peter GF, Kirst M: Accuracy of
genomic selection methods in a standard data set of loblolly
46. Hirakawa H, Shirasawa K, Miyatake K, Nunome T, Negoro S,
pine (Pinus taeda L.). Genetics 2012, 190:1503-1510.
Ohyama A, Yamaguchi H, Sato S, Isobe S, Tabata S: Draft
genome sequence of eggplant (Solanum melongena L.): the
59. Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, Schlueter JA,
representative solanum species indigenous to the old world.
Donoghue MT, Azam S, Fan G, Whaley AM: Draft genome
DNA Res 2014. dsu027.
sequence of pigeonpea (Cajanus cajan), an orphan legume
crop of resource-poor farmers. Nat Biotechnol 2012, 30:83-89.
47. Elert E: Rice by the numbers: a good grain. Nature 2014,
514:S50-S51.
60. Wang W, Feng B, Xiao J, Xia Z, Zhou X, Li P, Zhang W, Wang Y,
Møller BL, Zhang P: Cassava genome from a wild ancestor to
48. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J,
cultivated varieties. Nat Commun 2014:5.
Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A: The
Sorghum bicolor genome and the diversification of grasses.
61. Cannarozzi G, Plaza-Wu¨ thrich S, Esfeld K, Larti S, Wilson YS,
Nature 2009, 457:551-556.
Girma D, de Castro E, Chanyalew S, Blo¨ sch R, Farinelli L: Genome
and transcriptome sequencing identifies breeding targets in
49. Consortium IBGS: A physical, genetic and functional sequence
the orphan crop tef (Eragrostis tef). BMC Genomics 2014,
assembly of the barley genome. Nature 2012, 491:711-716.
15:581.
50. Chalhoub B, Denoeud F, Liu S, Parkin IA, Tang H, Wang X,
62. McCouch S, Baute GJ, Bradeen J, Bramel P, Bretting PK,
Chiquet J, Belcram H, Tong C, Samans B: Early allopolyploid
Buckler E, Burke JM, Charest D, Cloutier S, Cole G: Agriculture:
evolution in the post-Neolithic Brassica napus oilseed
feeding the future. Nature 2013, 499:23-24.
genome. Science 2014, 345:950-953.
The authors used sequences from the diploid parental species to sepa-
63. Kim MY, Lee S, Van K, Kim T-H, Jeong S-C, Choi I-Y, Kim D-S,
rate scaffolds by subgenome in the allopolyploid Brassica napus genome,
Lee Y-S, Park D, Ma J: Whole-genome sequencing and
providing an unprecedented look at the early events of polyploidy.
intensive analysis of the undomesticated soybean (Glycine
soja Sieb. and Zucc.) genome. Proc Natl Acad Sci 2010,
51. Mayer KF, Rogers J, Dolezˇel J, Pozniak C, Eversole K, Feuillet C, 107:22032-22037.
Gill B, Friebe B, Lukaszewski AJ, Sourdille P: A chromosome-
based draft sequence of the hexaploid bread wheat (Triticum 64. Li Y, Zhao S, Ma J, Li D, Yan L, Li J, Qi X-t, Guo X-s, Zhang L, He W-
aestivum) genome. Science 2014, 345:1251788. m: Molecular footprints of domestication and improvement in
The authors use flow cytometry to separate individual chromosomes for soybean revealed by whole genome re-sequencing. BMC
sequencing the hexaploid wheat genome unveiling emergent properties Genomics 2013, 14:579.
related to grain composition.
65. Li Y-h, Zhou G, Ma J, Jiang W, Jin L-g, Zhang Z, Guo Y, Zhang J,
52. Pfeifer M, Kugler KG, Sandve SR, Zhan B, Rudi H, Hvidsten TR, Sui Y, Zheng L: De novo assembly of soybean wild relatives for
Mayer KF, Olsen O-A: Genome interplay in the grain pan-genome analysis of diversity and agronomic traits. Nat
transcriptome of hexaploid bread wheat. Science 2014, Biotechnol 2014, 32:1045-1052.
345:1250091.
66. Lane AK, Niederhuth CE, Ji L, Schmitz RJ: pENCODE: a plant
53. Jiang C-X, Wright RJ, El-Zik KM, Paterson AH: Polyploid encyclopedia of DNA elements. Annu Rev Genet 2014:48.
formation created unique avenues for response to selection in This review provides a detailed explanation of the methods and utility of
Gossypium (cotton). Proc Natl Acad Sci 1998, 95:4419-4424. ENCODE projects and the future of crop genomics.
54. Paterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin D, 67. Shan Q, Wang Y, Li J, Gao C: Genome editing in rice and wheat
Llewellyn D, Showmaker KC, Shu S, Udall J: Repeated using the CRISPR/Cas system. Nat Protocols 2014, 9:2395-2410.
www.sciencedirect.com Current Opinion in Plant Biology 2015, 24:71–81