Available online at www.sciencedirect.com

ScienceDirect

Progress, challenges and the future of crop genomes

1 2

Todd P Michael and Robert VanBuren

The availability of reference genomes has ushered in a The high throughput and low cost of NGS technologies

new era of crop genomics. More than 100 plant genomes have made it possible to sequence crops with lower economic

been sequenced since 2000, 63% of which are crop species. value or large genomes and have paved the way for

These genome sequences provide insight into architecture, establishing new model species. The complexity and size

evolution and novel aspects of crop genomes such as the of some crop genomes made traditional Sanger sequenc-

retention of key agronomic traits after whole genome ing cost prohibitive. The wheat genome for instance, is

duplication events. Some crops have very large, polyploid, hexaploid, 90% repetitive, and 17 gigabases (Gb), and the

repeat-rich genomes, which require innovative strategies for sugarcane genome ranges in ploidy up to decaploid, and

sequencing, assembly and analysis. Even low quality reference its 12 Gb is 80% repetitive. Although sequencing capacity

genomes have the potential to improve crop germplasm and computational power are increasing exponentially,

through genome-wide molecular markers, which decrease numerous challenges still remain, and both novel meth-

expensive phenotyping and breeding cycles. The next stage of odologies and legacy techniques are important to crack

plant genomics will require draft genome refinement, building these impossible genomes.

resources for crop wild relatives, resequencing broad diversity

panels, and plant ENCODE projects to better understand the Model plant genomes such as Arabidopsis [1], Brachypo-

complexities of these highly diverse genomes. dium distachyon [3], Physcomitrella patens (moss [4]) and

Addresses Setaria italica [5,6], serve as an engine for research, while

1

Ibis Biosciences, Carlsbad, CA, United States others like Oyrza sativa (rice [7,8]), Populus trichocarpa ([9]

2

Donald Danforth Plant Science Center, St. Louis, MO, United States poplar), Zea mays (maize [10]), Glycine max (soybean [11]),

Solanum lycopersicum (tomato [12]), and Pinus taeda (lob-

Corresponding author: VanBuren, Robert ([email protected])

lolly pine [13]) serve a dual purpose not just as crops but

as functional models. Together these genomes have

Current Opinion in Plant Biology 2015, 24:71–81 provided the foundation for an era of molecular genomics

research that has enabled functional definition of many

This review comes from a themed issue on Genome studies and

molecular genetics key genes and pathways.

Edited by Insuk Lee and Todd C Mockler

Non-model and non-crop plant genomes provide im-

For a complete overview see the Issue and the Editorial

portant clues to plant genome architecture and the

Available online 19th February 2015

evolution of flowering . Although it was thought

http://dx.doi.org/10.1016/j.pbi.2015.02.002 that plants have a ‘one-way ticket to genome obesity’ as

1369-5266/# 2015 Elsevier Ltd. All rights reserved. a result of the retention of proliferating transposable

elements (TEs) [14], the smallest plant genomes [15],

Utricularia gibba (bladderwort) and Genlisea aurea (cork-

screw), provided evidence that almost all intragenic

space and repeat sequence can be purged [16,17]. In

addition, the aquatic, highly morphologically reduced,

Introduction non-grass monocot Spirodela polyrhiza (greater duck-

After the release of the Arabidopsis genome in 2000 [1] weed), has a genome similar in size to Arabidopsis

and the advent of Next Generation Sequencing (NGS) yet functions with 28% less genes (19,623) [18]. The

technology in 2005, the number of sequenced plant genomes of Selaginella moellendorffii (spikemoss [19])

genomes has rapidly increased to more than 100 ([2], and Amborella trichopoda [20], provide the evolutionary

List of sequenced plant genomes; URL: https:// link between vascular plants and angiosperms respec-

genomevolution.org/wiki/index.php/Sequenced_plant_ tively, yielding key insights into the trajectory of plant

genomes). Nearly two-thirds (63%) of the sequenced specific gene families and the radiance of flowering

plant genomes are from crops, while model, non-model plants.

and crop wild relatives make up the remainder; three-

fourths (76%) of the sequenced plant genomes are from In this review we focus primarily on the most recently

and one-fifth (19%) are from monocots. Few sequenced specialty and row crop genomes with an

genomes from non-flowering plants have been published emphasis on challenges and limitations of current genome

thus far, with only three from the Gymnospermae, one sequencing techniques. This segues into downstream

from the Bryophyta and one from the Lycopodiophyta work aimed at linking the genome to the biology, and

(Figure 1, Table 1). concludes with the future of plant genomics.

www.sciencedirect.com Current Opinion in Plant Biology 2015, 24:71–81

72 Genome studies and molecular genetics

Figure 1

Kiwi Blueberry Coffee Eggplant Tomato Potato Pepper Utricularia Monkey Flower Sugar Beet Grape Soybean Common Bean Pigeon Pea Medicago Apple Sequencing Technology Pear Sanger only Strawberry Sanger + 454/Illumina Peach

454 + Illumina Core eudicots Watermelon Illumina only Rosids

Rosids I Cucumber Whole Genome Duplication Poplar Whole Genome Triplication Willow Cassava Polyploid crop species Rubber Jatropha Castor Bean

Rosids II Eucalyptus Orange Cotton Cocao Papaya Arabidopsis thaliana Basal Eudicots Arabidopsis lyrata Camelina Brassica rapa Brassica oleracea Brassica napus Sacred lotus Wheat Barley Brachypodium Rice Bamboo Tef

Monocots Setaria Maize Sorghum Banana Oil Palm

Flowering Plants Flowering Date Palm Duckweed Amborella Seed Plants Loblolly Pine Plants Norway Spruce

Vascular Selaginella Plants Physcomitrella Land Chlamydomonas Vovlox

Current Opinion in Plant Biology

Current Opinion in Plant Biology 2015, 24:71–81 www.sciencedirect.com

Progress, challenges and the future of crop genomes Michael and VanBuren 73

Major challenges in crop genome sequencing Outcrossing species like grape, clonally propagated crops

projects like apple, and long-lived trees like Eucalyptus tend to

Genome assembly tools, which were generally designed have high levels of within genome heterozygosity. Para-

and tested for non-plant species [21], are ill suited for logous regions and heterozygous sites create ‘bubbles’

handling the issues of genome size, repeat content, paral- during genome assembly where two or more regions that

ogy, and heterozygosity that are common in plant gen- are highly similar assemble together, and the adjacent

omes. The throughput of NGS technologies has made it dissimilar regions assemble separately but eventually

economical to sequence most crop genomes, but resolv- merge again (Figure 2a). Assembly issues stemming from

ing plant genome complexity with 100–200 bp reads is polyploidy or heterozygosity can be overcome by using

still a challenge. Most recent mammalian genomes are diploid progenitor species (‘robusta’ coffee [26] and

assembled into chromosome scale regions [22], but most wheat [27]), closely related wild diploid species (wood-

draft plant genomes remain in thousands of highly frag- land strawberry [28]), haploid/monoploid lines (citrus

mented contigs or hundreds of scaffolds with numerous [29,30], banana [31] or peach [32]), or a bacterial artificial

imbedded gaps. Even the Arabidopsis genome, which is chromosome (BAC) by BAC sequencing approach (maize

arguably the best-assembled plant genome, is still in [10] or pear [33]) (Figure 2).

102 contigs with a total gap length of at least

185,644 bp (TAIR 10 [23]). Organelle DNA contamination can be a major problem in

genome sequencing projects. Plant cells can have over

Genome size and repeat content, which are often highly 100 chloroplasts with up to 10,000 plastid DNA copies per

correlated, present a major problem for plant genome cell [34] and organelle derived reads can constitute 5–



assembly. Genome size in plants varies by 4 orders of 20% of the total sequences in a whole genome sequencing

magnitude, from 61 Mb (Genlisea tuberosa) to over 150,000 (WGS) project. Modified DNA extraction protocols opti-

Mb (Paris japonica) (reviewed in [24]). NGS platforms can mized for nuclei isolation are typically used, which can

now generate enough raw data to sequence large genomes reduce organelle contamination several fold [35]. Organ-

but assembling so much data is a major computational elle contamination can be tested before library construc-

problem. The loblolly pine genome is the largest genome tion using a simple qPCR protocol [35]. Plant nuclear

assembled to date (22 Gb) and used a preprocessed, genomes also contain numerous organelle derived

condensed set of ‘super-reads’ to reduce the computation sequences which can have near identical homology to

resources needed for assembly [13]. the organelle genomes themselves, accurately sequenc-

ing through these regions requires read lengths that can

Repeats are a major problem in genome assembly, and span the insertion junction sites.

resolving repeat structures requires sequencing read

lengths that exceed the 10–20 kb repeats commonly Overcoming the challenges of sequencing plant genomes

found in plant genomes. Type II ‘cut and paste’ long requires both advances in sequencing technology. Lon-

terminal repeat (LTR) retrotransposons are the most ger read lengths provided by third generation single

prevalent repeat in plant genome and their proliferation molecule sequencers like Pacific Biosciences (PacBio)

results in genome bloating. Estimating the average repeat offer the possibility of overcoming the complexities of

lengths in the genome is crucial for picking read lengths, plant genomes. The average read length of PacBio reads

sequencing technology, mate pair libraries sizes and using P6C4 chemistry is over 15 kb with an average

coverage. Much of the structural variation (SV) between coverage of 30 required for a quality of Q50. Circular-

Â

cultivars within plant genomes is due to the movement of ized consensus sequencing (CCS) from PacBio uses short

LTRs, and reference based re-sequencing projects often sequences (1–5 kb) that are circularized and sequenced

miss or inaccurately predict SVs. De novo assembly of multiple times to produce very high quality sequences

three divergent rice strains uncovered several Megabases which can be used to distinguish highly similar repeat or

(Mb) of novel sequences in each strain, with many contigs paralogous structures. CCS is also useful for building

containing expressed genes [25]. high confidence, allele specific, full length transcripts

with well annotated alternative splicing sites [36]. Given

Issues of paralogy complicate genome assembly and high enough read depth, all but the longest repeats (such

result in incomplete, highly fragmented assemblies. Poly- as telomeres and centromeres) can be resolved, poten-

ploidy is common among crop species and plants have tially ushering a new era of platinum quality genome

large, multi-gene families with highly similar paralogs. assemblies. A preliminary de novo assembly of Arabidopsis

( Figure 1 Legend ) Distribution and characteristics of sequenced plant genomes. Phylogeny was adapted from: https://genomevolution.org/wiki/

index.php/Sequenced_plant_genomes and only representative crop, model and evolutionary significance genomes are shown. Branch colors

represent sequencing technologies used: Sanger only (brown), Sanger + 454/Illumina (blue), 454 + Illumina (red) and Illumina only (gold). Green and

blue circles represent whole genome duplications and triplications respectively. Branch length does not correlate with divergence

www.sciencedirect.com Current Opinion in Plant Biology 2015, 24:71–81

74 Genome studies and molecular genetics

Table 1

Published sequenced plant genomes. Over 100 plant genomes have been sequenced and published since 2000. The statistics for each

genome are taken from the publication, despite several model plants having significant updates to genome assemblies and gene counts.

NA, data not available in publication. Mb, megabases; kb, kilobases

Common name Scientific name Year Phyla Type Size Gene (#) Repeat Scaffold PMID

(Mb) (%) N50 (kb)

Arabidopsis Arabidopsis thaliana 2000 Dicot Model 125 25,498 14 NA 11130711

Rice Oryza sativa 2002 Monocot Crop 430 59,855 26 12 11935017

Rice Oryza sativa 2002 Monocot Crop 420 29,961 NA NA 11935018

Rice Oryza sativa 2005 Monocot Crop 403 37,544 26 NA 16100779

Black Populus trichocarpa 2006 Dicot Crop 485 45,555 42 3100 16973872

Cottonwood

Grape Vitis vinifera 2007 Dicot Crop 475 30,434 41 2065 17721507

Moss Physcomitrella patens 2008 Bryophyte Model 510 35,938 16 1320 18079367

Grape Vitis vinifera 2007 Dicot Crop 505 29,585 27 1330 18094749

Papaya Carica papaya 2008 Dicot Crop 372 28,629 43 1000 18432245

Lotus Lotus japonicus 2008 Dicot Non-model 472 30,799 56 NA 18511435

Sorghum Sorghum bicolor 2009 Monocot Crop 818 34,496 62 62,400 19189423

Cucumber Cucumis sativus 2009 Dicot Crop 367 26,682 24 1140 19881527

Corn Zea mays 2009 Monocot Crop 2300 32,540 85 76 19965430

Soybean Glycine max 2010 Dicot Crop 1115 46,430 57 47,800 20075913

Brachypodium Brachypodium 2010 Monocot Model 272 25,532 21 59,300 20148030

distachyon

Castor bean Ricinus communis 2010 Dicot Crop 320 31,237 50 561 20729833

Apple Malus domestica 2010 Dicot Crop 742 57,386 67 1542 20802477

Â

Jatropha Jatropha curcas 2010 Dicot Crop 380 40,929 36 NA 21149391

Cocoa Theobroma cacao 2011 Dicot Crop 430 28,798 24 473 21186351

Strawberry Fragaria vesca 2011 Dicot Crop 240 34,809 23 1300 21186353

Lyrata Arabidopsis lyrata 2011 Dicot Model 207 32,670 30 24,500 21478890

Spikemoss Selaginella 2011 Lyco Non-model 110 22,285 38 1700 21551031

moellendorffii

Date palm Phoenix dactylifera 2011 Monocot Crop 658 28,890 40 30 21623354

Potato Solanum tuberosum 2011 Dicot Crop 844 39,031 62 1318 21743474

Thellungiella Thellungiella parvula 2011 Dicot Model 140 30,419 8 5290 21822265

Cucumber Cucumis sativus 2011 Dicot Crop 367 26,587 NA 319 21829493

Chinese Brassica rapa 2011 Dicot Crop 485 41,174 40 1971 21873998

cabbage

Hemp Cannabis sativa 2011 Dicot Crop 820 30,074 NA 16 22014239

Pigeon pea Cajanus cajan 2012 Dicot Crop 833 48,680 52 516 22057054

Medicago Mediucago truncatula 2011 Dicot Model 454 62,388 31 1270 22089132

Setaria Setaria italica 2012 Monocot Model 490 38,801 46 1007 22580950

Setaria Setaria italica 2012 Monocot Model 510 35,471 40 47,300 22580951

Tomato Solanum lycopersicum 2012 Dicot Crop 900 34,727 63 16,467 22660326

Melon Cucumis melo 2012 Dicot Crop 450 27,427 NA 4680 22753475

Flax Linum usitatissimum 2012 Dicot Crop 373 43,484 24 132 22757964

Banana Musa acuminata 2012 Monocot Crop 523 36,542 44 1311 22801500

malaccensis

Tobacco benthamiana 2012 Dicot Crop 3000 NA NA 89 22876960

Cotton D Gossypium raimondii 2012 Dicot Crop 880 40,976 60 2284 22922876

Neem Azadirachta indica 2012 Dicot Crop 364 20,169 13 452 22958331

Barely Hordeum vulgare 2012 Monocot Crop 5100 30,400 84 NA 23075845

Pear Pyrus bretschneideri 2013 Dicot Crop 527 42,812 53 541 23149293

Dwarf birch Betula nana 2012 Dicot Non-model 448 NA NA 19 23167599

Sweet orange Citrus sinensis 2013 Dicot Crop 367 29,445 21 1690 23179022

Watermelon Citrullus lanatus 2012 Dicot Crop 425 23,440 45 2380 23179023

Wheat Triticum aestivum 2012 Monocot Crop 17,000 94,000 80 NA 23192148

Cotton D Gossypium raimondii 2012 Dicot Crop 880 37,505 61 18,800 23257886

Chinese plum Prunus mume 2012 Dicot Crop 280 31,390 45 578 23271652

Chickpea Cicer arietinum 2013 Dicot Crop 738 28,269 49 39,990 23354103

Rubber tree Hevea brasiliensis 2013 Dicot Crop 2150 68,955 72 3 23375136

Moso bamboo Phyllostachys 2013 Monocot Non-model 2075 31,987 59 329 23435089

heterocycla

Rice relative Oryza brachyantha 2013 Monocot Wild-relative 300 32,038 29 1013 23481403

Eutrema Eutrema salsugineum 2013 Dicot Non-model 243 26,351 51 13,400 23518688

salsugineum

Peach Prunus persica 2013 Dicot Crop 265 27,852 37 27,400 23525075

Wheat DD Aegilops tauschii 2013 Monocot Crop 4360 43,150 66 58 23535592

Current Opinion in Plant Biology 2015, 24:71–81 www.sciencedirect.com

Progress, challenges and the future of crop genomes Michael and VanBuren 75

Table 1 (Continued )

Common name Scientific name Year Phyla Type Size Gene (#) Repeat Scaffold PMID

(Mb) (%) N50 (kb)

Wheat AA Triticum urartu 2013 Monocot Crop 4940 34,879 67 64 23535596

Sacred lotus Nelumbo nucifera 2013 Dicot Non-model 929 26,685 57 3400 23663246

Bladderwort Utricularia gibba 2013 Dicot Non-model 77 28,500 3 95 23665961

Norway spruce Picea abies 2013 Gymnosperm Crop 19,600 28,354 70 NA 23698360

White spruce Picea glauca 2013 Gymnosperm Crop 20,000 NA NA 20 23698863

C. grandiflora C. grandiflora 2013 Dicot Non-model 200 NA NA 98 23749190

Neslia paniculata Neslia paniculata 2013 Dicot Non-model NA NA NA 62 23749190

Capsella Capsella rubella 2013 Dicot Non-model 219 26,521 NA 15,100 23749190

Tobacco 2013 Dicot Crop 2636 38,940 72 80 23773524

Tobacco Nicotiana 2013 Dicot Crop 2360 38,648 75 83 23773524

tomentosiformis

Brassicaceae Leavenworthia 2013 Dicot Non-model 316 30,343 27 70 23817568

alabamica

Brassicaceae Sisymbrium irio 2013 Dicot Non-model 262 28,917 38 135 23817568

Brassicaceae Aethionema arabicum 2013 Dicot Non-model 240 23,167 37 118 23817568

Genlisea Genlisea aurea 2013 Dicot Non-model 64 17,755 NA 6 23855885

Oil palm Elaeis guineensis 2013 Monocot Crop 1824 34,802 18 1045 23883927

Mulberry Morus notabilis 2013 Dicot Non-model 357 29,338 47 390 24048436

Kiwifruit Actinidia chinensis 2013 Dicot Crop 758 39,040 36 646 24136039

Poplar, wild Populus euphratica 2013 Dicot Wild-relative 593 34,279 44 482 24256998

Amborella Amborella trichopoda 2013 Dicot Non-model 748 27,313 57 4900 24357323

Greater duckweed Spirodela polyrhiza 2013 Monocot Crop 158 19,623 13 4924 24548928

Pepper Capsicum annuum 2014 Dicot Crop 3260 35,336 80 1226 24591624

Peper, wild Capsicum annuum 2014 Dicot Wild-relative 3070 34,476 81 445 24591625

Lobloly pine Pinus taeda 2014 Gymnosperm Crop 23,200 50,172 82 67 24647006

Camelina Camelina sativa 2014 Dicot Crop 785 89,418 28 2160 24759634

Tobacco 2014 Dicot Crop 4600 91,870 73 345 24807620

Tobacco Nicotiana tabacum 2014 Dicot Crop 4410 81,404 79 385 24807620

Tobacco Nicotiana tabacum 2014 Dicot Crop 4570 93,650 73 350 24807620

Tobacco Nicotiana otophora 2014 Dicot Crop 2700 NA NA 27 24807620

Cotton A Gossypium arboreum 2014 Dicot Crop 1724 41,330 69 666 24836287

Brassica Brassica oleracea 2014 Dicot Crop 630 45,758 38 1457 24852848

Wild radish Raphanus 2014 Dicot Crop 515 38,174 NA 10 24876251

raphanistrum

Common bean Phaseolus vulgaris 2014 Dicot Crop 587 27,197 45 50,000 24908249

Sweet orange Citrus sinensis 2014 Dicot Crop 367 25,379 31 250 24908277

Clementine Citrus clementina 2014 Dicot Crop 367 24,533 45 6800 24908277

Eucalyptus Eucalyptus grandis 2014 Dicot Crop 640 36,376 50 53,900 24919147

Willow Salix suchowensis 2014 Dicot Non-model 425 26,599 40 924 24980958

Soybean, wild Glycine max 2014 Dicot Wild-relative 1165 52,395 43 401 25004933

Tef Egrostis tef 2014 Monocot Crop 772 38,000 14 66 25007843

Wheat Triticum aestivum 2014 Monocot Crop 17,000 124,201 NA NA 25035500

Rice relative Oryza glaberrima 2014 Monocot Wild-relative 316 33,164 34 217 25064006

Tomato, wild Solanum pennellii 2014 Dicot Wild-relative 1200 32,273 82 1741 25064008

Canola Brassica napus 2014 Dicot Crop 1130 101,040 35 763 25146293

Coffee Cofea canefora 2014 Dicot Crop 710 25,574 50 1260 25190796

Soybean, wild Glycine soja 2014 Dicot Wild-relative 981 55,061 NA 18 25218520

Soybean, wild Glycine Soja 2014 Dicot Wild-relative 1001 54,256 NA 57 25218520

Soybean, wild Glycine soja 2014 Dicot Wild-relative 1054 56,542 NA 17 25218520

Soybean, wild Glycine soja 2014 Dicot Wild-relative 1118 57,631 NA 49 25218520

Soybean, wild Glycine soja 2014 Dicot Wild-relative 956 55,901 NA 65 25218520

Soybean, wild Glycine soja 2014 Dicot Wild-relative 993 54,805 NA 52 25218520

Soybean, wild Glycine soja 2014 Dicot Wild-relative 889 54,797 NA 45 25218520

Eggplant Solanum melongena 2014 Dicot Crop 1126 85,446 70 64 25233906

Cassava Manihot esculenta 2014 Dicot Wild-relative 742 34,483 37 67 25300236

Cassava Manihot esculenta 2014 Dicot Wild-relative 742 38,845 26 27 25300236

Jujube Ziziphus jujuba 2014 Dicot Crop 444 32,808 47 301 25350882

Blueberry Vaccinium 2014 Dicot Crop 600 60,000 NA 145 NA

corymbosum

www.sciencedirect.com Current Opinion in Plant Biology 2015, 24:71–81

76 Genome studies and molecular genetics

Figure 2

resources until recently because of the high cost of early

Sanger sequencing and their low production values com-

(a) (b) pared to row crops. Papaya is an exception to this, as it was

the fifth plant genome and first specialty crop to be

sequenced with roughly 3X coverage of Sanger reads

and BAC end sequences for scaffolding. The papaya

drafted genome showed the stability of transgenic inser-

tions and served as a powerful model for early compara-

tive genomics work because it lacks a lineage specific

whole genome duplication (WGD) [37]. The introduction

of NGS technologies facilitated sequencing several small double haploid or diploid species double

whole genome shotgun reads

specialty crop genomes including cucumber [38], apple

Highly heterozygotic or polyploid species

[39], strawberry [28], cacao [40], date palm [41], and

watermelon [42]. Most of the recently published crop

A B A A′ genomes are specialty crops (see Table 1). Here we focus

(c) (d)

on specialty crops with the most economic value, and row

crops with novel sequencing strategies that can be applied

to other species.

Tomato seed is worth its weight in gold. It is the leading

vegetable crop with a rich and diverse breeding program,

and serves as a model system for fruit development. The

high quality sanger based genome uncovered a whole

Minimum tilling path of BACs Minimum

genome triplication event which facilitated neo-functio-

long third generation PacBio reads PacBio long third generation

nalization of genes related to fruit quality and develop-

ment [43]. Comprehensive resequencing of 360 diverse Use short readsPacBio or WGS reads for error correction WGS reads for Use shortor readsPacBio

A B tomato accessions showed that two independent sets of

A B

quantitative trait loci (QTLs) are responsible for the 100-

Current Opinion in Plant Biology

fold increase in fruit size during tomato domestication

[44]. In addition to tomato and potato, other members of

Strategies for sequencing complex crop genomes. A and B represent

the have also been recently sequenced, in-

subgenomes in a polyploidy or homologous chromosomes in a highly

heterozygotic species, grey lines connect nearly identical regions and cluding a domesticated pepper, a wild-relative and

light blue and red represent diverse regions. A typical WGS strategy 20 resequenced pepper accessions [45] and the eggplant

yields a highly fragmented genome as similar regions assemble

genome [46].

together and diverse regions assemble separately creating ‘bubbles’ in

the assembly graph. (b) A double haploid (for heterozygous species)

or diploid relative/progenitor (for polyploidy species) can be used to The citrus complex, which is an admixture of hybrids that

simplify the assembly. Reads from only one haplotype (in this case A) includes oranges, grapefruit, lemons and limes, is the

assemble without ambiguities. (c) Long reads from third generation

highest value fruit crop across the world. Reference citrus

single molecule sequencers like PacBio can be used to assemble both

genomes for sweet orange [30] and Clementine mandarin

haplotypes separately resulting in two complete subgenomes. Short

[29] were sequenced using double-haploid lines to elimi-

PacBio or Illumina WGS reads are used to correct the long reads prior

to assembly. (d) A more traditional and expensive BAC by BAC nate within genome heterozygosity because of clonal

approach can be used where a BACs from a minimum tilling path are propagation and interspecific hybridization. Resequen-

sequenced separately and the stitched together to create a chimeric

cing efforts showed cultivated citrus are derived from two

assembly of both haplotypes. WGS can be mapped to the chimera to

progenitor species and sweet orange has a complex pedi-

sequence the second haplotype.

gree; one parent is a shared ancestor of mandarin and the

other is likely a pummelo with introgressions of wild

mandarin [29,30].

thaliana accession Ler-0 using P4C3 chemistry produced

a contig N50 of 6.36 Mb, similar to the quality of the Banana is a key starch staple in Africa and Asia with

TAIR10 release (PacBio website; URL: http://blog. consumption up to 400 kg/person/year, and is the second

pacificbiosciences.com/2013/08/new-data-release- most cultivated tropical fruit behind the citrus complex.

arabidopsis-assembly.html). Banana was the first non-grass monocot with a published

high contiguity reference genome. Most global banana

NGS technologies kick-start specialty crops production stems from somaclones derived from a single

genomics triploid line ‘Cavendish’, and genomic resources are

Specialty crops, which include most fruits and vegetables, essential for improved disease resistance and yield. A

nut trees, and beverage crops, have had limited genomic double-haploid banana was used to overcome issues

Current Opinion in Plant Biology 2015, 24:71–81 www.sciencedirect.com

Progress, challenges and the future of crop genomes Michael and VanBuren 77

associated with triploidy [31]. Banana has three WGD resources were limited because of its large (17 Gb),

events independent of the grasses, but surprisingly few hexaploid (2n = 6x = 42), repeat rich ( 87%) genome



NBS-LRR disease resistance genes (89), which may [51]. To overcome issues with genome complexity, flow

contribute to its disease susceptibility [31]. cytometry was used with aneuploidy wheat lines to isolate

and sequence each chromosome arm separately, with

Over 2.25 billion cups of coffee are consumed each day each arm representing 1.3–3.3% of the genome. The final

making coffee the world’s leading beverage crop. Most assembly is highly fragmented spanning 10.2 Gb with

coffee comes from Coffea arabica, a highly heterozygous, 124,201 genes distributed unevenly across the three sub-

outcrossing allotetraploid. The coffee reference genome genomes. The subgenomes of wheat have limited gene

was generated using a double haploid accession of C. loss or rearrangements, contrasting the dynamic shuffling

canephora, one of the diploid parents of C. arabica and the and loss in the much younger polyploidy events in B.

source of ‘robusta’ coffee [26]. Coffee has tandem dupli- napus. This suggests plasticity in the events post WGD;

cations of N-methyltransferases (NMTs) that contribute all polyploidy events are not alike. There is no global

to caffeine production. Comparisons of NMTs from tea genome dominance among the wheat subgenomes, but

and cacao suggest caffeine biosynthesis has polyphyletic there is cell and stage dependent dominance, including

origins and evolved at least twice. gene families related to baking quality [52].

Sub-genome assisted sequencing of complex Tetraploid cotton (AADD, Gossypium hirsutum) has higher

polyploidy row crops fiber and quality production than diploid cotton (G.

42% of human energy supply comes from cereal row crops barbadense), an emergent property with major QTLs in

such as rice, wheat, and maize [47], while other row crops the D subgenome from G. raimondii, which has no spin-

such as cotton, soybean and canola play major roles in nable fibers [53]. Diploid cotton (G. raimindii) has an

nutrition and clothing. The first row crop genomes, rice, abrupt 5–6 fold ploidy increase after splitting from the

maize and sorghum were sequenced using Sanger WGS cacao linage, rivaled only by the Brassicaceae. Tetraploid

and BAC by BAC approaches [7,8,10,48]. Other cereals cotton has numerous non-reciprocal DNA exchanges

have been sequenced recently including barley [49] the between the A and D subgenomes, and coordinated gene

model C3 grass B. distachyon [3] and the model C4 grass S. expression changes including nuclear mitochondrial

italica [5]. Row crops have a propensity for polyploidy, genes involved in electron transport, which likely con-

which likely contributes to their improved nutritional tribute to fiber production [54].

content and high yields. Polyploidy also confers emergent

properties like seed oil accumulation in canola, spinnable Tree crop genomes

fibers in cotton, and grain composition in wheat. The Trees are long-lived perennials that are valued for timber,

presence of multiple subgenomes complicates genome fuel and other products. Extraordinary progress has been

assembly of polyploids, but a powerful approach of se- made on economically important tree crops starting with

quencing the likely subgenome diploid contributor has the publication of P. trichocarpa (poplar [9]), in 2006,

accelerated our understanding of these complex genomes. which made it the third published plant genome. Another

fast growing and economically import tree Eucalyptus

Brassica napus (canola or rapeseed), the third largest grandis was published more recently [55]. The Eucalyptus

source of vegetable oil, is an allotetraploid of B. rapa genome revealed an expansion of terpene synthesis genes

(turnips and nana cabbage) and B. oleracea (cabbage, associated with defense as well as the largest number of

broccoli, cauliflower, kale and other cruciferous vegeta- tandem repeats of any sequenced genome to date. Three

bles), which occurred 7500–12,500 years ago. Most of the gymnosperm genomes, Picea tadea (loblolly pine [13]),

20,000 Illumina and 454 based scaffolds in B. napus were Picea glauca (white spruce [56]), and Picea abies (norway

assigned to either the A or C subgenome using 454 reads spruce [57]), also have been sequenced recently. In

from each progenitor parent, providing for unprecedented addition to being the first three gymnosperm genomes,

comparisons of homeologous regions in a polyploidy they are also the largest genomes sequenced to date.

species. Despite the young age of rapeseed, around Since the generation time in most trees is long compared

100 genes have been lost from each subgenome and to row crops, draft genomes enables technologies like

subtle changes in epigenetic regulation, homeologous genomic selection (GS), which model superior genotypes

exchange and gene expression divergence have occurred. using genome-wide markers and limited phenotyping,

Repeated whole genome duplications in rapeseed have reducing time-consuming phenotypic selection and

created a 72 fold duplication of the ancestral flowering breeding cycles [58].

plant genome, and unique expansions in oil biosynthesis

genes and loss of glucosinolate genes were observed [50]. Orphan crops

Orphan crops have limited improvement from their wild

Bread wheat (Triticum aestivum) is a staple food for 30% of relatives, unrecognized nutritional value, disease suscep-

the world’s population but until recently, genomic tibility, poor shelf life and growth constraints. However,

www.sciencedirect.com Current Opinion in Plant Biology 2015, 24:71–81

78 Genome studies and molecular genetics

orphan crops like pigeon pea (Cajanus cajan), cassava and the 13,632 accessions maintained in the ICRISAT gen-

tef (Eragrostis tef) are major staples in underdeveloped bank [59]. Pigeon pea has a large repertoire of universal

regions and genomic resources are essential to boost pro- drought response proteins, which may contribute to its

duction. The draft genome of pigeon pea provided over drought tolerance. Draft genomes for both wild and culti-

300,000 SSR markers for plant breeding and screening of vated cassava are available as well as a repertoire of SNPs

Figure 3

(a)

100% Soil 75% Soil 50% Soil 25% Soil No Water Capacity Capacity Capacity Capacity

Abiotic Stress

Photosynthesis Measurements Electrolyte Leakage Assays

(b) (c) ENCODE Analyses

Nucleosome Hypersensitivi- TFs CH3 ty Sites CH3 Gene CH3

(d) Long-range Cis-regulatory elements Exon regulatory elements (Promoters TF binding sites) Transcript H3K4me3 ChIP-seq (Methylation)

ChIP-seq (Protein/Histone Binding)

DNase-seq (Hypersenstitive Sites)

mRNAseq (transcript Abundance)

DNAseq

(Resequencing) SNP Deletion

Current Opinion in Plant Biology

The next frontier of plant genomics: Overview of the Brachypodium plant ENCODE project. (a) Plants (in this case Brachypodium), are subjected

to various abiotic stresses and responses are measured using high-throughput phenotyping and physiology. Material from stressed and control

plants are used to generate ENCODE datasets in panel d. (b) Gene co-regulation network produced from ENCODE data. (c) Genetic elements that

are targeted in an ENCODE project. (d) Example datasets from an ENCODE project. Peaks reflect a high depth of Illumina reads corresponding to

highly methylated regions (H3K4me3 ChIP-seq), regions with TF or histone binding (ChIP-seq), open chromatin regions (DNase-seq), transcribed

regions (mRNAseq) and polymorphisms/InDels between different lines (DNAseq). Taken together, ENCODE datasets can paint a near complete

picture of epigenetic regulation under a given stress.

Current Opinion in Plant Biology 2015, 24:71–81 www.sciencedirect.com

Progress, challenges and the future of crop genomes Michael and VanBuren 79

for breeding. Cassava genes involved in starch accumula- phenotyping will be leveraged to understand the net-

tion, photosynthesis and abiotic stresses have been posi- works active under drought conditions (Figure 3, DOE

tively selected during domestication and genes involved in award abstract; URL: http://genomicscience.energy.gov/

toxic cyanogenic glucoside formation have been purged research/DOEUSDA/abstracts/2014mockler_abstract.

[60]. A draft genome of tef identified novel SSR loci for shtml). Once we have detailed genome maps the ability to

marker assisted breeding and provides a framework for edit specific sequences such as has been done in rice and

identifying genes related to abiotic stresses and nutrition wheat [67] will enable the next generation of crop do-

[61]. mestication.

Conclusion: beyond the assembled genome The amount of available genomic data for crop plants is

and the future of crop genomics staggering, and thousands of Gb of plant sequences are

Though reference genomes are now available for many deposited in NCBI and other public databases monthly.

crops, only diploid/haploid references are available for As a community, we are about to have resources we could

polyploidy crops like potato, coffee, strawberry and ba- only dream of. How will we use them to meet the

nana and most highly heterozygotic genomes have only challenge of feeding 9 billion people by 2050? New tools

one sequenced haplotype. Some of the elements contrib- for analyzing these high-throughput datasets are desper-

uting to agronomic traits like gene duplications, genome ately needed, and training of young scientists should shift

rearrangements and repeat integrations are cultivar spe- toward computer science and engineering in order to

cific, and cannot be found using resequencing strategies prosper in the changing face of biological research.

or reduced complexity references. High quality refer- How will we build the plant genomicists of the future

ences of each subgenome (in polyploids) and each haplo- who only know science with big data, whole genome

type (in heterozygous crops) as well as multiple analysis and full information access?

references per crop species are needed to survey true

variation. Acknowledgements

This work was supported in part by funding from the National Science

Foundation (DBI-1401572) to R.V., and DARPA (HR0011-13-C-0103) to

Many crop genomes, especially those of vegetables, have T.P.M.

gone through a very tight breeding bottleneck to arrive at

our table. The impact is that a great deal of diversity is lost

References and recommended reading

in current breeding germplasm, leading to slowed im- Papers of particular interest, published within the period of review,

have been highlighted as:

provement and potential for loss of disease resistance.

There is a pressing need to develop genomic resources for of special interest



of outstanding interest

these crop wild-relatives so that they can be used in 

breeding, allele identification and introgression [62].

1. Initiative AG: Analysis of the genome sequence of the flowering

Draft genomes from wild relatives of tomato (Solanum

plant Arabidopsis thaliana. Nature 2000, 408:796.

pennellii), soybean (Glycine soja) [63–65] and cassava [60]

2. Michael TP, Jackson S: The first 50 plant genomes. Plant

are currently available, but more wild species are needed Genome 2013:6.

for crop improvement programs.

3. Vogel JP, Garvin DF, Mockler TC, Schmutz J, Rokhsar D,

Bevan MW, Barry K, Lucas S, Harmon-Smith M, Lail K: Genome

sequencing and analysis of the model grass Brachypodium

An assembled reference genome sequence is simply a

distachyon. Nature 2010, 463:763-768.

foundation; the true challenge is to identify the features

4. Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H,

of the genome that describe the biology. Although every

Nishiyama T, Perroud P-F, Lindquist EA, Kamisugi Y: The

cell has essentially the same DNA sequence, epigenetic Physcomitrella genome reveals evolutionary insights into the

conquest of land by plants. Science 2008, 319:64-69.

decorations and gene expression vary greatly by cell

based on the environment, developmental stage and 5. Bennetzen JL, Schmutz J, Wang H, Percifield R, Hawkins J,

Pontaroli AC, Estep M, Feng L, Vaughn JN, Grimwood J:

tissue type. The next phase of crop genomics will be

Reference genome sequence of the model plant Setaria. Nat

to completely elucidate these biologically active states of Biotechnol 2012, 30:555-561.

DNA, as has been done for other model systems such as 6. Zhang G, Liu X, Quan Z, Cheng S, Xu X, Pan S, Xie M, Zeng P,

Yue Z, Wang W: Genome sequence of foxtail millet (Setaria

human, mouse, drosophila and C. elegans in a plant EN-

italica) provides insights into grass evolution and biofuel

CODE (Encyclopedia of DNA Elements) projects [66].

potential. Nat Biotechnol 2012, 30:549-554.

Such studies can take on several forms, such as has been

7. Goff SA, Ricke D, Lan T-H, Presting G, Wang R, Dunn M,

done for other model systems where integrated maps of Glazebrook J, Sessions A, Oeller P, Varma H: A draft sequence of

the rice genome (Oryza sativa L. ssp. japonica). Science 2002,

DNA methylation, smallRNA, histone modification and

296:92-100.

transcript abundance are measure across multiple tissues

8. Yu J, Hu S, Wang J, Wong GK-S, Li S, Liu B, Deng Y, Dai L, Zhou Y,

and conditions (Figure 3). Brachypodium has the first

Zhang X: A draft sequence of the rice genome (Oryza sativa L.

formal plant ENCODE project funded by DOE/USDA ssp. indica). Science 2002, 296:79-92.

where not only detailed molecular maps of epigenetic

9. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I,

modifications and expression will be generated, but Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A: The

www.sciencedirect.com Current Opinion in Plant Biology 2015, 24:71–81

80 Genome studies and molecular genetics

genome of black cottonwood, Populus trichocarpa (Torr. & Oryza sativa, document novel gene space of aus and indica.

Gray). Science 2006, 313:1596-1604. Genome Biol 2014, 15:506.

The authors produce de novo sequences of three diverse rice strains and

10. Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, uncover several Mb of sequences unique to each strain that could not

Liang C, Zhang J, Fulton L, Graves TA: The B73 maize genome: have been identified using a standard resequencing approach.

complexity, diversity, and dynamics. Science 2009, 326:1112-

1115. 26. Denoeud F, Carretero-Paulet L, Dereeper A, Droc G, Guyot R,

Pietrella M, Zheng C, Alberti A, Anthony F, Aprea G: The coffee

11. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, genome provides insight into the convergent evolution of

Hyten DL, Song Q, Thelen JJ, Cheng J: Genome sequence of the caffeine biosynthesis. Science 2014, 345:1181-1184.

palaeopolyploid soybean. Nature 2010, 463:178-183.

27. Ling H-Q, Zhao S, Liu D, Wang J, Sun H, Zhang C, Fan H, Li D,

12. Consortium TG: The tomato genome sequence provides Dong L, Tao Y: Draft genome of the wheat A-genome

insights into fleshy fruit evolution. Nature 2012, 485:635-641. progenitor Triticum urartu. Nature 2013, 496:87-90.

13. Zimin A, Stevens KA, Crepeau MW, Holtz-Morris A, Koriabine M,

28. Shulaev V, Sargent DJ, Crowhurst RN, Mockler TC, Folkerts O,

Marc¸ais G, Puiu D, Roberts M, Wegrzyn JL, de Jong PJ:

Delcher AL, Jaiswal P, Mockaitis K, Liston A, Mane SP: The



Sequencing and assembly of the 22-Gb loblolly pine genome.

genome of woodland strawberry (Fragaria vesca). Nat Genet

Genetics 2014, 196:875-890.

2011, 43:109-116.

The authors use a novel reduced complexity sequencing strategy to

assemble the 22 Gb loblolly pine genome, the largest genome sequenced 29. Wu GA, Prochnik S, Jenkins J, Salse J, Hellsten U, Murat F,

to date. Perrier X, Ruiz M, Scalabrin S, Terol J: Sequencing of diverse

mandarin, pummelo and orange genomes reveals complex

14. Bennetzen JL, Kellogg EA: Do plants have a one-way ticket to

history of admixture during citrus domestication. Nat

genomic obesity? Plant Cell 1997, 9:1509.

Biotechnol 2014, 32:656-662.

15. Fleischmann A, Michael TP, Rivadavia F, Sousa A, Wang W,

30. Xu Q, Chen L-L, Ruan X, Chen D, Zhu A, Chen C, Bertrand D,

Temsch EM, Greilhuber J, Mu¨ ller KF, Heubl G: Evolution of

Jiao W-B, Hao B-H, Lyon MP: The draft genome of sweet orange

genome size and chromosome number in the carnivorous

(Citrus sinensis). Nat Genetics 2013, 45:59-66.

plant genus Genlisea (Lentibulariaceae), with a new estimate

of the minimum genome size in angiosperms. Annals Bot 2014. 31. D’Hont A, Denoeud F, Aury J-M, Baurens F-C, Carreel F,

mcu189. Garsmeur O, Noel B, Bocs S, Droc G, Rouard M: The banana

(Musa acuminata) genome and the evolution of

16. Leushkin EV, Sutormin RA, Nabieva ER, Penin AA, Kondrashov AS,

monocotyledonous plants. Nature 2012, 488:213-217.

Logacheva MD: The miniature genome of a carnivorous plant

Genlisea aurea contains a low number of genes and short non- 32. Verde I, Abbott AG, Scalabrin S, Jung S, Shu S, Marroni F,

coding sequences. BMC Genomics 2013, 14:476. Zhebentyayeva T, Dettori MT, Grimwood J, Cattonaro F: The high-

quality draft genome of peach (Prunus persica) identifies

17. Ibarra-Laclette E, Lyons E, Herna´ ndez-Guzma´ n G, Pe´ rez-

unique patterns of genetic diversity, domestication and

Torres CA, Carretero-Paulet L, Chang T-H, Lan T, Welch AJ,

 genome evolution. Nat Genetics 2013, 45:487-494.

Jua´ rez MJA, Simpson J: Architecture and evolution of a minute

plant genome. Nature 2013, 498:94-98.

33. Wu J, Wang Z, Shi Z, Zhang S, Ming R, Zhu S, Khan MA, Tao S,

The compact bladderwort genome provides evidence that almost all

Korban SS, Wang H: The genome of the pear (Pyrus

intragenic space and repeat sequences can be purged.

bretschneideri Rehd.). Genome Res 2013, 23:396-408.

18. Wang W, Haberer G, Gundlach H, Gla¨ ßer C, Nussbaumer T, Luo M,

34. Shaver JM, Oldenburg DJ, Bendich AJ: Changes in chloroplast

Lomsadze A, Borodovsky M, Kerstetter R, Shanklin J: The

DNA during development in tobacco, Medicago truncatula,

Spirodela polyrhiza genome reveals insights into its

pea, and maize. Planta 2006, 224:72-82.

neotenous reduction fast growth and aquatic lifestyle. Nat

Commun 2014:5. 35. Lutz KA, Wang W, Zdepski A, Michael TP: Isolation and analysis

of high quality nuclear DNA with reduced organellar DNA for

19. Banks JA, Nishiyama T, Hasebe M, Bowman JL, Gribskov M,

plant genome sequencing and resequencing. BMC Biotechnol

Albert VA, Aono N, Aoyama T, Ambrose BA, Ashton NW: The

2011, 11:54.

Selaginella genome identifies genetic changes associated

with the evolution of vascular plants. Science 2011, 332:960- 36. Tilgner H, Grubert F, Sharon D, Snyder MP: Defining a personal,

963. allele-specific, and single-molecule long-read transcriptome.



Proc Natl Acad Sci 2014, 111:9869-9874.

20. Albert VA, Barbazuk WB, Der JP, Leebens-Mack J, Ma H,

The authors use single-molecule long-reads to assemble a haplotype

Palmer JD, Rounsley S, Sankoff D, Schuster SC, Soltis DE: The

 specific transcriptome rich with gene isoforms which has broad applica-

Amborella genome and the evolution of flowering plants.

tions for crop genomics.

Science 2013, 342:1241089.

Amborella is the most basal flowering plant and serves as a powerful 37. Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, Senin P,

reference for comparative genomics. The amborella genome provided Wang W, Ly BV, Lewis KL: The draft genome of the transgenic

evidence for an ancient WGD that predated all flowering plants. tropical fruit tree papaya (Carica papaya Linnaeus). Nature

2008, 452:991-996.

21. Metzker ML: Sequencing technologies — the next generation.

Nat Rev Genet 2009, 11:31-46. 38. Huang S, Li R, Zhang Z, Li L, Gu X, Fan W, Lucas WJ, Wang X,

Xie B, Ni P: The genome of the cucumber, Cucumis sativus L.

22. Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Burton JN,

Nat Genetics 2009, 41:1275-1281.

Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S: High-quality draft

assemblies of mammalian genomes from massively parallel

39. Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A,

sequence data. Proc Natl Acad Sci 2011, 108:1513-1518.

Kalyanaraman A, Fontana P, Bhatnagar SK, Troggio M, Pruss D:

The genome of the domesticated apple (Malus [times]

23. Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C,

domestica Borkh.). Nat Genetics 2010, 42:833-839.

Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-

Hernandez M: The Arabidopsis Information Resource (TAIR):

40. Argout X, Salse J, Aury J-M, Guiltinan MJ, Droc G, Gouzy J,

improved gene annotation and new tools. Nucleic Acids Res

Allegre M, Chaparro C, Legavre T, Maximova SN: The genome of

2012, 40:D1202-D1210.

Theobroma cacao. Nat Genetics 2011, 43:101-108.

24. Michael TP: Plant genome size variation: bloating and purging

41. Al-Dous EK, George B, Al-Mahmoud ME, Al-Jaber MY, Wang H,

DNA. Briefings Funct Genomics 2014. elu005.

Salameh YM, Al-Azwani EK, Chaluvadi S, Pontaroli AC, DeBarry J:

De novo genome sequencing and comparative genomics of

25. Schatz MC, Maron LG, Stein JC, Wences AH, Gurtowski J,

date palm (Phoenix dactylifera). Nat Biotechnol 2011, 29:521-

Biggers E, Lee H, Kramer M, Antoniou E, Ghiban E: Whole

 527.

genome de novo assemblies of three divergent strains of rice,

Current Opinion in Plant Biology 2015, 24:71–81 www.sciencedirect.com

Progress, challenges and the future of crop genomes Michael and VanBuren 81

42. Guo S, Zhang J, Sun H, Salse J, Lucas WJ, Zhang H, Zheng Y, polyploidization of Gossypium genomes and the evolution of

Mao L, Ren Y, Wang Z: The draft genome of watermelon spinnable cotton fibres. Nature 2012, 492:423-427.

(Citrullus lanatus) and resequencing of 20 diverse accessions.

55. Myburg AA, Grattapaglia D, Tuskan GA, Hellsten U, Hayes RD,

Nat Genetics 2013, 45:51-58.

Grimwood J, Jenkins J, Lindquist E, Tice H, Bauer D: The genome

43. Consortium PGS: Genome sequence and analysis of the tuber of Eucalyptus grandis. Nature 2014, 510:356-362.

crop potato. Nature 2011, 475:189-195.

56. Birol I, Raymond A, Jackman SD, Pleasance S, Coope R,

44. Lin T, Zhu G, Zhang J, Xu X, Yu Q, Zheng Z, Zhang Z, Lun Y, Li S, Taylor GA, Saint Yuen MM, Keeling CI, Brand D, Vandervalk BP:

Wang X: Genomic analyses provide insights into the history of Assembling the 20 Gb white spruce (Picea glauca) genome



tomato breeding. Nat Genetics 2014, 46:1220-1226. from whole-genome shotgun sequencing data. Bioinformatics

The authors resequence 360 wild and cultivated tomato accessions and 2013. btt178.

uncover two independent sets of QTLs that increased fruit size 100x

57. Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin Y-C,

compared to wild tomatoes.

Scofield DG, Vezzi F, Delhomme N, Giacomello S, Alexeyenko A:

45. Qin C, Yu C, Shen Y, Fang X, Chen L, Min J, Cheng J, Zhao S, Xu M, The Norway spruce genome sequence and conifer genome

Luo Y: Whole-genome sequencing of cultivated and wild evolution. Nature 2013, 497:579-584.

peppers provides insights into Capsicum domestication and

58. Resende MF, Mun˜ oz P, Resende MD, Garrick DJ, Fernando RL,

specialization. Proc Natl Acad Sci 2014, 111:5135-5140.

Davis JM, Jokela EJ, Martin TA, Peter GF, Kirst M: Accuracy of

genomic selection methods in a standard data set of loblolly

46. Hirakawa H, Shirasawa K, Miyatake K, Nunome T, Negoro S,

pine (Pinus taeda L.). Genetics 2012, 190:1503-1510.

Ohyama A, Yamaguchi H, Sato S, Isobe S, Tabata S: Draft

genome sequence of eggplant (Solanum melongena L.): the

59. Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, Schlueter JA,

representative solanum species indigenous to the old world.

Donoghue MT, Azam S, Fan G, Whaley AM: Draft genome

DNA Res 2014. dsu027.

sequence of pigeonpea (Cajanus cajan), an orphan legume

crop of resource-poor farmers. Nat Biotechnol 2012, 30:83-89.

47. Elert E: Rice by the numbers: a good grain. Nature 2014,

514:S50-S51.

60. Wang W, Feng B, Xiao J, Xia Z, Zhou X, Li P, Zhang W, Wang Y,

Møller BL, Zhang P: Cassava genome from a wild ancestor to

48. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J,

cultivated varieties. Nat Commun 2014:5.

Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A: The

Sorghum bicolor genome and the diversification of grasses.

61. Cannarozzi G, Plaza-Wu¨ thrich S, Esfeld K, Larti S, Wilson YS,

Nature 2009, 457:551-556.

Girma D, de Castro E, Chanyalew S, Blo¨ sch R, Farinelli L: Genome

and transcriptome sequencing identifies breeding targets in

49. Consortium IBGS: A physical, genetic and functional sequence

the orphan crop tef (Eragrostis tef). BMC Genomics 2014,

assembly of the barley genome. Nature 2012, 491:711-716.

15:581.

50. Chalhoub B, Denoeud F, Liu S, Parkin IA, Tang H, Wang X,

62. McCouch S, Baute GJ, Bradeen J, Bramel P, Bretting PK,

Chiquet J, Belcram H, Tong C, Samans B: Early allopolyploid

 Buckler E, Burke JM, Charest D, Cloutier S, Cole G: Agriculture:

evolution in the post-Neolithic Brassica napus oilseed

feeding the future. Nature 2013, 499:23-24.

genome. Science 2014, 345:950-953.

The authors used sequences from the diploid parental species to sepa-

63. Kim MY, Lee S, Van K, Kim T-H, Jeong S-C, Choi I-Y, Kim D-S,

rate scaffolds by subgenome in the allopolyploid Brassica napus genome,

Lee Y-S, Park D, Ma J: Whole-genome sequencing and

providing an unprecedented look at the early events of polyploidy.

intensive analysis of the undomesticated soybean (Glycine

soja Sieb. and Zucc.) genome. Proc Natl Acad Sci 2010,

51. Mayer KF, Rogers J, Dolezˇel J, Pozniak C, Eversole K, Feuillet C, 107:22032-22037.

Gill B, Friebe B, Lukaszewski AJ, Sourdille P: A chromosome-



based draft sequence of the hexaploid bread wheat (Triticum 64. Li Y, Zhao S, Ma J, Li D, Yan L, Li J, Qi X-t, Guo X-s, Zhang L, He W-

aestivum) genome. Science 2014, 345:1251788. m: Molecular footprints of domestication and improvement in

The authors use flow cytometry to separate individual chromosomes for soybean revealed by whole genome re-sequencing. BMC

sequencing the hexaploid wheat genome unveiling emergent properties Genomics 2013, 14:579.

related to grain composition.

65. Li Y-h, Zhou G, Ma J, Jiang W, Jin L-g, Zhang Z, Guo Y, Zhang J,

52. Pfeifer M, Kugler KG, Sandve SR, Zhan B, Rudi H, Hvidsten TR, Sui Y, Zheng L: De novo assembly of soybean wild relatives for

Mayer KF, Olsen O-A: Genome interplay in the grain pan-genome analysis of diversity and agronomic traits. Nat

transcriptome of hexaploid bread wheat. Science 2014, Biotechnol 2014, 32:1045-1052.

345:1250091.

66. Lane AK, Niederhuth CE, Ji L, Schmitz RJ: pENCODE: a plant

53. Jiang C-X, Wright RJ, El-Zik KM, Paterson AH: Polyploid encyclopedia of DNA elements. Annu Rev Genet 2014:48.



formation created unique avenues for response to selection in This review provides a detailed explanation of the methods and utility of

Gossypium (cotton). Proc Natl Acad Sci 1998, 95:4419-4424. ENCODE projects and the future of crop genomics.

54. Paterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin D, 67. Shan Q, Wang Y, Li J, Gao C: Genome editing in rice and wheat

Llewellyn D, Showmaker KC, Shu S, Udall J: Repeated using the CRISPR/Cas system. Nat Protocols 2014, 9:2395-2410.

www.sciencedirect.com Current Opinion in Plant Biology 2015, 24:71–81