SAJB-00962; No of Pages 9 South African Journal of Botany xxx (2013) xxx–xxx

Contents lists available at SciVerse ScienceDirect

South African Journal of Botany

journal homepage: www.elsevier.com/locate/sajb

The promise of genomics for a “next generation” of advances in higher-level legume molecular systematics

Jeff J. Doyle ⁎

Department of Plant , Cornell University, 412 Mann Library Building, Ithaca, NY 14853, USA article info abstract

Article history: The years since the first International Legume Conference in 1978 have seen a veritable revolution in molecular Received 3 April 2013 phylogenetics. The first two volumes of Advances in Legume Systematics series, which were based on that confer- Received in revised form 20 June 2013 ence, contained no information on DNA-level variation. The next volume in the series, in 1987, had a chapter on Accepted 22 June 2013 the potential of DNA approaches and results of some early studies that used restriction enzyme mapping. The Available online xxxx 1990s saw the application of chloroplast gene sequencing to family-level phylogenetic problems that culminated fi Edited by JS Boatwright in the studies from the rst decade of the present century that have provided the working phylogenetic hypoth- eses for the family. The first full legume nuclear genome sequences have appeared more recently, and, fueled by Keywords: the advent of “next-generation” sequencing in the late 2000s, there is now a flood of genomic and transcriptomic Leguminosae data that is again revolutionizing biology in the way that the molecular revolution did previously. How to take Legumes advantage of this opportunity is a key question for legume systematists, particularly its own “next generation”. Phylogeny The mere availability of massive amounts of data does not guarantee that all phylogenetic problems in legumes Next-generation sequencing will be resolved once and for all; “megadata” is not a panacea, and the apparent rapid radiation of the family and its constituent clades represents a serious technical challenge. Moreover, major analytical controversies are simmering, notably between proponents of concatenation – the conventional way to analyze genome-scale data – and those who favor a coalescent-based tree approach. Assuming that a stable, well-resolved phy- logeny based on the nuclear genome is eventually produced, it will be useful not only for classification, but also for addressing questions involving homoplasy, such as the potential multiple origins of nodulation in the family. The rich legacy of the International Legume Conferences, embodied in the Advances in Legume Systematics, is that the increasing refinement of phylogenetic hypotheses serves the full range of comparative studies on the myriad facets of this huge, diverse, and fascinating family. © 2013 SAAB. Published by Elsevier B.V. All rights reserved.

1. Introduction credited Dr. Boris Krukoff, who had begun his botanical career as a penniless White Russian immigrant to the United States in the This paper is dedicated to Dr. Roger N. Polhill, on the occasion of 1920s, was a botanical explorer interested in economic plants, his 75th birthday (November 21, 2012). It was Roger Polhill, more amassed a fortune, and became a botanical philanthropist, funding than anyone else, who is responsible for the International Legume fellowships at botanical gardens such as Kew, New York, Leiden, Conferences (ILC), having been the principal organizer of ILC1, at and Missouri. According to Roger: the Royal Botanic Gardens, Kew, in July, 1978, when he was Curator of Legumes. He was also the editor, with Peter H. Raven, of the first “Krukoff made up a seed collection of samples running into thou- two volumes of Advances in Legume Systematics (Polhill and Raven, sands… In 1976, on the recommendation of Peter Raven, he was 1981a,b), of which this set of papers comprises the twelfth number. willing to present this collection to Kew on the understanding that The paper is based on the plenary address I was honored to pres- it would be used to assess the distribution of biologically signifi- ent at ILC6, in Johannesburg, South Africa, in January of 2013. In prep- cant compounds and predict what further genera and species aration for that talk, I wrote to Roger, and asked him some questions should be screened. Professor Patrick Brenan was Director of about how ILC1 and Advances came to be. With his characteristic hu- Kew at that time and, from his own distinguished systematic work mility and generosity, he replied with a very informative message on legumes, agreed that the Legume Section I headed at that time that downplayed his role and credited others with much of the inspi- should help co-ordinate the expected objectives that was to in- ration and work that made ILC1 so successful. In particular, he clude a conference within a reasonable time frame. As chemical data began to accumulate it was clear that there was some sort ⁎ Tel.: +1 1 607 255 7972. of reciprocal pattern in the distribution of alkaloids and amino E-mail address: [email protected]. acids, and some individual compounds, such as canavanine, were

0254-6299/$ – see front matter © 2013 SAAB. Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sajb.2013.06.012

Please cite this article as: Doyle, J.J., The promise of genomics for a “next generation” of advances in higher-level legume molecular systematics, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.012 2 J.J. Doyle / South African Journal of Botany xxx (2013) xxx–xxx

showing systematically significant distributions. One morning it characters in the pre-sequencing period, due to its high copy number came to me that we botanists couldn't hold up our part of the plan and conservative evolutionary pattern. unless we set out a clear framework of generic relationships to test RFLP mapping was technically demanding, and was limited by is- the chemical data. The rest, as they say, is history. I just wrote to sues of homology primarily to lower taxonomic levels, though efforts everyone I could think of and asked if they would help – and we were made to infer relationships among legume genera (Lavin and had a fantastic response bar a few mishaps – and managed to ac- Doyle, 1991; Doyle and Doyle, 1993; Bruneau et al., 1995). Thus, in cumulate quite a lot of data from other disciplines to put into the pre-sequencing era the major contribution of cpDNA was from the melting pot.” major structural mutations to the typically very conservative chloro- plast genome. The chloroplast genome is very stable in size and gene The conference that was organized, with much help from Peter order across angiosperms, but two novel chloroplast genome structures Raven, was, of course, ILC1. What struck me from Roger's message in ad- were identified in a subset of legumes: 1) Vicia faba (Koller and Delius, dition to his humility, and as I considered both the contributions that 1980)andPisum (Palmer and Thompson, 1981, 1982)werefoundto have comprised the Advances 1–11 and the topics of talks at ILC6, is lack the large inverted repeat (IR) that includes the 16S and 23S ribo- how diverse the definition of “systematics” hasbeeninthiscommunity somal RNA genes (Palmer et al., 1987); and 2) Pisum was found to from the very beginning. The primary product of ILC1 for most legume have a 50 kilobase (kb) inversion in the large single copy region researchers is Advances in Legume Systematics, Part 1 (Polhill and (Palmer and Thompson, 1981, 1982). It remained to screen large num- Raven, 1981a), which was, until replaced by Legumes of the World bers of legume species and genera for the presence of these mutations. (Lewis et al., 2005), the “bible” of legume systematics — the definitive Screening for the first of these mutations resulted in the recognition of classification and the source of hypotheses to be tested with newly an “Inverted Repeat Lacking Clade” (IRLC) within Polhill and Raven's emerging molecular methods. Yet, as Roger notes, that there was an (1981a), “Temperate Herbaceous Group” of papilionoid legumes that ILC1 (and hence, an Advances, Part 1)inthefirst place can be traced included the “galegoid ” tribes but not Loteae or Coronilleae (Palmer et back to a desire to predict where interesting chemistry might be al., 1987; Lavin et al., 1990; Liston, 1995). The 50 kb inversion was found in the family. Thus, phylogeny and classification were in service found to characterize most but not all papilionoids (Doyle et al., to other areas of biology, rather than specific ends in themselves. This 1996a). The naturalness of the groups identified by these mutations seems very much in keeping with G. G. Simpson's (1961) definition of has withstood the tests of chloroplast sequence phylogenies (e.g., systematics as “the scientific study of the kinds and diversity of organ- Wojciechowski et al., 2004; Cardoso et al., 2012)andtheyremain isms and of any and all relationships among them.” useful markers for major papilionoid groups. Other chloroplast In this paper, however, I will for the most part ignore the many structural mutations were found that characterized smaller groups of fascinating areas of legume biology covered in Advances, Part 2 legumes, for example a 78 kb inversion unique to most Phaseolineae (Polhill and Raven, 1981b) and focus on the continuing goal of eluci- (Bruneau et al., 1990) and various gene and intron losses (e.g., Doyle dating higher-level relationships within the family, generally using et al., 1995; Bailey et al., 1997; Lai et al., 1997; Jansen et al., 2008). genera as the units of study. Specifically, I will deal with DNA data, which since ILC1 have become the primary means of hypothesizing 3. The PCR/sequencing revolution and chloroplast genes relationships and reconstructing phylogenies. For my conference pre- sentation, I was asked to talk about the ongoing technological revolu- DNA sequencing had been available for many years (e.g., Sanger et tion in genomics, and what massive amounts of sequence data can al., 1977) but had found little use in plant molecular systematic studies and cannot do to advance our understanding of higher-level legume because of the difficulty of obtaining sufficient amounts of genes for relationships. Accordingly, I will focus primarily on the low copy sequencing. This changed with the use of thermostable polymerases genes that make up the bulk of coding sequences in the nuclear ge- that permitted the automation of targeted sequence amplification: the nome, but which have as yet made relatively little impact on legume polymerase chain reaction (PCR; Saiki et al., 1985). Now, in theory, phylogenetics, and will discuss how “next generation” technologies the only limitation was on designing oligonucleotide primers to amplify might–or might not–finally allow the potential of these genes to be sequences of interest. Here, once again, the evolutionarily conservative realized. I will discuss other sequences (e.g., chloroplast DNA, ribo- chloroplast genome was the target of choice. In particular, the somal genes) relatively briefly and not in proportion to their impor- chloroplast-encoded large subunit of ribulose bisphosphate carboxylase tant contributions to our current state of phylogenetic knowledge in (rbcL) became the focus of studies in many plant families, leading to the the legumes. development of “universal” primers, and, in 1993, the publication in Annals of the Missouri Botanical Garden of a phylogenetic hypothesis for nearly 500 angiosperm species representing a broad range of fami- 2. The early days of legume molecular systematics: The RFLP era lies (Chase et al., 1993). This paper ushered in the era of unparalleled progress in higher-level plant molecular phylogenetics. It is now difficult to conceive of a time when there was no plant The Chase et al. (1993) paper included only three legume species, molecular systematics. Yet consider the career of one of the foremost from three genera: Bauhinia, Albizia, and Medicago. However, even pioneers of plant molecular systematics, Jeffrey Palmer. Although he before the publication of this seminal work, Michael Wink and his published on the structure of legume chloroplast genomes in 1981 group in Heidelberg were exploring the utility of rbcL for legume sys- (Palmer and Thompson, 1981), his earliest major molecular phylogenet- tematics. In a series of papers (Käss and Wink, 1992, 1995, 1996, ic paper was a 1982 study using chloroplast DNA restriction fragment 1997a,b) they produced phylogenetic hypotheses for the family, con- length polymorphisms (cpDNA RFLPs) to hypothesize relationships centrating particularly on Lupinus and its genistoid relatives. In paral- among tomato (Solanum lycopersicon) and its relatives (Palmer and lel and beginning later, another series of family-wide phylogenetic Zamir, 1982). Palmer's first cpDNA phylogeny paper involving legumes hypotheses were generated focusing on phaseoloid legumes (Doyle, was on Pisum in 1985 (Palmer et al., 1985). That year I published my 1995; Doyle et al., 1997; Kajita et al., 2001). first legume molecular systematics paper (Doyle and Beachy, 1985) Additional chloroplast genes were explored for phylogenetic utility using the same RFLP approach but targeting the nuclear ribosomal at different taxonomic levels. An early work by Taberlet et al. (1991) gene cistron (nrDNA), whose use was promoted by one of my mentors, identified non-coding regions useful among taxa with low levels of di- the pioneering plant molecular systematist, Elizabeth Zimmer. However, vergence, and this work was continued in the “tortoise and hare” papers it was Palmer and a long line of postdoctoral fellows and students whose (Small et al., 1998; Shaw et al., 2005, 2007). Chloroplast sequences work popularized chloroplast DNA as the primary source of molecular more variable than rbcL but not too variable to align with confidence

Please cite this article as: Doyle, J.J., The promise of genomics for a “next generation” of advances in higher-level legume molecular systematics, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.012 J.J. Doyle / South African Journal of Botany xxx (2013) xxx–xxx 3 across genera were also identified, one of which, matK, was used by of in plants (Wood et al., 2009) and rampant birth and Wojchiechowski and colleagues in 2004 to produce what remains the death of lcn genes (Nei and Rooney, 2005; Lynch and Conery, 2003), definitive phylogenetic hypothesis for papilionoid legumes. It is this a set of primers that amplifies a single copy of a gene in one taxon phylogeny that was used in the seminal Lavin et al. (2005) paper that may amplify several duplicates (paralogues) even in a close relative. provided the first objectively derived divergence time estimates within Because only orthologous genes can be used to construct accurate the legumes and discussed the biogeography and diversification of the species trees (e.g., Doyle, 1992), this is a serious problem. family; this hypothesis is also the basis for the phylogenetic classifica- Therefore, much effort has been expended in identifying sets of tion of Legumes of the World (Lewis et al., 2005). More recently, orthologous genes, for example the conserved orthologue sets (COS) Bruneau et al. (2008) published a comprehensive chloroplast gene phy- in asterids (Wu et al., 2006). In legumes, Doug Cook at the University logeny focusing on “caesalpinioid” and mimosoid legumes that became of California, Davis, led a broad-based effort that identified 167 puta- the definitive study for these groups. Other studies have used sequences tively orthologous genes across seven crop and model legume spe- of one or more chloroplast regions–coding or non-coding–to produce cies, and used these genes to show conservation of synteny across hypotheses of legume relationships, for example within major clades the papilionoid subfamily (Choi et al., 2004). Scherson et al. (2005) of papilionoids (e.g., Lavin et al., 2001; Stefanovic et al., 2009), but the developed four of these loci for studies at lower taxonomic levels, Wojciechowski et al. (2004) and Bruneau et al. (2008) studies are the using Astragalus as a case study, and McMahon (2005) used one of most comprehensive to date using chloroplast DNA. these genes for phylogenetic studies in Amorpheae. Over 100 of these genes were tested subsequently on over 90 legume genera 4. The quest for nuclear genes representing all three subfamilies (Choi et al., 2006). Alignment was difficult, and it was clear that paralogy remained a serious problem As noted above, cpDNA eclipsed its early rival in plant molecular in many taxa, but it was possible to construct a phylogeny for using systematics, nuclear ribosomal genes. PCR leveled the playing field, a concatenated alignment of eight genes for 40 diverse papilionoid however, and when Baldwin (1992) showed that “universal” amplifi- species that gave results consistent with chloroplast phylogenies, cation primers from fungi could be used to amplify angiosperm se- and thus suggested that these genes were orthologous in that set of quences for the nuclear ribosomal gene internal transcribed spacers taxa (Choi et al., 2006). (ITS), another revolution began. The nrDNA ITS became the work- There remain relatively few examples of lcn genes being used for horse for molecular phylogenetic studies, and by 2003, Alvarez and higher level plant phylogenetics, though recently Manzanilla and Wendel noted that over 30% of all plant phylogenies were based ex- Bruneau (2012) used the lcn gene, sucrose synthase (originally devel- clusively on the nrDNA ITS. Because it can be very difficult to align oped by Choi et al., 2004) along with chloroplast sequences, to recon- as sequence divergence increases, it has most often been used at struct phylogenies of caesalpinioid legumes. Discussions of the utility of low taxonomic levels, though in legumes it has been used successful- lcn genes more often have focused on lower taxonomic levels, probably ly, often in concert with chloroplast genes, for infratribal studies, for because the need for multiple independent markers has received more example in Amorpheae (McMahon and Hufford, 2004), Indigofereae attention in species-level studies. In legumes, a few examples are the (Schrire et al., 2009), and in robinioid legumes (Lavin et al., 2003). use of CYCLOIDEA within Lupinus (Hughes and Eastwood, 2006), and The ITS has not had as much impact on higher-level systematics, CNGC5, β-cop-ike, and Ga3ox1 in Medicago (Maureira-Butler et al., though it is a target of attempts to create supermatrices or supertrees 2008; Steele et al., 2010). Efforts continue to assist systematists identify even today (Legume Phylogeny Working Group, 2013). Despite its candidate lcn genes for such studies, for example taking advantage of undisputed success and popularity, the ITS region is fraught with the large number of expressed sequence tags–short sequences from technical and theoretical problems, leading Alvarez and Wendel the ends of complementary DNA (cDNA, produced from messenger (2003) to state, “we recommend that ITS no longer be routinely uti- RNAs)–deposited in GenBank to facilitate the construction of PCR lized for phylogenetic analysis, opting instead for using several or primers (e.g., Ilut et al., 2012). The use of ESTs on a phylogenomic scale more different single-copy nuclear loci”. This, however, has been to address higher taxonomic level questions was explored (Sanderson more easily said than done, and the ITS remains the nuclear genome and McMahon, 2007), and a supermatrix for over 2000 legume taxa region of choice for most plant systematists. was analyzed, providing results consistent with prevailing views of le- As early as Advances in Legume Systematics, Part 3 (Stirton, 1987), low gume phylogeny (McMahon and Sanderson, 2006). However, even at copy nuclear (lcn) genes were discussed as a potential source of charac- low taxonomic levels there has been considerable sentiment that lcn ters for legume systematics (Doyle, 1987), and legume storage protein genes require much effort to be used successfully (e.g., Hughes et al., genes were among the earliest plant genes sequenced (e.g., Sun et al., 2006), and that they have not lived up to their potential (Feliner and 1981) and used in comparative sequence analyses (Schuler et al., 1983; Rossello, 2007). Indeed, Feliner and Rossello (2007) concluded that the Doyle et al., 1986). With their virtually unlimited numbers–for example, nrDNA ITS remained the best choice for a nuclear gene—as their title the Medicago nuclear genome is estimated to have 47,845 genes (Young put it, “better the devil we know”. et al., 2011–biparental mode of evolution and ability to recombine (both in contrast to cpDNA), lower levels of concerted evolution than ribosom- 5. The next generation: breakthroughs in sequencing technology al genes, and a mixture of conserved exons and variable introns, lcn genes present a seemingly obvious solution to the problem of obtaining The last decades have witnessed a tremendous growth in the num- numerous independent estimates of phylogenetic relationships at a wide ber of DNA sequences deposited in databases such as GenBank. A graph range of taxonomic levels (e.g., Doyle and Doyle, 1999; Sang, 2002). at www.ncbi.nlm.nih.gov/genbank/genbankstats-2008/ shows that the Promisingly, some of the earliest plant phylogenetic studies to use lcn GenBank database grew from around 2 million sequences in 1998 to 80 genes were in legumes, most importantly at relatively high taxonomic million by 2007. Up to 2007, nearly all DNA sequences were produced levels as in the use of phytochrome genes for studying relationships using the Sanger DNA sequencing technology (Sanger et al., 1977). among millettioid genera (Lavin et al., 1998), but also at the species But how DNA was sequenced had changed over the years, from large level within Glycine (Doyle et al., 1996b). format gels using radioactively labeled samples, and X-ray films However, identifying lcn genes that are useful for a new group of that needed to be read individually, to robotically controlled auto- taxa is a major challenge. In contrast to nrDNA ITS or chloroplast mated capillary sequencing with fluorescent dye labeled samples genes, primers are not nearly as “universal” because even conserved producing electropherograms generating automated sequence exons are composed of codons with silent sites that can vary even files. This not only streamlined the sequencing process, but led to among closely related species. Moreover, given the high frequency constantly decreasing costs per-base. As illustrated graphically at

Please cite this article as: Doyle, J.J., The promise of genomics for a “next generation” of advances in higher-level legume molecular systematics, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.012 4 J.J. Doyle / South African Journal of Botany xxx (2013) xxx–xxx www.genome.gov/sequencingcosts (Wetterstrand, 2013), the cost $1,000,000 US—well within the range of a US NSF Plant Genome of sequencing dropped from nearly $10,000 US/megabase (one mil- Research Program five year award. lion bases) in 2001 to a few hundred US dollars per raw megabase in But is it necessary to obtain fully assembled genome sequences? 2007. The curve closely followed Moore's law (Moore, 1965), named For phylogeny reconstruction, the answer is most definitely “no”. for the founder of Intel, Gordon Moore, who in the 1960s predicted The large fraction of the typical plant genome that is composed of that the number of transistors that could fit on an integrated circuit moderately to extremely highly repeated sequences is of little value would double every two years for the foreseeable future. This rela- for conventional phylogeny reconstruction. Instead, it is the set of tionship has proven to be an accurate predictor for many areas of ca. 50,000 lcn genes (e.g., 47,845 in Medicago truncatula: Young et technology, such as those that underlie improvements to DNA se- al., 2011) that is the most attractive target for such studies. To obtain quencing during the period in question. usable contigs of such genes by genomic sequencing, relatively deep In 2008, however, the per base cost of sequencing departed precip- coverage is required. Shallow coverage, from “genome skim” data, is itously from the Moore's Law curve, and by the end of 2012 had fallen useful for abundant sequence classes in the DNA used for library con- to around 10 cents (US) per megabase. This dramatic change was due struction. Thus, genome skims produce sequences of the nrDNA family, to the advent of “next-generation” (next-gen; NGS) high-throughput and also–because organellar DNA is almost always a major contaminant sequencing using technologies radically different from Sanger sequenc- in nuclear DNA preparations–nearly complete sequences of the chloro- ing (e.g., Liu et al., 2012). These technologies also differ from one anoth- plast and mitochondrial genomes (Straub et al., 2012). Whole chloro- er, but all have in common the ability to generate massive amounts of plast genome sequences are playing an increasingly prominent role in sequence information in a short period of time, by bringing together angiosperm phylogeny (e.g., Wu and Ge, 2012), and would certainly nanotechnology, advanced imaging methods, and high-powered com- be useful for legume systematics. puting. Instead of producing a single sequence from a population of There are several methods that produce data preferentially from lcn PCR products, next-gen methods such as those used by Illumina genes suitable for constructing gene trees, the two most important of (Bentley et al., 2008)or454(Margulies et al., 2005)provideashort which are transcriptome sequencing and sequence capture. The (b500 base) sequence “read” from one or both ends of each of the mil- transcriptome is the total pool of transcribed RNA in a cell or tissue; lions of individual molecules in a library of DNA sequences constructed, cDNA libraries made from mRNA are a major target of sequencing for example, from total genomic DNA or from complementary DNA projects, and next-gen sequencing of a transcriptome provides data on (cDNA, produced from the total mRNA) of a tissue sample. Reads are the transcribed regions of all genes expressed in that cell or tissue. then assembled into contiguous sequences (“contigs”), either de novo Illumina sequencing provides such deep coverage that reads for nearly or using a reference genome sequence. Low costs per sequenced base all genes in the genome can be obtained from the transcriptome of a tis- are achieved by the large volume of sequences produced per lane sue or organ with multiple cell types. In a study targeting the leaf (Illumina) or picotiter plate (454). Thus, using the cost per megabase transcriptomes of several perennial Glycine species for studying photo- given in Wetterstrand (2013) as a rough guide, the cost of sequencing synthesis in allopolyploids (Ilut et al., 2012), we obtained reads for 88% 1100 Mb–the genome size of Glycine max–would be around $110 US. of the over 65,000 gene models in the soybean genome (Schmutz et al., Multiply that number by around 700, and the cost of generating the 2010). However, this does not mean that we could construct gene trees number of base pairs representing a sequence for each legume genus from over 50,000 genes, even for the small number of species in our would be under $100,000 US. study. Transcriptome read counts reflect the abundance of RNAs in Of course, this amount of money would not be close to enough for the tissue, and this is strongly biased toward genes involved in the pri- the task of generating a usable genomic sequence for each genus. As- mary function of the tissue—photosynthesis in the case of our leaf sembling an accurate genome sequence requires multiple reads at transcriptomes. Read counts ranged from one to over 100,000, and the each position in the genome, and this depth of coverage is even more top 5% of the genes accounted for over 65% of the reads (Ilut et al., critical for short read whole genome shotgun sequencing than for the 2012). The practical result of this is that relatively few genes will have older methods such as those used to generate the original draft coverage across their entire length—the ideal situation for phylogenetic human genome sequence. And there are other costs, as well. The cost analysis. Because mRNAs lack introns, transcriptomic data are most use- of constructing a library for sequencing varies widely, but can be tens ful at higher taxonomic levels. to hundreds of dollars depending on the method. The overall costs of The “1000 plants” (1KP; http://www.onekp.com/index.html) generating the actual sequence data, including library costs are dwarfed project is sequencing transcriptomes from (as the name indicates) by the time and effort required to analyze the data—for example, a 2009 1000 taxa across the plant kingdom, including around 20 taxonomically Nature Methods Supplement commentary titled “Next Generation Gap” diverse legume genera. Currently, a group led by Jim Leebens-Mack (U. warned that “the coveted $1000 genome may come with a $20,000 of Georgia, USA) and Steven Cannon (Iowa State U., USA) is conducting analysis price tag” (McPherson, 2009). a phylogenomic study on these taxa using hundreds of gene trees to test The $1000 genome is very likely to be a reality in the near future. As hypotheses concerning the timing of a polyploidy event known from delegates met at ILC6, the Archon Genomics X-Prize (http://genomics. most papilionoid legumes but lacking in caesalpinioids (Cannon et al., xprize.org/competition-details/prize-overview) was being advertised, 2010; Doyle, 2012). Determining whether this event occurred in the offering $10 million US to the team that could “…submit 100 human common ancestor of all papilionoids or somewhat later is of interest genome sequences in 30 days or less at a maximum cost of $1000 in understanding the possible connection of polyploidy with diversifica- USD per genome sequence, attain an accuracy score of no more than tion of the subfamily, and in understanding the possible connection be- one error per 1,000,000 bases, present each genome as 98% complete, tweenpolyploidyandtheoriginofnodulationinpapilionoids(Doyle, and provide accurate haplotype phasing…”. The human genome, with 2011; Young et al., 2011). a haploid size of around 3000 Mb, is larger than the mean value of Next-gen sequencing can also be used in several ways to study lcn 2129.5 Mb for the 722 legume species in the RBG Kew Angiosperm genes. Heterozygosity, whether in a “pure” species or a diploid or allo- C-value database (as of March, 2013: http://data.kew.org/cvalues/; polyploid , is a serious problem for conventional sequencing, be- range: 298 Mb for Leucaena macrophylla to 26,797 Mb for tetraploid cause a single sequence is obtained for the entire pool of PCR V. faba), which includes both diploid and polyploid species and a amplification products, leading to ambiguity and, when insertions or de- high representation of the large genome vicioid genera such as letions occur between alleles, unreadable sequences. Next-gen methods Lathyrus (47 species, mean 7493.72 Mb/C) and Vicia (81 species, report individual reads, so it is possible to determine the sequences of all mean 4956.56 Mb/C). At a price of $1000 US per sample, a genome alleles at a heterozygous locus. However, to obtain millions of reads from sequence from one species of each legume genus would cost under one locus of a single individual would be prohibitively expensive. A

Please cite this article as: Doyle, J.J., The promise of genomics for a “next generation” of advances in higher-level legume molecular systematics, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.012 J.J. Doyle / South African Journal of Botany xxx (2013) xxx–xxx 5 solution is to amplify several genes of interest from each of many indi- may not have the same topology, let alone the same branch lengths, viduals, pool the amplicons by individual, construct libraries for each in- as the tree relating the organisms themselves—the “species tree”. dividual in which all of the sequences bear an identifying sequence tag, The gene tree–species tree problem has been discussed at length then run several individuals in the same Illumina lane (or 454 picotiter (e.g., Doyle, 1992; Maddison, 1997); in a recent paper, Edwards plate). Informatic methods are then used to separate sequences by locus (2009) reviewed the phenomena commonly incriminated in the and individual. We used this approach to study variation at several gene tree–species tree problem: deep coalescence (lineage sorting), genes in 72 accessions representing diploid members of the Medicago mistaken orthology, and hybridization/introgression—and added a sativa complex, but it was not without problems, notably in vitro re- fourth, branch length heterogeneity. combination when target sequences were re-amplified to obtain suffi- Large numbers of genes or whole nuclear genomes can be useful cient amounts of gene products (Sakiroglu et al., 2012). for phylogeny reconstruction for two rather different reasons. First, A more common way of targeting many genes at once is sequence there is obviously a positive correlation between the number of nu- capture (Cronn et al., 2012; Grover et al., 2012) in which a cleotides sequenced and the number of characters available for phy- custom-made microarray chip is spotted with hundreds to thousands logeny reconstruction. Taking advantage of this, a common method of “bait” sequences, each from one or more regions of genes of inter- in phylogenomics is to string together (concatenate) many genes est. Total DNA is hybridized to the chip, and sequences homologous to or DNA segments in a single large supermatrix that can then be ana- the bait sequences are preferentially recovered for next-gen sequenc- lyzed to produce a “genome tree”. This should, and generally does, ing. The feasibility of sequence capture for phylogenetic studies at a lead to convergence on a single, highly resolved “genome tree” range of taxonomic levels has been illustrated for vertebrates with high support values for relationships (e.g., Rokas et al., 2003). (Faircloth et al., 2012; Lemmon and Lemmon, 2012). A recent plant example is Lee et al. (2011), in which over 20,000 There are many useful reviews of next-gen methods (e.g., Metzker, protein-coding genes were used to produce a tree for over 100 gen- 2010). A set of very useful papers (some of which are cited in this sec- era of land plants that was fully resolved, with bootstrap support tion) was produced recently as part of a symposium on the use of greater than 95% for many nodes. This approach does not lead to next-gen methods in plant biology, with an overview and excellent in- “ending incongruence” (Gee, 2003)butmerelyhidesit.The“noise” troduction provided by the organizers (Egan et al., 2012). The of incongruent data remains, but it is swamped out by the over- take-home message is that for many years generating sequence data whelming signal from thousands of nucleotides that, it is hoped, was a limiting factor in phylogenetic studies, but that this is now no are tracking the true pattern of . longer the case, and the flood of sequence data will only increase in The second use for large numbers of nuclear genes or gene regions is volume. Already, “next-generation” methods are being called “second- to extract information from each individual historically independent generation” because third-generation methods (Schadt et al., 2011) segment of DNA—a “gene” in the sense used in coalescent theory. The are providing longer reads at still lower costs. With ever-improving legume chloroplast genome is a single “coalescent gene” because in na- technology it seems likely that even studies that require dense sampling ture chloroplast genomes with different histories are not known to re- of species to address biogeographic questions (e.g., Simon et al., 2011), combine (e.g., Doyle, 1992). In the nuclear genome, as Rosenberg and or necessitate extensive sampling of individuals for populational analy- Nordborg (2002) put it, due to recombination “unlinked or loosely ses may, in the future, turn to genome-wide sequencing methods. Al- linked loci can often be viewed as independent replicates of the evolu- ready, methods that permit low cost sampling across entire genomes, tionary process” each of which can yield information on the complex such as genotyping by sequencing (GBS; Elshire et al., 2011), which history of species. Many genes may track the overall history of the pop- combines restriction enzyme digestion with next-generation sequenc- ulations sampled, but others will instead track introgressions from ing, are being used to survey variation in many individuals, for example other species. Two genes can suggest different species relationships 840 individuals in a study of switchgrass (Panicum virgatum; Lu et al., due to incomplete lineage sorting of ancestral polymorphisms; most 2013). functional genes will show evidence of purifying selection, but others will be under other forms of selection, resulting in very different branch 6. “Megadata” is (are?) not a panacea lengths and potentially different topologies, as well. The Edwards (2009) paper mentioned above is titled, provocatively, Even beyond the basic informatics issues involved in making sense “Is a new and general theory of molecular systematics emerging?” In it, of next-gen data, the sheer amount of data does not overcome the dif- he makes the case that a new “species tree paradigm” should replace ficulties in using such data for addressing phylogenetic questions. the current “concatenation paradigm” because concatenation merges This does not make systematics any different from other fields of bi- the historical “replicates” (Rosenberg and Nordborg, 2002) into a single ology being transformed by the availability of massive amounts of se- estimate, in which the richness of the many stories told by individual quence data. Journals such as Nature are full of warnings that genes is lost. Moreover, he points to studies showing that this single es- researchers not be seduced by data volume. For example, an article timate can be positively misleading (Edwards et al., 2007; Kubatko and on how cancer research is “missing the mark” ( Buchen, 2011) quoted Degnan, 2007). Although the gene tree is the unit of most current spe- a biostatistician, Lisa McShane, as saying, “Sometimes the glamour of cies tree methods employing coalescent theory (e.g., *beast: Heled the technology or the sheer volume of omics data seem to make in- and Drummond, 2010), Edwards (2009) notes that models underlying vestigators forget basic scientific principles.” And in a Nature article the new paradigm “go so far as to integrate out gene trees as nuisance on “megadata” (Vance, 2012), John Quackenbush, an informatician parameters”. If gene trees are not necessary, why construct them at involved in the Human Genome Project, said, “… you can be a little all? As noted above, gene tree construction is difficult from short read naïve in thinking that having a lot of data will suddenly solve all of next-gen data. Moreover, the coalescent gene can be as small as a single your problems. Big data is not a panacea”. Indeed, massive amounts nucleotide position given sufficient recombination. Accordingly, Bryant of data can be massively misleading, strongly supporting wrong con- et al. (2012) have developed a promising method that uses SNPs direct- clusions; to quote the noted American baseball philosopher, Yogi ly to reconstruct species trees under the coalescent, bypassing the need Berra, “We're lost, but we're making good time”. for gene trees constructed from contiguous stretches of sequence. Early Systematists have been here before, of course, many times in the attempts to use this method on genome-scale data from Medicago were history of the field, most recently with uncritical acceptance of molec- unsuccessful due to its large computational requirements (Yoder et al., ular phylogenetics. It remains important to remind ourselves con- in press); our work with a comparable number of taxa in Glycine has stantly that a tree showing the relationships of sequences sampled been more successful from a technical standpoint, though our results from a group of organisms–a “gene tree” (Pamilo and Nei, 1988)– suggest that combining large numbers of SNPs (or genes, in parallel

Please cite this article as: Doyle, J.J., The promise of genomics for a “next generation” of advances in higher-level legume molecular systematics, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.012 6 J.J. Doyle / South African Journal of Botany xxx (2013) xxx–xxx gene tree-based approaches) does not eliminate problems with paralogy Much remains to be done to meet both of these requirements in le- (Bombarely and Doyle, unpublished data). gumes. Resolving relationships among early-diverging papilionoid How the concatenation vs. species tree debate turns out remains lineages (Cardoso et al., 2012), and in the sister groups to the core to be seen, but it is clear that the mere availability of virtually unlim- mimosoids is central to meeting the first requirement. The availabil- ited data does not mean that all phylogenetic problems will be re- ity of genome-scale data should facilitate this, although given the solved. Reconstructing relationships among legumes will remain a likelihood that legume radiation was rapid, this may be very difficult. serious challenge, given the rapid radiation of the family and resulting Genomic data, coupled with the availability of transcriptome atlases short branches at many critical places in the phylogeny (Lavin et al., from nodules and other organs (e.g., Libault et al., 2010a,b; Severin et 2005; Legume Phylogeny Working Group, 2013). al., 2010) also will enable phylogenomic tests of nodule origins (Doyle, 1994), and should lead to testing the hypothesized connec- 7. Beyond phylogeny: Roger Polhill's legacy tions between nodulation and polyploidy in papilionoids (Young et al., 2011). Regardless of which phylogenetic approach ultimately wins out– Clearly, as Roger Polhill realized in the 1970s, and as the twelve and what new approach eventually supplants both–neither does volumes of the Advances series amply document, there are a multi- much more, by itself, than provide a phylogenetic hypothesis. A phylo- tude of questions for which an understanding of relationships in the genetic hypothesis is interesting in its own right, but, more importantly, family is essential. The size, diversity, ecological roles, and economic it provides a framework for addressing other biological questions. As importance of the Leguminosae (e.g., Doyle and Luckow, 2003; Dobzhansky (1973) famously said, “Nothing in biology makes sense ex- Lewis et al., 2005) makes addressing these questions of great signifi- cept in the light of evolution”, to which a corollary has been added by cance not only for legume systematists, but for the planet. systematists (e.g., http://systbio.org/teachevolution.html) that “noth- I would like to close this tribute to Roger Polhill with a last quote ing in evolution makes sense except in the light of phylogeny”.Solet from his message to me: “The world has changed much in the subse- us assume that, in the next decade or so, a definitive phylogeny of le- quent thirty years, and seems to be speeding up, but I have really gumes is produced that remains stable in the face of subsequent meth- relished the ‘advances’ the community has made, and the spirit in odological advances in phylogeny reconstruction. This will provide a which it has done so.” We legume researchers owe a debt of gratitude solid foundation for an improved classification of the family, and for ad- to Roger for his seminal contributions to legume systematics and biolo- dressing such issues as conservation and biogeography (e.g., Simon et gy, which have led us, now, to Advances, Part 12. al., 2011; Saerkinen et al., 2012) as well as patterns of species diversifi- cation (e.g., Drummond et al., 2012). Acknowledgments A well-supported phylogeny is also fundamental to distinguishing homology from homoplasy. Understanding homology underlies much I am grateful to the organizers of ILC6, particularly Ben-Erik van of biology, from categorizing relationships among genes (e.g., paralogy, Wyk, for the invitation to speak at the conference and to write this orthology, homoeology), to addressing questions of the origins of mor- paper, and to two reviewers for their helpful comments. I thank phological structures (Patterson, 1988). Similarly, instances of incon- Jane Leclere Doyle for the instrumental role she has played in the gruence with phylogenetic pattern are also of great interest. In 1996, many legume projects for which she has generated data, trained stu- Michael Sanderson, one of the foremost thinkers in systematic biology dents and postdocs, and kept the lab running. I am also grateful to the in the last decades, who has made significant contributions to legume many lab members and visitors who contributed their knowledge of, phylogeny, co-edited (with Larry Hufford) a book titled, Homoplasy: and enthusiasm for legumes, and to Roger Polhill for his early encour- The Recurrence of Similarity in Evolution (Sanderson and Hufford, agement to apply molecular phylogenetic approaches to the family. 1996). The thesis of the book is that although homoplasy is often treated The National Science Foundation has supported our legume systemat- as a problem, obscuring phylogenetic relationships, such incongruence ic work for nearly 30 years, currently through grants 0822258 and is of fundamental interest and importance in evolution. The study of ho- 0939423. moplasy, according to Wake (1996), writing in the introduction to the book, “is a venerable area of inquiry that was largely shunted aside dur- References ing the past 25 years as interest focused on phylogenetics per se”.Phy- “ Alvarez, I., Wendel, J.F., 2003. Ribosomal ITS sequences and plant phylogenetic inference. logenetic incongruence is a window into genome history and – ” Molecular Phylogenetics and Evolution 29, 417 434. molecular evolution (Wendel and Doyle, 1998), and for Wake et al. Bailey, C.D., Doyle, J.J., Kajita, T., Nemoto, T., Ohashi, H., 1997. The chloroplast rpl2 intron (2011) homoplasy can “present opportunities to discover the founda- and ORF184 as phylogenetic markers in the legume tribe Desmodieae. Systematic tions of morphological traits”. One of the principal examples they dis- Botany 22, 133–138. “ ” Baldwin, B.G., 1992. Phylogenetic utility of the internal transcribed spacers of nuclear ri- cuss is the evolution of eyes, emphasizing that the deep homologies bosomal DNA in plants: an example from the Compositae. Molecular Phylogenetics underlying the parallel or convergent origin of eyes are homoplasies. and Evolution 1, 3–16. The discussion of eyes by Wake et al. (2011) spans much of the an- Bentley, D.R., Balasubramanian, S., Swerdlow, H.P., Smith, G.P., Milton, J., Brown, C.G., Hall, K.P., Evers, D.J., Barnes, C.L., Bignell, H.R., Boutell, J.M., Bryant, J., Carter, R.J., imal kingdom, but of course does not extend to plants. However, the Cheetham, R.K., Cox, A.J., Ellis, D.J., Flatbush, M.R., Gormley, N.A., Humphray, S.J., concept of “deep homology” is relevant to the origin and evolution of Irving, L.J., Karbelashvili, M.S., Kirk, S.M., Li, H., Liu, X., Maisinger, K.S., Murray, any evolutionarily novel structure, such as nodules, the structures L.J., Obradovic, B., Ost, T., Parkinson, M.L., Pratt, M.R., Rasolonjatovo, I.M.J., Reed, M.T., Rigatti, R., Rodighiero, C., Ross, M.T., Sabot, A., Sankar, S.V., Scally, A., that house nitrogen-fixing bacterial symbionts in legumes and a Schroth, G.P., Smith, M.E., Smith, V.P., Spiridou, A., Torrance, P.E., Tzonev, S.S., handful of other rosid taxa (reviewed by Doyle, 2011). The homolo- Vermaas, E.H., Walter, K., Wu, X., Zhang, L., Alam, M.D., Anastasi, C., Aniebo, I.C., gies of legume nodules have long been questioned, as reflected in Bailey, D.M.D., Bancarz, I.R., Banerjee, S., Barbour, S.G., Baybayan, P.A., Benoit, the title of a 1997 paper by Hirsch and Larue (1997): “Is the legume V.A., Benson, K.F., Bevis, C., Black, P.J., Boodhun, A., Brennan, J.S., Bridgham, J.A., Brown, R.C., Brown, A.A., Buermann, D.H., Bundu, A.A., Burrows, J.C., Carter, N.P., nodule a modified root or stem or an organ sui generis?”. More recent- Castillo, N., Catenazzi, M.C.E., Chang, S., Cooley, R.N., Crake, N.R., Dada, O.O., ly, Markmann and Parniske (2009) have asked, “How novel are Diakoumakos, K.D., Dominguez-Fernandez, B., Earnshaw, D.J., Egbujor, U.C., nodules?”. Based on current phylogenies, it is most parsimonious to Elmore, D.W., Etchin, S.S., Ewan, M.R., Fedurco, M., Fraser, L.J., Fajardo, K.V.F., Furey, W.S., George, D., Gietzen, K.J., Goddard, C.P., Golda, G.S., Granieri, P.A., hypothesize that nodulation arose independently numerous times, in- Green, D.E., Gustafson, D.L., Hansen, N.F., Harnish, K., Haudenschild, C.D., Heyer, cluding several times within legumes (Sprent, 2007; Doyle, 2011). Test- N.I., Hims, M.M., Ho, J.T., Horgan, A.M., Hoschler, K., Hurwitz, S., Ivanov, D.V., ing homologies of nodules throughout the family has two prerequisites: Johnson, M.Q., James, T., Jones, T.A.H., Kang, G., Kerelska, T.H., Kersey, A.D., Khrebtukova, I., Kindwall, A.P., Kingsbury, Z., Kokko-Gonzales, P.I., Kumar, A., 1) a well-resolved, strongly-supported phylogeny; and 2) accurate in- Laurent, M.A., Lawley, C.T., Lee, S.E., Lee, X., Liao, A.K., Loch, J.A., Lok, M., Luo, S., formation on the ability of phylogenetically critical species to nodulate. Mammen, R.M., Martin, J.W., McCauley, P.G., McNitt, P., Mehta, P., Moon, K.W.,

Please cite this article as: Doyle, J.J., The promise of genomics for a “next generation” of advances in higher-level legume molecular systematics, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.012 J.J. Doyle / South African Journal of Botany xxx (2013) xxx–xxx 7

Mullens, J.W., Newington, T., Ning, Z., Ng, B.L., Novo, S.M., O'Neill, M.J., Osborne, Drummond, C.S., Eastwood, R.J., Miotto, S.T.S., Hughes, C.E., 2012. Multiple continental M.A., Osnowski, A., Ostadan, O., Paraschos, L.L., Pickering, L., Pike, A.C., Pike, A.C., radiations and correlates of diversification in Lupinus (Leguminosae): testing for Pinkard, D.C., Pliskin, D.P., Podhasky, J., Quijano, V.J., Raczy, C., Rae, V.H., key innovation with incomplete taxon sampling. Systematic Biology 61, 443–460. Rawlings, S.R., Rodriguez, A.C., Roe, P.M., Rogers, J., Bacigalupo, M.C.R., Romanov, Edwards, S.V., 2009. Is a new and general theory of molecular systematics emerging? N., Romieu, A., Roth, R.K., Rourke, N.J., Ruediger, S.T., Rusman, E., Sanches-Kuiper, Evolution 63, 1–19. R.M., Schenker, M.R., Seoane, J.M., Shaw, R.J., Shiver, M.K., Short, S.W., Sizto, N.L., Edwards, S.V., Liu, L., Pearl, D.K., 2007. High-resolution species trees without concate- Sluis, J.P., Smith, M.A., Sohna, J.E.S., Spence, E.J., Stevens, K., Sutton, N., nation. Proceedings of the National Academy of Sciences of the United States of Szajkowski, L., Tregidgo, C.L., Turcatti, G., VandeVondele, S., Verhovsky, Y., Virk, America 104, 5936–5941. S.M., Wakelin, S., Walcott, G.C., Wang, J., Worsley, G.J., Yan, J., Yau, L., Zuerlein, Egan, A.N., Schlueter, J., Spooner, D.M., 2012. Applications of next-generation sequenc- M., Rogers, J., Mullikin, J.C., Hurles, M.E., McCooke, N.J., West, J.S., Oaks, F.L., ing in plant biology. American Journal of Botany 99, 175–185. Lundberg, P.L., Klenerman, D., Durbin, R., Smith, A.J., 2008. Accurate whole Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S., Mitchell, S.E., human genome sequencing using reversible terminator chemistry. Nature 456, 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diver- 53–59. sity species. PLoS One 6, e19379. Bruneau, A., Doyle, J.J., Palmer, J.D., 1990. A chloroplast DNA inversion as a subtribal Faircloth, B.C., McCormack, J.E., Crawford, N.G., Harvey, M.G., Brumfield, R.T., Glenn, character in the Phaseoleae (Leguminosae). Systematic Botany 15, 378–386. T.C., 2012. Ultraconserved elements anchor thousands of genetic markers spanning Bruneau, A., Doyle, J.L., Doyle, J.J., 1995. Phylogenetic evidence in Phaseoleae: evidence multiple evolutionary timescales. Systematic Biology 61, 717–726. from chloroplast restriction site characters. In: Crisp, M.D., Doyle, J.J. (Eds.), Ad- Feliner, G.N., Rossello, J.A., 2007. Better the devil you know? Guidelines for insightful vances in Legume Systematics, Part 7: Phylogeny. Royal Botanic Gardens, Kew, utilization of nrDNA ITS in species-level evolutionary studies in plants. Molecular pp. 309–330. Phylogenetics and Evolution 44, 911–919. Bruneau, A., Mercure, M., Lewis, G.P., Herendeen, P.S., 2008. Phylogenetic patterns and Gee, H., 2003. Ending incongruence. Nature 425, 782. diversification in the caesalpinioid legumes. Botany-Botanique 86, 697–718. Grover, C.E., Salmon, A., Wendel, J.F., 2012. Targeted sequence capture as a powerful Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N.A., RoyChoudhury, A., 2012. Infer- tool for evolutionary analysis. American Journal of Botany 99, 312–319. ring species trees directly from biallelic genetic markers: bypassing gene trees in a Heled, J., Drummond, A.J., 2010. Bayesian inference of species trees from multilocus full coalescent analysis. Molecular Biology and Evolution 29, 917–932. data. Molecular Biology and Evolution 27, 570–580. Buchen, 2011. Missing the mark. Nature 471, 428–432. Hirsch, A.M., Larue, T.A., 1997. Is the legume nodule a modified root or stem or an organ Cannon, S.B., Ilut, D., Farmer, A.D., Maki, S.L., May, G.D., Singer, S.R., Doyle, J.J., 2010. sui generis? Critical Reviews in Plant Sciences 16, 361–392. Polyploidy did not predate the evolution of nodulation in all legumes. PLoS One Hughes, C., Eastwood, R., 2006. Island radiation on a continental scale: exceptional 5, e11630. rates of plant diversification after uplift of the Andes. Proceedings of the National Cardoso, D., de Queiroz, L.P., Pennington, R.T., de Lima, H.C., Fonty, E., Wojciechowski, Academy of Sciences of the United States of America 103, 10334–10339. M.F., Lavin, M., 2012. Revisiting the phylogeny of papilionoid legumes: new in- Hughes, C.E., Eastwood, R.J., Bailey, C.D., 2006. From famine to feast? Selecting nuclear sights from comprehensively sampled early-branching lineages. American Journal DNA sequence loci for plant species-level phylogeny reconstruction. Philosophical of Botany 99, 1991–2013. Transactions of the Royal Society of London B, Biological Sciences 361, 211–225. Chase, M.W., Soltis, D.E., Olmstead, R.G., Morgan, D., Les, D.H., Mishler, B.D., Duvall, M.R., Ilut, D.C., Coate, J.E., Luciano, A.K., Owens, T.G., May, G.D., Farmer, A., Doyle, J.J., 2012. A Price, R.A., Hills, H.G., Qiu, Y., 1993. Phylogenetics of seed plants: An analysis of nucle- comparative transcriptomic study of an allotetraploid and its diploid progenitors otide sequences from the plastid gene rbcL. Annals of the Missouri Botanical Garden illustrates the unique advantages and challenges of RNA-seq in plant species. 80, 528–580. American Journal of Botany 99, 383–396. Choi, H., Mun, J., Kim, D., Zhu, H., Baek, J., Mudge, J., Roe, B., Ellis, N., Doyle, J., Kiss, G.B., Jansen, R.K., Wojciechowski, M.F., Sanniyasi, E., Lee, S., Daniell, H., 2008. Complete plastid Young, N.D., Cook, D.R., 2004. Estimating genome conservation between crop and genome sequence of the chickpea (Cicer arietinum) and the phylogenetic distribution model legume species. Proceedings of the National Academy of Sciences of the of rps12 and clpP intron losses among legumes (Leguminosae). Molecular Phyloge- United States of America 101, 15289–15294. netics and Evolution 48, 1204–1217. Choi, H., Luckow, M.A., Doyle, J., Cook, D.R., 2006. Development of nuclear gene-derived Kajita, T., Ohashi, H., Tateishi, Y., Bailey, C.D., Doyle, J.J., 2001. rbcL and legume phylogeny, molecular markers linked to legume genetic maps. Molecular Genetics and Geno- with particular reference to Phaseoleae, Millettieae, and allies. Systematic Botany 26, mics 276, 56–70. 515–536. Cronn, R., Knaus, B.J., Liston, A., Maughan, P.J., Parks, M., Syring, J.V., Udall, J., 2012. Käss, E., Wink, M., 1992. rbcL sequences from lupins and other legume species. Molec- Targeted enrichment strategies for next-generation plant biology. American Jour- ular Evolution Newsletter 2, 21–26. nal of Botany 99, 291–311. Käss, E., Wink, M., 1995. Molecular phylogeny of the Papilionoideae (family Dobzhansky, T., 1973. Nothing in biology makes sense except in the light of evolution. Leguminosae): rbcL gene sequences versus chemical taxonomy. Botanica Acta The American Biology Teacher 35, 125–129. 108, 149–162. Doyle, J.J., 1987. Variation at the DNA level: uses and potential in legume systematics. In: Käss, E., Wink, M., 1996. Molecular evolution of the Leguminosae: phylogeny of the Stirton, C. (Ed.), Advances in Legume Systematics, Part 3. Royal Botanic Gardens, three subfamilies based on rbcL-sequences. Biochemical Systematics and Ecology Kew, pp. 1–30. 24, 365–378. Doyle, J.J., 1992. Gene trees and species trees: molecular systematics as one-character Käss, E., Wink, M., 1997a. Molecular phylogeny and phylogeography of Lupinus taxonomy. Systematic Botany 17, 144–163. (Leguminosae) inferred from nucleotide sequences of the rbcL gene and ITS Doyle, J.J., 1994. Phylogeny of the legume family: an approach to understanding the or- 1 + 2 regions of rDNA. Plant Systematics and Evolution 208, 139–167. igins of nodulation. Annual Review of Ecology and Systematics 25, 325–349. Käss, E., Wink, M., 1997b. Phylogenetic relationships in the Papilionoideae (family Doyle, J.J., 1995. DNA data and legume phylogeny: a progress report. In: Crisp, M.D., Leguminosae) based on nucleotide sequences of cpDNA (rbcL) and ncDNA (ITS 1 Doyle, J.J. (Eds.), Advances in Legume Systematics, Part 7: Phylogeny. Royal Botanic and 2). Molecular Phylogenetics and Evolution 8, 65–88. Gardens, Kew, pp. 11–30. Koller, B., Delius, H., 1980. Vicia faba chloroplast DNA has only one set of ribosomal RNA Doyle, J.J., 2011. Phylogenetic perspectives on the origins of nodulation. Molecular genes as shown by partial denaturation mapping and R-loop analysis. Molecular Plant-Microbe Interactions 24, 1289–1295. and General Genetics 178, 261–270. Doyle, J.J., 2012. Polyploidy in legumes. In: Soltis, D.E., Soltis, P.S. (Eds.), Polyploidy and Kubatko, L.S., Degnan, J.H., 2007. Inconsistency of phylogenetic estimates from Genome Evolution. Springer, New York, pp. 147–180. concatenated data under coalescence. Systematic Biology 56, 17–24. Doyle, J.J., Beachy, R.N., 1985. Ribosomal gene variation in soybean Glycine max and its Lai, M., Sceppa, J., Ballenger, J.A., Doyle, J.J., Wunderlin, R.P., 1997. for the relatives. Theoretical and Applied Genetics 70, 369–376. presence of the rpL2 intron in chloroplast genomes of Bauhinia (Leguminosae). Doyle, J.J., Doyle, J.L., 1993. Chloroplast DNA phylogeny of the papilionoid legume tribe Systematic Botany 22, 519–528. Phaseoleae. Systematic Botany 18, 309–327. Lavin, M., Doyle, J.J., 1991. Tribal relationships of Sphinctospermum (Leguminosae): Doyle, J.J., Doyle, J.L., 1999. Nuclear protein-coding genes in phylogeny reconstruction Integration of traditional and chloroplast DNA data. Systematic Botany 16, and homology assessment: some examples from Leguminosae. In: Hollingsworth, 162–172. P., Bateman, R., Gornall, R. (Eds.), Molecular Systematics and Plant Evolution. Lavin, M., Doyle, J.J., Palmer, J.D., 1990. Evolutionary significance of the loss of the Taylor and Francis, London, pp. 229–254. chloroplast-DNA inverted repeat in the Leguminosae subfamily Papilionoideae. Doyle, J.J., Luckow, M.A., 2003. The rest of the iceberg. Legume diversity and evolution Evolution 44, 390–402. in a phylogenetic context. Plant Physiology (Rockville) 131, 900–910. Lavin, M., Eshbaugh, E., Hu, J., Mathews, S., Sharrock, R.A., 1998. Monophyletic sub- Doyle, J.J., Schuler, M.A., Godette, W.D., Zenger, V., Beachy, R.N., Slightom, J.L., 1986. The groups of the tribe Millettieae (Leguminosae) as revealed by phytochrome nucle- glycosylated seed storage proteins of Glycine max and Phaseolus vulgaris structural otide sequence data. American Journal of Botany 85, 412–433. homologies of genes and proteins. Journal of Biological Chemistry 261, 9228–9238. Lavin, M., Pennington, R.T., Klitgaard, B.B., Sprent, J.I., de Lima, H.C., Gasson, P.E., 2001. Doyle, J.J., Doyle, J.L., Palmer, J.D., 1995. Multiple independent losses of two genes and The dalbergioid legumes (Fabaceae): delimitation of a pantropical monophyletic one intron from legume chloroplast genomes. Systematic Botany 20, 272–294. clade. American Journal of Botany 88, 503–533. Doyle, J.J., Doyle, J.L., Ballenger, J.A., Palmer, J.D., 1996a. The distribution and phylogenetic Lavin, M., Wojciechowski, M.F., Gasson, P., Hughes, C., Wheeler, E., 2003. Phylogeny of significance of a 50-kb chloroplast DNA inversion in the flowering plant family robinioid legumes (Fabaceae) revisited: Coursetia and Gliricidia recircumscribed, Leguminosae. Molecular Phylogenetics and Evolution 5, 429–438. and a biogeographical appraisal of the endemics. Systematic Botany Doyle, J.J., Kanazin, V., Shoemaker, R.C., 1996b. Phylogenetic utility of histone H3 intron 28, 387–409. sequences in the perennial relatives of soybean (Glycine: Leguminosae). Molecular Lavin, M., Herendeen, P.S., Wojciechowski, M.F., 2005. Evolutionary rates analysis of Phylogenetics and Evolution 6, 438–447. Leguminosae implicates a rapid diversification of lineages during the tertiary. Sys- Doyle, J.J., Doyle, J.L., Ballenger, J.A., Dickson, E.E., Kajita, T., Ohashi, H., 1997. A phylog- tematic Biology 54, 575–594. eny of the chloroplast gene rbcL in the Leguminosae: taxonomic correlations and Lee, E.K., Cibrian-Jaramillo, A., Kolokotronis, S., Katari, M.S., Stamatakis, A., Ott, M., Chiu, insights into the evolution of nodulation. American Journal of Botany 84, 541–554. J.C., Little, D.P., Stevenson, D.W., McCombie, W.R., Martienssen, R.A., Coruzzi, G.,

Please cite this article as: Doyle, J.J., The promise of genomics for a “next generation” of advances in higher-level legume molecular systematics, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.012 8 J.J. Doyle / South African Journal of Botany xxx (2013) xxx–xxx

DeSalle, R., 2011. A functional phylogenomic view of the seed plants. PLoS Genetics Saerkinen, T., Pennington, R.T., Lavin, M., Simon, M.F., Hughes, C.E., 2012. Evolutionary 7, e1002411. islands in the Andes: persistence and isolation explain high endemism in Andean Legume Phylogeny Working Group, 2013. Legume phylogeny and classification in the dry tropical forests. Journal of Biogeography 39, 884–900. 21st century: progress, prospects and lessons for other species-rich clades. Taxon Saiki, R.K., Scharf, S., Faloona, F., Mullis, K.B., Horn, G.T., Erlich, H.A., Arnheim, N., 1985. 62, 217–248. Enzymatic amplification of beta globin genomic sequences and restriction site Lemmon, A.R., Lemmon, E.M., 2012. High-throughput identification of informative nu- analysis for diagnosis of sickle cell anemia. Science 230, 1350–1354. clear loci for shallow-scale phylogenetics and phylogeography. Systematic Biology Sakiroglu, M., Sherman-Broyles, S., Story, A., Moore, K.J., Doyle, J.J., Charles Brummer, E., 61, 745–761. 2012. Patterns of linkage disequilibrium and association mapping in diploid alfalfa Lewis, G.P., Schrire, B.D., Mackinder, B., Lock, M. (Eds.), 2005. Legumes of the World. (M. sativa L.). Theoretical and Applied Genetics 125, 577–590. Royal Botanic Gardens, Kew. Sanderson, M.J., Hufford, L. (Eds.), 1996. Homoplasy: The Recurrence of Similarity in Libault, M., Farmer, A., Brechenmacher, L., Drnevich, J., Langley, R.J., Bilgin, D.D., Evolution. Academic Press, San Diego, California. Radwan, O., Neece, D.J., Clough, S.J., May, G.D., Stacey, G., 2010a. Complete Sanderson, M.J., McMahon, M.M., 2007. Inferring angiosperm phylogeny from EST data transcriptome of the soybean root hair cell, a single-cell model, and its alteration with widespread gene duplication. BMC Evolutionary Biology 7, S3. in response to Bradyrhizobium japonicum infection. Plant Physiology (Rockville) Sang, T., 2002. Utility of low-copy nuclear gene sequences in plant phylogenetics. Critical 152, 541–552. Reviews in Biochemistry and Molecular Biology 37, 121–147. Libault, M., Farmer, A., Joshi, T., Takahashi, K., Langley, R.J., Franklin, L.D., He, J., Xu, D., Sanger, F., Nicklen, S., Coulson, A.R., 1977. DNA sequencing with chain- May, G., Stacey, G., 2010b. An integrated transcriptome atlas of the crop model terminating inhibitors. Proceedings of the National Academy of Sciences 74, Glycine max, and its use in comparative analyses in plants. The Plant Journal 63, 5463–5467. 86–99. Schadt, E.E., Turner, S., Kasarskis, A., 2011. A window into third generation sequencing. Liston, A., 1995. Use of the polymerase chain reaction to survey for the loss of the Human Molecular Genetics 20, 853. inverted repeat in the legume chloroplast genome. In: Crisp, M.D., Doyle, J.J. Scherson, R.A., Choi, H., Cook, D.R., Sanderson, M.J., 2005. Phylogenetics of New World (Eds.), Advances in Legume Systematics, Part 7: Phylogeny. Royal Botanic Gardens, Astragalus: screening of novel nuclear loci for the reconstruction of phylogenies at Kew, pp. 31–40. low taxonomic levels. Brittonia 57, 354–366. Liu, L., Li, Y., Li, S., Hu, N., He, Y., Pong, R., Lin, D., Lu, L., Law, M., 2012. Comparison of Schmutz, J., Cannon, S.B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., Hyten, D.L., Song, Q., next-generation sequencing systems. Journal of Biomedicine & Biotechnology Thelen, J.J., Cheng, J., Xu, D., Hellsten, U., May, G.D., Yu, Y., Sakurai, T., Umezawa, T., 2012, 1–11. Bhattacharyya, M.K., Sandhu, D., Valliyodan, B., Lindquist, E., Peto, M., Grant, D., Lu, F., Lipka, A.E., Glaubitz, J., Elshire, R., Cherney, J.H., Casler, M.D., Buckler, E.S., Costich, Shu, S., Goodstein, D., Barry, K., Futrell-Griggs, M., Du, J., Tian, Z., Zhu, L., Gill, N., D.E., 2013. Switchgrass genomic diversity, ploidy, and evolution: novel insights Joshi, T., Libault, M., Sethuraman, A., Zhang, X.C., Shinozaki, K., Nguyen, H.T., from a network-based SNP discovery protocol. PLoS Genetics 9, e1003215. Wing, R.A., Cregan, P., Specht, J., Grimwood, J., Rokhsar, D., Stacey, G., Shoemaker, Lynch, M., Conery, J.S., 2003. The origins of genome complexity. Science 302, 1401–1404. R.C., Jackson, S.A., 2010. Genome sequence of the paleopolyploid soybean. Nature Maddison, W.P., 1997. Gene trees in species trees. Systematic Biology 46, 523–536. 463, 178–183. Manzanilla, V., Bruneau, A., 2012. Phylogeny reconstruction in the Caesalpinieae grade Schrire, B.D., Lavin, M., Barker, N.P., Forest, F., 2009. Phylogeny of the tribe Indigofereae (Leguminosae) based on duplicated copies of the sucrose synthase gene and plas- (Leguminosae–Papilionoideae): geographically structured more in succulent-rich tid markers. Molecular Phylogenetics and Evolution 65, 149–162. and temperate settings than in grass-rich environments. American Journal of Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Botany 96, 816–852. Braverman, M.S., Chen, Y., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Schuler, M.A., Doyle, J.A., Beachy, R.N., 1983. Nucleotide homologies between the Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, glycosylated seed storage proteins of Glycine max and Phaseolus vulgaris. Plant Mo- M.L.I., Jarvie, T.P., Jirage, K.B., Kim, J., Knight, J.R., Lanza, J.R., Leamon, J.H., lecular Biology 2, 119–128. Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., Severin, A., Woody, J., Bolon, Y., Joseph, B., Diers, B., Farmer, A., Muehlbauer, G., Nelson, McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, R., Grant, D., Specht, J., Graham, M., Cannon, S., May, G., Vance, C., Shoemaker, R., M.T., Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, 2010. RNA-Seq Atlas of Glycine max: a guide to the soybean transcriptome. BMC K.R., Tomasz, A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, Plant Biology 10, 160. P., Begley, R.F., Rothberg, J.M., 2005. Genome sequencing in microfabricated high- Shaw, J., Lickey, E.B., Beck, J.T., Farmer, S.B., Liu, W., Miller, J., Siripun, K.C., Winder, C.T., density picolitre reactors. Nature 437, 376–380. Schilling, E.E., Small, R.L., 2005. The tortoise and the hare II: relative utility of 21 Markmann, K., Parniske, M., 2009. Evolution of root endosymbiosis with bacteria: how noncoding chloroplast DNA sequences for phylogenetic analysis. American Journal novel are nodules? Trends in Plant Science 14, 77–86. of Botany 92, 142–166. Maureira-Butler, I.J., Pfeil, B.E., Muangprom, A., Osborn, T.C., Doyle, J.J., 2008. The retic- Shaw, J., Lickey, E.B., Schilling, E.E., Small, R.L., 2007. Comparison of whole chloroplast ge- ulate history of Medicago (Fabaceae). Systematic Biology 57, 466–482. nome sequences to choose noncoding regions for phylogenetic studies in angiosperms: McMahon, M.M., 2005. Phylogenetic relationships and floral evolution in the thetortoiseandthehareIII.AmericanJournalofBotany94,275–288. papilionoid legume clade Amorpheae. Brittonia 57, 397–411. Simon, M.F., Grether, R., de Queiroz, L.P., Saerkinen, T.E., Dutra, V.F., Hughes, C.E., 2011. McMahon, M., Hufford, L., 2004. Phylogeny of Amorpheae (Fabaceae: Papilionoideae). The evolutionary history of Mimosa (Leguminosae): toward a phylogeny of the American Journal of Botany 91, 1219–1230. sensitive plants. American Journal of Botany 98, 1201–1221. McMahon, M.M., Sanderson, M.J., 2006. Phylogenetic supermatrix analysis of GenBank Simpson, G.G., 1961. Principles of Animal Taxonomy. Columbia University Press, New sequences from 2228 papilionoid legumes. Systematic Biology 55, 818–836. York. McPherson, J.D., 2009. Next generation gap. Nature Methods Supplement 6, S1–S5. Small, R.L., Ryburn, J.A., Cronn, R.C., Seelanan, T., Wendel, J.F., 1998. The tortoise and the Metzker, M.L., 2010. Sequencing technologies — the next generation. Nature Reviews hare: choosing between noncoding plastome and nuclear Adh sequences for Genetics 11, 31–46. phylogeny reconstruction in a recently diverged plant group. American Journal of Moore, Gordon E., 1965. Cramming more components onto integrated circuits. Elec- Botany 85, 1301–1315. tronics Magazine 19 (1965), 114–117 (April). Sprent, J.I., 2007. Evolving ideas of legume evolution and diversity: a taxonomic per- Nei, M., Rooney, A.P., 2005. Concerted and birth-and-death evolution of multigene spective on the occurrence of nodulation. New Phytologist 174, 11–25. families. Annual Review of Genetics 39, 121–152. Steele, K.P., Ickert-Bond, S.M., Zarre, S., Wojciechowski, M.F., 2010. Phylogeny and charac- Palmer, J.D., Thompson, W.F., 1981. Rearrangements in the chloroplast genomes of ter evolution in Medicago (Leguminosae): evidence from analyses of plastid trnk/ mung bean and pea. Proceedings of the National Academy of Sciences of the United matK and nuclear GA3ox1 sequences. American Journal of Botany 97, 1142–1155. States of America 78, 5533–5537. Stefanovic, S., Pfeil, B.E., Palmer, J.D., Doyle, J.J., 2009. Relationships among phaseoloid Palmer, J.D., Thompson, W.F., 1982. Chloroplast DNA rearrangements are more fre- legumes based on sequences from eight chloroplast regions. Systematic Botany quent when a large inverted repeat sequence is lost. Cell 29, 537–550. 34, 115–128. Palmer, J.D., Zamir, D., 1982. Chloroplast DNA evolution and phylogenetic relationships Stirton, C.H. (Ed.), 1987. Advances in Legume Systematics Part 3. Royal Botanic Gardens, in Lycopersicon. Proceedings of the National Academy of Sciences of the United Kew. States of America 79, 5006–5010. Straub, S.C.K., Parks, M., Weitemier, K., Fishbein, M., Cronn, R.C., Liston, A., 2012. Navi- Palmer, J.D., Jorgensen, R.A., Thompson, W.F., 1985. Chloroplast DNA variation and evolu- gating the tip of the genomic iceberg: next-generation sequencing for plant sys- tion in Pisum: patterns of change and phylogenetic analysis. Genetics 109, 195–214. tematics. American Journal of Botany 99, 349–364. Palmer, J.D., Osorio, B., Aldrich, J., Thompson, W.F., 1987. Chloroplast DNA evolution Sun, S.M., Slightom, J.L., Hall, T.C., 1981. Intervening sequences in a plant gene: compar- among legumes: loss of a large inverted repeat occurred prior to other sequence ison of the partial sequence of complementary DNA and genomic DNA of French rearrangements. Current Genetics 11, 275–286. bean (Phaseolus vulgaris) phaseolin. Nature (London) 289, 37–41. Pamilo, P., Nei, M., 1988. Relationships between gene trees and species trees. Molecular Taberlet, P., Gielly, L., Pautou, G., Bouvet, J., 1991. Universal primers for amplification of Biology and Evolution 5, 568–583. three non-coding regions of chloroplast DNA. Plant Molecular Biology 17, 1105–1110. Patterson, C., 1988. Homology in classical and molecular biology. Molecular Biology Vance, E., 2012. Megadata: the odd couple. Nature 491, S52–S54. and Evolution 5, 603–625. Wake, D., 1996. Introduction. In: Sanderson, M.J., Hufford, L. (Eds.), Homoplasy: Polhill, R.M., Raven, P.H. (Eds.), 1981a. Advances in Legume Systematics Part 1. Royal The Recurrence of Similarity in Evolution. Academic Press, San Diego, California, Botanic Gardens, Kew. pp. xvii–xxv. Polhill, R.M., Raven, P.H. (Eds.), 1981b. Advances in Legume Systematics Part 2. Royal Wake, D.B., Wake, M.H., Specht, C.D., 2011. Homoplasy: from detecting pattern to de- Botanic Gardens, Kew. termining process and mechanism of evolution. Science 331, 1032–1035. Rokas, A., Williams, B.L., King, N., Carroll, S.B., 2003. Genome-scale approaches to Wendel, J.F., Doyle, J.J., 1998. Phylogenetic incongruence: window into genome history resolving incongruence in molecular phylogenies. Nature 425, 798–804. and molecular evolution, In: Soltis, D.E., Soltis, P.S., Doyle, J.J. (Eds.), Molecular Sys- Rosenberg, N.A., Nordborg, M., 2002. Genealogical trees, coalescent theory and the tematics of Plants, 2nd ed. Kluwer Academic Publishers, Norwell, Massachusetts, analysis of genetic polymorphisms. Nature Reviews Genetics 3, 380–390. pp. 265–296.

Please cite this article as: Doyle, J.J., The promise of genomics for a “next generation” of advances in higher-level legume molecular systematics, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.012 J.J. Doyle / South African Journal of Botany xxx (2013) xxx–xxx 9

Wetterstrand, K.A., 2013. DNA sequencing costs: data from the NHGRI Genome Sequencing Young, N.D., Debelle, F., Oldroyd, G.E.D., Geurts, R., Cannon, S.B., Udvardi, M.K., Benedito, Program (GSP). available at: www.genome.gov/sequencingcosts (Accessed March, V.A., Mayer, K.F.X., Gouzy, J., Schoof, H., Van de Peer, Y., Proost, S., Cook, D.R., Meyers, 2013). B.C., Spannagl, M., Cheung, F., De Mita, S., Krishnakumar, V., Gundlach, H., Zhou, S., Wojciechowski, M.F., Lavin, M., Sanderson, M.J., 2004. A phylogeny of legumes Mudge, J., Bharti, A.K., Murray, J.D., Naoumkina, M.A., Rosen, B., Silverstein, K.A.T., (Legumenosae) based on analyses of the plastid matK gene resolves many well- Tang, H., Rombauts, S., Zhao, P.X., Zhou, P., Barbe, V., Bardou, P., Bechner, M., Bellec, supported subclades within the family. American Journal of Botany 91, 1846–1862. A., Berger, A., Berges, H., Bidwell, S., Bisseling, T., Choisne, N., Couloux, A., Denny, R., Wood, T.E., Takebayashi, N., Barker, M.S., Mayrose, I., Greenspoon, P.B., Rieseberg, L.H., Deshpande, S., Dai, X., Doyle, J.J., Dudez, A., Farmer, A.D., Fouteau, S., Franken, C., 2009. The frequency of polyploid in vascular plants. Proceedings of the Na- Gibelin, C., Gish, J., Goldstein, S., Gonzalez, A.J., Green, P.J., Hallab, A., Hartog, M., tional Academy of Sciences of the United States of America 106, 13875–13879. Hua, A., Humphray, S.J., Jeong, D., Jing, Y., Joecker, A., Kenton, S.M., Kim, D., Klee, K., Wu, Z., Ge, S., 2012. The phylogeny of the BEP clade in grasses revisited: evidence from Lai, H., Lang, C., Lin, S., Macmil, S.L., Magdelenat, G., Matthews, L., McCorrison, J., the whole-genome sequences of chloroplasts. Molecular Phylogenetics and Evolution Monaghan, E.L., Mun, J., Najar, F.Z., Nicholson, C., Noirot, C., O'Bleness, M., Paule, 62, 573–578. C.R., Poulain, J., Prion, F., Qin, B., Qu, C., Retzel, E.F., Riddle, C., Sallet, E., Samain, S., Wu, F., Mueller, L.A., Crouzillat, D., Petiard, V., Tanksley, S.D., 2006. Combining bioinfor- Samson, N., Sanders, I., Saurat, O., Scarpelli, C., Schiex, T., Segurens, B., Severin, A.J., matics and phylogenetics to identify large sets of single-copy orthologous genes Sherrier, D.J., Shi, R., Sims, S., Singer, S.R., Sinharoy, S., Sterck, L., Viollet, A., Wang, (COSII) for comparative, evolutionary and systematic studies: a test case in the B., Wang, K., Wang, M., Wang, X., Warfsmann, J., Weissenbach, J., White, D.D., euasterid plant clade. Genetics 174, 1407–1420. White, J.D., Wiley, G.B., Wincker, P., Xing, Y., Yang, L., Yao, Z., Ying, F., Zhai, J., Zhou, Yoder, J.B., Briskine, R., Mudge, J., Farmer, A., Paape, T., Steele, K., Weiblen, G.D., Bharti, L., Zuber, A., Denarie, J., Dixon, R.A., May, G.D., Schwartz, D.C., Rogers, J., Quetier, F., A.K., Zhou, P., May, G.D., Young, N.D., Tiffin, P., 2013. Phylogenetic signal variation Town, C.D., Roe, B.A., 2011. The Medicago genome provides insight into the evolution in the genomes of Medicago (Fabaceae). Systematic Biology (in press). of rhizobial symbioses. Nature 480, 520–524.

Please cite this article as: Doyle, J.J., The promise of genomics for a “next generation” of advances in higher-level legume molecular systematics, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.012