ABSTRACT

PHYLOGENOMIC STUDY OF SELECTED SPECIES WITHIN THE GENUS : MUTATION RATE ANALYSIS OF COMPLETE CHLOROPLAST GENOMES

Lauren Orton, M.S. Department of Biological Sciences Northern Illinois University, 2015 Melvin R. Duvall, Director

This project examines the relationships within the genus Zea using complete chloroplast genomes (plastomes). Zea mays is one of the most widely cultivated crop species in the world.

Billions of dollars have been spent in the commercial agriculture sector to study and improve Z. mays.

While Z. mays has been well studied, the congeneric species have yet to be as thoroughly examined. For this study complete plastomes were sequenced in four species (,

Zea perennis, , and Zea mays subsp. huehuetenangensis) by Sanger or next- generation methods. An analysis of the microstructural mutations, such as inversions, insertion or deletion mutations (indels) and determination of their frequencies were performed for the complete plastomes.

It was determined that 197 indels and 10 inversions occurred across the examined plastomes. The most common mutational mechanism was discovered to be the tandem repeat from slipped strand mispairing events. Mutation rates were calculated to determine a precise rate over time. The mutations rates for the genus fell within the range of 0.00126 to 0.02830

microstructural mutation events per year. These rates are highly variable, corresponding to the close and complex relationships within the genus.

Phylogenomic analyses were also conducted to examine the differences between species within Zea. In many cases, much of the previous work examining Zea mitochondrial and nuclear data was confirmed with identical tree topologies. Divergence dates for specific nodes relative to

Zea were calculated to fall between 8,700 calendar years before present for the subspecies included in this study and 1,024 calendar years before present for the perennial species included in this study.

NORTHERN ILLINOIS UNIVERSITY DE KALB, ILLINOIS

AUGUST 2015

PHYLOGENOMIC STUDY OF SELECTED SPECIES WITHIN THE GENUS Zea: MUTATION RATE ANALYSIS OF COMPLETE CHLOROPLAST GENOMES

BY

LAUREN M. ORTON ©2015 Lauren M. Orton

A THESIS SUBMITTED TO THE GRADUATE SCHOOL

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE

MASTER OF SCIENCE

DEPARTMENT OF BIOLOGICAL SCIENCES

Thesis Director: Melvin R. Duvall

ACKNOWLEDGEMENTS

This project would not have been possible with out the following individuals and organizations for their help and support. First, I would like to thank Dr. Melvin Duvall, Professor and thesis advisor, for his guidance and instruction on this research project. I thank the Iowa

State University Seed Science Center under the direction of Mr. Mike Stahr, and the Small Seed

Department Head, Ms. Kim North, for helping in the procurement of seeds for this project.

Along with Mr. Stahr and Ms. North, I would like to extend my gratitude to Mr. Mark Millard of the USDA/ARS Introduction Station at Iowa State University, for providing the seed accessions used within this study. I also thank Distinguished Professor, Dr. Richard King and

Assistant Professor, Dr. Wesley Swingley, both faculty members of Northern Illinois University and Graduate Committee Members for this thesis project. I would also like to thank Mr. William

P. Wysocki and Mr. Sean V. Burke for their assistance. Additionally, portions of this project would not have been possible without the financial support of the National Science Foundation

(NSF grant DEB1120761 to MRD). Finally, I thank the Northern Illinois University College of

Liberal Arts and Sciences and the Graduate School.

DEDICATION

This thesis is dedicated in memory of my grandparents, George and Anna Hellmuth, who always said, “An education is the only thing that another man cannot take away from you.”

I also dedicate this thesis to Adam Orton, Robert & Sherry Hellmuth, Lindsey Hellmuth,

Jeffery & Mary Orton, Rachel Orton, Bob & Joan Muir, Vicky Muir, and in memory of Orville

& Ardella Orton, John & Ina Asplund.

TABLE OF CONTENTS

Page LIST OF TABLES…………………………………………………………… vi

LIST OF FIGURES….………………………………………………………. vii

LIST OF ABBREVIATIONS………………………………………………… viii

PREFACE………………………………………………………………….. x

Chapter

1. PHYLOGENOMIC STUDY OF SELECTED SPECIES WITHIN THE GENUS ZEA: MUTATION RATE ANALYSIS OF COMPLETE CHLOROPLAST GENOMES

Introduction……………………………………………… 1

Methods…………………………………………..…...... 5

DNA Extraction and Next Generation Sequencing…………………………………….... 7

Sanger Sequencing…………………………...... 7

Plastome Assembly and Annotation………….... 9

Plastome Comparisons………………………..... 11

Microstructural Mutation Analysis…………...... 11

v

Page

Mutation Rates and Divergence Times……….... 12

Results...... 16

Phylogenetics: Sequence Analysis...... 16

Phylogenetics: Microstructural Mutation Analysis.... 19

Phylogenetics: Combined Sequence and Microstructural Mutation Analysis...... 22

Divergence Time Estimation Analysis...... 22

Mutation Rates...... 24

Zea perennis Insertion...... 25

Discussion...... 27

Phylogenomic and Mutation Rate Analysis...... 27

Divergence Analysis (BEAST)...... 29

Zea perennis “Insertion”...... 30

Conclusion...... 31

LITERATURE CITED...... 32

LIST OF TABLES

Table Page

1 Plastome characteristics of selected species...………………………..… 6

2 List of species-specific primers designed for Zea diploperennis………. 8

3 GenBank accessions of species for which plastomes were previously determined…………………………………………...... 10

4 Results of BEAST analysis divergence times..………………………… 15

5 ML, MP, BI analysis results for sequence, microstructural mutation (MME), and combined data sets...... 18

6 Rare genomic changes (RGCs) characterized across the species included in this study...... ,,,,,...... 21 7 Sequence alignment of the 83 bp insertion/deletion first identified by Doebley et al., 1987, as an insertion in the Zea perennis plastome...... ………… 26

LIST OF FIGURES

Figure Page

1 Maximum likelihood best tree showing bootstrap support values of MP analyses………………………...... 17

2 Diagram examining the relationship between the number of occurrences and lengths of MME.……………...... ….…...... 20

3 Tree result of BEAST analysis with node ages shown...... 23

LIST OF ABBREVIATIONS

AA Amino Acid

ACRE Anchored Conserved Region Extension

BEAST Bayesian Evolutionary Analysis Sampling Trees

BLAST Basic Local Alignment Search Tool bp Base pair bs Bootstrap Support value

CDS Coding Sequence

CIPRES Cyber Infrastructure for Phylogenetic RESearch

GPWGI (II) Grass Phylogeny Working Group I (II)

IR Inverted Repeat

LSC Long Single Copy

MCMC Markov Chain Monte Carlo

ML Maximum Likelihood

MME Microstructural Mutation Event

MP Maximum Parsimony

NGS Next Generation Sequencing

NS Nucleotide Sequence

PAUP * Phylogenetic Analysis Using Parsimony * and other methods

RGC Rare Genomic Change

ix

SSC Short Single Copy

SSM Slipped-strand mispairing

XSEDE eXtreme Science and Engineering Discovery Environment

PREFACE

Maize (Zea mays subsp. mays) is a common, globally consumed and cultivated crop. In

2014 alone, it was estimated by the USDA that nearly 988.70 million metric tons of corn () was produced. Along with wheat (Triticum aestivum) and rice (Oryza sativa), maize accounts for approximately 94% of all cereal grains consumed worldwide (Ranum et al., 2014). Many scholars believe that maize was domesticated some, 7,000-10,000 years ago from wild grasses or teosintes. This domestication event was a pivotal moment in humankind’s shift from hunter- gatherer to agriculturalist within a couple of thousand years (Ranum et al., 2014).

Zea mays subsp. mays (Andropogoneae; ; ), the only cultivated crop in the genus, has been well studied for its importance in agriculture and human nutrition.

However, less is known about the other closely related taxa within the genus Zea. The genus is comprised of five species (Zea luxurians, Zea perennis, Zea diploperennis, Zea nicaraguaensis, and Zea mays). Zea mays is further comprised of four subspecies (subsp. huehuetenangensis, parviglumis, mexicana, and mays) (Doebley, 2003). The non-crop species and subspecies within the genus are referred to as the teosintes, as their morphologies more closely resemble wild grasses.

Historical geographic distributions of the genus Zea include areas stretching from

Northern Mexico into Central America, including the countries of Nicaragua and Guatemala

(Hufford et al., 2012). The average range of temperatures for this region fall between 15-28°C, and average precipitation totals range from 200 mm to upwards of 2,000 mm (Hufford et al.,

xi

2012). However, despite being historically located in hot, humid and moderately temperate habitats, maize has readily proven that it can easily colonize the globe, given that corn fields are present in countries like China and India, across portions of Europe and North, Central, and

South Americas (USDA). Despite the different global climates (some of which may be more suited to C3 photosynthetic ) this colonization of maize (a C4 plant) has been made possible by modern agriculture practices, including the use of herbicides and tight row cropping to choke out competitive C3 photosynthetic species.

Given the importance of maize, it is no surprise that researchers have a great interest in its genetics. Zea mays subsp. mays has been well studied, and complete genomes have been sequenced for nuclear, mitochondrial, and chloroplast compartments. Sequencing the nuclear genome has allowed for greater insight into the genetics pertinent to agriculture (Schnable et al.,

2009). The mitochondrial genome studies elaborated on the relationships within the genus

(Darracq et al.,2012). The sequencing of the chloroplast genome increased our understanding of the plastome’s structure and gene content (Maier et al., 1995). These studies have provided a solid foundation from which to compare many of the closely related species of the genus Zea for this study.

With the advent of high throughput NextGen sequencing, the ability to sequence and analyze multiple species and draw conclusions from useful, inferential nucleotide sequence (NS) data has become a necessary tool to look into the divergence and phylogenetic relationships within the genus Zea. Phylogenomic relationships between taxa provide a greater understanding of evolutionary diversity. Understanding species divergence provides a unique opportunity to examine the genetic history of organisms and determine what potential evolutionary pressures

xii led to differences between species and taxa. Mutation analysis, in particular, provides unique opportunities to discover species and taxon-specific genetic differences resulting in greater phylogenomic diversity. The opportunity to re-examine the current phylogeny of Zea and closely related species of Andropogoneae using complete plastomes will allow for the confirmation of infrageneric relationships.

Mutations, or deviations from the original genetic sequence, can arise from various events during DNA replication. Notably, this study will examine mutational occurrences from slipped strand mispairing events, recombination in stem loop structures, and multi-base substitutions. Alignments of multiple sequences provide evidence of clear and easily identifiable mutation events within plastomes. The accumulation of derived mutations over time produces a historical record of DNA changes that results in the divergence of new characteristics and new taxa.

Chapter 1- PHYLOGENOMIC STUDY OF SELECTED SPECIES WITHIN THE GENUS

Zea: MUTATION RATE ANALYSIS OF COMPLETE CHLOROPLAST

GENOMES

Introduction

The genus Zea is well known for its cultivar Zea mays subsp. mays, as it has become one of the most abundant crops in production today. Current phylogenies of Zea, which include both mitochondrial (Darracq et al., 2011) and nuclear data (Hufford et al.,2012; Ross-Ibarra et al.,

2009), describe a common topology wherein Zea nicaraguensis is sister to Zea luxurians, and this clade is sister to the remaining taxa in the genus. Zea diploperennis is sister to Zea perennis, and this clade is sister to Zea mays. The subspecies of Zea mays are monophyletic (Hufford et al.,2012; Ross-Ibarra et al., 2009).

Although the species within the genus are closely related, the diversity between them is clearly evident in their morphologies. There are evident physical differences between maize and the teosinte species within the genus Zea. Although the tassel, or male inflorescence, is somewhat consistent across the species, the seeds, however, are very different from cultivated maize. Teosinte seeds are angular in shape, resembling a trapezoid or triangle, arranged onto a slender “ear”. According to Matsuoka et al., (2002), the teosinte species were critical to the evolution of maize and have indeed contributed to the genetics of Zea mays subsp. mays.

2

Matsuoka hypothesized, through the use of microsatellite genotyping for multiple loci, that it was a single domestication event which led to the rise of maize as an abundant agricultural crop species.

In agriculture today, human selection and modification have played a large part with regard to the genetics of Zea mays subsp. mays, including its genetic modifications for pest resistance, drought tolerance, and increased yield (Murray et al., 2014). Research and development, within both the private and public sector in agricultural biotechnology companies and organizations, has grown at a rapid pace. It was estimated by Fuglie et al., (2012), that between the years of 1994-2007 spending on research and development (as well as many other agricultural inputs), was increasing at approximately 4.3 percent per annum. Billions of dollars have been spent in commercial corn genetics, making it one of the most costly modified crop species on the market today (Murray et al., 2014).

In 2012, the Grass Phylogeny Working Group II established a notable and well accepted phylogeny for C4 photosynthetic origins encompassing the genus Zea. The genus belongs to the

Andropogoneae tribe of the Panicoideae family which is included in the PACMAD clade

(GPWGII, 2012). One of the great attributes of the genus Zea is its ability to photosynthesize via the C4 pathway. Unlike C3 photosynthetic species, C4 species utilize PEP carboxylase as the initial carbon fixing reaction. PEP carboxylase has a much lower affinity for O2 reducing the amount of wasteful photorespiration.

Undoubtedly, for the hot, dry to moderately sub-tropical climate of Central America, this has proved to be a useful metabolic tool. Unlike Oryza (rice), Triticum (wheat), or many other C3 photosynthetic crop species, this more efficient form of photosynthesis has allowed Zea to

3 become a staple crop not just within the confines of Central America, but it has rapidly become an agricultural global necessity.

Mutations, which can arise from numerous events and sequencing errors, are an excellent means to observe species divergence and phylogenetic diversity among taxa. These mutations also offer possible explanations for the development of such genetic diversity. With the advancements made to bioinformatics software in recent years, mutations are now easily identifiable in genome alignments (Saarela et al., 2015).

Microstructural mutations, a specific type of small or relatively common mutations generally consisting of 50 bp or less (for the purposes of this study), are easily observable. Large mutations of greater than 50 bp are classified as rare genomic changes (RGC), which likely occur with low frequency. However, not all large mutations will be classified as RGCs. For a mutation to fall under the category of an RGC, it must be relatively large, rare (low frequency), either autapomorphic or shared only by closely related species, representative of an intergenomic transfer (ex: mitochondrial DNA transferred to the chloroplast), and/or results in pseudogenization (Jones et al., 2014). Two or more of these specific characteristics are generally required to classify a mutation as a RGC.

The specific mutational mechanism resulting in microstructural mutations is also an identifying attribute. Microstructural mutations such as tandem repeats, are a product of slip- strand mispairing events. Inversions caused by stem-loop structures are identified by reverse- complemented stems flanking the loop region within the strand (Kim et al., 2005).

This is a phylogenomic study of Zea chloroplast genomes, somewhat analogous to the published study of mitochondrial genomes in Zea (Darraq et al., 2010). This study will bridge a

4 gap between the well-studied chloroplast genome of Zea mays subsp. mays and those of the other closely related taxa in the genus by providing new insights into phylogenomic relationships based on complete chloroplast data (Maier et al., 1995).

Four hypotheses were tested. First, the frequency of inversions and insertion/deletion mutations will decrease as the length of the mutation (in base pairs), increases. Second, inversions will regularly occur with less frequency than insertion/deletions of the same length.

Third, extremely large microstructural mutations will likely be occasionally evident as rare genomic changes, and the sequence evidence for the mutational mechanism responsible for the microstructural mutation (i.e. tandem repeats, stem-loop structures) will be observable in the microstructural mutations due to the close similarity between the Zea plastomes.

Finally, in 1987, Doebley, et al. discovered an 83 bp insertion within the Zea perennis chloroplast using restriction enzymes. At the time, the 83 bp insertion was compared solely to the Zea mays subsp. mays chloroplast. Given the broader variety of species examined in this study, it is hypothesized that the insertion will in fact be found in other species of Zea or the outgroup species.

5

Methods

Initially, Zea diploperennis, Zea perennis, Zea luxurians, Zea nicaraguensis, and all four subspecies of Zea mays (heuhuetenagensis, mexicana, parviglumis and mays) were obtained as accessions of seeds from the USDA/ARS Plant Introduction Station at Iowa State University.

Nuclear and mitochondrial genomic relationships within the genus Zea were considered prior to species selection. Specifically, the phylogenic data of 1000 nuclear SNPs (single nucleotide polymorphisms) from data obtained by Hufford et al (2012) allowed for a selection of the four species within the genus Zea examined in this study.

Four species were ultimately selected to represent the major subgroups in Zea for complete chloroplast genome sequencing (Doebley, 2003). The selected species included: Zea diploperennis, Zea perennis, Zea luxurians, and Zea mays subsp. huehuetenangensis (Table 1).

Table 1

Plastome characteristics and sequencing information

Species Accession GenBank Length SSC LSC IR No. Reads Library Sequencing Mean No. Number ID bp Prep. Method Coverage Scaffolded Method Contigs

Zea diploperennis PI 462368 KR873421 140,748 12,554 82,640 22,777 N/A N/A Sanger N/A N/A

Zea luxurians PI 44193 KR873424 140,710 12,657 82,699 22,677 10,917,029 NextEra Single-end 433.869 5

Zea perennis Ames KR873423 140,790 12,546 82,700 22,772 12,417,323 NextEra Single-end 218.866 4 21874 Zea mays subsp. PI 441934 KR873422 140,453 12,527 82,394 22,766 17,015,170 NextEra Single-end 563.809 6 huehuetenangensis

Table 1. Plastome characteristics of selected species. 6 6

7

Five seeds from each of these species were soaked in plain water for 48 hours prior to planting in soilless grow mixes. DNA was extracted from approximately ten-day old leaf shoots using the DNeasy Plant Minikit (Qiagen Inc., Valencia, CA, USA).

DNA Extraction and Next Generation Sequencing

Zea mays subsp. huehuetenangensis, Zea luxurians and Zea perennis, were chosen for

Illumina sequencing by synthesis (NGS). Initially, fresh leaf tissues were manually homogenized in liquid nitrogen. Then, the DNA was extracted using the Qiagen DNeasy Plant Mini Kit protocol.

The quantities of total genomic DNA in the extracts were verified by Qubit (Invitrogen,

ThermoFisher Scientific, Wilmington, DE, USA) and were then diluted to 2.5 ng/µl in 20 µl sterile water. Single read DNA libraries were prepared for sequencing using the NextEra Prep

Kit (Illumina Inc., San Diego, CA, USA; Caruccio, N., 2011). The purification step used the

DNA Clean and Concentrator kit (Zymo Research, Irvine, CA, USA). Iowa State University’s

DNA Facility, Ames, IA, USA, sequenced the libraries on the HiSeq 2000 instrument.

Sanger Sequencing

The Dhingra and Folta (2005) ASAP 01 PCR method of Sanger sequencing was utilized in the sequencing of Zea diploperennis to obtain a Sanger-sequenced reference. Grass-specific primers (Leseberg and Duvall, 2009) were chosen for overlapping regions of approximately

2,400 bases in lenth. One copy of the IR was sequenced as well as the four IR boundaries.

8

For PCR reactions that did not produce an expected product, alternative methods were then explored. These included the use of Fidelitaq polymerase (Affymetrix, Santa Clara, CA, USA), as well as the development of species-specific primers (Table 2) from the flanking regions previously assembled in Geneious Pro 7.0.1 (Biomatters Ltd., Auckland, New Zealand).

Table 2

Species specific primers

Primer Name Sequence

zdip 119aF CCCCTGCCTTACCACTTGGGC

zdip 120aF GGGAAACTCCAGGTCCGAACAGC zdip 120aR GCTCACTCTTCATCAATCCCTACGG zdip 121aR CTTCTCAAGTTCTTCCGCCAAGCC

zdip 121bR GGTCTAATTCCGAGCGGTAG

zdip 167bF GCGGGATTATTCGTGACTGCG zdip 168aR CCTGCTAAAGCCCCAAACCATAGAG zdip 168bR GGGCCTATTCGAGAACGCCTACG

Table 2. Species-specific primers designed for Zea diploperennis.

9

Gel electrophoresis was subsequently employed for verification of PCR product. Single PCR products of the expected size were then prepared for sequencing with the Wizard SV Gel and

PCR Clean-up System (Promega Corp., Madison, WI, USA). Nucleotide concentration was then verified with the Nanodrop 1000 (ThermoFisher Scientific, Wilmington, DE, USA) and those products were then diluted to 25 ng/20 μl for DNA sequencing at Macrogen Inc. (Seoul, South

Korea).

Plastome Assembly and Annotation

For Sanger sequenced data, the electropherogram sequence data received from Macrogen were downloaded into the Geneious Pro software program. Overlappng regions were identified and sequences were then assembled until the entire genome had been completed and specific regions identified and annotated.

For NGS data, DynamicTrim v2.1 of the SolexaQA software suite (Cox, et al., 2010) was used to perform initial quality trimming on the single 99 bp reads using default settings.

LengthSort v2.1, also from the SolexaQA software suite, was used to remove any sequences with a length less than 25 bp.

Using exclusively de novo methods, the NextEra sequenced plastomes were assembled.

Following methods developed by Wysocki et al. (2014), the Velvet software suite (Zerbino &

Birney, 2008) was iteratively run. The ACRE (anchored conserved region extension) method

(Wysocki et al., 2014) scaffolded the remaining contigs. Gaps between large contigs were then manually resolved locating regions of overlap until the plastome was complete. IR boundaries

10 were located by using the BLAST search tool to align the entire plastome to itself and locate the strands marked plus/minus (Burke et al., 2012).

The completed plastome assembly was then annotated by aligning to Zea mays subsp. mays (Table 3) and annotations sharing a minimum of 82% similarity were transferred from the reference to the newly assembled plastome (Wysocki et al., 2014). The annotated plastomes were banked in the GenBank database (Table 1).

Table 3

Previously determined plastomes

Out-group/ comparison species GenBank Accession Number

Saccharum officinarum AP006714

Sorghum bicolor NC008602

Coix lacryma-jobi FJ261955

Zea mays subsp. mays NC001666

Table 3. GenBank accessions of species for which plastomes were previously determined.

11

Plastome Comparisons

Given that Zea mays subsp. mays has been well studied with regard to its chloroplast genome structure and content, it makes an exceptional comparison for other species in this study

(Maier et al., 1995). Moreover, complete plastomes of five congeneric species of Zea provide a unique opportunity to compare, contrast, and identify microstructural mutations (namely: indels caused by SSM among tandem repeats, and inversions caused by recombination in stem-forming regions of stem-loops). Three out-group species were selected based on previously acquired phylogenomic data. Saccharum officinarium, Sorghum bicolor, and Coix lacryma-jobi were selected for out-group comparisons given their phylogenetic relationships to the genus Zea

(Table 3).

Completed and verified genomes were aligned using the ClustalW (Larkin, et al., 2007) alignment tool in Geneious Pro. Microstructural mutations were identified manually, following the methods of Leseberg and Duvall (2009), and a binary file was constructed to differentiate between the ancestral (0) and the derived microstructural mutation (1) condition.

Microstructural Mutation Analysis

Aligned genomes were exported into Nexus or Phylip text format (Felsenstein, J., 2005).

A jModel Test was first performed with the Akaike Information Criterion using the aligned data to determine the appropriate model to specify in phylogenetic analyses (Akaike, 1974; Posada et al., 2003). The General Time Reversible (GTR) substitution model inclusive of invariant sites

(+I) and gamma distribution (+G) was among 88 best fit models within the 100% confidence interval. The GTR+I+G model was selected as the appropriate model for the phylogenetic

12 analyses conducted in this study. The jModel Test model results were then performed to create phylogenetic topologies. A completed binary file differentiating between the ancestral and the derived microstructural mutation was also analyzed for significance in branch node comparisons.

Maximum likelihood (ML), maximum parsimony (MP), Bayesian Inference (BI), and

BEAST analyses were performed for sequence only, binary microstructural mutation only, and sequence plus binary microstructural mutation data sets (where possible).

ML analysis was calculated with the RAxML-HPC2 v8.1.11 program on XSEDE

(Stamatakis, 2014) through the CIPRES science gateway (Miller, et al., 2010). A ML best tree was produced and a consensus bootstrap tree was constructed from 1000 replicates using the

Consense tool from the Phylip software suite available on CIPRES (Felsenstein, 2005).

Bayesian Inference analyses were completed on XSEDE via the CIPRES science gateway using the MrBayes v3.2.2 program (Ronquist, et al., 2012). The Markov Chain Monte

Carlo (MCMC) analysis was run at 10 million generations. The data type was set to “restriction” for the binary portion of the data (Ronquist et al., 2003). This parameter allowed for the sequence data and binary microstructural mutation data to be partitioned within the nexus file itself.

MP branch and bound analyses were calculated via PAUP*v4.0b10 (Swofford, 2003).

Subsequent bootstrap analyses were also performed with 1,000 bootstrap replicates.

Mutation Rates and Divergence Times

Divergence times were estimated in the study group using an uncorrelated relaxed clock method. This method requires the use of dated fossils for calibration of specific nodes.

13

There are very few fossils of panicoid grasses (Iles et al., 2015; Vicentini et al., 2008); and there are even fewer relevant fossils representative of the Andropogoneae tribe that are appropriate to use as calibration points in divergence estimations. Based on the recently published data by Iles et al. (2015) the closest fossils are representative of the pooids, and not a suitable point of calibration for the species used in this study. Both the panicoid fossils from

Setaria and Dichanthelium were considered based on the report by Vicentini et al. (2008); however, as they are both genera of Paniceae, both were inappropriate for use in this study. The fossil dates obtained by Piperno et al, (2009) suggest a later phytolith fossil date of 8,700 calendar years before present for the genus Zea. The samples themselves were not radiocarbon dated, but rather dated by stratigraphy. A companion paper (by Ranere & Piperno et al., 2009) radiocarbon dated the millstone tools used to grind starch grains of Zea. The fossil estimates of

Pohl et al., (2007) offer a more conservative date of 7,300 calendar years before present, and give a radiocarbon date rather than a stratigraphy based date for Zea phytoliths.

Along with Zea fossils, Sorghum and Saccharum fossils were also difficult to locate in peer reviewed publishing. No fossils for Saccharum were found, however an anthropological study conducted by Dahlberg, et al., (1996) at a site in the Western Desert of southern Egypt discovered carbonized fossils of Sorghum seeds, which were subsequently radiocarbon dated to

8000 calendar years before present.

An evolutionary (BEAST) analysis was performed using the BEAST v2.1.2 software

(Bouckaert, et al., 2014). Parameters were set in the accompanying BEAUti program. A GTR

+I+G substitution model was used in conjunction with a relaxed log-normal clock. A chain length of 110 million was specified for the total number of generations to be run. Two nodes

14 were calibrated with fossils and given a log-normal distribution. The crown node of two Zea subspecies was calibrated to 8700 years as obtained from the maize starch grain data of Piperno et al., (2009). The crown node of Sorghum bicolor and Saccharum officinarum was calibrated to

8,000 years, based on anthropological radiocarbon dated carbonized Sorghum seeds discovered in Egypt (Smith et al., 2000; Dahlberg et al., 1996).

The remaining nodes were not calibrated and thus, were not given a specific distribution.

Six taxonomic groups, as they appear in all phylogenetic topologies, were constrained to be monophyletic (Table 4). Mean and variance of the log-normal distributions was calculated using

ParameterSolver v3.0 (Cook et al., 2013).

Tree files produced from the BEAST analysis were annotated using TreeAnnotater v2.1.2

(Rambaut, et al., 2014) and viewed with the FigTree v1.4.2 program (Rambaut, et al., 2014). The

TreeAnnotator burn-in value was set to 20 percent. Relevant node ages and HPD ranges are shown for the BEAST analysis (Table 4).

15

Table 4

BEAST analysis

BEAST Calibrations Crown Stem Zea Crown Stem Zea Crown Stem Analysis Zea Saccharum luxurians Zea Coix subsp. & perennials Sorghum

Zea subsp.: Mean 8,700 10,637.74 7,999.99 1,994.62 1,024.41 12,924.94 8,700 cal. age Ybp.

Saccharum 95% 8,698.11 8,758.85 7,998.04 766.22 714.96 11,975.45 & Sorghum: HPD 8,000 cal. Lower Ybp.

95% 8,701.96 12,475.12 8,001.90 5,106.96 1,390.95 13,884.46 HPD Upper

Table 4. Results of BEAST analysis divergence times in thousands of calendar years before present.

In determining the differing numbers of mutations between specific pairs of species and their respective nodes on the phylogenetic trees, the number of mutations was divided by the total node ages to determine the estimated mutation rate over time. An average of one mutation per “X” years was also determined by the reciprocal of the mutation rate to provide an example illustrating the rates.

16

Results

This study sequenced 562,701 bases for the four plastomes. Completed plastomes were annotated for the four selected taxa in Zea and GenBank accession numbers are reported (Table

1). Overall, 215 microstructural mutations and rare genomic changes were observed in a

ClustalW alignment of the eight andropogonoid species, counted separately from base substitution mutations, and ranging from 2 - 432 bp. 86.39% of mutations occurred in the intergenic spacers and only 13.61% were detected in coding sequences.

The mutational mechanisms (i.e. slipped strand mispairing among tandem repeats; recombination in stem-forming regions of stem-loops) were also identified. Ambiguous mutations that could not be classified as ancestral or derived were discarded since little could be inferred without a standard ancestral condition for comparison. Ultimately, after discarding ambiguous mutations, a total of 207 microstructural mutations and rare genomic changes were analyzed.

Phylogenetics: Sequence Analysis

Phylogenetic analyses of complete plastome sequence data (with gaps removed having a total length of: 115,574), produced MP, and BI trees (not shown) with topologies identical to the

ML tree (Fig. 1). ML bootstrap values at all nodes were 100% reflecting the quantity of phylogenetic information in completely analyzed plastomes. In these trees, the two subspecies of

Z. mays were monophyletic and sister to a clade of the other Zea species, in which Z. perennis was sister to Z. diploperennis. The latter clade was in turn sister to Z. luxurians. In the outgroup

17 clade, Sorghum bicolor was sister to Saccharum officinarum, and this clade was sister to Coix lacryma-jobi (Fig. 1).

....

91/90

Figure 1. Maximum likelihood best tree showing bootstrap support values of MP analyses. Branch lengths are proportional to the substitution rate along the branch. (Sequence/MME).

18

MP branch and bound analysis of sequence data using the PAUP test produced one tree of 2798 steps. The ensemble consistency index (CI) was 0.9736; the CI (excluding uninformative characters) was 0.9085, and the retention index (RI) equaled 0.9408. The bootstrap analysis supported all nodes at values equal to 100%, except for that uniting the Zea perennis and Zea diploperennis, which was 91% (Table 5).

Table 5

ML, MP, BI data

Sequence MME Combined ML Best tree, -lnL -177976.187 -698.907 -177975.327 Mean Bootstrap 100% 62.7% 99.4% Std. Dev. 0 28.10 1.30

MP No. MP trees 1 1 1 Length (steps) 2798 228 3036 CI (excl. U.C.) 0.9085 0.8091 0.8947 RI 0.9408 0.8571 0.9306 Mean Bootstrap 98.2 94.8 94.2

BI Mean pp 1 0.8980 1 Std. Dev. 0 0.1872 0

Monophyletic groups Section Zea X X X Section Luxuriantes X X X Saccharum & Sorghum X X X

Table 5. ML, MP, BI analysis results for sequence, microstructural mutation (MME), and combined data sets. U.C. = “uninformative characters.”

19

A BI analysis resulted in posterior probability values of one for all species, including outgroups, and an identical topology to the ML and MP tree results.

Phylogenetics: Microstructural Mutation Analysis

The binary microstructural mutation events (MME) data file of 207 characters denoted as

0 for ancestral, and 1 for derived was analyzed using ML, MP, and BI methods. The ML bootstrap consensus resulted in moderate to strong support values for two nodes. A value of 98% united Coix lacryma-jobi to Sorghum bicolor and Saccharum officinarum. A value of 71% supported the monophyly of the two subspecies of Zea mays. Other bootstrap values fell below

50% support.

The MP analysis using PAUP resulted in one tree with a length of 228 steps. The ensemble CI was 0.9079, CI (excluding uninformative characters) was 0.8091, and the RI was

0.8571. Bootstrap values were reported as 98% between Coix lacryma-jobi, Sorghum bicolor and

Saccharum officinarum, 95% between Zea luxurians, Zea perennis, and Zea diploperennis and outgroups, 90% between Zea luxurians and Zea perennis, and Zea diploperennis, 91% between

Zea mays subsp. huehuetenangensis and Zea mays subsp. mays and outgroups.

A BI analysis resulted in posterior probabilities averaging 0.9660, as noted in Table 5, and identical topologies to both the ML and MP analyses. The reported posterior probability values between specific branches of the BI tree were as follows: Sorghum bicolor and

Saccharum officinarum (0.8813), Zea luxurians, Zea perennis and Zea diploperennis (0.5169),

20

Zea perennis and Zea diploperennis (0.5097), Zea mays subsps. mays and subsp. huehuetenangensis (0.7767).

Included in the microstructural mutation analysis, was a characterization of the frequency and occurences of different types of microstructural mutations (insertions/deletions and inversions). The frequency of indels was higher than the frequency of inversions of the same length. Out of 207 microstructural mutations, 10 inversions were recorded in contrast to 197 indels. Both inversions and indels showed a general trend of decreasing frequency with increased length (p<0.01) (Fig. 2).

Occurrence vs. Length of Microstructural Mutations 500 40

450 35 400 30 350 300 25 250 20

length 200 15 150 10 100 5

50 of number occurrences 0 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 microstructural mutations

length Number of occurrences

Figure 2. Diagram examining the relationship between the number of occurrences and lengths of MME.

21

Inversions of two bp in length occurred in three of the recorded cases. However, an inversion of

16 bp in length occurred in one of the recorded cases.

Indel mutations also exhibited a similar pattern, with the highest frequency occurring between 2 – 20 bp, and a total range of 2 – 432 bp. The highest frequency of indel length was found in 37 of 197 indels at a length of 5 bp.

A total of eight mutations were considered to be rare genomic changes. Each of these was determined to be unusually large, low in frequency and either autapomorphic or shared only by closely related species (Table 6).

Table 6 Characterization of RGCs

Taxa Type of MME Length (bp) Location

Zea ingroup Deletion 73 rps16-psbK IGS

Z. perennis Insertion 432 rps16-psbK IGS

Z. perennis Deletion 165 rps16-psbK IGS

Zea subspecies Deletion 83 psbC-psbZ IGS

Zea subspecies Deletion 83 psbZ-psbM IGS

S. bicolor Deletion 61 atpI-atpH IGS

C. lacryma-jobi Deletion 126 atpH-atpF IGS

Z. diploperennis Insertion 140 psaJ

Table 6. Rare genomic changes (RGCs) characterized across the species included in this study. The taxa, MME type, length, and location in the genome are noted. (IGS=Intergenic Space)

22

Phylogenetics: Combined Sequence and Microstructural Mutation Analysis

A ML analysis resulted in a best tree with bootstrap values of 100% for all nodes, except for a value of 97% between sister groups Zea perennis + Zea diploperennis, and Zea luxurians.

A MP analysis resulted in one tree with a length of 3,036 steps, and an ensemble CI equal to 0.9681, CI (excluding uninformative characters) equal to 0.8947, and a RI equal to 0.9306.

The MP tree had bootstrap values of 100% for each branch, exclusive of the crown node for Zea perennis and Zea diploperennis, which was 71%.

A BI analysis produced one tree with average probability estimates of 1.0 for each branch

(Table 5) and topologies identical to both the ML and MP tree results.

Divergence Time Estimation Analysis

A divergence estimation log normal relaxed clock analysis using BEAST was performed.

Fossil ages were used to calibrate specifc nodes. Six monophyletic groups corresponding to those in the ML tree were constrained to be monophyletic. The crown node of two Zea subspecies was calibrated to 8,700 years as obtained from the maize starch grain data of Piperno, et al., (2009).

The crown node of Sorghum bicolor and Saccharum officinarum was calibrated to 8,000 years based on anthropological radiocarbon dated Sorghum seeds discovered in Egypt (Smith, et al.,

2000).

The BEAST analysis produced the following approximate node ages (including the fossil calibrated nodes): Zea mays subsp. mays and Zea mays subsp. huehuetenangensis: 8,700 cal.

YPB, Zea luxurians: 2,000 cal. YBP, Zea perennis and Zea diploperennis: 1,000 cal. YBP, Zea

23 species stem node: 11,000 cal. YBP, Coix lacryma-jobi: 13,000 cal. YBP, Sorghum bicolor and

Saccharum officinarum: 8,000 cal. YBP (Fig. 3; Table 4).

Figure 3. Tree result of BEAST analysis with node ages shown. Fossil calibrations were used for Zea subspecies: 8700 ybp, and for Sorghum bicolor: 8000 ybp. For 95% HPD ranges see Table 4.

24

Mutation Rates

Tallies of mutational differences between the following taxon pairs were: Zea mays subsp. mays and Zea mays subsp. huehuetenangensis: 11 MME, Zea luxurians and Zea perennis: eight MME, Zea luxurians and Zea diploperennis: 32 MME, Zea perennis and Zea diploperennis: 29 MME, Coix lacryma-jobi and Saccharum: 85 MME, Coix lacryma-jobi and

Sorghum bicolor: 90 MME, Sorghum bicolor and Saccharum officinarum: 51 MME.

By calculating the MME rates as described above, this resulted in the following mutation rates over time: Zea mays subsp. mays and Zea mays subsp. huehuetenangensis: 0.00126 mutations per year, equivalent to an average of 1 MME approximately every 800 years; Zea luxurians and Zea perennis: 0.00401 equivalent to an average of 1 MME approximately every

250 years; Zea luxurians and Zea diploperennis: 0.01604, equivalent to an average of 1 MME approximately every 60 years; Zea perennis and Zea diploperennis: 0.02830, equivalent to an average of 1 MME approximately every 35 years; Zea species (all mutations): 0.0075, equivalent to an average of 1 MME approximately every 130 years; Coix lacryma- jobi stem node (all out- group mutations): 0.0174 equivalent to an average of 1 MME approximately every 60 years;

Coix lacryma-jobi and Sorghum bicolor: 0.00696, equivalent to an average of 1MME approximately every 85 years; Coix lacryma-jobi and Saccharum officinarum: 0.00657, equivalent to an average of 1 MME approximately every 90 years; Sorghum bicolor and

Saccharum officinarum: 0.00638, equivalent to an average of 1 MME approximately every 160 years.

25

Zea perennis Insertion

Within the Zea perennis plastome between bases 12,549 and 12,632 (between CDS: psbC and psbZ) , there exists an 83 bp insertion, first identified by Doebley et al. (1987) with the use of restriction enzymes. Upon further scrutiny of this specific 83 bp sequence, it was determined that Zea perennis in fact, shares this common 83 bp sequence with all species included in this study, excluding: Zea mays subsp. mays and Zea mays subsp. huehuetenangensis, which appear to have this sequence deleted (Table 7).

Table 7

Zea perennis insertion/deletion

Species Sequence

Saccharum GTATACTGTATACACGGATACAGAATCCGCTATATCCGTTTGTGAAATAAAGGCTAAATCCCCTCCCCTCAACTCCATATCCAAATAAAAACG

officinarum

Coix lacryma-jobi GTATACTGTATACACGGATACAGAATCCGCTATATCCGTTTGTGAAATAAAGGCTAAATCCCCTCCCCTCAACTCCATATCCAAATAAAAACG

Sorghum bicolor GTATACTGTATACACGGATACAGAATCCGCTATATCCGTTTGTGAAATAAAGGCTAAATCCCCTCCCCTCAACTCCATATCCAAATAAAAACG

Zea luxurians GTATACTGTATACACGGATACAGAATCCGCTATATCCGTTTGTGAAATAAAGGCTAAATCCCCTCCCCTCAACTCCATATCCAAATAAAAACG

Zea perennis GTATACTGTATACACGGATACAGAATCCGCTATATCCGTTTGTGAAATAAAGGCTAAATCCCCTCCCCTCAACTCCATATCCAAATAAAAACG

Zea mays subsp. GTATA------AAGCG

huehuetenangensis

Zea mays subsp.

mays GTATA------AAGCG

Zea diploperennis GTATACTGTATACACGGATACAGAATCCGCTATATCCGTTTGTGAAATAAAGGCTAAATCCCCTCCCCTCAACTCCATATCCAAATAAAAACG

Table 7. Sequence alignment of the 83 bp insertion/deletion first identified by Doebley et al., 1987 as an insertion in the Zea perennis plastome. As shown, the 83 bp sequence is identified as a deletion in both Zea subspecies.

26 27

Discussion

In this study the ML, MP, and BI analyses confirmed previous phylogenies based on nuclear and mitochondrial data, as the topologies were identical for the species included within this study. Divergence estimation defined times of divergence with a molecular clock analysis that used the phylogenomic information in complete plastomes. These times could then be used to precisely determine the rates of microstructural mutations that were exhaustively catalogued across an alignment of the complete plastid chromosome.

Phylogenomic and Mutation Rate Analysis

The frequency of inversions was determined to be less than the frequency of indel mutations as predicted in previous studies (Leseberg and Duvall, 2009; Saarela et al., 2015). In the microstructural mutation and rare genomic change data set, a total of 197 indel mutations occurred among the plastomes. However, only 10 inversions were recorded. The most frequently observed mutational mechanism was slipped-strand mispairing in which a segment of sequence was excised or repeated due to slippage between tandem repeats.

Indels were hypothesized to occur with greater frequency when the event involved fewer nucleotide bases because the amount of energy required to cause slippage over fewer nucleotides should be less than that required for slippage over more nucleotides (Wu et al., 1991). However, the trend describing the frequency of indels and inversions did not perfectly correlate to the length of the mutation. Instead, small mutations between 2-4 bp in length showed an upward trend, peaking at five bp, and only then gradually decreasing as length increased, rather than having the greatest frequency present in the smallest length mutation followed by a gradual 28 decrease as length increased (Fig. 2). A possible explanation for this trend is that a small mutational repair mechanism may more easily repair the shortest classes of indels. However, the mechanism may be less effective with mutations greater than four bp.

The eight individual mutations classified as RGCs occurred only once per genome, and were not present across multiple species (excluding the Zea subspecies). However, both Zea perennis and both subspecies mays and huehuetenangensis contained multiple RGCs. With regard to the subspecies, this may have been accelerated as a result of artificial selection during domestication. More research is needed for a definitive answer to these findings by including complete plastome sequence from Zea mays subsp. mexicana, subsp. parviglumis, and Zea nicaraguensis.

The overall microstructural mutation rate for the genus ranges from 0.00126 to 0.02830.

The reported range shows that the highest MME rate is nearly 22 times greater than the lowest

MME rate. Nucleotide substitution rates have been estimated for grasses and other plants.

Relatively few studies have estimated rates of indel mutation in grass species. Previous studies using complete chloroplast data reported mutation ranges per site per year in intergenic regions of varying lengths as 0.8 +/- 0.04 X 10-9 (Yamane et al., 2006). Converting the mutation rates obtained in this study to per site per year, the rates range from 1.09 X 10-8 to 2.4 X 10-7. These values are much larger than the previous data of Yamane et al., (2006), however this study utilizes the complete plastome rather than only intergenic regions (as done in the Yamane study).

Therefore, this study provides a precise measure of MME occuring within both the CDS and intergenic space for the complete plastomes. It must also be noted that the divergence estimates used by Yamane et al., (2006) placed the divergence of Saccharum and maize at roughtly 7.6

29 million years ago based on extrapolations of rbcL substution rates per synonymous site per year.

In contrast, this study utilizes the node ages obtained from the BEAST divergence analysis calibrated on reliable fossils.

Among the examined species there is unexpected and greatly varied magnitude of MME rates. The Zea subspecies (mays and huehuetenangensis) exhibited the lowest overall mutation rates, while Zea diploperennis exhibited the highest rate. Since the traits of Zea diploperennis are not artifically selected, this is an unexpected result. Given the highly variable rates of mutation observed in this study, in-depth sampling is shown to be necessary to fully characterize rates of

MMEs among taxa.

In comparison to the outgroup taxa, Zea has, on average, a greater MME rate (0.02480

MME per year for Zea; 0.00664 MME per year for outgroup taxa). The range of MME rates for the genus is also greater than that of the outgroup taxa included in this study, ranging from

0.00126 to 0.02380 MME per year in Zea contrasted with that of the outgroup, which ranges from 0.00638 to 0.00696 MME per year.

Divergence Analysis (BEAST)

Based upon the accuracy of fossil calibrated divergence time estimates, the speciation events within the genus are relatively recent in terms of evolutionary time, having taken place within the last 1,000-11,000 years. This corresponds to the widely accepted domestication timeline of Zea mays, although it more firmly establishes an absolute history for these events estimated from complete plastome data and reliable archeological and fossil evidence.

30

While the Zea subspeceis‘ divergence can be linked to its independent domestication, the divergence of the other taxa likely faced very different selection pressures. Historically, the perennial species were geographically distributed across Mexico and Central America in diverse locations. Zea diploperennis, Zea perennis, and Zea luxurians are found in areas averaging 858-

1946 m in elevation (Hufford et al. 2012). The area is part of the Sierra Madre Occidental and

Sierra Madre Del Sur mountain ranges. Mountainous regions are known for isolating species and imposing selective variation (Hedberg et al., 1969). Another potential explanation for this divergence can be explained by climate driven variation of the early Holocene which ushered in a warming and drying trend (Ellwood et al., 2006). These factors may have led to the derived conditions seen in this analysis.

Zea perennis “Insertion"

Upon analysis of the Zea perennis “insertion," it became clear that the original event, identified by Doebley et al. (1987) is an ancestral condition rather than a derived insertion.

Furthermore, Zea mays subsp. mays and Zea mays subsp. huehuetenangensis have a deletion of this 83 bp sequence. This specific sequence was previously misclassified as an insertion within the Zea perennis plastome based on the incomplete RFLP evidence. With the multiple Zea species examined in this study, as well as the outgroup species, evidence supports the reclassification of this specific 83 bp sequence as a deletion (rather than an insertion within the

Zea perennis plastome) in the two Zea subspecies present in this study (Table 7).

31

Conclusion

This study has examined the divergence of selected species within the genus Zea based on sequence and microstructural mutation data obtained from complete chloroplast genomes.

The phylogenomic topologies here reflect those of analogous studies and express identical phylogenies across these multiple analyses. As the genus is relatively recent and rapid in its divergence, the precisely measured microstructural mutation rates are greatly varying among the species examined in this study. Undoubtedly, the domestication of maize has contributed partially to the rapid diversification and speciation of genus Zea. However, factors such as climate shifts during the Holocene and diverse geographic distributions may also have contributed to much of the divergence seen within the genus.

LITERATURE CITED

Akaike H. 1974. A new look at the statistical model identification. Institute of Electrical and Electronics Engineers Transactions on Automatic Control 19:716–723.

Bouckaert, R., Heled, J., Kcompbiol.org/article/info%253adoi%252f10.1371%252fjournal.pcbi.1003A., & Drummond, A. J. (2014). BEAST 2: A Software Platform for Bayesian Evolutionary Analysis. PLoS Computational Biology, 10(4), e1003537. doi:10.1371/journal.pcbi.1003537

Burke, S. V., Grennan C. P., and Duvall M. R. "Plastome sequences of two New World bamboos—Arundinaria gigantea and Cryptochloa strictiflora (Poaceae)—extend phylogenomic understanding of Bambusoideae." American journal of botany 99.12 (2012): 1951-1961.

Caruccio, Nicholas. "Preparation of next-generation sequencing libraries using Nextera™ technology: simultaneous DNA fragmentation and adaptor tagging by in vitro transposition." High-Throughput Next Generation Sequencing. Humana Press, 2011. 241-255.

Cook J., Wathen K. J., and Nguyen H. (2013) Parametersolver v3.0 Available from https://biostatistics.mdanderson.org/SoftwareDownload/SingleSoftware.aspx?Software_I d=6

Cox, M. et al., "SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data." BMC Bioinformatics (2010) 11:485.

Dahlberg, J. A., and K. Wasylikowa. "Image and statistical analyses of early sorghum remains (8000 BP) from the Nabta Playa archaeological site in the Western Desert, southern Egypt." Vegetation History and Archaeobotany 5.4 (1996): 293-299.

Darracq, Aude, Jean-Stéphane Varré, and Pascal Touzet. "A scenario of mitochondrial genome evolution in maize based on rearrangement events." BMC genomics 11.1 (2010): 233.

Dhingra, A. and K. Folta. ASAP: amplification, sequencing and annotation of plastomes. BMC Genomics (2005) 6: 176.

Doebley, J., et al., "Insertion/deletion mutations in the Zea chloroplast genome." Current Genetics. (1987) 11: 617-624.

33

Doebley, J., Laboratory of Genetics. University of Wisconsin-Madison. (2003) http://teosinte.wisc.edu/taxonomy.html

Ellwood, Brooks B., and Wulf A. Gose. "Heinrich H1 and 8200 yr BP climate events recorded in Hall's Cave, Texas." Geology 34.9 (2006): 753-756.

Felsenstein J: PHYLIP (phylogeny inference package). Distributed by the author. Department of Genome Sciences, University of Washington, Seattle; 2005

Fuglie, Keith, et al. "Private Industry Investing Heavily, and Globally, in Research To Improve Agricultural Productivity." Amber Waves 2 (2012).

Grass Phylogeny Working Group II (2012), New grass phylogeny resolves deep evolutionary relationships and discovers C4 origins. New Phytologist, 193: 304–312. doi: 10.1111/j.1469-8137.2011.03972.x

Hedberg, Olov. "Evolution and speciation in a tropical high mountain flora." Biological Journal of the Linnean Society 1.1‐2 (1969): 135-148.

Hufford, Matthew B., et al. "Teosinte as a model system for population and ecological genomics." Trends in Genetics 28.12 (2012): 606-615.

Iles, William JD, et al. "Monocot fossils suitable for molecular dating analyses."Botanical Journal of the Linnean Society (2015).

Jones, Samuel S., Sean V. Burke, and Melvin R. Duvall. "Phylogenomics, molecular evolution, and estimated ages of lineages from the deep phylogeny of Poaceae." Plant systematics and evolution 300.6 (2014): 1421-1436.

Kim, K.J., Lee, H.L., “Widespread occurrence of small inversions in the chloroplast genomes of land plants.” Molecules and Cells. (2005). 19:1. Pp104-113.

Larkin, Mark A., et al. "Clustal W and Clustal X version 2.0." Bioinformatics 23.21 (2007): 2947-2948.

Leseberg, Charles H., and Melvin R. Duvall. "The complete chloroplast genome of Coix lacryma-jobi and a comparative molecular evolutionary analysis of plastomes in cereals." Journal of molecular evolution 69.4 (2009): 311-318.

Maier, Rainer M., et al. "Complete sequence of the maize chloroplast genome: gene content, hotspots of divergence and fine tuning of genetic information by transcript editing." Journal of molecular biology 251.5 (1995): 614-628.

34

Matsuoka, Yoshihiro, et al. "A single domestication for maize shown by multilocus microsatellite genotyping." Proceedings of the National Academy of Sciences 99.9 (2002): 6080-6084.

Miller, M.A., Pfeiffer, W., and Schwartz, T. (2010) "Creating the CIPRES Science Gateway for inference of large phylogenetic trees" in Proceedings of the Gateway Computing Environments Workshop (GCE), 14 Nov. 2010, New Orleans, LA pp 1 - 8. Murray, S.C., Jessup, R.W. (2014) “Breeding and Genetics of Perenial Maize: Progress, Opportunities and Challenges.” http://www.landinstitute.org/wp- content/uploads/2014/11/PF_FAO14_ch08.pdf.

Piperno, Dolores R., et al. "Starch grain and phytolith evidence for early ninth millennium BP maize from the Central Balsas River Valley, Mexico."Proceedings of the National Academy of Sciences 106.13 (2009): 5019-5024.

Pohl, Mary ED, et al. "Microfossil evidence for pre-Columbian maize dispersals in the neotropics from San Andres, Tabasco, Mexico." Proceedings of the National Academy of Sciences 104.16 (2007): 6870-6875.

Posada D. jModelTest: Phylogenetic Model Averaging. Molecular Biology and Evolution. Guindon S and Gascuel O (2003). A simple, fast and accurate method to estimate large phylogenies by maximum-likelihood". Systematic Biology 52: 696-704.

Rambaut A, et al. (2014) FigTree v1.4.2, Available from http://tree.bio.ed.ac.uk/software/figtree/

Ranere, Anthony J., et al. "The cultural and chronological context of early Holocene maize and squash domestication in the Central Balsas River Valley, Mexico." Proceedings of the National Academy of Sciences 106.13 (2009): 5014-5018.

Ranum, Peter, Juan Pablo Peña‐Rosas, and Maria Nieves Garcia‐Casal. "Global maize production, utilization, and consumption." Annals of the New York Academy of Sciences 1312.1 (2014): 105-112.

Ronquist, F. and J. P. Huelsenbeck. 2003. MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572-1574.

Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D. L., Darling, A., Höhna, S., ... & Huelsenbeck, J. P. (2012). MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic biology, 61(3), 539-542.

Ross-Ibarra, Jeffrey, Maud Tenaillon, and Brandon S. Gaut. "Historical divergence and gene flow in the genus Zea." Genetics 181.4 (2009): 1399-1413.

35

Saarela, Jeffery M., William P. Wysocki, Craig F. Barrett, Robert J. Soreng, Jerrold I. Davis, Lynn G. Clark, Scot A. Kelchner, J. Chris Pires, Patrick P. Edger, Dustin R. Mayfield, and M. R. Duvall. Plastid phylogenomics of the cool-season grass subfamily: Clarification of relationships among early-diverging tribes. AoB plants (2015): plv046

Schnable, Patrick S., et al. "The B73 maize genome: complexity, diversity, and dynamics." science 326.5956 (2009): 1112-1115.

Smith, C. Wayne, and Richard A. Frederiksen, eds. Sorghum: Origin, history, technology, and production. Vol. 2. John Wiley & Sons, 2000.

Stamatakis, A. (2014) RAxML Version 8: A tool for Phylogenetic Analysis and Post- Analysis of Large Phylogenies. Bioinformatics 10.1093/bioinformatics/btu033 http://bioinformatics.oxfordjournals.org/content/early/2014/01/21/bioinformatics.btu 033.abstract

Swofford, D. L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts.

USDA. Foreign Agricultural Service. http://apps.fas.usda.gov/psdonline/psdreport.aspx?hidReportRetrievalName=BVS&hidRe portRetrievalID=884&hidReportRetrievalTemplateID=1

Vicentini, Alberto, et al. "The age of the grasses and clusters of origins of C4 photosynthesis." Global Change Biology 14.12 (2008): 2963-2977.

Wu, Dan Y., et al. "The effect of temperature and oligonucleotide primer length on the specificity and efficiency of amplification by the polymerase chain reaction." DNA and cell biology 10.3 (1991): 233-238.

Wysocki WP, Clark LG, Kelchner SA, Burke SV, Pires JC, Edger PP, Mayfield DR, Triplett JK, Columbus JT, Ingram AL, Duvall MR. “A multi-step comparison of short-read full plastome sequence assembly methods in grasses.” Taxon 2014.

Yamane, Kyoko, Kentaro Yano, and Taihachi Kawahara. "Pattern and rate of indel evolution inferred from whole chloroplast intergenic regions in sugarcane, maize and rice." DNA research 13.5 (2006): 197-204.

Zerbino, D., Birney, E. "Velvet algorithims for de novo short read assembly using de Bruijn graphs." Genome Research (2008) 18: 821-829.