This is the pre-peer reviewed version of the following article:

Sancho R, Cantalapiedra CP, López-Alvarez D, Gordon SP, Vogel JP, Pilar Catalán P, Contreras- Moreira B (2017) Pan-plastome and phylogenomics of : flowering time signatures, introgression and recombination in recently diverged ecotypes. New Phytologist, doi: 10.1111/nph.14926

which has been published in final form at http://onlinelibrary.wiley.com/doi/10.1111/nph.14926/abstract

This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.

Comparative genomics and phylogenomics of Brachypodium plastomes: flowering time signatures, introgression and recombination in recently diverged ecotypes

For Peer Review Journal: New Phytologist

Manuscript ID Draft

Manuscript Type: MS - Regular Manuscript

Date Submitted by the Author: n/a

Complete List of Authors: Sancho, Ruben; University of Zaragoza, Department of Agricultural and Environmental Sciences Cantalapiedra, Carlos; Estacion Experimental de Aula Dei, Departamento de Genética y Mejora López-Álvarez, Diana; Universidad de Zaragoza Escuela Politecnica Superior de Huesca, Department of Agricultural and Environmental Sciences Gordon, Sean; DOE Joint Genome Institute, Functional Genomics Vogel, John; USDA, Western Regional Research Center Catalán, Pilar; University of Zaragoza, Department of Agriculture Contreras-Moreira, Bruno; Estacion Experimental de Aula Dei, Genetics

Brachypodium distachyon – B. stacei – B. hybridum, comparative cpDNA Key Words: genomics, grass phylogenomics, intraspecific genealogy, nested dating analysis, plastid introgression and recombination

Manuscript submitted to New Phytologist for review Page 1 of 42

1 Title:

2 Comparative genomics and phylogenomics of Brachypodium plastomes: flowering time

3 signatures, introgression and recombination in recently diverged ecotypes

4

5 Authors: Rubén Sancho 1,2, Carlos P. Cantalapiedra 3, Diana LópezAlvarez 1, Sean P. Gordon 4, ,

4,7 1,2,5* 2,3,6* 6 John P. Vogel , Pilar ForCatalán PeerBruno ContrerasMoreira Review 7

8 Affiliations:

9 1 Department of Agricultural and Environmental Sciences, High Polytechnic School of 10 Huesca, University of Zaragoza, Huesca,

11 2 Grupo de Bioquímica, Biofísica y Biología Computacional (BIFI, UNIZAR), Unidad 12 Asociada al CSIC

13 3 Department of Genetics and Plant Breeding, Estación Experimental de Aula DeiConsejo 14 Superior de Investigaciones Científicas, Zaragoza, Spain

15 4 DOE Joint Genome Institute, Walnut Creek, CA USA

16 5 Department of Biology, Tomsk State University, Tomsk, Russia

17 6 Fundación ARAID, Zaragoza, Spain.

18 7Department of Plant and Microbial Biology, University of California, Berkeley, CA

19

20 * Corresponding authors (both authors contributed equally as corresponding authors):

21 Pilar Catalán. High Polytechnic School of Huesca, University of Zaragoza. Ctra. Cuarte km 1,

22 22071 Huesca (Spain). Phone: +34974232465; fax: +34974239302. Email:

23 [email protected]

Manuscript submitted to New Phytologist for review 1

Page 2 of 42

24 Bruno ContrerasMoreira. Estación Experimental de Aula Dei / CSIC, Av. Montañana 1.005,

25 50059 Zaragoza (Spain). Phone: +34976716089. Email: [email protected]

Total word count 5 (Figs. 1, 2, 3a, b, 4 (excluding summary, 5896 No. of figures: references and in colour) legends):

Summary: 200 No. of Tables: 1 (Table 1a, b)

14 (Tables S1, S2, For Peer ReviewS3, S4, S5, S6a, b, c, d, e, S7, S8, S9, S10; Figs. S1, S2a, b, c, No of Supporting Introduction: 579 S3a, b, S4a, b, S5a, Information files: b, S6a, b (Figs. S1, S2a, b, S3a, b, S4a, b, S6a, b in colour), Methods S1)

Materials and 1268 Methods:

Results: 1844

Discussion: 2205

Acknowledgements: 116

26

Abbreviation Definition

BEP Bambusoideae, Ehrhartoideae and clade

BI bayesian inference

bp base pairs

BS bootstrap support

CDS coding sequence

Manuscript submitted to New Phytologist for review 2

Page 3 of 42

cpDNA chloroplast DNA

d average number of nucleotide difference

second order rate of change of the log probability of data between successive DeltaK (K) K values for a particular K

DF delayed flowering

EDF extremely delayed flowering

EDF+ extremely delayed flowering (plus other flowering class types) clade

ENA European Nucleotide Archive ERF extremelyFor rapid flowering Peer Review ESS Effective sample size

Fis inbreeding coefficient

G gamma

GTR generalised timereversible model

h haplotype

Hd haplotypes diversity index

HPD highest posterior density (interval)

I proportion of invariant sites

IBD isolation by distance

IDF intermediate delayed flowering

IR inverted repeat

IRF intermediate rapid flowering

K number of potential genomic groups

kbp kilo base pairs

LSC long single copy

Ma millions years ago

Mbp mega base pairs

MCC maximum clade credibility

MCMC Markov chain Monte Carlo

ML maximum likelihood

Manuscript submitted to New Phytologist for review 3

Page 4 of 42

MP maximum parsimony

MP matepair

N or Ns missing data

ncDNA noncoding DNA

NSyn nonsynonymous mutations

NV no vernalization

Panicoideae, , , , Aristidoideae PACMAD and clade PCR polymeraseFor chain reactionPeer Review PE pairedend

PPS posterior probability support

RADseq restriction site associated DNA markers

RF Rapid Flowering

S number of segregating sites

Syn synonymous mutations

shm number of shared mutations

SNP single nucleotide polymorphism

SSC short single copy

S+ Spanish (plus other geographically close ecotypes) group

Spanish (plus other geographically close ecotypes) and Turkish (plus other S+T+ geographically close ecotypes) clade

T+ Turkish (plus other geographically close ecotypes) group

ucld uncorrelated lognormal distribution

WGS whole genome sequencing

wV weeks of vernalization

x chromosome base number

27

Manuscript submitted to New Phytologist for review 4

Page 5 of 42

28 SUMMARY

29 ● Few pangenomic studies have been conducted in , and none of them have focused on 30 the intraspecific diversity and evolution of their chloroplast genomes.

31 ● We address this issue in , a model system for monocots, and its 32 close relatives B. stacei and B. hybridum , for which a large genomic data set has been 33 compiled. We analyze inter and intraspecific cpDNA comparativegenomics and 34 phylogenomic relationships within a familywide framework.

35 ● Major structural rearrangements were detected between the B. distachyon and B. stacei/B. 36 hybridum plastomes. TwoFor main lineages, Peer an Extremely Review Delayed Flowering (EDF+) clade and 37 a Spanish Turkish (S+T+) clade, plus nine chloroplast capture and two cpDNA introgression 38 and microrecombination events, were detected within B. distachyon . Early Oligocene (30.9 39 Mya) and Late Miocene (10.1 Mya) divergence times were inferred for the respective stem 40 and crown nodes of Brachypodium and a very recent MidPleistocene (0.9 Mya) time for the 41 B. distachyon split.

42 ● Flowering time is a main factor driving rapid intraspecific divergence in B. distachyon , 43 though it is counterbalanced by repeated introgression between previously isolated lineages. 44 Swapping of plastomes among its three different genomic groups (EDF+, S+, T+), likely 45 resulted from random backcrossing followed by stabilization through selection pressure.

46

47

48 Key words: Brachypodium distachyon – B. stacei – B. hybridum , comparative cpDNA 49 genomics, grass phylogenomics, intraspecific genealogy, nested dating analysis, plastid 50 introgression and recombination.

Manuscript submitted to New Phytologist for review 5

Page 6 of 42

51 INTRODUCTION

52 Chloroplast DNA (cpDNA) has been widely used in inter and intraspecific phylogenetic 53 analyses in multiple and populations of plants (Waters et al. , 2012; Ma et al. , 2014; 54 Middleton et al. , 2014; Wysocki et al. , 2015). Phylogenetic dating of monocots and eudicots 55 have also been based on cpDNA (Chaw et al. , 2004). Comparative genomics of whole 56 chloroplast genomes (plastomes) have provided a way to detect and investigate genetic 57 variation across the seed plants (Jansen & Ruhlman, 2012). The proliferation of Whole 58 Genome Sequencing (WGS) , which typically includes a substantial amount of chloroplast 59 sequence, has providedFor large data Peersets that can beReview used to assemble and analyze plastomes 60 (Nock et al. , 2011).

61 Brachypodium is a small in the family that contains approximately 20 species 62 (17 perennial and 3 annual) distributed worldwide (Schippmann, 1991; Catalán & Olmstead, 63 2000; Catalán et al. , 2012, 2016a). The three annuals include two diploids [B. distachyon 64 (2n=2x=10; x=5) and B. stacei (2n=2x=20; x=10)] and their derived allotetraploid [B. 65 hybridum (2n=4x=30; x=5+10)]. Until recently, all three species were described as variants of 66 B. distachyon (Catalán et al. , 2012). All three species have a large, overlapping distribution in 67 their native circumMediterranean region (Catalán et al., 2012, 2016b; LópezAlvarez et al. , 68 2012; LopezAlvarez et al. , 2015) and B. hybridum has naturalized extensively around the 69 world. The evolutionary relationship between Brachypodium and other grasses has been 70 thoroughly studied (Catalán et al. , 1997; Catalán & Olmstead, 2000; Döring et al. , 2007). 71 Most recent phylogenetic analyses place Brachypodium in an intermediate position within the 72 Pooideae clade (Minaya et al. , 2015; Soreng et al. , 2015; Catalán et al. , 2016a). By contrast, 73 only a few studies of intraspecific variation have been conducted in the genus Brachypodium, 74 primarily focusing on B. distachyon (e. g., Filiz et al. , 2009; Vogel et al. , 2009; Mur et al. , 75 2011; Tyler et al. , 2016).

76 B. distachyon has been established as a model plant for temperate and biofuel grasses 77 (Vogel et al. , 2010; Mur et al. , 2011; Catalán et al. , 2014; Vogel, 2016). Additionally, the B. 78 distachyon complex has been proposed as a model system for grass polyploid speciation 79 (Catalán et al. 2014; Gordon et al. 2016; Dinh Thi et al., 2016). Nuclear and chloroplast 80 genomes of the Bd21 ecotype of B. distachyon have been sequenced, assembled and 81 annotated. The nuclear genome encompasses 272 Mbp (Vogel et al. , 2010) and contains

Manuscript submitted to New Phytologist for review 6

Page 7 of 42

82 31,694 proteincoding loci. The current chloroplast genome (NC_011032.1) is 135,199bp long 83 and encodes 133 genes (Bortiri et al. , 2008).

84 In parallel with the creation of the nuclear pangenome of B. distachyon from 53 diverse lines 85 (Gordon SP et al. under review, citation will be added), and the genome sequencing of its 86 close congeners B. stacei and B. hybridum (unpublished, early access available through 87 Phytozome), we isolated cpDNA sequences from WGS pairedend reads to assemble the 88 corresponding plastomes. Our aim was to compile a large comparativegenomics plastome 89 data set and investigate the evolutionary relationships of the annual Brachypodium species and 90 accessions within the grass phylogenetic framework. The specific objectives of this study were 91 to : (1) assemble, annotateFor and compare Peer the 57 plastom Reviewes of B. distachyon , B. stacei and B. 92 hybridum ; (2) reconstruct and date the divergences within the Brachypodium lineages and a 93 familywide plastome phylogeny, (3) infer the genealogical relationships within the studied 94 accessions of B. distachyon and compare them with the nuclear genome genealogy, and (4) 95 investigate the potential existence of plastid introgression and recombination in B. distachyon 96 ecotypes known to hold nuclear introgressions.

97

98 MATERIALS AND METHODS

99

100 Plant materials 101 Brachypodium distachyon, B. stacei and B. hybridum ecotypes used in this work are inbred 102 lines derived from our own collections (Vogel et al. , 2009; Mur et al. , 2011; Catalán et al. , 103 2012) and from the National Plant Germplasm System (NPGS) and Brachyomics collections 104 (USDA and ABER lines; Vogel et al. , 2006; Garvin, 2007; Garvin et al. , 2008). Most ecotypes 105 were originally collected in Spain, and Iraq (Table S1, Fig. 1) (Vogel & Hill, 2008; 106 Filiz et al. , 2009; Mur et al. , 2011). Available plastome data from the main grass lineages were 107 retrieved from GenBank (Table S2).

108

109 Chloroplast DNA automated assembly, annotation and validation

110 Illumina pairedend (PE) and matepair (MP) libraries from 53 B. distachyon , 1 B. stacei and 3 111 B. hybridum accessions were produced from high molecular weight nuclear genomic DNA, 112 isolated as described previously (Peterson et al. , 2000), and randomly sheared into fragments

Manuscript submitted to New Phytologist for review 7

Page 8 of 42

113 of the desired size, mostly 250bp, 4kbp and 10kbp (Gordon et al. , 2014). Read length was 73 114 100 bp in most cases.

115 We developed a pipeline, available at https://github.com/eeadcsic 116 compbio/chloroplast_assembly_protocol, for the assembly and annotation of plastid genomes 117 (Methods S1, Table S3, Fig. S1). Briefly, chloroplast reads are extracted from WGS data using 118 DUK (http://duk.sourceforge.net), followed by quality control and error correction, with 119 FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc), Trimmomatic (Bolger et 120 al. 2014) and Musket (Liu et al. , 2013). The filtered reads were then assembled with Velvet 121 (Zerbino, 2010), SSpace (Boetzer et al. , 2011) and GapFiller (Boetzer and Pirovano 2012; 122 Nadalin et al. 2012). For Peer Review

123 This pipeline can be used to perform both de novo and referenceguided assemblies. The 124 referenceguided assemblies were optimal, so we used the Bd21 chloroplast sequence as a 125 guide to create 57 plastome assemblies. Assembly errors were corrected with SEQuel (Ronen 126 et al. , 2012), and by visual inspection of read mappings using IGV (Thorvaldsdóttir et al 127 2013).

128 Gene annotation was performed exhaustively for a single line of each species, and then 129 transferred with custom scripts to the remaining chloroplast assemblies. The cpDNA genomes 130 were compared with Organellar Genome DRAW (Lohse et al. , 2013) and Circos (Krzywinski 131 et al. , 2009). Junctions between IRLSC, LSCIR, IRSSC and SSCIR regions as well as main 132 structural variations were confirmed by PCR amplification and Sanger sequencing (Table S4). 133 The annotated plastomes of B. distachyon , B. stacei and B. hybridum ecotypes were deposited 134 at ENA (European Nucleotide Archive) with accession numbers LT222229 30 and 135 LT558582LT558636.

136

137 Intra-specific genealogy, haplotypic network, and genomic diversity and structure 138 analyses

139 Plastomes from the 53 B. distachyon accessions (Table S1) were aligned using MAFFT 140 (Katoh & Standley, 2013), and poorly aligned regions removed with trimAl (CapellaGutiérrez 141 et al. 2009). The second inverted repeat region (IRb), arguably the lowest quality segment of 142 all plastomes, was excluded from subsequent phylogenetic analysis (Nock et al. , 2011; 143 Middleton et al. , 2014; Saarela et al. , 2015). Alignments were revised and manually curated 144 using Geneious (Kearse et al. , 2012).

Manuscript submitted to New Phytologist for review 8

Page 9 of 42

145 MaximumLikelihood (ML) and Bayesian inference (BI) phylogenomic analyses were 146 performed with RAxML v.8.1.17 (Stamatakis, 2014) and MrBayes v.3.2.4 (Ronquist et al. 147 2011; Ronquist and Huelsenbeck 2003), respectively. The GTR+I+G substitution model, 148 selected by JModelTest based on the Akaike Information Criterion (Guindon & Gascuel, 149 2003; Darriba et al. , 2012), was imposed in the searches. In the ML search we computed 20 150 starting trees from 20 distinct randomized Maximum Parsimony (MP) trees and 1000 151 bootstrap replicates. In the BI search, two sets of four chains were run for 2 million 152 generations, sampling trees and parameters every 100th generation. A 50 majority rule 153 consensus tree was computed discarding the first 25% saved trees as ‘burnin’. All trees were 154 midpoint rooted. For Peer Review 155 Haplotypic network analysis was conducted with the 53 B. distachyon plastome alignment 156 after removing IRb, indels and columns with missing data (Ns). Statistic parsimony analysis 157 was performed with TCS v1.21 (Clement et al. , 2000), setting a maximum connection of 1000 158 steps. Haplotype polymorphism and genetic diversity statistics of the plastome data set, such 159 as the number of segregating sites (S) and haplotypes (h), the haplotype diversity index (Hd), 160 and the number of shared mutations (shm) and the average number of nucleotide differences 161 (d) among the three intraspecific genetic groups retrieved from the phylogenomic analysis 162 (see Results) were calculated with DnaSP v.5 (Librado & Rozas, 2009).

163 Bayesian genomic clustering analysis was performed to infer the structure of the data, using a 164 B. distachyon cpDNA data matrix of 298 mapped polymorphic positions, and to assign 165 accessions’ plastomes to the inferred groups using Structure v.2.3.4 (Pritchard et al. , 2000). 166 The program was run for a number of potential genomic groups (K) from 1 to 6, imposing 167 ancestral admixture and correlated allele frequencies priors. Ten independent runs with 168 100,000 burnin steps, followed by 1,000,000 generations were computed for each K. The 169 number of genetic clusters was estimated using Structure Harvester (Earl & vonHoldt, 2012), 170 which identifies the optimal K based both on the posterior probability of the data for a given K 171 and the ∆K (Evanno et al. , 2005). The potential existence of interplastome recombination in 172 two introgressed ecotypes (see Results) was further assessed through visual inspection of the 173 mapped polymorphic alignments and through the recombination detection methods 174 implemented in RDP4 v.4.56 (RDP, GENECONV, BootScan, MaxChi, Chimaera, SiScan, 175 LARD, 3SEQ (Martin et al. , 2015) and in OrgConv (Hao 2010), using default settings in all 176 cases.

177

Manuscript submitted to New Phytologist for review 9

Page 10 of 42

178 Phylogenetic and molecular dating analyses

179 A grass plastome alignment was built including all B. distachyon, one B. stacei and one B. 180 hybridum ecotypes (55 accessions; Table S1) plus the plastomes of 90 grasses (Table S2). ML 181 analysis was performed with RAxML following the same steps indicated above. Pairwise 182 TamuraNei (TN) raw genetic distances and pairwise TN patristic (RAxMLtree) distances 183 were computed between all pairs of grass entries using MEGA (Kumar et al. , 2016) and 184 Geneious (Kearse et al. , 2012), respectively.

185 Divergence time estimations of the Brachypodium lineages were calculated within a family 186 wide dated phylogenyFor using a Bayesian Peer nested datingReview partitioned approach (Pokorny et al. , 187 2011; Mairal et al. , 2015) in BEAST v1.8.2 (Drummond et al. , 2012). Because there are no 188 known fossil records of Brachypodium , a highlevel more inclusive grass data set (93 samples 189 = 90 grass species + 1 B. distachyon + 1 B. stacei + 1 B. hybridum accessions, 110,370 bp 190 length, 22,489 polymorphic positions) was used to estimate divergence times within the B. 191 distachyon ingroup (53 samples, 110,370 bp length, 415 polymorphic positions). The grass 192 tree was rooted with the ancestral species Anomochloa marantoidea . The estimated ages were 193 drawn from deeptime calibrations imposed in the Poaceae partition and were used to 194 constrain the molecular clock rate of the linked B. distachyon populationlevel data set and to 195 calibrate the divergence time of its crown node. We estimated divergence times among the 196 Poaceae lineage imposing GTR+G+I, lognormal relaxed clock and Yule tree models, a broad 197 uniform distribution prior for ucld.mean (lower=1.0E6; upper = 0.1) and a default exponential 198 prior for ucld.stdev. Calibrations were drawn from fossilrich dating analyses of the grass 199 family (Vicentini et al. , 2008; BouchenakKhelladi et al. , 2010; Christin et al. , 2014) and were 200 imposed as secondary age constrains for the crown nodes of Poaceae (90 ± 1.0 Mya) and of 201 the BEP+PACMAD clade (55 ± 0.5 Mya), assuming normal distributions. For the intra 202 specific B. distachyon data set we imposed a coalescent constantsize tree model. We ran 203 1,000,000,000 MCMC generations in Beast with a sampling frequency of 1,000 generations 204 after a burnin period of 1%. The adequacy of parameters was checked using Tracer v1.6 205 (http://beast.bio.ed.ac.uk/Tracer ), noting ESS values > 200. Maximum clade credibility (MCC) 206 trees were computed for the Poaceae and for the B. distachyon data sets after discarding 1% of 207 the respective saved trees as burnin.

208

209

Manuscript submitted to New Phytologist for review 10

Page 11 of 42

210 RESULTS 211 212 Structure, gene content and sequence in B. distachyon , B. stacei and B. hybridum cpDNAs

213 Referenceguided assemblies were obtained for 57 plastomes. Fortyone assemblies contained 214 ≤ 10 contigs, with an average longest contig length of 84 kbp and 176x depth coverage (Table 215 S5). After scaffolding, 45 assemblies had ≤ 4 scaffolds with a mean plastome length of 124.5 216 kbp. Missing data ranged from 0 to 6%, with most plastomes (38) showing ≤ 0.1%. Most of 217 the missing sequence was located in the IRb region which is difficult to assemble because of 218 its redundancy. The resulting Brachypodium plastomes were highly conserved in terms of 219 synteny and gene number.For Chromosome Peer lengths Reviewvaried from 134,991 to 135,214 bp in B. 220 distachyon , and between 136,326 and 136,330 bp in B. stacei and B. hybridum (Table S5). 221 Reference accession B. distachyon Bd21 (NC_011032.1; Bortiri et al ., 2008; 2010 – direct 222 submission) and our B. distachyon Bd21 control (Bd21C, assembled and annotated in the 223 current study) showed some differences [10 SNPs and 19 insertions; Table S6a)]. Our newly 224 assembled Bd21 plastome has better supporting evidence than the NC_011032.1 assembly, as 225 most mutations detected in our assembly have great read depth coverage and were also found 226 in a large number of plastomes of the other studied B. distachyon accessions (see Table S6a). 227 While most of these polymorphisms lie in intergenic regions, some were located in protein 228 coding genes such as psb A (1 synonymous (Syn) mutation), psb K (1 nonsynonymous (NSyn) 229 mutation), rpo C2 (1 Syn and 1 NSyn), psa A (1 Syn), and also in one copy of the rRNA 16S 230 locus. The B. distachyon plastomes showed the same arrangement and number of genes (133) 231 as our freshly annotated Bd21control (Bd21C) (Table S6a). In particular, they contained 76 232 protein coding genes, 7 of which were duplicated, 20 nonredundant tRNAs (out of a total 38), 233 4 rRNAs in both inverted repeats, 4 pseudogenes ( trn I, rps 12a, trn T and trn I) and 2 234 hypothetical open reading frames (ycf). Several polymorphisms, mostly nonsynonymous, 235 were detected in comparison to several grass plastomes. The regions with the highest number 236 of polymorphisms were rpo C2 (70 SNPs), ndh F (59 SNPs), rpo B (31 SNPs) and mat K (30 237 SNPs), suggesting a significant correlation between SNP frequency and gene length (Table 238 S6b; R 2 = 0.68, p < 2.2e16).

239 B. stacei and B. hybridum accessions showed the same overall plastid genomic features as the 240 B. distachyon with two exceptions (Fig. 2). They both contained a 1,161bp insertion between 241 psa l and rbc L in the Long SingleCopy (LSC) region. This insertion was also detected in 242 homologous regions of several grasses (Table S6c), and confirmed by read mapping (Fig. S2a,

Manuscript submitted to New Phytologist for review 11

Page 12 of 42

243 b) and found to contain a CDS fragment annotated as pseudogene rpl 23 (Table S6d). The B. 244 stacei and B. hybridum plastomes also contained a deletion of a rps 19 copy between psb A and 245 trn H in the IRb repeat, which was confirmed through Sanger sequencing (Fig. S2c). The 246 presence of these genomic rearrangements in the plastid genomes of the three B. hybridum 247 accessions suggests that they were inherited from B. stacei type maternal parents. Six 248 polymorphisms were detected between the B. hybridum and B. stacei plastomes (Table S6e). 249 These polymorphisms were located in intergenic regions, except for a Syn substitution in psb T 250 (ecotype BdTR6G, B. hybridum ) and a NSyn mutation in one copy of rpl 23 in ABR113 ( B. 251 hybridum ). A conceptual RNAedited translation (U to C) was inferred in the B. hybridum and 252 B. stacei ndh B gene, asFor well as in thePeer ndh K gene of Review the B. distachyon Gaz8 ecotype. 253

254 Genealogy, haplotypic groups and diversity of B. distachyon plastomes

255 Beast (Fig. 3a), ML (Fig. S3a) and BI (Fig. S3b) analyses detected two main diverging 256 lineages within B. distachyon that were structured phenotypically (Fig. 3a – Plastome tree, 257 Table S8). One of them corresponded to an Extremely Delayed Flowering (EDF+) clade, and 258 the second to a SpanishTurkish (S+T+) clade of remaining accessions, which showed a 259 mixture of flowering phenotypes (Fig. 3a – Plastome tree, Table S8). The second clade was 260 further substructured geographically into a paraphyletic Western group (“Spanish” group – 261 S+), including almost all ecotypes from Spain, and Italy, and a monophyletic Eastern 262 group (“Turkish” group – T+), including ecotypes from Turkey and Iraq, plus two Spanish 263 accessions (ABR3, Uni2). While the divergences of the main lineages and sublineages had 264 high bootstrap support (BS) and posterior probability support (PPS), the support of some 265 internal branches of the S+ group was low (Figs. 3a – Plastome tree, S3a, b).

266 Haplotypic network analyses detected 36 or 32 distinct cpDNA haplotypes, including or 267 excluding indels, respectively (Table S7). A set of 298 nucleotide polymorphic sites extracted 268 from the full B. distachyon plastome alignment confirmed the occurrence of 32 distinct 269 cpDNA haplotypes; 6 haplotypes were shared by different accessions (H1: 13; H2: 2; H3: 3; 270 H4: 4; H5: 3; H6: 2) and 26 haplotypes were unique (Table S7). The TCS analysis clustered 271 the 32 haplotypes into six groups (Fig. 3b), matching the structure observed in the 272 genealogical cpDNA tree (Fig. 3a – Plastome tree). The haplotypic network was fully resolved 273 except for one internal loop. The EDF+ haplotypes were separated from the cluster of S+ 274 group and T+ group haplotypes by 59 and 74 step mutations, respectively. Within the EDF+

Manuscript submitted to New Phytologist for review 12

Page 13 of 42

275 group there were two highly isolated clusters separated by 57 steps, one including only 276 Turkish accessions (BdTR7A, H3, H5) and the second including Turkish and eastern 277 European accessions (H4, Bd11, Bd291). The isolated Spanish Arn1 + Mon3 accessions of 278 the S+T+ group showed an internal loop connecting its haplotypes with those of the EDF+ 279 group (70 steps) and those of remaining accessions of the S+T+ group (61 steps). Within the 280 core S+T+ group, the haplotypes clustered into four relatively close clusters, three of them 281 including only accessions from the West (Spain, France and Italy), and the fourth cluster 282 including mostly accessions from the East (Turkey, Iraq, plus Uni2 and ABR3) (Fig. 3b).

283 Plastome genomic diversity was variable within B. distachyon accessions (S = 298, h = 32, Hd 284 = 0.933), and especiallyFor within the Peer S+ (S = 137, h Review= 17, Hd = 0.993) and EDF+ (S= 107, h = 285 6, Hd = 0.846) groups (Table 1a). Our analyses indicated that the T+ group was less variable 286 (S =12, h = 9, Hd = 0.658) than the others. Diversity θπ values were not significantly different 287 among groups. The S+ and T+ groups showed the lowest number of nucleotide differences (d= 288 33.970), reflecting their close genomic affinities. In contrast, the EDF+ group showed the 289 highest nucleotide differences to any of them (EDF+ – S+, d = 112.632; EDF+ – T+, d = 290 112.790) though it also shared 6 polymorphisms with the S+ group (EDF+ – S+, shm = 6) 291 (Table 1b).

292 When the B. distachyon plastome genealogy was compared to a SNPbased nuclear pan 293 genome genealogy generated in our parallel study (Fig. 3a – Nuclear tree, Gordon SP et al. 294 under review ), the plastome tree revealed eleven cases of potential chloroplast capture and 295 introgression. Seven cases (BdTR11A, BdTR11I, BdTR11G, BdTR13A, BdTR13C, BdTR3C, 296 Bis1), corresponded to nuclear T+ ecotypes nested within the plastid EDF+ clade, two cases 297 (ABR3, Uni2) to nuclear S+ ecotypes nested within the plastid T+ group, and two cases 298 (Arn1, Mon3) to introgressed nuclear EDF+ ecotypes nested (and introgressed) within the 299 plastid S+T+ clade (Fig. 3). All these cases suggest the existence of gene flow between the 300 most diverged B. distachyon lineages. The STRUCTURE search further confirmed the 301 potential ‘admixed’ nature of the Arn1 and Mon3 plastomes. The Bayesian structure analysis 302 selected two optimal plastome groups (best K=2) that corresponded to the EDF+ and S+T+ 303 clades, with individual haplotypes showing high percentages of membership (>95%) to their 304 respective groups except the Arn1 and Mon3 haplotypes that showed similar percentages (40 305 60%) to both groups (Fig. 3a – plastome structure; Table S9). The next optimal grouping was 306 for K=4; in this partition EDF+, S+ and T+ haplotypes clustered separately and the Arn1 and 307 Mon3 haplotypes formed an independent group (all memberships >95%). None of the

Manuscript submitted to New Phytologist for review 13

Page 14 of 42

308 recombination methods assayed in RDP4 and OrgConv detected significant recombination in 309 our data set; however, visual inspection of the polymorphic data matrix detected potential 310 microrecombination events in Arn1 and Mon3 (Fig. S4). Both haplotypes showed a large part 311 of their sequences (polymorphic positions 1 225) similar to S+T+ sequences, and a small part 312 of them (polymorphic positions 226 – 230) similar to EDF+ sequences. Polymorphic positions 313 1 – 237, 238 – 245 and 246 298 were located in the LSC, IR and SSC regions, respectively 314 (Figs. 2, S4).

315 316 Phylogenomics and divergenceFor timePeer estimations Reviewof Poaceae and B. distachyon lineages 317 ML (Fig. S5a) and BI (Fig. S5b) phylogenomic analysis of the grass plastome data set (Table 318 S2) placed the monophyletic Brachypodium lineage in an intermediate and strongly supported 319 diverging position within the Pooideae clade. Brachypodium was resolved as sister to the 320 recently evolved core pooid clade, whereas the close Diarrheneae ( Diarrhena ) lineage was 321 sister to the Brachypodium + core clade. Relationships among successively diverging basal 322 Pooideae (Brachyelytreae, Phaenospematae, Meliceae, Stipeae) and BEP (Bambusoideae, 323 Ehrartoideae) and PACMAD () lineages were congruent with previous studies; 324 most bifurcations in the topology showed strong BS and PPS values. Within Brachypodium , 325 the B. stacei clade (formed by B. stacei and the staceilike B. hybridum plastomes) was 326 resolved as sister to the B. distachyon clade. The latter lineage showed the divergence of the 327 strongly supported EDF+ and S+T+ clades (Figs. S5a, b).

328 Both plastome raw pairwise genetic distances and pairwise patristic (RAxML tree) distances 329 (Table S10, Fig. 4) supported the intermediate evolutionary position of Brachypodium within 330 the Pooideae clade (Fig. S5a, b). Moreover, TamuraNei (raw) genetic and patristic distances 331 indicated a closer relationship of Brachypodieae to more ancestral basal pooid lineages (e. g., 332 smaller genetic /patristic distances to Stipeae and Phaenospermatae than to recently evolved 333 core pooid lineages (Triticodae, Poodae) (Table S10, Fig. 4). They also revealed its closest 334 relatedness to its evolutionarily nearest relative Diarrheneae. Distances of Brachypodieae to 335 some Poodae lineages (e. g., Loliinae, Anthoxanthiinae were similar to those observed to less 336 related (e. g., Bambusoideae, Ehrartoideae (Rhynchorhiza ), or even much less related 337 (Puelia ,) lineages (Table S10, Fig. 4).

338 The BEAST cpDNA maximum clade credibility (MCC) tree yielded the same topology of 339 Poaceae (Figs. 5, S6a) than that of the ML and BI trees (Figs. S5a, b). The dating analysis

Manuscript submitted to New Phytologist for review 14

Page 15 of 42

340 inferred intermediate Early Oligocene divergence times for the stem nodes of the Diarrheneae 341 (31.9 Mya) and Brachypodieae (30.9 Mya) lineages, and divergence ages ranging from the 342 more ancestral MidLate Eocene splits of the basal pooids (Brachyelytreae, 44.2 Mya; 343 Phaenospermatae, 38.4 Mya; Meliceae, 36.7 Mya; Stipeae, 35.3 Mya) to the recent Late 344 OligoceneEarly Miocene splits of the core pooids (crown, 27.8 Mya; Poodae, 23.9 Mya; 345 Triticodae, 17.6 Mya) lineages. A Midlate Miocene age (10.1 Ma) was estimated for the B. 346 stacei / B. distachyon split and a recent MidPleistocene age (0.9 Ma) for the split of the most 347 recent common ancestor (MRCA) of B. distachyon (Figs. 5, S6a). According to our nested 348 dating analysis, intraspecific divergences within B. distachyon occurred very recently, during 349 the last half million yearsFor (e. g., EDF+Peer and S+T+ Reviewsplits, 0.55 Mya; Figs. 3a – Plastome tree, 350 S6b).

351

352 DISCUSSION

353

354 The chloroplast genomes of Brachypodium 355 Our study allowed us to construct the first largescale intraspecific plastome analysis of a grass 356 for the model species B. distachyon and a comparative genomics analysis with its close 357 congeners B. stacei and B. hybridum (Fig. 2; Table S5). We detected two major 358 rearrangements in between the B. distachyon and B. stacei/B. hybridum plastomes (Fig. S2), 359 and no structural changes but a total of 415 polymorphisms (298 without indels) among the 53 360 B. distachyon ecotypes (Tables S6a, b). A 1,161 bp insert and the deletion of one copy of the 361 rps 19 gene, discovered in both the B. stacei and B. hybridum ecotypes, indicates that the 362 former is likely the maternal diploid plastome donor of the B. hybridum line used in this study 363 and is consistent with previous findings that B. stacei was the maternal progenitor of most, 364 though not all, wild B. hybridum populations (LópezAlvarez et al. , 2012). The scarce number 365 of polymorphisms (6) found in the B. hybridum as compared to the B. stacei plastome (Table 366 S6e) indicates either that the B. hybridum plastome has remained almost intact since the 367 formation of B. hybridum or that there has been continuous gene flow from B. stacei into B. 368 hybridum (e. g., in PleistoceneHolocene times, after the dated split of B. distachyon parent; 369 Figs. 3a, S6b).

370 The 1.1 kbp insert found in the B. stacei/B.hybridum plastomes contains a rpl 23 pseudogene of 371 225 bp located around position 56,335bp (Table S6c; Figs. 2, S2). The presence of a rpl 23

Manuscript submitted to New Phytologist for review 15

Page 16 of 42

372 pseudogene in this region has been reported in several monocots and in a large number of 373 grasses, with insert sizes ranging from 40 – 243 bp (Morris & Duvall, 2010), whereas other 374 authors have detected a functional rpl 23 copy in Agrostis stolonifera (NC_008591) and 375 Sorghum bicolor (NC_008602) (Saski et al. , 2007). In this study, all the studied B. distachyon 376 plastomes lack the insert and show two annotated rpl 23 functional copies and no pseudogene, 377 whereas the B. stacei/B. hybridum plastomes have also two functional rpl 23 copies plus the 378 rbc L psa I insert rpl 23 pseudogene (Table S6c, Fig. 2). In monocots the trn Hrps 19 cluster 379 that contains rps 19 is located within the junctions of LSC and the two inverted repeats (Borsch 380 and Quandt 2009 and references therein). Wang et al. (2008) described three types of IRLSC 381 junctions based on theFor organization Peer of their flanki Reviewng genes in several monocots and dicots. 382 While the studied B. distachyon plastomes fit the type III class (two trn Hrps 19 clusters 383 present in both IRs), typical of monocots, the B. stacei/B.hybridum plastomes show a single 384 rps 19 copy near the rpl 22 functional LSC flanking gene, and the lack of the second rps 19 385 copy, fitting best the type I junction model. The type I class is mostly found in basal 386 angiosperms, Magnoliids and Eudicots (Wang et al. , 2008). Thus the rbc L psa I insert rpl 23 387 pseudogene and the trn Hrps 19 type I cluster constitute landmarks of the more ancestral B. 388 stacei chloroplast genome.

389

390 Flowering time divergence, chloroplast capture and introgression in B. distachyon 391 plastomes 392 Our genealogical and haplotypic network analyses have detected a main split of two 393 intraspecific B. distachyon lineages (EDF+ vs S+T+) that are not primarily connected with 394 geography but with flowering time phenotypic traits, though the second clade is further 395 separated into two geographically disjunct western (S+) and eastern (T+) circum 396 Mediterranean groups (Figs. 3a – Plastome tree, S3a, b, Table S8). Haplotypic divergence data 397 confirm the isolation of the EDF+ clade from the S+ and T+ genomic groups and similar 398 haplotypic diversity values of EDF+ and S+ (Table 1a, b). Intraspecific evolutionary studies of 399 organisms tend to recover the spatiotemporal divergences of populations, that are usually 400 associated with a geographical distribution, detecting a typical isolationbydistance (IBD) 401 pattern (Wright, 1943; Jenkins et al. , 2010). However, long distance dispersal events and 402 biological and ecological traits have influenced the population structure in B. distachyon ( 403 Vogel et al. , 2009; Mur et al. , 2011; LópezAlvarez et al. , 2012; Tyler et al. , 2016). Here, we 404 have detected a strong influence of flowering time in the ancestral divergence of the B.

Manuscript submitted to New Phytologist for review 16

Page 17 of 42

405 distachyon EDF+ and S+T+ lineages, as several EDF+ lines (BdTR7A, BdTR8I, Tek2, Tek4) 406 flower considerably later than the S+T+ lines (Fig. 3a – Plastome tree, Table S8). Our parallel 407 nuclear pangenome study of B. distachyon has also recovered a main EDF+ clade, including 408 all the EDF lines of our plastome clade (Fig. 3a – Nuclear tree), and recent population genetic 409 studies of B. distachyon based on GBS (Tyler et al. , 2016) data have also found it. Thus, 410 flowering time is a main biological factor controlling the divergence of the major annual B. 411 distachyon clades since the late Pleistocene (0.90.55 Mya) (Figs. 3a – Plastome and Nuclear 412 trees, S6b). Flowering time has been extensively studied in temperate cereals (barley, ) 413 that have winter and spring races governed by vernalization and photoperiod requirements 414 analogous to the delayedFor and rapid Peer flowering phenot Reviewypes observed in B. distachyon (Vogel & 415 Bragg, 2009; Schwartz et al. , 2010; ColtonGagnon et al. , 2014; Ream et al. , 2014; Woods et 416 al. , 2014). Our study highlights the evolutionary importance of flowering time in driving 417 intraspecies divergence.

418 It could be expected that flowering time isolation would create a barrier to gene flow, which 419 might ultimately lead to (micro) speciation (Silvertown et al. , 2005; Lowry et al. , 2008; Noirot 420 et al. , 2015). However, our study has demonstrated that it is not the case in B. distachyon , 421 where frequent introgressions have apparently occurred between the EDF+ and S+T+ clades 422 during the last half million years (Figs. 3a, S6b). Topological comparison between the 423 plastome and nuclear trees (Figs. 3a) indicated that seven Turkish accessions (BdTR11A, 424 BdTR11I, BdTR11G, BdTR13A, BdTR13C, BdTR3C, Bis1 ) that are deeply and strongly 425 nested within the eastern group of the S+T+ clade in the nuclear tree are, however, deeply and 426 strongly nested within the eastern group of EDF+ clade in the plastome tree and network. 427 Similarly, two Spanish accessions (ABR3, Uni2) deeply nested within western group of the 428 S+T+ clade in the nuclear tree are instead nested within the eastern group of S+T+ clade in the 429 plastome tree, though with low support (Figs. 3a, b, S3a, b). Moreover, two Spanish 430 accessions (Arn1, Mon3) which are part of the EDF+ clade in the nuclear tree, are nested 431 within the S+T+ clade in the plastome tree, and form a loop with an EDF+ subgroup in the 432 plastome haplotypic network (Figs. 3a, b, S3a, b). Interestingly, genomic structure analyses 433 indicated considerable introgression signals in the Arn1 and Mon3 nuclear and plastid 434 genomes, whereas the seven Turkish accessions and the two Spanish accessions do not show 435 introgression evidences to the other genetic group in their chloroplast or nuclear genomes 436 (Figs. 3a – plastome genomic structure, S4). These results support the occurrence of two 437 different introgression events. An early introgression of a S+T+ Spanish lineage with a

Manuscript submitted to New Phytologist for review 17

Page 18 of 42

438 member of the EDF+ clade could have originated the admixed ancestor of the Arn1/Mon3 439 lineage that kept most of its maternal S+T+ plastome but 2/3 of its paternal nuclear EDF+ 440 genome over generations (Gordon SP et al., unpublished data). According to our dating 441 analysis, this introgression likely occurred in IonianUpper Pleistocene times (0.55 – 0.02 442 Mya) (Figs. 3a, S6b). By contrast, more recent late PleistoceneHolocene (0.025 – 0.007 Mya ) 443 introgressions between geographically close Turkish EDF+ and S+T+ lines likely resulted in 444 the seven lines that show chloroplast capture for their intact EDF+ plastomes in combination 445 with their intact paternal nuclear S+T+ genomes, the later probably originated through 446 repeated backcrossing to paternal S+T+ individuals (Figs. 3a, S4, S6b). A similar late 447 PleistoceneHolocene Forscenario of introgressionsPeer andReview repeated backcrossing, though between 448 geographically distant S+ and T+ lines, probably resulted in the two Spanish lines that show 449 chloroplast capture for their intact T+ maternal plastomes and their paternal nuclear S+ 450 genomes (Figs. 3a, S4). These observations support previous evidences of long distance 451 dispersal of eastern B. distachyon seeds to the West across the Mediterranean basin (cf. López 452 Álvarez et al. 2012, 2015). Additionally, Uni2 shows a significantly smaller inbreeding

453 coefficient (F is =0.48) than the remaining highly selfed B. distachyon accessions (median

454 Fis =0.88), (Gordon SP et al., unpublished data), suggesting than the reduced F is might be 455 reflective of recent potential interpopulation crosses.

456 Our analyses also points towards to the potential existence of heteroplasmic recombination in 457 the Arn1 and Mon3 plastomes (Fig. 3a – plastome structure; Table S9). Also, visual inspection 458 of the polymorphic data matrix identified a large proportion of their plastomes as S+T+type 459 and a smaller proportion of them (e. g., microrecombinations) as EDF+type (Fig. S4). 460 Natural chloroplast heteroplasmy originated from biparentally inherited chloroplasts is a rare 461 event in angiosperms, where plastid inheritance is considered to be mostly maternally 462 transmitted (Jansen & Ruhlman, 2012). However, evidences of cpDNA biparental inheritance 463 and of cpDNA introgression have been documented in some flowering plants (Mason et al . 464 1994, 1995; Mogensen 1996), and frequent levels of heteroplasmy and of potential inter or 465 intraspecific recombination have been detected in the plastomes of the highly hybridogenous 466 genus Citrus (CarbonellCaballero et al. , 2015). Also, interspecific chloroplast recombination 467 was observed after somatic cell fusion in Nicotiana (Medgyesy et al. , 1985). Our study reports 468 the first case of potential intraspecific recombination between different plastome types in 469 these two introgressed B. distachyon accessions.

470

Manuscript submitted to New Phytologist for review 18

Page 19 of 42

471 Evolutionary placement of a model genus for both temperate and tropical grasses

472 The phylogenomic analysis of 145 grass plastomes allowed us to infer the phylogenetic 473 placement of Brachypodium and to calculate its genetic and patristic distances to other grass 474 lineages (Table S10; Figs. 4, 5, S5a, b, S6a). The intermediate nesting of Brachypodium within 475 the Pooideae clade and the relationships of the other Poaceae lineages agree with previous 476 studies based on nuclear or plastid genes (BouchenakKhelladi et al. , 2008; Schneider et al. , 477 2011; Hochbach et al. , 2015; Soreng et al. , 2015) or whole plastome sequences (Saarela et al. , 478 2015). The sister but noninclusive relationship of Brachypodium to the core pooid clade 479 [Triticodae (+Bromeae)/Poodae (Poeae+Aveneae)], originally proposed by Davis and 480 Soreng (1993), was abandonedFor in Peerfavor of the inclus Reviewion of Brachypodium within the ‘core 481 pooids’, a nontaxonomic but independently evolved natural group, in some recent analyses 482 (Davis & Soreng, 2007; Saarela et al. , 2015; Soreng et al. , 2015). Our ML and BI analyses 483 support the sister relationship proposed by Davis and Soreng (Figs. S5a, b) as well as 484 divergence times intermediate between those of the basal ancestral pooids and the recently 485 evolved core pooids (Fig. 5, S6a). Our pairwise cpDNA genetic and patristic distances have 486 further confirmed that Brachypodium is closer to some basal pooid lineages than to the core 487 pooid lineages (Table S10; Fig. 4), corroborating similar results based on nuclear single copy 488 genes (Minaya et al. , 2015), and similarly close to some core pooid groups than to more 489 distant Ehrartoideae and Pulioideae lineages. The evolutionary placement of Brachypodium in 490 the Poaceae supports its utility as model system for the monocots as has been recently 491 manifested in functional genomic studies of regulation of vernalization and flowering time. B. 492 distachyon shows either seasonal response to flowering mechanisms close to those of core 493 pooid grasses adapted to cold and temperate climates (Fjellheim et al. , 2014) , and new 494 flowering repressor vernalization genes shared with basal pooids, other tropical and 495 subtropical grasses and less related Musaceae and Arecaceae (Woods et al. , 2016). The 496 isolated and ‘bridging’ intermediate position of Brachypodium within the grasses convincingly 497 support its value as a model genus for all type of grasses, particularly for bioenergy crops from 498 different grass subfamilies (e. g., Miscanthus , Paspalum (Panicoideae), Thinopyrum 499 (Pooideae).

500 Our estimated divergence times for the main Poaceae lineages ( Ehrhartoideae, 52 Mya; 501 Bambusoideae 49 Mya; Pooideae, 44 Mya) (Figs. 5, S6a) are in agreement with those 502 calculated by BouchenakKhelladi et al. (2010), Christin et al. (2014) and Vicentini et al. 503 (2008) but slightly older than those estimated by Wu and Ge (2012). Our results support early

Manuscript submitted to New Phytologist for review 19

Page 20 of 42

504 Oligocene (32 Mya) and late Miocene (10 Mya) splits for the respective stem and crown 505 Brachypodium nodes, which are also slightly older than those calculated by Catalán et al. 506 (2012), though the highest posterior density ( HPD) range intervals overlap in both studies. The 507 relatively old divergence inferred for the annual B. stacei and B. distachyon lineages in the late 508 Miocene contrasts with the very recent burst of the intraspecific B. distachyon lineages. The 509 estimated time of the late radiation (0.9 Mya) is in agreement with the estimated age of B. 510 hybridum (~1 Mya; cf. Catalán et al. 2012), the allotetraploid derivative of crosses between B. 511 stacei and B. distachyon . Thus the two complementary dating analyses fit a Mid Pleistocene 512 scenario for the almost contemporary origins of both parent and hybrid species.

513 For Peer Review

514 CONCLUSION

515 Our comparative genomic study of whole plastome sequences of B. distachyon and its close 516 relatives allowed us to detect intraspecific introgressions and other associated evolutionary 517 events (e. g., biparental plastome inheritance, heteroplasmy) that could not be detected with 518 single genes. The observed plastome admixture that goes along with the nuclear genome 519 admixture in the B. distachyon Arn1 and Mon3 lines, and the essential swapping of plastomes 520 among the three different B. distachyon plastome groups (EDF+, S+, T+), likely resulted from 521 random backcrossing, followed by stabilization through selection pressure. The chloroplast 522 genome of B. distachyon is much more constrained as compared to its nuclear genome since 523 we don’t observe variation in the plastome genes.

524

525 ACKNOWLEDGEMENTS

526 We thank Drs. Daniel Woods and Weilon Hao for fruitful discussions about flowering time

527 features of the studied B. distachyon accessions and potential plastome recombination events,

528 respectively. PC, BCM, RS and DLA received funding from the Spanish Ministry of Economy

529 and Competitivity (Mineco) grant projects (CGL201239953C0201, CSIC134E2490 and

530 CGL201679790P). BC was funded by Fundación ARAID. RS and DLA were funded by

531 their respective Spanish Mineco PhD fellowships. PC, RS and DLA were partially funded by a

532 Bioflora grant cofunded by the Spanish Aragon Government and the European Social Fund.

Manuscript submitted to New Phytologist for review 20

Page 21 of 42

533 The work conducted by the US DOE Joint Genome Institute is supported by the Office of

534 Science of the US Department of Energy under Contract no. DEAC0205CH11231.

535

536 AUTHOR CONTRIBUTIONS

537 BCM and PC designed the experiment, SG and JV collected the data, RS, CPC, DLA, BCM

538 and PC performed the analyses, RS, CPC, BCM and PC wrote the paper, SG and JV

539 contributed to the writing. All authors read, commented and approved the paper. The authors

540 declare no conflict of interest.For Peer Review

541

542 REFERENCES

543 Boetzer M, Henkel C V., Jansen HJ, Butler D, Pirovano W. 2011 . Scaffolding pre 544 assembled contigs using SSPACE. Bioinformatics 27 : 578–579.

545 Boetzer M, Pirovano W. 2012 . Toward almost closed genomes with GapFiller. Genome 546 biology 13 : R56.

547 Bolger AM, Lohse M, Usadel B . 2014 . Trimmomatic: A flexible trimmer for Illumina 548 sequence data. Bioinformatics 30 : 2114–2120.

549 Borsch T, Quandt D . 2009 . Mutational dynamics and phylogenetic utility of noncoding 550 chloroplast DNA. Plant Systematics and Evolution 282 : 169–199.

551 Bortiri E, Coleman-Derr D, Lazo GR, Anderson OD, Gu YQ . 2008 . The complete 552 chloroplast genome sequence of Brachypodium distachyon: sequence comparison and 553 phylogenetic analysis of eight grass plastomes. BMC Research Notes 1: 61.

554 Bouchenak-Khelladi Y, Salamin N, Savolainen V, Forest F, Bank M v d, Chase MW, 555 Hodkinson TR . 2008 . Large multigene phylogenetic trees of the grasses (Poaceae): Progress 556 towards complete tribal and generic level sampling. Molecular Phylogenetics and Evolution 557 47 : 488–505.

558 Bouchenak-Khelladi Y, Verboom GA, Savolainen V, Hodkinson TR . 2010 . Biogeography 559 of the grasses (Poaceae): A phylogenetic approach to reveal evolutionary history in 560 geographical space and geological time. Botanical Journal of the Linnean Society 162 : 543–

Manuscript submitted to New Phytologist for review 21

Page 22 of 42

561 557.

562 Bremer K . 2002 . Gondwanan Evolution of the Grass Alliance of Families (). Evolution 563 56 : 1374–1387.

564 Byars SG, Parsons Y, Hoffmann AA . 2009 . Effect of altitude on the genetic structure of an 565 Alpine grass , hiemata. Annals of Botany 103 : 885–899.

566 Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T . 2009 . trimAl: A tool for automated 567 alignment trimming in largescale phylogenetic analyses. Bioinformatics 25 : 1972–1973.

568 Carbonell-Caballero J, Alonso R, Ibañez V, Terol J, Talon M, Dopazo J . 2015 . A 569 phylogenetic analysis ofFor 34 chloroplast Peer genomes elu Reviewcidates the relationships between wild and 570 domestic species within the genus Citrus. Molecular biology and evolution 32 : msv082–.

571 Catalán P, Chalhoub B, Chochois V, Garvin DF, Hasterok R, Manzaneda AJ, Mur LAJ, 572 Pecchioni N, Rasmussen SK, Vogel JP, et al. 2014 . Update on the genomics and basic 573 biology of Brachypodium. International Brachypodium Initiative (IBI). Trends in Plant 574 Science 19 : 414–418.

575 Catalán P, Kellogg EA, Olmstead RG . 1997 . Phylogeny of Poaceae subfamily Pooideae 576 based on chloroplast ndhF gene sequences. Molecular phylogenetics and evolution 8: 150– 577 166.

578 Catalán P, López-álvarez D, Bellosta C, Villar L . 2016 . Updated taxonomic descriptions , 579 iconography , and habitat preferences of Brachypodium distachyon , B . stacei , and B . 580 hybridum ( Poaceae ). 73 : 1–14.

581 Catalán P, López-Alvarez D, Díaz-Pérez A, Sancho R, López-Herránz ML . 2015 . 582 Phylogeny and Evolution of the Genus Brachypodium. In: Vogel JP, ed. Plant Genetics and 583 Genomics: Crops Models.9–38.

584 Catalán P, Müller J, Hasterok R, Jenkins G, Mur LAJ, Langdon T, Betekhtin A, 585 Siwinska D, Pimentel M, López-Alvarez D . 2012 . Evolution and taxonomic split of the 586 model grass Brachypodium distachyon. Annals of Botany 109 : 385–405.

587 Catalán P, Olmstead RG . 2000 . Phylogenetic reconstruction of the genus Brachypodium P. 588 Beauv. (Poaceae) from combined sequences of chloroplast ndhF gene and nuclear ITS. Plant 589 Systematics and Evolution 220 : 1–19.

590 Chaw SM, Chang CC, Chen HL, Li WH . 2004 . Dating the monocotdicot divergence and

Manuscript submitted to New Phytologist for review 22

Page 23 of 42

591 the origin of core eudicots using whole chloroplast genomes. Journal of Molecular Evolution 592 58 : 424–441.

593 Christin PA, Spriggs E, Osborne CP, Strömberg CAE, Salamin N, Edwards EJ . 2014 . 594 Molecular dating, evolutionary rates, and the age of the grasses. Systematic Biology 63 : 153– 595 165.

596 Clement M, Posada D, Crandall KA . 2000 . TCS: a computer program to estimate gene 597 genealogies. Molecular Ecology 9: 1657–1660.

598 Colton-Gagnon K, Ali-Benali MA, Mayer BF, Dionne R, Bertrand A, Do Carmo S, 599 Charron JB . 2014 . ComparativeFor analysisPeer of the cold Review acclimation and freezing tolerance 600 capacities of seven diploid Brachypodium distachyon accessions. Annals of Botany 113 : 681– 601 693.

602 Darriba D, Taboada GL, Doallo R, Posada D . 2012 . jModelTest 2: more models, new 603 heuristics and parallel computing. Nature Methods 9: 772.

604 Davis JI, Soreng RJ . 1993 . Phylogenetic structure in the grass family (Poaceae) as inferred 605 from chloroplast DNA restriction site variation. American Journal of Botany 80 : 1444–1454.

606 Davis JI, Soreng RJ . 2007 . A preliminary phylogenetic analysis of the grass subfamily 607 Pooideae (Poaceae), with attention to structural features of the plastid and nuclear genomes, 608 including an intron loss in GBSSI. Aliso: A Journal of Systematics and Evolutionary Botany 609 23 : 335–348.

610 Dinh Thi VH, Coriton O, Le Clainche I, Arnaud D, Gordon SP, Linc G, Catalan P, 611 Hasterok R, Vogel JP, Jahier J, Chalhoub B. 2016 . Creating synthetic Brachypodium 612 hybridum by Uniting the Divergent Genomes of B. distachyon and B. stacei . PlosOne (in 613 press).

614 Döring E, Schneider J, Hilu KW, Röser M, Hilu W, Rserl M, Doringl E . 2007 . 615 Phylogenetic relationships in the Aveneae/Poeae complex (Pooideae , Poaceae ). Kew Bulletin 616 62 : 407–424.

617 Drummond AJ, Suchard MA, Xie D, Rambaut A . 2012 . Bayesian phylogenetics with 618 BEAUti and the BEAST 1.7. Molecular Biology and Evolution 29 : 1969–1973.

619 Earl DA, vonHoldt BM . 2012 . STRUCTURE HARVESTER: A website and program for 620 visualizing STRUCTURE output and implementing the Evanno method. Conservation 621 Genetics Resources 4: 359–361.

Manuscript submitted to New Phytologist for review 23

Page 24 of 42

622 Evanno G, Regnaut S, Goudet J . 2005 . Detecting the number of clusters of individuals using 623 the software STRUCTURE: A simulation study. Molecular Ecology 14 : 2611–2620.

624 Filiz E, Ozdemir BS, Budak F, Vogel JP, Tuna M, Budak H . 2009 . Molecular, 625 morphological, and cytological analysis of diverse Brachypodium distachyon inbred lines. 626 Genome 52 : 876–890.

627 Fjellheim S, Boden S, Trevaskis B . 2014 . The role of seasonal flowering responses in 628 adaptation of grasses to temperate climates. 5: 1–15.

629 Garvin DF . 2007 . Brachypodium distachyon: A New Model System for Structural and 630 Functional Analysis ofFor Grass Genomes. Peer In: Varshney Review RK,, In: Koebner RMD, eds. Model 631 Plants and Crop Improvement.109–123.

632 Garvin DF, Gu YQ, Hasterok R, Hazen SP, Jenkins G, Mockler TC, Mur LAJ, Vogel JP . 633 2008 . Development of genetic and genomic research resources for Brachypodium distachyon, 634 a new model system for grass crop research. Crop Science : S69–S84.

635 Gordon SP, Priest H, Des Marais DL, Schackwitz W, Figueroa M, Martin J, Bragg JN, 636 Tyler L, Lee CR, Bryant D, et al. 2014 . Genome diversity in Brachypodium distachyon: 637 Deep sequencing of highly diverse inbred lines. The Plant Journal 79 : 361–374.

638 Guindon S, Gascuel O . 2003 . A Simple, Fast, and Accurate Algorithm to Estimate Large 639 Phylogenies by Maximum Likelihood. Systematic Biology 52 : 696–704.

640 Hochbach A, Schneider J, Röser M . 2015 . A multilocus analysis of phylogenetic 641 relationships within grass subfamily Pooideae (Poaceae) inferred from sequences of nuclear 642 single copy gene regions compared with plastid DNA. Molecular Phylogenetics and Evolution 643 87 : 14–27.

644 Jansen RK, Ruhlman TA . 2012 . Plastid genomes of seed plants. In: Bock R,, In: Knoop V, 645 eds. Genomics of chloroplast and mitochondria. Springer, 103–126.

646 Jenkins DG, Carey M, Czerniewska J, Fletcher J, Hether T, Jones A, Knight S, Knox J, 647 Long T, Mannino M, et al. 2010 . A metaanalysis of isolation by distance : relic or reference 648 standard for landscape genetics ? Ecography 33 : 315–320.

649 Katoh K, Standley DM . 2013 . MAFFT multiple sequence alignment software version 7: 650 Improvements in performance and usability. Molecular Biology and Evolution 30 : 772–780.

651 Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper

Manuscript submitted to New Phytologist for review 24

Page 25 of 42

652 A, Markowitz S, Duran C, et al. 2012 . Geneious Basic: An integrated and extendable 653 desktop software platform for the organization and analysis of sequence data. Bioinformatics 654 28 : 1647–1649.

655 Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra 656 MA . 2009 . Circos: An information aesthetic for comparative genomics. Genome Research 19 : 657 1639–1645.

658 Kumar S, Stecher G, Tamura K . 2016 . MEGA7: Molecular Evolutionary Genetics Analysis 659 version 7.0 for bigger datasets. Molecular Biology and Evolution . 660 Librado P, Rozas J . 2009For. DnaSP Peerv5: A software forReview comprehensive analysis of DNA 661 polymorphism data. Bioinformatics 25 : 1451–1452.

662 Liu Y, Schröder J, Schmidt B . 2013 . Musket: A multistage kmer spectrumbased error 663 corrector for Illumina sequence data. Bioinformatics 29 : 308–315.

664 Lohse M, Drechsel O, Kahlau S, Bock R . 2013 . OrganellarGenomeDRAWa suite of tools 665 for generating physical maps of plastid and mitochondrial genomes and visualizing expression 666 data sets. Nucleic acids research 41 : 575–581.

667 López-Alvarez D, López-Herranz ML, Betekhtin A, Catalán P . 2012 . A DNA Barcoding 668 Method to Discriminate between the Model Plant Brachypodium distachyon and Its Close 669 Relatives B. stacei and B. hybridum (Poaceae). PLoS ONE 7: e51058.

670 Lopez-Alvarez D, Manzaneda a. J, Rey PJ, Giraldo P, Benavente E, Allainguillaume J, 671 Mur L, Caicedo a. L, Hazen SP, Breiman A, et al. 2015 . Environmental niche variation and 672 evolutionary diversification of the Brachypodium distachyon grass complex species in their 673 native circumMediterranean range. American Journal of Botany 102 : 1073–1088.

674 Lowry DB, Modliszewski JL, Wright KM, Wu CA, Willis JH . 2008 . The strength and 675 genetic basis of reproductive isolating barriers in flowering plants. Philosophical transactions 676 of the Royal Society 363 : 3009–3021.

677 Ma PF, Zhang YX, Zeng CX, Guo ZH, Li DZ . 2014 . Chloroplast phylogenomic analyses 678 resolve deeplevel relationships of an intractable tribe (Poaceae). 679 Systematic Biology 63 : 933–950.

680 Mairal M, Pokorny L, Aldasoro JJ, Alarcón ML, Sanmartín I . 2015 . Ancient vicariance 681 and climatedriven extinction explain continentalwide disjunctions in : the case of the 682 Rand Flora genus Canarina (Campanulaceae). Molecular ecology 24 : 1335–1354.

Manuscript submitted to New Phytologist for review 25

Page 26 of 42

683 Martin DP, Murrell B, Golden M, Khoosal A, Muhire B. 2015 . RDP4: Detection and 684 analysis of recombination patterns in virus genomes. Virus Evolution 1: 1–5.

685 Medgyesy P, Fejes E, Maliga P . 1985 . Interspecific chloroplast recombination in a Nicotiana 686 somatic hybrid. Proceedings of the National Academy of Sciences of the of 687 America 82 : 6960–6964.

688 Middleton CP, Senerchia N, Stein N, Akhunov ED, Keller B, Wicker T, Kilian B . 2014 . 689 Sequencing of chloroplast genomes from wheat, barley, rye and their relatives provides a 690 detailed insight into the evolution of the triticeae tribe. PLoS ONE 9: e85761. 691 Minaya M, Díaz-PérezFor A, Mason-Gamer Peer R, Pimentel Review M, Catalán P . 2015 . Evolution of the 692 betaamylase gene in the temperate grasses: Nonpurifying selection, recombination, 693 semiparalogy, homeology and phylogenetic signal. Molecular Phylogenetics and Evolution 694 91 : 68–85.

695 Morris LM, Duvall MR . 2010 . The chloroplast genome of anomochloa marantoidea 696 (; Poaceae) comprises a mixture of grasslike and unique features. American 697 Journal of Botany 97 : 620–627.

698 Mur LAJ, Allainguillaume J, Catalán P, Hasterok R, Jenkins G, Lesniewska K, Thomas 699 I, Vogel J . 2011 . Exploiting the brachypodium tool box in and grass research. New 700 Phytologist 191 : 334–347.

701 Nadalin F, Vezzi F, Policriti A . 2012 . GapFiller: a de novo assembly approach to fill the gap 702 within paired reads. BMC Bioinformatics 13 : S8.

703 Nock CJ, Waters DLE, Edwards MA, Bowen SG, Rice N, Cordeiro GM, Henry RJ . 2011 . 704 Chloroplast genome sequences from total DNA for plant identification. Plant Biotechnology 705 Journal 9: 328–333.

706 Noirot M, Charrier A, Stoffelen P, Anthony F . 2015 . Reproductive isolation , gene flow and 707 speciation in the former Coffea subgenus : a review. Trees. Structure and function : 1–12.

708 Peterson DG, Tomkins JP, Frisch DA, Wing RA, Paterson AH . 2000 . Construction of 709 plant bacterial artificial chromosome (BAC) libraries: An illustrated guide. Journal of 710 Agricultural Genomics 5: 1–100.

711 Pokorny L, Oliván G, Shaw AJ . 2011 . Phylogeographic Patterns in Two Southern 712 Hemisphere Species of Calyptrochaeta (Daltoniaceae, Bryophyta). Systematic Botany 36 : 542– 713 553.

Manuscript submitted to New Phytologist for review 26

Page 27 of 42

714 Pritchard JK, Stephens M, Donnelly P . 2000 . Inference of population structure using 715 multilocus genotype data. Genetics 155 : 945–959.

716 Ream TS, Woods DP, Schwartz CJ, Sanabria CP, Mahoy J a, Walters EM, Kaeppler 717 HF, Amasino RM . 2014 . Interaction of photoperiod and vernalization determines flowering 718 time of Brachypodium distachyon. Plant physiology 164 : 694–709.

719 Ronen R, Boucher C, Chitsaz H, Pevzner P . 2012 . sEQuel: Improving the accuracy of 720 genome assemblies. Bioinformatics 28 : 188–196.

721 Ronquist F, Huelsenbeck JP . 2003 . MrBayes 3: Bayesian phylogenetic inference under 722 mixed models. BioinformaticsFor 19 : 1572–1574.Peer Review 723 Ronquist F, Huelsenbeck J, Teslenko M . 2011 . MrBayes Version 3.2 Manual: Tutorials and 724 Model Summaries. : 1–103.

725 Saarela JM, Wysocki WP, Barrett CF, Soreng RJ, Davis JI, Clark LG, Kelchner SA, 726 Pires JC, Edger PP, Mayfield DR, et al. 2015 . Plastid phylogenomics of the coolseason 727 grass subfamily: Clarification of relationships among earlydiverging tribes. AoB PLANTS 7: 728 1–27.

729 Saski C, Siri SL, Chittibabu F, Jansen RK, Luo H, Je V, Odd T, Rognli A, Daniell H, Liu 730 J. 2007 . Complete chloroplast genome sequences of Hordeum vulgare , Sorghum bicolor and 731 Agrostis stolonifera , and comparative analyses with other grass genomes. Theoretical and 732 Applied Genetics 115 : 571–590.

733 Schippmann U . 1991 . Revision der europäischen Arten der Gattung Brachypodium Palisot de 734 Beauvois (Poaceae). Boissiera 45 .

735 Schneider J, Winterfeld G, Hoffmann MH, Röser M . 2011 . Duthieeae, a new tribe of 736 grasses (Poaceae) identified among the early diverging lineages of subfamily Pooideae: 737 molecular phylogenetics, morphological delineation, cytogenetics, and biogeography. 738 Systematics and Biodiversity 9: 27–44.

739 Schwartz CJ, Doyle MR, Manzaneda AJ, Rey PJ, Mitchell-Olds T, Amasino RM . 2010 . 740 Natural variation of flowering time and vernalization responsiveness in Brachypodium 741 distachyon. Bioenergy Research 3: 38–46.

742 Silvertown J, Servaes C, Biss P, Macleod D . 2005 . Reinforcement of reproductive isolation 743 between adjacent populations in the Park Grass Experiment. Heredity 95 : 198–205.

Manuscript submitted to New Phytologist for review 27

Page 28 of 42

744 Soreng RJ, Peterson PM, Romaschenko K, Davidse G, Zuloaga FO, Judziewicz EJ, 745 Filgueiras TS, Davis JI, Morrone O . 2015 . A worldwide phylogenetic classification of the 746 Poaceae (Gramineae); A worldwide phylogenetic classification of the Poaceae (Gramineae). 747 Journal of Systematics and Evolution 53 : 117–137.

748 Stamatakis A . 2014 . RAxML version 8: A tool for phylogenetic analysis and postanalysis of 749 large phylogenies. Bioinformatics 30 : 1312–1313.

750 Strien MJ Van, Holderegger R, Heck HJ Van . 2014 . Isolationbydistance in landscapes : 751 considerations for landscape genetics. Heredity 114 : 27–37. 752 Thorvaldsdóttir H, RobinsonFor JT, Peer Mesirov JP . 2013 Review. Integrative Genomics Viewer (IGV): 753 Highperformance genomics data visualization and exploration. Briefings in Bioinformatics 754 14 : 178–192.

755 Tyler AL, Lee SJ, Young ND, Deiulio GA, Benavente E, Reagon M, Sysopha J, Baldini 756 RM, Troìa A, Hazen SP, et al. 2016 . Population structure in the model grass Brachypodium 757 distachyon is highly correlated with flowering differences across broad geographic areas. The 758 Plant Genome : 1–55.

759 Vicentini A, Barber JC, Aliscioni SS, Giussani LM, Kellogg EA . 2008 . The age of the 760 grasses and clusters of origins of C4 photosynthesis. Global Change Biology 14 : 2963–2977.

761 Vogel JP . 2016 . The Rise of Brachypodium as a Model System. In: Vogel JP, ed. Genetics 762 and genomics of Brachypodium. Springer, 1–8.

763 Vogel J, Bragg J . 2009 . Brachypodium distachyon, a New Model for the Triticeae. In: 764 Muehlbauer GJ,, In: Feuillet C, eds.427–449.

765 Vogel JP, Garvin DF, Leong OM, Hayden DM . 2006 . Agrobacteriummediated 766 transformation and inbred line development in the model grass Brachypodium distachyon. 767 Plant Cell, Tissue and Organ Culture 84 : 199–211.

768 Vogel JP, Garvin DF, Mockler TC, Schmutz J, Rokhsar D, Bevan MW, Barry K, Lucas 769 S, Harmon-Smith M, Lail K, et al. 2010 . Genome sequencing and analysis of the model 770 grass Brachypodium distachyon. Nature 463 : 763–768.

771 Vogel J, Hill T . 2008 . Highefficiency Agrobacteriummediated transformation of 772 Brachypodium distachyon inbred line Bd213. Plant Cell Reports 27 : 471–478.

773 Vogel JP, Tuna M, Budak H, Huo N, Gu YQ, Steinwand MA . 2009 . Development of SSR

Manuscript submitted to New Phytologist for review 28

Page 29 of 42

774 markers and analysis of diversity in Turkish populations of Brachypodium distachyon. BMC 775 plant biology 9: 88.

776 Wang R-J, Cheng C-L, Chang C-C, Wu C-L, Su T-M, Chaw S-M. 2008 . Dynamics and 777 evolution of the inverted repeatlarge single copy junctions in the chloroplast genomes of 778 monocots. BMC evolutionary biology 8: 36.

779 Waters DLE, Nock CJ, Ishikawa R, Rice N, Henry RJ . 2012 . Chloroplast genome sequence 780 confirms distinctness of Australian and Asian wild rice. Ecology and Evolution 2: 211–217.

781 Woods DP, Mckeown MA, Dong Y, Preston JC, Amasino RM . 2016 . Evolution of VRN2 / 782 GhD7 Like Genes in VernalizationMediatedFor Peer Repressi Reviewon of Grass. Plant Physiology 170 : 1– 783 12.

784 Woods DP, Ream TS, Amasino RM . 2014 . Memory of the vernalized state in plants 785 including the model grass Brachypodium distachyon. Frontiers in plant science 5: 99.

786 Wright S . 1943 . Isolation by distance. Genetics 28 : 114–138.

787 Wu ZQ, Ge S . 2012 . The phylogeny of the BEP clade in grasses revisited: Evidence from the 788 wholegenome sequences of chloroplasts. Molecular Phylogenetics and Evolution 62 : 573– 789 578.

790 Wysocki WP, Clark LG, Attigala L, Ruiz-Sanchez E, Duvall MR . 2015 . Evolution of the 791 (Bambusoideae; Poaceae): a full plastome phylogenomic analysis. BMC 792 evolutionary biology 15 : 50.

793 Zerbino DR . 2010 . Using the Columbus extension to Velvet.

794

795 796

Manuscript submitted to New Phytologist for review 29

Page 30 of 42

797 TABLES 798 799 Table 1. (a). Chloroplast haplotype diversity analysis of B. distachyon ecotypes and genomic 800 groups (EDF+, S+, T+). Group size and chloroplast haplotype diversity parameters. (b). 801 Pairwise estimates of the number of shared mutation (above diagonal) and the average number 802 of nucleotide differences (below diagonal) between genomic groups. 803 (a) 804

Genomic groups N S h Hd θπ

For Peer Review12.780 EDF+ 13 107 6 0.846 (3.872 – 31.128)

12.388 S+ 18 137 17 0.993 (3.804 – 30.837)

12.683 T+ 22 12 9 0.658 (3.784 – 28.087)

12.442 B. distachyon (all ecotypes) 53 298 32 0.933 (4.218 – 28.245)

805 (b)

shm EDF+ S+ T+ d

EDF+ --- 6 0

S+ 112.632 --- 0

T+ 112.790 33.970 ---

806 807

Manuscript submitted to New Phytologist for review 30

Page 31 of 42

808 FIGURE LEGENDS 809 810 Figure 1. Native circumMediterranean geographic distributions of the B. distachyon , B. 811 hybridum and B. stacei ecotypes used in the plastome evolutionary and genomic analyses. 812 Symbol and color codes for accessions are indicated in the chart. Accessions’ numbers 813 correspond to those indicated in Table S1. 814 815 Figure 2. Plastome maps of B. distachyon ABR6 (inner circle) and B. stacei ABR114 (outer 816 circle). A 1,161bp insertion is shown in the B. stacei map ( ∆, see upperleft quadrant), as well 817 as a deletion of rps 19For locus (*, seePeer lowerright quadrant).Review Smaller inner circles and tracks 818 correspond respectively to a map of plastome regions (LSC, SSC, IRA and IRB), a histogram 819 of observed SNPs across all 57 aligned plastomes, and a histogram of undetermined 820 nucleotides, marked as N characters in the alignments. 821 822 Figure 3. Intraspecific evolutionary analysis of B. distachyon plastomes, including dated 823 plastome genealogy, haplotypic network and genomic structure plots compared against the B. 824 distachyon nuclear genealogical tree. 825 (a). Beast nested dated chronogram of 53 B. distachyon plastomes showing estimated 826 divergence times for belowspecies level lineages. Datings (Mya) were inferred from 827 calibrations obtained from abovespecies level estimations (left). Genomic structure plots 828 showing percentages of membership of plastomes’ profiles to K=2 and K=4 genomic groups. 829 Thickness of branches indicates posterior probability support (thick, 0.951; intermediate, 830 0.900.94.; thin, <0.90) (center). Chloroplast capture and introgression events detected through 831 topological contrast of the plastome and the nuclear trees (nDNA tree from Gordon SP et al. 832 unpublished data) (right). Discontinuous and continuous lines mark potential chloroplast 833 capture events and introgression events, respectively. Colour codes for flowering time class 834 groups and phylogenetic groups are indicated in the respective charts. Flowering time class 835 groups are classified according to Ream et al. (2014) (see Table S8) 836 (b). Haplotypic statistical parsimony network constructed with the B. distachyon plastomes 837 using TCS. Dots represent mutation steps; number of mutation steps are indicated on branches. 838 Color codes for clusters are indicated in the chart. 839 840

Manuscript submitted to New Phytologist for review 31

Page 32 of 42

841 Figure 4. Colorcoded matrices of pairwise TamuraNei (TN) genetic distances between the 842 plastome sequences of 99 Poaceae species and 3 Brachypodium (B. distachyon , B. stacei , B. 843 hybridum ) species. Below diagonal: pairwise raw TN genetic distances; above diagonal: 844 pairwise patristic TN genetic distances (computed on the RAxML tree, see Fig. S5a). Color 845 associated distance values are indicated in the chart. 846 847 Figure 5. BEAST nested dated chronogram of 93 grass plastomes showing estimated 848 divergence times and posterior probability support values for abovespecies level lineages. 849 Stars indicate nodal calibration priors (ages) for the Poaceae and BEP+PACMAD clades. 850 Thickness of branchesFor indicates posteriorPeer probabili Reviewty support (thick, 0.951; intermediate, 851 0.900.94.; thin, <0.90).

Manuscript submitted to New Phytologist for review 32

Page 33 of 42

852 SUPPLEMENTARY TABLES 853 Table S1. List of B. distachyon , B. stacei and B. hybridum accessions studied. Origin of 854 samples: ABR1 ABR7 (Brachyomics collections (C. Stace & P. Catalán), Aberystwyth, UK); 855 BdTR_ accessions (Filiz et al. 2009); Bd11, Bd23, Bd31, Bd181, Bd21 (Vogel et al. 2006); 856 Bd213 (Vogel & Hill, 2008); Adi_, Gaz_, Tek_, Bis_, Kah_, Koz_ (Vogel et al. , 2009); Foz1, 857 Mig3, Mon3, Mur1, Uni2 (Mur et al. , 2011). * BdTR6G cited as ‘ B. distachyon’ in GRIN 858 Global; Filiz et al. (2009) described it as a “polyploid line”. **Bd301, developed by D. 859 Garvin from material collected by A. Manzaneda. IL (inbred line). Web sites: 860 https://gold.jgi.doe.gov/biosamples?Study.GOLD%20Study%20ID=Gs0033763 861 https://gold.jgi.doe.gov/projects?page=6&Biosample.For Peer ReviewBiosample+Name=Brachypodium+dista 862 chyon&count=100 863 https://www.google.com/fusiontables/DataSource?docid=1EQ1jPj9PJdBj4_or3zQGtIThS2YS 864 Ujli6Jcd1Q#rows:id=1 865 866 Table S2. Grass plastomes employed in evolutionary and genomic analyses. (1). Genomes 867 used in ML (RAxML) and BI (MrBayes) phylogenomic analyses. (2). Genomes used in 868 Bayesian nested dating analysis (BEAST). (3). Genomes used to infer background kmer 869 distributions (DUK). 870 871 Table S3. Bioinformatic tools used in the assembly and annotation of Brachypodium 872 plastomes and in the ir evolutionary and genomic analyses. 873 874 Table S4. Primers used for amplification and sequencing of IRLSC, LSCIR, IRSSC and 875 SSCIR junction regions and of the IR rps 19 copy 876 877 Table S5. Comparative cpDNA data of the assembled B. distachyon , B. hybridum and B. 878 stacei plastomes and GenBank accession numbers. (1) kmer  length of kmers in the best

879 assembly . (2) C  number of contigs assembled (Velvet output) . (3) L c  length of the

880 longest contig . (4) S  number of scaffolds assembled (SSPACE output) . (5) L s  length of 881 the longest scaffold . (6) Duplicate IRb  Assembly performed without Columbus module.

882 The IRb is a duplicated IRa. (7) L Total  Total length of the assembled genome at the end of 883 process (including missing data, Ns) . (8) N  missing data in percent. * “handcraft 884 scaffolding” and confirmation by SSPACE and GapFiller tools. 885 .

Manuscript submitted to New Phytologist for review 33

Page 34 of 42

886 887 Table S6. Genomic rearrangements and mutations found in inter and intraspecific 888 comparisons of the B. distachyon , B. stacei and B. hybridum plastomes 889 (a). Polymorphisms found between the plastomes of B. distachyon inbred line Bd21 890 (NC_011032.1) uploaded by Bortiri et al . (2008) and B. distachyon inbred line Bd21 891 assembled in present study. Note that our newly assembled Bd21 plastome has better 892 supporting evidence than the NC_011032.1 plastome, as most mutations detected in our 893 assembly have great read depth coverage and were also found in a large number of plastomes 894 of the other studied B. distachyon accessions. *Annotated insertions. ** PolyA region highly 895 variable. (b). CharacteristicsFor of thePeer 133 genes found Review in the B. distachyon , B. stacei and B. 896 hybridum assembled plastomes , and in the B. distachyon Bd21 (NC_011032.1) reference 897 plastome , annotated according to the best assembled B. distachyon ABR6 plastome (excluding 898 the IRb region). 899 900 (c). Rearrangements reported in rpl 23 and rps 19 gene copies in several plastomes of grasses. 901 902 (d). rpl 23 pseudogene output obtained from Blastx searches of the B. stacei /B.hybridum 1.1 903 kbp insert into annotated plastomes of several grasses. 904 905 (e). Polymorphisms detected among the assembled plastomes of the B. stacei and B. hybridum 906 accessions. 907 908 Table S7. List of B. distachyon plastome haplotypes found across the 53 analyzed ecotypes’ 909 plastomes. 910 911 Table S8. Flowering time groups classified according to Ream et al. (2014) 912 913 Table S9. Percentages of membership of 53 B. distachyon ecotypes’ plastome profiles to 914 optimal K= 2 and K= 4 Bayesian genomic groups inferred with Structure v.2.3.2. * 915 introgressed ecotypes. 916 917

Manuscript submitted to New Phytologist for review 34

Page 35 of 42

918 Table S10. Pairwise TamuraNei raw and patristic genetic distances between Brachypodium 919 [B. distachyon (ABR6), B. stacei (ABR114), B. hybridum (ABR113)] and 91 grass plastomes 920 (Table S2). Patristic distances were calculated in the best ML tree (Fig. S5a). 921 922 923 SUPPLEMENTARY FIGURES 924 Figure S1. Summarized pipeline used for the chloroplast genome assembly of the 925 Brachypodium plastomes. 926 927 Figure S2. Evidences of major rearrangements found among the B. distachyon , B. stacei and 928 B. hybridum plastomes.For Peer Review 929 (a). IGV image of the psa l rbc L insert region (1,161 bp) found in the assembled B. stacei and 930 B. hybridum plastomes. 931 (b). Alignment of the insert region in B. stacei , B. hybridum and B. distachyon (Bd21C) 932 ecotypes. 933 (c). Electrophoresis gel showing the amplified LSCIRb junction region 934 935 Figure S3. Plastome phylogenomic analysis of B. distachyon . 936 (a). Maximum likelihood cpDNA phylogenomic tree and cladogram of 53 Brachypodium 937 distachyon ecotypes computed with RAxML. Thickness of branches indicates bootstrap 938 support (thick, 80100%; intermediate, 5079%; thin, <50%). 939 940 (b). Bayesian cpDNA 50% Majoritary Rule consensus phylogenomic tree and cladogram of 53 941 Brachypodium distachyon ecotypes computed with MrBayes. Thickness of branches indicates 942 posterior probability support (thick, 0.951; intermediate, 0.900.94.; thin, <0.90). 943 944 Figure S4. Visual inspection of potential recombination events detected in the plastomes of 945 the introgressed B. distachyon Arn1 and Mon3 ecotypes. 946 (a). Aligned data matrix of 298 polymorphic positions found across the 53 studied B. 947 distachyon plastomes. 948 (b). Detail of the recombinant region (polymorphic positions 141 – 207) pointing potential 949 microrecombination events (Red rectangle: positions shared between Arn1 and Mon3 and 950 EDF+ clade. Green rectangle: positions shared between Arn1 and Mon3 and S+ group). 951

Manuscript submitted to New Phytologist for review 35

Page 36 of 42

952 Figure S5. Plastome phylogenomic analysis of Poaceae. 953 (a). Maximum likelihood cpDNA phylogenomic tree and cladogram of 95 Poaceae and 53 954 Brachypodium distachyon lineages computed with RAxML. Thickness of branches indicates 955 bootstrap support (thick, 80100%; intermediate, 5079%; thin, <50%). 956 (b). Bayesian cpDNA 50% Majoritary Rule consensus phylogenomic tree and cladogram of 95 957 Poaceae and 53 Brachypodium distachyon lineages computed with MrBayes. Thickness of 958 branches indicates posterior probability support (thick, 0.951; intermediate, 0.900.94.; thin, 959 <0.90). 960 961 Figure S6. Beast nestedFor dating analysis Peer of Poaceae Review (abovespecies) and B. distachyon (below 962 species) plastome sequences. 963 (a) Beast nested dated chronogram of 93 abovespecies grass plastomes showing the estimated 964 divergence times, HPD ranges (bars) and posterior probability support (line thickness; thick, 965 0.951; intermediate, 0.900.94.; thin, <0.90 ) of each node. Stars indicate nodal calibration 966 priors (ages) for the Poaceae and BEP+PACMAD clades. 967 (b) Beast nested dated chronogram of 53 belowspecies B. distachyon plastomes showing 968 divergence times, HPD ranges (bars) and posterior probability support (line thickness; thick, 969 0.951; intermediate, 0.900.94.; thin, <0.90 ) of each node. 970 971 972 973

Manuscript submitted to New Phytologist for review 36

Fig.Page 1 37 of 42 For Peer Review

Manuscript submitted to New Phytologist for review Fig. 2 Page 38 of 42

For Peer Review

Manuscript submitted to New Phytologist for review Fig.3aPage 39 of 42

For Peer Review

Manuscript submitted to New Phytologist for review Fig.3b Page 40 of 42

For Peer Review

Manuscript submitted to New Phytologist for review PageFig.4 41 of 42

For Peer Review

Manuscript submitted to New Phytologist for review Fig.5 Page 42 of 42

For Peer Review

Manuscript submitted to New Phytologist for review Fig.S1 Fig.S2 Fig.S3a Fig.S3b Fig.S4 Fig.S5a Fig.S5b Fig.S6a Fig.S6b SUPPLEMENTARY TABLES

Table S1. List of B. distachyon, B. stacei and B. hybridum accessions studied. Origin of samples: ABR1 - ABR7 (Brachyomics collections (C. Stace & P. Catalán), Aberystwyth, UK); BdTR_ accessions (Filiz et al. 2009); Bd1-1, Bd2-3, Bd3-1, Bd18-1, Bd21 (Vogel et al. 2006); Bd21-3 (Vogel & Hill, 2008); Adi_, Gaz_, Tek_, Bis_, Kah_, Koz_ (Vogel et al., 2009); Foz1, Mig3, Mon3, Mur1, Uni2 (Mur et al., 2011). * BdTR6G cited as ‘B. distachyon’ in GRIN-Global; Filiz et al. (2009) described it as a “polyploid line”. **Bd30-1, developed by D. Garvin from material collected by A. Manzaneda. IL (inbred line). Web sites: https://gold.jgi.doe.gov/biosamples?Study.GOLD%20Study%20ID=Gs0033763 https://gold.jgi.doe.gov/projects?page=6&Biosample.Biosample+Name=Brachypodium+distachyon&count=100 https://www.google.com/fusiontables/DataSource?docid=1EQ1jPj9PJdBj4_or3zQGtIThS2YSUjli6Jcd1Q#rows:id=1

GOLD Collection Biosample SRX SRA Elevation Accession SRA study Bioproject Biosample Latitude Longitude location ID accession sample (masl) (JGI) Hérault, ABR2 Gb0017122 SRX182920 SRS360668 SRP001538 PRJNA32607 SAMN01162034 371 43° 36' 15.343" N 3° 15' 46.580" E France ABR3 Aísa, (ABY-Bs Huesca, Gb0017178 1928 42° 10' 49.8" N 0° 4’ 23.2" W 5088) Spain

Arén, ABR4 Huesca, Gb0017179 480 42° 15' 45.54" N 0° 43' 0.48" E (ABY-Bs 5089) Spain Jaca, ABR5 Huesca, Gb0017180 SRX182894 SRS360645 SRP001538 PRJNA32607 SAMN01161993 828 42° 34' 23.45" N 0° 33' 49.39" W (ABY-Bs 5090) Spain

1

Los Arcos, ABR6 Navarra, Gb0017181 SRX298413 SRS438486 SRP001538 PRJNA32607 SAMN02194258 484 42° 34' 27.48" N 2° 11' 5.39" W (ABY-Bs 5091) Spain Otero, ABR7 Valladolid, Gb0017232 725 41° 35' 23.86" N 4° 45' 24.26" W (ABY-Bs 5092) Spain SRX874557 ABR8 Siena, Italy Gb0017233 SRS844147 SRP054525 PRJNA258992 SAMN03003516 272 43° 18' 52.423" N 11° 19' 10.902" E SRX874558 Adi10 Adiyaman, Gb0009975 SRX185151 SRS361658 SRP001538 PRJNA32607 SAMN01163552 510 37° 46' 14.5" N 38° 21' 8.2" E (W6 39243) Turkey Adi12 Adiyaman, Gb0009864 510 37° 46' 14.5" N 38° 21' 8.2" E (W6 39245) Turkey Adi2 Adiyaman, Gb0017235 510 37° 46' 14.5" N 38° 21' 8.2" E (W6 39235) Turkey Arén, Arn1 Huesca, Gb0009976 SRX298355 SRS438433 SRP024421 PRJNA249894 SAMN02194211 681 42° 15' 23.44" N 0° 43' 47.46" E Spain SRX060134 Bd1-1 Soma, SRX060135 (PI 170218) Manisa, Gp0001144 SRS190935 SRP001538 PRJNA32607 SAMN00262772 141 39° 11' 27.44" N 27° 36' 28.59" E SRX060136 (W6 46201) Turkey SRX116611 Kaman, Bd18-1 Kırşehir (PI 245730) Gb0009918 1101 39° 22' 4.25" N 33° 43' 48.91" E Province, (W6 46204) Turkey Bd2-3 (PI 185133) Iraq Gb0009943 42 33° 45' 39.18" N 44° 24' 11.07" E (W6 46202) Bd21 near (PI 254867) Salakudin, Gb0012676 42 33° 45' 39.18" N 44° 24' 11.07" E (W6 36678) Iraq

2

near Bd21-3 SRX119501 SRS291714 SAMN00788671 Salakudin, Gp0039861 SRP010886 PRJNA250376 42 33° 45' 39.18" N 44° 24' 11.07" E (W6 39233) SRX146215 SRS312328 SAMN00991142 Iraq Bd3-1 (PI 185134) Iraq Gp0001284 SRX117923 SRP001538 SRP001538 PRJNA32607 SAMN00779234 42 33° 45' 39.18" N 44° 24' 11.07" E (W6 46203) Dilar, Bd30-1** SRX116649 Granada, Gp0001821 SRS190910 SRP001538 PRJNA32607 SAMN00262727 1220 36° 59' 25.76" N 3° 33' 31.44" W (W6 46206) SRX059915 Spain BdTR10C Turkey Gb0009946 SRX185149 SRS361656 SRP001538 PRJNA32607 SAMN01163550 1288 37° 46' 41.64" N 31° 53' 5.68" E (W6 39406) BdTR11A Turkey Gb0017236 986 38° 25' 0.42" N 28° 1' 52.75" E (W6 39418) BdTR11G Kirklareli, Gb0017237 124 41° 25' 17.86" N 27° 28' 36.81" E (W6 39424) Turkey BdTR11I Turkey Gb0009945 363 39° 44' 17.39" N 28° 2' 24.71" E (W6 39426) BdTR12C SRX059779 Turkey Gp0009928 SRS190847 SRP001538 PRJNA32607 SAMN00262664 1035 39° 44' 53.45" N 34° 39' 1.15" E (W6 39429) SRX059780 BdTR13A Ankara, Gb0017238 SRX183318 SRS360828 SRP001538 PRJNA32607 SAMN01162158 787 39° 45' 23.35" N 32° 25' 56.46" E (W6 39430) Turkey BdTR13C Ankara, Gb0009863 1192 39° 24' 46.28" N 32° 59' 17.24" E (W6 39432) Turkey BdTR1I Aydin, Gb0017239 SRX183383 SRS360859 SRP001538 PRJNA32607 SAMN01162198 841 38° 5' 35.03" N 28° 34' 59.02" E (W6 39308) Turkey BdTR2B Turkey Gb0012677 667 40° 4' 55.55" N 31° 19' 52.01" E (W6 39314) BdTR2G Ankara, Gb0009917 SRX185148 SRS361655 SRP001538 PRJNA32607 SAMN01163549 1596 40° 23' 37.13" N 32° 59' 7.32" E (W6 39319) Turkey BdTR3C Turkey Gb0009942 1957 36° 46' 58.92" N 32° 57' 46.71" E (W6 39332)

3

BdTR5I Turkey Gb0009974 1596 40° 23' 37.13" N 32° 59' 7.32" E (W6 39366) BdTR7A Yozgat, Gb0017240 SRX183377 SRS360854 SRP001538 PRJNA32607 SAMN01162193 1035 39° 44' 53.45" N 34° 39' 1.15" E (W6 39385) Turkey SRX181206 BdTR8I SRX181207 Turkey Gb0017241 SRS359840 SRP001538 PRJNA32607 SAMN01137370 2385 37° 6' 31.87" N 34° 4' 17.06" E (W6 39390) SRX181208 SRX181209 BdTR9K Eskişehir, Gb0009919 932 39° 45' 10.62" N 30° 47' 19.07" E (W6 39402) Turkey Bismil, Bis1 Gp0017242 529 37° 52' 35.6" N 41° 0' 54.3" E Turkey Foz de Gp0009893 Lumbier, (Mig1) Foz1 434 42º 38' 11.44" N 1º 18' 17.42" W Navarra, Project ID: Spain 404167 Gaz8 Gaziantep, Gb0009947 SRX185147 SRS361654 SRP001538 PRJNA32607 SAMN01163548 891 37° 7' 39.8" N 37° 23' 26.9" E (W6 39269) Turkey Ermita de Gp0009916 San (Mon1) Jer1 Jerónimo, 418 42° 3' 16.56" N 0° 0' 44.57" W Project ID: Huesca, 404166 Spain Kah1 Kahta, Gp0017374 665 37° 44' 2.3" N 38° 32' 0.2" E (W6 39278) Turkey Kah5 Kahta, Gb0017182 665 37° 44' 2.3" N 38° 32' 0.2" E (W6 39282) Turkey Koz1 Kozluk, Gb0012678 SRX183517 SRS360986 SRP001538 PRJNA32607 SAMN01162329 853 38° 9' 8.2.6" N 41° 36' 34.8" E (W6 39284) Turkey Koz3 Kozluk, SRX059781 Gp0009991 SRS190848 SRP001538 PRJNA32607 SAMN00262665 853 38° 9' 8.2.6" N 41° 36' 34.8" E (W6 39286) Turkey SRX059782

4

Ermita de Santa Lucía, Luc1 (G31i1) Berdún, Gp0017244 PRJNA249901 SAMN02821630 597 42° 36' 36.18" N 0° 53' 35.48" W Huesca, Spain San Miguel de Foces, Mig3 Ibieca, Gb0017183 SRX182705 SRS360564 SRP001538 PRJNA32607 SAMN01161923 572 42° 8' 52.76" N 0° 11' 41.89" W Huesca, Spain Puerto de Pallaruelo, Castejón de Mon3 Gb0017184 SRX182916 SRS360665 SRP001538 PRJNA32607 SAMN01162031 515 41° 39' 4.75" N 0° 12' 37.51" W Monegros. Zaragoza, Spain Castillo de

Mur1 Mur, Lleida, Gb0009944 SRX181229 SRS359860 SRP001538 PRJNA32607 SAMN01137389 487 42° 06' 18" N 0° 51' 23" E Spain Puerto del Perdón, Per1 (G30i1) Gp0017243 PRJNA249907 SAMN02947395 742 42° 44' 13.34" N 1° 44' 58.6" W Navarra, Spain Zaidín, S8iiC Huesca, Gb0017185 144 41° 36' 19.3" N 0° 08' 38.4" E Spain Gp0009900 Sigüés, (Sig1) Sig2 Zaragoza, 524 42° 36' 46.55" N 1° 0' 52.38" W Spain Project ID: 404169

5

Tek2 Tekirdag, Gb0012679 SRX183516 SRS360985 SRP001538 PRJNA32607 SAMN01162328 20 41° 0' 40.1" N 27° 31' 8.8" E (W6 39290) Turkey Tek4 Tekirdag, Gb0017188 20 41° 0' 40.1" N 27° 31' 8.8" E (W6 39292) Turkey Escuela Politécnica Uni2 Superior, Gb0017189 480 42° 7' 3.98" N 0° 26' 42.81" W Huesca, Spain Roncal, RON4 Navarra, Gp0039823 SRX711596 SRS710321 SRP047542 PRJNA251304 SAMN02821496 594 42° 46' 50" N 0° 57' 48'' W (RON2) Spain Bd29-1 Krym, (PI 639818) 260 44° 30' 55" N 33° 33' 23" E (W6 46205) Poblado de San Pob1 (G32i2) Antonio, Gp0017245 PRJNA249905 SAMN02947396 573 41° 0' 16.99" N 0° 11' 6.72" E B. hybridum Calaceite, Teruel, Spain BdTR6G* (W6 39378) Turkey Gb0022615 SRX716913 SRS712683 SRP048461 PRJNA250071 SAMN02821424 872 39° 45' 15'' N 33° 31' 16'' E B. hybridum ABR113 Lisbon, Gb0025282 PRJNA251263 SAMN03002988 187 38° 46' 58.775" N 9° 15' 1.757" W B. hybridum Torrent, ABR114 B. stacei Formentera, Gb0025047 PRJNA251220 SAMN03003255 122 39° 28' 35.350" N 2° 49' 55.448" E Spain

6

Table S2. Grass plastomes employed in evolutionary and genomic analyses. (1). Genomes used in ML (RAxML) and BI (MrBayes) phylogenomic analyses. (2). Genomes used in Bayesian nested dating analysis (BEAST). (3). Genomes used to infer background k-mer distributions (DUK).

1 2 3 Species Accession GI RAxML BEAST DUK Acidosasa purpurea NC_015820.1 340034177 x x x bicornis cultivar Clae57 NC_024831.1 685508511 x x Aegilops cylindrica NC_023096.1 568246973 x x Aegilops geniculata NC_023097.1 568244975 x x Aegilops kotschyi cultivar TA1980 NC_024832.1 699008472 x x Aegilops longissima cultivar TA1924 NC_024830.1 685508428 x Aegilops searsii cultivar TA1926 KJ614413.1 667754557 x x Aegilops sharonensis cultivar TA1995 NC_024816.1 697964657 x x Aegilops speltoides var. ligustica cultivar AE918 KJ614404.1 667753810 x x Aegilops tauschii cultivar AL8/78 KJ614412.1 667754474 x x Agrostis stolonifera NC_008591.1 118430280 x x x Ammophila breviligulata voucher CAN:Peterson NC_027465.1 884998160 x x 20867 Ampelocalamus calcareus NC_024731.1 675155489 x x Ampelodesmos mauritanicus voucher B:Royl & NC_027466.1 884998245 x x Schiers s.n. Anomochloa marantoidea NC_014062.1 295065706 x x x Anthoxanthum odoratum voucher CAN:Saarela NC_027467.1 884998329 x x 500 appalachiana NC_023934.1 608787536 x x NC_020341.1 452849461 x x NC_023935.1 608787620 x x Avena sativa voucher CAN:Saarela 775 NC_027468.1 884998413 x x Bambusa emeiensis NC_015830.1 340034430 x x x Bambusa oldhamii NC_012927.1 253729536 x x x aristosum voucher BH:J.I. Davis NC_027470.1 884998582 x x 777 Brachypodium distachyon Bd21 NC_011032.1 194033128 x Briza maxima voucher CAN:Saarela 284 NC_027471.1 884998669 x x Bromus vulgaris voucher CAN:Saarela 822 NC_027472.1 884998754 x x Chimonocalamus longiusculus NC_024714.1 675154211 x x Coix lacryma-jobi NC_013273.1 260677373 x x x Dactylis glomerata voucher CAN:Saarela 496 NC_027473.1 884998837 x x Dendrocalamus latiflorus NC_013088.1 255961360 x x x Deschampsia antarctica NC_023533.1 589229800 x x Diarrhena obovata voucher BH:J.I. Davis 756 NC_027474.1 884998922 x x Fargesia nitida NC_024715.1 675154294 x x Fargesia spathacea NC_024716.1 675154378 x x Fargesia yunnanensis NC_024717.1 675154462 x x Ferrocalamus rimosivaginus NC_015831.1 340034515 x x x altissima NC_019648.1 427436954 x x

7

Festuca arundinacea NC_011713.2 255961284 x Festuca arundinacea voucher CAN:Saarela 331 KM974751.1 768805826 x x Festuca ovina NC_019649.1 426406618 x x Festuca pratensis NC_019650.1 427437051 x x Gaoligongshania megalothyrsa NC_024718.1 675154546 x x tessellatus NC_024719.1 675154630 x x Helictochloa hookeri voucher CAN:Saarela NC_027469.1 884998498 x x 18359 Hierochloe odorata voucher A:E.A. Kellogg s.n. NC_027475.1 884999006 x x Hordeum jubatum voucher CAN:Saarela 18478 NC_027476.1 884999091 x x Hordeum vulgare subsp. vulgare cultivar Barke KC912687.1 521300931 x x Hordeum vulgare subsp. vulgare cultivar Morex EF115541.1 118201020 x Indocalamus longiauritus NC_015803.1 339906432 x x x Indocalamus wilsonii NC_024720.1 675154714 x x Indosasa sinica NC_024721.1 675154798 x x Lecomtella madagascariensis NC_024106.1 662020661 x x Leersia tisserantii JN415112.1 346228283 x x multiflorum NC_019651.1 427437197 x x Lolium perenne NC_009950.1 159106843 x x x mutica voucher US:W.J. Kress & M. Butts NC_027477.1 884999174 x x 04-7461 Melica subulata voucher CAN:Saarela 836 NC_027478.1 884999258 x x Oligostachyum shiuyingianum NC_024722.1 675154881 x x Olyra latifolia KF515509.1 628098861 x x Oryza nivara NC_005973.1 50233947 x x Oryza rufipogon KF428978.1 552954453 x x Oryza sativa (indica cultivar-group) isolate 93-11 AY522329.1 42795473 x x x Oryza sativa (japonica cultivar-group) isolate AY 522331.1 42795601 x PA64S Oryzopsis asperifolia voucher CAN:Saarela 430 NC_027479.1 884999342 x x chloroplast NC_015990.1 345895196 x x x Phaenosperma globosum voucher BH:J.I. Davis NC_027480.1 884999427 x x 779 Phalaris arundinacea voucher CAN:Saarela 973 NC_027481.1 884999512 x x Pharus lappulaceus NC_023245.1 570700293 x x Pharus latifolius NC_021372.1 511347561 x x Phleum alpinum voucher CAN:Saarela 1234 NC_027482.1 884999596 x x edulis NC_015817.1 340034006 x x x Phyllostachys nigra var. henonis NC_015826.1 340034345 x x x Phyllostachys propinqua NC_016699.1 374249330 x x Phyllostachys sulphurea NC_024669.1 671743764 x x Piptochaetium avenaceum voucher CAN:R.J. NC_027483.1 884999681 x x Soreng & K. Romaschenko 430 Pleioblastus maculatus chloroplas NC_024723.1 675155300 x x Poa palustris voucher CAN:Saarela 1080 NC_027484.1 884999765 x x nuttalliana NC_027485.1 884999850 x x Puelia olyriformis NC_023449.1 586929210 x x Rhynchoryza subulata NC_016718.1 374249599 x x

8

Saccharum hybrid cultivar NCo 310 NC_006084.1 50812505 x Saccharum hybrid cultivar SP-80-3280 NC_005878.2 50198865 x Sarocalamus faberi NC_024713.1 675154126 x x cereale NC_021761.1 525782195 x x Sorghum bicolor NC_008602.1 118614470 x x x Stipa hymenoides voucher CAN:Saarela 725 NC_027464.1 884998075 x x Thamnocalamus spathiflorus NC_024724.1 675155405 x x Torreyochloa pallida voucher CAN:Saarela 1110 NC_027486.1 884999935 x x Trisetum cernuum voucher CAN:Saarela 876 NC_027487.1 885000020 x x Triticum aestivum NC_002762.1 14017551 x Triticum aestivum cultivar Chinese Spring KJ614396.1 667753146 x x TA3008 Triticum monococcum subsp. aegilopoides KC912692.1 521301327 x x Triticum timopheevii cultivar TA0941 KJ614407.1 667754059 x x Triticum turgidum cultivar TA2801 KJ614399.1 667753395 x x Triticum urartu cultivar PI428335 KJ614411.1 667754391 x x levigata NC_024725.1 675154964 x x Zea mays X86563.2 11990232 x x x

9

Table S3. Bioinformatic tools used in the assembly and annotation of Brachypodium plastomes and in their evolutionary and genomic analyses.

Bioinformatics tools Brief description References Chloroplast assembly DUK DUK - A fast and efficient kmer based sequence matching too. (Li et al., 2011) FastQC is a quality control application for high throughput sequence data. It reads in sequence data in a variety of formats and can either provide an FastQC v.0.10.1 (Andrews, 2010) interactive application to review the results of several different QC checks, or create an HTML based report which can be integrated into a pipeline. Trimmomatic v.0.32 Flexible trimmer for Illumina sequence data (Bolger et al., 2014) Musket v.1.0.6 Multistage k-mer spectrum-based error corrector for Illumina sequence data. (Liu et al., 2013) BWA v.0.7.8 Fast and accurate short read alignment with Burrows–Wheeler transform. (Li & Durbin, 2009) Multi-threaded Perl script for automatically optimising the three primary VelvetOptimiser v.2.2.5 parameter options (K, -exp_cov, -cov_cutoff) for the Velvet de novo sequence (Gladman & Seemann, 2012) assembler. Velvet v.1.2.07 (Zerbino & Birney, 2008; Zerbino, 2010) Algorithms for de novo short read assembly using de Bruijn graphs. Columbus module SSPACE Basic v.2.0 Stand-alone scaffolder of pre-assembled contigs using paired-read data. (Boetzer et al., 2011) GapFiller v.1.11 De novo assembly approach to fill the gap within paired reads. (Boetzer & Pirovano, 2012; Nadalin et al., 2012) The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to BLAST v.2.2.28+ sequence databases and calculates the statistical significance of matches. BLAST (Camacho, 2013) can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families. Tool for correcting errors (i.e., insertions, deletions, and substitutions) in contigs SEQuel v.1.0.2 (Ronen et al., 2012) output from assembly. The algorithm behind SEQuel makes use of a graph

10

structure called the positional “de Bruijn” graph, which models k-mers within reads while incorporating their approximate positions into the model. SAMtools implements various utilities for post-processing alignments in the SAM SAMtools v.0.1.18 format, such as indexing, variant caller and alignment viewer, and thus provides (Li et al., 2009) universal tools for processing read alignments. High-performance viewer that efficiently handles large heterogeneous data sets, IGV v.2.3.8 while providing a smooth and intuitive user experience at all levels of genome (Thorvaldsdóttir et al., 2013) resolution. Alignment and viewer MAFFT v.7.031b Multiple sequence alignment program. (Katoh & Standley, 2013) The Molecular Evolutionary Genetics Analysis (MEGA) software is developed for MEGA v.7.0.14 (Kumar et al., 2016) comparative analyses of DNA and protein sequences Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic SeaView v.4 (Gouy et al., 2010) Tree Building. Integrated and extendable desktop software platform for the organization and Geneious v.8.1.4 (Kearse et al., 2012) analysis of sequence data. trimAl v.1.2rev59 Tool for automated alignment trimming in large-scale phylogenetic analyses (Capella-Gutiérrez et al., 2009) Annotation and drawing Integrated web server for the annotation, visualization, analysis, and GenBank (Liu et al., 2012) CpGAVAS (web) submission of completely sequenced chloroplast genome sequences. OrganellarGenomeDRAW Tool for generating physical maps of plastid and mitochondrial genomes and (Lohse et al., 2013) (web) visualizing expression data sets Visualization tool to the identification and analysis of Circos (Krzywinski et al., 2009) similarities and differences arising from comparisons of genomes Phylogenetic, haplotypic and genomic diversity analyses JModelTest v.2.1.7 Tool to carry out statistical selection of best-fit models of nucleotide substitution (Guindon & Gascuel, 2003; Darriba et al., 2012) RAxML v.8.1.17 Tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. (Stamatakis, 2014) MrBayes v.3.2.2 MrBayes 3 performs Bayesian phylogenetic analysis combining information from (Ronquist & Huelsenbeck, 2003; Ronquist et al.,

11

different data partitions or subsets evolving under different stochastic 2011) evolutionary models. BEAST v.1.8.2 Bayesian Evolutionary Analysis Sampling Trees (Drummond et al., 2012) BEAUti v.1.8.2 A simple user interface for creating input files to run BEAST (Drummond et al., 2012) Tracer v.1.6 Tracer is graphical tool for visualization and diagnostics of MCMC output. (Rambaut et al., 2014) TCS v.1.21 Phylogenetic network estimation using statistical parsimony. (Clement et al., 2000) Free software package for using multi-locus genotype data to investigate Structure v.2.3.2 (Pritchard et al., 2000) population structure Recombination detection program that implements an extensive array of RDP v.4.56 (Martin et al., 2015) methods for detecting and visualizing recombination events. In-house Scripts Efficient kseq-based program to sort and find paired reads within FASTQ/FASTA split pairs v.0.5 files, with the ability to edit headers with the power of Perl-style regular expressions. Tool for transfering features annotated on a reference GenBank file to another annot_fasta_from_gbk.pl https://github.com/eead-csic- sequence (in FASTA forma) compbio/chloroplast_assembly_protocol Script to analyze DNA polymorphisms along pre-aligned cp genomes. Produces _check_matrix.pl data files to be used as tracks with Circos software. A set of scripts for the assembly of chloroplast genomes out of whole-genome Chloroplast_Assembly_Protocol sequencing reads.

12

SUPPLEMENTARY REFERENCES TABLE S3

Andrews S. 2010. FastQC: a quality control tool for high throughput sequence data.

Boetzer M, Henkel C V., Jansen HJ, Butler D, Pirovano W. 2011. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27: 578–579.

Boetzer M, Pirovano W. 2012. Toward almost closed genomes with GapFiller. Genome biology 13: R56.

Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120.

Camacho C. 2013. BLAST+.

Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25: 1972–1973.

Clement M, Posada D, Crandall KA. 2000. TCS: a computer program to estimate gene genealogies. Molecular Ecology 9: 1657–1660.

Darriba D, Taboada GL, Doallo R, Posada D. 2012. jModelTest 2: more models, new heuristics and parallel computing. Nature Methods 9: 772.

Drummond AJ, Suchard MA, Xie D, Rambaut A. 2012. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution 29: 1969–1973.

Filiz E, Ozdemir BS, Budak F, Vogel JP, Tuna M, Budak H. 2009. Molecular, morphological, and cytological analysis of diverse Brachypodium distachyon inbred lines. Genome 52: 876–890.

Gladman S, Seemann T. 2012. VelvetOptimiser.

13

Gouy M, Guindon S, Gascuel O. 2010. SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Molecular biology and evolution 27: 221–224.

Guindon S, Gascuel O. 2003. A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood. Systematic Biology 52: 696–704.

Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution 30: 772–780.

Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, et al. 2012. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28: 1647– 1649.

Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. 2009. Circos: An information aesthetic for comparative genomics. Genome Research 19: 1639–1645.

Kumar S, Stecher G, Tamura K. 2016. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Molecular Biology and Evolution.

Li M, Copeland A, Han J. 2011. DUK - A Fast and Efficient Kmer Based Sequence Matching Too.

Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754–1760.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079.

Liu Y, Schröder J, Schmidt B. 2013. Musket: A multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics

14

29: 308–315.

Liu C, Shi L, Zhu Y, Chen H, Zhang J, Lin X, Guan X. 2012. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences. BMC genomics 13: 715.

Lohse M, Drechsel O, Kahlau S, Bock R. 2013. OrganellarGenomeDRAW--a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets. Nucleic acids research 41: 575–581.

Martin DP, Murrell B, Golden M, Khoosal A, Muhire B. 2015. RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evolution 1: 1–5.

Mur LAJ, Allainguillaume J, Catalán P, Hasterok R, Jenkins G, Lesniewska K, Thomas I, Vogel J. 2011. Exploiting the brachypodium tool box in cereal and grass research. New Phytologist 191: 334–347.

Nadalin F, Vezzi F, Policriti A. 2012. GapFiller: a de novo assembly approach to fill the gap within paired reads. BMC Bioinformatics 13: S8.

Pritchard JK, Stephens M, Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics 155: 945–959.

Rambaut A, Suchard MA, Xie D, Drummond AJ. 2014. Tracer.

Ronen R, Boucher C, Chitsaz H, Pevzner P. 2012. sEQuel: Improving the accuracy of genome assemblies. Bioinformatics 28: 188–196.

Ronquist F, Huelsenbeck JP. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574.

Ronquist F, Huelsenbeck J, Teslenko M. 2011. MrBayes Version 3.2 Manual: Tutorials and Model Summaries. : 1–103.

Stamatakis A. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30: 1312–1313.

Thorvaldsdóttir H, Robinson JT, Mesirov JP. 2013. Integrative Genomics Viewer (IGV): High-performance genomics data visualization and

15 exploration. Briefings in Bioinformatics 14: 178–192.

Vogel J, Hill T. 2008. High-efficiency Agrobacterium-mediated transformation of Brachypodium distachyon inbred line Bd21-3. Plant Cell Reports 27: 471–478.

Vogel JP, Tuna M, Budak H, Huo N, Gu YQ, Steinwand MA. 2009. Development of SSR markers and analysis of diversity in Turkish populations of Brachypodium distachyon. BMC plant biology 9: 88.

Zerbino DR. 2010. Using the Columbus extension to Velvet.

Zerbino DR, Birney E. 2008. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18: 821–829.

16

Table S4. Primers used for amplification and sequencing of IR-LSC, LSC-IR, IR-SSC and SSC-IR junction regions and of the IR rps19 copy. Primer name Sequencing 1_IRa_LSC_Forward AGCCGGATCTAAGTGTTGGC 2_IRa_LSC_Reverse GCTCATGGTTATTTTGGCCGAT 3_LSC_IRb_Forward ATTCCCCCAATTTGCGACCT 4_LSC_IRb_Reverse TGTTGGCTAGGTAAACGCCC 5_IRb_SSC_Forward ACGTTTGCTGGCTTATTTGGC 6_IRb_SSC_Reverse TCTGTAAGTCTAGYTATCCTCGGT 7_SSC_IRa_Forward ACTCGCTCAACTCGTTCCAA 8_SSC_IRa_Reverse AAGATACGGAGACTTGCTTCACA

17

Table S5. Comparative cpDNA data of the assembled B. distachyon, B. hybridum and B. stacei plastomes and GenBank accession numbers. (1) kmer  length of k-mers in the best assembly. (2) C  number of contigs assembled (Velvet output). (3) Lc  length of the longest contig. (4)

S  number of scaffolds assembled (SSPACE output). (5) Ls  length of the longest scaffold. (6) Duplicate IRb  Assembly performed without Columbus module. The IRb is a duplicated IRa. (7) LTotal  Total length of the assembled genome at the end of process (including missing data, Ns). (8) N  missing data in percent. * “handcraft scaffolding” and confirmation by SSPACE and GapFiller tools.

Median Ambiguous Duplicate LTotal Accession ID kmer C Lc (bp) coverage S Ls (bp) Base – Ns Ecotypes IRb (bp) (Embl/ENA) (1) (2) (3) depth (4) (5) (%) (6) (7) (8) LT558583 ABR2 59 7 98,942 108 1 134,840 135,170 0.1 LT558584 ABR3 47 3 79,476 299 2 101,121 X 135,147 - LT558585 ABR4 59 8 98,896 241 1 134,958 135,138 <0.1 LT558586 ABR5 81 8 98,932 81 1 134,954 135,187 0.1 LT222229 ABR6 47 3 79,487 239 2 101,032 X 135,159 - LT558587 ABR7 59 9 98,883 250 3 134,772 135,125 0.1 LT558588 ABR8 47 8 68,322 350 3 134,878 135,214 <0.1 LT558590 Adi2 71 8 98,841 76 1 135,065 135,186 0.1 LT558633 Adi10 47 9 68,281 266 4 70,688 X 135,155 0.1 LT558632 Adi12 47 6 79,494 259 3 105,959 X 135,186 0.3 LT558591 Arn1 47 7 84,060 165 1 134,976 135,116 <0.1 LT558592 Bd1-1 47 10 87,720 211 3 135,053 135,039 0.1 LT558595 Bd18-1 59 7 98,859 96 3 99,686 135,191 0.4 LT558593 Bd2-3 81 10 92,493 84 8 93,342 135,186 3.4 LT558596 Bd21-3 47 1 135,186 302 1 135,232 135,186 <0.1 LT558597 Bd21-control 47 12 82,896 145 2 134,579 135,202 0.3

18

(Bd21C) LT558598 Bd29-1 47 8 34,312 96 3 79,485 X 135,049 <0.1 LT558594 Bd3-1 47 8 98,932 101 2 136,368 135,186 <0.1 LT558599 Bd30-1 47 12 88,757 75 2 120,904 135,133 - LT558606 BdTR10C 59 11 90,871 258 6 102,044 135,186 4.8 LT558607 BdTR11A 59 9 68,319 92 2 134,785 135,156 0.2 LT558608 BdTR11G 59 20 72,915 159 13 134,379 135,157 0.3 LT558609 BdTR11I 47 13 80,716 166 9 81,807 135,157 0.4 LT558610 BdTR12C 47 7 98,923 75 1 135,174 135,186 - LT558631 BdTR13C 81 15 81,100 83 11 92,314 135,048 3.4 LT558611 BdTR13A 47 10 68,288 168 3 134,851 135,044 <0.1 LT558600 BdTR1I 81 10 50,454 83 1 135,164 135,186 0.1 LT558601 BdTR2B 73 11 68,321 192 3 135,029 135,185 0.1 LT558602 BdTR2G 71 8 68,314 78 2 118,743 135,186 4.4 LT558634 BdTR3C 47 22 70,632 271 15 75,137 134,991 0.1 LT558635 BdTR5I 59 3 79,506 187 2 101,051 X 135,186 <0.1 LT558604 BdTR7A 59 13 44,684 254 5 134,443 135,141 0.4 LT558605 BdTR8I 59 10 98,911 243 3 134,620 135,159 0.5 LT558636 BdTR9K 59 10 92,492 97 6 93,497 135,186 5.7 LT558612 Bis1 47 7 87,717 175 1 134,788 135,044 0.1 LT558613 Foz1 47 7 98,906 164 1 135,020 135,149 0.1 LT558582 Gaz8 47 23 24,678 508 14 38,929 135,187 4.3 LT558614 Jer1 47 3 79,492 133 2 101,037 X 135,161 - LT558615 Kah1 47 8 98,853 105 1 134,976 135,186 <0.1 LT558616 Kah5 81 8 68,314 82 1 134,821 135,186 0.3 LT558617 Koz1 47 8 98,924 161 2 135,043 135,186 0.1 LT558618 Koz3 47 8 52,928 72 1 135,155 135,186 - LT558619 Luc1 59 8 98,890 109 2 135,015 135,132 0.1

19

LT558620 Mig3 47 7 98,863 164 1 134,819 135,116 0.1 LT558621 Mon3 47 9 98,910 167 3 134,977 135,140 <0.1 LT558622 Mur1 47 14 82,907 163 8 102,460 135,174 1.0 LT558623 Per1 59 9 80,513 241 2 134,552 135,175 0.4 LT558625 RON4 59 6 98,901 316 1 135,080 135,144 <0.1 LT558626 S8iiC 47 12 98,846 278 7 111,777 135,145 1.8 LT558627 Sig2 59 11 98,906 243 4 134,745 135,149 0.1 LT558629 Tek2 47 8 98,911 163 2 134,887 135,159 0.2 LT558628 Tek4 47 4 79,476 271 4 79,522 X 135,159 <0.1 LT558630 Uni2 47 7 68,229 251 5 79,475 X 135,106 <0.1 LT222230 ABR113 47 7 49,506 230 3 81,299 X 136,327 - LT558624 Pob1 59 4 47,139 107 1 136,402 136,327 - LT558603 BdTR6G 87 2 136,298 313 2 136,384 136,326 - LT558589 ABR114 47 6 42,837 272 * * X 136,330 -

20

Table S6. Genomic rearrangements and mutations found in inter and intra-specific comparisons of the B. distachyon, B. stacei and B. hybridum plastomes (a). Polymorphisms found between the chloroplast genomes of B. distachyon inbred line Bd21 (NC_011032.1) uploaded by Bortiri et al. (2008, 2010) and B. distachyon inbred line Bd21 assembled in present study. Note that our newly assembled Bd21 plastome has better supporting evidence than NC_011032.1 as most mutations detected in our assembly are supported by great read depth coverage and were also found in a large number of plastomes from the other studied B. distachyon accessions. *Annotated insertions. ** Poly-A region highly variable

Brachypodium distachyon Bd21

Mutation Consensus between our NC_011032.1 Current study Region and mutation (synonymous/non- Regions (SNPs, Coverage (x) of assembled Bd21-control synonymous) Bd21-control indels*) (Bortiri et al. Bd21-control and the other assembled lines of B. distachyon polymorphisms 2008) (Bd21C)

Indel AGG 1- 3 - 52/52 - Intergenic

Indel C 10 - 52/52 - Intergenic

Indel G 24 - 52/52 - Intergenic

Indel G 27 - 52/52 - Intergenic

Indel T 53 - 52/52 - Intergenic LONG SINGLELONG COPY (LSC) SNP G (593) T (586) 52/52 16,302 psbA - synonymous

21

Indel G 3,208 - 52/52 - Intergenic

Indel T 3,578 - 52/52 - Intergenic

SNP A (3,921) C (3,913) 37/52 11,536 Intergenic

Indel T - 6,753 52/52 14,463 psbK – non-synonymous

Indel A - 6,757 52/52 14,351 psbK – non-synonymous

SNP C (6,783) T (6,777) 52/52 14,490 psbK – non-synonymous

Indel A - 6,784 52/52 14,325 psbK – non-synonymous

Indel 11,141 – 11,146 - ** - Intergenic AAAAAA

Indel C 12,333 - 52/52 - Intergenic

Indel T - 17,447 52/52 16,325 Intergenic

SNP T (29,074) C (29,063) 52/52 16,750 rpoC2 – non-synonymous

SNP A (29,276) G (29,265) 52/52 13,439 rpoC2 – synonymous

SNP T (29,396) G (29,385) 52/52 8,711 rpoC2 – synonymous

Indel C - 29,487 52/52 13,702 rpoC2 – non-synonymous

Indel A 29,504 - 52/52 - rpoC2 – non-synonymous

Indel G - 36,487 52/52 14,617 Intergenic

22

Indel C - 36,490 52/52 14,950 Intergenic

SNP G (40,576) A (40,567) 52/52 11,318 psaA – synonymous

Indel A - 54,273 52/52 16,419 Intergenic

SNP K (70,065) T (70,058) 52/52 14,966 Intergenic

Missing N (70,595) A (70,588) 52/52 15,477 Intergenic data

Indel T - 70,628 52/52 16,033 Intergenic

Indel T - 77,483 52/52 13,873 Intergenic

Indel A - 77,492 52/52 14,002 Intergenic

Indel A - 77,494 52/52 13,879 Intergenic

Indel C 78,925 - 52/52 - Intergenic

Indel T - 78,929 46/52 12284 Intergenic

SNP G (78,944) T (78,941) 52/52 12,543 Intergenic

Indel A - 78,950 52/52 12,884 Intergenic

Indel T - 98,920 46/52 14,041 Intergenic

Indel T - 98,924 46/52 14,316 Intergenic NVERTED NVERTED I REPEAT (IR) REPEAT Indel A - 98,928 46/52 14,421 Intergenic

23

Indel G 98,948 - 52/52 - Intergenic

SNP A (100,189) C (100,189) 31/52 219 Intergenic

SHORT SINGLE COPY Indel A - 103,494 ** - Intergenic

(SSC)

Indel T - 115,719 49/52 12,988 Intergenic

Missing - 117,709 – 117,948 - - Intergenic data INVERTED REPEAT Indel T 123,317 - - - 16S ribosomal RNA

(IR) Missing - 126,895 – 126,979 - - Intergenic data

Missing - 134,020 – 134,113 - - Intergenic data

24

(b). Characteristics of the 133 genes found in the B. distachyon, B. stacei and B. hybridum assembled plastomes, and in the B. distachyon Bd21 (NC_011032.1) reference plastome, annotated according to the best assembled B. distachyon ABR6 plastome (excluding the IRb region).

accessions Length Missing data Parsimony Non- Missing in 122 grass Gene Coordinate Indels SNPs synonymous cpDNA genomes compared (bp) (Ns) informative psbA 88 58 1062 0 0 6 5 6 1 matK 1639 58 1536 0 0 30 28 30 2 rps16 4401 58 258 0 0 4 4 4 3 psbK 6687 58 186 0 0 12 4 12 0 psbI 7283 58 111 0 0 0 0 0 0 psbD 8588 58 1062 0 0 10 9 10 0 psbC 9597 58 1422 0 0 6 6 6 1 psbZ 11627 58 189 0 0 0 0 1 2 psbM 16357 58 105 0 0 0 0 0 0 petN 17258 58 90 0 0 0 0 0 2 rpoB 19549 58 3231 0 0 31 31 31 0 rpoC1 22812 58 2049 0 0 22 22 22 0

25 rpoC2 25070 58 4443 0 0 70 63 70 1 rps2 29819 58 711 0 0 10 9 10 0 atpI 30785 58 744 0 0 3 3 3 0 atpH 31915 58 246 0 0 1 1 1 0 atpF 32576 58 552 0 0 2 2 2 2 atpA 34044 58 1524 0 0 14 13 14 0 rps14 36110 58 312 0 0 2 2 2 2 psaB 36567 58 2205 0 0 14 13 14 0 psaA 38797 58 2253 0 0 15 13 15 0 ycf3 41685 58 513 0 0 3 3 3 3 rps4 44631 58 606 0 0 2 2 2 0 ndhJ 48071 58 480 0 0 1 1 1 0 ndhK 48656 58 738 0 3 7 6 7 1 ndhC 49384 58 363 0 0 4 4 4 0 atpE 51608 58 414 0 0 3 3 3 0 atpB 52018 58 1497 0 0 9 9 9 2 rbcL 54300 57 1431 0 0 15 15 15 0

26 psaI 56192 58 111 0 0 1 1 1 3 ycf4 56622 58 558 0 0 7 7 7 1 cemA 57605 58 693 0 0 5 5 5 2 petA 58522 58 963 0 0 4 4 4 0 psbJ 60319 58 123 0 0 0 0 0 3 psbL 60572 58 117 0 0 1 1 1 4 psbF 60711 58 120 0 0 0 0 0 2 psbE 60841 58 252 0 0 1 1 1 2 petL 62370 58 96 0 0 0 0 0 5 petG 62639 58 114 0 0 0 0 0 4 psaJ 63500 58 129 0 0 1 1 1 5 rpl33 64070 58 201 0 0 1 1 1 3 rps18 64579 58 492 0 3 6 6 6 5 rpl20 65227 58 360 0 0 7 7 7 4 rps12 66278 58 363 0 0 0 0 0 0 rps12-2 66278 58 375 0 0 0 0 0 0 clpP 66533 58 651 0 0 3 3 3 0

27 psbB 67696 58 1527 0 0 12 12 12 0 psbT 69409 58 108 0 0 2 1 2 3 psbN 69565 58 132 0 0 0 0 0 1 psbH 69800 58 222 0 0 3 3 3 1 petB 70915 58 699 0 0 4 4 4 2 petD 72516 58 525 0 0 3 3 3 2 rpoA 73250 58 1020 0 0 15 15 15 0 rps11 74334 58 432 0 0 4 4 4 1 rpl36 74953 58 114 0 0 1 1 1 1 infA 75173 58 324 0 0 5 5 5 0 rps8 75577 58 411 0 0 4 4 4 0 rpl14 76129 58 372 0 0 3 3 3 1 rpl16 76587 58 411 0 0 4 4 4 0 rps3 78193 58 720 0 0 12 11 12 0 rpl22 78970 58 450 0 0 6 5 6 0 rps19 79493 58 282 1 0 2 2 3 3 rpl2 80038 58 792 0 431 1 1 1 1

28 rpl23 81540 58 282 0 0 1 1 1 0 ndhB 85219 58 1533 0 0 3 3 3 3 rps7 87762 56 471 0 0 2 2 2 1 rps15 100385 58 273 0 0 1 1 1 0 ndhF 101014 57 2225 0 15 59 57 59 1 rpl32 104086 58 181 0 12 9 9 9 0 ccsA 105139 58 969 0 0 19 18 19 1 ndhD 106272 58 1503 0 0 17 16 17 0 psaC 107894 58 246 0 0 5 5 5 0 ndhE 108586 57 306 27 0 4 4 31 0 ndhG 109100 58 531 0 0 6 6 6 0 ndhI 109881 58 543 0 0 9 8 9 0 ndhA 110517 58 1089 0 0 15 14 15 1 ndhH 112643 58 1182 0 0 16 16 16 0

29

(c). Rearrangements reported in rpl23 and rps19 gene copies in several plastomes of grasses.

rpl23 rps19

Length Functional Specie Accession Chloroplast (bp) gene copies Annotation of rpl23 (pseudogen or functional copies Annotation of rps19 copies annotated in gene) in “the insert” annotated chloroplast

Acidosa purpurea NC_015820 139,697 2 pseudogene not annotated 2

56,532-56,816 Agrostis NC_008591 136,584 3 rpl23 copies rbcL-rpl23-psaI 2 stolonifera YP_874745

Anomochloa NC_014062 138,412 2 pseugene rpl23 absent not annotated not annotated marantoidea

Bambusa NC_015830 139,493 2 pseudogene not annotated 2 emeiensis

Bambusa NC_012927 139,350 2 pseudogene not annotated 2 oldhamii

Brachypodium NC_011032 135,199 2 pseugene rpl23 absent 2 distachyon Bd21

Brachypodium current study 136,330 2+1 pseudo 56,338-56,565 1 80,961-81,242

30 stacei ABR114 rbcL - rpl23 pseudogene - psaI

(Insert 56,336-57,496)

56,337-56,564 Brachypodium current study 136,327 2+1 pseudo rbcL - rpl23 pseudogene - psaI 1 80,958-81,239 hybridum ABR113 (Insert 56,335-57,495)

58,900-59,163 Coix lacryma-jobi NC_013273 140,745 2+1 pseudo 2 rbcL - rpl23 pseudogen - accD pseudogene - psaI

Dendrocalamus NC_013088 139,394 2 pseudogene not annotated 2 latiflorus

Ferrocalamus NC_015831 139,467 2 pseudogene not annotated 2 rimosivaginus

Festuca NC_011713 136,048 2 pseudogene not annotated 2 arundinacea

56,648-56,925 Hordeum vulgare subsp. vulgare EF115541 136,462 2+1 similar to rpl23 2 cultivar Morex rbcL-rpl23 misc_feature-psaI

Indocalamus NC_015803 139,668 2 pseudogene not annotated 2 longiauritus

Lolium perenne NC_009950 135,282 2 pseudogene not annotated 2

31

Oryza sativa (indica cultivar- 134,175-134,456 AY522329 134,496 2 pseudogene not annotated 2 group) isolate 93- “similar to ribosomal protein S19" 11

Oryza sativa 80,649-80,930 (japonica cultivar- AY 522331 134,551 2 pseudogene not annotated 2 134,226-134,507 group) isolate PA64S "similar to ribosomal protein S19"

Panicum virgatum NC_015990 139,619 2 pseudogene not annotated 2

Phyllostachys NC_015817 139,679 2 pseudogene not annotated 2 edulis

Phyllostachys NC_015826 139,839 2 pseudogene not annotated 2 nigra var. henonis

Saccharum hybrid 59,179-59,421 NC_006084 141,182 2+1 2 cultivar NCo 310 rbcL-rpl23 pseudogene-psaI

59,411-59,701

Sorghum bicolor NC_008602 140,754 3 rpl23 copies rbcL-rpl23-psaI 2

YP_899416

55,636-56,919 Triticum aestivum NC_002762 134,545 2+1 2 rbcL-rpl23 pseudogene-psaI

32

Zea mays X86563 140,384 2 pseudogene not annotated 2

33

(d). rpl23 pseudogene output obtained from Blastx searches of the B. stacei/B.hybridum 1.1 kbp insert into annotated plastomes of several grasses.

Max. Total Query Description E value Ident. Accession score score cover ribosomal protein L23 [Bambusa 146 146 19% 3.00E-39 93% YP_003029781.1 oldhamii] ribosomal protein L23 [Aristida 144 144 19% 1.00E-38 92% YP_009072631.1 purpurea] ribosomal protein L23 [Greslania 144 144 19% 2.00E-38 93% YP_009135152.1 sp. McPherson 19217] ribosomal protein L23 [Agrostis 144 144 19% 2.00E-38 93% YP_874779.1 stolonifera] ribosomal protein L23 [Hordeum 144 144 19% 3.00E-38 93% AGP50796.1 vulgare subsp. vulgare] ribosomal protein L23 [Zea mays] 143 143 19% 4.00E-38 92% NP_043068.1 ribosomal protein L23 [Oryza 143 143 19% 4.00E-38 92% NP_039429.1 sativa Japonica Group] ribosomal protein L23 [Olyra 143 143 19% 4.00E-38 92% YP_009033485.1 latifolia] Putative ribosomal protein L23 from chromosome 10 chloroplast 143 143 19% 5.00E-38 92% AAM08579.1 insertion [Oryza sativa Japonica Group] ribosomal protein L23 [Oryza 144 144 19% 7.00E-38 92% AER12861.1 sativa Indica Group]

34

(e). Polymorphisms detected among the assembled plastomes of the B. stacei and B. hybridum accessions.

B. stacei B. hybridum B. hybridum B. hybridum ABR114 ABR113 BdTR6G Pop1 Positio Positio Positio Positio Mutatio Mutation Mutation Mutation n n n n n 1,552 indel T 1,552 indel T - - substitution - - - - 7,697 - - G (T) substitution (psbT gene - - - - - 70,902 synonymous - - ) T (G)

106,20 substitution ------2 A (T) 112,97 substitutio ------1 n T (G) substitution (rpl23 gene 134,46 – non – ------2 synonymous ) C (G)

35

Table S7. List of B. distachyon cpDNA haplotypes found across the 53 analyzed ecotypes’ plastomes.

Haplotypes (SNPs only, indels Haplotypes (SNPs and indels) excluded)

Total # 32 36 haplotypes

Unique 26 30 haplotypes

13 (Adi10; Adi12; Bd21-3; Bd2- 11 (Adi12; Bd21-3; Bd2-3; Bd3-1; 3; Bd3-1; BdTR12C; BdTR1I; H1 BdTR12C; BdTR1I; BdTR2B; BdTR2G; BdTR2B; BdTR2G; BdTR5I; BdTR9K; Kah1; Koz1) BdTR9K; Kah1; Koz1)

H2 2 ( BdTR10C; Kah5) 2 ( BdTR10C; Kah5)

H3 3 (BdTR11A; BdTR11G; BdTR11I) 3 (BdTR11A; BdTR11G; BdTR11I)

4 (BdTR13A; BdTR13C; BdTR3C; H4 2 (BdTR13A; Bis1) Bis1)

H5 3 ( BdTR8I; Tek2; Tek4) 3 ( BdTR8I; Tek2; Tek4)

H6 2 (Foz1; Sig2) 2 (Foz1; Sig2)

36

Table S8. Flowering time groups classified according to Ream et al. (2014)

Photoperiod Flowering time Weeks vernalization Groups requirements (days) (wV) (hours) Extremely Rapid < 30 20 NV* Flowering (ERF) Rapid Flowering (RF) 30-35 20 NV Intermediate Rapid 50-60 20 NV Flowering (IRF) Intermediate Delayed Flowering 50 20 2-4 (IDF) Delayed Flowering 20-30 20 6-8 (DF) Extremely Delayed 60 20 10 Flowering (EDF)

*NV  No vernalization

37

Table S9. Percentages of membership of 53 B. distachyon lines’ plastome profiles to optimal K= 2 and K= 4 Bayesian genomic groups inferred with Structure v.2.3.2. * introgressed ecotypes.

Bayesian genomic groups

K2 K4 Phylogenetic Ecotypes groups Group 1 Group 2 Group 1 Group 2 Group 3 Group 4

Bd29-1 1 0 0 0.999 0 0

Bd1-1 1 0 0 0.999 0 0

BdTR13A 1 0 0 0.999 0 0

BdTR13C 1 0 0 0.999 0 0

BdTR3C 1 0 0 0.999 0 0

Bis1 1 0 0 0.999 0 0

EDF+ BdTR7A 0.999 0.001 0.001 0.999 0 0

BdTR11A 1 0 0 0.999 0 0

BdTR11G 1 0 0 0.999 0 0

BdTR11I 1 0 0 0.999 0 0

BdTR8I 1 0 0 0.999 0 0

Tek2 1 0 0 0.999 0 0

Tek4 1 0 0 0.999 0 0

Arn1* 0.561 0.439 0.001 0.001 0.997 0.001

Mon3* 0.569 0.431 0 0 0.999 0

ABR8 0.001 0.999 0.996 0 0 0.003 S+ Jer1 0.001 0.999 0.997 0 0 0.002

ABR2 0.001 0.999 0.996 0 0.001 0.003

S8iiC 0.001 0.999 0.996 0 0 0.003

38

ABR6 0 1 0.998 0 0 0.002

RON4 0 1 0.998 0 0 0.001

ABR5 0 1 0.998 0 0 0.001

Mur1 0 1 0.998 0 0 0.001

Per1 0 1 0.998 0 0 0.001

Foz1 0 1 0.998 0 0 0.001

Sig2 0 1 0.998 0 0 0.001

Mig3 0 1 0.994 0 0 0.005

ABR4 0 1 0.996 0 0 0.003

ABR7 0 1 0.996 0 0 0.003

Bd30-1 0 1 0.996 0 0 0.003

Luc1 0 1 0.996 0 0 0.004

Bd21C 0 1 0.002 0 0 0.997

ABR3 0 1 0.002 0 0 0.997

Uni2 0 1 0.002 0 0 0.997

Bd18-1 0 1 0.002 0 0 0.997

Gaz8 0 1 0.001 0 0 0.998

Koz3 0 1 0.001 0 0 0.998

Adi10 0 1 0.001 0 0 0.998 T+ Adi12 0 1 0.001 0 0 0.998

Bd21-3 0 1 0.001 0 0 0.998

Bd2-3 0 1 0.001 0 0 0.998

Bd3-1 0 1 0.001 0 0 0.999

BdTR12C 0 1 0.001 0 0 0.998

BdTR1I 0 1 0.001 0 0 0.998

BdTR2B 0 1 0.001 0 0 0.998

39

BdTR2G 0 1 0.001 0 0 0.998

BdTR5I 0 1 0.001 0 0 0.998

BdTR9K 0 1 0.001 0 0 0.998

Kah1 0 1 0.001 0 0 0.998

Koz1 0 1 0.001 0 0 0.998

Adi2 0 1 0.001 0 0 0.998

BdTR10C 0 1 0.001 0 0 0.999

Kah5 0 1 0.001 0 0 0.998

40

Table S10. Pairwise Tamura-Nei raw and patristic genetic distances between Brachypodium [B. distachyon (ABR6), B. stacei (ABR114), B. hybridum (ABR113)] and 91 grass plastomes (Table S2). Patristic distances were calculated in the best ML tree (Fig. S5a).

distachyon (ABR6) B. stacei (ABR114) stacei B. (ABR113) hybridum B. (ABR6) distachyon B. (ABR114) stacei B. (ABR113) hybridum B. B. Pastristic Tamura- Raw Tamura-Nei Nei Anomochlooideae Anomochloa marantoidea 0.211 0.211 0.211 0.079 0.079 0.079

Pharus lappulaceus 0.156 0.156 0.156 0.059 0.059 0.060 Pharus latifolius 0.157 0.157 0.158 0.060 0.060 0.061

Puelioideae Puelia olyriformis 0.102 0.102 0.102 0.040 0.040 0.041 Lecomtella madagascariensis 0.133 0.133 0.133 0.050 0.050 0.051 Panicum virgatum 0.143 0.143 0.144 0.054 0.054 0.055

Zea mays 0.148 0.148 0.148 0.056 0.056 0.057 Panicoideae Coix lacryma-jobi 0.148 0.148 0.148 0.056 0.056 0.056 PACMAD Sorghum bicolor 0.145 0.145 0.145 0.055 0.055 0.055 Saccharum hybrid 0.142 0.142 0.142 0.054 0.054 0.054 Rhynchoryza subulata 0.119 0.119 0.120 0.047 0.047 0.047 Leersia tisserantii 0.139 0.139 0.139 0.054 0.054 0.054 Ehrhartoideae Oryza rufipogon 0.133 0.133 0.133 0.051 0.051 0.051 Oryza sativa 0.133 0.133 0.134 0.051 0.051 0.051 Oryza nivara 0.134 0.134 0.134 0.051 0.051 0.052 Olyra latifolia 0.118 0.118 0.118 0.047 0.047 0.048 Dendrocalamus latiflorus 0.095 0.095 0.096 0.038 0.038 0.039

Bambusa emeiensis 0.094 0.094 0.095 0.038 0.038 0.038 Bambusa oldhamii 0.097 0.097 0.097 0.039 0.039 0.039

Ampelocalamus calcareus 0.099 0.099 0.099 0.039 0.039 0.040 BEP (BOP) Gaoligongshania megalothyrsa 0.098 0.098 0.098 0.039 0.039 0.039

Ferrocalamus rimosivaginus 0.096 0.096 0.096 0.038 0.038 0.038 Gelidocalamus tessellatus 0.096 0.096 0.096 0.038 0.038 0.038 Bambusoideae Arundinaria gigantea 0.097 0.097 0.097 0.038 0.038 0.038 Arundinaria appalachiana 0.096 0.096 0.096 0.038 0.038 0.038 Arundinaria tecta 0.097 0.097 0.097 0.038 0.038 0.039 Acidosasa purpurea 0.095 0.095 0.096 0.038 0.038 0.038 Pleioblastus maculatus 0.095 0.095 0.096 0.038 0.038 0.038

41

Indosasa sinica 0.095 0.095 0.096 0.038 0.038 0.038 Oligostachyum shiuyingianum 0.095 0.095 0.096 0.038 0.038 0.038 Indocalamus wilsonii 0.096 0.096 0.097 0.038 0.038 0.039 Chimonocalamus longiusculus 0.096 0.096 0.097 0.038 0.038 0.039 Thamnocalamus spathiflorus 0.096 0.096 0.096 0.038 0.038 0.039 Sarocalamus faberi 0.095 0.095 0.095 0.037 0.037 0.038 Fargesia yunnanensis 0.095 0.095 0.095 0.037 0.037 0.038 Indocalamus longiauritus 0.095 0.095 0.095 0.038 0.038 0.038 Yushania levigata 0.095 0.095 0.095 0.038 0.038 0.038 Fargesia nitida 0.095 0.095 0.095 0.038 0.038 0.038 Fargesia spathacea 0.095 0.095 0.095 0.038 0.038 0.038 Phyllostachys propinqua 0.095 0.095 0.096 0.038 0.038 0.038 Phyllostachys edulis 0.095 0.095 0.095 0.038 0.038 0.038 Phyllostachys nigra 0.095 0.095 0.095 0.038 0.038 0.038 Phyllostachys sulphurea 0.095 0.095 0.095 0.038 0.038 0.038 Brachyelytreae Brachyelytrum aristosum 0.104 0.104 0.105 0.041 0.041 0.041

Phaenospermateae Phaenosperma globosum 0.077 0.077 0.077 0.031 0.031 0.032

Stipa hymenoides 0.078 0.078 0.078 0.031 0.031 0.032 Stipeae Piptochaetium avenaceum 0.074 0.074 0.074 0.030 0.030 0.030

Ampelodesmeae Ampelodesmos mauritanicus 0.070 0.070 0.071 0.028 0.028 0.029 Stipeae Oryzopsis asperifolia 0.070 0.070 0.071 0.028 0.028 0.029

Melica mutica 0.095 0.095 0.095 0.037 0.037 0.037 Meliceae Melica subulata 0.097 0.097 0.097 0.037 0.037 0.038

Diarheneae Diarrhena obovata 0.055 0.055 0.055 0.023 0.023 0.023 Avena sativa 0.099 0.099 0.099 0.039 0.039 0.039 Trisetum cernuum 0.095 0.095 0.095 0.037 0.037 0.037 Phalaris arundinacea 0.089 0.089 0.089 0.035 0.035 0.035 Torreyochloa pallida 0.087 0.087 0.087 0.035 0.035 0.035 Anthoxanthum odoratum 0.105 0.105 0.105 0.041 0.041 0.041 Hierochloe odorata 0.088 0.088 0.088 0.035 0.035 0.035 Briza maxima 0.098 0.098 0.099 0.039 0.039 0.039

Agrostis stolonifera 0.092 0.092 0.092 0.036 0.036 0.036 Ammophila breviligulata 0.085 0.085 0.085 0.034 0.034 0.034 Puccinellia nuttalliana 0.095 0.095 0.095 0.037 0.037 0.038 eae Aveneae + Phleum alpinum 0.090 0.090 0.091 0.036 0.036 0.036 Po Poa palustris 0.095 0.095 0.095 0.037 0.037 0.037 Helictochloa hookeri 0.099 0.099 0.099 0.039 0.039 0.039 Dactylis glomerata 0.098 0.098 0.099 0.039 0.039 0.039 Deschampsia antarctica 0.092 0.092 0.093 0.037 0.037 0.037

Festuca ovina 0.100 0.100 0.100 0.039 0.039 0.039 Festuca altissima 0.092 0.092 0.092 0.036 0.036 0.036

Pooideae Festuca arundinacea 0.101 0.101 0.102 0.039 0.039 0.040

42

Festuca pratensis 0.103 0.103 0.103 0.040 0.040 0.041 Lolium multiflorum 0.104 0.104 0.104 0.040 0.040 0.041 Lolium perenne 0.104 0.104 0.104 0.040 0.040 0.041 Bromus vulgaris 0.092 0.092 0.092 0.036 0.036 0.036 Hordeum jubatum 0.092 0.092 0.092 0.035 0.035 0.036 Hordeum vulgare 0.094 0.094 0.094 0.036 0.036 0.037 Secale cereale 0.090 0.090 0.090 0.035 0.035 0.036 Triticum monococcum 0.090 0.089 0.090 0.035 0.035 0.035 Triticum urartu 0.089 0.089 0.089 0.035 0.035 0.035

Aegilops tauschii 0.089 0.089 0.090 0.035 0.035 0.035 Aegilops cylindrica 0.089 0.089 0.090 0.035 0.035 0.035 Aegilops geniculata 0.089 0.089 0.090 0.035 0.035 0.036 Aegilops bicornis 0.089 0.089 0.089 0.035 0.035 0.035 Aegilops kotschyi 0.089 0.089 0.089 0.035 0.035 0.035 Triticeae Bromeae + Aegilops sharonensis 0.089 0.089 0.089 0.035 0.035 0.035 Aegilops longissima 0.089 0.089 0.089 0.035 0.035 0.035 Aegilops searsii 0.089 0.089 0.089 0.035 0.035 0.035 Aegilops speltoides 0.089 0.089 0.089 0.035 0.035 0.035 Triticum timopheevii 0.089 0.088 0.089 0.035 0.035 0.035 Triticum turgidum 0.089 0.089 0.089 0.035 0.035 0.035 Triticum aestivum 0.089 0.089 0.089 0.035 0.035 0.035

43

1 SUPPLEMENTARY INFORMATION. METHODS S1 2 3 Detailed description of the plastome automated assembly pipeline

4 A pipeline for the automated assembly and annotation of plastomes was developed (Fig. S1). 5 This workflow employs a large set of bioinformatics software packages (Table S1). First, 6 DUK (http://duk.sourceforge.net) is used to extract putative chloroplast reads from WGS 7 reads. The Next steps involve quality control and filter of raw sequencing reads using FastQC 8 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and Trimmomatic (Bolger et al. 9 2014) respectively. Substitution errors can be optionally corrected with Musket (Liu et al., 10 2013). These trimming and filtering steps results in paired and single reads which can be 11 managed using split_pairs (https://github.com/eead-csic-compbio/split_pairs). Further quality 12 control can be performed by assessing orientation and insert size of paired reads, after 13 mapping them to a reference genome with BWA (Li & Durbin, 2009). Contig assembly can be 14 performed with Velvet de novo assembler (Zerbino & Birney, 2008) or with Columbus 15 module of Velvet (Zerbino, 2010) for reference-guided assembly, attempting to resolve 16 inverted repeats (IRs). Scaffolds are constructed using SSPACE (Boetzer et al., 2011), and 17 Gapfiller (Boetzer and Pirovano 2012; Nadalin et al. 2012) is used to gap-fill them using all 18 available PE and MP (reverse complement) reads. Potential overlaps among scaffold ends are 19 confirmed with custom Perl scripts and BLASTN (Camacho, 2013).

20 Assembly and annotation of Brachypodium plastomes

21 Several Pooideae plastomes were used to infer background for the chloroplast k-mer 22 distributions by DUK (Table S2). Pipeline parameters, number of input reads or k-mer length, 23 were estimated previously with VelvetOptimiser 24 (http://www.vicbioinformatics.com/software.velvetoptimiser.shtml). The assembly was guided 25 by the reference Bd21 plastome. Some scaffolds were manually merged by checking overlaps 26 of velvet k-mer length - 1. Finally, any remaining errors were detected and corrected with help 27 from SEQuel (Ronen et al., 2012), and by visual inspection of the original sequence reads 28 mapped onto the assembled scaffolds using IGV software (Thorvaldsdóttir et al 2013).

29 Protein-coding genes and transfer RNAs in B. distachyon (ABR6 ecotype), B. stacei (ABR114 30 ecotype) and B. hybridum (ABR113 ecotype) plastomes were identified and annotated using 31 cpGAVAS (Liu et al., 2012) and BLAST (Camacho, 2013) tools, with extensive manual 32 curation. These annotations were then exported and adapted to the remaining genome

1

33 assemblies with Perl script _annot_fasta_from_gbk.pl, documented at https://github.com/eead- 34 csic-compbio/chloroplast_assembly_protocol.

35 All protein-coding and tRNA genes were further aligned and validated by comparison with B. 36 distachyon Bd21 (NC_011032.1) reference plastome. A circular gene map of the chloroplast 37 genome was generated using OrganellarGenomeDRAW (Lohse et al., 2013) and the 38 similarities and differences among all assembled genomes were analyzed using script 39 _check_matrix.pl (https://github.com/eead-csic-compbio/chloroplast_assembly_protocol) and 40 illustrated with Circos software (Krzywinski et al., 2009) using B. distachyon ABR6 line as 41 reference and “window size = 100” (Fig. 2).

42 Validation of plastid assemblies by PCR and Sanger sequencing

43 Junctions between IR-LSC, LSC-IR, IR-SSC and SSC-IR regions of B. stacei ABR114 and B. 44 hybridum ABR113 assembled plastomes were amplified and sequenced. Besides, the deletion 45 of one of rps19 copy (180 bp) in the junction between LSC and IR in B. stacei and hybridum 46 lines (see Results) was confirmed by amplifying, gel electrophoresis and Sanger sequencing in 47 all B. hybridum and B. stacei lines and B. distachyon Bd21 (Fig. S2). Primers (Table S13) 48 were designed using Geneious software (Kearse et al., 2012).

49 Each 25 µL PCR reaction contained the following: 2.5 µL of KAPA Taq buffer A with

50 MgCl2, 2.5 µL, MgCl2, 0.63 µL of dNTPs, 0.25 µL KAPA Taq DNA Polymerase, 0.5 µL of 51 each primer (10 µM), 17.12 µL of Milli-Q water and 1 µL template DNA. PCR conditions 52 were 3 min at 94°C, followed by 30 cycles of 94°C for 30 secs, 65ºC for 45 secs, and 72°C for 53 1 min and finally 72ºC for 7 min.

54

55 Supplementary references

56 Boetzer M, Henkel C V., Jansen HJ, Butler D, Pirovano W. 2011. Scaffolding pre- 57 assembled contigs using SSPACE. Bioinformatics 27: 578–579.

58 Boetzer M, Pirovano W. 2012. Toward almost closed genomes with GapFiller. Genome 59 biology 13: R56.

60 Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: A flexible trimmer for Illumina 61 sequence data. Bioinformatics 30: 2114–2120.

62 Camacho C. 2013. BLAST+.

63 Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper 64 A, Markowitz S, Duran C, et al. 2012. Geneious Basic: An integrated and extendable 65 desktop software platform for the organization and analysis of sequence data. Bioinformatics

2

66 28: 1647–1649.

67 Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra 68 MA. 2009. Circos: An information aesthetic for comparative genomics. Genome Research 19: 69 1639–1645.

70 Li M, Copeland A, Han J. 2011. DUK - A Fast and Efficient Kmer Based Sequence 71 Matching Too.

72 Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler 73 transform. Bioinformatics 25: 1754–1760.

74 Liu Y, Schröder J, Schmidt B. 2013. Musket: A multistage k-mer spectrum-based error 75 corrector for Illumina sequence data. Bioinformatics 29: 308–315.

76 Liu C, Shi L, Zhu Y, Chen H, Zhang J, Lin X, Guan X. 2012. CpGAVAS, an integrated 77 web server for the annotation, visualization, analysis, and GenBank submission of completely 78 sequenced chloroplast genome sequences. BMC genomics 13: 715.

79 Lohse M, Drechsel O, Kahlau S, Bock R. 2013. OrganellarGenomeDRAW--a suite of tools 80 for generating physical maps of plastid and mitochondrial genomes and visualizing expression 81 data sets. Nucleic acids research 41: 575–581.

82 Nadalin F, Vezzi F, Policriti A. 2012. GapFiller: a de novo assembly approach to fill the gap 83 within paired reads. BMC Bioinformatics 13: S8.

84 Ronen R, Boucher C, Chitsaz H, Pevzner P. 2012. sEQuel: Improving the accuracy of 85 genome assemblies. Bioinformatics 28: 188–196.

86 Thorvaldsdóttir H, Robinson JT, Mesirov JP. 2013. Integrative Genomics Viewer (IGV): 87 High-performance genomics data visualization and exploration. Briefings in Bioinformatics 88 14: 178–192.

89 Zerbino DR. 2010. Using the Columbus extension to Velvet.

90 Zerbino DR, Birney E. 2008. Velvet: Algorithms for de novo short read assembly using de 91 Bruijn graphs. Genome Research 18: 821–829.

92

3