Genome

Complete Plastome Sequences of Picea asperata Mast., P. crassifolia Kom. and Comparative Analyses with P. abies (L.) Karst. and P. morrisonicola Hayata

Journal: Genome

Manuscript ID gen-2018-0195.R1

Manuscript Type: Article

Date Submitted by the 19-Mar-2019 Author:

Complete List of Authors: Ouyang, Fangqun; State Key Laboratory of Forest Genetics and Tree Breeding, Hu, Jiwen; Research Institute of Forestry, Chinese Academy of Forestry Wang, Junchen;Draft Northwest Agriculture & Forestry University Ling, Juanjuan; Research Institute of Forestry, Chinese Academy of Forestry Wang, Zhi; Research Institute of Forestry, Chinese Academy of Forestry Wang, Nan ; Research Institute of Forestry, Chinese Academy of Forestry Ma, Jianwei; Research Institute of Forestry of Xiaolong Mountain Zhang, Hanguo; State Key Laboratory of Tree Genetics and Breeding Mao, Jianfeng; Beijing Forestry University Wang, Junhui ; Chinese Academy of Forestry

Keyword: Picea crassifolia, Picea asperata, plastome, ycf1, highly variable regions

Is the invited manuscript for consideration in a Special Not applicable (regular submission) Issue? :

https://mc06.manuscriptcentral.com/genome-pubs Page 1 of 41 Genome

1 Complete Plastome Sequences of Picea asperata Mast., P. crassifolia Kom. and

2 Comparative Analyses with P. abies (L.) Karst. and P. morrisonicola Hayata

3 Fangqun OuYang1, Jiwen Hu1, Junchen Wang12, Juanjuan Ling1, Zhi Wang1, Nan

4 Wang1, Jianwei Ma3, Hanguo Zhang4, Jian-Feng Mao5, Junhui Wang1*

5 1 State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree

6 Breeding and Cultivation of State Forestry Administration, Research Institute of

7 Forestry, Chinese Academy of Forestry, Beijing, PR China.

8 2 Northwest Agriculture & Forestry University, Xi’an, P. R. China.

9 3 Research Institute of Forestry of Xiaolong Mountain, Gansu Provincial Key 10 Laboratory of Secondary Forest Cultivation,Draft Gansu, P. R. China. 11 4 State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University,

12 Harbin, People’s Republic of China

13 5 National Engineering Laboratory for Forest Tree Breeding, Key Laboratory for

14 Genetics and Breeding of Forest Trees and Ornamental of Ministry of

15 Education, College of Biological Science and Technology, Beijing Forestry

16 University, Beijing, 100083, PR China.

17 *Corresponding author: Junhui Wang, Dongxiaofu 1#, Xiangshan East Road, Haidian

18 District, Beijing, PR China, [email protected]

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 2 of 41

1 Abstract

2 Picea asperata and P. crassifolia have sympatric ranges and are closely related, but

3 the differences between these at the plastome level are unknown. To better

4 understand the patterns of variation among Picea plastomes, the complete plastomes

5 of P. asperata and P. crassifolia were sequenced. Then, the plastomes were compared

6 with the complete plastomes of P. abies and P. morrisonicola, which are closely and

7 distantly related to the focal species, respectively. We also used these sequences to

8 construct phylogenetic trees to determine the relationships among and between the

9 four species as well as additional taxa from and other gymnosperms. 10 Analysis of our sequencing dataDraft allowed us to identify 438 single nucleotide 11 polymorphism (SNPs) point mutation events, 95 indel events, four inversion events,

12 and seven highly variable regions, including six gene spacer regions (psbJ-petA,

13 trnT-psaM, trnS-trnD, trnL-rps4, psaC-ccsA, and rps7-trnL) and one gene (ycf1). The

14 highly variable regions are appropriate targets for future use in the phylogenetic

15 reconstructions of closely related, sympatric Picea species as well as Pinaceae in

16 general.

17

18 Keywords Picea crassifolia, Picea asperata, plastome, ycf1, highly variable regions

https://mc06.manuscriptcentral.com/genome-pubs Page 3 of 41 Genome

19 Introduction

20 (Picea, Pinaceae) are important constituents of forests throughout the

21 Northern Hemisphere. The genus Picea comprises 34 species worldwide, and its

22 distribution and center of differentiation is in Asia (Farjon 2001). Most species occur

23 in China, including seven species endemic to the country (Farjon 2001; Fu et al.

24 1999). Picea species are generally morphologically similar and have incomplete

25 lineage sorting and interspecies introgression, which has resulted in a complicated

26 phylogeny and has caused researchers to experience problems with species

27 identification (Bouillé et al. 2011; Lockwood et al. 2013; Ran et al. 2006; Sullivan et 28 al. 2017). The significant topologicalDraft incongruence among chloroplast DNA loci is 29 suggestive of recombination, but incongruence between Picea mitochondrial DNA

30 and chloroplast DNA is suggestive of introgression (Bouillé et al. 2011; Sullivan et

31 al. 2017). Growing evidence suggests that plastomes can be widely applied for

32 species and identification at both intra- and interspecific levels (Barrett et

33 al. 2016; Huang et al. 2014; Wu et al. 2013). Unlike most angiosperms,

34 plastid inheritance were predominately paternal (Neale & Sederoff 1989). Unbroken

35 uniparental inheritance is one of key assumptions for plant evolution studies (Wolfe &

36 Randle 2004). Compared to nuclear genomes, plastomes have lower mutation rates,

37 smaller genome sizes, and smaller effective population sizes, which could provide a

38 larger basis for comparative studies in Picea species (Birol et al. 2013; Nystedt et al.

39 2013; Sullivan et al. 2017; Wolfe et al. 1987).

40 Picea asperata Masters and P. crassifolia Komarov are endemic to China, with a

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 4 of 41

41 natural range in the eastern Qinghai-Tibetan Plateau (QTP). The two species are not

42 only afforesting trees in the northwest region of China, but also has great significance

43 in economics and environment. These species exhibit parapatric areas (Whittle and

44 Johnston 2002), but P. crassifolia extends to a higher northern latitude (Bi et al.

45 2016). Eckenwalder (2009) recognizes P. crassifolia as a variety of P. schrenkiana

46 Fischet Mey which is not supported by molecular evidence in which P. crassifolia

47 was a member of the P. asperate complex (Lockwood et al. 2013). Picea asperata

48 and P. crassifolia are closely related and no derived chloroplast or mitochondrial

49 mutations are species-specific (Du et al. 2009). Nonetheless, the two species have 50 obvious phenotypic differences. Draft Compared to P. crassifolia, the leaf apex of P. 51 asperata is acute and the leaf length/width ratio is greater (Bi et al. 2016). In addition,

52 unlike P. crassifolia, the winter buds of P. asperata are resinous (Fu et al. 1999).

53 Population bottlenecks during the late Pleistocene period may have reduced

54 population size, restricted interspecific gene flow, and further promoted the

55 divergence between P. asperata and P. crassifolia by accelerating the fixation of

56 different adaptive alleles in the diverging lineages (Bi et al. 2016; Räsänen and

57 Hendry 2010). A number of species have the ability to produce viable artificial

58 hybrids, including with parapatric and allopatric taxa (OECD 1999). Picea abies (L.)

59 Karst, native to northern and central Europe, can be easily hybridized with P.

60 asperata (Zhao et al. 2015) and P. crassifolia, as we previously verified in field trials

61 (Table S1). Crossability is generally high between species with close genetic

62 relationships (Eckenwalder and Press 2009); P. crassifolia, P. asperata, and P. abies

https://mc06.manuscriptcentral.com/genome-pubs Page 5 of 41 Genome

63 are genetically closely related, belonging to the same clade in the phylogenetic trees

64 estimated by chloroplast data (Lockwood et al. 2013; Ran et al. 2006; Sullivan et al.

65 2017). Phylogenetic comparisons of Picea species showed that the chloroplast DNA

66 and nuclear DNA trees were similar, suggesting that P. asperata and P. crassifolia are

67 closely related species. In contrast, significant topological differences were found in

68 the mitochondrial DNA tree, where P. abies was found to be closely related to P.

69 asperata and P. crassifolia (Ran et al. 2015). Previous work based on mitochondrial

70 DNA variation showed that most of the variation between P. asperata and P.

71 crassifolia is caused by variation within species (80%), with negligible variation 72 between species (Du et al. 2009). DraftHowever, some differentiation of chloroplast DNA 73 sequences has been found between the two species. For example, one of nine

74 chlorotypes is restricted to P. asperata (Du et al. 2009). Sullivan et al. (2017)

75 analyzed the entire plastome of 65 accessions of Picea to test for deviations from

76 canonical plastome evolution, but studies have not compared the difference of

77 plastome sequences between the closely related P. asperata and P. crassifolia.

78 In this study, the complete plastomes of P. asperata and P. crassifolia were

79 sequenced and compared to the plastome sequences of P. abies (Birol et al. 2013) and

80 P. morrisonicola Hayata (Zou et al. 2013). Furthermore, using these plastome

81 sequences, we investigated the broader-scale phylogenetic relationships of Picea and

82 taxa from Pinaceae and other gymnosperms to verify the usefulness of plastome

83 sequences for studying phylogenetic relationships. This study aimed to (1) evaluate

84 the characteristics of the P. asperata and P. crassifolia plastomes, (2) identify the

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 6 of 41

85 variation in the plastomes of P. asperata, P. crassifolia, P. abies, and P.

86 morrisonicola, (3) better understand the patterns of variation of Picea plastomes, (4)

87 determine the relationships among the four spruces as well as among these four

88 species and other Pinaceae and gymnosperm taxa.

89

90 Materials and Methods

91 Plant materials, DNA extraction, and DNA sequencing

92 On October 10, 2015, P. asperata and P. crassifolia needles were collected from

93 30-year-old growing at the improved tree seedling base at the Research

94 Institute of Forestry of Xiaolong Mountain,Draft Tianshui, Gansu Province, China (34°07′ 95 N, 105°24′ E). Total DNA was extracted from 1 g of fresh needles following the

96 cetyltrimethylammonium bromide (CTAB) method (Li et al. 2013). DNA fragments

97 were purified through agarose gel electrophoresis, and a 500-base pair (bp)-long

98 library was produced by NEBNext (New England Biolabs, lpswich, Massachusetts,

99 USA) for sequencing analysis using a HiSeq 4000 platform (PE150).

100 Plastome assembly and annotation

101 Clean reads were generated by removing adapter sequences as well as reads with too

102 many (>10%) unknown base calls (N), low complexity, and low-quality bases (>50%

103 of the bases with a quality score <5). High quality paired-end reads were assembled

104 using SPAdes 3.6.1 (Bankevich et al. 2012) with the parameter kmer = 95. Contigs

105 from the plastome were then filtered using BLASTN (Altschul et al. 1997) and

106 aligned to the P. abies reference plastome (http://congenie.org/; Birol et al. 2013)

https://mc06.manuscriptcentral.com/genome-pubs Page 7 of 41 Genome

107 with Sequencher 4.10 (Gene Codes Corporation, Ann Arbor, Michigan, USA). To

108 further verify the contigs, Geneious 8.1 (Kearse et al. 2012) was used to map all reads

109 to the assembled plastome sequence. The consensus sequences were produced using

110 mapped reads in Geneious.

111 The plastomes were annotated using the Dual Organellar GenoMe Annotator

112 (DOGMA) (Wyman et al. 2004). All annotations were manually checked to ensure

113 accurate identification of genes, especially for the genes unannotated by DOGMA,

114 including rps16, petB, and petD. All coding genes and the locations of RNA genes

115 were identified with BLASTX and BLASTN, respectively. The plastome maps of P. 116 asperata and P. crassifolia wereDraft generated using Organellar Genome DRAW 117 (http://ogdraw.mpimp-golm.mpg.de/index.shtml) (Lohse et al. 2013) and edited using

118 Adobe Illustrator CS5 (Adobe, San Jose, USA).

119 Comparative analysis of the complete plastomes of the four spruce species

120 The plastome sequences of P. abies (GenBank accession HF937082) and P.

121 morrisonicola (GenBank accession AB480556) were extracted from the National

122 Center for Biotechnology Information (NCBI) database. The plastome sequences of P.

123 asperata, P. crassifolia, P. abies, and P. morrisonicola were aligned using MAFFT

124 (Katoh and Standley 2013) and adjusted using Se-al software (Rambaut 2002).

125 Microstructural mutations and sequence polymorphisms among the four spruces were

126 analyzed using DnaSp 5.0 (Librado and Rozas 2009) employing the sliding window

127 method to screen highly variable regions. Plastomes were then analyzed using a 600

128 bp window length and a 25 bp step size. Microstructural mutations were analyzed and

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 8 of 41

129 classified into simple sequence repeat (SSR) and non-SSR indels and inversions. We

130 tested for structural rearrangements in de novo scaffolds of the complete plastomes of

131 four spruce species using the software package MUMmer v. 3.0 (Kurtz et al. 2004).

132 In order to better understand the patterns of variation of Picea plastomes, we also

133 compared sequence identity of the two new accessions (Picea asperata and P.

134 crassifolia) and the other three Picea plastomes extracted from the NCBI database, P.

135 sitchensis (GenBank accession EU998739), P. glauca (GenBank accession

136 KT634228), and P. jezoensis (GenBank accession KT337318) with P. abies as a

137 reference using mVISTA (Dubchak, I. 2007). 138 Phylogenetic analysis Draft 139 The 68 common conserved genes from the plastid genomes of 20 species from the

140 Pinaceae and 12 outgroup species (Table S3), from the Araucariaceae, Podocarpaceae,

141 Taxaceae, and Cupressaceae, were aligned using MAFFT and adjusted using Se-al

142 software (Rambaut 2002). We used these 32 species because at the time when we

143 sequenced Picea asperata and P. crassifolia the chloroplast sequences of these 32

144 species were already available. Two phylogenetic-inference methods, allowing for

145 different mutation rates for different genes and codon positions, were employed to

146 infer trees from these 68 concatenated regions. We fit general time reversible

147 (GTR+G) models to different genes and codon positions using the BIC criterion

148 implemented in PartitionFinder (Lanfear et al. 2012). Clade support (as a percentage)

149 was evaluated with 1,000 bootstrap replicates for both the maximum parsimony (MP)

150 and maximum likelihood (ML) methods. MP analysis was implemented in PAUP1.0

https://mc06.manuscriptcentral.com/genome-pubs Page 9 of 41 Genome

151 b10 (Simmons 2004) and ML analysis in RaxML 7.04 (Stamatakis 2006).

152

153 Results

154 High-throughput sequencing analysis, general features of the P. asperata and P.

155 crassifolia plastomes, and comparisons with P. abies and P. morrisonicola

156 Using an Illumina Hiseq 4000 (PE 150) system, the plastomes from Picea asperata

157 and P. crassifolia were sequenced, resulting in the generation of 14,912,912 and

158 8,506,474 total paired-end raw reads, respectively. Among these reads, 115,828 and

159 55,312 plastome reads were extracted with 140 X and 67 X coverage identified by 160 BLASTN against the plastome sequenceDraft of P. abies for the P. asperata (deposited in 161 GenBank: KY204451) and P. crassifolia plastomes (deposited in GenBank:

162 KY204450), respectively. The percentage of plastome reads of P. asperata was

163 0.78%, and the percentage of plastome reads of P. crassifolia was 0.65% of total raw

164 reads.

165 The details of the P. asperata plastome assembly were as follows: the total

166 assembled genome size was 124,145 bp, which was divided over eight contigs with an

167 N50 of 20,773 bp and a maximum contig size of 39,232 bp. The details of the P.

168 crassifolia plastome assembly were as follows: the total assembled genome size was

169 124,126 bp, which was divided over 10 contigs with an N50 of 35,630 bp and a

170 maximum contig size of 39,556 bp.

171 The complete plastomes of P. asperata and P. crassifolia (Fig. 1) were similar to

172 those of P. abies (124,084 bp) and P. morrisonicola (124,168 bp) (Table 1). The GC

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 10 of 41

173 content of the protein-coding regions of the P. asperata and P. crassifolia plastomes

174 were the same (38.71%), and this content was similar to those of P. abies (38.72%)

175 and P. morrisonicola (38.79%) (Table 1). The IR regions of the P. asperata and P.

176 crassifolia plastomes were highly reduced and exclusively contained repeated short

177 gene sequences of trnH-GUG (74 bp), trnI-CAU (73 bp), and ycf12 (101 bp) (Fig 1).

178 The plastomes of the four spruces displayed an equivalent number of encoded

179 genes, comprised 108 different functional genes (Fig. 1, Table S2) including 72

180 protein-coding genes, 32 tRNA genes, and four rRNA genes (16 S, 23 S, 5 S, and 4.5

181 S). Among the 108 genes, trnH-GUG, trnI-CAU, trnT-GGU, rps12, psbI and ycf12 182 were duplicated, existing as invertedDraft repeat sequences. Eight protein-coding genes 183 and seven tRNA genes each presented one intron, whereas ycf3 contained two. In

184 addition, rps12 was identified as a trans-splicing gene. The 108 genes were classified

185 into three classes. The first class contained genes related to transcription and

186 translation, mainly encoding RNA polymerase subunits, rRNAs, ribosomal protein

187 products, and tRNAs. The second class contained genes related to photosynthesis,

188 particularly the Rubisco large subunit gene and components of the photosynthetic

189 electron transport chain. The third class contained genes related to the biosynthesis of

190 amino acids and fatty acids, protein translocation as well as some genes of unknown

191 function, including ycf2 and ycf12.

192 The length of the four spruce plastomes was 124,517 bp and included 438 single

193 nucleotide polymorphisms (SNPs). The mean nucleotide diversity (π) was 0.0182.

194 Pairwise comparison of SNPs and π among the spruce plastomes revealed minimal

https://mc06.manuscriptcentral.com/genome-pubs Page 11 of 41 Genome

195 differences between P. asperata and P. crassifolia. Only seven SNPs were observed

196 and were located in psbI-trnE-UUC (3 SNPs), rpoC2 (1 SNP in coding sequence),

197 trnP-GGG-rpl32 (1 SNP), trnL-CAA-ycf2 (1 SNP), and trnV-UAC-trnH-GUG (1

198 SNP). Picea abies presented moderate differences from P. asperata and P. crassifolia

199 (69 and 71 SNPs, respectively). However, larger differences existed between P.

200 morrisonicola and the other three spruces, in which 396–408 SNPs were observed. It

201 is worth noting that we did not find structural rearrangements of P. asperata and P.

202 crassifolia compared with P. abies and P. morrisonicola after sequence alignment.

203 Sullivan et al. (2017) also found no evidence of rearrangements within scaffolds. 204 Divergence hotspots for the plastomesDraft of P. asperata, P. crassifolia, P. abies, and 205 P. morrisonicola

206 The number of SNPs ranged from 0 to 16, and π ranged from 0 to 0.01361 for the

207 plastome sequences. Seven highly variable regions were identified employing π =

208 0.006 as the dividing value; these included six gene spacer regions (psbJ-petA,

209 trnT-psaM, trnS-trnD, trnL-rps4, psaC-ccsA, and rps7-trnL) and one gene (ycf1). In

210 addition, rps7-trnL and ycf1 presented the greatest variation (Fig. 2).

211 Numbers and patterns of SNP mutations

212 The patterns of the 438 SNP mutations are presented in Fig. 3. The probability of each

213 base variation was different. The numbers of C to T and G to A mutations were high

214 (145), whereas the numbers of C to G and G to C mutations were low (21). In the

215 plastomes of the four spruce species, 236 transitions and 202 transversions were

216 detected, and the transition-to-transversion ratio was 1:1.17.

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 12 of 41

217 Among all of the identified base mutation events, 144, 35, and 258 occurred in

218 coding regions, introns, and intergenic regions, respectively. The 144 mutations

219 occurring in coding regions were located in 43 genes (Table S4), including 20

220 photosynthetic apparatus genes, seven ribosomal protein genes, four

221 transcription-related genes, three chlorophyll biosynthesis genes, and nine other

222 genes. The mutations were 30-fold more likely to occur in the ycf1 and ycf2 genes.

223 Notably, ycf1 and ycf2 together contain 48% of all coding sequence polymorphisms.

224 Only one base pair in rpoC2 showed a mutation between P. asperata and P.

225 crassifolia. 226 Numbers and forms of microstructuralDraft mutations 227 Various types of SSR loci are present in the genome, and different SSR loci display

228 different numbers of repetitions. Therefore, many indels may exist in the genome. In

229 this study, these indels are referred to as SSR indels, whereas other indels are referred

230 to as non-SSR indels.

231 Forty-eight SSR indels were detected among the four spruce plastomes, and

232 seven occurred in intronic regions. The largest SSR indel was nine-bp-long. More

233 than half of the SSR indels (26, 54.17%) were single-base indels. The SSRs causing

234 indels were mainly single-base repeats (mainly A and T), whereas two-base repeat

235 indels only occurred three times. The indel located at the 3'-end of rps12 was adjacent

236 to two SSR repeats (repeating bases A and G) and was regarded as a single indel

237 event according to the principles of sequence alignment (Table S5).

https://mc06.manuscriptcentral.com/genome-pubs Page 13 of 41 Genome

238 Forty-seven non-SSR indels were detected in the four spruce plastomes. Six were

239 located in gene-coding regions, nine in intronic regions, and 32 in intergenic spacer

240 regions. The non-SSR indels occurred eight times in the gene encoding ycf1. The

241 sizes of the non-SSR indels ranged from 1 to 60 bp, and they were larger than the SSR

242 indels. The largest indel (60 bp) was located in ycf1 (Table S6).

243 Four inversion events were detected in the four spruce plastomes; these were

244 located in the psbA-trnK, trnH-trnT, trnI-trnF, and ycf3 introns. The sizes of the four

245 inversion fragments were 4, 3, 4, and 2 bp, respectively, for psbA-trnK, trnH-trnT,

246 trnI-trnF, and ycf3 introns, and the lengths of the repetitive sequences at both ends 247 were 13, 9, 19, and 2 bp, respectivelyDraft (Table 2). 248 The direction of the four inversion events was analyzed further. Using P.

249 asperata as a reference (G), the inversions located in psbA-trnK and ycf3 occurred

250 only in P. morrisonicola; however, the inversions located in trnH-trnT and trnI-trnF

251 occurred in both P. abies and P. morrisonicola (Table 2).

252 Comparison of plastome sequences with three published Picea plastomes

253 We also compared the two new accessions of Picea asperata and P. crassifolia with

254 the other three known Picea species (P. sitchensis, P. glauca, and P. jezoensis)

255 plastomes extracted from the NCBI. We found that spruce plastomes were highly

256 similar in pair-wise comparisons (Fig. S1). The structures of these plastomes were

257 generally conserved, and neither translocations nor inversions were detected among

258 the sequences. As expected, coding regions were more highly conserved than

259 noncoding regions. More concretely, most highly polymorphic regions were located

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 14 of 41

260 in intergenic regions. In addition to the six intergenic regions mentioned above, we

261 also found others (such as accD-psaL, ycf12-psbB, ycf2-trnL, and psbE-petL). These

262 regions may be undergoing more rapid nucleotide substitution at the species level,

263 which would indicate that molecular markers are important for phylogenetic analyses

264 and plant identification in Picea.

265 Reconstruction of phylogenetic relationships based on plastome genes

266 We performed phylogenetic analysis on 32 species whose plastomes were publicly

267 available. All phylogenetic analyses were performed using maximum likelihood (ML)

268 and parsimony (MP) methods on 68 shared plastome genes, protein-coding genes, and 269 conserved genes that were almost identicalDraft in five major clades: i) Pinaceae (including 270 Pinoideae, Piceoideae, Laricodeae, and Abietoideae), ii) Araucariaceae, iii)

271 Podocarpaceae, iv) Taxaceae, and v) Cupressaceae (Fig. 4 ,Fig. S2 and S3).

272 Relationships within most clades were strongly supported (>90%) with the notable

273 exception of Pinoideae and Cathaya argyrophylla, in which the nodes had less than

274 60% support. Pinaceae was sister to the other conifers, among which Taxaceae was

275 sister to Cupressaceae, and Araucariaceae was sister to Podocarpaceae. Piceoideae

276 was sister to Pinaceae and these two were both sisters of Laricoideae. Larix decidua

277 was found to have a closer relationship with sinensis (Laricoideae) than

278 to the Pinoideae and Piceoideae. With one exception, Cathaya argyrophylla

279 (Laricoideae) was moderately supported as a sister group to Pinoideae. Abies koreana,

280 Keteleeria devidiana, and Cedrus deodara in Abietoideae were found to be more

281 closely related to each other than to other Pinaceae species.

https://mc06.manuscriptcentral.com/genome-pubs Page 15 of 41 Genome

282 In the Picadeae, P. asperata and P. crassifolia were found to be more closely

283 related to each other than to the seven other species included in the subfamily. Picea

284 abies was sister to P. asperata and P. crassifolia, with 100% support. The genetic

285 relationships of P. glauca and P. jezoensis could be collapsed into a polytomy

286 because the support was less than 80%, whereas P. morrisonicola and P. sitchensis

287 had 100% support as a clade that was sister to the P. glauca–P. jezoensis polytomy

288 and the P. abies, P. asperata, and P. crassifolia clade.

289

290 Discussion 291 In this study, the complete plastomesDraft of Picea asperata and P. crassifolia were 292 sequenced, and a comparative analysis with the plastomes of two other spruces (P.

293 abies and P. morrisonicola) was performed to assess genome-wide mutational

294 dynamics within the genus Picea. The sizes of the four spruce plastomes were similar,

295 in the range of 124,084 bp to 124,168 bp, which is similar to that of P. jezoensis

296 (124,146 bp) (Yang et al. 2015) and slightly longer than that of P. glauca (123,266

297 bp) (Jackman et al. 2016). The four spruces evaluated in the present study displayed

298 the same number of encoded genes (108 genes), and six genes (i.e. rps12, trnT-GGU,

299 trnH-GUG, trnI-CAU, psbI, and ycf12) were duplicated. The gene content of the

300 Picea plastome was conserved and comprised 74 protein-coding genes, 36 tRNA

301 genes, and four rRNAs (Jackman et al. 2016; Sullivan et al. 2017). The ndh genes (11

302 in total) were lost in all four spruces, and only non-functional plastid ndh gene

303 fragments were present. Several Pinaceae species were also found to have lost the ndh

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 16 of 41

304 genes in their plastid genome (Wakasugi et al. 1994). The fate of ndh genes in Picea

305 involved a complex and dynamic scenario that cannot be considered to be a single

306 evolutionary loss (Lin et al. 2010). Ranade et al. (2016) demonstrated that ndh genes

307 were transferred to the nucleus during chloroplast evolution. In addition to the loss of

308 ndh genes, Pinaceae shared other synapomorphic plastome features, such as the

309 common loss of rps16 genes and expansion of IRs to the 3’ region of the psbA gene.

310 The rps16 gene was also absent in the P. asperata and P. crassifolia plastomes (Fig 1),

311 which is similar to a previous finding that the rps16 gene was absent from the Pinus

312 thunbergii plastome (Tsudzuki et al. 1992). 313 Phylogenetic inference amongDraft closely related species within subsections presents 314 a challenge. A growing body of evidence points to the presence of highly variable

315 markers in the plastome (Dong et al. 2012). These markers have been widely used to

316 study plant phylogenetics on lower taxonomic levels. In the present study,

317 comparisons of the four spruce plastomes suggested that ycf1, which encodes an

318 essential component of the plastid protein import apparatus, has the most variable

319 regions (Kikuchi et al. 2013). In addition, the largest indel (60 bp) is located within

320 the ycf1 sequence. ycf1, with two noncontiguous sections (i.e. ycf1a and ycf1b), is the

321 most promising plastid DNA barcode in land plants (Dong et al. 2015). Using the

322 distance method, ycf1b exhibited the highest discrimination success compared with

323 markers such as matK, rbcLb, and trnH-psbA among Pinus, Calycanthaceae, Iris,

324 Armeniaca, Paeonia, and Quercus species (Dong et al. 2015). However, ycf1 has a

325 higher ratio of non-synonymous to synonymous substitutions, which strongly

https://mc06.manuscriptcentral.com/genome-pubs Page 17 of 41 Genome

326 supports the hypothesis that this gene has been positively selected in Pinus (Parks et

327 al. 2009) and Picea (Sullivan et al. 2017). After excluding repetitive sequences and

328 poorly aligned regions, we found that the ycf1 gene had 21 positively selected codons,

329 indicating that this gene may be under very high selection pressure in Picea (Sullivan

330 et al. 2017). We also found six gene spacer regions (i.e. psbJ-petA, trnT-psaM,

331 trnS-trnD, trnL-rps4, psaC-ccsA, and rps7-trnL) that showed high variability based on

332 comparisons of four spruce plastomes. Among these spacer regions, only psbJ-petA

333 has been used in phylogenetic reconstructions—i.e. low-resolution reconstructions of

334 Osmanthus Lour. (Oleaceae) (Guo et al. 2011) and other angiosperm species 335 (Jaramillo et al. 2008). Draft 336 Nonetheless, there are other issues to consider when using plastome for

337 phylogenetic and/or population genetic studies. For instance, it is important to note

338 that plastome is effectively a single gene estimate. Incomplete lineage sorting can

339 cause the plastome to markedly vary from the true species tree (Doyle 1992), and

340 introgression can cause further discordance between plastome and species history

341 (Maddison 1997). There is also growing awareness of the importance of considering

342 the causes of phylogenetic discord to account for disagreement between phylogenetic

343 analyses given that plastomes are capable of recombination (Sullivan et al. 2017, Zhu,

344 2018). Thus, reconstructing evolutionary relationships among closely-related species

345 should involve the use of multiple individuals and high-resolution loci (and even

346 whole plastomes) (Knowles and Carstens 2007; Sullivan et al. 2017; Syring et al.

347 2007).

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 18 of 41

348 In the present study, the topologies of ML and MP phylogenetic trees inferred

349 from 68 shared plastome genes, protein-coding genes, and codon positions were

350 almost identical (Fig. 4, Fig. S2 and S3). Pinaceae was sister to the other conifers,

351 among which Taxaceae was sister to Cupressaceae, and Araucariaceae was sister to

352 Podocarpaceae, which is in accordance with previous phylogenetic analyses

353 uncovered two subclades within cupressophytes that inferred from 80 plastid

354 protein-coding genes (Wu and Chaw, 2016). One is the Northern Hemisphere species,

355 consisting of Cupressaceae and Taxaceae, and the other is the Southern Hemisphere

356 species, containing Araucariaceae and Podocarpaceae. Our results also revealed an 357 ambiguous placement for CathayaDraft argyrophylla, a relative of Pinus and Picea; our 358 results suggested that C. argyrophylla was more closely related to Picea than to Pinus

359 (Lin et al. 2010; Wang et al. 2000). In addition, we found that Cedrus deodara

360 formed a clade with Keteleeria davidiana and Abies koreana, which agrees with

361 previous phylogenetic analyses that used comparative chloroplast genomics to

362 categorize Pinaceae genera and subfamilies (Lin et al. 2010).

363 With respect to the phylogenetic distribution of Picea species, our results

364 suggested that seven spruce species are sister to the Piceoideae (Fig. 4, Fig. S2 and

365 S3). The topologies of our phylogenies of these seven spruce species are similar to

366 phylogenetic analyses produced by Sullivan et al. (2017). Picea abies, P. asperata,

367 and P. crassifolia were found to be more closely related to each other than to other

368 Picea species. Similar results were also found by previous phylogenetic analyses that

369 used plastid, mitochondrial, and nuclear markers (Lockwood et al. 2013; Ran et al.

https://mc06.manuscriptcentral.com/genome-pubs Page 19 of 41 Genome

370 2006; Sullivan et al. 2017). However, several nodes with P. abies and predominately

371 northeast Asian taxa, including P. asperata and P. crassifolia, had less than 50%

372 support in Sullivan et al. (2017). The uncertainty of this topology is likely due to a

373 rapid and recent radiation given the very low interspecific genetic divergence and

374 monomorphic mitotypes (Ran et al. 2006; Sullivan et al. 2017). The plastomes of P.

375 abies, P. asperata, and P. crassifolia presented minimal differences; i.e. P. asperata

376 and P. crassifolia appeared to have only seven SNPs in the present study. A previous

377 study of chloroplast and mitochondrial DNA variations between P. asperata and P.

378 crassifolia also showed that none of the derived mutations are species specific (Du et 379 al. 2009). A recent divergence (approximatelyDraft 127,000 years ago) between these two 380 species and a lack of fixed variation indicates that they are at the initial stage of

381 speciation (Bi et al. 2016). At this stage, there has been insufficient time to

382 accumulate genetic differentiation (Nielsen and Wakeley 2001). Moreover,

383 incomplete lineage sorting and gene flow results in extensive genetic sharing between

384 the two lineages (Bi et al. 2016).

385 Picea jezoensis is a widespread species found in cold-temperate and boreal

386 forests in eastern Siberia, northeast China, Korea, and Japan. Here, we found poor

387 support for the relationships among P. jezoensis, P. glauca, and P. abies on the

388 whole-plastome level. Similarly, previous phylogenetic studies based on individual

389 chloroplast DNA regions showed that these species are distinct (Lockwood et al.

390 2013; Ran et al. 2006). Sullivan et al. (2017) inferred that introgression between a P.

391 jezoensis-like species and the most recent common ancestor of P. abies may have

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 20 of 41

392 been present in northeastern Russia during the Quaternary glaciation, prior to the

393 clade’s diversification and colonization of Eurasia. In addition, P. glauca and P.

394 sitchensis, which are distributed in western North America (Sutton et al. 1991), were

395 clustered into distantly related clades rather than in a monophyletic group (Fig 4).

396 This finding is consistent with phylogenetic analyses conducted using the trnC-trnD

397 and trnT-trnF regions (Ran et al. 2006) as well as whole plastome alignments

398 (Sullivan et al. 2017). P. sitchensis and P. glauca, two distantly-related species, have

399 been found to readily hybridize (Hamilton & Aitken 2013). However, a previous

400 study showed that P. glauca, P. engelmannii, and P. sitchensis were sister to each 401 other with strong support (LockwoodDraft et al. 2013). The phylogeographical types 402 found at different sampling locations and the different plastome sequences within

403 them may also have caused discordance among Picea phylogenies (Aizawa et al.

404 2007; Lockwood et al. 2013; Ran et al. 2006).

405 Other than P. sitchensis, P. morrisonicola was found to be the most distantly

406 related to the other six Picea species (Fig. 4). Picea morrisonicola is a vulnerable

407 spruce species endemic to the island of Taiwan (Bodare et al. 2013); chloroplast,

408 mitochondrial and nuclear marker data suggest that this species is rather distantly

409 related to P. abies, P. crassifolia, and P. asperata (Ran et al. 2015). The population

410 of P. morrisonicola is small and geographically isolated (Bodare et al. 2013);

411 therefore, species-specific mutations in the chloroplast DNA would likely have

412 accumulated more rapidly in this species (Bouillé et al. 2011; Ran et al. 2006;

413 Sullivan et al. 2017). Thus, P. morrisonicola exhibits greater differences compared

https://mc06.manuscriptcentral.com/genome-pubs Page 21 of 41 Genome

414 with the other three spruce species examined in this study. Several studies have

415 investigated the phylogenetic relationships among spruces (Bouillé et al. 2011; Ran

416 et al. 2006; Sullivan et al. 2017); however, no concordant topologies have been

417 generated using plastid data (Bouillé et al. 2011; Ran et al. 2006; Sullivan et al.

418 2017). This can be attributed to interspecific plastome recombination and ancient

419 reticulate evolution in Picea (Bouillé et al. 2011; Sullivan et al. 2017).

420

421 Conclusions

422 In this study, by comparing the plastomes of P. asperata, P. crassifolia, P. abies, and 423 P. morrisonicola, we identified 438Draft SNPs, 95 indel events, four inversion events, 424 seven highly variable regions, six gene spacer regions (psbJ-petA, trnT-psaM,

425 trnS-trnD, trnL-rps4, psaC-ccsA, and rps7-trnL) and one gene (ycf1). These identified

426 regions and mutations may be used as molecular markers for further phylogenetic

427 analyses of Picea and other Pinaceae species. Furthermore, phylogenetic analysis

428 inferred from shared plastome genes may be more precise than analyses using a single

429 gene from the plastome. However, because of incomplete lineage sorting and

430 interspecies introgression leading to inconsistent tree topologies, we suggest that

431 additional analyses should be conducted to confirm the relationships among Picea

432 species, especially based on larger genomic data.

433

434 Acknowledgements

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 22 of 41

435 This work was financially supported by the National Natural Science Foundation of

436 China (NSFC 31271265 and 31600541) and the China Postdoctor Science Foundation

437 (CPSF 2016M591053). We acknowledge TopEdit LLC for linguistic editing and

438 proofreading during the preparation of this manuscript. And we also would like to

439 thank the editor-in-chief, associate editor and the expert reviewer, for their valuable

440 comments and suggestions.

441

442 References

443 Aizawa, M., Yoshimaru, H., Saito, H., Katsuki, T., Kawahara, T., Kitamura, K., Shi, 444 F., and Kaji, M. 2007. PhylogeographyDraft of a northeast Asian spruce, Picea 445 jezoensis, inferred from genetic variation observed in organelle DNA markers.

446 Molecular Ecology 16(16): 3393-3405.

447 doi:10.1111/j.1365-294X.2007.03391.x.

448 Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., , and

449 Lipman D.J. 1997. Gapped BLAST and PSI-BLAST: a new generation of

450 protein database search programs. Nucleic Acids Research 25(17): 3389-3402.

451 doi: 10.1093/nar/25.17.3389.

452 Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S.,

453 Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., Pyshkin A.V.,

454 Sirotkin A.V., Vyahhi N., Tesler G., Alekseyev M.A. and Pevzner P.A. 2012.

455 SPAdes: A New Genome Assembly Algorithm and Its Applications to

456 Single-Cell Sequencing. Journal of Computational Biology 19: 455-477.

https://mc06.manuscriptcentral.com/genome-pubs Page 23 of 41 Genome

457 doi:10.1089/cmb.2012.0021.

458 Barrett, C.F., Baker, W.J., Comer, J.R., Conran, J.G., Lahmeyer, S.C., Leebens‐Mack,

459 J.H. and Li, J., et al. 2016. Plastid genomes reveal support for deep

460 phylogenetic relationships and extensive rate variation among palms and other

461 commelinid monocots. New Phytologist 209(2): 855-870.

462 doi:10.1111/nph.13617.

463 Bi, H., Yue, W., Wang, X., Zou, J., Li, L., Liu, J., and Sun, Y. 2016. Late Pleistocene

464 climate change promoted divergence between Picea asperata and

465 P. crassifolia on the Qinghai–Tibet Plateau through recent bottlenecks. 466 Ecology & Evolution 6(13):Draft 4435-4444. doi: 10.1002/ece3.2230. 467 Birol, I., Raymond, A., Jackman, S.D., Pleasance, S., Coope, R., Taylor, G.A. and

468 Yuen,S.M.M., et al. 2013. Assembling the 20 Gb white spruce (Picea glauca)

469 genome from whole-genome shotgun sequencing data. Bioinformatics 29(12):

470 1492-1497. doi: 10.1093/bioinformatics/btt178.

471 Bodare, S., Stocks, M., Yang, J.C., and Lascoux, M. 2013. Origin and demographic

472 history of the endemic Taiwan spruce (Picea morrisonicola). Ecology &

473 Evolution 3(10): 3320-3333. doi: 10.1002/ece3.698.

474 Bouillé, M., Senneville, S., and Bousquet, J. 2011. Discordant mtDNA and cpDNA

475 phylogenies indicate geographic speciation and reticulation as driving factors

476 for the diversification of the genus Picea. Tree Genetics & Genomes 7(3):

477 469-484. doi:10.1007/s11295-010-0349-z.

478 Dong, W., Liu, J., Yu, J., Wang, L., and Zhou, S. 2012. Highly variable chloroplast

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 24 of 41

479 markers for evaluating plant phylogeny at low taxonomic levels and for DNA

480 barcoding. PLoS One 7 : e35071. doi:10.1371/journal.pone.0035071.

481 Dong, W., Xu, C., Li, C., Sun, J., Zuo, Y., Shi, S. and Cheng, T., et al., 2015. ycf1, the

482 most promising plastid DNA barcode of land plants. Scientific Reports 5:

483 8348. doi:10.1038/srep08348.

484 Doyle, J.J. 1992. Gene trees and species trees: molecular systematics as one-character

485 taxonomy. Systematic Botany 17(1): 144-163.

486 Du, F.K., Petit, R.J., and Liu, J.Q. 2009. More introgression with less gene flow:

487 chloroplast vs. mitochondrial DNA in the Picea asperata complex in China, 488 and comparison with otherDraft Conifers. Molecular Ecology 18(7): 1396-1407. 489 doi: 10.1111/j.1365-294X.2009.04107.x.

490 Dubchak, I. 2007. Comparative Analysis and Visualization of Genomic Sequences

491 Using VISTA Browser and Associated Computational Tools. Methods in

492 Molecular Biology 395:3-16.

493 Eckenwalder, J.E., and Press, T. 2009. Conifers of the world. Timber Press.

494 Farjon, A. 2001. World checklist and bibliography of conifers. Royal Botanic

495 Gardens.

496 Fu, L., Li, N., and Mill, R. 1999. Pinaceae. In: Flora of China. Science Press, Beijing

497 and Missouri Botanical Garden Press, St. Louis. 11-52.

498 Guo, S.-Q., Xiong, M., Ji, C.-F., Zhang, Z.-R., Li, D.-Z., and Zhang, Z.-Y. 2011.

499 Molecular phylogenetic reconstruction of Osmanthus Lour. (Oleaceae) and

500 related genera based on three chloroplast intergenic spacers. Plant Systematics

https://mc06.manuscriptcentral.com/genome-pubs Page 25 of 41 Genome

501 and Evolution 294(1-2): 57-64. doi:10.1007/s00606-011-0445-z.

502 Hamilton, J.A., Aitken, S.N. 2013. Genetic and morphological structure of a spruce

503 hybrid (Picea sitchensis x P. glauca) zone along a climatic gradient. American

504 Journal of Botany. 100:1651–1662.

505 Huang, D.I., Hefer, C.A., Kolosova, N., Douglas, C.J., and Cronk, Q.C.B. 2014.

506 Whole plastome sequencing reveals deep plastid divergence and cytonuclear

507 discordance between closely related balsam poplars, Populus balsamifera and

508 P. trichocarpa (Salicaceae). New Phytologist 204(3): 693-703. doi:

509 10.1111/nph.12956. 510 Jackman, S.D., Warren, R.L., Gibb,Draft E.A., Vandervalk, B.P., Mohamadi, H., Chu, J. 511 and Raymond, A., et al. 2016. Organellar Genomes of White Spruce (Picea

512 glauca): Assembly and Annotation. Genome Biology & Evolution 8(1):

513 29-41. doi:10.1093/gbe/evv244.

514 Katoh, K., and Standley, D.M. 2013. MAFFT multiple sequence alignment software

515 version 7: improvements in performance and usability. Molecular Biology &

516 Evolution 30: 772-780. doi:10.1093/molbev/mst010.

517 Kearse, M., Moir, R., Wilson, A., Stoneshavas, S., Cheung, M., Sturrock, S. and

518 Buxton, S., et al. 2012. Geneious Basic: An integrated and extendable desktop

519 software platform for the organization and analysis of sequence data.

520 Bioinformatics 28(12): 1647-1649. doi:10.1093/bioinformatics/bts199.

521 Knowles, L.L., and Carstens, B.C. 2007. Estimating a geographically explicit model

522 of population divergence. Evolution 61(3): 477-493.

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 26 of 41

523 doi:10.1111/j.1558-5646.2007.00043.x.

524 Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., and

525 Salzberg, S.L. 2004. Versatile and open software for comparing large

526 genomes. Genome Biology 5(2): R12. doi:10.1186/gb-2004-5-2-r12.

527 Lanfear, R., Calcott, B., Ho, S.Y.W., and Guindon, S. 2012. PartitionFinder:

528 combined selection of partitioning schemes and substitution models for

529 phylogenetic analyses. Molecular Biology & Evolution 29(6): 1695-1701.

530 doi:10.1093/molbev/mss020.

531 Li, J., Wang, S., Yu, J., Wang, L., and Zhou, S. 2013. A modified CTAB protocol for 532 plant DNA extraction. ChineseDraft Bulletin of Botany 48(1): 72-78. doi: 533 10.3724/SP.J.1259.2013.00072.

534 Librado, P., and Rozas, J. 2009. DnaSP v5: a software for comprehensive analysis of

535 DNA polymorphism data. Bioinformatics 25(11): 1451-1452. doi:

536 10.1093/bioinformatics/btp187.

537 Lin, C.P., Huang, J.P., Wu, C.S., Hsu, C.Y., and Chaw, S.M. 2010. Comparative

538 Chloroplast Genomics Reveals the Evolution of Pinaceae Genera and

539 Subfamilies. Genome Biology & Evolution 2(1): 504-517.

540 doi:10.1093/gbe/evq036.

541 Lockwood, J.D., Aleksić, J.M., Zou, J., Wang, J., Liu, J., and Renner, S.S. 2013. A

542 new phylogeny for the genus Picea from plastid, mitochondrial, and nuclear

543 sequences. Molecular Phylogenetics & Evolution 69(3): 717-727. doi:

544 10.1016/j.ympev.2013.07.004.

https://mc06.manuscriptcentral.com/genome-pubs Page 27 of 41 Genome

545 Lohse, M., Drechsel, O., Kahlau, S., and Bock, R. 2013. OrganellarGenomeDRAW--a

546 suite of tools for generating physical maps of plastid and mitochondrial

547 genomes and visualizing expression data sets. Nucleic Acids Research

548 41(Web Server issue): 575-581. doi: 10.1093/nar/gkt289.

549 Maddison, W.P. 1997. Gene Trees in Species Trees. Systematic Biology 46(3):

550 523-536. doi: 10.1093/sysbio/46.3.523.

551 Neale DB, Sederoff RR. 1989. Paternal inheritance of chloroplast DNA and maternal

552 inheritance of mitochondrial DNA in loblolly pine. Theoretical and Applied

553 Genetics 77:212–216. 554 Nielsen, R., and Wakeley, J. 2001.Draft Distinguishing Migration From Isolation: A 555 Markov Chain Monte Carlo Approach. Genetics 158(2): 885-896.

556 Nystedt, B., Street, N.R., Wetterbom, A., Zuccolo, A., Lin, Y.-C., Scofield, D.G. and

557 Vezzi, F. et al., 2013. The Norway spruce genome sequence and

558 genome evolution. Nature. 497:579-584. doi:10.1038/nature12211.

559 OECD. 1999. Consensus document on the biology of Picea glauca (Moench) Voss

560 (White Spruce). Series on Harmonization of Regulatory Oversight in

561 Biotechnology No. 13, eds. Joint Meet. Chemicals Committee and Working

562 Party on Chemicals (Environ. Direct., OECD Environ. Health and Safety

563 Publ.), Paris, France.

564 Parks, M., Cronn, R., and Liston, A. 2009. Increasing phylogenetic resolution at low

565 taxonomic levels using massively parallel sequencing of chloroplast genomes.

566 BMC Biology 7(84): 1-17. doi:10.1186/1741-7007-7-84.

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 28 of 41

567 Rambaut, A. 2002. Se-Al: Sequence Alignment Editor v2.0 a11. University of Oxford

568 UK.

569 Ran, J.H., Wei, X.X., and Wang, X.Q. 2006. Molecular phylogeny and biogeography

570 of Picea (Pinaceae): implications for phylogeographical studies using

571 cytoplasmic haplotypes. Molecular Phylogenetics & Evolution 41(2):

572 405-419. doi:10.1016/j.ympev.2006.05.039.

573 Ran, J.H., Shen, T.T., Liu, W.J., Wang, P.P., and Wang, X.Q. 2015. Mitochondrial

574 introgression and complex biogeographic history of the genus Picea.

575 Molecular Phylogenetics & Evolution 93: 63-76. 576 doi:10.1016/j.ympev.2015.07.020.Draft 577 Ranade, S.S., García-Gil, M.R., and Rosselló, J.A. 2016. Non-functional plastid ndh

578 gene fragments are present in the nuclear genome of Norway spruce (Picea

579 abies L. Karsch): insights from in silico analysis of nuclear and organellar

580 genomes. Molecular Genetics & Genomics 291(2): 935-941.

581 doi:10.1007/s00438-015-1159-7.

582 Räsänen, K., and Hendry, A.P. 2010. Disentangling interactions between adaptive

583 divergence and gene flow when ecology drives diversification. Ecology

584 Letters 11(6): 624-636. doi: 10.1111/j.1461-0248.2008.01176.x.

585 Kikuchi S., Bédard J., Hirano M., Hirabayashi Y., Oishi M., Imai M. and Takase M.

586 2013. Uncovering the protein translocon at the chloroplast inner envelope

587 membrane. Science (New York, N.Y.) 339(6119): 571-574.

588 doi:10.1126/science.1229262.

https://mc06.manuscriptcentral.com/genome-pubs Page 29 of 41 Genome

589 Simmons, M.P. 2004. Independence of alignment and tree search. Molecular

590 Phylogenetics & Evolution 31(3): 874-879. doi:10.1016/j.ympev.2003.10.008.

591 Stamatakis, A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic

592 analyses with thousands of taxa and mixed models. Bioinformatics 22(21):

593 2688-2690. doi: 10.1093/bioinformatics/btl446.

594 Sullivan, A.R., Schiffthaler, B., Thompson, S.L., Street, N.R., and Wang, X.R. 2017.

595 Interspecific Plastome Recombination Reflects Ancient Reticulate Evolution

596 in Picea (Pinaceae). Molecular Biology & Evolution 34(7): 1689-1701.

597 Sutton, B.C., Flanagan, D.J., Gawley, J.R., Newton, C.H., Lester, D.T., and 598 El-Kassaby, Y.A. 1991. InheritanceDraft of chloroplast and mitochondrial DNA in 599 Picea and composition of hybrids from introgression zones. Theoretical &

600 Applied Genetics 82(2): 242-248. doi:10.1007/bf00226220.

601 Syring, J., Farrell, K., Businsky, R., Cronn, R., and Liston, A. 2007. Widespread

602 genealogical nonmonophyly in species of Pinus subgenus Strobus. Systematic

603 Botany 56(2): 1-19. doi:10.1080/10635150701258787.

604 Tsudzuki, J., Nakashima, K., Tsudzuki, T., Hiratsuka, J., Shibata, M., Wakasugi, T.,

605 and Sugiura, M. 1992. Chloroplast DNA of black pine retains a residual

606 inverted repeat lacking rRNA genes: nucleotide sequences of trnQ, trnK,

607 psbA, trnI and trnH and the absence of rps16. Molecular & General Genetics

608 232(2): 206-214.

609 Wakasugi, T., Tsudzuki, J., Ito, S., Nakashima, K., Tsudzuki, T., and Sugiura, M.

610 1994. Loss of all ndh genes as determined by sequencing the entire chloroplast

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 30 of 41

611 genome of the black pine Pinus thunbergii. Proceedings of the National

612 Academy of Sciences of the United States of America 91(21): 9794-9798.

613 Wang, X.Q., Tank, D.C., and Sang, T. 2000. Phylogeny and divergence times in

614 Pinaceae: evidence from three genomes. Molecular Biology & Evolution 17:

615 773-781. doi:10.1212/01.wnl.0000210464.94122.e1

616 Whittle, C.A., and Johnston, M.O. 2002. Male-Driven Evolution of Mitochondrial and

617 Chloroplastidial DNA Sequences in Plants. Molecular Biology & Evolution

618 19(6): 938-949. doi: 10.1093/oxfordjournals.molbev.a004151.

619 Wolfe A.D, Randle C.P. 2004. Recombination, heteroplasmy, haplotype 620 polymorphism, and paralogyDraft in plastid genes: Implications for plant molecular 621 systematics. Systematic Botany 29:1011–1020.

622 Wolfe, K.H., Li, W.H., and Sharp, P.M. 1987. Rates of nucleotide substitution vary

623 greatly among plant mitochondrial, chloroplast, and nuclear DNAs.

624 Proceedings of the National Academy of Sciences of the United States of

625 America 84(24): 9054-9058.

626 Wu, C.S., Chaw SW., and Huang, Y.Y. 2013. Chloroplast phylogenomics indicates

627 that Ginkgo biloba is sister to Cycads. Genome biology and evolution 5(1):

628 243-254. doi:10.1093/gbe/evt001.

629 Wu, C.S., Chaw, S.M. 2016. Large-Scale Comparative Analysis Reveals the

630 Mechanisms Driving Plastomic Compaction, Reduction, and Inversions in

631 Conifers II (Cupressophytes). Genome biology and evolution 8:3740-3750

632 doi:10.1093/gbe/evw278.

https://mc06.manuscriptcentral.com/genome-pubs Page 31 of 41 Genome

633 Wyman, S.K., Jansen, R.K., and Boore, J.L. 2004. Automatic annotation of organellar

634 genomes with DOGMA. Bioinformatics 20(17): 3252-3255. doi:

635 10.1093/bioinformatics/bth352.

636 Yang, J.C., Joo, M., So, S., Yi, D.K., Shin, C.H., Lee, Y.M., and Choi, K. 2015. The

637 complete plastid genome sequence of Picea jezoensis (Pinaceae: Piceoideae).

638 Mitochondrial DNA Part A DNA Mapping Sequencing & Analysis 27: 3761.

639 doi: 10.3109/19401736.2015.1079894.

640 Zhao, W., Jiang, M., Ma, J., Xu, N., and Wang, J. 2015. Interspecific hybridization of

641 Picea and genetic testing of growth traits in F_1 seedlings. Forest Science and 642 Technology (In Chinese) 9:Draft 40-43. 643 Zhu, A., Fan, W., Adams, R.P., Mower, J.P. 2018. Phylogenomic evidence for ancient

644 recombination between plastid genomes of the

645 Cupressus-Juniperus-Xanthocyparis complex (Cupressaceae). BMC

646 Evolutionary Biology 18:137 doi:10.1186/s12862-018-1258-2.

647 Zou, J., Sun, Y., Li, L., Wang, G., Wei, Y., Lu, Z., Wang, Q., and Liu, J. 2013.

648 Population genetic evidence for speciation pattern and gene flow between

649 Picea wilsonii, P. morrisonicola and P. neoveitchii. Annals of Botany 112(9):

650 1829-1844. doi:10.1093/aob/mct241.

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 32 of 41

651 Figure legends

652 Figure 1. Gene map of the P. asperata and P. crassifolia plastomes. Genes are

653 indicated by boxes on the inside (clockwise transcription) and outside

654 (counterclockwise transcription) as the outermost circle. Genes belonging to different

655 functional groups are color-coded. The dashed area in the inner circle indicates the

656 GC content of the plastome.

657

658 Figure 2. Sequence identity plot comparing the complete plastomes of P. asperata, P.

659 crassifolia, P. abies, and P. morrisonicola with P. asperata as a reference using 660 mVISTA. Gray arrows and thick blackDraft lines above the alignment indicate genes with 661 their orientation and the position of the IRs, respectively. A cutoff of 70% identity

662 was used for the plots, and the Y-scale represents the percent identity, and ranges

663 from 50 to 100%.

664

665 Figure 3. Patterns of nucleotide substitutions among the P. asperata, P. crassifolia, P.

666 abies and P. morrisonicola plastomes.

667

668 Figure 4. Phylogenetic tree inferred via maximum likelihood and parsimony using 68

669 shared protein-coding genes among 32 plastid genomes (20 from the Pinaceae and 12

670 outgroups from the Araucariaceae, Podocarpaceae, Taxaceae, and Cupressaceae).

671 Supported values estimated from 1,000 bootstrap replicates through maximum

672 likelihood (ML) and parsimony (MP) are presented along the branches (MP/ML).

https://mc06.manuscriptcentral.com/genome-pubs Page 33 of 41 Genome

673

674 Supplementary materials

675 Table S1. Interspecific hybridization among Picea crassifolia and P. abies, P.

676 wilsonii, P. asperata, P. koraiensis, and P. glauca.

677

678 Table S2. Genes identified in the Picea asperata, P. crassifolia, P. abies, and P.

679 morrisonicola plastomes.

680

681 Table S3. List of 32 species whose cp sequences are available in GeneBank. 682 Draft 683 Table S4. Base mutation events in gene coding regions of the plastomes of Picea

684 asperata, P. abies, and P. morrisonicola.

685

686 Table S5. Simple sequence repeat (SSR) indels in the plastomes of Picea asperata, P.

687 crassifolia, P. abies, and P. morrisonicola.

688

689 Table S6. Simple indels in the plastomes of Picea asperata, P. crassifolia, P. abies,

690 and P. morrisonicola.

691

692 Figure S1. Sequence identity plot comparing the complete plastomes of P. asperata,

693 P. crassifolia, P. abies, P. morrisonicola, P. sitchensis, P. glauca, and P. jezoensis

694 with P. abies as a reference using mVISTA. Gray arrows and thick black lines above

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 34 of 41

695 the alignment indicate genes with their orientation and the position of the IRs,

696 respectively. Genome regions are color-coded as exons and conserved non-coding

697 sequences (CNS). A cutoff of 50% identity was used for the plots, and the Y-scale

698 represents the percent identity, and ranges from 50 to 100%.

699

700 Figure S2 Phylogenetic tree inferred via maximum likelihood and parsimony using

701 68 shared genes among 32 plastid genomes (20 from the Pinaceae and 12 outgroups

702 from the Araucariaceae, Podocarpaceae, Taxaceae and Cupressaceae). Supported

703 values estimated from 1,000 bootstrap replicates using maximum likelihood (ML). 704 Draft 705 Figure S3 Phylogenetic tree inferred via maximum likelihood and parsimony using

706 68 shared codon positions among 32 plastid genomes (20 from the Pinaceae and 12

707 outgroups from the Araucariaceae, Podocarpaceae, Taxaceae and Cupressaceae).

708 Supported values estimated from 1,000 bootstrap replicates using maximum

709 likelihood (ML).

710

711

https://mc06.manuscriptcentral.com/genome-pubs Page 35 of 41 Genome

1 Table 1. Summary of the four complete Picea plastomes: Picea asperata, P. crassifolia, P. abies, and P. morrisonicola

P. P. P. P. Category asperata crassifolia morrisonicola abies

Accession number KY204451 KY204450 AB480556 HF937082

124,084 Size (bp) 124,145 124,126 124,168 % GC content 38.71 Draft38.71 38.79 38.72 Number of protein-coding genes 72 72 72 72

Number of tRNA genes 32 32 32 32

Number of rRNA genes 4 4 4 4

Number of genes with introns 14 14 14 14

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 36 of 41

3

Draft

https://mc06.manuscriptcentral.com/genome-pubs Page 37 of 41 Genome

1 Table 2. Inversion events and inversion directions in the plastomes of Picea

2 asperata, P. crassifolia, P. abies, and P. morrisonicola

Size of Repeat

inversion sequence P. P. P. P. Gene fragment length asperata crassifolia abies morrisonicola

(bp) (bp)

psbA-trnK 4 13 G G G A

trnH-trnT 3 9 G A A A

trnI-trnF 4 19 G G A A ycf3 intron 2 2 DraftG G G A

3 A and G indicate different directions. The direction of the plastome of P. asperata (G)

4 was used as a reference.

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 38 of 41

trnS-GCU

trnE-UUC trnD-GUC trnY-GUA

Masp

psbM

rpoB trnH-GUG

trnV-UAC atpE atpB rpoC1

psbI rpoC2

psbJ

trnT-GGU petN rbcL rps2 trnC-GCA psbL trnM-CAU psbF psbE atpI accD psaI trnR-CCG ycf4 atpH trnW-CCA trnP-UGG atpF cemA petA

atpA

rpl20 rps12 petL clpP trnR-UCU petG trnG-GCC psaJ ycf12 psbB rpl33 ycf12 rps18 psbI psbT psbK psbH trnS-GCU petB psbN trnQ-UUG chlB petD Picea asperata rpoA trnK-UUU rps11 Draft matK rpl36 infA 124,145 bp psbA rps8 trnI-CAU rpl14 trnH-GUG rpl16 rps3 Picea crassifolia rpl22 rps19 124,126 bp rpl2 ycf2 rpl23 trnI-CAU

trnF-GAA trnL-UAA

trnL-CAA

trnV-GAC trnT-UGUrps4 rps7 rps12 rpl32 trnS-GGA trnL-UAG

ycf3 ccsA

trnG-UCC psbZ

trnP-GGG psaA

psbC

trnN-GUU psbD psaB trnT-GGU psaC

rps15

rps14 trnS-UGA ycf1 photosystem I

trnfM-CAU photosystem II chlN chlL cytochrome b/f complex rrn16

ATP synthase rrn23 trnI-GAU

trnA-UGC RubisCO large subunit trnR-ACG

5.4nrr

RNA polymerase 5nrr ribosomal proteins (SSU) ribosomal proteins (LSU) clpP, matK other genes hypothetical chloroplast reading frames (ycf) transfer RNAs https://mc06.manuscriptcentral.com/genome-pubs ribosomal RNAs introns psbF atpE petG trnK matK trnK chlB trnQ psbIycf12 rpl20 rpl33 trnP psbE psbJ petA cemA ycf4 psaI accD rbcL atpB trnM trnH psbK trnS clpP rps18 psaJtrnW petL psbL trnR trnV trnT 100% P. crassifolia 50% 100% P. morrisonicola 50% 100% P. abies 50% 0k 3k 6k 9k 12k 15k 18k 21k 24k 27k 30k

psaM psbI trnY psbM trnC rpoB rpoC1 rpoC2 rps2 atpI atpH atpF atpA trnR ycf12 psbB psbN petB petD rpoA rpl36rps8 trnS trnE trnD petN trnG psbT psbH rps11 infA 100%

50% 100%

50% 100%

50% 33k 36k 39k 42k 45k 48k 51k 54k 57k 60k rpl22 psbD trnG rps8 rpl16 rps3 rpl2 rpl23 trnF trnT trnS ycf3 psaA psaB rps14 trnS psbC trnT rrn16 trnI trnA rrn23 Genome rpl14 rps19 trnI trnL rps4 trnfMpsbZ Draft 100% 50% 100%

50% 100% https://mc06.manuscriptcentral.com/genome-pubs 50% 63k 66k 69k 72k 75k 78k 81k 84k 87k 90k 93k

rrn4.5 trnR chlL chlN ycf1 rps15 psaC ccsA trnP rpl32 trnV rps12 trnL ycf2 trnH rrn5 trnN trnL rps7 trnI 100%

50% gene 100%

exon 50% intron 100% CNS 50% 93k 96k 99k 102k 105k 108k 111k 114k 117k 120k 123k Page 39 of 41 Genome Page 40 of 41

Draft

81x49mm (300 x 300 DPI)

https://mc06.manuscriptcentral.com/genome-pubs Page 41 of 41 Genome

Draft

https://mc06.manuscriptcentral.com/genome-pubs