<<

bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Whole genome sequence of an edible and potential medicinal ,

2 guangdongensis

3 Chenghua Zhang, Wangqiu Deng, Wenjuan Yan, Taihui Li1

4 State Key Laboratory of Applied Microbiology Southern , Guangdong

5 Provincial Key Laboratory of Microbial Culture Collection and Application,

6 Guangdong Open Laboratory of Applied Microbiology, Guangdong Institute of

7 Microbiology, Guangzhou, 510070, China

8 References number: 47

9

10

11

12

13

14

15

16

17

18 bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

19 Running title: Genome of Cordyceps guangdongensis

20 Keywords: Cordyceps, chromosome, transporters, transcription factors, Genome

21 Report

22 1Corresponding author: Guangdong Institute of Microbiology, Building 59, No.100

23 Courtyard, Xianlie Zhong Road, Yuexiu District, Guangzhou City, P.R. China,510070.

24 E -mail: [email protected]. Tel: 020-87137620

25

26

27

28

29

30

31

32

33

34

35

36 bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

37 ABSTRACT

38 Cordyceps guangdongensis is an edible fungus which has been approved as a

39 Novel Food by the Chinese Ministry of Public Health in 2013. It also has a broad

40 application prospect in pharmaceutical industries with many medicinal activities. In

41 this study, the whole genome of C. guangdongensis GD15, a single spore isolate from

42 a wild strain, was sequenced and assembled with Illumina and PacBio sequencing

43 technology. The generated genome is 29.05 Mb in size, comprising 9 scaffolds with

44 an average GC content of 57.01%. It is predicted to contain a total of 9150

45 protein-coding genes. Sequence identification and comparative analysis indicated that

46 the assembled scaffolds contained two complete chromosomes and four single-end

47 chromosomes, showing a high level assembly. Gene annotation revealed a diversity of

48 transporters that could contribute to the genome size and evolution. Besides,

49 approximately 15.49% and 13.70% genes involved in metabolic processes were

50 annotated by KEGG and COG respectively. Genes belonging to CAZymes accounted

51 for a proportion of 2.84% of the total genes. In addition, 435 transcription factors

52 (TFs) were identified, which were involved in various biological processes. Among

53 the identified TFs, the fungal transcription regulatory proteins (18.39%) and

54 fungal-specific TFs (19.77%) represented the two largest classes of TFs. These data

55 provided a much needed genomic resource for studying C. guangdongensis, laying a

56 solid foundation for further genetic and biological studies, especially for elucidating

57 the genome evolution and exploring the regulatory mechanism of fruiting body

58 development. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

59 INTRODUCTION

60 Cordyceps guangdongensis T. H. Li, Q. Y. Lin & B. Song () was

61 discovered in southern China (Lin et al. 2008), and has been successfully cultivated

62 (Lin et al. 2010). Its fruiting body is nontoxic, and has become the second Novel Food

63 of Cordyceps species approved by the Ministry of Public Health of China in 2013.

64 This fungus has rich nutrient contents and bioactive compounds, such as cordycepic

65 acid, adenosine and polysaccharide (Lin et al. 2009; 2010). The contents of the

66 above-mentioned bioactive compounds in C. guangdongensis are similar to those of

67 the traditional Chinese invigorant, sinensis (=Cordyceps sinensis)

68 (Lin et al. 2009). Previous researches by the authors’ group indicated that the fruiting

69 bodies of C. guangdongensis possessed various therapeutic properties, including

70 antioxidant activity (Zeng et al. 2009), longevity-increasing activity (Yan et al. 2011),

71 anti-fatigue effect (Yan et al. 2013), curative effect on chronic renal failure (Yan et al.

72 2012), and anti-inflammatory effect (Yan et al. 2014). These active effects provided

73 great potential for its application in food and medicinal industries. However, it is

74 worth considering what makes the fruiting bodies of C. guangdongensis so valuable

75 and how to improve the fruiting body yield. Therefore, it is a matter of cardinal

76 significance for further understanding the fruiting body development and metabolism

77 mechanisms of C. guangdongensis, as well as the evolutionary relationship between

78 other related species.

79 In recent years, whole genome sequencing (WGS) is widely used to analyze

80 the relevance of phenotypic characters and genetic mechanism. The advanced bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

81 sequencing techniques and rapidly developed bioinformatic methods had become the

82 practical ways to further explore the molecular mechanisms of fungal development

83 and metabolism, the systematic and the evolutionary relationship of fungi.

84 To date, numerous genomes of fungi under have been released in

85 Ensembl fungus database (http://fungi.ensembl.org/index.html). Based on the

86 available genome sequences, researchers not only ascertained evolutionary

87 relationship of many fungi (Zheng et al. 2011; Bushley et al. 2013; Xia et al. 2017),

88 but also identified quite a lot medically active components (Zheng et al. 2011; Yin et

89 al. 2012; Bushley et al. 2013; Quandt et al. 2015). Meanwhile, various transcription

90 factors (TFs), including bZIP TFs, zinc finger TFs and fungal-specific TFs, were

91 proved to be involved in fruiting body development by transcriptome analysis on the

92 basis of genome sequences (Zheng et al. 2011; Yin et al. 2012; Yang et al. 2016).

93 However, the whole genome sequence for C. guangdongensis is still lacking.

94 In order to acquire abundant molecular information for further effectively

95 exploring the relevance of phenotypic characters and genetic mechanisms in C.

96 guangdongensis, the whole genome of C. guangdongensis was firstly sequenced by a

97 combined Illumina HiSeq2000 and PacBio platform in this study. The assemble level,

98 the types of transposable elements and the transcriptional factors (TFs) were further

99 analyzed. This genomic resource provided a novel insight into further genetic

100 researches that will facilitate in its application addressed in many areas of biology.

101 MATERIAL AND METHODS

102 Fungal strains and DNA extraction bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

103 Sample used for whole genome sequencing and assembly was isolated from the

104 strain GDGM30035 (wild fruiting bodies of C. guangdongensis). The strain was

105 cultured on the PDA medium at 23±1° for four weeks, aqueous suspensions of fungal

106 spores were prepared by pouring sterile distilled water onto the sporulated cultures

107 and gently scrapping the agar surface. The spore suspension was collected by passing

108 through four layers of sterile cheesecloth to remove mycelial fragments. Spore

109 suspension was adjusted to 1×103 conidia ml/l using a hemocytometer, and was

110 coated on the PDA medium covered with cellophane. Single colony (C.

111 guangdongensis GD15) was transferred onto new PDA medium and subcultured for

112 three times. For DNA isolation, the strain was cultured on PDA medium, which was

113 then covered with cellophane at 23°. Genomic DNA from 7-day-old fungal colony

114 was extracted using a CTAB-based extraction buffer (Watanabe et al. 2010). The

115 DNA concentration was determined using UV-Vis spectrophotometer (BioSpec-nano),

116 and the integrity of the DNA was detected using a 0.8% agarose gel, while the purity

117 of the DNA was analyzed by a PCR program using 16S rDNA primers.

118 Genome sequencing and assembly

119 The genome of C. guangdongensis GD15 was sequenced with the hybrid of the

120 next generation sequencing technology Illumina HiSeq 2000 and the third-generation

121 sequencing platform, PacBio, at the Beijing Genomics Institute at Shenzhen. DNA

122 libraries with 500 bp inserts were constructed and sequenced with the Illumina

123 HiSeq2000 Genome Analyzer sequencing technique. Long insert SMRTbell template

124 libraries were prepared according to PacBio protocols. The unqualified raw reads bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

125 obtained by PacBio were filtered out, the subreads (≥1000bp) were corrected by

126 Proovread 2.12 (https://github.com/BioInf-Wuerzburg/proovread) and then were

127 initially assembled by SMRT Analysis v.2.3.0 (Chin et al. 2013). The preliminary

128 assembly results were further corrected using small Illumina reads by GATK v1.6-13

129 (http://www.broadinstitute.org/gatk/), the scaffolds were assembled and optimized

130 using long Illumina reads by SSPACE Basic v2.0

131 (http://www.baseclear.com/genomics/bioinformatics/basetools/SSPACE) and

132 PBJelly2 v15.8.24 (https://sourceforge.net/projects/pb-jelly/).

133 Genome components analysis

134 The characteristic telomeric repeats (TTAGGG/CCCTAA) were searched on the

135 both ends of each scaffold within 100bp length. Repetitive elements included tandem

136 repeats and transposable elements (TEs). Tandem repeats were searched in all

137 scaffolds by Tandem Repeats Finder (TRF 4.04) described by Benson (1999). TEs

138 annotation was performed by RepeatMasker 4.06 (Smit et al. 2014) based on the

139 Repbase database (http://www.girinst.org/repbase/). The tRNAs were predicted by

140 tRNAscan-SE 1.3.1 (Lowe and Eddy, 1997), rRNAs were identified using RNAmmer

141 1.2 (Lagesen et al. 2007), and sRNA were predicted by Infernal based on the Rfam

142 database (Gardner et al. 2009). Genes were annotated based on sequence homology

143 and de novo gene predictions. The homology approach was based on the reference

144 genomes download from EnsemblFungi (http://fungi.ensembl.org/index.html)

145 including the protein sequences of C. militaris, O. sinensis and Cordyceps

146 ophioglossoides (= ophioglossoides). The de novo gene predictions bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

147 were performed by Genemarkes 4.21 (Ter-Hovhannisyan et al. 2008).

148 Functional annotation

149 Structural and functional annotations of genes were performed according to

150 various databases of ARDB (Antibiotic Resistance Genes Database) (Liu et al. 2009),

151 CAZymes (Carbohydrate-Active enZYmes Database) (Cantarel et al. 2009), COG

152 (Cluster of Orthologous Groups) (Tatusov et al. 2003), GO (Gene Ontology) (Bard

153 and Winter 2000), KEGG (Kyoto Encyclopedia of Genes and Genomes) (Kanehisa et

154 al. 2006), NR (Non-Redundant Protein Database), P450 (Magrane et al. 2011), PHI

155 (Pathogen Host Interactions) (Torto-Alalibo et al. 2009), SwissProt (Magrane et al.

156 2011), T3SS (Type III secretion system Effector protein) (Vargas et al. 2012),

157 TrEMBL, VFDB (Virulence Factors of Pathogenic Bacteria) (Chen et al. 2016)and so

158 on. Transcription factors (TFs) were analyzed based on the TRANSFAC database

159 (http://www.gene-regulation.com/pub/databases.html#transfac).

160 Data availability

161 The genome sequencing project has been deposited at GenBank under the

162 accession number NRQP00000000. The BioProject designation for this project is

163 PRJNA399600.

164 RESULTS AND DISCUSSION

165 Whole-genome assembly

166 A total of 3,926,378,523 reads representing a cumulative size of 3.926 Gb were bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

167 generated, including 13,392,532 and 3,912,985,991 reads obtained from Illumina and

168 PacBio sequencing platforms, respectively. The PacBio sequencing results showed

169 high quality of polymerasereads and subreads (Fig.S1). After the low quality reads

170 were filtered, a total of 3,484,503,143 reads were assembled into 9 scaffolds with an

171 N50 of 7.88 Mb from ∼183 average coverage, and a total length of 29.05 Mb with a

172 57.01% GC content was obtained (Fig. 1A). Based on the Illumina sequencing data,

173 the predicted genome size by K-mer analysis was 31.58Mb, the total size of the

174 combined assembly closely matched this estimate size (91.98%). Besides, compared

175 to the previously reported draft genomes listed in Table 1, the new assembly

176 sequences were more contiguous with higher GC content.

177 Chromosome analysis

178 Sequences analysis of telomeric repeats was used to estimate the number of

179 chromosomes for C. guangdongensis genome according to the method described by

180 Zheng et al. (2011). The characteristic telomeric repeats (TTAGGG/CCCTAA)n were

181 found at either 5’ or 3’ terminal of 6 scaffolds, of which the telomeric repeats were

182 found at both ends of scaffolds 1 and 3, suggesting that the two scaffolds are complete

183 chromosomes. The lengths of the two complete chromosomes were about 8.81M and

184 5.00M, respectively. Single-ended telomeric repeats were found at four scaffolds,

185 including the start of scaffolds 4 and 6, the end of scaffolds 2 and 5, suggesting that

186 these four scaffolds extended to the telomeres. The lengths of the four candidate

187 scaffolds are about 7.88M, 4.50M, 2.05M and 0.61M, respectively. The remaining

188 three scaffolds contained none of telomeric repeats, possibly due to incompleteness of bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

189 the scaffold sequence data (Table 2, Table S1). Furthermore, on scaffold 7, 14 genes

190 were identified which belonged to the core genes of mitochondrial genome, indicating

191 that this scaffold represent mitochondrial genome sequence. Previous researches

192 showed that the haploid genome of C. militaris contains seven chromosomes (Kramer

193 et al. 2017), and Cordyceps subsessilis (=) also contains seven

194 chromosomes (Stimberg et al. 1992). Taking into consideration of the chromosome

195 number in the above related species and the present telomeric repeats analysis, it was

196 inferred that C. guangdongensis may possibly also contain seven chromosomes, and

197 this hypothesis should be further proved by karyotype analysis.

198 Genome features and annotation

199 As shown in Table 3, a total of 9150 protein-coding genes were predicted in the

200 genome, including 31 rRNA, 111 tRNA, 121 sRNA, 25 snRNA and 26 miRNA. The

201 cumulative length of total genes accounted for 56.35% of the whole genome sequence

202 length, and the lengths of most genes were in the range of 200-5000bp (Fig. S3). The

203 exons showed a large proportion (48.29%) with the max number of 29,548, and the

204 number of introns was 20,398, with a total length of 2.34M (8.06%). The ncRNA with

205 a total number of 314 represented 0.4% of the genome assembly, suggesting that they

206 contributed only a small proportion to the overall genome size (Fig. 1B).

207 Of the 9150 identified genes, 8486 genes (92.74%) were annotated by the

208 databases described in the methods (Table S2). The present paper was mainly focused

209 on the genes involved in metabolic process. Among all the predicted genes,

210 approximately 37.22% were annotated by KEGG pathway, including 15.49% bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

211 involved in metabolism, accounting for the major proportion. Genes classified into

212 functional categories based on the COG analysis accounted for 23.83%, including

213 13.70% involved in metabolic processes. Thereinto, approximately 2.01% of total

214 genes were related to the biosynthesis, transportation, and catabolism of secondary

215 metabolites (Fig. 2). The percentage of genes encoding CAZymes was 2.84%, which

216 contributed to substrate degradation processes as nutrition for fungal development and

217 reproduction. Among the genes related with CAZymes, 103 genes encoding glycoside

218 hydrolases (GHs) accounted for the largest proportion of 1.12% of the total genes,

219 followed by 78 genes encoding carbohydrate-binding modules (CBMs) accounting for

220 0.85%,.and then 66 genes encoding glycosyl transferases (GTs) with a percentage of

221 0.72%. Genes belong to the carbohydrate esterases (CEs) and polysaccharide lyases

222 (PLs) had much lower percentages of about 0.13% and 0.01%, respectively. Since the

223 genes relevant to CAZymes in Pleurotus eryngii were not only involved in XXXX,

224 but also involved in its primordium differentiation and fruiting body development

225 (Xie et al. 2017), the above identified genes of CAZymes in C. guangdongensis could

226 likely also be involved in the primordium differentiation and fruiting body

227 development, and the studied results in this paper will be beneficial to further study

228 on the genetic and molecular mechanisms of the fruiting body development.

229 Repetitive elements

230 The cumulative sequences of repetitive elements identified in C. guangdongensis

231 genome occupied a percentage of 2.77% of the assembly sequences. The tandem

232 repeats represented 1.42% of the genome assembly with a total length of 412,989 bp; bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

233 and the TEs represented 1.35% of the genome assembly with a total length of 393,608

234 bp. The total number of TE families analyzed by RepeatMasker in the genome

235 assembly is 1534, of which 1527 (99.5%) belonged to the known TEs, including 1033

236 retrotransposons (Class I) and 494 DNA transposons (Class II); while the remaining

237 TEs could not be classified at this moment (Table 4 and Table S3).

238 The retrotransposons in Class I can be mainly divided into three groups of TEs,

239 including LINE, LTR and SINE; and each group contains several subgroups.

240 Retrotransposons, particularly L1, Copia, DIRS, ERV1, Gypsy, Pao, Alu and so on,

241 are the easiest ones to be annotated, which are also the most abundant transposons in

242 fungi (Suarez et al. 2017). The DNA transposons in Class II contain a lot of known

243 groups and some unclassified members. Among the transposons, hAT, MULE,

244 PIF-Harb and Tc1-Mariner were reported to be extraordinarily abundant in fungi,

245 while the transposons CMC and piggyBac have limited taxonomic distribution and

246 seem to remain in a few fungal taxa (Muszewska et al. 2017). Other transposons,

247 including P, Sola, Dada, Ginger, Zisupton and Merlin, had been identified only in a

248 handful of species (Kojima and Jurka 2013; Iyer et al. 2014; Majorek et al. 2014).

249 Previous studies indicated that TEs contributed to genome size expansion and

250 evolution (Cordaux et al. 2009; Sun et al. 2012), and played a crucial roles in a wide

251 range of biological events, including organism development (Kano et al. 2009;

252 Garcia-Perez et al. 2016), regulation (Elbarbary et al. 2016) and differentiation

253 (Morales-Hernandez et al. 2016), sometimes acting as novel promoters to activate

254 transcriptive process (Faulkner et al. 2009; Mita et al. 2016). Therefore, the bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

255 abundance in C. guangdongensis would have more significance in this regard, and

256 they should be further noticed for studying both fungal taxonomy application and

257 regulatory roles in fruiting body development by bonding to the transcription factors.

258 Transcription factors

259 Transcription factors (TFs) are essential to modulating diverse biological

260 processes by regulating gene expression and play central roles in the development and

261 evolution. In this study, function annotation identified 435 genes to be TFs in C.

262 guangdongensis, accounting for 4.75% of the total predicted genes. Like other fungi,

263 genes encoding fungal-specific TFs (86 members) and fungal transcription regulatory

264 proteins (80 members) represented the two largest classes of TFs in C.

265 guangdongensis, followed by C2H2-type Zinc finger TFs (54 members) and Winged

266 helix-repressor DNA binding proteins (54 members), accounting for approximately

267 19.77%, 18.39%, 12.41% and 12.41% of the total predicted TFs, respectively (Fig. 3).

268 Moreover, other different types of Zinc finger transcription factors were

269 identified, including 13 CCHC-type Zinc finger TFs (2.98%), 6 DHHC-type TFs

270 (0.06%) and 3 MIZ-type TFs (0.03%). Apart from these, there were 19 bZIP TFs

271 (0.21%), 16 MYB TFs (0.17%), 7 GATA TFs (0.08%) and 9 Homeobox-type TFs

272 (0.10%). In C. militaris, majority of TFs, such as Zn2Cys6-type TFs, GATA-type TFs,

273 bZIP TFs, CHCC-type TFs, were differential expressions during fruiting body

274 developmental stages (Zheng et al. 2011). Hence, the known information about

275 various TFs could guide the search for exploring the regulatory mechanism of TFs on

276 fruiting body development in C. guangdongensis. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

277 Conclusion

278 High quality of genome sequencing of C. guangdongensis was presented in this

279 study. Two complete chromosomes and four single-end chromosomes were assembled

280 through sequence analysis of the genome. In the genomic sequences, diversities of

281 transposable elements were identified, which may contribute to the genome size and

282 evolution. Moreover, transcription factors existing in C. guangdongensis were

283 identified and classified, which may facilitate the further studying of fruiting body

284 development. With the knowledge of the whole genome sequence, more detailed

285 molecular information will be revealed, providing interesting insights into better

286 understanding the relevance of phenotypic characters and genetic mechanism in C.

287 guangdongensis.

288 ACKNOWLEDGMENTS

289 This work was supported by GDAS' Special Project of Science and Technology

290 Development (No. 2017GDASCX-0822), Natural Science Foundation of Guangdong

291 Province, China (2017A030310533), the Science and Technology Planning Project of

292 Guangdong Province, China (No. 2015A030302052; 2016A030303035), the National

293 Natural Science Foundation of China (No. 31470155), and the Science and

294 Technology Planning Project of Guangzhou, China (No. 201504291620324). The

295 authors sincerely thank the professor Chengshu Wang in Shanghai Institute for

296 Biological Sciences for providing the guidance of genome sequencing.

297 LITERATURE CITED bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

298 Bard, J., and R. Winter, 2000 Gene Ontology:tool for the unification of biology. Nat

299 Genet. 25: 25-29.

300 Benson, G., 1999 Tandem repeats finder: a program to analyze DNA sequences.

301 Nucleic Acids Res. 27: 573-580.

302 Bushley, K. E., R. Raja, P. Jaiswal, J. S. Cumbie, M. Nonogaki et al., 2013 The

303 Genome of Tolypocladium inflatum: Evolution, Organization, and Expression of

304 the Cyclosporin Biosynthetic Gene Cluster. PLoS Genet. 9: e1003496.

305 Cantarel, B. L., P. M. Coutinho, C. Rancurel, T. Bernard, V. Lombard et al., 2009 The

306 Carbohydrate-Active EnZymes database (CAZy): an expert resource for

307 Glycogenomics. Nucl Acids Res. 37: 233-238.

308 Chen, L. H., D. D. Zheng, B. Liu, J. Yang and Q. Jin, 2016 VFDB 2016 hierarchical

309 and refined dataset for big data analysis-10 years on. Nucleic Acids Res. 44:

310 694-697.

311 Chin, C.S., D.H. Alexander, P. Marks, A.A. Klammer, J. Drake et al., 2013 Nonhybrid,

312 finished microbial genome assemblies from long-read SMRT sequencing data.

313 Nat Methods. 10: 563-569.

314 Cordaux, R., and M.A. Batzer, 2009 The impact of retrotransposons on human

315 genome evolution. Nat Rev Genet. 10: 691-703.

316 Elbarbary, R. A., B.A. Lucas, and L.E. Maquat, 2016 Retrotransposons as regulators

317 of gene expression. Science. 351:aac7247. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

318 Faulkner, G.J., Y. Kimura, C.O. Daub, S. Wani, C. Plessy et al., 2009 The regulated

319 retrotransposon transcriptome of mammalian cells. Nat Genet. 41: 563-571.

320 Garcia-Perez, J. L., T.J. Widmann, and I.R. Adams, 2016 The impact of transposable

321 elements on mammalian development. Development. 143: 4101-4114.

322 Gardner, P. P, J. Daub, J. G. Tate, E. P. Nawrocki, D. L. Kolbe et al., 2009 Rfam:

323 updates to the RNA families database. Nucl Acids Res. 37: 136-140.

324 Iyer L. M., D. Zhang, R. F. de Souza, P. J. Pukkila, A Rao et al., 2014

325 Lineage-specific expansions of TET/JBP genes and a new class of DNA

326 transposons shape fungal genomic and epigenetic landscapes. Proc Natl Acad Sci

327 U S A. 111: 1676-1683.

328 Kanehisa, M., S. Goto, M. Hattori, K. F. Aoki-Kinoshita, M. Itoh et al., 2006 From

329 genomics to chemical genomics: new developments in KEGG. Nucl Acids Res.

330 34: 354-357.

331 Kano, H., I. Godoy, C. Courtney, M.R. Vetter, G.L. Gerton et al., 2009 L1

332 retrotransposition occurs mainly in embryogenesis and creates somatic

333 mosaicism. Genes Dev. 23: 1303-1312.

334 Kojima, K. K., and J. Jurka, 2013 A superfamily of DNA transposons targeting

335 multicopy small RNA genes. PloS ONE. 8: e68260.

336 Kramer G. J., and J. R. Nodwell, 2017 Chromosome level assembly and secondary

337 metabolite potential of the parasitic fungus . BMC Genomics. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

338 18: 912-921.

339 Lagesen, K., P. F. Hallin, E. A. Rødland, H. H. Staerfeldt, T. Rognes et al., 2007

340 RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucl Acids

341 Res. 35: 3100-3108.

342 Lin, Q. Y., T. H. Li, and B. Song, 2008 Cordyceps guangdongensis sp. nov. from

343 China. Mycotaxon.103: 371-376.

344 Lin, Q. Y., T. H. Li, B. Song, and H. Huang, 2009 Comparison of selected chemical

345 component levels in Cordyceps guangdongensis, C .sinensis and C .militaris.

346 Acta Edulis Fungi. 16: 54-57.

347 Lin, Q. Y., B. Song, H. Huang, and T. H. Li, 2010 Optimization of selected cultivation

348 parameters for Cordyceps guangdongensis. Lett Appl Microbiol. 51: 219-225.

349 Liu, B., and M. Pop, 2009 ARDB-Antibiotic Resistance Genes Database. Nucleic

350 Acids Res. 37: 443-447.

351 Lowe, T. M., and S. R. Eddy, 1997 tRNAscan-SE: A Program for Improved Detection

352 of Transfer RNA Genes in Genomic Sequence. Nucl Acids Res. 25: 0955-964.

353 Magrane, M., and U. Consortium, 2011 UniProt Knowledgebase: a hub of integrated

354 protein data. Database (Oxford), bar009.

355 Majorek K. A., S. Dunin-Horkawicz, K. Steczkiewicz, A. Muszewska, M. Nowotny et

356 al., 2014 The RNase H-like superfamily: new members, comparative structural

357 analysis and evolutionary classification. Nucleic Acids Res. 42: 4160-4179. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

358 Mita, P., and J.D. Boeke, 2016 How retrotransposons shape genome regulation. Curr.

359 Opin. Genet. Dev. 37: 90-100.

360 Morales-Hernandez, A., F.J. Gonzalez-Rico, A.C. Roman, E. Rico-Leo, A.

361 AlvarezBarrientos et al., 2016 Alu retrotransposons promote differentiation of

362 human carcinoma cells through the aryl hydrocarbon receptor. Nucleic Acids Res.

363 44: 4665-4683.

364 Muszewska A., K. Steczkiewicz, M. Stepniewska-Dziubinska1, and K. Ginalski.,

365 2017 Cut-and-Paste transposons in fungi with diverse lifestyles. Genome Biol.

366 Evol. 9: 3463-3477.

367 Quandt, C. A., K. E. Bushley, and J. W. Spatafora, 2015 The genome of the

368 truffle-parasite Tolypocladium ophioglossoides and the evolution of antifungal

369 peptaibiotics. BMC Genomics. 16: 553-556.

370 Smit A. F. A., R. Hubley, and P. Green, 2014 RepeatMasker Open-4.0. 2013-2015.

371 URL http://www. repeatmasker. org.

372 Stimberg N., M. Walz, K. Schörgendorfer, and U. Kiick., 1992 Electrophoretic

373 karyotyping from Tolypocladium inflatum and six related strains allows

374 differentiation of morphologically similar species. Appl Microbiol Biotechnol.

375 37:485-9.

376 Suarez, N. A., A. Macia, and A. R. Muotri, 2017 LINE-1 Retrotransposons in healthy

377 and diseased human brain. Dev Neurobiol. 1-22. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

378 Sun, C., D.B. Shepard, R.A. Chong, J. Lopez Arriaza, K. Hall et al., 2012 LTR

379 retrotransposons contribute to genomic gigantism in plethodontid salamanders.

380 Genome Biol Evol. 4: 168-183.

381 Tatusov, R. L., N. D. Fedorova, J. D. Jackson, A. R. Jacobs, B. Kiryutin et al., 2003

382 The COG database: an updated version includes eukaryotes. BMC

383 Bioinformatics. 4: 41-54.

384 Ter-Hovhannisyan, V., A. Lomsadze, Y. O. Chernoff, and M. Borodovsky, 2008 Gene

385 prediction in novel fungal genomes using an ab initio algorithm with

386 unsupervised training. Genome Res. 18: 1979-1990.

387 Torto-Alalibo, T., C. W. Collmer and M. Gwinn-Giglio, 2009 The Plant Associated

388 Microbe Gene Ontology (PAMGO) Consortium: community development of

389 new Gene Ontology terms describing biological processes involved in

390 microbe-host interactions. BMC microbiol. 9: 1-5.

391 Watanabe, M., K. Lee, K. Goto, S. Kumagai, Y. Sugita-Konishi et al., 2010 Rapid and

392 effective DNA extraction method with bead grinding for a large amount of

393 fungal DNA. J Food Protect. 73: 1077-1084.

394 Vargas, W. A., J. M. Martín, G. E. Rech, L. P. Rivera, E. P. Benito et al., 2012 Plant

395 defense mechanisms are activated during biotrophic and necrotrophic

396 development of Colletotricum graminicola in maize. Plant Physiol. 158:

397 1342-58. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

398 Xia, E. H., D. R. Yang, J. J. Jiang, Q. J. Zhang, Y. Liu et al., 2017 The caterpillar

399 fungus, , genome provides insights into highland

400 adaptation of fungal pathogenicity. Scientific Reports.7:1806-1816.

401 Xie C., W. Gong, Z. Zhu, L. Yan, Z. Hu et al., 2017 Comparative transcriptomics of

402 Pleurotus eryngii reveals blue-light regulation of carbohydrate-active enzymes

403 (CAZymes) expression at primordium differentiated into fruiting body stage.

404 Genomics.

405 Yan, W. J., T. H. Li, and Z. D. Jiang, 2011 Anti-fatigue and life-prolonging effects of

406 Cordyceps guangdongensis. Food R & D. 32: 164-167.

407 Yan, W. J., T. H. Li, and Z. D. Jiang, 2012 Therapeutic effects of Cordyceps

408 guangdongensis on chronic renal failure rats induced by adenine. Mycosystema.

409 31: 432-442.

410 Yan, W. J., T. H. Li, J. H. Lao, B. Song, and Y. H. Shen, 2013 Anti-fatigue property of

411 Cordyceps guangdongensis and the underlying mechanisms. Pharm Biol. 51:

412 614-620.

413 Yan, W. J., T. H. Li, and Z. Y. Zhong, 2014 Anti-inflammatory effect of a novel food

414 Cordyceps guangdongensis on experimental rats with chronic bronchitis induced

415 by tobacco smoking. Food Funct. 5: 2552-2557.

416 Yang T., M. M. Guo, H. J. Yang, S. P. Guo, and C. H. Dong, 2016 The blue-light

417 receptor CmWC-1 mediates fruit body development and secondary metabolism bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

418 in Cordyceps militaris. Appl Microbiol and Biot. 100: 743-755.

419 Yin, Y., G. Yu, Y. Chen, S. Jiang, M. Wang et al., 2012 Genome-wide transcriptome

420 and proteome analysis on different developmental stages of Cordyceps militaris.

421 PLoS ONE. 7: e51853.

422 Zeng, H. B., T. H. Li, B. Song, Q. Y. Lin, and H. Huang, 2009 Study on antioxidant

423 activity of Cordyceps guangdongensis. Nat Prod Res Dev. 21: 201-204.

424 Zheng, P., Y. Xia, G. Xiao, C. H. Xiong, X. Hu et al., 2011 Genome sequence of the

425 pathogenic fungus Cordyceps militaris, a valued traditional Chinese

426 medicine. Genome Biol. 12: 116-137.

427 Table 1 Assembly summary statistics of Cordyceps guangdongensis GD15 compared

428 to other Cordyceps genomes.

429 Table 2 Chromosome analysis of Cordyceps guangdongensis GD15 genomic

430 sequence.

431 Table 3 Genome annotation features of Cordyceps guangdongensis GD15.

432 Table 4 Class analysis of transposable elements repeats in Cordyceps

433 guangdongensis.

434 Table S1 Genome sequences used to analyze the chromosome

435 Table S2 Genome annotation of proteins in Cordyceps guangdongensis genome

436 Table S3 Transposable elements classification in Cordyceps guangdongensis

437 Figure legends bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

438 Figure 1 General genomic feature of Cordyceps guangdongensis. A, I, scaffolds; II,

439 gene density represented as the number of genes per 100 kb; III, percentage of

440 coverage of repetitive sequences; IV, GC content estimated by the percentage of G +

441 C in 100 kb. B, Genomic element density including genic and nongenic features of the

442 overall genome assembly length including 40.5% non-annotated sequences.

443 Figure 2 COG functional classification of proteins in the Cordyceps guangdongensis

444 genome.

445 Figure 3 Transcription factors analysis in the Cordyceps guangdongensis genome.

446 Figure S1 Length and quality distributions of PacBio reads.

447 Figure S2 Length statistics of scaffolds generated by genomic sequencing.

448 Figure S3 Distribution of gene length predicted in the Cordyceps guangdongensis

449 genome. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.