bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
1 Whole genome sequence of an edible and potential medicinal fungus, Cordyceps
2 guangdongensis
3 Chenghua Zhang, Wangqiu Deng, Wenjuan Yan, Taihui Li1
4 State Key Laboratory of Applied Microbiology Southern China, Guangdong
5 Provincial Key Laboratory of Microbial Culture Collection and Application,
6 Guangdong Open Laboratory of Applied Microbiology, Guangdong Institute of
7 Microbiology, Guangzhou, 510070, China
8 References number: 47
9
10
11
12
13
14
15
16
17
18 bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
19 Running title: Genome of Cordyceps guangdongensis
20 Keywords: Cordyceps, chromosome, transporters, transcription factors, Genome
21 Report
22 1Corresponding author: Guangdong Institute of Microbiology, Building 59, No.100
23 Courtyard, Xianlie Zhong Road, Yuexiu District, Guangzhou City, P.R. China,510070.
24 E -mail: [email protected]. Tel: 020-87137620
25
26
27
28
29
30
31
32
33
34
35
36 bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
37 ABSTRACT
38 Cordyceps guangdongensis is an edible fungus which has been approved as a
39 Novel Food by the Chinese Ministry of Public Health in 2013. It also has a broad
40 application prospect in pharmaceutical industries with many medicinal activities. In
41 this study, the whole genome of C. guangdongensis GD15, a single spore isolate from
42 a wild strain, was sequenced and assembled with Illumina and PacBio sequencing
43 technology. The generated genome is 29.05 Mb in size, comprising 9 scaffolds with
44 an average GC content of 57.01%. It is predicted to contain a total of 9150
45 protein-coding genes. Sequence identification and comparative analysis indicated that
46 the assembled scaffolds contained two complete chromosomes and four single-end
47 chromosomes, showing a high level assembly. Gene annotation revealed a diversity of
48 transporters that could contribute to the genome size and evolution. Besides,
49 approximately 15.49% and 13.70% genes involved in metabolic processes were
50 annotated by KEGG and COG respectively. Genes belonging to CAZymes accounted
51 for a proportion of 2.84% of the total genes. In addition, 435 transcription factors
52 (TFs) were identified, which were involved in various biological processes. Among
53 the identified TFs, the fungal transcription regulatory proteins (18.39%) and
54 fungal-specific TFs (19.77%) represented the two largest classes of TFs. These data
55 provided a much needed genomic resource for studying C. guangdongensis, laying a
56 solid foundation for further genetic and biological studies, especially for elucidating
57 the genome evolution and exploring the regulatory mechanism of fruiting body
58 development. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
59 INTRODUCTION
60 Cordyceps guangdongensis T. H. Li, Q. Y. Lin & B. Song (Cordycipitaceae) was
61 discovered in southern China (Lin et al. 2008), and has been successfully cultivated
62 (Lin et al. 2010). Its fruiting body is nontoxic, and has become the second Novel Food
63 of Cordyceps species approved by the Ministry of Public Health of China in 2013.
64 This fungus has rich nutrient contents and bioactive compounds, such as cordycepic
65 acid, adenosine and polysaccharide (Lin et al. 2009; 2010). The contents of the
66 above-mentioned bioactive compounds in C. guangdongensis are similar to those of
67 the traditional Chinese invigorant, Ophiocordyceps sinensis (=Cordyceps sinensis)
68 (Lin et al. 2009). Previous researches by the authors’ group indicated that the fruiting
69 bodies of C. guangdongensis possessed various therapeutic properties, including
70 antioxidant activity (Zeng et al. 2009), longevity-increasing activity (Yan et al. 2011),
71 anti-fatigue effect (Yan et al. 2013), curative effect on chronic renal failure (Yan et al.
72 2012), and anti-inflammatory effect (Yan et al. 2014). These active effects provided
73 great potential for its application in food and medicinal industries. However, it is
74 worth considering what makes the fruiting bodies of C. guangdongensis so valuable
75 and how to improve the fruiting body yield. Therefore, it is a matter of cardinal
76 significance for further understanding the fruiting body development and metabolism
77 mechanisms of C. guangdongensis, as well as the evolutionary relationship between
78 other related species.
79 In recent years, whole genome sequencing (WGS) is widely used to analyze
80 the relevance of phenotypic characters and genetic mechanism. The advanced bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
81 sequencing techniques and rapidly developed bioinformatic methods had become the
82 practical ways to further explore the molecular mechanisms of fungal development
83 and metabolism, the systematic taxonomy and the evolutionary relationship of fungi.
84 To date, numerous genomes of fungi under Hypocreales have been released in
85 Ensembl fungus database (http://fungi.ensembl.org/index.html). Based on the
86 available genome sequences, researchers not only ascertained evolutionary
87 relationship of many fungi (Zheng et al. 2011; Bushley et al. 2013; Xia et al. 2017),
88 but also identified quite a lot medically active components (Zheng et al. 2011; Yin et
89 al. 2012; Bushley et al. 2013; Quandt et al. 2015). Meanwhile, various transcription
90 factors (TFs), including bZIP TFs, zinc finger TFs and fungal-specific TFs, were
91 proved to be involved in fruiting body development by transcriptome analysis on the
92 basis of genome sequences (Zheng et al. 2011; Yin et al. 2012; Yang et al. 2016).
93 However, the whole genome sequence for C. guangdongensis is still lacking.
94 In order to acquire abundant molecular information for further effectively
95 exploring the relevance of phenotypic characters and genetic mechanisms in C.
96 guangdongensis, the whole genome of C. guangdongensis was firstly sequenced by a
97 combined Illumina HiSeq2000 and PacBio platform in this study. The assemble level,
98 the types of transposable elements and the transcriptional factors (TFs) were further
99 analyzed. This genomic resource provided a novel insight into further genetic
100 researches that will facilitate in its application addressed in many areas of biology.
101 MATERIAL AND METHODS
102 Fungal strains and DNA extraction bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
103 Sample used for whole genome sequencing and assembly was isolated from the
104 strain GDGM30035 (wild fruiting bodies of C. guangdongensis). The strain was
105 cultured on the PDA medium at 23±1° for four weeks, aqueous suspensions of fungal
106 spores were prepared by pouring sterile distilled water onto the sporulated cultures
107 and gently scrapping the agar surface. The spore suspension was collected by passing
108 through four layers of sterile cheesecloth to remove mycelial fragments. Spore
109 suspension was adjusted to 1×103 conidia ml/l using a hemocytometer, and was
110 coated on the PDA medium covered with cellophane. Single colony (C.
111 guangdongensis GD15) was transferred onto new PDA medium and subcultured for
112 three times. For DNA isolation, the strain was cultured on PDA medium, which was
113 then covered with cellophane at 23°. Genomic DNA from 7-day-old fungal colony
114 was extracted using a CTAB-based extraction buffer (Watanabe et al. 2010). The
115 DNA concentration was determined using UV-Vis spectrophotometer (BioSpec-nano),
116 and the integrity of the DNA was detected using a 0.8% agarose gel, while the purity
117 of the DNA was analyzed by a PCR program using 16S rDNA primers.
118 Genome sequencing and assembly
119 The genome of C. guangdongensis GD15 was sequenced with the hybrid of the
120 next generation sequencing technology Illumina HiSeq 2000 and the third-generation
121 sequencing platform, PacBio, at the Beijing Genomics Institute at Shenzhen. DNA
122 libraries with 500 bp inserts were constructed and sequenced with the Illumina
123 HiSeq2000 Genome Analyzer sequencing technique. Long insert SMRTbell template
124 libraries were prepared according to PacBio protocols. The unqualified raw reads bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
125 obtained by PacBio were filtered out, the subreads (≥1000bp) were corrected by
126 Proovread 2.12 (https://github.com/BioInf-Wuerzburg/proovread) and then were
127 initially assembled by SMRT Analysis v.2.3.0 (Chin et al. 2013). The preliminary
128 assembly results were further corrected using small Illumina reads by GATK v1.6-13
129 (http://www.broadinstitute.org/gatk/), the scaffolds were assembled and optimized
130 using long Illumina reads by SSPACE Basic v2.0
131 (http://www.baseclear.com/genomics/bioinformatics/basetools/SSPACE) and
132 PBJelly2 v15.8.24 (https://sourceforge.net/projects/pb-jelly/).
133 Genome components analysis
134 The characteristic telomeric repeats (TTAGGG/CCCTAA) were searched on the
135 both ends of each scaffold within 100bp length. Repetitive elements included tandem
136 repeats and transposable elements (TEs). Tandem repeats were searched in all
137 scaffolds by Tandem Repeats Finder (TRF 4.04) described by Benson (1999). TEs
138 annotation was performed by RepeatMasker 4.06 (Smit et al. 2014) based on the
139 Repbase database (http://www.girinst.org/repbase/). The tRNAs were predicted by
140 tRNAscan-SE 1.3.1 (Lowe and Eddy, 1997), rRNAs were identified using RNAmmer
141 1.2 (Lagesen et al. 2007), and sRNA were predicted by Infernal based on the Rfam
142 database (Gardner et al. 2009). Genes were annotated based on sequence homology
143 and de novo gene predictions. The homology approach was based on the reference
144 genomes download from EnsemblFungi (http://fungi.ensembl.org/index.html)
145 including the protein sequences of C. militaris, O. sinensis and Cordyceps
146 ophioglossoides (=Tolypocladium ophioglossoides). The de novo gene predictions bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
147 were performed by Genemarkes 4.21 (Ter-Hovhannisyan et al. 2008).
148 Functional annotation
149 Structural and functional annotations of genes were performed according to
150 various databases of ARDB (Antibiotic Resistance Genes Database) (Liu et al. 2009),
151 CAZymes (Carbohydrate-Active enZYmes Database) (Cantarel et al. 2009), COG
152 (Cluster of Orthologous Groups) (Tatusov et al. 2003), GO (Gene Ontology) (Bard
153 and Winter 2000), KEGG (Kyoto Encyclopedia of Genes and Genomes) (Kanehisa et
154 al. 2006), NR (Non-Redundant Protein Database), P450 (Magrane et al. 2011), PHI
155 (Pathogen Host Interactions) (Torto-Alalibo et al. 2009), SwissProt (Magrane et al.
156 2011), T3SS (Type III secretion system Effector protein) (Vargas et al. 2012),
157 TrEMBL, VFDB (Virulence Factors of Pathogenic Bacteria) (Chen et al. 2016)and so
158 on. Transcription factors (TFs) were analyzed based on the TRANSFAC database
159 (http://www.gene-regulation.com/pub/databases.html#transfac).
160 Data availability
161 The genome sequencing project has been deposited at GenBank under the
162 accession number NRQP00000000. The BioProject designation for this project is
163 PRJNA399600.
164 RESULTS AND DISCUSSION
165 Whole-genome assembly
166 A total of 3,926,378,523 reads representing a cumulative size of 3.926 Gb were bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
167 generated, including 13,392,532 and 3,912,985,991 reads obtained from Illumina and
168 PacBio sequencing platforms, respectively. The PacBio sequencing results showed
169 high quality of polymerasereads and subreads (Fig.S1). After the low quality reads
170 were filtered, a total of 3,484,503,143 reads were assembled into 9 scaffolds with an
171 N50 of 7.88 Mb from ∼183 average coverage, and a total length of 29.05 Mb with a
172 57.01% GC content was obtained (Fig. 1A). Based on the Illumina sequencing data,
173 the predicted genome size by K-mer analysis was 31.58Mb, the total size of the
174 combined assembly closely matched this estimate size (91.98%). Besides, compared
175 to the previously reported draft genomes listed in Table 1, the new assembly
176 sequences were more contiguous with higher GC content.
177 Chromosome analysis
178 Sequences analysis of telomeric repeats was used to estimate the number of
179 chromosomes for C. guangdongensis genome according to the method described by
180 Zheng et al. (2011). The characteristic telomeric repeats (TTAGGG/CCCTAA)n were
181 found at either 5’ or 3’ terminal of 6 scaffolds, of which the telomeric repeats were
182 found at both ends of scaffolds 1 and 3, suggesting that the two scaffolds are complete
183 chromosomes. The lengths of the two complete chromosomes were about 8.81M and
184 5.00M, respectively. Single-ended telomeric repeats were found at four scaffolds,
185 including the start of scaffolds 4 and 6, the end of scaffolds 2 and 5, suggesting that
186 these four scaffolds extended to the telomeres. The lengths of the four candidate
187 scaffolds are about 7.88M, 4.50M, 2.05M and 0.61M, respectively. The remaining
188 three scaffolds contained none of telomeric repeats, possibly due to incompleteness of bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
189 the scaffold sequence data (Table 2, Table S1). Furthermore, on scaffold 7, 14 genes
190 were identified which belonged to the core genes of mitochondrial genome, indicating
191 that this scaffold represent mitochondrial genome sequence. Previous researches
192 showed that the haploid genome of C. militaris contains seven chromosomes (Kramer
193 et al. 2017), and Cordyceps subsessilis (=Tolypocladium inflatum) also contains seven
194 chromosomes (Stimberg et al. 1992). Taking into consideration of the chromosome
195 number in the above related species and the present telomeric repeats analysis, it was
196 inferred that C. guangdongensis may possibly also contain seven chromosomes, and
197 this hypothesis should be further proved by karyotype analysis.
198 Genome features and annotation
199 As shown in Table 3, a total of 9150 protein-coding genes were predicted in the
200 genome, including 31 rRNA, 111 tRNA, 121 sRNA, 25 snRNA and 26 miRNA. The
201 cumulative length of total genes accounted for 56.35% of the whole genome sequence
202 length, and the lengths of most genes were in the range of 200-5000bp (Fig. S3). The
203 exons showed a large proportion (48.29%) with the max number of 29,548, and the
204 number of introns was 20,398, with a total length of 2.34M (8.06%). The ncRNA with
205 a total number of 314 represented 0.4% of the genome assembly, suggesting that they
206 contributed only a small proportion to the overall genome size (Fig. 1B).
207 Of the 9150 identified genes, 8486 genes (92.74%) were annotated by the
208 databases described in the methods (Table S2). The present paper was mainly focused
209 on the genes involved in metabolic process. Among all the predicted genes,
210 approximately 37.22% were annotated by KEGG pathway, including 15.49% bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
211 involved in metabolism, accounting for the major proportion. Genes classified into
212 functional categories based on the COG analysis accounted for 23.83%, including
213 13.70% involved in metabolic processes. Thereinto, approximately 2.01% of total
214 genes were related to the biosynthesis, transportation, and catabolism of secondary
215 metabolites (Fig. 2). The percentage of genes encoding CAZymes was 2.84%, which
216 contributed to substrate degradation processes as nutrition for fungal development and
217 reproduction. Among the genes related with CAZymes, 103 genes encoding glycoside
218 hydrolases (GHs) accounted for the largest proportion of 1.12% of the total genes,
219 followed by 78 genes encoding carbohydrate-binding modules (CBMs) accounting for
220 0.85%,.and then 66 genes encoding glycosyl transferases (GTs) with a percentage of
221 0.72%. Genes belong to the carbohydrate esterases (CEs) and polysaccharide lyases
222 (PLs) had much lower percentages of about 0.13% and 0.01%, respectively. Since the
223 genes relevant to CAZymes in Pleurotus eryngii were not only involved in XXXX,
224 but also involved in its primordium differentiation and fruiting body development
225 (Xie et al. 2017), the above identified genes of CAZymes in C. guangdongensis could
226 likely also be involved in the primordium differentiation and fruiting body
227 development, and the studied results in this paper will be beneficial to further study
228 on the genetic and molecular mechanisms of the fruiting body development.
229 Repetitive elements
230 The cumulative sequences of repetitive elements identified in C. guangdongensis
231 genome occupied a percentage of 2.77% of the assembly sequences. The tandem
232 repeats represented 1.42% of the genome assembly with a total length of 412,989 bp; bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
233 and the TEs represented 1.35% of the genome assembly with a total length of 393,608
234 bp. The total number of TE families analyzed by RepeatMasker in the genome
235 assembly is 1534, of which 1527 (99.5%) belonged to the known TEs, including 1033
236 retrotransposons (Class I) and 494 DNA transposons (Class II); while the remaining
237 TEs could not be classified at this moment (Table 4 and Table S3).
238 The retrotransposons in Class I can be mainly divided into three groups of TEs,
239 including LINE, LTR and SINE; and each group contains several subgroups.
240 Retrotransposons, particularly L1, Copia, DIRS, ERV1, Gypsy, Pao, Alu and so on,
241 are the easiest ones to be annotated, which are also the most abundant transposons in
242 fungi (Suarez et al. 2017). The DNA transposons in Class II contain a lot of known
243 groups and some unclassified members. Among the transposons, hAT, MULE,
244 PIF-Harb and Tc1-Mariner were reported to be extraordinarily abundant in fungi,
245 while the transposons CMC and piggyBac have limited taxonomic distribution and
246 seem to remain in a few fungal taxa (Muszewska et al. 2017). Other transposons,
247 including P, Sola, Dada, Ginger, Zisupton and Merlin, had been identified only in a
248 handful of species (Kojima and Jurka 2013; Iyer et al. 2014; Majorek et al. 2014).
249 Previous studies indicated that TEs contributed to genome size expansion and
250 evolution (Cordaux et al. 2009; Sun et al. 2012), and played a crucial roles in a wide
251 range of biological events, including organism development (Kano et al. 2009;
252 Garcia-Perez et al. 2016), regulation (Elbarbary et al. 2016) and differentiation
253 (Morales-Hernandez et al. 2016), sometimes acting as novel promoters to activate
254 transcriptive process (Faulkner et al. 2009; Mita et al. 2016). Therefore, the bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
255 abundance in C. guangdongensis would have more significance in this regard, and
256 they should be further noticed for studying both fungal taxonomy application and
257 regulatory roles in fruiting body development by bonding to the transcription factors.
258 Transcription factors
259 Transcription factors (TFs) are essential to modulating diverse biological
260 processes by regulating gene expression and play central roles in the development and
261 evolution. In this study, function annotation identified 435 genes to be TFs in C.
262 guangdongensis, accounting for 4.75% of the total predicted genes. Like other fungi,
263 genes encoding fungal-specific TFs (86 members) and fungal transcription regulatory
264 proteins (80 members) represented the two largest classes of TFs in C.
265 guangdongensis, followed by C2H2-type Zinc finger TFs (54 members) and Winged
266 helix-repressor DNA binding proteins (54 members), accounting for approximately
267 19.77%, 18.39%, 12.41% and 12.41% of the total predicted TFs, respectively (Fig. 3).
268 Moreover, other different types of Zinc finger transcription factors were
269 identified, including 13 CCHC-type Zinc finger TFs (2.98%), 6 DHHC-type TFs
270 (0.06%) and 3 MIZ-type TFs (0.03%). Apart from these, there were 19 bZIP TFs
271 (0.21%), 16 MYB TFs (0.17%), 7 GATA TFs (0.08%) and 9 Homeobox-type TFs
272 (0.10%). In C. militaris, majority of TFs, such as Zn2Cys6-type TFs, GATA-type TFs,
273 bZIP TFs, CHCC-type TFs, were differential expressions during fruiting body
274 developmental stages (Zheng et al. 2011). Hence, the known information about
275 various TFs could guide the search for exploring the regulatory mechanism of TFs on
276 fruiting body development in C. guangdongensis. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
277 Conclusion
278 High quality of genome sequencing of C. guangdongensis was presented in this
279 study. Two complete chromosomes and four single-end chromosomes were assembled
280 through sequence analysis of the genome. In the genomic sequences, diversities of
281 transposable elements were identified, which may contribute to the genome size and
282 evolution. Moreover, transcription factors existing in C. guangdongensis were
283 identified and classified, which may facilitate the further studying of fruiting body
284 development. With the knowledge of the whole genome sequence, more detailed
285 molecular information will be revealed, providing interesting insights into better
286 understanding the relevance of phenotypic characters and genetic mechanism in C.
287 guangdongensis.
288 ACKNOWLEDGMENTS
289 This work was supported by GDAS' Special Project of Science and Technology
290 Development (No. 2017GDASCX-0822), Natural Science Foundation of Guangdong
291 Province, China (2017A030310533), the Science and Technology Planning Project of
292 Guangdong Province, China (No. 2015A030302052; 2016A030303035), the National
293 Natural Science Foundation of China (No. 31470155), and the Science and
294 Technology Planning Project of Guangzhou, China (No. 201504291620324). The
295 authors sincerely thank the professor Chengshu Wang in Shanghai Institute for
296 Biological Sciences for providing the guidance of genome sequencing.
297 LITERATURE CITED bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
298 Bard, J., and R. Winter, 2000 Gene Ontology:tool for the unification of biology. Nat
299 Genet. 25: 25-29.
300 Benson, G., 1999 Tandem repeats finder: a program to analyze DNA sequences.
301 Nucleic Acids Res. 27: 573-580.
302 Bushley, K. E., R. Raja, P. Jaiswal, J. S. Cumbie, M. Nonogaki et al., 2013 The
303 Genome of Tolypocladium inflatum: Evolution, Organization, and Expression of
304 the Cyclosporin Biosynthetic Gene Cluster. PLoS Genet. 9: e1003496.
305 Cantarel, B. L., P. M. Coutinho, C. Rancurel, T. Bernard, V. Lombard et al., 2009 The
306 Carbohydrate-Active EnZymes database (CAZy): an expert resource for
307 Glycogenomics. Nucl Acids Res. 37: 233-238.
308 Chen, L. H., D. D. Zheng, B. Liu, J. Yang and Q. Jin, 2016 VFDB 2016 hierarchical
309 and refined dataset for big data analysis-10 years on. Nucleic Acids Res. 44:
310 694-697.
311 Chin, C.S., D.H. Alexander, P. Marks, A.A. Klammer, J. Drake et al., 2013 Nonhybrid,
312 finished microbial genome assemblies from long-read SMRT sequencing data.
313 Nat Methods. 10: 563-569.
314 Cordaux, R., and M.A. Batzer, 2009 The impact of retrotransposons on human
315 genome evolution. Nat Rev Genet. 10: 691-703.
316 Elbarbary, R. A., B.A. Lucas, and L.E. Maquat, 2016 Retrotransposons as regulators
317 of gene expression. Science. 351:aac7247. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
318 Faulkner, G.J., Y. Kimura, C.O. Daub, S. Wani, C. Plessy et al., 2009 The regulated
319 retrotransposon transcriptome of mammalian cells. Nat Genet. 41: 563-571.
320 Garcia-Perez, J. L., T.J. Widmann, and I.R. Adams, 2016 The impact of transposable
321 elements on mammalian development. Development. 143: 4101-4114.
322 Gardner, P. P, J. Daub, J. G. Tate, E. P. Nawrocki, D. L. Kolbe et al., 2009 Rfam:
323 updates to the RNA families database. Nucl Acids Res. 37: 136-140.
324 Iyer L. M., D. Zhang, R. F. de Souza, P. J. Pukkila, A Rao et al., 2014
325 Lineage-specific expansions of TET/JBP genes and a new class of DNA
326 transposons shape fungal genomic and epigenetic landscapes. Proc Natl Acad Sci
327 U S A. 111: 1676-1683.
328 Kanehisa, M., S. Goto, M. Hattori, K. F. Aoki-Kinoshita, M. Itoh et al., 2006 From
329 genomics to chemical genomics: new developments in KEGG. Nucl Acids Res.
330 34: 354-357.
331 Kano, H., I. Godoy, C. Courtney, M.R. Vetter, G.L. Gerton et al., 2009 L1
332 retrotransposition occurs mainly in embryogenesis and creates somatic
333 mosaicism. Genes Dev. 23: 1303-1312.
334 Kojima, K. K., and J. Jurka, 2013 A superfamily of DNA transposons targeting
335 multicopy small RNA genes. PloS ONE. 8: e68260.
336 Kramer G. J., and J. R. Nodwell, 2017 Chromosome level assembly and secondary
337 metabolite potential of the parasitic fungus Cordyceps militaris. BMC Genomics. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
338 18: 912-921.
339 Lagesen, K., P. F. Hallin, E. A. Rødland, H. H. Staerfeldt, T. Rognes et al., 2007
340 RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucl Acids
341 Res. 35: 3100-3108.
342 Lin, Q. Y., T. H. Li, and B. Song, 2008 Cordyceps guangdongensis sp. nov. from
343 China. Mycotaxon.103: 371-376.
344 Lin, Q. Y., T. H. Li, B. Song, and H. Huang, 2009 Comparison of selected chemical
345 component levels in Cordyceps guangdongensis, C .sinensis and C .militaris.
346 Acta Edulis Fungi. 16: 54-57.
347 Lin, Q. Y., B. Song, H. Huang, and T. H. Li, 2010 Optimization of selected cultivation
348 parameters for Cordyceps guangdongensis. Lett Appl Microbiol. 51: 219-225.
349 Liu, B., and M. Pop, 2009 ARDB-Antibiotic Resistance Genes Database. Nucleic
350 Acids Res. 37: 443-447.
351 Lowe, T. M., and S. R. Eddy, 1997 tRNAscan-SE: A Program for Improved Detection
352 of Transfer RNA Genes in Genomic Sequence. Nucl Acids Res. 25: 0955-964.
353 Magrane, M., and U. Consortium, 2011 UniProt Knowledgebase: a hub of integrated
354 protein data. Database (Oxford), bar009.
355 Majorek K. A., S. Dunin-Horkawicz, K. Steczkiewicz, A. Muszewska, M. Nowotny et
356 al., 2014 The RNase H-like superfamily: new members, comparative structural
357 analysis and evolutionary classification. Nucleic Acids Res. 42: 4160-4179. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
358 Mita, P., and J.D. Boeke, 2016 How retrotransposons shape genome regulation. Curr.
359 Opin. Genet. Dev. 37: 90-100.
360 Morales-Hernandez, A., F.J. Gonzalez-Rico, A.C. Roman, E. Rico-Leo, A.
361 AlvarezBarrientos et al., 2016 Alu retrotransposons promote differentiation of
362 human carcinoma cells through the aryl hydrocarbon receptor. Nucleic Acids Res.
363 44: 4665-4683.
364 Muszewska A., K. Steczkiewicz, M. Stepniewska-Dziubinska1, and K. Ginalski.,
365 2017 Cut-and-Paste transposons in fungi with diverse lifestyles. Genome Biol.
366 Evol. 9: 3463-3477.
367 Quandt, C. A., K. E. Bushley, and J. W. Spatafora, 2015 The genome of the
368 truffle-parasite Tolypocladium ophioglossoides and the evolution of antifungal
369 peptaibiotics. BMC Genomics. 16: 553-556.
370 Smit A. F. A., R. Hubley, and P. Green, 2014 RepeatMasker Open-4.0. 2013-2015.
371 URL http://www. repeatmasker. org.
372 Stimberg N., M. Walz, K. Schörgendorfer, and U. Kiick., 1992 Electrophoretic
373 karyotyping from Tolypocladium inflatum and six related strains allows
374 differentiation of morphologically similar species. Appl Microbiol Biotechnol.
375 37:485-9.
376 Suarez, N. A., A. Macia, and A. R. Muotri, 2017 LINE-1 Retrotransposons in healthy
377 and diseased human brain. Dev Neurobiol. 1-22. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
378 Sun, C., D.B. Shepard, R.A. Chong, J. Lopez Arriaza, K. Hall et al., 2012 LTR
379 retrotransposons contribute to genomic gigantism in plethodontid salamanders.
380 Genome Biol Evol. 4: 168-183.
381 Tatusov, R. L., N. D. Fedorova, J. D. Jackson, A. R. Jacobs, B. Kiryutin et al., 2003
382 The COG database: an updated version includes eukaryotes. BMC
383 Bioinformatics. 4: 41-54.
384 Ter-Hovhannisyan, V., A. Lomsadze, Y. O. Chernoff, and M. Borodovsky, 2008 Gene
385 prediction in novel fungal genomes using an ab initio algorithm with
386 unsupervised training. Genome Res. 18: 1979-1990.
387 Torto-Alalibo, T., C. W. Collmer and M. Gwinn-Giglio, 2009 The Plant Associated
388 Microbe Gene Ontology (PAMGO) Consortium: community development of
389 new Gene Ontology terms describing biological processes involved in
390 microbe-host interactions. BMC microbiol. 9: 1-5.
391 Watanabe, M., K. Lee, K. Goto, S. Kumagai, Y. Sugita-Konishi et al., 2010 Rapid and
392 effective DNA extraction method with bead grinding for a large amount of
393 fungal DNA. J Food Protect. 73: 1077-1084.
394 Vargas, W. A., J. M. Martín, G. E. Rech, L. P. Rivera, E. P. Benito et al., 2012 Plant
395 defense mechanisms are activated during biotrophic and necrotrophic
396 development of Colletotricum graminicola in maize. Plant Physiol. 158:
397 1342-58. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
398 Xia, E. H., D. R. Yang, J. J. Jiang, Q. J. Zhang, Y. Liu et al., 2017 The caterpillar
399 fungus, Ophiocordyceps sinensis, genome provides insights into highland
400 adaptation of fungal pathogenicity. Scientific Reports.7:1806-1816.
401 Xie C., W. Gong, Z. Zhu, L. Yan, Z. Hu et al., 2017 Comparative transcriptomics of
402 Pleurotus eryngii reveals blue-light regulation of carbohydrate-active enzymes
403 (CAZymes) expression at primordium differentiated into fruiting body stage.
404 Genomics.
405 Yan, W. J., T. H. Li, and Z. D. Jiang, 2011 Anti-fatigue and life-prolonging effects of
406 Cordyceps guangdongensis. Food R & D. 32: 164-167.
407 Yan, W. J., T. H. Li, and Z. D. Jiang, 2012 Therapeutic effects of Cordyceps
408 guangdongensis on chronic renal failure rats induced by adenine. Mycosystema.
409 31: 432-442.
410 Yan, W. J., T. H. Li, J. H. Lao, B. Song, and Y. H. Shen, 2013 Anti-fatigue property of
411 Cordyceps guangdongensis and the underlying mechanisms. Pharm Biol. 51:
412 614-620.
413 Yan, W. J., T. H. Li, and Z. Y. Zhong, 2014 Anti-inflammatory effect of a novel food
414 Cordyceps guangdongensis on experimental rats with chronic bronchitis induced
415 by tobacco smoking. Food Funct. 5: 2552-2557.
416 Yang T., M. M. Guo, H. J. Yang, S. P. Guo, and C. H. Dong, 2016 The blue-light
417 receptor CmWC-1 mediates fruit body development and secondary metabolism bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
418 in Cordyceps militaris. Appl Microbiol and Biot. 100: 743-755.
419 Yin, Y., G. Yu, Y. Chen, S. Jiang, M. Wang et al., 2012 Genome-wide transcriptome
420 and proteome analysis on different developmental stages of Cordyceps militaris.
421 PLoS ONE. 7: e51853.
422 Zeng, H. B., T. H. Li, B. Song, Q. Y. Lin, and H. Huang, 2009 Study on antioxidant
423 activity of Cordyceps guangdongensis. Nat Prod Res Dev. 21: 201-204.
424 Zheng, P., Y. Xia, G. Xiao, C. H. Xiong, X. Hu et al., 2011 Genome sequence of the
425 insect pathogenic fungus Cordyceps militaris, a valued traditional Chinese
426 medicine. Genome Biol. 12: 116-137.
427 Table 1 Assembly summary statistics of Cordyceps guangdongensis GD15 compared
428 to other Cordyceps genomes.
429 Table 2 Chromosome analysis of Cordyceps guangdongensis GD15 genomic
430 sequence.
431 Table 3 Genome annotation features of Cordyceps guangdongensis GD15.
432 Table 4 Class analysis of transposable elements repeats in Cordyceps
433 guangdongensis.
434 Table S1 Genome sequences used to analyze the chromosome
435 Table S2 Genome annotation of proteins in Cordyceps guangdongensis genome
436 Table S3 Transposable elements classification in Cordyceps guangdongensis
437 Figure legends bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
438 Figure 1 General genomic feature of Cordyceps guangdongensis. A, I, scaffolds; II,
439 gene density represented as the number of genes per 100 kb; III, percentage of
440 coverage of repetitive sequences; IV, GC content estimated by the percentage of G +
441 C in 100 kb. B, Genomic element density including genic and nongenic features of the
442 overall genome assembly length including 40.5% non-annotated sequences.
443 Figure 2 COG functional classification of proteins in the Cordyceps guangdongensis
444 genome.
445 Figure 3 Transcription factors analysis in the Cordyceps guangdongensis genome.
446 Figure S1 Length and quality distributions of PacBio reads.
447 Figure S2 Length statistics of scaffolds generated by genomic sequencing.
448 Figure S3 Distribution of gene length predicted in the Cordyceps guangdongensis
449 genome. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/254243; this version posted January 25, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.