bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Comparative Transcriptomes Analysis of

2 pisiformis at Different Development Stages

3 Lin Chen1†*, Jing Yu1†, Jing Xu2†, Wei Wang1, Lili Ji 1, Chengzhong Yang3, Hua Yu4 4 1 Key Lab of Meat Processing of Sichuan Province, College of Pharmacy and Biological Engineering, 5 Chengdu University, Chengdu, 610106, China; [email protected] (L.C.); [email protected](J.Y.); 6 [email protected](W.W.); [email protected](L.J.) 7 2 Department of Parasitology, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu 8 611130, China; [email protected] (J.X.) 9 3 Chongqing Key Laboratory of Biology, College of Life Sciences, Chongqing Normal University, 10 Chongqing, 401331, China; [email protected](C.Y.) 11 4 Sichuan EntryExit Inspection and Quarantine Breau,Chengdu,610041,China;[email protected] (H.Y.) 12 * Correspondence: [email protected]; Tel.: 086-28-84616805 13 † These authors contributed equally to this work. 14 Abstract: To understand the characteristics of the transcriptional group of Taenia pisiformis at

15 different developmental stages, and to lay the foundation for the screening of vaccine antigens and

16 drug target genes, the transcriptomes of adult and larva of T. pisiformis were assembled and 17 analyzed using bioinformatic tools. A total of 36,951 unigenes with a mean length of 950bp were 18 formed, among which 12,665, 8,188, 7,577, and 6,293 unigenes have been annotated respectively by 19 sequence similarity analysis with four databases (NR, Swiss-Prot, KOG, and KEGG). It should be 20 noted there are 5,662 unigenes that share good similarity with the four databases and get a 21 relatively perfect functional annotation. Besides, a total of 10,247 differentially expressed genes

22 were screened. To be specific, 6,910 unigenes were up-regulated in the larva stage while 3,337

23 were down-regulated in the adult stage. To sum up, this study sequenced and analyzed the

24 transcriptomes of the larval and adult stages of T. pisiformis. The results of differentially expressed 25 genes in these two stages could provide basis for functional genomics, immunology and gene 26 expression profiles of T. pisiformis.

27 Keywords: Taenia pisiformis, transcriptome, differentially expressed genes (DEGs), RNA-Seq

28 29

30 1. Introduction bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

2 of 16

31 is one of the common parasitic diseases in , caused by the metacestode of 32 Taenia pisiformis (Cestoidea; ; ), also known as Cysticercus pisiformis. The 33 adult T. pisiformis often parasitized the small intestine of canines and felines, such as dogs, wolves, 34 jackals, foxes, raccoon dogs and cats (Owiny, 2001; Foronda and Valladares et al., 2003; Saeed and 35 Maddox et al., 2006; Martinez and Hernandez et al., 2007; Lahmar and Sarciron et al., 2008; Bagrade 36 and Kirjusina et al., 2009; Jia and Yan et al., 2010). The larvae (Cysticercus pisiformis) often inflict 37 the liver capsule, greater omentum, mesentery and rectal serous membrane of rabbits, squirrels,

38 mice, and other (Zhou and Du et al., 2008). T. pisiformis can cause serious health problems

39 and even death (Yang and Fu et al., 2012).

40 So far, there have been a lot of research on T. pisiformis, including its biological characteristics 41 (Chen and Yang et al., 2015), population genetic diversity (Yang and Ren et al., 2013), gene function 42 (Yang and Chen et al., 2013; Chen and Yang et al., 2014; 2014; Chen and Yang et al., 2016) and 43 preventative measures against it (Chen and Yang et al., 2014) . However, although the 44 transcriptome of the adult of T. pisiformis has been determined before (Yang and Fu et al., 2012), 45 there is no report on comparative transcriptomics of T. pisiformis at different developmental stages. 46 Therefore, the exploration of gene expression patterns in different developmental stages of T. 47 pisiformis will contribute to elucidating the mechanism of the infection of the host and provide an 48 important basis for the prevention and control of these tapeworms. Yet, due to the limitation of 49 materials and research methods, the infection mechanism of T. pisiformis is still unclear. In recent 50 years, high throughput sequencing technology has been widely accepted thanks to its various 51 advantages, such as large number of data, high accuracy, high sensitivity and low running cost. It 52 has been used in many parasite species such as Itch mite (He and Xu et al., 2016), Sarcoptes scabiei 53 (He and Gu et al., 2017), Dirofilaria immitis (Fu and Lan et al., 2012), Taenia ovis (Zheng, 2017), 54 Echiococcus graulosus (Ju, 2013), Taenia multiceps (Li and Zhang et al., 2017) and others (Kolev and 55 Franklin et al., 2010; Cantacessi and Young et al., 2011; Sorber and Dimon et al., 2011; 2012; Schicht 56 and Qi et al., 2014). By using this method, it is possible to understand the molecular mechanism of 57 specific biological processes and understand the overall level of gene expression at different stages 58 of the parasite. In the current research, the Illumina sequencing techniques were applied to study 59 the transcriptome of the two different developmental stages of T. pisiformis. By clarifying the gene 60 expression status in the adult and larva of T. pisiformis, this research will lay a foundation for the 61 diagnosis, prevention and treatment of Cysticercosis.

62 2. Results

63 2.1. Illumina sequencing and assembly bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

3 of 16

64 RNA-Seq generated approximately 20 to 26 million raw sequence reads for each of the six 65 cDNA libraries obtained from T. pisiformis (three from adult stage and three from larva stage ). The 66 average GC content was 47.25% (Table 1). All of the clean reads were assembled into a 67 transcriptome, which was used as a reference sequence for further analyses. In total, 36,951 68 unigenes were detected in all transcriptomes (Table 2). The mean length and N50 were 950bp and 69 1,998bp, respectively (Table 2). According to the statistics of length distribution, 13,759 unigenes

70 (43.50%) ≥ 500bp, and 8,975 (24.29%) ≥ 1,000bp (Figure 1, Additional file 1: Table S1). The 71 transcriptome raw reads dataset obtained has been submitted to the NCBI Short Read Archive 72 (http://www.ncbi.nlm.nih.gov/Traces/sra_sub/sub.cgi) with the accession number: SUB4234089.

73 Table1 Summary of transcriptome data for adult (Tp) and larva (Cp) of T. pisiformis

Reads After Filter GC content Sample Total raw reads Length Adapter (%) Low quality (%) Reads Number (%) (%) (bp)

Cp-1 27,788,994 26,124,150 (94.01%) 150 47.38% 696,628 (2.51%) 965,736 (3.48%)

Cp-2 25,598,082 23,928,260 (93.48%) 150 47.44% 653,040 (2.55%) 1,013,594 (3.96%)

Cp-3 22,221,546 20,382,060 (91.72%) 150 47.19% 640,246 (2.88%) 1,195,572 (5.38%)

Tp-1 23,644,240 22,091,652 (93.43%) 150 46.95% 615,538 (2.6%) 932,282 (3.94%)

Tp-2 25,641,834 24,071,136 (93.87%) 150 48.22% 658,922 (2.57%) 908,514 (3.54%)

Tp-3 24,713,304 23,233,138 (94.01%) 150 46.30% 551,842 (2.23%) 921,362 (3.73%)

Average 24,934,667 23,305,066(93.42%) 150 47.25% 636,036(2.56%)- 989,510(4.01%)-

74

75 Table 2 Filtered results of transcriptome data

Genes GC Total Max length Min length Average Number percentage N50(bp) assembled (bp) (bp) length (bp) (%) bases

36,951 46.54 1,998 26,041 201 950 35,115,151 76

77 2.2 Functional annotation of unigenes

78 A total of 12,844 unigenes (34.76% of all unigenes) were annotated by the Nr (12,665, 34.28%), 79 Swiss-Prot (8,188, 22.16%), KEGG (6,293, 17.03%) and KOG (7,577, 20.51%) protein databases using bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

4 of 16

80 Blastx, and 5,662 unigenes were similar to the sequences of all these four databases (Figure 2). Rest 81 of the unigenes (24,107) failed to match against any sequence with E-value < 10− 5. Homologous 82 genes came from several species, with 59.01% of the unigenes having the highest homology to 83 genes from Echinococcus granulosus (7,474, 59.01%), followed by Echinococcus multilocularis (3,654, 84 28.85%), Hymenolepis microstoma (557, 4.4%), Daphnia magna (72, 0.57%), Taenia solium (67, 0.53%) 85 and other species (841, 6.64%). 86 The analysis of GO function annotation provided a functional classification and enrichment 87 analysis for differentially expressed genes (DEGs) (Li and Zhang et al., 2017). As shown in Figure 3, 88 16,269 unigenes can be classified into 25 independent KOG functional items. “Signal transduction 89 mechanisms” is one of the largest clusters, including 2,882 (17.7%) unigenes. The second and third 90 largest ones are “general function prediction only” (containing 2,393 unigenes, 14.7%) and 91 “modification after translation, transformation” (containing 1699 unigenes, 10.4%). “Cell motility” 92 had the lowest number of related genes (only 34). In the transcriptome, the items that involve the 93 growth of T. pisiformis include: “carbohydrates (2.24%)”, “transport and metabolism of amino acid 94 transport and metabolism (1.46%)” , “nucleic acid transport and metabolism (1.03%)”, “lipid 95 transport and metabolism (1.82%)”, “other material metabolism and signal transduction mechanism 96 (17.71%)”, “transcription (6.62%)”, “translation, the ribosome structure and biological 97 transformation (4.88%)”, “protein modification after translation, protein conversion, molecular 98 partner (10.44%)”, “coenzyme transport and metabolism (0.48%)”, “secondary metabolites 99 biosynthesis, transportation and catabolism (0.47%)” and “defense mechanism (0.54%)”.

100 2.3 Unigene metabolic pathway analysis

101 A total of 3,122 unigenes were mapped to 227 KEGG pathways. The KEGG pathways 102 included five categories: “metabolism”, ”genetic information processing”, ”environmental

103 information processing”, ”cellular processes” and “organismal systems” (The specific data are not

104 illustrated here). The most abundant pathway was transcription ribosome, including 273

105 transcriptions, followed by 216 transcriptions in endocytosis and 193 transcriptions in spliceosome

106 pathway (Additional file 2: Table S2). The metabolic pathway and genetic information processing

107 pathway included over 3,500 unigenes. Based on the KEGG pathway, the top 5 KEGG pathways 108 were ribosome (273, 8.74%), endocytosis (216, 6.92%), spliceosome (193, 6.18%), protein processing 109 in endoplasmic reticulum (176, 5.64%) and oxidative phosphorylation (176, 5.64%). These results 110 indicated the active protein metabolism in T. pisiformis.

111 2.4 Comparison of gene expression profile of T. pisiformis from different developmental stages bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

5 of 16

112 To detect gene expression differences in different development stages in the life cycle of T. 113 pisiformis, the researchers analyzed differentially expressed genes in the Cp (larva) groups and Tp 114 (adult) groups. The results suggested that 10,247 DEGs were identified, including 6,910 115 up-regulated unigenes and 3,337 down-regulated unigenes in the the Tp (adult) compared with Cp 116 (larva) group (Figure 5, Figure 6 and additional file 3: Table S3). DEGs were assigned to 198 KEGG 117 pathways. Among them, “endocytosis” (ko04144, 86 unigenes) and ”phagosome” (ko04145, 66) 118 showed significant enrichment (Additional file 4: Table S4). In the pathway enrichment analysis, 119 unigenes were classified into 198 metabolic pathways. The top 10 pathways with the largest 120 unigene numbers were listed in Table 3. 121

122 Table 3 Enriched pathways of differentially expressed genes

Pathway DEGs① KEGG② P-value Pathway ID

Phagosome 66 117 0.000000 ko04145

Lysosome 45 87 0.000006 ko04142

Melanogenesis 13 18 0.000187 ko04916

Endocytosis 86 216 0.000321 ko04144

Gap junction 16 26 0.000568 ko04540

Neuroactive ligand-receptor interaction 36 77 0.000700 ko04080

Ether lipid metabolism 9 13 0.003180 ko00565

Sphingolipid metabolism 17 32 0.003614 ko00600

Glutathione metabolism 20 40 0.004147 ko00480

GnRH signaling pathway 10 16 0.005574 ko04912

123 Note: ① DEGs, the number of differentially expressed genes in each KEGG pathway; ② KEGG, the 124 number of genes in the pathway in KEGG database.

125 2.5 Development of SSR marker of T. pisiformis

126 Using microsatellite loci scanning tool MISA, we found that 1,642 SSR loci made up two to six

127 nucleotide repeats in table transcriptome data (Table 4). 1,335 sequences of 36951 contained SSR

128 sites, of which 213 unigenes had more than one SSR loci. Trinucleotide repeat types accounted for

129 the highest proportion, reaching 45.68%, followed by two, four, six and five nucleotide repeat types

130 and the proportions were 35.51%, 13.34%, 2.80% and 2.68%, respectively. In the detection of SSR, a

131 total of 95 kinds of element types were found. The most abundant ten kinds of repeat motifs were bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

6 of 16

132 AC/GT (375), AGG/CCT (201), ACC/GGT (181), AG/CT (175), AGC/CTG (141), AAG/CTT (67),

133 AAC/GTT (47 A), ATC/ATG (42), AT/AT (32) and ACG/CGT (32). The analysis of the feature of SSR

134 mentioned above is helpful for the development of general molecular markers for T. pisiformis and

135 similar species.

136 Table 4 Distribution and compositions of dominant repeat for the different SSR

Repeat type SSR number Proportion (%) Advantage repeat

Dinucleotide 583 35.51% AC/GT

Trinucleotide 750 45.68% AGG/CCT

Tetranucleotide 219 13.34% AGGC/CCTG

Pentanucleotide 44 2.68% ACGAG/CGTCT

Hexanucleotide 46 2.80% ACCGCC/CGGTGG

Total 1642 100%

137 2.6 Prediction of allergen genes

138 The unigenes of T. pisiformis in the two development stages were compared by Blastx against 139 allergen protein sequences, and 808 putative allergen genes were identified (Additional file 5: Table 140 S5). There were 618 putative allergen genes producing Blast hits to genes from other Taenia and 141 Echinococcus (Additional file 6: Table S6), while the other 190 did not produce Blast hits to genes 142 from other Taenia and Echinococcus (Additional file 7: Table S7). 143 The upregulated DEGs were similarly analyzed in the adult group relative to the larval group 144 and 343 putative allergen genes were predicted (Additional file 8: Table S8). Among them, 284 were 145 shared with other Taenia and Echinococcus (Additional file 9: Table S9), and 59 were not (Additional 146 file 10: Table S10). A similar analysis of the significantly downregulated DEGs predicted 461 147 putative allergen genes (Additional file 11: Table S11), of which 334 produced Blast hits to 148 sequences from Taenia and Echinococcus (Additional file 12. Table S12), while 127 did not 149 (Additional file 13. Table S13).

150 2.7 Analysising of wnt signaling pathway and wnt genes

151 To better understand the biochemistry and physiology of T. pisiformis, we chose the “wnt 152 signaling” pathway as a case-study for further analysis. Across the transcriptome, this pathway was 153 mapped to 37 unigenes which were grouped into three categories (Canonical pathway, Planar cell 154 polarity pathway and wnt/Ca2+ pathway). This pathway was one of the most enriched in 155 downregulated DEGs in the adult group relative to the larvae group; there were 20 downregulated bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

7 of 16

156 DEGs related to this pathway(Additional file 14. Figure S1). In the results, we found wnt1, wnt4, 157 wnt5a, wnt11 and wnt11b in the larval and adult stage of T. pisiformis transcriptome annotations. 158 The wnt2b-a and wnt3a were found in the adult but were absent in the larvae of T. pisiformis 159 (Additional file 15. Table S14).

160 3. Discussion

161 The development of a new generation of high-throughput sequencing technology changed the 162 study of transcriptional studies (Chen and Guo et al., 2017). This is a new method to analyze the 163 complex genomic function and cellular activities of organisms. This method does not need to design 164 probes in advance and can sequence the whole transcriptional activity of any biological growth and 165 development stage under specific conditions. It is more precise and is not subject to 166 cross-hybridization, thereby providing higher accuracy and a larger dynamic range (Wang and 167 Gerstein et al., 2009; 2015). In the current study, the transcriptional sequence of the T. pisiformis 168 larvae and adults were sequenced and assembled, using a HiSeq 2000 paired-end sequencing 169 platform and Trinity assembling software. A total of 10,247 DEGs were obtained. The results may 170 lay the foundation for the research about the development of the reproductive system. This results 171 can also provide extensive coverage of the transcriptome in long fragments, which presents large 172 numbers of data to analyze gene expression, predict new genes, and explore metabolic pathways of 173 T. pisiformis. 174 In this study, we carried out transcriptome sequencing analysis of the T. pisiformis larvae and 175 adults, and explored the difference of gene expression, metabolic pathway and functional clustering 176 in the two development stages. Sample collection plays an important role in the accuracy and 177 representativeness of the transcriptional data. In order to make the transcript data more 178 comprehensive and representatives, this study carried out three repetitions for each sample to 179 ensure the result. 180 In 2015, Ju Yan (Ju, 2013) sequenced the mRNA of E. granulosus by the Illumina's Solexa 181 sequencing platform, and obtained 2GB data. In 2017, the transcriptomic of the larva T. 182 multiceps was analyzed for the first time. 70,253 unigenes with a mean length of 1492 bp were 183 obtained (Li and Zhang et al., 2017). In 2012, the Illumina sequencing technology was used to detect 184 transcriptome of T. pisiformis (Yang and Fu et al., 2012). 72,957 unigenes were assembled. At the 185 same time, the unigenes of T. pisiformis, E. granulosus and E. multilocularis were compared, and the 186 results revealed the distribution characteristics of functional genes. In the current study, reads were 187 assembled into 36,951 unigenes. The assembled unigenes were compared with BLASTX, Nr, 188 Swissprot, KOG and KEGG database. The comparison rates were 34.28%, 22.16%, 20.51% and 189 17.03%, respectively. 12,844 unigenes were annotated. The average length of T. pisiformis unigenes bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

8 of 16

190 obtained in this study (average length 950 bp) was longer than that in earlier studies (398 bp) (Yang 191 and Fu et al., 2012). This indicates that the sequencing data was well assembled. The unigene 192 annotations included a description of the gene name, analysis of GO terms and metabolic/signaling 193 pathways, which provided biological information at a specific time. Such data could contribute to a 194 more in-depth understanding of gene expression in T. pisiformis. 195 Both diseases and parasites have high correlations with allergy, because of the immunological

196 characteristics that contribute to maintaining the larvae in its host (Vuitton, 2004). Homologous 197 skin-sensitizing antibodies could be detected in the sera of rabbits infected with T. pisiformis (Leid

198 and Williams, 1975). In particular, we characterized putative T. pisiformis allergen genes and 199 analyzed differences in gene expression between larval and adult stages. This lays the foundation 200 for research about the pathogenic properties of these two stages. Comparisons of T. pisiformis DEGs 201 against allergen protein sequences produced 808 predicted allergen genes. Among them, 618 202 matched sequences in other Echinococcus and Taenia. Further study on these genes will help 203 researchers understand the interactions between T. pisiformis and its hosts. 204 The wnt signaling pathways are a group of signal transduction pathways involved in a wide 205 range of cellular interactions throughout the development, including regulating cell proliferation, 206 segmentation and axial patterning (Dierick and Bejsovec, 1998). It is known that there are at least 207 three wnt pathways: the canonical pathway, the planar cell polarity (PCP) pathway and the 208 wnt/Ca2+ pathway (Logan and Nusse, 2004). Compared with planarians, there are fewer orthologs 209 and paralogs gene of wnt in parasitic . Riddiford et al. (2011) reported that wnt is more

210 likely to be involved in the evolution of segmentation in platyhelminthes (Riddiford and Olson, 2011). 211 Aulehla et al. (2003) reported that Wnt3a plays a major role in the segmentation clock controlling

212 somitogenesis (Aulehla and Wehrle et al., 2003). Hou et al. (2014) reported that the gene of wnt4 may

213 be related to the process of cysticerci evagination and the scolex/bladder development of T. solium

214 cysticerci (Hou and Luo et al., 2014). We found seven wnt genes (wnt1, wnt2b-a, wnt3a, wnt4, wnt5a,

215 wnt11 and wnt11b) in adult stage of T. pisiformis transcriptome annotations and five wnt genes in

216 larval stage (Absent of wnt2b-a and wnt3a). The absence of wnt2b-a and wnt3a in larval stage may 217 be related to the somitogenesis. The data of wnt genes in this study may help clarify the role of wnt 218 genes in the development of T. pisiformis.

219 4. Materials and Methods

220 4.1 Ethics statement.

221 were handled strictly according to the 222 animal protection laws of the People's Republic of China (released on Sept. 18th, 2009) and the 223 National Standards for Laboratory Animals in China (executed on May 1st, 2002). The Animal Ethics bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

9 of 16

224 Committee of Sichuan Agricultural University (AECSCAU; Approval No. 2014–015) had reviewed 225 and approved this study.

226 4.2 Parasite

227 T. pisiformis larvae were collected from the great omentum of two New Zealand white rabbits 228 naturally infected with this tapeworm from a farm in Sichuan, China. After morphological 229 identification, three larvae were cleaned, labeled and kept in liquid nitrogen, and the other 10 230 larvae were used for the dog infection. Three adult worms (T. pisiformis) were taken out of small 231 intestine, and were washed in warm physiological saline for three times to avoid contamination 232 before they were frozen immediately and stored in liquid nitrogen.

233 4.3 RNA isolation and Illumina sequencing

234 Total RNA was isolated from single adult (including scoles, neck and strobila) and larva of 235 T.pisiformis using Trizol reagent (Invitrogen,Life Technologies, Carlsbad, CA, USA) according to 236 the manufacturer’s protocol. Total RNA of independent samples were stored at -80℃ before used. 237 The RNA quality was verified by an Agilent 2100 RNA Nanochip (Agilent, SantaClara, CA, USA) in 238 terms of concentration, RNA integrity number and the 28S:18S ratio. 239 The OligoTex mRNA mini kit (Qiagen) was used to isolate poly (A) mRNA. Fragmentation 240 buffer was added to interrupt mRNA to shorten fragments (100–400 bp). The first cDNA chain 241 (200±25 bp) was synthesized by six base random primers according to the template of mRNA. Then 242 the second cDNA chains were synthesized by adding buffer dNTPs, RNase H, and 243 DNApolymeraseI. Having been purified by QiaQuick PGR kit and eluted with EB buffer, end 244 repair, poly (A) and sequence connection were performed. Then 2% TAE- agarose gel 245 electrophoresis was used to select the size of fragment, and finally PCR amplification. The qualities 246 of the sequencing libraries were assessed on the Agilent Bioanalyzer 2100 system (Agilent 247 Technologies, CA). The library preparations were sequenced on an Illumina HiSeq 2000 platform 248 (Illumina, USA). RNA-Seq data was produced by Guangzhou Jidiao Biotechnology Co. Ltd. For 249 detailed steps, please refer to Wu et al. (Wu and Fu et al., 2012).

250 4.4 Assembly and Annotation

251 Before assembly, the high-quality clean reads were obtained from raw reads by removing 252 adaptor sequences, duplication sequences, reads containing more than 10% “N’’ rates (the “N’’ 253 character represents ambiguous bases in reads), and low quality reads containing more than 10% 254 bases with Q-value ≤ 20. All the downstream analyses were performed using clean reads. The bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

10 of 16

255 Trinity program was used for de novo assembly of the sequence data. These N-free assembly 256 fragments obtained through the overlapping relations among reads are called unigene. 257 Blastx alignment was carried out between unigenes and databases including NR (NCBI 258 non-redundant protein sequences), KO (KEGG Orthology), SwissProt (a manually annotated and

259 peer-reviewed protein sequence database), GO (Gene Ontology) and COG (the Eukaryotic

260 Ortholog Groups database).

261 The direction and CDS of unigenes in databases were obtained based on the best alignment 262 results. Unigenes that could not be aligned to the above databases were scanned using the ESTScan 263 software to obtain the CDS and the sequence direction. The MISA program software was applied to 264 analyze the microsatellite loci. Bioinformatics analyses were conducted as previously described 265 (Yang and Fu et al., 2012).

266 4.5 Analysis of gene expression

267 The Bowtie software was used for analyzing the ratio of comparison. The gene expression 268 amount was estimated by counting the reads numbers mapped to each gene. Expression levels of 269 individual unigenes from different stages in the life cycle of the T. pisiformis were evaluated with 270 the method of RPKM (reads per kb per million reads) (He and Xu et al., 2016).

271 4.6 Real-time PCR (qRT-PCR) validation

272 QRT-PCR was performed to verify the T.pisiformis expression data. Primers for differentially 273 expressed genes and the housekeeping gene GAPDH were designed by Primer3 tool and the 274 sequences are available in Additional file 16: Table S15. For qRT- PCR, an ABI7500FAST real-time 275 PCR System (Applied Biosystems, Forster, USA) and a SYBR®Premix Ex TaqTM ⅡKit (Takara, Japan) 276 were applied in accordance to manufacturers’ recommendations. The qRT-PCR conditions were 277 96℃ for 2min, followed by 40 cycles of 93℃ for 20s and 59℃ for 20s. The final melting curve 278 was analyzed. The relative expression level of each gene was calculated using the 2- △ △ Ct method 279 (Livak and Schmittgen, 2001).

280 4.7 Prediction of putative allergens

281 To predict putative allergens, the unigenes of T. pisiformis in each development stage and the 282 differentially expressed unigenes of T. pisiformis were compared by Blast against allergen protein 283 sequences from the allergen database website (https://www.uniprot.org).

284 Author Contributions: Conceptualization, Lin Chen and Wei Wang; Formal analysis, Lin Chen; Funding 285 acquisition, Wei Wang; Methodology, Lin Chen and Jing Yu; Project administration, Wei Wang; Resources, Lili 286 Ji, Chengzhong Yang and Hua Yu; Supervision, Lin Chen; Writing – review & editing, Jing Xu. bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

11 of 16

287 Conflicts of Interest: The authors declare no conflict of interest.

288 Acknowledgments: This work was supported by Chengdu University Youth Fund Program [2018XZB06], 289 Sichuan Province Soft Science Program [2014ZR0118] and Sichuan Province Science and Technology Program 290 [2017JY0118].

291

292 References:

293 Almeida, G. and M. Amaral, et al. (2012). "Exploring the Schistosoma mansoni adult male transcriptome 294 using RNA-seq." Exp Parasitol 132 (1): 22-31. 295 Aulehla, A. and C. Wehrle, et al. (2003). "Wnt3a plays a major role in the segmentation clock controlling 296 somitogenesis." Dev. Cell 4 (3): 395-406. 297 Bagrade, G. and M. Kirjusina, et al. (2009). "Helminth parasites of the wolf Canis lupus from Latvia." Journal 298 of Helminthology 83 (1): 63-68. 299 Cantacessi, C. and N. Young, et al. (2011). "The transcriptome of Trichuris suis-first molecular insights into a 300 parasite with curative properties for key immune diseases of humans." PLos One 6 (8): e23590. 301 Chen, D. and R. Guo, et al. (2017). "Transcriptome of Apis cerana cerana larval gut under the stress of 302 Ascosphaera apis." Scientia Agricultura Sinica 50 (13): 2614-2623. 303 Chen, L. and D. Y. Yang, et al. (2014). "Protection against Taenia pisiformis larval infection induced by a 304 recombinant oncosphere antigen vaccine." Genetics and Molecular Resarch 13 (3): 6148-6159. 305 Chen, L. and D. Yang, et al. (2014). "Evaluation of a novel Dot-ELISA assay utilizing a recombinant protein 306 for the effective diagnosis of Taenia pisiformis larval infections." Veterinary Parasitology 204 (3-4): 214-220. 307 Chen, L. and D. Yang, et al. (2014). "Prokaryotic expression of Tp-dUTPase gene of Taenia pisiformis 308 oncospheres and establishment of dot-ELISA using the expressed protein." Chin j Vet Sci 7 (34): 1100-1105. 309 Chen, L. and D. Yang, et al. (2015). "Biological Characteristics of Taenia pisiformis Sichuan Strain." Sichuan 310 Journal of Zoology 34 (3): 439-443. 311 Chen, L. and D. Yang, et al. (2016). "Clonging and Bioinformatics Analysis of Tp-TSP2 Gene of Taenia 312 pisiformis." Janrnal of Economic Animal 20 (04): 227-233. 313 Dierick, H. and A. Bejsovec (1998). "Cellular Mechanisms of Wingless/Wnt Signal Transduction." Current 314 Topics in Developmental Biology 43 (1): 153-190. 315 Foronda, P. and B. Valladares, et al. (2003). "Helminths of the wild (Oryctolagus cuniculus) in 316 Macaronesia." J Parasitol 89 (5): 952-957. 317 Fu, Y. and J. Lan, et al. (2012). "Novel Insights into the Transcriptome of Dirofilaria immitis." Plos one 7 (7): 318 e41639. 319 He, M. and J. Xu, et al. (2016). "Preliminary analysis of Psoroptes ovis transcriptome in different 320 developmental stages." Parasites & Vectors 9 (570): 1-12. 321 He, R. and X. Gu, et al. (2017). "Transcriptome-microRNA analysis of Sarcoptes scabiei and host immune 322 response." Plos one 12 (5): e0177733. 323 Hou, J. and X. Luo, et al. (2014). "Sequence analysis and molecular characterization of Wnt4 gene in 324 metacestodes of Taenia solium." Korean Journal of Parasitology 52 (2): 163-168. bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

12 of 16

325 Jia, W. and H. Yan, et al. (2010). "Complete mitochondrial genomes of Taenia multiceps, T. hydatigena and T. 326 pisiformis: additional molecular markers for a tapeworm genus of human and animal health significance." 327 BMC Genomics 1 (11): 447-460. 328 Ju, Y. (2013). mRNA Sequencing and Transcriptome Characteristic of Echinococcus granulosus, Ningxia 329 Medical University. Master. 330 Kolev, N. and J. Franklin, et al. (2010). "The transcriptome of the human pathogen Trypanosoma brucei at 331 single nucleotide resolution." PLoS Pathog 6 (9): e1001090. 332 Lahmar, S. and M. E. Sarciron, et al. (2008). "Echinococcus granulosus and other intestinal helminths in 333 semi-stray dogs in Tunisia: infection and re-infection rates." Tunis Med 7 (86): 657-664. 334 Leid, R. W. and J. F. Williams (1975). "Reaginic antibody response in rabbits infected with Taenia 335 pisiformis." International Journal for Parasitology 5 (2): 203-208. 336 Li, W. H. and N. Z. Zhang, et al. (2017). "Transcriptomic analysis of the larva Taenia multiceps." Research 337 in Veterinary Science 115: 407-411. 338 Livak, K. J. and T. D. Schmittgen (2001). "Analysis of Relative Gene Expression Data Using Real-Time 339 Quantitative PCR and the 2−ΔΔCT Method." Methods 25 (4): 402-408. 340 Logan, C. Y. and R. Nusse (2004). "The Wnt signaling pathway in development and disease." Annu. Rev. 341 Cell Dev. Biol. 20 (1): 781-810. 342 Martinez, F. J. and S. Hernandez, et al. (2007). "Estimation of canine intestinal parasites in Cordoba (Spain) 343 and their risk to public health." Vet Parasitol 143 (1): 7-13. 344 Owiny, J. R. (2001). "Cysticercosis in laboratory rabbits." Contemp Top Lab Anim Sci 40 (2): 45-48. 345 Reuter, J. A. and D. V. Spacek, et al. (2015). "High-Throughput Sequencing Technologies." Molecular Cell 346 58 (4): 586-597. 347 Riddiford, N. and P. D. Olson (2011). "Wnt gene loss in flatworms." Development Genes and Evolution(221): 348 187. 349 Saeed, I. and C. Maddox, et al. (2006). "Helminths of red foxes (Vulpes vulpes) in Denmark." Vet Parasitol 350 139 (1-3): 168-179. 351 Schicht, S. and W. Qi, et al. (2014). "Whole transcriptome analysis of the poultry red mite Dermanyssus 352 gallinae (De Geer, 1778)." Parasitology 141 (3): 336-46. 353 Sorber, K. and M. Dimon, et al. (2011). "RNA-Seq analysis of splicing in Plasmodium falciparum uncovers 354 new splice junctions, alternative splicing and splicing of antisense transcripts." Nucleic Acids Res 39 (9): 355 3820-35. 356 Vuitton, D. A. (2004). Echinococcosis and Allergy, Humana Press. 357 Wang, Z. and M. Gerstein, et al. (2009). "RNA-Seq: a revolutionary tool for transcriptomics." Nat Rev Genet 358 10 (1): 57-63. 359 Wu, X. and Y. Fu, et al. (2012). "Detailed transcriptome description of the neglected cestode Taenia 360 multiceps." Plos One 7 (9): e45830. 361 Yang, D. and L. Chen, et al. (2013). "Prokaryotic Expression of Tp18 Gene from Taenia pisiformis and 362 Antigenicity Analysis of the Expressed Product." Acta Veterinaria et Zootechnica Sinica 44 (11): 1819-1825. 363 Yang, D. and Y. Fu, et al. (2012). "Annotation of the Transcriptome from Taeniapisiformis and Its 364 Comparative Analysis with Three Taeniidae Species." plos one 4 (7): e32283. 365 Yang, D. and Y. Ren, et al. (2013). "Population genetic diversity based on CO2 and ND4 genes in Taenia 366 pisiformis from Sichuan." Chin J Vet Sci 33 (1): 43-48. bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

13 of 16

367 Zheng, Y. (2017). "High-throughput identification of miRNAs of Taenia ovis , a cestode threatening sheep 368 industry." Infection, Genetics and Evolution 51: 98-100. 369 Zhou, Y. and A. Du, et al. (2008). "Research of harmfulness of Cysticercus pisiformis in rabbit." Journal of 370 Zhejiang agricultural science(3): 372-373.

371

372

373 Figure 1 Length distribution of T. pisiformis unigenes

374

375 376 Figure 2 Wayne diagrams of four database annotations

377 bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

14 of 16

378

379 Figure 3 KOG functional annotations of putative proteins among transcripts of T. pisiformis

380

381 382 Figure 4 GO annotations of unigenes in T. pisiformis transcriptome bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

15 of 16

383 384 Figure 5 Volcano plot showing differentially expressed genes between three replicates of adult

385 and three replicates of larva of T. pisiformis. The x-axis corresponds to the log2 fold-change

386 value, and the y-axis displays the log10 (FDR).The red dots represent the significantly

387 differentially expressed transcripts (p ≤ 0.05 and fold change ≥ 2) between the adult and the

388 larva, while the black dots are not statistically significant (P>0.05).

389

390 Figure 6 Statistics of differentially expressed genes comparing adult to larva of T. pisiformis bioRxiv preprint doi: https://doi.org/10.1101/490276; this version posted December 9, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

16 of 16

391

392 Figure 7 qRT-PCR validation of nine selected genes that were differentially expressed between

393 the Cp group (samples from T. pisiformis larva) and the Tp group (samples from T. pisiformis

394 adult) in the transcriptomic data. Error bars indicate standard error