1 Author running head: Y. J. Fang et al.

2 Title running head: Sialotranscriptome of An. sinensis

3 Correspondence: Bin Chen, Institute of Entomology and Molecular Biology

4 College of Life Sciences, Chongqing Normal University

5 University Town, Chongqing 401331, China. Tel: +86 23 65918391 (office), 13638385631

6 (mobile); fax: +86 23 65910315; email: [email protected], [email protected]

7 *These authors contributed equally for the work.

8

9 ORIGINAL ARTICLE

10 Sialotranscriptome sequencing and analysis of sinensis and

11 comparison with Psorophora albipes sialotranscriptome (Diptera: Culicidae)

12

13 Ya-Jie Fang*, Zhen-Tian Yan and Bin Chen*

14 Chongqing Key Laboratory of Vector ; Institute of Entomology and Molecular Biology,

15 College of Life Sciences, Chongqing Normal University, Chongqing, China.

This is an Accepted Article that has been peer-reviewed and approved for publication in the Science but has yet to undergo copy-editing and proof correction. Please cite this article as doi: 10.1111/1744-7917.12431.

This article is protected by copyright. All rights reserved.

16

17

18 Abstract Most of adult female mosquitoes secrete saliva to facilitate blood sucking,

19 digestion and nutrition, and -borne disease prevention. The knowledge of

20 classification and characteristics of sialotranscriptome genes are still quite limited. Anopheles

21 sinensis is a major vector in China and Southeast countries. In the study, the An.

22 sinensis sialotranscriptome was sequenced using Illumina sequencing technique with a total of

23 10 907 unigenes to be obtained and annotated in biological function and pathway, and 10 470

24 unigenes were mapped to An. sinensis reference genome with 70.46% genes having

25 90%–100% genome mapping through bioinformatics analysis. These mapped genes were

26 classified into four categories: housekeeping (6632 genes), secreted (1177), protein-coding

27 genes with function-unknown (2646) and transposable element (15). The housekeeping genes

28 were divided into 27 classes, and the secreted genes were divided into 11 classes and 96

29 families. The classification, characteristics and evolution of these classes/families of secreted

30 genes were further described and discussed. The comparison of the 1177 secreted genes in An.

31 sinensis in Anophelinae subfamily with 811 in Psorophora albipes in subfamily

32 show that 6 Classes/subclasses have the gene number more than twice and 2 classes (Uniquely

33 found in anophelines, and Orphan proteins of unique standing) are unique in the former

34 compared with the later, whereas 4 Classes/subclasses are much expanded and the Uniquely

35 found in Aedes class is unique in the later. The An. sinensis sialotranscriptome sequence data is

36 the most complete in mosquitoes to date, and the analyses provide a comprehensive

37 information frame for further research of mosquito sialotranscriptome.

This article is protected by copyright. All rights reserved.

38 Key words Anopheles sinensis; comparative analysis; gene classification;

39 Psorophora albipes; sialotranscriptome

40

41 Introduction

42 Anopheles sinensis is a major malaria vector in China and in eastern and southeastern Asia,

43 with wide distribution from Afghanistan, China (northern China and Taiwan), Japan, Korea,

44 northeast India, and southward to western Indonesia (Sinka et al., 2011). It also transmits the

45 filarial nematodes (Zhang et al., 1994). The adult female mosquitoes need blood feeding to

46 obtain nutrition for egg‟s maturation. Salivary gland is an important organ of mosquitoes to

47 facilitate: (1) blood feeding and digestion, (2) sugar, protein and lipid digestion, (3) reducing of

48 the parasite infection and transmission (Das et al., 2010). When mosquito sucks blood, the

49 salivary gland releases mixture, containing anti-hemostatic, blood platelet aggregation

50 inhibitor and vasodilator to obtain blood, and meanwhile, it also releases the defensin and

51 lysozyme to minimize parasite infection. The mosquito salivary glands are paired and located

52 at the anterior portion of the thorax. There are three lobes each gland, two lateral lobes and one

53 median lobe (Siriyasatien et al., 2005a). The lateral can be divided into proximal region and

54 distal region. The proximal region mainly secretes enzymes related to sugar feeding, while the

55 distal region and the median lobe secrete the compounds associated with blood-sucking, which

56 may be specific for female mosquitoes.

57 The research on transcriptome and protein of mosquito salivary gland are essential to

58 understand the mechanism of blood sucking, digestion and nutrition, and the prevention of

59 mosquito-borne diseases. Up to date, 11 species of mosquito transcriptomes of salivary glands

This article is protected by copyright. All rights reserved.

60 (sialotranscriptome) have sequenced and analyzed, including

61 (Francischetti et al., 2002), (Francischetti et al., 2002; Ribeiro et al., 2007), An.

62 stephensi (Valenzuela et al., 2003), An. darlingi (Calvo et al., 2004; Calvo et al., 2009b), Culex

63 pipiens quinquefasciatus (Ribeiro et al., 2004), Ae. albopictus (Arca et al., 2007), An. funestus

64 (Calvo et al., 2007), Toxorhynchites amboinensis (Calvo et al., 2008), Cx. tarsalis (Calvo et

65 al., 2010a), Ochlerotatus triseriatus (Calvo et al., 2010b), and Psorophora albipes (Chagas et

66 al., 2013). Comparative analyses of sialotranscriptomes in relation to blood feeding in An.

67 gambiae and in response to sugar feeding in An. stephensi have also been reported (Das et al.,

68 2010; Dixit et al., 2009). However, all of these species of sialotranscriptomes, except for

69 Psorophora albipes, were sequenced based on first generation of sequencing techniques, thus

70 the transcripts sequenced were quite limited with only 281–1273 contigs. Due to partial

71 sequencing, the classifications of these species of sialotranscriptome were still preliminary.

72 The sialotranscriptome of Ps. albipes has been well sequenced based on the second generation

73 of sequencing techniques, and assembled into 43 466 contigs; however, only 3247 CDs were

74 classified (Chagas et al., 2013). Ribeiro et al. (2010) (Ribeiro et al., 2010) summarized and

75 reviewed the sialotranscriptomes of Nematocera in Diptera, in which 10 species of mosquitoes

76 were inclusive. The proteomics researches of salivary gland have mainly concentrated on the

77 comparison before and after mosquito blood sucking or Plasmodium infection (Siriyasatien et

78 al., 2005b; Cotama et al., 2013). A small amount of salivary proteins have been preliminarily

79 investigated: the blood feeding related proteins such as vasodilator (Ribeiro, 1992), apyrase

80 (Champagne et al., 1995) and anophelin (Francischetti et al., 1999), and the immunity related

81 proteins such as aedesin with antimicrobial activity (Godreuil et al., 2014), inhibitor of

82 cysteine proteases associated with Plasmodium infection (Boysen & Matuschewski, 2013).

This article is protected by copyright. All rights reserved.

83 Generally, the knowledge of classification, characteristics, evolution, and function of salivary

84 gland genes are still limited.

85 Although the genome and general transcriptome in An. sinensis have been reported (Zhou

86 et al., 2014; Chen et al., 2014), the sialotranscriptome of An. sinensis is still unknown. In the

87 study, we sequenced and annotated the sialotranscriptome of An. sinensis based on next

88 generation of sequencing technique, classified the unigenes of the sialotranscriptome into

89 various categories, and compared the sialotranscriptome with Ps. albipes sialotranscriptome.

90 The An. sinensis sialotranscriptome sequence data is most complete so far, and the analyses

91 provide a comprehensive information frame for further research of the mosquito

92 sialotranscriptome.

93

94 Materials and methods

95 Insect samples, salivary gland dissection and RNA extraction

96 The laboratory colony of An. sinensis was reared in the Institute of Entomology and

97 Molecular Biology, Chongqing Normal University, China in a 12 : 12h light cycle and at 26 ±

98 1°C with 75% ± 5% relative humidity. The colony was established five years ago based on a

99 single pregnant female adult originally collected from Wuxi, Jiangsu, and confirmed to be An.

100 sinensis by genome sequencing in our institute. Female adult mosquitoes were fed with 10%

101 glucose solution and collected at the age of 3–5 days post emergence for salivary gland

102 dissection.

This article is protected by copyright. All rights reserved.

103 The collected mosquitoes were frozen in a freezer to be knocked down, and then placed in a

104 Petri dish that was kept cold on ice. Salivary glands were dissected in a microplate-well with a

105 drop of sterilized 1× Phosphate Buffer Saline (1×PBS) solution using fine-tipped forceps. The

106 thorax was fixed with a pair of forceps, and the head was pull apart with another pair of forceps.

107 The salivary glands located at the anterior portion of the thorax were exposed in the process of

108 pulling-apart, peeled off with removing connected tissues, and cleaned with the 1×PBS buffer.

109 A total of 180 salivary glands were collected and dipped into 1 mL Trizol solution (Ambion,

110 USA) for RNA preparation.

111 Total RNA was separately extracted using TRIzol Reagent (Ambion, USA) following the

112 manufacturer‟s protocol. To eliminate genomic DNA, the RNA samples were treated with

113 RNase-Free DNase I according to manufacturer‟s protocol (Qiagen, USA). The RNA quality

114 was detected using Nanodrop 1000 and agarose gel electrophoresis. The RNA integrity was

115 confirmed using the Agilent 2100 Bioanalyzer with a minimum integrity number value of 7.

116

117 cDNA synthesis and Illumina sequencing

118 mRNA was purified using beads with Oligo (dT) from total RNA, and fragmentation buffer

119 was added for interrupting mRNA to short fragments. Taking these short fragments as

120 templates, first-strand cDNA was generated using random hexamer-primed reverse

121 transcription, followed by the synthesis of the second-strand cDNA using Buffer, dNTPs,

122 RNase H and DNA polymerase I. The cDNA fragments were purified using a QIAquick PCR

123 extraction kit.

This article is protected by copyright. All rights reserved.

124 These purified cDNA fragments were then washed with EB buffer, and poly (A) end and

125 sequencing adapters were added to the fragments. Following agarose gel electrophoresis and

126 extraction of cDNA from gels, the cDNA fragments were purified and enriched by PCR to

127 construct the final cDNA library. The cDNA library was sequenced on the Illumina sequencing

128 platform (Illumina HiSeq™ 2500) using the paired-end technology by Gene Denovo Company

129 (Guangzhou, China). A Perl program was written to select clean reads by removing low quality

130 sequences (reads with more than 50% Q<20), reads with more than 5% N bases (bases

131 unknown) and reads containing adaptor sequences. The data quality was evaluated by FastQC.

132 The clean sequencing data was deposited in NCBI as Short Read Archive (SRA) with an

133 accession number SRA426337 (http:// www.ncbi.nlm.nih.gov/sra).

134

135 Functional annotation of salivary gland transcriptome

136 Sequencing reads in FASTQ format were mapped to reference genome and splice

137 junctions were identified using TopHat (Kim et al., 2013). Cufflinks package (Trapnell

138 et al., 2012) was used for genome guided transcript assembly and expression

139 abundance estimation. The whole transcripts were compared with several databases to

140 obtain function annotation using Blastx (Ye et al., 2006) with E-value cut-off of 1e-5.

141 The gene structure was optimized according to the result of transcripts assembly from

142 Cufflinks. The extensions of 5′ and 3′ boundaries were determined by comparison of

143 the potential gene model with the existing gene annotation. The databases, including

144 NCBI non-redundant protein (Nr) database, SwissProt, InterPro, Gene Ontology (GO)

145 database, Clusters of Orthologous Groups of proteins (COG) database and Kyoto

This article is protected by copyright. All rights reserved.

146 Encyclopedia of Genes and Genomes (KEGG) database, were used to annotate the

147 function of the genes. Pfam and SMART databases were used to search the conserved

148 protein domains. The SignalP server (Nielsen et al., 1997) was used to predict the

149 secreted proteins. TMHMM Server v.2.0 was used to predict the transmembrane

150 domain (http://www.cbs.dtu.dk/services/TMHMM/). The program NetOGlyc was used

151 to predict O-glycosylation sites on the proteins (Julenius et al., 2005).

152

153 Gene classification of transcripts and comparison with Ps. albipes sialotranscriptome

154 SignalP (Petersen et al., 2011) was used to predict the genes of An. sinensis

155 sialotranscriptome, and as a result all transcripts were classified into both housekeeping

156 and secreted protein categories. The transcripts were further annotated by Blast against

157 the salivary gland protein database of Nematocera including Culicidae, Simuliidae,

158 Ceratopogonidae and Psychodidae, and then classified into different classes with at

159 least 40% sequence similarity in comparison with defined classes in the database. The

160 housekeeping category of genes was finally classified into 29 classes, and the secreted

161 protein category of genes was further classified into 11 classes in reference of Ribeiro

162 et al. (2010) (Ribeiro et al., 2010) and Chagas et al. (2013) (Chagas et al., 2013).

163 The annotation file of Ps. albipes sialotranscriptome was obtained from Chagas et al.

164 (2013)(Chagas et al., 2013). A total of 3247 CDs and counterpart protein sequences

165 were downloaded from

166 http://exon.niaid.nih.gov/transcriptome/Psorophora_albipes/Pso-s2-web.xlsx. The

167 secreted genes and their numbers of Ps. albipes sialotranscriptome were counted, and

168 the classification of the genes were summarized, including the categories and

This article is protected by copyright. All rights reserved.

169 subcategories. The genes and numbers in each category, subcategory and class were

170 compared and summarized between An. sinensis and Ps. albipes sialotranscriptomes.

171

172 Results and Discussion

173 Sequencing results of An. sinensis sialotranscriptome

174 A total of 46.43 million raw reads (5.80 Gnt) were obtained from the sialotranscriptome

175 sequencing (Table 1). After removal of the adapters, high rate „N‟ and low quality reads, we

176 obtained 45.85 million clean reads (5.73 Gnt), and the value of Q20 was 95.54%, GC content

177 was 51.55%, and unknown base „N‟ was 0.00% (Table 1). These clean reads were assembled

178 by TopHat and Cufflinks, which produced 10 907 unigenes. Out of these unigenes, 10 470

179 could be mapped to reference genome, and it occupied 61.79% of annotated genes in reference

180 genome. To evaluate the assembled transcripts, coverage range and percentage coverage were

181 calculated. The 70.46% genes could be mapped to the reference genome with more than 90%

182 coverage, which indicated the high quality of assembled unigenes (Fig. 1).

183

184 Functional annotation

185 Based on gene ontology (GO) annotation, 3625 of these 10 470 transcripts were annotated

186 and categorized into 3 main categories and 68 subcategories (Fig. 2), including biological

187 process (30 subcategories), cellular component (14), and molecular function (24) (Table 1 and

188 Fig. 2). In the biological process, the subcategories metabolic process, cellular process and

189 single-organism process were top-ranked in term of gene number. In the cellular component,

This article is protected by copyright. All rights reserved.

190 the subcategories cell and cell part were top-ranked. In the molecular function, the

191 subcategories metabolic process, catalytic activity and binding were larger than others. The

192 result showed these transcripts might play a role in structure, regulation, transport and

193 metabolism.

194 To predict the biological pathway, these 10 470 transcripts were searched against KEGG

195 database. Out of them, 4073 transcripts were annotated and categorized into 234 KEGG

196 pathways (Table S2). The most represented pathways were Metabolic pathways (684 genes),

197 Pathways in cancer (129), RNA transport (125), Protein processing in endoplasmic reticulum

198 (112), Purine metabolism (111), Huntington‟s disease (109), and Spliceosome (103),

199 Endocytosis (101), Alzheimer‟s disease (100). In addition, some transcripts took part in

200 salivary secretion, amino acid metabolism, lipid metabolism, carbohydrate metabolism and

201 drug metabolism. These transcripts may be involved in the transcription, digestion, immunity

202 and detoxification.

203 Out of these 10 470 transcripts, 8339 genes were annotated by SwissProt database, 8214 by

204 InterPro database, and 5454 by COG database (Fig. 3). The conserved domain was predicted in

205 8091 genes with Pfam (http://pfam.xfam.org/), whereas 4135 genes with SMART

206 (http://smart.embl-heidelberg.de/). A total of 1157 genes were predicted to have signal peptide,

207 which suggest that these genes may be secreted proteins. All of the annotated results were

208 presented in Table S1.

209

210 Classification of An. sinensis salivary genes and comparison with the Ps. albipes

This article is protected by copyright. All rights reserved.

211 Based on the functional annotation and reference classification of Ribeiro et al.

212 (2010) and Chagas et al. (2013), the An. sinensis salivary gland genes were classified

213 into four categories: housekeeping function genes (63.34% of the total), secreted

214 function genes (11.24%), protein-coding genes with function unknown (25.27%) and

215 transposable element (0.15%) (Fig. 4). There are 6632 housekeeping genes, which were

216 further divided into 27 classes (Table S1 and Table S3). These genes mainly involve in

217 protein synthesis machinery, transporters and channels, signal transduction,

218 detoxification, nucleotide metabolism, native immunity, and cytoskeletal formation.

219 They have not been well discussed in earlier reports, and it is necessary for detailed

220 summary and functional discussion with further sialotranscriptome data and research.

221 There are 1177 secreted genes, which were divided into 11 classes and further 96

222 families (Table 2, Table S1). These genes have main responsibility for salivary

223 functions, including blood feeding and anticoagulation, digestion and nutrition,

224 immunity and pathogen prevention. Due to the importance, we will describe and

225 discuss them in more detail in the sections below based on their main classes. In

226 addition, there are 2646 protein-coding genes with function unknown, and 15

227 transposable elements were identified in the An. sinensis sialotranscriptome.

228 The classification system of sialotranscriptome came from Ribeiro et al. (2010)

229 (Ribeiro et al., 2010), and afterward Chagas et al. (2013) (Chagas et al., 2013) followed

230 the system for Ps. albipes sialotranscriptome. We also used the system in the present

231 study, and with the increasing of An. sinensis sialotranscriptome sequencing, the

232 system appears somewhat contradictory. For example, we identified one gene in the

233 subclass “Uniquely found in culicines”, 13 genes in the class “Protein families specific

This article is protected by copyright. All rights reserved.

234 of black ”, and 106 genes in the class “Families not reported on Nematocera -

235 sialome review”. It is necessary to improve the classification system with further

236 research of sialotranscriptomes.

237

238 Ubiquitous protein families existing outside Dipetera

239 A total of 437 genes (23 families) in the subclass Enzyme were identified in An.

240 sinensis sialotranscriptome, whereas there were only 46 genes (13 families) in Ps.

241 albipes sialotranscriptome (Table S4). The families

242 Pyrophosphatase/Phosphodiesterase, Metalloprotease, Dipeptidyl peptidase,

243 Carboxylesterase, Triglyceride lipase, Phlebotomus phospholipase A2, Peroxiredoxin,

244 Glutamate carboxypeptidase, Chitinase, Anopheline peroxidases and Purine hydrolase

245 are unique in An. sinensis, while the family Sphingomyelin phosphodiesterase is unique

246 in Ps. albipes in the comparison of these two sialotranscriptomes. The family

247 Dipeptidyl peptidase was the first report in salivary glands of mosquitoes investigated.

248 The families Adenosine deaminase (7 in An. sinensis and 1 Ps. albipes), Ribonuclease

249 (31 and 3), Serine protease (162 and 5), Cathepsin (8 and 3), Mosquito lipase (5 and 1),

250 Serine-type carboxypeptidase (10 and 2), and Glycosidase (28 and 7) were much

251 expanded in An. sinensis with gene number more than twice of Ps. albipes gene number

252 (Table S4).

253 The subclass Immunity related proteins was also much expanded in An. sinensis,

254 with 146 and 41 genes in An. sinensis and Ps. albipes, respectively. The expansion in

255 An. sinensis mainly happened in the families Defensin, Cecropin, Galectin,

256 Fred/Ficolin and Leucine rich protein, while the family Lysozyme was much expanded

This article is protected by copyright. All rights reserved.

257 in Ps. albipes. The family Diptericin is unique in An. sinensis in comparison with Ps.

258 albipes. In contrast, the subclass OBP superfamily was much expanded in Ps. albipes,

259 with 26 and 51 genes in An. sinensis and Ps. albipes, respectively. The families

260 Anopheline short D7 family and Phlebotomine long D7 family are unique for the

261 former, while the family Culicine short-D7 protein is unique for the later. The family

262 Salivary mosquito OBP was much expanded in the former, whereas the family Long

263 form D7 salivary protein for the later. The uniqueness and expansion might be due to

264 phylogenetic difference with An. sinensis belonging to Anophelinae and Ps. albipes to

265 Culicinae, and distribution difference with the former in Asia and the later in America.

266 The gene numbers in the subclasses Ubiquitous protease inhibitor domains and

267 Mucins and peritrophins were comparative between An. sinensis and Ps. albipes. There

268 were 42 and 39 genes in Ubiquitous protease inhibitor domains in An. sinensis and Ps.

269 albipes, respectively. The families Culicoides Kunitz proteins and Simulium Kunitz

270 proteins are unique for the former and the first report in mosquitoes investigated, and

271 the family Metalloproteinase inhibitor is unique for the later in comparison with An.

272 sinensis. There were 57 and 56 genes in Mucins and peritrophins in An. sinensis and Ps.

273 albipes, respectively. The Widespread mucin family is unique for the former, while the

274 family Aedes- specific mucin unique for the later. The family Peritrophin/chitin binding

275 was much expanded in the former, whereas the families Mucin I and gSG5 were much

276 expanded.

277 In other ubiquitous families, the families Selenoprotein and Aedes

278 Phosphatidylethanolamine-binding protein are unique, and the family Yellow

279 phlebotomine was much expanded in An. sinensis in comparison of Ps. albipes.

This article is protected by copyright. All rights reserved.

280

281 Protein families specific of mosquitoes

282 The subclass found in both culicines and anophelines was much expanded in Ps.

283 albipes, with 17 and 46 genes in An. sinensis and Ps. albipes, respectively. The families

284 56 kDa, 37.7 kDa, and 4.3 kDa are unique, and the family Anopheline SG1 was much

285 expanded in An. sinensis. The families HHH peptide, Aedes/An. darlingi 14-15, gSG8

286 and Aedes 62 kDa are unique, and the families Basic tail, HHH family 2, and Hyp6.2

287 were much expanded in Ps. albipes. There were 34 genes identified in the subclass

288 Uniquely found in culicines in Ps. albipes, but one gene in the family 23.5 kDa salivary

289 protein was also identified in An. sinensis, which implies that the family of genes is not

290 specific for culicines. As expected, the subclass Uniquely found in anophelines were

291 only found in An. sinensis, whereas the subclass Uniquely found in Aedes only in Ps.

292 albipes.

293

294 Other classes of secreted genes

295 For the classes Ubiquitous protein families existing also outside Nematocera (with

296 function unknown), Ubiquitous insect protein family existing also outside Nematocera

297 (with function unknown), and Protein families exclusive of blood sucking Nematocera,

298 there were genes existing in all their subclasses for both An. sinensis and Ps. albipes;

299 however, the former was expanded in An. sinensis and the later was much expanded in

300 Ps. albipes in comparison of each other (Table S4). For the class Protein families

301 specific of black flies, the family Acetylcholine receptor/Simulium nigrimanum

302 8-10Cys is unique for An. sinensis, while families H-rich, acidic proteins of Simulium

This article is protected by copyright. All rights reserved.

303 and Simulium disintegrin similar to Phenoloxidase inhibitor peptide are unique for Ps.

304 albipes. The class Salivary-orphan proteins of conserved secreted families was much

305 expanded, and the class Orphan proteins of unique standing was unique in An. sinensis

306 in comparison with Ps. albipes. There was one gene in Aedes 7 kDa family found in An.

307 sinensis. The class Family not reported on Nematocera (sialome review) was divided

308 into 20 families, with two families owning genes in both An. sinensis and Ps. albipes,

309 10 families were unique for An. sinensis and eight families unique for Ps. albipes in

310 comparison of each other (Table S4). In addition, there were 231 and 379 other secreted

311 genes identified in An. sinensis and Ps. albipes, respectively.

312

313 Protein families found in both An. sinensis and Ps. albipes

314 As a result of sialotranscriptome comparison between An. sinensis and Ps. albipes, there

315 were 7 classes and 45 gene families common in these two mosquito species. In the class

316 Ubiquitous protein families existing outside Dipetera, function known or presumed, there

317 were12 families identified in the subclass Enzyme, 11 families in the subclass Immunity

318 related proteins, 8 families in the subclass OBP superfamily, 2 families in the subclass Mucins

319 and peritrophins, and 1 family in the subclass Other ubiquitous families. There were 1 family in

320 the class Ubiquitous protein families existing also outside Nematocera, with function

321 unknown, 4 families in the class Ubiquitous insect protein family existing also outside

322 Nematocera, with function unknown, 2 families in the class Protein families exclusive of blood

323 sucking Nematocera, and 6 families in the class Protein families specific of mosquitoes. In

This article is protected by copyright. All rights reserved.

324 addition, the 25 genes in the class Other putative secreted proteins were identified both in An.

325 sinensis and Ps. albipes.

326 Because these two species were blood-sucking, theses common gene families might be

327 involved in the process of blood feeding and immunity. Combined with past research, the

328 families 5′ nucleotidase/Apyrase (Ribeiro JMC, 1985), Adenosine deaminase (Ribeiro et al.,

329 2001), Hyaluronidase (Cerna et al., 2002) and Serine protease (Hedstrom, 2002) might take

330 part in the process of blood feeding. At the same time, the Hyaluronidase and Serine protease

331 might take part in the immune process.

332 Serpin, Kazal, TIL, Cystatins and Schistocerca protease inhibitor existed in these two

333 mosquitoes, and Kazal, TIL and Cystatins might act as immune inhibitor to function against

334 bacteria or suppress the immune inflammation in the host (Kanost, 1999; Rawlings et al., 2004;

335 Kotsyfakis et al., 2007). Lysozyme, Defensin, Gambicin, Cecropin, ML domains, Galectin,

336 C-type lectin, Peptidoglycan recognition protein, Fred/Ficolin, Leucine rich protein and

337 Gram-negative binding protein were common in these two mosquitoes. Among them,

338 Fred/Ficolin and C-type lectin may act as pathogen recognition receptors, which could cause

339 the pathogen to be aggregated together, so then induced other immune cells to participate in the

340 immune response (Fujita et al., 2004). Mucin I mosquito family, virus induced mucin,

341 Simulium mucin, gSG5 mucin protein family, SG3 protein, Mucin II mosquito family, Other

342 mucin and Peritrophin/chitin binding were ubiquitous in the salivary gland and the intestinal

343 tract of the , and they might take part in the immune regulation induced by virus or

344 bacteria (Ribeiro et al., 2010).

This article is protected by copyright. All rights reserved.

345 Salivary mosquito OBP and Long form D7 salivary protein were both in the salivary glands

346 of An. sinensis and Ps. albipes. Based on the past research of Aedes aegypti salivary gland, D7

347 protein had a high affinity for leukotriene and it might be involved in immune regulation

348 (Calvo et al., 2006; Calvo et al., 2009a). From past research, Antigen 5-related protein in the

349 horsefly salivary gland mainly acted as a platelet aggregation inhibitor, and could have the

350 protease hydrolysis activity in snake venom and the lizard venom, while it also could combine

351 with immunoglobulin in the salivary gland of Stomoxys calcitrans (Nobile et al., 1996;

352 Yamazaki et al., 2002; Ameri et al., 2008).

353 However, the saglin identified in An. sinensis but not in Ps. albipes have been reported, and

354 it may act as a receptor plasmodium sporozoite, and mediate Plasmodium invasive mosquito

355 salivary gland (Ghosh et al., 2009). In comparison of An. sinensis and Ps. albipes, most of

356 secreted genes were common in sialotranscriptomes of these two species, and they might be

357 involved in digestion or take part in the blood-sucking. However, there were still some genes

358 unique existing in one of them, and it might be associated with the geographical distribution,

359 their living environment, host, or the pathogens, and so on.

360

361 Acknowledgments

362 This research was supported by the following, Par-Eu Scholars Program (20136666), The

363 National Natural Science Foundation of China (31672363, 31372265), Coordinated Research

364 Project of the International Atomic Energy Agency (18268/R1), National Key Program of

365 Science and Technology Foundation Work of China (2015FY210300) and Chongqing

This article is protected by copyright. All rights reserved.

366 graduate research innovation project (CYS14139). Conceived and designed the research: BC,

367 YJF. Performed the analysis: YJF, BC, ZTY. Wrote the paper: YJF, BC.

368

369 Disclosure

370 The authors declare no conflict of interest.

371

372 References

373 Ameri, M., Wang, X., Wilkerson, M.J., Kanost, M.R. and Broce, A.B. (2008) An immunoglobulin binding protein 374 (antigen 5) of the stable (Diptera: Muscidae) salivary gland stimulates bovine immune responses. 375 Journal of Medical Entomology, 45, 94–101.

376 Arca, B., Lombardo, F., Francischetti, I.M., Pham, V.M., Mestres-Simon, M., Andersen, J.F. and Ribeiro, J.M. 377 (2007) An insight into the sialome of the adult female mosquito Aedes albopictus. Insect Biochemistry 378 and Molecular Biology, 37, 107–127.

379 Boysen, K.E. and Matuschewski, K. (2013) Inhibitor of cysteine proteases is critical for motility and infectivity of 380 Plasmodium sporozoites. MBio, 4, e00874-13.

381 Calvo, E., Andersen, J., Francischetti, I.M., De, L.C.M., Debianchi, A.G., James, A.A., Ribeiro, J.M. and 382 Marinotti, O. (2004) The transcriptome of adult female Anopheles darlingi salivary glands. Insect 383 Molecular Biology, 13, 73–88.

384 Calvo, E., Dao, A., Pham, V.M. and Ribeiro, J.M. (2007) An insight into the sialome of Anopheles funestus 385 reveals an emerging pattern in anopheline salivary protein families. Insect Biochemistry and Molecular 386 Biology, 37, 164–175.

387 Calvo, E., Mans, B.J., Andersen, J.F. and Ribeiro, J.M. (2006) Function and evolution of a mosquito salivary 388 protein family. Journal of Biological Chemistry, 281, 1935–1942.

This article is protected by copyright. All rights reserved.

389 Calvo, E., Mans, B.J., Ribeiro, J.M. and Andersen, J.F. (2009a) Multifunctionality and mechanism of ligand 390 binding in a mosquito antiinflammatory protein. Proceedings of the National Academy of Sciences of the 391 United States of America, 106, 3728–3733.

392 Calvo, E., Pham, V.M., Marinotti, O., Andersen, J.F. and Ribeiro, J.M. (2009b) The salivary gland transcriptome 393 of the neotropical malaria vector Anopheles darlingi reveals accelerated evolution of genes relevant to 394 hematophagy. BMC Genomics, 10, 57.

395 Calvo, E., Pham, V.M. and Ribeiro, J.M. (2008) An insight into the sialotranscriptome of the non-blood feeding 396 Toxorhynchites amboinensis mosquito. Insect Biochemistry and Molecular Biology, 38, 499–507.

397 Calvo, E., Sanchez-Vargas, I., Favreau, A.J., Barbian, K.D., Pham, V.M., Olson, K.E. and Ribeiro, J.M. (2010a) 398 An insight into the sialotranscriptome of the West Nile mosquito vector, Culex tarsalis. BMC Genomics, 399 11, 51.

400 Calvo, E., Sanchez-Vargas, I., Kotsyfakis, M., Favreau, A.J., Barbian, K.D., Pham, V.M., Olson, K.E. and 401 Ribeiro, J.M. (2010b) The salivary gland transcriptome of the eastern tree hole mosquito, Ochlerotatus 402 triseriatus. Journal of Medical Entomology, 47, 376–386.

403 Cerna, P., Mikes, L. and Volf, P. (2002) Salivary gland hyaluronidase in various species of phlebotomine sand 404 flies (Diptera: psychodidae). Insect Biochemistry and Molecular Biology, 32, 1691–1697.

405 Chagas, A.C., Calvo, E., Rios-Velasquez, C.M., Pessoa, F.A., Medeiros, J.F. and Ribeiro, J.M. (2013) A deep 406 insight into the sialotranscriptome of the mosquito, Psorophora albipes. BMC Genomics, 14, 875.

407 Champagne, D.E., Smartt, C.T., Ribeiro, J.M. and James, A.A. (1995) The salivary gland-specific apyrase of the 408 mosquito Aedes aegypti is a member of the 5'-nucleotidase family. Proceedings of the National Academy 409 of Sciences of the United States of America, 92, 694–698.

410 Chen, B., Zhang, Y.J., He, Z., Li, W., Si, F., Tang, Y., He, Q., Qiao, L., Yan, Z., Fu, W. and Che, Y. (2014) De 411 novo transcriptome sequencing and sequence analysis of the malaria vector Anopheles sinensis (Diptera: 412 Culicidae). Parasites & Vectors, 7, 314.

413 Cotama, S., Dekumyoy, P., Samung, Y. and Lek-Uthai, U. (2013) Salivary Glands Proteins Expression of 414 Anopheles dirus A Fed on - and Plasmodium falciparum-Infected Human Blood. 415 Journal of Parasitology Research, 2013, 535267.

This article is protected by copyright. All rights reserved.

416 Das, S., Radtke, A., Choi, Y.J., Mendes, A.M., Valenzuela, J.G. and Dimopoulos, G. (2010) Transcriptomic and 417 functional analysis of the Anopheles gambiae salivary gland in relation to blood feeding. BMC 418 Genomics, 11, 566.

419 Dixit, R., Sharma, A., Mourya, D.T., Kamaraju, R., Patole, M.S. and Shouche, Y.S. (2009) Salivary gland 420 transcriptome analysis during Plasmodium infection in malaria vector Anopheles stephensi. 421 International Journal of Infectious Diseases, 13, 636–646.

422 Francischetti, I.M., Valenzuela, J.G., Pham, V.M., Garfield, M.K. and Ribeiro, J.M. (2002) Toward a catalog for 423 the transcripts and proteins (sialome) from the salivary gland of the malaria vector Anopheles gambiae. 424 Journal of Experimental Biology, 205, 2429–2451.

425 Francischetti, I.M., Valenzuela, J.G. and Ribeiro, J.M. (1999) Anophelin: kinetics and mechanism of thrombin 426 inhibition. Biochemistry, 38, 16678–16685.

427 Fujita, T., Matsushita, M. and Endo, Y. (2004) The lectin-complement pathway–its role in innate immunity and 428 evolution. Immunological Reviews, 198, 185–202.

429 Ghosh, A.K., Devenport, M., Jethwaney, D., Kalume, D.E., Pandey, A., Anderson, V.E., Sultan, A.A., Kumar, N. 430 and Jacobs-Lorena, M. (2009) Malaria parasite invasion of the mosquito salivary gland requires 431 interaction between the Plasmodium TRAP and the Anopheles saglin proteins. PLoS Pathogens, 5, 432 e1000265.

433 Godreuil, S., Leban, N., Padilla, A., Hamel, R., Luplertlop, N., Chauffour, A., Vittecoq, M., Hoh, F., Thomas, F., 434 Sougakoff, W., Lionne, C., Yssel, H. and Misse, D. (2014) Aedesin: structure and antimicrobial activity 435 against multidrug resistant bacterial strains. PLoS ONE, 9, e105441.

436 Hedstrom, L. (2002) Serine protease mechanism and specificity. Chemical Reviews, 102, 4501–4524.

437 Julenius, K., Molgaard, A., Gupta, R. and Brunak, S. (2005) Prediction, conservation analysis, and structural 438 characterization of mammalian mucin-type O-glycosylation sites. Glycobiology, 15, 153–164.

439 Kanost, M.R. (1999) Serine proteinase inhibitors in arthropod immunity. Developmental and Comparative 440 Immunology, 23, 291–301.

441 Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R. and Salzberg, S.L. (2013) TopHat2: accurate alignment 442 of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology, 14, R36.

This article is protected by copyright. All rights reserved.

443 Kotsyfakis, M., Karim, S., Andersen, J.F., Mather, T.N. and Ribeiro, J.M. (2007) Selective cysteine protease 444 inhibition contributes to blood-feeding success of the tick Ixodes scapularis. Journal of Biological 445 Chemistry, 282, 29256–29263.

446 Nielsen, H., Engelbrecht, J., Brunak, S. and Von Heijne, G. (1997) Identification of prokaryotic and eukaryotic 447 signal peptides and prediction of their cleavage sites. Protein Engineering, 10, 1–6.

448 Nobile, M., Noceti, F., Prestipino, G. and Possani, L.D. (1996) Helothermine, a lizard venom toxin, inhibits 449 calcium current in cerebellar granules. Experimental Brain Research, 110, 15–20.

450 Petersen, T.N., Brunak, S., Von Heijne, G. and Nielsen, H. (2011) SignalP 4.0: discriminating signal peptides 451 from transmembrane regions. Nature Methods, 8, 785–786.

452 Rawlings, N.D., Tolle, D.P. and Barrett, A.J. (2004) Evolutionary families of peptidase inhibitors. Biochemical 453 Journal, 378, 705–716.

454 Ribeiro, J.M. (1992) Characterization of a vasodilator from the salivary glands of the yellow fever mosquito Aedes 455 aegypti. Journal of Experimental Biology, 165, 61–71.

456 Ribeiro, J.M., Arca, B., Lombardo, F., Calvo, E., Phan, V.M., Chandra, P.K. and Wikel, S.K. (2007) An annotated 457 catalogue of salivary gland transcripts in the adult female mosquito, Aedes aegypti. BMC Genomics, 8, 6.

458 Ribeiro, J.M., Charlab, R., Pham, V.M., Garfield, M. and Valenzuela, J.G. (2004) An insight into the salivary 459 transcriptome and proteome of the adult female mosquito Culex pipiens quinquefasciatus. Insect 460 Biochemistry and Molecular Biology, 34, 543–563.

461 Ribeiro, J.M., Charlab, R. and Valenzuela, J.G. (2001) The salivary adenosine deaminase activity of the 462 mosquitoes and Aedes aegypti. Journal of Experimental Biology, 204, 463 2001–2010.

464 Ribeiro, J.M., Mans, B.J. and Arca, B. (2010) An insight into the sialome of blood-feeding Nematocera. Insect 465 Biochemistry and Molecular Biology, 40, 767–784.

466 Ribeiro Jmc, R.P., Spielman A (1985) Salivary gland apyrase determines probing time in anopheline mosquitoes. 467 Journal of Insect Physiology, 31, 4.

468 Sinka, M.E., Bangs, M.J., Manguin, S., Chareonviriyaphap, T., Patil, A.P., Temperley, W.H., Gething, P.W., 469 Elyazar, I.R., Kabaria, C.W., Harbach, R.E. and Hay, S.I. (2011) The dominant Anopheles vectors of 470 human malaria in the Asia-Pacific region: occurrence data, distribution maps and bionomic precis. 471 Parasites & Vectors, 4, 89.

This article is protected by copyright. All rights reserved.

472 Siriyasatien, P., Tangthongchaiwiriya, K., Jariyapan, N., Kaewsaitiam, S., Poovorawan, Y. and Thavara, U. 473 (2005a) Analysis of salivary gland proteins of the mosquito Armigeres subalbatus. Southeast Asian 474 Journal of Tropical Medicine and Public Health, 36, 64–67.

475 Siriyasatien, P., Tangthongchaiwiriya, K., Kraivichian, K., Nuchprayoon, S., Tawatsin, A. and Thavara, U. 476 (2005b) Decrease of mosquito salivary gland proteins after a blood meal: an implication for pathogenesis 477 of mosquito bite allergy. Journal of the Medical Association of Thailand, 88 Suppl 4, S255–S259.

478 Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L. and 479 Pachter, L. (2012) Differential gene and transcript expression analysis of RNA-seq experiments with 480 TopHat and Cufflinks. Nature Protocols, 7, 562–578.

481 Valenzuela, J.G., Francischetti, I.M., Pham, V.M., Garfield, M.K. and Ribeiro, J.M. (2003) Exploring the salivary 482 gland transcriptome and proteome of the Anopheles stephensi mosquito. Insect Biochemistry and 483 Molecular Biology, 33, 717–732.

484 Yamazaki, Y., Koike, H., Sugiyama, Y., Motoyoshi, K., Wada, T., Hishinuma, S., Mita, M. and Morita, T. (2002) 485 Cloning and characterization of novel snake venom proteins that block smooth muscle contraction. 486 European Journal of Biochemistry, 269, 2708–2715.

487 Ye, J., Fang, L., Zheng, H., Zhang, Y., Chen, J., Zhang, Z., Wang, J., Li, S., Li, R., Bolund, L. and Wang, J. (2006) 488 WEGO: a web tool for plotting GO annotations. Nucleic Acids Research, 34, W293–W297.

489 Zhang, S., Cheng, F. and Webber, R. (1994) A successful control programme for lymphatic filariasis in Hubei, 490 China. Transactions of the Royal Society of Tropical Medicine and Hygiene, 88, 510–512.

491 Zhou, D., Zhang, D., Ding, G., Shi, L., Hou, Q., Ye, Y., Xu, Y., Zhou, H., Xiong, C., Li, S., Yu, J., Hong, S., Yu, 492 X., Zou, P., Chen, C., Chang, X., Wang, W., Lv, Y., Sun, Y., Ma, L., Shen, B. and Zhu, C. (2014) 493 Genome sequence of Anopheles sinensis provides insight into genetics basis of mosquito competence for 494 malaria parasites. BMC Genomics, 15, 42.

495

496

497 Manuscript received June 27, 2016

498 Final version received October 27, 2016

499 Accepted November 15, 2016

This article is protected by copyright. All rights reserved.

500

501

502

503 Table 1 Statistics of sequencing, assembly and functional annotation for the transcriptome of 504 An. sinensis salivary gland.

Sequencing results Number of total raw reads 46,434,832

Number of total raw data (nt) 5,804,354,000

Number of total clean reads 45,852,942

Number of clean data (nt) 5,731,617,750

Q20 percentage of total clean reads 95.54%

GC percentage of total clean data 51.55%

N percentage of total clean data 0.00%

Assembling results Number of all genes 10,907

Number of known genes 10,470

New genes 437

Annotation Known genes with GO annotation 3625 (68 sub-categories)

(E-value < = 1e-5) New genes with GO annotations 161

Known genes with KEGG annotation 4073 (234 pathways)

New genes with KEGG annotations 156

Biological process 30 sub-categories

Cellular component 14 sub-categories

Molecular function 24 sub-categories

This article is protected by copyright. All rights reserved.

505

506

507

508

509

510

This article is protected by copyright. All rights reserved.

511

512 Table 2 Classes/subclasses and gene number of secreted transcripts in An. sinensis and Ps. 513 albipes sialotranscriptomes. Detailed information is in Table S4.

Class/subclass An. sinensis Ps. albipes†

1. Ubiquitous protein families existing outside Dipetera,

function known or presumed

Enzyme 437 46

Ubiquitous protease inhibitor domains 42 39

Immunity related proteins 146 41

Mucins and peritrophins 57 56

OBP superfamily 26 51

Other ubiquitous families 18 2

2. Ubiquitous protein families existing also outside 22 13 Nematocera, with function unknown

3. Ubiquitous insect protein family existing also outside 10 9 Nematocera, with function unknown

4. Protein families exclusive of blood sucking Nematocera 3 13

5. Protein families specific of mosquitoes

Found in both culicines and anophelines 17 46

Uniquely found in culicines‡ 1 34

Uniquely found in anophelines 6

Uniquely found in Aedes 14

6. Protein families specific of black flies‡ 13 2

7. Salivary-orphan proteins of conserved secreted families 33 8

8. Orphan proteins of unique standing 8

9. Deorphanized proteins 1 27

This article is protected by copyright. All rights reserved.

10. Families not reported on Nematocera - sialome review‡ 106 40

11. Other putative secreted proteins 231 370

Total 1177 811

514 †Chagas et al. (2013).

515 ‡Our annotation and classification results showed a number of conflicting with the 516 classification system of Ribeiro et al. (2010) and Chagas et al. (2013).

517

518

519

520 Figure legends:

521 Fig. 1 Coverage distribution of An. sinensis unigenes mapped to reference genome. Coverage 522 range and percentage coverage in bracket are shown in corresponding charts.

523

524 Fig. 2 Gene ontology function classifications of salivary gland genes in An. sinensis.

This article is protected by copyright. All rights reserved.

525

526 Fig. 3 COG function classification of salivary gland genes in An. sinensis.

527

This article is protected by copyright. All rights reserved.

528 Fig. 4 Gene number and percentage of main functional gene categories in An. sinensis

529 sialotranscriptome.

530

531

This article is protected by copyright. All rights reserved.