Accepted Manuscript

The Harderian gland transcriptomes of Caraiba andreae, Cubophis cantherigerus and Tretanorhinus variabilis, three colubroid from Cuba

Dany Domínguez-Pérez, Jordi Durban, Guillermin Agüero- Chapin, Javier Torres Lopez, Reinaldo Molina Ruiz, Daniela Almeida, Juan J. Calvete, Vítor Vasconcelos, Agostinho Antunes

PII: S0888-7543(18)30501-9 DOI: https://doi.org/10.1016/j.ygeno.2018.11.026 Reference: YGENO 9154 To appear in: Genomics Received date: 24 August 2018 Revised date: 31 October 2018 Accepted date: 27 November 2018

Please cite this article as: Dany Domínguez-Pérez, Jordi Durban, Guillermin Agüero- Chapin, Javier Torres Lopez, Reinaldo Molina Ruiz, Daniela Almeida, Juan J. Calvete, Vítor Vasconcelos, Agostinho Antunes , The Harderian gland transcriptomes of Caraiba andreae, Cubophis cantherigerus and Tretanorhinus variabilis, three colubroid snakes from Cuba. Ygeno (2018), https://doi.org/10.1016/j.ygeno.2018.11.026

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. ACCEPTED MANUSCRIPT

The Harderian gland transcriptomes of Caraiba andreae, Cubophis cantherigerus and

Tretanorhinus variabilis, three colubroid snakes from Cuba

Dany Domínguez-Pérez 1,2, +, Jordi Durban3, +; Guillermin Agüero-Chapin1,2; Javier Torres

Lopez4,5; Reinaldo Molina Ruiz6; Daniela Almeida1,2; Juan J. Calvete3,*, Vítor Vasconcelos

1,2 and Agostinho Antunes 1,2,*

1 CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research,

University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, Porto 4450-208, Portugal; [email protected] (D.D.-P.); [email protected]

(G.A.-Ch.); [email protected] (D.A.); [email protected] (V.V.); [email protected] (A.A.)

2 Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, s/n, Porto 4169-007, Portugal

3 Evolutionary and Translational Venomics Laboratory, CSIC, Jaume Roig, 11, 46010,

Valencia, Spain; [email protected] (J.D.); [email protected] (J.J.C.)

4 Department of Ecology and Evolutionary Biology, The University of Kansas, 1345

Jayhawk Blvd., Lawrence, Kansas 66045, USA; [email protected] (J.T.L.) 5 Faculty ofACCEPTED Biology, Havana University, MANUSCRIPT 25 St 455, La Habana CP 10400, Cuba 6 Centro de Bioactivos Químicos, Universidad Central “Marta Abreu” de Las Villas,

54830, Santa Clara, Cuba; [email protected] (R.M.-R.)

+ These authors contributed equally to this work ACCEPTED MANUSCRIPT

* Correspondence: [email protected] Tel.: +353‐22‐340‐1813; [email protected]

Abstract: The Harderian gland is a cephalic structure, widely distributed among vertebrates. In snakes, the Harderian gland is anatomically connected to the vomeronasal organ via the nasolacrimal duct, and in some can be larger than the eyes. The function of the Harderian gland remains elusive, but it has been proposed to play a role in the production of saliva, pheromones, thermoregulatory lipids and growth factors, among others. Here, we have profiled the transcriptomes of the Harderian glands of three non-front-fanged colubroid snakes from

Cuba, the Cuban Lesser Racer, Caraiba andreae; Cubophis cantherigerus (Cuban Racer); and the Caribbean Water , Tretanorhinus variabilis, using Illumina HiSeq2000 100 bp paired- end. In addition to ribosomal and non-characterized proteins, the most abundant transcripts encode putative transport/binding, lipocalin/lipocalin-like, and bactericidal/permeability- increasing-like proteins. Transcripts coding for putative canonical toxins described in venomous snakes were also identified. This transcriptional profile suggests a more complex function than previously recognized for this enigmatic organ.

Keywords: Harderian gland transcriptomics, non-front-fanged snakes, , Caraiba andreae, Cubophis cantherigerus, Tretanorhinus variabilis

ACCEPTED MANUSCRIPT

1. Introduction

With widespread distribution, often large size and unclear function, the Harderian gland represents one of the least understood organs among vertebrates [1]. This gland is located within ACCEPTED MANUSCRIPT

the eye’s orbit (Fig. 1), occupying a large volume [2-5] that in some species can exceed that of the eye [1]. The Harderian gland was first discovered in deer by Swiss anatomist Johann Jacob

Harder in 1694 [6], and it has been subsequently reported in most tetrapods (, amphibians, birds and mammals) that possess a nictitating membrane [1]. This gland seems to play an important role in terrestrial environments, because it has not been found in fish, aquatic urodeles, or in the aquatic forms of anurans. Nevertheless, it is present in some secondary aquatic vertebrates such as crocodilians and cetaceans [1]. Proposed functions for the Harderian gland include the production of saliva, pheromones, thermoregulatory lipids and growth factors, photoprotection, a site of immune response and osmoregulation [1,7].

In squamates, the Harderian gland is anatomically part of the retinal-pineal axis and acts as an accessory to the lacrimal gland, secreting fluid that eases movement of the nictitating membrane

[1,8-12]. Its morphological relationship with the vomeronasal (Jacobson's) organ (VNO) [13,14] and its role in vomerolfaction and/or chemosensory detection, are well documented [3,15]. Other hypothetical functions of the Harderian gland in squamates, such as acting as an accessory salivary gland [16], providing a source of lubrication [2,17], and in the production and secretion of digestive enzymes [9], have been discussed. However, apart from snakes and pygopods, the

Harderian gland empties its secretions directly into the orbital space [9,18], and thus, it seems unlikely that the Harderian gland acts as an accessory salivary gland, as proposed [16]. In addition, a deep revisionACCEPTED revealed little evidence MANUSCRIPT consistent with a role in digestion [19]. Further, the morphology of the squamate Harderian gland and the presence of alternative secretory organs made unlikely a role in corneal lubrication [15]. On the other hand, there is evidence supporting a function for the squamate Harderian in chemosensory/vomerolfaction, transferring chemical signals (female pheromones, feeding cues) to the chemosensory epithelium of the VNO [9,15,19- ACCEPTED MANUSCRIPT

21]. Moreover, there is evidence that the removal of the Harderian gland considerably reduced the response capacity of garter snakes towards female pheromone, thereby impairing matting

[20].

The Harderian gland has been considered “as the most inconstant of all the glands of the snake’s head in the extent of its development", since it varies from the condition of an incipient glandular tissue to an obvious organ in Dryophis, Homalopsis, Enhydris, Platurus [22,23], and in the African egg-eating snake Dasypeltis scabra [24]. The Harderian gland is often very large in cryptozoic and fossorial taxa [21,25-27] and in other species that eat worms or slugs, such as

Tropidodipsas sartorii [28]. The gland is present in sea snakes, although reduced in pelagic species [4]. The proposed lubricant function for the Harderian gland in snakes is supported by the anatomical connection between the orbit around the eye and the mouth via the lachrymal duct

[29,30], which transports fluid from the Harderian gland to the Jacobson’s organ [1]. It is worthy of note that all the snakes examined by West [22] had a typical albuminous (serous) Harderian gland with much smaller alveoli than the parotid gland, which originated the venom gland of extant venomous snakes. However, despite the unique morphologically differentiated structure of the Harderian gland, the identity of its secretions has not been unveiled in any snake species.

Proteomic analyses and Next Generation Sequencing (NGS) approaches combined with computational methods have been used to discover new protein/genes from a variety of tissues from non-model organismsACCEPTED and to figure out itsMANUSCRIPT roles in complex pathways like toxin production or detoxification process [31-44]. Specifically, de novo NGS studies have become the most important source of new data in large-scale given a comprehensive picture of the transcriptional activity of the venom gland of front-fanged and, to a lesser extent, non-front-fanged snakes [45-

59]. Here we report the first transcriptomes of the Harderian gland of three non-front-fanged ACCEPTED MANUSCRIPT

colubroid snakes from Cuba: Caraiba andreae (Cuban Lesser Racer), an endemic species of a monotypic ; and Cubophis cantherigerus (Cuban Racer) and Tretanorhinus variabilis

(Cuban water snake), two native species inhabiting also other Caribbean islands, such as the

Swan Islands and Cayman Islands, respectively, (although the specimens sampled represent endemic Cuban ). These snakes occupy different ecological niches with the first two being terrestrial and diurnal active foragers, while the Cuban water snake is a freshwater, nocturnal sit-and-wait predator [60]. To our knowledge, this study provides the first wide- transcriptomes analyses of the Harderian gland in reptiles. The here reported Harderian gland transcriptomes revealed transcripts encoding putative binding/transporters, antimicrobial and immunological proteins, but also toxins and other components found in snake venoms. This evidence suggests a more complex function than previously recognized for this enigmatic organ.

2. Results and Discussion

2.1. Harderian gland transcriptome and assembly statistics

The transcriptional activity of the Harderian glands of three species of colubroid snakes from

Cuba were investigated using Illumina HiSeq2000 100 bp (base-pairs) paired-end (PE). We obtained 54,593,038 (Caraiba andreae, Ca), 50,762,546 (Cubophis cantherigerus, Cc) and

56,115,370 (Tretanorhinus variabilis, Tv) reads, with average insert size of ~ 280 bp. The reads were de novo assembledACCEPTED into 63,333 (Ca), 27,666MANUSCRIPT (Cc) and 62,324 (Tv) good quality contigs (Table 1). Raw sequence data have been archived with the NCBI Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/Traces/sra) [61] under accession number SRP103723. Accession codes by species are SRX2730337 (Ca), SRX2730335 (Cc), and SRX2730333 (Tv). The

Transcriptome Shotgun Assemblies have been deposited at DDBJ/EMBL/GenBank under the accessions GGQX00000000 (Ca), GGQY00000000 (Cc), and GGQZ00000000 (Tv). ACCEPTED MANUSCRIPT

Supplementary Tables S1-S3 list best Blast hits and statistics of all the contigs assembled from the transcriptomic analysis of the Harderian gland of the Cuban colubroids Ca, Cc, and Tv, respectively.

2.2. The tip of the iceberg: the 25-top expressed transcripts

Assembled contigs were used as templates for mapping raw paired-ends reads, yielding more than 40 million reads in each species, which represented 78.76% (Ca), 82.21% (Cc) and 81.77%

(Tv) of the mapped reads, with the majority of reads mapped in pairs (Table 1). Blast analysis of assembled contigs yielded 45,258 hits for Ca, 23,370 for Cc, and 36,982 for Tv, representing

71.46%, 84.47%, and 59.33% of the respective transcriptome (Table 1). The estimated expression (in Reads per Kilobase of Exon per Million Mapped Reads (RPKM) and Transcripts per Million (TPM)) for all contigs in the Harderian gland transcriptomes of the three Cuban colubrids investigated is shown in Supplementary Tables 1-3. Both, RPKM and TPM, had similar distribution in the three snake species, with about 25 contigs, each representing  0.1%

RPKM of the transcriptome, and a fewer number (9-15) of highly transcribed genes

(Supplementary Figure S1).

Amongst the 25 most expressed Harderian gland mRNAs of the three Cuban snakes sampled, orthologs of transcripts found in the maxillary venom glands of advanced snakes [52] were identified, includingACCEPTED lipocalin-like toxins AFC17965 MANUSCRIPT [Erythrolamprus miliaris] (8.8% RPKM in Ca; 6.25% in Cc; 3.3% in Tv), JQ340879 [Dispholidus typus] (Ca, 3.3%; Cc, 1.78%; Tv, 0.7%), and XM_014057573 [Thamnophis sirtalis] (Ca, 0.11%; Cc, 0.54%); and bactericidal/permeability-increasing protein-like 3 CAO79917 [Deinagkistrodon acutus] (Ca,

0.5%; Cc, 0.5%; Tv, 0.3%). In addition, 0.2% of the transcriptional activity of the Harderian gland of Caraiba andreae was identified as Ophiophagus hannah Ohanin precursor-like ACCEPTED MANUSCRIPT

[AY351433] mRNA, and Tretanorhinus variabilis expressed a Boiga irregularis pentaxin-like transcript [A0A0B8RX63] that accounted for 0.3% RPKM of its Harderian gland transcriptome.

Moreover, a high occurrence of proteins without function ascribed (uncharacterized proteins) that we called non-characterized proteins were also found. The presence of homologues sequences without predicted function could be a consequence of the increasing number of large- scale sequencing projects deposited in databases. It also correlates with the limited knowledge on the Harderian gland gene products.

One of the most expressed transcripts corresponded to lipocalins, which are secreted proteins that bind a wide variety of small hydrophobic ligands with high affinity [62]. Members of this protein family are thought to be involved in biological processes, such as immune response, olfaction and pheromone transport [62-65]. Lipocalins have been found in gland secretions of both invertebrate and vertebrate hematophagous taxa [66], and in transcriptomic studies of venom glands of a variety of advanced snakes [52]. In this respect, lipocalin has been reported as one of the most expressed transcripts in Atractaspis aterrima venom gland [67], and in the

Duvernoy’s gland transcriptome of Oxyrhopus guibei, where it represented 29% of the sequenced reads [54].

The presence of bactericidal proteins in addition to snake’s toxins, shed light about possible involvement in defense against pathogens. The family of bactericidal/permeability-increasing proteins has beenACCEPTED associated with the protection MANUSCRIPT against Gram-negative bacteria in human and mice [68], showing an evolutionary conservancy (due to their very important role) among very distinct mammals that could be occurring also in other vertebrates, such as reptiles. Indeed, the bactericidal/permeability–increasing protein has been classified as an “endogenous antibiotic protein with activity against Gram–negative bacteria” [69]. However, other functions like ACCEPTED MANUSCRIPT

odorant carrier or removal have been previously suggested to this family [68,70]. In addition, another transcript found among the most expressed contigs was Pentaxin (pentraxin), which have been also reported to be involved in acute immunological responses [71].

Establishing the actual translation levels and functions of the putative proteins identified in the Harderian gland of the three Cuban snake species sampled requires further work, but it is tempting to hypothesize its role in the olfactory system, i.e., defense against pathogens, and in the transport of pheromones or odorants to the VNO, as previously suggested [20]. The hypothetical defensive function of the Harderian gland would be supported by the reported overexpression of lipocalin and bactericidal/permeability-increasing proteins in transcriptomes of the human and murine olfactory epithelium [70], a tissue that is permanently exposed to pathogen invasions [68,70].

2.3. Low abundance Harderian gland transcripts encoding canonical toxins

A low percentage (3.3-3.6%) of transcripts matched protein families that in other non-front- fanged snake species include bona fide or putative toxins [54] (Table 2). These 1,632 (Ca), 781

(Cc) and 1,299 (Tv) transcripts were manually filtered to sort out products of ordinary genes from those displaying significant homology to proteins previously identified in snake venoms or venom/Duvernoy's gland transcriptomes. Table 2 displays the identity, relative abundance and closest similarity ACCEPTED to NCBI databank snake venom MANUSCRIPT protein entries of contigs transcribed in the Harderian glands of the three Cuban snakes sampled. ACCEPTED MANUSCRIPT

Table 1. Summary statistics for the de novo assembly of the Harderian gland transcriptomes of the three Cuban colubroids, Caraiba andreae (Ca), Cubophis cantherigerus (Cc) and

ACCEPTED MANUSCRIPT

Tretanorhinus variabilis (Tv).

ACCEPTED MANUSCRIPT

Figure 1. A) Picture highlighting the anatomical relationship between the positions of the

Harderian (Hg) and Duvernoy's (Dg) cephalic glands of the Cuban Lesser Racer, Caraiba andreae. B) Pie charts of the relative expression of putative toxin-coding transcripts identified in the transcriptomes of the Harderian glands of the three Cuban colubrids, Caraiba andreae (Ca),

Cubophis cantherigerus (Cc) and Tretanorhinus variabilis (Tv).

Our results point to a low expression of the putative toxin transcripts (0.0001-0.01% of the total transcriptional activity) in the respective Harderian glands of Cc and Tv. Only in Ca, transcripts encoding putative SVMPs represent >0.1% of the transcriptome (Table 2). In addition, there is little overlap between the sets of putative toxin transcripts among the three snakes. The mostACCEPTED expressed toxin transcripts inMANUSCRIPT Ca encoded sequences with closest homology to Philodryas SVMPs. This toxin class represents the second more abundant transcripts in the transcriptome of Cc and is absent in Tv, whose Harderian gland "toxin transcriptome" is dominated by waprin-encoding mRNAs (Table 2). Moreover, CTL-encoding transcripts represent more than the half of all the Tv Harderian gland toxin transcripts, but they barely ACCEPTED MANUSCRIPT

account for 2% of the transcripts coding for putative toxins in the Harderian gland transcriptome of Caraiba andreae (Ca). Figure 1B displays the relative levels of putative toxin transcripts among the Harderian gland transcriptomes of the Cuban colubrids investigated.

The presence of toxin transcripts in the Harderian gland could be possible since canonical toxins have been repeatedly reported in the oral secretion of some nonvenomous species of snakes and lizards [38,66,72,73]. In Squamates, some tissues can produce toxin-transcripts as consequence of the single early origin of toxin encoding-genes, at the base of the Toxicofera

[74,75]. Moreover, this feature has not been only ascribed to those organs or glands, whose embryonic fate is related to oral secretions. Besides salivary glands, scent glands, and even profiles from the skin genes expression from de gekkonid lizards and some snakes, revealed co- expression of toxins-related genes in comparable level to those found in snake venom gland [74].

Therefore, the Harderian gland may produce toxins occurring in the secretion, at least in low levels. While the amount of toxins is unlikely to be functionally relevant in predation it could be playing a role in feeding and protection against microorganism, harmful bacteria or infections in the orbital space, mucosal membranes and also skin.

Table 2. Putative canonical toxin-coding transcripts expressed in the Harderian gland transcriptomes of the Cuban colubroids, Caraiba andreae (Ca), Cubophis cantherigerus (Cc) and Tretanorhinus variabilis (Tv). C-NP, C-type natriuretic peptide precursor; CTL, C-type lectin-like; CRISP,ACCEPTED cysteine-rich secretory protein; MANUSCRIPT Hya, hyaluronidase; KUN, Kunitz-type serine protease inhibitor-like; SVMP, snake venom metalloproteinase; SP, serine protease; VEGF, vascular endothelial growth factor; WAP, waprin. Nucleotide sequences of assembled putative toxin-coding contigs are listed in Supplementary Table S4. ACCEPTED MANUSCRIPT

Species Toxin family Contig Contig size (bp) Best NCBI hit % Query coverage % ID e-value RPKM % transcriptome % Toxin family Contig16590 413 100.0 90.6 3.32E-174 C-NP DQ912656.1 Philodryas olfersii 132.2 0.01320 2.95 Contig17753 875 67.3 87.0 0.0 Contig613 442 KM527183.1 Philodryas chamissonis 41.6 91.9 1.76E-70 33.2 0.00330 CTL 1.97 Contig17228 765 DQ912658.1 Philodryas olfersii 86.5 90.0 0.0 55.4 0.00550 Contig16917 1681 49.5 89.9 0.0 DQ912659.1 Philodryas olfersii 1046.4 0.07400 Contig59349 2116 61.8 91.0 0.0 CRISP Contig19342 847 DQ139897.1 Philodryas olfersii 62.5 85.3 7.53E-180 3.3 0.00033 23.94 Contig48801 964 EU938339.1 Naja kaouthia 6.5 93.7 1.47E-18 15.5 0.00150 Contig17831 1635 AY093955.1 Rhabdophis tigrinus 3.4 96.4 1.07E-16 9.0 0.00090 Contig46174 1032 DQ840262.1 Echis carinatus sochureki 67.7 88.1 0.0 1.5 0.00015 HYA Contig46175 1812 91.2 89.0 0.0 0.00054 0.18 AB851978.1 Ovophis okinavensis 6.5 Contig46176 1048 67.1 85.9 0.0 0.00011 Contig17117 2876 EU012449.1 Austrelaps labialis 40.4 82.7 0.0 27.8 0.00270 KUN 0.74 Contig42573 529 DQ464286.1 Sistrurus catenatus edwardsi 90.0 73.5 2.94E-81 5.6 0.00056 Contig324 1459 46.0 79.0 1.32E-179 Contig16906 1673 47.4 87.1 0.0 Contig21872 328 99.7 88.5 2.13E-118 Ca Contig44916 1139 88.5 92.8 0.0 GQ139592.1 Philodryas olfersii 1103.2 0.11000 Contig44919 2158 76.0 89.7 0.0 Contig45587 433 100.0 92.8 0.0 Contig464 813 96.7 76.9 0.0 Contig63240 255 100.0 92.5 8.94E-103 SVMP 68.58 Contig936 268 100.0 77.5 1.25E-50 Contig16965 2628 85.8 91.1 0.0 Contig17004 1177 100.0 88.1 0.0 Contig44917 1578 KM527180.1 Philodryas chamissonis 43.4 93.0 0.0 1746.3 0.17500 Contig44918 1479 88.8 93.2 0.0 Contig49335 871 51.0 93.7 0.0 Contig61177 440 47.7 89.5 2.74E-74 Contig46524 370 EF080840.1 Naja atra 88.1 88.8 2.96E-117 81.0 0.00810 SP Contig29475 2302 AB848237.1 Protobothrops flavoviridis 79.8 78.5 0.0 11.6 0.00116 0.25 Contig31052 4417 AB851940.1 Protobothrops flavoviridis 93.8 88.7 0.0 9.0 0.00900 VEGF 0.23 Contig52705 621 FJ554641.1 Protobothrops flavoviridis 99.8 89.1 0.0 1.7 0.00017 Contig22160 1041 AB851939.1 Protobothrops flavoviridis 12.4 82.2 2.69E-28 29.1 0.00290 WAP 0.14 Contig47028 492 EU029743.1 Philodryas olfersii 26.6 90.2 7.83E-44 22.2 0.00220 KUN Contig11376 2355 EU012449.1 Austrelaps labialis 53.4 81.6 0.0 33.4 0.00330 20.17 SP Contig21245 2239 AB848237.1 Protobothrops flavoviridis 81.6 78.1 0.0 9.2 0.00091 5.52 Cc VEGF Contig9026 4058 AB852009.1 Ovophis okinavensis 98.5 88.8 0.0 8.0 0.00080 4.80 WAP Contig93 604 AB851939.1 Protobothrops flavoviridis 19.2 83.6 6.50E-27 115.2 0.01150 69.50 Contig36720 668 GU190822.1 Bungarus flaviceps 90.0 81.3 3.48E-170 4.4 0.00044 Contig37483 722 FJ790480.1 Suta nigriceps 94.2 84.3 0.0 109.9 0.01100 CTL 53.68 Contig37484 736 EF194720.1 Oxyuranus scutellatus 84.7 79.5 2.13E-154 99.4 0.00990 Contig52500 559 EU029700.1 Philodryas olfersii 85.9 82.5 1.08E-137 106.8 0.01070 CRISP Contig8067 1214 A7X4T8.1 Causus rhombeatus 11.8 87.5 3.02E-24 51.3 0.00510 8.59 Tv Contig39002 1078 AB851935.1 Protobothrops flavoviridis 11.9 86.7 7.02E-36 5.8 0.00058 SVMP Contig40773 563 EF080840.1 Naja atra 57.4 75.4 1.66E-59 1.2 0.00012 36.18 Contig54548 2655 GQ139592.1 Philodryas olfersii 86.1 86.5 0.0 209.0 0.02089 Contig15371 348 A7X4M7.1 Philodryas olfersii 69.0 88.8 6.36E-33 2.4 0.00024 WAP Contig42847 404 EU029743.1 Philodryas olfersi i 51.1 81.1 3.84E-53 4.5 0.00045 1.53 Contig55283 235 EU029742.1 Philodryas olfersi i 100.0 91.5 7.15E-91 2.2 0.00022

Indeed, the antibiotic activity of some snake’s toxins have been extensively reviewed and its antimicrobial effectivenessACCEPTED has been demonstrated MANUSCRIPT [76,77]. Some of them like CTL, SVMP and waprin (detected in this study) have been subsequently tested having potent antibacterial activity

[76,78]. It is noteworthy that a SVMP from the Chinese viper Agkistrodon halys resulted more active than conventional drugs against some multi-drug resistant human pathogens [79]. Besides, members of the waprin family have shown antimicrobial activity against a variety of bacteria ACCEPTED MANUSCRIPT

including Gram-positive bacteria [80], as well as multi-drug resistant strains [81]. Moreover,

CLTs are also involved in innate immune responses against potential pathogens related to its glycan recognition capability [82].

3. Concluding remarks

This work is the first to analyse the transcriptome of the Harderian gland in snake species. As is often the case with any pioneering work, this study also leaves more open questions than those that it helped to unveil. The scarcity of data on the physiology and ecology of the snakes studied does not help to rationalise the transcriptomic results either. Thus, although our results seem to be compatible with the Harderian gland performing functions related to binding/transport of substances, e.g. pheromones and/or odorants in vomerolfaction, and protection against pathogens in the orbital space and mucosal membranes of the olfactory system, they also suggest that this enigmatic gland may produce toxins. Clearly, more detailed studies, including proteomic analysis of the Duvernoy's gland secretion are needed. Toxic effects (inflammation, flushing, fever) due to Cubophis cantherigerus' non-lethal bites to humans have been documented [83-86], and predation by the Cuban racer on the Cuban giant anole, Anolis equestris, presumably using venom, has been reported [87]. The Cuban lesser racer, Caraiba andreae, a diurnal ground dwelling species and active forager that inhabits mesophilic [88] typically preys on frogs

[60] and lizards [89]. The Caribbean water snake, a nocturnal and aquatic snake found on Cuba,

Isla de la Juventud,ACCEPTED and Grand Cayman in the MANUSCRIPT Cayman Islands, preys on small fishes and frogs.

However, whether these species use venom for predation, and whether the Harderian gland represents an ancillary source of toxins to the snake, deserves future detailed studies.

4. Materials and Methods ACCEPTED MANUSCRIPT

4.1. Specimens sampling and Harderian gland extraction

The three specimens of colubroids, Caraiba andreae (Cuban Lesser Racer, Ca), Cubophis cantherigerus (Cuban Racer, Cc) and Tretanorhinus variabilis (Cuban Water Snake, Tv) used in this study were sampled from the wild: Ca (adult) was sampled at Pico San Juan, Cienfuegos; Cc

(subadult) at Guanahacabibes, Pinar del Río; while Tv (adult) was sampled at Santa Fe, La

Habana. Specimens' weight was 44.9 g (Ca), 160.4 g (Cc) and 94.1 g (Tv), and their snout-to- vent-lengths were 382 mm (Ca), 537 mm (Cc) and 323 mm (Tv). Specimens were euthanized with sodium pentobarbital and both Harderian glands were removed (Figure 1A). The weight of each gland pair was measured in an analytical balance (model AW-224, Sartorious AG,

Gottingen, Germany) and the glands were stored in 2 mL Eppendorf tubes containing RNA stabilizing reagent RNAlater (Thermo Fisher Scientific, Waltham, MA). All were manipulated according to the University of Havana Care, based on the Institutional

Animal Care and Use Committees (IACUCs) Guide [90]. Vouchers were deposited under the followed field code at the herpetological collections of the Museo de Historia Natural “Felipe

Poey”, University of Havana (Cc and Tv, CHC-222 and CHC-224, respectively) and the Instituto de Ecología y Sistemática (Ca, CHC-225), La Habana, Cuba.

4.2. Harderian gland RNA extraction and Illumina sequencing

Total RNA extractionACCEPTED from the glands was carriedMANUSCRIPT out with Qiagen’s RNeasy Mini kit (Venlo, The Netherlands). To this end, both glands of each specimen (Ca: 87.4 mg; Cc: 82.1 mg and Tv:

91.3 mg) were transferred to suitable vessels (Micrewtube® Microcentrifuge Tube with Screw

Cap, Simport Scientific) containing 600 μL of homogenization buffer (RLT buffer). Glands were disrupted and homogeneized using a Precellys® 24 tissue homogenizer (Bertin Technologies,

Montigny le Bretonneux, France). The lysate was centrifuged at 8000× g for 3 min (Centrifuge – ACCEPTED MANUSCRIPT

VWR Micro Star 17R, Radnor, Pennsylvania, US) and the supernatants were transferred to new tubes, and mixed with 70% ethanol (1:1). Thereafter, 700 μL of the mixture were transferred to a

RNeasy spin column, centrifuged for 15 s at 8000× g, and the flow-through discarded.

Subsequently, 700 μL of RW1 buffer was added to the spin column and centrifuged for 15 s at

8000× g. Washing steps were performed by addition of 500 μL of RPE buffer and centrifugation under the same conditions. Before elution, a last washing step was performed with 500 μL of

RPE buffer and the column was then centrifuged for 2 min at 8000× g and the flow-through discarded. Finally, a total volume of 30 μL of sample was eluted using RNase-free water.

RNA concentration was measured photometrically with Qubit Fluorometer (Invitrogen,

Carlsbad, CA, USA) and its integrity examined by agarose gel electrophoresis. RNA Integrity

Number (RIN) as criteria of total RNA integrity was also evaluated with the 2100 Bioanalyzer

(Agilent Technologies, Santa Clara, CA, USA). Total RNA extracted from each species (Ca:

11.47 μg, RIN: 8.2; Cc:11.87 μg, RIN: 8.8 and Tv: 10.89 μg, RIN: 7.6) was used for library preparation on Macrogen, Inc. (Seoul, South Korea) using the TruSeq stranded total RNA library prep with Ribo-Zero Globin kit and finally sequenced in one lane on the Illumina HiSeq 2000

(Illumina, Inc., San Diego, CA, USA) with 100-base-pair (bp) paired-end reads (PE).

4.3. Transcriptome assembly and bioinformatics analyses

Illumina data wereACCEPTED analysed using an adapted MANUSCRIPT version of the workflow described by Durban et al. [48], which includes available NGS algorithms and in-house scripts. First, the NGS data and the paired-end Illumina raw reads were trimmed, and adapters removed using TrimmomaticPE

(version 0.35). Identical reads were collapsed using FastX-collapser (FastX Toolkit version

0.0.14) and resulting reads were considered as high-quality and suitable for the downstream assembly process. ACCEPTED MANUSCRIPT

For de novo transcriptome assembly, a multi-assembler approach was employed, which involves combination of De Brujin graph and Overlapping Layout Consensus (OLC) assemblers into a final data set ready for annotation. To this end, collapsed sequences from the preprocessing step were assembled with Oases (version 0.2.08) [91]. Assembly was done using k-mer values from 23 to 43 with a step value of 2, and merged [92]. Collapsed sequences were also assembled with Trinity (version trinityrnaseq_r20131110) [93], setting a minimum length of

100 nucleotides per contig. Assembled sequences from Trinity and Oases were reassembled with

CAP3 (Version Date: 08/06/2013) [94], with a minimum 98% identity sequence required to overlap nucleotide sequences.

Annotations were carried out in two steps. First, CAP3 contigs and singletons were masked using RepeatMasker (version 4.0.5) [95] (http://www.girinst.org, version 19.08, update of

September 2014). Masked sequences were blasted (BLASTALL version 2.2.26) [96] against the non-redundant nucleotide NCBI and Uniref100 databases (uptade November 2017) setting a cut- off e-value of e-03. Second, CAP3 contigs were imported into CLC Genomics Workbench

(version 8.5.1) (Qiagen, Aarhus, Denmark) and used as references to map raw paired-ends reads.

Blast hits were concatenated with the CLC-mapped-contigs to estimate the expression as RPKM

(Reads Per Kilobase per Million mapped reads) [97] and TPM (transcripts per million).

4.4. Harderian gland toxin characterization ACCEPTED MANUSCRIPT Putative Harderian gland toxin transcripts that we called “potential toxins” (see Table 1) were extracted from BLAST-hits to GenBank entries within the Serpentes, and manually filtered using keywords, including the acronyms of all known toxin protein families. The phylogenetically nearest amino acid full-length sequence was used as a reference onto which the six possible reading frames of toxin family-specific obtained nucleotide sequences translation ACCEPTED MANUSCRIPT

exhibiting e-value thresholds better than e-03 were aligned to create a multiple alignment using

Cobalt (version 2.0.2, build June 2013) [98]. The total number of full-length or nearly full-length sequences was calculated using the analyze_blastPlus_topHit_coverage.pl script included in the

Trinity package. Transcript abundance of a given toxin class family was estimated by mapping pre-collapsed reads back to the coding sequences of previously annotated toxin-encoding transcripts. To this end, bowtie2 (version 2.2.4) [99] with –very-sensitive (i.e. –end-to-end mode) preset parameters and Bowtie (version 1.0.0) with –best reporting parameters, were used. The relative expression of transcripts resulted as Blast positive hits was calculated according to the

RPKM (Reads per Kilobase of exon per Million mapped reads) [97]. The relative expression of each toxin protein family (mol %), was calculated as the number of reads assigned to this protein family (Ri) normalized by the length (in nucleotides) of the reference transcript sequence

(ntREF) and expressed as the % of total reads in the snake transcriptome (ΣReads): mol% toxin family “i” =%[(Ri/ntREF)/ΣReads).

Supplementary Materials: The following files are available online at…:Figure S1, shows

Distribution (RPKM and TPM) of the most expressed transcript; Table S1, list and statistics of all contigs assembled in the transcriptomic analysis of the Harderian gland of the Cuban colubroid Caraiba andreae; Table S2, list and statistics of all contigs assembled in the transcriptomic analysis of the Harderian gland of the Cuban colubroid Cubophis cantherigerus; Table S3, list andACCEPTED statistics of all contigs MANUSCRIPT assembled in the transcriptomic analysis of the Harderian gland of the Cuban colubroid Tretanorhinus variabilis; Table S4, assembled nucleotide sequences of transcripts coding for the putative canonical toxins shown in Table 2. ACCEPTED MANUSCRIPT

Conflicts of Interest: Authors do not declare any conflict of interest. The founding sponsors had no role in the design of the study, analyses or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Author Contributions: Conceptualization, Dany Domínguez-Pérez, Jordi Durban and

Agostinho Antunes; Data curation, Dany Domínguez-Pérez and Jordi Durban; Formal analysis,

Dany Domínguez-Pérez, Jordi Durban, Reinaldo Molina Ruiz and Daniela Almeida; Funding acquisition, Agostinho Antunes; Project administration, Agostinho Antunes; Resources,

Agostinho Antunes; Software, Reinaldo Molina Ruiz and Daniela Almeida; Supervision, Juan J.

Calvete, Vítor Vasconcelos and Agostinho Antunes; Writing – original draft, Dany Domínguez-

Pérez, Jordi Durban, Guillermin Agüero-Chapin, Javier Torres Lopez, Juan J. Calvete and

Agostinho Antunes; Writing – review & editing, Juan J. Calvete, Vítor Vasconcelos and

Agostinho Antunes.

Acknowledgments: DDP was supported by a PhD grant (SFRH/BD/80592/2011) from the

Portuguese Foundation for Science and Technology (FCT—Fundação para a Ciência e a

Tecnologia, Portugal). GACh was also supported by a Postdoctoral grant

(SFRH/BPD/92978/2013) from the FCT. This study was funded in part by the projects

PTDC/AAG-GLO/6887/2014 and by the Strategic Funding UID/Multi/04423/2013 through national funds providedACCEPTED by FCT and the European MANUSCRIPT Regional Development Fund (ERDF) in the framework of the program PT2020, by the European Structural and Investment Funds (ESIF) through the Competitiveness and Internationalization Operational Program—COMPETE 2020 and by National Funds through the FCT under the project PTDC/AAG-GLO/6887/2014 (POCI-

01-0124-FEDER-016845), and by the Structured Programs of R&D&I INNOVMAR— ACCEPTED MANUSCRIPT

Innovation and Sustainability in the Management and Exploitation of Marine Resources

(NORTE-01-0145-FEDER-000035, Research Line NOVELMAR), and funded by the Northern

Regional Operational Program (NORTE2020) through the ERDF. Work performed at the

Evolutionary and Translational Venomics Laboratory, Instituto de Biomedicina de Valencia

(CSIC) was funded by grant BFU2013-42833-P from the Ministerio de Economía y

Competitividad, Madrid, Spain (to JJC). We are grateful to: Tomás Michel Rodríguez Cabrera

(Sociedad Cubana de Zoología) and Lázaro Cuellar Yanes (undergraduate Biology student from

La Universidad de La Habana, Cuba) for collecting specimens; Bruno Reis (CIIMAR, FCUP,

University of Porto) for his help supervising the total RNA extraction; Prof. Alan H. Savitzky

(Utah State University, USA), for his useful comments and insights on the Harderian Gland; to

Yudermys Moya Chaviano, for helping with the confection of Figure 1; Filipe Silva and

Emanuel Maldonado (CIIMAR, FCUP, University of Porto) for helping with the analysis of contig expression performed with the CLC Genomics Worbench 8.5.1, and with the confection of Supplementary Table 1, respectively.

ACCEPTED MANUSCRIPT References

1. Payne, A. The Harderian gland: A tercentennial review. J. Anat. 1994, 185, 1-49.

2. McDowell, S.B. Toxicocalamus, a new guinea genus of snakes of the family Elapidae. J.

Zool. 1969, 159, 443-511. ACCEPTED MANUSCRIPT

3. Saint Girons, H. In Les glandes céphaliques exocrines des reptiles. I: Données anatomiques et histologiques, Ann. Sci. Nat. Zool., 1988, 221-255.

4. Saint-Girons, H. In Les glandes céphaliques exocrines des reptiles. II: Considérations fonctionnelles et évolutives, Ann. Sci. Nat. Zool. 1989, 1-17.

5. Dillemeijer, P. A comparative functional-anatomical study of the heads of some

Viperidae. Morph. Jb. 1959, 99, 881-985.

6. Harder, J. Glandula nova lachrymalis una cum ductu excretorio in cervis ed damis. Acta

Erudit. Publ. Lipsias, 49-52.

7. Chieffi, G.; Baccari, G.C.; Di Matteo, L.; d'Istria, M.; Minucci, S.; Varriale, B. Cell biology of the Harderian gland. Int. Rev. Cytol. 1996, 168, 1-80.

8. Cowan, F.B.M. The ultrastructure of the lachrymal'salt'gland and the Harderain gland in the euryhaline Malaclemys and some closely related stenohaline emydines. Can. J. Zool. 1971,

49, 691-697.

9. Saint Girons, H. Histologie compareé des glandes orbitaires des lépidosauriens. Ann. Sci.

Nat. Zool. 1982, 4, 171-191.

10. Wetterberg, L.; Geller, E.; Yuwiler, A. Harderian gland: An extraretinal photoreceptor influencing the pineal gland in neonatal rats? Science 1970, 167, 884-885. 11. Reiter, R.J.;ACCEPTED Richardson, B.A.; Matthews, MANUSCRIPT S.A.; Lane, S.J.; Ferguson, B.N. Rhythms in immunoreactive melatonin in the retina and Harderian gland of rats: Persistence after pinealectomy. Life Sci. 1983, 32, 1229-1236.

12. Pevet, P.; Heth, G.; Hiam, A.; Nevo, E. Photoperiod perception in the blind mole rat

(Spalax ehrenbergi, Nehring): Involvement of the Harderian gland, atrophied eyes, and melatonin. J. Exp. Zool. Part A: Ecol. Genet. Physiol. 1984, 232, 41-50. ACCEPTED MANUSCRIPT

13. Keverne, E.B. The vomeronasal organ. Science 1999, 286, 716-720.

14. Silva, L.; Antunes, A. Vomeronasal receptors in vertebrates and the evolution of pheromone detection. Annu Rev Anim Biosci. 2017, 5, 353-370.

15. Halpern, M. Nasal chemical senses in reptiles: Structure and function (Gans, C., Crews

D., eds) Biology of the reptilia, vol. 18, Brain, Hormones, and Behavior. Chicago/IL: University of Chicago Press, 1992, 423–523.

16. Young, J.A.; Van Lennep, E.W. The morphology of salivary glands. Academic Press,

1978.

17. Walls, G.L. The vertebrate eye and its adaptive radiation. Cranbrook Institute of Science:

Oxford, UK, 1942; p xiv, 785.

18. Bellairs, A.A.; Boyd, J. In The lachrymal apparatus in lizards and snakes.—i. The brille, the orbital glands, lachrymal canaliculi and origin of the lachrymal duct, Proc. Zool. Soc.

London, Wiley Online Library, 1947, 81-108.

19. Rehorek, S.J. Squamate Harderian gland: An overview. Anatom. Rec. 1997, 248, 301-

306.

20. Bentz, E.; Mason, R.T. The role of the Harderian gland in the chemical ecology of garter snakes. Department of Integrative Biology, Oregon State University (OSU); http://masonlab.science.oregonstate.edu/: 2017. ACCEPTED MANUSCRIPT 21. Savitsky, A.H. The relationships of the xenodontine colubrid snakes related to Ninia.

[M.Sc. thesis]. Lawrence: University of Kansas, 1974. ACCEPTED MANUSCRIPT

22. West, G. On the histology of the salivary, buccal, and Harderian glands of the

Colubridae, with notes on their tooth‐succession and the relationships of the poison‐duct. J.

Linnean Soc. London, Zoology 1898, 26, 517-526.

23. Smith, M.; Bellairs, A.d.A. The head glands of snakes, with remarks on the evolution of the parotid gland and teeth of the opisthoglypha. J. Linnean Soc. London, Zoology 1947, 41,

351-368.

24. Greene, H.W.; Fogden, M.; Fogden, P. Snakes: The evolution of mystery in nature. Univ.

California Press, 2000.

25. Schwarz-Karsten, H. Über die Orbitaldrüsen von Lacerta agilis, lacerta muralis,

Ophiosops elegans, Tarentola mauretanica und Tropidonotud natrix. Morphol. Jb. 1937, 80,

248-279.

26. Savitzky, A.H. The origin of the new world proteroglyphous snakes and its bearing on the study of venom delivery systems in snakes. [Ph.D. thesis]. Lawrence: University of Kansas,

1979.

27. McCarthy, C. Monophyly of elapid snakes (Serpentes: Elapidae). An assessment of the evidence. Zool. J. Linnean Soc. 1985, 83, 79-93.

28. Zaher, H.; de Oliveira, L.; Grazziotin, F.G.; Campagner, M.; Jared, C.; Antoniazzi, M.M.;

Prudente, A.L. Consuming viscous prey: A novel protein-secreting delivery system in neotropical snail-ACCEPTEDeating snakes. BMC Evol. Biol. MANUSCRIPT 2014, 14, 58.

29. Pratt, C. In The morphology of the ethmoidal region of Sphenodon and lizards, Proc.

Zool. Soc. London, Wiley Online Library 1948, 171-201.

30. Parsons, T.S. The nose and Jacobson's organ. In Biology of the Reptilia (B. C. Gans, ed.),

Academic Press, London, Morphology 1970; Vol. 2. ACCEPTED MANUSCRIPT

31. Oakley, C.A.; Ameismeier, M.F.; Peng, L.; Weis, V.M.; Grossman, A.R.; Davy, S.K.

Symbiosis induces widespread changes in the proteome of the model cnidarian aiptasia. Cell

Microbiol 2016, 18, 1009-1023.

32. Cziesielski, M.J.; Liew, Y.J.; Cui, G.; Schmidt-Roach, S.; Campana, S.; Marondedze, C.;

Aranda, M. Multi-omics analysis of thermal stress response in a zooxanthellate cnidarian reveals the importance of associating with thermotolerant symbionts. Proc Biol Sci 2018, 285.

33. Ojeda, P.G.; Ramirez, D.; Alzate-Morales, J.; Caballero, J.; Kaas, Q.; Gonzalez, W.

Computational studies of snake venom toxins. Toxins 2017, 10.

34. Tasoulis, T.; Isbister, G.K. A review and database of snake venom proteomes. Toxins

2017, 9.

35. Lomonte, B.; Fernandez, J.; Sanz, L.; Angulo, Y.; Sasa, M.; Gutierrez, J.M.; Calvete, J.J.

Venomous snakes of costa rica: Biological and medical implications of their venom proteomic profiles analyzed through the strategy of snake venomics. J. Proteomics 2014, 105, 323-339.

36. Calvete, J.J. Snake venomics: From the inventory of toxins to biology. Toxicon 2013, 75,

44-62.

37. Ruder, T.; Sunagar, K.; Undheim, E.A.; Ali, S.A.; Wai, T.C.; Low, D.H.; Jackson, T.N.;

King, G.F.; Antunes, A.; Fry, B.G. Molecular phylogeny and evolution of the proteins encoded by coleoid (cuttlefish, octopus, and squid) posterior venom glands. J Mol Evol 2013, 76, 192-

204. ACCEPTED MANUSCRIPT

38. Fry, B.G.; Undheim, E.A.; Ali, S.A.; Jackson, T.N.; Debono, J.; Scheib, H.; Ruder, T.;

Morgenstern, D.; Cadwallader, L.; Whitehead, D.; Nabuurs, R.; van der Weerd, L.; Vidal, N.;

Roelants, K.; Hendrikx, I.; Gonzalez SP.; Koludarov, I.; Jones, A.; King, G.F.; Antunes, A.; ACCEPTED MANUSCRIPT

Sunagar, K. Squeezers and leaf-cutters: Differential diversification and degeneration of the venom system in toxicoferan reptiles. Mol Cell Proteomics 2013, 12, 1881-1899.

39. Sunagar, K.; Fry, B.G.; Jackson, T.N.; Casewell, N.R.; Undheim, E.A.; Vidal, N.; Ali,

S.A.; King, G.F.; Vasudevan, K.; Vasconcelos, V.M.; Antunes, A. Molecular evolution of vertebrate neurotrophins: Co-option of the highly conserved nerve growth factor gene into the advanced snake venom arsenal. PloS one 2013, 8, e81827.

40. Low, D.H.; Sunagar, K.; Undheim, E.A.; Ali, S.A.; Alagon, A.C.; Ruder, T.; Jackson,

T.N.; Gonzalez, S. P.; King, G.F.; Jones, A., Antunes, A.; Fry, B.G. Dracula's children:

Molecular evolution of vampire bat venom. J. Proteomics 2013, 89, 95-111.

41. Pereira, S.R.; Vasconcelos, V.M.; Antunes, A. The phosphoprotein phosphatase family of ser/thr phosphatases as principal targets of naturally occurring toxins. Crit Rev Toxicol 2011, 41,

83-110.

42. Pereira, S.R.; Vasconcelos, V.M.; Antunes, A. Computational study of the covalent bonding of microcystins to cysteine residues--a reaction involved in the inhibition of the ppp family of protein phosphatases. FEBS J 2013, 280, 674-680.

43. da Fonseca, R.R.; Johnson, W.E.; O'Brien, S.J.; Vasconcelos, V.; Antunes, A. Molecular evolution and the role of oxidative stress in the expansion and functional diversification of cytosolic glutathione transferases. BMC Evol Biol 2010, 10, 281.

44. da Fonseca,ACCEPTED R.R.; Antunes, A.; Melo, A.;MANUSCRIPT Ramos, M.J. Structural divergence and adaptive evolution in mammalian cytochromes P450 2C. Gene 2007, 387, 58-66.

45. Ducancel, F.; Durban, J.; Verdenaud, M. Transcriptomics and venomics: Implications for medicinal chemistry. Future Med. Chem. 2014, 6, 1629-1643. ACCEPTED MANUSCRIPT

46. Brahma, R.K.; McCleary, R.J.; Kini, R.M.; Doley, R. Venom gland transcriptomics for identifying, cataloging, and characterizing venom proteins in snakes. Toxicon 2015, 93, 1-10.

47. Durban, J.; Juárez, P.; Angulo, Y.; Lomonte, B.; Flores-Diaz, M.; Alape-Girón, A.; Sasa,

M.; Sanz, L.; Gutiérrez, J.M.; Dopazo, J. Profiling the venom gland transcriptomes of costa rican snakes by 454 pyrosequencing. BMC Genomics 2011, 12, 259.

48. Durban, J.; Pérez, A.; Sanz, L.; Gómez, A.; Bonilla, F.; Chacón, D.; Sasa, M.; Angulo,

Y.; Gutiérrez, J.M.; Calvete, J.J. Integrated “omics” profiling indicates that mirnas are modulators of the ontogenetic venom composition shift in the central american rattlesnake,

Crotalus simus simus. BMC genomics 2013, 14, 234.

49. McGivern, J.J.; Wray, K.P.; Margres, M.J.; Couch, M.E.; Mackessy, S.P.; Rokyta, D.R.

Rna-seq and high-definition mass spectrometry reveal the complex and divergent venoms of two rear-fanged colubrid snakes. BMC Genomics 2014, 15, 1061.

50. Rokyta, D.R.; Lemmon, A.R.; Margres, M.J.; Aronow, K. The venom-gland transcriptome of the eastern diamondback rattlesnake (Crotalus adamanteus). BMC Genomics

2012, 13, 312.

51. Rokyta, D.R.; Wray, K.P.; Lemmon, A.R.; Lemmon, E.M.; Caudle, S.B. A high- throughput venom-gland transcriptome for the eastern diamondback rattlesnake (Crotalus adamanteus) and evidence for pervasive positive selection across toxin classes. Toxicon 2011,

57, 657-671. ACCEPTED MANUSCRIPT

52. Fry, B.G.; Scheib, H.; Junqueira de Azevedo, I.L.; Silva, D.A.; Casewell, N.R. Novel transcripts in the maxillary venom glands of advanced snakes. Toxicon 2012, 59, 696-708.

53. Ching, A.T.; Rocha, M.M.; Paes Leme, A.F.; Pimenta, D.C.; de Fátima D Furtado, M.;

Serrano, S.M.; Ho, P.L.; Junqueira-de-Azevedo, I.L. Some aspects of the venom proteome of the ACCEPTED MANUSCRIPT

Colubridae snake Philodryas olfersii revealed from a Duvernoy's (venom) gland transcriptome.

FEBS letters 2006, 580, 4417-4422.

54. Junqueira-de-Azevedo, I.L.; Campos, P.F.; Ching, A.T.; Mackessy, S.P. Colubrid venom composition: An-omics perspective. Toxins 2016, 8, 230.

55. Corrêa-Netto, C.; Junqueira-de-Azevedo, I.L.; Silva, D.A.; Ho, P.L.; Leitão-de-Araújo,

M.; Alves, M.L.M.; Sanz, L.; Foguel, D.; Zingali, R.B.; Calvete, J.J. Snake venomics and venom gland transcriptomic analysis of Brazilian coral snakes, Micrurus altirostris and M. corallinus. J.

Proteomics 2011, 74, 1795-1809.

56. Gonçalves-Machado, L.; Pla, D.; Sanz, L.; Jorge, R.J.B.; Leitão-De-Araújo, M.; Alves,

M.L.M.; Alvares, D.J.; De Miranda, J.; Nowatzki, J.; de Morais-Zani, K. Combined venomics, venom gland transcriptomics, bioactivities, and antivenomics of two Bothrops jararaca populations from geographic isolated regions within the Brazilian Atlantic rainforest. J.

Proteomics 2016, 135, 73-89.

57. Schwartz, T. S.; Tae, H.; Yang, Y.; Mockaitis, K.; Van Hemert, J.L.; Proulx, S.R.; Choi,

J-H.; Bronikowski, A.M. A garter snake transcriptome: Pyrosequencing, de novo assembly, and sex-specific differences. BMC Genomics 2010, 11, 694.

58. Ching, A.T.; Paes Leme, A.F.; Zelanis, A.; Rocha, M.M.; Furtado, M.F.; Silva, D.A.;

Trugilho, M.R.; da Rocha, S.L.; Perales, J.; Ho, P.L.; Serrano, S.M.; Junqueira-de-Azevedo,

I.L.Venomics profilingACCEPTED of Thamnodynastes strigatus MANUSCRIPT unveils matrix metalloproteinases and other novel proteins recruited to the toxin arsenal of rear-fanged snakes. J. Proteome Res. 2012, 11,

1152-1162.

59. Pla, D.; Petras, D.; Saviola, A.J.; Modahl, C.M.; Sanz, L.; Pérez, A.; Juárez, E.; Frietze,

S.; Dorrestein, P.C.; Mackessy, S.P.; Calvete, J.J. Transcriptomics-guided bottom-up and top- ACCEPTED MANUSCRIPT

down venomics of neonate and adult specimens of the arboreal rear-fanged Brown Treesnake,

Boiga irregularis, from Guam. J Proteomics. 2018, 174, 71-84.

60. Henderson, R.W.; Powell, R. Natural history of West Indian reptiles and amphibians.

University Press of Florida, 2009.

61. Chang, Z.; Wang, Z.; Li, G. The impacts of read length and transcriptome complexity for de novo assembly: A simulation study. PloS ONE 2014, 9, e94825.

62. Flower, D.R. The lipocalin protein family: Structure and function. Biochem. J. 1996, 318,

1-14.

63. Grzyb, J.; Latowski, D.; Strzałka, K. Lipocalins–a family portrait. J. Plant Physiol. 2006,

163, 895-915.

64. Miyawaki, A.; Matsushita, F.; Ryo, Y.; Mikoshiba, K. Possible pheromone-carrier function of two lipocalin proteins in the vomeronasal organ. EMBO J. 1994, 13, 5835.

65. Lee, K.-H.; Wells, R.G.; Reed, R.R. Isolation of an olfactory cDNA: Similarity to retinol- binding protein suggests a role in olfaction. Science 1987, 235, 1053-1057.

66. Fry, B.G.; Roelants, K.; Champagne, D.E.; Scheib, H.; Tyndall, J.D.; King, G.F.;

Nevalainen, T.J.; Norman, J.A.; Lewis, R.J.; Norton, R.S. The toxicogenomic multiverse:

Convergent recruitment of proteins into animal venoms. Ann. Rev. Genomics Hum. Genet. 2009,

10, 483-511. ACCEPTED MANUSCRIPT 67. Terrat, Y.; Sunagar, K.; Fry, B.G.; Jackson, T.N.; Scheib, H.; Fourmy, R.; Verdenaud,

M.; Blanchet, G.; Antunes, A.; Ducancel, F. Atractaspis aterrima toxins: The first insight into the molecular evolution of venom in side-stabbers. Toxins 2013, 5, 1948-1964. ACCEPTED MANUSCRIPT

68. Andrault, J.-B.; Gaillard, I.; Giorgi, D.; Rouquier, S. Expansion of the bpi family by duplication on human chromosome 20: Characterization of the ry gene cluster in 20q11. 21 encoding olfactory transporters/antimicrobial-like peptides. Genomics 2003, 82, 172-184.

69. Zhou, Z.-P.; Xia, X.-Y.; Guo, Q.-S.; Xu, C. Bactericidal/permeability-increasing protein originates in both the testis and the epididymis and localizes in mouse spermatozoa. Asian J

Androl. 2014, 16, 309.

70. Olender, T.; Keydar, I.; Pinto, J.M.; Tatarskyy, P.; Alkelai, A.; Chien, M.-S.; Fishilevich,

S.; Restrepo, D.; Matsunami, H.; Gilad, Y. The human olfactory transcriptome. BMC Genomics

2016, 17, 619.

71. Gewurz, H.; Zhang, X.-H.; Lint, T.F. Structure and function of the pentraxins. Curr. Op.

Immunol. 1995, 7, 54-64.

72. Fry, B.G.; Lumsden, N.G.; Wüster, W.; Wickramaratna, J.C.; Hodgson, W.C.; Kini, R.M.

Isolation of a neurotoxin (α-colubritoxin) from a nonvenomous colubrid: Evidence for early origin of venom in snakes. J. Mol. Evol 2003, 57, 446-452.

73. Fry, B.G.; Winter, K.; Norman, J.A.; Roelants, K.; Nabuurs, R.J.; Van Osch, M.J.;

Teeuwisse, W.M.; Van Der Weerd, L.; Mcnaughtan, J.E.; Kwok, H.F. Functional and structural diversification of the anguimorpha lizard venom system. Mol Cell Proteomics 2010, mcp. M110. 001370. ACCEPTED MANUSCRIPT 74. Hargreaves, A.D.; Swain, M.T.; Logan, D.W.; Mulley, J.F. Testing the toxicofera:

Comparative transcriptomics casts doubt on the single, early evolution of the venom system. Toxicon 2014, 92, 140-156. ACCEPTED MANUSCRIPT

75. Hargreaves, A.D.; Swain, M.T.; Hegarty, M.J.; Logan, D.W.; Mulley, J.F. Restriction and recruitment—gene duplication and the origin and evolution of snake venom toxins. Genome Biol

Evol. 2014, 6, 2088-2095.

76. de Oliveira Junior, N.G.; Franco, O.L. Snake venoms: Attractive antimicrobial proteinaceous compounds for therapeutic purposes. Cell Mol Life Sci 2013, 70, 4645-4658.

77. de Lima, D.C.; Alvarez Abreu, P.; de Freitas, C.C.; Santos, D.O.; Borges, R.O.; dos

Santos, T.C.; Mendes Cabral, L.; Rodrigues, C.R.; Castro, H.C. Snake venom: Any clue for antibiotics and cam? J Evid Based Complementary Altern Med. 2005, 2, 39-47.

78. Rádis-Baptista, G.; Moreno, F.B.M.B.; de Lima Nogueira, L.; Martins, A.M.; de Oliveira

Toyama, D.; Toyama, M.H.; Cavada, B.S.; de Azevedo, W.F.; Yamane, T. Crotacetin, a novel snake venom c-type lectin homolog of convulxin, exhibits an unpredictable antimicrobial activity. Cell Biochem Biophys. 2006, 44, 412-423.

79. Samy, R.P.; Gopalakrishnakone, P.; Chow, V.T.; Ho, B. Viper metalloproteinase

(agkistrodon halys pallas) with antimicrobial activity against multi‐drug resistant human pathogens. J Cell Physiol 2008, 216, 54-68.

80. Nair, D.G.; Fry, B.G.; Alewood, P.; Kumar, P.P.; Kini, R.M. Antimicrobial activity of omwaprin, a new member of the waprin family of snake venom proteins. Biochem J. 2007, 402,

93-104. ACCEPTED MANUSCRIPT

81. Hagiwara, K.; Kikuchi, T.; Endo, Y.; Usui, K.; Takahashi, M.; Shibata, N.; Kusakabe, T.;

Xin, H.; Hoshi, S.; Miki, M. Mouse swam1 and swam2 are antibacterial proteins composed of a single whey acidic protein motif. J Immunol 2003, 170, 1973-1979. ACCEPTED MANUSCRIPT

82. Drickamer, K. C-type lectin-like domains. Curr Opin Struct Biol. 1999, 9, 585-590.

83. Neill, W.T. Evidence of venom in snakes of the genera Alsophis and Rhadinaea. Copeia

1954, 1954, 59-60.

84. Jaume, M.L., Garrido, O.H. Notas sobre mordeduras de jubo (Alsophis cantherigerus)

Bibron (Reptilia, Serpentes Colubridae) en Cuba. Rev. Cub. Med. Trop 1980; 32, 145-148.

85. Poey, F. Mordedura de un jubo. El Genio Cient. La Habana 1873; 1, 94-8.

86. Minton, S.A. Venomous bites by nonvenomous snakes: an annotated bibliography of colubrid envenomation. J. Wilderness Med. 1990, 1, 119-127.

87. Rodríguez-Cabrera, T.M.; Torres, J.; Romero, R.M.; Podio-Martínez, J. Predation attempt by the Cuban racer, Cubophis cantherigerus (: Dipsadidae) on the Cuban giant anole, Anolis equestris buidei (Squamata: Dactyloidae), a threatened endemic subspecies. IRCF

Reptiles and Amphibians 2016, 23, 46–50.

88. Schwartz, A., Henderson, R. W. Amphibians and Reptiles of the West Indies:

Descriptions, Distributions, and Natural History. Gainesville, University of Florida Press, 1991.

89. Alfonso, Y.U., Pellicer-López, K., Armijos-Ojeda, D. Endemic frog predation by the

Cuban lesser racer, Caraiba andreae (Squamata: Dipsadidae), on La Melba, Alexander Von Humboldt NationalACCEPTED Park, eastern Cuba. Herpetol. MANUSCRIPT Notes 2013, 6, 91-93. 90. Council, N.R. Guide for the care and use of laboratory animals. National Academies

Press, 2010.

91. Schulz, M.H.; Zerbino, D.R.; Vingron, M.; Birney, E. Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformat. 2012, 28, 1086-1092. ACCEPTED MANUSCRIPT

92. Bregge-Silva, C.; Nonato, M.C.; de Albuquerque, S.; Ho, P.L.; Junqueira de Azevedo,

I.L..; Diniz, M.R.V.; Lomonte, B.; Rucavado, A.; Díaz, C.; Gutiérrez, J.M. Isolation and biochemical, functional and structural characterization of a novel L-amino acid oxidase from

Lachesis muta snake venom. Toxicon 2012, 60, 1263-1276.

93. Grabherr, M.G.; Haas, B.J.; Yassour, M.; Levin, J.Z.; Thompson, D.A.; Amit, I.;

Adiconis, X.; Fan, L.; Raychowdhury, R.; Zeng, Q. Full-length transcriptome assembly from

RNA-seq data without a reference genome. Nature Biotech. 2011, 29, 644-652.

94. Huang, X.; Madan, A. Cap3: A DNA sequence assembly program. Genome Res. 1999, 9,

868-877.

95. Smit, A.; Hubley, R.; Green, P. Repeatmasker Open-4.0, Institute for Systems Biology. http://repeatmasker. org 2015.

96. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403-410.

97. Mortazavi, A.; Williams, B.A.; McCue, K.; Schaeffer, L.; Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Meth. 2008, 5, 621-628.

98. Papadopoulos, J.S.; Agarwala, R. Cobalt: Constraint-based alignment tool for multiple protein sequences. Bioinformat. 2007, 23, 1073-1079. ACCEPTED MANUSCRIPT 99. Langmead, B.; Trapnell, C.; Pop, M.; Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10, R25.

ACCEPTED MANUSCRIPT

Table 1. Summary statistics for de novo assembly of the Harderian gland transcriptomes of the three Cuban colubrids: Caraiba andreae (Ca), Cubophis cantherigerus (Cc) and Tretanorhinus variabilis (Tv).

Ca Cc Tv

Read Count 54,593,038 50,762,546 56,115,370

Total Bases 5,513,896,838 5,127,017,146 5,667,652,370

GC(%) 51.78 48.8 52

AT(%) 48.22 51.2 47.74

RAW DATA RAW Q20(%)1 94.35 95.27 95.03

Q30(%)2 87.59 89.49 90.07

63.333 27.666 62.324

11.879 9.693 14.494

402 393 636

33 35 48

25000 bp) 2 2 2

0 0 0

Largest contig 29.43 29.155 28.545

Total length all contigs 46,223,254 29,666,985 52,264,819 CAP3_assembly_stats GC (%) ACCEPTED MANUSCRIPT42.87 42.65 43.9

N503 1092 1602 1337

L504 10.568 5175 10.078

N's per 100 kbp5 0.46 0.02 0.24 ACCEPTED MANUSCRIPT

imported references (contigs) 63.333 27.666 62.324

42,997,058 41,732,724 45,884,816 number of reads (paired)

(78.76%) (82.21%) (81.77%)

32,702,780 28,594,296 30,750,932 reads mapped in pairs (76.06%) (68.52%) (67.0)

mappedreads 4,529,254 6,655,635 5,139,839 reads mapped in broken pairs

CLC CLC (10.53%) (15.95%) (11.20)

5,765,024 6,482,793 9,994,045 reads not mapped (13.41%) (15.53%) (21.78%)

45,258 23,370 36,982

total of positive hits (71.46%) (84.47%) (59.33%)

potential toxins 1632 781 1299

non-characterized within

162 90 218 Blast Positive hits BlastPositive potential toxins

1 Q20(%): % of bases with phred score > 20

2 Q30(%): % of bases with phred score > 30

3 N50: average contigs length

4 L50: number of contigs whose length is N50 5 nucleotide ambiguityACCEPTED per 100 kbp MANUSCRIPT

ACCEPTED MANUSCRIPT

Table 2. Putative canonical toxin-coding transcripts expressed in the Harderian gland transcriptomes of the Cuban colubroids, Caraiba andreae (Ca), Cubophis cantherigerus (Cc) and Tretanorhinus variabilis (Tv). C-NP, C-type natriuretic peptide precursor; CTL, C-type lectin-like; CRISP, cysteine-rich secretory protein; Hya, hyaluronidase; KUN, Kunitz-type serine protease inhibitor-like; SVMP, snake venom metalloproteinase; SP, serine protease; VEGF, vascular endothelial growth factor; WAP, waprin. Nucleotide sequences of assembled putative toxin-coding contigs are listed in Supplementary Table S4.

Contig Toxin % Query % e- % % Toxin Species ACCEPTEDContig size MANUSCRIPTBest NCBI hit RPKM family coverage ID value transcriptome family (bp)

DQ912656.1 Philodryas 3.32e - Ca C-NP Contig16590 413 100 90.6 132.2 0.0132 2.95 olfersii 174 ACCEPTED MANUSCRIPT

Contig17753 875 67.31 87.0 0

KM527183.1 Philodryas 1.76e- Contig613 442 41.61 91.9 33.2 chamissonis 70 0.0033 CTL 1.97 DQ912658.1 Philodryas Contig17228 765 90.0 0 55.4 olfersii 86.54 0.0055

Contig16917 1681 DQ912659.1 Philodryas 49.49 89.9 0 1046.4 0.074 Contig59349 2116 olfersii 61.77 91.0 0

DQ139897.1 Philodryas 7.53e- Contig19342 847 85.3 3.3 olfersii 62.46 180 0.00033 CRISP 23.94 1.47e- Contig48801 964 EU938339.1 Naja kaouthia 93.7 15.5 6.54 18 0.0015

AY093955.1 Rhabdophis 1.07e- Contig17831 1635 96.4 9.0 tigrinus 3.36 16 0.0009

DQ840262.1 Echis Contig46174 1032 88.1 0 1.5 carinatus sochureki 67.7 0.00015 HYA 0.18 Contig46175 1812 AB851978.1 Ovophis 91.19 89.0 0 0.00054 6.5 Contig46176 1048 okinavensis 67.14 85.9 0 0.00011

EU012449.1 Austrelaps Contig17117 2876 82.7 0 27.8 ACCEPTED labialisMANUSCRIPT 40.4 0.0027 KUN 0.74 DQ464286.1 Sistrurus 2.94e- Contig42573 529 73.5 5.6 catenatus edwardsi 90.04 81 0.00056

GQ139592.1 Philodryas 1.32e- SVMP Contig324 1459 45.99 79.0 1103.2 0.110 68.58 olfersii 179 ACCEPTED MANUSCRIPT

Contig16906 1673 47.4 87.1 0

2.13e- Contig21872 328 99.7 88.5 118

Contig44916 1139 88.5 92.8 0

Contig44919 2158 75.95 89.7 0

Contig45587 433 100 92.8 0

Contig464 813 96.68 76.9 0

8.94e- Contig63240 255 100 92.5 103

1.25e- Contig936 268 100 77.5 50

Contig16965 2628 85.77 91.1 0

Contig17004 1177 100 88.1 0 KM527180.1 Philodryas Contig44917 1578 43.42 93.0 0 1746.3 0.175 chamissonis Contig44918 1479 88.78 93.2 0

Contig49335 871 50.98 93.7 0

2.74e- Contig61177 440 47.73 89.5 74 ACCEPTED MANUSCRIPT 2.96e- Contig46524 370 EF080840.1 Naja atra 88.8 81.0 88.11 117 0.0081

AB848237.1 Protobothrops SP Contig29475 2302 78.5 0 11.6 0.25 flavoviridis 79.84 0.00116

VEGF Contig31052 4417 AB851940.1 Protobothrops 93.77 88.7 0 9.0 0.0090 0.23 ACCEPTED MANUSCRIPT

flavoviridis

FJ554641.1 Protobothrops Contig52705 621 89.1 0 1.7 flavoviridis 99.84 0.00017

AB851939.1 Protobothrops 2.69e- Contig22160 1041 82.2 29.1 flavoviridis 12.39 28 0.0029 WAP 0.14 EU029743.1 Philodryas 7.83e- Contig47028 492 90.2 22.2 olfersii 26.61 44 0.0022

EU012449.1 Austrelaps KUN Contig11376 2355 81.6 0 33.4 20.17 labialis 53.39 0.0033

AB848237.1 Protobothrops SP Contig21245 2239 78.1 0 9.2 5.52 flavoviridis 81.64 0.00091 Cc AB852009.1 Ovophis VEGF Contig9026 4058 88.8 0 8.0 4.80 okinavensis 98.45 0.00080

AB851939.1 Protobothrops 6.50e- WAP Contig93 604 83.6 115.2 69.5 flavoviridis 19.21 27 0.0115

GU190822.1 Bungarus 3.48e- Contig36720 668 81.3 4.4 ACCEPTED flavicepsMANUSCRIPT 89.97 170 0.00044 Contig37483 722 FJ790480.1 Suta nigriceps 94.2 84.3 0 109.9 0.0110 Tv CTL 53.68 EF194720.1 Oxyuranus 2.13e- Contig37484 736 79.5 99.4 scutellatus 84.73 154 0.0099

Contig52500 559 EU029700.1 Philodryas 85.87 82.5 1.08e- 106.8 0.0107 ACCEPTED MANUSCRIPT

olfersii 137

A7X4T8.1 Causus 3.02e- CRISP Contig8067 1214 87.5 51.3 8.59 rhombeatus 11.82 24 0.0051

AB851935.1 Protobothrops 7.02e- Contig39002 1078 86.7 5.8 flavoviridis 11.87 36 0.00058

1.66e- SVMP Contig40773 563 EF080840.1 Naja atra 75.4 1.2 36.18 57.37 59 0.000124

GQ139592.1 Philodryas Contig54548 2655 86.5 0 209.0 olfersii 86.1 0.02089

A7X4M7.1 Philodryas 6.36e- Contig15371 348 68.97 88.8 2.4 olfersii 33 0.00024

EU029743.1 Philodryas 3.84e- WAP Contig42847 404 81.1 4.5 1.53 olfersii 51.08 53 0.00045

EU029742.1 Philodryas 7.15e- Contig55283 235 91.5 2.2 olfersii 100 91 0.00022

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT

Highlights:

• first transcriptomic analysis of a reptile Harderian gland

• transcriptomes of the Harderian glands revealed for three non-front-fanged colubroid snakes from Cuba

• most abundant transcripts encode putative transport/binding, lipocalin/lipocalin-like, and bactericidal/permeability-increasing-like proteins

• transcripts coding for putative canonical toxins described in venomous snakes were also identified

• transcriptional profile suggests a more complex function than previously recognized for the

Harderian gland ACCEPTED MANUSCRIPT