marine drugs

Article Diversity of Conopeptides and Their Precursor Genes of Litteratus

Xinjia Li, Wanyi Chen, Dongting Zhangsun * and Sulan Luo *

Key Laboratory of Tropical Biological Resources of Ministry of Education, Key Laboratory for Marine Drugs of Haikou, School of Life and Pharmaceutical Sciences, Hainan University, Haikou 570228, China; [email protected] (X.L.); [email protected] (W.C.) * Correspondence: [email protected] (D.Z.); [email protected] (S.L.)

 Received: 14 August 2020; Accepted: 10 September 2020; Published: 14 September 2020 

Abstract: The venom of various Conus is composed of a rich variety of unique bioactive peptides, commonly referred to as conotoxins (conopeptides). Most conopeptides have specific receptors or ion channels as physiologically relevant targets. In this paper, high-throughput transcriptome sequencing was performed to analyze putative conotoxin transcripts from the venom duct of a vermivorous species, Conus litteratus native to the South China Sea. A total of 128 putative conotoxins were identified, most of them belonging to 22 known superfamilies, with 43 conotoxins being regarded as belonging to new superfamilies. Notably, the M superfamily was the most abundant in conotoxins among the known superfamilies. A total of 15 known cysteine frameworks were also described. The largest proportion of cysteine frameworks were VI/VII (C-C-CC-C-C), IX (C-C-C-C-C-C) and XIV (C-C-C-C). In addition, five novel cysteine patterns were also discovered. Simple sequence repeat detection results showed that di-nucleotide was the major type of repetition, and the codon usage bias results indicated that the codon usage bias of the conotoxin genes was weak, but the M, O1, O2 superfamilies differed in codon preference. Gene cloning indicated that there was no intron in conotoxins of the B1- or J superfamily, one intron with 1273–1339 bp existed in a mature region of the F superfamily, which is different from the previously reported gene structure of conotoxins from other superfamilies. This study will enhance our understanding of conotoxin diversity, and the new conotoxins discovered in this paper will provide more potential candidates for the development of pharmacological probes and marine peptide drugs.

Keywords: conotoxin; diversity; superfamily; cysteine framework; gene structure

1. Introduction The genus Conus is a large family of gastropods belonging to mollusks. There are about 800 species of Conus worldwide, which are distributed in tropical seas [1,2]. According to their diet habits, cone snails can be divided into three groups, vermivorous (V), molluscivorous (M), and piscivorous (P) species [3–5]. Although they move slowly, cone snails can prey on creatures with quick movement by skillfully injecting a small amount of a complex cocktail containing potent venom peptides. These venom peptides are commonly named conotoxins or conopeptides, and are a mixture of different bioactive compounds for defense and preying. These peptides, with different pharmacological activities, are composed of a small number of amino-acid residues that are especially rich in cysteines, and have become one of the main sources of peptide medicine [6–8]. With structural stability, target specificity and high affinity, conotoxins are able to skillfully target different ion channels and receptors, displaying massive potential in the study of peptide drugs and probes [9–12]. In 1978, one conotoxin, called myotoxin, from Conus geographus venom was purified and characterized for the first time. Since then, researchers have been engaged in the systematic study

Mar. Drugs 2020, 18, 464; doi:10.3390/md18090464 www.mdpi.com/journal/marinedrugs Mar. Drugs 2020, 18, 464 2 of 17

Mar. Drugs 2020, 18, x 2 of 18 of conotoxins [7,13]. As is well known, ω-MVIIA (ziconotide) has already approved by the FDA forsystematic clinical study application of conotoxins in the treatment [7,13]. As ofis well severe known, pain ω‐ [14MVIIA–16]. Generally, (ziconotide) conopeptide has already precursors approved possessby the FDA a conserved for clinical topological application organization, in the treatment which of issevere composed pain [14–16]. of three Generally, regions, including conopeptide an N-terminusprecursors possess signal regiona conserved (including topological about organization, 20 hydrophobic which amino is composed acids and of highly three regions, conserved including in the samean N‐terminus gene superfamily), signal region an intervening(including about propeptide 20 hydrophobic region, and amino a mature acids toxinand highly region conserved (hypervariable in the region)same gene [17 ,superfamily),18]. an intervening propeptide region, and a mature toxin region (hypervariable region)So [17,18]. far, there are three main classification methods for conotoxins. According to the similarity of the signalSo far, peptide there are region, three various main classification conotoxins are methods mainly for divided conotoxins. into 29 According gene superfamilies, to the similarity i.e., A, B1, of B2,the B3,signal and peptide C, etc. Thereregion, are various also 15 conotoxins temporary are gene mainly superfamilies divided into identified 29 gene in superfamilies, early divergent i.e., clade A, species.B1, B2, B3, For and conotoxins C, etc. There rich inare cysteines, also 15 temporary according gene to the superfamilies latest update identified of ConoServer in early on divergent Monday, 31clade August species. 2020, Forthere conotoxins are 31 kinds rich of in confirmed cysteines, cysteineaccording frameworks, to the latest designated update of as ConoServer I, II, III, IV, etc.on OnMonday, the basis 31 ofAugust pharmacological 2020, there activityare 31 kinds and target, of confirmed 12 pharmacological cysteine frameworks, families of designated conotoxins as (α -,I, γII,-, δIII,-, εIV,-, ι-, etc.κ-, µOn-, ρ -,theσ-, basisτ-, χ- of and pharmacologicalω-families) have activity been defined and target, [19–21 12]. pharmacological families of conotoxinsPrevious (α‐ studies, γ‐, δ‐, ε‐ have, ι‐, shown κ‐, μ‐, ρ‐ that, σ‐ each, τ‐, χ‐ Conusand ω‐speciesfamilies) contains have thousands been defined of conopeptides, [19–21]. which meansPrevious that there studies may have be more shown than that 1,000,000 each Conus natural species conotoxins contains thousands in cone snails of conopeptides, (about 800 species) which allmeans over that the there world may [22 be–25 more]. According than 1,000,000 to the natural latest conotoxins ConoServer in cone statistics, snails less(about than 800 only species) 1% all of conopeptideover the world sequences [22–25]. According have been to reported. the latest Fewer ConoServer than 300 statistics, conopeptides less than have only been 1% of characterized conopeptide pharmacologicallysequences have been [21]; therefore,reported. a largeFewer portion than of300 conotoxins conopeptides still remain have tobeen be discovered characterized and characterized.pharmacologically Nowadays, [21]; therefore, conotoxins a large are widelyportion used of conotoxins in the research still remain and development to be discovered of marine and peptidecharacterized. drugs, Nowadays, providing conotoxins a new ligand are toolwidely for used the study in the of research receptors, and ion development channels, etc.of marine [9–12]. Traditionalpeptide drugs, methods, providing including a new isolation ligand tool and for Sanger the study sequencing, of receptors, are generally ion channels, considered etc. [9–12]. to be time-consuming,Traditional methods, of lower including efficiency, isolation and and often Sanger limited sequencing, by sample are availability. generally considered to be time‐ consuming,With the of development lower efficiency, of high-throughput and often limited sequencing by sample platforms, availability. the combination of transcriptomic or proteomicWith the sequencing development with bioinformaticalof high‐throughput screening sequencing is expected platforms, to accelerate the ligandcombination and target of discoverytranscriptomic [26–28 or]. Inproteomic the present sequencing work, the with Hiseq bioinformatical Xten sequencing screening approach is wasexpected applied to to accelerate uncover theligand venom and ducttarget transcriptome discovery [26–28]. of a worm-hunting In the present species work, ofthe cone Hiseq snail, XtenC. sequencing litteratus, of approach different sizes was (Smallapplied 6–7cm, to uncover Middle the 8–10cm, venom Big duct 10–12cm; transcriptome Figure1). Weof furthera worm characterized‐hunting species its conotoxin of cone diversitysnail, C. oflitteratus cysteine, of framework, different sizes pharmacological (Small 6–7cm, activity Middle and gene8–10cm, superfamily. Big 10–12cm; The simple Figure sequence 1). We repeatsfurther (SSRs),characterized codon usageits conotoxin bias, and diversity gene structure of cysteine of the conotoxinsframework, were pharmacological also analyzed. activity This research and gene will supportsuperfamily. novel The bioactive simple peptide sequence discovery repeats and (SSRs), genomic codon feature usage studies bias, and for thegene exploitation structure ofof thisthe speciesconotoxins in the were future. also analyzed. This research will support novel bioactive peptide discovery and genomic feature studies for the exploitation of this species in the future.

Figure 1. TheThe shell shell and and venom venom duct duct of of C.C. litteratus litteratus. (.(A)A Small) Small conus conus (SC), (SC), (B) (middleB) middle conus conus (MC), (MC), (C) (bigC) bigconus conus (BC), (BC), (D) ( VenomD) Venom duct. duct.

2. Results

2.1. High‐Throughput Sequencing and De Novo Assembly of Conus Litteratus Transcriptome Venom transcriptome libraries of small (SC, Figure 1A), middle (MC, Figure 1B) and big (BC, Figure 1C) individuals of Conus litteratus were constructed and sequenced. Raw reads of the

Mar. Drugs 2020, 18, 464 3 of 17

2. Results

2.1. High-Throughput Sequencing and De Novo Assembly of Conus Litteratus Transcriptome Venom transcriptome libraries of small (SC, Figure1A), middle (MC, Figure1B) and big (BC, Figure1C) individuals of Conus litteratus were constructed and sequenced. Raw reads of the transcriptome sequencing were deposited in the NCBI Sequence Read Archive (SRA) database with a project ID PRJNA586870 including SRX7667250–SRX7667252. After removing adaptor sequences, ambiguous nucleotides and low-quality sequences, de novo assembly was performed using the Trinity software. The final assembly of each transcriptome contained 44,097 unigenes for SC, 37,861 unigenes for MC, and 45,554 unigenes for BC, with mean lengths of 440 bp, 301 bp and 356 bp, respectively. A summary of the three transcriptome assemblies is presented in Table1.

Table 1. Summary of the transcriptome assemblies.

SC MC BC Total raw reads 43,964,576 39,694,336 37,786,706 Total raw nucleotides (nt) 6,594,686,400 5,954,150,400 5,668,005,900 Total clean reads 39,777,162 36,169,722 34,326,494 Total clean nucleotides (nt) 5,234,397,520 4,766,618,449 4,656,292,329 Number 44,097 37,861 45,554 Total length (bp) 19,428,193 11,408,032 16,204,862 Unigenes Max length (bp) 10,772 10,772 10,772 Mean length (bp) 440 301 356 N50 (bp) 486 278 349

2.2. Conotoxin Diversity of C. Litteratus with Respect to Superfamily The putative conopeptide sequences from C. litteratus transcriptome were predicted by Conosorter and Blastp search against a local reference database of known conopeptides from the ConoDB databases, which were subsequently examined using the ConoPrec tool. After removal of duplications and truncated mature region sequences, a total of 128 putative conopeptides were obtained from three libraries (SC, MC, BC) (Figure2A, S1 and S2). A phylogenetic tree of the signal sequences from identified conotoxins of C. litteratus was also constructed to investigate the relationships among these conopeptides (Figure2B). The distance among signal sequences was measured on a global alignment performed by ClustalX2 [29]. As a result, most of these predicted conotoxins could be classified into 22 known superfamilies. In addition, 43 sequences belonged to new superfamilies, and 18 that were not assigned to any superfamily were taken as Unknown family. Moreover, M superfamily accounted for the highest proportion of the known superfamily. Of all these conotoxins, 25 have been reported in the ConoServer database (S3). All these sequences have at least one amino acid (aa) difference between them in the mature regions. Most of these identified conopeptides were full-length or nearly full-length. Herein, conopeptides from superfamilies E, F, H, Q, Divergent_MSTLGMVLL, Hormone superfamily, and Conodipine are first reported in C. litteratus. Furthermore, several rare superfamilies of conotoxins (fewer than 20 in the ConoServer database) were found in this research. They can be summarized as follows (Figure3). There were three precursors from the C superfamily (4), one precursor from the E superfamily (3), two precursors from the F superfamily (3), two precursors from the H superfamily (11) (S10-Figure S7), one precursor from the I3 superfamily (14), and three precursors from the P superfamily (16) (S10-Figure S8). The above numbers in parentheses refer to the conotoxin number of the superfamily reported in ConoServer. Mar. Drugs 2020, 18, x 4 of 18

Mar. Drugs 2020, 18, x 4 of 18 Mar. Drugs 2020, 18, 464 4 of 17

Figure 2. Summary of conopeptides from C. litteratus. (A) Total superfamilies of conopeptides identified in this paper. (B) Clustering analysis of signal sequences found in C. litteratus transcriptome and identification of gene superfamilies.

Herein, conopeptides from superfamilies E, F, H, Q, Divergent_MSTLGMVLL, Hormone superfamily, and Conodipine are first reported in C. litteratus. Furthermore, several rare superfamilies of conotoxins (fewer than 20 in the ConoServer database) were found in this research. They can be summarized as follows (Figure 3). There were three precursors from the C superfamily (4), one precursor from the E superfamily (3), two precursors from the F superfamily (3), two precursors Figure 2. Summary of conopeptides from C. litteratus.(A) Total superfamilies of conopeptides identified fromFigure the H 2.superfamily Summary of(11) conopeptides (S10‐Figure from S7), C.one litteratus precursor. (A )from Total the superfamilies I3 superfamily of conopeptides (14), and three in this paper. (B) Clustering analysis of signal sequences found in C. litteratus transcriptome and precursorsidentified from in this the paper. P superfamily (B) Clustering (16) analysis (S10‐Figure of signal S8). sequences The above found numbers in C. litteratus in parentheses transcriptome refer to identification of gene superfamilies. the conotoxinand identification number of of gene the superfamilies. superfamily reported in ConoServer.

Herein, conopeptides from superfamilies E, F, H, Q, Divergent_MSTLGMVLL, Hormone superfamily, and Conodipine are first reported in C. litteratus. Furthermore, several rare superfamilies of conotoxins (fewer than 20 in the ConoServer database) were found in this research. They can be summarized as follows (Figure 3). There were three precursors from the C superfamily (4), one precursor from the E superfamily (3), two precursors from the F superfamily (3), two precursors from the H superfamily (11) (S10‐Figure S7), one precursor from the I3 superfamily (14), and three precursors from the P superfamily (16) (S10‐Figure S8). The above numbers in parentheses refer to the conotoxin number of the superfamily reported in ConoServer.

Figure 3. The rare superfamilies of conotoxins identified from C. litteratus. The signal regions Figure 3. The rare superfamilies of conotoxins identified from C. litteratus. The signal regions are are underlined, the mature sequence are framed with blue rectangles (not the sequence with blue underlined, the mature regions are framed with blue rectangles, and cysteine residues are marked background), and cysteine residues are marked with orange rectangles (not the letters with orange with orange rectangles. background). Interestingly, eight conopeptides discovered in this study possessed very similar sequences, as Interestingly, eight conopeptides discovered in this study possessed very similar sequences, previously reported in other cone snail species (Table 2). More specifically, the T superfamily has two as previously reported in other cone snail species (Table2). More specifically, the T superfamily has two members (Lt015 and Lt044); the remaining six were classified into the A, O1, O2, O3, L and M superfamilies. Among these similar conopeptides, two sequences come from the vermivorous C. leopardus and four derive from C. eburneus. Surprisingly, there also exist two sequences (Lt019 and Lt048)Figure identified 3. The from rare thesuperfamilies piscivorous of conotoxinsC. geographus identifiedand C. from striatus C. litteratus, which. were The signal completely regions di arefferent with theunderlined, vermivorous the matureC. litteratus regionsin are diet. framed with blue rectangles, and cysteine residues are marked with orange rectangles.

Interestingly, eight conopeptides discovered in this study possessed very similar sequences, as previously reported in other cone snail species (Table 2). More specifically, the T superfamily has two

Mar. Drugs 2020, 18, 464 5 of 17

Table 2. Homologous sequences reported from other Conus species.

Cystine Homologous Cono-Toxins Super-Family Source Diet Pattern Sequences Lt015 T V Eu5.4 C. eburneus M Lt019 O2 1 S-S contryphan-G C. geographus P Lt020 A I Lt1.2/Lp1.1 C. leopardus V Lt034 L XIV Eu14.7 C. eburneus M Lt038 O1 VI/VII Eu6.8 C. eburneus M Lt044 T V Lt5g/Lp5.2 C. leopardus V Lt048 M III S3-Y01 C. striatus P Lt079 O3 VI/VII Eu6.6 C. eburneus M P: piscivorous; M: molluscivorous; V: vermivorous.

2.3. Conotoxin Diversity of C. Litteratus with Respect to Cysteine Framework The cysteine patterns of 128 conotoxins of C. litteratus were investigated (Figure4 and S1). Among them, 30 conotoxins were identified with no disulfide bond, including from superfamilies B1, H, M, O1, O2, O3; 14 sequences contained only one disulfide bond, including from superfamilies C, O2, F; 7 peptides contained an odd number of cysteine residues, including Conkunitzin. There were seven conotoxins that possessed unnamed cysteine frameworks with the following five novel patterns: Lt001 with the framework of (CC-C-C-C-CC-C-C-C-C-C), Lt004 and Lt127 with the framework of (C-C-C-C-C-C-C-C-C-C-C-C), Lt018 with the framework of (C-C-C-C-C-CC-C-C-C-C-C), Lt050 with the frameworkMar. of Drugs (CC-C-C-C-C-C-C), 2020, 18, x and Lt108 with the framework of (CC-C-C-C-C-C-CC-C-C-C-C-C).6 of 18

Figure 4.FigureSummary 4. Summary of di offf differenterent cystine cystine patterns of ofconopeptides conopeptides of C. litteratus of C. litteratus. .

There wereFPKM 83 sequences(fragments per with kilobase 15 known of transcript cysteine per million frameworks, mapped reads) which values can were be brieflycalculated depicted as follows. Ato total represent of 15 the conotoxins expression level with of framework each conopeptide. VI/VII Here, (C-C-CC-C-C) the top 30 conopeptides accounted with for the the highest biggest share, includingFPKM from values the superfamilies were also selected H, for I3, the O1, analysis O2, O3,of expression Q. There level were (Figure 10 5, conotoxins Table3). In the with top 30 framework conopeptides (Table 3), M, O1, O2 and T superfamilies were more common than other superfamilies, IX (C-C-C-C-C-C)and there also including existed many from conotoxins superfamily that fall P into and new Conkunitzin; superfamilies. 9 Framework with framework VI/VII (C‐C XIV‐CC‐ (C-C-C-C) including fromC‐C) was superfamilies the most abundant J, L, and Cys pattern. Conkunitzin; The results 6 with revealed framework that huge difference XV (C-C-CC-C-C-C-C), of the FPKM value including from superfamiliesexisted in the O2, top Divergent_MSTLGMVLL; 30 conotoxins. Actually, there was 5 withan over framework five hundred III‐fold (CC-C-C-CC) variation between including the from superfamiliesfirst and M; the 5 30th. with More framework importantly, V big (CC-CC), differences including in the expression from level superfamily were even present T. Four within conotoxins the same superfamily. had framework XXII (C-C-C-C-C-C-C-C) and VIII (C-C-C-C-C-C-C-C-C-C), three had framework I (CC-C-C) and XXI (CC-C-C-C-CC-C-C-C), two had framework XI (C-C-CC-CC-C-C). There was only one conotoxin discovered for framework IV (CC-C-C-C-C), XVI (C-C-CC), XX (C-CC-C-CC-C-C-C-C) and XXVII (C-CC-C-C-C), respectively. FPKM (fragments per kilobase of transcript per million mapped reads) values were calculated to represent the expression level of each conopeptide. Here, the top 30 conopeptides with the highest FPKM values were also selected for the analysis of expression level (Figure5, Table3). In the top 30 conopeptides (Table3), M, O1, O2 and T superfamilies were more common than other superfamilies, and there

Figure 5. FPKM ranking of top 30 conopeptides in each group.

Mar. Drugs 2020, 18, x 6 of 18

Figure 4. Summary of different cystine patterns of conopeptides of C. litteratus. Mar. Drugs 2020, 18, 464 6 of 17 FPKM (fragments per kilobase of transcript per million mapped reads) values were calculated to represent the expression level of each conopeptide. Here, the top 30 conopeptides with the highest FPKM values were also selected for the analysis of expression level (Figure 5, Table3). In the top 30 also existedconopeptides many conotoxins (Table 3), M, that O1, fall O2 and into T superfamilies new superfamilies. were more Frameworkcommon than other VI/VII superfamilies, (C-C-CC-C-C) was the most abundantand there also Cys existed pattern. many Theconotoxins results that revealed fall into new that superfamilies. huge difference Framework of the VI/VII FPKM (C‐C‐ valueCC‐ existed in the topC 30‐C) conotoxins. was the most abundant Actually, Cys there pattern. was The anresults over revealed five hundred-fold that huge difference variation of the FPKM between value the first and the 30th.existed More in the importantly, top 30 conotoxins. big Actually, differences there inwas the an expressionover five hundred level‐fold were variation even between present the within the first and the 30th. More importantly, big differences in the expression level were even present within same superfamily.the same superfamily.

Figure 5. FPKM ranking of top 30 conopeptides in each group. Figure 5. FPKM ranking of top 30 conopeptides in each group. Table 3. The top 30 conotoxins (FPKM ranking) and their gene superfamilies and Cys frame of

each dataset.

Cys Cys Cys Rank SC Super-Family MC Super-Family BC Super-Family Frame Frame Frame 1 Lt044 TV Lt044 TV Lt020 AI 2 Lt049 Nsf No S-S Lt020 AI Lt070 Nsf No S-S 3 Lt033 L XIV Lt057 F 1 S-S Lt072 TV 4 Lt072 TV Lt049 Nsf No S-S Lt118 F 1 S-S 5 Lt070 Nsf No S-S Lt072 TV Lt040 O1 VI/VII 6 Lt118 F 1 S-S Lt070 Nsf No S-S Lt049 Nsf No S-S 7 Lt020 AI Lt033 L XIV Lt033 L XIV 8 Lt036 O1 No S-S Lt048 M III Lt046 M III 9 Lt040 O1 VI/VII Lt040 O1 VI/VII Lt117 Nsf 1 S-S 10 Lt097 P IX Lt037 O1 VI/VII Lt044 TV 11 Lt015 TV Lt076 J XIV Lt035 O1 No S-S 12 Lt076 J XIV Lt013 O2 VI/VII Lt074 Nsf No S-S 13 Lt037 O1 VI/VII Lt015 TV Lt061 Nsf 1 S-S 14 Lt110 # XXI Lt039 O1 VI/VII Lt120 J XIV 15 Lt022 AI Lt067 M XVI Lt097 P IX 16 Lt003 Nsf unnamed1 Lt023 Nsf VI/VII Lt067 M XVI 17 Lt108 # unnamed2 Lt084 M No S-S Lt030 Q VI/VII 18 Lt013 O2 VI/VII Lt019 O2 1 S-S Lt047 M III 19 Lt016 Nsf No S-S Lt054 Nsf XXII Lt014 Nsf (C-C-C) 20 Lt019 O2 1 S-S Lt080 Nsf VI/VII Lt050 Nsf unnamed3 21 Lt048 M III Lt036 O1 No S-S Lt015 TV 22 Lt030 Q VI/VII Lt055 Nsf VIII Lt023 Nsf VI/VII 23 Lt038 O1 VI/VII Lt108 # unnamed2 Lt041 O1 VI/VII 24 Lt012 O2 XV Lt016 Nsf No S-S Lt109 # XXI 25 Lt067 M XVI Lt006 Nsf IX Lt045 unknown III 26 Lt082 Nsf No S-S Lt029 O1 VI/VII Lt019 O2 1 S-S 27 Lt085 M No S-S Lt012 O2 XV Lt012 O2 XV 28 Lt028 P IX Lt014 Nsf (C-C-C) Lt083 M No S-S 29 Lt050 Nsf unnamed3 Lt081 Nsf No S-S Lt108 # unnamed2 30 Lt034 L XIV Lt056 Nsf VIII Lt013 O2 VI/VII #: Con-ikot-ikot; unnamed 1: C-C-C-C-C-CC-C; unnamed 2: CC-C-C-C-C-C-CC-C-C-C-C-C; unnamed 3: CC-C-C-C-C-C-C. Mar. Drugs 2020, 18, 464 7 of 17

Among the top 30 conotoxin sequences with the highest FPKM values, there were eight individual-specific conopeptides for SC-top30, eight for MC-top30 and eleven for BC-top30, respectively. There were 13 peptides shared by all three groups—SC-top30, MC-top30 and BC-top30. Additionally, five peptides were shared by SC-top30 and MC-top30; four peptides were shared by SC and BC. OnlyMar. Drugs two 2020 peptides, 18, x were common to both MC-top30 and BC-top30 (Figure6). There8 of 18 were also 13 common sequences among the top 30 conopeptides of the three datasets (Figure6), in which Lt012, Table 3 displays the expression levels of the top 30 conotoxins discovered from C. litteratus of Lt013, Lt019different belonged sizes native to the to O2 South superfamily; China Sea. Although Lt015, Lt044, we mixed Lt072 the RNA belonged of five tovenom the ducts T superfamily; evenly, Lt020, Lt033, Lt040,there Lt067 may still and be Lt108 differences belonged on account to the of the A, fact L, O1, that Mdifferent and Con-ikot-ikot geographic and living superfamilies, environments respectively. However,may Lt049 result and in Lt070 higher wereintraspecific not assigned venom peptide to any diversity. known The family. content The of the FPKM venom rank duct may of these also common conotoxinsbe was different variable between in individuals different groups.of the same size from different geographic environments and living conditions.

Mar. Drugs 2020, 18, x 8 of 18

Table 3 displays the expression levels of the top 30 conotoxins discovered from C. litteratus of different sizes native to South China Sea. Although we mixed the RNA of five venom ducts evenly, there may still be differences on account of the fact that different geographic and living environments may result in higher intraspecific venom peptide diversity. The content of the venom duct may also be different between individuals of the same size from different geographic environments and living conditions.

Figure 6. Venn diagram of the top 30 conotoxins sequences from the three datasets. Figure 6. Venn diagram of the top 30 conotoxins sequences from the three datasets.

Table2.4.3 displays Comparison the of Conopeptides expression in the levels Three of Venom the Duct top Transcriptomes 30 conotoxins of Different discovered Size C. Litteratus from C. litteratus of different sizes native to South China Sea. Although we mixed the RNA of five venom ducts evenly, A total of 88, 74 and 75 putative conopeptides were identified from the SC, MC and BC C. there maylitteratus still be digroups,fferences respectively, on account which of theshowed fact thatdifferent diff erentindividual geographic size. Their and livingcomparative environments may resultdistribution in higher of intraspecificconopeptides is summarized venom peptide in the Venn diversity. diagrams The of Figure content 7A (S4). of theThere venom were 12 duct may also be diffpeptideserent between shared by SC individuals and MC. Nine of peptides the same were size common from to di bothfferent MC and geographic BC. Six peptides environments were and living conditions.shared by SC and BC. There were 41 peptides that were common among the three groups (Figure 7A). There were 22 conopeptides classified into 13 known superfamilies (Figure 7B). There were 15 peptides belonging to new superfamilies. Of all the known superfamilies, M, O2 and T possessed 2.4. Comparison of Conopeptides in the Three Venom Duct Transcriptomes of Di erent Size C. Litteratus three conopeptides, respectively (Figure 7B). ff Figure 6. Venn diagram of the top 30 conotoxins sequences from the three datasets. A total of 88, 74 and 75 putative conopeptides were identified from the SC, MC and BC C. litteratus groups, respectively,2.4. Comparison which of Conopeptides showed in the di Threefferent Venom individual Duct Transcriptomes size. Theirof Different comparative Size C. Litteratus distribution of conopeptidesA is total summarized of 88, 74 and in 75 theputative Venn conopeptides diagrams were of Figureidentified7A from (Supplementary the SC, MC and BC Material C. S4). There werelitteratus 12 peptides groups, sharedrespectively, by SC which and MC.showed Nine different peptides individual were size. common Their tocomparative both MC and BC. Six peptidesdistribution were shared of conopeptides by SC and is summarized BC. There in were the Venn 41 peptides diagrams of that Figure were 7A common (S4). There amongwere 12 the three peptides shared by SC and MC. Nine peptides were common to both MC and BC. Six peptides were groups (Figureshared7 A).by SC There and BC. were There 22 were conopeptides 41 peptides that classified were common into 13 among known the three superfamilies groups (Figure (Figure 7B). There were7A). 15 There peptides were belonging22 conopeptides to new classified superfamilies. into 13 known Of superfamilies all the known (Figure superfamilies, 7B). There were M, 15 O2 and T possessedpeptides three conopeptides, belonging to new respectively superfamilies. (Figure Of all the7B). known superfamilies, M, O2 and T possessed three conopeptides, respectively (Figure 7B).

Figure 7. (A) Venn diagram of conopeptide transcripts from different datasets of SC, MC and BC C. litteratus. (B) Conopeptides shared by the three different C. litteratus groups.

2.5. Characterization of Microsatellites and Codon Usage Bias in C. Litteratus Groups of Different Size The distribution and characteristics of SSR (simple sequence repeat) sequences were analyzed to lay a foundation for the development of SSR primers of C. litteratus (Table 4). A total of 9523, 3524 and 3777 SSRs were detected in SC, MC, BC transcriptomes, respectively. Di‐nucleotide was the major repeat type, including 8009/2444/2355 di‐nucleotide repeats in SC/MC/BC, respectively, which accounted for 84.10%, 69.35% and 62.35%, respectively. The most frequent di‐nucleotide repeated

Figure 7. (FigureA) Venn 7. (A) diagram Venn diagram of conopeptide of conopeptide transcripts from from different diff erentdatasets datasets of SC, MC of and SC, BC MC C. and BC C. litteratuslitteratus.(B) Conopeptides. (B) Conopeptides shared shared by by the the three three different different C. C.litteratus litteratus groupsgroups..

2.5. Characterization of Microsatellites and Codon Usage Bias in C. Litteratus Groups of Different Size The distribution and characteristics of SSR (simple sequence repeat) sequences were analyzed to lay a foundation for the development of SSR primers of C. litteratus (Table 4). A total of 9523, 3524 and 3777 SSRs were detected in SC, MC, BC transcriptomes, respectively. Di‐nucleotide was the major repeat type, including 8009/2444/2355 di‐nucleotide repeats in SC/MC/BC, respectively, which accounted for 84.10%, 69.35% and 62.35%, respectively. The most frequent di‐nucleotide repeated

Mar. Drugs 2020, 18, 464 8 of 17

2.5. Characterization of Microsatellites and Codon Usage Bias in C. Litteratus Groups of Different Size The distribution and characteristics of SSR (simple sequence repeat) sequences were analyzed to lay a foundation for the development of SSR primers of C. litteratus (Table4). A total of 9523, 3524 and Mar. Drugs 2020, 18, x 9 of 18 3777 SSRs were detected in SC, MC, BC transcriptomes, respectively. Di-nucleotide was the major repeat type,motifs including were TG/CA, 8009/2444 with/2355 29.91% di-nucleotide of SC, 21.74% repeats of inMC SC and/MC 20.41%/BC, respectively, of BC, as well which as accounted GT/AC, with for 84.10%,28.22% 69.35%of SC, and20.29% 62.35%, of MC respectively. and 19.49% The of most BC (Figure frequent 8). di-nucleotide Tri‐nucleotide repeated repeated motifs motifs were for TG /eachCA, withgroup 29.91% were ranked of SC, 21.74% second, of for MC which and 20.41%the percentages of BC, as were well 12.87% as GT/AC, for SC, with 23.52% 28.22% for of MC SC, and 20.29% 31.67% of MCfor BC. and The 19.49% proportion of BC (Figure of hexa8).‐ Tri-nucleotidenucleotide repeated repeated motifs motifs was for minimal, each group and were was rankedcalculated second, to be foronly which 0.00%, the 0.03% percentages and 0.16% were for 12.87% the SC, for MC SC, and 23.52% BC groups, for MC respectively. and 31.67% for BC. The proportion of hexa-nucleotide repeated motifs was minimal, and was calculated to be only 0.00%, 0.03% and 0.16% for the SC, MCTable and 4. BC The groups, distribution respectively. and characteristics of various SSR sequences in C. litteratus.

Table 4. The distribution and characteristicsProportion of of various total SSRSSR sequences Frequency in C. litteratus of occurrence. Quantity Primitive types (%) (%) QuantityS/M/B Proportion of Total SSR (%) Frequency of Occurrence (%) Primitive Types S/M/B S/M/B S/M/B S/M/B S/M/B dinucleotide 8009/2444/2355 84.10/69.35/62.35 18.16/6.45/5.17 dinucleotide 8009/2444/2355 84.10/69.35/62.35 18.16/6.45/5.17 trinucleotide 1226/829/1196 12.87/23.52/31.67 2.78/2.19/2.63 trinucleotide 1226/829/1196 12.87/23.52/31.67 2.78/2.19/2.63 tetranucleotidetetranucleotide 282282/242/213/242/213 2.96/6.87/5.64 2.96/6.87/5.64 0.64/0.64/0.47 0.64/0.64/0.47 pentanucleotidepentanucleotide 6/6/8/78/7 0.06/0.23/0.19 0.06/0.23/0.19 0.01/0.02/0.02 0.01/0.02/0.02 hexanucleotidehexanucleotide 0/0/1/61/6 0.00/0.03/0.16 0.00/0.03/0.16 0.00/0.00/0.01 0.00/0.00/0.01 totaltotal 95239523/3524/3777/3524/3777 100/100/100 100/100/100 21.60/9.31/8.29 21.60/9.31/8.29

Figure 8. The proportion of different di-nucleotides in different groups. Figure 8. The proportion of different di‐nucleotides in different groups. To understand the codon usage preference of conotoxin genes of C. litteratus, CodonW and The SequenceTo understand Manipulation the codon Suite (SMS usage online, preference http: of//www.bio-soft.net conotoxin genes /ofsms C./ )litteratus were used, CodonW to analyze and theThe codonSequence usage Manipulation bias. As a result Suite (S6), (SMS the online, effective http://www.bio number of codons‐soft.net/sms/) (ENC) of were conotoxin used to genes analyze ranged the fromcodon 32 usage to 61, withbias. anAs average a result value (S6), the of 52.11. effective It was number indicated of codons that codon (ENC) usage of conotoxin bias of conotoxin genes ranged genes offromC. litteratus 32 to 61,was with weak. an average Furthermore, value of the 52.11. codon It adaptationwas indicated index that (CAI) codon ranged usage from bias 0.128 of conotoxin to 0.409, fargenes less of than C. litteratus 1.0, which was also weak. showed Furthermore, its weak codon the codon usage adaptation bias. Analysis index of (CAI) the relative ranged synonymousfrom 0.128 to codon0.409, usagefar less (RSCU) than of1.0, conotoxin which precursorsalso showed showed its weak that 32codon codons usage with bias. RSCU Analysis> 1 existed of inthe conotoxin relative precursorsynonymous genes codon (Supplementary usage (RSCU) Material, of conotoxin S7). In theseprecursors preference showed codons, that CTG32 codons encoding with leucine RSCU and > 1 AGAexisted encoding in conotoxin of arginine precursor possessed genes (Supplementary the top 2 RSCU valuesMaterial, with S7). 2.20 In these and 1.57, preference respectively, codons, which CTG wasencoding much leucine higher thanand AGA other encoding codons encoding of arginine Leu possessed and Arg. the However, top 2 RSCU RSCU values values with of 2.20 all the and other 1.57, codonsrespectively, were lesswhich than was 1.40, much which higher further than reflected other codons that the encoding conotoxin Leu codon and Arg. usage However, bias was RSCU weak overall.values of Moreover, all the other conotoxin codons codons were less encoding than 1.40, Cys which are preferential further reflected to TGC, that with the an conotoxin RSCU of codon 1.15, ratherusage thanbias was TGT, weak with overall. an RSCU Moreover, of 0.85. conotoxin codons encoding Cys are preferential to TGC, with an RSCURSCU of analysis 1.15, rather was than performed TGT, with using an theRSCU top of 30 0.85. conopeptides with the highest RPKM values fromRSCU each transcriptome, analysis was performed including M, using O1 andthe top O2 superfamily30 conopeptides peptides with (Supplementarythe highest RPKM Material, values S7).from According each transcriptome, to the results, including there is noM, obviousO1 and diO2ff erencesuperfamily in RSCU peptides between (Supplementary the top 30 and Material, the total S7). According to the results, there is no obvious difference in RSCU between the top 30 and the total conotoxins. However, in most amino acids there were some differences in the RSCU values among the M, O1 and O2 superfamilies. As shown in Figure 9, when encoding leucine, the RSCU value of CTT in the O2 superfamily was 1.24, which was much higher than in th eM and O1 superfamilies, with an RSCU value of 0.56. In addition, CTC was more frequently used in the O1 superfamily than in the M and O2 superfamilies. The variation was more distinct when encoding arginine; the RSCU of AGA in the M superfamily was much less than in O1 and O2 superfamilies. RSCU of CGG in O2

Mar. Drugs 2020, 18, 464 9 of 17 conotoxins. However, in most amino acids there were some differences in the RSCU values among the M, O1 and O2 superfamilies. As shown in Figure9, when encoding leucine, the RSCU value of CTT in the O2 superfamily was 1.24, which was much higher than in th eM and O1 superfamilies, with an RSCU value of 0.56. In addition, CTC was more frequently used in the O1 superfamily than in the M and O2 superfamilies. The variation was more distinct when encoding arginine; the RSCU of AGA in theMar. M Drugs superfamily 2020, 18, x was much less than in O1 and O2 superfamilies. RSCU of CGG in O2 superfamily10 of 18 was >1, while it was lower than 0.5 in M and O1 superfamilies. The RSCU value of CGT in the M superfamily was >1, while it was lower than 0.5 in M and O1 superfamilies. The RSCU value of CGT superfamily was much higher than in the O1 and O2 superfamilies. For cystine codons, M and O1 in the M superfamily was much higher than in the O1 and O2 superfamilies. For cystine codons, M superfamilies preferred TGC, while the O2 superfamily tended to use TGT. and O1 superfamilies preferred TGC, while the O2 superfamily tended to use TGT.

Figure 9. RSCU value of three amino acids. (A) RSCU value of leucine. (B) RSCU value of cystine. (FigureC) RSCU 9. RSCU value ofvalue arginine. of three amino acids. (A) RSCU value of leucine. (B) RSCU value of cystine. (C) RSCU value of arginine. 2.6. Conotoxin Gene Cloning from C. Litteratus 2.6. ConotoxinTo explore Gene the gene Cloning structure from C. for Litteratus partial superfamily conopeptides, specific primers were designed accordingTo explore to cDNA the sequences gene structure of the B, for J and partial F superfamily superfamily conotoxins conopeptides, from C. litteratusspecific transcriptome.primers were Finally,designed there according were 19 to genes cDNA of B,sequences J and F superfamily of the B, J and conotoxins F superfamily cloned conotoxins from C. litteratus from genomeC. litteratus by PCR.transcriptome. Three genes Finally, belonged there to thewere B1 19 superfamily, genes of B, eight J and genes F superfamily belonged to theconotoxins J superfamily, cloned and from eight C. geneslitteratus belonged genome to by the PCR. F superfamily Three genes (S8, belonged S9). Furthermore, to the B1 superfamily, no intron existed eight ingenes the genes belonged of B1 to and the JJ superfamilysuperfamily, conotoxins. and eight genes A new belonged intron with to the 1273-1339 F superfamily bp was discovered(S8, S9). Furthermore, in the mature no region intron of existed the F superfamilyin the genes conotoxinsof B1 and J insuperfamily this work by conotoxins. comparing A and new matching intron with their 1273 cDNA‐1339 sequences bp was discovered (S9). in the mature region of the F superfamily conotoxins in this work by comparing and matching their 3. Discussion cDNA sequences (S9). Nowadays, high-throughput sequencing combined with bioinformatics analysis has become a very e3.ff ectiveDiscussion method for discovering and predicting novel conotoxins from various Conus species [22]. Here, next-generationNowadays, sequencing high‐throughput was applied sequencing for the combined venom duct with transcriptomes bioinformatics of analysis three di hasfferent-sized become a C. litteratus very effectivegroups. method After for removal discovering of duplications and predicting from novel the three conotoxins transcriptome from various sequencing Conus datasets, species 128[22]. unique Here, putative next‐generation conopeptides sequencing were discovered, was applied which for were the mainlyvenom classifiedduct transcriptomes into 22 superfamilies of three withdifferent various‐sized cysteine C. litteratus frameworks. groups. Interestingly, After removal although of duplications some conopeptide from the transcripts three transcriptome from small, C. litteratus mediumsequencing and datasets, big size 128 uniqueindividuals putative wereconopeptides different from were each discovered, other, their which venom were components mainly stillclassified displayed into a25 certain superfamilies overlap with (Figure various7). There cysteine were frameworks. 41 conopeptide Interestingly, transcripts although shared by some all individualsconopeptide from transcripts the three from groups. small, medium Different and individuals big size C. of litteratus the same individuals Conus species were different may produce from dieachfferent other, conopeptides their venom due components to intraspecific still displayed genetic variation, a certain which overlap increase (Figure conotoxin 7). There diversity were 41 drasticallyconopeptide [26 transcripts]. shared by all individuals from the three groups. Different individuals of the α sameThe Conus-conotoxins species may from produce the A different superfamily conopeptides target various due to intraspecific subtypes of genetic nicotinic variation, acetylcholine which receptorsincrease conotoxin [30–32], which diversity have drastically a wide distribution [26]. in cone snails. Surprisingly, only three precursors of α -conotoxinsThe α‐conotoxins Lt020, Lt021 from and the Lt022 A superfamily with type target I cysteine various framework subtypes of of CC-C-C nicotinic were acetylcholine identified α inreceptorsC. litteratus [30–32],transcriptomes, which have a wide which distribution belong to in the cone4 /snails.7 subfamily Surprisingly, with CC-X4-C-X7-C only three precursors mode. α Actually,of α‐conotoxins Lt020 is Lt020,-conotoxin Lt021 and LtIA, Lt022 which with was type discovered I cysteine by framework our lab previously, of CC‐C‐C and were can identified specifically in α β α blockC. litteratus the 3 transcriptomes,2 nAChRs subtype which [ 33belong]. However, to the α some4/7 subfamily-conotoxins with CC found‐X4‐ inC‐C.X7 litteratus‐C mode.that Actually, have beenLt020 reported is α‐conotoxin in ConoServer LtIA, which or other was discovered references wereby our not lab found previously, in the transcriptome and can specifically sequencing block inthe this α3β study,2 nAChRs such subtype as Lt1.46 [33]. precursor However, and some Lt1C α‐ precursorconotoxins [21 found]. This in phenomenon C. litteratus that also have exists been in α otherreported conus, in ConoServer that is, the genomeor other genesreferences of -conotoxinswere not found cannot in the be foundtranscriptome in the transcriptome sequencing in data. this study, such as Lt1.46 precursor and Lt1C precursor [21]. This phenomenon also exists in other conus, that is, the genome genes of α‐conotoxins cannot be found in the transcriptome data. This means that most of time, many α‐conotoxin precursor genes are not transcribed and remain silent. Conotoxin gene expression depends on the prey species, the environment, different development stage, etc., which results in more complex genetic diversity for conotoxin production. M superfamily peptides appear to be ubiquitous in Conus venom, and several conotoxins of this superfamily are known to be antagonists or blockers of voltage‐gated sodium channels, voltage‐gated K+ channels, or neuromuscular nAChRs [17,34]. The largest number of conotoxins in C. litteratus was

Mar. Drugs 2020, 18, 464 10 of 17

This means that most of time, many α-conotoxin precursor genes are not transcribed and remain silent. Conotoxin gene expression depends on the prey species, the environment, different development stage, etc., which results in more complex genetic diversity for conotoxin production. M superfamily peptides appear to be ubiquitous in Conus venom, and several conotoxins of this superfamily are known to be antagonists or blockers of voltage-gated sodium channels, voltage-gated K+ channels, or neuromuscular nAChRs [17,34]. The largest number of conotoxins in C. litteratus was found to be in the M superfamily in this study. There were 9 M superfamily mature peptides with various cysteine frameworks. Among them, four peptides—M-1: Lt046, Lt047, Lt048; and M-2: Lt58—had a cysteine framework of type III of CC-C-C-CC; one peptide, Lt067, showed a type XVI of C-C-CC; and one peptide, Lt086, had a type XXXII of C-CC-C-C-C framework. However, there were three peptides—Lt083, Lt084, Lt085—without disulfide bonds at all. In addition, it is known that LtIIIA is the same as the Lt046 conotoxin found in C. literatus herein, which exhibits the ability to enhance tetrodotoxin-sensitive sodium currents. LtIIIA is an agonist of voltage-gated sodium channels with no delayed inactivation belonging to the iota pharmacological family [35]. Currently, ω-conotoxin MVIIA (ziconotide) from the O1 superfamily is still the only marketed conotoxin of neuronal calcium channel blocker for treating severe chronic pain [16]. Here, seven genes of the O1 superfamily conotoxin precursors were discovered from C. litteratus transcriptomes. There were five O1 superfamily mature peptides with VI/VII cysteine frameworks of C-C-CC-C-C, which was the same as ω-conotoxin MVIIA. Interestingly, two O1 superfamily mature peptides possessed no disulfide bonds. However, there was only one peptide, Lt040, that was shared by all three transcriptomic datasets, and it was ranked top10 of FPKM in every group. T and O2 superfamily peptides were the most frequent and were quite common among the top 30 expressed conotoxins in the three different-sized C. litteratus transcriptomes. This suggested their important function for C. litteratus. Recent studies suggest that some T superfamily peptides target various subtypes of GPCRs, ion channels or neurotransmitter transporters. Some O2 superfamily peptides are thought to target pacemaker channels in molluscan neurons [17]. Three T superfamily peptides, Lt015, Lt044 and Lt072, contained the type V cysteine framework, CC-CC. Three O2 superfamily conopeptides were found from C. litteratus, of which Lt012 had the XV framework of C-C-CC-C-C-C-C, Lt013 contained the framework VI/VII of C-C-CC-C-C, and Lt019 had only one disulfide bond. All three O2 superfamily peptides ranked in the top 30. Interestingly, Lt033 belonged to the L superfamily, with a type XIV cysteine framework of C-C-C-C, and was ranked in the top 10 of FPKM in each group. Conotoxins have become a very important source for the development of marine peptide drugs, which target various receptors, ion channels, etc., with high selectivity, such as nicotinic acetylcholine receptors, sodium channels, potassium channel, calcium channel, and other relevant receptors, transporters, etc. The targets of conotoxins are involved in many human diseases, including pain, addiction, depression, autoimmune disease, cancers, etc. [8,36–39]. Therefore, these conotoxins from C. litteratus may target different receptors. Here, common conopeptides with known pharmacological activity shared by the top 30 in the three groups are listed in Table5. α-Conotoxin Lt020 (LtIA), a member of the α4/7 subfamily, showed high levels of expression with the average value of FPKM being ranked first in the three C. litteratus groups (Table5), and has been characterized as a potent antagonist of α3β2 nAChR subtype in our lab before [33]. Lt033 (LtXIVA) belongs to the L superfamily, with FPKM ranked 6th, and has been identified as an antinociceptive agent that targets the nAChR receptor [40]. Lt072 (LtVd), the 4th-ranked FPKM, can block tetrodotoxin-sensitive (TTX-S) Na+ channels [41]. Additionally, Lt040 (LtVIC) can block both tetrodotoxin-sensitive (TTX-S) and -resistant (TTX-R) Na+ channels [42]. The expression levels of these conotoxins are in the forefront and are shared among the three groups, which indicates that these conotoxins have much higher concentrations in the mixture of C. litteratus venom than other peptides. Therefore, it can easily be concluded that conotoxins of different pharmacological families in venom may have a synergistic effect in capturing prey and in defense processes. Moreover, α-conotoxins as Mar. Drugs 2020, 18, 464 11 of 17 nAChRs inhibitors and µ-conotoxins, as sodium channel inhibitors, have shown great potential to modulate pain pathways [8]; therefore, these conotoxins are promising candidates or lead compounds forMar. the Drugs development 2020, 18, x of novel marine drugs as pain therapeutics. Furthermore, we have begun to12 study of 18 the pharmacological function of other new conotoxins discovered in this study. Table 5. Common conopeptides with known pharmacology activity shared by the top 30 in three Tablegroups. 5. Common conopeptides with known pharmacology activity shared by the top 30 in three groups.

RankingRanking of FPKMof FPKMCystine Cystine Super‐ Reported‐ AverageAverage value Value of of Conotoxins Super-Family Reported-Activity SCSC MCMC BC BC Patternpattern family activity FPKMFPKM rank Rank Lt020 7 2 1 I A α* [33] 3.3 Lt020 7 2 1 I A α [33] 3.3 Lt072Lt072 44 55 33 VV TT μ* [41]µ [ 41]4.0 4.0 Lt033Lt033 33 77 77 XIVXIV LL α* [40]α [ 40]5.7 5.7 Lt040Lt040 99 99 55 VI/VIIVI/VII O1O1 μ* [42]µ [ 42]7.7 7.7 Most eukaryotic genomes have a great number of different types of simple tandem repeats knownMost as microsatellite eukaryotic genomes DNA or have simple a greatsequence number repeats. of diThefferent intronic types microsatellites of simple tandem may affect repeats gene knowntranscription as microsatellite and mRNA DNA splicing or simple[43]. A previous sequence report repeats. showed The that intronic a limited microsatellites set of 100 conopeptide may affect geneprecursor transcription genes could and generate mRNA thousands splicing [ 43of]. conopeptides A previous in report a single showed species thatof cone a limited snail. Variable set of 100peptide conopeptide processing precursor was expected genes could to contribute generate to thousands the evolution of conopeptides and diversity in of a singleConus speciesvenoms, of which cone snail.includes Variable alternative peptide cleavage processing sites, was expectedheterogeneous to contribute post‐translational to the evolution modifications, and diversity and of Conushighly venoms,variable which N‐ and includes C‐terminal alternative truncations cleavage [24]. sites, It has heterogeneous been reported post-translational that GT/AG is modifications, the main site and for highlyrecognition variable and N- splicing and C-terminal of pre‐ truncationsmRNA in [eukaryotes24]. It has been [44,45]. reported The thatjunctions GT/AG in is conotoxin the main site genes for recognitionbetween exon and and splicing intron of pre-mRNAare typical indonor eukaryotes and acceptor [44,45]. splice The junctions sites that in follow conotoxin the GT/AG genes between rule. In exonthis andpaper, intron SSR are analysis typical of donor partial and C. acceptor litteratus splice genes sites encoding that follow conotoxins the GT/ AGshowed rule. that In this the paper, most SSRabundant analysis type of partial was diC.‐nucleotide litteratus genes repeats, encoding of which conotoxins TG/CA, GT/AC, showed AG/CT, that the mostGA/TC abundant were the type top wasfour di-nucleotidedi‐nucleotide repeats, repeats of (Figure which 8). TG These/CA, GT simple/AC, AG repeats/CT, GA may/TC provide were the more top fouracceptor di-nucleotide and donor repeats splice (Figuresites for8). alternative These simple mRNA repeats splicing. may provide Hence, more it may acceptor be one and of donorthe reasons splice sitesfor the for alternativediversity of mRNAconotoxins, splicing. which Hence, could itproduce may be multiple one of the mRNA reasons and for protein the diversity isoforms. of Therefore, conotoxins, it whichmay result could in produceproteome multiple diversity, mRNA in andwhich protein there isoforms. are a lot Therefore, of different it may peptides result inwith proteome related, diversity, distinct in or which even thereopposing are a functions. lot of different peptides with related, distinct or even opposing functions. ThereThere were were 87, 87, 74 74 and and 74 74 conotoxins conotoxins belonging belonging to to 21, 21, 20, 20, and and 20 20 superfamilies superfamilies found found in in the the small,small, medium,medium, and and big big size sizeC. litteratus C. litteratustranscriptomes transcriptomes (SC, MC, (SC, BC), MC, respectively BC), respectively (Figure 10). (Figure The comparison 10). The ofcomparison results revealed of results that revealed the number that of the shared number conopeptides of shared conopeptides between each between pair of groups each pair ranged of groups from 47ranged to 53. from There 47 were to 53. several There individual-specific were several individual conopeptides‐specific for conopeptides the three cone for snail the groups three (Figurecone snail1): 29groups for small-size (Figure 1): individuals 29 for small (Figure‐size individuals1A), 12 for (Figure medium-size 1A), 12 individualsfor medium (Figure‐size individuals1B) and 19 (Figure for big-size1B) and individuals 19 for big‐size (Figure individuals1C), respectively (Figure 1C), (Figure respectively7A). C. litteratus(Figure 7A).venom C. litteratus peptides venom still displayed peptides significantstill displayed intraspecific significant variation intraspecific between divariationfferent-size between individuals, different which‐size is consistent individuals, with previouswhich is studyconsistent [46–49 with]. previous study [46–49].

Figure 10. Number of conotoxins and superfamilies discovered in different size C. litteratus transcriptomes. Figure 10. Number of conotoxins and superfamilies discovered in different size C. litteratus Theretranscriptomes. were eight conotoxin sequences discovered from C. litteratus venom transcriptome here that had been reported in other Conus species previously (Table2, S10-Figure S1). This means Conus speciesThere from were the sameeight cladeconotoxin or di ffsequenceserent clades discovered may generate from C. a fewlitteratus similar venom or even transcriptome complete same here that had been reported in other Conus species previously (Table 2, S10‐Figure S1). This means Conus species from the same clade or different clades may generate a few similar or even complete same conopeptides, although their genetic relationships may not be close to C. litteratus (S10‐Figure S2) [5]. This suggests that these conotoxins in Table 2 may be of slow evolution and conservative between different clades.

Mar. Drugs 2020, 18, 464 12 of 17 conopeptides, although their genetic relationships may not be close to C. litteratus (S10-Figure S2) [5].

ThisMar. Drugs suggests 2020, 18 that, x these conotoxins in Table2 may be of slow evolution and conservative between13 of 18 different clades. Conotoxin precursors precursors of of the the same same superfamily superfamily tend tend to be to conservative be conservative in the in signal the signal region, region, while whiletheir mature their mature regions regionsare highly are variable. highly variable.A similar situation A similar occurred situation in the occurred O1 and in O2 the superfamilies, O1 and O2 whethersuperfamilies, they were whether cysteine they‐poor were or cysteine-poor ‐rich peptides or(S10 -rich‐Figure peptides S3, S10(S10-Figure‐Figure S4). S3, It S10-Figureis obvious that S4). theIt is signal obvious sequences that the of signal O1 and sequences O2 superfamily of O1 and conopeptide O2 superfamily precursors conopeptide remain highly precursors evolutionarily remain conserved,highly evolutionarily even if their conserved, mature peptides even if their have mature extreme peptides variation have (S10 extreme‐Figure variation S3, S10‐Figure(S10-Figure S4). For S3, example,S10-Figure mature S4). For peptides example, of mature Lt035 and peptides Lt036 of in Lt035 the O1 and superfamily Lt036 in the do O1 not superfamily have any docysteine, not have while any thecysteine, rest of while the O1 the superfamily rest of the O1mature superfamily peptides mature in S10‐ peptidesFigure S3 in contain S10-Figure six cysteines S3 contain with six a cysteinesC‐C‐CC‐ withC‐C framework. a C-C-CC-C-C Similarly, framework. mature Similarly, peptides mature of Lt008 peptides and Lt009 of Lt008in the O2 and superfamily Lt009 in the have O2 superfamily no cysteine either,have no but cysteine the other either, O2 superfamily but the other mature O2 superfamily peptides in mature S10‐Figure peptides S4 contain in S10-Figure eight cysteines, S4 contain with eight a Ccysteines,‐C‐CC‐C with‐C‐C‐ aC C-C-CC-C-C-C-C framework. framework. Codon usage for cysteine in the O1 superfamilysuperfamily conopeptide genes (Lt037-Lt041)(Lt037‐Lt041) was also analyzed (S10 (S10-Figure‐Figure S5). It It was was highly highly conserved conserved for for cysteine cysteine residue residue II II (TGT), (TGT), III III (TGC) and IV (TGC). Meanwhile,Meanwhile, TGT TGT and and TGC TGC were were both both used used for for cysteine cysteine residue residue I, V I, andV and VI. VI. It is It obvious is obvious that that the preferencethe preference of cysteine of cysteine codon codon varies varies in di inff differenterent positions positions in O1 in O1 superfamily superfamily genes. genes. Codon Codon usage usage of ofcysteine cysteine residue residue II, III II, and III and IV are IV more are more conservative. conservative. This phenomenonThis phenomenon is very is common very common in the mature in the peptidemature peptide region of region conotoxin of conotoxin genes [1 genes,50]. [1,50]. Previous studies have shown that the precursor genes of α‐α-conotoxinsconotoxins of the A superfamily superfamily have one intron located in the pro pro-region‐region (Figure 11),11), and there there are two two introns introns present in in I1, I1, I2, I2, M, M, O1, O1, O2, O3, O3, P, P, S S and and T T superfamily superfamily precursor precursor genes genes [43]. [43 ].However, However, no nointron intron was was found found in either in either B1 or B1 J superfamilyor J superfamily conotoxin conotoxin genes genes in this in this study. study. Furthermore, Furthermore, a along long intron intron with with 1273 1273-1339‐1339 bp bp was discovered in F superfamily genes firstly, firstly, which was located in the the mature mature region region (Figure (Figure 1111 andand S9). S9). The intron of F superfamily genes was also rich in microsatellite sequences, which may affect affect gene transcription, mRNAmRNA splicing, splicing, or or mRNA mRNA export export to the to cytoplasmthe cytoplasm [51]. In[51]. addition, In addition, GT/TG GT/TG and AG /andGA AG/GAin this intron in this were intron the were most the abundant most abundant repeated motifs,repeated which motifs, is in which accordance is in accordance with the SSR with analysis the SSR in analysistranscriptome in transcriptome of C. litteratus of. C. This litteratus suggests. This that suggests diversity that of gene diversity structures of gene for structures different superfamilies for different superfamiliesmay have significant may have effects significant on conotoxin effects gene on conotoxin transcription. gene transcription.

Figure 11. Gene structures of F and A superfamily conotoxin precursors. Signal sequence: blue box; FigurePropeptide: 11. Gene green structures box; Mature of F sequence: and A superfamily red box; Intron: conotoxin black precursors. box. Signal sequence: blue box; Propeptide: green box; Mature sequence: red box; Intron: black box. 4. Materials and Methods

4.4.1. Materials Sample Collection, and Methods RNA Extraction and Sequencing

4.1. SampleC. litteratus Collection,specimens, RNA Extraction 6-12 cm and in Sequencing length, were collected in the offshore areas of the Xisha Islands, South China Sea, China. Immediately after collection, the cone snails were evenly divided C. litteratus specimens, 6‐12 cm in length, were collected in the offshore areas of the Xisha Islands, into three groups on the basis of length (Small, 6–7 cm; Middle, 8–10 cm; Big, 10–12 cm; Figure1), South China Sea, China. Immediately after collection, the cone snails were evenly divided into three and then dissected on ice soon after. To avoid the uncertainty and randomness of one single specimen, groups on the basis of length (Small, 6–7 cm; Middle, 8–10 cm; Big, 10–12 cm; Figure 1), and then five C. litteratus specimens of similar size for each group (SC, MC, BC) were selected to isolate their dissected on ice soon after. To avoid the uncertainty and randomness of one single specimen, five C. venom duct total RNAs (S10-Figure S6). Qubit2.0 was used to measure RNA concentration of each litteratus specimens of similar size for each group (SC, MC, BC) were selected to isolate their venom cone snail (S10-Table S1). Then, the same amount of every RNA in each group was mixed together duct total RNAs (S10‐Figure S6). Qubit2.0 was used to measure RNA concentration of each cone snail for transcriptome sequencing (S10-Table S2). Total RNAs were extracted from venom ducts using (S10‐Table S1). Then, the same amount of every RNA in each group was mixed together for Total RNA Extractor (Sangon Biotech, Shanghai, China) according to the manufacturer’s instructions; transcriptome sequencing (S10‐Table S2). Total RNAs were extracted from venom ducts using Total RNA Extractor (Sangon Biotech, Shanghai, China) according to the manufacturer’s instructions; Qubit2.0 was used for the detection of RNA concentration, and poly A mRNA molecules were purified using oligo (dT)‐attached magnetic beads. cDNA libraries were prepared using a VAHTSTM mRNA‐seq V2 Library Prep Kit for Illumina® (Vazyme Biotech, Nanjing, China) following the

Mar. Drugs 2020, 18, 464 13 of 17

Qubit2.0 was used for the detection of RNA concentration, and poly A mRNA molecules were purified using oligo (dT)-attached magnetic beads. cDNA libraries were prepared using a VAHTSTMmRNA-seq V2 Library Prep Kit for Illumina® (Vazyme Biotech, Nanjing, China) following the manufacturer’s protocol, and sequenced on the Illumina HiSeq2000 platform (Illumina, San Diego, CA, USA) by Sangon (Sangon Biotech, Shanghai, China).

4.2. Sequence Data Assembly and Analysis The raw data was first assessed using the FastQC v0.10.1 program (Babraham Institute, London, England), and in order to generate clean data, Trimmomatic v0.39 [52] was used for the removal of adapter sequences, with reads containing ploy-N and low-quality sequences. After obtaining clean data, Trinity v2.5.1 [53] was used for assembly with the default parameters and a minimum length of 201 bp. The longest transcript of a cluster with various isoforms was defined as a unigene. To estimate transcript abundance in terms of FPKM, RNA-Seq was performed by Expectation Maximization (RSEM) v1.2.31 [54]. For further protein functional annotation, all the assembled unigenes were searched against Nr (non-redundant protein database at NCBI), NT, PFAM, Swissprot and TrEMBL using BLASTX with an E-value < 1 10 5 and a similarity >30%, and only the top hits were considered [55–57]. × − 4.3. Prediction and Classification of Conotoxins The prediction and identification of conopeptides was conducted using the following steps. First, unigenes were annotated with ConoSorter to obtain preliminary results, including amino acid sequences, number of cysteines, and N-terminal hydrophobicity rate [58]. In addition, sequences were then identified by Blastp similarity searches with an E-value < 1e-05 against the local reference conotoxin database, which was constructed with 7414 conotoxins downloaded from the ConoDB (http://conco.ebc.ee/). Finally, the predicted conotoxin sequences were manually inspected using the ConoPrec tool in ConoServer and the Blastp in the NCBI database. The duplication and truncated mature region sequences were removed, and the signal peptides, gene superfamilies and cysteine frameworks were also checked for confirmation [1,26]. Taking into account the percentage of highly conserved signal sequence identity, the predicted conopeptide precursors were assigned to different superfamilies, with a universal identity threshold of 75%, and the particular threshold values for assigning conopeptides to I1, I2, L, M, P, S, con-ikot-ikot and divergent superfamilies were adjusted to 71.85, 57.6, 67.5, 69.3, 69.1, 72.9, 64.5 20.2 and ± 64.22 20.53%, respectively [58]. If the identity of a signal region was below the threshold value for ± any reported superfamily, the conopeptide was considered as a member of a new conotoxin superfamily. Meanwhile, those conotoxins without signal regions but still showing high similarity either in the pro-region or the mature region were regarded as belonging to the “unknown” family.

4.4. Analysis of SSR Distribution and Codon Usage Bias MISA v1.0 [59] was used (http://pgrc.ipk-gatersleben.de/misa/) to detect simple sequence repeats (SSRs). SSR were identified using the search criteria with the minimum repetitions of di-, tri-, tetra-, penta-, and hexa-nucleotides being 6, 5, 5, 5, and 5, respectively, and the flanking sequence length of the SSR loci was greater than 100 bp. Codon usage bias was analyzed by SMS suite (http://www.bio-soft.net/sms/) and CodonW 1.4.2 [60] with the default parameter values.

4.5. Isolation of Genomic DNA of C. Litteratus and Gene Cloning of Partial Superfamily Conotoxins Herein, magnetic bead method was used to extract high-quality genomic DNA of the venom bulb in C. litteratus. The primers of the B1, J, and F superfamilies were designed based on the coding sequence of conotoxins found in C. litteratus at the transcriptome level (Supplementary Material, S2). Then, conopeptide genes were amplified by PCR using the genomic DNA of C. litteratus as templates. After sanger dideoxy sequencing, the nucleotide sequences were translated into amino acid sequences and predicted by ConoPrec. Mar. Drugs 2020, 18, 464 14 of 17

5. Conclusions In summary, this paper comprehensively investigates the conotoxin diversity with respect to cysteine framework, pharmacological activity and gene superfamily of C. litteratus. A total of 128 unique putative conopeptide sequences were identified, most of which could be classified into 22 known superfamilies. There were 43 precursors belonging to new superfamilies. Among them, there were 15 known cysteine frameworks that had been described previously, and 5 novel cysteine patterns that were discovered here. Moreover, di-nucleotide was the most abundant type of repetition in conotoxin precursor genes. The codon usage bias of the conotoxin genes was not obvious. However, the M, O1, and O2 superfamilies differed in codon preference in most amino acids. We also discovered that the B1 and J superfamilies had no intron, and that the F superfamily possessed one intron with 1273–1339 bp in the mature region. Overall, this study lays the foundation for molecular biology research of conotoxin diversity, and the newly discovered conotoxins will provide more potential candidate peptides for developing new marine drugs.

Supplementary Materials: The following supplementary are available online at http://www.mdpi.com/1660- 3397/18/9/464/s1, S1: All conotoxins detail; S2: CDS of all conotoxin; S3: Reported conotoxins in conosever; S4: Conotoxin distribution in three databases (SC, MC and BC); S5: FPKM value of top 30; S6: Codon usage bias of all conotoxin genes; S7: RSCU value of different group; S8: B1 and J superfamily gene; S9: F superfamily gene structure; S10: RNA preparation in each group and other supporting information; S10-Figure S1: The alignment of highly homologous sequences in Table2 reported from other Conus species; S10-Figure S2: Bayesian tree based on a concatenation of the COI, 16S and 12S genes for the reduced dataset of 326 specimens cited from reference 5. Conus species with red frame may generate a few same conopeptide sequences compare to C. literatus; S10-Figure S3: Alignment of O1 superfamily conotoxins with or without rich cysteines; S10-Figure S4: Alignment of O2 superfamily conotoxins with or without rich cysteines; S10-Figure S5 Codon usage for cysteine in the O1 superfamily conotoxin genes; S10-Figure S6: Total RNAs of each venom ducts were extracted from different individuals of C. litteratus. M: RNA Marker, 1-5: Small size cone snail; 6-10: Medium size cone snail; 11-15: Big size cone snail; C: Positive control; S10-Figure S7: Alignment of the signal region of Lt065, Lt077 with the reported conotoxin belonging to H superfamily in Conosever; S10-Figure S8: Alignment of the signal region of Lt097 with the reported conotoxin belonging to P superfamily in Conosever; S10-Table S1: Total RNA concentrations for 15 specimens; S10-Table S2: Final RNA concentration of mixed samples of each group. Author Contributions: X.L., D.Z. and S.L. designed the experiments; X.L. and W.C. performed the experiments; X.L., D.Z. and S.L. analyzed the data; X.L., W.C. and S.L. wrote the manuscript. All authors have read and agreed to the published version of the manuscript. Funding: This work was supported by National Natural Science Foundation of China (No. 41966003 and No. 81872794). Conflicts of Interest: The authors declare no conflict of interest.

References

1. Lavergne, V.; Harliwong, I.; Jones, A.; Miller, D.; Taft, R.J.; Alewood, P.F. Optimized deep-targeted proteotranscriptomic profiling reveals unexplored Conus toxin diversity and novel cysteine frameworks. Proc. Natl. Acad. Sci. USA 2015, 112, E3782–E3791. [CrossRef][PubMed] 2. Prashanth, J.; Dutertre, S.; Jin, A.; Lavergne, V.; Hamilton, B.; Cardoso, F.; Griffin, J.; Venter, D.; Alewood, P.; Lewis, R.J. The role of defensive ecological interactions in the evolution of conotoxins. Mol. Ecol. 2016, 25, 598–615. [CrossRef][PubMed] 3. Yao, G.; Peng, C.; Zhu, Y.; Fan, C.; Jiang, H.; Chen, J.; Cao, Y.; Shi, Q. High-Throughput Identification and Analysis of Novel Conotoxins from Three Vermivorous Cone Snails by Transcriptome Sequencing. Mar. Drugs 2019, 17, 193. [CrossRef][PubMed] 4. Dutertre, S.; Jin, A.H.; Vetter, I.; Hamilton, B.; Sunagar, K.; Lavergne, V.; Dutertre, V.; Fry, B.G.; Antunes, A.; Venter, D.J.; et al. Evolution of separate predation- and defence-evoked venoms in carnivorous cone snails. Nat. Commun. 2014, 5, 3521. [CrossRef][PubMed] 5. Puillandre, N.; Bouchet, P.; Duda, T., Jr.; Kauferstein, S.; Kohn, A.; Olivera, B.; Watkins, M.; Meyer, C. Molecular phylogeny and evolution of the cone snails (, ). Mol. Phylogenet. Evol. 2014, 78, 290–303. [CrossRef][PubMed] Mar. Drugs 2020, 18, 464 15 of 17

6. Baldomero, M.; Olivera, B. Conus venom peptides, receptor and ion channel targets, and drug design: 50 million years of neuropharmacology. Mol. Biol. Cell 1997, 8, 2101–2109. 7. Norton, R.S.; Olivera, B.M. Conotoxins down under. Toxicon 2006, 48, 780–798. [CrossRef] 8. Lewis, R.J.; Dutertre, S.; Vetter, I.; Christie, M.J. Conus venom peptide pharmacology. Pharm. Rev. 2012, 64, 259–298. [CrossRef] 9. Nelson, L. Venomous snails: One slip, and you’re dead. Nature 2004, 429, 798–800. [CrossRef] 10. Olivera, B.M.; Teichert, R.W. Diversity of the neurotoxic Conus peptides. Mol. Microbiol. 2007, 7, 251. [CrossRef] 11. Xie, B.; Huang, Y.; Baumann, K.; Fry, B.G.; Shi, Q. From marine venoms to drugs: Efficiently supported by a combination of transcriptomics and proteomics. Mar. Drugs 2017, 15, 103. [CrossRef][PubMed] 12. McIntosh, J.M.; Jones, R.M. Cone venom–from accidental stings to deliberate injection. Toxicon 2001, 39, 1447–1451. [CrossRef] 13. Cruz, L.J.; Gray, W.R.; Olivera, B.M. Purification and properties of a myotoxin from Conus geographus venom. Arch. Biochem. Biophys. 1978, 190, 539–548. [CrossRef] 14. Olivera, B.M. ω-Conotoxin MVIIA: From marine snail venom to analgesic drug. In Drugs from the Sea; Fusetani, N., Ed.; Karger Publishers: Salt Lake City, UT, USA, 2000; pp. 74–85. 15. McGivern, J.G. Ziconotide: A review of its pharmacology and use in the treatment of pain. Neuropsychiatr. Dis. Treat. 2007, 3, 69. [CrossRef][PubMed] 16. Brinzeu, A.; Berthiller, J.; Caillet, J.B.; Staquet, H.; Mertens, P. Ziconotide for spinal cord injury-related pain. Eur. J. Pain. 2019, 23, 1688–1700. [CrossRef][PubMed] 17. Robinson, S.D.; Norton, R.S. Conotoxin gene superfamilies. Mar. Drugs 2014, 12, 6058–6101. [CrossRef] [PubMed] 18. Robinson, S.D.; Safavi-Hemami, H.; McIntosh, L.D.; Purcell, A.W.; Norton, R.S.; Papenfuss, A.T. Diversity of conotoxin gene superfamilies in the venomous snail, Conus victoriae. PLoS ONE 2014, 9, e87648. [CrossRef] 19. Kaas, Q.; Westermann, J.C.; Halai, R.; Wang, C.K.; Craik, D.J. ConoServer, a database for conopeptide sequences and structures. Bioinformatics 2008, 24, 445–446. [CrossRef] 20. Kaas, Q.; Westermann, J.C.; Craik, D.J. Conopeptide characterization and classifications: An analysis using ConoServer. Toxicon 2010, 55, 1491–1509. [CrossRef] 21. Kaas, Q.; Yu, R.; Jin, A.H.; Dutertre, S.; Craik, D.J. ConoServer: Updated content, knowledge, and discovery tools in the conopeptide database. Nucleic Acids Res. 2012, 40, D325–D330. [CrossRef] 22. Huang, Y.; Peng, C.; Yi, Y.; Gao, B.; Shi, Q. A Transcriptomic Survey of Ion Channel-Based Conotoxins in the Chinese Tubular Cone Snail (Conus betulinus). Mar. Drugs 2017, 15, 228. [CrossRef][PubMed] 23. Gao, B.; Peng, C.; Zhu, Y.; Sun, Y.; Zhao, T.; Huang, Y.; Shi, Q. High Throughput Identification of Novel Conotoxins from the Vermivorous Oak Cone Snail (Conus quercinus) by Transcriptome Sequencing. Int. J. Mol. Sci. 2018, 19, 3901. [CrossRef][PubMed] 24. Dutertre, S.; Jin, A.H.; Kaas, Q.; Jones, A.; Alewood, P.F.; Lewis, R.J. Deep venomics reveals the mechanism for expanded peptide diversity in cone snail venom. Mol. Cell Proteom. 2013, 12, 312–329. [CrossRef][PubMed] 25. Puillandre, N.; Duda, T.; Meyer, C.; Olivera, B.; Bouchet, P. One, four or 100 genera? A new classification of the cone snails. J. Molluscan Stud. 2015, 81, 1–23. [CrossRef][PubMed] 26. Peng, C.; Yao, G.; Gao, B.M.; Fan, C.X.; Bian, C.; Wang, J.; Cao, Y.; Wen, B.; Zhu, Y.; Ruan, Z.; et al. High-throughput identification of novel conotoxins from the Chinese tubular cone snail (Conus betulinus) by multi-transcriptome sequencing. Gigascience 2016, 5, 17. [CrossRef] 27. Li, R.; Bekaert, M.; Wu, L.; Mu, C.; Song, W.; Migaud, H.; Wang, C. Transcriptomic Analysis of Marine Gastropod Hemifusus tuba Provides Novel Insights into Conotoxin Genes. Mar. Drugs 2019, 17, 466. [CrossRef] 28. Himaya, S.W.A.; Lewis, R.J. Venomics-accelerated cone snail venom peptide discovery. Int. J. Mol. Sci. 2018, 19, 788. [CrossRef] 29. Larkin, M.A.; Blackshields, G.; Brown, N.P.; Chenna, R.; McGettigan, P.A.; McWilliam, H.; Valentin, F.; Wallace, I.M.; Wilm, A.; Lopez, R.; et al. Clustal W and Clustal X version 2.0. Bioinformatics 2007, 23, 2947–2948. [CrossRef] 30. Tsetlin, V.I.; Utkin, Y.N.; Kasheverov, I.E.; Lyukmanova, E.N. 142. From alpha-Conotoxins and alpha-Neurotoxins to Endogenous “Prototoxins” and Binding Sites in Nicotinic Acetylcholine Receptors. Toxicon 2012, 60, 167–168. [CrossRef] Mar. Drugs 2020, 18, 464 16 of 17

31. Kryukova, E.; Ivanov, I.; Lebedev, D.; Spirova, E.; Egorova, N.; Zouridakis, M.; Kasheverov, I.; Tzartos, S.; Tsetlin, V. Orthosteric and/or Allosteric Binding of α-Conotoxins to Nicotinic Acetylcholine Receptors and Their Models. Mar. Drugs 2018, 16, 460. [CrossRef] 32. Sébastien, D.; Annette, N.; Victor, I.T. Nicotinic acetylcholine receptor inhibitors derived from snake and snail venoms. Neuropharmacology 2017, 127, 196–223. 33. Luo, S.; Akondi, K.B.; Zhangsun, D.; Wu, Y.; Zhu, X.; Hu, Y.; Christensen, S.; Dowell, C.; Daly, N.L.; Craik, D.J. Atypical α-conotoxin LtIA from Conus litteratus targets a novel microsite of the α3β2 nicotinic receptor. J. Biol. Chem. 2010, 285, 12355–12366. [CrossRef][PubMed] 34. Jacob, R.B.; McDougal, O.M. The M-superfamily of conotoxins: A review. Cell. Mol. Life Sci. 2010, 67, 17–27. [CrossRef][PubMed] 35. Wang, L.; Liu, J.; Pi, C.; Zeng, X.; Zhou, M.; Jiang, X.; Chen, S.; Ren, Z.; Xu, A. Identification of a novel M-superfamily conotoxin with the ability to enhance tetrodotoxin sensitive sodium currents. Arch. Toxicol. 2009, 83, 925–932. [CrossRef] 36. Wickenden, A.; Priest, B.; Erdemli, G. Ion channel drug discovery: Challenges and future directions. Future Med. Chem. 2012, 4, 661–679. [CrossRef][PubMed] 37. Wu, Y.; Ma, H.; Zhang, F.; Zhang, C.; Zou, X.; Cao, Z. Selective Voltage-Gated Sodium Channel Peptide Toxins from Venom: Pharmacological Probes and Analgesic Drug Development. ACS Chem. Neurosci. 2018, 9, 187–197. [CrossRef] 38. Catterall, W.A. Structure and regulation of voltage-gated Ca2+ channels. Annu. Rev. Cell Dev. Biol. 2000, 16, 521–555. [CrossRef] 39. Hurst, R.; Rollema, H.; Bertrand, D. Nicotinic acetylcholine receptors: From basic science to therapeutics. Pharmacol. Ther. 2013, 137, 22–54. [CrossRef] 40. Ren, Z.; Wang, L.; Qin, M.; You, Y.; Pan, W. Pharmacological characterization of conotoxin lt14a as a potent non-addictive analgesic. Toxicon 2015, 96, 57–67. [CrossRef] 41. Liu, J.; Wu, Q.; Pi, C.; Zhao, Y.; Zhou, M.; Wang, L.; Chen, S.; Xu, A. Isolation and characterization of a T-superfamily conotoxin from Conus litteratus with targeting tetrodotoxin-sensitive sodium channels. Peptides 2007, 28, 2313–2319. [CrossRef] 42. Wang, L.; Pi, C.; Liu, J.; Chen, S.; Peng, C.; Sun, D.; Zhou, M.; Xiang, H.; Ren, Z.; Xu, A. Identification and characterization of a novel O-superfamily conotoxin from Conus litteratus. J. Pept. Sci. 2008, 14, 1077–1083. [CrossRef][PubMed] 43. Yun, W.; Lei, W.; Maojun, Z.; Yuwen, Y.; Xiaoyan, Z.; Yuanyuan, Q.; Mengying, Q.; Shaonan, L.; Zhenghua, R.; Anlong, X. Molecular Evolution and Diversity of Conus Peptide Toxins, as Revealed by Gene Structure and Intron Sequence Analyses. PLoS ONE 2013, 8, e82495. [CrossRef] 44. Maniatis, T. Mechanisms of alternative pre-mRNA splicing. Science 1991, 251, 33–35. [CrossRef][PubMed] 45. Lopez, A.J. Alternative splicing of pre-mRNA: Developmental consequences and mechanisms of regulation. Annu. Rev. Genet. 1998, 32, 279–305. [CrossRef][PubMed] 46. Davis, J.; Jones, A.; Lewis, R.J. Remarkable inter- and intra-species complexity of conotoxins revealed by LC/MS. Peptides 2009, 30, 1222–1227. [CrossRef][PubMed] 47. Romeo, C.; Di Francesco, L.; Oliverio, M.; Palazzo, P.; Massilia, G.R.; Ascenzi, P.; Polticelli, F.; Schininà, M.E. Conus ventricosus venom peptides profiling by HPLC-MS: A new insight in the intraspecific variation. J. Sep. Sci. 2008, 31, 488–498. [CrossRef] 48. Jakubowski, J.A.; Kelley, W.P.; Sweedler, J.V.; Gilly, W.F.; Schulz, J.R. Intraspecific variation of venom injected by fish-hunting Conus snails. J. Exp. Biol. 2005, 208, 2873–2883. [CrossRef] 49. Safavi-Hemami, H.; Young, N.D.; Williamson, N.A.; Purcell, A.W. Proteomic Interrogation of Venom Delivery in Marine Cone Snails: Novel Insights into the Role of the Venom Bulb. J. Proteome Res. 2010, 9, 5610–5619. [CrossRef] 50. Loughnan, M.L.; Nicke, A.; Lawrence, N.; Lewis, R.J. Novel αD-Conopeptides and Their Precursors Identified by cDNA Cloning Define the D-Conotoxin Superfamily . Biochemistry 2009, 48, 3717–3729. [CrossRef] †‡ 51. Li, Y.-C.; Korol, A.B.; Fahima, T.; Nevo, E. Microsatellites within genes: Structure, function, and evolution. Mol. Biol. Evol. 2004, 21, 991–1007. [CrossRef] 52. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [CrossRef][PubMed] Mar. Drugs 2020, 18, 464 17 of 17

53. Grabherr, M.G.; Haas, B.J.; Yassour, M.; Levin, J.Z.; Thompson, D.A.; Amit, I.; Adiconis, X.; Fan, L.; Raychowdhury, R.; Zeng, Q.; et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011, 29, 644. [CrossRef][PubMed] 54. Li, B.; Dewey, C.N. RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 2011, 12, 323. [CrossRef][PubMed] 55. Altschul, S. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [CrossRef] 56. Finn, R.D.; Penelope, C.; Eberhardt, R.Y.; Eddy, S.R.; Jaina, M.; Mitchell, A.L.; Potter, S.C.; Marco, P.; Matloob, Q.; Amaia, S.V. The Pfam protein families database: Towards a more sustainable future. Nucleic Acids Res. 2015, 44, D279–D285. [CrossRef] 57. Rolf, A.; Amos, B.; Wu, C.H.; Barker, W.C.; Brigitte, B.; Serenella, F.; Elisabeth, G.; Huang, H.; Rodrigo, L.; Michele, M. UniProt: The Universal Protein knowledgebase. Nucleic Acids Res. 2004, 32, D115–D119. 58. Lavergne, V.; Dutertre, S.; Jin, A.-h.; Lewis, R.J.; Taft, R.J.; Alewood, P.F. Systematic interrogation of the Conus marmoreus venom duct transcriptome with ConoSorter reveals 158 novel conotoxins and 13 new gene superfamilies. BMC Genom. 2013, 14, 708. [CrossRef] 59. Beier, S.; Thiel, T.; Munch, T.; Scholz, U.; Mascher, M. MISA-web: A web server for microsatellite prediction. Bioinformatics 2017, 33, 2583–2585. [CrossRef] 60. Peden, J.F. Analysis of codon usage. Univ. Nottm. 2000, 90, 73–74.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).