A Comprehensive Multi-Omic Approach Reveals a Simple Venom in a Diet Generalist, the Northern Short-Tailed Shrew, Blarina brevicauda
Thesis
Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the
Graduate School of The Ohio State University
By
Zachery R. Hanf B.S.
Graduate Program in Evolution, Ecology, and Organismal Biology
The Ohio State University
2019
Thesis committee:
Andreas Chavez, Advisor
H. Lisle Gibbs
Marymegan Daly
i
Copyright by Zachery R. Hanf 2019
ii Abstract
Venom is often comprised of a mixture of proteins derived from duplication events of regulatory genes that have undergone positive selection and neofunctionalization. Proteins operating in venom are often designed to disrupt some physiological process in prey organisms, and thus, in venomous organisms who prey on multiple divergent prey with differing physiologies, it is hypothesized that venom complexity should mirror dietary complexity.
Blarina brevicauda, the northern-short tailed shrew, is a venomous mammal which has been observed to eat a variety of prey from widely divergent taxa. However, there have only been two proteins that have been implicated as having toxic properties isolated from this venom system.
Given that venom is typically comprised of many toxic proteins and that B. brevicauda possesses a very broad diet, herein we investigate using a multi-omic approach to determine whether B. brevicauda’s venom is more complex than is currently understood. We generated a de novo transcriptome assembly of the submaxillary gland using both short-read and long-range sequencing data, assembled a reference B. brevicauda genome, and also sequenced the salivary proteome for these shrews. We report that despite containing a complex diet, B. brevicauda’s venom system is likely limited to BLTX, PLA2, and proenkephalin (Soricidin). Additionally, we find that KLK1 duplications are more extensive than previously realized, and that duplication events of these genes may have come from a distant shrew ancestor. Finally, we find two proteins that may be functioning as endogenous defense mechanisms and have high probability of counteracting the self-harming effects of BLTX. Taken together and with the lack of major dietary change in B. brevicauda from other non-venomous shrews, these results suggest that venom may serve as an accessory adaptation to its greater prey capture method. Further functional assays for all venom proteins on both vertebrate and invertebrate prey would provide
iii further insight into the ecological relevance of venom in this species, and possibly other venomous Eulipotyphlan species.
iv
Acknowledgements
I would like to acknowledge and thank my advisor, Andreas Chavez, for his extreme support and dedication to helping me throughout the entirety of this project. I would also like to thank my committee members H. Lisle Gibbs and Marymegan Daly for their valuable and constructive feedback through the planning and execution of my project.
I would like to thank my various lab mates and undergrads for their help and fostering an environment for critical thinking. I would like to thank my partner Whitney King for her never ending support throughout my graduate education. Finally, I would like to thank my parents for believing in and supporting me during my educational journey.
v
Vita
2017…………………………………………………...B.S. The Ohio State University
2017,18,19…………………………………………….Graduate Teaching Assistant,
College of Life Science Education,
The Ohio State University
2017, 18……………………………………………….Graduate Teaching Assistant,
Department of Evolution, Ecology,
and Organismal Biology,
The Ohio State University
Field of Study
Major Field: Evolution, Ecology, and Organismal Biology
vi Table of Contents
Abstract………………………………………………………………………………………….iii
Acknowledgements……………………………………………………………………………....v
Vita……………………………………………………………………………………………….vi
List of Tables…………………………………………………………………………………...viii
List of Figures……………………………………………………………………………………ix
Introduction………………………………………………………………………………………1
Methods…………………………………………………………………………………………...4
Results…………………………………………………………………………………………...14
Discussion……………………………………………………………………………………….20
Conclusion………………………………………………………………………………………29
Literature Cited .……………………………………………………………………………….30
Appendix………………………………………………………………………………………...36
vii List of Tables
Table 1: Results from various steps of the bioinformatical processing of raw RNA reads from the transcriptome of two individual B. brevicauda…………………………………………………36
Table 2: Results from positive selection analyses conducted using the codeml program in PAML.
M8a represents the null model in which dn/ds is fixed to 1 with 11 classes of sites taken into account. M8 represents the alternative model in which dn/ds is allowed to vary amongst 11 classes of sites. …………………………………………………………………………………37
viii List of Figures
Figure 1: Alignment of all Kallikrein-Like Serine Proteases identified from both genomic and transcriptomic analysis…………………………………………………………………………...38
Figure 2: Genomic orientation of KLK1 tandem array in B. brevicauda. Flanking genes (KLK15 and ACPT) are shown along with expression values in TPM for each KLK1 paralog………………………………………………………………..…………………………..39
Figure 3: Three-dimensional model of BLTX showing sites undergoing positive selection
(Magenta colored). Active site residues are highlighted in red………………………………….40
Figure 4: Relative abundance of Putative Venom Proteins in the saliva for both WH5 and WH4 individuals……………………………………………………………………………………….41
Figure 5: Phylogenetic reconstruction of KLK1 sequences from B. brevicauda and S. araneus
(sequences with the “gi|” prefix). Hedgehog and Star-Nosed Mole KLK1 sequences were used as an outgroup. Node labels indicate bootstrap support (1000 bootstraps)…………………………42
Figure 6: Electrostatic potential of modelled surface residues for 1) BLTX 2) Blarinasin 3) Non-
Expressed Kallikrein 4) KLK1 in B. brevicauda. Red indicates a more negative electric potential, whereas blue indicates a more positive electric potential………………………………………. 43
Figure 7: ClusPro predicted protein-protein interaction between BLTX (Purple) and the Double
Headed Protease Inhibitor (Teal). The active sites in the catalytic cleft of BLTX are highlighted in yellow………………………………………………………………………………………….44
ix
Introduction
Venom is typically comprised of a complex mixture of proteins and short peptides that are designed to disrupt prey’s vital physiological processes (Casewell et al. 2013). The proteins functioning within venom are often convergently co-opted from the same regulatory gene families, even amongst widely divergent venomous organisms (Fry and Wüster 2004; Fry et al.
2009). Gene duplications followed by rapid positive selection in these gene families allows for extreme functional diversity in venom proteins, and therefore allows for diversification of molecular targets. Thus, it is predicted that a venomous organism’s ability to prey on a specific prey type is facilitated by the classes of venom proteins that it produces. Therefore, in venomous animals that prey on multiple species, the dietary-breadth hypothesis predicts that venom complexity should correlate with dietary complexity (Daltry et al. 1996; Barlow et al. 2009;
Phuong et al. 2016; Pekár et al. 2018).
Within mammals, only a handful of extant species have evolved venom for prey capture and all of these species are within the Order Eulipotyphla (shrews, moles, hedgehogs, and solenodons) (Dufton 1992; Ligabue-Braun 2015). Eulipotyphlans are comprised of more than
500 species that have colonized a diverse range of terrestrial environments worldwide from arctic to tropical regions. Most Eulipotyphlans, including the venomous species, are generalist predators that feed on vertebrate and invertebrate prey from divergently related animal phyla, including arthropoda, annelida, mollusca, and chordata (Hamilton 1930; Hamilton 1941; Eadie
1952). Because venomous and non-venomous species share similar feeding habits, the selective pressures that drove the gain of venom in only certain Eulipotyphlans has been puzzling. This has stimulated debate as to whether venom evolved to help these species hunt larger prey,
1
particularly small vertebrates, or if venom evolved to further facilitate the caching of small invertebrate prey. Both of these are behaviors have been observed in both venomous and non- venomous species, thus leading to the lack of clarity as to what selective factor drove the evolution of venom in only a handful of species. It is also possible that these hypotheses are not mutually exclusive, and that venom is important for both situations.
Blarina brevicauda (the northern-short tailed shrew) has the most potent venom of the
Eulipotyphlans (Dufton 1992). Dietary studies of B. brevicauda show that they consume widely divergent vertebrate and invertebrate prey including: insects, annelids, molluscs, arachnids, and rodents ( Hamilton 1930; Hamilton 1941; George et al. 1986). Early functional studies on B. brevicauda’s venom from extracts from their submaxillary gland revealed that it causes respiratory arrest and even death in vertebrates, however, the venom also has a paralytic effect on invertebrates (Pearson 1942; Martin 1981). These findings suggest that B. brevicauda venom may have multiple functionalities for targeting different physiological functions in very divergent prey. More recently, proteomic isolation of venom proteins from submaxillary glands of B. brevicauda yielded two Kallikrien-1 (KLK1) like serine proteases (BLTX and Blarinasin).
However, only BLTX exhibited toxic activity in mice, specifically, by cleaving kinogens to bradykinins, but it’s functionality was never evaluated in invertebrates (Kita et al. 2004; Kita et al. 2005). Kita et al (2004) also pointed out that many mammals contain KLK1’s that can cleave kinins to bradykinin, but not all mammals are venomous, and stated that there are likely other constituents in the venom contributing to the lethality of shrew saliva. Furthermore, a short peptide called Soricidin, has been isolated from B. brevicauda and found to have paralytic effects on invertebrates, but similarly, this peptide was not tested on vertebrate prey (Patent:
US8003754B2). Given that venom is typically comprised of many toxic proteins and that B.
2
brevicauda possesses a very broad diet, we explored whether B. brevicauda’s venom complexity matches their diet complexity and whether it contains more toxins than is currently understood.
To gain more insight into the complexity of venom in B. brevicauda, we took a comprehensive mulit-omic approach utilizing genomic, transcriptomic, and proteomic methods.
We generated a de novo transcriptome assembly of the submaxillary gland using both short-read and long-range sequencing data from cDNA. We also assembled a de novo reference genome to be used with our reference transcriptome to bioinformatically search for transcripts with homology to known venom components found in other venomous systems. From our list of putative venom genes, we explored patterns of gene expression, evidence for positive selection, and their 3D-protein structure to provide more insight into their potential role as a venom transcript. These transcriptomic results were then compared with proteomic profiles of saliva from B. brevicauda to see which putative-venom transcripts from the submaxillary gland were present in the saliva and could potentially be used as venom.
3
Methods
Animal Capture and Tissue and Saliva Procurement
We captured two wild B. brevicauda animals (one female (WH4) and one male (WH5)) in pitfall traps near woodpiles at The Ohio State University’s Waterman Farm Headquarters in
June 2017. Saliva was collected by scruffing shrews and allowing them to bite on a piece of sterile medical tubing. The tubes were then placed in a sterile Eppendorf tube and put on ice and within 30 minutes were stored at -80oC. Shrews were then euthanized via an extended period of inhaled isoflurane. The submaxillary glands containing venom transcripts and other internal organs were immediately removed and preserved in RNA Later (Invitrogen) at room temperature for 24 hours and then stored at -80oC.
High-Molecular-Weight gDNA Extraction
High-molecular-weight genomic DNA (HMW gDNA) was isolated from frozen heart tissue of one B. brevicauda animal (WH4) using the Puregene Kit (Qiagen) following the manufacturer’s protocol with slight modification. These modifications included replacing all steps that required vortexing with gentle inversions to protect the integrity of larger DNA strands. HMW gDNA was quantified with the Qubit dsDNA HS Assay Kit (Life Technologies) and the quality of the molecular weight was assessed by using both genomic screen tapes on a
Tapestation 2200 (Agilent) and a pulse-field gel electrophoresis on a Pippin Pulse (Sage
Sciences). Fragments smaller than 600bp were removed from the HMW gDNA sample using the
Pippin HT (Sage Science).
4
Genomic DNA Library Construction, Sequencing, and de novo Assembly
A Chromium Controller Instrument (10X Genomics) at the DNA Technologies and
Expression Analysis Core at the UC Davis Genome Center was used for sample preparation of a
10X Genomics “linked-read” library for de novo genome assembly. Sample indexing and partition-barcoded libraries were prepared using the Chromium Genome Reagent Kit (10x
Genomics) according to manufacturer’s protocols described in the Chromium Genome User
Guide Rev A (https://support.10xgenomics.com/de-novo- assembly/sample-prep/doc/user-guide- chromium-genome-reagent- kit-v1-chemistry). In summary, approximately 1ng of HMW gDNA in Master Mix was combined with a library of Genome Gel Beads and partitioning oil to create
Gel Bead-In-Emulsions (GEMs) within a microfluidic Genome Chip. HMW gDNA was partitioned across ~1 million GEMs where library construction took place. The library construction incorporated a unique 16bp barcode, an Illumina R1 sequencing primer, and a 6bp random primer sequence. GEM reactions were isothermally incubated (for 3 h at 30°C ; for 10 min at 65°C; held at 4°C), and barcoded fragments ranging from a few to several hundred base pairs were generated. After incubation, the GEMs were broken and the barcoded DNA was recovered. Solid Phase Reversible Immobilization (SPRI) beads were used to purify and size select fragments for library preparation.
Standard library prep was performed according to the protocol described in the
Chromium Genome User Guide Rev A (https://support.10xgenomics.com/de-novo-assembly/ sample-prep/doc/user-guide-chromium-genome-reagent-kit-v1- chemistry) to construct one sample-indexed library using 10x Genomics adaptors. The final library contained the P5 and P7 primers used in lllumina bridge amplification and was quantified by qPCR. Sequencing was
5
conducted with an Illumina HiSeq X with 2×150 paired-end reads based on the manufacturer’s protocols.
We assembled the “linked-read” HiSeq data using Supernova 2.1.0TM assembler
(Weisenfeld et al. 2017) using the default recommended settings. Genome-wide statistics were calculated on the total number of phase blocks and the N50 of individual phase block sizes in the pseudohap outputs produced in the Supernova 10X assembly. Useful statistics about the genome assembly were also ascertained using the stats.py script that is part of the BBMAP suite
(Bushnell 2014).
RNA Isolation and Sequencing
We extracted total RNA from the submaxillary tissue for both shrews using the Qiagen
RNeasy Plus Mini Kit following the manufacturer’s protocol. RNA concentration and quality were assessed using RNA screen Tapes on a TapeStation 2200 (Agilent Technologies). Poly-A- selected RNA libraries were constructed from total RNA for both shrews using a Kapa mRNA
Hyperprep Kit for Illumina platforms (Kapa Biosystems). Final library concentrations and fragment-size distributions were confirmed using a Qubit RNA HS Assay Kit (Invitrogen) and a
TapeStation 2200 (Agilent Technologies), respectively. Sequencing was conducted with an
Illumina HiSeq 4000 with 2×150 paired-end reads based on the manufacturer’s protocols at the
DNA Technologies and Expression Analysis Core at the UC Davis Genome Center.
We also generated long-range cDNA sequence data, which has been shown to be useful in resolving transcripts from difficult and highly paralogous venom gene families (Hargreaves and Mulley 2015), using the Oxford Nanopore’s Minion Sequencing platform. cDNA libraries were prepped with Nanopore’s cDNA-PCR Sequencing Kit (SQK-PCS108) for one shrew
6
(WH5) due to our limited amount of RNA. Libraries were prepared using Nanopore’s cDNA protocol and suggested enzymes (https://community.nanoporetech.com/protocols/cdna-pcr- sequencing/v/pcs_9035_v108_revh_26jun2017; LongAmp Taq 2X Master Mix: NEB;
RNaseOUT: Thermofisher; SuperScript IV reverse transcriptase, 5x RT buffer and 100 mM
DTT: ThermoFisher Exonuclease 1: NEB). The final library was loaded into a FLO-MIN 106 R9 flowcell, and data acquisition was carried out without live basecalling until almost all of the flowcell’s pores read inactive (~10 hours).
Transcriptome Assembly
150bp paired-end RNAseq reads were pre-processed using methods adapted from Singhal
(2013) and Bi et al. (2012) with some modifications. Briefly, Trimmomatic (Bolger et al. 2014) was used to trim adapter contaminations and low-quality reads. Exact PCR and/or optical duplicate reads were removed using Super-Deduper (https://github.com/dstreett/Super-Deduper).
Bowtie2 (Langmead and Salzberg 2012) was used to align the resulting reads against the
Escherichia coli genome to remove potential bacterial contamination that might be present in the raw data. Overlapping paired reads were then merged using Flash (Magoč and Salzberg 2011) and their final quality assessed using FastQC (Babraham Bioinformatics).
Raw Minion Reads from WH5 were converted from fast5 format to fastq format using the command-line python script for basecalling in Albacore. Bases whose quality score were below 10 were trimmed using Nanofilt (https://github.com/wdecoster/nanofilt).
We created three different assemblies using the various data types. This included (1) a de novo assembly for WH4 using only short-read data from our HiSeq sequencing run (WH4- denovo); (2) a de novo assembly for WH5 using only short-read data (WH5-denovo); and (3) a
7
de novo assembly for WH5 using both short-read data and long-read data from the Minion sequencing run (WH5-longread). All assemblies were created using the default parameters in
Trinity, except for the inclusion of the long reads parameter (--long_reads) in the WH5-longread assembly where we also used the Minion reads. We used the TrinityStats.pl script to assess quality of the transcripts for each of the three assemblies.
Saliva Proteomics Sample Preperation
The tube containing shrew saliva was soaked into 50mM Ammonium Bicarbonate solution (just enough to cover the tube). The inside of the tube was washed by pipetting ammonium bicarbonate solution in and out 20 times. Ammonium Bicarbonate solution was then removed and placed in a separate Eppendorf tube. The wash procedure was repeated two times and ammonium bicarbonate solution was pooled together, concentrated in speed vacuum for a final volume of ~100uL for digestion (Sample prep #1). 200uL of 50mM Ammonium
Bicarbonate solution was then added to the Eppendorf tube containing the back tube; following which the sample was shaken at room temperature for 30min, the solution was collected (Sample prep #2). Concentration of the proteins were then measured using Qubit.
For digestion, 5uL of DTT (5ug/uL in 50 mM ammonium bicarbonate) was added to sample prep #1 and #2 individually. Samples were incubated at 56°C for 15 min. After the incubation, 5uL of iodoacetamide (15 mg/ml in 50mM ammonium bicarbonate) was added and the sample incubated in the dark at room temperature for 30 min. Sequencing grade-modified trypsin (Promega, Madison WI) prepared in 50 mM ammonium bicarbonate was added to the sample with an estimation of 1:50 enzyme-substrate ratio and the reaction was carried on at 37°C for overnight. After the digestion, acetic acid was added to the sample to quench the reaction.
8
The samples were dried in a vacufuge and resuspended in 20 uL of 50 mM acetic acid. Peptide concentration was determined by nanodrop (A280nm).
LC-MS/MS
Liquid chromatography-nanospray tandem mass spectrometry (LC/MS/MS) for protein identification was performed on a Thermo Scientific orbitrap Fusion mass spectrometer equipped with an EASY-Spray™ Source and operated in positive ion mode. Samples were separated on an easy spray nano column (PepmapTM RSLC, C18 3µ 100A, 75µm X250mm Thermo
Scientific) using a 2D RSLC HPLC system from Thermo Scientific. Each sample was injected
(2ug) into the µ-Precolumn Cartridge (Thermo Scientific,) and desalted with 0.1% Formic Acid in water for 5 minutes. The injector port was then switched to inject and the peptides were eluted off of the trap onto the column. Mobile phase A was 0.1% Formic Acid in water and acetonitrile
(with 0.1% formic acid) was used as mobile phase B. Flow rate was set at 300nL/min. Mobile phase B was increased from 2% to 20% in 40 min and then increased from 20-32% in 10min and again from 32%-95% in 6 min and then kept at 95% for another 2 min before being brought back quickly to 2% in 2 min. The column was equilibrated at 2% of mobile phase B (or 98% A) for 15 min before the next sample injection. MS/MS data was acquired with a spray voltage of 1.7 KV and a capillary temperature of 275 °C is used. The scan sequence of the mass spectrometer was as follows: the analysis was programmed for a full scan recorded between m/z 375 – 1700 and a
MS/MS scan to generate product ion spectra to determine amino acid sequence in consecutive scans starting from the most abundant peaks in the spectrum in the next 3 seconds. To achieve high mass accuracy MS determination, the full scan was performed at FT mode and the resolution was set at 120,000. The AGC Target ion number for FT full scan was set at 4 x 105
9
ions, and maximum ion injection time was set at 50 ms. MSn was performed using ion trap mode to ensure the highest signal intensity of MSn spectra using CID (for 2+ to 7+ charges). The AGC
Target ion number for ion trap MSn scan was set at 1x 104 ions, and maximum ion injection time was set at 30 ms. The CID fragmentation energy was set to 30%. Dynamic exclusion is enabled with a repeat count of 1 within 60s and a low mass width and high mass width of
10ppm.
Sequence information from the MS/MS data was processed by converting the .raw files into a merged file (.mgf) using MS convert (ProteoWizard). The resulting mgf files were searched using Mascot Daemon by Matrix Science version 2.3.2 (Boston, MA) and the database searched against a custom database derived from the transcriptome of our individual shrews. The mass accuracy of the precursor ions were set to 10ppm, and accidental inclusion of 1 13C peaks was also included into the search. The fragment mass tolerance was set to 0.5 Da. Considered variable modifications were oxidation (Met), deamidation (N and Q), acetylation (K), and carbamidomethylation (Cys) was set as a fixed modification. Four missed cleavages for the enzyme were permitted. A decoy database was also searched to determine the false discovery rate (FDR) and peptides were filtered according to the FDR. Proteins with less than 1% FDR as well as a minimal of 2 significant peptides detected were considered as valid proteins.
Bioinformatic Pipeline for Identification of Novel Toxins
To identify putative venom transcripts from our de novo transcriptome assemblies, we applied a pipeline described in Verdes et al. (2016) that filters out transcripts that possess signal peptides and don’t match any toxins from existing toxin databases. We applied this method to
10
each of the three de novo transcriptome assemblies. First, we identified the coding potential of each transcripts from each transcriptome assembly using TransDecoder
(https://github.com/TransDecoder/TransDecoder/wiki). The resulting amino acid sequences were then searched for the presence of signal peptides using the stand-alone command line version of
SignalP (Petersen et al. 2011). Amino acid sequences with signal peptides were then blasted against a curated database of known toxin protein sequences extracted from Tox-Prot
(https://www.uniprot.org/program/Toxins), a subdivision of the UniProt database, using blastp from the BLAST+ package (NCBI) with an e-value cutoff of 1e-5. The remaining transcripts were then manually annotated by blasting against NCBI’s non-redundant database using blastp.
Finally, expression levels for each of the putative-toxin transcripts were calculated using Kallisto
(Bray et al. 2016). We estimated transcripts per million (TPM) for all transcripts in both transcriptome assemblies and further examined only the transcripts with TPM values greater than
100 to minimize transcripts that are likely not to be biologically relevant in venom. Annotated transcripts with a match to known venom-protein families and with expression levels over 1,000
TPM were considered putative-venom genes.
Selection Tests
Positive selection has been identified as a driver of functional variation of duplicated genes in many gene families in many venom systems (Duda and Palumbi 1999; Vonk et al.
2013). Therefore in order to investigate the effects that positive selection has had on shaping the venom system of B. brevicauda, we conducted tests for positive selection in our putative venom genes using the Codeml program in PAML (Yang 2007). First, one-to-one orthologs for each putative toxin were downloaded from Ensembl (Frankish et al. 2017). Original alignments were
11
made using the MUSCLE (Edgar 2004) plug-in for Geneious v. 11.0.4
(https://www.geneious.com). These alignments were then manually evaluated for quality and edited if sequences were too short, had too many N’s, or were likely erroneous orthologues.
Sequences were then realigned using the Translation Align plug-in for Geneious and processed with Trimal with the -automated1 flag (Silla-Martínez et al. 2009) to remove spurious alignments and large uninformative gaps. Alignments were then converted into Phylip format with a custom python script so phylogenetic trees could be inferred using the GTR GAMMA nucleotide model with rapid bootstrapping methods (1,000 bootstraps) in RAxML (Stamatakis
2014). Branch-site models in the Codeml package in PAML (Yang 2007) were analyzed with the
Maximumx Likelihood tree and final Trimal Phylip alignment to test for positive selection of putative venom genes. Briefly, to do this the null M8null model, in which positive selection was prohibited (dN/dS fixed equal to 1) for 11 different site classes, was compared to the alternative model M8, in which positive selection was not restricted (dN/dS not fixed). Finally, we used likelihood ratio tests (2 x lnL1 - lnL0, df = 1) to determine statistical significance between the two models.
Phylogenetic Inference of KLK1 Gene Family
In order to infer the evolutionary history of KLK1 duplications within B. brevicauda, we created a phylogenetic tree containing genes from B. brevicauda and its sister species the common shrew (Sorex araneus, for which a reference genome is available) (NCBI Accession numbers: XM012935295.1, XM012935294.1, and XM012935293.1 ). We used Eulipotyphlan
KLK1 genes from the star-nosed mole (Condylura cristata) and the hedgehog (Erinaceus europaeus) (NCBI Accession numbers XM_004694038.2 and XM_007531288.2 respectively),
12
as outgroups. Protein sequences for KLK1 genes for the common shrew, star-nosed mole and hedgehog were downloaded from NCBI. Alignments were made from these sequences using the
MUSCLE (Edgar 2004) plug-in for Geneious v. 11.0.4 (https://www.geneious.com), and trees were built using the GTR GAMMA nucleotide model with rapid bootstrapping method (1,000 bootstraps) in RAxML (Stamatakis 2014).
Protein Modelling and Docking
Due to the previous finding that substitutions in regulatory loops surrounding the catalytic cleft of BLTX facilitates its toxicity (Aminetzach et al. 2009), we modelled newly discovered KLK1 paralogous sequences to investigate their potential role as a toxin. Input protein sequences were trimmed of their signal peptides (as identified by SignalP) and fed into the homology-based protein folding, online server, MUSTER (Wu and Zhang 2008). We visualized predicted protein models and examined the electrostatic potential of surface residues using the APBS electrostatics plug-in in PyMOL (Schrödinger, LLC 2015). We also modelled a novel Serine Protease Inhibitor that was found in our transcriptome using the same methods as the KLK1’s with the intention of modelling protein-protein interactions between the inhibitor and BLTX. We modelled the potential interaction of BLTX and the Serine Protease Inhibitor using ClusPro by treating BLTX as the receptor and the double headed inhibitor as the ligand
(Comeau et al. 2004). Results from this prediction were also visualized using PyMOL.
13
Results
Sequencing and Assembly Statistics
The 10X Genomic Library from one individual shrew (WH4) produced a weighted mean molecule size of 20.32kb that was sequenced on an Illumina Hi-Seq X and generated
360,640,993 150bp paired-end reads. The reads were then assembled with 31X effective coverage using Supernova 2.1.0TM and produced a final 1.66gb assembly that contained 166,552 scaffolds with a scaffold N50 of 338,775bp, of which 16,115 scaffolds were greater than 10Kb in length. Supernova estimated the total genome size from our sample to be 2.52gb.
We also generated 150bp paired-end RNA-seq data from the submaxillary glands of two individual adult shrews (WH4 and WH5). We generated 12,239,696 paired-end reads on an
Illumina Hi-Seq 4000 for the WH4 individual and 12,789,562 reads for the WH5 individual.
After quality control and filtering, we assembled resulting reads using Trinity with the default parameters. This produced assemblies containing 31,657 transcripts for the WH4 individual
(WH4-denovo) and 29,630 transcripts for the WH5 individual (WH5-denovo).
Along with the short-read RNA-seq data, we also generated full-length cDNA sequences for one shrew (WH5) utilizing Nanopore’s Minion Benchtop Sequencer. After library preparation and ~10 hours of data acquisition, we generated 972,946 reads, which were reduced to 513,552 high-quality reads after quality filtering. These reads were then processed with the short-read RNA-seq data from WH5 (WH5-longread) in Trinity using the –longreads parameter to produce a final hybrid short-read/long-read assembly that contained 29,808 total transcripts.
Identification of Putative Venom Genes
14
We predicted 17,684 ORFs for the WH4-denovo assembly, 15,727 ORFs for WH5- denovo assembly, and 15,754 ORFs for the WH5-longread assembly using Transdecoder. After filtering all ORFs for peptide sequences containing a signal peptide, we found 780, 759, and 762 transcripts with signal peptides in WH4-denovo, WH5-denovo, and WH5-longread assemblies, respectively. From this list of transcripts with signal peptides, our blast results against the Tox-
Prot database revealed 59 sequences with homology to known toxins in the WH4-denovo assembly, 58 sequences in the WH5-denovo assembly, and 57 sequences in the WH5-longread assembly. The results from this portion of the pipeline are summarized in Table 1. Our comparison of these putative-toxin transcripts between the three different assemblies following annotation showed that there was overlap in transcripts. Only nine of these transcripts across all three assemblies were expressed at a high level with TPM values greater than 1,000. These nine sequences consisted of BLTX, Blarinasin, proenkaphalin, KLK1, a truncated KLK1 paralog, a double-headed protease inhibitor, oxytocin/neurophosin, endothelin-1, and a phospholipase A2
(PLA2) . Most of the other highly expressed transcripts that were filtered out from our toxin- search pipeline belonged to known salivary genes (e.g., mucin and amylase).
Further examination of the genomic region containing the known BLTX toxin and its paralog Blarinasin using both the transcriptome and genome assemblies revealed other KLK1 serine-protease paralogs that have not previously been identified in B. brevicauda (Figure 1). We found two additional paralogous KLK1 genes showing homology to BLTX, however one of them appeared to be missing the fifth exon of the gene (hereafter referred to as the ‘KLK1 paralog’ and the ‘truncated-KLK1 paralog,’ respectively). Both of these additional paralogous
KLK1 genes were expressed at a relatively high abundance in the submaxillary gland. We also discovered a third additional KLK1 paralog in our reference genome that was not expressed in
15
the submaxillary gland, however, it did contain a translatable open-reading frame (hereafter referred to as the ‘non-expressed KLK1 paralog’). According to our reference genome, all paralogs were oriented in a tandem array (Figure 2). We also surmised that Blarinasin-1 and
Blarinasin-2 that were treated as distinct paralogs from previous investigations (Aminetzach et al. 2009) were also likely the same gene.
Selection Tests
We investigated patterns of selection on seven of the nine putative-toxin genes and found that only three of these showed evidence of positive selection using the branch-site tests (Table
2). These three genes included BLTX (2 sites: 187 V, 207 H) (Figure 3), the non-expressed
KLK1 paralog (1 site: 186 D), and PLA2 (1 site: 75 G). We did not perform selection tests on the double-headed protease inhibitor and the truncated-KLK1 paralog because we were unable to find orthologous sequences for both of these genes.
Mass Spectrometry of Saliva
LC/MS analysis of the saliva from two adult B. brevicauda individuals yielded 127 proteins with at least 2 unique identifying spectra for the WH4 individual and 97 proteins for the
WH5 individual. Of these identified proteins, 74 were shared between the two saliva samples.
The results for putative venom proteins are summarized in Figure 4. Interestingly, seven of the ten most abundant proteins were shared between the two individuals. These seven included
BLTX, Blarinasin, the truncated KLK1 paralog, and PLA2. BLTX appeared to be the main component of the venom as it comprised 11.34% of the abundance of all salivary proteins in
WH4 and 7.57% in WH5. BLTX, Blarinasin, and the truncated-KLK1 paralog together,
16
accounted for 23.3% of all proteins identified in the WH4 sample and 19.16% of all proteins found in the WH5 sample. This is interesting because it suggests that around 20-25% of the salivary proteins in B. brevicauda are produced from the same set of paralogous genes. PLA2, in addition to being highly expressed in the submaxillary gland was also found to be the third highest abundant protein, constituting 8.98% of all proteins in WH4 and the fourth highest abundant protein, comprising 6.63% of the WH5 sample. Proenkephalin, although expressed at a high level in the submaxillary gland, was found only to be 1.77% of all proteins in WH4 and
2.62% in WH5.
We found no evidence of oxytocin/neurophysin in the proteome of the saliva for either individual shrews we sampled, despite this being expressed at a relatively high level in the submaxillary gland, therefore, we conclude that this protein is likely not an important component in B. brevicauda venom. Additionally, despite being the most highly expressed transcript in the submaxillary gland, the double-headed serine protease inhibitor was only detected in the saliva of the WH4 individual and comprised only 0.55% of the total salivary proteome. Similarly, endothelin-1, despite being highly expressed in the submaxillary gland, was found at very low levels in the saliva of both individual shrews and constituted only 0.72% of all proteins in WH4 and 0.92% in WH5.
Phylogenetic Inference of KLK1
Our phylogenetic tree of the KLK1 gene family revealed there were two ancient duplications of the KLK1 gene prior to the divergence between B. brevicauda and S. araneus, common shrew (Figure 5). This tree also showed that the KLK1s genes from the hedgehog and star-nosed mole, were outgroups, as expected, to shrew KLK1 genes. Low support values for nodes
17
leading to divergence between BLTX, Blarinasin, and the S. araneus ortholog (Common Shrew
KLK1-1) indicates the order to which these genes split from each other is not clear.
3D Protein Modelling of KLK1 Paralogs
The three-dimensional protein structure and predicted electrostatic potential for surface residues for BLTX and Blarinasin revealed similar patterns found by Aminetzach et al. (2009), wherein BLTX had a negative catalytic pocket surrounded by generally positive regulatory loops
(Figure 6.1) and Blarinasin, in contrast, had both a negative pocket and generally negative surrounding regulatory loops (Figure 6.2). The KLK1 paralog had similar 3D structure and electrostatic charge as Blarinasin (Figure 6.4) and with orthologous genes from other species.
The protein model for the non-expressed KLK1 paralog revealed that it had accumulated at least a few positive residues in the regions surrounding the active site, but lacked the canonical negatively charged pocket indicative of all KLK1s (Figure 6.3). We didn’t model the protein structure for the 4-exon KLK1 paralog due to the lack of a fifth exon potentially causing misfolding in the predicted protein structure.
The presence of a highly expressed double-headed serine-protease inhibitor in the submaxillary gland and its potential to impact the function of BLTX (which is a serine protease) prompted us to investigate protein-protein interactions between these using ClusPro. When treating BLTX as the receptor and the inhibitor as the ligand, ClusPro predicted docking of the protease inhibitor in the active site of BLTX (Figure 7). Specifically, the protease binding loop of the inhibitor bound to BLTX with the reactive P1-P1’ site of the inhibitor directly positioned in the active-site of BLTX. This configuration has been found as the general mode of inhibition for many serine protease inhibitors, suggesting the possibility that this inhibitor does interact with
18
BLTX in an inhibitory manor inside only the submaxillary gland since it is not really found in the saliva.
19
Discussion
In this study, we found the toxin composition of B. brevicauda venom to not be very complex. Using a multi-omic approach with a de novo reference genome, transcriptome of the submaxillary gland, and mass-spectrometry based proteomics of saliva, we found that the main venom components in the saliva are likely limited to BLTX, PLA2, and proenkaphalin. Two of these, BLTX and proenkephalin (particularly its shorter peptide named Soricidin), have been previously identified and tested to be functionally toxic to mice and mealworms, respectively
(Kita et al. 2004; Patent: US8003754B2). We also discovered additional KLK1 paralogs in tandem array with BLTX that are also found in the saliva, but do not appear to be toxic like
BLTX. Finally, we found two proteins in the submaxillary gland, but not in the saliva, that may act as endogenous self-defense mechanisms to BLTX.
Putative Venoms
Consistent with previous findings (Kita et al. 2004; Aminetzach et al. 2009), BLTX was found to be undergoing positive selection, was expressed at high levels in the submaxillary gland, and was consequently a major constituent of salivary proteins. Previously, this protein has been found to be lethal to mice by decreasing blood pressure through the cleaving of kinins to bradykinin (Kita et al. 2004). A previous study provides evidence suggesting that positive selection acting on lineage-specific insertions in and around the regulatory loops surrounding the catalytic pocket of BLTX have led to increased substrate specificity for this protein relative to its
KLK1 derivative (Aminetzach et al. 2009). However, we were not able to test for selection in these insertions because they are specific to B. brevicauda and thus were trimmed from our
20
alignment that included multiple other mammal species. However, the previous study by
Aminetzach et al. (2009) found no evidence of positive selection acting at a gene-wide level.
Whereas, we found evidence of positive selection at a gene-wide level at two sites (207 H and
187 V) that was possibly due to our analyses having more power to detect selection with a larger alignment with other divergent taxa (Fletcher and Yang 2010). Interestingly, one of these sites,
187 V, falls near a region surrounding the catalytic pocket of BLTX. However, since Valine is a non-polar, non-charged amino acid, it’s likely that this transition isn’t contributing the effects reported in Aminetzach et al. (2009).
As with previous work, we also found the previously characterized KLK1 paralog
Blarinasin in B. brevicauda (Kita et al. 2005). Interestingly, this previous study found that despite having specificity for similar substrates as BLTX, Blarinasin was not lethal when injected into mice. This may be explained by the fact that Blarinasin lacked the same positive amino acid substitutions in the regulatory loops surrounding it’s active sites as was found in
BLTX (Aminetzach et al. 2009). We also found in our transcriptome assembly the presence of truncated-KLK1 paralog that was present in the saliva of both shrews, and another KLK1 paralog, which was not present in the saliva of either shrew. Additionally, the genomic region encompassing all of these paralogs was also bordered by another KLK1 gene that was not expressed in the submaxillary gland despite containing a translatable open-reading frame. Given that these KLK1 paralogs are arranged in a tandem array in the genome, it is possible that at least some of them are subject to the same means of regulation (e.g. promoter/enhancer, post- transcriptional, post-translational). Interestingly, we notice that these KLK1 paralogs are being expressed at different levels in both the transcriptome and proteome, perhaps suggesting that they are being modulated through different regulatory mechanisms.
21
Our models of the three-dimensional structure and predicted electrostatic potential of unexpressed KLK1 paralog) showed that neither of them possess positive amino-acid substitutions in the regulatory loops surrounding the active sites as reported for BLTX
(Aminetzach et al. 2009). In fact, one of these sequences exhibited both a negatively charged catalytic pocket and surrounding regulatory regions, leading us to believe that this is the gene with canonical KLK1 activity in these shrews. The unexpressed KLK1 paralog found in our genome on the other hand, had a relatively neutral catalytic pocket and regulatory loops, suggesting that it may have lost its function as a kallikrein and/or gained a new function entirely.
This result is interesting, and maybe even expected, as this paralog was also found to be undergoing positive selection in our selection analysis.
Genomic duplication events of specific regulatory genes has been shown to result in the evolution of venom across many evolutionarily disparate venomous organisms (Gibbs and
Rossiter 2008; Vonk et al. 2013). It appears from our phylogenetic results of the KLK1 gene family that there were two old duplication events of KLK1 paralogs that preceded the duplication event for BLTX. These two earlier duplication events appeared to have occurred before the split between the Soricini and Blarinini lineages as both Sorex araneus and B. brevicauda possess these two earlier duplications along with the original KLK1 gene. The duplication event for
BLTX occurred more recently within the Blarinini lineage due to Sorex araneus not possessing a closely related paralog. Additionally, another duplication event occurred specific to the Blarini lineage and led to a fifth KLK1 paralog specific to B.brevicauda. However, none of these duplicated paralogs were found in hedgehog or star-nosed mole, suggesting that duplication of the KLK1 occurred within the Soricidae.
22
The proenkephalin gene that we discovered matched with Soricidin from the Tox-Prot database and was found to be highly expressed in the submaxillary gland, but only present at moderate levels as a protein in the saliva. Proenkaphalin is a precursor gene that undergoes post- translational cleavage resulting in multiple enkaphalins, which are short peptides involved in opioid receptor signaling (Henry et al. 2017). Proenkaphalins have been implicated as toxic components in both scorpion and fangblenny venom, where they exhibit hypotensive activity
(Zhang et al. 2012; Casewell et al. 2017). Soricidin is a small peptide that was previously isolated from B. brevicauda with efficacy in certain cancer treatments (Bowen et al. 2013; Zhang et al. 2014). Specifically, this small peptide has been shown to have high affinity for TRPV6
Calcium ion channels, and is capable of inhibiting the movement of calcium across the cellular membrane (an important physiological process in proliferating cancer cells) (Bowen et al. 2013).
Soricidin was also found to be highly effective at immobilizing mealworms (Patent:
US8003754B2), and thus it is likely that this toxin constituent is involved in B. brevicauda’s ability to cache and immobilize invertebrate prey for long periods of time.
The PLA2 gene found in our study had not previously been identified as a possible toxin in B. brevicauda’s venom. Although there are no functional assays on the toxicity of this gene, it possesses multiple characteristics that are suggestive that it has an important function in shrew saliva including it being highly expressed in the submaxillary gland, produced at high levels as a salivary protein, and it undergoing rapid evolution. Phospholipase A2 enzymes “PLA2s” have been convergently recruited into venom arsenals across the animal kingdom and depending on the specific venom system can have drastically different pharmacological effects including: neurotoxic, myotoxic, inflammatory, and haemolytic activities (Fry and Wüster 2004; Kordiš
2011). PLA2 has the general property of hydrolyzing phospholipids, and thus plays an important
23
role in lipid metabolism (Arni and Ward 1996). PLA2 paralogs and isoforms can have highly variable function and levels of toxicity even within the same venomous species/organism (Harris and Scott-Davey 2013). Interestingly, recent proteomic work from the saliva Neomys fodiens, an unrelated venomous-shrew species, revealed a protein with homology to PLA2, which was speculated to contribute to the paralytic effects observed in N. fodiens venom (Kowalski et al.
2017). The extreme variability in PLA2 function across venom systems makes it difficult to suggest an exact function for PLA2 within the B. brevicauda venom system, but nonetheless it is a likely candidate venom gene. Further proteomic isolation and functional assays are needed to elucidate the functional role of this protein within this venom system.
Putative Endogenous Venom Defense?
Many venomous snakes are known to have inhibitory proteins in their circulatory systems that can act as direct antagonists to their own venom components (Mackessey 2010; Santos-Filho and Santos 2017). We believe this may be the case for the double-headed protease inhibitor found in B. brevicauda’s submaxillary gland. This inhibitor was one of the most highly expressed transcripts in B. brevicauda’s submaxillary gland, but was protein only at extremely low levels in the saliva and found in just one of the two shrew samples. This gene contains two
Kazal-type serine protease domains in tandem. Kazal-type serine protease inhibitors can contain multiple Kazal domains with a variable amount of amino acids making up each domain. This variability is thought to confer different specificities for their target serine proteases
(Rimphanitchayakit and Tassanakajon 2010). Kazal-type serine protease inhibitors have been found to have a role in the venom of the eyelash and side-striped palm vipers (B. schlegelii and B. lateralis), however, they are relatively rare venom constituents (Durban et al.
24
2011). We have shown with models of protein-protein interaction a high potential for the double- headed protease inhibitor to dock with BLTX directly in the active site of BLTX. Previous studies of kazal-type protease inhibitor domains have similarly shown protease binding loops binding directly to active sites of the protease they are inhibiting (Krowarsch et al. 2003;
Rimphanitchayakit and Tassanakajon 2010). Since the main component of B. brevicauda venom is a serine protease, BLTX, and we see little evidence for inhibitor proteins in the saliva, we surmise that this inhibitor is a self-defense mechanism against BLTX within the submaxillary gland where it is highly expressed.
Another potential inhibitor in B. brevicauda that is highly expressed in the submaxillary gland, but was not abundantly present as a salivary protein is endothelin-1. This gene is homologus and structurally similar to the snake venom Sarafotoxin, which is incredibly lethal to small vertebrates (Kochva et al. 1993) because it causes dramatic increases in blood pressure due to extreme vasoconstriction (Wollberg et al. 1989; Mackessey 2010). However, we suspect the high expression of endothelin within the submaxillary gland in B. brevicauda may be causing vasonconstriction as a means to ameliorate the vasodilation efffects caused by the toxic BLTX.
This would be consistent with previous work that has shown B. brevicauda to have a higher tolerance to its own venom than mice when injected with extracts from their own submaxillary gland (Pearson 1950).
Venom simplicity in relation to diet complexity
One of the lingering questions about the evolution of venom in B. brevicauda, and other venomous shrew species, is identifying the selective pressures that led to the production of venom. Diet complexity has been shown to correspond with greater complexity in venom
25
composition in many organisms including cone snails, snakes and spiders (Daltry et al. 1996;
Phuong et al. 2016; Pekár et al. 2018). However, despite their feeding on a wide-range of prey from different animal phyla, including Arthropoda, Annelida, Mollusca, and Chordata, we find that there are likely just a few toxic proteins in B. brevicauda saliva (Hamilton 1941; George et al. 1986). Most non-venomous shrews and other Eulipotyphlans have as broad of a diet as B. brevicauda and other venomous shrews (Hamilton 1930; Wall 1990; Churchfield and Sheftel
1994). This order of mammals relies heavily on an active foraging strategy, specialized dentition, and masticatory systems for capturing and consuming prey (Dufton 1992; Furió et al. 2010;
Folinsbee 2013). Thus, our finding for the simplicity of the venom in B. brevicauda suggests that evolution of venom does not represent a major transition in foraging strategy in this taxonomic group, but that venom might be a very specialized accessory to their greater prey-capture strategy.
Other hypotheses that have been proposed to explain why venom evolved in B. brevicauda include the “hunting big” and the “hoarding small” hypotheses (Dufton 1992; Rode-
Margono and Nekaris 2015). The “hunting big” hypothesis proposes that venom evolved in B. brevicauda to help facilitate the capture of small vertebrate prey. Certainly, the effects of BLTX toxin could be a contributing factor to helping this shrew accomplish this. On the other hand, the
“hoarding small” hypothesis proposes that venom has evolved to aid in the long term caching of invertebrate prey, a behavior observed in B. brevicauda, as well as several other non-venomous shrews (Hamilton 1930; Hamilton 1941; Robinson and Brodie 1982; Martin 1984). The functional effects of Soricidin toxin certainly support this hypothesis. PLA2 toxins are found in a wide range of venomous animals that specialize on either vertebrate and invertebrate prey and thus without further functional assays it is not clear whether this putative toxin helps B.
26
brevicauda capture vertebrate or invertebrate prey. To better understand the merit of either or both hypotheses, it would be helpful if there were functional studies for all toxin types on both vertebrate and invertebrate prey. For instance, BLTX was only functionally experimented on mice whereas, Soricidin was only functionally tested on mealworms. Furthermore, it would also be helpful to have more information on how much vertebrate and invertebrate prey make up the total caloric amount in B. brevicauda’s diet, as well as in other venomous and non-venomous species’ diet. A recent study on Neomys fodiens, another venomous shrew, suggests that N. fodiens caches statistically less invertebrate prey than S. araneus, a non-venomous shrew
(Kowalski and Rychlik 2018). However, N. fodiens were able to overpower and cache larger prey more quickly than S. araneus, possibly suggesting that venom allows for a dietary expansion towards larger prey. Further dietary analyses on B. brevicauda would be vital in determining if functional convergence in prey type is evident between these shrews.
We cannot rule out the possibility that B. brevicauda seasonally modulate the toxin composition of their venom to reflect prey availability. Venom composition has been shown to vary seasonally in multiple different venomous animals (Gubenšek et al. 1974; Cologna et al.
2018). Shrews are active in the winter and many species hoard prey to help them survive through the winter months due to their large metabolic demands and reduced prey availability. (Vander
Wall 1990). Blarina brevicauda has been observed to change their hoarding preference at different times of the year. For instance, in a study by Martin (1984), B. brevicauda was found storing seeds in October, then hoarding insects in November, and then caching mice by the end of November (Martin 1984). Throughout most of the geographic range of B. brevicauda, the climate in the winter is extremely cold and the invertebrate prey are not as readily available.
Therefore, it is possible that the composition of B. brevicauda’s venom changes to accommodate
27
the availability of different types of prey. We conducted our study in the summer and thus may have missed changes in venom components or changes in their abundance.
28
Conclusion
Our research represents one the first multi-omic comprehensive characterizations of a venom system. The components of venom in B. brevicauda appear to be relatively simple.
Venom has evolved in at least two other Eulipotyphlan genera, and possibly several more given multiple anecdotal accounts of other shrew species paralyzing their prey (Dufton 1992; Folinsbee
2013) We don’t know the molecular bases of these other venomous shrews and whether the functional convergence evolved from similar proteins. Herein, we provide a comprehensive investigation of the venom system in short-tailed shrews that furthers our understanding of how the evolution of toxicity for prey capture has arisen in mammals. These findings will be useful for future comparative studies with other venomous and non-venomous Eulipotyphlans.
29
Literature Cited
Aminetzach YT, Srouji JR, Kong CY, Hoekstra HE. 2009. Convergent Evolution of Novel Protein Function in Shrew and Lizard Venom. Curr. Biol. [Internet] 19:1925–1931. Available from: http://dx.doi.org/10.1016/j.cub.2009.09.022
Arni RK, Ward RJ. 1996. Phospholipase A2—a structural review. Toxicon [Internet] 34:827–841. Available from: http://www.sciencedirect.com/science/article/pii/0041010196000360 Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics [Internet] 30:2114–2120. Available from: http://dx.doi.org/10.1093/bioinformatics/btu170
Bowen C V., DeBay D, Ewart HS, Gallant P, Gormley S, Ilenchuk TT, Iqbal U, Lutes T, Martina M, Mealing G, et al. 2013. In Vivo Detection of Human TRPV6-Rich Tumors with Anti-Cancer Peptides Derived from Soricidin. PLoS One 8:1–11.
Bray NL, Pimentel H, Melsted P, Pachter L. 2016. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. [Internet] 34:525. Available from: https://doi.org/10.1038/nbt.3519
Bushnell B. 2014. BBMap: A Fast, Accurate, Splice-Aware Aligner. In: United States. Available from: https://www.osti.gov/servlets/purl/1241166
Casewell NR, Visser JC, Baumann K, Dobson J, Han H, Kuruppu S, Morgan M, Romilio A, Weisbecker V, Mardon K, et al. 2017. The Evolution of Fangs, Venom, and Mimicry Systems in Blenny Fishes. Curr. Biol. [Internet] 27:1184–1191. Available from: http://www.sciencedirect.com/science/article/pii/S0960982217302695
Casewell NR, Wu W, Vonk FJ, Harrison RA, Fry BG. 2013. Complex cocktails : the evolutionary novelty of venoms. 28:219–229.
Churchfield S, Sheftel BI. 1994. Food niche overlap and ecological separation in a multi-species community of shrews in the Siberian taiga. J. Zool. 234:105–124.
Cologna CT, Rodrigues RS, Santos J, de Pauw E, Arantes EC, Quinton L. 2018. Peptidomic investigation of Neoponera villosa venom by high-resolution mass spectrometry: seasonal and nesting habitat variations. J. Venom. Anim. Toxins Incl. Trop. Dis. [Internet] 24:6. Available from: https://doi.org/10.1186/s40409-018-0141-3
30
Comeau SR, Gatchell DW, Vajda S, Camacho CJ. 2004. ClusPro: an automated docking and discrimination method for the prediction of protein complexes. Bioinformatics [Internet] 20:45–50. Available from: http://dx.doi.org/10.1093/bioinformatics/btg371
Daltry JC, Wüster W, Thorpe RS. 1996. Diet and snake venom evolution. Nature [Internet] 379:537–540. Available from: https://doi.org/10.1038/379537a0
Duda TF, Palumbi SR. 1999. Molecular genetics of ecological diversification: Duplication and rapid evolution of toxin genes of the venomous gastropod <em>Conus</em> Proc. Natl. Acad. Sci. [Internet] 96:6820 LP-6823. Available from: http://www.pnas.org/content/96/12/6820.abstract
Dufton MJ. 1992. Venomous mammals. Pharmacol. Ther. [Internet] 53:199–215. Available from: http://www.sciencedirect.com/science/article/pii/016372589290009O
Durban J, Juárez P, Angulo Y, Lomonte B, Flores-Diaz M, Alape-Girón A, Sasa M, Sanz L, Gutiérrez JM, Dopazo J, et al. 2011. Profiling the venom gland transcriptomes of Costa Rican snakes by 454 pyrosequencing. BMC Genomics [Internet] 12:259. Available from: https://www.ncbi.nlm.nih.gov/pubmed/21605378
Eadie WR. 1952. Shrew Predation and Vole Populations on a Localized Area. J. Mammal. [Internet] 33:185–189. Available from: http://www.jstor.org/stable/1375927 Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. [Internet] 32:1792–1797. Available from: https://dx.doi.org/10.1093/nar/gkh340
Fletcher W, Yang Z. 2010. The Effect of Insertions, Deletions, and Alignment Errors on the Branch-Site Test of Positive Selection. Mol. Biol. Evol. [Internet] 27:2257–2267. Available from: https://doi.org/10.1093/molbev/msq115
Folinsbee KE. 2013. Evolution of venom across extant and extinct eulipotyphlans. Comptes Rendus Palevol [Internet] 12:531–542. Available from: http://www.sciencedirect.com/science/article/pii/S1631068313000717
Frankish A, Vullo A, Zadissa A, Yates A, Thormann A, Parker A, Gall A, Moore B, Walts B, Aken BL, et al. 2017. Ensembl 2018. Nucleic Acids Res. [Internet] 46:D754–D761. Available from: https://doi.org/10.1093/nar/gkx1098
Fry BG, Roelants K, Champagne DE, Scheib H, Tyndall JDA, King GF, Nevalainen TJ, Norman JA, Lewis RJ, Norton RS, et al. 2009. The Toxicogenomic Multiverse: Convergent Recruitment of Proteins Into Animal Venoms. Annu. Rev. Genomics Hum. Genet. [Internet] 10:483–511. Available from: https://doi.org/10.1146/annurev.genom.9.081307.164356
31
Fry BG, Wüster W. 2004. Assembling an Arsenal: Origin and Evolution of the Snake Venom Proteome Inferred from Phylogenetic Analysis of Toxin Sequences. Mol. Biol. Evol. [Internet] 21:870–883. Available from: https://dx.doi.org/10.1093/molbev/msh091
Furió M, Agustí J, Mouskhelishvili A, Sanisidro Ó, Santos-Cubedo A. 2010. The paleobiology of the extinct venomous shrew Beremendia (Soricidae, Insectivora, Mammalia) in relation to the geology and paleoenvironment of Dmanisi (Early Pleistocene, Georgia). J. Vertebr. Paleontol. [Internet] 30:928–942. Available from: https://doi.org/10.1080/02724631003762930
George SB, Choate JR, Genoways HH. 1986. Blarina brevicauda. Mamm. Species [Internet]:1–9. Available from: http://www.jstor.org/stable/3504010?seq=1#page_scan_tab_contents
Gibbs HL, Rossiter W. 2008. Rapid Evolution by Positive Selection and Gene Gain and Loss: PLA2 Venom Genes in Closely Related Sistrurus Rattlesnakes with Divergent Diets. J. Mol. Evol. [Internet] 66:151–166. Available from: https://doi.org/10.1007/s00239-008-9067-7
Gubenšek F, Sket D, Turk V, Lebez D. 1974. Fractionation of Vipera ammodytes venom and seasonal variation of its composition. Toxicon [Internet] 12:167–168. Available from: http://www.sciencedirect.com/science/article/pii/0041010174902414
Hamilton WJ. 1930. The Food of the Soricidae. J. Mammal. [Internet] 11:26–39. Available from: http://www.jstor.org/stable/1373782
Hamilton WJ. 1941. The Food of Small Forest Mammals in Eastern United States. J. Mammal. [Internet] 22:250–263. Available from: http://www.jstor.org/stable/1374950
Hargreaves AD, Mulley JF. 2015. Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing.Pochon X, editor. PeerJ [Internet] 3:e1441. Available from: https://doi.org/10.7717/peerj.1441
Harris JB, Scott-Davey T. 2013. Secreted phospholipases A2 of snake venoms: effects on the peripheral neuromuscular system with comments on the role of phospholipases A2 in disorders of the CNS and their uses in industry. Toxins (Basel). [Internet] 5:2533–2571. Available from: https://www.ncbi.nlm.nih.gov/pubmed/24351716
Henry MS, Gendron L, Tremblay ME, Drolet G. 2017. Enkephalins: Endogenous Analgesics with an Emerging Role in Stress Resilience. Neural Plast. 2017.
Ishihama Y, Oda Y, Tabata T, Sato T, Nagasu T, Rappsilber J, Mann M. 2005. Exponentially Modified Protein Abundance Index (emPAI) for Estimation of Absolute Protein Amount in
32
Proteomics by the Number of Sequenced Peptides per Protein. Mol. & Cell. Proteomics [Internet] 4:1265 LP-1272. Available from: http://www.mcponline.org/content/4/9/1265.abstract
Kita M, Nakamura Y, Okumura Y, Ohdachi SD, Oba Y, Yoshikuni M, Kido H, Uemura D. 2004. Blarina toxin, a mammalian lethal venom from the short-tailed shrew <em>Blarina brevicauda</em>: Isolation and characterization. Proc. Natl. Acad. Sci. U. S. A. [Internet] 101:7542 LP-7547. Available from: http://www.pnas.org/content/101/20/7542.abstract
Kita M, Okumura Y, Ohdachi SD, Oba Y, Yoshikuni M, Nakamura Y, Kido H, Uemura D. 2005. Purification and characterisation of blarinasin, a new tissue kallikrein-like protease from the short-tailed shrew Blarina brevicauda: Comparative studies with blarina toxin. Biol. Chem. 386:177–182.
Kochva E, Bdolah A, Wollberg Z. 1993. Sarafotoxins and endothelins: evolution, structure and function. Toxicon [Internet] 31:541–568. Available from: http://www.sciencedirect.com/science/article/pii/004101019390111U
Kordiš D. 2011. Evolution of Phospholipase A2 Toxins in Venomous Animals.
Kowalski K, Marciniak P, Rosiński G, Rychlik L. 2017. Evaluation of the physiological activity of venom from the Eurasian water shrew Neomys fodiens. Front. Zool. 14.
Kowalski K, Rychlik L. 2018. The role of venom in the hunting and hoarding of prey differing in body size by the Eurasian water shrew, Neomys fodiens. J. Mammal. 99:351–362.
Krowarsch D, Cierpicki T, Jelen F, Otlewski J. 2003. Canonical protein inhibitors of serine proteases. Cell. Mol. Life Sci. 60:2427–2444.
Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods [Internet] 9:357. Available from: http://dx.doi.org/10.1038/nmeth.1923
Ligabue-braun R. 2015. Evolution of Venomous Animals and Their Toxins. Available from: http://link.springer.com/10.1007/978-94-007-6727-0
Mackessey. 2010. Handbook of Venoms and Toxins of Reptiles. Boca Raton: CRC Press
Magoč T, Salzberg SL. 2011. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics [Internet] 27:2957–2963. Available from: http://dx.doi.org/10.1093/bioinformatics/btr507
33
Martin IG. 1981. Venom of the Short-Tailed Shrew (Blarina brevicauda) as an Insect Immobilizing Agent. J. Mammal. [Internet] 62:189–192. Available from: http://www.jstor.org/stable/1380494 Martin IG. 1984. Factors affecting food hoarding in the short-tailed shrew Marina brevicauda. Mammalia 48:65–72.
Pearson OP. 1942. On the Cause and Nature of a Poisonous Action Produced by the Bite of a Shrew (Blarina brevicauda). J. Mammal. [Internet] 23:159–166. Available from: http://www.jstor.org/stable/1375068
Pearson OP. 1950. The submaxillary glands of shrews. Anat. Rec. [Internet] 107:161–169. Available from: https://doi.org/10.1002/ar.1091070206
Pekár S, Bočánek O, Michálek O, Petráková L, Haddad CR, Šedo O, Zdráhal Z. 2018. Venom gland size and venom complexity—essential trophic adaptations of venomous predators: A case study using spiders. Mol. Ecol. [Internet] 27:4257–4269. Available from: https://doi.org/10.1111/mec.14859
Petersen TN, Brunak S, von Heijne G, Nielsen H. 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods [Internet] 8:785. Available from: http://dx.doi.org/10.1038/nmeth.1701
Phuong MA, Mahardika GN, Alfaro ME. 2016. Dietary breadth is positively correlated with venom complexity in cone snails. BMC Genomics [Internet] 17:1–15. Available from: http://dx.doi.org/10.1186/s12864-016-2755-6
Rimphanitchayakit V, Tassanakajon A. 2010. Structure and function of invertebrate Kazal-type serine proteinase inhibitors. Dev. Comp. Immunol. [Internet] 34:377–386. Available from: http://www.sciencedirect.com/science/article/pii/S0145305X09002560
Robinson DE, Brodie ED. 1982. Food Hoarding Behavior in the Short-tailed Shrew Blarina brevicauda. Am. Midl. Nat. [Internet] 108:369–375. Available from: http://www.jstor.org/stable/2425498
Rode-Margono EJ, Nekaris AK. 2015. Cabinet of Curiosities: Venom Systems and Their Ecological Function in Mammals, with a Focus on Primates. Toxins 7.
Santos-Filho NA, Santos CT. 2017. Alpha-type phospholipase A(2) inhibitors from snake blood. J. Venom. Anim. Toxins Incl. Trop. Dis. [Internet] 23:19. Available from: https://www.ncbi.nlm.nih.gov/pubmed/28344595
34
Schrödinger, LLC. 2015. The {PyMOL} Molecular Graphics System, Version~1.8. Silla-Martínez JM, Capella-Gutiérrez S, Gabaldón T. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics [Internet] 25:1972–1973. Available from: https://dx.doi.org/10.1093/bioinformatics/btp348
Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics [Internet] 30:1312–1313. Available from: https://dx.doi.org/10.1093/bioinformatics/btu033
Vonk FJ, Casewell NR, Henkel C V, Heimberg AM, Jansen HJ, McCleary RJR, Kerkkamp HME, Vos RA, Guerreiro I, Calvete JJ, et al. 2013. The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system. Proc. Natl. Acad. Sci. [Internet] 110:20651 LP-20656. Available from: http://www.pnas.org/content/110/51/20651.abstract
Wall SB Vander. 1990. Food Hoarding In Animals. Chicago: University of Chicago Press
Wollberg Z, Bdolah A, Kochva E. 1989. Vasoconstrictor effects of sarafotoxins in rabbit aorta: Structure-function relationships. Biochem. Biophys. Res. Commun. [Internet] 162:371–376. Available from: http://www.sciencedirect.com/science/article/pii/0006291X89920068
Wu S, Zhang Y. 2008. MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins Struct. Funct. Genet. 72:547–556.
Yang Z. 2007. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. [Internet] 24:1586–1591. Available from: https://dx.doi.org/10.1093/molbev/msm088
Zhang L-L, Zhang Q, Chen B, Wang Y, Liu C-C, Yang H. 2014. Biotoxins for Cancer Therapy. Asian Pacific J. Cancer Prev. 15:4753–4758.
Zhang Y, Xu J, Wang Z, Zhang X, Liang X, Civelli O. 2012. BmK-YA, an enkephalin-like peptide in scorpion venom. PLoS One [Internet] 7:e40417–e40417. Available from: https://www.ncbi.nlm.nih.gov/pubmed/22792309
35
Appendix: Tables and Figures
Assembly Name WH4 -deNovo WH5 -deNovo WH5-LongReads
Raw Reads 2,000,000 150bp 2,000,000 150bp 2,000,000 150bp Assembled paired end paired end paired end + 500,000 Minion Reads
Transcripts 31,657 29,630 29,808 Assembled Transcripts 17,684 15,727 15,754 containing ORFs
ORFs containing a 780 759 762 signal peptide
Proteins with hits to 59 58 57 Tox-Prot database
Table 1: Results from various steps of the bioinformatic processing of raw RNA reads from the transcriptome of two individual B. brevicauda .
36
Protein Name M8a lnL M8 lnL Likelihood Pvalue/Significa Ratio Test nce BLTX -4245.426066 -4243.175318 2 x (- .033867 * 4243.175318-(- 2 sites: 187 V 4245.426066)) = 207 H 4.501496
Blarinasin -5350.820439 -5351.626880 2 x (- .204099 5351.626880-(- 5350.820439)) = 1.612882
KLK1 -5212.783533 -5213.394167 2 x (- .269125 5213.394167(- 5212.783533 )) = 1.221268
Non-Expressed -5384.553916 -5387.709546 2 x (- .011998 * KLK1 5384.553916 – (- 1 site: 186 D 5387.709546)) = 6.31126
Soricidin -6478.892152 -6478.733397 2 x (- .573114 6478.733397-(- 6478.892152)) = 0.31751
Endothelin-1 -3787.737959 -3787.711426 2 x (- .817923 3787.711426 – (- 3787.737959)) = 0.053066
PLA2 -3928.358225 -3925.249968 2 x (- .012657 * 3925.249968 – (- 1 site: 75 G 3928.358225)) = 6.216514
Oxytocin/Neuro -1468.823853 -1468.962410 2 x (- N/A physin 1468.962410 – (- 1468.823853)) = −0.277114
Table 2: Results from positive selection analyses conducted using the codeml program in PAML. M8a represents the null model in which dn/ds is fixed to 1 with 11 classes of sites taken into
37
account. M8 represents the alternative model in which dn/ds is allowed to vary amongst 11 classes of sites.
KLK_not_express 1 MCFLLLCLALSLLGTAYTHPTYAGHGKSGHIC-SLQCEKQFQPWQVTLYFNRENIGLCGG KLK1 1 MCFLLLCLALTLGGTG-AVFPLPGIQIEARIYGGWECEKHSQPWQAAIYYNQGF--LCGA Blarinasin 1 MYLLLLCLPLTLMGTG-AVPPGPSIEIHPRIVGGWECDKHSQPWQALLTFTNGLDGVCGG 4_Exon_KLK 1 MYFLLLCLALTLMGTG-AAPPYPGIQIHARIVGGWECDKHSQPWQAVLTFAK--NGFCGG BLTX 1 MCFLLLCLTLTLAGTG-AVPTGPSIEIHSRIIGGWECDKHSQPWQALLTFTRKHNSVCGG
KLK_not_express 60 VLIHPKWVLTAAHCLGENYHIWLGLQGQTLNLSKAQHNWVSGKFPHPLY-MTQRNRWKSS KLK1 58 VLVHPMWVLTAAHCIDRDYKVWLGLHNSSAPESTAQFFRVSESVLHPLFNLSLLIPMGNP Blarinasin 60 VLVHPQWVLTAAHCIGDNYKIKLGLHDRFSKDDPFQEFQVSASFPHPSYNMRLLKLLLSD 4_Exon_KLK 58 VLVHPQWVLTAAHCFQDNIKVILGLHDLVSNEDTVQKIQVNAIFLHPLYNMTLRNLLKHH BLTX 60 VLVHPQWVLTAAHCIGDNYKVLLGLHDRSSKESTVQEARVSARFPHPLYNMTLLNLLLSH
KLK_not_express 119 KIMYLINSGNARKVDYSNDLMLLRLELPAQLSDTVQVLDLPTQEPAEGSTCYIAAWSINY KLK1 118 DMTWKEFVDTFQGVDFSHDLMLLRLDRPAVLTDTVKVLDLPTQEPQVGSKCLTSGWGSTD Blarinasin 120 ELNDTYYDEISLGADFSHDLMMMQLEKPVQLNDAVQVLDLPTQEPQVGSKCHASGWGSMD 4_Exon_KLK 118 TKNSL---KTFRRADFSHDLMLLHLEHPVQLTDAVQVLDLPTQEPQVGSNCYASGWGSIN BLTX 120 KMNLTFFYKTFLGADFSHDLMLLRLDQPVQLTDAVQVLDLPTQEPQVGSTCHVSGWGRTS
KLK_not_express 179 PETSRPID-TSSKLQCVNFKLLSNNVCGRNYVEKVTDTMLCAGRMEGSKGSCMGDSGAPL KLK1 178 SYKGETIVKLSRELRCVDLDLLPNDDCAKAQIAKVTEYMLCAGVMEGGKDTCVGDSGGPL Blarinasin 180 PYSRIDFP-RTGKLQCVDLTLMSNNECSRSHIFKITDDMLCAGHIKGRKDTCGGDSGGPL 4_Exon_KLK 175 PDAKNSFV-LPKTLQCVDLALLPNEICSRAYIFKMTEAMLCAGHMKGGKDTCG------BLTX 180 QNYENSFV-LPEKLQCVEFTLLSNNECSHAHMFKVTEAMLCAGHMEGGKDSCVGDSGGPL
KLK_not_express 238 ICDGMLHGIASWGAPPCSPHYKSGLFVKIFPYVNWIQETIKANT KLK1 238 ICDGVFQGITSWGVGPCAYRQKPGLYVKLFSYVDWIRETIATHS Blarinasin 239 ICDGVFQGTTSWGSYPCGKPRMPGVYVKIFSHVDWIREIIATHS 4_Exon_KLK ------BLTX 239 ICDGVFQGIASWGSSPCGQQGRPGIYVKVFLYISWIQETIKAHS
Figure 1. Alignment of all Kallikrein-Like Serine Proteases identified from both genomic and transcriptomic analysis.
38
Kallikrein Genomic Orientation and Expression
14000
12000
10000
8000
6000 Expression (TPM) Level
4000
2000
0
KLK15 KLK1 Blarinasin BLTX 4-Exon Non ACPT Kallikrein Expressed Kallikrein
Figure 2. Genomic orientation of KLK1 tandem array in B. brevicauda. Flanking genes (KLK15 and ACPT) are shown along with expression values in TPM for each KLK1 paralog.
39
Figure 3: Three-dimensional model of BLTX showing sites undergoing positive selection (Magenta colored). Active site residues are highlighted in red. Note that amino acids under positive selection (magenta) are away from the active site (red).
40
Relative Abundance of Putative Toxins in Salivary Proteome 12 10 8 6 4 2 Relative Abundance (%) 0
BLTX KLK1 Soricidin Blarinasin Endothelin Truncated-KLK1
Oxytocin/Neurophydin
Double-Headed Protease Inhibitor
WH4 WH5
Figure 4: Relative abundance of Putative Venom Proteins in the saliva for both WH5 and WH4 individuals.
41
Figure 5: Phylogenetic reconstruction of KLK1 sequences from B. brevicauda and the common shrew (S. araneus). Hedgehog and Star-Nosed Mole KLK1 sequences were used as an outgroup. Node labels indicate bootstrap support (1000 bootstraps).
42
Figure 6: Electrostatic potential of modelled surface residues for 1) BLTX 2) Blarinasin 3) Non- Expressed Kallikrein 4) KLK1 in B. brevicauda. Red indicates a more negative electric potential, whereas blue indicates a more positive electric potential.
43
Figure 7: ClusPro predicted protein-protein interaction between BLTX (Purple) and the Double Headed Protease Inhibitor (Teal). The active sites in the catalytic cleft of BLTX are highlighted in yellow.
44