<<

A Comprehensive Multi-Omic Approach Reveals a Simple in a Diet Generalist, the Northern Short-Tailed , Blarina brevicauda

Thesis

Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the

Graduate School of The Ohio State University

By

Zachery R. Hanf B.S.

Graduate Program in Evolution, Ecology, and Organismal Biology

The Ohio State University

2019

Thesis committee:

Andreas Chavez, Advisor

H. Lisle Gibbs

Marymegan Daly

i

Copyright by Zachery R. Hanf 2019

ii Abstract

Venom is often comprised of a mixture of proteins derived from duplication events of regulatory genes that have undergone positive selection and neofunctionalization. Proteins operating in venom are often designed to disrupt some physiological process in prey organisms, and thus, in venomous organisms who prey on multiple divergent prey with differing physiologies, it is hypothesized that venom complexity should mirror dietary complexity.

Blarina brevicauda, the northern-short tailed shrew, is a venomous which has been observed to eat a variety of prey from widely divergent taxa. However, there have only been two proteins that have been implicated as having toxic properties isolated from this venom system.

Given that venom is typically comprised of many toxic proteins and that B. brevicauda possesses a very broad diet, herein we investigate using a multi-omic approach to determine whether B. brevicauda’s venom is more complex than is currently understood. We generated a de novo transcriptome assembly of the submaxillary gland using both short-read and long-range sequencing data, assembled a reference B. brevicauda genome, and also sequenced the salivary proteome for these . We report that despite containing a complex diet, B. brevicauda’s venom system is likely limited to BLTX, PLA2, and proenkephalin (Soricidin). Additionally, we find that KLK1 duplications are more extensive than previously realized, and that duplication events of these genes may have come from a distant shrew ancestor. Finally, we find two proteins that may be functioning as endogenous defense mechanisms and have high probability of counteracting the self-harming effects of BLTX. Taken together and with the lack of major dietary change in B. brevicauda from other non-venomous shrews, these results suggest that venom may serve as an accessory adaptation to its greater prey capture method. Further functional assays for all venom proteins on both vertebrate and invertebrate prey would provide

iii further insight into the ecological relevance of venom in this species, and possibly other venomous Eulipotyphlan species.

iv

Acknowledgements

I would like to acknowledge and thank my advisor, Andreas Chavez, for his extreme support and dedication to helping me throughout the entirety of this project. I would also like to thank my committee members H. Lisle Gibbs and Marymegan Daly for their valuable and constructive feedback through the planning and execution of my project.

I would like to thank my various lab mates and undergrads for their help and fostering an environment for critical thinking. I would like to thank my partner Whitney King for her never ending support throughout my graduate education. Finally, I would like to thank my parents for believing in and supporting me during my educational journey.

v

Vita

2017…………………………………………………...B.S. The Ohio State University

2017,18,19…………………………………………….Graduate Teaching Assistant,

College of Life Science Education,

The Ohio State University

2017, 18……………………………………………….Graduate Teaching Assistant,

Department of Evolution, Ecology,

and Organismal Biology,

The Ohio State University

Field of Study

Major Field: Evolution, Ecology, and Organismal Biology

vi Table of Contents

Abstract………………………………………………………………………………………….iii

Acknowledgements……………………………………………………………………………....v

Vita……………………………………………………………………………………………….vi

List of Tables…………………………………………………………………………………...viii

List of Figures……………………………………………………………………………………ix

Introduction………………………………………………………………………………………1

Methods…………………………………………………………………………………………...4

Results…………………………………………………………………………………………...14

Discussion……………………………………………………………………………………….20

Conclusion………………………………………………………………………………………29

Literature Cited .……………………………………………………………………………….30

Appendix………………………………………………………………………………………...36

vii List of Tables

Table 1: Results from various steps of the bioinformatical processing of raw RNA reads from the transcriptome of two individual B. brevicauda…………………………………………………36

Table 2: Results from positive selection analyses conducted using the codeml program in PAML.

M8a represents the null model in which dn/ds is fixed to 1 with 11 classes of sites taken into account. M8 represents the alternative model in which dn/ds is allowed to vary amongst 11 classes of sites. …………………………………………………………………………………37

viii List of Figures

Figure 1: Alignment of all Kallikrein-Like Serine Proteases identified from both genomic and transcriptomic analysis…………………………………………………………………………...38

Figure 2: Genomic orientation of KLK1 tandem array in B. brevicauda. Flanking genes (KLK15 and ACPT) are shown along with expression values in TPM for each KLK1 paralog………………………………………………………………..…………………………..39

Figure 3: Three-dimensional model of BLTX showing sites undergoing positive selection

(Magenta colored). Active site residues are highlighted in red………………………………….40

Figure 4: Relative abundance of Putative Venom Proteins in the saliva for both WH5 and WH4 individuals……………………………………………………………………………………….41

Figure 5: Phylogenetic reconstruction of KLK1 sequences from B. brevicauda and S. araneus

(sequences with the “gi|” prefix). and Star-Nosed Mole KLK1 sequences were used as an outgroup. Node labels indicate bootstrap support (1000 bootstraps)…………………………42

Figure 6: Electrostatic potential of modelled surface residues for 1) BLTX 2) Blarinasin 3) Non-

Expressed Kallikrein 4) KLK1 in B. brevicauda. Red indicates a more negative electric potential, whereas blue indicates a more positive electric potential………………………………………. 43

Figure 7: ClusPro predicted protein-protein interaction between BLTX (Purple) and the Double

Headed Protease Inhibitor (Teal). The active sites in the catalytic cleft of BLTX are highlighted in yellow………………………………………………………………………………………….44

ix

Introduction

Venom is typically comprised of a complex mixture of proteins and short peptides that are designed to disrupt prey’s vital physiological processes (Casewell et al. 2013). The proteins functioning within venom are often convergently co-opted from the same regulatory gene families, even amongst widely divergent venomous organisms (Fry and Wüster 2004; Fry et al.

2009). Gene duplications followed by rapid positive selection in these gene families allows for extreme functional diversity in venom proteins, and therefore allows for diversification of molecular targets. Thus, it is predicted that a venomous organism’s ability to prey on a specific prey type is facilitated by the classes of venom proteins that it produces. Therefore, in venomous that prey on multiple species, the dietary-breadth hypothesis predicts that venom complexity should correlate with dietary complexity (Daltry et al. 1996; Barlow et al. 2009;

Phuong et al. 2016; Pekár et al. 2018).

Within , only a handful of extant species have evolved venom for prey capture and all of these species are within the Order (shrews, moles, , and ) (Dufton 1992; Ligabue-Braun 2015). Eulipotyphlans are comprised of more than

500 species that have colonized a diverse range of terrestrial environments worldwide from arctic to tropical regions. Most Eulipotyphlans, including the venomous species, are generalist predators that feed on vertebrate and invertebrate prey from divergently related phyla, including arthropoda, annelida, mollusca, and chordata (Hamilton 1930; Hamilton 1941; Eadie

1952). Because venomous and non-venomous species share similar feeding habits, the selective pressures that drove the gain of venom in only certain Eulipotyphlans has been puzzling. This has stimulated debate as to whether venom evolved to help these species hunt larger prey,

1

particularly small vertebrates, or if venom evolved to further facilitate the caching of small invertebrate prey. Both of these are behaviors have been observed in both venomous and non- venomous species, thus leading to the lack of clarity as to what selective factor drove the evolution of venom in only a handful of species. It is also possible that these hypotheses are not mutually exclusive, and that venom is important for both situations.

Blarina brevicauda (the northern-short tailed shrew) has the most potent venom of the

Eulipotyphlans (Dufton 1992). Dietary studies of B. brevicauda show that they consume widely divergent vertebrate and invertebrate prey including: , annelids, molluscs, arachnids, and ( Hamilton 1930; Hamilton 1941; George et al. 1986). Early functional studies on B. brevicauda’s venom from extracts from their submaxillary gland revealed that it causes respiratory arrest and even death in vertebrates, however, the venom also has a paralytic effect on invertebrates (Pearson 1942; Martin 1981). These findings suggest that B. brevicauda venom may have multiple functionalities for targeting different physiological functions in very divergent prey. More recently, proteomic isolation of venom proteins from submaxillary glands of B. brevicauda yielded two Kallikrien-1 (KLK1) like serine proteases (BLTX and Blarinasin).

However, only BLTX exhibited toxic activity in mice, specifically, by cleaving kinogens to bradykinins, but it’s functionality was never evaluated in invertebrates (Kita et al. 2004; Kita et al. 2005). Kita et al (2004) also pointed out that many mammals contain KLK1’s that can cleave kinins to bradykinin, but not all mammals are venomous, and stated that there are likely other constituents in the venom contributing to the lethality of shrew saliva. Furthermore, a short peptide called Soricidin, has been isolated from B. brevicauda and found to have paralytic effects on invertebrates, but similarly, this peptide was not tested on vertebrate prey (Patent:

US8003754B2). Given that venom is typically comprised of many toxic proteins and that B.

2

brevicauda possesses a very broad diet, we explored whether B. brevicauda’s venom complexity matches their diet complexity and whether it contains more toxins than is currently understood.

To gain more insight into the complexity of venom in B. brevicauda, we took a comprehensive mulit-omic approach utilizing genomic, transcriptomic, and proteomic methods.

We generated a de novo transcriptome assembly of the submaxillary gland using both short-read and long-range sequencing data from cDNA. We also assembled a de novo reference genome to be used with our reference transcriptome to bioinformatically search for transcripts with homology to known venom components found in other venomous systems. From our list of putative venom genes, we explored patterns of gene expression, evidence for positive selection, and their 3D-protein structure to provide more insight into their potential role as a venom transcript. These transcriptomic results were then compared with proteomic profiles of saliva from B. brevicauda to see which putative-venom transcripts from the submaxillary gland were present in the saliva and could potentially be used as venom.

3

Methods

Animal Capture and Tissue and Saliva Procurement

We captured two wild B. brevicauda animals (one female (WH4) and one male (WH5)) in pitfall traps near woodpiles at The Ohio State University’s Waterman Farm Headquarters in

June 2017. Saliva was collected by scruffing shrews and allowing them to bite on a piece of sterile medical tubing. The tubes were then placed in a sterile Eppendorf tube and put on ice and within 30 minutes were stored at -80oC. Shrews were then euthanized via an extended period of inhaled isoflurane. The submaxillary glands containing venom transcripts and other internal organs were immediately removed and preserved in RNA Later (Invitrogen) at room temperature for 24 hours and then stored at -80oC.

High-Molecular-Weight gDNA Extraction

High-molecular-weight genomic DNA (HMW gDNA) was isolated from frozen heart tissue of one B. brevicauda animal (WH4) using the Puregene Kit (Qiagen) following the manufacturer’s protocol with slight modification. These modifications included replacing all steps that required vortexing with gentle inversions to protect the integrity of larger DNA strands. HMW gDNA was quantified with the Qubit dsDNA HS Assay Kit (Life Technologies) and the quality of the molecular weight was assessed by using both genomic screen tapes on a

Tapestation 2200 (Agilent) and a pulse-field gel electrophoresis on a Pippin Pulse (Sage

Sciences). Fragments smaller than 600bp were removed from the HMW gDNA sample using the

Pippin HT (Sage Science).

4

Genomic DNA Library Construction, Sequencing, and de novo Assembly

A Chromium Controller Instrument (10X Genomics) at the DNA Technologies and

Expression Analysis Core at the UC Davis Genome Center was used for sample preparation of a

10X Genomics “linked-read” library for de novo genome assembly. Sample indexing and partition-barcoded libraries were prepared using the Chromium Genome Reagent Kit (10x

Genomics) according to manufacturer’s protocols described in the Chromium Genome User

Guide Rev A (https://support.10xgenomics.com/de-novo- assembly/sample-prep/doc/user-guide- chromium-genome-reagent- kit-v1-chemistry). In summary, approximately 1ng of HMW gDNA in Master Mix was combined with a library of Genome Gel Beads and partitioning oil to create

Gel Bead-In-Emulsions (GEMs) within a microfluidic Genome Chip. HMW gDNA was partitioned across ~1 million GEMs where library construction took place. The library construction incorporated a unique 16bp barcode, an Illumina R1 sequencing primer, and a 6bp random primer sequence. GEM reactions were isothermally incubated (for 3 h at 30°C ; for 10 min at 65°C; held at 4°C), and barcoded fragments ranging from a few to several hundred base pairs were generated. After incubation, the GEMs were broken and the barcoded DNA was recovered. Solid Phase Reversible Immobilization (SPRI) beads were used to purify and size select fragments for library preparation.

Standard library prep was performed according to the protocol described in the

Chromium Genome User Guide Rev A (https://support.10xgenomics.com/de-novo-assembly/ sample-prep/doc/user-guide-chromium-genome-reagent-kit-v1- chemistry) to construct one sample-indexed library using 10x Genomics adaptors. The final library contained the P5 and P7 primers used in lllumina bridge amplification and was quantified by qPCR. Sequencing was

5

conducted with an Illumina HiSeq X with 2×150 paired-end reads based on the manufacturer’s protocols.

We assembled the “linked-read” HiSeq data using Supernova 2.1.0TM assembler

(Weisenfeld et al. 2017) using the default recommended settings. Genome-wide statistics were calculated on the total number of phase blocks and the N50 of individual phase block sizes in the pseudohap outputs produced in the Supernova 10X assembly. Useful statistics about the genome assembly were also ascertained using the stats.py script that is part of the BBMAP suite

(Bushnell 2014).

RNA Isolation and Sequencing

We extracted total RNA from the submaxillary tissue for both shrews using the Qiagen

RNeasy Plus Mini Kit following the manufacturer’s protocol. RNA concentration and quality were assessed using RNA screen Tapes on a TapeStation 2200 (Agilent Technologies). Poly-A- selected RNA libraries were constructed from total RNA for both shrews using a Kapa mRNA

Hyperprep Kit for Illumina platforms (Kapa Biosystems). Final library concentrations and fragment-size distributions were confirmed using a Qubit RNA HS Assay Kit (Invitrogen) and a

TapeStation 2200 (Agilent Technologies), respectively. Sequencing was conducted with an

Illumina HiSeq 4000 with 2×150 paired-end reads based on the manufacturer’s protocols at the

DNA Technologies and Expression Analysis Core at the UC Davis Genome Center.

We also generated long-range cDNA sequence data, which has been shown to be useful in resolving transcripts from difficult and highly paralogous venom gene families (Hargreaves and Mulley 2015), using the Oxford Nanopore’s Minion Sequencing platform. cDNA libraries were prepped with Nanopore’s cDNA-PCR Sequencing Kit (SQK-PCS108) for one shrew

6

(WH5) due to our limited amount of RNA. Libraries were prepared using Nanopore’s cDNA protocol and suggested enzymes (https://community.nanoporetech.com/protocols/cdna-pcr- sequencing/v/pcs_9035_v108_revh_26jun2017; LongAmp Taq 2X Master Mix: NEB;

RNaseOUT: Thermofisher; SuperScript IV reverse transcriptase, 5x RT buffer and 100 mM

DTT: ThermoFisher Exonuclease 1: NEB). The final library was loaded into a FLO-MIN 106 R9 flowcell, and data acquisition was carried out without live basecalling until almost all of the flowcell’s pores read inactive (~10 hours).

Transcriptome Assembly

150bp paired-end RNAseq reads were pre-processed using methods adapted from Singhal

(2013) and Bi et al. (2012) with some modifications. Briefly, Trimmomatic (Bolger et al. 2014) was used to trim adapter contaminations and low-quality reads. Exact PCR and/or optical duplicate reads were removed using Super-Deduper (https://github.com/dstreett/Super-Deduper).

Bowtie2 (Langmead and Salzberg 2012) was used to align the resulting reads against the

Escherichia coli genome to remove potential bacterial contamination that might be present in the raw data. Overlapping paired reads were then merged using Flash (Magoč and Salzberg 2011) and their final quality assessed using FastQC (Babraham Bioinformatics).

Raw Minion Reads from WH5 were converted from fast5 format to fastq format using the command-line python script for basecalling in Albacore. Bases whose quality score were below 10 were trimmed using Nanofilt (https://github.com/wdecoster/nanofilt).

We created three different assemblies using the various data types. This included (1) a de novo assembly for WH4 using only short-read data from our HiSeq sequencing run (WH4- denovo); (2) a de novo assembly for WH5 using only short-read data (WH5-denovo); and (3) a

7

de novo assembly for WH5 using both short-read data and long-read data from the Minion sequencing run (WH5-longread). All assemblies were created using the default parameters in

Trinity, except for the inclusion of the long reads parameter (--long_reads) in the WH5-longread assembly where we also used the Minion reads. We used the TrinityStats.pl script to assess quality of the transcripts for each of the three assemblies.

Saliva Proteomics Sample Preperation

The tube containing shrew saliva was soaked into 50mM Ammonium Bicarbonate solution (just enough to cover the tube). The inside of the tube was washed by pipetting ammonium bicarbonate solution in and out 20 times. Ammonium Bicarbonate solution was then removed and placed in a separate Eppendorf tube. The wash procedure was repeated two times and ammonium bicarbonate solution was pooled together, concentrated in speed vacuum for a final volume of ~100uL for digestion (Sample prep #1). 200uL of 50mM Ammonium

Bicarbonate solution was then added to the Eppendorf tube containing the back tube; following which the sample was shaken at room temperature for 30min, the solution was collected (Sample prep #2). Concentration of the proteins were then measured using Qubit.

For digestion, 5uL of DTT (5ug/uL in 50 mM ammonium bicarbonate) was added to sample prep #1 and #2 individually. Samples were incubated at 56°C for 15 min. After the incubation, 5uL of iodoacetamide (15 mg/ml in 50mM ammonium bicarbonate) was added and the sample incubated in the dark at room temperature for 30 min. Sequencing grade-modified trypsin (Promega, Madison WI) prepared in 50 mM ammonium bicarbonate was added to the sample with an estimation of 1:50 enzyme-substrate ratio and the reaction was carried on at 37°C for overnight. After the digestion, acetic acid was added to the sample to quench the reaction.

8

The samples were dried in a vacufuge and resuspended in 20 uL of 50 mM acetic acid. Peptide concentration was determined by nanodrop (A280nm).

LC-MS/MS

Liquid chromatography-nanospray tandem mass spectrometry (LC/MS/MS) for protein identification was performed on a Thermo Scientific orbitrap Fusion mass spectrometer equipped with an EASY-Spray™ Source and operated in positive ion mode. Samples were separated on an easy spray nano column (PepmapTM RSLC, C18 3µ 100A, 75µm X250mm Thermo

Scientific) using a 2D RSLC HPLC system from Thermo Scientific. Each sample was injected

(2ug) into the µ-Precolumn Cartridge (Thermo Scientific,) and desalted with 0.1% Formic Acid in water for 5 minutes. The injector port was then switched to inject and the peptides were eluted off of the trap onto the column. Mobile phase A was 0.1% Formic Acid in water and acetonitrile

(with 0.1% formic acid) was used as mobile phase B. Flow rate was set at 300nL/min. Mobile phase B was increased from 2% to 20% in 40 min and then increased from 20-32% in 10min and again from 32%-95% in 6 min and then kept at 95% for another 2 min before being brought back quickly to 2% in 2 min. The column was equilibrated at 2% of mobile phase B (or 98% A) for 15 min before the next sample injection. MS/MS data was acquired with a spray voltage of 1.7 KV and a capillary temperature of 275 °C is used. The scan sequence of the mass spectrometer was as follows: the analysis was programmed for a full scan recorded between m/z 375 – 1700 and a

MS/MS scan to generate product ion spectra to determine amino acid sequence in consecutive scans starting from the most abundant peaks in the spectrum in the next 3 seconds. To achieve high mass accuracy MS determination, the full scan was performed at FT mode and the resolution was set at 120,000. The AGC Target ion number for FT full scan was set at 4 x 105

9

ions, and maximum ion injection time was set at 50 ms. MSn was performed using ion trap mode to ensure the highest signal intensity of MSn spectra using CID (for 2+ to 7+ charges). The AGC

Target ion number for ion trap MSn scan was set at 1x 104 ions, and maximum ion injection time was set at 30 ms. The CID fragmentation energy was set to 30%. Dynamic exclusion is enabled with a repeat count of 1 within 60s and a low mass width and high mass width of

10ppm.

Sequence information from the MS/MS data was processed by converting the .raw files into a merged file (.mgf) using MS convert (ProteoWizard). The resulting mgf files were searched using Mascot Daemon by Matrix Science version 2.3.2 (Boston, MA) and the database searched against a custom database derived from the transcriptome of our individual shrews. The mass accuracy of the precursor ions were set to 10ppm, and accidental inclusion of 1 13C peaks was also included into the search. The fragment mass tolerance was set to 0.5 Da. Considered variable modifications were oxidation (Met), deamidation (N and Q), acetylation (K), and carbamidomethylation (Cys) was set as a fixed modification. Four missed cleavages for the enzyme were permitted. A decoy database was also searched to determine the false discovery rate (FDR) and peptides were filtered according to the FDR. Proteins with less than 1% FDR as well as a minimal of 2 significant peptides detected were considered as valid proteins.

Bioinformatic Pipeline for Identification of Novel Toxins

To identify putative venom transcripts from our de novo transcriptome assemblies, we applied a pipeline described in Verdes et al. (2016) that filters out transcripts that possess signal peptides and don’t match any toxins from existing toxin databases. We applied this method to

10

each of the three de novo transcriptome assemblies. First, we identified the coding potential of each transcripts from each transcriptome assembly using TransDecoder

(https://github.com/TransDecoder/TransDecoder/wiki). The resulting amino acid sequences were then searched for the presence of signal peptides using the stand-alone command line version of

SignalP (Petersen et al. 2011). Amino acid sequences with signal peptides were then blasted against a curated database of known toxin protein sequences extracted from Tox-Prot

(https://www.uniprot.org/program/Toxins), a subdivision of the UniProt database, using blastp from the BLAST+ package (NCBI) with an e-value cutoff of 1e-5. The remaining transcripts were then manually annotated by blasting against NCBI’s non-redundant database using blastp.

Finally, expression levels for each of the putative-toxin transcripts were calculated using Kallisto

(Bray et al. 2016). We estimated transcripts per million (TPM) for all transcripts in both transcriptome assemblies and further examined only the transcripts with TPM values greater than

100 to minimize transcripts that are likely not to be biologically relevant in venom. Annotated transcripts with a match to known venom-protein families and with expression levels over 1,000

TPM were considered putative-venom genes.

Selection Tests

Positive selection has been identified as a driver of functional variation of duplicated genes in many gene families in many venom systems (Duda and Palumbi 1999; Vonk et al.

2013). Therefore in order to investigate the effects that positive selection has had on shaping the venom system of B. brevicauda, we conducted tests for positive selection in our putative venom genes using the Codeml program in PAML (Yang 2007). First, one-to-one orthologs for each putative toxin were downloaded from Ensembl (Frankish et al. 2017). Original alignments were

11

made using the MUSCLE (Edgar 2004) plug-in for Geneious v. 11.0.4

(https://www.geneious.com). These alignments were then manually evaluated for quality and edited if sequences were too short, had too many N’s, or were likely erroneous orthologues.

Sequences were then realigned using the Translation Align plug-in for Geneious and processed with Trimal with the -automated1 flag (Silla-Martínez et al. 2009) to remove spurious alignments and large uninformative gaps. Alignments were then converted into Phylip format with a custom python script so phylogenetic trees could be inferred using the GTR GAMMA nucleotide model with rapid bootstrapping methods (1,000 bootstraps) in RAxML (Stamatakis

2014). Branch-site models in the Codeml package in PAML (Yang 2007) were analyzed with the

Maximumx Likelihood tree and final Trimal Phylip alignment to test for positive selection of putative venom genes. Briefly, to do this the null M8null model, in which positive selection was prohibited (dN/dS fixed equal to 1) for 11 different site classes, was compared to the alternative model M8, in which positive selection was not restricted (dN/dS not fixed). Finally, we used likelihood ratio tests (2 x lnL1 - lnL0, df = 1) to determine statistical significance between the two models.

Phylogenetic Inference of KLK1 Gene Family

In order to infer the evolutionary history of KLK1 duplications within B. brevicauda, we created a phylogenetic tree containing genes from B. brevicauda and its sister species the common shrew ( araneus, for which a reference genome is available) (NCBI Accession numbers: XM012935295.1, XM012935294.1, and XM012935293.1 ). We used Eulipotyphlan

KLK1 genes from the star-nosed mole (Condylura cristata) and the hedgehog ( europaeus) (NCBI Accession numbers XM_004694038.2 and XM_007531288.2 respectively),

12

as outgroups. Protein sequences for KLK1 genes for the common shrew, star-nosed mole and hedgehog were downloaded from NCBI. Alignments were made from these sequences using the

MUSCLE (Edgar 2004) plug-in for Geneious v. 11.0.4 (https://www.geneious.com), and trees were built using the GTR GAMMA nucleotide model with rapid bootstrapping method (1,000 bootstraps) in RAxML (Stamatakis 2014).

Protein Modelling and Docking

Due to the previous finding that substitutions in regulatory loops surrounding the catalytic cleft of BLTX facilitates its toxicity (Aminetzach et al. 2009), we modelled newly discovered KLK1 paralogous sequences to investigate their potential role as a toxin. Input protein sequences were trimmed of their signal peptides (as identified by SignalP) and fed into the homology-based protein folding, online server, MUSTER (Wu and Zhang 2008). We visualized predicted protein models and examined the electrostatic potential of surface residues using the APBS electrostatics plug-in in PyMOL (Schrödinger, LLC 2015). We also modelled a novel Serine Protease Inhibitor that was found in our transcriptome using the same methods as the KLK1’s with the intention of modelling protein-protein interactions between the inhibitor and BLTX. We modelled the potential interaction of BLTX and the Serine Protease Inhibitor using ClusPro by treating BLTX as the receptor and the double headed inhibitor as the ligand

(Comeau et al. 2004). Results from this prediction were also visualized using PyMOL.

13

Results

Sequencing and Assembly Statistics

The 10X Genomic Library from one individual shrew (WH4) produced a weighted mean molecule size of 20.32kb that was sequenced on an Illumina Hi-Seq X and generated

360,640,993 150bp paired-end reads. The reads were then assembled with 31X effective coverage using Supernova 2.1.0TM and produced a final 1.66gb assembly that contained 166,552 scaffolds with a scaffold N50 of 338,775bp, of which 16,115 scaffolds were greater than 10Kb in length. Supernova estimated the total genome size from our sample to be 2.52gb.

We also generated 150bp paired-end RNA-seq data from the submaxillary glands of two individual adult shrews (WH4 and WH5). We generated 12,239,696 paired-end reads on an

Illumina Hi-Seq 4000 for the WH4 individual and 12,789,562 reads for the WH5 individual.

After quality control and filtering, we assembled resulting reads using Trinity with the default parameters. This produced assemblies containing 31,657 transcripts for the WH4 individual

(WH4-denovo) and 29,630 transcripts for the WH5 individual (WH5-denovo).

Along with the short-read RNA-seq data, we also generated full-length cDNA sequences for one shrew (WH5) utilizing Nanopore’s Minion Benchtop Sequencer. After library preparation and ~10 hours of data acquisition, we generated 972,946 reads, which were reduced to 513,552 high-quality reads after quality filtering. These reads were then processed with the short-read RNA-seq data from WH5 (WH5-longread) in Trinity using the –longreads parameter to produce a final hybrid short-read/long-read assembly that contained 29,808 total transcripts.

Identification of Putative Venom Genes

14

We predicted 17,684 ORFs for the WH4-denovo assembly, 15,727 ORFs for WH5- denovo assembly, and 15,754 ORFs for the WH5-longread assembly using Transdecoder. After filtering all ORFs for peptide sequences containing a signal peptide, we found 780, 759, and 762 transcripts with signal peptides in WH4-denovo, WH5-denovo, and WH5-longread assemblies, respectively. From this list of transcripts with signal peptides, our blast results against the Tox-

Prot database revealed 59 sequences with homology to known toxins in the WH4-denovo assembly, 58 sequences in the WH5-denovo assembly, and 57 sequences in the WH5-longread assembly. The results from this portion of the pipeline are summarized in Table 1. Our comparison of these putative-toxin transcripts between the three different assemblies following annotation showed that there was overlap in transcripts. Only nine of these transcripts across all three assemblies were expressed at a high level with TPM values greater than 1,000. These nine sequences consisted of BLTX, Blarinasin, proenkaphalin, KLK1, a truncated KLK1 paralog, a double-headed protease inhibitor, oxytocin/neurophosin, endothelin-1, and a phospholipase A2

(PLA2) . Most of the other highly expressed transcripts that were filtered out from our toxin- search pipeline belonged to known salivary genes (e.g., mucin and amylase).

Further examination of the genomic region containing the known BLTX toxin and its paralog Blarinasin using both the transcriptome and genome assemblies revealed other KLK1 serine-protease paralogs that have not previously been identified in B. brevicauda (Figure 1). We found two additional paralogous KLK1 genes showing homology to BLTX, however one of them appeared to be missing the fifth exon of the gene (hereafter referred to as the ‘KLK1 paralog’ and the ‘truncated-KLK1 paralog,’ respectively). Both of these additional paralogous

KLK1 genes were expressed at a relatively high abundance in the submaxillary gland. We also discovered a third additional KLK1 paralog in our reference genome that was not expressed in

15

the submaxillary gland, however, it did contain a translatable open-reading frame (hereafter referred to as the ‘non-expressed KLK1 paralog’). According to our reference genome, all paralogs were oriented in a tandem array (Figure 2). We also surmised that Blarinasin-1 and

Blarinasin-2 that were treated as distinct paralogs from previous investigations (Aminetzach et al. 2009) were also likely the same gene.

Selection Tests

We investigated patterns of selection on seven of the nine putative-toxin genes and found that only three of these showed evidence of positive selection using the branch-site tests (Table

2). These three genes included BLTX (2 sites: 187 V, 207 H) (Figure 3), the non-expressed

KLK1 paralog (1 site: 186 D), and PLA2 (1 site: 75 G). We did not perform selection tests on the double-headed protease inhibitor and the truncated-KLK1 paralog because we were unable to find orthologous sequences for both of these genes.

Mass Spectrometry of Saliva

LC/MS analysis of the saliva from two adult B. brevicauda individuals yielded 127 proteins with at least 2 unique identifying spectra for the WH4 individual and 97 proteins for the

WH5 individual. Of these identified proteins, 74 were shared between the two saliva samples.

The results for putative venom proteins are summarized in Figure 4. Interestingly, seven of the ten most abundant proteins were shared between the two individuals. These seven included

BLTX, Blarinasin, the truncated KLK1 paralog, and PLA2. BLTX appeared to be the main component of the venom as it comprised 11.34% of the abundance of all salivary proteins in

WH4 and 7.57% in WH5. BLTX, Blarinasin, and the truncated-KLK1 paralog together,

16

accounted for 23.3% of all proteins identified in the WH4 sample and 19.16% of all proteins found in the WH5 sample. This is interesting because it suggests that around 20-25% of the salivary proteins in B. brevicauda are produced from the same set of paralogous genes. PLA2, in addition to being highly expressed in the submaxillary gland was also found to be the third highest abundant protein, constituting 8.98% of all proteins in WH4 and the fourth highest abundant protein, comprising 6.63% of the WH5 sample. Proenkephalin, although expressed at a high level in the submaxillary gland, was found only to be 1.77% of all proteins in WH4 and

2.62% in WH5.

We found no evidence of oxytocin/neurophysin in the proteome of the saliva for either individual shrews we sampled, despite this being expressed at a relatively high level in the submaxillary gland, therefore, we conclude that this protein is likely not an important component in B. brevicauda venom. Additionally, despite being the most highly expressed transcript in the submaxillary gland, the double-headed serine protease inhibitor was only detected in the saliva of the WH4 individual and comprised only 0.55% of the total salivary proteome. Similarly, endothelin-1, despite being highly expressed in the submaxillary gland, was found at very low levels in the saliva of both individual shrews and constituted only 0.72% of all proteins in WH4 and 0.92% in WH5.

Phylogenetic Inference of KLK1

Our phylogenetic tree of the KLK1 gene family revealed there were two ancient duplications of the KLK1 gene prior to the divergence between B. brevicauda and S. araneus, common shrew (Figure 5). This tree also showed that the KLK1s genes from the hedgehog and star-nosed mole, were outgroups, as expected, to shrew KLK1 genes. Low support values for nodes

17

leading to divergence between BLTX, Blarinasin, and the S. araneus ortholog (Common Shrew

KLK1-1) indicates the order to which these genes split from each other is not clear.

3D Protein Modelling of KLK1 Paralogs

The three-dimensional protein structure and predicted electrostatic potential for surface residues for BLTX and Blarinasin revealed similar patterns found by Aminetzach et al. (2009), wherein BLTX had a negative catalytic pocket surrounded by generally positive regulatory loops

(Figure 6.1) and Blarinasin, in contrast, had both a negative pocket and generally negative surrounding regulatory loops (Figure 6.2). The KLK1 paralog had similar 3D structure and electrostatic charge as Blarinasin (Figure 6.4) and with orthologous genes from other species.

The protein model for the non-expressed KLK1 paralog revealed that it had accumulated at least a few positive residues in the regions surrounding the active site, but lacked the canonical negatively charged pocket indicative of all KLK1s (Figure 6.3). We didn’t model the protein structure for the 4-exon KLK1 paralog due to the lack of a fifth exon potentially causing misfolding in the predicted protein structure.

The presence of a highly expressed double-headed serine-protease inhibitor in the submaxillary gland and its potential to impact the function of BLTX (which is a serine protease) prompted us to investigate protein-protein interactions between these using ClusPro. When treating BLTX as the receptor and the inhibitor as the ligand, ClusPro predicted docking of the protease inhibitor in the active site of BLTX (Figure 7). Specifically, the protease binding loop of the inhibitor bound to BLTX with the reactive P1-P1’ site of the inhibitor directly positioned in the active-site of BLTX. This configuration has been found as the general mode of inhibition for many serine protease inhibitors, suggesting the possibility that this inhibitor does interact with

18

BLTX in an inhibitory manor inside only the submaxillary gland since it is not really found in the saliva.

19

Discussion

In this study, we found the toxin composition of B. brevicauda venom to not be very complex. Using a multi-omic approach with a de novo reference genome, transcriptome of the submaxillary gland, and mass-spectrometry based proteomics of saliva, we found that the main venom components in the saliva are likely limited to BLTX, PLA2, and proenkaphalin. Two of these, BLTX and proenkephalin (particularly its shorter peptide named Soricidin), have been previously identified and tested to be functionally toxic to mice and mealworms, respectively

(Kita et al. 2004; Patent: US8003754B2). We also discovered additional KLK1 paralogs in tandem array with BLTX that are also found in the saliva, but do not appear to be toxic like

BLTX. Finally, we found two proteins in the submaxillary gland, but not in the saliva, that may act as endogenous self-defense mechanisms to BLTX.

Putative

Consistent with previous findings (Kita et al. 2004; Aminetzach et al. 2009), BLTX was found to be undergoing positive selection, was expressed at high levels in the submaxillary gland, and was consequently a major constituent of salivary proteins. Previously, this protein has been found to be lethal to mice by decreasing blood pressure through the cleaving of kinins to bradykinin (Kita et al. 2004). A previous study provides evidence suggesting that positive selection acting on lineage-specific insertions in and around the regulatory loops surrounding the catalytic pocket of BLTX have led to increased substrate specificity for this protein relative to its

KLK1 derivative (Aminetzach et al. 2009). However, we were not able to test for selection in these insertions because they are specific to B. brevicauda and thus were trimmed from our

20

alignment that included multiple other mammal species. However, the previous study by

Aminetzach et al. (2009) found no evidence of positive selection acting at a gene-wide level.

Whereas, we found evidence of positive selection at a gene-wide level at two sites (207 H and

187 V) that was possibly due to our analyses having more power to detect selection with a larger alignment with other divergent taxa (Fletcher and Yang 2010). Interestingly, one of these sites,

187 V, falls near a region surrounding the catalytic pocket of BLTX. However, since Valine is a non-polar, non-charged amino acid, it’s likely that this transition isn’t contributing the effects reported in Aminetzach et al. (2009).

As with previous work, we also found the previously characterized KLK1 paralog

Blarinasin in B. brevicauda (Kita et al. 2005). Interestingly, this previous study found that despite having specificity for similar substrates as BLTX, Blarinasin was not lethal when injected into mice. This may be explained by the fact that Blarinasin lacked the same positive amino acid substitutions in the regulatory loops surrounding it’s active sites as was found in

BLTX (Aminetzach et al. 2009). We also found in our transcriptome assembly the presence of truncated-KLK1 paralog that was present in the saliva of both shrews, and another KLK1 paralog, which was not present in the saliva of either shrew. Additionally, the genomic region encompassing all of these paralogs was also bordered by another KLK1 gene that was not expressed in the submaxillary gland despite containing a translatable open-reading frame. Given that these KLK1 paralogs are arranged in a tandem array in the genome, it is possible that at least some of them are subject to the same means of regulation (e.g. promoter/enhancer, post- transcriptional, post-translational). Interestingly, we notice that these KLK1 paralogs are being expressed at different levels in both the transcriptome and proteome, perhaps suggesting that they are being modulated through different regulatory mechanisms.

21

Our models of the three-dimensional structure and predicted electrostatic potential of unexpressed KLK1 paralog) showed that neither of them possess positive amino-acid substitutions in the regulatory loops surrounding the active sites as reported for BLTX

(Aminetzach et al. 2009). In fact, one of these sequences exhibited both a negatively charged catalytic pocket and surrounding regulatory regions, leading us to believe that this is the gene with canonical KLK1 activity in these shrews. The unexpressed KLK1 paralog found in our genome on the other hand, had a relatively neutral catalytic pocket and regulatory loops, suggesting that it may have lost its function as a kallikrein and/or gained a new function entirely.

This result is interesting, and maybe even expected, as this paralog was also found to be undergoing positive selection in our selection analysis.

Genomic duplication events of specific regulatory genes has been shown to result in the evolution of venom across many evolutionarily disparate venomous organisms (Gibbs and

Rossiter 2008; Vonk et al. 2013). It appears from our phylogenetic results of the KLK1 gene family that there were two old duplication events of KLK1 paralogs that preceded the duplication event for BLTX. These two earlier duplication events appeared to have occurred before the split between the Soricini and Blarinini lineages as both Sorex araneus and B. brevicauda possess these two earlier duplications along with the original KLK1 gene. The duplication event for

BLTX occurred more recently within the Blarinini lineage due to Sorex araneus not possessing a closely related paralog. Additionally, another duplication event occurred specific to the Blarini lineage and led to a fifth KLK1 paralog specific to B.brevicauda. However, none of these duplicated paralogs were found in hedgehog or star-nosed mole, suggesting that duplication of the KLK1 occurred within the Soricidae.

22

The proenkephalin gene that we discovered matched with Soricidin from the Tox-Prot database and was found to be highly expressed in the submaxillary gland, but only present at moderate levels as a protein in the saliva. Proenkaphalin is a precursor gene that undergoes post- translational cleavage resulting in multiple enkaphalins, which are short peptides involved in opioid receptor signaling (Henry et al. 2017). Proenkaphalins have been implicated as toxic components in both scorpion and fangblenny venom, where they exhibit hypotensive activity

(Zhang et al. 2012; Casewell et al. 2017). Soricidin is a small peptide that was previously isolated from B. brevicauda with efficacy in certain cancer treatments (Bowen et al. 2013; Zhang et al. 2014). Specifically, this small peptide has been shown to have high affinity for TRPV6

Calcium ion channels, and is capable of inhibiting the movement of calcium across the cellular membrane (an important physiological process in proliferating cancer cells) (Bowen et al. 2013).

Soricidin was also found to be highly effective at immobilizing mealworms (Patent:

US8003754B2), and thus it is likely that this toxin constituent is involved in B. brevicauda’s ability to cache and immobilize invertebrate prey for long periods of time.

The PLA2 gene found in our study had not previously been identified as a possible toxin in B. brevicauda’s venom. Although there are no functional assays on the toxicity of this gene, it possesses multiple characteristics that are suggestive that it has an important function in shrew saliva including it being highly expressed in the submaxillary gland, produced at high levels as a salivary protein, and it undergoing rapid evolution. Phospholipase A2 enzymes “PLA2s” have been convergently recruited into venom arsenals across the animal kingdom and depending on the specific venom system can have drastically different pharmacological effects including: neurotoxic, myotoxic, inflammatory, and haemolytic activities (Fry and Wüster 2004; Kordiš

2011). PLA2 has the general property of hydrolyzing phospholipids, and thus plays an important

23

role in lipid metabolism (Arni and Ward 1996). PLA2 paralogs and isoforms can have highly variable function and levels of toxicity even within the same venomous species/organism (Harris and Scott-Davey 2013). Interestingly, recent proteomic work from the saliva fodiens, an unrelated venomous-shrew species, revealed a protein with homology to PLA2, which was speculated to contribute to the paralytic effects observed in N. fodiens venom (Kowalski et al.

2017). The extreme variability in PLA2 function across venom systems makes it difficult to suggest an exact function for PLA2 within the B. brevicauda venom system, but nonetheless it is a likely candidate venom gene. Further proteomic isolation and functional assays are needed to elucidate the functional role of this protein within this venom system.

Putative Endogenous Venom Defense?

Many venomous are known to have inhibitory proteins in their circulatory systems that can act as direct antagonists to their own venom components (Mackessey 2010; Santos-Filho and Santos 2017). We believe this may be the case for the double-headed protease inhibitor found in B. brevicauda’s submaxillary gland. This inhibitor was one of the most highly expressed transcripts in B. brevicauda’s submaxillary gland, but was protein only at extremely low levels in the saliva and found in just one of the two shrew samples. This gene contains two

Kazal-type serine protease domains in tandem. Kazal-type serine protease inhibitors can contain multiple Kazal domains with a variable amount of amino acids making up each domain. This variability is thought to confer different specificities for their target serine proteases

(Rimphanitchayakit and Tassanakajon 2010). Kazal-type serine protease inhibitors have been found to have a role in the venom of the eyelash and side-striped palm vipers (B. schlegelii and B. lateralis), however, they are relatively rare venom constituents (Durban et al.

24

2011). We have shown with models of protein-protein interaction a high potential for the double- headed protease inhibitor to dock with BLTX directly in the active site of BLTX. Previous studies of kazal-type protease inhibitor domains have similarly shown protease binding loops binding directly to active sites of the protease they are inhibiting (Krowarsch et al. 2003;

Rimphanitchayakit and Tassanakajon 2010). Since the main component of B. brevicauda venom is a serine protease, BLTX, and we see little evidence for inhibitor proteins in the saliva, we surmise that this inhibitor is a self-defense mechanism against BLTX within the submaxillary gland where it is highly expressed.

Another potential inhibitor in B. brevicauda that is highly expressed in the submaxillary gland, but was not abundantly present as a salivary protein is endothelin-1. This gene is homologus and structurally similar to the venom Sarafotoxin, which is incredibly lethal to small vertebrates (Kochva et al. 1993) because it causes dramatic increases in blood pressure due to extreme vasoconstriction (Wollberg et al. 1989; Mackessey 2010). However, we suspect the high expression of endothelin within the submaxillary gland in B. brevicauda may be causing vasonconstriction as a means to ameliorate the vasodilation efffects caused by the toxic BLTX.

This would be consistent with previous work that has shown B. brevicauda to have a higher tolerance to its own venom than mice when injected with extracts from their own submaxillary gland (Pearson 1950).

Venom simplicity in relation to diet complexity

One of the lingering questions about the evolution of venom in B. brevicauda, and other venomous shrew species, is identifying the selective pressures that led to the production of venom. Diet complexity has been shown to correspond with greater complexity in venom

25

composition in many organisms including cone snails, snakes and (Daltry et al. 1996;

Phuong et al. 2016; Pekár et al. 2018). However, despite their feeding on a wide-range of prey from different animal phyla, including Arthropoda, Annelida, Mollusca, and Chordata, we find that there are likely just a few toxic proteins in B. brevicauda saliva (Hamilton 1941; George et al. 1986). Most non-venomous shrews and other Eulipotyphlans have as broad of a diet as B. brevicauda and other venomous shrews (Hamilton 1930; Wall 1990; Churchfield and Sheftel

1994). This order of mammals relies heavily on an active foraging strategy, specialized dentition, and masticatory systems for capturing and consuming prey (Dufton 1992; Furió et al. 2010;

Folinsbee 2013). Thus, our finding for the simplicity of the venom in B. brevicauda suggests that evolution of venom does not represent a major transition in foraging strategy in this taxonomic group, but that venom might be a very specialized accessory to their greater prey-capture strategy.

Other hypotheses that have been proposed to explain why venom evolved in B. brevicauda include the “hunting big” and the “hoarding small” hypotheses (Dufton 1992; Rode-

Margono and Nekaris 2015). The “hunting big” hypothesis proposes that venom evolved in B. brevicauda to help facilitate the capture of small vertebrate prey. Certainly, the effects of BLTX toxin could be a contributing factor to helping this shrew accomplish this. On the other hand, the

“hoarding small” hypothesis proposes that venom has evolved to aid in the long term caching of invertebrate prey, a behavior observed in B. brevicauda, as well as several other non-venomous shrews (Hamilton 1930; Hamilton 1941; Robinson and Brodie 1982; Martin 1984). The functional effects of Soricidin toxin certainly support this hypothesis. PLA2 toxins are found in a wide range of venomous animals that specialize on either vertebrate and invertebrate prey and thus without further functional assays it is not clear whether this putative toxin helps B.

26

brevicauda capture vertebrate or invertebrate prey. To better understand the merit of either or both hypotheses, it would be helpful if there were functional studies for all toxin types on both vertebrate and invertebrate prey. For instance, BLTX was only functionally experimented on mice whereas, Soricidin was only functionally tested on mealworms. Furthermore, it would also be helpful to have more information on how much vertebrate and invertebrate prey make up the total caloric amount in B. brevicauda’s diet, as well as in other venomous and non-venomous species’ diet. A recent study on Neomys fodiens, another venomous shrew, suggests that N. fodiens caches statistically less invertebrate prey than S. araneus, a non-venomous shrew

(Kowalski and Rychlik 2018). However, N. fodiens were able to overpower and cache larger prey more quickly than S. araneus, possibly suggesting that venom allows for a dietary expansion towards larger prey. Further dietary analyses on B. brevicauda would be vital in determining if functional convergence in prey type is evident between these shrews.

We cannot rule out the possibility that B. brevicauda seasonally modulate the toxin composition of their venom to reflect prey availability. Venom composition has been shown to vary seasonally in multiple different venomous animals (Gubenšek et al. 1974; Cologna et al.

2018). Shrews are active in the winter and many species hoard prey to help them survive through the winter months due to their large metabolic demands and reduced prey availability. (Vander

Wall 1990). Blarina brevicauda has been observed to change their hoarding preference at different times of the year. For instance, in a study by Martin (1984), B. brevicauda was found storing seeds in October, then hoarding insects in November, and then caching mice by the end of November (Martin 1984). Throughout most of the geographic range of B. brevicauda, the climate in the winter is extremely cold and the invertebrate prey are not as readily available.

Therefore, it is possible that the composition of B. brevicauda’s venom changes to accommodate

27

the availability of different types of prey. We conducted our study in the summer and thus may have missed changes in venom components or changes in their abundance.

28

Conclusion

Our research represents one the first multi-omic comprehensive characterizations of a venom system. The components of venom in B. brevicauda appear to be relatively simple.

Venom has evolved in at least two other Eulipotyphlan genera, and possibly several more given multiple anecdotal accounts of other shrew species paralyzing their prey (Dufton 1992; Folinsbee

2013) We don’t know the molecular bases of these other venomous shrews and whether the functional convergence evolved from similar proteins. Herein, we provide a comprehensive investigation of the venom system in short-tailed shrews that furthers our understanding of how the evolution of toxicity for prey capture has arisen in mammals. These findings will be useful for future comparative studies with other venomous and non-venomous Eulipotyphlans.

29

Literature Cited

Aminetzach YT, Srouji JR, Kong CY, Hoekstra HE. 2009. Convergent Evolution of Novel Protein Function in Shrew and Lizard Venom. Curr. Biol. [Internet] 19:1925–1931. Available from: http://dx.doi.org/10.1016/j.cub.2009.09.022

Arni RK, Ward RJ. 1996. Phospholipase A2—a structural review. Toxicon [Internet] 34:827–841. Available from: http://www.sciencedirect.com/science/article/pii/0041010196000360 Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics [Internet] 30:2114–2120. Available from: http://dx.doi.org/10.1093/bioinformatics/btu170

Bowen C V., DeBay D, Ewart HS, Gallant P, Gormley S, Ilenchuk TT, Iqbal U, Lutes T, Martina M, Mealing G, et al. 2013. In Vivo Detection of TRPV6-Rich Tumors with Anti-Cancer Peptides Derived from Soricidin. PLoS One 8:1–11.

Bray NL, Pimentel H, Melsted P, Pachter L. 2016. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. [Internet] 34:525. Available from: https://doi.org/10.1038/nbt.3519

Bushnell B. 2014. BBMap: A Fast, Accurate, Splice-Aware Aligner. In: United States. Available from: https://www.osti.gov/servlets/purl/1241166

Casewell NR, Visser JC, Baumann K, Dobson J, Han H, Kuruppu S, Morgan M, Romilio A, Weisbecker V, Mardon K, et al. 2017. The Evolution of Fangs, Venom, and Mimicry Systems in Blenny . Curr. Biol. [Internet] 27:1184–1191. Available from: http://www.sciencedirect.com/science/article/pii/S0960982217302695

Casewell NR, Wu W, Vonk FJ, Harrison RA, Fry BG. 2013. Complex cocktails : the evolutionary novelty of venoms. 28:219–229.

Churchfield S, Sheftel BI. 1994. Food niche overlap and ecological separation in a multi-species community of shrews in the Siberian taiga. J. Zool. 234:105–124.

Cologna CT, Rodrigues RS, Santos J, de Pauw E, Arantes EC, Quinton L. 2018. Peptidomic investigation of Neoponera villosa venom by high-resolution mass spectrometry: seasonal and nesting habitat variations. J. Venom. Anim. Toxins Incl. Trop. Dis. [Internet] 24:6. Available from: https://doi.org/10.1186/s40409-018-0141-3

30

Comeau SR, Gatchell DW, Vajda S, Camacho CJ. 2004. ClusPro: an automated docking and discrimination method for the prediction of protein complexes. Bioinformatics [Internet] 20:45–50. Available from: http://dx.doi.org/10.1093/bioinformatics/btg371

Daltry JC, Wüster W, Thorpe RS. 1996. Diet and snake venom evolution. Nature [Internet] 379:537–540. Available from: https://doi.org/10.1038/379537a0

Duda TF, Palumbi SR. 1999. Molecular genetics of ecological diversification: Duplication and rapid evolution of toxin genes of the venomous gastropod <em>Conus</em> Proc. Natl. Acad. Sci. [Internet] 96:6820 LP-6823. Available from: http://www.pnas.org/content/96/12/6820.abstract

Dufton MJ. 1992. Venomous mammals. Pharmacol. Ther. [Internet] 53:199–215. Available from: http://www.sciencedirect.com/science/article/pii/016372589290009O

Durban J, Juárez P, Angulo Y, Lomonte B, Flores-Diaz M, Alape-Girón A, Sasa M, Sanz L, Gutiérrez JM, Dopazo J, et al. 2011. Profiling the venom gland transcriptomes of Costa Rican snakes by 454 pyrosequencing. BMC Genomics [Internet] 12:259. Available from: https://www.ncbi.nlm.nih.gov/pubmed/21605378

Eadie WR. 1952. Shrew Predation and Vole Populations on a Localized Area. J. Mammal. [Internet] 33:185–189. Available from: http://www.jstor.org/stable/1375927 Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. [Internet] 32:1792–1797. Available from: https://dx.doi.org/10.1093/nar/gkh340

Fletcher W, Yang Z. 2010. The Effect of Insertions, Deletions, and Alignment Errors on the Branch-Site Test of Positive Selection. Mol. Biol. Evol. [Internet] 27:2257–2267. Available from: https://doi.org/10.1093/molbev/msq115

Folinsbee KE. 2013. Evolution of venom across extant and extinct eulipotyphlans. Comptes Rendus Palevol [Internet] 12:531–542. Available from: http://www.sciencedirect.com/science/article/pii/S1631068313000717

Frankish A, Vullo A, Zadissa A, Yates A, Thormann A, Parker A, Gall A, Moore B, Walts B, Aken BL, et al. 2017. Ensembl 2018. Nucleic Acids Res. [Internet] 46:D754–D761. Available from: https://doi.org/10.1093/nar/gkx1098

Fry BG, Roelants K, Champagne DE, Scheib H, Tyndall JDA, King GF, Nevalainen TJ, Norman JA, Lewis RJ, Norton RS, et al. 2009. The Toxicogenomic Multiverse: Convergent Recruitment of Proteins Into Animal Venoms. Annu. Rev. Genomics Hum. Genet. [Internet] 10:483–511. Available from: https://doi.org/10.1146/annurev.genom.9.081307.164356

31

Fry BG, Wüster W. 2004. Assembling an Arsenal: Origin and Evolution of the Snake Venom Proteome Inferred from Phylogenetic Analysis of Toxin Sequences. Mol. Biol. Evol. [Internet] 21:870–883. Available from: https://dx.doi.org/10.1093/molbev/msh091

Furió M, Agustí J, Mouskhelishvili A, Sanisidro Ó, Santos-Cubedo A. 2010. The paleobiology of the extinct venomous shrew Beremendia (Soricidae, Insectivora, Mammalia) in relation to the geology and paleoenvironment of Dmanisi (Early Pleistocene, Georgia). J. Vertebr. Paleontol. [Internet] 30:928–942. Available from: https://doi.org/10.1080/02724631003762930

George SB, Choate JR, Genoways HH. 1986. Blarina brevicauda. Mamm. Species [Internet]:1–9. Available from: http://www.jstor.org/stable/3504010?seq=1#page_scan_tab_contents

Gibbs HL, Rossiter W. 2008. Rapid Evolution by Positive Selection and Gene Gain and Loss: PLA2 Venom Genes in Closely Related Sistrurus Rattlesnakes with Divergent Diets. J. Mol. Evol. [Internet] 66:151–166. Available from: https://doi.org/10.1007/s00239-008-9067-7

Gubenšek F, Sket D, Turk V, Lebez D. 1974. Fractionation of Vipera ammodytes venom and seasonal variation of its composition. Toxicon [Internet] 12:167–168. Available from: http://www.sciencedirect.com/science/article/pii/0041010174902414

Hamilton WJ. 1930. The Food of the Soricidae. J. Mammal. [Internet] 11:26–39. Available from: http://www.jstor.org/stable/1373782

Hamilton WJ. 1941. The Food of Small Forest Mammals in Eastern United States. J. Mammal. [Internet] 22:250–263. Available from: http://www.jstor.org/stable/1374950

Hargreaves AD, Mulley JF. 2015. Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing.Pochon X, editor. PeerJ [Internet] 3:e1441. Available from: https://doi.org/10.7717/peerj.1441

Harris JB, Scott-Davey T. 2013. Secreted phospholipases A2 of snake venoms: effects on the peripheral neuromuscular system with comments on the role of phospholipases A2 in disorders of the CNS and their uses in industry. Toxins (Basel). [Internet] 5:2533–2571. Available from: https://www.ncbi.nlm.nih.gov/pubmed/24351716

Henry MS, Gendron L, Tremblay ME, Drolet G. 2017. Enkephalins: Endogenous Analgesics with an Emerging Role in Stress Resilience. Neural Plast. 2017.

Ishihama Y, Oda Y, Tabata T, Sato T, Nagasu T, Rappsilber J, Mann M. 2005. Exponentially Modified Protein Abundance Index (emPAI) for Estimation of Absolute Protein Amount in

32

Proteomics by the Number of Sequenced Peptides per Protein. Mol. &amp; Cell. Proteomics [Internet] 4:1265 LP-1272. Available from: http://www.mcponline.org/content/4/9/1265.abstract

Kita M, Nakamura Y, Okumura Y, Ohdachi SD, Oba Y, Yoshikuni M, Kido H, Uemura D. 2004. Blarina toxin, a mammalian lethal venom from the short-tailed shrew <em>Blarina brevicauda</em>: Isolation and characterization. Proc. Natl. Acad. Sci. U. S. A. [Internet] 101:7542 LP-7547. Available from: http://www.pnas.org/content/101/20/7542.abstract

Kita M, Okumura Y, Ohdachi SD, Oba Y, Yoshikuni M, Nakamura Y, Kido H, Uemura D. 2005. Purification and characterisation of blarinasin, a new tissue kallikrein-like protease from the short-tailed shrew Blarina brevicauda: Comparative studies with blarina toxin. Biol. Chem. 386:177–182.

Kochva E, Bdolah A, Wollberg Z. 1993. Sarafotoxins and endothelins: evolution, structure and function. Toxicon [Internet] 31:541–568. Available from: http://www.sciencedirect.com/science/article/pii/004101019390111U

Kordiš D. 2011. Evolution of Phospholipase A2 Toxins in Venomous Animals.

Kowalski K, Marciniak P, Rosiński G, Rychlik L. 2017. Evaluation of the physiological activity of venom from the Eurasian water shrew Neomys fodiens. Front. Zool. 14.

Kowalski K, Rychlik L. 2018. The role of venom in the hunting and hoarding of prey differing in body size by the Eurasian water shrew, Neomys fodiens. J. Mammal. 99:351–362.

Krowarsch D, Cierpicki T, Jelen F, Otlewski J. 2003. Canonical protein inhibitors of serine proteases. Cell. Mol. Life Sci. 60:2427–2444.

Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods [Internet] 9:357. Available from: http://dx.doi.org/10.1038/nmeth.1923

Ligabue-braun R. 2015. Evolution of Venomous Animals and Their Toxins. Available from: http://link.springer.com/10.1007/978-94-007-6727-0

Mackessey. 2010. Handbook of Venoms and Toxins of Reptiles. Boca Raton: CRC Press

Magoč T, Salzberg SL. 2011. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics [Internet] 27:2957–2963. Available from: http://dx.doi.org/10.1093/bioinformatics/btr507

33

Martin IG. 1981. Venom of the Short-Tailed Shrew (Blarina brevicauda) as an Immobilizing Agent. J. Mammal. [Internet] 62:189–192. Available from: http://www.jstor.org/stable/1380494 Martin IG. 1984. Factors affecting food hoarding in the short-tailed shrew Marina brevicauda. Mammalia 48:65–72.

Pearson OP. 1942. On the Cause and Nature of a Poisonous Action Produced by the Bite of a Shrew (Blarina brevicauda). J. Mammal. [Internet] 23:159–166. Available from: http://www.jstor.org/stable/1375068

Pearson OP. 1950. The submaxillary glands of shrews. Anat. Rec. [Internet] 107:161–169. Available from: https://doi.org/10.1002/ar.1091070206

Pekár S, Bočánek O, Michálek O, Petráková L, Haddad CR, Šedo O, Zdráhal Z. 2018. Venom gland size and venom complexity—essential trophic adaptations of venomous predators: A case study using spiders. Mol. Ecol. [Internet] 27:4257–4269. Available from: https://doi.org/10.1111/mec.14859

Petersen TN, Brunak S, von Heijne G, Nielsen H. 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods [Internet] 8:785. Available from: http://dx.doi.org/10.1038/nmeth.1701

Phuong MA, Mahardika GN, Alfaro ME. 2016. Dietary breadth is positively correlated with venom complexity in cone snails. BMC Genomics [Internet] 17:1–15. Available from: http://dx.doi.org/10.1186/s12864-016-2755-6

Rimphanitchayakit V, Tassanakajon A. 2010. Structure and function of invertebrate Kazal-type serine proteinase inhibitors. Dev. Comp. Immunol. [Internet] 34:377–386. Available from: http://www.sciencedirect.com/science/article/pii/S0145305X09002560

Robinson DE, Brodie ED. 1982. Food Hoarding Behavior in the Short-tailed Shrew Blarina brevicauda. Am. Midl. Nat. [Internet] 108:369–375. Available from: http://www.jstor.org/stable/2425498

Rode-Margono EJ, Nekaris AK. 2015. Cabinet of Curiosities: Venom Systems and Their Ecological Function in Mammals, with a Focus on Primates. Toxins 7.

Santos-Filho NA, Santos CT. 2017. Alpha-type phospholipase A(2) inhibitors from snake blood. J. Venom. Anim. Toxins Incl. Trop. Dis. [Internet] 23:19. Available from: https://www.ncbi.nlm.nih.gov/pubmed/28344595

34

Schrödinger, LLC. 2015. The {PyMOL} Molecular Graphics System, Version~1.8. Silla-Martínez JM, Capella-Gutiérrez S, Gabaldón T. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics [Internet] 25:1972–1973. Available from: https://dx.doi.org/10.1093/bioinformatics/btp348

Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics [Internet] 30:1312–1313. Available from: https://dx.doi.org/10.1093/bioinformatics/btu033

Vonk FJ, Casewell NR, Henkel C V, Heimberg AM, Jansen HJ, McCleary RJR, Kerkkamp HME, Vos RA, Guerreiro I, Calvete JJ, et al. 2013. The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system. Proc. Natl. Acad. Sci. [Internet] 110:20651 LP-20656. Available from: http://www.pnas.org/content/110/51/20651.abstract

Wall SB Vander. 1990. Food Hoarding In Animals. Chicago: University of Chicago Press

Wollberg Z, Bdolah A, Kochva E. 1989. Vasoconstrictor effects of sarafotoxins in rabbit aorta: Structure-function relationships. Biochem. Biophys. Res. Commun. [Internet] 162:371–376. Available from: http://www.sciencedirect.com/science/article/pii/0006291X89920068

Wu S, Zhang Y. 2008. MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins Struct. Funct. Genet. 72:547–556.

Yang Z. 2007. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. [Internet] 24:1586–1591. Available from: https://dx.doi.org/10.1093/molbev/msm088

Zhang L-L, Zhang Q, Chen B, Wang Y, Liu C-C, Yang H. 2014. Biotoxins for Cancer Therapy. Asian Pacific J. Cancer Prev. 15:4753–4758.

Zhang Y, Xu J, Wang Z, Zhang X, Liang X, Civelli O. 2012. BmK-YA, an enkephalin-like peptide in scorpion venom. PLoS One [Internet] 7:e40417–e40417. Available from: https://www.ncbi.nlm.nih.gov/pubmed/22792309

35

Appendix: Tables and Figures

Assembly Name WH4 -deNovo WH5 -deNovo WH5-LongReads

Raw Reads 2,000,000 150bp 2,000,000 150bp 2,000,000 150bp Assembled paired end paired end paired end + 500,000 Minion Reads

Transcripts 31,657 29,630 29,808 Assembled Transcripts 17,684 15,727 15,754 containing ORFs

ORFs containing a 780 759 762 signal peptide

Proteins with hits to 59 58 57 Tox-Prot database

Table 1: Results from various steps of the bioinformatic processing of raw RNA reads from the transcriptome of two individual B. brevicauda .

36

Protein Name M8a lnL M8 lnL Likelihood Pvalue/Significa Ratio Test nce BLTX -4245.426066 -4243.175318 2 x (- .033867 * 4243.175318-(- 2 sites: 187 V 4245.426066)) = 207 H 4.501496

Blarinasin -5350.820439 -5351.626880 2 x (- .204099 5351.626880-(- 5350.820439)) = 1.612882

KLK1 -5212.783533 -5213.394167 2 x (- .269125 5213.394167(- 5212.783533 )) = 1.221268

Non-Expressed -5384.553916 -5387.709546 2 x (- .011998 * KLK1 5384.553916 – (- 1 site: 186 D 5387.709546)) = 6.31126

Soricidin -6478.892152 -6478.733397 2 x (- .573114 6478.733397-(- 6478.892152)) = 0.31751

Endothelin-1 -3787.737959 -3787.711426 2 x (- .817923 3787.711426 – (- 3787.737959)) = 0.053066

PLA2 -3928.358225 -3925.249968 2 x (- .012657 * 3925.249968 – (- 1 site: 75 G 3928.358225)) = 6.216514

Oxytocin/Neuro -1468.823853 -1468.962410 2 x (- N/A physin 1468.962410 – (- 1468.823853)) = −0.277114

Table 2: Results from positive selection analyses conducted using the codeml program in PAML. M8a represents the null model in which dn/ds is fixed to 1 with 11 classes of sites taken into

37

account. M8 represents the alternative model in which dn/ds is allowed to vary amongst 11 classes of sites.

KLK_not_express 1 MCFLLLCLALSLLGTAYTHPTYAGHGKSGHIC-SLQCEKQFQPWQVTLYFNRENIGLCGG KLK1 1 MCFLLLCLALTLGGTG-AVFPLPGIQIEARIYGGWECEKHSQPWQAAIYYNQGF--LCGA Blarinasin 1 MYLLLLCLPLTLMGTG-AVPPGPSIEIHPRIVGGWECDKHSQPWQALLTFTNGLDGVCGG 4_Exon_KLK 1 MYFLLLCLALTLMGTG-AAPPYPGIQIHARIVGGWECDKHSQPWQAVLTFAK--NGFCGG BLTX 1 MCFLLLCLTLTLAGTG-AVPTGPSIEIHSRIIGGWECDKHSQPWQALLTFTRKHNSVCGG

KLK_not_express 60 VLIHPKWVLTAAHCLGENYHIWLGLQGQTLNLSKAQHNWVSGKFPHPLY-MTQRNRWKSS KLK1 58 VLVHPMWVLTAAHCIDRDYKVWLGLHNSSAPESTAQFFRVSESVLHPLFNLSLLIPMGNP Blarinasin 60 VLVHPQWVLTAAHCIGDNYKIKLGLHDRFSKDDPFQEFQVSASFPHPSYNMRLLKLLLSD 4_Exon_KLK 58 VLVHPQWVLTAAHCFQDNIKVILGLHDLVSNEDTVQKIQVNAIFLHPLYNMTLRNLLKHH BLTX 60 VLVHPQWVLTAAHCIGDNYKVLLGLHDRSSKESTVQEARVSARFPHPLYNMTLLNLLLSH

KLK_not_express 119 KIMYLINSGNARKVDYSNDLMLLRLELPAQLSDTVQVLDLPTQEPAEGSTCYIAAWSINY KLK1 118 DMTWKEFVDTFQGVDFSHDLMLLRLDRPAVLTDTVKVLDLPTQEPQVGSKCLTSGWGSTD Blarinasin 120 ELNDTYYDEISLGADFSHDLMMMQLEKPVQLNDAVQVLDLPTQEPQVGSKCHASGWGSMD 4_Exon_KLK 118 TKNSL---KTFRRADFSHDLMLLHLEHPVQLTDAVQVLDLPTQEPQVGSNCYASGWGSIN BLTX 120 KMNLTFFYKTFLGADFSHDLMLLRLDQPVQLTDAVQVLDLPTQEPQVGSTCHVSGWGRTS

KLK_not_express 179 PETSRPID-TSSKLQCVNFKLLSNNVCGRNYVEKVTDTMLCAGRMEGSKGSCMGDSGAPL KLK1 178 SYKGETIVKLSRELRCVDLDLLPNDDCAKAQIAKVTEYMLCAGVMEGGKDTCVGDSGGPL Blarinasin 180 PYSRIDFP-RTGKLQCVDLTLMSNNECSRSHIFKITDDMLCAGHIKGRKDTCGGDSGGPL 4_Exon_KLK 175 PDAKNSFV-LPKTLQCVDLALLPNEICSRAYIFKMTEAMLCAGHMKGGKDTCG------BLTX 180 QNYENSFV-LPEKLQCVEFTLLSNNECSHAHMFKVTEAMLCAGHMEGGKDSCVGDSGGPL

KLK_not_express 238 ICDGMLHGIASWGAPPCSPHYKSGLFVKIFPYVNWIQETIKANT KLK1 238 ICDGVFQGITSWGVGPCAYRQKPGLYVKLFSYVDWIRETIATHS Blarinasin 239 ICDGVFQGTTSWGSYPCGKPRMPGVYVKIFSHVDWIREIIATHS 4_Exon_KLK ------BLTX 239 ICDGVFQGIASWGSSPCGQQGRPGIYVKVFLYISWIQETIKAHS

Figure 1. Alignment of all Kallikrein-Like Serine Proteases identified from both genomic and transcriptomic analysis.

38

Kallikrein Genomic Orientation and Expression

14000

12000

10000

8000

6000 Expression (TPM) Level

4000

2000

0

KLK15 KLK1 Blarinasin BLTX 4-Exon Non ACPT Kallikrein Expressed Kallikrein

Figure 2. Genomic orientation of KLK1 tandem array in B. brevicauda. Flanking genes (KLK15 and ACPT) are shown along with expression values in TPM for each KLK1 paralog.

39

Figure 3: Three-dimensional model of BLTX showing sites undergoing positive selection (Magenta colored). Active site residues are highlighted in red. Note that amino acids under positive selection (magenta) are away from the active site (red).

40

Relative Abundance of Putative Toxins in Salivary Proteome 12 10 8 6 4 2 Relative Abundance (%) 0

BLTX KLK1 Soricidin Blarinasin Endothelin Truncated-KLK1

Oxytocin/Neurophydin

Double-Headed Protease Inhibitor

WH4 WH5

Figure 4: Relative abundance of Putative Venom Proteins in the saliva for both WH5 and WH4 individuals.

41

Figure 5: Phylogenetic reconstruction of KLK1 sequences from B. brevicauda and the common shrew (S. araneus). Hedgehog and Star-Nosed Mole KLK1 sequences were used as an outgroup. Node labels indicate bootstrap support (1000 bootstraps).

42

Figure 6: Electrostatic potential of modelled surface residues for 1) BLTX 2) Blarinasin 3) Non- Expressed Kallikrein 4) KLK1 in B. brevicauda. Red indicates a more negative electric potential, whereas blue indicates a more positive electric potential.

43

Figure 7: ClusPro predicted protein-protein interaction between BLTX (Purple) and the Double Headed Protease Inhibitor (Teal). The active sites in the catalytic cleft of BLTX are highlighted in yellow.

44