AN ABSTRACT OF THE DISSERTATION OF

Ehren J. Bentz for the degree of Doctor of Philosophy in Integrative Biology presented on March 20, 2019.

Title: Characterizing the Function of the Harderian Gland and its Interactions with the Vomeronasal Organ in the Red-sided Garter , Thamnophis sirtalis parietalis.

Abstract approved: ______Robert T. Mason

The Harderian gland is a large cephalic gland present in most groups of terrestrial vertebrates. Although the Harderian gland has been the focus of numerous studies for more than 300 years, its physiological function has remained largely unresolved. Harderian gland secretions are diverse among different taxa, and many putative functions have been ascribed to this gland. The Red-sided garter snake (Thamnophis sirtalis parietalis) displays strong seasonal shifts in behavior – mating occurs immediately after emergence from hibernation in the spring and abruptly shifts to feeding several weeks later. These behaviors are mutually exclusive and coincide with sexually dimorphic physical changes to the Harderian gland. Male Harderian glands are hypertrophied upon emergence as male use their vomeronasal chemosensory system to actively search for females expressing sexual attractiveness pheromone, whereas the gland in recently emerged females remains regressed and quiescent. Here, I use garter snakes as a novel model to investigate the function of the Harderian gland, and through the use of modern molecular techniques, describe the mechanisms by which this historically enigmatic gland functions within the vomeronasal chemosensory system. Using high-throughput sequencing and bioinformatic analyses, I examine the functional characteristics of the Harderian gland transcriptome (a collection of all genes which are expressed as mRNA) to describe a general physiological function of this tissue. I describe patterns of variation

by sex and season of the genes expressed in the Harderian gland as well as of chemosensory receptor proteins expressed in the vomeronasal organ. Additionally, I use protein mass spectrometry to identify and characterize the functions of proteins present in the secretions of the vomeronasal organ and use an integrated analysis using protein mass spectrometry and RNA-sequencing to infer which proteins within that fluid are likely to be primarily produced in the Harderian gland. The Harderian gland was found to express an abundance of genes associated with lipid-binding proteins and proteins involved antimicrobial defense. Expression within these two categories of genes is significant compared to other tissues. This suggests that gene products produced in this tissue likely function to bind and solubilize lipids and act as component of the immune system within the vomeronasal chemosensory system. The Harderian glands of male snakes were found to be more transcriptionally active in the spring compared to females. Male glands were also found to express more secretory and lipid-binding proteins compared to females throughout the year. Females in the spring were found to express genes involved in stress responses suggesting that they may respond differently to stress than males. These glands in both males and females were found to express genes involved in porphyrin metabolism – a well-described characteristic of the Harderian glands in rodents. This is the first description of porphyrin metabolism in a squamate but was observed only in the summer feeding time period. I found no evidence that the expression of antimicrobial defense proteins varies by either sex or season. Proteins identified in the fluid of the vomeronasal organ were found to contain an abundance of lipocalins (lipid-binding proteins) and extracellular immune proteins. I conducted in-vitro bacteria killing assays demonstrating that this fluid has potent antimicrobial properties. Tissue-specific expression showed that a large proportion of the identified lipocalins and antimicrobial proteins are produced in the Harderian gland and secreted into the vomeronasal organ. Expression of vomeronasal chemosensory receptors showed sexually dimorphic seasonal variation. Males express receptors from early spring and throughout summer. Female snakes expressed very few receptors during the spring mating period, suggesting that their vomeronasal chemosensory system is relatively inactive while mating. The vomeronasal receptor repertoire did not appear to vary

greatly by season in either males or females. An important protein of interest was identified which was expressed highly in the male Harderian gland while being nearly absent in those of females. This protein is identified as a lipocalin lipid-binding protein and a likely candidate for a putative pheromone-binding protein facilitating the solubilization and subsequent detection of the female sexual attractiveness pheromone. The findings presented here demonstrate that the Harderian gland is an integral component of the vomeronasal chemosensory system in the Red-sided garter snake functioning both to facilitate the detection of chemical signals and as a component of the extracellular immune system protecting the sensitive vomeronasal sensory epithelium from environmental pathogens.

©Copyright by Ehren J. Bentz March 20, 2019 All Rights Reserved

Characterizing the Function of the Harderian Gland and its Interactions with the Vomeronasal Organ in the Red-sided Garter Snake, Thamnophis sirtalis parietalis

By

Ehren J. Bentz

A DISSERTATION

submitted to

Oregon State University

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Presented March 20, 2019 Commencement June 2019

Doctor of Philosophy dissertation of Ehren J. Bentz presented on March 20, 2019.

APPROVED:

Major Professor, representing Integrative Biology

Chair of the Department of Integrative Biology

Dean of the Graduate School

I understand that my thesis will become part of the permanent collection of Oregon State University libraries. My signature below authorizes release of my thesis to any reader upon request.

Ehren J. Bentz, Author

ACKNOWLEDGEMENTS

I would like to thank my daughter Jessica for the incredible amount of support and understanding she has shown me throughout the years. I know it was never easy to be the child of a single father who is in the midst of a doctoral degree program. Jessica, you are an amazing young woman. I grow more proud of you every day. No more puzzles! But, maybe someday we’ll get a chance to conduct some more Y-maze experiments in a sweltering barn. I would also like to extend my deepest and most sincere thanks and appreciation to my parents Gay and Brad Bentz who have supported my every decision and allowed me to become the person I am today. I can say with confidence that I would not have accomplished a fraction of what I have, and I would not be completing this degree without their love and encouragement. I also want to thank my amazing partner Rebecca Amantia who has loved me and supported me through this process. Through long days, and working late into the night, you never once complained, even though some days I didn’t even get around to putting on pants. Thank you! I love you all!

I want to thank my doctoral advisor, Dr. Robert T. Mason, for his support and encouragement during my time as a graduate student. Even when experiments devolved into hopeless catastrophes, you continued to provide intellectual and financial support and allowed me freedom to continue pursuing my interests in science. Additionally, I would like to thank my committee members: Dr. Felipe Barreto and Dr. Jeff Anderson for their contributions both to my research and my development as a graduate student. Dr. Jean Hall who performed admirably as my graduate council representative, and Dr. Eli Meyer without whom I would not have been able to perform the various molecular and bioinformatic pursuits on which my dissertation research rests.

I would also like to thank my lab partners Dave Hubert and Leslie Blakemore who have been there every step of the way. Whether I wanted to talk research or go on an impromptu snake-catchin’ canoe adventure on the Santiam river. We mutually

helped each other retain our sanity (or some of it at least) on cross country drives and crossing international borders in a van filled with hundreds of snakes. Oh, sweet Canada.

Last, but certainly not least, I would like to thank the Integrative Biology staff Tara Bevandich, Tresa Bowlin, Traci Durell-Khalife, Trudy Powell, and Torri Givigliano, and Jane Van Order who provided administrative support throughout my degree program. I don’t know how you do everything you do, but you do it extremely well. The degrees of many, many students have been made possible by your efforts.

Thank you all!

The research presented here was partially funded by the National Science Foundation Graduate Research Fellowship Program (NSF-GRFP), NSF grant 0620125, the J. C. Braly Natural History fund, and the Jack Kent Cooke Foundation.

High throughput nucleotide sequencing was performed by the Genomics & Cell Characterization Core Facility at University of Oregon, Eugene, OR; The Massively Parallel Sequencing Shared Resource (MPSSR) at Oregon Health & Science University, Portland, OR; and the Center for Genome Research and Biocomputing (CGRB), Oregon State University, Corvallis, OR.

Protein mass spectrometry was performed at the Oregon State University Mass Spectrometry Center, Corvallis, OR on the Orbitrap Fusion Lumos instrument provided by NIH grant # 1S10OD020111-01.

CONTRIBUTION OF AUTHORS

CHAPTER TWO Samples were collected in the field by EHREN BENTZ, DAVID HUBERT, and DR. ROBERT MASON. Transcriptome sequencing prep was conducted by EHREN BENTZ and DR. ELI MEYER. Scripts used for data analysis were developed by DR. ELI MEYER and EHREN BENTZ. Data analysis was conducted by EHREN BENTZ. Interpretation of analyses were conducted by EHREN BENTZ and DR. ROBERT MASON.

CHAPTER THREE Protein samples were collected in the field by EHREN BENTZ and DR. ROBERT MASON and DAVID HUBERT. Protein preparation protocol for mass spec. was developed and conducted by EHREN BENTZ. Protein data analysis was conducted by EHREN BENTZ and DR. JEFF ANDERSON. Interpretation of analyses were conducted by EHREN BENTZ, DR. JEFF ANDERSON, DAVE HUBERT, and DR. ROBERT MASON. Experimental design and implementation of BKA assays were conducted by EHREN BENTZ and LESLIE BLAKEMORE. BKA assays were prepared and data collection was conducted by HANNAH STUWE. R-code used to analyze optical density readings was developed by LESLIE BLAKEMORE, and analysis was conducted by EHREN BENTZ.

CHAPTER FOUR Samples were collected in the field by EHREN BENTZ and DR.ROBERT MASON. Sequencing prep was conducted by EHREN BENTZ and DR. ELI MEYER. Scripts used in data analysis were developed by DR. ELI MEYER and EHREN BENTZ. Data analysis was performed by EHREN BENTZ. Interpretation of analyses were conducted by EHREN BENTZ and DR. ROBERT MASON.

CHAPTER FIVE Samples were collected in the field by EHREN BENTZ and DR.ROBERT MASON. Sequencing prep was conducted by EHREN BENTZ. Scripts used in data analysis were developed by DR. ELI MEYER and EHREN BENTZ. Data analysis was performed by EHREN BENTZ. Interpretation of analyses were conducted by EHREN BENTZ and DR. ROBERT MASON.

TABLE OF CONTENTS

Page

1. Introduction and Background

The Harderian gland ...... 1

The vomeronasal organ ...... 9

Sex pheromones ...... 20

Pheromone-binding proteins ...... 22

Thamnophis sirtalis parietalis as a research model ...... 25

Seasonal and sexual dimorphism of the Hard. gland of T. s. parietalis ..29

Brief introduction to technical methods ...... 37

Project Overview ...... 41

Chapter 1 References ...... 43

2. Characterization of the Harderian gland Transcriptome of T. s. parietalis

Introduction ...... 54

Methods...... 59

Results ...... 69

Discussion ...... 83

Chapter 2 References ...... 91

TABLE OF CONTENTS (Continued)

Page

3. IDENTIFICATION OF THE PROTEIN COMPONENTS OF VOMERONASAL SECRETIONS OF THAMNOPHIS SIRTALIS PARIETALIS VIA PROTEIN MASS SPECTROMETRY

Introduction ...... 95

Methods...... 100

Results ...... 107

Discussion ...... 118

Chapter 2 References ...... 129

4. DESCRIBING THE FUNCTIONAL SIGNIFICANCE OF VARIATIONS IN GENE EXPRESSION BY SEX AND SEASON IN THE HARDERIAN GLAND OF THAMNOPHIS SIRTALIS PARIETALIS

Introduction ...... 133

Methods...... 138

Results ...... 145

Discussion ...... 171

Chapter 4 References ...... 185

TABLE OF CONTENTS (Continued)

Page

5. VARIATIONS IN THE EXPRESSION OF VOMERONASAL CHEMOSENSORY SENSORY RECEPTORS BY SEX AND SEASON

Introduction ...... 190

Methods...... 197

Results ...... 204

Discussion ...... 223

Chapter 2 References ...... 231

6. SYNTHESIS AND MAJOR CONCLUSIONS

Introduction ...... 235

Synthesis and conclusions...... 237

Future Directions ...... 249

BIBLIOGRAPHY ...... 255

APPENDICES

A.1 ‘Antimicrobial defense’ transcripts ...... 271

A.2 Primer sequences ...... 279

R script appendices ...... 280

Unix script appendices ...... 312

LIST OF FIGURES

Figure Page

1.1 Phylogeny of the Harderian gland in vertebrates ...... 2

1.2 Histology of the Harderian gland ...... 3

1.3 Phylogeny of the vomeronasal organ in vertebrates ...... 11

1.4 Chemosensory neurons in vomeronasal sensory epithelium ...... 13

1.5 Cross section of the vomeronasal organ of T. s. parietalis ...... 14

1.6 ink injected into the Harderian gland...... 18

1.7 The Harderian gland and vomeronasal organ in squamates ...... 19

1.8 Harderian gland mass against head length ...... 31

1.9 Acinar cell heights by sex and season ...... 32

1.10 Acinar lumen diameters by sex and season ...... 33

2.1 Transcriptomic analysis workflow ...... 65

2.2 Differential expression of Harderian gland and pooled samples ...... 76

3.1 Tissue specificity analysis workflow ...... 104

3.2 BKA plate layout diagram ...... 106

3.3 Differential expression of proteins in vomeronasal secretion ...... 112

3.4 Differential expression of results of vomeronasal secretion proteins ...... 113

3.5 Protein abundance by protein origin and protein identity ...... 114

3.6 Bacterial growth curves in the presence of vomeronasal secretions...... 116

3.7 Bacterial killing potency of vomeronasal secretion ...... 117

4.1 Two-factor comparisons diagram in the Harderian gland ...... 142

4.2 Single-factor comparisons diagram in the Harderian gland ...... 143

LIST OF FIGURES (Continued)

Figure Page

4.3 Principal components analysis Harderian gland gene expression ...... 146

4.4 Heat map of gene expression in the Harderian gland ...... 147

4.5 Differential expression by sex in the Harderian gland ...... 148

4.6 Differential expression by season in the Harderian gland ...... 151

4.7 Differential expression by sex-season interaction in the Harderian gland ...... 154

4.8 Differential expression by sex in the Harderian gland in spring ...... 158

4.9 Differential expression by sex in the Harderian gland in summer ...... 161

4.10 Differential expression by season in the Harderian gland in males ...... 164

4.11 Differential expression by season in the Harderian gland in females ...... 167

5.1 Two-factor comparisons diagram in the vomeronasal organ ...... 202

5.2 Single-factor comparisons diagram in the vomeronasal organ ...... 203

5.3 Principal components analysis vomeronasal organ gene expression ...... 205

5.4 Differential expression in the vomeronasal organ, two-factor mode ...... 207

5.5 Differential expression in the vomeronasal organ, single-factor mode ...... 210

5.6 Counts of differentially expressed transcripts in the vomeronasal organ ...... 212

5.7 Differential expression of vomeronasal receptors, two-factor model ...... 214

5.8 Differential expression by sex of vomeronasal receptors in spring ...... 215

5.9 Z-scores by sex of vomeronasal receptors in spring ...... 216

5.10 Differential expression by sex of vomeronasal receptors in summer ...... 217

5.11 Differential expression by season of vomeronasal receptors in males ...... 218

5.12 Differential expression by season of vomeronasal receptors in females ...... 219

5.13 Z-scores by season vomeronasal receptors in females ...... 220

5.14 Counts of differentially expressed transcripts in the vomeronasal organ ...... 222

LIST OF TABLES

Table Page

2.1 Summary statistics of transcriptome assembly ...... 71

2.2 Top 100 most abundant transcripts in the Harderian gland ...... 72

2.3 Enrichment analysis results - ‘Molecular Function’ ...... 78

2.4 Enrichment analysis results - ‘Biological process’ ...... 79

2.5 Enrichment analysis results - ‘Cellular Component’ ...... 81

3.1 Proteins identified in vomeronasal secretions by nanoLC/MS/MS ...... 108

4.1 Gene set enrichment by sex in the Harderian gland ...... 150

4.2 Gene set enrichment by season in the Harderian gland ...... 153

4.3 Gene set enrichment by sex-season interaction in the Harderian gland ...... 155

4.4 Gene set enrichment by sex in the Harderian gland in spring ...... 159

4.5 Gene set enrichment by sex in the Harderian gland in summer ...... 163

4.6 Gene set enrichment by season in the Harderian gland of males ...... 166

4.7 Gene set enrichment by season in the Harderian gland of females ...... 169

5.1 Differentially expressed transcripts in the vomeronasal organ ...... 211

5.2 Differentially expressed vomeronasal receptors by sex and season ...... 221

1

Chapter 1

Background and Introduction

The Harderian gland:

The Harderian gland was first described by Johann Harder in 1694 as a novel structure located in the orbit of red deer. Harder found that the duct of the Harderian gland opened directly onto the eye leading him to suggest that it served to lubricate the eye or eyelids (Hillenius et al. 2007; Harder, 1694). This structure is present in all major lineages of the clade Tetrapoda, and although it has been described for centuries, its physiological function remains unclear (Payne, 1994; Hillenius et al. 2007). Payne (1994) states in his comprehensive review of the Harderian gland that “It is arguably the last remaining large organ of widespread distribution among the vertebrates to which we cannot confidently ascribe a confirmed function”.

Evolution of the Harderian gland:

Since Harder’s initial description, the Harderian gland has been described in a multitude of taxa and has been identified in all major clades of terrestrial vertebrates – Amphibia, Testudines, Crocodilia, Aves, , and Mammalia, but has not been described in fishes (Webb et al. 1992; Payne, 1994). These finding are commonly interpreted to indicate that the Harderian gland arose early in tetrapod evolution (Figure 1.1). While the Harderian gland is found in all major lineages of vertebrates, it is not found to be present in every terrestrial vertebrate taxon. While particular details concerning the evolution of the Harderian gland are not entirely clear (Sakai, 1981), it is generally agreed that the structure first arose in early and is ancestral to all terrestrial vertebrates. The absence of this structure in some groups of terrestrial vertebrates therefore appears to be due to secondary loss of the gland rather than convergence in groups possessing it (Sakai, 1981; Payne, 1994).

2

Figure 1.1: Phylogeny of the Harderian gland in vertebrates: Cladogram of extant vertebrate groups illustrating the phylogeny of the Harderian gland in vertebrates. Numbered lashes indicate: 1) Harderian gland is present in embryos, larvae, or adults; 2) Harderian gland is secondarily lost in some groups within these clades, but present in a large number of extant taxa. Figure based on descriptions of the Harderian gland listed in Payne (1994).

Structure of the Harderian gland across taxa:

The Harderian gland is located within the orbit, usually attached to the orbital wall medial to the eye. It is a relatively large gland, and generally described as an alveolar or acinar secretory gland (Rehorek et al. 2000b; Chieffi Baccari et al. (1996); Smith & Belairs, 1947). The main lobe of the gland contains many secretory bodies in which a layer of myoepithelial structural cells surround and support individual clusters of columnar epithelial secretory cells (Wight et al. 1971) (Figure 1.2). Although the structure of the Harderian gland differs by taxa, it is generally described as a secretory structure, with secretory cells organized as individual alveoli or acini surrounding a lumen which initiates a secretory tubule (Chieffi Baccari et al. (1996); Rehoreket al. 2000b). Individual secretory tubules converge to create larger channels until all channels converge into a single secretory duct. The location of the terminal

3

Figure 1.2: Histology of the Harderian gland: Image shows a cross section of the Harderian gland in T. s. parietalis. Acinar cells are visible organized into acini surrounding a central lumen. The acinar lumina converge to form larger ducts, and ultimately form the nasolacrimal duct in squamates. Figure from Erickson (2007). end of this secretory duct, and thus the location where the Harderian gland deposits its secretions, varies greatly by taxa (Payne, 1994). In most mammals, the Harderian gland secretes its contents onto the eye or nictating membrane (Brownscheidle, 1974; Smith, 1976) and in birds the Harderian gland duct terminates at the conjunctival sac (Burns, 1992: Wight et al. 1971). In many squamates (lizards and snakes), rather than emptying onto the eye or associated structures, the duct of the Harderian gland is continuous with the nasolacrimal duct which continues forward to the vomeronasal organ. In these taxa, the Harderian gland secretes its contents primarily into the vomeronasal lumen (Smith & Bellairs, 1947; Rehorek et al. 2000b).

4

Proposed functions of the Harderian gland and its secretions:

Harderian gland secretions have been found to be very diverse across taxa and many putative functions have been ascribed to this gland. Although this gland has been described in all lineages of terrestrial vertebrates, the mammalian Harderian gland has been the focus of a larger number of studies relative to other groups.

Many studies have described Harderian gland secretions of mammals (particularly rodents) as containing large amounts of lipid molecules (Seyama, et al. 1992). Watanabe et al. (1980) found that Harderian gland secretions in rodents include a high concentration of lipids, and the terminal secretory duct exited onto the eye. This led the authors to suggest that lipid secretions of this gland functions in lubrication of the cornea and eyelid.

In addition to lubrication, it has sometimes been suggested that the Harderian gland’s production of lipids serve a thermoregulatory function. Thiessen, et al. (1992) found that gerbils used grooming habits to spread the lipid rich secretions over their fur – possibly serving as a waterproofing substance. It was found that with surgically removed Harderian glands were not able to thermoregulate effectively, resulting in more rapid loss of body heat compared to unaltered gerbils when placed in ice water.

Albone (1984) observed grooming behaviors in gerbils similar to those described by Thiessen, et al. (1992), but rather than attributing this behavior to waterproofing and thermoregulation, Albone suggested that the gerbils were spreading pheromones (chemical signals acting as a communication mechanism between members of the same ) over their fur. It was demonstrated that removal of the Harderian gland from male gerbils decreased sex-specific interactions such as aggressive behavior by other males and reproductive receptivity of females. Seyama & Uchijima (2007) described production of lipids in the Harderian glands of golden hamsters, and also suggested that these lipids acted as pheromones. The authors found that extracted and isolated lipids were recognized by other hamsters and found that male hamsters were more attracted to lipids originating from females than those from other males,

5 although this difference was not significant. The authors of this study suggested that the lipids were used as long lasting territorial pheromones, and to attract mates. Pan, et al. (2010) used qPCR to investigate the expression of carbonic anhydrase in Harderian gland tissue of mice, finding that mitochondrial carbonic anhydrase was highly expressed in this tissue. The authors noted that carbonic anhydrase is sometimes associated with lipid synthesis and attributed its expression in rodent Harderian glands to this function.

In addition to production of lipids, many studies of the mammalian Harderian gland have described the gland in the context of synthesizing and secreting porphyrins (photoreactive precursors to heme groups) (Buzzell, et al. 1989; Spike et al. 1988; Rodriguez et al. 2003). Indeed, a previously undescribed form of porphyrin, “harderoporphyrin”, was isolated from the Harderian gland of the rat (Rattus norvegicus) and characterized (Kennedy, 1970). Although porphyrins are commonly known to be produced in the rodent Harderian gland, there is currently no well supported physiological function for these molecules. However, the production of porphyrins in rodents appears to correlate with stress events. Figge & Atkinson (1945) showed that rats increased the production of porphyrins when deprived of water for long periods of time. Harkness & Ridgway (1980) found that porphyrin secretion was linked to acute painful stimuli and limb restraint. Hipolide and Tufik (1995) showed that porphyrin production increased when rodents were experimentally deprived of sleep. In the Syrian hamster (Mesocricetus auratus), the Harderian gland is described as a model for the effects of oxidative stress in the presence of porphyrins (Coto-Montes, et al. 2001), and enzymes involved in porphyrin production have been observed to increase during estrous (Menendez- Pelaez, et al. 1991).

Several recent studies have used the Harderian gland in rodents as a model for tumorigenesis as it is prone to developing tumors easily with chemical stimulation (Shankaran et al. 2001; Parnell et al. 2005; Cucinotta & Chappell, 2010).

6

The Harderian gland in birds is often discussed in the context of immune system function. In Wight et al (1971), one of the earliest descriptions of the Harderian gland in domestic chickens (Gallus gallus), showed that the gland secretes its contents onto the surface of the eye, and noted the high likelihood that it functions in an immune capacity. Since that time, many studies in birds have focused on the immune system as the primary function of the gland in this group. Mueller et al. (1971) used plaque forming cell (PFC) immune assays to confirm that Harderian secretions from the domestic chicken bind bacterial pathogens in vitro. Montgomery and Maslin (1992) showed that the Harderian gland of birds is commonly involved in the activities of the head associated lymphoid tissue (HALT). The Harderian gland in birds is also observed to produce large amounts of immunoglobulins IgG, IgA and IgM and may serve as the major immune response site of the conjunctival sac and the eye (Montgomery and Maslin, 1992; Burns, 1992).

More recently, the transcriptome of the Harderian gland of the domestic chicken was sequenced and compared to several other immune tissues to determine its role within the immune system (Deist & Lamont, 2018). The authors of this study found that the gland displayed abundant expression of immune proteins including many extracellular antibodies secreted onto the surface of the eye. Additionally, they found the Harderian gland transcriptome to be enriched for proteins involved in the G- protein coupled receptor signaling pathway – a signaling pathway commonly associated with response to pathogen invasion. The authors concluded that the Harderian gland is not only involved in production of immune proteins, but also directly involved in the response to immune challenges as as environmental pathogens contact the ocular mucosa.

The Harderian gland appears to be present in Testudines, but few studies have investigated its function in this group (Payne, 1994). Chieffi (1992) showed that the Harderian gland of terrapins contains groups of cells which appear to serve as salt- secreting structures. This finding led Chieffi to infer that the Harderian gland in this group functions, at least in part, as an osmoregulatory gland in turtles.

7

The Harderian glands of crocodilians are not well studied or well understood. Rehorek, et al. (2005b) described the Harderian gland of the American alligator (Alligator mississippiensis) possessing features similar to those of Mammalia, Aves, and Lepidosauria. The authors showed that the gland in this group contains immune structures similar to those found in birds, but also appears to secrete both lipids (as described in mammals) and protein (as described in lizards and snakes).

The Harderian gland in amphibians is also found to secrete its contents onto the surface of the eye (Payne, 1994). Di Matteo et al. (1989) investigated the Harderian gland of the common green ( erythraea) and ascribed to it the function of moisturization and lubrication of the eye. The authors of this study also described a high concentration of mast cells and plasma cells in the intracellular spaces between acini suggesting that the tissue may have immune functions in this group as well. d'Istria et al. (1991) showed that the Harderian gland of the green frog also contains large amounts of androgen receptors but did not ascribe a specific function to this observation. The Harderian gland of the green frog and the European green toad (Bufo viridis) show expression of enzymes involved in the production of melatonin suggesting that the gland may also function in part to regulate circadian rhythms (Serino et al. 1993).

The squamate Harderian gland has been shown to be highy morphologically variable (Rehorek, 1997). The hypothesized functions of this gland include corneal lubrication and digestion. However, in her review of the squamate Harderian gland, Rehorek (1997) argues that the suggested functions of lubrication and digestion are unlikely. Instead Rehorek concludes that the morphological arrangements of structures connecting to the gland show that Harderian gland secretions enter the lacrimal apparatus but do not interact with the eye or digestive system. The nasolacrimal duct in the group provides a physical connection of the Harderian gland to the vomeronasal organ, and it has been shown that the majority of Harderian gland secretions in squamates passes into the lumen of the vomeronasal organ (Rehorek et al. 2000b).

8

Recently, the Harderian gland transcriptomes of three colubrid snake species were described by Domínguez-Pérez et al. (2018). The authors of this study showed that antimicrobial proteins and lipid-binding proteins were expressed in high abundance in this tissue. The authors of this study also described the presence of many canonical toxin transcripts in this tissue possibly indicating that the tissues in these species’ are involved in the production of venom – a function which has yet to be observed in the Harderian gland.

The research described above does not constitute an exhaustive list of studies undertaken with the aim of describing the function of the Harderian gland. This information was presented rather to demonstrate that these endeavors have led researchers to many different conclusions. Clearly, secretions of the Harderian gland vary by taxa and may contain a number of different components. These findings have led to many hypothetical functions of the gland, but show little evidence alluding to any well-supported biological function of the secreted compounds.

9

The Vomeronasal organ:

The vomeronasal organ, also sometimes referred to as “Jacobson’s organ”, was first described by the Danish researcher Ludvig Jacobson in 1813 (Eng. Trans: Jacobson, 1999). This organ is a major chemosensory structure which has been described in all lineages of terrestrial vertebrates (Døving & Trotier, 1998; Bertmar, 1981).

Until the mid 1990’s the vomeronasal organ was thought to have evolved specifically to detect pheromones (Døving & Trotier, 1998). However, as our understanding of this important chemosensory organ has developed, research findings have shown that many organisms rely upon the sensory cues from the vomeronasal organ to detect prey kairomones (chemical signals used by predators to identify prey species).

Wang et al. (1993) and Lie et al. (1997) showed that earth-worm shock secretions contained proteins which are detected by the vomeronasal organ of T. s. parietalis and evoke an active feeding response. Amphibians such as salamanders (Plethodon cinereus) have been shown to lack the ability to recognize prey without a functional vomeronasal organ (Placyk and Graves, 2002). Alving and Kardong (1996) showed that rattlesnakes (Crotalus viridis oreganus) use prey kairomones both to locate prey before a strike and to trail and locate envenomated prey after a strike. The authors found that sectioning of the vomeronasal nerves results in complete cessation of prey trailing behavior and ingestion of prey after a strike.

In addition to location of prey, several studies have also revealed that the vomeronasal organ is used to detect predator kairomones (chemical signals used by prey species to detect and avoid predators). Papes et al. (2010) demonstrated that chemical signals were used by mouse species to detect the presence of heterospecific mouse species. Miller and Gutzke (1999) showed that rattlesnakes without functional vomeronasal organs did not display typical predator avoidance behaviors when encountering common kingsnakes (Lampropeltis getula; a known predator of rattlesnakes). The vomeronasal organ has even been observed to be employed as a

10 mechanism to detect and avoid chemical markers of sickness in conspecifics (Boillat, 2015).

Our current understanding of the vomeronasal organ shows that its functional purpose is not limited to the detection of pheromones, but that it fulfils many different roles. However, the vomeronasal organ appears to display specialization, being highly attuned to specific cues. Isogai et al. (2011) showed that the vomeronasal organ of mice contain a small number (relative to the main olfactory system) of chemical receptors which were specific to the chemical signals obtained from a variety of animals. The authors presented mice with one of a wide array of chemical signals from conspecifics, as well as heterospecific mouse species and common predators and used in-situ hybridization to visualize the activation of neurons within the vomeronasal organ. They found that the neurons were extremely sensitive to some signals, especially those likely to be of biological importance such as conspecifics or predators, whereas neurons did not respond to signals from species not likely to interact with mice in a biologically meaningful capacity. Additionally, they found that the number of activated neurons depended on the complexity of the signal, and that mice exposed to multiple cues at once showed a higher rate of neuronal activation. These findings indicate that each neuron is specifically activated by a narrow range of biologically chemical cues.

As our understanding of the vomeronasal organ continues to progress, it is becoming clear that this structure, while clearly important for the detection of pheromones, fulfils many biologically important roles as a major chemosensory system.

11

Evolution and phylogeny of the vomeronasal organ:

The vomeronasal organ is present in all major clades of terrestrial vertebrates and appears to have first evolved early in the tetrapod lineage (Eisthen, 1992) (Figure 1.3). The vomeronasal organ is absent from fishes and appears to have been secondarily lost in many groups of aquatic tetrapods. These findings led Bertmar (1981) to conclude that the vomeronasal organ likely first evolved in response to evolutionary pressures associated with the transition to a terrestrial lifestyle. Eisthen (1992) refutes this hypothesis based on evidence that the vomeronasal organ is present in many fully aquatic species, as well as amphibian larvae. Regardless of the evolutionary pressures responsible for the development of this chemosensory system, it is agreed that the presence of a fully formed vomeronasal organ is limited to tetrapods (D’Aniello et al. 2017). The vomeronasal organ is especially prominent in rodents and squamate (Isogai, 2011; Halpern & Martinez-Marcos, 2003).

Figure 1.3: Phylogeny of the vomeronasal organ in vertebrates: Cladogram of extant vertebrate groups illustrating the phylogeny of the vomeronasal system (VS) in vertebrates. Numbered lashes indicate: 1) VS organ is present in embryos, larvae, or adults; 2) VS is secondarily lost in adults; 3) VS is lost in some groups within this clade, but present in many others. Figure adapted from Eisthen (1992).

12

Vomeronasal organ structure and neurology:

As its name suggests, the vomeronasal organ is located at the base of the nasal cavity near the vomer and nasal bones. The organ consists of a bi-lobed cartilaginous or bony capsule lined with a sensory epithelium composed of thousands of densely packed neurons (Døving& Trotier, 1998; Rehorek, 2000b).

The axons of vomeronasal chemosensory neurons lead to the accessory olfactory bulb of the brain for initial processing and interpretation of sensory signals (Rodriguez, 1999; Halpern, 1987). Because the sensory epithelium is located inside the rigid capsule, signal molecules must be transported into the vomeronasal lumen before they can be recognized. The vomeronasal lumen is accessed by a pair of ducts; one leading to each lobe of the organ (Halpern; 1987). The location of the duct opening varies by taxa but is usually located in the frontal portion of the hard pallet or sometimes inside the nasal cavity. In squamates, this opening is found in the hard pallet in the front of the mouth where signal molecules are most often delivered by movements of the tongue. As the tongue in many squamates is forked and attenuated (tapering toward the tip), it has been suggested that the tongue is likely inserted into the vomeronasal ducts as a means of transferring chemical cues collected from the environment (Broman, 1920), however, this mechanism has since been shown to be false and has been thoroughly refuted. Schwenk (1994) showed that forked tongues have evolved multiple times throughout vertebrate evolution, but the appearance of this feature is uncorrelated with the presence of a prominent bi-lobed vomeronasal organ. The forked tongue rather appears to be a specialized adaptation to following chemical trails allowing animals to infer the direction of a chemical trail based on the direction and location of each fork of the tongue (Schwenk, 1994). The movement of molecules into the vomeronasal ducts appears rather to be due to movement of fluid out of and into the vomeronasal lumen. This conclusion illustrates the importance of the vomeronasal organ in groups with a forked tongue (such as snakes and monitor lizards) and shows that the degree of specialization of this chemical sense has strongly influenced the evolution of associated structures (Brykczynska et al. 2013).

13

Structures of vomeronasal organs vary surprisingly little by taxa (Bertmar, 1981). The vomeronasal lumen is commonly a fluid filled space within the vomeronasal capsule. The lumen is lined with a sensory epithelium comprised of thousands of sensory neurons (Halpern, 1987). The sensory neurons are bipolar with the cell body located near the base of the sensory epithelium (Halpern & Martinez-Marcos, 2003). The axons extend from the sensory epithelium to the accessory olfactory bulb of the brain. Each neuron possesses a dendritic projection oriented toward the fluid-filled center of the vomeronasal lumen (Figures 1.4, 1.5), and the tip of each projection is formed into many microvilli containing chemical receptor proteins (Rodriguez, 1999; Halpern 1987).

Figure 1.4 Chemosensory neurons in vomeronasal sensory epithelium: Vomeronasal sensory epithelium of a mouse. Neurons are tagged with fluorescent labels in order to observe the structures of individual cell bodies and dendrites. Chemical receptors are concentrated on microvilli bordering the vomeronasal lumen. Image adapted from Rodriguez et al (1999)

14

Figure 1.5 Cross section of the vomeronasal organ of T. s. parietalis: Tissue was decalcified, formalin fixed, paraffin embedded and sectioned to a thickness of 10µm. Section was stained with hematoxylin & eosin. Vomeronasal lumen is visible as clear space surrounded by sensory epithelium and non-sensory epithelium.

Vomeronasal chemosensory receptors:

Each neuron within the vomeronasal sensory epithelium expresses only one of three distinct classes of G-protein coupled receptor; V1R, V2R, or formyl peptide receptors (FPRs) (Dulac & Axel, 1995; Francia et al. 2014). Receptor expression patterns in vomeronasal sensory neurons are generally described to follow the “one neuron, one receptor” rule; each individual neuron expressing only a single receptor gene at any given time (Brykczynska et al. 2013; Isogai, 2011; Mazzoni et al. 2004; Bargmann, 1997). Most vomeronasal receptors bind a specific ligand, thus each neuron in the vomeronasal organ responds only to one chemical cue. Complex chemical cues containing many ligand types are interpreted by the accessory olfactory

15 bulb and recognized as a distinct pattern of neuronal activation (Isogai, 2011; Rodriguez, 1999).

In general, V1R vomeronasal receptors are thought to be more efficient at acquiring chemical signals in a dry environment, whereas V2R receptors are more efficient at acquiring chemical signals dissolved in aqueous fluid (Isogai et al. 2011; Brykczynska et al. 2013). The vomeronasal organ of most mammal lineages contain a very high proportion of V1R receptors whereas squamates express a very high proportion of V2R receptors. This has been proposed as a consequence of the evolution of mammal lineages and the transition from water to land (Shi & Zhang, 2007). However, Brykczynska et al. (2013) refutes this hypothesis, arguing instead that the opposite extremes in vomeronasal receptor composition observed in mammals and reptiles represent two unrelated lineage specific expansions in receptor gene repertoires.

The role of the vomeronasal organ in snakes:

The primary function of the squamate vomeronasal organ is to detect nonvolatile chemical signals (Rehorek, 2000a). Squamates, particularly snakes, have an extremely well-developed vomeronasal sense which is usually paired with a tongue which is well adapted as a sophisticated chemical collection and delivery system (Brykczynska et al. 2013; Schwenk, 1994). The vomeronasal organ in many groups facilitates a direct interaction between a vertebrate’s chemical environment and its brain, thereby acting as a link by which the chemical environment is able to directly influence behavior (Isogai, 2011; Dulac & Axel, 1995; Buck & Axel, 1991). The vomeronasal organ of garter snakes in particular plays a critical role acting as the sole mechanism employed for the identification of potential mates and the primary mechanism used to evaluate the quality of potential mates. LeMaster and Mason (2002) demonstrated that male snakes are not only able to distinguish males from females based on chemical cues alone but are also able to assess a female’s snakes quality as a potential mate. The authors found that the pheromone of large female

16 snakes (indicative of high fecundity) contains a larger proportion of ω-9 cis- unsaturated methyl ketones. Male snakes were able to distinguish this difference and preferentially courted extracted pheromone rich in these unsaturated lipids, even in the absence of visual, tactile, or olfactory cues. In addition to being the primary means of detection of sex pheromone, the vomeronasal organ of garter snakes is also the primary mechanism of locating prey via the detection of prey kairomones (Halpern et al. 1983; Kubie et al. 1978; Inouchi et al. 1993).

Evidence for the involvement of the Harderian gland in the vomeronasal chemosensory system:

A growing body of evidence suggests that the Harderian gland may play an important role in the vomeronasal organ of some groups (Halpern, 1992; Payne, 1994; Buzzell, 1996b; Rehorek et al. 2000b). As early as the late 1800’s the association between the terminal end of the lacrimal duct and the vomeronasal organ prompted Born (1876) to suggest a link between the two structures, noting that the secretions of the Harderian gland enter the mouth and the vomeronasal organ. Broman (1920) appears to have been the first to suggest that the Harderian gland secretions may act to dissolve odorant particles delivered to the vomeronasal organ by the tongue. Both Born (1876) and Borman (1920) were printed in German, and their ideas received little attention, especially from English authors, until more recently. Bellairs and Boyd (1950) provided an early English description of the physiological relationship between the lacrimal apparatus (including the Harderian gland) and the vomeronasal organ. Buzzell (1996b) provides a review of the Harderian gland and discusses its association with the vomeronasal organ.

Hillenius et al (2001) used an injection of India ink to trace the path of Harderian secretions of the leopard frog (Lithobates pipiens) and the American bullfrog (Lithobates catesbiana). The authors showed that Harderian gland secretions, despite the lack of a specific duct to channel the fluid, reach the vomeronasal sensory epithelium, suggesting that a function in the vomeronasal system is plausible for the

17 gland in . Schmidt and Wake (1990) demonstrated a similar arrangement in caecilians (Gymnophiona), showing that fluids from the tentacle sheath (which includes Harderian gland secretions) enter the vomeronasal organ. Døving et al. (1993) suggested that the vomeronasal nasal organ of frogs may function to detect waterborne chemical signals which would be concentrated at the surface of the water. This postulation leads Hillenius et al. (2001) to discuss the implications of Harderian gland secretions as possibly performing an odorant-binding function where the eye would often contact these chemical signals as well as the vomeronasal sensory epithelium. These findings suggest that an association between the Harderian gland and vomeronasal organ may predate those described in squamates. This supposition suggests that the function of the ancestral Harderian gland may have included functions of a chemosensory capacity. Rehorek (1997; 1998) and Rehorek et al. (1997) added significantly to the discussion surrounding the role of the Harderian gland in the vomeronasal chemosensory system providing a very detailed review of the associations between the two structures in squamates and demonstrating the association in Australian ( and Strophurus intermedius). In continuation of this line of research, Rehorek (2000a) examined the morphology of the Harderian gland in two species of pygopodid (legless lizards) showing that the nasolacrimal duct forms a continuous physical link between the Harderian gland and the lumen of the vomeronasal organ. In another study, Rehorek et al. (2000b) injected 3H-proline into the Harderian glands of Eastern and Red-sided garter snakes (Thamnophis sirtalis sirtalis and T. s. parietalis). The authors demonstrated that large amounts of radioactivity was present in cells of the vomeronasal sensory epithelium, but was not detected in any other tissue. This confirms that Harderian gland secretions do, in fact, reach the lumen of the vomeronasal organ and suggest that this is the only location to which these secretions are transported. Similar injections using India ink rather than 3H-proline corroborate these findings, suggesting that the vast majority of Harderian gland secretions enter the nasolacrimal duct and are channeled to the vomeronasal organ. Rehorek et al. (2011) yet again provided histological observations of the vomeronasal organ of

18 snakes suggesting that this structure does not possess the intrinsic secretory ability to produce a volume of fluid large enough to fill its lumen. Additionally, the authors used transmission electron microscopy to visualize large protein secretory bodies in the posterior lobe of the Harderian gland, indicating that the secretions are primarily proteinaceous in nature.

Rehorek concluded that the Harderian gland in T. s. parietalis is a large secretory structure organized specifically to secrete proteinaceous and mucous fluid into the nasolacrimal duct and thus, into the lumen of the vomeronasal organ, and that the majority of the proteins within the lumen of the vomeronasal organ likely originate from the Harderian gland. (Rehorek et al. 2000b; Rehorek et al. 2011).

The protein components of the fluid within the vomeronasal lumen are thought to be important to the function of the vomeronasal chemosensory system; facilitating the detection of non-polar, water-insoluble chemical cues including sex pheromones (Mason & Halpern, 2011). Although this fluid is likely an important functional component of the vomeronasal chemosensory system in squamates, the protein components of this fluid have not been identified

Ink exiting vomeronasal ducts

Figure 1.6 India Ink injected into the Harderian gland: Ink injected into the Harderian gland of an anesthetized snake flows through the nasolacrimal duct into the vomeronasal lumen before passing through the vomeronasal ducts into the mouth. Harderian gland secretions follows the same route – exiting the vomeronasal ducts to coat the tongue. Photo: Erickson, 2007.

19

Figure 1.7 The physical association between the Harderian gland and the vomeronasal organ in squamates: The Harderian gland of T. s. parietalis secretes its contents into the nasolacrimal duct to be transported directly to the lumen of the vomeronasal organ. Harderian gland secretions are in contact with the vomeronasal sensory epithelium and exit through the vomeronasal duct to coat the tongue allowing it to interact directly with chemical signals in the environment.

HG = Harderian gland NC = Nasal cavity LC = Lacrimal canal VNO = Vomeronasal organ LD = Lacrimal duct

20

Sex pheromones:

The term “pheromone” was coined by Karlson and Lüscher in 1959 to describe substances which are produced by one individual and received by another individual of the same species affecting its physiology or behavior. The definition of a pheromone is based on a function rather than a specific molecular structure. Therefore, a wide variety of chemical signals may be classified as pheromones presuming they fit this broad functional definition. Currently, the commonly used definition of a pheromone states that the chemical signal must be released with the ‘intent’ of altering the behavior or physiology of a conspecific individual (‘intent’ is interpreted to mean the function of a pheromone signal must have evolved for the purpose of conspecific communication). Sex pheromones are a specific subset of pheromones released in order to sexually attract a conspecific of the opposite sex (Bernstein & Bernstein, 1997). The primary function of sex pheromones are to facilitate the location and identification of potential mates (Gomez-Diaz & Benton 2013). Sex pheromones have been identified in a multitude of animals including but not limited to: insects, arachnids, crustaceans, fishes, amphibians, squamates and mammals (Leal, 2005; Gasket, 2007; Sorensen & Stacy, 2004; Balanger & Corkum, 2009; Mason, et. al 1989; Achiraman & Archunan, 2005).

Sex pheromones are released in order to allow identification and to attract potential mates of the opposite sex, therefore the release of pheromones are commonly found to be extremely sex-biased being produced only in one sex (Gomez- Diaz & Benton, 2013; Mason et al. 1989; Moraes et al. 2008).

The most often described vertebrate pheromone belongs to the mouse Mus musculus (Achiraman & Archunan, 2005). Sex pheromones in mice and the mechanisms of release and reception are well known (Cheetham, et al. 2007).

The female sexual attractiveness pheromone of the Red-sided garter snake T. s. parietalis is discussed throughout this dissertation as it is one of a small number of vertebrate pheromones that have been isolated, characterized and synthesized. The sex pheromone of the Red-sided garter snake is used primarily as a mechanism for

21 male snakes to locate females, but also conveys information allowing the male to asses a female’s quality as a potential mate (LeMaster & Mason 2002). As the size and number of offspring are correlated with the body size of female snakes, this is an important factor influencing male mate choice and reproductive success (Shine, 2003; Gregory, 1977). Male snakes are able to detect the subtle differences in pheromone profiles, and were found to prefer pheromones indicative of larger, and therefore more fecund females. Sexual attractiveness pheromone appears to be the only method by which male Red-sided garter snakes recognize females. Therefore, this pheromone is considered essential to reproductive success in this group.

22

Pheromone-binding proteins:

The first pheromone-binding proteins were discovered in the silk moth (Antheraea polyphemus). This binding-protein was first characterized and described by physical observations of binding the known pheromone molecule (Vogt and Ridiford,1981). Since the first description, pheromone-binding proteins have been described in a large number of taxa, and are common among animals (Pelosi & Maida, 1990). With the advent of high throughput sequencing technology, a staggeringly large number of putative pheromone-binding proteins have been identified in insects (Fan et al. 2011). The functions of many of the proteins have been confirmed through experimental evidence (Brito, et al 2016; Pelosi and Maida, 1995; Chang et al. 2015). In insects, a large number of odorant-binding proteins may be identified even within an individual species. Hekmat-Scafe et al. (2002) found 51 odorant-binding proteins in Drosophila melanogaster alone.

Many pheromone-binding proteins have been described in vertebrates as well. Pelosi et al. (1982) is generally credited with the first description of an odorant binding protein in the nasal mucosa of a cow (Bos taurus). Bocskei et al (1992) was among the first to characterize and describe pheromone-binding proteins in a rat (Rattus norvegicus) – a system which is now well known and is the most common vertebrate model for mammalian odorant- and pheromone-binding protein research. The authors used x-ray crystallography to observe physical binding of known pheromone molecules by α2-globulin and Major Urinary Proteins found in the urine of rats. Marchese et al. (1998) and Scaloni et al. (2001) identified pheromone-binding proteins in the nasal mucosa of the domestic pig (Sus scrofa domesticus). Similar to observations of pheromones themselves, pheromone-binding proteins are commonly observed to display sex-biased expression – present at much higher levels in the sex opposite from that producing the pheromone it binds (Beynon et al. 2008; Chang et al. 2015). Kohuro et al (2017) characterized 34 odorant-binding proteins in the Asiatic rice borer (Chilo suppressalis) and used qPCR to evaluate expression by tissue and sex. The authors showed many binding proteins displaying strongly sex-

23 biased expression. Jin et al. (2014) characterized the expression of pheromone- binding proteins in rice borers and argued that the differences in sexually biased expression levels are likely a result of different biological functions of the pheromones they bind – suggesting some pheromone/pheromone-binding protein pairs are used by males to detect conspecific males, some used to locate and trail mates and some used by females to detect both males and other females.

Odorants (often pheromones) are commonly found to be small, often hydrophobic molecules (Pelosi & Maida, 1990). Many odorant- and pheromone- binding proteins are, not surprisingly, found to be identified as small molecule- binding proteins such as lipocalins (Tegoni et al. 2000). Lipocalins are a large family of small extracellular proteins with diverse sequences, but with conserved sequence motifs (Flower, 1996). Lipocalin protein structures are highly conserved with an eight-stranded β-barrel which encloses an internal ligand-binding site. Lipocalins have been identified as pheromone-binding proteins in many taxa (Tegoni et al. 2000). The odorant binding proteins first identified by Bocskei et al. (1992) were subsequently identified as lipocalins (Tegoni et al. 2000). The protein ‘aphrodisin’ was identified as a pheromone-binding lipocalin in the hamster (Mesocricetus auratus) (Singer & Macrides, 1993). Lipocalins are also identified in the salivary glands of domestic pigs (Marchese, et al. 1998) and in the urine of the mouse (Mus musculus) (Robertson, et al. (1998).

Although many lipocalins have been identified as pheromone-biding proteins, Tegoni et al. (2000) argued that the identification of a protein with the appropriate structural classification is not sufficient to claim the identification of a pheromone- binding protein without further evidence. To function as a pheromone-binding protein, a protein must be capable of physically binding a pheromone, and also must be present in a location where it contacts both the pheromone and the chemical receptor responsible for the detection of that pheromone. Pheromone-binding proteins must also be expressed in the appropriate sex and are likely to show sex-biased expression.

24

Although many lipocalins are identified and are confidently confirmed as pheromone-binding proteins in many groups, there are still a great number of lipocalins which may have other functions. The binding of small molecules, especially in sensitive tissues such as mucous membranes, may have many biological functions. Stopková et al. (2014; 2017) investigated the physiological functions and evolutionary patterns of lipocalins identified in fluid on the mucous membranes of the eyes of mice. The authors present evidence for an alternative function of lipocalins expressed alongside pheromone-binding lipocalins. The authors argue that lipocalins may be used for both chemical signaling and in the immune system of the same tissue. Both sexually dimorphic and non-sexually dimorphic lipocalins were identified in the ‘tears’ of mice used in these studies. The most highly sexually dimorphic lipocalins were identified as major urinary proteins (many known pheromone-binding proteins). Lipocalins are known to interact with small molecules in some capacity. Lipocalin α-1-microglobulin is known to bind and reduce radicals (Akerstrom, et al. 2007). Flo et al. (2004) showed that the protein Lipocalin-2 is involved in the initiation of immune responses by binding and sequestering iron. These findings present an interesting new hypothesis about alternative functions of lipocalins in biological fluids.

Because the garter snake sexual attractiveness pheromone is very non-polar and insoluble in water, it is hypothesized that a pheromone-binding protein is expressed in males of this group and is responsible for transportof the pheromone to the vomeronasal sensory epithelium (Mason & Halpern, 2011).

25

Thamnophis sirtalis parietalis as a research model:

Thamnophis sirtalis parietalis (the Red-sided garter snake) provides an excellent research model that allows the use of multiple approaches in order to study many different aspects of vertebrate physiology. One of the most important characteristics of a vertebrate model is the availability of accessible wild populations large enough to provide adequate sample sizes or the ability to maintain sufficient numbers in a laboratory facility. Wild populations of T. sirtalis in the Interlake region of Manitoba, Canada are easily accessible and seasonally present in extremely high densities allowing the observation or capture of thousands of animals per hour (Aleksiuk & Gregory, 1974). They can be easily transported, and maintained for several years in a captive environment, and they will readily breed and give birth in captivity if housed appropriately Blakemore et al (in prep).

Thamnophis sirtalis is undoubtedly a uniquely powerful research model with a broad range of applications within vertebrate physiology, but it is especially valuable as a model for the study of the vomeronasal chemosensory system and the Harderian gland. These structures are highly developed in this species (Rehorek, et al 2000b) and the vomeronasal chemosensory system is an extremely important component of garter snake physiology affecting many different aspects of their biology. Garter snakes display very robust courtship, mating, and feeding behaviors driven almost entirely by their chemical senses. This allows the use of uniquely effective behavioral assays which have been successfully used as a research tool for decades (Kubie & Halpern, 1975; LeMaster et al. 2001; O’Donnel at al., 2004).

Laboratory mice have historically occupied the role as the most common model used for the study of the vomeronasal organ and Harderian gland. The genome of the house mouse (Mus musculus) was initially sequenced in 2002 and decades of previous research on this model has generated a vast knowledge base to be used as an informative tool (Mouse Genome Consortium, 2002). The vomeronasal organ of mice has been long identified as a chemosensory system used primarily for the detection of conspecific pheromones and heterospecific kairomones. The vomeronasal

26 chemosensory system in snakes, however, is far more refined than those of rodents and acts as a more prevalent mechanism affecting behavior (Zuri & Halpern, 2003; Miller & Gutske 1999). Snakes provide an extremely powerful and informative model for the study of the vomeronasal organ, and especially for the study of the Harderian gland when compared to mice.

Additionally, T. s. parietalis displays several unique characteristics arising from its seasonally driven life history phases that accentuate behavioral and physiological adaptations to seasonal environmental changes (Aleksiuk & Gregory, 1974). Thamnophis sirtalis displays robust and mutually exclusive seasonal shifts in behavior. Mating behavior occurs immediately after emergence from hibernation and abruptly shifts to feeding behavior several weeks later (Gregory, 1974).

During the winter, the entire adult population of T. s. parietalis brumate in underground hibernacula to escape cold temperatures and to gain opportunities to mate in the spring. The largest of these hibernacula may contain tens of thousands of individuals (Aleksiuk & Gregory, 1974). As temperatures rise in the spring, male snakes emerge and congregate near the entrance to the den to await the emergence of females. Female snakes emerge over an extended period of approximately 30 days. The sole method by which males locate and recognize females is by using their vomeronasal organ to detect sexual attractiveness pheromone expressed on the skin of female snakes (LeMaster and Mason, 2002). The high ratio of male to female snakes near the den in the spring mating period results in mating aggregations (‘mating balls’) in which large numbers of male snakes surround a single female to compete for mating opportunities (Gregory, 1974). During this spring mating time period, male snakes are very active and dedicate a very large fraction of their active time to searching for attractive females. While at the den, male snakes do not search for prey, and show no interest in feeding, even if prey items are offered (O’Donnell et al. 2004). Spring mating behavior continues until all of the female snakes have emerged and moved away from the den. When mating opportunities have ceased at the den, male snakes disperse and travel as far at 20km to find appropriate bodies of water

27 with sufficient populations of prey items to serve as summer feeding grounds (Aleksiuk & Gregory, 1974). Upon arrival at the feeding areas, garter snakes no longer show interest in mating. At this point, the vomeronasal organ is used primarily to search for prey items and snakes do not respond when presented with female pheromone (O’Donnell, 2004). Snakes remain in the summer feeding area until early fall when they return to the den site to repeat this annual behavioral cycle.

The female sexual attractiveness pheromone of the Red-sided garter snake is one of a small number of vertebrate pheromones that have thoroughly characterized. Sexual attractiveness pheromone is composed of a homologous series of long chain methyl ketones, present on the skin of female snakes. This mixture of lipids exists as a hydrophobic solid which does not readily dissolve in an aqueous environment (Mason et al. 1989).

Due to the hydrophobic nature of the molecules that make up pheromone, a mechanism must exist to dissolve it in aqueous solution before it can be transported into the vomeronasal lumen to contact sensory neurons (Mason & Halpern, 2011). Harderian gland secretions are hypothesized to perform this function. Huang et al. (2006) demonstrated that sex pheromone, while insoluble on its own, became soluble in the presence of Harderian gland homogenate. Histological studies show that the Harderian gland contains large numbers of secretory vesicles containing proteins (Rehorek, 2011). As the vomeronasal organ has little intrinsic capacity for fluid secretion, it appears that the Harderian gland produces the fluid filling the vomeronasal lumen (Rehorek et al. 2000b). These observations suggest that a component of Harderian gland secretion facilitates the solubilization and transport of nonvolatile and nonpolar chemical signals (such as pheromone) from the environment to the vomeronasal sensory epithelium (Rehorek et al. 2011; Mason & Halpern, 2011). Arguably, the two most important life history events in the life of any animal are breeding and feeding, therefore the vomeronasal organ is a crucial component of the survival and fitness of individual snakes. The Harderian gland and the vomeronasal organ appear to function together in this species to facilitate

28 chemoreception, affecting the snakes’ ability to recognize and assess mates as well as to detect and locate prey. In this sense, the Harderian gland of this species may be as important to the chemosensory system as the vomeronasal organ itself.

29

Seasonal and sexual dimorphism of the Harderian gland in T. s. parietalis:

Erickson (2007) investigated the structure of the Harderian gland in T. s. parietalis using histological techniques and documented sexually dimorphic seasonal changes to the gland’s structure between the winter brumation, spring mating, and summer feeding timepoints. This study was performed as an independent undergraduate research project in partial fulfilment of a bachelor’s degree from the Oregon State University Honors college. As this study was performed in the Mason laboratory, but has not yet been submitted for publication, I include here an abbreviated description of the histological and statistical methods used and report the findings of the author.

Twenty-seven adult Red-sided garter snakes (n= 15 males and n=12 females) were collected from the field site near Inwood Manitoba. “Summer” snakes were collected in late spring and housed in summer conditions for 20 weeks prior to tissue collection. “Winter” snakes were collected at the same location in late fall and housed in winter conditions for 10 weeks prior to tissue collection. “Spring” snakes were also collected in the fall and housed under winter conditions until spring. Tissues were collected 7 days post-emergence from brumation. All snakes were weighed and snout-vent lengths collected prior to euthanasia.

Snakes were anesthetized with an injection of Brevital® sodium and tissues perfused with 10% phosphate-buffered formalin via peristaltic pump. This method fixes the tissue quickly resulting in improved quality of histology and imaging. Following perfusion, Harderian glands were removed and individually weighed. Glands were embedded in paraffin, sectioned to a thickness of 10 µm. Slides were stained using a standard hematoxylin and eosin staining protocol. Slides were microscopically photographed and histological measurements of acinar cell heights and lumen diameters were collected for 20 acini per tissue.

30

Statistics:

Paired t-tests were used to compare the masses of left and right Harderian glands. A 3-way ANOVA was used to compare left and right glands, males and females, and seasonal differences in acinar cell height and acinar lumen diameter. No significant differences were detected between measurements of left and right acinar cell heights and lumen diameters, therefore the measurements from left and right glands were combined for a total of 40 cell heights and 40 lumen diameters from each individual. In order to account for sexually dimorphic differences in body mass and head size (as female T. s. parietalis are generally much larger than males of the same age), gland masses were divided by head length to create an “adjusted HG mass”. Residuals of adjusted masses were calculated from a linear regression model after the regression was checked for normality. A “HG mass index” was defined as the residuals from the regression of Harderian gland mass vs head length. A two-way ANOVA was used to determine residuals of gland mass with sex and season as factors. All post-hoc comparisons were analyzed using Tukey’s HSD. All statistical tests were performed using SigmaStat 3.1®.

31

Results:

Harderian Gland Mass:

Harderian gland mass correlated significantly with head length in both male and female snakes (R2=0.68, df=25. P<0.001) (Figure 1.8). Mean HG mass indices were significantly different between male and female snakes (F=7.601, df=1, p=0.012) and significantly different between each of the 3 seasons in which measurements were collected (F=6.277, df=2, p=0.007). The mean HG index of males was significantly larger than that of females (q=3.899, p=0.012). The mean HG Index in the summer was significantly greater than that in winter (q=4.687, p=0.009), and in spring (q=3.877, p=0.032). The mean HG index did not differ significantly between the winter and spring time points. The sex by season interaction term was not significant. Adjusted HG masses were significantly different when compared across all 3 seasons (F=9.055, df=2, p=0.001). Adjusted HG mass of males compared to females across all three seasons was not significantly different. Adjusted HG masses were significantly greater in the summer compared to both spring (q=5.425, p=0.003) and winter (q=4.969,p=0.006).

Figure 1.8 Harderian gland mass against head length: Figure shows the linear regression models produced from Harderian gland mass in grams plotted against the total length of the head in millimeters. Regression lines were fit using SigmaStat 3.1® for (n = 15 males (red) and n= 12 females (black)). Male glands were found to be significantly larger than females during all seasons in which they were measured. Figure: Erickson (2007).

32

Acinar cell height:

Acinar cell height significantly differed by season (F=151.706, df=2, p<0.001), by sex (F=21.532, df=1, p<0.001), and with respect to sex by season interaction (F=15.921, df=2, p<0.001) (Figure 1.9). Combined male and female cell heights in summer were significantly greater than those in winter (q=24.918, p<0.001). Male and female cell heights in summer were significantly greater than those in spring (q=15.671, p<0.001). Male and female cell heights in spring were significantly greater than those in winter (q=9.247, p<0.001). Combining data from all three seasons; cell heights in male snakes were greater than those of female snakes (q=6.711, p<0.001). Acinar cell heights collected from male snakes during the spring were significantly larger than those collected from females during the spring (q=10.497, p<0.001). Cell heights of males compared to females were not significantly different during the winter or during the summer time points. Acinar cell heights of male snakes were significantly higher in the summer than in winter (q=18.012, p<0.001), significantly higher in the summer than in spring (q=6.147, p<0.001), and significantly higher in the spring than in winter (q=11.865 p<0.001). Acinar cell heights in female snakes were significantly greater in summer compared to winter. Cell heights in female snakes were not significantly different in spring compared to winter. Figure 1.9 Cell height in males was Acinar cell heights by sex and season: Mean acinar cell heights recorded from male (n=5; red) and significantly greater than female (n=4; black) garter snakes during winter brumation, spring mating and summer feeding time periods. Cell heights females only during the spring in males were significantly larger than those of females during the spring mating period (p<0.001). Bars indicate (q=10.497, p<0.001). standard error associated with mean cell heights. Figure: Erickson (2007).

33

Acinar lumen diameter:

Acinar lumen diameter significantly differed by season (F=59.094, df=2, p=0.001), sex (f=11.181, df=1, p=0.002), and with respect to sex by season interaction (F=5.122, df=2, p=0.010) (Figure 1.10). Lumen diameters of males were significantly greater than those of females in the winter (q=4.580, p=0.002) and during the spring (q=4.879, p=0.001). Lumen diameters of males and females were not significantly different during the summer. The acinar lumen of male snakes varied by season, being significantly greater during summer compared to winter (q=6.704, p<0.001), and during summer compared to spring (q=7.759, p<0.001), but were not significantly different during winter compared to spring. Lumen diameter of female snakes also varied by season, being significantly larger in summer compared to winter (q= 11.285, p<0.001) or spring (q=12.511, p<0.001), but were not significantly different between winter and spring. Lumen diameters of male snakes were significantly greater than those of females in the spring (q=4.580,p=0.002). Combining male and female Figure 1.10 Acinar lumen diameters of Harderian glands by sex and acinar lumen diameters, those season: during summer were Recorded acinar lumen diameters from n=5 males (red) and n=4 females (black) Lumen diameter of acini for male and significantly greater than female garter snakes compared over three seasons (n = 5 males, 4 females per season). Bars indicate standard error. winter (q=12.880, p<0.001), Figure: Erickson (2007). and significantly greater during summer than spring (q=14.498, p<0.001). Combined male/female lumen diameters of spring snakes were not significantly different from those of winter snakes. Combining data from all three seasons, acinar lumen diameters of male snakes were significantly greater than those of female snakes (q=4.887, p<0.001).

34

Discussion of the findings of Erickson (2007):

Previous studies investigating the Harderian gland have reported sexual dimorphism in the gland’s structure. Shiao et al. (2012) used transcriptomic analyses to show that the expression of olfactory receptors in mice is sexually dimorphic and likely results in a difference in abilities between male and female mice to detect chemical cues. Dawley & Crowder (1995) observed significant seasonal and sexual dimorphism in the vomeronasal sensory epithelium of the red-backed salamander (Plethodon cinereus). Other studies have reported sexual dimorphism in the European green toad, (Bufo viridis) (Minucci et al. 1989; 1994), the frog, ( esculenta) (Serino et el. 2007), in the rat (Rattus norvegicus) (Sashima et al. 1989), and the golden hamster (Mesocricetus auratus) (Buzzel, 1996a; Bucana & Nadakavukaren, 1972). The composition of Harderian gland protein secretions was found to be sexually dimorphic in African Clawed frog Xenopus leavis (Varriale & Chieffi, 1997) and in the hamster (Hoh et al. 1984). Harderin, an mRNA found in the Harderian gland of R. esculenta, was found to be differentially expressed and regulated by hormonal concentrations in males and females (Serino, et al. 2007).

Seasonal variation in the Harderian gland has been reported in amphibians, (Chieffi et al. 1992; Di Matteo et al. 1989; 1995; Minucci et al. 1989; 1990; Serino at al., 2007), in the hamster M. auratus (Buzzell at al., 1996), and in the Tarentola mauritanica (Chieffi et al. 2000). Seasonal variation in the number of sensory neurons was observed in the Japanese toad (Bufo japonicus) (Nakazawa, et al. 2009). The authors additionally noted that an increased sensitivity to specific chemical cues was associated with the observed neurogenesis.

The results presented here show that the physical structure of the Harderian gland in T. s. parietalis is both seasonally variable and sexually dimorphic. In both males and females, the Harderian glands are larger in the summer compared to those in spring and in winter, and males are shown to have relatively larger glands than females. Histological results mirror those obtained from mass measurements, showing that acinar cell heights differ significantly by sex and by season, and that the

35 cell heights of male snakes are greater than those of females. The difference in cell height between males and females was greatest in the spring when cell heights of males were significantly larger than those of females. This difference was not observed during the winter or the summer. Similarly, the acinar lumen diameters of both males and females was greatest during the summer and not significantly different from one another. However, in both winter and spring, the acinar lumen diameter in males was larger than that of females. The findings presented here are the first description of sexual dimorphism in the Harderian gland in any squamate taxon.

It is well documented that the annual behavioral cycles of T. s. parietalis are tightly coordinated with the seasonal environment (Aleksiuk & Gregory, 1974). The results presented here show that the Harderian gland of T. s. parietalis displays sexually dimorphic structural changes occurring in coordination with the annual behavioral cycles observed in this species. Acinar cell height is most likely to be the best indication of secretory activity, as the cells must accumulate secretory granules in order to increase secretory activity (Rehorek et al. 2011).

Here, we observe a large difference in acinar cell acinar cell heights between males and females in the spring – males appearing to be enlarged and active. During the winter, the Harderian glands of both males and females are regressed and inactive, and during the summer, the Harderian glands of both males and females are observed to be enlarged and active.

During the spring mating period, male T. s. parietalis are consistently observed to actively use their vomeronasal chemosensory system to search for mates expressing female sexual attractiveness pheromone (Aleksiuk & Gregory, 1974; LeMaster and Mason, 2002) whereas females are not observed to tongue flick and do not appear to use their vomeronasal organs to explore their chemical environment. In the winter, neither sex appears to use their vomeronasal organ, but during the summer, both males and females appear to actively use their vomeronasal organ to locate prey kairomones, and likely to detect predator kairomones, or conspecific pheromones.

36

The results presented here mirror these behavioral cycles showing that the Harderian glands of both sexes are enlarged and active during the seasons when the vomeronasal organ is active and biologically important and regressed when it is inactive. This suggests that the Harderian gland is intricately involved in the vomeronasal chemosensory system in this species. Sexually dimorphic seasonal changes in the function and structures within the vomeronasal chemosensory system have been observed in amphibians. Dawley & Crowder (1995) observed seasonal and sexual dimorphism in the vomeronasal sensory epithelium of the red-backed salamander (Plethodon cinereus). Due to the seasonal nature of these salamanders and the manner in which the vomeronasal organ is used during different seasons, the authors suggested seasonal neurogenesis in the vomeronasal sensory epithelium may be due to the observed shift from mating to feeding.

The observations of sexual dimorphism and seasonal changes to the Harderian gland discussed here, in addition to the histological observations previously described by Rehorek (2000b; 2011) demonstrate that the vomeronasal chemosensory structures of T. s. parietalis including the Harderian gland are seasonally regulated and may be optimized for the conservation of energy through brumation, for the detection of sex pheromones by males during the spring, and for the detection of prey kairomones my males and females during the summer.

37

A brief introduction to technical methods:

Preparing an expression-normalized transcriptome to serve as a reference sequence database:

A common challenge faced during transcriptome sequencing arises from unequal expression of individual genes (Conessa et al. 2016). Due to the nature of high throughput sequencing, genes that are very highly expressed are sequenced at a much higher rate than relatively rare transcripts. This results in very high numbers of reads from common transcripts and few or no reads from rare transcripts, causing the transcriptome to fail to identify rare transcripts (Zhulidov et al. 2004). To compensate for this effect, I created a normalized transcriptome library which reduces the abundance of highly expressed transcripts increasing the likelihood of capturing rare transcripts (Kitchen et al. 2015; Meyer et al. 2009; Zhulidov et al. 2004). This protocol is effective at sequencing rare gene products but comes at the cost of necessarily removing data showing expression levels of any particular gene. Additionally, a transcriptome created from a multi-tissue pool of mRNA removes any information regarding the expression of any particular transcript as well as information indicating the origin (tissue or individual) of individual transcripts. Therefore, to gain biologically relevant data from these sequences, the transcriptome must be paired with other techniques. The normalized transcriptome produced during this research is used repeatedly as a reference database, and 3’ Tag-seq or Protein mass spectrometry (both described below) are used to obtain expression profiles for transcripts and/or proteins used to determine tissue specificity and expression levels of individual transcripts represented in the transcriptome.

Tag-seq – an efficient and effective variation of RNA-seq gene expression profiling:

RNA-seq is a well-established method for the identification of mRNA transcripts expressed in an organism or tissue (Conesa et al. 2016). Traditional RNA-seq utilizes full length transcript sequences to assemble a transcript library and perform gene

38 expression profiling with a single sequencing run. Tag-seq is an alternative method that utilizes only the 3’ end of mature mRNA molecules allowing the use of single- end illumina sequencing and reducing the sequencing depth per sample required to achieve meaningful results (Meyer et al. 2011; Lohman et al. 2016) This method drastically increases the power of RNA-seq while significantly reducing sequencing costs (Lohman et al. 2016). As this method sequences only the 3’ end of each transcript, it is limited in that the sequence data cannot be used to assemble full length transcripts, and thus is reliant upon an appropriate, high-quality reference database assembled from a previous sequencing run. Tag-seq is used in some capacity to assess gene expression profiles in all chapters of this dissertation. Tag-seq reads are mapped to the above described normalized transcriptome to obtain meaningful results.

Enrichment analysis to obtain biologically meaningful results from gene expression data:

The software package ErmineJ™ uses an enrichment analysis method based on gene score resampling (GSR). For GSR, the null hypothesis is that the mean gene score for a particular gene is drawn from the global distribution of gene set scores at the mean rate calculated from all possible gene scores, as determined by resampling (Pavlidis, et al. 2002). Using GSR, ErmineJ tests for functional enrichment of all Gene Ontology terms and all parent terms associated with each annotated gene in a reference database using a user-provided gene score to calculate a p-value that the null hypothesis is true (Gillis, et al. 2010). In short, this method tests whether any particular gene is drawn from a pool of all possible genes at a rate higher than expected based on the null hypothesis. The rate at which each gene is drawn is based on a user-provided gene score. In this case, the gene score used here is the -log10(p- value)x(log2FC) calculated from differential expression analysis. This score uses information about the change in expression as well as the p-value associated with that change in expression making it both directionally aware and incorporates the magnitude of expression changes and the confidence in that determination. After

39 calculating individual p-values for each gene set (based on GO terms), ErmineJ then applies a Benjamini-Hochberg multiple test correction to control for false discovery rate (Benjamini & Hochberg, 1995). This statistical procedure results in a conservative estimation of gene sets which are enriched in any particular dataset based on differential expression analysis (Pavlidis, et al. 2002).

Liquid chromatography - Tandem mass spectrometry to identify and relatively quantify proteins in a complex mixture:

Although mRNA expression profiles are a valuable tool to identify the mechanisms underlying the regulation of physiological processes active in a specific tissue under specific conditions, this method is not always considered a reliable indicator of concentrations of the proteins they encode (Li et al. 2014; Taniguchi et al. 2010). Investigations included in this body of research target those gene products that are not only expressed as mRNA but are translated to proteins that enter the extracellular secretory pathway and are present in the fluid within the lumen of the vomeronasal organ. The discrepancy between mRNA abundance measured via RNA- Seq and their respective proteins actually present in the vomeronasal lumen may be significant. It is therefore necessary to conduct analyses at the level of proteins and in the location where those proteins are likely to be exerting their biological effect. To identify the protein components of vomeronasal secretions, I collected fluid directly from the vomeronasal duct and identified its protein constituents via liquid chromatography coupled with tandem mass spectrometry (LC/MS/MS).

Bacterial killing assays to relatively quantify bactericidal properties of biological fluids:

Variations on bacterial killing assays (BKAs) are methods used extensively in immunological research to quantify the antimicrobial properties of biological fluids such as blood plasma (Dugovich, et al. 2017). In general, a BKA assay includes

40 exposing live bacteria to a biological fluid and allowing antimicrobial proteins within the fluid to bind, inactivate, or kill the bacterial cells. After incubation, the bacteria are incubated and optical density readings are recorded at many time intervals to obtain growth curves. The growth curves are then compared between samples to relatively quantify the antimicrobial potencies of individual samples. Here, I use a modified form of BKA to demonstrate that vomeronasal secretions possess antimicrobial properties, and that those properties decrease with decreased concentrations of secretions.

41

Project overview: The research presented in Chapters 2-6 of this thesis focuses on elucidating the function of the Harderian gland in T. s. parietalis and describing its role within the vomeronasal chemosensory system. Chapter 2 details the creation of a multi-tissue transcriptome and the tissue-specific transcriptome of the Harderian gland in T. s. parietalis and discusses the functional significance of the expressed genes. The research presented in this chapter utilized high throughput RNA-sequencing to create a normalized transcriptome. Expression data obtained from Harderian glands of T. s. parietalis was used to describe the Harderian gland transcriptome. Additionally, the expression of genes in the Harderian gland was compared to that of a multi-tissue pool and enrichment analysis is used to determine the functional significance of the genes expressed in the Harderian gland. Chapter 3 focuses on the characterization of the protein components of vomeronasal secretions and addresses the assumption presented by Rehorek (2000b; 2011) that the Harderian gland produces the majority of the proteins within this fluid. This chapter presents the results of analyses using protein mass spectrometry to identify protein components of vomeronasal secretion and addresses possible tissues of origin of individual proteins via RNA-seq expression analyses. Additionally, the results presented in this chapter confirm putative antimicrobial properties of vomeronasal secretion using BKA assays. Chapter 4 describes seasonal variation and sexual dimorphism of gene expression in the Harderian gland of T. s. parietalis and discusses the functional significance of the expressed genes within the context of the vomeronasal chemosensory system. This chapter used Tag-seq with reads mapped to a normalized transcriptome to obtain gene expression profiles to describe sexual dimorphism and seasonal variation of gene expression in the Harderian gland of T. s. parietalis. Chapter 5 describes the seasonal variation and sexual dimorphism of the vomeronasal receptor repertoire of T. s. parietalis and discusses the functional significance of this variation as a suggested mechanism mediating the observed behavioral shift from mating to feeding which takes place between spring and summer This chapter used Tag-seq with reads mapped to a normalized transcriptome to assess the sexual and seasonal variation in

42 expression of vomeronasal chemical receptors. Chapter 6 combines and synthesizes the findings of Chapters 2-5 and discusses their significance within the context of the body of research suggesting that the Harderian gland functions as a component of the vomeronasal chemosensory system.

43

References: Achiraman, S., Archunan, G. (2005). 3-Ethyl-2,7-dimethyl octane, a testosterone dependent unique urinary sex pheromone in male mouse (Mus musculus). Animal reproduction science. 87:151-61. Akerstrom, B., Maghzal, G., Winterbourn, C., Kettle, A. (2007) The lipocalin alpha(1)- microglobulin has radical scavenging activity. Journal of Biological Chemistry. 282:31493–31503. Albone, E. (1984) Mammalian Semiochemistry. In: The investigation of chemical signals between mammals. (Chichester). Aleksiuk M., and Gregory P. (1974) Regulation of seasonal mating behavior in Thamnophis sirtalis parietalis. Copeia. 1974(3):681489. Alving, W. & Kardong, K. (1996). The role of the vomeronasal organ in rattlesnake (Crotalus viridis oreganus) predatory behavior. Brain, Behavior and Evolution, 48(3):165-172. Bargmann, C. (1997) Olfactory receptors, vomeronasal receptors, and the organization of olfactory information. Cell. 90(4):585-587. Belanger, R., Corkum, L. (2009) Review of aquatic sex pheromones and chemical communication in anurans. Journal of Herpetology 43(2). Bellairs, A. (1970). In: The life of reptiles. Universe Books, New York. Bellairs, A., Boyd, J. (1950). The lachrymal apparatus in lizards and snakes II: The anterior part of the lachrymal duct and its relationship with the palate and with the nasal and vomeronasal organs. Proc. Zool. Soc. Lond. 120:167-310. Benjamini, Y., & Hochberg Y. (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. B57(1):289-300. Bernstein, C., Bernstein, H. (1997) Sexual communication. Journal of theoretical biology. 188(1):69-78. Bertmar, G., (1981). Evolution of vomeronasal organs in vertebrates. Evolution. 35:359–366. Beynon R., Hurst, J., Turton, M., Robertson, D., Armstrong, S., Cheethtam, S., Simpson, D., MacNicoll, A., Humphries, R. (2008) Urinary lipocalins in Rodentia: Is there a Generic Model? In: Chemical Signals in Vertebrates. 11. Springer, New York, NY. Blakemore, L., Bentz, E., Hubert, D., Morehead, A., Mason, R. (In Prep.) Improved husbandry techniques resulting in higher lab-raised neonatal survival in the Red-sided garter snake, Thamnophis sirtalis parietalis.

44

Bocskei, Z., Groom, C., Flower, D., Wright, C., Phillips, S., Cavaggioni, A. (1992) Pheromone-binding to two rodent urinary proteins revealed by X-ray crystallography. Nature. 360:186–8. Boillat, M., Challet, L., Rossier, D., Kan, C., Carleton, A., Rodriguez, I. (2015) The vomeronasal system mediates sick conspecific avoidance. Curr Biol. 25(2):251- 255. Born, G. (1876). Uber die Nasenh6hlen und der Thranennasengang der Amphibien. Gegenb. morph. Jahrb. 2:578-64. Brito, N., Moreira, M., Melo, A. (2016) A look inside odorant-binding proteins in insect chemoreception. Journal of Insect Physiology. 95:51-65. Broman, I. (1920). Das Organon Vomero-Nasale Jacobsoni-ein Wassergeruchsorgan! Arb. anat. Inst., Wiesbaden (Anat. Hefte, Abt. I) 58:137-191. Brownscheidle, C. (1974) The Morphology and Histochemistry of the Harderian Gland of the Mongolian Gerbil, Meriones unguiculatus. Doctoral dissertation, State University of New York at Buffalo, Buffalo, NY. Brykczynska U, Tzika A, Rodriguez I, Milinkovith M. (2013) Contrasted evolution of the vomeronasal receptor repertoires in Mammals and Squamate reptiles. Genome Biology and Evolution. 5(2):389–401. Bucana, C., Nadakavukaren M. (1972). Fine structure of the hamster Harderian gland. Z. Zellforsch. 129:178–187. Buck, L. & Axel, R. (1991) A novel multigene family may encode odorant receptors: a molecular basis for odor recognition. Cell. 65:175-187. Burns, R. (1992) The Harderian gland in birds: histology and immunology. In: Harderian glands: porphyrin metabolism, behavioral and endocrine effects. (Springer)155-163. Buzzell, G. (1996a). Sexual dimorphism in the Harderian gland of the Syrian hamster is controlled and maintained by hormones, despite seasonal fluctuations in hormone levels: functional implications. Microsc. Res. Tech. 34:133–138. Buzzell, G. (1996b). The Harderian gland: perspectives. Microsc. Res. Tech. 34:2-5. Buzzell, G., Menendez-Pelaez, A., Porkka-Heiskanen, T., Pangerl, B., Vaughan, M., Reiter, R. (1989) Bromocriptine prevents the castration-induced rise in porphyrin concentration in the Harderian glands of the male Syrian hamster, Mesocricetus auratus. Journal of Experimental Zoology 249:172-176. Chang, H., Liu, Y., Yang, T., Pelosi, P., Dong, S., & Wang, G. (2015). Pheromone- binding proteins enhance the sensitivity of olfactory receptors to sex pheromones in Chilo suppressalis. Scientific reports. 5:1309. Cheetham, S., Thom, M., Jury, F., Ollier, W., Beynon, J., Hurst, J. (2007) The genetic basis of individual-recognition signals in the mouse. Curr Biol. 17: 1771–1777.

45

Chieffi Baccari, G. Chieffi, G., Di Mateo, L., Danfis, D., De Rienzo, G., Minucci, S., (2000) Morphology of the Harderian gland of the Gecko, Tarentola mauritanica. Journal of morphology. 244(2):137-142. Chieffi, G., Baccari, G., Di Matteo, L., Istria, M., Minucci, S., Varriale, B. (1996). Cell Biology of the Harderian Gland. International Review of Cytology. 168:1- 80. Chieffi, G., Chieffi-Baccari, G., Di Matteo, L., d’Istria, M., Marmorino, C., Minucci, S. and B. Varriale. (1992). The Harderian gland of amphibians and Reptiles. In: Harderian Glands. Porphyrin Metabolism, Behavioral and Endocrine Effects. Webb, S.M., Hoffman, R.A., Puig-Domingo, M.L. and R. J. Reiter (Eds). Springer-Verlag, Berlin. 91-108. Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., Wojciech Szcześniak, M., Gaffney, D., Elo, L., Zhang, X., Mortazavi1, A. (2016). A survey of best practices for RNA-seq data analysis. Genome Biology. 17:13. Coto-Montes, A., Boga, J., Tomás-Zapico, C., Rodrıgueź -Colunga, M., Martıneź - Fraga, J., Tolivia-Cadrecha, D., Menéndez, G., Hardeland, R. (2001) Physiological oxidative stress model: Syrian hamster Harderian gland—sex differences in antioxidant enzymes. Free Radical Biology and Medicine. 30(7):785-792. Cucinotta, F., Chappell, L. (2010) Non-targeted effects and the dose response for heavy ion tumor induction. Mutat Res. 687(1-2):49-53. D'Aniello, B., Semin, G. R., Scandurra, A., & Pinelli, C. (2017). The Vomeronasal Organ: A Neglected Organ. Frontiers in neuroanatomy, 11:70. Dawley, E., Crowder, J. (1995) Sexual and seasonal differences in the vomeronasal epithelium of the red-backed salamander (Plethodon cinereus). Journal of Comparative Neurology. 359(3):382-90. Deist M., Lamont S. (2018) What makes the Harderian gland transcriptome different from other chicken immune tissues? A gene expression comparative analysis. Frontiers in physiology. 9:492. Di Matteo, L., Minucci, S., Chieffi Bacari, G., Pellicciari, C., d’Istria, M., Chieffi, G. (1989) The harderian gland of the frog, Rana esculenta, during the annual cycle: histology, histochemistry and ultrastructure. Basic and Applied Histochemistry. 33(2):93-112. d'Istria, M., Chieffi-Baccari, G., Di Matteo, L., Minucci, S., Varriale B., Chieffi G. (1991) Androgen receptor in the Harderian gland of Rana esculenta. Journal of Endocrinology.129:227–232. Domínguez-Pérez, D., Durban, J., Aguero-Chapin, G., Lopes, J., Molina, Ruiz, R., Almeida, D., Calvete, J., Vasconcelos, V., Antunes, A. (2018). The Harderian gland transcriptomes of Caraiba andreae, Cubophis cantherigerus and

46

Tretanorhinus variabilis, three colubroid snakes from Cuba. Genomics. In press. Døving, K., & Trotier, D. (1998). Structure and function of the vomeronasal organ. Journal of experimental biology. 201(21):2913-2925. Døving, K., Trotier J., Rosin, F., Holley, A. (1993). Functional architecture of the vomeronasal organ of the frog ( Rana). Acta Zool. 74:173-180. Dugovich, B., Peel, M., Palmer A., Zielke R., Sikora, A., Beechler, B., Jolles, A., Epps, C., Dolan, B. (2017) Detection of bacterial-reactive natural IgM antibodies in desert bighorn sheep populations. PLoS ONE. 12(6): e0180415. Dulac, C. & Axel, R. (1995) A novel family of genes encoding putative pheromone receptors in mammals. Cell. 83:195-206. Eisthen, H. (1992) Phylogeny of the vomeronasal system and of receptor cell types in the olfactory and vomeronasal epithelia of vertebrates. Microsc Res Tech. 23(1):1-21. Eisthen, H. (1997). Evolution of vertebrate olfactory systems. Brain Behav. Evol. 50:222–233. Erickson, S. (2007) Sexual dimorphism and seasonal changes in the Harderian gland of the Red-sided garter snake, Thamnophis sirtalis parietalis. Bachelor’s thesis. Oregon State University Honors College. Corvallis OR. Fan, J., Francis, F., Liu, Y., Chen, J., & Cheng, D. (2011). An overview of odorant- binding protein functions in insect peripheral olfactory reception. Genetics and molecular research. 10(4):3056-3069. Figge, F., Atkinson, W. (1945) Relation of water metabolism to porphyrin incrustations in pantothenic acid-deficient rats. in: Proc. soc. exper. biol. & med. Flo, T., Smith, K., Sato, S., Rodriguez, D., Holmes, M., Strong, R., Akira, S., Aderem, A. (2004) Lipocalin 2 mediates an innate immune response to bacterial infection by sequestrating iron. Nature. 432(7019):917-21. Flower, D. (1996) The lipocalin protein family: structure and function. Biochem. 318:1-14. Francia, S., Pifferi, S., Menini, A., Tirindelli, R. (2014) Vomeronasal Receptors and Signal Transduction in the Vomeronasal Organ of Mammals. In: Neurobiology of Chemical Communication. Boca Raton (FL): CRC Press/Taylor & Francis. Gasket, A. (2007) Spider sex pheromones: emission, reception, structures, and functions. Biological reviews. 82(1):27-48. Gillis J. Mistry M. Pavlidis P. (2010) Gene function analysis in complex data sets using ermineJ. Nature Protocols. 5(6):1148-59.

47

Gomez‐Diaz, C., Benton, R. (2013). The joy of sex pheromones. EMBO reports. 14(10):874-883. Gregory, P. (1974) Patterns of spring emergence of the Red-sided garter snake (Thamnophis sirtalis parietalis) in the Interlake region of Manitoba. Can J Zool. 52:1063–1069. Gregory, P. (1977) Life-history parameters of the Red-sided garter snake (Thamnophis sirtalis parietalis) in an extreme environment, the Interlake region of Manitoba. Nat Mus Can Publ Zool. 13:1–44. Halpern, M. (1987) The organization and function of the vomeronasal system. Annual review in Neurosciences. 10:325-362. Halpern, M. (1992). Nasal chemical senses in reptiles. Hormones, brain and behavior: Biology of the Reptilia. 18:423-523. Halpern, M., Kubie, J., Silverstein, R., Muller-Schwarze, D. (1983) Snake tongue flicking behavior: clues to vomeronasal system functions. In: Chemical Signals III. Plenum, NY. 45-72. Halpern, M., Martinez-Marcos, A. (2003) Structure and function of the vomeronasal system: an update. Progress in neurobiology. 70:245-318. Harder, J. (1694). Glandula nova lachrymalis una cum ductu excretorio in cervis et damis. Acta Erudit. Lips. 49–52. Harkness, E., Ridgway, M. (1980). Chromodacryorrhea in laboratory rats (Rattus norvegicus): Etiologic considerations. Laboratory animal science. 30:841-4. Hekmat-Scafe, D., Scafe, C., McKinney, A., Tanouye, M. (2002) Genome-wide analysis of the odorant-binding protein gene family in Drosophila melanogaster. Genome Res.12(9):1357-69. Hillenius, W., Phillips, A. Rehorek, S. (2007). “A new lachrymal gland with an excretory duct in red and fallow deer” by Johann Jacob Harder (1694): English translation and historical perspective. Annals of anatomy = Anatomischer Anzeiger: official organ of the Anatomische Gesellschaft. 189. 423-33. Hillenius, W., Rehorek, S. (2005). From the eye to the nose: ancient orbital to vomeronasal communication in tetrapods? In: Mason, R., LeMaster, M., Mu¨llerSchwartze, D. (Eds.), Chemical Signals in Vertebrates (10). Kluwer Academic, New York. 228–241. Hillenius, W., Watrobski, L., A. Rehorek, S. (2001) Passage of Tear Duct Fluids through the Nasal Cavity of Frogs. Journal of Herpetology. 35(4):701-704. Hipolide, D., Tufik, S. (1995). Paradoxical sleep deprivation in female rats alters drug-induced behaviors. Physiology & behavior. 57:1139-1143.

48

Hoh J., Lin W., Nadakavukaren M. (1984) Sexual dimorphism in the Harderian gland proteins of the golden hamster. Comparative Biochemistry and Physiology. 77B:729-731. Huang, G., Zhang, J., Wang, D., Mason, R., Halpern, M. (2006) Female snake sex pheromone induces membrane responses in vomeronasal sensory neurons of male snakes. Chemical Senses. 31:521–529. Inouchi, J., Wang, D., Jiang, X. C., Kubie, J., & Halpern, M. (1993). Electrophysiological analysis of the nasal chemical senses in garter snakes. Brain, behavior and evolution. 41(3-5):171-182. Isogai1 Y., Si S., Pont-Lezica1 L., Tan T., Kapoor V., Murthy V., Dulac C. (2011) Molecular organization of vomeronasal chemoreception. Nature. 478:241-247. Jacobson, L. (1813). Anatomisk Beskrivelse over et nyt Organ I Huusdyrenes Næse. Veterinær Selskapets Skrifter [in Danish] 2:209–246. Jacobson, L. (1999). Anatomical description of a new organ in the nose of domesticated animals. English translation. Chem. Senses 23 Jin, J., Jhang, T., Lui, N., Dong, S. (2014) Different roles suggested by sex-biased expression and pheromone-binding affinity among three pheromone-binding proteins in the pink rice borer, Sesamia inferens (Lepidoptera: Noctuidae). Journal of insect physiology. 66:71-79. Karlson P., Lüscher, M. (1959) ‘Pheromones’: a new term for a class of biologically active substances. Nature. 183:55–56. Kennedy, G. (1970) Harderoporphyrin: a new porphyrin from the Harderian glands of the rat. Comp. Biochem. Physiol. 36:21–36. Khuhro, S., Liao, H., Zhu, G., Li, S., Ye, Z., Dong, S. (2017) Tissue distribution and functional characterization of odorant binding proteins in Chilo suppressalis (Lepidoptera: Pyralidae). J. of Asia-Pacific Entomology. 20:1104-1111. Kitchen, S., Crowder, C., Poole, A., Weis V., Meyer E. (2015) De Novo Assembly and Characterization of Four Anthozoan (Phylum Cnidaria) Transcriptomes. G3: Genes, Genomes, Genetics. 5(11):2441-2452. Kubie, J., Halpern, M. (1975) Laboratory observations of trailing behavior in garter snakes. Journal of Comparative Physiological Psychology. 89:667-674. Kubie, J., Vagvolgyi, A., Halpern, M. (1978) The roles of the vomeronasal and olfactory systems in the courtship behavior of male garter snakes. Journal of Comp. Physiol. Psychol. 92:627-641. Leal, W. (2005) Pheromone reception. Top Curr Chem 240: 1–36. LeMaster, M, Mason, R. (2002) Variation in a female sexual attractiveness pheromone controls male mate choice in garter snakes. Journal of chemical ecology. 28(6):1269-85.

49

LeMaster, M., Mason, R. (2001) Evidence for a female sex pheromone mediating male trailing behavior in the Red-sided garter snake, Thamnophis sirtalis parietalis. Chemoecology. 11:149-152. Li, J., Bickel, P., Biggen, M. (2014) System wide analyses have underestimated protein abundances and the importance of transcription in mammals. PeerJ. 2:270. Lohman, B., Weber, J., & Bolnick, D. (2016). Evaluation of TagSeq, a reliable low‐ cost alternative for RNA seq. Molecular ecology resources. 16(6):1315-1321. Marchese, S., Pes, D., Scaloni, A., Carbone, V., Pelosi, P. (1998) Lipocalins of boar salivary glands binding odours and pheromones, Eur. J. Biochem. 252:563-568. Mason, R., Fales, H., Jones, T., Pannell, L., Chinn, J., Crews, D. (1989) Sex pheromones in snakes. Science. 254(4915):290-293. Mason, R., Halpern M., (2011). Chemical ecology of snakes: from pheromones to receptors. North American Society for Comparative Endocrinology Conference, 2011. Mazzoni, E., Desplan, C., Celik, A. (2004) ‘One receptor’ rules in sensory neurons. Developmental neuroscience. 26(5-6):388-395. Menendez-Pelaez, A., Buzzell, G., Rodriguez, C., Reiter, R. (1991). Indole and porphyrin content of the Syrian hamster Harderian glands during the proestrous and estrous phases of the estrous cycle. Journal of Steroid Biochemistry and Molecular Biology. 38(1). Meyer, E., Aglyamova, G., Matz, M. (2011) Profiling gene expression responses of coral larvae (Acropora millepora) to elevated temperature and settlement inducers using a novel RNA‐Seq procedure. Molecular Ecology. 20:3599-3616. Meyer, E., Aglyamova, G., Wang, S., Buchanan-Carter, J., Abrego, D., Willis, B., Matz, M. (2009) Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx. BMC Genomics. 10:219. Miller L., Gutzke W. (1999). The role of the vomeronasal organ of crotalines (Reptilia: Serpentes: Viperidae) in predator detection. Animal Behavior. 58:53- 57. Minucci, S., Baccari, G.C., Di Matteo, L. and G. Chieffi. 1989. A sexual dimorphism of the Harderian gland of the toad, Bufo viridis. Basic Appl. Histochem. 33:299- 310. Minucci, S., Chieffi Baccari, G. and L. DiMatteo. (1994). The effect of sex hormones on lipid content and mast cell number in the Harderian gland of the female toad, Bufo viridis. Cell Tiss. Res. 26:797-805. Montgomery, R., Maslin, W. (1992) A comparison of the gland of harder response and head associated lymphoid tissue (HALT) morphology in chicken and turkeys. Avian Disease. 36:755-759.

50

Moraes, M., Pareja, M., Laumann, R., Borges, M. (2008) The chemical volatiles (semiochemicals) produced by neotropical stink bugs (Hemiptera: Pentatomidae). Neotrop Entomol 37: 489–505. Mouse Genome Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature. 420:520-562. Mueller, A., Sato, K., Glick, B. (1971) The chicken lacrimal gland, gland of Harder, caecal tonsil, and accessory spleens as sources of antibody-producing cells. Cell Immunol. 2(2):140-52. Nakazawa, H., Ichikawa, M., Nagai, T. (2009). Seasonal increase in olfactory receptor neurons of the Japanese toad, Bufo japonicus, is paralleled by an increase in olfactory sensitivity to isoamyl acetate. Chemical senses. 34(8):667- 678. O'Donnell, R., Shine, R., Mason, R. (2004) Seasonal anorexia in the male Red-sided Garter snake, Thamnophis sirtalis parietalis. Behavioral and Ecological sociobiology. 56:413-41. Pan, P., Waheed, A., Sly, W., Parikkila, S. (2010). Carbonic anhydrase in the mouse Harderian gland. Journal of Molecular Histology. 41(6):411-7. Papes, F., Logan, D., Stowers, L. (2010). The vomeronasal organ mediates interspecies defensive behaviors through detection of protein pheromone homologs. Cell. 141(4):692-703. Parker, M., Mason, R. (2012) How to make a sexy snake: estrogen activation of female sex pheromone in male Red-sided garter snakes. J Exp Biol. 215(5):723- 30. Parnell, P., Crossland, J., Beattie, R. (2005) Frequent Harderian Gland Adenocarcinomas in Inbred White-Footed Mice (Peromyscus leucopus). Comparitive medicine. 55(4):382-386. Pavlidis, P, Lewis, D, Noble, W. (2002) Exploring gene expression data with class scores. Pacific symposium on biocomputing. 474–485. Payne, A. (1994) The Hardian Gland: A tercentennial review. Journal of Anatomy. 185(1):1–49. Pelosi, P., Baldaccini, N., Pisanelli, A. (1982) Identification of a specific olfactory receptor for 2-isobutyl-3-methoxypyrazine. Biochem. J. 201:245-248. Pelosi, P., Maida, R. (1990) Odorant-binding proteins in vertebrates and insects. Chemical senses. 15(2):205-215. Pelosi, P., Maida, R. (1995) Odorant-binding proteins in insects. Comp. Biochem. Physiol. II:1B(3):503-514. Placyk, J., (Jr), Graves, B. (2002). Prey Detection by Vomeronasal Chemoreception in a Plethodontid Salamander. Journal of chemical ecology. 28:1017-36.

51

Rehorek, S. (1997). Squamate Harderian gland: An overview. The Anatomical Record. 248(3):301-6. Rehorek, S. (1998) The embryology of the anterior orbital glands of some squamate reptiles. Acta Soc. Zool. Bohem. 62:155–165. Rehorek, S., Firth, B., and Hutchinson, M. (1997). Morphology of the Harderian gland of some Australian geckos. J. Morphol. 231:253–259. Rehorek, S., Firth, B., Hutchinson, M. (2000a) The structure of the nasal chemosensory system in squamate reptiles. Journal of Biosciences. 25(2):181- 90. Rehorek, S., Halpern, M., Firth, B., Hutchinson, M. (2011). The Harderian gland of two species of snakes: Pseudonaja textilis (Elapidae) and Thamnophis sirtalis (). Canadian Journal of Zoology. 81. 357-363. Rehorek, S., Hillenius, W., Kennaugh, J., Chapman, N. (2005a) The gland and the sac – the preorbital apparatus of muntjacs. In: Mason, R., LeMaster, M., Mu¨ller- Schwartze, D. (Eds.), Chemical Signals in Vertebrates 10. Kluwer Academic, New York. 152–158. Rehorek, S., Hillenius, W., Quan, W., Halpern, M. (2000b) Passage of Harderian gland secretions to the vomeronasal organ in Thamnophis sirtalis (Serpentes: Colubridae). Canadian Journal of Zoology. 78(7):1284-1288. Rehorek, S., Legenzoff, E., Carmody, K., Smith, T., Sedlmayr, J. (2005b) Alligator tears: a reevaluation of the lacrimal apparatus of the crocodilians. J Morphol. 266(3):298-308. Robertson, D., Hurst, J., Hubbard, S. Gaskell, S. Beynon, R. (1998) Ligands of urinary lipocalins from the mouse: uptake of environmentally derived chemicals. Journal of chemical ecology. 24(7):1127-1140. Rodriguez, C., Mayo, J., Sainz, R., Antolin, I., Herrerra, F., Martin, V., Reiter, R. (2003) Regulation of antioxidant enzymes: a significant role for melatonin. Journal of pineal research. 36(1):1-9. Rodriguez, I., Feinstein, P., Mombaerts, P. (1999) Variable patterns of axonal projections of sensory neurons in the mouse vomeronasal system. Cell. 97(2):199-208. Sakai, T. (1981) The Mammalian Harderian Gland: Morphology, Biochemistry, Function and Phylogeny. Arch. Hist. Jap. 44(4):299-333. Sashima, M., Hatakey, S., Satoh M., Suzuki, A. (1989) Harderianization is another sexual dimorphism of rat exorbital lacrimal gland. Acta Anatomica 135:303- 306. Scaloni, A., Paolini, S., Brandazza, A. (2001) Purification, cloning and characterisation of odorant- and pheromone-binding proteins from pig nasal epithelium. Cell. Mol. Life Sci. 58(5-6):823-834.

52

Schmidt, A., Wake, M. (1990) Olfactory and vomeronasal systems of caecilians (Amphibia: Gymnophiona). J. Morphol., 205:255-268. Schwenk, K. (1994). Why Snakes Have Forked Tongues. Science, 263(5153), 1573- 1577. Serino, I., D'Istria, M., Monteleone, P. (1993) A comparative study of melatonin production in the retina, pineal gland and Harderian gland of Bufo viridis and Rana esculenta. Comparative Biochemistry and Physiology Part C: Pharmacology, Toxicology and Endocrinology. 106(1):189-193. Serino, I., Izzo, G., Ferrara, D., d'Istria, M., Minucci, S. (2007). A new sex dimorphism in the Harderian gland of the frog Rana esculenta. Canadian Journal of Zoology. 85:909-915. Seyama, Y., Kasama, T., Yasugi, E., Park, S., Kano, K. (1992) Lipids in Harderian glands and their significance. In: Webb, S., Hoffman, R., Puig-Domingo, M., Reiter, R. (eds). Harderian Glands. Springer, Berlin, Heidelberg. Seyama, Y., Uchijima, Y. (2007). Novel function of lipids as a pheromone from the Harderian gland of golden hamster. Proceedings of the Japan Academy. B83(3);77-96. Shankaran, V., Ikeda, H., Bruce, A., White, J., Swanson, P., Old, L., Schreiber, R. (2001) IFNgamma and lymphocytes prevent primary tumour development and shape tumour immunogenicity. Nature. 410(6832):1107-11. Shi, P., Zhang, J. (2007) Comparative genomic analysis identifies an evolutionary shift of vomeronasal receptor gene repertoires in the vertebrate transition from water to land. Genome Res. 17:166-174. Shiao, M., Chang, A., Liao, B., Ching, Y., Jade Lu, M., Chen, S., Li, W. (2012) Transcriptomes of mouse olfactory epithelium reveal sexual differences in odorant detection, Genome Biology and Evolution. 4(5):703–712. Shine, R. (2003) Reproductive strategies in snakes. Proceedings of the Royal society of London 270:995-1004. Singer, A., Macrides, F. (1993) Composition of an aphrodisiac pheromone, Chem. Senses. 18:630. Smith, M., Bellairs, A. (1947) The head glands of snakes, with remarks on the evolution of the parotid gland and teeth of the Opisthoglypha. Journal of the Linnean Society of London (Zoology). 41:351-368. Soldi, R., Rodrigues, M., Aldrich, J., Zarbin, P. (2012) The male produced sex pheromone of the true bug, Phthia picta, is an unusual hydrocarbon. J Chem Ecol. 38:814–824. Sorensen, P., Stacey, N. (2004) Brief review of fish pheromones and discussion of their possible uses in the control of non‐indigenous teleost fishes. New Zealand Journal of Marine and Freshwater Research, 38:3:399-417.

53

Spike R., Payne A., Moore M. (1988) The effects of age on the structure and porphyrin synthesis of the Harderian gland of the female golden hamster. Journal of Anatomy. 160:157–166. Stopková, R., Dudková, B., Hájková, P., Stopka, P. (2014) Complementary roles of mouse lipocalins in chemical communication and immunity. Biochemical Society Transactions. 42(4):893-898. Stopkova, R., Klempt, P., Kuntova, B., Stopka, P. (2017). On the tear proteome of the house mouse (Mus musculus musculus) in relation to chemical signalling. PeerJ 5:e3541 Taniguchi, Y., Choi P., Li, GW., Chen H., Babu M., Hearn J., Imily A., Xie X. (2010) Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science. 329(5991):533-538. Tegoni, M, Pelosi P., Vincent, F., Spinelli, S., Campanacci, V., Grolli, S., Ramoni, R., Cambillau, C. (2003) Mammalian odorant binding proteins. Biochimica et Biophysica Acta (BBA)-Protein Structure and Molecular Enzymology. 1482(1- 2):229-240. Thiessen, D. (1992) The function of the Harderian gland in the Mongolian gerbil, Meriones unguiculatus. In: Harderian glands: porphyrin metabolism, behavioral and endocrine effects. (Springer):127-140. Variale, B., Chieffi-Baccari, G., d’Istria, M., Di Matteo, L., Minucci, S., Serino, I. and G. Chieffi. (1992). Testosterone induction of poly (A)(+)- RNA synthesis and 35S methionine incorporation into proteins of Rana esculenta Harderian gland. Mol. Cell. Endocrinol. 84:R51-56. Vogt, R., Riddiford, L. (1981). Pheromone-binding and inactivation by moth antennae. Nature. 293:161-163. Wang, D, Jiang, X, Chen, P, Inouchi, J, Halpern, M. (1993) Chemical and immunological analysis of prey-derived vomeronasal stimulants. Brain Behav Evol. 3;41:246-254. Watanabe, M. (1980) An autoradiographic, biochemical, and morphological study of the Harderian gland of the mouse. Journal of morphology 163(3): 349-365. Webb, S., Hoffman, R., Puig-Domingo, M., Reiter, R. (1992). In: Harderian Glands: Porphyrin Metabolism, Behavioral and Endocrine Effects. Springer, Berlin. Zhulidov, P., Bogdanova, E., Shcheglov, A., Vagner, L., Khaspekov, G., Kozhemyako, V., Matz, V., Meleshkevitch, E., Moroz, L., Lukyanov, S., Shagin, D. (2004) Simple cDNA normalization using kamchatka crab duplex- specific nuclease. Nucleic Acids Res, 32(3):e37. Zuri, I, Halpern, M. (2003). Differential effects of lesions of the vomeronasal and olfactory nerves of garter snake (Thamnophis sirtalis) responses to airborne chemical stimuli. Behavioral Neuroscience. 117:169-183.

54

Chapter 2:

Characterization of the Harderian gland transcriptome in

Thamnophis sirtalis parietalis

Introduction:

Despite the ubiquitous presence of the Harderian gland in all lineages of terrestrial vertebrates, relatively little is known about its function (Payne, 1994; Deist & Lamont, 2018). The majority of studies aiming to determine the function of this gland are based on histological techniques or physiological experimental evidence. Relatively few studies have been performed using modern molecular techniques to discover the function of this gland.

Deist & Lamont (2018) produced the first described targeted transcriptome of the Harderian gland in the domestic chicken, Gallus gallus. The analyses performed in this study were based on previous findings including that the Harderian gland in birds is often found to be a functional component of the immune system. The authors compared the gene expression of the Harderian gland to other immune tissues in the chicken and described their comparative molecular analyses. They found that this tissue is an important component of the immune system in Gallus gallus, and performs unique roles not performed by other tissues. Additionally, Deist & Lamont found that the Harderian gland transcriptome displays enrichment of genes in the G- protein coupled receptor signaling pathway – an indication that the gland may be involved in detecting and responding to environmental pathogens.

The transcriptome of a squamate Harderian gland was recently described using tissues from three species of colubrid snakes; Caraiba andreae, Cubophis cantherigerus, and Tretanorhinus variabilis (Domínguez-Pérez et al. 2018). This analysis of the Harderian gland transcriptome is limited to a description of the 25 most abundant transcripts. Within this list, the authors describe proteins annotated as

55 putative toxins, lipid-binding proteins, and proteins likely to have antimicrobial properties.

The research conducted here represents the first study conducted from a molecular viewpoint investigating the role of the Harderian gland in the vomeronasal chemosensory system. Histological studies investigating the function of this tissue in T. s. parietalis provide compelling evidence that the gland is an integral component of the vomeronasal chemosensory system in this group (Rehorek et al. 2011). The major research goals of the research presented in this chapter are aimed at characterizing the Harderian gland transcriptome and describing the functional significance of expressed genes. Additionally, this research seeks to identify evidence for the roles of the Harderian gland in regard to detection of sex pheromones and anti- microbial defense.

Aim 1) Identify the transcriptome of the Harderian gland of T. s. parietalis and describe its functional significance:

To identify the transcriptome (a library of all expressed genes) of the Harderian gland of T. s. parietalis, I used high throughput RNA-sequencing and in-silico analyses to produce a database containing sequences for all mRNA transcripts expressed in this tissue.

Although the genome of T. sirtalis was sequenced in 2015 (Thamnophis sirtalis annotation release 100, NCBI), this genome has been shown to be of limited use when applied to T. s. parietalis. Using gene sequences from this draft genome has resulted in failed attempts to create primer sets, is not useful as a reference for genome-guided transcriptome assembly, and results in low mapping quality when used in RNA-seq based experiments.

The T. s. concinnus genome was produced from snakes captured at E.E. Wilson Wildlife Refuge in Benton County, OR. This population is of the Thamnophis sirtalis concinnus . The population located in Inwood, MB (the focal population

56 used for all research presented here) is of the Thamnophis sirtalis parietalis subspecies. Because these two populations are of different subspecies and located in different geographical regions, genetic differences likely contribute to the low performance of this genome database. Additionally, because the existence of multiple isoforms, alternative splicing, and other complications that may lead to alternative mature mRNA isoforms being transcribed from similar or identical genes, accurate annotation of genes based on prediction algorithms can be a challenging process (Yandell & Ence, 2012). The gene sequences contained in the T. s. concinnus draft genome were predicted in-silico from genomic DNA sequences, and do not provide information regarding tissue specificity, or evidence that the gene is actually transcribed in-vivo.

Transcript sequences from the Inwood, MB population of T. s. parietalis are more representative of the focal population and contain mature gene sequences that are transcribed in vivo. This transcriptome has been shown to be more useful as a tool to produce high quality and meaningful data when applied to downstream sequencing projects. The transcriptome reference database produced during this project outperforms the T. s. concinnus genome in all applications involving T. s. parietalis to date.

The transcriptome produced here was annotated using protein names from UniProt as well as functional annotation terms from the Gene Ontology (GO) consortium. To predict the functional significance of genes expressed in the Harderian gland, I performed enrichment analysis to identify gene sets displaying significant enrichment in the Harderian gland compared to heart, liver, testis and brain tissue from the same population of T. s. parietalis.

As this study is focused on the role of the Harderian gland within the vomeronasal chemosensory system, particular emphasis was on the proteins that are secreted from the gland which may interact with the vomeronasal organ. To predict the secretome (the subset of the transcriptome containing genes secreted into the

57 extracellular environment), analysis of the transcriptome was performed in-silico to predict proteins likely to enter the secretory pathway and be secreted outside the cell.

Aim 2) Research question: Does the Harderian gland produce lipid-binding proteins capable of binding and solubilizing female sexual attractiveness pheromone?

The T. s. parietalis sexual attractiveness pheromone first described by Mason et al. (1989) consists of a series of non-polar and water-insoluble lipids (long-chain methyl ketones). The vomeronasal organ in snakes relies upon aqueous fluid to transport chemical cues from the environment to the sensory epithelium. Therefore, insoluble molecules such as lipids should be difficult to transport and detect. Male garter snakes however, immediately respond to extremely small amounts of female pheromone (LeMaster & Mason, 2001). The lipid molecules of female pheromone are hypothesized to be bound and solubilized by lipid-binding proteins to allow transport into the aqueous environment to facilitate detection by the sensory epithelium of the vomeronasal organ.

Because secretions from the Harderian gland primarily travel to the lumen of the vomeronasal organ (Rehorek et al. 2000b), and the vomeronasal organ has no intrinsic specialized secretory structure, it is hypothesized that the Harderian gland produces many of the proteins within the lumen of the vomeronasal organ, including pheromone-binding proteins which function to solubilize lipid pheromones in this group. Based on this hypothetical role of the Harderian gland, I predicted that gene expression profiles of the Harderian gland would show high levels of expression of lipid-binding proteins. I tested this prediction using enrichment analysis to identify gene sets associated with Gene Ontology terms which were enriched in Harderian gland tissue compared to a combined assortment of other tissues in T. s. parietalis.

58

Aim 3) Research question: Does the Harderian gland play a role in the immune system of the vomeronasal organ?

The Harderian gland has often been described as an immune tissue in birds (Albini et al. 1974). As squamates and birds are both sauropsids with relatively recent common ancestors, it can be hypothetically inferred that this tissue also has some function within the immune system. Due to the observations that the Harderian gland secretions travel primarily to the lumen of the vomeronasal organ, this tissue was hypothesized to perform immune functions as part of the vomeronasal chemosensory system in T. s. parietalis. Based on this hypothesis, I predicted that the Harderian gland transcriptome would be enriched for extracellular antimicrobial proteins, and that many of those proteins would be present in the vomeronasal lumen. I defined a gene set including all genes annotated as being involved in antimicrobial defense and used enrichment analysis to test whether expression of this gene set was enriched in Harderian gland tissue compared to other tissues of T. s, parietalis.

59

Methods:

Animal collection and tissue collection:

Four adult Red-sided garter snakes (T.s. parietalis) (n=2 male and n=2 female) were collected from the wild near the town of Inwood in the Interlake region of Manitoba, Canada (50°31'28"N, 97°30'00"W) during May of 2015. All snakes were housed in outdoor nylon cloth arenas (1 × 1 × 1 m) for 24 hours between time of capture and tissue collection. Snakes were euthanized individually immediately prior to tissue collection via an overdose of methohexital sodium (Brevital™) (0.005 mL/g of body mass, 1% solution). Euthanized snakes were placed on a workspace under a Jena dissecting scope and tissues were removed with forceps and corneoscleral scissors. A total of 11 tissues from 4 snakes were collected (n=2 male and n=2 female Harderian gland pairs, n=2 male and n=2 female vomeronasal organs, n=1 male and n=1 female brains, n=1 male testis). Prior to, and between each dissection, both the workspace and surgical instruments were cleaned to prevent cross contamination of samples. The workspace was cleaned with sterile saline, followed by a 10% bleach solution then treated with RNAse away (Ambion). Surgical instruments were cleaned with sterile saline, aseptically treated with a 10% bleach solution and 100% ethanol, then placed in a Germinator 500™ dry sterilizer for >15 seconds. Tissues were dissected from animals, rinsed briefly with nuclease free water to remove blood or surface contaminants, then immediately placed in RNALater™ RNA stabilization reagent (Invitrogen). Tissues were stored at 4°C for 24 hours to allow the RNALater™ to fully permeate the tissue, then moved to -20°C for long term storage until transportation. Tissues were transported to Oregon State University on ice (~0°C) then stored at -20°C until RNA Extraction.

60

RNA Extraction and Sequencing:

Tissues were removed from RNA preservation reagent and immediately mechanically homogenized. Following homogenization, total RNA was extracted with E.Z.N.A.® HP Total RNA Kits (Omega Bio-Tek). A sample from each extracted RNA was run on polyacrylamide gel and visualized to ensure high quality RNA was present and intact. Each extracted RNA sample was estimated using absorbance at 260nm using a Spectramax™ M3 with SpectraDrop™ microvolume microplate (Molecular Devices, LLC, San Jose, CA). An equal quantity (by nucleotide mass; 500ng each) of each of the 11 extracted RNA samples was added to a sample pool. The pooled RNA was then concentrated via Lithium Chloride precipitation, resuspended in nuclease free water and diluted to a concentration of 111ng/μl. Full length complementary DNA (cDNA) sequences were reverse transcribed from the pooled RNA samples and enriched for messenger RNA using Tetro™ Reverse Transcriptase (Bioline) coupled with primer oligonucleotides designed to target and enrich mature messenger RNAs containing poly-A tails. This cDNA library was then used to prepare a normalized sequencing library. cDNA libraries were prepared as described in Kitchen et al. (2015), except using the cDNA normalization procedure described in Meyer et al (2009). This library preparation protocol normalizes transcript abundances in order to avoid over-sequencing of abundant transcripts and to improve representation of rare transcripts. Primer sequences used during library prep are reported in Appendix A.2. The completed cDNA sequencing library was submitted to the Genomics & Cell Characterization Core Facility (University of Oregon, Eugene) for sequencing. High throughput sequencing was performed on the Illumina MiSeq® platform (2 lanes of 150 bp paired end reads).

61

Quality Control and Assembly:

The raw data from 2 lanes of Illumina MiSeq® yielded approximately 82M (~82M forward and ~82M reverse) 150 bp paired end sequencing reads. Quality filtering was performed with the custom script “QualFilterFastq.pl” (Meyer, Github; Appendix U.1). Reads were filtered from the dataset if they did not pass any of 3 separate filters. Reads were removed if they contained: 1) 20 or more base pairs with a Quality Value (QV) score of 20 or less, 2) 50 or more Homologous Repeats (HR) or 3) 15 or more base pairs aligning to adapter sequences from library preparation. Reads passing quality filters were examined using FastQC (version 0.11.3) (Andrews, 2010) to ensure a high-quality dataset was used in subsequent steps. Remaining reads were processed with the custom script “fix_PE_fastq.pl” (Meyer, Github; Appendix U.2) to match read pairs and place orphans (reads with no match in the dataset) into a separate file.

Reads were assembled with the de novo assembler software package Trinity v2.1.1 (Grabherr et al. 2011; Hass et al. 2013). Multiple iterations of Trinity assembly were performed, and multiple quality metrics were used to assess each resulting assembly: Standard metrics (number of assembled contigs, mean and median length of contigs, N75, N50, N25) were used to compare the relative sizes of contigs contained in each assembly. BLASTx was used to determine the number of predicted Thamnophis sirtalis genes (searched against the Thamnophis sirtalis annotation release 100, NCBI) captured by each assembly. An assembly method was selected based on a synthesis of these quality metrics indicating the most useful assembly. This choice of assembly was examined at a later date (after subsequent RNA-seq data was generated) and again compared to all other assemblies with respect to the percent of reads which map to the assembled contigs. This metric is the most useful in determining the suitability of a transcriptome as a refence for read mapping-based downstream applications and confirmed this assembly as the most useful for this purpose. The chosen final assembly was assembled with the following parameters: Reads with valid mates were assembled as pairs and orphan reads were incorporated

62 maintaining their orientation (forward or reverse) using the flags “--left left.fastq, left_orphans.fastq” and “--right right.fastq, right_orphans.fastq”. The minimum contig length included in the assembly was 200 base pairs.

Identification of Genes and Annotation:

The final assembly was filtered to remove redundant transcripts using the custom script “trinity_reps.pl” (Meyer, Github; Appendix U.3). Highly similar transcripts indicating assembly of multiple isoforms per gene were collapsed into the longest Trinity sub-component present for each gene.

The resulting transcriptome was annotated using BLASTx (version 2.7.1) via the custom script “GenesFromLocalDB_cgrb.pl” (Meyer, Github; Appendix U.4). BLAST searches were performed against all vertebrate proteins from the Swissprot (manually curated, high confidence annotations) and TrEMBL (larger number of annotated proteins, but with less confidence in annotations) (The UniProt Consortium). To ensure that the most useful annotations were associated with transcripts, BLAST hits were filtered to prevent annotations with gene names containing: “uncharacterized”, “unknown”, “RIKEN (un-named mouse genes)”, “hypothetical protein”, “whole genome shotgun”, and “predicted protein”. When a transcript matched one or more proteins in the database with high confidence (e-value

63

Functional annotations (Gene Ontology consortium) were added to transcripts based on accession numbers of the most significant BLAST hit from either the Swissprot or TrEMBL (UniProt) databases. The continuously updated Gene Ontology associations file “goa_UniProt_all.gaf” (Gene Ontology.org) was downloaded and used to create a functional annotation table using the custom script “GOAnnotTable.pl” (Meyer, Github; Appendix U.5). Gene Ontology term annotations were then added to transcripts using the custom script “GO_by_gene.pl” (Meyer, Github; Appendix U.6).

All possible open reading frames 20 amino acids or greater were identified from all transcripts using the Transdecoder software package (Haas et al. 2013; https://transdecoder.github.io/). The output containing all possible open reading frames was then used as a reference to identify proteins containing known protein family domains identified from the Pfam-A database (Finn et al. 2014). The software package HMMER (Eddy, 2011) was used to search the Pfam-A database to identify protein domains via hidden Markov models. Open reading frames with matches to known domains (e-value ≤ .0001) were annotated with the name of the protein family identified.

Identification of the Harderian Gland Transcriptome:

Description of the Harderian gland tissue-specific transcriptome was performed using high throughput RNA-seq (Figure 2.1) (See Chapter 4 for a complete description of RNA-seq methods). Harderian gland tissue was collected from 40 adult T. s. parietalis in both the spring and the summer (n=10 male spring, n=10 male summer, n=10 female spring, n=10 female summer), total RNA was extracted from isolated Harderian gland tissue and a sequencing library was prepared using the established TAG-seq protocol which employs sequence tags to efficiently detect expressed transcripts when mapped to an appropriate reference (Meyer et al. 2011; Lohman et al. 2016). Primer sequences used during library prep are reported in Appendix A.2. The prepared library was submitted to Oregon Health & Science

64

University, Massively Parallel Sequencing Shared Resource (MPSSR). High throughput sequencing was performed on the Illumina HiSeq 2500 platform (2 lanes, 100 bp single end reads). Reads were quality filtered and mapped to the above described T. s parietalis multi-tissue transcriptome. Transcripts with an average of at least 0.5 reads mapped per transcript across all samples were considered to be expressed in Harderian gland tissue and used to estimate the full transcriptome of the Harderian gland.

65

Figure 2.1: Transcriptomic analysis workflow: Analyses included construction of 1) A multi-tissue transcriptome from 2 male and 2 female snakes, 2) Tag-seq expression profiles from Harderian 39 Harderian glands & 3) Tag-seq expression profiles from a pool of 35 tissues from brain, heart, liver and testis from Thamnophis sirtalis parietalis.

66

Secretome Identification:

Multiple methods were used to identify the complete list of all proteins which are secreted from Harderian gland tissue into the extracellular environment (secretome). Because secreted proteins contain a signal peptide (a short amino acid sequence which functions to target a protein to the rough endoplasmic reticulum thereby entering the secretory pathway), a common method of identifying secreted proteins is to search for signal peptides among possible open reading frames for each gene. To perform this search, I used the software package SignalP to identify proteins which are likely to be secreted outside of the cell (Peterson et al. 2011). The SignalP algorithm was applied to the previously generated reference file containing all possible open reading frames of 20 amino acids or longer. The default cutoff values of 0.45 (for proteins not predicted as transmembrane) and 0.50 (for predicted transmembrane proteins) were used to identify secretory proteins.

Because many transcripts in any transcriptome assembly are incomplete and may be lacking a signal peptide, any sequence with a BLASTx match associated with a Gene Ontology term for the “extracellular environment” (GO:0005576) or “extracellular space” (GO:0005615) was included in the HG secretome regardless of the presence of a signal peptide. Additional proteins found to be expressed primarily in the Harderian gland and identified in subsequent LC/MS/MS protein mass spectrometry of vomeronasal secretion (see Chapter 3) were included in the secretome. These proteins were detected as mRNA in the Harderian gland and present as protein in the extracellular fluid filling the lumen of the vomeronasal organ, therefore they were included in the secretome regardless of the existence of additional evidence that these are secreted proteins.

67

Gene Set Enrichment Analysis:

To determine if the Harderian gland transcriptome displayed enrichment for genes associated with a particular function, I performed functional enrichment analysis of genes expressed in the Harderian gland compared to an existing dataset of combined multi-tissue gene expression data from T.s. parietalis (n=36 samples from 9 snakes; n=9 samples each from heart, liver, brain and testis) (Hubert et al. in prep). Harderian gland tissue was collected from 40 adult T. s. parietalis (n=10 male spring, n=10 male summer, n=10 femalespring, n=10 femalesummer) (these were the same samples that were used to determine the Harderian gland transcriptome in the above section). Total RNA was extracted from Harderian gland tissue and a sequencing library was prepared using the TAG-seq protocol and sequenced on the HiSeq 2500 platform (2 lanes, 100 bpsingle-end reads). Reads were mapped to the reference transcriptome to create a table of counts (See Chapter 4 for a complete description of RNA-seq methods). This table of counts was combined with a table from the existing multi-tissue expression data. Differential expression analysis was performed using the R package Deseq2 (Love et al. 2014). All samples from Harderian gland tissue were combined into a single factor level and counts from all other tissues were combined into a second level within the same factor. Read counts were filtered for low expression; reads were retained if the average read count from Harderian gland tissue was ≥ 0.5. Harderian gland counts were compared to all other tissues using a single factor design to identify genes significantly more abundant in Harderian gland tissue compared to other tissues. From the Deseq2 output, a gene enrichment score was calculated as the negative log10 of the p-value multiplied by the log2 fold change (- log10(p-value)*(log2FC)). This score is both directionally aware, using the fold change information to determine relative abundance of the expressed genes, and also accounts for the confidence in that determination by incorporating the p-value. Applying this score to the Deseq2 output for all genes in the analysis creates a rank- ordered list of gene scores appropriate for enrichment analysis using the command line program ErmineJ (Gillis et al. 2010). The custom script Make_ermineJ_annotations.pl (Appendix U.7), was used to extract gene ontology

68

(GO) term annotations for all genes included in the analysis. Enrichment analysis was performed with ErmineJ using precision-recall gene-score-resampling (Gillis at al., 2010). In addition to the gene sets defined by the Gene Ontology consortium, an additional gene set was defined based on the hypothetical prediction that the Harderian gland produces secreted protein involved in immune system functions. This gene set includes 116 genes likely involved in “Antimicrobial defense” (Appendix A.1). Separate enrichment analyses were performed for each aspect (Molecular function, Biological process, and Cellular component) of the Gene Ontology structure.

69

Results:

Transcriptome assembly and annotation:

Sequencing data received from the Genomics & Cell Characterization Core Facility (University of Oregon, Eugene) yielded a total of 82,309,520 raw read pairs (2x lanes Illumina MiSeq® 150 bp paired end reads) After quality filtering, removing reads containing sequencing adapters and matching reads with valid mate pairs, 43,388,032 forward and reverse read pairs and 13,398,493 orphan forward (left) and 33,574,177 orphan reverse (right) reads remained.

The resulting Trinity assembly contained a total of 89,188 unique transcripts (Table 2.1). The minimum length of transcripts retained in the assembly was 201bp, the maximum length was 14,870bp, and the average length was 776bp. The N25 (top 25% of the data in an ordered list from longest to shortest transcript) was 2,240 bp contained in 5,358 sequences. The N50 was 1,172 bp contained in 16,156 sequences, and the N75 was 542 bp contained in 38,126 sequences. The total GC% of bases in the transcriptome was 40.14%.

Annotation of the Transcriptome via BLASTx searches against Swissprot and TrEMBL resulted in successful annotation of 33,148 (~37.2%) transcripts. Of those annotated with BLASTx 21,070 (~23.6%) transcripts had valid Gene Ontology terms associated with their respective match. Annotation via hidden Markov models to identify known functional domains (pFAM) yielded 16,866 (18.9%) successful annotations. From the complete transcriptome assembly, A total of 792,191 possible open reading frames of 20AA or longer were predicted, 15,597 of which were predicted to enter the secretory pathway to secreted into the extracellular environment.

Mapping of RNA-seq reads to the complete transcriptome from a subsequent RNA-seq experiment using T. s. parietalis under similar conditions yielded 95.7% of reads successfully mapped.

70

Harderian gland tissue-specific transcriptome characterization:

The subset of Trinity transcripts that were retained in the Harderian gland tissue-specific transcriptome contained a total of 13,393 unique transcripts. The min length of transcripts was 201bp, the maximum length was 14,870bp, and the average length was 1,614bp. The N25 was 3,517 bp contained in 1,180 sequences. The N50 was 2,378 bp contained in 3,068 sequences, and the N75 was 1,441 bp contained in 5,946 sequences. The total GC% of bases in the transcriptome was 41.06%.

Annotation of the Transcriptome via BLASTx searches against Swissprot and TrEMBL resulted in successful annotation of 8,272 (~66.2%) transcripts. Of those annotated with BLASTx 5,883 (~43.9%) transcripts had valid associated Gene Ontology terms. Annotation via hidden Markov models to identify known functional domains (pFAM) yielded 6222 (46.5%) successful annotations. A total of 4,343 transcripts from the Harderian gland were predicted to enter the secretory pathway. of which were predicted to enter the secretory pathway to secreted into the extracellular environment.

71

Table 2.1 Summary statistics of de-novo assembly and annotation of multi-tissue and extraction of Harderian gland-specific transcriptomes:

Multi-Tissue Harderian Gland- Transcriptome Specific Transcriptome

Total transcripts 89,188 13,393 N50 1,172bp 2,378bp Min transcript length 201bp 201bp, Max transcript length 14,870bp 14,870bp Mean transcript length 776bp 1,614bp GC content 40.14%. 41.06%. Annotated (UniProt) 33,148 (37.2%) 8,272 (66.2%) Annotated (Gene Ontology) 21,070 (~23.6%) 5,883 (~43.9%) Annotated (Pfam) 16,866 (18.9%) 6222 (46.5%) Secreted (Predicted) 15,597 4,343

Analysis of the most abundant transcripts revealed that ~96.7% of the total normalized read counts were mapped to the 100 most abundant transcripts. The top 3 most abundant transcripts amounted to ~74.2% of all expressed transcripts and were annotated as “Plasminogen”, “Lipocalin homologue”, and an unannotated protein of unknown function. The top 100 transcripts (Table 2.2) contained 4 lipid-binding proteins of the lipocalin family, 12 proteins involved in antimicrobial defense, and many proteins involved in transcription, translation, and protein production, transport and secretion.

72

Table 2.2 The top 100 most highly expressed genes in the Harderian gland of Thamnophis sirtalis parietalis: Protein Name is based on annotations from the highest confidence match in either the Swissprot or TrEMBLE databases (BLASTx e-value ≤ .0001). "Predicted Function" and "Functional Category" were added based on protein information obtained from the National Center for Biotechnology Information (NCBI) or UniProt (UniProt.org) databases. "Mean normalized counts" represents the mean number of reads mapping to each transcript (SHRiMP2; Rumble, et al. 2009). Normalization performed as part of the DEseq2 expression analysis pipeline (Love et al. 2014).

Mean Protein Name Predicted Function Functional Category Counts

Plasminogen Plasmin precursor, degradation of proteins Component of blood 1,405,633 Lipocalin homologue Binds and solubilizes lipids Lipid Binding 1,074,495 N/A N/A Unknown 923,330 Harderian gland protein N/A Unknown 324,506 HG33 Lipocalin Binds and solubilizes lipids Lipid Binding 159,821 Pentaxin Innate immune, Functional ancestors to antibodies Pathogen defense 72,584 N/A N/A Unknown 48,933 N/A N/A Unknown 48,282 Protein transport protein Formation of protein transport vesicles Protein production/transport 29,692 Sec24A NADH-ubiquinone Component of electron transport chain Metabolic processes 21,760 oxidoreductase chain 3 Lipocalin-type Binds and solubilizes lipids Lipid Binding 18,771 Prostaglandin D synthase N/A N/A Unknown 18,682 N/A N/A Unknown 17,855 N/A N/A Unknown 15,719 Ficolin 1 Innate immune response, Binds pathogens Pathogen defense 14,744 N/A N/A Unknown 14,271 N/A N/A Unknown 12,571 N/A N/A Unknown 11,347 Ficolin 1 Innate immune response, Binds pathogens Pathogen defense 10,965 N/A N/A Unknown 10,643 Bactericidal/permeabilit Antibiotic protein, Binds LPS Pathogen defense 10,135 y-increasing protein 3 N/A N/A Unknown 9,407

73

Ficolin 1 Innate immune response, Binds pathogens Pathogen defense 8,442 N/A N/A Unknown 7,455 Signal peptidase Cleaves signal peptide from secretory proteins Protein production/transport 6,313 complex subunit 3 N/A N/A Unknown 6,032 NADH-ubiquinone Component of electron transport chain Metabolic processes 6,001 oxidoreductase chain 1 Putative 60s ribosomal Component of ribosomal complexes Translation 5,667 protein Cytochrome c oxidase Component of electron transport chain Metabolic processes 5,263 subunit 1 N/A N/A Unknown 5,080 LRRG00134 Possible* tissue regeneration Cell structure* 5,071 Histone-lysine N- DNA methylation, regulation of transcription Transcription 4,949 methyltransferase MLL3 Ficolin 1 Innate immune response, Binds pathogens Pathogen defense 4,857 Signal transducer CD24 Cell proliferation, Immune response Cell proliferation 4,657 protein Cytochrome b Component of electron transport chain Metabolic processes 4,536 Galectin Carbohydrate binding protein (glycosylation) Mucus 3,759 N/A N/A Unknown 3,689 N/A N/A Unknown 3,630 Arylsulfatase I Hydrolyzes sulfate containing compounds Metabolic processes 3,492 Cytochrome c oxidase Component of electron transport chain Metabolic processes 3,287 subunit 3 N/A N/A Unknown 2,807 60S ribosomal protein Component of ribosomal complexes Translation 2,728 L17 UPF0764 protein Possible* tissue development or metabolism Metabolism* 2,687 C16orf89 Galectin Carbohydrate binding protein (glycosylation) Mucus 2,505 N/A N/A Unknown 2,500 Bactericidal/permeabilit Antibiotic protein, Binds LPS Pathogen defense 2,375 y-increasing protein 3 Peroxiredoxin-4 Peroxide detoxification Oxidative stress reduction 2,195 Ficolin 1 Innate immune response, Binds pathogens Pathogen defense 2,028 LRRG00134 Possible* tissue regeneration Cell structure* 1,957 Transcription factor Transcription factor Transcription 1,954 SOX-8 60S ribosomal protein Component of ribosomal complexes Translation 1,912 L8 N/A N/A Unknown 1,903

74

40S ribosomal protein Component of ribosomal complexes Translation 1,884 SA 40S ribosomal protein Component of ribosomal complexes Translation 1,751 S3a Neuronal PAS domain Regulation of neurogenesis Cell proliferation 1,717 protein 3 N/A N/A Unknown 1,637 40S ribosomal protein Component of ribosomal complexes Translation 1,604 S30 N/A N/A Unknown 1,539 Fumarylacetoacetase Amino acid degradation Protein production/transport 1,537 protein N/A N/A Unknown 1,450 N/A N/A Unknown 1,447 ATP synthase subunit a Component of ATP synthetase Metabolic processes 1,358 N/A N/A Unknown 1,318 N/A N/A Unknown 1,229 Annexin Calcium dependent phosolipase A inhibitor Anti-inflammatory 1,215 Signal sequence Cleaves signal peptide from secretory proteins Protein production/transport 1,190 receptor subunit 1 60S ribosomal protein Component of ribosomal complexes Translation 1,187 L7 Surfeit locus protein 4 Structural component of ER and Golgi body Protein production/transport 1,175 60S ribosomal protein Component of ribosomal complexes Translation 1,131 L6 Lipocalin Binds and solubilizes lipids Lipid Binding 991 Proline-rich nuclear Degradation of mRNA Translation 952 receptor coactivator 2 Kinesin protein KIF13B Transport vesicle regulatory protein Protein production/transport 946 N/A N/A Unknown 925 N/A N/A Unknown 871 60S ribosomal protein Component of ribosomal complexes Translation 849 L23a N/A N/A Unknown 837

Acetyltransferase of Conversion of pyruvate to acetyl-CoA and CO2 Metabolic processes 823 pyruvate dehydrogenase ER membrane protein Protein folding in the endoplasmic reticulum Protein production/transport 806 complex subunit 3 40S ribosomal protein Component of ribosomal complexes Translation 801 S28 Salivary agglutinin Pathogen Binding, Host defense Pathogen defense 797 Dr1-associated Negative regulation of transcription Transcription 780 corepressor

75

Keratin, type I Component of cytoskeleton Cell structure 770 cytoskeletal 18 Ficolin 3 Innate immune, Lectin complement pathway Pathogen defense 718 60S ribosomal protein Component of ribosomal complexes Translation 704 L35a Pentaxin Innate immune, Functional ancestors to antibodies Pathogen defense 693 40S ribosomal protein Component of ribosomal complexes Translation 686 S4 Eukaryotic translation Translation initiation factor Translation 667 initiation factor 5B 60S acidic ribosomal Component of ribosomal complexes Translation 657 protein P2 isoform A Polyadenylate-binding Translation initiation, binds mRNA Translation 653 protein N/A N/A Unknown 653 N/A N/A Unknown 649 PRA1 family protein Regulates neurotransmitter concentrations Signaling 646 Protein disulfide- Protein folding Protein production/transport 630 isomerase Ferritin Binds iron, Maintains local ferrous oxide levels Iron binding 625 60S ribosomal protein Component of ribosomal complexes Translation 622 L22 N/A N/A Unknown 620 Eukaryotic translation Translation initiation factor Translation 617 initiation factor 4B Translocon-associated Regulation of ER bound proteins, Binds calcium Protein production/transport 610 protein subunit delta N/A N/A Unknown 607 Ac1147 protein N/A Unknown 590

76

Figure 2.2 Differential expression of Harderian gland and pooled samples: Analysis included 39 Harderian glands compared to a pool of 35 tissues from brain, heart, liver and testis in Thamnophis sirtalis parietalis. Heat map represents z-scores calculated from regularized log transformed read mapping counts (Deseq2; Love et al. 2014). Heat map includes only differentially expressed genes with Benjamini Hochberg adjusted p-values ≤ 0.01. Brackets indicating tissue-specific expression are approximate.

77

Enrichment analysis:

Differential expression analysis revealed a total of 6,217 genes which were differentially expressed in the Harderian gland compared to the multi-tissue pool (p ≤ 0.01). Of these, 1,674 genes showed significantly greater expression in Harderian gland tissue, and 4,543 (46%) genes showed significantly less expression in the Harderian gland than in other combined tissues (Figure 2.2).

Enrichment analysis of genes expressed in the Harderian gland compared to genes expressed in liver, heart, brain, and testis showed significant enrichment of several gene sets defined by Gene Ontology terms.

Gene sets included in the “Molecular Function” aspect (Table 2.3) of the Gene Ontology database showed enrichment of proteins including terms for lipid binding (GO:0008289; p=1.00e-4; BH adjusted p-value=0.0147), transporter activity (GO:0005215; p=3.00e-4; BH adjusted p-value=0.02205), calcium ion binding (GO:0005509; p=8.00e-4; BH adjusted p-value=0.00392), transmembrane transporter activity (GO:0022857; p=1.20e-3; BH adjusted p-value=0.0441). An additional 16 GO terms showed significant enrichment p=values but failed to pass multiple test correction.

78

Table 2.3: Enrichment analysis results for the "Molecular Function" aspect of the Gene Ontology structure: “Gene set" indicates the molecular function of genes contained in the listed gene set. Enrichment analysis was performed using the ErmineJ software package using Precision Recall Gene Score Resampling (Gillis et al. 2010). Gene scores were calculated from differential expression analysis of Harderian gland tissue compared to a multi-tissue pool using the R package DEseq2 (Love et al. 2014). Gene scores were calculated using - log10(p-value)(log2FC) from DEseq2 output. "Adjusted p-values" were based on ErmineJ output and calculated using the Benjamini Hochberg FDR method to control for false discovery rate. * Significant enrichment in Harderian gland tissue Gene Adjusted Gene set Ontology term p-value lipid binding * GO:0008289 0.0147 * transporter activity * GO:0005215 0.0221 * calcium ion binding * GO:0005509 0.0392 * transmembrane transporter activity * GO:0022857 0.0441 * carbohydrate binding GO:0030246 0.1793 hydrolase activity, hydrolyzing O-glycosyl compounds GO:0004553 0.4729 metal ion transmembrane transporter activity GO:0046873 0.4977 transferase activity, transferring hexosyl groups GO:0016758 0.4814 hydrolase activity, acting on glycosyl bonds GO:0016798 0.4377 ion transmembrane transporter activity GO:0015075 0.4204 transcription regulator activity GO:0140110 0.3996 DNA-binding transcription factor activity GO:0003700 0.3859 transferase activity, transferring glycosyl groups GO:0016757 0.3585 transmembrane signaling receptor activity GO:0004888 0.3444 peptidase activity GO:0008233 0.3646 channel activity GO:0015267 0.3712 passive transmembrane transporter activity GO:0022803 0.3712 inorganic molecular entity transmembrane transporter activity GO:0015318 0.3632 peptidase activity, acting on L-amino acid peptides GO:0070011 0.3642 endopeptidase activity GO:0004175 0.3536

79

Gene sets included in the “Biological Processes” aspect (Table 2.4) showed significant enrichment of genes functioning in transport (GO:0006810; p=1.00e-12; BH adjusted p-value=4.27e-10), localization (GO:0051179; p=1.00e-12; BH adjusted p- value=2.85e-10), establishment of localization (GO:0051234: p=1.00e-12; BH adjusted p-value=2.14e-10), organic substances transport (GO:0071702: p=1.00e-4; BH adjusted p-value=0.017), and nitrogen compound transport (GO:0071705; p=3.00e-4; BH adjusted p-value=0.0427). An additional 42 GO terms showed significant unadjusted p-values but failed to pass multiple test correction. Genes involved in the “G-protein receptor coupled signaling pathway” did not show significant enrichment (p=0.0771; BH adjusted p=value=~1.0 (software rounded)). An additional custom gene set containing proteins involved in “Antimicrobial defense” showed significant enrichment of antimicrobial proteins (p=1.00e-12; BH adjusted p-value=8.54e-10).

Table 2.4: Enrichment analysis results for the "Biological Process" aspect of the Gene Ontology structure: "Gene set" indicates the biological process in which included genes are involved. An additional author-defined gene set including genes involved in “Antimicrobial defense” was also included in this analysis. Enrichment analysis was performed using the ErmineJ software package using Precision Recall Gene Score Resampling (Gillis et al. 2010). Gene scores were calculated from differential expression analysis of Harderian gland tissue compared to a multi-tissue pool using DEseq (Love et al. 2014). Gene scores were calculated using -log10(p-value)(log2FC) from DEseq2 output. "Adjusted p-values" were based on ErmineJ output and calculated using the Benjamini Hochberg FDR method to control for false discovery rate. * Significant enrichment in Harderian gland tissue

Gene Adjusted Gene set Ontology term p-value establishment of localization * GO:0051234 2.14e-10 * localization * GO:0051179 2.85e-10 * transport * GO:0006810 4.27e-10 * Antimicrobial defense * Author Defined 8.54e-10 * organic substance transport * GO:0071702 0.0171 * nitrogen compound transport * GO:0071705 0.0427 *

80 amide transport GO:0042886 0.1281 peptide transport GO:0015833 0.1464 transmembrane transport GO:0055085 0.1964 vesicle-mediated transport GO:0016192 0.2088 macromolecule localization GO:0033036 0.2096 protein transport GO:0015031 0.2847 establishment of protein localization GO:0045184 0.6775 cellular amide metabolic process GO:0043603 0.6882 protein localization GO:0008104 0.6939 monocarboxylic acid transport GO:0015718 0.7015 ion transport GO:0006811 0.7160 receptor-mediated endocytosis GO:0006898 0.7164 establishment of localization in cell GO:0051649 0.7326 lipid catabolic process GO:0016042 0.7645 carboxylic acid transport GO:0046942 0.7729 anion transport GO:0006820 0.8239 lipid localization GO:0010876 0.8436 glycoprotein metabolic process GO:0009100 0.8467 proteolysis GO:0006508 0.8620 lipid transport GO:0006869 0.8658 cellular lipid metabolic process GO:0044255 0.8682 microtubule-based movement GO:0007018 0.8695 Golgi vesicle transport GO:0048193 0.8702 organic acid transport GO:0015849 0.8726 ER to Golgi vesicle-mediated transport GO:0006888 0.8843 metal ion transport GO:0030001 0.8847 secretion GO:0046903 0.8872 intracellular transport GO:0046907 0.8896 phospholipid metabolic process GO:0006644 0.8901 glycosylation GO:0070085 0.8906 intracellular protein transport GO:0006886 0.8934 protein processing GO:0016485 0.8945 protein glycosylation GO:0006486 0.9236 glycoprotein biosynthetic process GO:0009101 0.9236 macromolecule glycosylation GO:0043413 0.9236

81 ion transmembrane transport GO:0034220 0.9239 inorganic anion transport GO:0015698 0.9280 methylation GO:0032259 0.9454 lipid phosphorylation GO:0046834 0.9501 phosphatidylinositol phosphorylation GO:0046854 0.9501 cation transport GO:0006812 0.9638 organic anion transport GO:0015711 0.9790

Gene sets included in the “Cellular Component” (subcellular localization) aspect showed significant enrichment of genes which are targeted to the Extracellular region (GO:0005576; p=1.00e-12; BH adjusted p-value=1.17e-10), Endomembrane system (GO:0012505; p=1.00e-12: BH adjusted p-value=5.85e-11), Integral component of membrane (GO:0016021; p=1.00e-12; 3.90e-11), Intrinsic component of membrane (GO:0031224; p=1.00e-12; BH adjusted p-value=2.93e-11), and Organelle sub compartment (GO:0031984; p=1.00e-12; BH adjusted p-value=2.34e-11). An additional 12 gene sets showed significant p-values but failed to pass multiple test correction.

Table 2.5: Enrichment analysis results for the "Cellular Component" aspect of the Gene Ontology structure: “Gene set" indicates the subcellular localization of genes included in the indicated gene set. Enrichment analysis was performed using the ErmineJ software package using Precision Recall Gene Score Resampling (Gillis et al. 2010). Gene scores were calculated from differential expression analysis of Harderian gland tissue compared to a multi-tissue pool using the R package Deseq2 (Love et al. 2014). Gene scores were calculated using - log10(p-value)(log2FC) from DEseq2 output. "Adjusted p-values" were based on ErmineJ output and calculated using the Benjamini Hochberg FDR method to control for false discovery rate. * Significant enrichment in Harderian gland tissue Gene Adjusted Gene set Ontology term p-value extracellular region * GO:0005576 1.17e-10 * endomembrane system * GO:0012505 5.85e-11 * integral component of membrane * GO:0016021 3.90e-11 * intrinsic component of membrane GO:0031224 2.93e-11 *

82

* organelle sub compartment * GO:0031984 2.34e-11 * endoplasmic reticulum GO:0005783 0.0644 endoplasmic reticulum part GO:0044432 0.0735 endoplasmic reticulum membrane GO:0005789 0.0760 nuclear outer membrane-E.R. membrane network GO:0042175 0.0728 endoplasmic reticulum sub compartment GO:0098827 0.0655 Golgi apparatus GO:0005794 0.0659 bounding membrane of organelle GO:0098588 0.0672 Golgi apparatus part GO:0044431 0.0945 Golgi sub compartment GO:0098791 0.1002 Golgi membrane GO:0000139 0.1076 membrane coat GO:0030117 0.1740 coated membrane GO:0048475 0.1740

83

Discussion:

Transcriptome characterization: The full multi-tissue transcriptome assembly produced during this research project contained a total of 89,188 unique transcripts. This number of assembled transcripts is much larger than the 26,656 predicted genes in the T. s. concinnus genome (NCBI). Many mRNA molecules are subject to alternative splicing and may lead to the production of several isoforms per gene leading to an increased number of transcripts compared to the predicted number of genes contained in the genome. However, while de novo assembly algorithms make use of the most current bioinformatic approaches to produce the best possible result, there are many steps during library prep, sequencing and assembly that introduce spurious sequences into the final transcriptome (Conessa, et al. 2016; Yandell & Ence, 2012). Few transcriptomes are assembled without errors therefore, the full multi-tissue transcriptome likely contains spuriously assembled transcripts. This may in part account for the relatively low N50 (1,172bp) indicating a large number of short transcripts, and in the relatively few transcripts (~37.2%) that successfully annotated via the UniProt database. This transcriptome however, appears to contain a large percentage of correctly assembled genes actually expressed in Harderian gland tissue. Reads produced from the additional sequencing run mapped to the transcriptome with high efficiency, with 95.7% of reads successfully mapped to produce the Harderian gland-specific transcriptome. Therefore, the transcripts contained in the Harderian gland-specific transcriptome are described with relatively high confidence. Additional support for the quality of the Harderian gland-specific transcriptome comes from a more reasonable number of transcripts (13,393), an increased N50 (2,378bp), and a much-increased percentage of transcripts successfully annotated with BLASTx matches in the UniProt database (~66.2%). Few (if any) de novo transcriptome assemblies contain 100% of expressed transcripts, and the Harderian gland transcriptome, being no exception, is likely missing a number of transcripts that are

84 expressed in this tissue. We may be confident however, in the transcripts that are included in this transcriptome. Additional support for the use of this reference as a research tool comes from the finding that the T. s. parietalis transcriptome greatly outperforms the available T. s. concinnus genome when applied to RNA-seq based studies in this species. The percentage of reads mapped to the genome was approximately 55-60% compared to the far superior 95.7% of reads mapped to the transcriptome reference produced here. The most abundant transcripts found to be expressed in Harderian gland tissue accounted for a very large proportion of all detected transcripts. The 3 most highly expressed proteins were plasminogen, a lipocalin lipid-binding protein, and an unannotated protein of unknown function. The unannotated protein found to be highly expressed (accounting for ~21% of all expression in this tissue) is interesting, but without further analyses, little information can be obtained from this finding. Further study is necessary to determine the identity and function of this protein, as well as other unidentified proteins expressed in the Harderian gland. High levels of expression of lipid-binding proteins (likely lipocalins) were hypothesized in Harderian gland tissue based on its hypothetical role in solubilizing lipid pheromones. The results described here show that the expression of lipocalins account for ~27.3% of gene expression identified in this tissue, with a single lipocalin protein accounting for ~23.4% of gene expression in the Harderian gland. This supports the hypothesized role of this tissue as producing pheromone-binding proteins. In order to fulfil this role as pheromone-binding proteins however, these proteins must not only be expressed, but must be secreted into the nasolacrimal duct and travel to the lumen of the vomeronasal organ. The location of expressed extracellular proteins cannot be determined by gene expression analyses alone. Chapter 3 of this dissertation investigates the protein contents of the fluid found within the vomeronasal organ. Plasminogen is the inactive precursor to plasmin, a protease protein commonly found in the blood stream. Activated plasmin is involved in the degradation of many

85 proteins and important in regulating the lifetime of many proteins in the blood (Castellino, 2013). Plasmin is especially important in the degradation of blood clots formed when fibrinogen is activated in the blood stream to produce fibrin polymers (Castellino & Ploplis, 2005). Plasminogen is normally found to be produced primarily in the liver (Raum, et al., 1980; Castellino, 2013). Here, extremely high levels of plasminogen expression (~30.6% of mRNA expression) was found in Harderian gland tissue. This finding is both unexpected and highly interesting, suggesting that the Harderian gland may be contributing important proteins to the blood stream, a role normally performed by the liver. This is a preliminary finding, and more research is needed to determine its real significance.

Gene set enrichment: Gene expression analyses presents a major challenge in describing a very large amount of data. Gene set enrichment analysis compares the expression of genes associated with functional annotation terms to identify functional categories showing relatively high rates of expression of many of the genes included in that category. The enrichment analysis performed here identified a total of 14 gene sets based on Gene Ontology terms and 1 additional custom-defined gene set, which were enriched in Harderian gland tissue. Gene sets identified as enriched in Harderian gland tissue include: lipid-binding, transport, localization, establishment of localization, organic substance transport, transporter activity, transmembrane transporter activity, nitrogen compound (protein) transport, calcium ion binding, genes targeted to organelle sub-compartments (often referring to the Golgi apparatus), membrane components, and genes targeted to the extracellular region (Table 2.5). All of these terms may be interpreted as being involved in producing and transporting proteins to the Golgi and to ultimately be secreted into the extracellular environment. Based on the structure of the Harderian gland in T. s. parietalis, and the previous identifications of protein secretory bodies (Rehorek et al. 2000b), it is unsurprising that genes expressed in this tissue were involved in the secretion of proteins.

86

Additional GO terms that individually showed significant enrichment (p ≤ 0.05), but that did not pass multiple test correction included many genes involved in the secretory pathway, secretion of proteins, and targeting to the Endoplasmic reticulum, Golgi and secretory vesicles (Table 2.5). This set of GO terms approaching significance also included several genes involved in Glycosylation and carbohydrate binding, indicative of the production and transport of glycoproteins likely used for the production of mucus. These results, although not statistically significant based on this conservative method, support the significant findings that the Harderian gland in garter snakes is primarily a protein secretory structure.

Enrichment of Lipid-binding proteins: In addition to the large number of genes functioning in the secretory pathway, this analysis also identified significant enrichment of genes associated with the GO term for lipid-binding. Garter snakes use their vomeronasal organ to detect a variety of chemical signals, many of which are non-polar and insoluble in aqueous solution (LeMaster & Mason, 2001). Lipids are an extremely abundant component of all organisms. Snakes use lipids and other components of skin, mucus etc. as kairomones used to detect and trail prey (LeMaster & Mason, 2002). The T. s. parietalis sexual attractiveness pheromone, which consists of a homologous series of long chain methyl ketones, is non-polar and insoluble in aqueous fluid (Mason et al. 1989). As this is the primary method by which male snakes locate, recognize, and evaluate the quality of potential mates, it is essential to reproduction and thus, individual fitness. In order to contact the sensory epithelium within the vomeronasal organ, chemical cues acquired via tongue-flicking must first be dissolved in the fluid within the vomeronasal lumen. Non-polar lipids which do not readily dissolve in aqueous solution must be bound to a protein to become soluble, then may be transported into the vomeronasal lumen. Lipid-binding proteins have been thought to fulfil this role as components of the fluid within the lumen of the vomeronasal organ (Mason & Halpern, 2011). Rehorek et al. (2011) showed evidence suggesting that the majority of the fluid within the

87 vomeronasal organ is produced by the Harderian gland, and travels to the vomeronasal lumen via the nasolacrimal duct. It was therefore hypothesized that the Harderian gland would likely produce large amounts of lipid-binding protein. This hypothesis is supported by the finding that Harderian gland tissue exhibits significant enrichment of expression of mRNAs that are predicted to encode lipid-binding protein products.

Enrichment of antimicrobial proteins: Previous studied have indicated that the Harderian gland likely functions as a component of the immune system in many taxa, especially in birds (Burns, 1992). Deist & Lamont (2018) described the role of the Harderian gland in the immune system of the domestic chicken (Gallus gallus) and found that the tissue expresses a number of antimicrobial proteins. The Harderian gland transcriptomes of three colubrid snake species described by Domínguez-Pérez et al. (2018) found antimicrobial proteins to be among the most highly expressed transcripts identified in this tissue. The results described by Domínguez-Pérez et al. must be interpreted carefully, as simply describing the most highly expressed transcripts is not a method that accurately represents functional significance of a tissue when lacking comparisons of gene expression with other tissues or between different treatment conditions (Ballouz et al. 2017; Conessa et al. 2016).

Gene set enrichment analysis performed here showed that expression of genes involved in antimicrobial defense is significantly enriched in Harderian gland tissue compared to other tissues from conspecifics in similar environmental conditions. The results presented here show that the Harderian gland transcribes many genes with protein products predicted to be involved in antimicrobial defense. Enrichment of this gene set suggests that the Harderian gland has a specific role as an immune tissue, producing anti-microbial proteins at a higher rate than other tissues included in the analysis. Because secretions from this tissue are primarily transported through the nasolacrimal duct into the lumen of the vomeronasal organ, it is likely that the

88 antimicrobial proteins produced in the Harderian gland function to protect sensitive vomeronasal mucous membranes often in contact with environmental pathogens.

Enrichment of the G-protein receptor coupled signaling pathway:

In addition to antimicrobial proteins, Deist & Lamont (2018) also observed significant enrichment of genes involved in the G-protein receptor coupled signaling pathway in the chicken Harderian gland. The interpretation of the authors was that this finding demonstrates that the Harderian gland in Gallus gallus not only secretes immune system proteins but is also involved in detecting and responding to antigens that come into contact with the eye and surrounding mucous membranes.

The analysis presented here yields no evidence of enrichment of transcripts involved in the G-protein coupled signaling pathway in the Harderian gland of T. s. parietalis. Based on the differences in the physiology of the Harderian gland and associated structures in these groups, the apparent disagreement between these findings appear plausible.

The chicken Harderian gland is located in the orbit, and the eyes and surrounding mucous membranes of birds are exposed to the environment, and regularly come into contact with exogenous pathogens. In birds, the G-protein signaling pathway allows the Harderian gland to detect and respond to pathogen stimuli (Deist & Lamont, 2018). In snakes however, the eye is covered with a “spectacle” – a fixed transparent scale that covers and protects the eye from the environment (Sivak, 1977). In addition to protecting the eye from debris and damage, the spectacle likely also prevents the introduction of environmental pathogens. In this physiological arrangement, the Harderian gland in snakes would likely have little need to respond to pathogens directly. Rather, its proposed function within the immune system is to protect the mucous membranes of the vomeronasal organ which is located relatively far from the gland itself. It therefore may be expected that the G-protein coupled signaling pathway is not enriched in the Harderian gland of snakes.

89

Summary and Conclusions:

At the outset of this project, little was known about the Harderian gland in T. s. parietalis, especially regarding the function of its expressed genes. Building a Harderian gland-specific transcriptome was the first logical step toward understanding the function of this enigmatic tissue. Analyses presented here include descriptions of the most common (top ~96.7%) genes transcribed in Harderian gland tissue, and a prediction of those which are likely to be secreted into the extracellular environment. Enrichment analyses support the hypothesis that the Harderian gland is primarily a secretory structure, involved in the secretion of large amounts of proteins. Enrichment analysis also highlights the importance of the Harderian gland in the production of secreted proteins involved in lipid binding and extracellular immune system functions. The additional finding that the G-protein receptor-coupled signaling pathway was not enriched in Harderian gland tissue is interesting and suggests that this gland may fulfill slightly different roles within the immune system of birds compared to that of snakes, and that those differences may be due to a difference in the physiology of the eyes between these groups.

The results obtained from analysis of the highly expressed genes in the Harderian gland are very interesting. The role of the Harderian gland as producing pheromone- binding proteins is supported, but other findings are interesting as well. The extremely high level of plasminogen expression warrants additional research regarding its function in the Harderian gland of garter snakes as well as in other taxa, as it may indicate a novel function of the Harderian gland. Additional research opportunities are highlighted here, as there are several highly expressed proteins which are unannotated or have no known function. Discovering the function of these proteins may allow a better understanding of the function of the Harderian gland or may indicate yet another novel function of this tissue.

This chapter may be considered the most important contribution to this project, as much of the research and analysis conducted in subsequent chapters depends heavily upon the annotated multi-tissue transcriptome produced here. Chapter 3 is

90 based on Shotgun-proteomic analysis of the fluid contained within the lumen of the vomeronasal organ. This analysis was first attempted using reference proteins from the Swissprot and TrEMBLE databases but were unsuccessful. Similar analyses (presented in Chapter 3) using a protein sequence database prepared from the transcriptome successfully identified hundreds of protein components of this fluid. Chapters 4 and 5 present the results of RNA-seq based gene expression analyses, both of which require the transcriptome produced here, as the published T. s. concinnus genome was shown to be insufficient for this purpose. The results presented in this chapter comprise the most detailed transcriptome analysis of the Harderian gland in any organism to date.

91

References:

Albini, B., Wick, G., Rose, E., Orlans, E. (1974) Immunoglobulin production in chicken Harderian glands. Int Arch Allergy Immunol. 47:23-34. Andrews, S. (2010) FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc. Ballouz, S., Pavlidis, P., Gillis, J., (2017). Using predictive specificity to determine when gene set analysis is biologically meaningful. Nucleic Acids Research. 45(4):e20. Barka, T., and Anderson, P.J. (1965) Histochemistry: theory, practice and bibliography. Harper and Row, New York. Bateman, A., Clements, J., Coggill, P., Eberhardt, R., Eddy, S., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E., Tate, J., Punta, M. 2013. Pfam: the protein families database. Nucleic acids research. 42:D222- D230. Benjamini, Y., & Hochberg Y. (1995) Controlling the False Discovery Rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. B57(1):289-300. Burns, R. B. (1992) The Harderian gland in birds: histology and immunology. In: Harderian glands: porphyrin metabolism, behavioral and endocrine effects. Springer. 155-163. Castellino, F. (2013). Plasmin: Activity and specificity. In: Handbook of proteolytic enzymes. 3. Castellino, F., Ploplis, V. (2005) Structure and function of the plasminogen/plasmin system. Thrombosis and Haemostasis. 93(4):647-654. Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., Wojciech Szcześniak, M., Gaffney, D., Elo, L., Zhang, X., Mortazavi1, A. (2016) A survey of best practices for RNA-seq data analysis. Genome Biology. 17:13. Deist, M., Lamont, S. (2018) What makes the Harderian gland transcriptome different from other chicken immune tissues? A gene expression comparative analysis. Frontiers in physiology. 9:492. Domínguez-Pérez, D., Durban, J., Aguero-Chapin, G., Lopes, J., Molina, Ruiz, R., Almeida, D., Calvete, J., Vasconcelos, V., Antunes, A. (2018) The Harderian gland transcriptomes of Caraiba andreae, Cubophis cantherigerus and Tretanorhinus variabilis, three colubrid snakes from Cuba. Genomics. In press. Eddy, S. (2011) Accelerated profile HMM searches. PLoS Computational Biology. 7: e1002195.

92

Erickson, S. (2007) Sexual dimorphism and seasonal changes in the Harderian gland of the Red-sided garter snake, Thamnophis sirtalis parietalis. Honors Undergraduate thesis. Oregon State University Honors College. Corvallis OR. Finn, R., Bateman, A., Clements, J., Coggill, P., Eberhardt, R., Eddy, S., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E., Tate, J., Punta, M. (2014) The Pfam protein families database. Nucleic Acids Research. 42:D222- D230. Gillis, J. Mistry, M. Pavlidis, P. (2010) Gene function analysis in complex data sets using ErmineJ. Nature Protocols. 5(6):1148-59. Grabherr, M., Haas, B., Yassour, M., Levin, J., Thompson, D., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B., Nusbaum, C., Lindblad-Toh, K., Friedman, N., Regev, A. (2011) Full-length transcriptome assembly from RNA- Seq data without a reference genome. Nature biotechnology. 29(7), 644-52. Haas, B., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P., Bowden, J., Couger, M., Eccles, D., Li, B., Lieber, M., MacManes, M., Ott, M., Orvis, J., Pochet, N., Strozzi, F., Weeks, N., Westerman, R., William, T., Dewey, C. N., Henschel, R., LeDuc, R., Friedman, N., Regev, A. (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature protocols. 8(8): 1494-512. Halpern, M., Kubie, J., Silverstein, R., Muller-Schwarze, D. (1983) Snake tongue- flicking behavior: clues to vomeronasal system functions. Chemical Signals III. Plenum, NY. 45-72. Hubert, D., Bentz, E., Mason, R., In Prep. Transcriptional and physiological responses to acute thermal stress in the Red-sided garter snake Thamnophis sirtalis parietalis. Kitchen, S., Crowder, C., Poole, A., Weis V., Meyer E. (2015) De novo assembly and characterization of four anthozoan (Phylum Cnidaria) transcriptomes. G3: Genes, Genomes, Genetics. 5(11):2441-2452. LeMaster, M, Mason, R. (2002) Variation in a female sexual attractiveness pheromone controls male mate choice in garter snakes. Journal of chemical ecology. 28(6):1269-85. LeMaster, M., Mason, R. (2001) Evidence for a female sex pheromone mediating male trailing behavior in the Red-sided garter snake, Thamnophis sirtalis parietalis. Chemoecology. 11:149-152. Lohman, B., Weber, J., & Bolnick, D. (2016). Evaluation of TagSeq, a reliable low‐ cost alternative for RNA seq. Molecular ecology resources. 16(6):1315-1321. Love, M., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with Deseq2. Genome Biology. 15:550.

93

Mason, R., Halpern, M. (2011) Chemical ecology of snakes: from pheromones to receptors. North American Society for Comparative Endocrinology Conference. July, 2011. Mason, R., Fales, H., Jones, T., Pannell, L., Chinn, J., Crews, D. (1989) Sex pheromones in snakes. Science. 254(4915):290-293. Meyer laboratory. Github Repository. https://github.com/Eli-Meyer. Meyer, E., Aglyamova, G., Matz, M. (2011) Profiling gene expression responses of coral larvae (Acropora millepora) to elevated temperature and settlement inducers using a novel RNA‐Seq procedure. Molecular Ecology. 20:3599-3616. Meyer, E., Aglyamova, G., Wang, S., Buchanan-Carter, J., Abrego, D., Willis, B., Matz, M. (2009) Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx. BMC Genomics. 10:219. NCBI. Thamnophis sirtalis parietalis annotation release 100. Pavlidis, P., Lewis, D., Noble, W. (2002) Exploring gene expression data with class scores. Pacific Symposium on Biocomputing. 474–485. Payne, A., McGadey, J., Moore, M., Thompson, G. (1977) Androgenic control of the Harderian gland in the male golden hamster. Journal of Endocrinology. 75:73- 82. Payne, A. (1994) The Harderian Gland: A tercentennial review. Journal of Anatomy. 185(1):1–49. Petersen, T., Brunak, S., Heijne, G., Nielsen, H. (2011) SIGNALP 4.0: discriminating signal peptides from transmembrane regions. Nature methods. 8:785-6. Petersen, T., Brunak, S., von Heijne, G., Nielsen H. (2011) SignalP 4.0 - Discrimination between signal peptides and transmembrane Regions. Nature Methods 8:785-786. Raum, D., Marcus, D., Alper, C., Levey, R., Taylor, P., Starzl, T. (1980) Synthesis of human plasminogen by the liver. Science. 208:1036-37. Rehorek, S., Firth, B., Hutchinson, M. (2000a) The structure of the nasal chemosensory system in squamate reptiles. Journal of Biosciences. 25(2):181- 90. Rehorek, S., Hillenius, W., Quan, W., Halpern, M. (2000b) Passage of Harderian gland secretions to the vomeronasal organ in Thamnophis sirtalis (Serpentes: Colubridae). Canadian Journal of Zoology. 78(7): 1284-1288. Rehorek, S., Halpern, M., Firth, T., Hutchinson, M. (2011). The Harderian gland of two species of snakes: Pseudonaja textilis (Elapidae) and Thamnophis sirtalis (Colubridae). Canadian Journal of Zoology. 81:357-363. Rumble, S., Lacroute, P., Dalca, A., Fiume, M., Sidow, A. (2009) SHRiMP: Accurate mapping of short color-space reads. PLoS Comput Biol. 5(5): e1000386.

94

Sivak, J. (1977) The role of the spectacle in the visual optics of the snake eye. Vision Research. 17:293–298. The Gene Ontology (GO) database and informatics resource. (2004) Nucleic Acids Research, 32(suppl_1:1) D258-D261. The UniProt Consortium. (2018) UniProt: the universal protein knowledgebase. Nucleic Acids Res. 46:2699. Yandell, M., Ence, D. (2012) A beginner's guide to eukaryotic genome annotation. Nature Reviews Genetics. 13:329-342.

95

Chapter 3:

Identification of the protein components of vomeronasal secretions of Thamnophis sirtalis parietalis via protein mass spectrometry

Introduction:

The vomeronasal organ is a sensitive specialized chemosensory structure common to all lineages of terrestrial vertebrates (Eisthen, 1992). Although its structure and physiology varies by taxa, it shares a common functionality and similar structural features. In amphibians, mammals and lepidosaurs, the vomeronasal organ consists of a cartilaginous capsule with a fluid filled lumen (Halpern, 1987; Rehorek et al. 2000a; Dulac et al. 2003). The lumen of this structure is lined with sensory epithelium containing many neurons, each expressing a specific chemical receptor protein (Isogai, 2011; Brykczynska, 2013). The source of the fluid filling the lumen of the vomeronasal organ differs by taxa. In mammals, the fluid is produced intrinsically by mucous and serous glands in the vomeronasal organ itself. In most amphibians, the fluid is produced by specialized nasal glands. In squamate reptiles, the vomeronasal organ has no intrinsic source of fluid production, and the majority of the fluid in the vomeronasal lumen appears to be produced in the Harderian gland (Rehorek et al. 2011). Histological studies focused on the role of the Harderian gland in the vomeronasal chemosensory system suggest that the vast majority of Harderian gland secretions enter the nasolacrimal duct and travel directly to the vomeronasal lumen (Rehorek et al. 2000b). The Harderian gland is a large secretory structure organized specifically to secrete proteinaceous and mucous fluid into the nasolacrimal duct (Rehorek et al. 2000a;b). Histology of the Harderian gland and vomeronasal organ of T. sirtalis employing transmission electron microscopy and mercury bromophenol blue staining to identify and visualize protein secretory granules (Barka & Anderson, 1965) show that the Harderian gland actively produces and secretes large amounts of

96 protein, whereas the vomeronasal organ lacks specialized protein secretory structures suggesting that the fluid, and the majority of the proteins within the lumen of the vomeronasal organ originate from the Harderian gland. (Rehorek et al. 2000b; Rehorek et al. 2011). The protein components of the fluid within the vomeronasal lumen are thought to function as part of of the vomeronasal chemosensory system; facilitating the detection of non-polar, insoluble chemical cues including sex pheromones (Mason & Halpern, 2011). Although this fluid is likely an important functional component of the vomeronasal chemosensory system in squamates, the protein components of this fluid have not been identified. Identification of secretory proteins may be accomplished using multiple approaches. Often, in silico analyses of transcriptomic and gene expression data are sufficient to predict proteins which are likely to be secreted into the extracellular environment and perform a specific function (Hathout, 2007). Messenger RNA expression data are a valuable indicator of the regulation of gene transcription in specific tissues or under specific conditions, however, this method is often criticized, as it may not always be a reliable indicator of the concentrations of the proteins they encode (Li et al. 2014; Taniguchi et al. 2010). Gene expression analyses of this nature are valuable when applied to many applications, but a nucleotide-based approach is not appropriate as the sole method to characterize the serous components of the fluid within the vomeronasal organ. Although the Harderian gland is thought to be the primary source of the fluid within the lumen of the vomeronasal organ, the protein components of this fluid are likely a relatively small subset of the overall secretome of the Harderian gland. Additionally, although the vomeronasal organ does not contain specialized secretory structures, it is highly likely that at least a small percentage of the protein components within this fluid are produced and secreted by cells within in the vomeronasal organ itself. Proteins transcribed in the vomeronasal organ would not be detected by mRNA expression profiles of the Harderian gland, therefore, a protein-based approach is much more appropriate to identify the proteins present in the lumen of the vomeronasal organ.

97

Protein identification methods have made great advancements in recent years, and currently allow efficient identification and relative quantification of many hundreds of proteins simultaneously via “shotgun mass spectrometry” (Domon & Aebersold, 2006). Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) an effective method of making large numbers of simultaneous protein identifications. This analysis method requires the use of an appropriate protein reference database to make accurate protein identifications. In order to identify the protein components of the fluid within the lumen of the vomeronasal organ, I collected vomeronasal secretions from T. s. parietalis and identified the protein components via protein mass spectrometry.

Research aims and predictions: The research presented in this chapter was conducted to fulfil several related research goals and answer several research questions based on previous observations and hypothesized functions of the Harderian gland in T. s. parietalis. The major goal of the research presented here was to identify and characterize the protein components of vomeronasal secretion. Specific analyses were performed to identify possible pheromone-binding proteins and proteins capable of binding and solubilizing T. s. parietalis sexual attractiveness pheromone. Additional analyses were performed to identify proteins likely to have antimicrobial functions, and to confirm antimicrobial properties of vomeronasal secretion.

Aim 1) Characterize the proteins present in the lumen of the vomeronasal organ, and determine which proteins are likely to be produced in the Harderian gland: To identify the protein components of the fluid within the vomeronasal lumen, I collected secretion from the vomeronasal organ of male and female T. s. parietalis and analyzed these secretion samples by mass spectrometry to identify protein components. This investigation targeted those gene products that are not only expressed as mRNA in the Harderian gland but are translated to proteins that enter the secretory pathway and are transported to the vomeronasal lumen.

98

To more closely examine the assumption that the majority of proteins found in the vomeronasal lumen are produced in the Harderian gland, an analysis was performed to compare the abundances of these proteins with RNA-seq based expression profiles of the associated genes in Harderian gland tissue and in vomeronasal organ tissue to determine the likely sources of individual proteins.

Aim 2) Describe evidence for a pheromone-binding protein is present in the fluid within the vomeronasal organ. Pheromone-binding proteins are common mechanisms allowing organisms to efficiently detect sex pheromones to identify and evaluate potential mates (Chang et al. 2015). In most mating systems employing pheromonal mate recognition, it is the male which actively seeks female sex pheromone (Lemaster & Mason 2001; 2002). In such systems, the expression of sex pheromone-binding proteins is often strongly male biased (Beynon et al. 2008; Jin, et al. 2014; Chang et al. 2015). Female sexual attractiveness pheromone in T. s. parietalis has been identified as a series of long-chain methyl ketones which are non-volatile and insoluble in aqueous solution (Mason et al. 1989). This pheromone is detected via the vomeronasal organ in this group (Kubie et al. 1978; Halpern et al. 1983). A major aim of this body of research is to identify and describe evidence that supports the existence of the hypothesized pheromone-binding protein produced in the Harderian gland and active in the vomeronasal organ. Based on the expression patterns of pheromone-binding proteins observed in many taxa, it was hypothesized that such a protein in T. s. parietalis would display a high level of male-biased expression. Additionally, a pheromone-binding protein in this group must be capable of binding and solubilizing the non-polar lipids (methyl ketones) of the sex pheromone present on the skin of female garter snakes. Therefore, a candidate pheromone-binding protein in the vomeronasal fluid of T. s. parietalis must be identified as a lipid-binding protein (likely of the lipocalin protein family) and will very likely display strong male bias in protein concentration, especially during the spring mating season.

99

In order to address this research question, I first created an annotated transcriptome and used in-silico analysis to identify proteins with predicted lipid- binding functions and that are likely to be secreted (Chapter 2). To investigate the hypothesis that a pheromone-binding protein is present in the vomeronasal lumen, I performed relative quantitation of the protein components of the fluid in the lumen of the vomeronasal organ via mass spectrometry. To identify candidate pheromone- binding proteins, I searched for proteins which were both annotated as a lipid-binding protein, and which displayed male-biased and spring-biased protein concentrations.

Aim 3) Research question: Does the Harderian gland play a role in the immune system of the vomeronasal organ? The Harderian gland has often been described as an immune tissue in birds (Deist & Lamont, 2018; Burns, 1992: Albini et al. 1974). These observations, combined with what is known about the physiology of the Harderian gland, and its relation to the vomeronasal organ in squamates have led to the hypothesis that this tissue performs immune functions as part of the vomeronasal chemosensory system in T. s. parietalis. The in silico analyses described in Chapter 2 provided additional support for this hypothesis, showing that gene expression in the Harderian gland is significantly enriched for proteins involved in antimicrobial defense (Appendix A.1). To further examine this hypothesis, I characterized the protein components of the fluid secreted into the lumen of the vomeronasal organ to identify likely antimicrobial proteins. To experimentally confirm the antimicrobial properties of vomeronasal secretion, I used bacterial killing assays (BKA) (Dolan et al. 2016; Dugovich et al. 2016; Blakemore, 2017) to demonstrate that this fluid kills bacteria in vitro.

100

Methods:

Animal collection and collection of vomeronasal organ secretion: Adult Red-sided garter snakes (T.s. parietalis) were collected from the wild near the town of Inwood in the Interlake region of Manitoba, Canada (50°31'28"N, 97°30'00"W) during May of 2016 and 2017. All snakes were housed in outdoor nylon cloth arenas (1 × 1 × 1 m) following capture until collection of samples. Collection of vomeronasal secretion used in this study included a total of 56 animals (n=43 male and n=13 female). Snakes were removed from enclosures and their mass (g) and SVL (cm) were recorded. To stimulate secretion from the gland, snakes were given an intraperitoneal injection of pilocarpine (0.01mg/g of body weight) immediately before anesthetization (Rosenberg, 1992). Snakes were anaesthetized with a subcutaneous injection of Brevital™ (methohexital sodium) (0.0025 mL of 0.5% solution per 1g body mass). When anesthetic had taken full effect, snakes were placed in the workspace ventral-side up. Clean gauze was used to hold the mouth open and to create a physical barrier isolating the vomeronasal duct and preventing contamination of the vomeronasal secretion by saliva or other fluids within the mouth. With barriers in place, the opening to the vomeronasal duct was wiped with sterile phosphate buffered saline to remove residual fluid and possible contaminants. Vomeronasal secretion was collected by micro-syringe as it exited the vomeronasal duct and placed in a cryogenic vial on ice. Secretion was collected for approximately 15 minutes or until snakes began to regain consciousness. Following collection, secretion samples were immediately flash frozen in liquid nitrogen (-196 ̊C). Secretion samples yielded approximately 5-10µL per animal, and were frozen and stored either as individual samples, or as pools of 10 individuals in a single vial. After collection of the secretion, snakes were removed from the workspace and monitored for the first hour, or until they showed no signs of anesthesia, then returned to their home enclosure to await release at the site of capture. Secretion samples were transported on dry ice (- 79 ̊C) and stored at -80 ̊C until preparation for mass spectrometry.

101

Preparation of samples for mass spectrometry: Selected samples to be analyzed included vomeronasal secretion from 6 individuals (n=3 males and n=3 females) and 5 pooled samples (n=2 spring male, n=2 summer male, and n=1 spring female, each containing secretion from 10 individuals). To prepare samples for mass spectrometry, the raw secretion was first diluted 1:1 with 50mM ammonium bicarbonate (NH4HCO3) to lower the viscosity of the sample. Proteins were precipitated by adding 100% acetone to create a solution of 1:4 sample:acetone and incubated at -20 ̊C for 24 hours. Samples were then centrifuged, and the precipitated protein pellet was washed twice with fresh 100% acetone at - 20 ̊C. Samples were centrifuged under vacuum to remove acetone and resuspended in 50mM ammonium bicarbonate. Protein concentrations were determined via Qubit™ protein assay (Invitrogen) and diluted to concentrations of 1µg/µl in 50mM ammonium bicarbonate plus 1% Rapigest™ surfactant (Waters). To reduce disulfide bonds, dithiothreitol (DTT) was added to a final concentration of 10mM and samples were incubated at 55ºC for 45 minutes. To alkylate cysteine residues, iodoacetamide was then added to a final concentration of 20mM and samples were incubated at 25ºC for 30 minutes. Samples were digested with porcine trypsin (Promega #V5111). Trypsin was added to samples at a ratio of 1:50 trypsin:protein, and samples were incubated for 6 hours at 37ºC. To halt digestion and precipitate surfactant, trifluoroacetic acid was added to a final concentration of 0.5% by volume and samples were incubated at 37ºC for 45 minutes. Samples were centrifuged and supernatant containing tryptic peptides was collected and stored at -80ºC until analysis. Samples were submitted to the Oregon State University Mass Spectrometry Center for analysis.

Mass spectrometry: Mass spectrometry was performed on the prepared complex peptide mixtures by nanoLC-MS/MS (ultra-performance liquid chromatography coupled with tandem mass spectrometry) using an Orbitrap Fusion™ Lumos™ (Thermo Scientific). Mass spectra matching and initial data analysis was performed by the OSU Mass

102

Spectrometry Center. Raw files were analyzed with Thermo ScientificTM Proteome DiscovererTM 2.2 and searched using the Sequest™ HT engine. Mass spectra were searched against a database containing all possible open reading frames 20 amino acids or greater identified from the multi-tissue transcriptome (described in chapter 2) using the Transdecoder software package (Haas, et al. 2013; https://transdecoder.github.io/). Proteins were considered to have been identified in a sample with “high-confidence” only if identified by ≥ 2 unique peptides with a q- value (false discovery rate) of ≤ 0.05. Proteins were considered a component of vomeronasal secretion only if present in 2 or more samples. Proteins which were identified as components of vomeronasal secretion with “low-confidence” if the identification was based on only a single unique peptide or was detected in only a single sample, but with a peptide q-value of < 0.05. These low-confidence proteins were not included in further analyses. Proteins identified in vomeronasal secretion with high confidence were investigated and described using the information available from the National Center for Biotechnology Information (NCBI) or UniProt databases (Geer et al. 2010; UniProt consortium, 2017). Based on described protein functions, proteins were grouped into major functional categories (i.e. “Lipid binding”, “Pathogen defense”, “Component of blood”, etc.). The intensity (area under the curve) of mass spectra peaks were used to compare relative protein abundances between samples. Abundances were statistically compared between male and female samples using individually collected vomeronasal secretion samples (n=3 male and n=3 female) using a Welch’s two sample t-test and multiple test correction performed using the Benjamini-Hochberg method controlling for false discovery rate. Additional qualitative comparisons were made between seasonal samples, as the analysis only contained n=2 male pooled samples from snakes in the summer. The proportion of the sum peak intensities of mass spectra were calculated using the mean peak intensity across all samples (n=3 male individual, n=3 male pooled samples and n=3 female individual and n=1 female pooled sample).

103

Tissue specificity and predicting protein origin: To determine the most likely origin of individual proteins, gene expression for each component of the secretion was compared between Harderian gland and vomeronasal organ tissues (Figure 3.1). Total RNA was extracted from Harderian gland tissue from 20 male and 20 female snakes and a sequencing library was prepared using the TAG-seq protocol and sequenced on the HiSeq 2500 platform (2 lanes, 100 bp single-end reads). An additional TAG-seq library was prepared from extracted vomeronasal organs of a subset (n=16 male and n=16 female) of the same male and female snakes. Total RNA was extracted from whole extracted vomeronasal organs. The prepared library was sequenced on the HiSeq 3000 platform (1 lane, 50 bp single-end reads). Reads were mapped to the reference transcriptome to create a table of counts (See Chapters 4 and 5 for a complete description of RNA-seq methods). Reads were mapped to the multi-tissue transcriptome to create a table of combined counts. Differential expression analysis was performed using the R package DEseq2 (Love et al. 2014). Harderian gland tissue was compared to vomeronasal organ tissue using a single factor design to determine differential expression between tissues (See Appendix R.2 for R script). Proteins with mRNAs only detected in one tissue, were considered to be expressed primarily in that tissue. Additional proteins were considered to be primarily expressed in either tissue if the tissue-specific expression showed a significant unadjusted fold change of ≥ 20x

(log2fold change ≥ 4.3219 ; p ≤ 0.01) toward the indicated tissue.

104

Figure 3.1 Tissue specificity analysis workflow: Steps used to determine primary tissue of origin for proteins identified in vomeronasal secretion. Methods include RNA-seq (Tag-seq; Meyer 2009) and Protein Mass spectrometry. Differential expression analysis was performed using DEseq2 (Love et al. 2014).

Bacterial Killing Assays (BKA)

The bactericidal assay used here was modified from Dolan et al. (2016). Gentamycin-resistant Escherichia coli cells were plated on LB agar plates containing 10μg/mL gentamycin sulfate (MP Biomedicals LLC, 105030-QR12272). The plates were cultured for 2 days at 37°C. Three similarly sized colonies were collected using

105 a BBL Prompt system (BD Biosciences, VWR) to provide a relatively consistent number of bacterial cells for each completed BKA. The collected bacteria were vortexed for one minute in the BBL Prompt collection vials using provided sterile PBS, then diluted to a 1:10 solution in sterile PBS. 10 wells of a 96-well clear, flat bottom plate were used for a bacterial serial dilution (100%, 1:1, 1:2, 1:4, 1:8, 1:16, 1:32) to allow for reference points of bacterial killing ability (0% killing, 50%, 75%, 87.5%, 93.75%, 96.88%, 98.43% respectively). If sample wells did not reach threshold, they were assumed to have between 98.43% and 100% bacterial killing ability. Negative control wells contained only 100μl of sterile phosphate-buffered saline (PBS) and 100μl of LB media. Sample dilutions were prepared from 5 replicates of pooled vomeronasal secretion. A serial dilution was performed and samples with dilution factors of 100%, 1:1, 1:8, 1:16, 1:64, and 1:128 were added to six sample wells for each replicate. Prior to incubation, sample wells received 85 μl of sterile PBS, 5 μl of appropriate sample dilution, and 10μl of the prepared 1:10 stock bacterial dilution (100 μl total volume). The plate was then incubated while shaking at 25°C for 20 minutes to allow antimicrobial protein components of the secretion to interact with and kill bacteria. After incubation, LB broth (100 μl) with gentamycin (10μg/ml) was added to each well. The initial absorbance, measured by optical density at a wavelength of 600nm, was determined for each well by a Spectramax M3 plate reader using Softmax pro 6.2.1 software. Plates were then incubated in a shaking incubator at 37°C and 200 rotations per minute. The optical density (OD) at 600nm was measured again after 6 hours of incubation and every hour thereafter until growth had continued for a total of 15 hours. Results were analyzed using a custom R-code developed by Blakemore (2017) by calculating ‘time to threshold’ and assigning a bacteria killing potency to each sample dilution.

106

Figure 3.2 BKA plate layout diagram: Figure shows the dilutions used in Bacteria Killing Assay (BKA) used to identify antimicrobial properties of vomeronasal secretions of T. s. parietalis.

107

Results:

Protein identification results: Protein mass spectrometry analysis of all 11 samples of vomeronasal secretion yielded a total of 140 proteins identified with “high-confidence” (n ≥ 2 unique peptides detected in at least 2 samples; all peptides detected with q-value < 0.05) (Table 3.1). An additional 536 proteins were detected as “low-confidence” hits indicating that the protein identification was based on only a single unique peptide or was detected in only a single sample, but with a peptide q-value of <0.05. A total of 6 lipid-binding proteins were detected in vomeronasal secretion with high confidence, all of which were expressed primarily in the Harderian gland except one which showed low levels of expression in vomeronasal organ tissue as well. Antimicrobial proteins accounted for a large proportion of the proteins found in vomeronasal fluid. A total of 31 antimicrobial proteins were detected; 7 originated from genes expressed primarily in the Harderian gland, 10 expressed primarily in the vomeronasal organ, 13 expressed in both tissues, and 1 not found to be expressed in either tissue. Two proteins were, by a wide margin, the most highly abundant proteins in vomeronasal secretion. These proteins were produced primarily in the Harderian gland and annotated as “Lipocalin homologue” (~33.85% of sum peak intensity) and “Harderian gland protein HG33” (~31.74% of sum peak intensity).

108

Table 3.1 Proteins identified in vomeronasal secretions by nanoLC/MS/MS: Table includes 140 proteins identified with high confidence in the vomeronasal secretions of T. s. parietalis (Identified from at least 2 unique peptides; p-value ≤ 0.05). ‘Protein name’ indicates annotated protein name by UniProt, ‘Functional category’ is based on information available on NCBI and UniProt. ‘Primary expression’ indicates the tissue in which the protein was determined to be primarily expressed. HG= Harderian gland, VNO= vomeronasal organ. ‘% of sum peak intensity’ shows the percent of the total peak intensity from all proteins which was produced from the protein indicated.

Predicted % Sum

Expression Peak Protein Name Functional Category Site Intensity Lipocalin homologue Lipid Binding HG 33.847% Harderian gland protein HG33 Unknown HG 31.741% N/A Unknown HG 8.007% Bactericidal/permeability-increasing protein 3 Pathogen defense HG 1.260% N/A Unknown HG 1.107% Lipocalin (Candidate pheromone-binding Protein) Lipid Binding HG 0.322% N/A Unknown HG 0.185% Lipocalin Lipid Binding HG 0.176% Acidic mammalian chitinase isoform 1 Pathogen defense HG 0.143% Cholinesterase Signaling HG 0.075% Pentaxin Pathogen defense HG 0.058% Galectin Mucus HG 0.057% Ficolin 1 Pathogen defense HG 0.047% N/A Unknown HG 0.041% Ficolin 1 Pathogen defense HG 0.036% UPF0764 protein C16orf89 Metabolism HG 0.027% N/A Unknown HG 0.012% N/A Unknown HG 0.009% Pentaxin Pathogen defense HG 0.009% galectin-4 Pathogen defense HG 0.007% Chromosome 16 open reading frame 89 Cell structure HG 0.005% Parvalbumin Calcium Binding HG 0.001% 40S ribosomal protein SA Metabolism HG 0.001% Phospholipase A2 Group IIE Calcium Binding HG 0.000% Lipocalin-type Prostaglandin D synthase Lipid Binding HG 0.000% Perilipin-4 Lipid Binding HG 0.000% Transferrin Iron Binding VNO 4.273% Phospholipase A2 inhibitor alfa Calcium Binding VNO 1.030% Alpha-2-macroglobulin Component of Blood VNO 0.149% Venom factor Pathogen defense VNO 0.145%

109

Protein G7c Component of Blood VNO 0.080% Annexin Anti-inflammatory VNO 0.071% IgY1 Pathogen defense VNO 0.070% Actin gamma 1 Cell structure VNO 0.055% IgY1 Pathogen defense VNO 0.026% Creatine kinase B-type Metabolism VNO 0.018% Phospholipase A2 inhibitor Calcium Binding VNO 0.018% Complement factor B Pathogen defense VNO 0.013% Keratin, type I cytoskeletal 19 Cell structure VNO 0.011% Plasma protease C1 inhibitor Component of Blood VNO 0.011% Keratin, type II cytoskeletal 4 Cell structure VNO 0.011% Complement factor I Pathogen defense VNO 0.009% Cystatin Protein metabolism/transport VNO 0.008% Fibronectin Cell structure VNO 0.007% Immunoglobulin heavy constant mu Pathogen defense VNO 0.006% Ceruloplasmin Copper Binding VNO 0.005% Intelectin 1 (Galactofuranose binding) Pathogen defense VNO 0.004% Glutathione S-transferase A1 Pathogen defense VNO 0.002% Phospholipase A2 inhibitor 31 kDa subunit Calcium Binding VNO 0.002% Glutathione S-transferase P Pathogen defense VNO 0.001% p2X purinoceptor Metabolism VNO 0.000% polymeric immunoglobulin receptor Pathogen defense VNO 0.000% Salivary agglutinin Pathogen defense HG+VNO 0.668% Mesothelin Cell structure HG+VNO 0.199% Lipocalin homologue Lipid Binding HG+VNO 0.188% CUB zona pellucida domain-containing protein Cell structure HG+VNO 0.073% Gelsolin Cell structure HG+VNO 0.058% Complement C3 Pathogen defense HG+VNO 0.058% Alpha-2-macroglobulin Component of Blood HG+VNO 0.053% Glutathione S-transferase Mu 1 Pathogen defense HG+VNO 0.047% Lactadherin Apoptosis HG+VNO 0.045% Extracellular matrix protein 1 Cell structure HG+VNO 0.024% Glyceraldehyde-3-phosphate dehydrogenase Carbohydrate metabolism/transport HG+VNO 0.019% Hemopexin Component of blood HG+VNO 0.019% Complement factor H Pathogen defense HG+VNO 0.019% Alpha-enolase Carbohydrate metabolism/transport HG+VNO 0.012% Putative thrombin Component of Blood HG+VNO 0.011% Vitelline membrane outer layer protein Mucus HG+VNO 0.008% Heat shock cognate protein HSP 90-beta Thermal protection HG+VNO 0.008% Complement factor H Pathogen defense HG+VNO 0.008% Mucin-16 Mucus HG+VNO 0.007%

110

Salivary agglutinin Pathogen defense HG+VNO 0.006% Transketolase Metabolism HG+VNO 0.006% Xaa-Pro aminopeptidase 2 Protein metabolism/transport HG+VNO 0.006% Chitotriosidase Pathogen defense HG+VNO 0.005% Transgelin Cell structure HG+VNO 0.005% Triosephosphate isomerase Carbohydrate metabolism/transport HG+VNO 0.005% Fructose-bisphosphate aldolase Carbohydrate metabolism/transport HG+VNO 0.005% Cofilin-2 Cell structure HG+VNO 0.005% Pyruvate kinase Carbohydrate metabolism/transport HG+VNO 0.004% Vascular non-inflammatory molecule 2 Cell structure HG+VNO 0.004% Murinoglobulin-2 Protein metabolism/transport HG+VNO 0.004% Adipogenesis regulatory factor Cell structure HG+VNO 0.004% Elongation factor 1-alpha Metabolism HG+VNO 0.004% Carboxylic ester hydrolase Metabolism HG+VNO 0.003% Ribonuclease inhibitor Metabolism HG+VNO 0.002% Calumenin Calcium Binding HG+VNO 0.002% Complement C5 Pathogen defense HG+VNO 0.002% Pentaxin Pathogen defense HG+VNO 0.002% Bactericidal/permeability-increasing protein 3 Pathogen defense HG+VNO 0.002% Heat shock cognate protein 70 Thermal protection HG+VNO 0.002% ADP-ribosylation factor 1 Protein metabolism/transport HG+VNO 0.002% Heat shock cognate 71 kDa protein Thermal protection HG+VNO 0.002% Annexin Anti-inflammatory HG+VNO 0.002% Annexin Anti-inflammatory HG+VNO 0.002% complement C4 Pathogen defense HG+VNO 0.002% 39S ribosomal protein L16, mitochondrial Protein metabolism/transport HG+VNO 0.001% mucin-5AC Mucus HG+VNO 0.001% Galectin Mucus HG+VNO 0.001% Complement C5 Pathogen defense HG+VNO 0.001% Rho GDP-dissociation inhibitor 1 Cell structure HG+VNO 0.001% Nucleobindin-2 Calcium Binding HG+VNO 0.001% Protein FAM3B isoform 1 Component of Blood HG+VNO 0.001% Glutamine synthetase Metabolism HG+VNO 0.001% Kinesin family member 27 Protein metabolism/transport HG+VNO 0.001% 40S ribosomal protein S4 Protein metabolism/transport HG+VNO 0.001% WD repeat-containing protein 1 Cell structure HG+VNO 0.001% Purine nucleoside phosphorylase Metabolism HG+VNO 0.001% Clathrin heavy chain Protein metabolism/transport HG+VNO 0.000% Ras homolog family member A Cell structure HG+VNO 0.000% Proteasome activator complex subunit 2 Protein metabolism/transport HG+VNO 0.000% complement component C6 Pathogen defense HG+VNO 0.000%

111

Myosin-9 Protein metabolism/transport HG+VNO 0.000% Alpha-1,4 glucan phosphorylase Carbohydrate metabolism/transport HG+VNO 0.000% Villin-1 Cell structure HG+VNO 0.000% Tubulin polymerization-promoting protein family Cell structure HG+VNO 0.000% Ezrin Cell structure HG+VNO 0.000% Chloride intracellular channel protein Signaling HG+VNO 0.000% Nascent polypeptide-associated complex subunit α Protein metabolism/transport HG+VNO 0.000% Serine (Or cysteine) proteinase inhibitor Protein metabolism/transport HG+VNO 0.000% Rab GDP dissociation inhibitor Cell structure HG+VNO 0.000% N-acetylmuramoyl-L-alanine amidase Protein metabolism/transport HG+VNO 0.000% Protein disulfide-isomerase Protein metabolism/transport HG+VNO 0.000% Glucose-6-phosphate isomerase Carbohydrate metabolism/transport HG+VNO 0.000% Serum albumin Component of Blood Neither 10.363% Fetuin-B Metabolism Neither 0.141% InfB Protein metabolism/transport Neither 0.090% Alpha-1-antiproteinase 2 Protein metabolism/transport Neither 0.051% Apolipoprotein A4 Component of Blood Neither 0.023% Complement protein C3-1 Pathogen defense Neither 0.020% Kininogen-1 Component of blood Neither 0.013% Peptidyl-prolyl cis-trans isomerase Protein metabolism/transport Neither 0.007% Putative oxidoreductase C663.09c Metabolism Neither 0.005% Tubulin beta chain Cell structure Neither 0.003% Carboxypeptidase B2 Component of Blood Neither 0.003% Heparin cofactor 2 Component of blood Neither 0.003% Fibrinogen gamma chain Component of Blood Neither 0.003% Tubulin alpha-3 chain Cell structure Neither 0.002% Tubulin beta chain Cell structure Neither 0.002% Beta-2-glycoprotein 1 Component of Blood Neither 0.001%

112

Tissue-specific expression analysis results: Of the proteins detected in vomeronasal secretion with high-confidence, 26 proteins (comprising ~77.17% of the sum peak intensity of all mass spectra) were found to be expressed primarily in Harderian gland tissue, 26 proteins (~6.03% of sum peak intensity) originated from genes primarily expressed in the vomeronasal organ, and 72 (~1.63% of sum peak intensity) originated from genes expressed in both tissues without significant bias toward either tissue, and an additional 16 (~10.73% of sum peak intensity) originated from genes not found to be expressed in either the Harderian gland or the vomeronasal organ (Figures 3.3, 3.4, 3.5; Table 3.1) The remaining ~4.44% of sum peak intensity was from the 536 ‘low-confidence’ proteins not included in analyses.

Figure 3.3: Differential expression of proteins identified in vomeronasal secretion: Sequencing was performed using Tag-seq from mRNA extracted from Harderian glands (n=40) and vomeronasal organs (n=32). Z-scores for expression of 138 transcripts with protein products identified in vomeronasal secretion (excluding 2 transcripts with no mapped reads). Heatmap was generated using ‘Heatmap3’ (Zhao, et al. 2014).

113

Figure 3.4 Distribution of differential expression results comparing Harderian gland to Vomeronasal organ. Sequencing was performed using Tag-seq (Meyer, 2016) on mRNA extracted from Harderian glands (n=40) and vomeronasal organs (n=32) of Thamnophis sirtalis parietalis. Differential expression analysis was performed using DEseq2 (Love et al. 2014). Figure includes 138 transcripts with protein products identified in vomeronasal

114

Figure 3.5 Protein abundance by Protein Origin and Protein Identity: Peak intensity by protein (top) displays the percentage of the total peak intensity represented by the top 6 most abundant proteins identified in vomeronasal secretion. ‘Other’ refers to the remaining proteins not displayed individually. Color key indicates the predicted tissue of origin of each protein based on differential transcript abundances. Peak intensity by tissue (bottom) displays the percentage of the total peak intensity comprised of proteins predicted to originate primarily from the indicated tissue.

115

Relative quantification of proteins and comparisons by sex and season: Comparisons of the protein content from male and female vomeronasal secretions identified several proteins which were present in one sex while not detected in the other. Male samples showed 18 proteins which were only detected in males including 2 proteins involved in the innate immune system and 4 lipid-binding proteins. Of particular note are two lipid-binding proteins annotated as lipocalins (Transcript IDs: DN1116_c0_g1_i1 and DN4762_c0_g1_i1) which were found to be abundant in all male vomeronasal secretion samples (excluding one male sample in which DN4762_c0_g1_i1 was not detected). These proteins were not detected in any of the female samples. A seasonal comparison between pooled male samples showed that these proteins were ~2.79x and ~3.48x more abundant in the spring vs the summer respectively. Female samples contained 7 proteins which were absent from males including 4 proteins involved in cellular structure, 1 involved in protein metabolism, 1 involved in carbohydrate metabolism, and 1 calcium binding protein.

Bacterial Killing Assay (BKA) results: Negative control wells containing only PBS and LB broth showed no bacterial growth over a 15 hour period. Bacterial dilution wells containing PBS, LB broth and a dilution series of bacteria showed increased bacterial growth with increased initial bacterial concentrations. Sample wells containing PBS, LB broth, 10µl bacterial solution (stock dilution), and 5µl of a dilution series from sample replicates (100% to 0.78% concentrations) showed that bacterial growth rates decreased with increasing concentrations of vomeronasal secretion (Figure 3.6).

116

Figure 3.6 Bacterial growth curves in the presence of vomeronasal secretions: Increasing concentrations of vomeronasal secretion of T. s. parietalis from of 0.78% to 100% reduced bacterial growth rates using in-vitro bacterial killing assays (Dolan et al. 2016). Bacteria killing potency was measured by ‘time to OD threshold’ compared against a bacterial standard dilution (Blakemore, 2017). OD threshold used (OD:0.2 at 600nm) is indicated by a central horizontal black line. Note the horizontal position of bacterial growth curves as they cross the threshold.

117

Figure 3.7 Bacterial killing potency with increasing concentrations of vomeronasal secretion: Increasing concentrations of vomeronasal secretion of T. s. parietalis from of 0.78% to 100% killed up to 90.13% of bacteria using in-vitro bacterial killing assays (Dolan et al. 2016). Each well received 5µl each of the dilution indicated. Bacteria killing potency was measured by ‘time to OD threshold’ compared against a bacterial standard dilution (Blakemore, 2017).

118

Discussion:

Identification of proteins and comparisons of relative protein abundance: The Harderian gland of T. s. parietalis had previously been hypothesized to function as both a component of the vomeronasal chemosensory system by facilitating the detection of nonpolar chemical signals and by supplying extracellular immune system functions. The transcriptomic analysis described in Chapter 2 showed significant functional enrichment of both lipid-binding proteins and antimicrobial defense proteins (Tables 2.2 & 2.3). The result of these enrichment analyses, however, cannot be use dto infer the role of the Harderian gland in the vomeronasal chemosensory system without demonstrating that these proteins are presnt in vomeronasal secretion. Identification of vomeronasal secretion proteins, was therefore a required and extremely important component of the overall goals of this body of dissertation research. Protein mass spectrometry analysis of vomeronasal secretion yielded 140 proteins identified with high-confidence including 6 lipid-binding proteins and 31 antimicrobial proteins. The most highly abundant protein in vomeronasal secretion was identified as a lipocalin, (annotating as “Lipocalin homologue”). This single protein comprised over 1/3 of the sum peak intensity from proteins detected in the fluid. Additional lipid- binding proteins were also among the top most highly abundant proteins identified. Many antimicrobial proteins were also identified in vomeronasal secretion, together accounting for nearly 3% of the of sum total of peak intensities from all proteins. These findings agree with those from the earlier transcriptomic enrichment analysis and demonstrate that lipid-binding and antimicrobial proteins are not only transcribed in high abundance in the Harderian gland but that many proteins with these functions are also found to be present in high abundance in the lumen of the vomeronasal organ. These results suggest that the Harderian gland is fulfilling these roles, however, these findings are only meaningful if the proteins found in the vomeronasal organ originate from the Harderian gland rather than the vomeronasal organ itself. These results suggest that many of the important proteins identified in the

119 vomeronasal organ are likely to be produced in the Harderian gland. This suggests that the Harderian gland likely has functional roles in vomeronasal chemosensory system both in binding of nonpolar molecules and in production of extracellular antimicrobial proteins. The analysis methods described here provide support for the hypothesis that proteins produced in the Harderian gland travel through the nasolacrimal duct to the lumen of the vomeronasal organ. However, these results must be interpreted carefully, as they use mRNA expression to infer predicted origins of proteins identified in vomeronasal secretion, but do not demonstrate physical movement of individual proteins. The results of this meta-analysis combining mass spectrometry and RNA-seq may be used to demonstrate that 1) a given protein is present in vomeronasal secretion as measured by mass spectrometry, and 2) that the corresponding transcript encoding that protein is expressed as mRNA in either the Harderian gland or the vomeronasal organ. These are separate interpretations of two independent analysis methods. Although the results of this meta-analysis appear convincing when presented alongside well supported evidence that Harderian gland fluid secretions pass into the vomeronasal organ (Rehorek et al. 2000b), they are not sufficient to definitively demonstrate the tissue of origin for any individual protein. Experimental validation demonstrating physical movement of individual proteins may be accomplished by isotope labeling and subsequent mass spectrometry to confirm proteins were produced in the Harderian gland are present in vomeronasal secretions. Additional experimental evidence may also be provided to demonstrate that surgical ablation of the Harderian gland results in significant decrease in abundances of proteins predicted to be produced in the Harderian gland. In the absence of experimental evidence, results of this analysis may be interpreted as predictions of tissue specific protein production.

120

Harderian gland protein HG33: An additional protein of interest identified in vomeronasal secretion is annotated as “Harderian gland protein HG33”. This protein, despite accounting for nearly one third of the total sum peak intensity of proteins identified in vomeronasal fluid, has no identified function to date. The protein HG33 was identified and named as an abundant component of a cDNA library produced from the Harderian gland of T. s parietalis by Saxion & Rehorek (2013; NCBI). This protein is the second most abundant protein found in vomeronasal secretion and is highly expressed in the Harderian gland of this species. The function of this protein, although currently unknown, is likely very important to this system and warrants further investigation. A detailed analysis focused on determining the structure of HG33 may provide valuable insight into its function within vomeronasal secretions. In addition to the 140 proteins identified with high confidence, peptides belonging to 536 proteins were also identified in Harderian gland secretion. These proteins were identified by either a single peptide, or were only identified in a single sample, and therefore were not considered a reliably identified component of vomeronasal secretion. Although identified with lower confidence, these proteins may be important functional components of this fluid. Due to high variation between samples and the low number of samples available for this study, the statistical power to detect differences in protein abundance between samples is low. Statistical comparisons of individual protein abundances yielded no significant differences between male and female samples. There are however, several striking differences in protein abundances among samples where relatively high protein abundances were identified in one sex, but absent from the opposite sex. Male samples showed 18 proteins which were only detected in males. Among these were 4 lipid-binding proteins and 2 proteins involved in antimicrobial defense in the innate immune system. Female samples showed 7 proteins which were absent from male samples. These included 4 proteins involved in cellular structure, 1 involved in protein metabolism, 1 involved in carbohydrate metabolism, and 1 calcium binding protein. Based on this result, it appears that male-

121 specific vomeronasal secretion proteins include those which are hypothesized to play an active role in the chemosensory system by either binding chemical signals to facilitate detection, or by protecting the sensitive mucous membranes of the vomeronasal sensory epithelium from pathogens. Proteins detected only in female vomeronasal secretion include mostly proteins involved in cellular structure or metabolism. This comparison was conducted using vomeronasal secretion samples collected from the field during the spring mating season. During this period, male snakes actively search for female sexual attractiveness pheromone using their vomeronasal organ (LeMaster and Mason, 2001). Females are approached by courting males and may mate if they choose (Friesen et al. 2014). After mating, females spend relatively little time near the den, and move on to the surrounding areas to begin feeding (Lutterschmidt, et al. 2004). Erickson (2007) found that during this mating period, the Harderian glands of male snakes are hypertrophied and active, whereas the female glands remain regressed. Additionally, findings presented in Chapter 5 of this dissertation show that female vomeronasal receptors are almost universally downregulated during this period. These findings suggest that female snakes are not using their vomeronasal organ to explore their chemical environment during the spring mating period. However, after arriving at their respective summer feeding areas, females begin using their vomeronasal organ to actively search for prey. During the summer feeding period, the female Harderian glands are hypertrophied, and expression of vomeronasal receptors is similar to that of males (Erickson, 2007). This implies that there is a physiological activation of the vomeronasal organ between the spring and summer time periods. These findings show that male vomeronasal organs contain lipid-binding proteins and antimicrobial proteins whereas females contain proteins involved in metabolism and cell structure. It seems likely therefore, that this may be due to differences in physiology and seasonal timing. Male snakes secrete antimicrobial proteins and lipid-binding proteins while their chemosensory system is active. The vomeronasal organs of female snakes are undergoing physical and physiological

122 changes. The proteins involved in metabolism and cell structure are likely common during this period, and although likely not important components of extracellular vomeronasal secretions, are present in the secretion simply due to their increased expression in this organ during this time period. Two lipid-binding proteins were found to be abundant in male vomeronasal secretion, but absent from all female samples. These same proteins were found to be more abundant in the spring mating season compared to the summer feeding season. These characteristics were predicted to occur together as indicators of putative pheromone-binding proteins. A pheromone-binding protein must be capable of binding and solubilizing the lipid based T. s. parietalis sexual attractiveness pheromone. It would also likely display male biased expression, as well as seasonally biased expression; expressed at higher rates in the spring compared to the summer. Both of these proteins are likely candidates for pheromone-binding proteins as they display all of these predicted characteristics.

Tissue-specific expression of the proteins found in HG secretion: In addition to characterizing the proteins present in vomeronasal secretion, this research aimed to examine the assumption that the Harderian gland supplies the majority of this fluid and its constituent proteins. As the vomeronasal organ of T. s. parietalis appears to have little to no intrinsic secretory capacity, it has been suggested that the fluid filling this organ must have an extrinsic source (Rehorek et al. 2000b). Further investigations identified the Harderian gland as the most likely source of this fluid and demonstrated that this tissue (unlike the vomeronasal organ itself) contains many large secretory granules and actively secretes large amounts of protein into the nasolacrimal duct which ultimately empties into the lumen of the vomeronasal organ. Due to this physiological arrangement, and the apparent lack of secretory ability in vomeronasal tissue, it has been often suggested that the Harderian gland produces the vast majority of the fluid and protein content of the vomeronasal organ (Rehorek et al. 2011). However, although the vomeronasal organ lacks specialized secretory structures, it is likely that a percentage of the protein content of

123 vomeronasal secretion is in fact produced locally by cells within the vomeronasal organ itself. The integrated analysis presented here incorporates protein mass spectrometry and high throughput RNA sequencing to predict the tissue of origin of the proteins found in vomeronasal secretion. Based on expression measured by mRNA abundances in the Harderian gland and vomeronasal organ, the origins of individual proteins were predicted. The Harderian gland was found to transcribe 26 proteins which together make up ~77.17% of the sum peak intensity of mass spectra of proteins identified within vomeronasal secretions. Another 72 proteins were found to be expressed in the Harderian gland, but were also expressed in the vomeronasal organ; however, this set of proteins comprised only ~1.63% of sum peak intensity. The vomeronasal organ itself showed mRNA expression of 26 proteins comprising ~6.03% of sum peak intensity. Previously, the assumption that the Harderian gland contributes a large fraction of the fluid and protein in vomeronasal secretions was only based primarily on histological evidence and had not been confirmed by additional methods. These findings support this hypothesis. It was expected that the Harderian gland would be found to produce a large amount of this fluid as well as a majority of its protein components. Therefore, the results suggesting that up to approximately 77% of the sum peak intensity of the fluid was predicted to have originated from the Harderian gland are not exceptionally surprising but support the hypothesis that the Harderian gland contributes the majority of the protein within the fluid filling the vomeronasal lumen. Also unsurprising was the finding that the vomeronasal organ showed mRNA expression of protieins identified in this fluid suggesting that a portion of the protein components are contributed by the vomeronasal organ itself, but on a relatively small scale. While these results concerning the origins of proteins in vomeronasal secretion were hypothesized based on the findings of previous research and inferred here based on correlations with mRNA abundances, they are nonetheless extremely important in furthering the understanding of the role of the Harderian gland within the vomeronasal chemosensory system. The proteins expressed in the Harderian gland

124 and present in the vomeronasal organ represent a large contribution to vomeronasal fluid, particularly those proteins predicted to be involved in solubilizing nonpolar chemical signals and those involved in antimicrobial defense. These findings suggest that the Harderian gland plays several important roles in the vomeronasal chemosensory system. Interestingly, an additional 16 proteins were found to be present in vomeronasal secretion but were not found to be expressed as mRNA in either the Harderian gland or the vomeronasal organ. Most of this set of proteins were relatively minor constituents making up only a small percentage of the sum peak intensity, with the exception of serum albumin, which comprised ~10.36% of the sum peak intensity, despite not being produced locally. Serum albumin is usually produced primarily in the liver and is an abundant protein in blood serum (Fagerberg, et al. 2013). It is possible that the vomeronasal secretion was contaminated by blood during collection. This appears unlikely however, as all 11 samples show this protein to be present in high abundance, and a contamination artifact would be expected to be observed in a subset, or in more varying abundances. It appears more likely then, that a transport mechanism exists allowing serum albumin to exit the blood stream and enter vomeronasal secretion. Due to the high degree of vascularization observed in the Harderian gland, it appears likely that interactions with the blood stream would take place in this tissue rather than the vomeronasal organ. Serum albumin functions in the blood stream as a protein mainly to stabilize osmotic pressures and allow movement of proteins and solutes into and out of the blood stream (Day et al. 1979). Serum albumin also functions as a carrier of hydrophobic substances through the blood stream (Day et al. 1979). If serum albumin is a real component of vomeronasal secretion, rather than an artifact, this suggests that this protein may function as an additional transporter of hydrophobic molecules facilitating detection of chemical signals; or it may suggest yet another role of the Harderian gland in the vomeronasal chemosensory system – stabilizing colloid osmotic pressures in the fluid surrounding these very sensitive tissues and neurons, or possibly another role altogether.

125

BKA interpretations and conclusions: Because the vomeronasal organ requires chemical signals from the environment to be physically transported into the lumen to be detected by sensory neurons (Halpern, 1987), this tissue is constantly assaulted by environmental pathogens inadvertently carried alongside chemical signals. Therefore, it has been hypothesized that the vomeronasal organ would require defense mechanisms to protect these sensitive tissues. The fluid filling the vomeronasal lumen was hypothesized to be the likely source of this defense, as secretion of extracellular antimicrobial proteins would bind, inactivate or kill environmental pathogens to prevent damage to cells within the vomeronasal organ. Bacterial killing assays (BKAs) were developed to relatively quantify the antimicrobial properties of biological fluids such as blood plasma by allowing the plasma to contact and kill bacteria in a controlled environment (Dolan, et al. 2016; Dugovich et al. 2016). Although this technique was developed for use with blood plasma, it may be applied to any biological fluid, and was used here to test for antimicrobial properties in vomeronasal secretion. Standard bacterial killing assays include negative controls to show that samples are free from contamination, and a bacterial dilution which is used to create a scale to which sample potencies are compared. The potency is assigned based on a time-to- threshold (time at which a specific optical density threshold is reached) correlating with an interpolated curve calculated by the time/OD values measured from each well of a bacterial serial dilution. Therefore bactericidal “potency” can be approximated (i.e if the time-to-threshold in a sample well corresponded to the time-to-threshold in the 1:8 bacterial dilution well, the potency would be approximated as “the sample killed ~87.5% of the bacteria in the well”). The calculation of time to threshold potency are effective methods of comparing bactericidal potency between samples (Dolan et al. 2016; Dugovitch et al. 2017; Blakemore, et al. in prep). When using blood plasma in BKA assays, it is assumed the samples have bactericidal properties, and the goal is to quantitatively compare their potencies. The modified BKA procedure used here, rather than to quantify and compare potencies, was developed to

126 experimentally confirm that vomeronasal secretion does or does not display bactericidal properties. Positive and negative controls were used to demonstrate that any effects were due to the addition of vomeronasal secretion samples. The results of this experiment showed that the addition of increasing amounts of vomeronasal secretion resulted in decreased bacterial growth rates indicating that bacteria were killed during the initial incubation. These findings demonstrate that vomeronasal secretion of T. s. parietalis does in fact demonstrate antimicrobial properties. This is the first description of experimental evidence demonstrating antimicrobial properties in vomeronasal secretions.

Summary and Conclusions: This research constitutes the first mass spectrometry-based analysis of proteins present in squamate vomeronasal secretions. Conducting this research first required the construction of an annotated transcriptome to be used as a protein reference database. Analysis of the Harderian gland transcriptome (described in Chapter 2) showed high expression and functional enrichment of lipid-binding proteins, antimicrobial proteins, and several gene-sets related to secretion of proteins into the extracellular environment. This information, however was limited in that it was based on mRNA expression in the Harderian gland. To be meaningful in the context of describing the role of the Harderian gland in the vomeronasal chemosensory system, a protein-based analysis was needed to confirm that important proteins were not only expressed in the Harderian gland, but were secreted, and traveled to the lumen of the vomeronasal organ. The primary goals of the research presented in this chapter were to 1) identify and describe the functions of the proteins present in vomeronasal secretion, 2) identify and describe evidence of pheromone-binding proteins present in the lumen of the vomeronasal organ, and 3) determine if the Harderian gland plays a role in the immune system of the vomeronasal organ. Presented here, are findings describing 140 proteins identified with high confidence as components of vomeronasal secretion. An additional 536 proteins were

127 also identified but did not meet confidence thresholds to be considered as a component of the secretion. Among the proteins identified with high confidence were lipid-binding proteins, antimicrobial proteins, and proteins with several other functions. In order to predict the tissue of origin of individual proteins identified in vomeronasal secretions, gene expression profiles of the Harderian gland and vomeronasal organ were compared for transcripts associated with all identified proteins. This analysis suggested that only 26 proteins were predicted to be produced solely in the Harderian gland, but those proteins comprised over 77% of the sum peak intensity from mass spectra across all samples. In support of earlier findings, proteins predicted to be produced in the Harderian gland and present in the vomeronasal organ included several lipid-binding proteins present in high abundances, and antimicrobial proteins. It was found that the Harderian gland, and the vomeronasal organ express mRNA with predicted extracellular immune protein products suggesting that both tissues likely contribute to the observed bacteriacidal/bacteriostatic potency, however, the secretions of the Harderian gland appear to be more abundant than those of the vomeronasal organ. These results confirm that vomeronasal secretions have bacteriacidal/bacteriostatic properties and suggest that many of the extracellular immune proteins responsible originate from the Harderian gland, further suggesting that this tissue is a functional component of the vomeronasal immune system. Comparisons of relative protein abundances show that male Harderian glands in the spring are actively producing both lipid-binding proteins and antimicrobial proteins, whereas female glands may be just becoming active; showing presence of several metabolic and cellular structure related proteins. Two lipid-binding proteins were identified which were abundant in male secretions, but completely absent from those of females. These same proteins were found to be more abundant in the spring mating period compared to the summer feeding period. These characteristic patterns in lipid-binding proteins mark these as candidates for putative pheromone-binding

128 proteins which are hypothesized to bind and solubilize female sex-pheromone to allow its detection by the male vomeronasal chemosensory system. This body of research would benefit from analysis of additional vomeronasal secretion samples as the statistical power to detect significant differences between groups with sample sizes of only three individuals is quite low. The descriptive differences presented here are informative, but lack the rigor expected of quantitative analysis of protein expression data. Serum albumin was found to be highly abundant in vomeronasal secretion, but not expressed in either the Harderian gland or the vomeronasal organ. An analysis focused on the origin and possible functional significance of this albumin protein may be interesting and informative. As serum albumin is known to bind and solubilize hydrophobic molecules in the blood stream, it may be playing a role in the binding of chemosensory signals in the vomeronasal organ. Albumin is also known to play a role in stabilizing the osmotic environment in blood to allow free movement of molecules into and out of cells and to protect sensitive tissues. The presence of serum albumin, may indicate another possible function of the Harderian gland in sequestering albumin from the blood stream to be used inside the vomeronasal lumen. A protein of unknown function, “Harderian gland protein HG33”, was also found to be expressed in the Harderian gland and highly abundant in vomeronasal secretion. While this protein comprises nearly 1/3 of the sum peak intensity of proteins identified in vomeronasal secretion, nothing is known about its function. An additional targeted analysis of the functional significance of this protein is a reasonable future direction of this research. The protein mass spectrometry analysis described here, combined with the transcriptomic analysis described in Chapter 2 illustrate the functional importance of the Harderian gland in the vomeronasal chemosensory system of T. s parietalis, and suggest several possible directions for future study.

129

References: Albini, B., Wick, G., Rose, E., Orlans, E. (1974) Immunoglobulin production in chicken Harderian glands. Int Arch Allergy Immunol. 47:23-34. Barka, T., Anderson, P. (1965) Histochemistry: theory, practice and bibliography. Harper and Row, New York. Beynon, R., Hurst, J., Turton, M., Robertson, D., Armstrong, S., Cheethtam, S., Simpson, D., MacNicoll, A., Humphries, R. (2008) Urinary lipocalins in Rodentia: is there a Generic Model? in: Chemical Signals in Vertebrates 11. Springer, New York, NY. Blakemore, L. (2017) Sex and survival: Reproduction and anti-microbial defense in the Red-sided garter snake (Thamnophis sirtalis parietalis). M.S. thesis. Oregon State University. Corvallis, OR. Blakemore, L., Dolan, B., Bentz, E., Mason, R. (In Prep.) Sex or survival: tradeoffs between reproduction and anti-microbial defense in the Red-sided garter snake, Thamnophis sirtalis parietalis. Blighe, K. (2018) EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling. https://github.com/kevinblighe. Brykczynska, U., Tzika, A., Rodriguez, I., Milinkovith, M. (2013) Contrasted evolution of the vomeronasal receptor repertoires in mammals and squamate reptiles. Genome Biology and Evolution. 5(2):389–401. Burns, R. (1992) The Harderian gland in birds: histology and immunology. In: Harderian glands: porphyrin metabolism, behavioral and endocrine effects. (Springer)155-163. Chang, H., Liu, Y., Yang, T., Pelosi, P., Dong, S., Wang, G. (2015) Pheromone- binding proteins enhance the sensitivity of olfactory receptors to sex pheromones in Chilo suppressalis. Scientific reports. 5:13093. Day, J., Thorpe, S, Baynes, J. (1979) Nonenzymatically glucosylated albumin. In vitro preparation and isolation from normal human serum. J. Biol. Chem. 254(3):595–7. Deist, M., Lamont, S. (2018) What makes the Harderian gland transcriptome different from other chicken immune tissues? A gene expression comparative analysis. Frontiers in physiology. 9:492. Dolan, B., Fisher, K., Colvin, M., Benda, S., Peterson, J., Kent, M., Schreck, C. (2016) Innate and adaptive immune responses in migrating spring-run adult chinook salmon, Oncorhynchus tshawytscha. Fish Shellfish Immunol.48:136- 44. Domon, B., Aebersold, R. (2006) Mass spectrometry and protein analysis. Science. 312:212-217.

130

Dugovich, B., Peel, M., Palmer A., Zielke R., Sikora, A., Beechler, B., Jolles, A., Epps, C., Dolan, B. (2017) Detection of bacterial-reactive natural IgM antibodies in desert bighorn sheep populations. PLOS ONE. 12(6): e0180415. Dulac, C., Torello, A. (2003) Molecular detection of pheromone signals in mammals: From genes to behaviour. Nat Rev Neurosci. 4:551–562. Eisthen, H. (1992) Phylogeny of the vomeronasal system and of receptor cell types in the olfactory and vomeronasal epithelia of vertebrates. Microsc Res Tech. 23(1):1-21 Erickson, S., (2007) Sexual dimporphism and seasonal changes on the Harderian gland of the Red-sided garter snake, Thamnophis sirtalis parietalis. Oregon State University Honors College thesis research. Fagerberg L., et al. (2013) Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics. 13(2):397-406. Friesen, C., Uhrig, E., Squire, M., Mason, R., Brennan, P. (2014) Sexual conflict over mating in Red-sided garter snakes (Thamnophis sirtalis) as indicated by experimental manipulation of genitalia. Proceedings of the Royal Society B:Biological Sciences. Geer, L., Marchler-Bauer, A., Geer, R., Han, L., He, J., He, S., Liu, C., Shi, W., Bryant, S. (2010) The NCBI BioSystems database. Nucleic Acids Res. 38:D492-D496. Gregory, P. (1974) Patterns of spring emergence of the Red-sided garter snake (Thamnophis sirtalis parietalis) in the Interlake region of Manitoba. Journal of Canadian Zoology. 52:1063-1069. Haas, B., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P., Bowden, J., Couger, M., Eccles, D., Li, B., Lieber, M., MacManes, M. D., Ott, M., Orvis, J., Pochet, N., Strozzi, F., Weeks, N., Westerman, R., William, T., Dewey, C. N., Henschel, R., LeDuc, R. D., Friedman, N., Regev, A. (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature protocols. 8(8):1494-512. Halpern, M. (1987) The organization and function of the vomeronasal system. Annual review in Neurosciences. 10:325-362. Halpern, M., Kubie, J., Silverstein, R., Muller-Schwarze, D. (1983) Snake tongue flicking behavior: clues to vomeronasal system functions. Chemical Signals III. Plenum, NY. 45-72. Hathout, Y. (2007) Approaches to the study of the cell secretome. Expert Review of Proteomics. 4(2): 239

131

Isogai, Y., Si, S., Pont-Lezica, L., Tan, T., Kapoor, V., Murthy, V., Dulac, C. (2011) Molecular organization of vomeronasal chemoreception. Nature. 478:241- 247. Jin, J., Jhang, T., Lui, N., Dong, S. (2014) Different roles suggested by sex-biased expression and pheromone-binding affinity among three pheromone-binding proteins in the pink rice borer, Sesamia inferens (Lepidoptera: Noctuidae). Journal of insect physiology. 66:71-79. Kitchen, S., Crowder, C., Poole, A., Weis V., Meyer E. (2015) De Novo Assembly and Characterization of Four Anthozoan (Phylum Cnidaria) Transcriptomes. G3: Genes, Genomes, Genetics. 5(11):2441-2452. Kubie, J., Vagvolgyi, A., Halpern, M. (1978) The roles of the vomeronasal and olfactory systems in the courtship behavior of male garter snakes. Journal of Comp. Physiol. Psychol. 92:627-641. LeMaster, M, Mason, R. (2002) Variation in a female sexual attractiveness pheromone controls male mate choice in garter snakes. Journal of chemical ecology. 28(6):1269-85. LeMaster, M., Mason, R. (2001) Evidence for a female sex pheromone mediating male trailing behavior in the Red-sided garter snake, Thamnophis sirtalis parietalis. Chemoecology. 11:149-152. Li, J., Bickel, P., Biggen, M. (2014) System wide analyses have underestimated protein abundances and the importance of transcription in mammals. PeerJ. 2:270. Love, M., Huber, W., Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with Deseq2. Genome Biology. 15:550. Lutterschmidt, D., LeMaster, M., and Mason, R. (2004). Effects of melatonin on the behavioral and hormonal responses of Red-sided garter snakes (Thamnophis sirtalis parietalis) to exogenous corticosterone. Hormones and Behavior. 46:692-702. Mason, R., Fales, H., Jones, T., Pannell, L., Chinn, J., Crews, D. (1989) Sex pheromones in snakes. Science. 254(4915):290-293. Mason, R., Halpern, M. (2011) Chemical ecology of snakes: from pheromones to receptors. North American Society for Comparative Endocrinology. Conference abstract. July, 2011. Meyer, E., Aglyamova, G., Matz, M. (2011) Profiling gene expression responses of coral larvae (Acropora millepora) to elevated temperature and settlement inducers using a novel RNA‐Seq procedure. Molecular Ecology. 20:3599-3616. Meyer, E., Aglyamova, G., Wang, S., Buchanan-Carter, J., Abrego, D., Willis, B., Matz, M. (2009) Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx. BMC Genomics. 10:219

132

Payne, A. (1994) The Harderian Gland: A tercentennial review. Journal of Anatomy. 185(1): 1–49. Rehorek S., Firth B., Hutchinson M. (2000a) The structure of the nasal chemosensory system in squamate reptiles. Journal of Biosciences. 25(2):181-90. Rehorek, S. J. (2000b) Passage of Harderian gland secretions to the vomeronasal organ in Thamnophis sirtalis (Serpentes: Colubridae). Canadian Journal of Zoology. 78(7): 1284-1288. Rehorek, S., Halpern, M., Firth, B., Hutchinson, M. (2011). The Harderian gland of two species of snakes: Pseudonaja textilis (Elapidae) and Thamnophis sirtalis (Colubridae). Canadian Journal of Zoology. 81. 357-363. Rosenberg, H. (1992) An improved method for collecting secretion from Duvernoy’s gland of colubrid snakes. Copeia. (1):244-246. Taniguchi, Y., Choi, P., Li, G., Chen, H., Babu, M., Hearn, J., Imily, A., Xie, X. (2010) Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science. 329(5991):533-538. The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Research. 45:D158-D169.

133

Chapter 4:

Describing the functional significance of variations in gene expression by sex and season in the Harderian gland of Thamnophis sirtalis parietalis

Introduction:

The Red-sided garter snake Thamnophis sirtalis parietalis is a well-studied squamate model that has been the focus of vertebrate research for decades (Aleksiuk & Gregory, 1974). These snakes display very robust seasonal patterns which are tightly coordinated with environmental conditions, showing mutually exclusive periods of mating or feeding behavior (Gregory, 1974).

During the winter hibernation period, the entire population of adult T. s. parietalis enter underground hibernacula, both to escape cold temperatures and to ensure that they are near the den in the spring when mating begins. As the snow melts in the spring, male snakes begin to emerge and position themselves near den openings to await the emergence of females. As females emerge, male snakes use their vomeronasal organ to locate and evaluate females’ quality as potential mates (LeMaster & Mason, 2002). During this period, neither males nor females search for food, and neither accept food if offered (O’Donnell et al. 2004). Female snakes during this same period are not observed to actively explore their chemical environment with tongue-flicking behavior. Females remain at the den only long enough to mate before moving on to the surrounding marshes to feed (Aleksiuk & Gregory, 1974). This difference in behavior at the den site suggests that male snakes are more active in their use of the vomeronasal organ during this time compared to females.

Once the majority of females have left the den site, males migrate to the surrounding marshes and begin feeding as well. During the summer feeding period, both male and female snakes actively use their vomeronasal organ as the primary sense organ by which they locate prey items (Kubie et al. 1978; Halpern et al. 1983).

134

The lumen of the vomeronasal organ in T. s. parietalis is filled with aqueous fluid, however the vomeronasal organ itself does not appear to have the structural secretory capacity to produce this volume of fluid suggesting that the fluid has an external source (Rehorek et al. 2000a). It has been observed that much of this fluid originates from the Harderian gland – a large secretory structure near the orbit (Rehorek et al. 2000a;b; Rehorek at al. 2011). Histological observations using electron microscopy and protein staining have identified a large number of secretory vesicles containing proteins which are secreted into the nasolacrimal duct and ultimately enter the vomeronasal lumen to come in contact with the sensory epithelium.

The T. s. parietalis sexual attractiveness pheromone has been identified as a homologous series of long chain methyl ketones (saturated and unsaturated lipids) expressed as a solid, waxy substance on the skin of female snakes (Mason et al. 1989). As this pheromone is insoluble in aqueous fluid, a transport mechanism was hypothesized to exist in the form of a pheromone-binding protein acting to bind and solubilize female pheromone and allow it to come into contact with the vomeronasal sensory epithelium to be detected.

The Harderian gland in T. s. parietalis has been observed to undergo significant changes in size and structure during the transition from winter brumation to the spring mating period, and again during the transition from mating to the summer feeding period (Erickson, 2007). During winter brumation, at a time when neither sex is actively using their vomeronasal organ to search for mates or to locate prey, the Harderian glands in both sexes were found to be regressed and appeared less active. During the spring mating period, when males use their vomeronasal organ to actively locate females and evaluate their quality as potential mates, the male Harderian gland is hypertrophied, whereas the female gland is still inactive. During the summer feeding period, when both sexes use their vomeronasal organ to locate prey, the Harderian glands of both sexes was found to be hypertrophied and active in both males and females.

135

Sexual and seasonal variation of both the vomeronasal organ and the Harderian gland have been observed in many taxa (Hoh, 1984; Minucci et al. 1989; Sashima, 1989; Dawley et al. 1995; Buzzell, 1996; Kondoh, et al. 2012). The physiological and evolutionary links between these two structures appears well founded (Smith et al. 2017; Rehorek et al. 2011).

Based on these findings, it appears very likely that the Harderian gland in T. s. parietalis should exhibit changes to gene expression profiles that vary by both sex and season. The research presented in this chapter aims to identify and describe patterns of gene expression varying by both sex and season in the Harderian gland of T. s. parietalis and describe the functional significance of gene expression profiles with respect to the role of the Harderian gland as a component of the vomeronasal chemosensory system.

Research Aims:

Aim 1) Is there evidence for sexual and seasonal variation of gene expression in the Harderian gland? Previous research findings suggest a high likelihood that gene expression in the Harderian gland varies by both sex and season. As T. s. parietalis transitions from the spring mating period to the summer feeding period, the biological function of the vomeronasal chemosensory system changes dramatically. Only the Harderian glands of male snakes are active during the spring as they use their vomeronasal organ to search for and evaluate female sexual attractiveness pheromone. During the summer, both male and female snakes use their vomeronasal organ to search for prey. This indicates that the activity of the gland varies by sex and season. The research presented here aims to identify patterns of gene expression which vary by both sex and season.

136

Aim 2) What are the functional significances of genes which are differentially expressed across sex and season? The biological functions of the vomeronasal organ of T. s. parietalis changes dramatically between the spring mating period and the summer feeding period. Because the Harderian gland appears to be a componenet of the vomeronasal chemosensory system, the function of the Harderian gland is predictied to vary by sex and season as well. As gene expression analyses generated from RNA-seq experiments utilize a staggering amount of data, the information contained in that data must be further analyzed in order to identify functional significance of the underlying patterns of gene expression. Gene set enrichment analysis accomplishes this by investigating the variation of gene sets, each containing many genes sharing common characteristics. Gene ontology annotation adds this functional information to expressed transcripts and facilitates gene set enrichment analysis. Using enrichment analysis, the research presented in this chapter aims to describe the biological and functional significance of the patterns of gene expression observed in male and female garter snakes during the spring mating period and the summer feeding period.

Aim 3) Is there evidence that a pheromone-binding protein is produced in the Harderian gland? Because the female sexual attractiveness pheromone of T. s. parietalis is insoluble in aqueous solution yet must pass through aqueous solution to be detected by the vomeronasal sensory epithelium, pheromone-binding proteins are hypothesized to be present in the vomeronasal lumen and act as a mechanism to bind and solubilize sex pheromone (Mason & Halpern, 2011). Due to the physiological structures of the vomeronasal organ and the Harderian gland in this group, it is believed that the Harderian gland produces the majority of the fluid within the vomeronasal lumen (Rehorek, 2000b). Pheromone-binding proteins are very often found to display extremely sex-biased expression (Beynon et al. 2008; Jin, et al. 2014; Chang et al. 2015). Additionally, due to the extreme seasonality of T. s. parietalis, a pheromone-

137 binding protein would likely display seasonal bias as well. The research presented in this chapter aims to identify transcripts as candidate pheromone-binding proteins in T. s. parietalis based on a combination of hypothetical characteristics: 1) an apparent ability to bind and solubilize lipid pheromone, 2) male-biased expression, 3) spring biased expression, and 4) expression levels high enough to be physiologically relevant for a protein which is hypothesized to be secreted in relatively large quantities.

Aim 4) Is there evidence for sexual and seasonal variation in immune functions of the Harderian gland? As both squamate reptiles and Aves are within the clade Sauropsida, it is likely that the Harderian glands of these groups share some common functions. The Harderian gland has often been described as an immune tissue in birds (Albini et al. 1974, Deist & Lamont, 2018), and has been observed to transcribe antimicrobial proteins in snakes (Domínguez-Pérez et al. 2018). Additionally, the research presented in Chapter 2 shows that gene expression in the Harderian gland in T. s. parietalis is enriched for transcripts associated in antimicrobial defense. Due to the observations that vomeronasal secretions contain antimicrobial proteins, and that many of these proteins are predicted to be produced in the Harderian gland, this tissue appears to have a role involved in immune functions within the vomeronasal chemosensory system in T. s. parietalis (Chapters 2 & 3). This research aims to identify variations by sex and season of the expression of genes in the Harderian gland of T. s parietalis associated with antimicrobial defense.

138

Methods:

Animal collection and tissue collection:

A total of 40 adult Red-sided garter snakes (T.s. parietalis) (n=10 spring- male, n=10 summer- male, n=10 spring- female, n=10 summer- female) were collected from the wild near the town of Inwood in the Interlake region of Manitoba, Canada (50°31'28"N, 97°30'00"W). Spring snakes were collected from active mating aggregations in April 2016. Only actively courting males and females being courted were collected. Summer snakes were collected from the surrounding marshland (summer feeding areas) in July 2016. All snakes were housed in outdoor nylon cloth arenas (1 × 1 × 1 m) for 24 hours following capture before tissue collection. Snakes were euthanized individually immediately prior to tissue collection via an overdose of methohexital sodium (Brevital™) (0.005 mL/g of body mass, 1% solution). Euthanized snakes were placed on a workspace under a Jena dissecting scope and in tact Harderian glands were removed with forceps and corneoscleral scissors. Prior to, and between each dissection, both the workspace and the surgical instruments were cleaned to prevent cross contamination of samples. The workspace was cleaned with sterile saline followed by a 10% bleach solution, then treated with RNAse away (Ambion). Surgical instruments were cleaned with sterile saline, aseptically treated with a 10% bleach solution and 100% ethanol, then placed in a Germinator 500™ dry sterilizer for >15 seconds. Harderian glands were dissected from animals, rinsed briefly with nuclease free water to remove excess blood or surface contaminants, then immediately placed RNALater™ RNA stabilization reagent (Invitrogen). Tissues were stored at 4°C for 24 hours to allow the RNALater™ to fully permeate the tissue, then moved to -20°C for long term storage until transportation. Tissues were transported to Oregon State University on ice ~0°C then stored at -20°C until RNA Extraction.

139

RNA Extraction and Sequencing:

Tissues were removed from RNA preservation reagent and immediately mechanically homogenized. Total RNA was extracted with E.Z.N.A.® HP Total RNA Kits (Omega Bio-Tek). A sample from each extracted RNA was analyzed on a polyacrylamide gel and visualized using ethidium bromide and ultraviolet light to ensure high quality RNA was present and intact. Concentrations of each RNA sample were estimated using absorbance at 260nm using a Spectramax™ M3 with SpectraDrop™ microvolume microplate (Molecular Devices, LLC, San Jose, CA). Total RNA was diluted to a concentration of ~111ng/µl in all samples. A total of 1µg of RNA from each sample was degraded via heat at 95°C for 50 minutes resulting in a distribution of RNA fragments from ~100-500bp. Complementary DNA (cDNA) was reverse transcribed from degraded RNA samples and enriched for messenger RNA using Tetro™ Reverse Transcriptase (Bioline) coupled with primer oligonucleotides designed to target and enrich mature messenger RNAs containing poly-A tails. This cDNA library was then used to prepare a Tag-seq sequencing library (Meyer et al. 2011; Lohman et al. 2016). Primer sequences used during library prep are reported in Appendix A.2. The prepared library was submitted to Oregon Health & Science University, Massively Parallel Sequencing Shared Resource (MPSSR). High throughput sequencing was performed on the Illumina HiSeq 2500 platform (2 lanes, 100 bp single end reads).

Raw Data Processing and Quality Control:

The raw data from 2 lanes of Illumina HiSeq® 2500 yielded approximately 404.5M 100 bp single end reads (mean reads/sample: ~10.1M). Quality filtering was performed with the custom script “QualFilterFastq.pl” (Meyer, Github; Appendix U.2). Reads were trimmed to remove leading adapter sequence and truncated to 75 bp and filtered if they did not pass any of 3 separate filters. Reads were removed if they contained: 1) 20 or more base pairs with a Quality Value (QV) score of 20 or

140 less, 2) 10 or more Homologous Repeats (HR) or 3) 12 or more base pairs aligning to adapter sequences from library preparation. Reads passing quality filters were examined using FastQC (version 0.11.3) to ensure a high-quality dataset was used in subsequent steps (Andrews, 2010).

Read Mapping:

Processed and quality filtered reads were mapped to the multi-tissue reference transcriptome (see chapter 2 for full description of transcriptome methods). Reads contained in separate files (.fastq) were mapped using the short read mapper SHRiMP2; gmapper (Rumble et al. 2009; David et al. 2011). Gmapper was run with flags: “--qv-offset 33” (quality scores appropriate for the current version of Illumina™ sequencing platforms.), “-Q” (indicates reads are in .fastq format), “— strata” -o 3” (report only the highest scoring mappings for each read to a max of 3) “- N 4 -K 10000” (use 4 threads, with each thread processing blocks of 10,000 reads) and “-L” (perform local mapping, rather than global). The resulting sequence alignment map file (.sam) was then filtered to retain only high quality and informative read mappings using the custom script “SAMFilterByGene.pl” (Meyer, Github; Appendix U.7). Read mappings were retained if they contained at least 57 matches between the local alignment of each read and the reference. Reads were counted as the number of reads uniquely assigned to a single reference sequence at the level of the Trinity subcomponent. Read mapping counts were compiled into a tab delimited counts matrix containing all samples and all reference sequences with at least one valid mapping using the custom script “CombineExpression.pl” (Meyer, Github; Appendix U.8).

Differential Expression Analysis:

Differential Expression analysis was conducted using the R package DESeq2 (Love et al. 2014). Input files were loaded as a tab delimited matrix of read mapping counts (described above) and a key file including associations between samples and

141 treatment groups. Input counts data were filtered to retain transcripts with a mean of ≥ 0.5 valid mappings per reference sequence. A full DESeq model was created including two factors with two levels each plus an interaction term. Factors included “sex” (male or female), “season” (spring or summer) and the interaction term of “sex by season”.

Preliminary examinations of the counts data were performed using principal components analysis (PCA) and hierarchical clustering visualization of z-scores. Prior to visualization of the data, the full model was first transformed using the regularized log transformation function included in the DESeq2 package. A principal components analysis plot was generated including all four possible combinations of factor levels – spring/male, summer/male, spring/female, and summer/female using the function “PlotPCA” included in the DESeq2 package. A heatmap of hierarchical clustering of z-scores was produced with the R package “Heatmap3” (Zhao, et al. 2014).

Differential expression analyses were conducted according to a two factor design including factors of sex, season and sex by season interaction (Figure 4.1). Additional differential expression analyses were conducted using a single factor design including all combinations of Sex and Season (Figure 4.2). To test the effects of the interaction term, the full model was compared against a reduced model including the factors of sex and season but lacking the interaction term. The reduced model was compared using a negative binomial likelihood ratio test. The factor levels of “female” and “summer” were assigned as reference levels to generate output indicating the effect of the interaction of spring season and male sex. The factors were then releveled to assign “male and “summer” as reference levels to test the effect of the interaction of female sex and spring season. To test the effects of sex and season independent of their interactions, an additional full DESeq model was created not including the interaction term. The effect of sex and season as factors were tested against reduced models lacking the sex and term respectively, using negative binomial likelihood ratio tests.

142

Multiple test correction of raw p-values was performed according to the Benjamini-Hochberg procedure to control for false discovery rates. Independent filtering by DESeq was allowed in order to remove reference sequences from multiple test correction calculations with mean counts below the optimal threshold after normalization (Bourgon, et al. 2010). The threshold of p ≤ 0.01 was used as the significance threshold for all differential expression analyses.

Results of differential expression analyses were visualized using the R package “Heatmap3”. Heatmaps depict clustered z-scores calculated from regularized log transformed counts for differentially expressed reference sequences.

Figure 4.1 Two factor differential expression comparisons in Harderian gland tissue: Comparisons made using the two factor model including Sex and Season. Comparisons are made across all samples (n = 9 spring males, n = 10 summer males, n = 10 spring females, n = 10 summer females).

143

Figure 4.2 Single-factor differential expression comparisons in Harderian gland tissue: Comparisons made using single-factor models including combinatins of sex and season. Comparisons are made among samples within a single sex or season (n = 9 spring males, n = 10 summer males, n = 10 spring females, n = 10 summer females).

144

Gene Set Enrichment Analysis:

To determine if differentially expressed genes displayed enrichment by sex and season associated with a particular function, I performed functional enrichment analysis using the software ErmineJ (Gillis, et al. 2010). Output files from differential expression analyses including the effect of sex, season and the sex by season interaction were used to conduct enrichment analyses. From the DESeq2 outputs, gene enrichment scores were calculated as the negative log10 of the p-value multiplied by the log2 fold change (-log10(p-value)*(log2FC)). This score is both directionally aware, using the fold change information and magnitude to determine relative abundance of the expressed genes, and also accounts for the statistical confidence in that determination by incorporating the p-value. This score was applied to all reference sequences in the DESeq2 outputs to create a rank-ordered list of gene scores appropriate for enrichment analysis using the command line program ErmineJ. The custom script “Make_ermineJ_annotations.pl” (Appendix U.7), was used to extract gene ontology (GO) term annotations for all genes included in the analysis. Enrichment analysis was performed with ErmineJ using precision-recall gene-score- resampling (Pavlidis at al., 2002). In addition to the gene sets defined by the Gene Ontology consortium, an additional author-defined gene set was included based on previous findings that the Harderian gland transcriptome is enriched for genes involved in antimicrobial defense (Chapter 2). This gene set includes 116 genes likely involved in “Antimicrobial defense” (Appendix A.1). Separate enrichment analyses were performed for each aspect (Molecular function, Biological process, and Cellular component) of the Gene Ontology structure.

145

Results:

Processing and read mapping: The raw data from 2 lanes of Illumina HiSeq® 2500 yielded approximately 404.5M 100 bp single end reads. After quality filtering, 254,829,083 high quality reads remained (mean reads/sample: ~6.4M; range of 3,118,462 - 9,901,298 reads per sample). A total of 6,080,296 quality filtered reads mapped to the reference with an average of 95.46% (range: 93.73% – 96.90%). The subset of reference sequences with reads mapped included 32,472 transcripts. One sample (Spring male #107) showed extreme variation not displayed by any other sample and was removed from analyses as an extreme outlier.

Exploratory analyses of counts data:

Exploratory analysis of counts data using principal components analysis showed that the variation between samples corresponded with the factor levels (Figure 4.3). All combinations of factor levels resulted in non-overlapping clusters plotted on PC1 and PC2 using regularized log transformed counts. PC1 contributed ~15% of overall variation and PC2 contributed ~9% of variation. A heatmap of z- scores produced from regularized log transformed counts showed clusters of transcripts with abundances well above and below the mean (Figure 4.4). Z-score clustering showed results similar to that of principal components analysis, indicating that variation among many genes correlate with all factor combinations. Distinct clusters in many factor combinations showed relatively high abundances reflected by relatively low abundances in other factor combinations. A distinct cluster with large z-scores in spring males, showed relatively small z-scores in spring females. Other clusters showed similar patterns of reflected z-scores across all factor combinations.

One sample (spring male #107) showed variation which was substantially different from other male snakes, and from male snakes in the spring and in summer.

146

These outlying variances may have been due to a failed sequencing library prep, causing sequencing artifacts or a skewed representation of transcripts present in the library. The abnormal variation may also have been caused by biological factors such as a disease state, heavy parasite load or malnutrition. As we were not able to determine the source of the observed difference in gene expression, this sample was removed, and all analyses were conducted with the remaining nine spring male samples.

Figure 4.3 Principal components analysis (PCA) of gene expression in the Harderian gland of T. s. parietalis: Principal components PC1 and PC2 calculated from regularized log transformed read mapping counts from: n = 9 spring males, n = 10 spring females, n = 10 summer males, and n = 10 summer females. Log transformation and PCA were performed using DESeq2 (Love et al. 2014).

147

Figure 4.4 Expression by transcript in the Harderian gland of T. s. parietalis. Heat map shows z-scores calculated from read mapping counts of the top 1,000 most expressed transcripts in Harderian gland tissue. Figure was produced and clustering performed using the R package “heatmap3” (Zhao, et al. 2014). Z-scores were calculated from regularized log transformed read mapping counts using DESeq2 (Love et al. 2014).

148

Differential expression analysis #1: Males vs. Females across all samples:

Differential expression analysis of Harderian gland tissues comparing males to females across all samples included 13,390 reference sequences with non-zero normalized read counts. A total of 351 genes (4.2%) displayed significantly increased abundances in males compared to females and 258 genes (3.1%) displayed significantly increased abundances in females compared to males (adjusted p-value ≤ 0.01) (Figure 4.5). Independent filtering by DESeq2 identified 104 reference sequences as outliers and 4,859 reference sequences with low normalized counts and included 8,430 sequences in multiple test correction calculations.

Figure 4.5 Differential expression in the Harderian gland of T. s. parietalis when comparing all male samples to all female samples. Heat map was produced and clustering performed using the R package “heatmap3” (Zhao, et al. 2014). Z-scores were calculated from regularized log transformed read mapping counts using DESeq2 (Love et al. 2014).

149

Examination of genes with lipid-binding protein products showed that the second most abundant transcript expressed in the Harderian gland is annotated as a lipocalin (a family of extracellular lipid-binding proteins). However, this protein was not found to be significantly differentially expressed (adjusted p-value = 0.838 by sex). However, two other lipocalin transcripts (DN1116_c0_g1_i1 and DN4762_c0_g1_i1) were significantly differentially expressed between males and females (adjusted p- values: 2.52e-23 and 4.04e-17 respectively).

Enrichment analysis #1: Males vs. Females across all samples:

Enrichment analysis of gene sets based on 2-factor differential expression results included 809 gene sets from the Biological process aspect (B) of the Gene Ontology structure, 188 gene sets from the Molecular function aspect (M), and 121 gene sets from the Cellular component aspect (C). Male Harderian glands showed enrichment of 4 gene sets including genes targeted to the “Extracellular region” (GO:0005576; BH adjusted p-value = 0.0109), and genes with molecular functions associated with “Lipid binding” (GO:0008289; BH adjusted p-value = 1.71e-10), “Structural molecule activity” (GO:0005198; BH adjusted p-value = 0.0171), and “Structural constituent of ribosome” (GO:0003735; BH adjusted p-value = 0.0342) (Table 4.1). No gene sets were found to be significantly enriched in females compared to males. Antimicrobial defense gene set was not found to be enriched in either males or females (adjusted p- values: 0.763 (males); ~1.0 (software rounded) (females).

150

Table 4.1 Significantly enriched gene sets in Harderian glands of Males compared to Females: Gene sets are reported if found to be significantly enriched in all male Harderian glands compared to all female Harderian glands (adjusted p-value ≤ 0.05). “Gene set" contains the description of the Gene Ontology (GO) term indicated. “Aspect” indicates either the Molecular function (M), Biological process (B) or Cellular component (C) of the gene ontology structure. “Sex” indicates the group in which the gene set was enriched. Enrichment analysis was performed with the ErmineJ software package using Precision Recall Gene Score Resampling (Gillis et al. 2010). Gene scores were calculated from differential expression analysis using the R package DESeq2 (Love et al. 2014). Gene scores were calculated using -log10(p- value)(log2fc) from DEseq2 output. "Adjusted p-values" are reported from ErmineJ output and calculated using the Benjamini Hochberg FDR method to control for false discovery rate. No gene sets were found to be enriched in females compared to males. Adjusted Gene set GO term Aspect Sex p-value

Lipid binding GO:0008289 M Males 1.71e-10 Structural molecule activity GO:0005198 M Males 0.0171 Structural constituent of ribosome GO:0003735 M Males 0.0342 Extracellular region GO:0005576 C Males 0.0109

151

Differential expression analysis #2: Spring vs. Summer across all samples:

Differential expression analysis of Harderian gland tissues comparing spring to summer across all samples included 13,390 reference sequences with non-zero normalized read counts. A total of 204 genes (1.5%) displayed significantly increased abundances in spring compared to summer and 258 genes (3.1%) displayed significantly increased abundances in summer compared to spring (BH adjusted p- value ≤ 0.01) (Figure 4.6). Independent filtering by DESeq2 identified 104 reference sequences as outliers and 5,629 reference sequences with low normalized counts and included 7,660 sequences in multiple test correction calculations.

Figure 4.6 Differentially expressed genes in the Harderian gland of T. s. parietalis when comparing all spring samples to all summer samples. Heat map was produced and clustering performed using the R package “heatmap3” (Zhao, et al. 2014). Z-scores were calculated from regularized log transformed read mapping counts. DESeq2 (Love et al. 2014).

152

Lipid-binding proteins of interest identified in comparison #1 (above) showed no evidence of seasonal differential expression. Lipocalin (DN20506_c2_g1_i7) was not found to be significantly differentially expressed by season (adjusted p-value = 0.076). Lipocalin transcripts of interest (DN1116_c0_g1_i1 and DN4762_c0_g1_i1) identified as differentially expressed by sex were not found to be seasonally differentially expressed between spring and summer time periods (adjusted p-values: 0.131 and 0.124 respectively).

Enrichment analysis #2: Spring vs. Summer across all samples:

Enrichment analysis of gene sets based on 2-factor differential expression results included 809 gene sets from the Biological process aspect (B) of the Gene Ontology structure, 188 gene sets from the Molecular function aspect (M), and 121 gene sets from the Cellular component aspect (C). Enrichment analysis of gene sets based on differential expression results showed significant enrichment of 7 gene sets in summer Harderian glands. Enriched gene sets were targeted to the “Cytosolic part” (GO:0044445; BH adjusted p-value = 1.09e-10), associated with molecular functions of “Cofactor binding” (GO:0048037; BH adjusted p-value = 4.28e-11), “Tetrapyrrole binding” (GO:0046906; BH adjusted p-value = 5.70e-11), “Heme binding” (GO:0020037; BH adjusted p-value = 8.55e-11), “Iron ion binding” (GO:0005506; BH adjusted p-value = 1.71e-10) , and associated with the biological processes of “Response to chemical” (GO:0042221; BH adjusted p-value = 7.20e-10), and “Response to drug” (GO:0042493; BH adjusted p-value = 3.60e-10) (Table 4.2). No gene sets were found to be significantly enriched in the spring compared to summer. Antimicrobial defense gene set was not found to be enriched in either spring or summer (BH adjusted p-values: 1.0 (software rounded) (spring); ~1 (software rounded) (summer).

153

Table 4.2: Significantly enriched gene sets in Harderian gland tissue when comparing Spring to Summer: Gene sets are reported if found to be significantly enriched in Harderian gland tissue in the spring mating period compared to the summer feeding period across all samples (adjusted p- value ≤ 0.05). “Gene set" contains the description of the Gene Ontology (GO) term indicated. “Aspect” indicates either the Molecular function (M), Biological process (B) or Cellular component (C) of the gene ontology structure. “Season” indicates the season in which the gene set was enriched. Enrichment analysis was performed with the ErmineJ software package using Precision Recall Gene Score Resampling (Gillis et al. 2010). Gene scores were calculated from differential expression analysis of Harderian gland tissue comparing all summer samples to all spring samples using the R package DESeq2 (Love et al. 2014). Gene scores were calculated using -log10(p-value)(log2fc) from DEseq2 output. "Adjusted p-values" are reported from ErmineJ output and calculated using the Benjamini Hochberg FDR method to control for false discovery rate. No gene sets were found to be enriched in Spring compared to Summer. Adjusted Gene set GO term Aspect Season p-value

Cofactor binding GO:0048037 M Summer 4.28e-10 Tetrapyrrole binding GO:0046906 M Summer 5.70e-10 Heme binding GO:0020037 M Summer 8.55e-10 Iron ion binding GO:0005506 M Summer 1.71e-10 Cytosolic part GO:0044445 C Summer 1.09e-10 Response to drug GO:0042493 B Summer 3.60e-10 Response to chemical GO:0042221 B Summer 7.20e-10

154

Differential expression analysis #3: Sex by Season interaction across all samples:

Differential expression analysis investigating the effects of the interaction of sex and season included 13,390 reference transcripts and showed 205 genes (~1.5%) with significantly increased abundances and 178 genes (~1.3%) with significantly decreased abundances according to the sex by season interaction term (adjusted p- value ≤ 0.01) (Figure 4.7). Independent filtering by DESeq2 identified 76 reference sequences as outliers and 9,021 reference sequences with low normalized counts not to be used in multiple test correction calculations. Independent filtering by DESeq2 identified 76 reference sequences as outliers and 9,021 reference sequences with low normalized counts and included 4,296 sequences in multiple test correction calculations.

Figure 4.7 Differentially expressed genes in the Harderian gland of T. s. parietalis investigating the effects of the sex by season interaction across all samples. Heat map was produced and clustering performed using the R package “heatmap3” (Zhao, et al. 2014). Z-scores were calculated from regularized log transformed read mapping counts. DESeq2 (Love et al. 2014).

155

Enrichment analysis #3: Sex by Season interaction across all samples:

Enrichment analysis of gene sets based on 2-factor differential expression results included 809 gene sets from the Biological process aspect (B) of the Gene Ontology structure, 188 gene sets from the Molecular function aspect (M), and 121 gene sets from the Cellular component aspect (C). Results showed significant enrichment of 10 Molecular function gene sets, 19 Cellular component gene sets and 27 Biological process gene sets (BH adjusted p-values ≤ 0.05; Table 4.3). Antimicrobial defense gene set was not found to be enriched by the interaction of sex and season (BH adjusted p-value = 0.460).

Table 4.3: Gene sets significantly enriched in Harderian glands comparing the effect of Sex by Season interaction: Gene sets are reported if found to be significantly enriched in a comparison of the sex by season interaction term (adjusted p-value ≤ 0.05). “Gene set" contains the description of the Gene Ontology (GO) term indicated. “Aspect” indicates either the Molecular function (M), Biological process (B) or Cellular component (C) of the gene ontology structure. Enrichment analysis was performed with the ErmineJ software package using Precision Recall Gene Score Resampling (Gillis et al. 2010). Gene scores were calculated from differential expression analysis of the sex by season interaction in all spring and summer males, and all spring and summer females using the R package DESeq2 (Love et al. 2014). Gene scores were calculated using -log10(p-value)(log2fc) from DEseq2 output. "Adjusted p-values" are reported from ErmineJ output and calculated using the Benjamini Hochberg FDR method to control for false discovery rate. Adjusted Gene set GO term Aspect p-value inorganic cation transmembrane transporter activity GO:0022890 M 2.85E-11 proton transmembrane transporter activity GO:0015078 M 3.42E-11 monovalent inorganic cation transmembrane transporter activity GO:0015077 M 4.28E-11 cation transmembrane transporter activity GO:0008324 M 5.70E-11 structural molecule activity GO:0005198 M 8.55E-11 structural constituent of ribosome GO:0003735 M 1.71E-10 ion transmembrane transporter activity GO:0015075 M 7.33E-03

156 transmembrane transporter activity GO:0022857 M 8.55E-03 inorganic molecular entity transmembrane transporter activity GO:0015318 M 9.50E-03 peptidase activity GO:0008233 M 0.031 proton transmembrane transport GO:1902600 B 3.61E-10 monovalent inorganic cation transport GO:0015672 B 7.21E-10 inorganic ion transmembrane transport GO:0098660 B 0.018 ion transmembrane transport GO:0034220 B 0.024 ribose phosphate metabolic process GO:0019693 B 0.028 cation transmembrane transport GO:0098655 B 0.029 purine ribonucleoside triphosphate metabolic process GO:0009205 B 0.030 inorganic cation transmembrane transport GO:0098662 B 0.031 ATP metabolic process GO:0046034 B 0.031 purine nucleoside triphosphate metabolic process GO:0009144 B 0.032 drug metabolic process GO:0017144 B 0.032 ribonucleotide metabolic process GO:0009259 B 0.033 purine-containing compound metabolic process GO:0072521 B 0.033 nucleoside monophosphate metabolic process GO:0009123 B 0.033 purine nucleoside monophosphate metabolic process GO:0009126 B 0.033 purine ribonucleoside monophosphate metabolic process GO:0009167 B 0.033 purine nucleotide metabolic process GO:0006163 B 0.034 carbohydrate derivative biosynthetic process GO:1901137 B 0.034 ribonucleoside monophosphate metabolic process GO:0009161 B 0.034 ribonucleoside triphosphate metabolic process GO:0009199 B 0.034 organophosphate metabolic process GO:0019637 B 0.035 cation transport GO:0006812 B 0.036 ion transport GO:0006811 B 0.036 purine ribonucleotide metabolic process GO:0009150 B 0.036 generation of precursor metabolites and energy GO:0006091 B 0.036 nucleoside triphosphate metabolic process GO:0009141 B 0.036

157 carbohydrate derivative metabolic process GO:1901135 B 0.036 membrane protein complex GO:0098796 C 5.50E-11 ribosome GO:0005840 C 1.10E-10 ribosomal subunit GO:0044391 C 0.029 proton-transporting two-sector ATPase complex GO:0016469 C 0.031 endoplasmic reticulum part GO:0044432 C 0.031 mitochondrial membrane GO:0031966 C 0.031 nuclear outer membrane-endoplasmic reticulum membrane network GO:0042175 C 0.032 endoplasmic reticulum GO:0005783 C 0.033 cytosolic part GO:0044445 C 0.035 organelle subcompartment GO:0031984 C 0.036 mitochondrial inner membrane GO:0005743 C 0.037 endoplasmic reticulum subcompartment GO:0098827 C 0.038 envelope GO:0031975 C 0.038 organelle envelope GO:0031967 C 0.038 endoplasmic reticulum membrane GO:0005789 C 0.039 organelle inner membrane GO:0019866 C 0.043 extracellular region GO:0005576 C 0.044 mitochondrial envelope GO:0005740 C 0.044 mitochondrial part GO:0044429 C 0.046

158

Differential expression analysis #4: Spring Males vs Spring Females:

Differential expression analysis of Harderian gland tissues comparing spring males to spring females included 14,027 reference sequences with non-zero normalized read counts. A total of 472 genes (~3.4%) displayed significantly increased abundances in spring males compared to spring females and 488 genes (~3.5%) displayed significantly increased abundances in spring females compared to spring males (adjusted p-value ≤ 0.01) (Figure 4.8). Independent filtering by DESeq2 identified no reference sequences as outliers and 5,722 reference sequences with low normalized counts and included 8,316 sequences in multiple test correction calculations.

Figure 4.8 Differentially expressed genes in the Harderian gland of T. s. parietalis comparing all spring males to all spring females. Heat map was produced and clustering performed using the R package “heatmap3” (Zhao, et al. 2014). Z-scores were calculated from regularized log transformed read mapping counts. DESeq2 (Love et al. 2014).

159

Enrichment analysis #4: Spring Males vs Spring Females:

Enrichment analysis of gene sets based on differential expression results comparing Spring Males to Spring Females included 473 gene sets from the Biological process aspect (B) of the Gene Ontology structure, 160 gene sets from the Molecular function aspect (M), and 109 gene sets from the Cellular component aspect (C). Enrichment analysis of gene sets based on differential expression results showed significant enrichment of 12 Molecular function gene sets and gene sets and 12 Cellular component gene sets (BH adjusted p-values ≤ 0.05; Table 4.4). No gene sets were found to be significantly enriched in spring females compared to spring males. Antimicrobial defense gene set was not found to be enriched in either spring males or spring females (BH adjusted p-values: 0.156 (spring males); ~1 (software rounded) (spring females).

Table 4.4 Significantly enriched gene sets in Harderian glands when comparing Spring Males to Spring Females: Gene sets are reported if found to be significantly enriched in Harderian gland tissue comparing males in the spring mating period to females in the time same period (adjusted p-value ≤ 0.05). “Gene set" contains the description of the Gene Ontology (GO) term indicated. “Aspect” indicates either the Molecular function (M), Biological process (B) or Cellular component (C) of the gene ontology structure. “Group” indicates the sex and season in which the gene set was enriched. Enrichment analysis was performed with the ErmineJ software package using Precision Recall Gene Score Resampling (Gillis et al. 2010). Gene scores were calculated from differential expression analysis using the R package DESeq2 (Love et al. 2014). Gene scores were calculated using - log10(p-value)(log2fc) from DEseq2 output. "Adjusted p-values" are reported from ErmineJ output and calculated using the Benjamini Hochberg FDR method to control for false discovery rate. No gene sets were found to be enriched in spring females. Adjuste d Gene set GO term Aspect Group p-value lipid binding GO:0008289 M Spring Male 4.90e-11 structural molecule activity GO:0005198 M Spring Male 7.35e-11 structural constituent of ribosome GO:0003735 M Spring Male 1.47e-10 oxidoreductase activity GO:0016491 M Spring Male 3.68e-03 proton transmembrane transporter activity GO:0015078 M Spring Male 5.88e-03 transporter activity GO:0005215 M Spring Male 0.0147

160 inorganic cation transmembrane transporter activity GO:0022890 M Spring Male 0.0165 cation transmembrane transporter activity GO:0008324 M Spring Male 0.0189 monovalent inorganic cation transmembrane transporter activ. GO:0015077 M Spring Male 0.0229 ion transmembrane transporter activity GO:0015075 M Spring Male 0.0309 inorganic molecular entity transmembrane transporter activ. GO:0015318 M Spring Male 0.0347 transmembrane transporter activity GO:0022857 M Spring Male 0.0355 ribonucleoprotein complex GO:1990904 C Spring Male 3.27e-11 ribosome GO:0005840 C Spring Male 4.90e-11 extracellular region GO:0005576 C Spring Male 9.80e-11 mitochondrion GO:0005739 C Spring Male 2.45e-03 ribosomal subunit GO:0044391 C Spring Male 5.88e-03 organelle membrane GO:0031090 C Spring Male 6.53e-03 membrane protein complex GO:0098796 C Spring Male 7.00e-03 mitochondrial part GO:0044429 C Spring Male 0.0319 organelle subcompartment GO:0031984 C Spring Male 0.0441 organelle inner membrane GO:0019866 C Spring Male 0.0452 mitochondrial envelope GO:0005740 C Spring Male 0.0472 mitochondrial membrane GO:0031966 C Spring Male 0.0474

161

Differential expression analysis #5: Summer Males vs. Summer Females:

Differential expression analysis of Harderian gland tissues comparing summer males to summer females included 12,819 reference sequences with non-zero normalized read counts. A total of 72 genes (~0.56%) displayed significantly increased abundances in summer males compared to summer females and 29 genes (~0.23%) displayed significantly increased abundances in summer females compared to summer males (adjusted p-value ≤ 0.01) (Figure 4.9). Independent filtering by DESeq2 identified no reference sequences as outliers and 6,243 reference sequences with low normalized counts and included 6,606 sequences in multiple test correction calculations.

Figure 4.9 Differentially expressed genes in the Harderian gland of T. s. parietalis comparing summer males to summer females. Heat map was produced and clustering performed using the R package “heatmap3” (Zhao, et al. 2014). Z-scores were calculated from regularized log transformed read mapping counts. DESeq2 (Love et al. 2014).

162

Enrichment analysis #5: Summer Males vs. Summer Females:

Enrichment analysis of gene sets based on differential expression results comparing Summer Males to Summer Females included 409 gene sets from the Biological process aspect (B) of the Gene Ontology structure, 136 gene sets from the Molecular function aspect (M), and 94 gene sets from the Cellular component aspect (C). Results showed significant enrichment of 1 Molecular function gene set and 1 Cellular component gene set (BH adjusted p-values ≤ 0.05; Table 4.5). No gene sets were found to be significantly enriched in summer females compared to summer males. Antimicrobial defense gene set was not found to be enriched in either summer males or summer females (BH adjusted p-values: 0.829 (summer males); 0.969 (summer females).

Expression of the two lipocalin transcripts identified as significantly differentially expressed by sex across all samples showed that neither transcript was significantly more abundant in the spring compared to summer using the conservative significance cutoff value (p-value ≤ 0.01) used elsewhere in this chapter (BH adjusted p-value 0.019 and 0.091 respectively). However, one transcript (DN1116_c0_g1_i1) can be considered significantly more abundant in the spring using the less conservative standard cutoff of 0.05.

163

Table 4.5 Significantly enriched gene sets in Harderian glands when comparing Summer Males to Summer Females: Gene sets are reported if found to be significantly enriched in Harderian gland tissue comparing males in the summer feeding period to females in the summer feeding period (adjusted p-value ≤ 0.05). “Gene set" contains the description of the Gene Ontology (GO) term indicated. “Aspect” indicates either the Molecular function (M), Biological process (B) or Cellular component (C) of the gene ontology structure. “Group” indicates the sex and season in which the gene set was enriched. Enrichment analysis was performed with the ErmineJ software package using Precision Recall Gene Score Resampling (Gillis et al. 2010). Gene scores were calculated from differential expression analysis using the R package DESeq2 (Love et al. 2014). Gene scores were calculated using -log10(p- value)(log2fc) from DEseq2 output. "Adjusted p-values" are reported from ErmineJ output and calculated using the Benjamini Hochberg FDR method to control for false discovery rate. No gene sets were found significantly enriched in Summer Females. Adjusted Gene set GO term Aspect Season p-value Extracellular region GO:0005576 C Summer Males 8.30e-03 Lipid binding GO:0008289 M Summer Males 1.24e-10

164

Differential expression analysis #6: Spring Males vs. Summer Males:

Differential expression analysis of Harderian gland tissues comparing spring males to summer males included 12,259 reference sequences with non-zero normalized read counts. A total of 140 genes (~1.1%) displayed significantly increased abundances in spring males compared to summer males and 87 genes (~0.71%) displayed significantly increased abundances in summer males compared to spring males (adjusted p-value ≤ 0.01) (Figure 4.10). Independent filtering by DESeq2 identified no reference sequences as outliers and 6,185 reference sequences with low normalized counts and included 6,080 sequences in multiple test correction calculations.

Figure 4.10 Differentially expressed genes in the Harderian gland of T. s. parietalis comparing spring males to summer males. Heat map was produced and clustering performed using the R package “heatmap3” (Zhao, et al. 2014). Z-scores were calculated from regularized log transformed read mapping counts. DESeq2 (Love et al. 2014).

165

Enrichment analysis #6: Spring Males vs. Summer Males:

Enrichment analysis of gene sets based on differential expression results comparing spring males to summer males included 389 gene sets from the Biological process aspect (B) of the Gene Ontology structure, 135 gene sets from the Molecular function aspect (M), and 92 gene sets from the Cellular component aspect (C). Results showed significant enrichment of 4 Molecular function gene sets, 2 Cellular component gene sets and 2 Biological process gene sets in spring males, and 6 Molecular function gene sets and 2 Cellular component gene sets and 1 Biological process gene set in summer males (BH adjusted p-values ≤ 0.05; Table 4.6). Antimicrobial defense gene set was not found to be enriched in either spring males or summer males (BH adjusted p-values: 0.911 (spring males); ~1 (software rounded) (summer males).

166

Table 4.6 Significantly enriched gene sets in Harderian glands when comparing Spring Males to Summer Males: Gene sets are reported if found to be significantly enriched in Harderian gland tissue comparing males in the spring mating period to males in the summer feeding period (adjusted p-value ≤ 0.05). “Gene set" contains the description of the Gene Ontology (GO) term indicated. “Aspect” indicates either the Molecular function (M), Biological process (B) or Cellular component (C) of the gene ontology structure. “Group” indicates the sex and season in which the gene set was enriched in females. Enrichment analysis was performed with the ErmineJ software package using Precision Recall Gene Score Resampling (Gillis et al. 2010). Gene scores were calculated from differential expression analysis using the R package DESeq2 (Love et al. 2014). Gene scores were calculated using - log10(p-value)(log2fc) from DEseq2 output. "Adjusted p-values" are reported from ErmineJ output and calculated using the Benjamini Hochberg FDR method to control for false discovery rate. Adjusted Gene set GO term Aspect Group p-value amide biosynthetic process GO:0043604 B Spring Male 0.0230 peptide metabolic process GO:0006518 B Spring Male 0.0345 ribonucleoprotein complex GO:1990904 C Spring Male 4.05e-11 ribosome GO:0005840 C Spring Male 8.10e-11 structural molecule activity GO:0005198 M Spring Male 1.22e-10 structural constituent of ribosome GO:0003735 M Spring Male 6.10e-03 peptidase activity, acting on L-amino acid peptides GO:0070011 M Spring Male 0.0203 peptidase activity GO:0008233 M Spring Male 0.0244 response to chemical GO:0042221 B Summer Male 3.45e-10 cytosolic part GO:0044445 C Summer Male 8.10e-11 cytosol GO:0005829 C Summer Male 0.0122 heme binding GO:0020037 M Summer Male 6.10e-11 tetrapyrrole binding GO:0046906 M Summer Male 6.10e-11 iron ion binding GO:0005506 M Summer Male 1.22e-10 cofactor binding GO:0048037 M Summer Male 0.0244 phosphotransferase activity, alcohol group as acceptor GO:0016773 M Summer Male 0.0244 transition metal ion binding GO:0046914 M Summer Male 0.0244

167

Differential expression analysis #7: Spring Females vs. Summer Females:

Differential expression analysis of Harderian gland tissues comparing spring females to summer females included 14,350 reference sequences with non-zero normalized read counts. A total of 303 genes (~2.1%) displayed significantly increased abundances in spring females compared to summer females and 250 genes (~1.7%) displayed significantly increased abundances in summer females compared to spring females (adjusted p-value ≤ 0.01) (Figure 4.11). Independent filtering by DESeq2 identified no reference sequences as outliers and 7,258 reference sequences with low normalized counts and included 7,117 sequences in multiple test correction calculations.

Figure 4.11 Differentially expressed genes in the Harderian gland of T. s. parietalis comparing spring females to summer females. Heat map was produced and clustering performed using the R package “heatmap3” (Zhao, et al. 2014). Z-scores were calculated from regularized log transformed read mapping counts. DESeq2 (Love et al. 2014).

168

The lipocalin transcript (DN1116_c0_g1_i1) found to be significantly more abundant in males compared to females (comparison #1) and significantly more abundant in spring males compared to summer males (comparison # 5) was not found to be differentially expressed in spring females compared to summer (BH adjusted p- value: 0.74).

Enrichment analysis #7: Spring Females vs. Summer Females:

Enrichment analysis of gene sets based on differential expression results comparing spring females to summer females included 432 gene sets from the Biological process aspect (B) of the Gene Ontology structure, 134 gene sets from the Molecular function aspect (M), and 98 gene sets from the Cellular component aspect (C). Results showed significant enrichment of 1 Molecular function gene set and 2 Biological process gene sets in spring females, and 14 Molecular function gene sets and 5 Cellular component gene sets and 12 Biological process gene set in summer females (BH adjusted p-values ≤ 0.05; Table 4.7). Antimicrobial defense gene set was not found to be enriched in either spring females or summer females (BH adjusted p- values: ~1.0 (software rounded) (spring females); ~0.288 (summer females).

169

Table 4.7 Significantly enriched gene sets in Harderian glands when comparing Spring Females to Summer Females: Gene sets are reported if found to be significantly enriched in Harderian gland tissue comparing Females in the spring mating period to Females in the summer feeding period (adjusted p-value ≤ 0.05). “Gene set" contains the description of the Gene Ontology (GO) term indicated. “Aspect” indicates either the Molecular function (M), Biological process (B) or Cellular component (C) of the gene ontology structure. “Group” indicates the season in which the indicated gene set was enriched in females. Enrichment analysis was performed with the ErmineJ software package using Precision Recall Gene Score Resampling (Gillis et al. 2010). Gene scores were calculated from differential expression analysis using the R package DESeq2 (Love et al. 2014). Gene scores were calculated using - log10(p-value)(log2fc) from DEseq2 output. "Adjusted p-values" are reported from ErmineJ output and calculated using the Benjamini Hochberg FDR method to control for false discovery rate. Adjusted Gene set GO term Aspect Group p-value response to stress GO:0006950 B Spring Fem. 1.94e-10 protein folding GO:0006457 B Spring Fem. 3.88e-10 unfolded protein binding GO:0051082 M Spring Fem. 1.26e-10 transmembrane transport GO:0055085 B Summer Fem. 1.94e-10 cation transport GO:0006812 B Summer Fem. 3.88e-10 cation transmembrane transport GO:0098655 B Summer Fem. 0.0129 inorganic ion transmembrane Summer Fem. transport GO:0098660 B 0.0172 proton transmembrane transport GO:1902600 B Summer Fem. 0.0194 oxidation-reduction process GO:0055114 B Summer Fem. 0.0194 monovalent inorganic cation Summer Fem. transport GO:0015672 B 0.0222 inorganic cation transmem. Summer Fem. transport GO:0098662 B 0.0233 ion transport GO:0006811 B Summer Fem. 0.0259 ion transmembrane transport GO:0034220 B Summer Fem. 0.0310 response to chemical GO:0042221 B Summer Fem. 0.0353 drug metabolic process GO:0017144 B Summer Fem. 0.0453 cytosolic part GO:0044445 C Summer Fem. 8.60e-11 membrane protein complex GO:0098796 C Summer Fem. 4.30e-03

170 extracellular region GO:0005576 C Summer Fem. 5.73e-03 mitochondrion GO:0005739 C Summer Fem. 0.0215 cytosol GO:0005829 C Summer Fem. 0.0396 cofactor binding GO:0048037 M Summer Fem. 2.52e-11 tetrapyrrole binding GO:0046906 M Summer Fem. 3.15e-11 heme binding GO:0020037 M Summer Fem. 4.20e-11 proton transmem. transporter Summer Fem. activity GO:0015078 M 6.30e-11 iron ion binding GO:0005506 M Summer Fem. 1.26e-10 oxidoreductase activity GO:0016491 M Summer Fem. 4.20e-03 inorganic cation transmem. Summer Fem. transporter GO:0022890 M 4.73e-03 cation transmem. transporter Summer Fem. activity GO:0008324 M 5.40e-03 monov. inorganic cation transm. Summer Fem. activ. GO:0015077 M 5.60e-03 inorganic molecular transm. Summer Fem. trans. GO:0015318 M 0.0105 ion transmembrane transporter Summer Fem. activity GO:0015075 M 0.0113 transporter activity GO:0005215 M Summer Fem. 0.0115 transmembrane transporter Summer Fem. activity GO:0022857 M 0.0145 lipid binding GO:0008289 M Summer Fem. 0.0234

171

Discussion:

Exploratory analyses:

The Principal Components analysis was performed as part of preliminary data examination in order to identify major sources of variability and to identify samples which may be extreme outliers compared to the data from other samples. When using PC analysis, we should expect data to group well according to the factors we have defined in our experiment (Joliffe et al. 2016). If the data showed groupings on PC1 and PC2 that did not correspond well to the defined factor levels, this would indicate that substantial sources of variation were present that was not captured by the factors included in the analysis. Additional visualization of preliminary data may be performed by plotting z-scores of read mapping counts for all samples to check for clustering associated with defined factors (Love et al. 2014).

Preliminary examinations of the data included in this study showed that sample variation on PC1 and PC2 resulted in non-overlapping clusters associated with all defined factor levels. Examination of a heatmap of z-scores of the top 1000 most abundant genes also showed clearly defined clusters associated with factor combinations of sex and season.

One sample (spring male #107) showed a large amount of variation which was substantially different from all other samples. These outlying variances may have been due to a failed sequencing library prep, causing sequencing artifacts or a skewed representation of transcripts present in the library. Although the snake appeared healthy at the time of tissue collection, health is sometimes difficult to assess (Uhrig et al. 2015), and it is possible that the observed abnormal variation was caused by biological factors such as disease, heavy parasite load or malnutrition. As we were not able to determine the source of the observed difference in gene expression, this sample was removed, and all analyses were conducted with the remaining nine spring male samples.

172

Preliminary data examinations performed here showed that the experimental design including the factors of sex and season captured a large proportion of the variation present in all samples. Based on observed z-scores, it was very likely that differential expression analyses would result in statistically significant and biologically meaningful results. Based on this determination, I chose to proceed with this differential expression and enrichment analyses as planned.

Differential expression analysis:

Differential expression analyses often result in a large number of differentially expressed genes. The number of genes included in an RNA-seq based study are often so large as to preclude gene by gene analysis by the researcher. Therefore, in order to make sense of large amounts of biologically relevant data, researchers must use analysis methods to identify significant and relevant patterns when comparing gene expression profiles between multiple groups (Conessa et al. 2016). Enrichment analysis is a method that examines meaningful patterns of expression in an RNA-seq experiment by grouping reference sequences into biologically meaningful gene sets (Ballouz et al. 2017). This method statistically compares the expression profiles of annotated gene sets defined by the Gene Ontology consortium (GO) to identify statistically significant enrichment of genes which share functions. The GO structure is split into 3 major aspects: “Molecular function”, “Biological Process”, and “Cellular component” (i.e. subcellular localization). Comparing gene expression and identifying gene sets enriched in each of the 7 comparisons presented in this chapter is a powerful method for describing the functional significance of the effects of sex and season on the Harderian gland of T. s. parietalis (Gillis at al., 2010).

173

Differential expression and enrichment analyses #1-3: Effects of Sex, Season and Sex by Season Interaction across all samples:

Male Harderian glands were found to be enriched for genes targeted to the extracellular region and associated with the molecular function of lipid binding. Chapter 2 presents data showing that secretion to the extracellular region, and the production of lipid-binding proteins are among the major functions of the Harderian gland in T. s. parietalis. Based on findings reported in the literature surrounding the Harderian gland in this group, it was expected that this tissue would be found to perform these functions, as this gland is a specialized secretory acinar gland hypothesized to facilitate the binding and solubilization of lipids (Rehorek et al. 2011; Mason & Halpern, 2011). Here, gene expression in male Harderian glands showed enrichment for genes targeted to the extracellular region and genes associated with lipid binding when compared to females. This would appear to indicate that male Harderian glands perform a different function compared to females. However, observations of differential expression analysis by sex, and the sex by season interaction show that the expression patterns of many genes are affected by both sex and season, indicating that additional single-factor comparisons were necessary to describe the underlying effects of sex and season.

In the summer, Harderian glands were found to be enriched for several gene sets including Cofactor binding, Tetrapyrrole binding, Heme binding, and Iron binding. These gene sets are closely associated in the hierarchical GO structure, and are distributed vertically, with heme binding and iron binding being sub-categories of cofactor binding, and tetrapyrrole binding. These gene sets, are therefore likely to be involved in the same process of producing or transporting porphyrins (tetrapyrroles). Multiple studies have identified the Harderian gland as producing porphyrins, especially in rats and the Syrian hamster, although the function of the porphyrins in these groups is poorly understood (Cui et al. 2003., Payne, 1994; Buzzell 1996).

Differential expression analysis investigating the effects of the sex by season interaction term resulted in 383 differentially expressed genes, and enrichment

174 analysis based on gene scores calculated from this comparison showed enrichment of 55 gene sets. These findings show that substantial, biologically relevant differences in gene expression vary by the interaction of sex and season. Not only were a large number of genes identified as being differentially expressed in this interaction, but those genes were associated with meaningful biological processes, molecular functions, or cellular components. These results alone do not show how the variation is distributed among sex and season (Castaldi et al. 2017). Further comparative analyses (comparisons 4-7) describe the expression of genes and associated gene sets in all combinations of the factors of sex and season (Figure 4.2).

No gene sets were found to be enriched in either females or in the spring across all samples. This may only reflect the fact that the female Harderian gland is regressed and generally inactive during the spring mating period (Erickson, 2007).

Differential expression and enrichment analysis #4: Spring Male vs Spring Female:

Male Harderian glands during the spring mating period showed enrichment of 12 molecular function gene sets and 12 cellular component gene sets which were enriched when compared to spring females. Genes associated with lipid binding were found to be enriched in spring males. This gene set has been shown to be enriched in the Harderian gland compared to other tissues (Chapter 2), and in the summer compared to the spring across all samples. As secretion of lipid-binding proteins appears to be an important function of the Harderian gland, it is expected that it should be enriched in comparisons where one tissue is likely more active in all areas of function such as in the spring when male Harderian glands are much more active than those of females.

In addition to lipid binding, many other gene sets associated with 11 molecular functions and 12 cellular components were found to be enriched in spring males compared to spring females. Enrichment of genes targeted to both the extracellular region and the cytosol, as well as several gene sets associated with

175 translation and several metabolic processes were enriched in spring males. Together these findings suggest that the male Harderian gland is generally more active than that of females in the spring, displaying enrichment of many unrelated gene sets.

Differential expression and enrichment analysis #5: Summer Male vs Summer Female:

In the summer, gene expression in male Harderian glands was found to be enriched for genes associated with lipid binding and genes targeted to the extracellular region. Both of these gene sets were also found to be enriched in the summer across all samples (comparison #2). Because a comparison across all samples lacks the power to detect sex specific differences in the summer, this finding suggests that relatively more of the overall variation in expression of lipid-binding genes is contributed by males in the summer compared to females. Lipid binding and secretion to the extracellular region appear to be important functions of the Harderian gland (Chapter 2). As no other gene sets were found to be enriched in summer males compared to summer females, the enrichment detected here is likely not merely the result of an increase in overall tissue activity.

Differential expression and enrichment analysis #6: Spring Male vs Summer Male:

Gene expression in the male Harderian gland in the spring compared to that of summer was found to be enriched for 8 gene sets associated with the ribosome, protein metabolic processes and peptidase activity (indicative of proteins entering the secretory pathway). Although secretory proteins or genes targeted to the extracellular environment were not enriched, the gene sets identified here as enriched in spring males are likely to be involved in processes of producing proteins.

Enrichment of peptidase activity indicates upregulation of enzymes responsible for cleaving signal peptides from nascent proteins as they are translated

176 across the endoplasmic reticulum. An upregulation of peptidase activity may indicate that an increased number of proteins are entering the secretory pathway at this time.

In the summer, the Harderian gland of males was found to be enriched for genes targeted to the cytosol and genes associated with a physiological response to chemicals and porphyrin binding. Genes targeted to the cytosol in the summer rather than the extracellular environment may indicate that male glands during this period are more active in metabolic functions compared to spring when the Harderian gland is producing large amounts of secretory protein. “Response to chemical” is a relatively high level gene set in the GO structure including many genes (this gene set represents a parent term relative to many gene sets in the GO structure). As such, it is less specific than many of the other gene sets reported here and may be associated with many physiological response pathways active in the Harderian gland.

Differential expression and enrichment analysis #7: Spring Female vs Summer Female:

Harderian glands of spring females were found to be enriched for genes associated with response to stress, protein folding, and unfolded protein binding. As female Harderian glands are observed to be regressed and inactive in the spring compared to summer, it is possible that protein folding and unfolded protein binding are related to preparing proteins for functions performed by the Harderian gland later in the season.

Females in summer show enrichment of a large number of gene sets in the summer compared to the spring. Enrichment of genes associated with Porphyrin binding, lipid binding, and metabolic processes were found to be enriched in the summer, as well as genes targeted to both the cytosol and targeted to the extracellular region.

177

During the summer feeding time period, the female Harderian gland is much more active than during the spring mating period. This result indicates that it is likely that many physiological processes and functions are upregulated during the summer as a result of its overall increase in activity observed in this tissue.

Differential expression of lipid-binding proteins:

Several lipid-binding proteins were identified as being of interest in this study. One such protein, annotated as a lipocalin, was found to be the second most commonly expressed transcript in the Harderian gland. Although extremely abundant, this transcript did not fit the hypothetical expression patterns that are likely to be indicative of a pheromone-binding protein. Pheromone-binding proteins very commonly show male biased expression, (Beynon et al. 2008; Jin, et al. 2014; Chang et al. 2015). In T. s. parietalis, a pheromone-binding protein must be capable of binding and solubilizing lipids and would likely show an expression bias toward the spring mating period as well, as pheromone is only physiologically important during mating. The most abundant lipocalin identified here is predicted to posess the necessary structure capable of binding lipids, and may be capable of binding the T. s parietalis sex pheromone, but it was not found to be differentially expressed by either sex or season. However, in addition to the lipocalin described above, two other transcripts annotated as lipocalins were identified that did show these expression patterns. Each of these were not only annotated as lipid-binding proteins but were found to have extremely male biased expression. Both of these lipocalins were found to be very abundant in the Harderian glands of male snakes, and nearly absent from those of females. One of these, (DN1116_c0_g1_i1) was found to be expressed at much higher rates (~10.5x more abundant than the other). Additionally, DN1116_c0_g1_i1 was found to be significantly more abundant in the spring compared to the summer in male snakes (p<0.02), whereas the other was not. The less abundant of the two was not detected in females. DN1116_c0_g1_i1 was identified, but not found to be differentially expressed by season in females. Based on these

178 results, one transcript was found to fit all expression patterns and posess necessary characteristics which were predicted and hypothesized to be indicative of a pheromone-binding protein in T. s. parietalis.

Evidence for sexual and seasonal variation in immune functions of the Harderian gland:

In Chapters 2 and 3, evidence was described showing that gene expression in the Harderian gland is significantly enriched for genes associated with antimicrobial defense, and that extracellular antimicrobial proteins produced in the Harderian gland are secreted into the vomeronasal organ. Here, I examined the variation in immune functions of the Harderian gland by sex and season. Antimicrobial defense genes were not found to be significantly enriched in any group from any comparison made here. Based on differential expression and enrichment analysis, I find no evidence that the antimicrobial functions in the Harderian gland vary significantly by sex or season.

Summary and conclusions:

Production of lipid-binding proteins in the Harderian gland has been hypothesized as an important function of this tissue functioning as a pheromone- binding protein – binding and solubilizing the lipid based sexual attractiveness pheromone in T. s. parietalis to facilitate its detection by the vomeronasal organ (Mason & Halpern, 2011). Transcription of lipid-binding proteins has been observed in the squamate Harderian gland transcriptomes of three species of snake: Caraiba andreae, Cubophis cantherigerus, and Tretanorhinus variabilis (Domínguez-Pérez et al. 2018). Findings presented in Chapter 3 show that lipid-binding proteins are highly abundant in the fluid filling the lumen of the vomeronasal organ, and these proteins are predicted to originate from the Harderian gland. Pheromone-binding proteins are often sex-biased, displaying high expression only in the sex that actively uses

179 chemical cues to search for mates (usually males) (Beynon et al. 2008; Jin, et al. 2014; Chang et al. 2015).

Observations presented here show that genes with lipid-binding protein products are enriched in the male Harderian gland across seasons, in males in the spring and in summer, and also enriched in summer females compared to females in the spring (Comparisons 1, 4, 5, and 7). As pheromone-binding proteins in T. s. parietalis must be capable of binding lipids, and only males use their vomeronasal organ to actively search for females by means of sex pheromones, it should be expected that pheromone-binding proteins would be enriched in males compared to females. Enrichment of lipid-binding protein genes in the Harderian glands of summer females is likely due to the overall increase in activity in this tissue in the summer (Erickson, 2007). The findings presented in this chapter support this hypothesis and provide a rationale to examine these genes more closely.

A total of 24 proteins associated with lipid binding functions were found here to be expressed in the Harderian gland. Of these, only two were found to display extremely male-biased expression. These same proteins were found to be seasonally- biased to some extent as well, although these differences are not statistically significant. The two proteins identified are annotated as lipocalins (a family of secreted lipid-binding proteins; Flower, 1996). The combination of lipid binding functions, significant sex-biased expression, and higher expression in the spring mating period compared to the summer feeding period mark these proteins as putative pheromone-binding proteins in T. s. parietalis. One of these lipocalins (DN1116_c0_g1_i1) was found to be more than 10 times as abundant as the other candidate and is considered more likely to fulfil the role of a pheromone-binding protein in this species.

Many studies have identified the Harderian gland as producing porphyrins, especially in rodents such as rats and hamsters; however, the function of these porphyrins is not well understood (Payne, 1994; Burns, 1992). Although the function of these molecules in rodents is unknown, porphyrin production from the Harderian

180 gland in hamsters is often used as a model for oxidative stress (Coto-montes et al. 2001). Porphyrins in rats may be linked to stress response or immune system function as its production has been shown to increase during times of stress or infection (Sakai 1981; Harkness and Ridgeway 1980).

The findings presented here indicate that genes associated with porphyrin binding are enriched in the T. s. parietalis Harderian gland in summer males, summer females, and in the summer across all samples (comparisons 6, 7, and 2). While associations with porphyrin synthesis are common in rodents, this is the first observation of this function in a squamate taxon. The function of the observed porphyrin-binding genes in the garter snake Harderian gland is still unknown. However, these observations indicate that the expression of porphyrin-binding genes (likely indicating upregulation of porphyrin synthesis) is biased toward the summer feeding period. Blakemore et al. (in prep.) found that immune responses to injury varied seasonally, with snakes showing weaker immune responses in the spring compared to summer, and that gravid females displayed weaker immune responses compared to those not in a reproductive state. The author argued that this was an indication of reproductive strategies prioritizing reproduction over immune system functions in the spring. Porphyrin binding was enriched only in the spring; therefore, porphyrin synthesis is associated with immune processes in response to infection (as observed in rodents), and it appears this also may be true of T. s. parietalis.

Genes associated with translation and metabolic processes were found to be enriched in the spring compared to summer across all samples. Male Harderian glands are active during the spring while female glands are regressed and appear inactive. During the summer, both male and female Harderian glands are active when both males and females are using their vomeronasal organ to search for prey (Erickson, 2007). Enrichment of gene sets associated with metabolic processes and translation during the spring are therefore likely due to a combination of male glands actively producing and secreting protein products while the female Harderian gland is

181 beginning its annual seasonal hypertrophy requiring the translation of many metabolic proteins.

Expression of genes encoding proteins targeted to the cytosol were found to be enriched in the summer across all samples and in summer males and summer females (Comparisons 2, 6, and 7). Expression of genes encoding proteins targeted to the extracellular region were found to be enriched in males across all samples, and enriched in spring males, and summer females (comparisons 1, 3, 4, 5, and 7). This result agrees with the interpretation of expression patterns of metabolic processes and translation described above. Male Harderian glands are more active in the spring compared to females, and more active in summer females compared to spring females. Expression patterns of genes encoding proteins targeted to the cytosol and extracellular region are likely due to the increased overall activity of this tissue in summer females, and the observed increase in the proportion of metabolic activities in summer males (relative to production and secretion of secretory proteins). An additional result showed enrichment of unfolded protein binding and protein folding (processes which occur early in translation of proteins) in spring females – further supporting the proposition that gene expression in spring females is related to preparation for an increase in activity in the summer feeding period.

A decrease in hormonal stress responses of male T. s. parietalis has been observed during the the spring mating period compared to the summer feeding period (Dayger & Lutterschmidt, 2016). These studies were all performed using male snakes only and did not compare stress responses in females. Blakemore et al. (in prep) showed that male T. s. parietalis show decreased immune responses in the spring compared to males during the summer feeding period. The authors findings from this study suggest that immune response is seasonally regulated in this group, so it is possible that response to stress varies by season as well. It is well documented that the spring mating period is stressful for T. s. parietalis, however mating behavior in spring males is uncoupled from concentrations of stress hormones, likely due to a suppression of stress responses in a tradeoff favoring reproduction (Moore & Mason,

182

2001). Little is known about stress hormones or responses to stress in females while at the den. Enrichment of genes involved in stress responses in spring females may indicate that responses to stress or stress hormones during mating are sexually dimorphic.

Genes involved in stress response were found to be enriched in Harderian gland tissue only in females in the spring mating period. This finding suggests that response to stress is sexually dimorphic in T. s. parietalis. While males are thought to prioritize reproduction in the spring over stress response and immune response (reproduction over long term survival), findings presented here suggest that females may employ an opposite strategy – prioritizing long term survival over reproduction in the spring. Although this result was observed in the Harderian gland, it is likely that stress responses in the spring are not confined to this tissue, as stress response is a systemic hormonal response usually affecting all cells in the body.

The research presented in this chapter was performed to investigate the seasonal and sexual variations in gene expression in the Harderian gland of T. s. parietalis. Previous research involving the Harderian gland in many groups has shown seasonal and sexual dimorphism in function (Buzzell, 1996; Minucci et al. 1989; Hoh, 1984; Sashima, 1989). It is well documented that the annual behavioral cycles of T. s. parietalis are tightly coordinated with the seasonal environment (Aleksiuk & Gregory, 1974), and it has been shown that the Harderian gland in this group displays significant structural changes coinciding with these seasonal behavioral shifts (Erickson, 2007).

Findings presented here represent a large body of evidence showing substantial amount of statistically significant and biologically relevant seasonal and sexual variation in gene expression in the Harderian gland. Many significantly differentially expressed genes are described which vary by all combinations of treatment factors. Enrichment analyses identified many gene sets significantly enriched in each comparison indicating that changes to expression profiles are due to biologically

183 meaningful differences between male and female Harderian glands and those during the spring mating period and the summer feeding period.

The functional significances of the patterns of genes expression (described in detail above) show that the Harderian gland in T. s. parietalis displays significant seasonal variation and sexual dimorphism in the expression of lipid-binding proteins, genes associated with transcription and translation, porphyrin binding, responses to stress, and metabolic processes. These processes are meaningfully interpreted within the context of the annual seasonal behavioral cycles of T. s. parietalis.

As described earlier in this chapter, a pheromone-binding protein in T. s. parietalis must be capable of binding and solubilizing the female sexual attractiveness pheromone which consists of a series of long chain methyl ketones (Mason et al. 1989). Therefore, any proteins considered candidates as a pheromone- binding protein must be identified as an extracellular lipid-binding protein. In addition to this identified function, a pheromone-binding protein is hypothesized to show an extreme expression bias toward males, and a bias toward the spring mating period (Beynon et al. 2008; Jin, et al. 2014; Chang et al. 2015).

Research presented here identified many lipid-binding proteins expressed in high abundances. Only one such transcript fit the hypothetical patterns used for identification of a likely candidate for a pheromone-binding protein. The pheromone- binding protein candidate is annotated as a lipid-binding protein of the lipocalin protein family and displays high expression in males while being nearly absent from females. This lipocalin was found to be significantly more abundant in males during the spring mating period compared to the summer, but this was not demonstrated in females. Based on these findings, this lipocalin (ID: DN1116_c0_g1_i1) is considered a likely candidate for a pheromone-binding protein in T. s parietalis.

This research however lacks experimental evidence confirming the molecular function and physiological importance of this candidate protein, both of which are essential to confirming its role as a pheromone-binding protein. This additional research is therefore considered the highest priority.

184

The Harderian gland in avian taxa has been often described as a tissue involved in the immune system (Albini et al. 1974; Deist & Lamont, 2018). Based on the supposition that the squamate Harderian gland may share this function, gene set enrichment analysis was performed to examine the expression of genes involved in antimicrobial defense in the Harderian gland of T. s. parietalis compared to a multi- tissue pool in the same species (Chapter 2). This analysis showed that gene expression in the Harderian gland of this group is significantly enriched for genes associated with antimicrobial defense indicating that this tissue functions as part of the immune system in this group as well as birds.

Based on these findings, research in this chapter aimed to examine the sexual and seasonal variation of antimicrobial defense proteins in the Harderian gland. Based on differential expression and enrichment analyses, I found no evidence of significant variation of antimicrobial defense associated with the factors of sex and season.

185

References:

Albini, B, Wick, G, Rose, E, Orlans, E. (1974) Immunoglobulin production in chicken Harderian glands. Int Arch Allergy Immunol. 47:23-34. Aleksiuk, M., and Gregory, P. (1974) Regulation of seasonal mating behavior in Thamnophis sirtalis parietalis. Copeia. 1974(3):681489. Andrews, S. (2010) FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc. Ballouz, S., Pavlidis, P., Gillis, J., (2017) Using predictive specificity to determine when gene set analysis is biologically meaningful. Nucleic Acids Research. 45(4):e20. Beynon R., Hurst, J., Turton, M., Robertson, D., Armstrong, S., Cheethtam, S., Simpson, D., MacNicoll, A., Humphries, R. (2008) Urinary lipocalins in Rodentia: is there a generic model? In: Chemical Signals in Vertebrates 11. Springer, New York, NY. Blakemore, L., Dolan, B., Bentz, E., Mason, R. (In Prep.) Sex or survival: tradeoffs between reproduction and anti-microbial defense in the Red-sided garter snake, Thamnophis sirtalis parietalis. Bourgon, R., Gentleman, R., Huber, W. (2010) Independent filtering increases detection power for high-throughput experiments. Proceedings of the National Academy of Sciences. 107(21):9546-9551. Burns, R. (1992) The Harderian gland in birds: histology and immunology. In: Harderian glands: porphyrin metabolism, behavioral and endocrine effects. (Springer)155-163. Buzzell, G. (1996) The Harderian gland: perspectives. Microscopy research and technique. 34(1). Castaldi, P., Cho, M., Liang, L., Silverman, E., Hersh, C., Rice, K., Aschard, H., (2017) Screening for interaction effects in gene expression data. PLoS ONE. 12(3): e0173847. Chang, H., Liu, Y., Yang, T., Pelosi, P., Dong, S., & Wang, G. (2015). Pheromone- binding proteins enhance the sensitivity of olfactory receptors to sex pheromones in Chilo suppressalis. Scientific reports 5:13093. Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., Wojciech Szcześniak, M., Gaffney, D., Elo, L., Zhang, X., Mortazavi1, A. (2016) A survey of best practices for RNA-seq data analysis. Genome Biology. 17:13.

186

Coto-Montes, A., Boga, J., Tomás-Zapico, C., Rodrıgueź -Colunga, M., Martıneź - Fraga, J., Tolivia-Cadrecha, D., Menéndez, G., Hardeland, R. (2001) Physiological oxidative stress model: Syrian hamster Harderian gland—sex differences in antioxidant enzymes. Free Radical Biology and Medicine. 30(7):785-792. Cui, Zhou, Satoh, Habara. (2003). A physiological role for protoporphyrin IX photodynamic action in the rat Harderian gland? Acta Physiologica Scandinavica. 179(2):149-154. David, M., Dzamba, M., Lister, D., Ilie, L., Brudno, M. (2011) SHRiMP2: Sensitive yet practical short read mapping. Bioinformatics. 27(7):1011–1012. Dawley, E., Crowder, J. (1995), Sexual and seasonal differences in the vomeronasal epithelium of the red‐backed salamander (Plethodon cinereus). J. Comp. Neurol. 359:382-390. Dayger, C., Lutterschmidt, D. (2016) Seasonal and sex differences in responsiveness to adrenocorticotropic hormone contribute to stress response plasticity in Red- sided garter snakes (Thamnophis sirtalis parietalis). Journal of Experimental Biology. 219(7):1022. Deist, M., Lamont, S. (2018) What makes the Harderian gland transcriptome different from other chicken immune tissues? A gene expression comparative analysis. Frontiers in physiology. 9:492. Domínguez-Pérez, D., Durban, J., Aguero-Chapin, G., Lopes, J., Molina, Ruiz, R., Almeida, D., Calvete, J., Vasconcelos, V., Antunes, A. (2018). The Harderian gland transcriptomes of Caraiba andreae, Cubophis cantherigerus and Tretanorhinus variabilis, three colubroid snakes from Cuba. Genomics. In press. Erickson, S. (2007) Sexual dimorphism and seasonal changes in the Harderian gland of the Red-sided garter snake, Thamnophis sirtalis parietalis. Corvallis, Or. Oregon State University. Flower, D. (1996) The lipocalin protein family: structure and function. Biochem. 318:1-14. Gillis J. Mistry M. Pavlidis P. (2010) Gene function analysis in complex data sets using ErmineJ. Nature Protocols. 5(6):1148-59. Gregory P. (1974) Patterns of spring emergence of the Red-sided garter snake (Thamnophis sirtalis parietalis) in the Interlake region of Manitoba. Journal of Canadian Zoology. 52:1063-1069. Halpern, M., Kubie, J., Silverstein, R., Muller-Schwarze, D. (1983) Snake tongue- flicking behavior: clues to vomeronasal system functions. Chemical Signals III. Plenum, NY. 45-72.

187

Harkness, E., Ridgway, M. (1980) Chromodacryorrhea in laboratory rats (Rattus norvegicus): Etiologic considerations. Laboratory animal science. 30:841-4. Hoh J., Lin W., Nadakavukaren M. (1984) Sexual dimorphism in the Harderian gland proteins of the golden hamster. Comparative Biochemistry and Physiology. 77B:729-731. Huang, G., Zhang, J., Wang, D., Mason, R., Halpern, M. (2006) Female snake sex pheromone induces membrane responses in vomeronasal sensory neurons of male snakes. Chemical Senses. 31(6):521–529. Jin, J., Jhang, T., Lui, N., Dong, S. (2014) Different roles suggested by sex-biased expression and pheromone-binding affinity among three pheromone-binding proteins in the pink rice borer, Sesamia inferens (Walker) (Lepidoptera: Noctuidae). Journal of insect physiology. 66:71-79. Jolliffe, I., Cadima, J. (2016). Principal component analysis: a review and recent developments. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences. 374(2065):20150202. Kondoh, D., Yamamoto, Y., Nakamuta, N., Taniguchi, K. (2012) Seasonal Changes in the Histochemical Properties of the Olfactory Epithelium and Vomeronasal Organ in the Japanese Striped Snake, Elaphe quadrivirgata. Anatomia. Histologia Embryologia 41: 41-53. Kubie, J., Vagvolgyi, A., Halpern, M. (1978) The roles of the vomeronasal and olfactory systems in the courtship behavior of male garter snakes. Journal of Comp. Physiol. Psychol. 92:627-641. LeMaster, M, Mason, R. (2002) Variation in a female sexual attractiveness pheromone controls male mate choice in garter snakes. Journal of chemical ecology. 28(6):1269-85. LeMaster, M., Mason, R. (2001) Evidence for a female sex pheromone mediating male trailing behavior in the Red-sided garter snake, Thamnophis sirtalis parietalis. Chemoecology. 11:149-152. Lohman, B., Weber, J., & Bolnick, D. (2016). Evaluation of TagSeq, a reliable low‐ cost alternative for RNA seq. Molecular ecology resources. 16(6):1315-1321. Love, M., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550. Mason, R., Fales, H., Jones, T., Pannell, L., Chinn, J., Crews, D. (1989) Sex pheromones in snakes. Science. 254(4915):290-293. Mason, R., Halpern, M. (2011) Chemical ecology of snakes: from pheromones to receptors. North American Society for Comparative Endocrinology Conference. July, 2011.

188

Meyer, E., Aglyamova, G., Matz, M. (2011) Profiling gene expression responses of coral larvae (Acropora millepora) to elevated temperature and settlement inducers using a novel RNA‐Seq procedure. Molecular Ecology. 20:3599- 3616. Minucci, S., Chieffi Baccari, G., DiMatteo L. (1989) A sexual dimorphism of the Harderian gland of the toad, Bufo viridis. Basic and Applied Histochemistry. 33:299-310. Moore, & Mason. (2001) Behavioral and hormonal responses to corticosterone in the male Red-sided garter snake, Thamnophis sirtalis parietalis. Physiology & Behavior, 72(5):669-674. O'Donnell, R., Shine, R., Mason, R. (2004) Seasonal anorexia in the male Red-sided Garter snake, Thamnophis sirtalis parietalis. Behavioral and Ecological sociobiology. 56:413-41. Pavlidis, P., Lewis, D., Noble, W. (2002) Exploring gene expression data with class scores. Pacific Symposium on Biocomputing: 474–485. Payne, A. (1994) The Harderian Gland: A tercentennial review. Journal of Anatomy. 185(1):1–49. Payne, A., McGadey, J., Moore, M., Thompson, G. (1977) Androgenic control of the Harderian gland in the male golden hamster. Journal of Endocrinology 75:73- 82. Rehorek, S. (2000b) Passage of Harderian gland secretions to the vomeronasal organ in Thamnophis sirtalis (Serpentes: Colubridae). Canadian Journal of Zoology. 78(7): 1284-1288. Rehorek, S., Firth, B., Hutchinson, M. (2000a). The structure of the nasal chemosensory system in squamate reptiles. Journal of Biosciences. 25(2):181- 90. Rehorek, S., Halpern M., Firth, B., Hutchinson, M. (2011). The Harderian gland of two species of snakes: Pseudonaja textilis (Elapidae) and Thamnophis sirtalis (Colubridae). Canadian Journal of Zoology. 81:357-363. Rumble, S., Lacroute, P., Dalca, A., Fiume, M., Sidow A. (2009) SHRiMP: Accurate mapping of short color-space reads. PLoS Comput Biol. 5(5):e1000386. Sakai, T. (1981) The mammalian Harderian gland: morphology, biochemistry, function and phylogeny. Arch. Hist. Jap. 44(4):299-333. Sashima, M., Hatakey, S., Satoh M., Suzuki, A. (1989) Harderianization is another sexual dimorphism of rat exorbital lacrimal gland. Acta Anatomica 135:303- 306. Smith, T., Bhatnagar, K. (2017) Vomeronasal system evolution. Psychology. Encyclopedia of Neuroscience. 461-470.

189

Uhrig, E. (2015). Reproductive implications of parasitic infections and immune challenges in garter snakes. Ph.D. Thesis. Corvallis, Or. Oregon State University. Zhao, S., Guo, Y., Sheng, Q., Shyr, Y., (2014) Advanced heat map and clustering analysis using Heatmap3. BioMed Research International. 986048.

190

Chapter 5:

Variations in the expression of vomeronasal chemosensory sensory

receptors by sex and season

Introduction:

The vomeronasal organ is a major chemosensory structure found in all lineages of terrestrial vertebrates (Bertmar, 1980). This organ consists of a cartilaginous or bony capsule lined with a sensory epithelium composed of thousands of densely packed neurons (Døving & Trotier, 1998, Rehorek et al. 2000a). The axons of these sensory neurons travel to the accessory olfactory bulb of the brain to be interpreted (Halpern, 1987; Rodriguez et al. 1999).

In order to be detected, chemical signal molecules must be transported from the environment into the lumen of the vomeronasal organ to be detected by the vomeronasal sensory epithelium (Rehorek et al. 2000a). In squamates, the opening into the vomeronasal organ (the vomeronasal duct) is located inside the mouth. The primary function of the squamate vomeronasal organ is to detect nonvolatile chemical signals (Rehorek et al. 2000b). In snakes, chemical signals are transported from the environment to the vomeronasal duct via tongue-flicking behavior (Halpern, 1987). Snakes have a particularly well adapted vomeronasal sense which in many species is the predominant method used to explore their environment (Brykczynska, et al. 2013).

Vomeronasal sensory epithelium contains neurons which express one of three distinct classes of G-protein coupled receptor; V1R, V2R, or formyl peptide receptors (FPRs) (Rivie`re, et al. 2009; Dulac & Axel, 1995; Francia et al. 2014). Each individual sensory neuron in the vomeronasal organ expresses many copies of receptors originating from a single receptor gene at any given time (Bargmann, 1997; Mazzoni et al. 2004). Most vomeronasal receptors bind a specific ligand, thus each

191 neuron in the vomeronasal organ responds only to one chemical cue (or a very narrow class of chemical cues). Complex chemical cues detected by the vomeronasal organ are interpreted by the brain as a distinct pattern of neuronal activation (Rodriguez, 1999).

Huang et al. (2006) used patch-clamp electrophysiology techniques to show that the vomeronasal sensory epithelium of male T. s. parietalis responded to female sexual attractiveness pheromone whereas female vomeronasal sensory epithelium did not. This method is valuable as a research tool as it is independent of behavior, and therefore does not rely on observations of live animals or interpretations of behavior in order to draw conclusions. The researchers in Huang et al. exposed male and female vomeronasal neurons to sex pheromone and recorded the response of the tissue in-vitro. This result shows, with a high degree of confidence, that female vomeronasal neurons did not detect sex pheromone indicating that they are physiologically incapable of responding to the pheromone behaviorally. Male tissues in this study did respond to sex pheromone, and as is often observed, males near the den in the wild respond strongly to very small amounts of sex pheromone (Mason et al. 1990). These results indicate that the physiology of the vomeronasal sensory epithelium itself differs between males and females.

The Red-sided garter snake, T. s. parietalis, has been used for vertebrate research spanning decades, and its physiology and behaviors are well described. This species shows extremely robust seasonal behaviors – mating occurs at the den in early spring immediately following winter brumation and feeding occurring in the summer (Shine et al. 2011).

During the spring mating period, male snakes are observed to actively use their vomeronasal organ to search for females which are detected by the presence of sexual attractiveness pheromone (Mason et al. 1989). Males at this time use their chemical senses as the sole mechanism by which they detect females and evaluate their quality as potential mates (LeMaster & Mason, 2002). During this time, females are not observed to tongue-flick, and do not appear to use their vomeronasal organ to

192 evaluate mates. During this spring mating period, neither male nor female snakes hunt or feed, and will not take food even if offered (Gregory & Stewart, 1975; O’Donnell et al. 2004). When spring mating comes to an end, both male and female snakes migrate to the surrounding marshes to feed. During this summer feeding period, both males and females actively use their vomeronasal organ to search for prey kairomones.

The observations of sexually dimorphic seasonal behavior in T. s. parietalis suggest he the vomeronasal organ in this species is used for very different purposes in the spring compared to the summer, and in males compared to females. These major differences suggest the possibility that the receptor repertoires may differ by sex and may shift by season. The hypothesis is supported by the findings of Huang et al (2006) which shows that male snakes are able to respond to female sex pheromone whereas females cannot.

Changes to chemical sensory receptor repertoires over time have been observed in multiple studies. Johnstone et al (2011) reported changes to olfactory receptor repertoires across developmental life stages of Atlantic salmon (Salmo salar). The authors suggest that the shift in expression of olfactory receptors may be associated with a shift in importance among the functions of the olfactory system at different life stages and may suggest optimization for migration at early and late stages of life. Seasonal variation in the number of sensory neurons was observed in the Japanese toad (Bufo japonicus) (Nakazawa, et al. 2009). The authors additionally noted that an increased sensitivity to specific chemical cues was associated with the observed neurogenesis. Shiao et al. (2012) used transcriptomic analyses to show that the expression of olfactory receptors in mice is sexually dimorphic and likely results in a difference in abilities between male and female mice to detect chemical cues. Dawley & Crowder (1995) observed significant seasonal and sexual dimorphism in the vomeronasal sensory epithelium of the red-backed salamander (Plethodon cinereus). Due to the seasonal nature of this species and the manner in which the vomeronasal organ is used during different seasons prompted the authors to suggest that the

193 observed seasonal neurogenesis in the vomeronasal sensory epithelium may be due to the production of specialized receptors used for mating and feeding at different time points.

These observations show that chemical sensory systems are often physiologically optimized for the requirements of the organism at any given time. Because T. s. parietalis is observed to show extremely robust seasonal shifts in behavior accompanied by a shift in the importance of specific chemical cues (sex pheromone in spring shifting to prey kairomones in summer), it was hypothesized that seasonal variations and sexual dimorphism of the vomeronasal receptor repertoire exist in order to optimize the abilities of the vomeronasal chemosensory system in this species.

Research Aims:

Aim 1) Research Question: Does the receptor profile of T. s. parietalis differ by sex? A number of studies have observed differential responses to chemical cues detected based on sex or by season (Kandoh et al. 2011; Nakazawa, et al. 2009; Hamdani et al. 2008). The Red-sided garter snake is an excellent example of this phenomenon, displaying extreme variation by both sex and season in responses to chemical cues (LeMaster & Mason 2001; Lutterschmidt & Maine, 2014). Male snakes display robust and easily identifiable courtship behavior when exposed to sex pheromones in the spring, but do not respond to prey kairomones at that time. In the summer, the same animals will cease responding to sex pheromone, but show robust prey-trailing and feeding behavior based on chemical cues alone. Female snakes do not appear to respond to sex pheromone or prey kairomones in the spring but respond strongly to prey cues in the summer (Gregory & Stewart, 1975). These observations led to the hypothesis that both male and females snakes may be expressing vomeronasal receptor repertoires that differ from each other or differ between the spring mating period and summer feeding period.

194

To investigate this hypothesis, I performed high throughput RNA-seq (Tag-seq) using vomeronasal organ tissues from males and females from both spring and summer and examined the differential expression of vomeronasal receptors to identify receptors differentially expressed by sex.

Aim 2) Research Question: Does the receptor profile of T. s. parietalis differ by season? Is there evidence that seasonal variation in vomeronasal receptor profiles mediates the behavioral shift from mating to feeding in male Red-sided garter snakes? The response to chemical cues by T. s. parietalis has often been observed to differ by season. Male snakes display a particularly robust shift in behavior – responding only to sex pheromone during the spring mating period and responding only to prey kairomones during the summer feeding period (O’Donnell et al. 2004).

These observations have guided several researchers to investigate the source of this behavioral shift, and several mechanisms have been described that may be involved. Uhrig et al. (2012) suggested that seasonal variation in production of sex pheromone in the skin of female snakes may be a mechanism mediating the shift in male behavior. The authors found that pheromone production in females decreased and females became less attractive to males two weeks after emergence from hibernation suggesting that pheromone concentration or quality may play a role in the observed behavioral shift. However, more recent observations have shown that the behavioral shift in males often occurs independent of the decrease in female attractivity suggesting that pheromone production is not the primary mechanism mediating the behavioral shift (Dayger & Lutterschmidt, 2016).

An alternative suggested mechanism mediating the behavioral shift proposes production of pheromone-binding proteins in the Harderian gland of T. s. parietalis which are hypothesized to bind and solubilize the sex pheromone and allow it to enter the fluid filling the vomeronasal lumen. This hypothesis is examined more closely here and discussed in Chapters 3 and 4, however little evidence was found that

195 seasonal expression of pheromone-binding proteins may be responsible for the behavioral shift.

A third mechanism has been suggested that the seasonal differential response to sex pheromone in males occurs on a neurological level with differential responses being elicited by seasonally different neurological interpretations of the signal rather than a decrease in detection of the chemical signals themselves (Lutterschmidt & Maine, 2014).

If the observed behavioral shift in male snakes is not mechanistically due to a decrease in sex pheromone production in females, nor is it due to a decrease in expression of a pheromone-binding protein in the male Harderian gland, it is likely that this shift occurs either on a neurological level or is mediated by pheromone detection in the vomeronasal organ itself.

Here, I address the hypothesis that the vomeronasal organ displays seasonal differential expression of pheromone receptors mediating the male behavioral shift by allowing males to detect the pheromone in the spring, while decreasing their ability to detect this signal in the summer.

Aim 3) Describe evidence for a specific pheromone receptor expressed in male T. s. parietalis. Due to the extremely strong seasonal and sexually dimorphic behaviors observed in T. s. parietalis, it is clear that the vomeronasal organ of this group is responsible for detecting very different chemical cues in males and females, and during the spring and the summer. Pheromone receptors are often observed to exhibit sex-biased expression (Touhara & Vosshall 2009; Wanner et al. 2010; Alekseyenko et al. 2006), and Huang et al. (2006) showed that the vomeronasal organ of male T. s. parietalis responds to sex pheromone whereas females do not. Based on these findings, it is possible that the receptors responsible for detecting sex pheromone in this group may exhibit sexual dimorphism. It was hypothesized that the vomeronasal sensory

196 epithelium of male T. s. parietalis expresses chemical receptor proteins responsible for the detection of female sexual attractiveness pheromone, whereas females lack these proteins. To address this hypothesis, I performed high throughput RNA-seq (Tag-seq) to examine the expression of vomeronasal receptors in male and female T. s. parietalis to identify candidate pheromone receptor proteins which are present in males but absent from females.

197

Methods:

Animal collection and tissue collection:

A total of 32 adult Red-sided garter snakes (T.s. parietalis) (n=9 spring males, n=7 summer males, n=8 spring females, n=8 summer females) were collected from the wild near the town of Inwood in the Interlake region of Manitoba, Canada (50°31'28"N, 97°30'00"W). Spring snakes were collected from active mating aggregations in April 2016. Only actively courting males and females being courted were collected. Summer snakes were collected from the surrounding marshland (summer feeding areas) in July 2016. All snakes were housed in outdoor nylon cloth arenas (1 × 1 × 1 m) for 24 hours following capture before tissue collection. Snakes were euthanized individually immediately prior to tissue collection via an overdose of methohexital sodium (Brevital™) (0.005 mL/g of body mass, 1% solution). Euthanized snakes were placed on a workspace under a Jena dissecting scope and in tact vomeronasal organs were removed with forceps and corneoscleral scissors. Prior to, and between each dissection, both the workspace and the surgical instruments were cleaned to prevent cross contamination of samples. The workspace was cleaned with sterile saline followed by a 10% bleach solution, then treated with RNAse away (Ambion). Surgical instruments were cleaned with sterile saline, aseptically treated with a 10% bleach solution and 100% ethanol, then placed in a Germinator 500™ dry sterilizer for >15 seconds. Vomeronasal organs were dissected from animals, rinsed briefly with nuclease free water to remove excess blood or surface contaminants, then immediately placed in RNALater™ RNA stabilization reagent (Invitrogen). Tissues were stored at 4°C for 24 hours to allow the RNALater™ to fully permeate the tissue, then moved to -20°C for long term storage until transportation. Tissues were transported to Oregon State University on ice ~0°C then stored at -20°C until RNA Extraction.

198

RNA Extraction and Sequencing:

Tissues were removed from RNA preservation reagent and immediately mechanically homogenized. Samples were centrifuged and liquid supernatant was collected in order to remove the bony non-homogenized material of the vomeronasal capsule. Total RNA was extracted with E.Z.N.A.® HP Total RNA Kits (Omega Bio- Tek). A sample from each extracted RNA was analyzed on a polyacrylamide gel and visualized using ethidium bromide and ultraviolet light to ensure high quality RNA was present and intact. Concentrations of each RNA sample were estimated using absorbance at 260nm using a Spectramax™ M3 with SpectraDrop™ microvolume microplate (Molecular Devices, LLC, San Jose, CA). Total RNA was diluted to a concentration of ~111ng/µl in all samples. A total of 1µg of RNA from each sample was degraded via heat at 95°C for 45 minutes resulting in a distribution of RNA fragments from ~100-500bp assessed by polyacrylamide gel electrophoresis. Complementary DNA (cDNA) was reverse transcribed from degraded RNA samples and enriched for messenger RNA using Tetro™ Reverse Transcriptase (Bioline) coupled with primer oligonucleotides designed to target and enrich mature messenger RNAs containing poly-A tails. This cDNA library was then used to prepare a sequencing library using the protocol described in Meyer et al. (2011) and Lohman et al. (2016). Primer sequences used during library prep are reported in Appendix A.2. The prepared library was submitted to Oregon State University Center for Genome Research and Biocomputing (CGRB). High throughput sequencing was performed on the Illumina HiSeq 30000 platform (1 lane, 50 bp single end reads).

Raw Data Processing and Quality Control:

Raw reads were trimmed to remove adapter sequence based on the presence of a “GGG” motif at the start of the read using the custom script “TagTrimmer.pl”. Trimmed reads were filtered from the dataset if they contained 15 or more homologous repeats using the custom script “HRFilterFastq.pl”. Trimmed and HR filtered reads were then quality trimmed and filtered using BBDuk (Joint Genome

199

Institute) to trim low quality sequence and filter reads still containing adapter sequence. BBduk was run with the flags: ktrim=r k=23 mink=10 hdist=1 qtrim=rl trimq=15 minlen=35 to remove reads less than 35 bp after quality trimming. Reads passing quality filters were examined using FastQC (version 0.11.3) to ensure a high- quality dataset was used in subsequent steps (Andrews, 2010).

Read Mapping

Processed reads were mapped to the multi-tissue reference transcriptome (see Chapter 2 for full description of transcriptome methods). Reads contained in separate files (.fastq) were mapped using the short read mapper SHRiMP2; gmapper (Rumble et al. 2009; David et al. 2011). Gmapper was run with flags: “--qv-offset 33” (quality scores appropriate for the current version of Illumina™ sequencing platforms.), “-Q” (indicates reads are in .fastq format), “—strata” -o 3” (report only the highest scoring mappings for each read to a max of 3) “-N 4 -K 10000” (use 4 threads, with each thread processing blocks of 10,000 reads) and “-L” (perform local mapping, rather than global). The resulting sequence alignment map file (.sam) was then filtered to retain only high quality and informative read mappings using the custom script “SAMFilterByGene.pl” (Meyer, Github; Appendix U.7.). Read mappings were retained if they contained at least 27 matches between the local alignment of each read and the reference. Reads were counted as the number of reads uniquely assigned to a single reference sequence at the level of the Trinity subcomponent. Read mapping counts were compiled into a tab delimited counts matrix containing all samples and all reference sequences with at least one valid mapping using the custom script “CombineExpression.pl” (Meyer, Github; Appendix U.8).

200

Differential Expression Analysis

Differential Expression analysis was conducted using the R package DESeq2 (Love et al. 2014). Input files were loaded as a tab delimited matrix of read mapping counts (described above) and a data factor key file including associations between samples and treatment groups. Input counts data were filtered to retain transcripts with a mean of ≥ 1 valid mapping per reference sequence. A full DESeq model was created including two factors with two levels each and an interaction term. Factors included “sex” (male or female), “season” (spring or summer) and the interaction term of “sex by season”.

Preliminary examinations of the counts data were performed using principal components analysis (PCA) and hierarchical clustering visualization of z-scores. Prior to visualization of the data, the full model was first transformed using the regularized log transformation function included in the DESeq2 package. A principal components analysis plot was generated including all four possible combinations of factor levels – spring/male, summer/male, spring/female, and summer/female using the function “PlotPCA” included in the DESeq2 package.

Differential expression analyses were conducted using a two factor design including factors of sex, season, and sex-by-season interaction (Figure 5.1). Additional differential expression analyses were conducted using a single factor design including all single factor combinations of Sex and Season (Figure 5.2).

To test the effects of sex and season across all samples, a full DESeq model was created which did not include the interaction term. The effect of sex and season as factors were tested against reduced models lacking the sex and season term respectively using negative binomial likelihood ratio tests.

To test the effects of the interaction term, a full model was created including sex, season and sex-by-season interaction. This model was compared against a reduced model including the factors of sex and season but lacking the interaction term using a negative binomial likelihood ratio test.

201

Multiple test correction of raw p-values was performed according to the Benjamini-Hochberg procedure to control false discovery rate. Independent filtering by DESeq was allowed in order to remove reference sequences from multiple test correction calculations with mean counts below the optimal threshold after normalization (Bourgon, et al. 2010). A standard threshold of p ≤ 0.05 was used to assign significance in all differential expression analyses.

After differential expression was analyzed for all transcripts (using dispersion information from all transcripts), vomeronasal V2R receptor transcripts were extracted and examined to describe patterns of variation in the expression of vomeronasal receptor transcripts excluding all other reference sequences.

Results of differential expression analyses were visualized using the R package “Enhanced Volcano” (Blighe, 2018) and “Heatmap3” (Zhao et al. 2014). Volcano plots show the dispersion of differential expression results on the axes of Log2-fold- change and Adjusted-p-value. Heatmaps depict hierarchically clustered z-scores calculated from regularized log transformed read mapping counts of transcripts found to be differentially expressed.

202

Figure 5.1 Two-factor comparisons of gene expression in vomeronasal organs: Comparisons made using the two factor model including Sex and Season. Comparisons are made across all samples. (n = 9 spring males, n = 7 summer males, n = 8 spring females, and n = 8 summer females).

203

Figure 5.2 Single-factor comparisons of gene expression in vomeronasal organs: Comparisons made using single factor model combinations of sex and season Sex and Season. Comparisons are made between individual groups: n = 9 spring males, n = 7 summer males, n = 8 spring females, n = 8 summer females.

204

Results:

Processing and read mapping.

The raw data from 2 lanes of Illumina HiSeq® yielded 216,326,941 50 bp single end reads. After quality filtering, 153,847,063 high quality reads remained. The mean reads/sample was ~4.8M with a range of 3,610,129 – 6,083,287 reads per sample.

A total of 118,126,475 quality filtered reads mapped to the reference with an average of ~76.8% (range: 73.73% – 79.84%). The subset of reference sequences in the transcriptome with at least one mapped read included 77,971 transcripts.

Exploratory analyses of counts data:

Exploratory analysis of counts data using principal components analysis showed that the variation between samples corresponded with the factor levels (Figure 5.3). All combinations of factor levels resulted in non-overlapping clusters plotted on PC1 and PC2 using regularized log transformed counts. PC1 contributed ~18% of overall variation and PC2 contributed ~12% of variation. A PCA plot including only transcripts annotated as V2R vomeronasal receptors also showed non- overlapping clusters with PC1 contributing 13% and PC2 contributing 7% of variation.

205

Figure 5.3 Principal components analysis of gene expression in the vomeronasal organ: Principal components analyses produced from the top 2000 expressed transcripts (left) and all transcripts predicted to encode V2R vomeronasal receptors (right) expressed in the vomeronasal organs of T. s. parietalis from n = 9 spring males, n = 7 summer males, n = 8 spring females, and n = 8 summer females. Regularized log

206

#1 Differential Expression by Sex: Differential expression analysis of vomeronasal organs comparing males to females across all samples included 43,422 reference sequences with non-zero normalized read counts. A total of 1,626 transcripts (3.7%) displayed significantly increased abundances in males compared to females and 1,420 transcripts (3.3%) displayed significantly increased abundances in females compared to males (adjusted p-value ≤ 0.05) (Figure 5.4; Table 5.1). Independent filtering by DESeq2 identified 74 reference sequences as outliers and 5,023 reference sequences with low normalized counts and 38,325 sequences were included in multiple test correction calculations. #2 Differential Expression by Season: Differential expression analysis of vomeronasal organs comparing spring to summer across all samples included 43,422 reference sequences with non-zero normalized read counts. A total of 4,011 transcripts (9.2%) displayed significantly increased abundances in spring compared to summer and 3,049 transcripts (7%) displayed significantly increased abundances in summer compared to spring (adjusted p-value ≤ 0.05) (Figure 5.4; Table 5.1). Independent filtering by DESeq2 identified 74 reference sequences as outliers and 1,663 reference sequences with low normalized counts and 41,685 sequences were included in multiple test correction calculations. #3 Differential Expression Sex-by-Season Interaction: Differential expression analysis of vomeronasal organs examining the effect of sex by season interaction across all samples included 43,422 reference sequences with non-zero normalized read counts. A total of 763 transcripts (1.8%) displayed significantly increased abundances and 746 transcripts (1.7%) displayed significantly increased abundances by the sex-by-season interaction term (adjusted p-value ≤ 0.05) (Figure 5.4; Table 5.1). Independent filtering by DESeq2 identified 62 reference sequences as outliers and 13,436 reference sequences with low normalized counts and 29,924 sequences were included in multiple test correction calculations.

207

Figure 5.4 Differential expression of all genes expressed in the vomeronasal organ of T. s. parietalis by sex, season, and sex-by-season interaction: Figures show results of two-factor differential expression analyses including all genes expressed in the vomeronasal organs of T. s. parietalis from n = 9 spring males, n = 7 summer males, n = 8 spring females and n = 8 summer females. Transcripts are plotted by Log2 fold change and -Log10 adjusted p-values. Points above the p-value cutoff line (orange) are significantly differentially expressed (≥0.05). Differential expression analyses were performed using DESeq2 (Love et al. 2014). Volcano plots were generated using the R package ‘Enhanced Volcano’ (Blighe, 2018). *Note differences in scale between figures.

208

#4 Differential Expression: Spring Male vs Spring Female:

Differential expression analysis of vomeronasal organs comparing spring to summer across all samples included 44,495 reference sequences with non-zero normalized read counts. A total of 2,061 transcripts (4.6%) displayed significantly increased abundances in spring males compared to spring females and 1,987 transcripts (4.5%) displayed significantly increased abundances in spring females compared to spring males (adjusted p-value ≤ 0.05) (Figure 5.5, Table 5.1). Independent filtering by DESeq2 identified 0 reference sequences as outliers and 1,987 reference sequences with low normalized counts and 37,593 sequences were included in multiple test correction calculations.

#5 Differential Expression: Summer Male vs Summer Female:

Differential expression analysis of vomeronasal organs comparing spring to summer across all samples included 42,185 reference sequences with non-zero normalized read counts. A total of 686 transcripts (1.6%) displayed significantly increased abundances in summer males compared to summer females and 296 transcripts (0.7%) displayed significantly increased abundances in summer females compared to summer males (adjusted p-value ≤ 0.05) (Figure 5.5, Table 5.1). Independent filtering by DESeq2 identified 0 reference sequences as outliers and 17,998 reference sequences with low normalized counts and 24,192 sequences were included in multiple test correction calculations.

#6 Differential Expression: Spring Male vs Summer Male:

Differential expression analysis of vomeronasal organs comparing spring to summer across all samples included 42,939 reference sequences with non-zero normalized read counts. A total of 1,225 transcripts (2.9%) displayed significantly increased abundances in spring males compared to summer males and 892 transcripts (2.1%) displayed significantly increased abundances in summer males compared to

209 spring males (adjusted p-value ≤ 0.05) (Figure 5.5, Table 5.1). Independent filtering by DESeq2 identified 0 reference sequences as outliers and 16,651 reference sequences with low normalized counts and 26,289 sequences were included in multiple test correction calculations.

#7 Differential Expression: Spring Female vs Summer Female:

Differential expression analysis of vomeronasal organs comparing spring to summer across all samples included 43,990 reference sequences with non-zero normalized read counts. A total of 3,277 transcripts (7.4%) displayed significantly increased abundances in spring females compared to summer females and 3,151 transcripts (7.2%) displayed significantly increased abundances in summer females compared to spring females (adjusted p-value ≤ 0.05) (Figure 5.5, Table 5.1). Independent filtering by DESeq2 identified 0 reference sequences as outliers and 16,651 reference sequences with low normalized counts and 9,383 sequences were included in multiple test correction calculations.

210

Figure 5.5 Differential expression of all genes expressed in the vomeronasal organ of T. s. parietalis by Sex or Season: Figures show results of single-factor differential expression analyses including all genes expressed in the vomeronasal organs of T. s. parietalis from n = 9 spring males, n = 7 summer males, n = 8 spring females and n = 8 summer females. Transcripts are plotted by Log2 fold change and -Log10 adjusted p-values. Points above the p-value cutoff line (orange) are significantly differentially expressed (≥0.05). Differential expression analyses were performed using DESeq2 (Love et al. 2014). Volcano plots were generated using the R package ‘Enhanced Volcano’ (Blighe, 2018). *Note differences in scale between figures.

211

Table 5.1 Differential expression of all transcripts expressed in vomeronasal organ: Results of differential expression analyses including all genes expressed in the vomeronasal organs of T. s. parietalis from n = 9 spring males, n = 7 summer males, n = 8 spring females and n = 8 summer females. Two factor comparisons show the results of differential expression analyses by Sex, Season and the Sex-by-Season interaction. Single-factor comparisons are made between samples within a single sex or season. ‘Total transcripts’ represents the number of transcripts included in each analysis (with a mean of at least 1 read mapping count per sample). ‘Sig. D.E. transcripts’ shows the number of significantly differentially expressed transcripts in each comparison, the proportion of the total number of transcripts which was differentially expressed and indicates the group in which significantly increased transcript abundance was detected. Differential expression analyses were performed using DESeq2 (Love et al. 2014). Total Comparison Trans. Sig. D.E. transcripts

Two-Factor Comparisons Sex 43,422 1,626 (3.7%) (Males) 1,420 (3.3%) (Females) Season 43,422 4,011 (9.2%) (Spring) 3,049 (7.0%) (Summer) Sex by Season 43,422 763 (1.8%) (Up) 746 (1.7%) (Down)

Single-Factor Comparisons

Spr. Male vs Spr. 44,495 2,061 (4.6%) (Spr. Males) 1,987 (4.5%) (Spr. Females) Female Su. Male vs Su. 42,195 686 (1.6%) (Su. Males) 296 (0.7%) (Su. Females) Female Spr. Male vs Su. 42,939 1,225 (2.9%) (Spr. Males) 892 (2.1%) (Su. Males) Male Spr. Female s Su. 43,990 3,277 (7.4%) (Spr. Females) 3,151 (7.2%) (Su. Females) Female

212

Figure 5.6 Differentially expressed transcripts in the vomeronasal organ of T. s. parietalis: Counts of transcripts found to be differentially expressed by the comparison indicated. Differential expression was determined using DESeq2 (Love et al. 2014) from: n = 9 spring males, n = 7 summer males, n = 8 spring females, and n = 8 summer females. Bars represent counts of transcripts displaying significantly increased abundances in in the group indicated by comparison. Comparisons 1 & 2 show differential expression by sex and season respectively across all samples using a two factor design. Comparison 3 shows counts of transcripts differentially expressed by the interaction of sex and season. Comparisons 4 - 7 show the number of

213

#1 Differential Expression of V2Rs by Sex:

Differential expression analysis of V2R vomeronasal receptor transcripts comparing males to females included 322 vomeronasal receptor transcripts with non- zero normalized read counts. A total of 79 transcripts (24.5%) displayed significantly increased abundances in males compared to females and 2 transcripts (0.006%) displayed significantly increased abundances in females compared to males (adjusted p-value ≤ 0.05) (Figure 5.7; Table 5.2). Independent filtering by DESeq2 identified 74 transcripts as outliers and 5,023 reference sequences with low normalized counts and included 38,325 sequences in multiple test correction calculations.

#2 Differential Expression of V2Rs by Season:

Differential expression analysis of V2R vomeronasal receptor transcripts comparing spring to summer included 322 vomeronasal receptor transcripts with non- zero normalized read counts. A total of 121 transcripts (37.6%) displayed significantly increased abundances in spring compared to summer and 0 transcripts displayed significantly increased abundances in summer compared to spring (adjusted p-value ≤ 0.05) (Figure 5.7; Table 5.2). Independent filtering by DESeq2 identified 74 transcripts as outliers and 1,663 reference sequences with low normalized counts and included 41,685 sequences in multiple test correction calculations.

#3 Differential Expression of V2Rs Sex-by-Season Interaction:

Differential expression analysis of V2R vomeronasal receptor transcripts examining the effect of sex by season interaction included 322 vomeronasal receptor transcripts with non-zero normalized read counts. A total of 180 transcripts (55.9%) displayed significantly increased abundances and 1 transcript (0.003%) displayed significantly increased abundances by sex-by-season interaction (adjusted p-value ≤ 0.05) (Figure 5.7; Table 5.2). Independent filtering by DESeq2 identified 62 transcripts as outliers and 13,436 reference sequences with low normalized counts and included 29,924 sequences in multiple test correction calculations.

214

Figure 5.7 Differential expression of predicted V2R vomeronasal receptors expressed in the vomeronasal organ of T. s. parietalis by Sex, Season, and Sex-by-Season interaction: Figures show results of two-factor differential expression analyses including only V2R vomeronasal receptors expressed in the vomeronasal organs of T. s. parietalis from n = 9 spring males, n = 7 summer males, n = 8 spring females and n = 8 summer females. Transcripts are plotted by Log2 fold change and -Log10 adjusted p-values. Points above the p-value cutoff line (orange) are significantly differentially expressed (≥0.05). Differential expression analyses were performed using DESeq2 (Love et al. 2014). Volcano plots were generated using the R package ‘Enhanced Volcano’ (Blighe, 2018). *Note differences in scale between figures.

215

#4 Differential Expression of V2Rs: Spring Male vs Spring Female:

Differential expression analysis of V2R vomeronasal receptor transcripts comparing spring to summer included 316 vomeronasal receptor transcripts reference sequences with non-zero normalized read counts. A total of 193 transcripts (61.1%) displayed significantly increased abundances in spring males compared to spring females and 3 transcripts (0.009%) displayed significantly increased abundances in spring females compared to spring males (adjusted p-value ≤ 0.05) (Figure 5.8, Table 5.2). Independent filtering by DESeq2 identified 0 reference sequences as outliers and 1,987 reference sequences with low normalized counts and included 37,593 sequences in multiple test correction calculations.

Figure 5.8 Differential expression of predicted V2R receptors expressed in the vomeronasal organ of T. s. parietalis by Sex in Spring: Results of single-factor differential expression analysis of V2R vomeronasal receptors expressed in the vomeronasal organs of T. s. parietalis comparing n = 9 spring males to n = 8 spring females. Transcripts are plotted by Log2 fold change and -Log10 adjusted p-values. Points above the p-value cutoff line (orange) are significantly differentially expressed (≥0.05). Differential expression analyses were performed using DESeq2 (Love et al. 2014). Volcano plots were generated using the R package ‘Enhanced Volcano’ (Blighe, 2018).

216

Figure 5.9 Vomeronasal V2R receptors expressed in the vomeronasal organ of T. s. parietalis in the spring: Heatmaps represent z-scores of transcripts predicted to encode vomeronasal V2R receptors expressed in the vomeronasal organs of T. s. parietalis from n = 9 male and n = 8 females during the spring mating period. Untransformed heatmap shows z-scores based on raw read mapping counts show a nearly universal downregulation of V2R receptors in spring females. Regularized log transformed heatmap shows transformed read mapping counts and illustrates that there is little difference between male and female receptor profiles in the spring after accounting for the downregulation observed in spring females. Transformation was performed using DESeq2 (Love et al. 2014).

217

#5 Differential Expression V2Rs: Summer Male vs Summer Female:

Differential expression analysis of V2R vomeronasal receptor transcripts comparing spring to summer included 346 vomeronasal receptor transcripts with non- zero normalized read counts. A total of 2 transcripts (0.006%) displayed significantly increased abundances in summer males compared to summer females and 6 transcripts (0.017%) displayed significantly increased abundances in summer females compared to summer males (adjusted p-value ≤ 0.05) (Figure 5.10, Table 5.2). Independent filtering by DESeq2 identified 0 reference sequences as outliers and 17,998 reference sequences with low normalized counts and included 24,192 sequences in multiple test correction calculations.

Figure 5.10 Differential expression of predicted V2R receptors expressed in the vomeronasal organ of T. s. parietalis by Sex in Summer: Results of single-factor differential expression analysis of V2R vomeronasal receptors expressed in the vomeronasal organs of T. s. parietalis comparing n = 7 males to n = 8 females during the summer feeding period. Transcripts are plotted by Log2 fold change and -Log10 adjusted p-values. Points above the p-value cutoff line (orange) are significantly differentially expressed (≥0.05). Differential expression analyses were performed using DESeq2 (Love et al. 2014). Volcano plots were generated using the R package ‘Enhanced Volcano’ (Blighe, 2018).

218

#6 Differential Expression V2Rs: Spring Male vs Summer Male:

Differential expression analysis of V2R vomeronasal receptor transcripts comparing spring to summer included 343 vomeronasal receptor transcripts with non- zero normalized read counts. A total of 2 transcripts (0.006%) displayed significantly increased abundances in spring males compared to summer males and 0 transcripts displayed significantly increased abundances in summer males compared to spring males (adjusted p-value ≤ 0.05) (Figure 5.11, Table 5.2). Independent filtering by DESeq2 identified 0 reference sequences as outliers and 16,651 reference sequences with low normalized counts and included 26,289 sequences in multiple test correction calculations.

Figure 5.11 Differential expression of predicted V2R receptors expressed in the vomeronasal organ of T. s. parietalis by Season in Males: Results of single-factor differential expression analysis of V2R vomeronasal receptors expressed in the vomeronasal organs of T. s. parietalis comparing n = 9 males during the spring mating period to n = 7 males during the summer feeding period. Transcripts are plotted by Log2 fold change and -Log10 adjusted p-values. Points above the p-value cutoff line (orange) are significantly differentially expressed (≥0.05). Differential expression analyses were performed using DESeq2 (Love et al. 2014). Volcano plots were generated using the R package ‘Enhanced Volcano’ (Blighe, 2018).

219

#7 Differential Expression V2Rs: Spring Female vs Summer Female:

Differential expression analysis of V2R vomeronasal receptor transcripts comparing spring to summer included 307 vomeronasal receptor transcripts with non- zero normalized read counts. A total of 1 transcript (0.003%) displayed significantly increased abundances in spring females compared to summer females and 205 transcripts (66.8%) displayed significantly increased abundances in summer females compared to spring females (adjusted p-value ≤ 0.05) (Figure 5.12, Table 5.2). Independent filtering by DESeq2 identified 0 reference sequences as outliers and 16,651 reference sequences with low normalized counts and included 9,383 sequences in multiple test correction calculations.

Figure 5.12 Differential expression of predicted V2R receptors expressed in the vomeronasal organ of T. s. parietalis by Season in Females: Results of single-factor differential expression analysis of V2R vomeronasal receptors expressed in the vomeronasal organs of T. s. parietalis comparing n = 8 females in the spring mating period to n = 8 females during the summer feeding period. Transcripts are plotted by Log2 fold change and - Log10 adjusted p-values. Points above the p-value cutoff line (orange) are significantly differentially expressed (≥0.05). Differential expression analyses were performed using DESeq2 (Love et al. 2014). Volcano plots were generated using the R package ‘Enhanced Volcano’ (Blighe, 2018).

220

Figure 5.13 Vomeronasal receptor V2Rs expressed in the vomeronasal organ of female T. s. parietalis: Heatmaps represent z-scores of transcripts predicted to encode vomeronasal V2R receptors expressed in the vomeronasal organ of female T. s. parietalis from n = 8 spring and n = 8 summer samples. Untransformed heatmap shows z-scores based on raw read mapping counts show a nearly universal downregulation of V2R receptors in spring females. Regularized log transformed heatmap shows transformed read mapping counts and illustrates that there is little difference in receptor repertoires between spring and summer after accounting for the downregulation observed in spring females. Transformation was performed using DESeq2 (Love et al. 2014)

221

Table 5.2 Differential expression of V2R vomeronasal receptors expressed in vomeronasal organ: Results of differential expression analyses including only transcripts predicted to encode V2R vomeronasal receptors expressed in the vomeronasal organs of T. s. parietalis from n = 9 spring males, n = 7 summer males, n = 8 spring females and n = 8 summer females. Two factor comparisons show the results of differential expression analyses by Sex, Season and the Sex-by-Season interaction across all samples. Single-factor comparisons show results of comparisons between samples within a single sex or season. ‘Total transcripts’ represents the number of transcripts included in each analysis (with a mean of at least 1 read mapping count per sample). ‘Sig. transcripts’ shows the number of significantly differentially expressed transcripts in each comparison, the proportion of the total number of transcripts which was differentially expressed and indicates the group in which significantly increased transcript abundance was detected. Differential expression analyses were performed using DESeq2 (Love et al. 2014). Total Comparison Receptors Sig. D.E. Receptor s

Two-Factor

Sex 322 79 (24.5%) (Males) 2 (0.006%) (Females) Season 322 121 (37.6%) (Spring) 0 (0%) (Summer) Sex by Season 322 180 (55.9%) (Up) 1 (0.003%) (Down)

Single-Factor

Spr. Male vs 316 193 (61.1%) (Spr. Males) 3 (0.009%) (Spr. Females) Spr. Female

Su. Male vs 346 2 (0.006%) (Su. Males) 6 (0.017%) (Su. Females) Su. Female

Spr. Male vs 343 2 (0.006%) (Spr. Males) 0 (0%) (Su. Males) Su. Male

Spr. Female s 307 1 (0.003%) (Spr. Females) 205 (66.8%) (Su. Females) Su. Female

222

Figure 5.14 Differentially expressed V2R receptors in the vomeronasal organ of T. s. parietalis: Counts of transcripts predicted to encode V2R vomeronasal receptors found to be differentially expressed by the comparison indicated. Differential expression was determined using DESeq2 (Love et al. 2014) from: n = 9 spring males, n = 7 summer males, n = 8 spring females, and n = 8 summer females. Bars represent counts of transcripts displaying significantly increased abundances in in the group indicated by comparison. Comparisons 1 & 2 show differential expression by sex and season respectively across all samples using a two factor design. Comparison 3 shows the combined number of transcripts differentially expressed by the interaction of sex and season. Comparisons 4 - 7 show the number of differentially expressed transcripts using single factor comparisons of the indicated groups.

223

Discussion:

Preliminary analyses:

Principal components analysis was performed in order to identify major sources of variability and to identify samples with expression exhibiting extreme outlying variation compared to other samples. Principal components analysis plots data on uncorrelated axes created from the variation present in the data without regard to factors defined by researchers. Using this analysis, if we have defined factors that capture the majority of the variation between samples, we should expect data to cluster according to the factors we have defined (Joliffe et al. 2016). If the data showed groupings on PC1 and PC2 that did not correspond well to the defined factor levels, this would indicate that substantial sources of variation were present that was not captured by the factors included in the analysis.

Here, principal components analysis was used to examine clustering of expression on PC1 and PC2 using the top 2000 expressed transcripts and also to examine the 322 V2R vomeronasal receptor transcripts expressed in these samples (Figure 5.3). These preliminary analyses show that sample variation on PC1 and PC2 for the top 2000 transcripts resulted in non-overlapping clusters associated with all defined factor levels. Spring male samples showed the most variation whereas spring female samples showed the least. Analysis of V2R receptors showed all receptors from spring males, summer males, and summer females overlapping almost entirely on PC1 but not overlapping on PC2. Spring females showed the greatest degree of difference from other samples when examining the top 2000 transcripts, and also when examining only the V2R transcripts.

In both cases, the variation present in the samples was generally captured by the factors of sex and season. Non-overlapping clusters indicate that significant differential expression is likely to exist based on comparisons between these factors.

224

Differential expression by sex, season and sex-by-season interaction across all samples:

Two-factor differential expression analysis of all genes across all samples identified a very large number of differentially expressed transcripts by sex, season, and sex-by-season interaction. Counts of differentially expressed transcripts are split relatively evenly between conditions in all comparisons. No comparison shows an extreme expression bias toward a single condition. Comparisons by sex and season across all samples show over 3,000 and 7,000 differentially expressed transcripts respectively. Sex-by-season comparison shows that over 1,500 transcripts are differentially expressed by both sex and season.

Single-factor comparisons also show a large number of differentially expressed transcripts in the vomeronasal organ. As observed in two-factor comparisons, differentially expressed transcripts here are split approximately evenly. The largest difference in the number of differentially expressed genes was observed between males and females in the summer, where males displayed ~2.3 times the number observed in females.

T. s. parietalis is well known for its robust seasonal and annual behavioral cycles (Aleksiuk & Gregory, 1974). These differences correspond to the dramatic shifts in behavior observed during the transition from the spring mating period to the summer feeding period. The function of the vomeronasal organ is physiologically important before and after this shift occurs; males using this organ to search for sex pheromones in the spring, and both sexes using this organ to search for prey kairomones during the summer. This suggests that gene expression in this organ may be regulated in order to mediate physiological shifts coinciding with the observed behavioral shifts during this transition.

225

Differential expression of vomeronasal V2R receptors by sex, season and sex-by- season interaction across all samples:

As observed in comparisons across all samples, vomeronasal chemical receptors exhibit a large number of differentially expressed transcripts by sex and season. However, the expression patterns of vomeronasal receptors are different from those including all transcripts in that vomeronasal receptor expression is extremely biased toward males and toward the summer time period. The finding that differential expression is approximately evenly distributed in comparisons including all samples indicates that the extreme biases observed in expression of receptor transcripts is not due to biases in the complete library.

The interaction of sex and season significantly impacts the expression of vomeronasal receptors as well, with 181 out of 322 receptor transcripts (56%) being significantly differentially expressed by both sex and season. This degree of differential expression by both sex and season makes interpretation of results from two-factor comparisons difficult due to the high rate of interaction. Therefore, in order to accurately characterize differences in gene expression profiles by sex and by season, it is necessary to examine the results of single-factor comparisons by sex and by season.

Differential expression of vomeronasal V2R receptors observed in single-factor comparisons within sex or season:

Single-factor analyses were performed to compare spring males to spring females (#4), summer males to summer females (#5), spring males to summer males (#6), and spring females to summer females (#7). Several of these comparisons exhibit the extreme expression biases observed to result from two-factor comparisons, whereas others show relatively little differential expression of vomeronasal receptors.

226

Comparisons of spring males to spring females showed 193 receptor transcripts significantly more abundant in males compared to only 3 transcripts in females. Likewise, comparisons of spring to summer samples in females showed 205 vomeronasal receptors significantly more abundant in summer females and only 1 transcript significantly more abundant in spring females. The lists of these differentially expressed transcripts overlap substantially, and do not represent completely different sets of vomeronasal receptors.

Comparisons of spring males to summer males showed only 2 receptors significantly more abundant in the spring compared to summer. A comparison of males to females in the summer, show 6 receptors significantly more abundant in females and only 2 in males.

The results of these comparisons, examined together, identify the primary source of the strong interaction of sex and season observed in the two-factor comparisons. The receptor profiles of summer females show very little differential expression when compared to either spring males or summer males. Receptor profiles in males also show very little differential expression between spring and summer. However, vomeronasal receptors in females are nearly universally downregulated in the spring.

The striking downregulation of vomeronasal receptors in the spring suggests that the vomeronasal organ in females during the spring mating period is physiologically less able to detect chemical cues compared to males and compared to females in the summer. In contrast, the striking similarity of spring males, summer males, and summer females (indicated by the small number of differentially expressed vomeronasal receptor transcripts between these groups) show that the receptor repertoire differs very little by sex and changes little throughout the year.

227

Identification of V2R, V1R or FPR receptors:

In many vertebrate taxa – especially mammals, the vomeronasal organ expresses a variety of vomeronasal receptor types (Rivie`re et al. 2009). Vomeronasal sensory epithelium contains neurons which express one of three distinct classes of receptor; V1R, V2R, or formyl peptide receptors (FPRs) (Dulac & Axel, 1995). Vomeronasal receptors V1R, V2R, and FPRs are commonly expressed simultaneously, and often in different regions of the VNO, (Liberles et al. 2009; Isogai, 2011). Data presented here shows that only transcripts annotated as vomeronasal receptors V2R were expressed. The multi-tissue transcriptome presented in Chapter 2 shows 615 vomeronasal receptor transcripts, all of which were annotated as V2R. The T. s. concinnus genome (NCBI) shows a single gene annotated as a V1R receptor, and a single gene annotated as an FPR. To date, no evidence has been provided showing the expression of either V1R or FPR receptors in the vomeronasal organ of T. s. concinnus, nor has evidence been provided that either V1R or FPR receptors are expressed in T. s. parietalis. Transcriptome annotation is far from a perfect process, and the possibility of misidentification and incorrect annotation of receptor transcripts exist. However, the extent of misidentification cannot be known without further research. At this time, only V2R vomeronasal receptors have been found to be expressed in this group, with the possible exception of a single V1R and one FPR for which no evidence has yet been described in T. s. parietalis.

Summary and conclusions: The analyses presented here show a high degree of both seasonal and sexual differential expression of vomeronasal receptors based on two-factor comparisons. This result, however is somewhat misleading considering that there are a large number of vomeronasal receptors which are differentially expressed by the sex-by- season interaction. Examinations of single-factor comparisons within each sex and season show that the majority of the differential expression reported across all

228 samples is due to a nearly universal downregulation of vomeronasal receptors in females in the spring. Comparisons of summer females to either spring or summer males show that the receptor profiles do not vary dramatically by sex or season. Although a small number of transcripts were found to be significantly differentially expressed in these comparisons, extreme biases are not observed. This indicates that, while the magnitude of expression of vomeronasal receptors is seasonally regulated, the overall receptor repertoire does not appear to vary greatly throughout the year. Many studies have observed differential responses to chemical cues detected by the vomeronasal organ based on sex or season (Kandoh et al. 2011; Nakazawa, et al. 2009; Hamdani et al. 2008). The Red-sided garter snake is an excellent example of this phenomenon, displaying extreme variation by both sex and season in responses to chemical cues (LeMaster & Mason 2001; Lutterschmidt & Maine, 2014). Male snakes display robust and easily identifiable courtship behavior when exposed to sex pheromones in the spring, but do not respond to prey kairomones at that time. In the summer, the same animals will cease responding to sex pheromone, but show robust prey-trailing and feeding behavior based on prey kairomones alone (Gregory & Stewart 1975). Female snakes do not appear to respond to sex pheromone or prey kairomones in the spring but respond strongly to prey cues in the summer (Lutterschmidt & Maine, 2014; O’Donnell et al. 2004). These observations led to the hypothesis that both male and females snakes may be expressing vomeronasal receptor repertoires that differ from each other or differ between the spring mating period and summer feeding period. The findings presented here show a large number of receptor transcripts differentially expressed by sex and by season. Upon closer examination however, these differences appear to arise primarily from the near universal downregulation of vomeronasal receptors in spring females. A comparison of males to females in the summer when the vomeronasal organ is active in both sexes failed to identify any receptors with sex-specific, or extremely sex-biased, patterns of expression. One would expect that the espression of vomeronasal receptor expression would not vary

229 significantly by sex during the summer as both sexes use this organ to hunt for prey via pray kairomones during this time. The response to chemical cues by T. s. parietalis has often been observed to differ by season. Male snakes display a particularly robust shift in behavior – responding only to sex pheromone during the spring mating period and responding only to prey kairomones during the summer feeding period. The findings presented here show no evidence that a shift in receptor profiles is a mechanism mediating this behavioral shift. Here, I describe differential expression of vomeronasal receptors showing that female receptors are nearly universally downregulated in the spring mating period compared to the summer feeding period. However, males display only 2 receptors found to be significantly differentially expressed between spring and summer. Those two receptors (transcript IDs: DN23797_co_g1_i3 and DN72559_c0_g1_i1) were present in both spring and summer males and both showed differential expression p-values of approximately 0.049 indicating that, while statistically significantly differentially expressed, they do not display expression patterns suggesting that they are expressed in physiologically relevant abundances in the spring, but not in the summer, and therefore are not likely to be a mechanism able to cease detection of sex pheromone completely. Several hypothesized mechanisms may be responsible for mediating this observed behavioral shift; including a decrease in pheromone production, a decrease in pheromone binding protein abundance, a shift in vomeronasal receptor profiles, or higher order neuronal interpretations of the signal. The concentrations of pheromone lipids on the skin of female snakes was found to vary by season, but is still present in the summer (Uhrig et al. 2012). The findings of Lutterschmidt & Maine (2014) showed that male T. s. parietalis shift from mating to feeding is likely based on seasonal changes in neuroendocrine physiology. Previous research employing patch-clamp electrophysiology techniques show that the vomeronasal sensory epithelium of male T. s. parietalis responds to female sexual attractiveness pheromone whereas female vomeronasal sensory epithelium does not (Huang et al. 2006). This observation was hypothesized to be a result of differential

230 expression of vomeronasal receptor proteins, expressed in males and absent from females, responsible for the detection of sex pheromones. The findings presented here detected no vomeronasal receptors which are present in males but absent in females. This result does not allow identification of a specific receptor as a likely pheromone receptor candidate in male snakes. However, these results do explain the electrophysiology results described above. Here, vomeronasal receptors in females were found to be nearly universally downregulated in the spring. The experiment conducted by Huang et al. was performed early in the spring mating period, during the time when females are not expressing vomeronasal receptors. This suggests that these results were not due to an absence of a only a single pheromone receptor (or a small number of receptors) in females, but were likely rather due to a near universal downregulation of vomeronasal receptors all together. The set of downregulated receptors in spring females likely contains pheromone receptors, but does not provide evidence that may be used to predict the identity of specific pheromone recptors. The research presented here does support the original hypothesis that pheromone receptors are expressed in male T. s. parietalis during the spring mating period but further suggests that they are absent from females at this time period. It does not, however, narrow the list of candidates to even a small number of receptor proteins, but indicates that the pheromone receptors are likely among the list of 193 differentially expressed receptors. Pheromone receptors are hypothesized to exhibit male biased expression and seasonally biased expression toward the spring mating period. From the results of comparisons between males in the spring to males in the summer, two specific receptor transcripts (transcript IDs: DN23797_co_g1_i3 and DN72559_c0_g1_i1) can be distinguished in that they follow these hypothesized expression patterns indicative of putative pheromone receptors. Although the differences in expression are small, and this evidence is fairly weak, these receptors are the most likely candidates for pheromone receptors which appear in this data set.

231

References:

Alekseyenko, O., Baum, M., Cherry, J. (2006) Sex and gonadal steroid modulation of pheromone receptor gene expression in the mouse vomeronasal organ. Neuroscience. 140(4):1349-57. Aleksiuk, M., & Gregory, P. (1974) Regulation of Seasonal Mating Behavior in Thamnophis sirtalis parietalis. Copeia, 1974(3):681-689. Andrews, S. (2010) FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc Bargmann, C. (1997) Olfactory receptors, vomeronasal receptors, and the organization of olfactory information. Cell. 90(4):585-587. Bertmar, G., (1981). Evolution of vomeronasal organs in vertebrates. Evolution. 35:359–366 Blighe, Kevin. (2018) EnhancedVolcano: publication-ready volcano plots with enhanced colouring and labeling.” https://github.com/kevinblighe. Bourgon, R., Gentleman, R., Huber, W. (2010) Independent filtering increases detection power for high-throughput experiments. Proceedings of the National Academy of Sciences. 107(21):9546-9551. Brykczynska, U, Tzika, A, Rodriguez, I, Milinkovith, M. (2013) Contrasted evolution of the vomeronasal receptor repertoires in mammals and squamate reptiles. Genome Biology and Evolution. 5(2):389–401. David, M., Dzamba, M., Lister, D., Ilie, L., Brudno, M. (2011) SHRiMP2: Sensitive yet practical short read mapping, Bioinformatics. 27(7):1011–1012. Dawley, E., Crowder, J. (1995) Sexual and seasonal differences in the vomeronasal epithelium of the red-backed salamander (Plethodon cinereus). Journal of Comparative Neurology. 359(3):382-90. Dayger, C., Lutterschmidt, D. (2016) Seasonal and sex differences in responsiveness to adrenocorticotropic hormone contribute to stress response plasticity in Red- sided garter snakes (Thamnophis sirtalis parietalis). Journal of Experimental Biology. 219(7):1022. Døving, K., Trotier, D. (1998) Structure and function of the vomeronasal organ. Journal of experimental biology. 201:2913-2925. Dulac, C. & Axel, R. (1995) A novel family of genes encoding putative pheromone receptors in mammals. Cell. 83:195-206. Francia, S., Pifferi, S., Menini, A., Tirindelli, R. (2014) Vomeronasal receptors and signal transduction in the vomeronasal organ of mammals. in: Neurobiology of chemical communication. Boca Raton (FL): CRC Press/Taylor & Francis.

232

Gregory, P., Stewart, K., (1975) Long-distance dispersal and feeding strategy of the Red-sided garter snake (Thamnophis sirtalis parietalis) in the Interlake of Manitoba. Can. J. Zool. 53: 238-245. Halpern, M. (1987) The organization and function of the vomeronasal system. Annual review in Neurosciences. 10:325-362. Hamdani, E., Lastein, S., Gregersen, f., Døving, K. (2008) Seasonal variations in olfactory sensory neurons—fish sensitivity to sex pheromones explained? Chemical Senses. 33(2):119–123. Huang, G., Zhang, J., Wang, D., Mason, R., Halpern, M. (2006) Female snake sex pheromone induces membrane responses in vomeronasal sensory neurons of male snakes. Chemical Senses. 31:521–529. Isogai, Y., Si, S., Pont-Lezica, L., Tan, T., Kapoor, V., Murthy, V., Dulac, C. (2011) Molecular organization of vomeronasal chemoreception. Nature. 478:241- 247. Johnstone, K., Lubieniecki, K., Koop, B., Davidson, W. (2011) Expression of olfactory receptors in different life stages and life histories of wild Atlantic salmon (Salmo salar). Molecular Ecology. 19:4059-69. Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent developments. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences. 374(2065):20150202. Kondoh, D., Yamamoto, Y., Nakamuta, N., Taniguchi, K. (2012) Seasonal Changes in the Histochemical Properties of the Olfactory Epithelium and Vomeronasal Organ in the Japanese Striped Snake, Elaphe quadrivirgata. Histologia Embryologia 41:41-53. LeMaster, M, Mason, R. (2002) Variation in a female sexual attractiveness pheromone controls male mate choice in garter snakes. Journal of chemical ecology. 28(6):1269-85. LeMaster, M., Mason, R. (2001) Evidence for a female sex pheromone mediating male trailing behavior in the Red-sided garter snake, Thamnophis sirtalis parietalis. Chemoecology. 11:149-152. Liberles, S., Horowitz, D., Contos, K., Wilson, J., Liberles, D., Buck, L., (2009) Formyl peptide receptors are candidate chemosensory receptors in the vomeronasal organ. Proceedings of the National Academy of Sciences. 106(24):9842-9847. Lohman, B., Weber, J., & Bolnick, D. (2016). Evaluation of TagSeq, a reliable low‐ cost alternative for RNA seq. Molecular ecology resources, 16(6):1315-1321. Love, M., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with Deseq2. Genome Biology. 15:550.

233

Lutterschmidt, D., Maine, A. (2014) Sex or candy? Neuroendocrine regulation of the seasonal transition from courtship to feeding behavior in male Red-sided garter snakes (Thamnophis sirtalis parietalis). Horm Behav. 66(1):120-34. Mason, R., Fales, H., Jones, T., Pannell, L., Chinn, J., Crews, D. (1989) Sex pheromones in snakes. Science. 254(4915):290-293. Mason, R., Halpern, M. (2011) Chemical ecology of snakes: from pheromones to receptors. North American Society for Comparative Endocrinology Conference, 2011. Mason, R., Jones, T., Fales, H., Pannell, L., Crews, D. (1990) Characterization, synthesis, and behavioral response to the sex attractiveness pheromones of the Red-sided garter snake, Thamnophis sirtalis parietalis. J Chem Ecol. 16:27– 36. Mazzoni E., Desplan C., Celik A. (2004) ‘One receptor’ rules in sensory neurons. Developmental neuroscience. 26(5-6):388-395. Meyer, E., Aglyamova, G., Matz, M. (2011) Profiling gene expression responses of coral larvae (Acropora millepora) to elevated temperature and settlement inducers using a novel RNA‐Seq procedure. Molecular Ecology. 20:3599-3616. Nakazawa, H., Ichikawa, M., Nagai, T. (2009). Seasonal increase in olfactory receptor neurons of the Japanese toad, Bufo japonicus, is paralleled by an increase in olfactory sensitivity to isoamyl acetate. Chemical senses. 34:667-78. O'Donnell, R., Shine, R., Mason, R. (2004) Seasonal anorexia in the male Red-sided Garter snake, Thamnophis sirtalis parietalis. Behavioral and Ecological sociobiology. 56:413-41. Rehorek, S. (2000b) Passage of Harderian gland secretions to the vomeronasal organ in Thamnophis sirtalis (Serpentes: Colubridae). Canadian Journal of Zoology. 78(7): 1284-1288. Rehorek, S., Firth, B., Hutchinson, M. (2000a). The structure of the nasal chemosensory system in squamate reptiles. Journal of Biosciences. 25(2):181- 90. Rivie`re, S., Challet, L., Fluegge, D., Spehr, M. & Rodriguez, I. (2009) Formyl peptide receptor-like proteins are a novel family of vomeronasal chemosensors. Nature. 459:574–577. Rivie`re, S., Challet, L., Fluegge, D., Spehr, M. & Rodriguez, I. (2009) Formyl peptide receptor-like proteins are a novel family of vomeronasal chemosensors. Nature. 459:574–577. Rodriguez I., Feinstein P., Mombaerts P. (1999) Variable patterns of axonal projections of sensory neurons in the mouse vomeronasal system. Cell. 97(2):199-208.

234

Rumble, S., Lacroute, P., Dalca, A., Fiume, M., Sidow, A. (2009) SHRiMP: Accurate Mapping of Short Color-space Reads. PLoS Comput Biol. 5(5):e1000386. Shiao, M., Chang, A., Liao, B., Ching, Y., Jade Lu, M., Chen, S., Li, W. (2012) Transcriptomes of mouse olfactory epithelium reveal sexual differences in odorant detection, Genome biology and evolution. 4(5):703–712. Shine, R., Elphick, M., Harlow, P., Moore, I., Lemaster, M., Mason, R., (2011) Movements, Mating, and Dispersal of Red-sided Garter snakes (Thamnophis sirtalis parietalis) from a communal den in Manitoba. Copeia. 2001(1):82–91. Touhara, K., Vosshall, L. (2009) Sensing odorants and pheromones with chemosensory receptors. Annual Review of Physiology. 71:307-332. Uhrig, E., Lutterschmidt, D., Mason, R., LeMaster, M. (2012) Pheromonal mediation of intraseasonal declines in the attractivity of female Red-sided garter snakes, Thamnophis sirtalis parietalis. J Chem Ecol. 38(1):71-80. Wanner, K., Nichols, A., Allen, J., Bunger, P., Garczynski, S., Linn, C., Luetje, C. (2010) Sex pheromone receptor specificity in the European corn borer moth, Ostrinia nubilalis. PloS one. 5(1):e8685. Zhao, S., Guo, Y., Sheng, Q., Shyr, Y., (2014) Advanced heat map and clustering analysis using Heatmap3. BioMed Research International. 986048.

235

Chapter 6:

Synthesis and Major Conclusions

Chapter introduction:

The Harderian gland is the largest cephalic gland in most groups of terrestrial vertebrates. Although the Harderian gland has been the focus of numerous studies for more than 300 years, only recently has its physiological function begun to be resolved. Payne (1994) states in his comprehensive review of the Harderian gland that “It is arguably the last remaining large organ of widespread distribution among the vertebrates to which we cannot confidently ascribe a confirmed function.”

Recent studies have shown that the Harderian gland in squamate reptiles, and thus those of garter snakes, appears to be an important component of the vomeronasal chemosensory system. The squamate Harderian gland secretes its contents into the nasolacrimal duct which carries the proteinaceous fluid to the lumen of the vomeronasal organ (Rehorek, 2011). It has been hypothesized that the Harderian gland of Red-sided garter snakes produces proteins essential to chemosensory function, such as pheromone-binding proteins and those involved in the extracellular immune system.

The majority of the research to determine the function of this gland has been conducted using mammal models. The Harderian gland appears to have first evolved in early tetrapods and has been recently found to be associated with the vomeronasal organ in extant amphibians (Hillenius, 2001). The association with the vomeronasal organ in garter snakes suggests that this tissue may perform functions similar to those in the ancestral Harderian gland. Thus, the findings presented here may be broadly applicable, and informative as to the function of the Harderian gland than those studies performed in more derived mammalian models. Currently, the best described molecular model for the Harderian gland is the domestic chicken, Gallus gallus, in which the gland has been found to express extracellular antimicrobial proteins and is considered part of the immune system in avian taxa (Deist & Lamont, 2018). As birds

236 do not possess a fully formed vomeronasal chemosensory system, and the Harderian gland in this group is associated with the mucous membranes of the eye. This suggestes a derived function in this group as well.

To date, little research aimed at determining the function of the Harderian gland has been conducted from a molecular perspective. Here, I use garter snakes as a model to investigate the function of the Harderian gland and, through the use of molecular techniques, have characterized several important aspects of the function of this historically enigmatic gland. This thesis research constitutes the most in depth effort to determine the function of the Harderian gland in recent years.

Chapter 1 provides a literature review and describes relevant background information necessary to describe this body of research in an appropriate context. Chapter 2 details the creation of a multi-tissue transcriptome as well as the description of the tissue-specific transcriptome of the Harderian gland in T. s. parietalis and discusses the functional significance of the expressed genes. Chapter 3 focuses on the characterization of the protein components of the fluid filling the lumen of the vomeronasal organ and addresses the findings presented by Rehorek (2000b; 2011) suggesting that the Harderian gland produces the majority of the proteins within this fluid. Chapter 4 describes seasonal variation and sexual dimorphism of gene expression in the Harderian gland of T. s. parietalis and discusses the functional significance of the expressed genes within the context of the vomeronasal chemosensory system. Chapter 5 describes the seasonal variation and sexual dimorphism of the vomeronasal receptor repertoire of T. s. parietalis and discusses the possible functional significance of this variation as a suggested mechanism mediating the observed behavioral shift from mating to feeding which takes place between spring and summer. Here, I synthesize the findings of previous chapters and discuss their significance in the context of describing the role of the Harderian gland in the vomeronasal chemosensory system of the Red-sided garter snake, T. s. parietalis and suggest avenues of future research.

237

Synthesis and Conclusions:

The Harderian gland of T. s. parietalis expresses genes encoding secretory proteins and lipid-binding proteins:

Enrichment analysis showed that the Harderian gland expresses a number of genes involved in the secretory pathway, and proteins which are secreted into the extracellular environment. This was observed in Chapter 2, showing that the Harderian gland transcriptome as a whole was enriched for these proteins. Based on the gross structure of the Harderian gland, and histological observations, the fact that this gland is a secretory tissue had already been well established. Enrichment of secretory pathway proteins and secreted proteins should be expected. These findings demonstrate the validity of the methods used. Enrichment of secretory proteins was also identified in Chapter 4, where it was shown that males specifically express genes encoding secretory proteins at a higher rate than females during both the spring mating period and the summer feeding period.

In addition to secretory proteins, it was also found that the Harderian gland is enriched for lipid-binding proteins. This was presented in both Chapters 2 and 4, showing that the Harderian gland as a whole is involved in the expression of genes encoding lipid-binding proteins, and that the glands of males more actively express genes encoding these proteins compared to females in the spring and summer. The comparison of males to females in the spring showed enrichment in males of a relatively large number of additional gene sets involved in many different processes such as cytosolic proteins, metabolic proteins and ribosomal proteins. These data suggest that the Harderian gland of male garter snakes is very active in the spring both metabolically, and in the functional capacities of producing lipid-binding proteins and secretory proteins. These two functions appear to be related, as many of the lipid-binding proteins predicted to be produced in the Harderian gland were found to be present in relatively large quantities in the vomeronasal lumen, as described in Chapter 3. These results, however are based on correlations and inferred from mRNA

238 expression profiles and require further experimental validation. Lipid-binding proteins were predicted to be expressed in the Harderian gland and hypothesized to facilitate the solubilization and detection of nonpolar lipid pheromones.

Proteins expressed in the Harderian gland are present in the lumen of the vomeronasal organ:

Over the course of 14 years, Rehorek (1997; 1998), Rehorek et al. (1997; 2000a; 2000b; 2011), and Hillenius & Rehorek (2005) contributed a great deal to our understanding of the squamate Harderian gland and its associated structures. In this body of research, Rehorek showed that in most squamate taxa, the Harderian gland is physically associated with the vomeronasal chemosensory system. The study by Rehorek (2000b) is key to the conceptualization of this dissertation research in that it demonstrates that the secretions of the Harderian gland in T. s. parietalis enter the nasolacrimal duct and are transported directly to the vomeronasal lumen. Additionally, the authors showed that the vomeronasal organ has little secretory ability, and it is unlikely that a large proportion of the fluid in the vomeronasal lumen originated there. The research presented in Chapter 3 addresses this claim using integrated methods including RNA-seq expression profiling, shotgun mass spectrometry, and bacterial killing assays (BKAs) to characterize and relatively quantify the proteins present in this fluid and asses the tissue of origin of individual proteins.

The mass spectrometry results presented here confidently identify 140 proteins present in vomeronasal secretion including a high proportion of lipid-binding (~35%) and antimicrobial defense proteins. Using gene expression measure by RNA-seq, the origins of these proteins were assessed. Over 77% of the sum peak intensity of individual proteins identified by mass spectrometry were predicted to be produced in the Harderian gland, whereas only 6% was predicted to be produced in the vomeronasal organ itself. Both tissues were found to express genes encoding

239 antimicrobial proteins, but the Harderian gland expressed the vast majority of the genes encoding lipid-binding proteins which were identified in the vomeronasal lumen. In addition to these lipid-binding proteins, the fluid was also found to contain a large of amount of the protein “Harderian gland protein HG33” (first sequenced by the Rehorek lab from the Harderian gland of the Red-sided garter snake). Here, the gene encoding HG33 was found to be expressed primarily in the Harderian gland and this protein constitutes ~31% of the sum peak intensity in Harderian gland secretion. The function of HG33 however is still unknown.

Evidence presented in Chapter 3 also provides experimental evidence that the fluid within the vomeronasal organ has antimicrobial properties. The evidence provided here shows that vomeronasal secretion inhibited or killed bacteria using in- vitro BKA assays. These assays did not distinguish between the properties of proteins produced in the Harderian gland from those produced in the vomeronasal organ, but antimicrobial defense proteins were found to be expressed in both tissues, and the effect therefore may be due to a combination of proteins produced in both tissues.

A candidate pheromone-binding protein was identified which was found to be both expressed in the Harderian gland and present in the lumen of the vomeronasal organ:

Any protein that may be considered a candidate for identification as a pheromone-binding protein produced in the Harderian gland of T. s. parietalis must fulfil several requirements. First, it must be capable of binding and solubilizing the methyl ketones (lipids) which compose the sexual attractiveness pheromone expressed on the skin of female snakes (Mason et al. 1989). Pheromone-binding proteins must also be present in a location where they may interact with chemical signals in the environment as well as with the vomeronasal sensory epithelium. In T. s. parietalis, this means that it must be a component of the fluid secreted into the vomeronasal lumen and exit the vomeronasal duct to be present on the tongue which is used to collect chemical signals. In addition to these requirements it is likely that a

240 pheromone-binding protein will display strong male-biased expression, and expression biased toward the spring mating period.

In Chapters 3 and 4, one protein is described which fits all of these hypothetical characteristics. Chapter 4 describes a single transcript which was annotated as a lipocalin (An extracellular lipid-binding protein) and expressed highly in the Harderian gland of males while being nearly absent from those of females (ID: DN1116_c0_g1_i1; hereafter referred to as a candidate pheromone-binding protein). Expression patterns for this transcript also showed that it is significantly more highly expressed in males during the spring mating period compared to the summer feeding period. This pattern was not observed in spring and summer females. As described in Chapter 3 this same protein was identified (as protein rather than as a transcript sequence) as a component of vomeronasal secretion using shotgun protein mass spectrometry. In this portion of the study, the candidate pheromone-binding protein was found to be among the top 10 most abundant proteins in the fluid within the vomeronasal organ. Additionally, it was detected only in the secretions from males snakes and was nearly 3 times more abundant in males from the spring mating period compared to the summer feeding period.

These findings demonstrate that this protein is annotated as a lipocalin, the gene encoding this protein is transcribed in the Harderian gland, and the protein is present in vomeronasal secretion. Expression patterns show predicted male- and spring- biased expression patterns measured by RNA-seq and of protein abundances measured by mass spectrometry in the fluid within the vomeronasal lumen.

These findings are sufficient for identification of a candidate pheromone-binding protein. However, this body of research requires experimental validation demonstrating pheromone-binding properties of this candidate protein, or that its presence increases the sensitivity of the vomeronasal sensory epithelium to sexual attractiveness pheromone.

241

The Harderian gland likely has a functional role as a component of the extracellular immune system within the vomeronasal chemosensory system:

The Harderian gland was found to express transcripts encoding proteins predicted to be involved in antimicrobial defense. Chapter 2 presents research demonstrating that gene expression in the Harderian gland is enriched for antimicrobial defense proteins. Findings from Chapter 3 showed many extracellular immune proteins are present in the fluid within the vomeronasal lumen and that many of those proteins are predicted to be produced in the Harderian gland. Immune function in the avian Harderian gland was first suggested by Wight et al (1971), and it has been well documented that the Harderian gland of birds produces extracellular antibodies (Montgomery and Maslin, 1992). Snakes and birds share a relatively recent (in evolutionary historical terms) common ancestor, both being within the clade Sauropsida. It seems likely then, that the Harderian glands of these two groups may share similar functions. Birds however, do not appear to possess a fully formed vomeronasal organ, and the Harderian gland in this group is associated with the ocular mucosa. Deist & Lamont (2018) found that the avian Harderian gland not only expressed genes encoding extracellular antibodies, but displays enrichment of gene expression of the G-protein coupled receptor (GPCR) signaling pathway, and the suthors suggest that this indicates that this tissue is also capable of responding to pathogen invasion. The findings here show no evidence of enrichment of the GPCR signaling pathway in the Harderian gland of T. s. parietalis. This may be due to the difference in morphologies of structures surrounding the Harderian gland, eye, and vomeronasal organ (or lack thereof). Birds have an exposed eye mucosa, whereas the eye of a snake is covered with a transparent scale. In this arrangement, the eye of a bird would be often exposed to environmental pathogens, whereas the eye of a snake is protected from the majority of these. The Harderian gland of a snake exerts its immune effects remotely via antimicrobial proteins secreted into the nasolacrimal duct. Therefore, the chicken Harderian gland may benefit from an ability to respond to pathogen invasion, whereas the gland in a snake would likely never contact

242 pathogens directly. The ability of vomeronasal secretions to kill or deactivate bacteria in-vitro demonstrated in Chapter 3 constitutes the first description of experimental evidence for immune function in the Harderian gland of a squamate.

Evidence for porphyrin synthesis in the Harderian gland of T. s. parietalis:

Porphyrin production in the mammalian Harderian gland is among the few confirmed functions of this tissue in mammals. Although there is little evidence indicating a biological function for these molecules, an association has been observed between porphyrin secretion and stress or trauma in rats, and it is associated with oxidative stress in hamsters.

The findings presented in Chapter 4 show that the Harderian glands of both male and female snakes express genes related to porphyrin binding, but only during the summer feeding time period. This is the first description of an association with porphyrin binding in a squamate, and as porphyrins have not been described as being produced in amphibians, this description likely constitutes the earliest in evolutionary history to date. The function of porphyrin-binding proteins here, as in mammals, is unknown. If the production of porphyrins in garter snakes is associated with oxidative stress, the production of these substances in the summer may be due to increased feeding and metabolism, as feeding does not occur at any other time of year. The association of porphyrin production with stress and physical trauma described in rodents may indicate that the production of these compounds could be related to infection or tissue damage signals which elicit immune responses. Blakemore et al. (in prep. 2019) found that immune responses of garter snakes are stronger in the summer feeding period compared to the spring mating period. If the immune system is more active in the summer the two may be related in some capacity. The functions for porphyrins in the Harderian glands of garter snakes mentioned here are based on loose associations only and should not be considered plausible explanations without further investigation.

243

Variation of V2R vomeronasal receptor profiles does not appear to mediate male seasonal behavioral cycles:

The expression of chemical receptors in the vomeronasal sensory epithelium of T. s. parietalis has been suggested as a possible mechanism contributing to the mutually exclusive behavioral shift from spring mating to summer feeding. Uhrig et al. (2012) showed that the pheromone profiles produced by females changes to becoming less attractive several weeks after emergence and suggested that this may contribute to the behavioral shift observed in males. The behavioral shift, however appears to occur independent of female emergence time indicating that this is likely not the driving mechanism. Lutterschmidt et al. (2014) showed that the behavioral shift is likely due to changes occurring within the neuroendocrine system leading to differential responses to the signal rather than an inability to detect it.

No evidence was described here indicating that a shift in the profile of chemosensory receptors expressed in the vomeronasal organ contribute to the behavioral shift. If this shift were driven by differential detection of chemical stimulus elicited from sex pheromone based on the expression of different receptor repertoires in spring and in summer, a down regulation of specific receptors (indicative of pheromone receptors) would likely be observed to occur during that shift. Here, only two V2R receptors were found to be significantly differentially expressed and more abundant in the spring. These receptors, although considered significantly differentially expressed showed borderline p-values (~0.049), were expressed at relatively low levels in the spring, but were not found to be absent in the summer which would have indicated a possible mechanism mediating seasonal behaviors. This finding shows an association with the spring mating season, which could indicate that they are involved in detection of sex pheromone but provides no evidence that their downregulation in the summer contributes to the behavioral shift. These observations indicate that the most likely mechanism is still the neuroendocrine

244 regulation occurring in the brain of male snakes described by Lutterschmidt et al (2014).

Although this study did not find evidence that receptor repertoires vary greatly by sex or change between spring and summer, it did show the interesting result that vomeronasal chemosensory receptors are nearly universally downregulated in the spring. Huang et al. (2006) showed that the vomeronasal sensory epithelium of male T. s. parietalis responded to sexual attractiveness pheromone, but that of females did not. This disparity led to the hypothesis that the vomeronasal organ of male snakes may express specific pheromone receptors not expressed in females. The results presented in Chapter 5 however demonstrates that, rather than a single or small number of receptors expressed only in males, nearly all of the vomeronasal receptors were downregulated in females. The research presented here suggest that the findings of Huang et al. (2006) are likely do to this nearly universal downregulation of spring female receptors rather than the absence of sex-specific pheromone receptors from the female vomeronasal sensory epithelium.

Evidence for sexually dimorphic seasonal regulation of the vomeronasal chemosensory system including the Harderian gland:

Repeatedly throughout this body of research, various results suggest that the vomeronasal organ and the Harderian glands of female garter snakes are inactive during the spring. This is demonstrated by the findings that male Harderian glands in the spring are enriched for proteins involved in secretion and lipid binding (two major functions of the Harderian gland as a whole), as well as many other functional gene sets involved in transcription, translation, and various metabolic processes. Additionally, with the exception of a single transcript, females showed very little expression of V2R vomeronasal chemical receptors. These findings mirror and corroborate those of Erickson (2007) which showed that the female Harderian gland

245 was inactive in the spring both when compared to males, and when compared to females in the summer.

It is consistently observed that male T. s. parietalis dedicate an extraordinary amount of time and energy to mating in the spring. Males emerge as soon as the snow melts in the spring which usually occurs in early-mid April. Males are seen on the surface before females, and individuals spend much of the active portion of each day searching for potential mates and competing with other males for mating opportunities. This occurs until most of the females have left the den in late May. During this time, males are not able to feed, and will not take food, even if it is offered. The season in which garter snakes have an opportunity to feed in Manitoba is incredibly short and can be as little as 10 weeks excluding the time it takes to migrate to and from the feeding areas. It is remarkable that these snakes remain at the den for over a month trading valuable feeding opportunities for the opportunity to mate. This tradeoff is essential in this population, owing to the fact that a male must be at the den in order for his genes to be passed into the next generation. The extreme nature of this explosive mating system demonstrates the intensity of the selective force driving the entire population to participate. Therefore, it follows that while males are at the den, they must maximize their ability to locate mates. The detection of sexual attractiveness pheromone via their vomeronasal chemosensory system is the primary (perhaps only) method by which a male snake locates and recognizes a female as a potential mate. This pheromone is also used to convey information about the size and body condition, and therefore the fecundity of a female, thus it is an extremely important factor influencing male reproductive success (LeMaster and Mason, 2002).

In addition to likely facilitating the detection of female sex pheromone, the Harderian gland appears to be an important component of the immune system producing antimicrobial defense proteins which protect the sensitive mucous membranes and sensory epithelium of the vomeronasal organ from environmental pathogens. It has been suggested that the various lipocalins found in ‘tears’ in the eyes of mice may function in both a chemical signaling capacity and as part of the

246 immune system (Stopková, 2014). The findings presented here may lend support to this hypotheses. Several lipocalins are secreted by the Harderian gland in very large quantities. Although no experimental evidence shows that the these lipocalins specifically have any antimicrobial properties, it is demonstrated in Chapter 3 that the whole secretion was found to kill or disable bacteria in vitro.

During the same period of time, female Red-sided garter snakes begin to emerge several weeks after the first males appear. Females remain near the den for only a short period of time before migrating to the surrounding marshes to begin feeding. While at the den, females rarely tongue-flick, and do not actively seek to locate or pursue mates. Rather, they are actively pursued by large numbers of males seeking to gain mating opportunities.

This mating system places a large reproductive responsibility on males near the den to locate mates. Because the location of mates is very dependent on a male’s ability to detect female sex pheromones, it follows that a strong selective force would drive the vomeronasal organs of males to be extremely active during the spring mating period. The vomeronasal organ of females does not appear to be nearly so important to an individual female’s reproductive success, suggesting that selective forces would drive the female vomeronasal chemosensory system toward an inactive state as a means of preserving energy, or at the least would likely not drive it toward becoming more active.

Findings presented throughout this thesis indicate that the vomeronasal chemosensory system of males becomes active much earlier in the spring than that of females. Males are behaviorally more active in their use of this chemosensory system during the spring – tongue-flicking and responding to chemical cues. At this time, the Harderian gland is physically hypertrophied, and gene expression of many functional proteins are increased. Simultaneously, the vomeronasal organ begins to express chemical receptors allowing males to respond to their chemical environment.

247

Notes on transcriptome quality and applications within the T. s. parietalis system:

The genome of T. sirtalis was sequenced in 2015 (Thamnophis sirtalis annotation release 100, NCBI), but the application of this genome is extremely limited when used as a reference for sequencing-based research involving T. s. parietalis. several attempts by multiple research groups have demonstrated that using gene sequences from this draft genome results in poor quality data and is not useful for this purpose. Using the same Tag-seq data presented in Chapters 2, 3, 4, and 5, mapping to the T. s. concinnus genome resulted in very low percentages of reads being mapped, and many low quality alignments. These findings validate the decision to create a multi-tissue transcriptome from T. s. parietalis. To my knowledge, no high throughput RNA-seq has been attempted in T. s. concinnus, so no evaluation can be made concerning the appropriateness of the genome to this subspecies.

The Tag-seq data produced from the Harderian glands which resulted in low mapping percentages when mapped to the T. s. concinnus genome resulted in extremely high mapping percentages (>95%) when applied to the transcriptome produced here. The read-mapping data produced from the vomeronasal organs (used in Chapters 3 & 5) resulted in lower percentages of mapped reads (~77%). This appears to be the result of a very unexpected inherent incompatibility of this specific version of the Tag seq protocol with the specific Illumina platform on which it was sequenced. The sequencing library produced from Harderian gland tissue was sequenced using the HiSeq2500, whereas the library from vomeronasal organ tissue was sequenced on the HiSeq3000. The latter is capable of producing a larger number of reads but results in a larger number of lower quality reads when used in conjunction with this library prep. Examination of the read mapping data from this platform showed that the reads that do map, appear to be assigned correctly, indicating that the concern is with the number of reads mapped, and the quality of the findings are not substantially affected. The read mapping obtained from both sequencing runs still far outperformed the results obtained while mapping to the genome.

248

Using RNA-seq data from a separate sequencing run mapped to the multi tissue transcriptome appears to have had a positive effect on the quality of the data. The Harderian gland tissue-specific transcriptome showed greatly improved quality metrics over the full transcriptome assembly. This is likely due to the removal of a large number of spuriously assembled transcripts which are present in most transcriptome assemblies. The transcripts with reads mapped are likely to exist and are demonstrated to be expressed in the Harderian gland of T. s. parietalis.

249

Future directions:

Differences in eye physiology may be responsible for the lack in G-protein coupled receptor signaling pathway enrichment in snakes compared to birds:

The Harderian gland of the Chicken is found to be enriched for expression of proteins involved in the G-protein coupled receptor signaling pathway (Deist & Lamont, 2018). The authors reporting this finding ascribed its function to detection and response to pathogens invading the ocular mucosa. The GPCR pathway was not found to be enriched in the Harderian gland of T. s. parietalis. As described here, the chicken eye is exposed whereas the eyes of snakes are covered with a protective transparent scale and are not exposed to the environment. This hypothetical explanation makes sense based on the differences in eye structures in these groups. However, the findings presented here only failed to detect enrichment of the GPCR pathway. Further research may shed additional light on the validity of the explanation provided, and further our understanding of the Harderian gland in snakes as well as in birds.

Harderian gland protein HG33:

Many proteins were identified in fluid secretions of the vomeronasal organ, but the majority of the content of the fluid is comprised of just a few proteins. The second most abundant protein comprising nearly 1/3 of the total protein content of the secretion is annotated as ‘Harderian gland protein HG33’, identified by Rehorek and Saxion (2013). No publication is associated with HG33 as of this writing, but it has been identified as a cDNA and uploaded to the NCBI database. The structure or function of HG33 is unknown. As this protein is both highly expressed in the Harderian gland and present in very high abundances in vomeronasal secretion, it is likely a key component of this secretion, and essential to our understanding of the role of the Harderian gland in the vomeronasal chemosensory system. These findings certainly warrant additional research in order to discover the identity, structure or function of HG33, and determine its role within the vomeronasal organ.

250

Further analysis of vomeronasal secretion using Mass spectrometry:

The protein mass spectrometry analysis described in Chapter 3 included statistical comparisons of protein abundances in the fluid of the vomeronasal organ. These comparisons did not yield any statistically significant comparisons, even of proteins with apparently very different expression patterns, due to the high levels of variation observed between samples. Further analyses in this line of research would benefit from the addition of more samples to better characterize and provide more statistically rigorous results concerning the sexual and seasonal variation in protein content of vomeronasal secretion.

Additional ‘low confidence’ protein identifications in vomeronasal secretion:

The protein mass spectrometry based portion of this study described in Chapter 3 identified 140 proteins with high confidence as components of vomeronasal secretion. In addition to these, 536 proteins were identified in the secretion, but not included in the list of high confidence identifications due to being identified in only a single sample or identified based on a single unique peptide. While this list of lower confidence proteins surely contains some spurious identifications and protein contaminants, it is also likely to contain proteins with an important function within this fluid. As the data is already generated, the investment necessary to begin this line of research is only in researcher time, and may yield interesting or important additional information about the function of these proteins.

Experiments to confirm function of the candidate pheromone-bind protein identified here:

Chapters 3, and 4 present evidence that a pheromone-binding protein is expressed in the Harderian gland and present in the fluid of the vomeronasal organ. The candidate protein appears to fulfil all the necessary requirements, being annotated

251 as a lipocalin family lipid-binding protein, and present as protein in the fluid of the vomeronasal organ. Additionally, it displays expression patterns that were predicted to be associated with a pheromone-binding protein – showing strongly male-biased expression and displaying seasonally biased expression toward the spring mating period in males, but not in females. The findings of Huang et al. (2006) show that Harderian gland homogenate increases the solubilization of sex pheromone indicating that a pheromone-binding protein is likely to be present. The lipocalin discussed here is the most likely candidate, but no experimental evidence has been provided to confirm its function. In order to be confirmed as a pheromone-binding protein, experimental evidence must demonstrate that this lipocalin does in fact bind the pheromone, or it must be demonstrated that the protein increases the sensitivity of the vomeronasal organ to sex pheromones. Evidence to demonstrate these characteristics may be obtained from experiments such as various in-vitro binding and solubilization assays, electrophysiological observations of vomeronasal sensory epithelium responses to pheromone or by removing the protein with siRNA or morpholinos paired with behavioral assays demonstrating a reduction in courtship behavior. This evidence is essential to confirming the pheromone-binding role of the Harderian gland and is considered the highest priority for future research in this area.

What are the functional significances of differentially expressed genes in the vomeronasal organ?

This research was focused on characterizing and describing the expression of vomeronasal receptor expression in T. s. parietalis with respect to specific hypotheses. While affective for these purposes, the analyses performed here excluded in depth examination of differential expression of all genes in the vomeronasal organ of this species. Differential expression analyses identified thousands of genes which were significantly differentially expressed by sex, season, or both. However, the functional significance of these differentially expressed genes is unknown. High

252 throughput sequencing (the most resource intensive component of this analysis) has been performed, and unused data is awaiting analysis and interpretation.

What are the functions of the vomeronasal receptors with contrasting expression patterns?

The findings presented here showed that the majority of vomeronasal receptors are downregulated in spring females. However, 1 receptor transcript was found to be significantly upregulated in spring females compared to summer females, and 3 receptors were found to be upregulated compared to spring males. One explanation for these findings is that the transcripts are misidentified, and not actually V2Rs, but transcripts from another gene with expression that does not follow the general trend found in the real V2R receptors. It may also be possible that these transcripts are V2R receptors. If this is the case, these receptors may have important biological functions not shared by the other receptors. For instance, the detection of specific chemical signals by females in the spring may be important to the biology of this species. Further investigation of these transcripts is necessary to determine if they are misidentified and, if they are found to be correctly identified, to determine their function in the vomeronasal organ of T. s. parietalis.

Identification of FPRs, V1Rs, and ‘ancV1R’:

This study included all transcripts from the multi-tissue transcriptome (Chapter 2) which annotated as a vomeronasal receptor. A total of 615 receptor transcripts were identified in the transcriptome, of which 322 were found to be expressed in the vomeronasal organ of T. s. parietalis in this study. All of these were annotated as V2R vomeronasal receptors. Vomeronasal organs across taxa use a broader variety of receptors including V1R, V2R and Formyl Peptide receptors (FPRs). It is possible that the annotation of these transcripts was biased toward organisms closely related to

253

T. s. parietalis such as other snakes, other squamates, and other sauropsids. Many V1Rs and FPRs have been described in other groups, and it is possible receptors found here are misidentified. Additional annotation analyses may yield more accurate or more specific identifications of the vomeronasal receptors in this group.

Additionally, a recently described vomeronasal receptor, referred to as ‘ancV1R’ (ancient V1R) was described in teleost fishes, and found to be conserved across 400 million years of vertebrate evolution as they are found to be present in mammals as well (Suzuki et al. 2018). This V1R is present in many lineages of vertebrates and has been suggested as an ancestral V1R protein. A specific analysis to investigate the presence of ancV1R in snakes may yield results with interesting evolutionary implications.

Expression of plasminogen in the Harderian gland:

Plasminogen is the inactive precursor to plasmin, a protease protein commonly found in the blood stream. Plasmin is especially important in the degradation of blood clots formed when fibrinogen is activated in the blood stream to produce fibrin polymers. Plasminogen is normally found to be produced primarily in the liver. Here, extremely high levels of plasminogen expression (~30.6% of identified protein expression) was found in Harderian gland tissue. Despite the high levels of expression in the Harderian gland, Plasminogen was not detected in vomeronasal secretion via protein mass spectrometry. The functional significance of plasminogen production in the Harderian gland is unknown, as is the destination of the protein. It is possible that the plasminogen transcripts are not expressed as protein except in the event of injury. It is also possible that plasminogen proteins produced here are secreted to enter the blood stream rather than into the nasolacrimal duct to travel to the vomeronasal organ. This finding suggests that the Harderian gland of T. s. parietalis may have a role in plasmin production not yet described in any organism.

254

Electrophysiology to examine responses to sex pheromone in the summer:

The patch-clamp based electrophysiology study conducted by Huang et al. (2006) showed that male vomeronasal organs respond to sex pheromone whereas females do not. This study, however, was conducted during the spring mating period at a time when females are found to have nearly universally downregulated vomeronasal receptors. This study may be repeated in the summer to determine: 1) if male vomeronasal organs actually do respond to sex pheromone during the summer, and 2) if females are capable of responding to sex pheromone during the summer as well. If these findings were reported, this would confirm the conclusion suggested here that seasonal variations in vomeronasal receptor profiles are not a likely mechanism for the shift from mating to feeding observed in male snakes. It would also confirm that females do express vomeronasal receptors capable of detecting sex pheromone, but only in the summer.

255

Bibliography:

Achiraman, S., Archunan, G. (2005). 3-Ethyl-2,7-dimethyl octane, a testosterone dependent unique urinary sex pheromone in male mouse (Mus musculus). Animal reproduction science. 87:151-61. Akerstrom, B., Maghzal, G., Winterbourn, C., Kettle, A. (2007) The lipocalin alpha(1)- microglobulin has radical scavenging activity. Journal of Biological Chemistry. 282:31493–31503. Albini, B, Wick, G, Rose, E, Orlans, E. (1974) Immunoglobulin production in chicken Harderian glands. Int Arch Allergy Immunol. 47:23-34. Albone, E. (1984) Mammalian Semiochemistry. In: The investigation of chemical signals between mammals. (Chichester). Alekseyenko, O., Baum, M., Cherry, J. (2006) Sex and gonadal steroid modulation of pheromone receptor gene expression in the mouse vomeronasal organ. Neuroscience. 140(4):1349-57. Aleksiuk M., and Gregory P. (1974) Regulation of seasonal mating behavior in Thamnophis sirtalis parietalis. Copeia. 1974(3):681489. Alving, W. & Kardong, K. (1996). The role of the vomeronasal organ in rattlesnake (Crotalus viridis oreganus) predatory behavior. Brain, Behavior and Evolution, 48(3):165-172. Andrews, S. (2010) FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc. Ballouz, S., Pavlidis, P., Gillis, J., (2017). Using predictive specificity to determine when gene set analysis is biologically meaningful. Nucleic Acids Research. 45(4):e20. Bargmann, C. (1997) Olfactory receptors, vomeronasal receptors, and the organization of olfactory information. Cell. 90(4):585-587. Barka, T., Anderson, P. (1965) Histochemistry: theory, practice and bibliography. Harper and Row, New York. Bateman, A., Clements, J., Coggill, P., Eberhardt, R., Eddy, S., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E., Tate, J., Punta, M. (2013) Pfam: the protein families database. Nucleic acids research. 42:D222- D230. Belanger, R., Corkum, L. (2009) Review of aquatic sex pheromones and chemical communication in anurans. Journal of Herpetology 43(2). Bellairs, A. (1970). In: The life of reptiles. Universe Books, New York.

256

Bellairs, A., Boyd, J. (1950). The lachrymal apparatus in lizards and snakes II: The anterior part of the lachrymal duct and its relationship with the palate and with the nasal and vomeronasal organs. Proc. Zool. Soc. Lond. 120:167-310. Benjamini, Y., & Hochberg Y. (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. B57(1):289-300. Bernstein, C., Bernstein, H. (1997) Sexual communication. Journal of theoretical biology. 188(1):69-78. Bertmar, G., (1981). Evolution of vomeronasal organs in vertebrates. Evolution. 35:359–366 Beynon R., Hurst, J., Turton, M., Robertson, D., Armstrong, S., Cheethtam, S., Simpson, D., MacNicoll, A., Humphries, R. (2008) Urinary lipocalins in Rodentia: Is there a Generic Model? In: Chemical Signals in Vertebrates. 11. Springer, New York, NY. Blakemore, L. (2017) Sex and survival: Reproduction and anti-microbial defense in the Red-sided garter snake (Thamnophis sirtalis parietalis). M.S. thesis. Oregon State University. Corvallis, OR. Blakemore, L., Bentz, E., Hubert, D., Morehead, A., Mason, R. (In Prep.) Improved husbandry techniques resulting in higher lab-raised neonatal survival in the Red-sided garter snake, Thamnophis sirtalis parietalis. Blakemore, L., Dolan, B., Bentz, E., Mason, R. (In Prep.) Sex or survival: tradeoffs between reproduction and anti-microbial defense in the Red-sided garter snake, Thamnophis sirtalis parietalis. Blighe, K. (2018) EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling. https://github.com/kevinblighe. Bocskei, Z., Groom, C., Flower, D., Wright, C., Phillips, S., Cavaggioni, A. (1992) Pheromone-binding to two rodent urinary proteins revealed by X-ray crystallography. Nature. 360:186–8. Boillat, M., Challet, L., Rossier, D., Kan, C., Carleton, A., Rodriguez, I. (2015) The vomeronasal system mediates sick conspecific avoidance. Curr Biol. 25(2):251- 255. Born, G. (1876). Uber die Nasenh6hlen und der Thranennasengang der Amphibien. Gegenb. morph. Jahrb. 2:578-64. Bourgon, R., Gentleman, R., Huber, W. (2010) Independent filtering increases detection power for high-throughput experiments. Proceedings of the National Academy of Sciences. 107(21):9546-9551. Brito, N., Moreira, M., Melo, A. (2016) A look inside odorant-binding proteins in insect chemoreception. Journal of Insect Physiology. 95:51-65.

257

Broman, I. (1920). Das Organon Vomero-Nasale Jacobsoni-ein Wassergeruchsorgan! Arb. anat. Inst., Wiesbaden (Anat. Hefte, Abt. I) 58:137-191. Brownscheidle, C. (1974) The Morphology and Histochemistry of the Harderian Gland of the Mongolian Gerbil, Meriones unguiculatus. Doctoral dissertation, State University of New York at Buffalo, Buffalo, NY. Brykczynska U, Tzika A, Rodriguez I, Milinkovith M. (2013) Contrasted evolution of the vomeronasal receptor repertoires in Mammals and Squamate reptiles. Genome Biology and Evolution. 5(2):389–401. Bucana, C., Nadakavukaren M. (1972). Fine structure of the hamster Harderian gland. Z. Zellforsch. 129:178–187. Buck, L. & Axel, R. (1991) A novel multigene family may encode odorant receptors: a molecular basis for odor recognition. Cell. 65:175-187. Burns, R. (1992) The Harderian gland in birds: histology and immunology. In: Harderian glands: porphyrin metabolism, behavioral and endocrine effects. (Springer)155-163. Buzzell, G. (1996) The Harderian gland: perspectives. Microscopy research and technique. 34(1). Buzzell, G. (1996a). Sexual dimorphism in the Harderian gland of the Syrian hamster is controlled and maintained by hormones, despite seasonal fluctuations in hormone levels: functional implications. Microsc. Res. Tech. 34:133–138. Buzzell, G. (1996b). The Harderian gland: perspectives. Microsc. Res. Tech. 34:2-5. Buzzell, G., Menendez-Pelaez, A., Porkka-Heiskanen, T., Pangerl, B., Vaughan, M., Reiter, R. (1989) Bromocriptine prevents the castration-induced rise in porphyrin concentration in the Harderian glands of the male Syrian hamster, Mesocricetus auratus. Journal of Experimental Zoology 249:172-176. Castaldi, P., Cho, M., Liang, L., Silverman, E., Hersh, C., Rice, K., Aschard, H., (2017) Screening for interaction effects in gene expression data. PLoS ONE. 12(3): e0173847. Castellino, F. (2013). Plasmin: Activity and specificity. In: Handbook of proteolytic enzymes. 3. Castellino, F., Ploplis, V. (2005) Structure and function of the plasminogen/plasmin system. Thrombosis and Haemostasis. 93(4):647-654. Chang, H., Liu, Y., Yang, T., Pelosi, P., Dong, S., & Wang, G. (2015). Pheromone- binding proteins enhance the sensitivity of olfactory receptors to sex pheromones in Chilo suppressalis. Scientific reports. 5:1309. Cheetham, S., Thom, M., Jury, F., Ollier, W., Beynon, J., Hurst, J. (2007) The genetic basis of individual-recognition signals in the mouse. Curr Biol. 17: 1771–1777.

258

Chieffi Baccari, G. Chieffi, G., Di Mateo, L., Danfis, D., De Rienzo, G., Minucci, S., (2000) Morphology of the Harderian gland of the Gecko, Tarentola mauritanica. Journal of morphology. 244(2):137-142. Chieffi, G., Baccari, G., Di Matteo, L., Istria, M., Minucci, S., Varriale, B. (1996). Cell Biology of the Harderian Gland. International Review of Cytology. 168:1- 80. Chieffi, G., Chieffi-Baccari, G., Di Matteo, L., d’Istria, M., Marmorino, C., Minucci, S. and B. Varriale. (1992). The Harderian gland of amphibians and Reptiles. In: Harderian Glands. Porphyrin Metabolism, Behavioral and Endocrine Effects. Webb, S.M., Hoffman, R.A., Puig-Domingo, M.L. and R. J. Reiter (Eds). Springer-Verlag, Berlin. 91-108. Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., Wojciech Szcześniak, M., Gaffney, D., Elo, L., Zhang, X., Mortazavi1, A. (2016). A survey of best practices for RNA-seq data analysis. Genome Biology. 17:13. Coto-Montes, A., Boga, J., Tomás-Zapico, C., Rodrıgueź -Colunga, M., Martıneź - Fraga, J., Tolivia-Cadrecha, D., Menéndez, G., Hardeland, R. (2001) Physiological oxidative stress model: Syrian hamster Harderian gland—sex differences in antioxidant enzymes. Free Radical Biology and Medicine. 30(7):785-792. Cucinotta, F., Chappell, L. (2010) Non-targeted effects and the dose response for heavy ion tumor induction. Mutat Res. 687(1-2):49-53. Cui, Zhou, Satoh, Habara. (2003). A physiological role for protoporphyrin IX photodynamic action in the rat Harderian gland? Acta Physiologica Scandinavica. 179(2):149-154. D'Aniello, B., Semin, G. R., Scandurra, A., & Pinelli, C. (2017). The Vomeronasal Organ: A Neglected Organ. Frontiers in neuroanatomy, 11:70. David, M., Dzamba, M., Lister, D., Ilie, L., Brudno, M. (2011) SHRiMP2: Sensitive yet practical short read mapping. Bioinformatics. 27(7):1011–1012. Dawley, E., Crowder, J. (1995) Sexual and seasonal differences in the vomeronasal epithelium of the red-backed salamander (Plethodon cinereus). Journal of Comparative Neurology. 359(3):382-90. Day, J., Thorpe, S, Baynes, J. (1979) Nonenzymatically glucosylated albumin. In vitro preparation and isolation from normal human serum. J. Biol. Chem. 254(3):595–7. Dayger, C., Lutterschmidt, D. (2016) Seasonal and sex differences in responsiveness to adrenocorticotropic hormone contribute to stress response plasticity in Red- sided garter snakes (Thamnophis sirtalis parietalis). Journal of Experimental Biology. 219(7):1022.

259

Deist M., Lamont S. (2018) What makes the Harderian gland transcriptome different from other chicken immune tissues? A gene expression comparative analysis. Frontiers in physiology. 9:492. Di Matteo, L., Minucci, S., Chieffi Bacari, G., Pellicciari, C., d’Istria, M., Chieffi, G. (1989) The harderian gland of the frog, Rana esculenta, during the annual cycle: histology, histochemistry and ultrastructure. Basic and Applied Histochemistry. 33(2):93-112. d'Istria, M., Chieffi-Baccari, G., Di Matteo, L., Minucci, S., Varriale B., Chieffi G. (1991) Androgen receptor in the Harderian gland of Rana esculenta. Journal of Endocrinology.129:227–232. Dolan, B., Fisher, K., Colvin, M., Benda, S., Peterson, J., Kent, M., Schreck, C. (2016) Innate and adaptive immune responses in migrating spring-run adult chinook salmon, Oncorhynchus tshawytscha. Fish Shellfish Immunol.48:136- 44. Domínguez-Pérez, D., Durban, J., Aguero-Chapin, G., Lopes, J., Molina, Ruiz, R., Almeida, D., Calvete, J., Vasconcelos, V., Antunes, A. (2018). The Harderian gland transcriptomes of Caraiba andreae, Cubophis cantherigerus and Tretanorhinus variabilis, three colubroid snakes from Cuba. Genomics. In press. Domon, B., Aebersold, R. (2006) Mass spectrometry and protein analysis. Science. 312:212-217. Døving, K., & Trotier, D. (1998). Structure and function of the vomeronasal organ. Journal of experimental biology. 201(21):2913-2925. Døving, K., Trotier J., Rosin, F., Holley, A. (1993). Functional architecture of the vomeronasal organ of the frog (Genus Rana). Acta Zool. 74:173-180. Dugovich, B., Peel, M., Palmer A., Zielke R., Sikora, A., Beechler, B., Jolles, A., Epps, C., Dolan, B. (2017) Detection of bacterial-reactive natural IgM antibodies in desert bighorn sheep populations. PLoS ONE. 12(6): e0180415. Dulac, C. & Axel, R. (1995) A novel family of genes encoding putative pheromone receptors in mammals. Cell. 83:195-206. Dulac, C., Torello, A. (2003) Molecular detection of pheromone signals in mammals: From genes to behaviour. Nat Rev Neurosci. 4:551–562. Eddy, S. (2011) Accelerated profile HMM searches. PLoS Computational Biology. 7: e1002195. Eisthen, H. (1992) Phylogeny of the vomeronasal system and of receptor cell types in the olfactory and vomeronasal epithelia of vertebrates. Microsc Res Tech. 23(1):1-21. Eisthen, H. (1997). Evolution of vertebrate olfactory systems. Brain Behav. Evol. 50:222–233.

260

Erickson, S. (2007) Sexual dimorphism and seasonal changes in the Harderian gland of the Red-sided garter snake, Thamnophis sirtalis parietalis. Bachelor’s thesis. Oregon State University Honors College. Corvallis OR. Fagerberg L., et al. (2013) Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics. 13(2):397-406. Fan, J., Francis, F., Liu, Y., Chen, J., & Cheng, D. (2011). An overview of odorant- binding protein functions in insect peripheral olfactory reception. Genetics and molecular research. 10(4):3056-3069. Figge, F., Atkinson, W. (1945) Relation of water metabolism to porphyrin incrustations in pantothenic acid-deficient rats. in: Proc. soc. exper. biol. & med. Finn, R., Bateman, A., Clements, J., Coggill, P., Eberhardt, R., Eddy, S., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E., Tate, J., Punta, M. (2014) The Pfam protein families database. Nucleic Acids Research. 42:D222- D230. Flo, T., Smith, K., Sato, S., Rodriguez, D., Holmes, M., Strong, R., Akira, S., Aderem, A. (2004) Lipocalin 2 mediates an innate immune response to bacterial infection by sequestrating iron. Nature. 432(7019):917-21. Flower, D. (1996) The lipocalin protein family: structure and function. Biochem. 318:1-14. Francia, S., Pifferi, S., Menini, A., Tirindelli, R. (2014) Vomeronasal Receptors and Signal Transduction in the Vomeronasal Organ of Mammals. In: Neurobiology of Chemical Communication. Boca Raton (FL): CRC Press/Taylor & Francis. Friesen, C., Uhrig, E., Squire, M., Mason, R., Brennan, P. (2014) Sexual conflict over mating in Red-sided garter snakes (Thamnophis sirtalis) as indicated by experimental manipulation of genitalia. Proceedings of the Royal Society B:Biological Sciences. Gasket, A. (2007) Spider sex pheromones: emission, reception, structures, and functions. Biological reviews. 82(1):27-48. Geer, L., Marchler-Bauer, A., Geer, R., Han, L., He, J., He, S., Liu, C., Shi, W., Bryant, S. (2010) The NCBI BioSystems database. Nucleic Acids Res. 38:D492-D496. Gillis J. Mistry M. Pavlidis P. (2010) Gene function analysis in complex data sets using ermineJ. Nature Protocols. 5(6):1148-59. Gomez‐Diaz, C., Benton, R. (2013). The joy of sex pheromones. EMBO reports. 14(10):874-883. Grabherr, M., Haas, B., Yassour, M., Levin, J., Thompson, D., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N.,

261

Gnirke, A., Rhind, N., di Palma, F., Birren, B., Nusbaum, C., Lindblad-Toh, K., Friedman, N., Regev, A. (2011) Full-length transcriptome assembly from RNA- Seq data without a reference genome. Nature biotechnology. 29(7), 644-52. Gregory P. (1974) Patterns of spring emergence of the Red-sided garter snake (Thamnophis sirtalis parietalis) in the Interlake region of Manitoba. Journal of Canadian Zoology. 52:1063-1069. Gregory, P. (1977) Life-history parameters of the Red-sided garter snake (Thamnophis sirtalis parietalis) in an extreme environment, the Interlake region of Manitoba. Nat Mus Can Publ Zool. 13:1–44. Gregory, P., Stewart, K., (1975) Long-distance dispersal and feeding strategy of the Red-sided garter snake (Thamnophis sirtalis parietalis) in the Interlake of Manitoba. Can. J. Zool. 53: 238-245. Haas, B., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P., Bowden, J., Couger, M., Eccles, D., Li, B., Lieber, M., MacManes, M., Ott, M., Orvis, J., Pochet, N., Strozzi, F., Weeks, N., Westerman, R., William, T., Dewey, C. N., Henschel, R., LeDuc, R., Friedman, N., Regev, A. (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature protocols. 8(8): 1494-512. Halpern, M. (1987) The organization and function of the vomeronasal system. Annual review in Neurosciences. 10:325-362. Halpern, M. (1992). Nasal chemical senses in reptiles. Hormones, brain and behavior: Biology of the Reptilia. 18:423-523. Halpern, M., Kubie, J., Silverstein, R., Muller-Schwarze, D. (1983) Snake tongue flicking behavior: clues to vomeronasal system functions. In: Chemical Signals III. Plenum, NY. 45-72. Halpern, M., Martinez-Marcos, A. (2003) Structure and function of the vomeronasal system: an update. Progress in neurobiology. 70:245-318. Hamdani, E., Lastein, S., Gregersen, f., Døving, K. (2008) Seasonal variations in olfactory sensory neurons—fish sensitivity to sex pheromones explained? Chemical Senses. 33(2):119–123. Harder, J. (1694). Glandula nova lachrymalis una cum ductu excretorio in cervis et damis. Acta Erudit. Lips. 49–52. Harkness, E., Ridgway, M. (1980) Chromodacryorrhea in laboratory rats (Rattus norvegicus): Etiologic considerations. Laboratory animal science. 30:841-4. Hathout, Y. (2007) Approaches to the study of the cell secretome. Expert Review of Proteomics. 4(2): 239 Hekmat-Scafe, D., Scafe, C., McKinney, A., Tanouye, M. (2002) Genome-wide analysis of the odorant-binding protein gene family in Drosophila melanogaster. Genome Res.12(9):1357-69.

262

Hillenius, W., Phillips, A. Rehorek, S. (2007). “A new lachrymal gland with an excretory duct in red and fallow deer” by Johann Jacob Harder (1694): English translation and historical perspective. Annals of anatomy = Anatomischer Anzeiger: official organ of the Anatomische Gesellschaft. 189. 423-33. Hillenius, W., Rehorek, S. (2005). From the eye to the nose: ancient orbital to vomeronasal communication in tetrapods? In: Mason, R., LeMaster, M., Mu¨llerSchwartze, D. (Eds.), Chemical Signals in Vertebrates (10). Kluwer Academic, New York. 228–241. Hillenius, W., Watrobski, L., A. Rehorek, S. (2001) Passage of Tear Duct Fluids through the Nasal Cavity of Frogs. Journal of Herpetology. 35(4):701-704. Hipolide, D., Tufik, S. (1995). Paradoxical sleep deprivation in female rats alters drug-induced behaviors. Physiology & behavior. 57:1139-1143. Hoh J., Lin W., Nadakavukaren M. (1984) Sexual dimorphism in the Harderian gland proteins of the golden hamster. Comparative Biochemistry and Physiology. 77B:729-731. Huang, G., Zhang, J., Wang, D., Mason, R., Halpern, M. (2006) Female snake sex pheromone induces membrane responses in vomeronasal sensory neurons of male snakes. Chemical Senses. 31:521–529. Hubert, D., Bentz, E., Mason, R., In Prep. Transcriptional and physiological responses to acute thermal stress in the Red-sided garter snake Thamnophis sirtalis parietalis. Inouchi, J., Wang, D., Jiang, X. C., Kubie, J., & Halpern, M. (1993). Electrophysiological analysis of the nasal chemical senses in garter snakes. Brain, behavior and evolution. 41(3-5):171-182. Isogai, Y., Si, S., Pont-Lezica, L., Tan, T., Kapoor, V., Murthy, V., Dulac, C. (2011) Molecular organization of vomeronasal chemoreception. Nature. 478:241-247. Jacobson, L. (1813). Anatomisk Beskrivelse over et nyt Organ I Huusdyrenes Næse. Veterinær Selskapets Skrifter [in Danish] 2:209–246. Jacobson, L. (1999). Anatomical description of a new organ in the nose of domesticated animals. English translation. Chem. Senses 23 Jin, J., Jhang, T., Lui, N., Dong, S. (2014) Different roles suggested by sex-biased expression and pheromone-binding affinity among three pheromone-binding proteins in the pink rice borer, Sesamia inferens (Lepidoptera: Noctuidae). Journal of insect physiology. 66:71-79. Johnstone, K., Lubieniecki, K., Koop, B., Davidson, W. (2011) Expression of olfactory receptors in different life stages and life histories of wild Atlantic salmon (Salmo salar). Molecular Ecology. 19:4059-69.

263

Jolliffe, I., Cadima, J. (2016). Principal component analysis: a review and recent developments. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences. 374(2065):20150202. Karlson P., Lüscher, M. (1959) ‘Pheromones’: a new term for a class of biologically active substances. Nature. 183:55–56. Kennedy, G. (1970) Harderoporphyrin: a new porphyrin from the Harderian glands of the rat. Comp. Biochem. Physiol. 36:21–36. Khuhro, S., Liao, H., Zhu, G., Li, S., Ye, Z., Dong, S. (2017) Tissue distribution and functional characterization of odorant binding proteins in Chilo suppressalis (Lepidoptera: Pyralidae). J. of Asia-Pacific Entomology. 20:1104-1111. Kitchen, S., Crowder, C., Poole, A., Weis V., Meyer E. (2015) De Novo Assembly and Characterization of Four Anthozoan (Phylum Cnidaria) Transcriptomes. G3: Genes, Genomes, Genetics. 5(11):2441-2452. Kondoh, D., Yamamoto, Y., Nakamuta, N., Taniguchi, K. (2012) Seasonal Changes in the Histochemical Properties of the Olfactory Epithelium and Vomeronasal Organ in the Japanese Striped Snake, Elaphe quadrivirgata. Anatomia. Histologia Embryologia 41: 41-53. Kubie, J., Halpern, M. (1975) Laboratory observations of trailing behavior in garter snakes. Journal of Comparative Physiological Psychology. 89:667-674. Kubie, J., Vagvolgyi, A., Halpern, M. (1978) The roles of the vomeronasal and olfactory systems in the courtship behavior of male garter snakes. Journal of Comp. Physiol. Psychol. 92:627-641. Leal, W. (2005) Pheromone reception. Top Curr Chem 240: 1–36. LeMaster, M, Mason, R. (2002) Variation in a female sexual attractiveness pheromone controls male mate choice in garter snakes. Journal of chemical ecology. 28(6):1269-85. LeMaster, M., Mason, R. (2001) Evidence for a female sex pheromone mediating male trailing behavior in the Red-sided garter snake, Thamnophis sirtalis parietalis. Chemoecology. 11:149-152. LeMaster, M., Mason, R. (2001) Evidence for a female sex pheromone mediating male trailing behavior in the Red-sided garter snake, Thamnophis sirtalis parietalis. Chemoecology. 11:149-152. Li, J., Bickel, P., Biggen, M. (2014) System wide analyses have underestimated protein abundances and the importance of transcription in mammals. PeerJ. 2:270. Liberles, S., Horowitz, D., Contos, K., Wilson, J., Liberles, D., Buck, L., (2009) Formyl peptide receptors are candidate chemosensory receptors in the vomeronasal organ. Proceedings of the National Academy of Sciences. 106(24):9842-9847.

264

Lohman, B., Weber, J., & Bolnick, D. (2016). Evaluation of TagSeq, a reliable low‐ cost alternative for RNA seq. Molecular ecology resources. 16(6):1315-1321. Love, M., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with Deseq2. Genome Biology. 15:550. Lutterschmidt, D., LeMaster, M., and Mason, R. (2004). Effects of melatonin on the behavioral and hormonal responses of Red-sided garter snakes (Thamnophis sirtalis parietalis) to exogenous corticosterone. Hormones and Behavior. 46:692-702. Lutterschmidt, D., Maine, A. (2014) Sex or candy? Neuroendocrine regulation of the seasonal transition from courtship to feeding behavior in male Red-sided garter snakes (Thamnophis sirtalis parietalis). Horm Behav. 66(1):120-34. Marchese, S., Pes, D., Scaloni, A., Carbone, V., Pelosi, P. (1998) Lipocalins of boar salivary glands binding odours and pheromones, Eur. J. Biochem. 252:563-568. Mason, R., Fales, H., Jones, T., Pannell, L., Chinn, J., Crews, D. (1989) Sex pheromones in snakes. Science. 254(4915):290-293. Mason, R., Halpern M., (2011). Chemical ecology of snakes: from pheromones to receptors. North American Society for Comparative Endocrinology Conference, 2011. Mason, R., Jones, T., Fales, H., Pannell, L., Crews, D. (1990) Characterization, synthesis, and behavioral response to the sex attractiveness pheromones of the Red-sided garter snake, Thamnophis sirtalis parietalis. J Chem Ecol. 16:27–36. Mazzoni, E., Desplan, C., Celik, A. (2004) ‘One receptor’ rules in sensory neurons. Developmental neuroscience. 26(5-6):388-395. Menendez-Pelaez, A., Buzzell, G., Rodriguez, C., Reiter, R. (1991). Indole and porphyrin content of the Syrian hamster Harderian glands during the proestrous and estrous phases of the estrous cycle. Journal of Steroid Biochemistry and Molecular Biology. 38(1). Meyer laboratory. Github Repository. https://github.com/Eli-Meyer. Meyer, E., Aglyamova, G., Matz, M. (2011) Profiling gene expression responses of coral larvae (Acropora millepora) to elevated temperature and settlement inducers using a novel RNA‐Seq procedure. Molecular Ecology. 20:3599-3616. Meyer, E., Aglyamova, G., Wang, S., Buchanan-Carter, J., Abrego, D., Willis, B., Matz, M. (2009) Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx. BMC Genomics. 10:219. Miller L., Gutzke W. (1999) The role of the vomeronasal organ of crotalines (Reptilia: Serpentes: Viperidae) in predator detection. Animal Behavior. 58:53- 57.

265

Minucci, S., Baccari, G.C., Di Matteo, L. and G. Chieffi. (1989) A sexual dimorphism of the Harderian gland of the toad, Bufo viridis. Basic Appl. Histochem. 33:299-310. Minucci, S., Chieffi Baccari, G. and L. DiMatteo. (1994). The effect of sex hormones on lipid content and mast cell number in the Harderian gland of the female toad, Bufo viridis. Cell Tiss. Res. 26:797-805. Montgomery, R., Maslin, W. (1992) A comparison of the gland of harder response and head associated lymphoid tissue (HALT) morphology in chicken and turkeys. Avian Disease. 36:755-759. Moore, I., Mason. R. (2001) Behavioral and hormonal responses to corticosterone in the male Red-sided garter snake, Thamnophis sirtalis parietalis. Physiology & Behavior, 72(5):669-674. Moraes, M., Pareja, M., Laumann, R., Borges, M. (2008) The chemical volatiles (semiochemicals) produced by neotropical stink bugs (Hemiptera: Pentatomidae). Neotrop Entomol 37: 489–505. Mouse Genome Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature. 420:520-562. Mueller, A., Sato, K., Glick, B. (1971) The chicken lacrimal gland, gland of Harder, caecal tonsil, and accessory spleens as sources of antibody-producing cells. Cell Immunol. 2(2):140-52. Nakazawa, H., Ichikawa, M., Nagai, T. (2009). Seasonal increase in olfactory receptor neurons of the Japanese toad, Bufo japonicus, is paralleled by an increase in olfactory sensitivity to isoamyl acetate. Chemical senses. 34(8):667- 678. NCBI. Thamnophis sirtalis parietalis annotation release 100. O'Donnell, R., Shine, R., Mason, R. (2004) Seasonal anorexia in the male Red-sided Garter snake, Thamnophis sirtalis parietalis. Behavioral and Ecological sociobiology. 56:413-41. Pan, P., Waheed, A., Sly, W., Parikkila, S. (2010). Carbonic anhydrase in the mouse Harderian gland. Journal of Molecular Histology. 41(6):411-7. Papes, F., Logan, D., Stowers, L. (2010). The vomeronasal organ mediates interspecies defensive behaviors through detection of protein pheromone homologs. Cell. 141(4):692-703. Parker, M., Mason, R. (2012) How to make a sexy snake: estrogen activation of female sex pheromone in male Red-sided garter snakes. J Exp Biol. 215(5):723- 30. Parnell, P., Crossland, J., Beattie, R. (2005) Frequent Harderian Gland Adenocarcinomas in Inbred White-Footed Mice (Peromyscus leucopus). Comparitive medicine. 55(4):382-386.

266

Pavlidis, P, Lewis, D, Noble, W. (2002) Exploring gene expression data with class scores. Pacific symposium on biocomputing. 474–485. Payne, A. (1994) The Harderian Gland: A tercentennial review. Journal of Anatomy. 185(1):1–49. Payne, A., McGadey, J., Moore, M., Thompson, G. (1977) Androgenic control of the Harderian gland in the male golden hamster. Journal of Endocrinology. 75:73- 82. Pelosi, P., Baldaccini, N., Pisanelli, A. (1982) Identification of a specific olfactory receptor for 2-isobutyl-3-methoxypyrazine. Biochem. J. 201:245-248. Pelosi, P., Maida, R. (1990) Odorant-binding proteins in vertebrates and insects. Chemical senses. 15(2):205-215. Pelosi, P., Maida, R. (1995) Odorant-binding proteins in insects. Comp. Biochem. Physiol. II:1B(3):503-514. Petersen, T., Brunak, S., von Heijne, G., Nielsen H. (2011) SignalP 4.0 - Discrimination between signal peptides and transmembrane Regions. Nature Methods 8:785-786. Placyk, J., (Jr), Graves, B. (2002). Prey Detection by Vomeronasal Chemoreception in a Plethodontid Salamander. Journal of chemical ecology. 28:1017-36. Raum, D., Marcus, D., Alper, C., Levey, R., Taylor, P., Starzl, T. (1980) Synthesis of human plasminogen by the liver. Science. 208:1036-37. Rehorek S., Firth B., Hutchinson M. (2000a) The structure of the nasal chemosensory system in squamate reptiles. Journal of Biosciences. 25(2):181-90. Rehorek, S. (1997). Squamate Harderian gland: An overview. The Anatomical Record. 248(3):301-6. Rehorek, S. (1998) The embryology of the anterior orbital glands of some squamate reptiles. Acta Soc. Zool. Bohem. 62:155–165. Rehorek, S., Firth, B., and Hutchinson, M. (1997). Morphology of the Harderian gland of some Australian geckos. J. Morphol. 231:253–259. Rehorek, S., Firth, B., Hutchinson, M. (2000a) The structure of the nasal chemosensory system in squamate reptiles. Journal of Biosciences. 25(2):181- 90. Rehorek, S., Halpern M., Firth, B., Hutchinson, M. (2011). The Harderian gland of two species of snakes: Pseudonaja textilis (Elapidae) and Thamnophis sirtalis (Colubridae). Canadian Journal of Zoology. 81:357-363. Rehorek, S., Hillenius, W., Kennaugh, J., Chapman, N. (2005a) The gland and the sac – the preorbital apparatus of muntjacs. In: Mason, R., LeMaster, M., Mu¨ller- Schwartze, D. (Eds.), Chemical Signals in Vertebrates 10. Kluwer Academic, New York. 152–158.

267

Rehorek, S., Hillenius, W., Quan, W., Halpern, M. (2000b) Passage of Harderian gland secretions to the vomeronasal organ in Thamnophis sirtalis (Serpentes: Colubridae). Canadian Journal of Zoology. 78(7): 1284-1288. Rehorek, S., Legenzoff, E., Carmody, K., Smith, T., Sedlmayr, J. (2005b) Alligator tears: a reevaluation of the lacrimal apparatus of the crocodilians. J Morphol. 266(3):298-308. Rivie`re, S., Challet, L., Fluegge, D., Spehr, M. & Rodriguez, I. (2009) Formyl peptide receptor-like proteins are a novel family of vomeronasal chemosensors. Nature. 459:574–577. Robertson, D., Hurst, J., Hubbard, S. Gaskell, S. Beynon, R. (1998) Ligands of urinary lipocalins from the mouse: uptake of environmentally derived chemicals. Journal of chemical ecology. 24(7):1127-1140. Rodriguez I., Feinstein P., Mombaerts P. (1999) Variable patterns of axonal projections of sensory neurons in the mouse vomeronasal system. Cell. 97(2):199-208. Rodriguez, C., Mayo, J., Sainz, R., Antolin, I., Herrerra, F., Martin, V., Reiter, R. (2003) Regulation of antioxidant enzymes: a significant role for melatonin. Journal of pineal research. 36(1):1-9. Rosenberg, H. (1992) An improved method for collecting secretion from Duvernoy’s gland of colubrid snakes. Copeia. (1):244-246. Rumble, S., Lacroute, P., Dalca, A., Fiume, M., Sidow A. (2009) SHRiMP: Accurate mapping of short color-space reads. PLoS Comput Biol. 5(5):e1000386. Sakai, T. (1981) The Mammalian Harderian Gland: Morphology, Biochemistry, Function and Phylogeny. Arch. Hist. Jap. 44(4):299-333. Sashima, M., Hatakey, S., Satoh M., Suzuki, A. (1989) Harderianization is another sexual dimorphism of rat exorbital lacrimal gland. Acta Anatomica 135:303- 306. Scaloni, A., Paolini, S., Brandazza, A. (2001) Purification, cloning and characterisation of odorant- and pheromone-binding proteins from pig nasal epithelium. Cell. Mol. Life Sci. 58(5-6):823-834. Schmidt, A., Wake, M. (1990) Olfactory and vomeronasal systems of caecilians (Amphibia: Gymnophiona). J. Morphol., 205:255-268. Schwenk, K. (1994). Why Snakes Have Forked Tongues. Science, 263(5153), 1573- 1577. Serino, I., D'Istria, M., Monteleone, P. (1993) A comparative study of melatonin production in the retina, pineal gland and Harderian gland of Bufo viridis and Rana esculenta. Comparative Biochemistry and Physiology Part C: Pharmacology, Toxicology and Endocrinology. 106(1):189-193.

268

Serino, I., Izzo, G., Ferrara, D., d'Istria, M., Minucci, S. (2007). A new sex dimorphism in the Harderian gland of the frog Rana esculenta. Canadian Journal of Zoology. 85:909-915. Seyama, Y., Kasama, T., Yasugi, E., Park, S., Kano, K. (1992) Lipids in Harderian glands and their significance. In: Webb, S., Hoffman, R., Puig-Domingo, M., Reiter, R. (eds). Harderian Glands. Springer, Berlin, Heidelberg. Seyama, Y., Uchijima, Y. (2007). Novel function of lipids as a pheromone from the Harderian gland of golden hamster. Proceedings of the Japan Academy. B83(3);77-96. Shankaran, V., Ikeda, H., Bruce, A., White, J., Swanson, P., Old, L., Schreiber, R. (2001) IFNgamma and lymphocytes prevent primary tumour development and shape tumour immunogenicity. Nature. 410(6832):1107-11. Shi, P., Zhang, J. (2007) Comparative genomic analysis identifies an evolutionary shift of vomeronasal receptor gene repertoires in the vertebrate transition from water to land. Genome Res. 17:166-174. Shiao, M., Chang, A., Liao, B., Ching, Y., Jade Lu, M., Chen, S., Li, W. (2012) Transcriptomes of mouse olfactory epithelium reveal sexual differences in odorant detection, Genome Biology and Evolution. 4(5):703–712. Shine, R. (2003) Reproductive strategies in snakes. Proceedings of the Royal society of London 270:995-1004. Shine, R., Elphick, M., Harlow, P., Moore, I., Lemaster, M., Mason, R., (2011) Movements, Mating, and Dispersal of Red-sided Garter snakes (Thamnophis sirtalis parietalis) from a communal den in Manitoba. Copeia. 2001(1):82–91. Singer, A., Macrides, F. (1993) Composition of an aphrodisiac pheromone, Chem. Senses. 18:630. Sivak, J. (1977) The role of the spectacle in the visual optics of the snake eye. Vision Research. 17:293–298. Smith, M., Bellairs, A. (1947) The head glands of snakes, with remarks on the evolution of the parotid gland and teeth of the Opisthoglypha. Journal of the Linnean Society of London (Zoology). 41:351-368. Smith, T., Bhatnagar, K. (2017) Vomeronasal system evolution. Psychology. Encyclopedia of Neuroscience. 461-470. Soldi, R., Rodrigues, M., Aldrich, J., Zarbin, P. (2012) The male produced sex pheromone of the true bug, Phthia picta, is an unusual hydrocarbon. J Chem Ecol. 38:814–824. Sorensen, P., Stacey, N. (2004) Brief review of fish pheromones and discussion of their possible uses in the control of non‐indigenous teleost fishes. New Zealand Journal of Marine and Freshwater Research, 38:3:399-417.

269

Spike R., Payne A., Moore M. (1988) The effects of age on the structure and porphyrin synthesis of the Harderian gland of the female golden hamster. Journal of Anatomy. 160:157–166. Stopková, R., Dudková, B., Hájková, P., Stopka, P. (2014) Complementary roles of mouse lipocalins in chemical communication and immunity. Biochemical Society Transactions. 42(4):893-898. Stopkova, R., Klempt, P., Kuntova, B., Stopka, P. (2017). On the tear proteome of the house mouse (Mus musculus musculus) in relation to chemical signalling. PeerJ 5:e3541 Taniguchi, Y., Choi P., Li, GW., Chen H., Babu M., Hearn J., Imily A., Xie X. (2010) Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science. 329(5991):533-538. Tegoni, M, Pelosi P., Vincent, F., Spinelli, S., Campanacci, V., Grolli, S., Ramoni, R., Cambillau, C. (2003) Mammalian odorant binding proteins. Biochimica et Biophysica Acta (BBA)-Protein Structure and Molecular Enzymology. 1482(1- 2):229-240. The Gene Ontology (GO) database and informatics resource. (2004) Nucleic Acids Research, 32(suppl_1:1) D258-D261. The UniProt Consortium. (2018) UniProt: the universal protein knowledgebase. Nucleic Acids Res. 46:2699. Thiessen, D. (1992) The function of the Harderian gland in the Mongolian gerbil, Meriones unguiculatus. In: Harderian glands: porphyrin metabolism, behavioral and endocrine effects. (Springer):127-140. Touhara, K., Vosshall, L. (2009) Sensing odorants and pheromones with chemosensory receptors. Annual Review of Physiology. 71:307-332. Uhrig, E. (2015). Reproductive implications of parasitic infections and immune challenges in garter snakes. Ph.D. Thesis. Corvallis, Or. Oregon State University. Uhrig, E., Lutterschmidt, D., Mason, R., LeMaster, M. (2012) Pheromonal mediation of intraseasonal declines in the attractivity of female Red-sided garter snakes, Thamnophis sirtalis parietalis. J Chem Ecol. 38(1):71-80. Variale, B., Chieffi-Baccari, G., d’Istria, M., Di Matteo, L., Minucci, S., Serino, I. and G. Chieffi. (1992). Testosterone induction of poly (A)(+)- RNA synthesis and 35S methionine incorporation into proteins of Rana esculenta Harderian gland. Mol. Cell. Endocrinol. 84:R51-56. Vogt, R., Riddiford, L. (1981). Pheromone-binding and inactivation by moth antennae. Nature. 293:161-163.

270

Wang, D, Jiang, X, Chen, P, Inouchi, J, Halpern, M. (1993) Chemical and immunological analysis of prey-derived vomeronasal stimulants. Brain Behav Evol. 3;41:246-254. Wanner, K., Nichols, A., Allen, J., Bunger, P., Garczynski, S., Linn, C., Luetje, C. (2010) Sex pheromone receptor specificity in the European corn borer moth, Ostrinia nubilalis. PloS one. 5(1):e8685. Watanabe, M. (1980) An autoradiographic, biochemical, and morphological study of the Harderian gland of the mouse. Journal of morphology 163(3): 349-365. Webb, S., Hoffman, R., Puig-Domingo, M., Reiter, R. (1992). In: Harderian Glands: Porphyrin Metabolism, Behavioral and Endocrine Effects. Springer, Berlin. Yandell, M., Ence, D. (2012) A beginner's guide to eukaryotic genome annotation. Nature Reviews Genetics. 13:329-342. Zhao, S., Guo, Y., Sheng, Q., Shyr, Y., (2014) Advanced heat map and clustering analysis using Heatmap3. BioMed Research International. 986048. Zhulidov, P., Bogdanova, E., Shcheglov, A., Vagner, L., Khaspekov, G., Kozhemyako, V., Matz, V., Meleshkevitch, E., Moroz, L., Lukyanov, S., Shagin, D. (2004) Simple cDNA normalization using kamchatka crab duplex- specific nuclease. Nucleic Acids Res, 32(3):e37. Zuri, I, Halpern, M. (2003). Differential effects of lesions of the vomeronasal and olfactory nerves of garter snake (Thamnophis sirtalis) responses to airborne chemical stimuli. Behavioral Neuroscience. 117:169-183.

271

Appendices:

Appendix A.1

>DN33479_c8_g36_i1 len=692 Match_Acc=A0A0B8RRE3 Gene=Ig_lambda_chain_ >DN19858_c0_g2_i1 len=664 Match_Acc=H9G4D3 Gene=Immunoglobulin_heavy_constant_mu_ GO=GO:0002250,GO:0003697,GO:0003823,GO:0005615,GO:0006910,GO:0006911 ,GO:0006958,GO:0009897,GO:0009986,GO:0019731,GO:0031210,GO:0034987,G O:0042571,GO:0042742,GO:0042834,GO:0045087,GO:0050829,GO:0050853,GO: 0050871,GO:0071756,GO:0071757,GO:0072562, >DN31203_c5_g1_i1 len=1626 Match_Acc=H9G4D3 Gene=Immunoglobulin_heavy_constant_mu_ GO=GO:0002250,GO:0003697,GO:0003823,GO:0005615,GO:0006910,GO:0006911 ,GO:0006958,GO:0009897,GO:0009986,GO:0019731,GO:0031210,GO:0034987,G O:0042571,GO:0042742,GO:0042834,GO:0045087,GO:0050829,GO:0050853,GO: 0050871,GO:0071756,GO:0071757,GO:0072562, >DN31758_c0_g1_i1 len=3379 Match_Acc=G1KP53 Gene=Immunoglobulin_mu_binding_protein_2_ GO=GO:0000049,GO:0003676,GO:0003677,GO:0003723,GO:0005524,GO:0005634 ,GO:0005737,GO:0008094,GO:0008134,GO:0008186,GO:0008270,GO:0032508,G O:0032575,GO:0043022,GO:0043141,GO:0046872,GO:0051260, >DN9902_c0_g1_i1 len=331 Match_Acc=F7FNM0 Gene=Joining_chain_of_multimeric_IgA_and_IgM_ GO=GO:0002250,GO:0003094,GO:0003697,GO:0005615,GO:0019731,GO:0019862 ,GO:0030674,GO:0031210,GO:0032461,GO:0034987,GO:0042803,GO:0042834,G O:0045087,GO:0060267,GO:0071748,GO:0071750,GO:0071751,GO:0071752,GO: 0071756, >DN28638_c0_g10_i2 len=413 Match_Acc=A0A0D9RMU8 Gene=Immunoglobulin_lambda_variable_5-48_ >DN32280_c24_g1_i1 len=510 Match_Acc=G1KMV4 Gene=Joining_chain_of_multimeric_IgA_and_IgM_ GO=GO:0002250,GO:0003094,GO:0003697,GO:0005615,GO:0016020,GO:0016021 ,GO:0019731,GO:0019862,GO:0030674,GO:0031210,GO:0032461,GO:0034987,G O:0042803,GO:0042834,GO:0045087,GO:0060267,GO:0071748,GO:0071750,GO: 0071751,GO:0071752,GO:0071756, >DN27409_c1_g1_i1 len=229 Match_Acc=A0A0B8RUH1 Gene=Ig_lambda_chain_ >DN2047_c0_g1_i1 len=276 Match_Acc=L8XZ18 Gene=Ig_lambda_chain_V- IV_region_Bau_ >DN7798_c0_g1_i2 len=1032 Match_Acc=A0A1U7SLM5 Gene=immunoglobulin_superfamily_member_10_ >DN30370_c0_g2_i1 len=992 Match_Acc=V8N3T3 Gene=Pentaxin_ GO=GO:0005576,GO:0046872, >DN11223_c0_g1_i1 len=250 Match_Acc=A0A0B8RQH0 Gene=Ig_heavy_chain_V- III_region_VH26_

272

>DN77201_c0_g1_i1 len=1032 Match_Acc=G1KNK3 Gene=Tyrosine_kinase_with_immunoglobulin_like_and_EGF_like_domains_1 _ GO=GO:0000166,GO:0001568,GO:0004672,GO:0004713,GO:0005524,GO:0005886 ,GO:0006468,GO:0016020,GO:0016021,GO:0016301,GO:0016310,GO:0016525,G O:0016740,GO:0018108,GO:0030336,GO:0032526,GO:0045026, >DN26894_c0_g3_i2 len=494 Match_Acc=A0A291NHA2 Gene=Ig_lambda_variable_ >DN73118_c0_g1_i1 len=412 Match_Acc=H9GDH9 Gene=Drosha_ribonuclease_III_ GO=GO:0001530,GO:0003723,GO:0003725,GO:0004521,GO:0004525,GO:0005634 ,GO:0006396,GO:0010468,GO:0010586,GO:0010628,GO:0014069,GO:0016075,G O:0017151,GO:0030422,GO:0031053,GO:0031054,GO:0042803,GO:0045589,GO: 0046332,GO:0050727,GO:0050829,GO:0050830,GO:0070412,GO:0070877,GO:00 70878,GO:0090502,GO:2000628, >DN63521_c0_g1_i1 len=833 Match_Acc=A0A1D5Q0H5 Gene=Immunoglobulin_lambda_variable_3-9_ >DN10830_c0_g5_i1 len=257 Match_Acc=A0A1U7SP05 Gene=immunoglobulin_iota_chain-like_ >DN81123_c0_g1_i1 len=224 Match_Acc=G3PF09 Gene=Sema_domain,_immunoglobulin_domain_(Ig),_short_basic_domain,_se creted,_(semaphorin)_3Ga_ >DN31203_c6_g1_i1 len=1689 Match_Acc=J9VCV5 Gene=IgY1_ >DN76585_c0_g1_i1 len=643 Match_Acc=A0A146W392 Gene=Ig_kappa_chain_V- IV_region_B17_ >DN24585_c0_g1_i1 len=1862 Match_Acc=G1KBB9 Gene=Immunoglobulin_superfamily_member_11_ GO=GO:0016020,GO:0016021, >DN18351_c0_g1_i1 len=1501 Match_Acc=G1KH94 Gene=Immunoglobulin_superfamily_member_10_ GO=GO:0005576,GO:2001222, >DN9443_c0_g1_i1 len=914 Match_Acc=G3SLN6 Gene=Recombination_signal_binding_protein_for_immunoglobulin_kappa_J _region_ GO=GO:0000122,GO:0000978,GO:0000982,GO:0001077,GO:0001103,GO:0001228 ,GO:0001525,GO:0001756,GO:0001837,GO:0001974,GO:0002193,GO:0002437,G O:0003139,GO:0003151,GO:0003157,GO:0003160,GO:0003198,GO:0003214,GO: 0003222,GO:0003256,GO:0003677,GO:0003682,GO:0003700,GO:0005634,GO:00 05654,GO:0005667,GO:0005730,GO:0005737,GO:0006355,GO:0006357,GO:0006 366,GO:0006959,GO:0007219,GO:0007221,GO:0007507,GO:0008134,GO:000828 4,GO:0008285,GO:0009912,GO:0009957,GO:0010468,GO:0010628,GO:0017053, GO:0021983,GO:0030097,GO:0030182,GO:0030183,GO:0030216,GO:0030279,GO :0030513,GO:0035019,GO:0035912,GO:0036302,GO:0042127,GO:0042742,GO:0 043011,GO:0043565,GO:0045165,GO:0045596,GO:0045892,GO:0045944,GO:004 7485,GO:0048505,GO:0048733,GO:0048820,GO:0048844,GO:0060045,GO:00604 86,GO:0060716,GO:0060844,GO:0061419,GO:0070491,GO:0072554,GO:0072602 ,GO:0097101,GO:1901186,GO:1901189,GO:1901297,GO:2000138, >DN77768_c0_g1_i1 len=441 Match_Acc=A0A2Y9SEQ3 Gene=immunoglobulin_lambda- 1_light_chain-like_

273

>DN27408_c0_g1_i1 len=1507 Match_Acc=H9GG91 Gene=Spondin_2_ GO=GO:0001530,GO:0002448,GO:0003823,GO:0005578,GO:0005615,GO:0008228 ,GO:0032496,GO:0032755,GO:0032760,GO:0042742,GO:0043152,GO:0045087,G O:0050832,GO:0051607,GO:0060907,GO:0071222, >DN46728_c0_g1_i1 len=413 Match_Acc=G1KBB9 Gene=Immunoglobulin_superfamily_member_11_ GO=GO:0016020,GO:0016021, >DN13644_c0_g2_i1 len=238 Match_Acc=A0A291NGW7 Gene=IgH_variable_region_ >DN10808_c0_g1_i1 len=1385 Match_Acc=H9GRC4 Gene=V- set_and_immunoglobulin_domain_containing_10_like_ GO=GO:0005634,GO:0005654,GO:0016020,GO:0016021, >DN30436_c0_g3_i1 len=711 Match_Acc=V8N3T3 Gene=Pentaxin_ GO=GO:0005576,GO:0046872, >DN8145_c1_g1_i1 len=317 Match_Acc=A0A2Y9FNS9 Gene=leucine- rich_repeats_and_immunoglobulin-like_domains_protein_3_isoform_X2_ >DN15417_c1_g1_i1 len=520 Match_Acc=A0A0B8RQH0 Gene=Ig_heavy_chain_V- III_region_VH26_ >DN31203_c1_g1_i1 len=212 Match_Acc=A0A0B8RQH0 Gene=Ig_heavy_chain_V- III_region_VH26_ >DN31945_c1_g1_i3 len=1464 Match_Acc=C6S3P7 Gene=Bactericidal/permeability- increasing_protein-like_3_ GO=GO:0008289, >DN9329_c0_g1_i1 len=303 Match_Acc=A0A0A1E3W4 Gene=Immmunoglobulin_lambda_light_chain_variable_region_ >DN14068_c0_g3_i1 len=208 Match_Acc=A0A0B8RRE3 Gene=Ig_lambda_chain_ >DN26636_c0_g1_i3 len=534 Match_Acc=A0A1U8DZM7 Gene=immunoglobulin_iota_chain-like_ >DN29579_c3_g2_i3 len=2156 Match_Acc=V8P3L2 Gene=Ig_lambda- 2_chain_C_region_ GO=GO:0016020,GO:0016021, >DN8145_c0_g1_i1 len=975 Match_Acc=G1K9M2 Gene=Leucine_rich_repeats_and_immunoglobulin_like_domains_3_ GO=GO:0016020,GO:0016021,GO:0032474, >DN80871_c0_g1_i1 len=363 Match_Acc=M7C3D1 Gene=Ig_lambda_chain_V_region_4A_ >DN22973_c0_g1_i1 len=536 Match_Acc=S6BGD4 Gene=IgG_H_chain_ >DN22109_c0_g1_i1 len=2968 Match_Acc=A0A1U8D3J0 Gene=polymeric_immunoglobulin_receptor_ GO=GO:0016020,GO:0016021, >DN47045_c0_g1_i1 len=559 Match_Acc=H9GPZ4 Gene=Pentaxin_ GO=GO:0005576,GO:0046872, >DN24393_c0_g1_i1 len=470 Match_Acc=M7C381 Gene=Ig_lambda_chain_V- II_region_BUR_ >DN22018_c0_g5_i1 len=482 Match_Acc=V8N3T3 Gene=Pentaxin_ GO=GO:0005576,GO:0046872,

274

>DN31668_c0_g1_i2 len=528 Match_Acc=A0A2U3YDB4 Gene=immunoglobulin_lambda- like_polypeptide_5_ >DN18156_c0_g2_i1 len=1637 Match_Acc=A0A0Q3PU08 Gene=Ig_gamma- 1_chain_C_region,_membrane-bound_form_ >DN10884_c0_g1_i1 len=287 Match_Acc=A0A0B8RQH0 Gene=Ig_heavy_chain_V- III_region_VH26_ >DN13637_c0_g1_i1 len=293 Match_Acc=A0A0B8RQH0 Gene=Ig_heavy_chain_V- III_region_VH26_ >DN7718_c0_g1_i2 len=940 Match_Acc=M7BP71 Gene=Pentaxin_ GO=GO:0005576,GO:0046872, >DN6889_c0_g1_i1 len=851 Match_Acc=G1KAV7 Gene=Leucine_rich_repeats_and_immunoglobulin_like_domains_1_ GO=GO:0007605,GO:0016020,GO:0016021,GO:0032474,GO:0060384, >DN33562_c2_g1_i1 len=3237 Match_Acc=A0A0F7Z0N6 Gene=Single_Ig_IL-1- related_receptor_ GO=GO:0007165,GO:0016020,GO:0016021, >DN66349_c0_g1_i1 len=316 Match_Acc=L5MK11 Gene=Ig_heavy_chain_V- III_region_GAL_ >DN27409_c0_g17_i1 len=746 Match_Acc=A0A0B8RUH1 Gene=Ig_lambda_chain_ >DN31929_c0_g1_i1 len=3569 Match_Acc=A0A1U7RHL8 Gene=immunoglobulin- like_domain-containing_receptor_1_ GO=GO:0016020,GO:0016021, >DN30769_c0_g2_i1 len=4807 Match_Acc=A0A1U7R844 Gene=immunoglobulin_superfamily_member_3_ GO=GO:0016020,GO:0016021, >DN35183_c0_g1_i1 len=244 Match_Acc=A0A147B334 Gene=Ig_lambda_chain_V- II_region_NIG-84_ >DN14651_c0_g1_i1 len=1100 Match_Acc=V8NNF3 Gene=Sialic_acid-binding_Ig- like_lectin_10_ GO=GO:0016020,GO:0016021,GO:0030246, >DN21286_c0_g3_i1 len=241 Match_Acc=A0A0B8RRE3 Gene=Ig_lambda_chain_ >DN21985_c0_g1_i2 len=3039 Match_Acc=G1KUK5 Gene=Leucine_rich_repeats_and_immunoglobulin_like_domains_2_ GO=GO:0016020,GO:0016021, >DN32849_c5_g1_i1 len=1337 Match_Acc=H9GM88 Gene=Leucine_rich_repeat,_Ig- like_and_transmembrane_domains_1_ >DN2037_c0_g1_i1 len=976 Match_Acc=G1KAV7 Gene=Leucine_rich_repeats_and_immunoglobulin_like_domains_1_ GO=GO:0007605,GO:0016020,GO:0016021,GO:0032474,GO:0060384, >DN58638_c0_g1_i1 len=225 Match_Acc=H0XWP3 Gene=Immunoglobulin_superfamily_containing_leucine_rich_repeat_2_ GO=GO:0016020,GO:0016021, >DN7843_c0_g1_i1 len=886 Match_Acc=F1N9L3 Gene=Immunoglobulin_superfamily_DCC_subclass_member_4_ >DN67015_c0_g1_i1 len=289 Match_Acc=A0A2U8J8Z6 Gene=Ig_heavy_chain_variable_region_

275

>DN55370_c0_g1_i1 len=234 Match_Acc=V8NDG6 Gene=Lipopolysaccharide- binding_protein_ GO=GO:0001530,GO:0006955,GO:0008289,GO:0050829, >DN30280_c0_g1_i2 len=1063 Match_Acc=U3JUK1 Gene=Transmembrane_and_immunoglobulin_domain_containing_1_ GO=GO:0005737,GO:0005886,GO:0016020,GO:0016021,GO:0030334,GO:0042127 ,GO:0043066,GO:0090559, >DN63612_c0_g1_i1 len=286 Match_Acc=H9GMX7 Gene=Immunoglobulin_superfamily_member_8_ GO=GO:0005886,GO:0016020,GO:0016021, >DN67588_c0_g1_i1 len=322 Match_Acc=A0A0B8RRE3 Gene=Ig_lambda_chain_ >DN74656_c0_g1_i1 len=349 Match_Acc=H0ZFQ6 Gene=Immunoglobulin_superfamily_DCC_subclass_member_4_ GO=GO:0016020,GO:0016021, >DN2702_c0_g1_i1 len=1119 Match_Acc=A0A2D4N988 Gene=Pentaxin_ >DN31203_c0_g1_i1 len=342 Match_Acc=A0A0B8RQH0 Gene=Ig_heavy_chain_V- III_region_VH26_ >DN26059_c0_g1_i1 len=1445 Match_Acc=H9GHP0 Gene=Leucine_rich_repeat_and_Ig_domain_containing_1_ GO=GO:0005154,GO:0005578,GO:0005615,GO:0007165,GO:0007409,GO:0016020 ,GO:0016021, >DN18441_c0_g3_i1 len=722 Match_Acc=H9GHP0 Gene=Leucine_rich_repeat_and_Ig_domain_containing_1_ GO=GO:0005154,GO:0005578,GO:0005615,GO:0007165,GO:0007409,GO:0016020 ,GO:0016021, >DN49765_c0_g1_i1 len=396 Match_Acc=R4GBL7 Gene=Immunoglobulin_superfamily_member_5_ GO=GO:0016020,GO:0016021, >DN33630_c0_g1_i1 len=1707 Match_Acc=K7FNG8 Gene=Drosha_ribonuclease_III_ GO=GO:0001530,GO:0003723,GO:0004521,GO:0004525,GO:0006396,GO:0010468 ,GO:0010586,GO:0010628,GO:0014069,GO:0016075,GO:0017151,GO:0031053,G O:0031054,GO:0042803,GO:0045589,GO:0046332,GO:0050727,GO:0050829,GO: 0050830,GO:0070412,GO:0070877,GO:0070878,GO:0090501,GO:0090502,GO:20 00628, >DN52167_c0_g1_i1 len=309 Match_Acc=M7BYS0 Gene=Cytochrome_c_oxidase_subunit_4_isoform_2_ GO=GO:0001530,GO:0004129,GO:0006955,GO:0008289,GO:0016020,GO:0016021 ,GO:0022900,GO:0050829,GO:1902600, >DN11883_c0_g2_i1 len=354 Match_Acc=A0A1U8DZM7 Gene=immunoglobulin_iota_chain-like_ >DN61872_c0_g1_i1 len=978 Match_Acc=A0A0B8RQH0 Gene=Ig_heavy_chain_V- III_region_VH26_ >DN21343_c0_g1_i1 len=662 Match_Acc=A0A0B8RX63 Gene=Pentaxin_ GO=GO:0005576,GO:0046872, >DN20383_c0_g1_i1 len=490 Match_Acc=V8N449 Gene=Pentaxin_ GO=GO:0005576,GO:0046872,

276

>DN10916_c0_g1_i1 len=247 Match_Acc=A0A2U3YDB4 Gene=immunoglobulin_lambda- like_polypeptide_5_ >DN23773_c0_g1_i2 len=1647 Match_Acc=H9GGG6 Gene=IgLON_family_member_5_ GO=GO:0016020,GO:0016021, >DN32391_c1_g1_i1 len=552 Match_Acc=A0A1U7R844 Gene=immunoglobulin_superfamily_member_3_ GO=GO:0016020,GO:0016021, >DN70739_c0_g1_i1 len=209 Match_Acc=F6YDB3 Gene=Leucine- rich_repeats_and_immunoglobulin-like_domains_3_ GO=GO:0016020,GO:0016021, >DN20090_c0_g2_i1 len=2204 Match_Acc=G1K9Q5 Gene=WAP,_follistatin/kazal,_immunoglobulin,_kunitz_and_netrin_domai n_containing_2_ GO=GO:0001501,GO:0004867,GO:0005576,GO:0005615,GO:0007179,GO:0010466 ,GO:0010951,GO:0030414,GO:0030512,GO:0032091,GO:0043392,GO:0048019,G O:0048747,GO:0050431,GO:0060021,GO:1900116, >DN59901_c0_g1_i1 len=358 Match_Acc=G1KAV7 Gene=Leucine_rich_repeats_and_immunoglobulin_like_domains_1_ GO=GO:0007605,GO:0016020,GO:0016021,GO:0032474,GO:0060384, >DN25039_c2_g1_i1 len=1258 Match_Acc=A6MLD7 Gene=Immunoglobulin_binding_protein_1-like_protein_ GO=GO:0009966, >DN12361_c0_g1_i1 len=347 Match_Acc=A0A0B8RX63 Gene=Pentaxin_ GO=GO:0005576,GO:0046872, >DN28279_c0_g1_i1 len=1609 Match_Acc=R4G959 Gene=V- set_and_immunoglobulin_domain_containing_10_ GO=GO:0016020,GO:0016021, >DN31203_c3_g1_i1 len=1312 Match_Acc=J9VCV5 Gene=IgY1_ >DN20112_c0_g1_i1 len=1553 Match_Acc=A0A1U7TAJ8 Gene=sialic_acid- binding_Ig-like_lectin_5_ GO=GO:0016020,GO:0016021,GO:0030246, >DN4637_c0_g1_i2 len=1669 Match_Acc=K7F7A0 Gene=Leucine_rich_repeat,_Ig- like_and_transmembrane_domains_1_ GO=GO:0016020,GO:0016021, >DN1451_c0_g1_i1 len=1008 Match_Acc=M7B4Z6 Gene=Ig_gamma- 1_chain_C_region,_membrane-bound_form_ GO=GO:0016020,GO:0016021, >DN28488_c0_g1_i1 len=1011 Match_Acc=C6S3P7 Gene=Bactericidal/permeability- increasing_protein-like_3_ GO=GO:0008289, >DN5499_c0_g1_i1 len=287 Match_Acc=L9KLT7 Gene=Ig_lambda_chain_V- VI_region_SUT_ >DN30999_c0_g1_i1 len=1646 Match_Acc=A0A1U8CVA6 Gene=immunoglobulin_superfamily_member_1-like_ GO=GO:0016020,GO:0016021, >DN28003_c0_g3_i1 len=2419 Match_Acc=G1KJ80 Gene=Immunoglobulin_like_domain_containing_receptor_2_ GO=GO:0009749,GO:0016020,GO:0016021,GO:0030073,GO:0030154,GO:0031016 ,GO:0048873,

277

>DN56656_c0_g1_i1 len=346 Match_Acc=G1KX42 Gene=Immunoglobulin_superfamily_member_22_ GO=GO:0005859,GO:0006941,GO:0007015,GO:0008307,GO:0030018,GO:0031430 ,GO:0045214,GO:0051015,GO:0051371,GO:0071688,GO:0097493, >DN33295_c2_g1_i1 len=1057 Match_Acc=G1KAV7 Gene=Leucine_rich_repeats_and_immunoglobulin_like_domains_1_ GO=GO:0007605,GO:0016020,GO:0016021,GO:0032474,GO:0060384, >DN66356_c0_g1_i1 len=254 Match_Acc=A0A0B8RUH1 Gene=Ig_lambda_chain_ >DN60363_c0_g1_i1 len=717 Match_Acc=G1KJ80 Gene=Immunoglobulin_like_domain_containing_receptor_2_ GO=GO:0009749,GO:0016020,GO:0016021,GO:0030073,GO:0030154,GO:0031016 ,GO:0048873, >DN5766_c0_g1_i1 len=211 Match_Acc=A0A0B8RRE3 Gene=Ig_lambda_chain_ >DN31203_c4_g4_i1 len=632 Match_Acc=L8B0U3 Gene=IgG_heavy_chain_ >DN71932_c0_g1_i1 len=303 Match_Acc=H9G4Z2 Gene=Serpin_family_E_member_1_ GO=GO:0001300,GO:0001525,GO:0002020,GO:0004867,GO:0005102,GO:0005576 ,GO:0005615,GO:0010469,GO:0010757,GO:0010951,GO:0014912,GO:0030194,G O:0030336,GO:0031012,GO:0032757,GO:0033629,GO:0035491,GO:0045766,GO: 0048260,GO:0050729,GO:0050829,GO:0051918,GO:0061044,GO:0070062,GO:00 71222,GO:0090026,GO:0090399,GO:0097187,GO:1901331,GO:1902042,GO:2000 098,GO:2000352, >DN55388_c0_g1_i1 len=313 Match_Acc=A0A1U7R844 Gene=immunoglobulin_superfamily_member_3_ GO=GO:0016020,GO:0016021, >DN31831_c2_g12_i1 len=782 Match_Acc=A0A2D4H243 Gene=Pentaxin_ >DN44440_c0_g1_i1 len=331 Match_Acc=V8NNF3 Gene=Sialic_acid-binding_Ig- like_lectin_10_ GO=GO:0016020,GO:0016021,GO:0030246, >DN17589_c0_g2_i1 len=287 Match_Acc=A0A0F8CCW3 Gene=Ig_heavy_chain_V- III_region_VH26_ >DN15329_c0_g3_i1 len=229 Match_Acc=A0A0B8RRE3 Gene=Ig_lambda_chain_ >DN11037_c0_g2_i1 len=2332 Match_Acc=A0A2I0M5E8 Gene=Leucine_rich_repeat_and_Ig_domain_containing_2_ >DN534_c0_g1_i1 len=450 Match_Acc=H9GR44 Gene=Transforming_growth_factor_beta_1_ GO=GO:0000060,GO:0000122,GO:0000165,GO:0001570,GO:0001657,GO:0001775 ,GO:0001837,GO:0001843,GO:0001933,GO:0001934,GO:0002028,GO:0002062,G O:0002244,GO:0002460,GO:0002513,GO:0003179,GO:0003823,GO:0005114,GO: 0005125,GO:0005160,GO:0005576,GO:0005578,GO:0005615,GO:0005634,GO:00 05737,GO:0005902,GO:0006468,GO:0006611,GO:0006754,GO:0006796,GO:0006 874,GO:0006954,GO:0007050,GO:0007093,GO:0007173,GO:0007179,GO:000718 2,GO:0007183,GO:0007219,GO:0007435,GO:0007492,GO:0007507,GO:0008083, GO:0008156,GO:0008283,GO:0008284,GO:0008285,GO:0008354,GO:0009611,GO :0009986,GO:0010468,GO:0010469,GO:0010628,GO:0010629,GO:0010718,GO:0 010763,GO:0010800,GO:0010862,GO:0010936,GO:0014003,GO:0016049,GO:001 6202,GO:0016477,GO:0017015,GO:0019049,GO:0019899,GO:0021915,GO:00224 08,GO:0030214,GO:0030217,GO:0030279,GO:0030308,GO:0030335,GO:0030501 ,GO:0030509,GO:0031012,GO:0031065,GO:0031293,GO:0031334,GO:0031663,G

278

O:0032270,GO:0032355,GO:0032570,GO:0032667,GO:0032700,GO:0032740,GO: 0032801,GO:0032930,GO:0032943,GO:0032967,GO:0033138,GO:0034713,GO:00 34714,GO:0035066,GO:0035307,GO:0042110,GO:0042127,GO:0042130,GO:0042 306,GO:0042307,GO:0042482,GO:0042802,GO:0042981,GO:0043011,GO:004302 9,GO:0043117,GO:0043406,GO:0043408,GO:0043491,GO:0043536,GO:0043537, GO:0043539,GO:0043552,GO:0043932,GO:0045066,GO:0045216,GO:0045589,GO :0045591,GO:0045596,GO:0045599,GO:0045662,GO:0045786,GO:0045892,GO:0 045893,GO:0045930,GO:0045944,GO:0048146,GO:0048298,GO:0048468,GO:004 8535,GO:0048642,GO:0050680,GO:0050714,GO:0050731,GO:0050868,GO:00509 21,GO:0051098,GO:0051101,GO:0051897,GO:0055010,GO:0060325,GO:0060389 ,GO:0060390,GO:0060391,GO:0060395,GO:0060762,GO:0060965,GO:0061035,G O:0070306,GO:0070374,GO:0070723,GO:0071158,GO:0071363,GO:0071407,GO: 0071560,GO:0085029,GO:0097191,GO:1900126,GO:1900182,GO:1901666,GO:19 02895,GO:1903077,GO:1903799,GO:1903800,GO:1903911,GO:1905313,GO:1990 402,GO:2000249,GO:2000679,GO:2000727, >DN54792_c0_g1_i1 len=711 Match_Acc=V8NAK5 Gene=BPI_fold- containing_family_C_protein_ GO=GO:0001530,GO:0006955,GO:0008289,GO:0050829, >DN5359_c0_g1_i1 len=899 Match_Acc=A0A1U7RE89 Gene=immunoglobulin_superfamily_member_22_ >DN63323_c0_g1_i1 len=280 Match_Acc=XP_013930920 Gene=immunoglobulin_superfamily_containing_leucine- rich_repeat_protein-like_[Thamnophis_sirtalis] >DN54589_c0_g1_i1 len=497 Match_Acc=XP_013932084 Gene=immunoglobulin_superfamily_member_3_isoform_X1_[Thamnophis_sirt alis] >DN19003_c0_g3_i1 len=383 Match_Acc=XP_013912506 Gene=bactericidal_permeability-increasing_protein- like_[Thamnophis_sirtalis]

279

Appendix A.2 Primer sequence appendix – Primers used to prepare sequencing libraries for normalized transcriptome sequencing and Tag-seq protocols:

Primer and adaptor sequences used for normalized transcriptome library prep: CA1-20TVN: AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTVN CA1-TS-YY (RNA): AAGCAGTGGTATCAACGCAGAGTACYYGGG CA1: AAGCAGTGGTATCAACGCAGAGTAC PE-Top: ACACTCTTTCCCTACACGACGCTCTTCCGATC*T HT-Bot; /5Phos/GATCGGAAGAGCACACGTCTGAACTCCAGTCA UTBC74: CAAGCAGAAGACGGCATACGAGATAATCGTGTGACTGGAGTTCAGACGTGTGCTCTTCCG ATC HT34: AATGATACGGCGACCACCGAGATCTACACCGAGAACACTCTTTCCCTACACGACGCTCTT CCGATCT

Primers and oligonucleotides used to prepare Tag-seq libraries: ILL-4N-TS: ACCGCAUGCGGCUACACGACGCUCUUCCGAUCUNNNNGGG 3ILL-20TV: ACGTGTGCTCTTCCGATCTAATTTTTTTTTTTTTTTTTTTTV 5ILL: CTACACGACGCTCTTCCGATCT

Alternate primer set used for Tag-seq cDNA synthesis in Chapter 5: ILL-4N-TS: ACCGCAUGCGGCUACACGACGCUCUUCCGAUCUNNNNGGG ILL_2N_TCG_2N_TS: ACCGCATGCGGCTACACGACGCTCTTCCGATCTNNTGCNNGGG ILL_2N_GCWTCH_2N_TS: ACCGCATGCGGCTACACGACGCTCTTCCGATCTNNGCWTCHNNGGG qPCR primers for library quantification: ILL-Lib1: AATGATACGGCGACCACCGA ILL-Lib2: CAAGCAGAAGACGGCATACGA

280

R script Appendix R.1 HG Transcriptome Differential Expression Compared to a Multi- Tissue pool (Enrichment analysis Pipeline): Used in: Chapter 2 setwd("C:/Users/Ehren/Desktop/analysis") library("DESeq2") library('rld') library('RColorBrewer') library("heatmap3")

# Upload the all_counts.tab file df <- read.table("Enr_combined_counts.tab") #head (df)

# Create a data.frame with the Key that expresses the factors for each sample dkey = read.table("Enr_key.tab", header=T) #head(dkey) #dkey

# Filter for coverage - filter out low expression data columns <- ncol(df) coveragethrd <-1 # min average number of reads per sample pvalue <- 0.01 # max adjusted p value for significance nrow(df) df <- df[rowSums(df)>=(columns*coveragethrd),] df <- df[rowSums(df[,1:39])>=(39*coveragethrd),] nrow(df)

# Make the matrix of df dataframe dm <- as.matrix(df)

# Create full model dds_enrichment <- DESeqDataSetFromMatrix(dm, dkey, design = ~ Tissue) dds_enrichment$Tissue <- relevel(dds_enrichment$Tissue, ref="ALL") dds_enrichment<-DESeq(dds_enrichment)

# Test the effect of Tissue res_Tissue<-results(dds_enrichment, alpha=pvalue) res_Tissue.nona<- na.omit(res_Tissue) nrow(res_Tissue.nona[res_Tissue.nona$padj<=pvalue,]) #summary(res_Tissue) summary(res_Tissue.nona) #write.table (res_Tissue.nona, "C:/Users/Ehren/Desktop/analysis/HG_Enrichment_results.tab", sep="\t", col.names = NA) hmcol<-colorRampPalette(c("Blue","Black","Orange"))(100)

#Create a list including the genes (the heat map can't include them all)

281 high_exp <- df[rowSums(df[,1:39])>=(39*1),] high_exp_gene_names <- rownames(high_exp) high_exp_res <- res_Tissue.nona[rownames(res_Tissue.nona)%in% unlist (high_exp_gene_names),] nrow(high_exp_res) summary(high_exp_res) high_exp_norm_matrix <- counts(dds_enrichment,normalized=TRUE)[high_exp_gene_names,] high_exp_matrix <- counts(dds_enrichment,normalized=FALSE)[high_exp_gene_names,] vst_high_exp <- varianceStabilizingTransformation(high_exp_matrix) heatmap3(vst_high_exp, Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA) heatmap3(high_exp_matrix, Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA) heatmap3(high_exp_norm_matrix, Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA)

#write.table (high_exp_res, "C:/Users/Ehren/Desktop/analysis/HG_highexp_results.tab", sep="\t", col.names = NA) hmap_pvalue <- .01 #<- pvalue used to filter genes for heat map high_sig_Tissue <- rownames(high_exp_res[high_exp_res$padj

# Create heat maps of DE genes only (z-scores) hmap_pvalue<-0.01 #<- pvalue used to filter genes for heat map hmcol<-colorRampPalette(c("Blue","Black","Orange"))(100)

HG_DE_Tissue <- rownames(res_Tissue.nona[res_Tissue.nona$padj

282

R script Appendix R.2 Determining Origin of Proteins in vomeronasal secretion: Used in: Chapter 3 setwd("C:/Users/Ehren/Desktop/analysis/HG_VNO_Protein_source") library("DESeq2") library('rld') library("heatmap3") library("EnhancedVolcano")

#clear environment rm(list=ls()) dev.off()

# Upload the all_counts.tab file df <- read.table("HG_VNO_combined_counts.tab") #head (df) df_HG <- df[1:40] df_VNO <- df[41:72] #head(df_HG) #head(df_VNO)

# Create a data.frame with the Key that expresses the factors for each sample dkey = read.table("HG_VNO_Key.tab", header=T) dkey

# Upload list of proteins found in the VNO proteins <- read.table("proteins_in_vno.tab") proteins

# heatmap parameters: hmap_pvalue <- 0.01 hmcol<-colorRampPalette(c("Blue","Black","Orange"))(100)

# Filter for coverage - filter out low expression data columns <- ncol(df) columns_HG <- ncol(df_HG) columns_VNO <- ncol(df_VNO) coveragethrd <-.05 # min average number of reads per sample pvalue <- 0.01 # max adjusted p value for significance nrow(df) df_HG <- df_HG[rowSums(df_HG)>=(columns_HG*coveragethrd),] df_VNO <- df_VNO[rowSums(df_VNO)>=(columns_VNO*coveragethrd),] HG_retain <- as.vector(rownames(df_HG)) VNO_retain <- as.vector(rownames(df_VNO)) nrow(df_HG) nrow(df_VNO) retain <- append(HG_retain,VNO_retain)

283 length(retain) retain <- unique(retain) length(retain) df <- df[rownames(df)%in% unlist (retain),] nrow(df)

# Make the matrix of df dataframe dm <- as.matrix(df)

# Create full model dds_HG_VNO <- DESeqDataSetFromMatrix(dm, dkey, design = ~ Tissue) dds_HG_VNO$Tissue <- relevel(dds_HG_VNO$Tissue, ref="VNO") dds_HG_VNO<-DESeq(dds_HG_VNO)

# Test the effect of Tissue res_HG_VNO<-results(dds_HG_VNO, alpha=pvalue) res_HG_VNO.nona<- na.omit(res_HG_VNO) nrow(res_HG_VNO.nona[res_HG_VNO.nona$padj<=pvalue,]) #summary(res_Tissue) summary(res_HG_VNO.nona) #write.table (res_HG_VNO.nona, "C:/Users/Ehren/Desktop/analysis/HG_VNO_results.tab", sep="\t", col.names = NA)

# extract high expression DE genes high_sig_Tissue <- rownames(high_exp_res[high_exp_res$padj

# Create heat maps of all genes (z-scores) select_norm<- order(rowMeans(counts(dds_enrichment,normalized=TRUE)),decreasing=TRUE) select<- order(rowMeans(counts(dds_enrichment,normalized=FALSE)),decreasing=TRUE) length(select) vst <- varianceStabilizingTransformation(dds_enrichment) #rld <- rlogTransformation(dds_enrichment) heatmap3(counts(select_norm,normalized=TRUE)[select,], Rowv=TRUE, Colv=NA, col = hmcol, scale="row",labRow = NA) heatmap3(assay(vst)[select,], Rowv=TRUE, Colv=NA, col = hmcol, scale="row",labRow = NA)

# Create heat maps of DE genes only (z-scores)

284

#jpeg(filename="HG_VNO_heatmap.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) HG_DE_Tissue <- rownames(res_HG_VNO.nona[res_HG_VNO.nona$padj

#PCA Plot print(plotPCA(vst, intgroup=c('Tissue'))) #print(plotPCA(rld, intgroup=c('Tissue')))

########################################################################### ########### # Subset results for proteins found in the VNO # Adjust pvalues for proteins found in the VNO ########################################################################### ########### res_proteins <- res_HG_VNO[rownames(res_HG_VNO)%in% unlist (proteins),] nrow(res_HG_VNO) nrow(proteins) res_proteins <- res_proteins[,-6] pvals_proteins <- res_proteins$pvalue proteins_padj <- p.adjust(pvals_proteins, method = "BH") res_proteins <-cbind(res_proteins, proteins_padj) res_proteins res_proteins.nona<- na.omit(res_proteins) nrow(res_proteins.nona) #write.table (res_proteins, "C:/Users/Ehren/Desktop/analysis/HG_VNO_Protein_source/HG_VNO_proteins_resu lts.tab", sep="\t", col.names = NA)

########################################################################### ########### # Heatmap of expression of proteins identified in the VNO: proteins_list<- rownames(res_proteins.nona) length(proteins_list) proteins_matrix <- counts(dds_HG_VNO,normalized=FALSE)[proteins_list,] rld_proteins <- rlogTransformation(proteins_matrix)

285

# Volcano_plots ########################################################################### ########### #jpeg(filename="HG_VNO_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_HG_VNO, lab = rownames(res_HG_VNO), x = 'log2FoldChange', y = "padj", transcriptPointSize = 3, transcriptLabSize = 2.5, xlab = bquote(~Log[2]~ "fold change"), ylab = bquote(~-Log[10]~("Adj. p-value ")), pCutoff = 0.05, FCcutoff=0, cutoffLineType = "longdash", cutoffLineWidth = 1, cutoffLineCol = "green4", col=c('black', 'blue', 'black', 'orange3'), title = ' ') dev.off() jpeg(filename="HG_VNO_proteins_volcano.jpg", units="in", width=12.5, height=7, pointsize=12, res=1200) EnhancedVolcano(res_proteins, lab = rownames(res_proteins), x = 'log2FoldChange', y = "proteins_padj", xlab = bquote(~Log[2]~ "fold change"), ylab = bquote(~-Log[10]~("Adj. p-value ")), transcriptPointSize = 3, transcriptLabSize = 2.5, colAlpha = 0.75, pCutoff = 0.05, FCcutoff=4.32192809489,#<- log2 of 20x unadjusted fc cutoffLineType = "longdash", cutoffLineWidth = 1.2, cutoffLineCol = "green4", col=c('grey50', 'black', 'blue3', 'orange3'), borderWidth = 1.5, legendPosition = "right", legendLabSize = 14, legendIconSize = 5, legend = c("Not Significant", "Log2 FC", "Adj. p- value<0.01", "Significant; >20x fold change"), title = 'Expression of Proteins identified in the Vomeronasal Organ') dev.off()

286

R script Appendix R.3 Bacterial killing assays, time to threshold: Used in: Chapter 4 setwd("C:/Users/Ehren/Desktop/BKA_R") library(ggplot2) library(dplyr) library(purrr) library(rootSolve)

## Clear environment rm(list=ls()) dev.off()

####BUILD DATA FRAME AND REMOVE NC WELLS############# df <- read.table("protdat.tab", header = TRUE, sep = "\t") df <- df[df$group != "NA" & is.na(df$group) == FALSE,] df

# set threshold (leave this at threshold unless your bacteria are not in the exponential growth phase at that point) threshold <- 0.2

# number of plates np <- 3

# number of groups (number of bacterial dilutions + number of biological samples) n <-13

# total number of subsamples N = n*np

# number of time measurements t <- 11

# number of snake subsamples (Total number of sample OD readings divided by number of individual 'IDs') ns <- 33

# change only file number (if needed) ttth <- 0 for(i in 1:N){ temp <- 0 temp <- df[((i-1)*t+1):(i*t),] ttth[i] <- min(temp[temp$OD > threshold,]$time) }

287

########################### CURVE FITTING FOR OPTICAL DENSITY ######################### #The function used is map() from the package purrr. #It applies a function to every element of a list, much like lapply() #Every element of our list is the data for an individual snake or subsample #map() performs an lm or glm fit to each snake or subsample. #We then extract the coefficients, and find the roots of the estimated lm function minus the threshold.

###Make list of data for each snake### snakelist <- list()

#Store the data for a single subsample in each element of the list for(i in 1:N){ snakelist[[i]] <- df[((i-1)*t+1):(i*t),] } snakelist snakelist[[N]]

#FIT A CUBIC CURVE TO THE MEANS AT EACH TIME POINT FOR EACH SNAKE fits <- map(snakelist, ~lm(OD~time+I(time^2)+I(time^3), .x)) fits

###NUMERICALLY SOLVE FOR THE TIME VALUE AT WHICH FITTED CURVE CROSSES .2 ####NOTE: SOME CURVES' MEANS DO NOT CROSS THE THRESHOLD, ####THEY ARE SHOWN AS -INF OR A NEGATIVE IN OD_ESTIMATES OD_estimates <- map_dbl(fits, ~max(uniroot.all(function(x){.x$coef[1]+.x$coef[2]*x+.x$coef[3]*I(x^2)+.x$c oef[4]*I(x^3)-.2}, c(-100,18))))

##calculate the mean for every 3 observations #post_means <- 0

#for(i in 1:n){ #post_means[i] <- mean(estimates[((i-1)*3+1):(i*3)]) #}

############## FIND EARLIEST TIMES FIRST, THEN GET MEAN ##################### ttth <- 0 for(i in 1:N){ temp <- 0 temp <- df[((i-1)*t+1):(i*t),] ttth[i] <- min(temp[temp$OD > threshold,]$time) } post_means <- post_medians <- 0 for(i in 1:n){

288

post_means[i] <- mean(ttth[((i-1)*np+1):(i*np)]) post_medians[i] <- median(ttth[((i-1)*np+1):(i*np)]) } ########################################################################### ############################### #############THIS SECTION PERFORMS THE CONVERSION TO A POTENCY BASED OFF OF CONTROL WELLS########### ########################################################################### ###############################

#make time to threshold column df["TTTh"]<-0 for(i in 1:N){ df[((i-1)*t+1):(i*t),]$TTTh<-OD_estimates[i] }

#create a data frame with only the TTTh for each SUBSAMPLE for non-control wells nocontrol_collapsed <- aggregate(TTTh~replicate+ID, data = df[df$sex != "NA",], FUN = max)

#Make a list where each element is the data for the control wells for a SINGLE PLATE control_vals <- list() for(i in 1:np){ control_vals[[i]] <- df[df$replicate == i & df$ID == "DIL", c("TTTh", "group")] %>% mutate(group = as.numeric(group)) control_vals[[i]] <- aggregate(TTTh~group, data = control_vals[[i]], FUN = max) }

##Fit a gaussian regression with logit link to each control group and extract coefficients ##This form of regression forces all the predicted values to be between 0 and 1. #First, we have to add 1 to all 0 potency values (this is already a rough approximation, so I'm cheating a bit to make glm run) for(i in 1:np){ control_vals[[i]]$group[control_vals[[i]]$group == 0] <- control_vals[[i]]$group[control_vals[[i]]$group == 0]+1 }

#glm fit with potency/100 as reponse and logit link. fits_control <- map(control_vals, ~glm(group/100~TTTh, family = gaussian(link = "logit"), data = .x)$coefficients)

#Also tried linear fit

289

#fits_control <- map(control_vals, ~glm(group/100~TTTh, data = .x)$coefficients)

####Estimate potency based on glm fit nocontrol_collapsed["potency"] <- 0 for(i in 1:np){ for(j in seq(i,(ns-np)+i, by = np)){ nocontrol_collapsed[j,]$potency <- exp(fits_control[[i]][1]+fits_control[[i]][2]*nocontrol_collapsed[j,]$TTTh) /

(1+exp(fits_control[[i]][1]+fits_control[[i]][2]*nocontrol_collapsed[j,]$TT Th)) #linear fit #nocontrol_collapsed[j,]$potency <- fits_control[[i]][1]+fits_control[[i]][2]*nocontrol_collapsed[j,]$TTTh } } potencies <- aggregate(potency~ID, data = nocontrol_collapsed, FUN = mean) potencies

######################################################### # Plots of OD/time: #########################################################

# Plot growth curves for all groups including controls:

#jpeg(filename="bka_allgroups.jpg", units="in", width=12, height=10, pointsize=12, res=1200) ggplot(aes(time,OD,col = group), data = df)+geom_point(aes(shape = as.factor(replicate)))+ scale_shape(solid=TRUE)+ facet_wrap(~group)+ geom_hline(yintercept = .2)+ theme(legend.position="none")+ coord_cartesian(ylim=c(0, 0.36)) #dev.off()

# Plot growth curves for all groups + show threshold: jpeg(filename="bka_sample_curves.jpg", units="in", width=10, height=6, pointsize=20, res=1200) ggplot(aes(time,OD,col = group), data = df[df$ID != "DIL",])+ geom_point(aes(shape = as.factor(replicate), col = as.factor(replicate)))+ scale_shape(solid=TRUE)+ geom_line(aes(time, OD, col = as.factor(replicate), group = replicate))+ facet_wrap(~ID)+ geom_hline(yintercept = .2)+ theme(legend.position="right")+

290

coord_cartesian(ylim=c(0, 0.36)) dev.off()

# Plot growth curves for each group individually + show threshold: ggplot(aes(time,OD,col = group), data = df[df$group == "1",])+ geom_point(aes(shape = as.factor(replicate), col = as.factor(replicate)))+ geom_line(aes(time, OD, col = as.factor(replicate), group = replicate))+ facet_wrap(~ID)+ geom_hline(yintercept = .2)+ theme(legend.position="none") ggplot(aes(time,OD,col = group), data = df[df$group == "2",])+ geom_point(aes(shape = as.factor(replicate), col = as.factor(replicate)))+ geom_line(aes(time, OD, col = as.factor(replicate), group = replicate))+ facet_wrap(~ID)+ geom_hline(yintercept = .2)+ theme(legend.position="none") ggplot(aes(time,OD,col = group), data = df[df$group == "3",])+ geom_point(aes(shape = as.factor(replicate), col = as.factor(replicate)))+ geom_line(aes(time, OD, col = as.factor(replicate), group = replicate))+ facet_wrap(~ID)+ geom_hline(yintercept = .2)+ theme(legend.position="none") ggplot(aes(time,OD,col = group), data = df[df$group == "4",])+ geom_point(aes(shape = as.factor(replicate), col = as.factor(replicate)))+ geom_line(aes(time, OD, col = as.factor(replicate), group = replicate))+ facet_wrap(~ID)+ geom_hline(yintercept = .2)+ theme(legend.position="none") ggplot(aes(time,OD,col = group), data = df[df$group == "5",])+ geom_point(aes(shape = as.factor(replicate), col = as.factor(replicate)))+ geom_line(aes(time, OD, col = as.factor(replicate), group = replicate))+ facet_wrap(~ID)+ geom_hline(yintercept = .2)+ theme(legend.position="none") ggplot(aes(time,OD,col = group), data = df[df$group == "6",])+ geom_point(aes(shape = as.factor(replicate), col = as.factor(replicate)))+ geom_line(aes(time, OD, col = as.factor(replicate), group = replicate))+ facet_wrap(~ID)+ geom_hline(yintercept = .2)+ theme(legend.position="none")

291

R script Appendix R.4 Differential Expression by Sex and Season in the Harderian gland: Used in: Chapter 4

setwd("C:/Users/Ehren/Desktop/analysis/HG_sex_season") library("DESeq2") library('rld') library("heatmap3") library("EnhancedVolcano")

#clear environment rm(list=ls()) dev.off()

# Upload the all_counts.tab file df <- read.table("HG_combined_counts.tab") df_female<-df[20:39] df_male<-df[1:19] df_spring<-df[c(1:9,20:29)] df_summer<-df[c(10:19,30:39)] #head (df)

# Create a data.frame with the Key that expresses the factors for each sample dkey <- read.table("HG_key.tab", header=T) dkey_female<-dkey[20:39,] #females dkey_male<-dkey[1:19,] #males dkey_spring<-dkey[c(1:9,20:29),] #spring dkey_summer<-dkey[c(10:19,30:39),] #summer

#Heatmap parameters: hmcol<-colorRampPalette(c("Blue","Black","Orange"))(15) hmap_pvalue<-0.01 #<- pvalue used to filter genes for heat map

# Filter for coverage - filter out low expression data columns <- ncol(df) columns_female <- ncol(df_female) columns_male <- ncol(df_male) columns_spring <- ncol(df_spring) columns_summer <- ncol(df_summer) coveragethrd <- 0.5 # min average number of reads per sample pvalue <- 0.01 # max adjusted p value for significance nrow(df) df <- df[rowSums(df)>=(columns*coveragethrd),] nrow(df) nrow(df_female) df_female <- df_female[rowSums(df_female)>=(columns_female*coveragethrd),]

292 nrow(df_female) nrow(df_male) df_male <- df_male[rowSums(df_male)>=(columns_male*coveragethrd),] nrow(df_male) nrow(df_spring) df_spring <- df_spring[rowSums(df_spring)>=(columns_spring*coveragethrd),] nrow(df_spring) nrow(df_summer) df_summer <- df_summer[rowSums(df_summer)>=(columns_summer*coveragethrd),] nrow(df_summer)

# Make the matrix of df dataframe dm <- as.matrix(df) dm_female <- as.matrix(df_female) dm_male <- as.matrix(df_male) dm_spring <- as.matrix(df_spring) dm_summer <- as.matrix(df_summer)

# Create full model for females only dds_female <- DESeqDataSetFromMatrix(dm_female, dkey_female, design = ~ season) dds_female$season <- relevel(dds_female$season, ref="summer") dds_female<-DESeq(dds_female)

# Test the effect of season in females only res_season_female<-results(dds_female, contrast = c("season", "spring", "summer"),alpha = pvalue) res_season_female.nona<- na.omit(res_season_female) nrow(res_season_female.nona[res_season_female.nona$padj<=pvalue,]) summary(res_season_female) summary(res_season_female.nona) #write.table (res_season_female.nona, "C:/Users/Ehren/Desktop/analysis/season_females_results.tab", sep="\t", col.names=NA)

# Create full model for males only dds_male <- DESeqDataSetFromMatrix(dm_male, dkey_male, design = ~ season) dds_male$season <- relevel(dds_male$season, ref="summer") dds_male<-DESeq(dds_male)

# Test the effect of season in males only res_season_male<-results(dds_male, contrast = c("season", "spring", "summer"),alpha=pvalue) res_season_male.nona<- na.omit(res_season_male) nrow(res_season_male.nona[res_season_male.nona$padj<=pvalue,]) summary(res_season_male) summary(res_season_male.nona)

293

#write.table (res_season_male.nona, "C:/Users/Ehren/Desktop/analysis/season_males_results.tab", sep="\t", col.names=NA)

# Create full model for spring samples only dds_spring <- DESeqDataSetFromMatrix(dm_spring, dkey_spring, design = ~ sex) dds_spring$sex <- relevel(dds_spring$sex, ref="female") dds_spring<-DESeq(dds_spring)

# Test the effect of sex in spring samples only res_spring_sex<-results(dds_spring, c("sex", "male", "female"), alpha=pvalue) res_spring_sex.nona<- na.omit(res_spring_sex) nrow(res_spring_sex.nona[res_spring_sex.nona$padj<=pvalue,]) summary(res_spring_sex) summary(res_spring_sex.nona) #write.table (res_spring_sex.nona, "C:/Users/Ehren/Desktop/analysis/spring_sex_results.tab", sep="\t", col.names=NA)

# Create full model for summer samples only dds_summer <- DESeqDataSetFromMatrix(dm_summer, dkey_summer, design = ~ sex) dds_summer$sex <- relevel(dds_summer$sex, ref="female") dds_summer<-DESeq(dds_summer)

# Test the effect of sex in summer samples only res_summer_sex<-results(dds_summer, c("sex", "male", "female"), alpha=pvalue) res_summer_sex.nona<- na.omit(res_summer_sex) nrow(res_summer_sex.nona[res_summer_sex.nona$padj<=pvalue,]) summary(res_summer_sex) summary(res_summer_sex.nona) #write.table (res_summer_sex.nona, "C:/Users/Ehren/Desktop/analysis/summer_sex_results.tab", sep="\t", col.names=NA)

# Create full model including interaction term dds_full <- DESeqDataSetFromMatrix(dm, dkey, design = ~ sex + season + sex:season) dds_full$sex <- relevel(dds_full$sex, ref="female") # Modify these lines to set interaction comparisons below dds_full$season <- relevel(dds_full$season, ref="summer") # Modify these lines to set interaction comparisons below dds_full <- DESeq(dds_full)

#Test the effects of the sex:season interaction term dds_interaction <-nbinomLRT(dds_full,reduced = ~ sex + season) resultsNames(dds_interaction)

294 res_interaction<- results(dds_interaction,contrast=list("sexmale.seasonspring"),alpha = pvalue) res_interaction.nona<- na.omit(res_interaction) nrow(res_interaction.nona[res_interaction.nona$padj<=pvalue,]) summary(res_interaction) summary(res_interaction.nona) #write.table (res_interaction.nona, "C:/Users/Ehren/Desktop/analysis/HG_interaction_results.tab", sep="\t", col.names=NA)

# Create full model NOT including interaction term dds_sex_season <- DESeqDataSetFromMatrix(dm, dkey, design = ~ sex + season) dds_sex_season <- DESeq(dds_sex_season)

# Test the effect of sex dds_sex <-nbinomLRT(dds_sex_season,reduced = ~ season) resultsNames(dds_sex) res_sex<-results(dds_sex, contrast = c("sex", "male", "female"),alpha = pvalue) res_sex.nona<- na.omit(res_sex) nrow(res_sex.nona[res_sex.nona$padj<=pvalue,]) summary(res_sex) summary(res_sex.nona) #write.table (res_sex, "C:/Users/Ehren/Desktop/analysis/HG_sex_results.tab", sep="\t", col.names = NA)

# Test the effect of season dds_season <-nbinomLRT(dds_sex_season,reduced = ~ sex) res_season<-results(dds_season, contrast = c("season", "spring", "summer"),alpha = pvalue) res_season.nona<- na.omit(res_season) nrow(res_season.nona[res_season.nona$padj<=pvalue,]) summary(res_season) summary(res_season.nona) #write.table (res_season, "C:/Users/Ehren/Desktop/analysis/HG_season_results.tab", sep="\t", col.names = NA)

# PCA Plot rld <- rlogTransformation(dds_full) jpeg(filename="PCA.jpg", units="in", width=10, height=10, pointsize=12, res=2100) "plotPCA"(rld, intgroup=c('season','sex'), ntop = 2000, returnData = FALSE) dev.off()

# Create rld matrix of the top n transcripts: top_exp<- order(rowMeans(counts(dds_sex_season,normalized=FALSE)),decreasing=TRUE)[1: 1000] top_exp_matrix <- counts(dds_full,normalized=FALSE)[top_exp,]

295 rld_top_exp<-rlogTransformation(top_exp_matrix)

#Heatmap of z scores; top n transcripts #jpeg(filename="TOP1000.jpg", units="in", width=10, height=10, pointsize=12, res=2100) heatmap3(rld_top_exp ,method="ward.D2", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA, showRowDendro = TRUE) #dev.off()

#Heatmaps of DE transcripts:

#DE #1 by sex HG_DE_sex <- rownames(res_sex.nona[res_sex.nona$padj

#DE #2 by season HG_DE_season <- rownames(res_season.nona[res_season.nona$padj

#DE #3 by sex:season interaction HG_DE_interaction <- rownames(res_interaction.nona[res_interaction.nona$padj

#DE #4 Spring Males vs Spring Females: HG_DE_spring_sex <- rownames(res_spring_sex.nona[res_spring_sex.nona$padj

#DE #5 Summer Males vs Summer Females: HG_DE_summer_sex <- rownames(res_summer_sex.nona[res_summer_sex.nona$padj

#DE #6 Spring Males vs Summer Males:

296

HG_DE_season_male <- rownames(res_season_male.nona[res_season_male.nona$padj

#DE #7 Spring Females vs Summer Females: HG_DE_season_female <- rownames(res_season_female.nona[res_season_female.nona$padj

## Heatmaps:

#DE #1 heatmap: sex #jpeg(filename="DE1.jpg", units="in", width=10, height=10, pointsize=12, res=2100) heatmap3(rld_HG_DE_sex, method = "complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA, showRowDendro = TRUE) dev.off()

#DE #2 heatmap: season jpeg(filename="DE2.jpg", units="in", width=10, height=10, pointsize=12, res=2100) heatmap3(rld_HG_DE_season, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE) dev.off()

#DE #3 heatmap: sex:season interaction jpeg(filename="DE3.jpg", units="in", width=10, height=10, pointsize=12, res=2100) heatmap3(rld_HG_DE_interaction, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE) dev.off()

#DE #4 heatmap: Spring Males vs Spring Females: jpeg(filename="DE4.jpg", units="in", width=10, height=10, pointsize=12, res=2100) heatmap3(rld_HG_DE_spring_sex, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE) dev.off()

#DE #5 Summer Males vs Summer Females: jpeg(filename="DE5.jpg", units="in", width=10, height=10, pointsize=12, res=2100) heatmap3(rld_HG_DE_summer_sex, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE) dev.off()

297

#DE #6 Spring Males vs Summer Males: jpeg(filename="DE6.jpg", units="in", width=10, height=10, pointsize=12, res=2100) heatmap3(rld_HG_DE_season_male, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE) dev.off()

#DE #7 Spring Females vs Summer Females: jpeg(filename="DE7.jpg", units="in", width=10, height=10, pointsize=12, res=2100) heatmap3(rld_HG_DE_season_female, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE) dev.off()

298

R script Appendix R.5 Vomeronasal Receptor Differential Expression by sex and season: Used in: Chapter 5

setwd("C:/Users/Ehren/Desktop/analysis/VNO_RNA_seq") library("DESeq2") library('rld') library("heatmap3") library("EnhancedVolcano")

#clear environment rm(list=ls()) dev.off()

# Upload the all_counts.tab file df <- read.table("VNO_counts.tab") df_male<-df[1:16] df_female<-df[17:32] df_spring<-df[c(1:9,17:24)] df_summer<-df[c(10:16,25:32)] #head (df)

# Create a data.frame with the Key that expresses the factors for each sample dkey <- read.table("key_VNO.tab", header=T) dkey_male<-dkey[1:16,] #males dkey_female<-dkey[17:32,] #females dkey_spring<-dkey[c(1:9,17:24),] #spring dkey_summer<-dkey[c(10:16,25:32),] #summer

# Read in list of VNO receptors (615 transcripts) receptors_list<-read.table("VNO_receptors.tab") nrow(receptors_list)

#Heatmap parameters: hmcol<-colorRampPalette(c("Blue","Black","Orange"))(15) hmap_pvalue<-0.05 #<- pvalue used to filter genes for heat map

# Filter for coverage - filter out low expression data columns <- ncol(df) columns_male <- ncol(df_male) columns_female <- ncol(df_female) columns_spring <- ncol(df_spring) columns_summer <- ncol(df_summer) coveragethrd <- 1 # min average number of reads per sample pvalue <- 0.05 # max adjusted p value for significance nrow(df) df <- df[rowSums(df)>=(columns*coveragethrd),] nrow(df)

299

nrow(df_female) df_female <- df_female[rowSums(df_female)>=(columns_female*coveragethrd),] nrow(df_female) nrow(df_male) df_male <- df_male[rowSums(df_male)>=(columns_male*coveragethrd),] nrow(df_male) nrow(df_spring) df_spring <- df_spring[rowSums(df_spring)>=(columns_spring*coveragethrd),] nrow(df_spring) nrow(df_summer) df_summer <- df_summer[rowSums(df_summer)>=(columns_summer*coveragethrd),] nrow(df_summer)

# Make the matrix of df dataframe dm <- as.matrix(df) dm_female <- as.matrix(df_female) dm_male <- as.matrix(df_male) dm_spring <- as.matrix(df_spring) dm_summer <- as.matrix(df_summer)

# Create full model NOT including interaction term dds_sex_season <- DESeqDataSetFromMatrix(dm, dkey, design = ~ sex + season) dds_sex_season$sex <- relevel(dds_sex_season$sex, ref="female") # Modify these lines to set interaction comparisons below dds_sex_season$season <- relevel(dds_sex_season$season, ref="summer") # Modify these lines to set interaction comparisons below dds_sex_season <- DESeq(dds_sex_season)

# Test the effect of sex dds_sex <-nbinomLRT(dds_sex_season,reduced = ~ season) resultsNames(dds_sex) res_sex<-results(dds_sex, contrast = c("sex", "male", "female"),alpha = pvalue) res_sex.nona<- na.omit(res_sex) nrow(res_sex.nona[res_sex.nona$padj<=pvalue,]) summary(res_sex) summary(res_sex.nona) #write.table (res_sex, "C:/Users/Ehren/Desktop/analysis/VNO_sex_results.tab", sep="\t", col.names = NA)

# Test the effect of season dds_season <-nbinomLRT(dds_sex_season,reduced = ~ sex) res_season<-results(dds_season, contrast = c("season", "spring", "summer"),alpha = pvalue) res_season.nona<- na.omit(res_season) nrow(res_season.nona[res_season.nona$padj<=pvalue,]) summary(res_season)

300 summary(res_season.nona) #write.table (res_season, "C:/Users/Ehren/Desktop/analysis/VNO_RNA_seq/VNO_season_results.tab", sep="\t", col.names = NA)

# Create full model with all genes including interaction term dds_full <- DESeqDataSetFromMatrix(dm, dkey, design = ~ sex + season + sex:season) dds_full$sex <- relevel(dds_full$sex, ref="female") # Modify these lines to set interaction comparisons below dds_full$season <- relevel(dds_full$season, ref="summer") # Modify these lines to set interaction comparisons below dds_full <- DESeq(dds_full)

#Test the effects of the sex:season interaction term dds_interaction <-nbinomLRT(dds_full,reduced = ~ sex + season) resultsNames(dds_interaction) res_interaction<- results(dds_interaction,contrast=list("sexmale.seasonspring"),alpha = pvalue) res_interaction.nona<- na.omit(res_interaction) nrow(res_interaction.nona[res_interaction.nona$padj<=pvalue,]) summary(res_interaction) summary(res_interaction.nona) #write.table (res_interaction.nona, "C:/Users/Ehren/Desktop/analysis/VNO_RNA_seq/VNO_interaction_results.tab", sep="\t", col.names=NA)

# Create full model for females only dds_female <- DESeqDataSetFromMatrix(dm_female, dkey_female, design = ~ season) dds_female$season <- relevel(dds_female$season, ref="summer") dds_female<-DESeq(dds_female)

# Test the effect of season in females only res_season_female<-results(dds_female, contrast = c("season", "spring", "summer"),alpha = pvalue) res_season_female.nona<- na.omit(res_season_female) nrow(res_season_female.nona[res_season_female.nona$padj<=pvalue,]) summary(res_season_female) summary(res_season_female.nona) #write.table (res_season_female.nona, "C:/Users/Ehren/Desktop/analysis/VNO_RNA_seq/VNO_season_female_results.tab" , sep="\t", col.names=NA)

# Create full model for males only dds_male <- DESeqDataSetFromMatrix(dm_male, dkey_male, design = ~ season) dds_male$season <- relevel(dds_male$season, ref="summer") dds_male<-DESeq(dds_male)

# Test the effect of season in males only

301 res_season_male<-results(dds_male, contrast = c("season", "spring", "summer"),alpha=pvalue) res_season_male.nona<- na.omit(res_season_male) nrow(res_season_male.nona[res_season_male.nona$padj<=pvalue,]) summary(res_season_male) summary(res_season_male.nona) #write.table (res_season_male.nona, "C:/Users/Ehren/Desktop/analysis/VNO_RNA_seq/VNO_season_male_results.tab", sep="\t", col.names=NA)

# Create full model for spring samples only dds_spring <- DESeqDataSetFromMatrix(dm_spring, dkey_spring, design = ~ sex) dds_spring$sex <- relevel(dds_spring$sex, ref="female") dds_spring<-DESeq(dds_spring)

# Test the effect of sex in spring samples only res_spring_sex<-results(dds_spring, c("sex", "male", "female"), alpha=pvalue) res_spring_sex.nona<- na.omit(res_spring_sex) nrow(res_spring_sex.nona[res_spring_sex.nona$padj<=pvalue,]) summary(res_spring_sex) summary(res_spring_sex.nona) #write.table (res_spring_sex.nona, "C:/Users/Ehren/Desktop/analysis/VNO_RNA_seq/VNO_spring_sex_results.tab", sep="\t", col.names=NA)

# Create full model for summer samples only dds_summer <- DESeqDataSetFromMatrix(dm_summer, dkey_summer, design = ~ sex) dds_summer$sex <- relevel(dds_summer$sex, ref="female") dds_summer<-DESeq(dds_summer)

# Test the effect of sex in summer samples only res_summer_sex<-results(dds_summer, c("sex", "male", "female"), alpha=pvalue) res_summer_sex.nona<- na.omit(res_summer_sex) nrow(res_summer_sex.nona[res_summer_sex.nona$padj<=pvalue,]) summary(res_summer_sex) summary(res_summer_sex.nona) #write.table (res_summer_sex.nona, "C:/Users/Ehren/Desktop/analysis/VNO_RNA_seq/VNO_summer_sex_results.tab", sep="\t", col.names=NA)

###################################################################### # PCA Plot of all data: rld <- rlogTransformation(dds_full) #jpeg(filename="VNO_PCA_top2000.jpg", units="in", width=10, height=10, pointsize=20, res=1200) "plotPCA"(rld, intgroup=c('season','sex'), ntop = 2000, returnData = FALSE) dev.off()

302

# PCA Plot of receptors only: receptors_list <- read.table("VNO_receptors.tab") receptors_df <- df[rownames(df)%in% unlist (receptors_list),] receptors_df <- as.matrix(receptors_df) nrow(receptors_df) receptors_df <- receptors_df[rowSums(receptors_df)>=(columns*coveragethrd),] nrow(receptors_df) receptors_dm <- as.matrix(receptors_df) dds_receptors <- DESeqDataSetFromMatrix(receptors_dm, dkey, design = ~sex + season) rld_receptors <- rlogTransformation(dds_receptors) jpeg(filename="VNO_PCA_receptors.jpg", units="in", width=10, height=10, pointsize=20, res=1200) "plotPCA"(rld_receptors, intgroup=c('season','sex'), returnData = FALSE) dev.off()

# Create rld matrix of the top n transcripts: #top_exp<- order(rowMeans(counts(dds_sex_season,normalized=FALSE)),decreasing=TRUE)[1: 2000] #top_exp_matrix <- counts(dds_full,normalized=FALSE)[top_exp,] #rld_top_exp<-rlogTransformation(top_exp_matrix)

#Heatmap of z scores; top n transcripts #jpeg(filename="VNO_heatmap_TOP1000.jpg", units="in", width=10, height=10, pointsize=12, res=2100) #heatmap3(rld_top_exp ,method="ward.D2", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA, showRowDendro = TRUE) #dev.off()

########################################################################### ##########

# Make comparison results tables:

#Comparison #1 res_receptors_sex <- res_sex[rownames(res_sex)%in% unlist (receptors_list),] res_receptors_sex <- res_receptors_sex[,-6] nrow(res_receptors_sex) pvals_receptors_sex <- res_receptors_sex$pvalue receptor_padj <- p.adjust(pvals_receptors_sex, method = "BH") res_receptors_sex <-cbind(res_receptors_sex, receptor_padj) res_receptors_sex.nona<- na.omit(res_receptors_sex) #write.table (res_receptors_sex.nona, "C:/Users/Ehren/Desktop/analysis/VNO_RNA_Seq/VNO_receptors_sex_results1.tab ", sep="\t", col.names = NA)

#Comparison #2 res_receptors_season <- res_season[rownames(res_season)%in% unlist (receptors_list),]

303 res_receptors_season <- res_receptors_season[,-6] nrow(res_receptors_season) pvals_receptors_season <- res_receptors_season$pvalue receptor_padj <- p.adjust(pvals_receptors_season, method = "BH") res_receptors_season <-cbind(res_receptors_season, receptor_padj) #write.table (res_receptors_season, "C:/Users/Ehren/Desktop/analysis/VNO_RNA_Seq/VNO_receptors_season_results.t ab", sep="\t", col.names = NA)

#Comparison #3 res_receptors_interaction <- res_interaction[rownames(res_interaction)%in% unlist (receptors_list),] res_receptors_interaction <- res_receptors_interaction[,-6] nrow(res_receptors_interaction) pvals_receptors_interaction <- res_receptors_interaction$pvalue receptor_padj <- p.adjust(pvals_receptors_interaction, method = "BH") res_receptors_interaction <-cbind(res_receptors_interaction, receptor_padj) #write.table (res_receptors_interaction, "C:/Users/Ehren/Desktop/analysis/VNO_RNA_Seq/VNO_receptors_interaction_resu lts.tab", sep="\t", col.names = NA)

#Comparison #4 res_receptors_season_male <- res_season_male[rownames(res_season_male)%in% unlist (receptors_list),] res_receptors_season_male <- res_receptors_season_male[,-6] nrow(res_receptors_season_male) pvals_receptors_season_male <- res_receptors_season_male$pvalue receptor_padj <- p.adjust(pvals_receptors_season_male, method = "BH") res_receptors_season_male <-cbind(res_receptors_season_male, receptor_padj) #write.table (res_receptors_season_male, "C:/Users/Ehren/Desktop/analysis/VNO_RNA_Seq/VNO_receptors_season_male_resu lts.tab", sep="\t", col.names = NA)

#Comparison #5 res_receptors_season_female <- res_season_female[rownames(res_season_female)%in% unlist (receptors_list),] res_receptors_season_female <- res_receptors_season_female[,-6] nrow(res_receptors_season_female) pvals_receptors_season_female <- res_receptors_season_female$pvalue receptor_padj <- p.adjust(pvals_receptors_season_female, method = "BH") res_receptors_season_female <-cbind(res_receptors_season_female, receptor_padj) #write.table (res_receptors_season_female, "C:/Users/Ehren/Desktop/analysis/VNO_RNA_Seq/VNO_receptors_season_female_re sults.tab", sep="\t", col.names = NA)

#Comparison #6 res_receptors_summer_sex <- res_summer_sex[rownames(res_summer_sex)%in% unlist (receptors_list),] res_receptors_summer_sex <- res_receptors_summer_sex[,-6] nrow(res_receptors_summer_sex) pvals_receptors_summer_sex <- res_receptors_summer_sex$pvalue

304 receptor_padj <- p.adjust(pvals_receptors_summer_sex, method = "BH") res_receptors_summer_sex <-cbind(res_receptors_summer_sex, receptor_padj) #write.table (res_receptors_summer_sex, "C:/Users/Ehren/Desktop/analysis/VNO_RNA_Seq/VNO_receptors_summer_sex_resul ts.tab", sep="\t", col.names = NA)

#Comparison #7 res_receptors_spring_sex <- res_spring_sex[rownames(res_spring_sex)%in% unlist (receptors_list),] res_receptors_spring_sex <- res_receptors_spring_sex[,-6] nrow(res_receptors_spring_sex) pvals_receptors_spring_sex <- res_receptors_spring_sex$pvalue receptor_padj <- p.adjust(pvals_receptors_spring_sex, method = "BH") res_receptors_spring_sex <-cbind(res_receptors_spring_sex, receptor_padj) #write.table (res_receptors_spring_sex, "C:/Users/Ehren/Desktop/analysis/VNO_RNA_Seq/VNO_receptors_spring_sex_resul ts.tab", sep="\t", col.names = NA)

########################################################################### ##########

#Heatmaps of DE transcripts:

#DE #1 by sex res_receptors_sex<-na.omit(res_receptors_sex) DE_receptors_sex <- rownames(res_receptors_sex.nona[res_receptors_sex.nona$receptors_sex_de_pad j

#DE #2 by season res_receptors_season.nona<-na.omit(res_receptors_season) DE_receptors_season <- rownames(res_receptors_season.nona[res_receptors_season.nona$receptors_seas on_de_padj

#DE #3 by sex:season interaction res_receptors_interaction.nona<-na.omit(res_receptors_interaction) DE_receptors_interaction <- rownames(res_receptors_interaction.nona[res_receptors_interaction.nona$rece ptors_interaction_de_padj

305 rld_DE_receptors_interaction <- rlogTransformation(DE_receptors_interaction_matrix)

#DE #4 spring male v spring female res_receptors_spring_sex.nona<-na.omit(res_receptors_spring_sex) DE_receptors_spring_sex <- rownames(res_receptors_spring_sex.nona[res_receptors_spring_sex.nona$recept ors_spring_sex_de_padj

#DE #5 summer male v summer female res_receptors_summer_sex.nona<-na.omit(res_receptors_summer_sex) DE_receptors_summer_sex <- rownames(res_receptors_summer_sex.nona[res_receptors_summer_sex.nona$recept ors_summer_sex_de_padj

#DE #6 spring male v summer male res_receptors_season_male.nona<-na.omit(res_receptors_season_male) DE_receptors_season_male <- rownames(res_receptors_season_male.nona[res_receptors_season_male.nona$rece ptors_season_male_de_padj

#DE #7 spring female v summer female res_receptors_season_female.nona<-na.omit(res_receptors_season_female) DE_receptors_season_female <- rownames(res_receptors_season_female.nona[res_receptors_season_female.nona$ receptors_season_female_de_padj

## Heatmaps:

#DE #1 heatmap: sex #jpeg(filename="DE1.jpg", units="in", width=10, height=10, pointsize=12, res=2100)

306 heatmap3(rld_DE_receptors_sex, method = "complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA, showRowDendro = TRUE) dev.off()

#DE #2 heatmap: season #jpeg(filename="DE2.jpg", units="in", width=10, height=10, pointsize=12, res=2100) heatmap3(rld_DE_receptors_season, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE) dev.off()

#DE #3 heatmap: sex:season interaction #jpeg(filename="DE3.jpg", units="in", width=10, height=10, pointsize=12, res=2100) heatmap3(rld_DE_receptors_interaction, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE) dev.off()

#DE #4 heatmap: Spring Males vs Spring Females: #jpeg(filename="rld_DE4.jpg", units="in", width=10, height=10, pointsize=12, res=1200) #jpeg(filename="DE4.jpg", units="in", width=10, height=10, pointsize=12, res=1200) heatmap3(DE_receptors_spring_sex_matrix, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE) heatmap3(rld_DE_receptors_spring_sex, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE) dev.off()

#DE #5 Summer Males vs Summer Females: #jpeg(filename="DE5.jpg", units="in", width=10, height=10, pointsize=12, res=2100) heatmap3(rld_DE_receptors_summer_sex, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE) dev.off()

#DE #6 Spring Males vs Summer Males: #jpeg(filename="DE6.jpg", units="in", width=10, height=10, pointsize=12, res=2100) heatmap3(rld_DE_receptors_season_male, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE) dev.off()

#DE #7 Spring Females vs Summer Females: jpeg(filename="DE7_receptors.jpg", units="in", width=10, height=10, pointsize=12, res=1200) jpeg(filename="rld_DE7_receptors.jpg", units="in", width=10, height=10, pointsize=12, res=1200) heatmap3(rld_DE_receptors_season_female, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE) heatmap3(DE_receptors_season_female_matrix, method="complete", Rowv=TRUE, Colv=NA, col=hmcol, scale="row",labRow=NA,showRowDendro = TRUE)

307 dev.off()

###################################################################### # Volcano Plots:

#DE #1 by sex #jpeg(filename="DE1_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_sex, lab = rownames(res_sex), x = 'log2FoldChange', y = 'padj', xlim = c(-7, 7), ylim = c(0,250), transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0, col=c('black', 'blue', 'black', 'orange3'), title = 'Expression by Sex Across All Samples') dev.off()

#DE #2 by season #jpeg(filename="DE2_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_season, lab = rownames(res_season), x = 'log2FoldChange', y = 'padj', xlim = c(-7, 7), ylim = c(0,250), transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0, col=c('black', 'blue', 'black', 'orange3'), title = ' Expression by Season Across All Samples') dev.off()

#DE #3 by sex-season interaction #jpeg(filename="DE3_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_interaction, lab = rownames(res_interaction), x = 'log2FoldChange', y = 'padj', xlim = c(-7, 7), transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0, col=c('black', 'blue', 'black', 'orange3'), title = ' Expression by Sex-by-Season Interaction Across All Samples') dev.off()

308

#DE #4 heatmap: Spring Males vs Spring Females: #jpeg(filename="DE4_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_spring_sex, lab = rownames(res_spring_sex), x = 'log2FoldChange', y = 'padj', xlim = c(-7.5, 7.5), transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0, col=c('black', 'blue', 'black', 'orange3'), title = ' Expression in Spring Males and Spring Females') dev.off()

#DE #5 Summer Males vs Summer Females: #jpeg(filename="DE5_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_summer_sex, lab = rownames(res_summer_sex), x = 'log2FoldChange', y = 'padj', xlim = c(-7.5, 7.5), transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0, col=c('black', 'blue', 'black', 'orange3'), title = ' Expression in Summer Males and Summer Females') dev.off()

#DE #6 Spring Males vs Summer Males: #jpeg(filename="DE6_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_season_male, lab = rownames(res_season_male), x = 'log2FoldChange', y = 'padj', xlim = c(-7.5, 7.5), transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0, col=c('black', 'blue', 'black', 'orange3'), title = ' Expression in Spring Males and Summer Males') dev.off()

#DE #7 Spring Females vs Summer Females: #jpeg(filename="DE7_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_season_female, lab = rownames(res_season_female), xlim = c(-7.5, 7.5),

309

x = 'log2FoldChange', y = 'padj', transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0, col=c('black', 'blue', 'black', 'orange3'), title = ' Expression in Spring Females and Summer Females') dev.off()

# Receptor Volcano Plots:

#DE #1 by sex #jpeg(filename="DE1_receptors_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_receptors_sex, lab = rownames(res_receptors_sex), x = 'log2FoldChange', y = 'receptor_padj', transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0, col=c('black', 'blue', 'black', 'orange3'), title = 'Vomeronasal Receptor Expression by Sex Across All Samples') dev.off()

#DE #2 by season #jpeg(filename="DE2_receptors_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_receptors_season, lab = rownames(res_receptors_season), x = 'log2FoldChange', y = 'receptor_padj', transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0, col=c('black', 'blue', 'black', 'orange3'), title = 'Vomeronasal Receptor Expression by Season Across All Samples') dev.off()

#DE #3 by sex-season interaction #jpeg(filename="DE3_receptors_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_receptors_interaction, lab = rownames(res_receptors_interaction), x = 'log2FoldChange', y = 'receptor_padj', transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0,

310

col=c('black', 'blue', 'black', 'orange3'), title = 'Vomeronasal Receptor Expression by Sex-by-Season Interaction Across All Samples') dev.off()

#DE #4 heatmap: Spring Males vs Spring Females: #jpeg(filename="DE4_receptors_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_receptors_spring_sex, lab = rownames(res_receptors_spring_sex), x = 'log2FoldChange', y = 'receptor_padj', transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0, col=c('black', 'blue', 'black', 'orange3'), title = 'Vomeronasal Receptor Expression in Spring Males and Spring Females') dev.off()

#DE #5 Summer Males vs Summer Females: #jpeg(filename="DE5_receptors_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_receptors_summer_sex, lab = rownames(res_receptors_summer_sex), x = 'log2FoldChange', y = 'receptor_padj', transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0, col=c('black', 'blue', 'black', 'orange3'), title = 'Vomeronasal Receptor Expression in Summer Males and Summer Females') dev.off()

#DE #6 Spring Males vs Summer Males: #jpeg(filename="DE6_receptors_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_receptors_season_male, lab = rownames(res_receptors_season_male), x = 'log2FoldChange', y = 'receptor_padj', transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0, col=c('black', 'blue', 'black', 'orange3'), title = 'Vomeronasal Receptor Expression in Spring Males and Summer Males') dev.off()

#DE #7 Spring Females vs Summer Females:

311

#jpeg(filename="DE7_receptors_volcano.jpg", units="in", width=12.5, height=10, pointsize=12, res=1200) EnhancedVolcano(res_receptors_season_female, lab = rownames(res_receptors_season_female), x = 'log2FoldChange', y = 'receptor_padj', transcriptPointSize = 3, transcriptLabSize = 2.5, pCutoff = 0.05, FCcutoff=0, col=c('black', 'blue', 'black', 'orange3'), title = 'Vomeronasal Receptor Expression in Spring Females and Summer Females')

312

Unix custom script code appendices

Unix script Appendix U.1 “QualFilterFastq.pl”

#!/usr/bin/perl # written by E Meyer, [email protected] # distributed without any guarantees or restrictions

# -- check for dependencies $mod2="Bio::SeqIO"; unless(eval("require $mod2")) {print "$mod2 not found. Exiting\n"; exit;} use Bio::SeqIO; $mod3="Bio::Seq::Quality"; unless(eval("require $mod3")) {print "$mod3 not found. Exiting\n"; exit;} use Bio::Seq::Quality; $scriptname=$0; $scriptname =~ s/.+\///g;

# -- program description and required arguments unless ($#ARGV == 3) {print "\nRemoves reads containing too many low quality basecalls from a set of short sequences \n"; print "Output:\t high-quality reads in FASTQ format\n"; print "Usage:\t $scriptname input.fastq low_score min_LQ output.fastq\n"; print "Arguments:\n"; print "\t input.fastq\t raw input reads in FASTQ format\n"; print "\t low score\t quality scores below this are considered low quality (LQ)\n"; print "\t min_LQ\t\t reads with more than this many LQ bases are excluded\n"; print "\t output.fastq\t name for ourput file of HQ reads in FASTQ format\n"; print "\n"; exit; }

my $fastqfile = $ARGV[0]; my $lowq = $ARGV[1]; my $minlq = $ARGV[2]; my $outfqfile = $ARGV[3];

#my $inseqs = new Bio::SeqIO(-file=>$fastqfile, -format=>"fastq- illumina"); my $inseqs = new Bio::SeqIO(-file=>$fastqfile, -format=>"fastq");

my %sh; my $scount = 0; while ($seq = $inseqs->next_seq) { $scount++;

313

$qo = new Bio::Seq::Quality(-accession_number=>$seq->display_id, -qual=>$seq->qual, -verbose=>-1); $qot = $qo->qual_text; @qoa = split(" ", $qot); $qid = $qo->accession_number; $lqcount = 0; foreach $q (@qoa) { if ($q - $const < $lowq) {$lqcount++;} } if ($lqcount > $minlq) {$toolow++; next;} $gh{$qid}++; }

print "Output from ", $scriptname, "\n"; print "Checked ", $scount, " reads.\n"; print $toolow, " failed.\n"; print $scount - $toolow, " passed.\n"; print $toolow/$scount, " rejection rate.\n";

print "Writing sequences to output...\n"; #my $inseqs = new Bio::SeqIO(-file=>$fastqfile, -format=>"fastq- illumina"); #my $outseqs = new Bio::SeqIO(-file=>">$outfqfile", -format=>"fastq- illumina"); my $inseqs = new Bio::SeqIO(-file=>$fastqfile, -format=>"fastq"); my $outseqs = new Bio::SeqIO(-file=>">$outfqfile", -format=>"fastq"); $ocount = 0; while ($seq = $inseqs->next_seq) { $qo = new Bio::Seq::Quality(-accession_number=>$seq->display_id, -qual=>$seq->qual, -verbose=>-1); $qid = $qo->accession_number; if (exists($gh{$qid})) {$outseqs->write_seq($seq); $ocount++;}; }

print "Done.\n"; print $ocount, " sequences written to output.\n"; system("date");

314

Unix script Appendix U.2 “fix_PE_fastq.pl”

# removes orphans from a pair of PE Illumina fastq reads # this script assume there is no character defining F and R reads (e.g. _1 or /1) # rather, they have identical names in the two files # this script also assumes there are no differences in order between files # orphans are written to a new file in case the user wants to work with them

$scriptname = $0; $scriptname =~ s/.+\///; unless ($#ARGV==5) { print "\nUsage: $scriptname input_F intput_R output_F output_R output_UF output_UR\n"; print "Where:\tinput_F:\tinput, forward reads (FASTQ)\n"; print "\tinput_R:\tinput, reverse reads (FASTQ)\n"; print "\toutput_F:\ta name for output file, forward reads with valid mates(FASTQ)\n"; print "\toutput_R:\ta name for output file, reverse reads with valid mates(FASTQ)\n"; print "\toutput_UF:\ta name for output file, unpaired reads (orphans)from forward input (FASTQ)\n"; print "\toutput_UR:\ta name for output file, unpaired reads (orphans)from reverse input (FASTQ)\n\n"; exit; }

$if1 = $ARGV[0]; $if2 = $ARGV[1]; $of1 = $ARGV[2]; $of2 = $ARGV[3]; $of3 = $ARGV[4]; $of4 = $ARGV[5];

# loop through infile 1 and build a hash of sequence IDs open(IN, $if1); while() { chomp; if ($_ !~ /\S/) {next;} $count++; if ($count==4) {$count=0;next;} if ($count==1) { $f1c++; $h1{$_}++; } } close(IN);

315

# loop through infile 2 and build a hash of sequence IDs found in both files open(IN, $if2); $count=0; while() { chomp; if ($_ !~ /\S/) {next;} $count++; if ($count==4) {$count=0;next;} if ($count==1) { $f2c++; if (exists($h1{$_})) { $bh{$_}++; } } } close(IN);

# loop through infile 1 and write out paired sequences to outfile 1 and unpaired to outfile 3 open(IN, $if1); open(OUT, ">$of1"); open(UPO, ">$of3"); $count=0; $switch=0; while() { chomp; if ($_ !~ /\S/) {next;} $count++; if ($count==4) {$count=0;} if ($count==1) { $switch=0; if (exists($bh{$_})) { $gc++; $switch++; } else {$upn1++;} } if ($switch>0) { print OUT $_, "\n"; } else { print UPO $_, "\n"; }

316

} close(IN); close(OUT); close(UPO);

# loop through infile 2 and write out paired sequences to outfile 2 and unpaired to outfile 4 open(IN, $if2); open(OUT, ">$of2"); open(UPO, ">$of4"); $count=0; $switch=0; while() { chomp; if ($_ !~ /\S/) {next;} $count++; if ($count==4) {$count=0;} if ($count==1) { $switch=0; if (exists($bh{$_})) { $switch++; } else {$upn2++;} } if ($switch>0) { print OUT $_, "\n"; } else { print UPO $_, "\n"; } } close(IN); close(OUT);

print $f1c, " sequences in ", $if1, "\n"; print $f2c, " sequences in ", $if2, "\n"; print $gc, " valid pairs written to $of1 and $of2\n"; print $upn1, " unpaired reads written to $of3\n"; print $upn2, " unpaired reads written to $of4\n";

317

Unix script Appendix U.3 “trinity_reps.pl”

#!/usr/bin/perl $scriptname=$0; $scriptname =~ s/.+\///g; unless ($#ARGV==3) { print "\nSelects the longest representative for each component or\n"; print "subcomponent in a Trinity transcriptome assembly.\n"; print "Usage: $scriptname assembly.fasta option output.tab output.fasta\n"; print "Where:\n"; print "\tassembly.fasta:\tthe input file, assembled by Trinity\n"; print "\toption:\t\tc for component, or s for subcomponent\n"; print "\toutput.tab:\ta name for the output summary file\n"; print "\toutput.fasta:\ta name for the output file of representative transcripts\n\n"; exit; } use Bio::SeqIO;

# define variables $inseq = $ARGV[0]; $opt = $ARGV[1]; $outtab = $ARGV[2]; $outseq = $ARGV[3]; $iseqs = new Bio::SeqIO(-file=>$inseq, -format=>"fasta"); $oseqs = new Bio::SeqIO(-file=>">$outseq", -format=>"fasta"); open(OUT, ">$outtab");

# record length, component, and subcomponent for each transcript while ($seq = $iseqs->next_seq) { $noseq++; $rsid = $seq->display_id; @rsa = split("_", $rsid); $nh{$rsid}{"c"} = $rsa[0]; $nh{$rsid}{"s"} = $rsa[0]."_".$rsa[1]; $clh{$rsa[0]}{$rsid} = $seq->length; $slh{$rsa[0]."_".$rsa[1]}{$rsid} = $seq->length; $nrch{$rsa[0]}++; $nrsh{$rsa[0]."_".$rsa[1]}++; }

# build a hash of selected transcripts if ($opt eq "c") { print OUT "comp\tmembers\tlongest\n"; foreach $c (sort(keys(%clh))) {

318

%subch = %{$clh{$c}}; @ta = sort{$subch{$b}<=>$subch{$a}}(keys(%subch)); $nmc = @ta; $ti = $ta[0]; $gh{$ti}++; print OUT $c, "\t", $nmc, "\t", $ti, "\n"; } } elsif ($opt eq "s") { print OUT "subcomp\tmembers\tlongest\n"; foreach $s (sort(keys(%slh))) { %subsh = %{$slh{$s}}; @ta = sort{$subsh{$b}<=>$subsh{$a}}(keys(%subsh)); $nms = @ta; $ti = $ta[0]; $gh{$ti}++; print OUT $s, "\t", $nms, "\t", $ti, "\n"; } }

# write out selected transcripts to output $iseqs = new Bio::SeqIO(-file=>$inseq, -format=>"fasta"); while ($seq = $iseqs->next_seq) { $rsid = $seq->display_id; if(exists($gh{$rsid})) { $oseqs->write_seq($seq); } }

# print summary output @nrca = keys(%nrch); @nrsa = keys(%nrsh); $nnrc = @nrca; $nnrs = @nrsa; print STDERR $noseq, " sequences in input file.\n"; print STDERR $nnrc, " components.\n"; print STDERR $nnrs, " subcomponents.\n";

319

Unix script Appendix U.4 “GenesFromLocalDB_cgrb.pl”

#!/usr/bin/perl # written by E Meyer, [email protected] # distributed without any guarantees or restrictions # this version accounts for odd IO behavior on CGRB

$scriptname=$0; $scriptname =~ s/.+\///g;

# -- program name print "\n", "-"x60, "\n"; print "$scriptname v 1.1\n"; print "Created 14 Sep 2010\n"; print "Last modified 02 Feb 2016\n"; print "-"x60, "\n";

# -- program description and required arguments unless ($#ARGV == 5) {print "Assigns gene names to a set of DNA sequences based on sequence similarity\n"; print "with other genes of known function. This version relies on both a local\n"; print "sequence database (formatted for blast) and a local definitions file in which\n"; print "each sequence ID is associated with a gene name or other annotation.\n"; print "Output:\t a table of best matches and a fasta file of annotated sequences.\n"; print "Usage:\t script -i=seqs -t=threads -b=exclude -n/p=db - a=defs -e=evalue\n"; print "Arguments:\n"; print "\t -i=seqs\t fasta file of sequences to be annotated\n"; print "\t -t=threads\t number of threads to use in blast search\n"; print "\t -b=exclude\t a text file of \'bad words\' to be excluded (e.g. \'uncharacterized\')\n"; print "\t -p/n=db1\t The sequence database to search. p=protein, n=nucleotide\n"; print "\t -a=defs\t Tab delimited file of gene names associated with that db (ID, Name).\n"; print "\t -e=evalue\t critical e-value for blast search.\n"; print "\n"; exit; }

# -- use statements use warnings; use Bio::SeqIO; use Bio::SearchIO;

# -- user defined settings my $nprog = "tblastx";

320 my $pprog = "blastx"; my $hno = 20; my $crit = 0.0001;

# -- data input my @unk; my @dbs; my @ann; my %out; my %uhs; my @plist; my $exopt; my $run; my %dhs; my @tlist; system("date"); foreach $argi (0 .. $#ARGV) {$name = $ARGV[$argi]; chomp ($name); @flag = split(/=/, $name); if ($flag[0] eq "-i") {$qfil = $flag[1];} if ($flag[0] eq "-b") {$bfil = $flag[1];} if ($flag[0] eq "-p") {$prog = $pprog; $db = $flag[1]} if ($flag[0] eq "-n") {$prog = $nprog; $db = $flag[1]} if ($flag[0] eq "-a") {$def = $flag[1]} if ($flag[0] eq "-t") {$tno = $flag[1]} if ($flag[0] eq "-e") {$crit = $flag[1]} } my $qseq = new Bio::SeqIO (-file=>$qfil, -format=>'fasta'); system ("cp $qfil tq.fasta"); open (BIN, $bfil); @bword = ; close(BIN); print "Avoided terms: "; for (@bword) {chomp ($_); print $_, ", ";} print "\n\n";

# build hard coded variables my @chars = ("A".."Z", "a".."z"); print "Loading sequences...\n"; while (my $seq = $qseq->next_seq) {$nom = $seq->display_id; $dfl = $seq->description; $ss = $seq->seq; push @unk, $nom; $uhs{$nom} = $ss; $dhs{$nom} = $dfl; } my %shs = %uhs; my @orig = keys(%shs); print "Done.\n";

# -- load gene name database print "Loading definitions file...\n"; open (TAB, $def); my %tabh; while() { chomp; @cols = split("\t", $_); $tabh{$cols[0]} = $cols[1]; }

321

system("date"); print "Finished loading definitions file.\n\n";

# -- run BLAST search if (!-e "out.br") { print "Blasting $qfil against $db ...\n"; $eset = $crit*10; $string .= $chars[rand @chars] for 1..8; system("mkdir /data/$string"); system("$prog -db $db -query $qfil -evalue $eset - num_descriptions $hno -num_alignments $hno -out /data/$string/out.br - num_threads $tno"); system("cp /data/$string/out.br ."); system("rm -rf /data/$string"); system("date"); print "Finished blasting ".$db.".\n\n"; } else { print "Blast report out.br already exists. Using this file.\n"; }

# -- parse out top hits for each query and identify the gene name for that hit print "Parsing blast report from ".$db."...\n"; my $br = new Bio::SearchIO (-file=>"out.br", format=>'blast'); RESULTS: while (my $result = $br->next_result) {$qid = $result->query_accession; $hcount = 0; HITS: while (my $hit = $result->next_hit) { $check = 0; if ($hcount>0) {next RESULTS;} my $hobs = $hit->significance; if ($hobs > $crit) {next RESULTS;} # print $qid, "\t", $hit->accession, "\n"; HSPS: while (my $hsp = $hit->next_hsp) {$eobs = $hsp->evalue; if ($hcount>0) {next RESULTS;} if ($eobs > $crit) {next HITS;} if ($eobs <= $crit) {$hid = $hit->accession; if ($hid =~ /\|.*\|/) { $hid =~ s/^\w+\|//; $hid =~ s/\|.+//; } if (!exists($tabh{$hid})) {next HITS;} $thisname = $tabh{$hid}; foreach $bw (@bword) {

322

chomp($bw); if ($thisname =~ /$bw/i) {$check++; next HITS;} } if ($check == 0) { $thisname =~ s/ /_/g; $out{$qid}{"hit"} = $hid; $out{$qid}{"name"} = $thisname; $out{$qid}{"e"} = $eobs; delete($uhs{$qid}); $hcount++; next RESULTS; } } } } } system("date"); print "Finished parsing report from ".$db.".\n\n";

# -- output both annotated and non-annotated sequences with whatever information # -- is available for the sequence. original descriptions are retained. my $outseqs = new Bio::SeqIO(-file=>">gene_annotated.fasta", - format=>'fasta'); foreach $q (@orig) {if (defined($out{$q})) { $so = new Bio::Seq(-display_id=>$q, -seq=>$shs{$q}); $od = $dhs{$q}; $nd = $od." Match_Acc=".$out{$q}{"hit"}." Gene=".$out{$q}{"name"}; $so->description($nd); $outseqs->write_seq($so); } if (!defined($out{$q})) { $so = new Bio::Seq(-display_id=>$q, -seq=>$shs{$q}, -description=>$dhs{$q}); $outseqs->write_seq($so); } }

# -- output summary information my @un = keys (%uhs); my $uno = @un; print $uno." sequences remained un-annotated.\n"; my @an = keys (%out); my $ano = @an; print $ano." sequences successfully annotated.\n\n"; foreach $q (keys(%out)) {

323

print $q, "\t"; print $out{$q}{"hit"}, "\t"; print $out{$q}{"e"}, "\t"; print $out{$q}{"name"}, "\t"; print "\n";} print "\n"; #system("rm error*.log"); system("rm tq.fasta"); #system("rm tqi.fasta"); system("date"); print "\n";

324

Unix script Appendix U.5 “GOAnnotTable.pl”

#! /usr/bin/env perl # # quick script to parse out the go UniProt association file # into a more more sensible format. E Meyer 08 Aug 2008 # modified 10 May 2014 to reduce memory use open (IN, $ARGV[0]); while () { if ($_ =~ /^\!/) {next;} chomp; @cols = split("\t", $_); $gh{$cols[1]}{$cols[4]}++; } close(IN); foreach $a (sort(keys(%gh))) { %ah = %{$gh{$a}}; print $a, "\t"; foreach $g (sort(keys(%ah))) { print $g, " "; } print "\n"; }

325

Unix script Appendix U.6 “GO_by_gene.pl”

#! /usr/bin/env perl $scriptname=$0; $scriptname =~ s/.+\///g;

# -- program name print "-"x60, "\n"; print "$scriptname v 1.00\n"; print "Created 09 Feb 2012\n"; print "Last modified 09 Feb 2012\n"; print "-"x60, "\n";

# -- program description and required arguments unless ($#ARGV == 2) {print "Assigns GO terms to a set of sequences already annotated with gene names based on UniProt.\n"; print "Output:\t a fasta file of annotated sequences.\n"; print "Usage:\t $scriptname input annotations output\n"; print "Arguments:\n"; print "\t input\t fasta file of sequences to be annotated\n"; print "\t annotations\t file of UniProt GO annotations\n"; print "\t output\t a name for the output file\n"; print "\n"; exit; }

my $seqfile = $ARGV[0]; my $dbfile = $ARGV[1]; my $outfile = $ARGV[2];

open(DB, $dbfile); while() { chomp; @cols = split("\t", $_); $cols[1] =~ s/ /\,/g; $dbh{$cols[0]} = $cols[1]; } close(DB);

open(IN, $seqfile); open(OUT, ">$outfile"); while() { chomp; $goi = $mi = $igi = ""; unless ($_ =~ />/) {print OUT $_, "\n"; next;} $_ =~ s/>//; @bits = split(" ", $_); $iti = $bits[0]; $acci = ""; foreach $b (@bits)

326

{ $bcount++; if ($b =~ /Match_Acc=/) {$acci = $b; $acci =~ s/Match_Acc=//;} } if (exists($dbh{$acci})) {$goi = $dbh{$acci};} else {$goi = "";} print OUT ">"; foreach $b (@bits) { if ($b =~ /GOMatch=/) {next;} if ($b =~ /GOTerms=/) {next;} print OUT $b, " "; } if ($goi ne "") {print OUT "GO=", $goi, " ";} print OUT "\n"; }

327

Unix script Appendix U.7 “Make_ermineJ_annotations.pl”

#!/usr/bin/perl # written by E Meyer, [email protected], and modified by E. Bentz. Distributed without any guarantees or restrictions

# -- check arguments and print usage statement $scriptname=$0; $scriptname =~ s/.+\///g; $usage = <

Usage: $scriptname -s scores.tab -r FASTA -o output.tab

Required arguments:

-s score file (created from DESeq etc.) with 2 columns column 1 = gene IDs column 2 = score (both columns require a header)

-r reference file (GO annotated FASTA) from which to extract gene information

-o output annotations file (.tab) column 1 = gene IDs column 2 = probe IDs (may be identical to gene IDs column 3 = protein product descriptions column 4 = UniProt protein ID column 5 = comma delimited list of gene ontology (GO) terms

USAGE

## assign input options:

$mod1="Getopt::Std"; unless(eval("require $mod1")) {print "$mod1 not found. Exiting\n"; exit;} use Getopt::Std; getopts("s:r:o:");

if (!$opt_s ||!$opt_r ||!$opt_o || $opt_h) {print "\n", "-"x60, "\n", $scriptname, "\n", $usage, "-"x60, "\n\n"; exit;} my $scorefile = $opt_s; # input score file my $reference = $opt_r; # reference .fasta file my $outfile = $opt_o; # output annotations file

## read in score IDs

open (SCORES, $scorefile) or die "$! error opening scores file";

328

while() { chomp; $score_ids{substr($_, 0, index($_, "\t"))} += 1; } close SCORES;

## Process FASTA file lines matching scorefile

open(FASTA, $reference) or die "$! error opening FASTA file"; open(OUT, ">$outfile") or die "$! error opening output file"; while() { chomp; my $seq = $_; my ($seq_id) = $seq =~ /^>*(\S+)/; if (exists($score_ids{$seq_id})) { @annots = split(" ", $seq); $sid = $annots[0]; $sid =~ s/\>//; $anh{$sid}{"gene"} = $anh{$sid}{"match"} = $anh{$sid}{"go"} = ""; foreach $a (@annots) { if ($a =~ /Gene=/) { $genestr = $a; $genestr =~ s/Gene=//; $genestr =~ s/_/ /g; $anh{$sid}{"gene"} = $genestr; # print $genestr,"\n"; } if ($a =~ /Match_Acc=/) { $matchstr = $a; $matchstr =~ s/Match_Acc=//; $anh{$sid}{"match"} = $matchstr; # print $matchstr,"\n"; } if ($a =~ /GO=/) { $gostr = $a; $gostr =~ s/GO=//; $anh{$sid}{"go"} = $gostr; # print $gostr,"\n"; } } } }

329

## print to outfile

foreach $s (sort(keys(%anh))) { if ($anh{$s}{"go"} ne "") { print OUT$s, "\t", $s, "\t"; if (exists($anh{$s}{"gene"})) { print OUT$anh{$s}{"gene"}; } if (exists($anh{$s}{"match"})) { print OUT"(", $anh{$s}{"match"}, ")", "\t"; } print OUT$anh{$s}{"go"}; print OUT"\t"; print OUT$go, "\n"; }

else { print OUT $s, "\t", $s, "\t"; if ($anh{$s}{"gene"} ne "") { print OUT$anh{$s}{"gene"}; } if ($anh{$s}{"match"} ne "") { print OUT" (", $anh{$s}{"match"}, ")"; } print OUT"\t\n"; } }

close(FASTA); close(OUT); print "\n","Done.","\n"; print "\n","Output file: $outfile is now ready for use with ermineJ.","\n\n";

330

Unix script Appendix U.8 “SAMFilterByGene.pl”

#!/usr/bin/env perl # written by E Meyer, [email protected] # distributed without any guarantees or restrictions

# -- check arguments and print usage statement $scriptname=$0; $scriptname =~ s/.+\///g; $usage = <

Matches may be counted as: (1) the number of reads matching each sequence in the reference (2) the number of reads matching each gene, from a user defined list (3) the number of reads matching each component or subcomponent in a Trinity assembly

NOTE: make sure that when a read matches multiple reference sequences (ambigous) your mapper reports all of these, or at least all alignments as strong as the best alignment. e.g. with SHRiMP you could use the flag --strata. This is NOT the default behavior for some mappers, but is required to exclude ambiguous matches before further analysis.

Usage: $scriptname -i input -m matches -o output Required arguments: -i input Output from any short read mapper, in SAM format. -m matches Minimum number of matching bases required to consider an alignment valid. -o output A name for the filtered output (SAM format). Options: -p option 1: Report the number of reads matching each reference sequence in a separate output files "counts.tab". 0: Don't produce this file (default). -r method The method used for counting matches. s: (default) count the number of reads uniquely assigned to a single reference sequence. t: count the number of reads uniquely assigned to each gene, as defined by the component-subcomponent-transcript structure in denovo transcriptome assemblies produced by the Trinity assembler. This requires that your reference sequences are named in the style of Trinity assemblies, e.g. comp0_c0_seq1. (see option -t)

331

g: count the number of reads uniquely assigned to each gene, as defined by a user-supplied gene list (see option -g) -t option Options for method "-r t". Choose whether to count genes at the level of components ("-t c")(the default) or sub- components ("-t s"). -g gene_list Required for method "-r g". The name of a gene list (tab-delimited text) formatted as sequence1 gene1 sequence2 gene1 -l length Minimum length of aligned region (match, mismatch, + gaps) required to consider an alignment valid. Only relevant if your mapper uses local alignment. Default (for global alignments) is set equal to -m. USAGE if ($#ARGV < 3 || $ARGV[0] eq "-h") {print "\n", "-"x60, "\n", $scriptname, "\n", $usage, "-"x60, "\n\n"; exit;}

# -- module and executable dependencies $mod1="Getopt::Std"; unless(eval("require $mod1")) {print "$mod1 not found. Exiting\n"; exit;} use Getopt::Std;

# get variables from input getopts('i:m:o:p:r:t:g:l:h'); # in this example a is required, b is optional, h is help if (!$opt_i || !$opt_m || !$opt_o || $opt_h) {print "\n", "-"x60, "\n", $scriptname, "\n", $usage, "-"x60, "\n\n"; exit;} if ($opt_p) {$cprint = $opt_p;} else {$cprint = 0;} if ($opt_t) {$countlevel = $opt_t;} else {$countlevel = "c";} if ($opt_r) {$method = $opt_r;} else {$method = "s";} if ($opt_r eq "g" && !$opt_g) {print "\n", "-"x60, "\n", $scriptname, "\n", $usage, "-"x60, "\n\n"; exit;} if ($opt_g) {$amblist = $opt_g;} if ($opt_l) {$athd = $opt_l;} else {$athd = $opt_m;} my $infile = $opt_i; my $mthd = $opt_m; my $outfile = $opt_o; $ambig = $tooshort = 0;

if ($method eq "s") { # read in sam file output and build a hash, counting raw mappings open(IN, $infile); my %maph; while() { if ($_ =~ /^@/) {next;} chomp;

332

$rowi = $_; $rowi =~ s/^>//; @cols = split("\t", $rowi); $ncols = @cols;

if ($cols[2] eq "*") {next;} $rawmap++;

# -- extract alignment length $numstr = $cols[5]; @chars = split("M", $numstr); $aligni = 0; foreach $c (@chars) { $c =~ s/.+\D//g; if ($c > 0) {$aligni+=$c;} } # -- extract mismatches and apply alignment length and mismatch thresholds $mismatchi = 0; foreach $flag (@cols[11..$ncols]) { if ($flag =~ /NM\:i/) { $flag =~ s/NM\:i\://; $mismatchi = $flag; } } $matchi = $aligni - $mismatchi; if ($matchi < $mthd) {$tooweak++; next;} if ($aligni < $athd) {$tooshort++; next;}

# -- add extracted data to hash $maph{$cols[0]}{$cols[2]}{"count"}++; $maph{$cols[0]}{$cols[2]}{"string"} = $_; $maph{$cols[0]}{$cols[2]}{"align"} = $aligni; $maph{$cols[0]}{$cols[2]}{"match"} = $matchi; }

# count number of reads with one or more matches passing thresholds @mra = keys(%maph); $nmr = @mra;

# select final set of unique mappings open(OUT, ">$outfile"); open(IN, $infile); while() { chomp; if ($_ =~ /^@/) {print OUT $_, "\n";} else {last;} }

333

my %refch; foreach $r (@mra) { %rh = %{$maph{$r}}; @ma = keys(%rh); $nma = @ma; if ($nma>1) { @moa = sort{$rh{$b}{"match"}<=>$rh{$a}{"match"}}(keys(%rh)); if ($rh{$moa[0]}{"match"}==$rh{$moa[1]}{"match"}) { $ambig++; next; } } if ($nma == 1) {@moa = @ma;} $unimap++; $refch{$moa[0]}++; print OUT $rh{$moa[0]}{"string"}, "\n"; }

if ($cprint eq 1) { open(CTS, ">counts.tab"); @sref = sort{$refch{$b}<=>$refch{$a}}(keys(%refch)); foreach $s (@sref) { print CTS $s, "\t", $refch{$s}, "\n"; } }

print $rawmap, " raw mappings altogether.\n"; print $nmr, " reads had one or more matches\n"; print $tooshort, " excluded for short alignments\n"; print $tooweak, " excluded for weak matches\n"; print $ambig, " excluded for ambiguous matches\n"; print $unimap, " unique mappings remained\n"; }

elsif ($method eq "g") { # read in list of functionally equivelent reference sequences (FERS) and build a hash %fersh = (); open(IN, $amblist); while() { chomp; ($seq, $gene) = split("\t", $_); $fersh{$seq} = $gene; }

334

close(IN);

# read in sam file output and build a hash, counting raw mappings $ambig = $tooshort = 0; open(IN, $infile); my %maph = (); while() { if ($_ =~ /^@/) {next;} chomp; $rowi = $_; $rowi =~ s/^>//; @cols = split("\t", $rowi); $ncols = @cols; if ($cols[2] eq "*") {next;} $rawmap++;

$numstr = $cols[5]; @chars = split("M", $numstr);

# -- extract alignment length $numstr = $cols[5]; @chars = split("M", $numstr); $aligni = 0; foreach $c (@chars) { $c =~ s/.+\D//g; if ($c > 0) {$aligni+=$c;} } # -- extract mismatches and apply alignment length and mismatch thresholds $mismatchi = 0; foreach $flag (@cols[11..$ncols]) { if ($flag =~ /NM\:i/) { $flag =~ s/NM\:i\://; $mismatchi = $flag; } } $matchi = $aligni - $mismatchi; if ($matchi < $mthd) {$tooweak++; next;} if ($aligni < $athd) {$tooshort++; next;}

# -- check the FERS list for each sequence to decide how to count the reads $assign = ""; if(exists($fersh{$cols[2]})) { $assign = $fersh{$cols[2]}; } else {$assign = $cols[2];}

335

# -- check if a mapping already exists for this read-ref pair # -- and keep the best mapping if so if(exists($maph{$cols[0]}{$assign})) { $prevmatch = $maph{$cols[0]}{$assign}{"match"}; $thismatch = $matchi; if ($prevmatch >= $thismatch) {next;} } # -- add extracted data to hash $maph{$cols[0]}{$assign}{"count"}++; $maph{$cols[0]}{$assign}{"string"} = $_; $maph{$cols[0]}{$assign}{"align"} = $aligni; $maph{$cols[0]}{$assign}{"match"} = $matchi; }

# count number of reads with one or more matches passing thresholds @mra = keys(%maph); $nmr = @mra;

# select final set of unique mappings open(OUT, ">$outfile"); open(IN, $infile); while() { chomp; if ($_ =~ /^@/) {print OUT $_, "\n";} else {last;} } my %refch; foreach $r (@mra) { %rh = %{$maph{$r}}; @ma = keys(%rh); $nma = @ma; if ($nma>1) { @moa = sort{$rh{$b}{"match"}<=>$rh{$a}{"match"}}(keys(%rh)); if ($rh{$moa[0]}{"match"}==$rh{$moa[1]}{"match"} && $moa[0] ne $moa[1]) { $ambig++; next; } } if ($nma == 1) {@moa = @ma;} $unimap++; $refch{$moa[0]}++; print OUT $rh{$moa[0]}{"string"}, "\n"; }

if ($cprint eq 1)

336

{ open(CTS, ">counts.tab"); @sref = sort{$refch{$b}<=>$refch{$a}}(keys(%refch)); foreach $s (@sref) { print CTS $s, "\t", $refch{$s}, "\n"; } } print $rawmap, " raw mappings altogether.\n"; print $nmr, " reads had one or more matches\n"; print $tooshort, " excluded for short alignments\n"; print $tooweak, " excluded for weak matches\n"; print $ambig, " excluded for ambiguous matches\n"; print $unimap, " unique mappings remained\n"; }

elsif ($method eq "t") { # read in sam file output and build a hash, counting raw mappings open(IN, $infile); my %maph; while() { if ($_ =~ /^@/) {next;} chomp; $rowi = $_; $rowi =~ s/^>//; @cols = split("\t", $rowi); $ncols = @cols;

if ($cols[2] eq "*") {next;} $rawmap++;

$numstr = $cols[5]; @chars = split("M", $numstr);

# -- extract alignment length $numstr = $cols[5]; @chars = split("M", $numstr); $aligni = 0; foreach $c (@chars) { $c =~ s/.+\D//g; if ($c > 0) {$aligni+=$c;} } # -- extract mismatches and apply alignment length and mismatch thresholds $mismatchi = 0; foreach $flag (@cols[11..$ncols]) { if ($flag =~ /NM\:i/) {

337

$flag =~ s/NM\:i\://; $mismatchi = $flag; } } $matchi = $aligni - $mismatchi; if ($matchi < $mthd) {$tooweak++; next;} if ($aligni < $athd) {$tooshort++; next;}

# -- apply the count level decision $refname = $cols[2]; if ($countlevel eq "c") {$refname =~ s/_.+//g;} elsif ($countlevel eq "s") {$refname =~ s/_seq.+//g;} # -- add extracted data to hash $maph{$cols[0]}{$refname}{"count"}++; $maph{$cols[0]}{$refname}{"string"} = $_; $maph{$cols[0]}{$refname}{"align"} = $aligni; $maph{$cols[0]}{$refname}{"match"} = $matchi; }

# count number of reads with one or more matches passing thresholds @mra = keys(%maph); $nmr = @mra;

# select final set of unique mappings open(OUT, ">$outfile"); open(IN, $infile); while() { chomp; if ($_ =~ /^@/) {print OUT $_, "\n";} else {last;} } my %refch; foreach $r (@mra) { %rh = %{$maph{$r}}; @ma = keys(%rh); $nma = @ma; $status = 0; if ($nma>1) { @moa = sort{$rh{$b}{"match"}<=>$rh{$a}{"match"}}(keys(%rh)); for ($m=0; $m<@moa; $m++) { if ($status > 0) {next;} if ($rh{$moa[$m]}{"match"}==$rh{$moa[$m+1]}{"match"}) { if ($moa[$m] ne $moa[$m+1]) { $ambig++; $status++;

338

} } } } if ($nma == 1) {@moa = @ma;} if ($status > 0) {next;} $unimap++; $refch{$moa[0]}++; print OUT $rh{$moa[0]}{"string"}, "\n"; } if ($cprint eq 1) { open(CTS, ">counts.tab"); @sref = sort{$refch{$b}<=>$refch{$a}}(keys(%refch)); foreach $s (@sref) { print CTS $s, "\t", $refch{$s}, "\n"; } } print $rawmap, " raw mappings altogether.\n"; print $nmr, " reads had one or more matches\n"; print $tooshort, " excluded for short alignments\n"; print $tooweak, " excluded for weak matches\n"; print $ambig, " excluded for ambiguous matches\n"; print $unimap, " unique mappings remained\n"; }

339

Unix script Appendix U.9 “CombineExpression.pl”

#!/usr/bin/perl # written by E Meyer, [email protected] # distributed without any guarantees or restrictions

$scriptname=$0; $scriptname =~ s/.+\///g; if (!$ARGV[0] || $ARGV[0] eq "-h") { print "\nCombines expression data (counts) produced by mapping multiple\n"; print "samples against a shared database.\n"; print "Usage: $scriptname file1 file2 .. fileN >output.tab\n"; print "\toutput.tab: \tName of output file (rows=genes, columns=samples)\n\n"; exit; }

my %bigh;

print "\t"; foreach $argi (0..$#ARGV) { print $ARGV[$argi], "\t"; open(TAB, $ARGV[$argi]); while() { chomp; @cols = split("\t", $_); $bigh{$cols[0]}{$argi} = $cols[1]; } } print "\n";

foreach $r (sort(keys(%bigh))) { print $r, "\t"; foreach $argi (0..$#ARGV) { if(exists($bigh{$r}{$argi})) {print $bigh{$r}{$argi}, "\t"; } else {print 0, "\t";} } print "\n"; }

340

Unix script Appendix U.10 “TagTrimmer.pl”

#!/usr/bin/env perl # written by E Meyer, [email protected] distributed without any guarantees or restrictions

#-- check arguments and print usage statement $scriptname=$0; $scriptname =~ s/.+\///g; $usage = <

This version (modified by E. Bentz, 10-26-2018) does not discard reads lacking the GGG motif. Reads lacking GGG motif may be trimmed at a defined location (Removes no bases by default). For the original version, run TagTrimmer.pl.

Usage: $scriptname -i input -o output Required arguments: -i input raw input reads in FASTQ format -o output name for output file of HQ reads in FASTQ format Options: -b beginning defines the beginning of the range in which to search for the GGG motif. (default = 1) -e end defines the end of range in which to search for the GGG motif. (default = 8) -f first defines the first base to keep from reads lacking the GGG motif. (default = 1) -l last defines the last base to keep from reads lacking the GGG motif. (default = last base)

USAGE

# -- module and executable dependencies $mod1="Getopt::Std"; unless(eval("require $mod1")) {print "$mod1 not found. Exiting\n"; exit;} use Getopt::Std;

#get variables from input getopts('i:o:b:e:f:l:h'); if (!$opt_i ||!$opt_o || $opt_h) {print "\n", "-"x60, "\n", $scriptname, "\n", $usage, "-"x60, "\n\n"; exit;} my $seqfile = $opt_i; # input.fastq my $outfile = $opt_o; # output.fastq my $rstart = 1; if ($opt_s) {$rstart = $opt_s;} # beginning of region to search

341

my $rend = 8; if ($opt_e) {$rend = $opt_e;} # end of region to search my $first = 1; if ($opt_f) {$first = $opt_f;} # first base to keep from reads lacking GGG motif my $last = 1000; if ($opt_l) {$last = $opt_l;} # last base to keep from reads lacking GGG motif

##### loop through fastq file and record presence and position of motif in each read

print "\n"; print "Processing $seqfile with $scriptname.\n"; print "Reading $seqfile and searching for GGG motif in the region $rstart - $rend...\n"; system("date"); open (IN, $seqfile); my $line = 0; while() { chomp; $line++; if ($line eq 1) { $sid = $_; $incount++; next; } elsif ($line eq 2) { $ssi = $_; $regseq = substr($ssi, $rstart-1, ($rend-$rstart+1)); $found = 0; for ($a=$rend-3; $a>=$rstart; $a--) { $subseq = substr($regseq, $a, 3); if ($subseq eq "GGG") { $cuti = $a+3+1; $found++; last; } } if ($found eq 0) { $nomotif++; next; } $motif++; $sh{$sid} = $cuti; # print $sid,"\n"; # print $regseq,"\n";

342

# print $cuti,"\n"; # print $ssi,"\n"; # print substr($ssi, $cuti-1, (length($ssi)-$cuti)),"\n"; next; } elsif ($line eq 4) {$line = 0;} } close(IN); print "Done.\n"; system("date");

##### read in fastq file again and print trimmed reads to output:

print "Printing trimmed reads to $outfile ...\n"; open (IN, $seqfile); open (OUT, ">$outfile"); my $line = 0; while() { chomp; $line++; if ($line eq 1) { $sid = $_; }

# if "GGG" motif was found, trim it from the sequence: if (exists($sh{$sid})) { if ($line eq 1 || $line eq 3) { print OUT $_, "\n"; } if ($line eq 2 || $line eq 4) { $ssi = $_; $leni = length($ssi); $subseq = substr($ssi, ($sh{$sid}-1), ($leni- $sh{$sid}+1)); #<-added "+1" to keep the last base in $ssi print OUT $subseq, "\n"; } } # if no "GGG" motif was found, trim the first 7 bases from sequence anyway: else { if ($line eq 1 || $line eq 3) { print OUT $_, "\n"; } if ($line eq 2 || $line eq 4) { print OUT substr($_,$first-1,$last- $first+1),"\n";

343

} } if ($line eq 1) {$outcount++;} if ($line eq 4) {$line = 0;} } close(IN); close(OUT); print "Done.\n"; system("date");

print $incount, " sequences in $seqfile.\n"; print $motif, " of these had a GGG motif in the region $rstart - $rend.\n"; $pred = int($nomotif/$incount*1000+0.5)/10; print $nomotif, " of these lacked this motif in this region ($pred %).\n"; print $outcount, " sequences written to $outfile.\n";

344

Unix script Appendix U.11 “HRFilterFastq.pl”.

Filters a FASTQ file to remove sequences containing homopolymer repeats (HR) longer than the specified threshold Usage: HRFilterFastq.pl sequences crit_length output Arguments: sequences file of short reads to be filtered, fastq format crit_length reads containing HRs longer than this will be excluded output a name for the output file (fastq format)

[Linux@waterman scripts]$ cat HRFilterFastq.pl #!/usr/bin/perl # written by E Meyer, [email protected] # distributed without any guarantees or restrictions

# -- program description and required arguments $scriptname=$0; $scriptname =~ s/.+\///g; if ($#ARGV != 2 || $ARGV[0] eq "-h") {print "\nFilters a FASTQ file to remove sequences containing homopolymer\n"; print "repeats (HR) longer than the specified threshold\n"; print "Usage:\t $scriptname sequences crit_length output\n"; print "Arguments:\n"; print "\t sequences\t file of short reads to be filtered, fastq format\n"; print "\t crit_length\t reads containing HRs longer than this will be excluded\n"; print "\t output\t\t a name for the output file (fastq format)\n"; print "\n"; exit; }

my $seqfile = $ARGV[0]; # raw reads, fastq format my $critlen = $ARGV[1]; # critical length my $outfile = $ARGV[2]; # name for output file, fastq format my @vals = qw{A C G T N}; my @stra = (); foreach $v (@vals) { $ssi = $v x $critlen; push (@stra, $ssi); }

# loop through fastq file and print out only those passing filter open (IN, $seqfile); open (OUT, ">$outfile"); my $switch = 0; while() {

345

chomp; $count++; if ($count==1) {$ss = substr($_, 0, 8);} if ($_ =~ /^$ss/) { $thisname = $_; $nextup=1; $switch=0; } else { if ($nextup==1) { $switch=1; foreach $s (@stra) { if ($_ =~ /$s/) {$switch=0;} } $nextup=0; if ($switch==0) {$fail++;} elsif ($switch==1) { print OUT $thisname, "\n"; $pass++; } } if ($switch>0) {print OUT $_, "\n";} } } close(IN); print $fail, " reads failed\n"; print $pass, " reads passed\n"; system("date");