© COPYRIGHT

by

Deborah Jane Weinstein

2019

ALL RIGHTS RESERVED

GENOME-WIDE ANALYSIS OF EXPRESSION IN

(THE DEVIL WORM)

BY

Deborah Jane Weinstein

ABSTRACT

The Halicephalobus mephisto was discovered in an isolated aquifer, 1.3km below ground. H. mephisto thrives under extreme conditions including elevated heat (37.2°C) and minimal oxygen, classifying it as an . H. mephisto is a vital discovery for evolution and adaptation, with particular interest in its thermophilic abilities. Here we report the full transcriptome and genome of H. mephisto. In the process we identified a unique adaptation: over amplification of AIG1 and Hsp70 , with 168 and 142 domains respectively. Hsp70 was over-expressed under elevated heat conditions, along with ARMET and Bax inhibitor-1, suggesting these genes help H. mephisto to survive elevated heat. AIG1 was not upregulated in elevated heat suggesting its use for non-heat abiotic stressors such as hypoxia. This paper sheds light on the genomic adaptations that have evolved in H. mephisto to survive its challenging environment.

ii

TABLE OF CONTENTS

ABSTRACT ...... ii

LIST OF TABLES ...... iv

LIST OF ILLUSTRATIONS ...... v

LIST OF ABBREVIATIONS ...... vi

INTRODUCTION ...... 1

OVERVIEW OF METHODS UTILIZING GENOMIC PROGRAMS ...... 6

INPUT DATA ...... 8

MAKER 2 ...... 10

DOMAIN ANALYSIS ...... 11

GENE EXPRESSION ANALYSIS BY TOPHAT2-STRINGTIE- BALLGOWN ...... 13

GENE ONTOLOGY AND MANUAL INSPECTION OF TOP 20 UP- AND DOWN-REGULATED ...... 17

TREE BUIDLING ...... 19

CROSS-SPECIES HSP70 COMPARISON AND VENN DIAGRAM ...... 22

CONCLUSION ...... 24

APPENDIX ...... 30

REFERENCES ...... 35

iii

LIST OF TABLES

Table 1: Preliminary RNA samples from H. mephisto ...... 8

Table 2: Transcripts Generated by Trinity ...... 9

Table 3: Final Genome (Omega) Assembly ...... 9

Table 4: Gene Predictions from Maker2 ...... 10

Table 5: GO Terms for H. mephisto ...... 18

iv

LIST OF ILLUSTRATIONS

Figure 1: Phylogenetic Tree of Nematode Phylum Containing P. redivivus ...... 2

Figure 2: Phylogenetic Tree of H. mephisto ...... 3

Figure 3: The Computational Programs Utilized During this Research Proposal...... 6

Figure 4: Domain Analysis...... 12

Figure 5: Transcriptome Analysis of Gene Expression in H. mephisto...... 15

Figure 6: RNA Expression Data Analysis of H. mephisto...... 16

Figure 7: Bayesian Phylogenetic Tree of Hsp70...... 20

Figure 8: RAxML Phylogenetic Tree of AIG1...... 21

Figure 9: Graph Displaying the Number of Hsp70 Genes Across Nematode Species...... 22

Figure 10: Venn Diagram Comparing Orthologous Genes Clusters between H. mephisto, P. redivivus, C. elegans and D. malangogaster...... 23

v

LIST OF ABBREVIATIONS

H. mephisto Halicephalobus mephisto

C.elegans Caenorhabditis elegans

P. redivivus

HGT Horizontal gene transfer

HSP Heat-shock

ARMET Arginine-rich, mutated in early-stage tumors

BI-1 Bax Inhibitor-1

S. maltophila Stenotrophomonas maltophila

ER Endoplasmic reticulum

UPR Unfolded protein response

PERK PKR-like ER kinase

RIDD Regulated IRE1a Dependent Decay

AIG1 avrRpt2-induced gene

NGM Nematode Growth Media

BAC Bacterial Artificial

vi

INTRODUCTION

Extremophiles are classified as organisms that inhabit physically or geochemically extreme environments and are of significant interest to science (Rothschild 2001). These organisms extend the boundaries of survival for parameters such as temperature, pH, salinity, water and pressure). Most belong to and (Rothschild 2001), yet they expand into Eukaryota as well. Currently extremophiles have been the focus of fewer research studies compared to the other two domains within the three-domain system

(Rothschild 2001).

One newly discovered nematode is Halicephalobus mephisto (The Devil Worm), which was discovered in 2011 in the Beatrix Gold Mine in South Africa, 1.3km below the surface. H. mephisto of Phylum Nematoda, is the first multicellular discovered at such sub-terrestrial depths. The environment it inhabits provides multiple survival challenges, leading to it being classified as an extremophilic nematode (Borgonie et al., 2011). Environmental temperature was recorded at 37.2°C, which is the highest temperature recorded for a free-living nematode, and the dissolved oxygen was recorded at 13-72 μM (0.42-2.3 mg/L); the minimum level required for fish to survive is 4-6mg/L (Utah State University, 2018). Put another way, H. mephisto can survive with ~10-40 fold less than standard surface water oxygen levels (Borgonie et al., 2011).

The combined effect of low oxygen levels in the water and an elevated temperature create extremely harsh survival conditions for H. mephisto. The water containing the H. mephisto was carbon-dated to be 3,000-12,000 years old and contained minimal tritium (Borgonie et al., 2011).

This leads to the conclusion that the water was not mixing with any surface water and is an isolated environment. Therefore, the evolution of the organism H. mephisto occurred within an

1

‘underground Galapagos’ akin to a subsurface island. The existence of H. mephisto proves that not only bacteria or fungi can live in such harsh oxygen deficient conditions but also multicellular , thereby making its genome highly interesting to science.

Due to their adaptive ability , also known as roundworms, are a highly diverse phylum and one of the most abundant, with organisms in nearly every type of habitat (Corsi et al., 2015).

Figure 1: Phylogenetic Tree of Nematode Phylum Containing P. redivivus 99 protein tree including C. elegans and P. redivivus, and position to H. mephisto. Image credit: Maggie Lau from Princeton

The genomes of over one hundred nematodes have been sequenced, including Trichinella spirilus (Mitreva and Jasmer, 2008), Caenorhabditis briggsae and Caenorhabditis elegans

(Martin et al., 2014). C. elegans was the first multicellular animal ever sequenced and is the most widely used model for nematodes (Corsi et al., 2015). However, C. elegans is not the closest relative to H. mephisto since it belongs to the family , (Corsi et al., 2015) while H. mephisto belongs to family (Borgonie et al., 2011). A closer relative to H.

2

mephisto was recently sequenced, Panagrellus redivivus, also known as “the microworm” providing a more adjacent species for evolutionary comparison (Srinivasan et al., 2013). This is displayed in Figure 2, which shows the placement of P. redivivus within the nematode phylum

(Srinivasan et al., 2013). Figure 1, shows the current phylogeny of H. mephisto based on rDNA data from Bayesian inference (Borgonie et al., 2011).

Figure 2: Phylogenetic Tree of H. mephisto Recent phylogenetic tree is based on small-subunit rDNA data using Bayesian inference (50% majority rule). Image credit: G Borgonie et al. Nature. 474, 79-82 (2011) doi:10.1038/nature09974

With the availability of sequence technology in the growing field there is the ability to sequence various classes of organisms. Micro-aerobes, like H. mephisto, have various medical and agricultural applications and could lead to possible solutions for environmental damage especially pollution (Urbieta et al. 2015). Nematodes are considered one of the most successful

3

organisms in terms of survival as they are in nearly every habitat and occupy various trophic level as well as biological niches (Peters 2013). They are increasingly being utilized as bioindicators for ecological and toxicity purposes, because of their thin and absorbent skin layer rendering them highly sensitive to chemical alterations in their environment. This makes them excellent indicators for chemical pollution in water and soil areas (Peters 2013). This motivates efforts to sequence their entire genomes, particularly for species living in unusual environments

(Urbieta et al., 2015).

Since its discovery in 2011, the Bracht lab has been involved in sequencing the H. mephisto genome and much progress has been made to date. Illuminia sequencing was utilized to determine the total genome size of 61.4 Mb. Running the genome through Maker2, a protein prediction software, yielded predicted proteins, albeit without any RNA data (S. Allen unpublished data). The results illustrated that H. mephisto does share proteins with other nematodes, however, 46% of proteins possessed by H. mephisto have no known homologs in other nematodes outside the Halicephalobus. This high level of novel proteins is consistent with other nematodes sequenced and the evolutionary plasticity of nematode genomes

(Rödelsperger et al., 2013), but also raises questions of how H. mephisto evolved. Two genes,

AIG1 and Hsp70, were found to be over copied compared to other proteins. AIG1 is a GTP- binding protein and functions in the stress responses of both plants and animals where it has a role in initiating and regulating the immune response (Wang and Li, 2009). Interestingly, it has not been reported in any other nematode and when discovered was reported not to be present in

C. elegans (Nitta et al. 2006).

The other over-amplified gene, Hsp70, is known to exist in other nematodes, but never at such high levels (over 100 copies), which makes it of particular interest (S. Allen unpublished

4

data). Heat exposure causes the accumulation of misfolded or denatured proteins, which alters the proteins’ function or renders them inactive; Hsp70 is a chaperone protein that “fixes” these misfolded or denatured proteins by re-folding them (Murphy 2013). Hsp70 chaperones are stress-induced meaning they respond to strains, including elevated heat, to activate. This family is known to maintain protein homeostasis and protect proteins from degradation (Murphy 2013).

Some previously identified Hsp70 sequences in H. mephisto were more similar to prokaryote

Hsp70 proteins rather than eukaryotic versions (S. Allen unpublished data). However, Hsp70 is not restricted to prokaryotes; in fact most organisms have at least one copy (Murphy 2013).

Therefore, it is possible H. mephisto acquired Hsp70 from prokaryotes, and then copied the gene repeatedly, as a way to survive in its environment.

However, gene presence is not enough. Gene presence identifies the genetic make-up of an organism, but gene expression illustrates how these genes are employed to give that organism a survival advantage. This is the goal of my thesis: to determine gene expression for H. mephisto.

RNA-seq data has been acquired prior to my joining the Bracht lab, and my thesis is to process this data and produce meaningful biological insight. This was achieved by first annotating the genome with RNA data to more accurately determine the specific genes comprising the genome.

Once the genes were more confidently identified, their expression was be explored under normal and warm conditions; with the prediction that heat adapted genes will be statistically more highly expressed under warmer conditions of 38° and 40° relative to the 25° samples.

5

OVERVIEW OF METHODS UTILIZING GENOMIC PROGRAMS

The process of annotating the H. mephisto genome was completed through computational genomic methods, which utilize multiple software programs in a specific order. The output from one software acted as the input for the next software program, allowing the data to build upon each other to achieve the end result. To begin the RNA-informed annotation process, the preliminary RNA sequence data had to be assembled into sequences, to generate RNA transcripts

(the transcriptome). The transcriptome generated by Trinity [Table 2] along with the genome

Figure 3: The Computational Programs Utilized During this Research Proposal. The order is shown with the arrows going in the direction toward the next program.

assembly (mephisto_omega.fasta, generated prior to my joining the Bracht lab), acted as an input to run Maker2, a gene annotation software. The output of Maker2 consists of a set of

6

comprehensive gene predictions (Yandell et al., 2012). From these predicted proteins, domains were identified, counted, and compared to C. elegans, the best-annotated (and free-living) nematode. Finally, we measured their expression levels with TopHat2-SringTie-Ballgown suite, and wrote the second Halicephalobus mephisto manuscript, (after its discovery). This paper is currently under revision at Nature Communications. The computational steps are schematically shown in Figure 3.

7

INPUT DATA

The raw input data was RNA data for H. mephisto. RNA was extracted by Sarah Allen, from H. mephisto, and sent to an off-site laboratory, Mount Sinai Genomics Core in New York, for sequencing. In order to mimic the temperature environment where the worms were found, the worms had been cultured at 25°C, 38°C and 40°C. The sequencing was paired-end, so a forward

(R1) and reverse (R2) read were obtained for each RNA fragment (24 total files), and quality analysis confirmed that the paired files were identical in read pair total number. The number of read pairs were determined for each RNA sample shown in Table 1, and the temperature the H. mephisto was cultured in was noted. The read length was 50bp for all the RNA samples.

Table 1: Preliminary RNA samples from H. mephisto All RNA samples generated that were sequenced to determine respective read pairs at different temperatures. RNA Sample Read Pairs Temperature (oC) (total#) 10 85,558,677 40 11 81,322,974 40 12 75,062,353 40 14 81,153,241 40 M1R 87,659,136 38 M2R 76,497,401 38 M3R 80,308,441 25 M4R 82,353,707 25 M5R 84,313,869 25 M6R 86,409,522 38 M7R 72,836,439 40 M8R 75,232,037 40

This data needs to be used for two things: 1) new (better) gene predictions using Maker2, and 2) gene expression analysis using Tophat2, StringTie, and Ballgown. For input to Maker2,

RNA samples were assembled because Maker2 cannot take raw reads as input. We decided

8

against assembling all 12 read-pairs, instead choosing one read-set from each culture temperature

(25°C, 38°C, and 40°C) to ensure that each was represented in the final transcript data. Trinity, a transcriptome assembler (Yandell et al., 2012) was utilized to analyze the RNA-seq data by generating transcripts from read sets 10 (40°C), M4R (25°C) and M6R (38°C). Trinity takes the short 50bp read lengths of the RNA sequence data, and overlaps them to build transcripts (Haas et al., 2013). The combined transcripts assembled by Trinity are the RNA transcriptome, since a transcriptome is the sum of all the RNA data (Haas et al., 2013). The number of transcripts generated by Trinity for each sample are shown in Table 2. Interestingly, we obtained more transcripts with higher temperatures, and the by far greatest number of transcripts from the 40° sample, consistent with the idea that many genes are turned on as an adaptive response to heat.

Table 2: Transcripts Generated by Trinity Transcript numbers generated from chosen RNA samples. RNA Sample Read Pairs Temperature (oC) Number of (total#) Transcripts 10 85,558,677 40 48,483 M4R 82,353,707 25 35,468 M6R 86,409,522 38 36,726

The final genome assembly (mephisto_omega.fasta) was created by Dr. Bracht by a combination of Illumina and PacBio sequencing. This genome assembly is summarized in Table

3 and was provided to Maker2 for gene annotation. Because it was created prior to my project with RNA-seq data, it is not described further here.

Table 3: Final Genome (Omega) Assembly The final genome assembly final containing specified genome assembly statistics. Total Assembly (bp) Longest Contig (bp) Total Scaffolds N50 (Kbp) 61,428,783 2,546,343 880 313

9

MAKER 2

Maker2 was utilized to make comprehensive gene predictions for H. mephisto. It begins with repeat identification, followed by the RNA-seq (transcripts) and proteins being aligned to the genome to make evidence-supported gene predictions (Yandell et al., 2012). There are two types of gene predictions that are made, non-evidence (ab initio) driven and evidence driven.

Non-evidence driven predictions use Augustus and SNAP, algorithms that only take the sequence characteristics into account, meaning Maker2 uses no external evidence (RNA or protein) supporting these predictions. Maker2 also incorporates expressed sequence tags (ESTs)

(in our case, 3 assembled H. mephisto RNA-seq transcriptomes combined) and protein-based evidence hints (in our case, 28 sequenced nematode protein sets were used) (Yandell et al.,

2012). The combined transcriptomes from Table 2 were fed into Maker2 in order to generate the gene predictions shown in Table 4. Gene predictions determine the gene locations and in turn the proteins’ identities. Both overlapping (evidence-driven) and non-overlapping (non-evidence- driven) genes were generated. Since genes code for both RNA and proteins, predicted transcripts and proteins are output by Maker2. Therefore in Table 4, protein and gene can be used interchangeably. The non-overlapping proteins are the low confidence genes and overlapping proteins are high confidence genes (with some form of supporting evidence). The total of both non-overlapping and overlapping proteins were utilized for domain counting to ensure no genes were accidently looked over or missed in the domain counts.

Table 4: Gene Predictions from Maker2 Predicted gene numbers for the sequence set 10, M4R and M6R are given. Overlapping Proteins Non-overlapping proteins Total Proteins 11,587 5,197 16,784

10

DOMAIN ANALYSIS

The Hidden-Markov Model-based algorithm, HMMER identifies protein domains (Eddy

2011). The E value determines statistically significant alignments, and reflects the probability of matching by chance (error) (Ochoa et al., 2015). Matches by error should not be counted in the domain counting since doing so would yield inaccurate results. A minimum of 10-4 (one chance match in 10,000) was pre-determined for E value. Domain analysis indicates the richness or reduction of certain domains in an organism. In this study domain counts of H. mephisto was compared to C. elegans using a python script to determine domain specific gene presence or absence in H. mephisto. Both over-lapping and non-overlapping proteins (genes) from Table 4, were used for the domain analysis to generate a combined scatter plot (Figure 4). The majority of domains are of reduced abundance in H. mephisto and over-represented in C. elegans, suggesting selection or gene loss in H. mephisto. Others fell on a 1:1 line. The two previously over copied genes determined found in H. mephisto, AIG1 and Hsp70 were confirmed in Figure 2 to again be over-represented (S. Allen unpublished data). 168 AIG1 domains were found in H. mephisto, but none in C. elegans. 142 Hsp70 domains were found in H. mephisto, and fewer in C. elegans with only 20.

11

10000

1000

100

Hsp70

C. elegans Protein Domains 10

AlG1 1 1 10 100 1000 H. Mephisto Protein Domains

Figure 4: Protein Domain Analysis. The domains are represented by numerical values, and the same known protein domains were searched for in both organisms. (The 1:1 line displays equal copies of the protein and therefore gene in both organisms. If a point is above the 1:1 line then its over-represented in C. elegans while points below the 1:1 line are over-represented in H. mephisto. The two domains over- represented in H. mephisto are colored red and labeled as Hsp70 and AlGl.)

12

GENE EXPRESSION ANALYSIS BY TOPHAT2-STRINGTIE-BALLGOWN

Once genes are predicted, we wanted to know relative expression. To do that we used the

Tophat2-StringTie-Ballgown suite of programs. Tophat2, aligns the RNA-seq data against a specified genome with Maker2 genes as a input.gff3 file (Trapnell et al., 2012). Therefore,

TopHat2 was given the genome of H. mephisto, both the forward (R1) and reverse (R2) reads for each RNA sample from Table 1, and the annotation from Maker2. The TopHat program requires that each RNA sample be run separately, using both the forward (R1) and reverse (R2) reads.

TopHat2 aligns RNA-seq data to the genome in order to determine splice sites in the genome

(Trapnell et al., 2012) StringTie takes alignment files from TopHat2 and assembles them to generate a transcriptome annotation, as well as quantify the expression levels and estimate the abundance of each transcript (Pertea et al., 2016). Ballgown plots the gene abundance and expression data for visualization in R, from the StringTie output data (Pertea et al., 2016). These programs provided the expression data.

Transcriptome analysis in H. mephisto was compiled into a boxplot. The boxplot illustrates the comparative expression levels of Hsp70 and AIG1 under heat (38 and 40°C) and normal (25°C) temperatures. The boxes represent an average of the total transcripts(s) found instead of a single transcript at the given temperatures. A change in expression meant the gene is being expressed more at one temperature relative to another. Hsp70 genes had greater expression at 38-40°C compared to 25°C, while AIG1 expression was unchanged between temperatures.

This is consistent with Hsp70's known heat stress-related function (Murphy 2013). We also conclude that Hsp70 appears to be contributing to H. mephisto's thermotolerance while AIG1 may be utilized for a different abiotic stress. Overall, the expression of Hsp70 and AIG1 had

13

lower expression levels and lower expression on a per-gene basis than the average for the other genes in the genome at both temperatures.

RNA expression data of H. mephisto was compiled into a volcano plot (Figure 6). It contained both statistically significant and insignificant genes. The volcano plot contains all genes of the complete H. mephisto genome that could be translated, totaling 28,068 genes. The volcano plot is comprised of four quadrants, statistically significant up-regulated and down- regulated and statistically insignificant up-regulated and down-regulated. The q-value threshold for significance was less than 0.05 and upregulated or downregulated at least 2-fold. The fold change reflects up- and down-regulated in heat conditions (38-40°C) relative to normal (25°C) conditions. Our analysis yielded 920 statistically significant genes, 675 down-regulated and 285 up-regulated. ARMET and Hsp70 were upregulated with ARMET having the second-highest fold-change of all with a 44-fold upregulation under warm conditions (the other highly upregulated gene is unknown, Figure 6). The exact function of ARMET is unknown, but its suggested role is being involved in degrading misfolded proteins in the endoplasmic reticulum

(ER) to combat ER stress (Mizobuchi et al., 2007). Therefore, it is thought to share a similar function to Hsp70, in that it deals with damaged proteins but specifically in the ER (Mitzobuchi et al, 2007; Murphy 2013).

Hsp70 was not as upregulated enough to reach the 2-fold threshold in the volcano plot, but did go up 1.5-fold (Figure 5). However having both ARMET and Hsp70 upregulated enlines with their suggested and known functions of both being involved with handling protein damage due to heat (Mizobuchi et al., 2007; Murphy 2013). Another significantly up-regulated gene was

Bax inhibitor-1 (BI-1) at 4-fold up-regulation. BI-1 is an ER-stress combatant (Cai et al., 2018).

BI-1 combats ER stress by acting as an anti-apoptotic protein (Cai et al., 2018). It binds and

14

suppresses IRE1a endoribonuclease and kinase activity, in turn stopping the stress-induced cell death pathway and promoting cell adaptation and survival instead (Sano and Reed, 2013).

Figure 5: Transcriptome Analysis of Gene Expression in H. mephisto. The box expresses the median, first and third quartiles, while the whiskers specify the 15th and 85th percentiles and notches indicate confidence intervals. For All Genes 28,068 transcripts were included in the analysis while for Hsp70 and AIG1, 142 and 57 transcripts are shown. (Filtering for < 5 FPKM across all replicates was performed to remove potential pseudogenes or misannotations).

15

Figure 6: RNA Expression Data Analysis of H. mephisto. Volcano plot of statistically significant and insignificant genes. The fold change is given to indicate up- or down-regulation of specified genes.

16

GENE ONTOLOGY AND MANUAL INSPECTION OF TOP 20 UP- AND DOWN-REGULATED

To discover potential trends in biological processes and/or molecular function,

Interproscan was utilized. Interproscan combines different protein signatures from numerous databases together to complete protein analysis. To accomplish this, InterPro compares the target protein sequence against non-redundant protein databases (Quevillon et al., 2005) and classifies the sequences into families and predicts the presence of domains or other important sites. As output, an IPRO number was provided as well as Gene Ontology (GO) term assignments which were exploited for additional information. Subsequently, GO analysis was conducted to determine common functionality between proteins by utilizing Python’s GOATools package

(Table 5). GO analysis determined that cuticle components and peptidases were down-regulated.

This is shown in Table 5 as both cytesine-type peptidase activity (GO:0008234) and structural constituent of cuticles (GO:0042302) were down-regulaed as they were enriched in the down- regulated study. The ratios were 133 and 89 both out of 28062 respectivly. Bonferroni was utilized to correct the p-value for the number of runs. GO analysis could not distinguish a trend in up-regulated genes since many of the up-regulaed genes were unknown.

We also evaluated the top twenty down-regulated and up-regulated statistically significant genes by eye based on Interproscan classification, confirming the overall trend that peptidases were down-regulated and transcriptional and metabolic proteins were up-regulated.

The fold-change of each gene was noted as well provided additional discoveries. Of the statistically significant up-regulated genes with recognizable similarity to known genes.

17

One of the most interesting genes we found by eye was HMEPH_10756-RA had the highest fold-change value of 44.35 with the next highest being 7.91. This HMEPH_10756-RA gene codes for two proteins: 1) ARMET, is thought to be involved with degrading misfolded proteins in the endoplasmic reticulum (ER), thus it acts to contend ER stress (Mizobuchi et al.,

2007), and 2) SAP which is a nuclear protein domain involved in DNA repair, transcription and

RNA processing or programmed chromatin degradation (InterPro Website). As stated previously the exact function of ARMET is unknown but it is believed to be involved with ER stress as a quality control alert (Mizobuchi et al., 2007). BI-1 displayed a 4-fold value which is another conserved anti-apoptotic protein against ER stress (Cai et al., 2018; Sano and Reed, 2013).

Table 5: GO Terms for H. mephisto GO terms enriched in H. mephisto of genes downregulated under heat stress.

18

TREE BUIDLING

Trees were constructed to determine phylogenetic relatedness of H. mephisto AIG1 and

Hsp70 genes to other related proteins. Protein-coding sequences of 5 other nematode worms were downloaded from WormBase to combine with all of the H. mephisto genes. Outgroups of non-nematodes were put in as well, dujardin, Rhizophagus irregularis, Arabidopsis thaliana and Homo sapiens. These combined sequences served as the raw data to build Hsp70 and AIG1 genetic trees utilizing Geneious. The first step was generating a proper protein alignment with MAFFT and that was refined over multiple rounds. The purpose of this repetitive process was to remove problematic sequences creating large gaps and deletions to achieve a high-quality alignment. Protein sequences above 1,000bp and below 600bp were removed immediately for the Baysien trees as they were deemed too long or too short in length, meaning they would disrupt the alignment. Once the alignment was satisfactory, MrBayes was run to produce Bayesian trees of both Hsp70 and AIG1. The Hsp70 tree was rooted on E. coil

AAA18300 while AIG1 was rooted on Ras_Ce (Ras protein let-60 Caenorhabditis elegans). The

Bayesian tree for AIG1 did not resolve well, so a RAxML tree was built for AIG1 with Ras_Ce still being the root. To evaluate robustness of the tree topology 100 iterations of bootstrap analysis was run for each tree generation. In the end, the RAxML tree was used for AIG1 and

Bayesian for Hsp70.

The phylogenetic trees of Hsp70 and AIG1 illustrated a unique and separate expansion of these two genes in H. mephisto. The Bayesian analysis recovered known paralogs specific to cellular compartments such as mitochondrial and endoplasmic reticulum (ER) (Yu at al., 2015) along with a distinct clade that grouped human, mouse, and nematode genes together (Cluster I,

19

Figure 7). The most fascinating aspect of both the Bayesian tree (Figure 7) and RaxML tree

(Figure 8) was the 37-member and 45-member H. mephisto-only groups respectively. In the

Bayesian tree this H. mephisto-only group was most closely related to a novel Diploscapter clade of 53 members (Cluster III, Figure 7). This data from Bayesian analysis suggests that the Hsp70 family of proteins has undergone significant amplification within these two nematode lineages, owing to their evolutionary distance and inhabiting hot environmental conditions. This supports that this extension occurred instead of H. mephisto inheriting Hsp70 from a common ancestor.

Figure 7: Bayesian Phylogenetic Tree of Hsp70. H. mephisto sequences marked with asterisk (*) and D. packys marked with an arrow. Branch numbers indicate the number of times that branch was made out of the total runs while scale bar represents substitutions per site. 20

Figure 8: RAxML Phylogenetic Tree of AIG1. H. mephisto sequences marked with asterisk (*) and two fungal (Rhizophagus irregularis) highlighted with light red boxes. Branch numbers specify bootstrap support and scale bar represents substitutions per site.

21

CROSS-SPECIES HSP70 COMPARISON AND VENN DIAGRAM

The number of Hsp70 genes was compared between free-living and parasitic nematodes.

An expansion in Hsp70 was apparent in free-living nematodes while parasitic nematodes lacked the expansion, suggesting Hsp70 as a general strategy for coping with environmental thermal stress not host thermal stress (Figure 9).

The website OrthoVenn (Wang et al., 2015) was utilized by uploading the genome of each selected species into the chosen template to generate a Venn diagram (Figure 10). This was to identify the ontology between the H. mephisto genes, D. melanogaster, C. elegans, and P. redivivus. The Venn diagram identified a set of 2,164 genes shared among all nematodes, and

3,233 shared among all four invertebrates. H. mephisto had superior similarity to P. redivivus, shown by a gene overlap of 1,563, while the overlap between H. mephisto and C. elegans was only 90 genes. This ontology result supports the conclusion that P. redivivus is a closer nematode relative to H. mephisto then C. elegans.

Figure 9: Graph Displaying the Number of Hsp70 Genes Across Nematode Species. Chart quantifies proteins not domains.

22

Figure 10: Venn Diagram Comparing Orthologous Genes Clusters between H. mephisto, P. redivivus, C. elegans and D. malangogaster.

23

CONCLUSION

In order for any organism to survive including , it must be capable of coping with environmental and genetic stresses it encounters (Oakes and Papa, 2015). One such stress is endoplasmic reticulum (ER) stress, resulting in the accumulation of misfolded proteins in the ER from stress impeding protein folding, modification and secretion capabilities. The long-term consequences are harmful impacts on cells and even cell death (Oakes and Papa, 2015; Sano and

Reed, 2013). To combat this the ER has evolved a method to detect and resolve misfolded proteins, by triggering an evolutionarily conserved cellular stress response, unfolded protein response (UPR). UPR in turn functions to restore homeostasis and functionality of the ER (Sano and Reed, 2013; Wu et al., 2014) before it becomes a threat to survival of the organism (Oakes and Papa, 2015). Interestingly, we found upregulated genes in H. mephisto are related to each other through the ER stress pathway involving unfolded proteins. We identified strong up- regulation of chaperone Hsp70 and ER-stress combatants ARMET and BI-1 under warm temperatures (38-40°C). All these genes promote cellular survival or assist repair of heat-related protein damage that would threaten organismal survival.

We found 675 genes downregulated and only 285 upregulated on heat stress. The trend toward reduction in mRNA or protein expression at higher temperatures (Figure 6) is consistent with the unfolded protein response (UPR), which induces degradation of ribosome-bound transcripts (Hollien and Weissman 2006). The first step occurs through the sensor PKR-like ER kinase (PERK), which inhibits eIF2α activity through auto-phosphorylation, slowing global protein translation (Oakes and Papa, 2015; Wang and Kaufman, 2016). This provides the cell time to resolve the backlog of proteins already present in the ER and prevents the situation from

24

escalating further (Oakes and Papa, 2015). This mRNA degradation is mediated by IRE1α, which has an endoribonuclease domain activity which initiates splicing of XBP-1 mRNA to produce active transcription factor. XBP-1, in turn up-regulates UPR genes that increase ER size and function along with ATF6 (Sano and Reed, 2013; Oakes and Papa, 2015) to promote cell survival (Sano and Reed, 2013). A second role of IRE1a is to selectively target and degrade mRNAs due to their localization to the ER membrane and amino acid sequence they encode

(Hollien and Weissman, 2006). This degradation of mRNA and in turn reduction in mRNA expression is the Regulated IRE1a Dependent Decay (RIDD) pathway of the UPR which relieves the immediate protein load (Sano and Reed, 2013; Hollien and Weissman, 2006).

Surprisingly, in Schizosaccharomyces pombe, IRE1 recognizes BIP1 mRNA (which encodes an Hsp70-family protein) but cleaves the 3’UTR with the counterintuitive effect of stabilizing the mRNA and increasing its translation (Kimmig 2012; Wu et al., 2014). Therefore, it is complementary to other UPR mechanisms because it potentially halts production of proteins that challenge ER recovery and clear translocation and folding machinery to facilitate the re- folding process (Hollein and Weissman, 2006). Overall the combination of halting global protein synthesis, degrading mRNA associated with the ER membrane and increasing ER coping mechanisms during the UPR, gives the cell time to resolve the backlog of proteins present in the

ER, facilitating a return of the cell to homeostasis and continued cell survival (Oakes and Papa,

2015). This is just one multiple stress pathway evolved by organisms to monitor internal systems and maintain homeostasis by triggering stress-induced regulatory mechanisms (Wu et al., 2014).

An important component of UPR in H. mephisto is Hsp70, which also combats the accumulation of misfolded proteins but from prolonged heat exposure. Hsp70 is a stress-induced chaperone that “fixes” misfolded or denatured proteins by re-folding them to maintain protein

25

homeostasis and protect proteins from degradation, when activated by elevated heat (Murphy

2013). One of the most strongly heat-induced genes in H. mephisto was arginine-rich, mutated in early-stage tumors (ARMET), which shared a similar function as it is involved with the UPR response in the endoplasmic reticulum (ER) (Mizobuchi et al., 2007) Inside the ER ARMET interacts directly with the ER Hsp70 protein BiP / GRP78 to mediate UPR and bring the ER back to homeostasis (Mizobuchi et al., 2007). ARMET was up-regulated by 44-fold by heat in H. mephisto (Figure 6) which was the highest fold change observed for an identifiable protein. The abundance of ARMET is consistent with appearance of the over-amplified gene HSP70 as it shares a similar function of responding to the protein damage due to heat, but they work in different cellular compartments, Hsp70 in cytosol while ARMET is in ER (Mizobuchi et al.,

2007). This supports co-expression, as under hot conditions it may be vital for H. mephisto to express both simultaneously, especially if they jointly protect the organism from damage due to heat but in different organelles.

In order for heat shock proteins to be beneficial they must be activated by the heat shock response, which is a cellular mechanism utilized by all organisms. Mammalian heat shock factor,

HSF1, activates heat shock protein genes (HSPs) such as Hsp70, which as described above encode molecular chaperones which will refold or destroy damaged proteins (Brunquell et al.,

2017). H. mephisto was found to have 1 copy of this HSF protein. However, in warm conditions with upregulated Hsp70, HSF levels remained unchanged. This is important because in most organisms including C. elegans HSF1 gene is the regulator of Hsp70 (Brunquell et al., 2016;

Bunch 2017), so this result supports that H. mephisto utilized a novel transcriptional regulator for its Hsp70 that remains unknown at this time.

26

The other original significantly expanded family, AIG1 did not have elevated expression under warmer (38-40°C) conditions. Therefore, the expansion of AIG1 in H. mephisto likely does still mediate abiotic stress but not stress from elevated heat. In Arabidopsis AIG1 responds to heat (Liu et al., 2008), while in mammals AIG1 is involved in immune system functioning to inhibit apoptosis during T-cell maturation (Nitta et al., 2006). Given that AIG1 was not activated by H. mephisto under heat, we posit these genes are involved in responding to hypoxia or other abiotic non-thermal stresses present in the deep terrestrial subsurface. AIG1 still likely assists H. mephisto in its extremophilic survival but not in terms of heat elevation like Hsp70.

Heat induced the up-regulation of another gene in H. mephisto, Bax Inhibitor-1, BI-1 (Cai et al., 2018). This gene is a conserved anti-apoptotic protein that prevents ER stress-induced apoptosis (Cai et al., 2018). This is accomplished by BI-1 interacting, binding and suppressing

IRE1α activity by cancelling its endoribonuclease and kinase activity, in order to promote cell adaptation and survival (Sano and Reed, 2013) by preventing the compeletion of the stress- induced cell death pathway. Therefore, BI-1 acts as negative regulator of IRE1a to suppress cell death from ER stress (Lisbona et al., 2009). In addition BI-1 triggers early adaptive responses to allow the cells to cope and recover from ER stress, such as increasing XBP-1 levels and upregulation of UPR target genes (Lisbona et al., 2009). Since BI-1 is another ER stress pro- survival factor (Cai et al., 2018), BI-1 possibly compensates for the lack of heat induction of

AIG1, as AIG1 plays a similar role inhibiting apoptosis (Wang and Li, 2009), but may be more specifically tuned to the UPR response.

H. mephisto demonstrates a dramatic expansion of Hsp70 paralogs, but there is also a delicate balancing act that exists in terms of gene expression. High expression of Hsp70 causes development, fertility and growth damage to organisms (Sorensen et al., 2003), however, H.

27

mephisto needs to effectively harness Hsp70 to survive. H. mephisto may have engineered a way around this problem. Though Hsp70 gene is upregulated in heat, the expression is kept statistically low on a per-gene basis in comparison to all its other genes. The Baysian tree demonstrated a 37-member expansion of Hsp70 specific to H. mephisto (Figure 7). The combination of Hsp70’s low per-gene expression and high copy number, demonstrates a unique molecular balancing act H. mephisto has struck. Due to the higher copy number gene dosage of

Hsp70 is effectively 51-fold higher than a single-copy gene would be meaning fewer Hsp70 genes can be expressed to achieve the same overall heightened expression effect. These Hsp70s being expressed at low-to-moderate levels may bypass the health cost to development, fertility and growth imposed by high Hsp70 expression (Sorensen et al., 2003). We hypothesize that the harmful effect is due to the amout of energy consumed by Hsp70 to re-fold proteins so by having more of them, the energy consumption of each individual Hsp70 is reduced. In the end, the overall result of heightened Hsp70 expression during times of heat stress is achieved across a greater volume of Hsp70 gene copies. We hypothesize that this balancing act has evolved in H. mephisto as a remedy to these deleterious effects, while providing the “as needed” expression in elevated heat. Further confirmation of the “as needed” expression is that Hsp70 showed reduced expression at cooler temperatures (Figure 5), as it serves no function at lower temperatures. H. mephisto illustrates its ability to sufficiently harness Hsp70 to survive without jeopardizing its health or growth.

We conclude that Hsp70, ARMET and BI-1 protect H. mephisto from elevated heat by maintaining proper protein structure and inhibiting cell death pathways linked to the unfolded protein response (Brunquell et al., 2017; Mizobuchi et al., 2007; Murphy 2013), while AIG1 is responsible for protection from a non-thermal stressor. Our work demonstrates a diversity of

28

mechanisms harnessed by H. mephisto for coping with the subterrestrial environment that are centered largely on managing unfolded protein stresses while upregulating Hsp70 and inhibiting apoptosis.

29

APPENDIX

WORM CULTURE AND PRELIMINARY EXPERIMENTS WITH DIPLOSCAPTER PACHYS

The expanded Hsp70 gene family exists in another free-living nematode, Diploscapter spp. (Figure 9) which has some evidence in the literature for heat-tolerance. However, this has never been rigorously tested in a laboratory setting. Therefore, an additional experiment was conducted on Diploscapter pachys to confirm this thermophilic nature. Diploscapter was discovered in 40°C thermal water (Lemzina and Gagarin, 1994) and cultured in mature and soil at 30°C (Gibbs et al., 2005). We obtained D. pachys from the Gunsalis lab at NYU and tested its thermal tolerance in the laboratory, by placing the worms in an incubator at 35°C. C. elegans was used a negative control as it is known to be unable to survive at high temperatures, therefore,

C. elegans should die since their maximum growth temperature is 25°C. The C. elegans was derived from a 1mL frozen stock that we revived. NGM plates were made according to

Stiernagle T. (2006), to maintain both D. pachys and C. elegans stocks. The preliminary trials were run at 30°C and 35°C. C. elegans died after a day at both temperatures. D. pachys was able to survive at both temperatures but couldn’t successfully reproduce at 35°C as the eggs would not hatch as if in an arrested state.

The D. pachys stock was contaminated upon arrival which then transferred to stock plates. This contamination had to be rectified to be able to acquire a sterile colony for the factual experiment, since contaminated worms could confound or alter the results. Spot bleaching was done according to Stiernagle T. (2006), in order to clean the worms and rid the contamination to

30

yield the desired sterile population. However, D. pachys died within a week of the contamination being removed. This suggested a symbiotic relationship between the contamination and the D. pachys and that the D. pachys needs this contamination in order to survive. To better understand this possible connection, the unknown contamination was identified by sequencing its 16S ribosomal RNA. It was discovered that the unknown contaminant was most closely related to

Stenotrophomonas maltophila. S. maltophila is a bacterium found in variety of environmental habitats including extreme ones, in addition to being involved in plant growth and the nitrogen cycle (An and Berg, 2018). We hypothesize its importance in the case of D. pachys is as a nitrogen-fixing bacterium and it provides some compound needed by the worm. Eukaryotes are unable to make their own nitrogen so they must obtain pre-fixed nitrogen directly through symbiotic relationships with nitrogen-fixing prokaryotes (Kneip et al., 2007). It is hypothesized that D. pachys needs S. maltophila for this reason, resulting in a symbiotic relationship between these organisms. We isolated the bactiera in pure culture and made freezer stocks in the -80.

We found that bleach does not kill S. maltophila. The bacteria would remain at the edges of where the bleached chunk was removed, causing S. maltophila to re-contaminate the plate days after the chunk was removed. This meant that alternative methods to remove the S. maltophila had to be investigated. Antibiotics were investigated next to determine if that could be added into the plates to prevent the S. maltophila from growing. If antibiotics were used then an alternative strain of E. coli would have to be used as a food source since the standard E. coli would be killed off by the antibiotic, leaving D. pachys without a food source. Therefore, resistant strains of E. coli were tested against selected antibiotics to ensure resistance. The antibiotics tested were 1:5 Trimethopsim-Sulfamethoxazoal, Tetracycline and Chloramphenicol, all of which were able to inhibit growth of the bacteria. We also tested ampicillin and kanamycin

31

but neither of them worked to inhibit growth for this bacteria. Chloramphenicol was selected for use in the NGM plates. We used an E. coli transformed with a Bacterial Artificial Chromosome

(BAC) yielding chloramphenicol resistance, called strain 102 derived during Dr. Bracht’s postdoctoral research. This was the D. pachys food source on the engineered Chloramphenicol plates. Because D. pachys appears dependent on metabolite(s) produced by S. maltophila , we needed to incorporate them into the plates but in a way that wouldn’t allow bacterial growth. In order to solve this, a 2-day culture of S. maltophila (37°C in LB) was spun down (200g for 2 minutes), then the supernatant was filter sterilized with an 0.2μM filter, producing a spent-media extract that was infused directly into the chloramphenicol plates. Therefore, the custom NGM plates were infused with S. maltophila cell-free spent media, +Chloramphenicol (25µL per plate) and E. coli strain 102. The worms were picked into these plates with bleach and in one successful trial the worms grew healthy and successfully reproduced free of any contaminating bacteria. It appears the custom plates provide D. pachys with the necessary “ingredients” for both survival in culture and completion of their life cycle in absence of contaminating

Stenotrophomonas. In the future the metabolite involved in the symbiotic relationship can be identified by running mass spectroscopy on the extract, which we have stored in the freezer for this purpose.

32

CREATING LAB STOCKS OF H. MEPHISTO

H. mephisto stock was recently sent to the Bracht lab and have just begun to be cultured.

A frozen stock of H. mephisto was successfully made following Stiernagle T. (2006) and

Barrière A., Félix M.-A (2014). A Bearmann funnel was utilized to isolate and collect H. mephisto worms from a plate containing E. coli. A rubber tube was fitted to a funnel with the tube closed with a clamp. The tube was then filled with M9 up to the neck of the funnel. A plate was selected with a significant amount of H. mephisto adults to ensure a quality yield. The funnel was lined with a piece of sieve before the contents of the plate was scooped out into the sieve and sieve folded over it to enclose the sample. M9 was poured onto the sample enough to wet it but not fully submerge it. Active H. mephisto swim toward the M9 and in turn swim into the tube and once in the M9, they will be preserved. The Bearmann funnel was left overnight to provide the H. mephisto worms enough time to crawl out of the sample. The next day the tube was un- clamped and the liquid collected (Barrière and Félix, 2014) into two 50mL centrifuge tubes.

Before the freezing solution was made both of the 50mL tubes were centrifuged at 200g for 2 minutes in a 2’ swing bucket in order to get most if not all the H. mephistos to settle to the bottom of the tube. Then M9 not containing H. mephisto was aspired out of both 50mL centrifuge tubes to reduce the total volume of liquid needed to be matched with S Buffer. Then

10mL of S Buffer was added to act as a wash, generating 12mL total combined in a single centrifuge tube. This was followed by heat-shock treatment (30 min at 37°C), which was shorter than the recommended 1 hr heat-shock (Barrière and Félix, 2014). After the heat-shock, 30% S

Buffer + Glycerin + 2mM CaCl2 was added at equal volume (creating 15% glycerol, 1mM CaCl2 final concentration in S-buffer). The tube containing the H. mephisto ready for freezing were

33

inverted several times to ensure all the components had sufficiently mixed together and all the H. mephisto were evenly dispersed in the liquid. The stock was then crafted by evenly aliquoting all the solution into 1.8mL cryovials each containing 1mL. Lastly, the cryovials were put into a labeled cardboard box and then that placed inside a Styrofoam shipping box to allow the H. mephisto to slowly freeze. One of the cryovials was thawed and examined on a plate, which was found to contain live, active worms demonstrating that the freezing process was successful.

34

REFERENCES

An, S.Q. and Berg, G. (2018). Stenotrophomonas maltophila. Trends Microbiol, 26(7), 637-638. doi: 10.1016/j.tim.2018.04.006

Barrière A., Félix M.-A. Isolation of C. elegans and related nematodes (May 2, 2014), WormBook, ed. The C. elegans Research Community, WormBook, doi/10.1895/wormbook.1.115.2, http://www.wormbook.org.

Borgonie, G., García-Moyano, A., Litthauer, D., Bert, W., Bester, A., van Heerden, E., Möller, C., Erasmus, M., and Onstott, T.C. (2011). Worms from hell: Nematoda from the terrestrial deep subsurface of South Africa. Nature, 474, 79–82.

Brunquell, J., Morris, S., Lu, Y., Cheng, F., & Westerheide, S. D. (2016). The genome-wide role of HSF-1 in the regulation of gene expression in Caenorhabditis elegans. BMC genomics, 17, 559. doi:10.1186/s12864-016-2837-5

Brunquell, J., Snyder, A., Cheng, F., & Westerheide, S. D. (2017). HSF-1 is a regulator of miRNA expression in Caenorhabditis elegans. PLoS one, 12(8), e0183445. doi.org/10.1371/journal.pone.0183445

Bunch H. (2017). RNA polymerase II pausing and transcriptional regulation of the HSP70 expression. Eur J Cell Biol, 96(8),739-745. doi: 10.1016/j.ejcb.2017.09.003

Cai, J., Wei, S., Lu, Y., Wu, Z., Qin, Q., & Jian, J. (2018) Bax inhibitor-1 from orange spotted grouper, Epinephelus coioides involved in viral infection. Fish Shellfish Immunol, 78, 91- 99, doi:10.1016/j.fsi.2018.04.020

Corsi, A.K., Wightman, B., and Chalfie, M. (2015). A Transparent window into biology: A primer on Caenorhabditis elegans. WormBook 1–31.

Eddy S. R. (2011). Accelerated Profile HMM Searches. PLoS computational biology, 7(10), e1002195. doi:10.1371/journal.pcbi.1002195

Gibbs, D. S., Anderson, G. L., Beuchat, L. R., Carta, L. K., & Williams, P. L. (2005). Potential role of Diploscapter sp. strain LKC25, a bacterivorous nematode from soil, as a vector of food-borne pathogenic bacteria to preharvest fruits and vegetables. Applied and environmental microbiology, 71(5), 2433–2437. doi:10.1128/AEM.71.5.2433-2437.2005

Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P. D., Bowden, J., Couger, M. B., Eccles, D., Li, B., Lieber, M., MacManes, M. D., Ott, M., Orvis, J., Pochet, N., Strozzi, F., Weeks, N., Westerman, R., William, T., Dewey, C. N., Henschel, R., LeDuc, R. D., Friedman, N., … Regev, A. (2013). De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature protocols, 8(8), 1494-512. doi:10.1038/nprot.2013.084

35

Hollien J., Weissman J. S. (2006). Decay of endoplasmic reticulum-localized mRNAs during the unfolded protein response. Science, 313,104–107. doi: 10.1126/science.1129631.

InterPro Website. https://www.ebi.ac.uk/interpro/ (accessed March 18, 2019)

Kimmig, P., Diaz, M., Zheng, J., Williams, C. C., Lang, A., Aragón, T., … Walter, P. (2012). The unfolded protein response in fission yeast modulates stability of select mRNAs to maintain protein homeostasis. eLife, 1, e00048. doi:10.7554/eLife.00048

Kneip, C., Lockhart, P., Voss, C., & Maier, U. G. (2007). Nitrogen fixation in eukaryotes--new models for symbiosis. BMC evolutionary biology, 7, 55. doi:10.1186/1471-2148-7-55

Lemzina, L.V., Gagarin, V.G. (1994) New species of free-living nematodes from thermal waters in Kyrghyzstan. Zoological Rossica, 3(1), 19-21.

Lisbona, F., Rojas-Rivera, D., Thielen, P., Zamorano, S., Todd, D., Martinon, F., Glavic, A., Kress, C., Lin, J. H., Walter, P., Reed, J. C., Glimcher, L. H., … Hetz, C. (2009). BAX inhibitor-1 is a negative regulator of the ER stress sensor IRE1alpha. Molecular cell, 33(6), 679-91. doi: 10.1016/j.molcel.2009.02.017

Liu, C., Wang, T., Zhang, W. & Li, X. (2008) Computational identification and analysis of immune-associated nucleotide gene family in Arabidopsis thaliana. J Plant Physiol, 165, 777-787, doi:10.1016/j.jplph.2007.06.002

Martin, J., Rosa, B. A., Ozersky, P., Hallsworth-Pepin, K., Zhang, X., Bhonagiri-Palsikar, V., … Mitreva, M. (2014). Helminth.net: expansions to Nematode.net and an introduction to Trematode.net. Nucleic acids research, 43(Database issue), D698–D706. doi:10.1093/nar/gku1128

Mitreva, M., & Jasmer, D. P. (2008). Advances in the sequencing of the genome of the adenophorean nematode Trichinella spiralis. Parasitology, 135(8), 869–880. doi:10.1017/S0031182008004472

Mizobuchi, N., Hoseki, J., Kubota, H., Toyokuni, S., Nozaki, M., Naitoh, M., Koizumi, A., & Nagata, K.. (2007). ARMET is a soluble ER protein induced by the unfolded protein response via ERSE-II element. Cell Struct Funct, 32, 41-50, doi:DOI 10.1247/csf.07001 (2007).

Murphy, M.E. (2013). The HSP70 family and cancer. Carcinogenesis, 34(6), 1181–1188. doi:10.1093/carcin.bgt111

Nitta, T., Nasreen, M., Seike, T., Goji, A., Ohigashi, I., Miyazaki, T., … Takahama, Y. (2006). IAN Family Critically Regulates Survival and Development of T Lymphocytes. PLoS Biology, 4(4), e103. doi:10.1371/journal.pbio.0040103

36

Oakes, S. A., & Papa, F. R. (2014). The role of endoplasmic reticulum stress in human pathology. Annual review of pathology, 10, 173–194. doi:10.1146/annurev-pathol- 012513-104649. doi:10.1146.annurev-pathol-012513-104649

Ochoa, A., Storey, J. D., Llinás, M., & Singh, M. (2015). Beyond the E-Value: Stratified Statistics for Protein Domain Prediction. PLoS computational biology, 11(11), e1004509. doi:10.1371/journal.pcbi.1004509

Pertea, M., Kim, D., Pertea, G., Leek, J. T., & Salzberg, S. L. (2016). Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie, and Ballgown. Nature Protocols, 11(9), 1650–1667. doi:10.1038/nprot.2016.095

Peters, A. (2013). Application and commercialization of nematodes. Appl Microbio. Biotechnol. 97(14), 6181–6188. Doi:10.1007/s00253-013-4941-7

Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., & Lopez, R. (2005). InterProScan: protein domains identifier. Nucleic acids research, 33(Web Server issue), W116–W120. doi:10.1093/nar/gki442

Rödelsperger, C., Streit, A., and Sommer, R.J. (2013) Structure, Function and Evolution of The Nematode Genome. In: eLS. John Wiley & Sons, Ltd: Chichester. DOI: 10.1002/9780470015902.a0024603

Rothschild, L.J. (2001). Life in Extreme Environments. Ad Astra. 14, 1092–1101.

Sano, R., & Reed, J. C. (2013). ER stress-induced cell death mechanisms. Biochimica et biophysica acta, 1833(12), 3460–3470. doi:10.1016/j.bbamcr.2013.06.028

Sorensen JG, Kristensen TN, Loeschcke V. (2003). The evolutionary and ecological role of heat shock proteins. Ecology Letters, 6(11):1025-1037. doi:10.1046/j.1461-0248.2003.00528

Srinivasan, J., Dillman, A. R., Macchietto, M. G., Heikkinen, L., Lakso, M., Fracchia, K. M., … Sternberg, P. W. (2013). The draft genome and transcriptome of Panagrellus redivivus are shaped by the harsh demands of a free-living lifestyle. Genetics, 193(4), 1279–1295. doi:10.1534/genetics.112.148809

Stiernagle, T. Maintenance of C. elegans (February 11, 2006), WormBook, ed. The C. elegans Research Community, WormBook, doi/10.1895/wormbook.1.101.1, http://www.wormbook.org.

Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D. R., … Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature protocols, 7(3), 562–578. doi:10.1038/nprot.2012.016

37

Urbieta, M.S., Donati, E.R., Chan, K.-G., Shahar, S., Sin, L.L., and Goh, K.M. (2015). in the genomic era: Biodiversity, science, and applications. Biotechnol. Adv, 33, 633–647. doi: 10.1016/j.biotechadv.2015.04.007

Utah State Univeristy. (2018) Water Quality. Utah State University Extension. (Last accessecd April 12, 2019)

Wu, H., Ng, B. S., & Thibault, G. (2014). Endoplasmic reticulum stress response in yeast and humans. Bioscience Reports, 34(4), e00118. doi:10.1042/BSR20140058

Wang, Y., Coleman-Derr, D., Chen, G. & Gu, Y. Q. (2015) OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species. Nucleic Acids Res 43, W78-84. doi:10.1093/nar/gkv487

Wang, M. & Kaufman, R. J. (2016) Protein misfolding in the endoplasmic reticulum as a conduit to human disease. Nature, 529(7586), 326-335. doi:10.1038/nature17041

Wang, Z., & Li, X. (2009). IAN/GIMAPs are conserved and novel regulators in vertebrates and angiosperm plants. Plant Signaling & behavior, 4(3), 165–167.

Yu, A., Li, P., Tang, T., Wang, J., Chen, Y., & Liu, L. (2015). Roles of Hsp70s in Stress Responses of , Plants, and Animals. BioMed research international, 2015, 510319. doi:10.1155/2015/510319

Yandell M, Ence D. (2012) A beginner’s guide to eukaryotic genome annotation. Nature Reviews Genetics, 13, 329-342.

38