A Molecular Evolutionary Approach for Targeted-transcriptomics of PCB Exposure: A Case Study Involving spp.

by

Gina Louise Capretta

A Thesis presented to The University of Guelph

In partial fulfilment of requirements for the degree of Master of Science in Integrative Biology

Guelph, Ontario, Canada

© Gina Capretta, December, 2015

ABSTRACT

A MOLECULAR EVOLUTIONARY APPROACH FOR TARGETED- TRANSCRIPTOMICS OF PCB EXPOSURE: A CASE STUDY INVOLVING Hexagenia spp.

Gina Louise Capretta Advisor: University of Guelph, 2015 Dr. Mehrdad Hajibabaei

This thesis developed a methodology to target candidate xenobiotic- interacting conserved across taxa, as well as tested the methodology as a tool to indicate chemical exposure and elucidate possible biological effects. Using PCBs as the chemical class, this thesis identified conserved candidate PCB-interacting genes and designed and tested degenerate primers to amplify those genes in divergent species. Using next-generation sequencing technology, this thesis also investigated the targeted-transcriptome response of H. rigida, a common ecotoxicological test species, following exposure of Hexagenia spp. to PCB-52 in a 96-hour water-only test, in which survivorship and bioaccumulation were also measured. Transcript sequences of target genes were generated for H. rigida and successfully annotated. Significant down-regulation of three genes (HSP90AB1, TUBA1C, and ALDH6A1) elucidated the biological processes that may be disrupted. This research shows the potential for linking molecular events to outcomes at higher levels of biological organization, an approach relevant to environmental risk assessment.

Dedication

To my family

iii

Acknowledgements

There are many people to whom I owe a debt of gratitude; their contributions helped support and encourage me throughout this journey.

Thank you, to my advisor, Dr. Mehrdad Hajibabaei, for challenging me to explore new technologies, new methodologies, and new ideas. Thank you for allowing me to travel to numerous conferences and encouraging me to present talks and posters. To all the members of the Hajibabaei lab (both past and present): Nicole, Mike, Steph, Rachel, Ian, and Katie, thanks for always being there to listen to my frustrations and for providing perspective when the going gets tough. To Shannon, lab manager and technician extraordinaire, thank you for helping me with RNA extractions and providing sound advice on lab protocols. A big thank you to Dr. Shadi Shokralla, for helping me through my entire thesis project, for providing both expertise and time, for guiding me through the sequencing and bioinformatics processes. Thank you to NSERC, OGS, the Ministry of Environment and Climate Change (MOECC), and the Department of Integrative Biology for providing funds for my degree. Thank you to my Advisory Committee. Thank you Dr. Glen Van Der Kraak for your advice and the Mus musculus and Danio rerio cDNA samples. Thank you Dr. Paul Sibley for your valuable comments and suggestions during the editing process. Your insight pushed me to do better. Thank you to Trudy Watson-Leung from MOECC, Aquatic Toxicology, Laboratory Services Branch. Thank you for collaborating on this project, for providing the Hexagenia organisms and set up for my PCB exposures, for providing me with tissue bioaccumulation data and water chemistry results, and thank you for your enthusiasm in integrating expression analysis in the work you do. Last, but not least, thank you to my family. To my daughters, Anabel and Elsa, who remind me everyday the importance for young girls to have strong and positive female role models in science. Thank you to my mom, dad, and sister for always encouraging my scientific pursuits. Thank you to Joel, my best friend, my colleague, and my partner in life – you encouraged me to start a Master’s, encouraged me to stick with it, and you are still encouraging me as I write this, providing me with your knowledge, expertise, guidance, support, time and above all else, love.

iv

Table of Contents

Chapter 1. General Introduction ……...…………………………….. 1

Chapter 2. In silico identification of candidate PCB-interacting genes conserved in Introduction ………………………………………6 Materials and Methods ………………………………………9 Identification of candidate PCB-interacting genes with known invertebrate homologues ………………………………………9 Sequence alignment and analysis …………………………………….10 Functional characterization …………………………………….10 Results …………………………………….11 Discussion and Conclusion …………………………………….12 Tables and Figures …………………………………….16

Chapter 3. Design and validation of degenerate primers for candidate PCB-interacting genes conserved in animals Introduction …………………………………….22 Materials and Methods …………………………………….24 CODEHOP degenerate primers design …………………………………….24 Validation of degenerate CODEHOP primers …………………………………….25 RNA isolation and cDNA synthesis …………………………………….26 PCR amplification …………………………………….27 Visual confirmation …………………………………….28 Bioinformatics confirmation …………………………………….28 Sanger sequencing and analysis …………………………………….28 Results …………………………………….29 Discussion and Conclusion …………………………………….31 Tables and Figures …………………………………….34

Chapter 4. Assesssing the targeted-transcriptome response of Hexagenia rigida to PCB-52 using PCB-interacting genes conserved across taxa Introduction …………………………………….42 Materials and Methods …………………………………….45 Exposures …………………………………….45 Tissue bioaccumulation of PCB-52 …………………………………….46 Selection of candidate genes …………………………………….46 DNA barcoding for species verification …………………………………….47 Total RNA-extraction for targeted- transcriptomics …………………………………….47 RT-PCR for cDNA synthesis …………………………………….48

v

Library preparation for Illumina MiSeq sequencing …………………………………….48 Sequencing on Illumina MiSeq …………………………………….49 Data pre-processing …………………………………….49 Annotation and alignment …………………………………….49 Statistical analysis on technical replicates …………………………………….50 Differential gene expression analysis …………………………………….51 Results …………………………………….52 Discussion and Conclusion …………………………………….53 Tables and Figures …………………………………….62

Chapter 5. General Conclusion …………………………………….66

References …………………………………….68

Appendix …………………………………….81

vi

List of Tables

Table 2.1 Number and sources of mRNA sequences used. 6.23 ± 0.75 = average (± SD) number of sequences per gene alignment 109 = total number of homologous PCB-interacting genes, at least 1 invertebrate sequence/gene; 279.86 ± 116.88 base pairs = average (± SD) length of domain per gene.

Table 2.2 Domain conservation of candidate PCB-interacting genes in each KEGG pathway.

Table 2.3 Functional characterization of candidate PCB-interacting conserved across taxa (N=109). * = multi-functional ; Lower conservation: < 49.5% conservation/domain; Higher conservation: ≥ 49.5% conservation/domain.

Table 3.1 Candidate PCB-interacting genes conserved in animals used for degenerate primer design (N=71 ≥ 49.5% nucleotide conservation for domain selected).

Table 3.2 Annealing temperature for each gene’s primer pair. *=housekeeping gene. Primers in bold were tested on Sample 2 of H. limbata.

Table 3.3 Number of primer sets considered successful at each confirmation level for each species.

Table 4.1 Degenerate primers used in targeted RNA-seq analysis. * housekeeping gene; TA = annealing temperature; Uppercase letters = non degenerate core; Lower letters = degenerate clamp. homologue gene name and symbol is represented by D. melanogaster.

Table 4.2 Water chemistry, survivorship, and tissue bioaccumulation results from 96 hour PCB-52 exposure.

Table 4.3 Number of each species of Hexagenia per treatment.

Table 4.4 Sequencing and annotation results from initial bioinformatics processing.

Table 4.5 Log2 fold change values of all genes for the acetone (carrier solution) control and 0.033 µg/L PCB-52. Asterisks denote log2 fold changes ≥ |1| that are statistically significant (p < 0.05).

vii

Table 4.6 Induction of HSP90 by several environmental stressors in various invertebrate species.

Appendix Table A1. Candidate PCB-interacting genes conserved in animals (N=109).

Table A2. Summary statistics for domain nucleotide conservation of candidate PCB-interacting genes per KEGG pathway.

Table A3. Summary statistics for domain conservation of candidate PCB-interacting genes per KEGG pathway.

Table A4. CODEHOP primers and primer properties (N=68). Upper case letters = non-degenerate core; Lower case letter = degenerate clamp; D=degeneracy; Tm=melting temperature; *=housekeeping gene.

Table A5. BLAST output for all genes tested using CODEHOP primers for all species. Q=query; SL=sequence length.

List of Figures

Figure 1.1 Generic structure of PCBs. Numbering is on the left ring and positioing on the right ring (From Richardson and Schlenk 2011).

Figure 1.2 Thesis conceptual framework, divided by objective. A candidate PCB-interacting gene is a gene with a documented effect on gene expression associated with any polychlorinated biphenyl in any organism across any level of biological organization.

Figure 2.1 Workflow to identify candidate PCB-interacting genes conserved across taxa using available online databases and bioinformatics tools used in molecular evolution. * nucleotide conservation is represented by the number of identical bases over the total numbers of bases for the region selected.

Figure 2.2 Relative frequency of candidate PCB-interacting genes conserved in animals per KEGG Pathway.

Figure 3.1 CODEHOP design strategy of degenerate primers for PCB-interacting genes conserved in animals * organisms include: H. sapiens, M. musculous, G. gallus, D. rerio, X. laevis, D.melanogaster, C. elegans with some exceptions and substitutions; MSA = multiple sequence alignment.

viii

Figure 3.2 Validation strategy of degenerate primers for PCB-interacting genes conserved in animals.

Figure 3.3 Example visual confirmation of gene amplification using CODEHOP primers for H. limbata (H), D. rerio (D), and M. musculus (M).

Figure 3.4 Relative frequency of BLAST amplicon sequence ID.

Figure 3.5 PCB-interacting genes conserved in animals successfully amplified, sequenced, and identified using CODEHOP degenerate primers sets (N-37).

Figure 4.1 ALDH6A1, TUBA1C, and HSP90AB1 gene expression in H. rigida exposed to 0.033 µg/L PCB-52. Asterisks denote statistically significant (p<0.05) fold changes between exposed and control groups, as detected by edgeR. Bars show standard error of the mean normalized counts per million.

Appendix Figure A2a. Figure A2a A simplified comparison of data generation between RNA-seq and targeted RNA-seq. *In Targeted RNA-seq, Total RNA can also be prepared initially and reverse transcribed into cDNA; primers can enrich for mRNA during PCR.

Figure A2b. A simplified comparison of data analysis between RNA-seq and targeted RNA-seq.

ix

List of Symbols, Abbreviations, or Nomenclature

AHR aryl hydrocarbon receptor ALDH6A1 aldehyde dehydrogenase 6 family, member 1A AP2A2 adaptor-related complex 2, alpha 2 subunit ATP5B ATP synthase, H+ transporting, mitochondrial F1 complex, beta BLAST Basic Local Alignment Search Tool bp base pairs cDNA complementary DNA CODEHOP Consensus-Degenerate Hybrid Oligonucleotide Primer CTD Comparative Toxicogenomics Database CYP cytochrome p450 gene family DL-PCB dioxin-like PCB dNTP deoxynucleotide GLGQ good length, good quality EEF1A1 eukaryotic translation elongation factor 1, alpha 1 ETS2 V-ets avian erythroblastosis virus E26 oncogene homolog 2 HK housekeeping HSP heat shock protein HSP90AB1 heat shock protein 90kDa alpha (cytosolic), class B member 1 HSPA8 heat shock 70kDa protein 8 JH juvenile hormone KEGG Kyoto Encyclopedia of Genes and Genomes

LC50 lethal concentration 50 MOECC Ministry of Environment and Climate Change mRNA messenger RNA NCBI National Centre for Biotechnology Information NDL-PCB non-dioxin like PCB NGS next-generation sequencing PCB polychlorinated biphenyl

x

PCB-52 2,2’,5,5’-tetrachlorobiphenyl PCR polymerase chain reaction POP persistent organic pollutant PGD phosphogluconate dehydrogenase PK pyruvate kinase QA/QC quality assurance/quality control qRT-PCR quantitative real time PCR RT-PCR reverse transcription PCR RNA ribonucleic acid RNA-seq RNA sequencing RPS18 ribosomal protein S18 SULT sulfotransferase gene family TUBA1C tubulin, alpha, 1c UGT UDP glucuronosyltransferase gene family 18S rRNA 18S ribosomal RNA 20E 20-Hydroxyecdysone

xi

Chapter 1

General Introduction Aquatic environments, habitats for both vertebrate and invertebrate taxa, are continuously being impacted with a range of xenobiotics including organic and inorganic compounds, metals, and nanoparticles (Lee et al. 2015). One of the main goals of environmental risk assessment (ERA) is to protect the individuals, populations, communities, and ecosystems from the adverse effects of chemical contamination (Lee et al. 2015). ERA is a complex process that is informed by many lines of evidence, including sediment chemistry, in situ studies on benthic organism community, laboratory-based bioassays, and more recently, functional genomics, i.e. transcriptomics (Lee et al. 2015). As linked to ecotoxicology, transcriptomics is the study of gene expression arising from responses to environmental toxicant exposure. Transcriptomics is widely used to identify stress-sensitive genes that respond to pollutants at a molecular level (Heinloth et al. 2004; Vandersteen 2012). Expression analysis of these stress-sensitive genes will aid identifying particular and involved in xenobiotic response. Also considered biomarker genes, these stress- sensitive genes can act as sublethal indicators of toxicant exposure (Steinberg et al. 2008). Sublethal effects can provide valuable early warning signals of toxicant exposure, highlight the potential of pollution in need of remediation, and can be useful for monitoring recovery efforts after management strategies have been implemented (Van der Oost et al. 2003; Steinberg et al. 2008; Berninger et al. 2014). Sublethal effects can also elucidate causal and mechanistic relationships between biomarker genes and adverse effects at higher levels of biological organization; the potential for linking knowledge from these different lines of evidence in order to understand toxicity progression forms the basis of the adverse outcome pathway (AOP) framework, a new paradigm for ERA (Ankley et al. 2010; Groh et al. 2015). Recently, a molecular evolutionary approach to transcriptomics has gained traction in ecotoxicology in order to better extrapolate conserved toxicant-induced molecular events across species, as well as to elucidate species-specific responses of

1

biomarker genes to xenobiotic exposure (Groh et al. 2015; LaLone et al. 2013a). A molecular evolutionary approach is based on the premise that stress-sensitive genes conserved across taxa have functional importance in xenobiotic response (Kültz 2005; Piña et al. 2007; Hoffman and Willi 2008). For example, conserved functional domains based on across target genes have been identified in different species for chemicals with known modes of action, including 17∝ -ethinyl estradiol, permethrin, 17�-trenbolone (LaLone et al. 2013b). In addition, a highly conserved neurotransmission-related transcriptional network was identified through a meta-analysis of transcriptional responses from previously conducted experiments on the neurotoxic effects of hexahydro-1,3,5-trinitro-1,3,5-triazine (RDX) in divergent organisms (Garcia-Reyero et al. 2011). Lastly, a multi-gene multi-species molecular tool designed to enable comparative gene expression and screen for endocrine disrupting chemicals as an early warning signal of exposure has been developed for fish (Baker et al. 2009) and frogs (Veldhoen et al. 2014). Multi-gene multi-species molecular tools have yet to be targeted for other environmental contaminants, such as PCBs, or developed for use in aquatic communities across a wider range of taxa, a range that includes both invertebrates and vertebrates. The development of taxonomically wide-range transcriptome analysis would provide gene expression data of target genes from many different organisms exposed to pollution. This taxonomically wide-range transcriptome analysis would contribute to the weight-of-evidence approach used to identify sites of potential biological impact in a timely manner and can elucidate the possible biological effects, either conserved or taxon-specific, that can occur at higher levels of biological organization. PCBs are persistent organic pollutants (POPs) that continue to pose a risk to wildlife and human health (Environment Canada Figure 1.1 Generic structure of PCBs. 2015). PCBs consist of 209 congeners, each Numbering is on the left ring and positioning on the right ring. (From congener with differing numbers and Richardson and Schlenk 2011) positions of chlorine atoms on the biphenyl structure (Figure 1.1).

2

PCBs are persistent, bioaccumulative, and biomagnify in both aquatic and terrestrial food chains (Giesy and Kannan 1998; Kelly and Gobas 2001). Though the import, manufacture, and selling of PCBs was banned in Canada in 1977, and PCB release into the environment has been illegal nationwide since 1985, PCBs still enter the environment through various ways, including accidental fires and/or leakage from PCB-containing products in landfill sites (Giesy and Kannan 1998). PCBs have been detected worldwide in air, water, sediment, and biota across multiple taxonomic groups (Kelly and Gomas 2001; Papp et al. 2007; Richardson and Schlenk 2011; Cullon et al. 2012; Melymuk et al. 2012; Van Praet et al. 2012; Byer et al. 2013; Dorneles et al. 2013; Gioia et al. 2013; Jaspers et al. 2013; Marek et al. 2013). PCBs have been found in Arctic regions (Eckhardt et al. 2007), as well as locally, in the Niagara River (Milani et al. 2013) and in the sediments and benthic macroinvertebrates of Lyons Creek East, a tributary of the Niagara River (Milani et al. 2013). PCBs can be broadly categorized into two main groups: dioxin-like PCB (DL- PCBs) and non-dioxin-like PCBs (NDL-PCBs). DL-PCBs are in the correct planar conformation to interact with the aryl hydrocarbon receptor (AHR) pathway of toxicity, similar to dioxins (Giesy and Kannan 1998; Hamers et al. 2011). NDL-PCBs are nonplanar PCBs that do not interact with the AHR pathway, and their mode(s) of toxic action remain less characterized (Giesy and Kanna 1998; Hamers et al. 2011; Viluksela et al 2012). Notable endocrine and reproductive disruption (Antunes- Fernandes et al. 2011), neurotoxic (Dutta et al. 2012; Boix et al. 2011), immunotoxic (Canesi et al. 2003), cytotoxic (Yilmaz et al. 2012), genotoxic (Sandel et al. 2008), and carcinogenic (Senthilkumar et al. 2011) effects of PCBs have been observed in wildlife and humans. Transcriptomic studies of PCB exposure in a range of organisms contribute to a wealth of data that can be used in applying a molecular evolutionary framework to find genes for taxonomically wide-range transcriptome analysis. For example, exposure to some environmental pollutants, including PCBs, affects the expression of heat shock proteins (Triebskorn et al. 2002), genes involved in energy metabolism (Kodvanti et al. 2011; Pujolar et al. 2012), lipid metabolism (Menzel et

3

al. 2007; Menzel et al. 2009), and intracellular signaling (Kodvanti et al. 2011; Wens et al. 2011), as well as several xenobiotic metabolizing genes (Pujolar et al. 2012; Garcia et al. 2013; Rhee et al. 2013), genes related to genetic information processing (Garcia et al. 2012) and cellular growth and architecture (Lassere et al. 2012). The focus of this thesis is two-fold. Firstly, I apply a molecular evolutionary framework to develop a methodology to find target genes conserved across vertebrates and invertebrates, in the context of PCB exposure. Secondly, as a proof of concept, I test the methodology as a tool to indicate chemical exposure and elucidate possible biological effects, by investigating the targeted-transcriptome response of PCB exposure on Hexagenia rigida, a common ecotoxicological species, using next generation sequencing (NGS) technology. Each data chapter focuses on one objective; the results from each objective inform the next objective (Figure 1.2). In chapter 2, I establish the relevance of a molecular evolutionary approach for PCBs, discuss the utility of a targeted- transcriptome NGS strategy, and using bioinformatics tools, I identify a set of candidate PCB-interacting genes conserved across taxa. In chapter 3, I design, optimize, and test degenerate primers for those candidate PCB-interacting genes in divergent species (H. rigida, , Danio rerio and Mus musculus). In chapter 4, I validate the molecular tools developed in chapter 3 by investigating the targeted-transcriptome response of H. rigida following an acute water-only exposure of Hexaegnia spp. to PCB-52, using NGS technology. This research will contribute to the development of improved taxonomically wide-range transcriptome methodology for ecotoxicologists and for the ERA process, as well as provide new gene expression data on the effects of a NDL-PCB on H. rigida, an ecotoxicological test species.

4

Figure 1.2 Thesis conceptual framework, divided by objective. A candidate PCB-interacting gene is one that changes expression in response to polychlorinated biphenyl exposure, documented in at least one organism.

5

Chapter 2

In silico identification of candidate PCB-interacting genes conserved in animals

Introduction Over the last fifteen years, functional gene expression (i.e., transcriptomics) has become increasingly integrated into the field of ecotoxicology. Two approaches have generally been used in ecotoxicological applications of transcriptomics: biomarker analysis and whole transcriptome analysis. In this context, biomarkers rely on the expression of a limited number of genes for detection of molecular changes (i.e., differential gene expression) in response to exposure and/or toxic effect of a pollutant. Often these gene targets are directly linked to a specific group of pollutants and/or known mechanism of action. For example, commonly applied biomarkers of exposure to DL-PCBs include genes involved in the well-characterized AHR mediated pathway of toxicity: aryl hydrocarbon receptor (AHR), aryl hydrocarbon receptor repressor (AHRR), aryl hydrocarbon receptor translocator (ARNT), and cytochrome p450 1A (CYP1A) (Zhou et al. 2010). Though these biomarkers of PCB exposure are well established in transcriptomic studies of vertebrates, most notably fish (Triebskorn et al. 2002; Veldhoen et al. 2011; Ribecco et al. 2012; Wellband and Heath 2013), they may not be well-suited to assess the functional gene expression of invertebrate bioindicator assemblages such as Ephemeroptera (), Plectoptera (stoneflies), and Trichoptera (caddisflies) or conventional test organisms such as Chironomous (midge), Daphnia (cladoceran), and Lumbriculous (oligochaete). For example, several studies indicate invertebrate AHR homologues lack specific high-affinity ligand binding sites (Butler et al. 2001; Hahn 2002), suggesting the adaptive role of AHR as a regulator of xenobiotic metabolism surfaced during vertebrate evolution (Hahn 2002). Consequently, conventional vertebrate-based biomarkers of PCBs have limited usability for understanding environmental exposure to, and impacts on

6

invertebrates. Thus, ecotoxicologists and the environmental risk assessment (ERA) process would immensely benefit from the development of taxonomically wide- range transcriptome analyses. In contrast to biomarker gene analysis, RNA-sequencing (RNA-seq), an NGS technology that sequences the entire transcriptome, has recently integrated whole transcriptome analysis into ecotoxicological studies (Garcia et al. 2012; Mehinto et al. 2012; Hu et al. 2014). Whole transcriptome analysis involves measuring the functional gene expression of the entire transcriptome and using differential expression analysis to identify the genes specific to the organism that induce a significant fold change in response to a pollutant. Cost, biases inherent to RNA-seq methodology, and the computational challenges of large data sets, however, complicate the analysis of mRNA expression data. For example, the entire length of a gene cannot be sequenced using RNA-seq, so the mRNA (or cDNA) is fragmented into much shorter segments before sequencing. This fragmentation process is not uniform – it can be affected by both positional and sequence-specific biases. Positional bias is the effect of fragments being located towards the beginning or end of transcripts. Sequence-specific bias refers to a more global effect in which the sequences surrounding potential fragments affect their probability of being selected for sequencing (Roberts et al. 2011). Both of these biases may render some fragments more available for sequencing than others. In addition to fragmentation problems, working with several hundred gigabytes of data is another challenge. The computational challenges of storing, retrieving, and processing massive amounts of data can complicate analysis of mRNA expression data (Wang et al. 2009). Often, a lot of sequencing effort and bioinformatics tools are used to analyze the entire transcriptome, when only a small fraction of the data generated can be annotated successfully with current approaches, especially in non-model species (Ockendon et al. 2015), or reveal significant changes in gene expression (Bansal et al. 2014). Though new software and approaches are continuously being developed to address these biases, the whole transcriptome methodology of RNA-seq is impractical as an “early warning” tool for environmental monitoring and risk assessment of chemicals typically

7

involving many sites and organisms (Fang and Cui 2011; Garber et al. 2011; Liu et al. 2014). Targeted RNA-seq (targeted transcriptome analysis) involves preselecting a subset of genes, which are then used to selectively amplify and quantify the expression of a targeted transcriptome (See Appendix Figure A2a for details). A fragmentation step is not required in targeted RNA-seq. As such, the biases inherent to fragmentation are lessened. Working with a smaller data set (generated from a targeted transcriptome compared to an entire transcriptome) will also lessen the bioinformatics challenges addressed above and may provide more meaningful analysis of targeted gene expression (See Appendix Figure A2b for details). A molecular evolutionary approach to identify genes for taxonomically wide- range transcriptome analysis, in the context of PCB exposure, would overcome the limitations of biomarker specificity for this chemical class and lessen the challenges of whole transcriptome analysis, as addressed above. This approach is based on bridging two fundamental principles. Firstly, conserved regions of the genome are often fundamentally important to organisms’ most basic functions. Upon environmental perturbation, universal mechanisms that deal with stress, such as those related to cellular functions and certain aspects of metabolism, aim to restore normal functioning (Kültz 2005; Hoffman and Willi 2008). Identification of these universal mechanisms is informed by the homology of stress-sensitive genes that share a conserved domain across taxa, both at the nucleotide and amino acid level. These conserved domains signify specific protein functions, such as DNA-binding, ligand-binding, and/or catalytic activity, that can respond to xenobiotic exposure. Secondly, transcription profiles between species may be the same within certain classes of chemicals (e.g., metals, POPs etc.). Chemical classes share structural and/or functional similarities that may target common and conserved modes of action across phylogenetically distant organisms (Piña et al. 2007; Hoffman and Willi 2008; Garcia-Reyero et al. 2011). As such, the genes and/or gene families with conserved domains across taxa that have shown interaction with pollutants in one or several species, may constitute suitable targets to assess the potential response,

8

either conserved or taxon-specific, in a wide range of organisms to xenobiotics, such as PCBs. The objective of this chapter was to use a molecular evolutionary approach to identify a key set of candidate PCB-interacting genes conserved across taxa (from invertebrates to vertebrates), by mining online databases coupled with applying bioinformatics tools used in molecular evolution. This research builds on other studies that have employed similar in silico approaches using only vertebrate sequence information, with target genes identified a priori (Baker et al. 2009; LaLone 2013a; Veldhoen et al. 2014). For example, the most relevant model for reliable prediction of drug or xenobiotic toxicity was determined using protein sequence conservation of the ligand-binding pocket of 28 toxicity target proteins, when comparing human proteins to their vertebrate aquatic species counterparts (McRobb et al. 2014). For this present work, I hypothesized that candidate PCB- interacting genes conserved across taxa can be identified using a bioinformatics approach if invertebrate homologues exist with corresponding sequence information for each candidate PCB-interacting gene.

Materials and Methods Identification of candidate PCB-interacting genes with known invertebrate homologues The Comparative Toxicogenomics Database (CTD) (www.ctdbase.org) is a publically available online database that curates interactions between environmental chemicals and gene and/or gene products. Though its aim is to facilitate the understanding of xenobiotic effects on human health, the CTD can also be used to identify potential novel biomarkers of exposure to specific pollutants that may be useful in environmental biomonitoring approaches. Data-mining the CTD database generated a list of PCB-interacting genes (see Figure 2.1 for workflow). Search terms to query for gene-PCB interaction on the CTD included: PCBs, and common DL-PCBS, such as PCB-126, and common- NDL-PCBs, such as PCB-52. The Web of Science (http://thomsonreuters.com/web-of-science/) and Google Scholar (www.scholar.google.ca) were also searched for more recent PCB-gene interactions.

9

This list of candidate PCB-interacting genes was cross-referenced with the publically available Homologene Database (www.ncbic.nlm.nih.gov/homologene), which uses an automated system to detect homologues among annotated genes based on protein sequence identity. Genes without known invertebrate (e.g., nematode and fly) homologues in the Homologene Database were discarded, such that the final candidate PCB-interacting gene set contained only genes with at least one identified invertebrate homologue per gene.

Sequence alignment and analysis For each homologous candidate PCB-interacting gene, one mRNA (cDNA) sequence each from human, mouse, chicken, frog, fish, fly, and nematode (with some exceptions and substitutions; see Table 2.1 for details) was downloaded from GenBank into a sequence file. These organisms were chosen because individually, they are model species whose sequence information is readily available and collectively, they represent phylogenetically distant taxa. Using Mega 5.2.2 software (Tamura et al. 2011), nucleotide sequences were aligned by Clustal according to default parameters (Chenna et al. 2003). Each nucleotide alignment was translated to its amino acid sequence to determine the highly conserved protein domain per gene. Using the highly conserved protein domain as a guide, genes with < 49.5% nucleotide conservation for the region selected were classified into the “lower conservation” group and genes with ≥ 49.5% nucleotide conservation per domain were classified into “higher conservation” group. Nucleotide conservation was represented by the number of identical bases over the total number of aligned bases for the region selected. Amino acid conservation was represented by the number of aligned residues over the total number of residues for the domain selected.

Functional characterization Among other functions, the Kyoto Encyclopedia of Genes and Genomes (KEGG) is a reference database that integrates information from experimental literature to biological systems, which can be used to infer function and/or cellular pathway from genomic information. Using KEGG, the functional characterization of

10

each gene in the final gene set was determined at specific function and pathway levels (Kanehisa and Goto 2000; Kanehisa et al. 2014).

Results Identification and conservation of candidate PCB-interacting genes Using an in silico molecular evolutionary approach, a key set of evolutionary conserved candidate PCB-interacting genes (N=109) was identified (see Appendix Table A1 for details and full gene names). CYP1A1, CYP1A2, SULT1A1, common POP metabolizing enzymes, were not included in our final gene set because Homologene did not identify any invertebrate homologues and/or GenBank mRNA sequence information was insufficient. Across all sequences used for alignment for the 109 genes, the overall average (± SD) conserved domain length was 279.86 ± 116.88 base pairs and the overall average nucleotide conservation was 50.90 ± 8.63 %. Tubulin, alpha 1b (TUBA1B)’s domain had the highest nucleotide conservation (72.76 %) and cell division cycle 6 (CDC6)’s domain had the lowest nucleotide conservation (30.93%). Overall average (± SD) amino acid conservation was 66.43 ± 16.31%. TUBA1B’s domain had the highest amino acid conservation (97.56%) and glutathione –S – pi (GSTP1)’s domain had the lowest amino acid conservation (31.07%). Seven gene families with at least two members were also identified: aldehyde dehydrogenases (ALDH2, ALDH1A1, ALDH6A1), tubulins (TUBA1B, TUBA1C, TUBB2B), heat shock proteins (HSPA1B, HSP90AB1, HSPA4, HSPA8), solute carriers (SLC1A3, SLC7A4, SLC 7A7); phosphoenolpyruvate carboxykinases (PCK1, PCK2); ATP-binding cassette transporters (ABCB1A, ABCC3); and ATPsynthases (ATP5B, ATP5H).

Functional characterization Overall, the KEGG pathway characterized all 109 genes: 47% involve metabolism, 21% involve genetic information processing, 15% involve cellular processes, and 13% involve environmental information processing (Fig. 2.2). Similar nucleotide conservation and amino acid conservation profiles were

11

generated across each KEGG pathway (Table 2.2; Appendix Table A2 and Table A3 for summary statistics on conservation information). Among the less conserved candidate PCB-interacting genes (N=38; nucleotide conservation < 49.5%, Table 2.3), the highest proportion of genes involved metabolism (N=24; 63.16%), within which the greatest number of genes annotated to xenobiotic metabolism (N=14; 58. 33%). This less conserved group included several common biomarker genes conventionally used to detect exposure to PCBs and dioxins, including AHR, acetylcholinesterase (ACHE), and cytochrome p450, family 3, (CYP3A4, CYP3A5), as well as Phase 2 xenobiotic metabolizing enzymes, including UDP glucuronosyltransferase 1 family, polypeptide A1 (UGT1A1), sulfotransferase family 1E, estrogen-preferring, member 1 (SULT1E1), glutathione –S – transferase pi 1 (GSTP1) and transporters from Phase 3 of xenobiotic metabolism including ABCB1A, ABCC3, SLC1A3, SLC17A7. Among the more conserved candidate PCB-interacting genes (N=71, nucleotide conservation ≥ 49.5%, Table 2.3) the highest proportion of genes also involved metabolism (N=27, 38.03%), within which the greatest number of genes were associated with carbohydrate metabolism (N=15, 21.13%).

Discussion and conclusion The framework presented (Figure 2.1) here shows the feasibility of using a molecular evolutionary approach coupled with publically available bioinformatics tools to identify a suite of candidate PCB-interacting genes conserved across taxa. Using this approach, genes conventionally used in environmental biomonitoring of PCBs in vertebrates were either discarded from the analysis due to lack of definitive invertebrate homologues in GenBank (e.g., CYP1A1, CYP1A2) or were grouped into the lower nucleotide conservation group (e.g., AHR, ACHE, CYP3A4, CYP3A5). This lack of or lower nucleotide conservation may indicate differences between vertebrates and invertebrates in the presence, function and/or ability of these genes and gene products to interact with toxicants. Among CYP1s for example, no clear evidence exists of the presence or inducibility of CYP1 enzymes in invertebrate taxa (Zhou et al. 2010; Koenig et al. 2012). As previously mentioned, invertebrate AHR

12

homologues also lack specific high affinity ligand binding sites (Butler et al. 2001; Hahn 2002). Both of these examples suggest the ability of AHR and CYP1s to bind respective ligands and mediate xenobiotic metabolism is a vertebrate-specific ability acquired during their evolution (Hahn 2002; Zhou et al. 2010; Koenig et al. 2012). As such, these genes may not be taxonomically universal biomarkers of exposure to PCBs. Furthermore, comparative genomic studies of cellular metabolism indicate xenobiotic metabolism pathway genes are less well conserved, especially between invertebrates and vertebrates, and may be associated with taxon-specific innovations (Peregrin-Alvarez et al. 2009). Consequently, conventional vertebrate-specific PCB-interacting genes commonly used in an ecotoxicological framework may not be useful when trying to assess the functional genomic response of both vertebrates and invertebrates in an environmental biomonitoring workflow. Applying a molecular evolution approach also provided insight into pathway conservation and range of gene function. From my findings, the highest proportion of candidate PCB-interacting genes from the higher conservation group belonged to carbohydrate metabolism. Other studies also indicate this pathway is one of the most conserved not only among metazoans, but among archaeans and bacteria as well (Peregrin-Alvarex et al. 2003; Peregrin-Alvarez et al. 2009). Given the KEGG functional characterization of genes in the gene set, my findings also show these genes are involved with a wide range of functions. This breadth of function among the 109 genes may provide a wider yet more refined picture of PCB-exposure across taxa than can be derived from analyzing the expression of one or two genes or the entire transcriptome from a single species. Several gene families were also represented in my candidate PCB-interacting gene set, identified using an in silico methodology. HSPs are ubiquitous proteins involved in a general cellular stress response that aim to reduce damage and maintain and/or re-establish cellular homeostasis (Gupta et al. 2009). HSP 70 is the most highly conserved, and the first to be induced when an organism is under cellular insult. Given that HSP70 can be induced by a range of stressors, such as temperature changes, anoxia, and several toxicants, its use as a biomarker to predict

13

specific stressors is limited (Gupta et al. 2009). However, including HSP70 genes (e.g., HSPA4, HSPA8 from my findings) in a set of conserved candidate PCB- interacting genes may be informative because their level of expression can be compared to the expression of other genes for any correlations across taxa. It is also interesting to note that applying a molecular evolutionary framework also identified genes from the tubulin superfamily (TUBA1B, TUBA1C, TUBB2B) as conserved candidate PCB-interacting genes. Tubulin is the subunit protein of microtubules which is important for the structure and kinetics of the cytoskeleton of all organisms (Ludueña 2013). Both ∝ and ß tubulins have conventionally been used as housekeeping genes in transcriptomic studies of vertebrates and invertebrates (Willett et al. 1997; Zhang et al. 2012). Housekeeping genes are genes that show stable expression levels across a wide range of tissues and for a wide range of stressors, such as toxicants, diseases, and temperature changes. They are used to normalize the expression of genes under study. My results indicate, however, that though tubulins are conserved, they may not be reliable housekeeping genes due to their established interaction with PCBs. In order to validate this evolutionary conserved gene set, further experiments can be performed. These genes can be used to develop the molecular tools necessary to probe the transcriptome of standard toxicological organisms and common bioindicator species. For example, degenerate primers, primers used to amplify genes in non-model organisms and/or to amplify a target gene(s) in more than one species, can be designed from the sequence information of each gene. Once designed and used to amplify genes across phylogenetically distant taxa, the utility of these genes and associated molecular tools can be tested in a variety of ways, relevant to the AOP paradigm of ERA and applicable in a biomonitoring workflow. In an ecotoxicological framework, current laboratory-based bioassays (for both model, e.g. Danio rerio, and non-model, e.g. Hyalella azteca, toxicological organisms) that characterize a site suspected of PCB contamination, such as acute lethality tests, bioaccumulation, and/or reproductive bioassays, can be performed concurrently with gene expression analysis in order to link expression levels of PCB- interacting genes with traditional toxicological endpoints. In the field, testing the

14

gene set and associated molecular tools in polluted vs. non-polluted aquatic communities, where the presence and abundance of common aquatic bioindicator species (e.g. Hexagenia rigida and Hexagenia limbata) are a proxy of water-quality, would also signify the usability of our approach in a biomonitoring workflow. Traditional gene expression studies either use a few biomarker genes or the whole transcriptome for analysis of exposure to xenobiotics; these methods however, present their individual drawbacks. This present study shows the feasibility of using a molecular evolutionary approach coupled with publically available bioinformatics tools to identify a suite of candidate PCB-interacting genes conserved across taxa. This targeted-transcriptome methodology serves as a framework to determine any xenobiotic-interacting genes conserved among distantly related organisms, with the computational tools to analyze them.

15

• data-mine the CTD for candidate PCB-

Comparative interacting genes Toxicogenomics • data-mine Google Scholar, Web of Science for candidate PCB-interacting genes Database (CTD)

• cross-reference list with Homologene to identify genes with invertebrate Homologene homologues

• download mRNA (cDNA) sequences from available organisms GenBank

• align sequences by ClustalW using Mega 5.2.2 software • determine highly conserved region from protein multiple sequence alignment (MSA) • group genes into higher conservation (≥ 49.5% nucleotide conservation/domain) or lower conservation (< 49.5% nucleotide conservation/domain) Figure 2.1 Workflow to identify candidate PCB-interacting genes conserved across taxa using available online databases and bioinformatics tools used in molecular evolution. * nucleotide conservation is represented by the number of identical bases over the total number of aligned bases for the region selected.

16

Table 2.1 Number and sources of mRNA sequences used.

Taxonomy Common Species Number of Total Name mRNA number (cDNA) of sequences sequences Invertebrate Nematoda roundworm Caenorhabditis elegans 55 roundworm Ascaris suum 1 heartworm Dirofilaria immitis 1 158 Cnidaria stony coral Acropora millepora 2

Mollusca zebra mussel Dreissena polymorpha 1

Arthropoda fruit fly Drosophila melanogaster 75 honey bee Apis florea 3 silk worm Bombyx mori 2 mosquito Anopheles gambiae 4 Hexagenia limbata 8 cabbage looper Trichoplusia ni 1 pea aphid Cyrthosiphon pisum deer tick Ixodes scapularis 1 water flea Daphnia pulex 1 salmon louse Lepeophtheirus salmonis 1 2

Vertebrate Mammalia human Homo sapiens 109 mouse Mus musculus 107

Aves chicken Gallus gallus 95 duck Anas platyrhynchos 2 turkey Meleagris gallopavo 1 520 Teleosti zebra fish Danio rerio 100 fathead Pimphelas promelas 1 minnow olive flounder Paralichthys olivaceus 1 rainbow trout Oncorhynchus mykiss 1 Nile tilapia Oreochromis niloticus 1

Amphibia African clawed Xenopus laevis frog 82 Western Xenopus tropicalis clawed frog 20 TOTAL 678 6.23 ± 0.75 = average (± SD) number of sequences per gene alignment 109 = total number of homologous PCB-interacting genes, at least 1 invertebrate sequence/gene; 279.86 ± 116.88 base pairs = average (± SD) length of domain per gene.

17

Table 2.2 Domain conservation of candidate PCB-interacting genes in each KEGG pathway.

KEGG Pathway % Nucleotide % Amino acid Number of Conservation Conservation genes/pathway Cellular Processes 53.01 ± 9.90 70.80 ± 18.01 17

Environmental 51.29 ± 5.40 65.38 ± 14.24 14 Information Processing

Genetic Information 55.60 ± 6.99 75.81 ± 13.05 23 Processing

Metabolism 49.09 ± 8.23 62.86 ± 9.47 51

Other 36.71 ± 5.27 45.75 ± 14.38 4

18

Table 2.3 Functional characterization of candidate PCB-interacting genes conserved across taxa (N=109).

KEGG Pathway Specific Function Interacting Genes (Human) Lower Conservation Higher Conservation (N=38) (N=71)

Cellular Processes Cell growth and death AKTIP, CDC6 CCNA2, CDK2, (N=17) SMARCB1, RAB1A

Cell communication EXOC3 CKAP5

Cell motility TUBB2B, TUBBA1B, TUBA1C, PAK6

CAT, HACL1, AP2A2, Transport and catabolism CROT CLTC

Environmental Information Processing Signal transduction HIF1A, NOS1* GTPBP1, MAPK1, (N=14) PLCB1, VDAC1, VDAC2, IGF1R,

Signaling molecules and GRM3 COL4A1, CCRN4L, interaction GABBR2

Membrane transporters SELENBP1, CYB5A

Genetic Information Transcription VDR GABPA, ARID1A, Processing CDC5L, POLR2B, (N=23) HNF4�, NF1A, RXR, LHX2

Translation SYMPK RPS8, RPLP0, EE1A1

HSP90AB1, HSPA1B, Folding, sorting, and PRKSCH HSPA4, HSPA8, CANX, degradation of proteins P4HB, CCT5, NR5A2

Replication and repair PARP1

19

Metabolism Xenobiotic metabolism: AHR, AIP, CYP3A4, ARNT (N=51) Phase 1 – Functionalization CYP3A5, PGRMC1 reactions

Xenobiotic metabolism: UGT1A1, SULT1E1, Phase 2 – Conjugation EPHX1, GSTP1 reactions

Xenobiotic metabolism: ABCB1A, ABCC3, SLC7A4 Phase 3 – Transporters SLC1A3, SLC7A7

Amino acid metabolism ALDH6A1 DDC, GLUD1, PSAT1 ALAS2

Nucleotide metabolism HPD, TH AMPD3

ACHE*, ACADS*, Lipid metabolism ACOX1*, FASN, GPAM, PAFAH1B2 ACSL5*, UGCG,

Energy metabolism ATP5H ATP5B, NDUFS2, SDHA

Carbohydrate metabolism ALDH2, DLAT, ALAD LDHA, MDH2 G6PD, ALDH1A1, IDH2, PDHA1, PYGM, PGD, GAPDH, PCK1, PCK2, ENO1, PKM

FKBP4, GRTP1, Other TMEM98, FBLN1 (N=4)

* = multi-functional ; Lower conservation: < 49.5% conservation/domain; Higher conservation: ≥ 49.5% conservation/domain

20

4% Cellular Processes 15% Environmental Information Processing 13% Genetic Information Processing 47% Metabolism

21% Other

Figure 2.2 Relative frequency of candidate PCB-interacting genes conserved in animals per KEGG Pathway.

21

Chapter 3

Design and validation of degenerate primers for candidate PCB- interacting genes conserved in animals

Introduction In order to measure functional gene expression, messenger RNA (mRNA) of a gene of interest needs to be successfully amplified in target organism(s) first, in order to verify the presence and expression level of that gene. In most cases, this requires an amplification of complementary DNA (cDNA) (which is synthesized from template mRNA) through PCR. Oligonucleotide primers, short segments of DNA usually 20-30 bps long are a key component to successful PCR. During PCR, template genomic DNA or complementary DNA (cDNA) is denatured, a gene-specific primer pair (forward primer and reverse primer) anneals to the targeted section of that single-stranded DNA template, the DNA template elongates by incorporating corresponding nucleotide bases via DNA polymerase, and through multiple cycles, millions of copies of double-stranded DNA specific to the target gene are created. Not all primers are strictly specific, however. A primer sequence is considered degenerate (non-specific) if some positions have multiple possible bases. Degenerate primers are actually mixtures of similar non-degenerate primers, which among other reasons, are used to amplify genes in non-model organisms - animals without known sequence information and/or to amplify a target gene(s) in more than one species. The use of degenerate primers in ecotoxicology is not new. Ecotoxicologists have used degenerate primers to amplify genes in order to: perform phylogenetic analyses of xenobiotic-interacting genes across species (Karchner et al. 2000; Teramitsu et al. 2000; Barber et al. 2007); quantify gene expression in response to a toxicant or other abiotic stressors (Zhang et al. 2005; Deane and Woo, 2006; Farcy et al. 2007; Bigot et al. 2010); investigate molecular mechanism(s) of differential sensitivity to toxicants (Karcher et al. 2000; Barber et al. 2007); and identify,

22

validate, and characterize putative homologs or new members of gene subfamilies (Miller et al. 1999; Kullman, 2000; Teramitsu et al. 2000; Zhang et al. 2005; Barber et al. 2007; Farcy et al. 2007; Bigot et al. 2010). Most often degenerate primers for ecotoxicological applications are designed from species closely related to the species of interest. For example, within the vertebrates, known fish sequences of vitellogenin, CYP1A, and HSP70, have been used to successfully design and amplify those target genes in non-model fish species (Bowman and Denslow. 1999; Miller et al. 1999; Zhang et al. 2005; Deane and Woo, 2006). Marine mammalian gene sequences for CYP1A have also been used to amplify unknown marine mammalian gene sequences (Teramitsu et al. 2000). Sometimes, sequences from multiple model vertebrate species, such as mammals, fish, and reptiles, are used to design degenerate primers to amplify CYPs in non- model fish species (Barber et al. 2007). The above examples of degenerate primer use in ecotoxicology demonstrate successful amplification of conventional biomarker genes for persistent organic pollutants, such as PCBs, in vertebrates. However, many of these biomarker genes are vertebrate specific (e.g. CYP1s and AHR) and consequently, are not useful in assessing the functional genomic response of invertebrate bioindicator species or common invertebrate test species to pollutants (Hahn 2002; Koeing et al. 2012). As such, new genomic tools that bridge this gap between vertebrate and invertebrate functional gene expression arising from toxicant exposure require development. The objective of this chapter was to design, optimize and test degenerate primers for candidate PCB-interacting genes conserved in animals (from Chapter 2). The degenerate primer design method employed is based on Consensus-Degenerate Hybrid Oligonucleotide Primer (CODEHOP) design strategy (Staheli et al. 2010). This design strategy is based on all the possible codons for the conserved regions of protein multiple sequence alignments. CODEHOP primers are usually longer than standard primers and contain a short degenerate 3’ core and a longer 5’ non- degenerate clamp. The 3’ core is chosen from a set of primers that contain all the possible codons for the conserved amino acid domain; the 5’ clamp stabilizes the primer (Staheli et al. 2010). CODEHOP has been successfully used to design primers

23

for the PCR amplification of genes across divergent taxa (Hajibabaei et al. 2006; Chakravorty et al. 2010; Staheli et al. 2010). To my knowledge, however, this CODEHOP design strategy has yet to be applied to a molecular evolutionary ecotoxicology framework, using protein multiple sequence alignments from a range of organisms (human, mouse, fish, chicken, frog, fly, and nematode) to target evolutionarily conserved xenobiotic-interacting genes. I hypothesized that degenerate primers for selected PCB-interacting conserved across taxa can be developed if target genes can be amplified, Sanger- sequenced, and bioinformatically identified in common, yet taxonomically divergent, ecotoxicological test species: Hexagenia limbata, Hexagenia rigida, Danio rerio, and Mus musculus. To test this hypothesis, I applied a multi-tiered PCR-Sanger sequencing workflow that included visual confirmation of gene amplification on a 1.5% agarose gel, Sanger DNA sequencing, as well as two bioinformatic confirmation tests on sequence data: identification of the target gene using BLAST and subsequent alignment of sequences to verify the target gene region.

Materials and Methods CODEHOP degenerate primer design A set of candidate PCB-interacting genes highly conserved in animals (N=71, ≥ 49.5% nucleotide conservation for domain selected) was previously identified in chapter 2 (Table 3.1). For each of these genes, the translated nucleotide sequences (1 sequence from each of H. sapiens, M. musculous, G. gallus, D. rerio, X. laevis, D. melanogaster, C. elegans per gene, with some exceptions and substitutions) were aligned by ClustalW (Chenna et al. 2003) using Mega 5.2.2 software (Tamura et al. 2011) to determine the highly conserved protein domain (see Figure 3.1 for workflow). Once the conserved domain was determined, Block Maker (http://blocks.fhcrc.org/blocks/make_blocks.html) converted these alignments into a Blocks Database format. Degenerate primers were designed using these blocks of protein multiple sequence alignments and the default parameters in CODEHOP (http://blocks.fhcrc.org/codehop.html) for each candidate PCB-interacting gene (Rose et al., 2003; Default parameters: primer concentration = 50nM, maximum

24

degeneracy = 128, genetic code = standard, codon usage table = gbinv for invertebrate). Based on amplicon length (between 100 and 320 bp), low degeneracy (at most 3 “n”s in degenerate 3’ end; n= A,C,T or G) and appropriate annealing temperature (between 52-58° C), one primer set (forward and reverse) for each candidate PCB-interacting gene was chosen. In order to decrease degeneracy of primer sets, an inosine “i” was incorporated for every “n”. Inosine is a nucleotide base that is identical to guanine, but lacking the N2 amino group. It is found in the Watson-Crick wobble position– the 5’-nucleotide of tRNA anticodons of mRNA (Crick 1966), and can loosely pair with the four nucleotides (A, C, T, G) (Ben- Dov and Kushmaro 2015) Inosine has been successfully used in degenerate primers to allow annealing to different, but similar sequences, during PCR (Ben-Dov and Kushmaro 2015). This CODEHOP design strategy was repeated for two housekeeping genes: RPS18 and CPOX. These housekeeping genes are well documented (Maroniche et al. 2011; Kuchipudi et al. 2012; Eisenberg and Levanon 2013), lack known interaction with PCBs, possess high conservation across taxa (at least one invertebrate homologue), and have good (between 100 and 320 bp) amplicon length. A segment of the ribosomal RNA (18S rRNA) was also used as a housekeeping gene based on the above criteria, however, a universal primer pair (F-1183 and R-1438) was chosen from Hadziavdic et al. (2014).

Validation of degenerate CODEHOP primers a) RNA isolation and cDNA synthesis Experimental animals Hexagenia spp. eggs (a mixture of H. limbata and H. rigida; mayflies) were collected from Lake St. Clair in the spring/summer of 2013 (Dr. J. Ciborowski, University of Windsor). Hexagenia spp. were cultured in the toxicology labs of the MOECC, Laboratory Services Branch, Etobicoke, ON (MOECC 2014). The whole nymph body, which includes different tissue types, was used for RNA isolation, Danio rerio (zebra fish) and the undifferentiated embryonic stem cells of M. musculus (mouse) were cultured, RNA extracted, and reverse transcribed to cDNA in

25

the Van der Kraak lab at the University of Guelph, Guelph, ON. The D. rerio cDNA samples were from liver tissue and comprised a mixture from several individuals (see Figure 3.2 for validation workflow).

RNA isolation RNAlater-preserved H. limbata samples (N=2) and H. rigida (N=1) were lysed in liquid nitrogen and homogenized using a QIAshredder homogenizer (Qiagen, Germany). Total RNA purification was performed using the standard silica matrix method with the RNeasy Mini Kit for total RNA extraction from animal tissues (Qiagen, Mississauga, ON, Canada), incorporating on-column DNase digestion (Qiagen, Mississauga, ON, Canada) prior to final elution of RNA. RNA from each sample was eluted with 40µL RNase-free water. The concentration and integrity of each RNA sample was measured via the UV absorption (260/280) using a NanoDrop 1000 spectrophometer (Thermo Fisher Scientific, USA). Total RNA was stored at - 80°C until cDNA preparation. cDNA synthesis and RT-PCR amplification First-strand synthesis was performed on each sample using the SuperScript III First-Strand Synthesis System for RT-PCR kit (Invitrogen, California, USA) in a total reaction volume of 20µL. H. limbata Sample 1 (2ml vessel) : 4.2 µL RNA, 1 µL oligo(dT), 1 µL dNTP (10 mM each), 3.78 µL DEPC-treated water to which 10 µL of the following cDNA synthesis mix was added:

2 µL 10X RT buffer, 4 µL mM MgCl2, 2 µL 0.1 M DTT, 1µL RNaseOUT (40 U/µL), and SuperScriptTM III RT (200 U/µL).

H. limbata Sample 2 (2ml vessel): 7.86 µL RNA, 1 µL oligo(dT), 1 µL dNTP (10 mM each), 0.13 µL DEPC-treated water to which 10 µL of the following cDNA synthesis mix was added: 2 µL 10X RT buffer, 4 µL mM MgCl2, 2 µL 0.1 M DTT, 1 µL RNaseOUT (40 U/µL), and SuperScriptTM III RT (200 U/µL).

26

H. rigida (2ml vessel) 5.1 µL RNA, 1 µL random hexamers, 1 µL dNTP(10 mM each), 2.90 µL DEPC-treated water to which 10 µL of the following cDNA synthesis mix was added: 2 µL 10X RT buffer, 4 µL mM MgCl2, 2 µL 0.1 M DTT, 1 µL RNaseOUT (40 U/µL), and SuperScriptTM III RT (200 U/µL). Reverse transcription of each sample was carried out as follows: incubation for 50 min at 50°C, termination for 5 minutes at 85°C, addition of 1 µL of RNase H to each sample tube, followed by incubation for 20 min at 37°C. The concentration and integrity of final cDNA was measured via the UV absorption (260/280) using a NanoDrop 1000 spectrophometer (Thermo Fisher Scientific, USA). cDNA was stored at -20°C until PCR amplification.

b) PCR Amplification i) Sample preparation for PCR The annealing temperature of the forward and reverse primer for each gene was used to group sets of genes for PCR (Table 3.1). All primers were tested on the same D. rerio cDNA. All primers were tested using H. limbata from Sample 1 except as otherwise noted. Only the primers that showed visual amplification for either D. rerio and/or H. limbata were tested on M. musculus cDNA and H. rigida cDNA. The concentration of cDNA from H. limbata, H rigida, D. rerio, and M. musculus was normalized to 50ng/µL. For each PCB-interacting gene and housekeeping gene, PCR reaction mixtures were prepared totaling 25 µL, each containing: 17.5µL H20, 2.5µL buffer,

1.0µL MgCl2, 0.5µL NTPs, 2.0µL cDNA, 2.0µL Taq, 0.5uL forward primer and 0.5µL reverse primer. ii) Running PCR Each tube was placed in a thermocycler in its appropriate column based on annealing temperature. The following parameters were used: 2 min at 94°, 40 cycles

27

of: 40 s at 94° C (melting step),1 min (between 43-59°C) (annealing step), 30 s at 72°C (elongation step), 5 min at 72°C, and hold at 10°C. c) Gene verification Visual confirmation Amplicons were analyzed by running PCR products on a 1.5% agarose gel for visualization. Amplicons showing bands of expected size were considered a positive result, and successfully amplified. Amplicon purification M. musculus amplicons from several genes (N=15) showed smearing or multiple bands on the gel and required repeating the initial PCR and a subsequent purification step prior to sequencing. Purification was performed using the MinElute ® PCR Purification Kit (Qiagen, Mississauga, ON, Canada). Bioinformatics confirmation Amplicons successfully amplified (based on visual confirmation), in at least one test species were prepared for Sanger sequencing. i) Preparing Sanger Sequencing Sanger Sequencing reaction mixtures each contained the following, totaling to 11.5µL per reaction: 2µL amplicon, 8.5µL big dye, and 1µL primer. The reactions for forward and reverse primers were done separately. ii) Running Sanger Sequencing Sanger sequencing reaction mixtures were placed in a thermal cycler. The following parameters were used: 2 min at 96°C, 30 cycles of: 30 s at 96°C, 15 s at 55°C, 4 min at 60°C, and hold at 4°C. iii) Sanger Sequencing Analysis Forward and reverse sequences were determined using an Applied Biosystems Genetic Analyzer (Thermo Fisher Scientific, USA). Sequences were analyzed using Codon Code Aligner (http://www.codoncode.com/aligner/). They were assembled first, then quality trimmed by discarding ambiguous bases (N’s) from the beginning and end of each consensus sequence. For each contig, consensus

28

sequences were searched against GenBank sequences using BLAST (Altshul et al. 1990) to verify each gene. For those sequences that did not form contig (e.g. one of the sequence directions was poor quality), either a forward or reverse sequence was used to compare against the nr/nt nucleotide database in GenBank. For H. limbata and H. rigida, BLAST parameters were: blastn using default parameters, filtering for metazoan, excluding vertebrates. For D. rerio and M. musculus, blast parameters were: blastn using default parameters (no filtering). In order to verify a match between amplicon sequence and gene identity, the top hit (gene and species) was used. Because H. limbata and H. rigida do not have a reference genome, (only a sample of mRNA sequences used in this study are in GenBank), verification for most Hexagenia spp. sequences was based on the top hit gene match for any insect species. A match between each gene and its BLAST ID was considered a positive result, and successfully sequenced and identified. Amplicon sequences for each gene per species were aligned with the initial sequences used to generate the CODEHOP primers and the forward and reverse CODEHOP primers by ClustalW (Chenna et al. 2003) using Mega 5.2.2 software (Tamura et al. 2011). Visual inspection of the alignment of each amplicon sequence between the forward and reverse primer sequence for each gene was used to verify the correct region of the gene was amplified.

Results Primer Design Based on criteria used to design primer sets, degenerate primers were designed for 68 (out of 71) candidate PCB-interacting genes and two housekeeping genes. The CODEHOP design process did not work for three genes because the estimated amplicon length (between forward and reverse primers) was less than 100 bps, a criteria necessary for better success in amplification (see Appendix Table A4 for all primers and primer properties).

29

Visual confirmation Of the 68 primer pairs attempting amplification through PCR, 63% of genes (3HK and 40 PCB-interacting) were amplified in at least one species, based on analyzing the PCR products (amplicons) on a 1.5% agarose gel. Specifically, 46% of genes (3 HK and 28 PCB-interacting) and 54% of genes (3 HK and 34 PCB- interacting) were visually amplified in H. limbata and D.rerio respectively. 81% and 84% of genes that showed visual confirmation in either H. limbata or D. rerio or both species, also showed visual confirmation in M. musculus and H. rigida respectively. 47% of the genes visually amplified (2 HK and 18 PCB-interacting) were amplified in all four species (Figure 3.3). All negative controls were blank.

Bioinformatic confirmation Amplicons of genes (3 HK and 40 PCB-interacting genes) visually amplified in at least one test species were Sanger-sequenced; resulting sequences were identified through BLAST (Appendix Table A5 for BLAST output). Of the genes sequenced, 86% (3HK and 34 PCB-interacting) correctly matched the top hit generated by BLAST in at least one species (Figure 3.4). Of these 37 genes, 12 were correctly identified in all four test species; 1 identified only in H. limbata and 2 identified only in D. rerio; 5 identified only in M. musculus (Figure 3.5). In at least two species, 78% of the genes were correctly identified. For all the genes correctly identified through BLAST, 100% of gene sequences aligned correctly between primers. For five genes in D. rerio (MAPK1, ENO1, SLC7A4, CDK2, and RXR�), sequences aligned properly, and the correct gene family was identified, but either a different gene ID or an isoenzyme was determined by BLAST. For five genes in M. musculus, (ENO3, HSPA1B, PCK1, TUBA1B, TUBB2B), sequences aligned properly and the correct gene family was identified, but either a different gene ID or an isoenzyme was determined by BLAST.

False positives and false negatives When results at both the visual and bioinformatics levels of confirmation for each species were compared, false positives (Type 1 error) and false negatives

30

(Type 2 error) were identified (Table 3.3). For example, single bands with the correct length for GAPDH appeared for all test species, but the sequence for H. limbata GAPDH did not return any match through BLAST, while the other species GAPDH sequences were correctly identified. Conversely, false negatives were also determined in H. rigida, D. rerio, and M, musculus, such that BLAST confirmed the identity of genes that were not visually amplified.

Discussion and Conclusion The CODEHOP design and verification strategy was successful for 54% (37 out of 68) primer pairs tested in at least one species; 12 of these designed primer pairs was successful in all four species. This multi-step verification method was essential for determining the success of this degenerate primer design strategy. Since visual amplification on a 1.5% agarose gel can only indicate the presence of DNA (within a narrow threshold for detection) of a certain length (number of bp), it was imperative to also confirm gene identity through sequencing and then bioinformatics to exclude false positives. The false positives in the visual confirmation results may be due to non- specific amplification. By definition, a degenerate primer is actually a mixture of similar non-degenerate primers. Only one of these non-degenerate primers matches the DNA template, so the most suitable non-degenerate primer is actually diluted in the mixture. Consequently, the greater the degeneracy of the degenerate primer, the less chance the most suitable non-degenerate primer has to bind to the DNA template, and be amplified through PCR. As such, other non-specific amplification can occur when a stretch of sequence in the degenerate primer binds to other DNA sequences, and a non-target gene is consequently amplified. Despite trying to minimize the degeneracy of each primer by incorporating an “inosine” for every “n”, some of our unsuccessful primers may still be too degenerate to successfully amplify only target genes. Furthermore, studies have shown inosine does not bind to nucleotides indiscriminatingly; based on molecular thermostability, binding preference is cytosine > adenine > thymine > guanine (Watkins and Santa Lucia 2005). This binding preference may also result in non-specific amplification. In

31

addition, some of our target genes are from families in which genes share several copies of homologous sequences (e.g. tubulin, enolase, retinoic acid receptor); these sequences may be the sequences targeted by our degenerate primers. As such, isoenzymes and/or subtypes of a gene, and possibly paralogous genes (homologous genes related by duplication within the genome of one species, but are much less functionally conserved), may have been preferentially amplified, especially for the gene sequences from D. rerio and M. musculus that aligned properly between the primers, but failed the bioinformatics BLAST ID test. The lack of visual confirmation for amplification of some genes using the degenerate primers can either indicate a lack of amplification (negative result) or amplification below the level of visual detection (false negative). Firstly, a lack of amplification (negative result) in some genes for any of the species could be due to differences in life stage of each animal, as well as differences in tissue type of each animal used for RNA extraction and consequent cDNA preparation. If a gene is not actively expressed during a specific life stage or in a specific tissue type, the gene will not be available for PCR amplification from cDNA. Furthermore, secondary structures, such as hairpins, can occur when the primer loops and partially anneals due to complementary sequences within the length of the primer. Primer dimers, another secondary structure that can form, occur when two primers anneal at their respective 3’ ends. In both of these cases, smaller DNA molecules are formed and through PCR amplification, can become the dominant DNA template; these secondary structures generally appear on a gel as fainter bands at the 30-50 size range (Singh et al. 2000). False negatives in the visual confirmation results may be due to differential sensitivity of each verification method: the concentration of DNA required to visually fluoresce on a 1.5% agarose gel is less than the concentration of DNA required for successful sequencing (Hajibabaei et al. 2005). The second tier of verification, bioinformatic validation, reduced the chance of recovering false positives and false negatives. Using a CODEHOP degenerate primer design strategy coupled with visual and bioinformatics validation tools, a set of conserved candidate PCB-interacting genes (N=10) and housekeeping genes (N=2) was successfully amplified, sequenced, and

32

identified in all four test species: H. limbata, H. rigida, D. rerio, and M. musculus. A breadth of cellular functions across different biological pathways is still represented by this set of genes (Figure 3.5). Though these 10 genes are a smaller subset of the original 68 genes with which we began, this gene set still represents some of the fundamental aspects of life’s most basic functions and the universal processes that deal with stress (Kültz 2005; Hoffman and Willi 2008). To my knowledge, this is the first time degenerate primers have been used to amplify xenobiotic-interacting evolutionary conserved genes over such taxonomic breadth, and in non-model, conventional ecotoxicological invertebrate species. Recently, Veldoen et al. (2014) designed primers that amplified a set of genes known to interfere in thyroid functioning when exposed to chemical contaminants; these primers were validated on a range of non-model anuran species, with a phylogenetic distance of 260 million years of evolution. My gene set was amplified in species with a phylogenetic distance of approximately 600 million years of evolution (Hedges et al. 2015). The success of amplifying this set of conserved candidate PCB-interacting genes in divergent organisms provides the tools to potentially probe the targeted- transcriptome of a wide range of species, from vertebrates to invertebrates, from model organisms to non-model organisms, from bioindicator species to conventional toxicological organisms, both naturally-exposed and laboratory- exposed to PCBs. Furthermore, this approach serves as a framework for the design and validation of other evolutionary conserved gene sets for the other toxicants (e.g. metals, nanoparticles, or endocrine-disrupting chemicals).

33

Table 3.1 Candidate PCB-interacting genes conserved in animals used for degenerate primer design (N=71, ≥ 49.5% nucleotide conservation for domain selected).

KEGG Pathway Specific Function Gene ID Gene Name Cellular Cell growth and death CDK2 cyclin-dependent kinase 2 Processes RAB1A member RAS oncogene family SMARCB1 SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily b, member 1 CCNA2 cyclin A2 Cell communication CKAP5 cytoskeleton associated protein 5 Cell motility PAK6 p21 protein (Cdc42/Rac)-activated kinase 6 TUBA1B tubulin, alpha 1b TUBA1C tubulin, alpha 1c TUBB2B tubulin, beta, 2b Transport and AP2A2 adaptor-related protein complex 2 alpha 2 catabolism subunit CAT catalase CLTC clathrin, heavy chain HACL1 2-hydroxyacyl-CoA 1 Environmental Signal transduction IGF1R insulin-like growth factor 1 receptor Information GTPBP1 GTP binding protein 1 Processing MAPK1 mitogen-activated kinase PLCB1 phospholipase C beta 1 VDAC2 voltage-dependent anion channel 2 ETS2 v-ets avian erythroblastosis virus E26 oncogene Signaling molecules CCRN4L carbon catabolite repression 4-like and interaction COL4A1 collagen, type IV, alpha 1 GABBR2 gamma-aminobutyric acid B receptor 2 Membrane CYB5A cytochrome b5 type A (microsomal) transporters SELENBP1 selenium binding protein Genetic Transcription ARID1A AT-rich interactive domain 1A (SWI-like) Information CDC5L cell division cycle 5-like Processing HNF4 hepatic nuclear factor 4, alpha LHX2 LIM homeobox 2 NF1A nuclear factor 1/A POLR2B polymerase (RNA) II (DNA directed) polypeptide B RXR� retinoid x receptor, alpha Translation EEF1A1 eukaryotic translation elongation factor 1 alpha 1 RPLP0 ribosomal protein, large, P0 RPS8 ribosomal protein S8 Folding, sorting, CANX calnexin degradation of CCT5 chapernonin containing Tcp subunit 5 proteins HSPA1B heat shock protein 70kDa protein 1B HSPA4 heat shock protein 70kDa protein 4 HSPA8 heat shock protein 70kDa protein 8 HSP90A1B heat shock protein alpha 90 kDa, class B, member 1 NR5A2 nuclear receptor, subfamily 5, group A, member 2 Replication and repair PARP1 poly[ADP-ribose] polymerase 1 aryl hydrocarbon nuclear translocator

34

Metabolism Xenobiotic metabolism ARNT SLC7A4 solute carrier family 7, member 4 Amino acid DDC dopa decarboxylase metabolism GLUD1 glutamate dehydrogenase 1 PSAT1 phospherine aminotransferase 1 ALAS2 delta-aminolevulinate synthase 2 ALDH6A1 aldehyde dehydrogenase 6, member A1 Nucleotide AMPD3 adenosine monophosphate deaminase 3 metabolism Lipid metabolism UGCG UDP-glucose ceramide glucosyltransferase ACSL5 acyl-CoA synthetase long-chain family member 5 Energy metabolism ATP5B ATP synthase, H+ transporting, mitochondrial F1 complex, beta polypeptide SDHA succinate dehydrogenase complex, subunit A, flavoprotein ALAD aminolevulinate dehydratase Carbohydrate ALDH1A1 aldehyde dehydrogenase 1, member A1 metabolism ALDH2 aldehyde dehydrogenase 2 DLAT dihydrolipoamide S-acetyltransferase ENO1 enolase 1, alpha GAPDH glyceraldehyde-3-phosphate dehydrogenase G6PD glucose-6-phosphate dehydrogenase 1DH2 isocitrate dehydrogenase 2 (NADP+) PCK1 phosphoenolpyruvate carboxykinase 1 PCK2 phosphoenolpyruvate carboxykinase 2 PDHA1 pyruvate dehydrogenase (lipoamide), alpha 1 PGD phosphogluconate dehydrogenase PKM pyruvate kinase, muscle PYGM phosphorylase glycogen, muscle

35

• download mRNA sequences from available organims* for each GenBank candidate PCB-interacting gene (from Chapter 2) • align translated sequences to determine ClustalW highly conserved region from protein MSA using Mega • retain genes with ≥ 49.5 % nucleotide 5.2.2 software conservation per domain across taxa Block Maker • convert MSA into Blocks software Database format

CODEHOP • generate list of all possible software degenerate primers from blocks of protein MSA

• choose one primers set (forward and reverse) based on: amplicon length (100-320 bp) Primer melting temperature Optimization (52-58° C) low degeneracy (≤ 3 n's) change all n's to inosine "i" to reduce overall degeneracy

Figure 3.1 CODEHOP design strategy of degenerate primers for candidate PCB- interacting genes conserved in animals * organisms include: H. sapiens, M. musculus, G. gallus, D. rerio, X. laevis, D.melanogaster, C. elegans; MSA = multiple sequence alignment

36

Figure 3.2 Validation strategy of degenerate primers for candidate PCB-interacting genes conserved in animals.

37

Temp (° C) Genes (N=68) 43.1 NDUFS2 45.9 18S rRNA*, SLC7A4 54.1 AP2A2, GAPDH, GLUD1, TUBB2B 54.9 CCNA2 55.1 CYB5A, CLTC, P4HB, HSPA8, IDH2, ALDH1A1 55.2 EEF1A1, ARNT, ATP5B, CDC5L, PGD, ETS2, HSPA1B, HNF4A 55.3 PCK2 55.5 SMARCB1, CCRN4L, PYGM, NF1A 55.8 RXR∝, UGCG, VDAC2 56.0 G6PD, RPLP0, CPOX*, RPS18*, NR5A2, PCK1, COL4A1, GTPBP1 56.2 PAK6, AMPD3, CCT5, DDC, ARID1A 56.6 SDHA, ALAD, HSPA4, CKAP5, DLAT, PKM 57.0 TUBA1B, CANX 57.3 CDK2, ENO1, IGF1R, RPS8, GABPA, RAB1A, LHX2, PSAT1, ALDH2 57.6 TUBA1C 57.9 HSP90AB1, ALDH6A1, CAT, PDHA1, MAPK1, PLCB1 58.9 POLR2B

Table 3. 2 Annealing temperature for each gene’s primer pair.

*=housekeeping gene. Primers in bold were tested on Sample 2 of H. limbata.

38

Figure 3.3 Example visual confirmation of gene amplification using CODEHOP primers for H. limbata (H), D. rerio (D), and M. musculus (M).

Figure 3.4 Relative frequency of BLAST amplicon sequence ID.

39

Figure 3.5 Candidate PCB-interacting genes conserved in animals successfully amplified, sequenced, and identified using CODEHOP degenerate primers sets (N=37).

40

Table 3.3 Number of primer sets considered successful at each confirmation level for each species.

Number of genes Visual Bioinformatic False False confirmation confirmation positive negative H. limbata 30 21 9 0 H. rigida 36 23 14 1 D. rerio 37 29 11 3 M. musculus 37 29 10 2

41

Chapter 4

Assessing the targeted-transcriptome response of Hexagenia rigida exposed to PCB-52 using candidate PCB-interacting genes conserved across taxa

Introduction One of the goals of ecotoxicology is to detect and analyze environmental toxicants collected from water, sediment, and bioindicator species and subsequently perform laboratory-based bioassays to characterize each contaminant’s toxicity. These bioassays use both lethal (e.g. mortality) and sublethal (e.g. growth, reproduction) endpoints as well as bioaccumulation analyses to evaluate the potential risks these toxicants pose to wildlife and human health. More recently, a molecular evolutionary approach to transcriptomics has been used to develop multi-gene, multi-species tools that can elucidate the conserved or taxon-specific molecular events that underpin these standard ecotoxicological endpoints, as well as act as a sublethal indicator of toxicant exposure across a range of taxa (Baker et al. 2009; Veldhoen et al. 2014). These contributions can also provide valuable linkages between steps of the adverse outcome pathway (AOP), a new paradigm proposed for environmental risk assessment (ERA) (Ankley et al. 2010; Kramer et al. 2011; LaLone et al. 2013). The objective of this chapter was to demonstrate the potential of a molecular evolutionary approach to target the transcriptome (outlined in chapter 2), by using a set of mulit-gene, multi-species molecular tools (developed in chapter 3) and next- generation sequencing (NGS) technology, to investigate the targeted-transcriptome response of PCB-52 exposure on Hexagenia rigida, a common invertebrate ecotoxicological test species. PCBs are well characterized, legacy contaminants that biomangnify in both terrestrial and aquatic food chains (Giesy and Kannan 1998; Kelly and Gobas 2001). PCBs are detected worldwide, with notable toxic effects across many levels of biological organization (Antunes-Fernandes et al. 2011; Boix

42

et al. 2011; Kodavanti et al. 2011; Dutta et al. 2012; Canesi et al. 2003; Yilmaz et al. 2012; Sandal et al. 2008; Senthilkumar et al. 2011). More specifically, PCB-52 is a non dioxin-like PCB (NDL-PCB) commonly found in the sediments of Ontario’s freshwaters, as well as other parts of Canada (Kostyniak et al. 2005). It was also the congener present in highest amounts in fish from Fox River, USA and the congener highest in aquatic organisms from Beaverdams Creek, Ontario prior to remediation efforts in 2010 (Kostyniak et al. 2005; Watson-Leung-personal communication 2013). Though PCB-52 is considered to be one of the most neurotoxic toxic NDL-PCBs due to its lower chlorination pattern (Fernandes et al. 2010), the toxic mode(s) of action, physiological response, and functional genomic response of NDL-PCBs as a chemical class, however, are much less understood (Viluksela et al. 2012). Hexagenia spp. (Ephemeroptera: : H. limbata and H. rigida) are burrowing mayflies native to Ontario’s freshwaters and are an important food source for both fish and birds (Hunt 1958; Edsall et al. 2005; Papp et al. 2007). As nymphs, Hexagenia spp. live in and consume the sediment, such that environmental toxicants, including PCBs, can readily bioaccumulate and consequently, move through the food chain (Mauck and Olson 1977; Landrum and Poore 1988; Papp et al. 2007). Sensitive to environmental change, Hexagenia spp. are considered a bioindicator group for assessing ecosystem health (Edsall et al. 2005). Because Hexagenia spp. can bioaccumulate high levels of PCBs without lethal toxicity (Gerwurtz et al. 2000), they may also be good models to study the sublethal effects of PCBs at the molecular level, even though their genomes have not yet been fully sequenced. In ecotoxicology labs, Hexagenia spp. (H. limbata and H. rigida) are also routinely used in sediment exposures, and more recently, water-only exposures (Watson-Leung, personal communication 2013; Harwood et al. 2014). Water-only exposures are often used for sediment-dwelling species to eliminate any effects sorption of organic contaminants by the sediment may have on the bioavailability of the experimental toxicant (Fremling and Mauck 1980; Harkley et al. 1994). Consequently, uptake of environmental toxicants from water occurs actively

43

through pumping of the gills and passively by diffusing across the body wall (Landrum and Poore 1998). H. limbata and H. rigida are morphologically cryptic at the 5mg acute toxicity testing stage, and though differential abdominal pigmentation can distinguish them at the 20-30 mg bioaccumulation stage (Elderkin et al. 2012), they are not routinely identified to species in laboratory-based bioassays (Watson- Leung, personal communication 2013). Given that species-specific responses to environmental change have been documented for H. limbata and H. rigida (Bustos and Corkum 2013), DNA barcoding was also used to differentiate the species in this experiment. In this study, a water-only acute experiment on Hexagenia spp. exposed to PCB-52 was performed in which survivorship and bioaccumulation were measured, and the targeted transcriptome of H. rigida was analyzed to determine differential gene expression of candidate PCB-interacting genes conserved across taxa. This chapter aimed to validate the molecular evolutionary methodology used to target the transcriptome in three ways: 1) successful generation of transcript sequences for H. rigida; 2) effective annotation of transcripts, compared to whole transcriptome analysis; and 3) significant differential expression of candidate PCB- interacting genes. I hypothesized that exposure of H. rigida to PCB-52 will alter the gene expression of the candidate PCB-interacting genes. Based on previous studies in a range of animals that show exposure to environmental pollution, including PCBs, affects gene expression (Triebskorn et al. 2002; Menzel et al. 2007; Menzel et al. 2009; Kodvanti et al. 2011; Pujolar et al. 2012; Garcia et al. 2012; Rhee et al. 2013), I predicted that the expression of genes belonging to the following pathways will increase upon exposure to PCB-52: genetic information processing, including translation (EEF1A1), protein sorting, folding, and degradation (HSP90AB1, HSPA8), and environmental information processing, including signaling (ETS2). I also predicted that the expression of genes belonging to the following pathways will decrease upon exposure to PCB-52: energy metabolism (ATP5B), carbohydrate metabolism (PGD, PK), amino acid metabolism (ALDH6A1) and cellular processes, including cell motility (TUBA1C) and transport and catabolism (AP2A2).

44

Furthermore, insight into the biological processes that may be disrupted by exposure to PCB-52 in H. rigida may also be gained from this targeted- transcriptome approach.

Materials and Methods Exposures Hexaegnia spp. eggs (a mixture of H. limbata and H. rigida) were collected from Lake St. Clair in spring 2013 from gravid females (by Dr. J. Ciborowski, University of Windsor) and stored, hatched, and reared in the Aquatic Toxicology Unit, Laboratory Services Branch at the MOECC according to standard government procedures (MOECC 2014). All water used in culturing and testing was dechlorinated city of Toronto tap water. Bioaccumulation size (20-30 mg) Hexagenia spp. were used for the water-only acute toxicity test (MOECC 2012); average mass was 20.1 mg. On test-set up day, four (2 L) plastic vessels were cleaned with Eliminase, thoroughly rinsed with distilled water, and dried. For this study, Hexagenia spp. were exposed to two concentrations of PCB-52: 0.033 µg/L and 0.12 µg/L. These concentrations were chosen based on a range of environmentally relevant concentrations of PCBs in pore water published in the literature (Oen et al. 2011; Martinez et al. 2013). PCB-52 (Ultra Scientific Analytical Solution, North Kingstown, RI) was dissolved in acetone (0.003%) to reach each PCB-52 test concentration. A water-only control and an acetone carrier control (0.003%) were also included in this experiment. Water-only exposures were carried out in 2L plastic vessels each containing 80 Hexagenia spp. specimens per treatment. There were not any replicate vessels per treatment. Because Hexagenia nymphs are burrowing organisms, they require artificial substrates in which they can hide and satisfy their thigmotactic requirements (Fremling and Mauck 1980). Studies have show that test nymphs swim constantly and continuously try to burrow in the bottom of the test vessel when artificial substrates are not provided, which consequently, fatigues the nymphs and may make them more susceptible to toxicants (Fremling and Mauck 1980). As such, 20 silicone tubes and two pieces of plastic mesh were added to each

45

vessel to satisfy thigmotactic requirements. The water was warmed to 22.5°C at test initiation with minimum aeration. Water quality parameters (pH, conductivity, dissolved oxygen, and temperature) were measured for each test vessel at the beginning and end of the experiment, according to government protocols (MOECC, 2012). After the 4-day exposure period, survival was recorded. From each vessel, 10 living organisms were randomly removed with tweezers, flame-sterilizing tweezers between each removal, and individually placed in 5mL RNase-free/DNase- free tubes containing RNAlater (Ambion, Life Technologies, Inc., Burlington, ON). All RNAlater filled tubes with organisms were stored at -20.0°C. The remaining organisms (N=70/treatment) were collected and sent for tissue analysis to measure bioaccumulation. Water samples (1L) were taken for PCB analysis from each vessel at 0 hour and 96 hour time periods according to the MOECC guidelines.

Tissue bioaccumulation of PCB-52 Bioaccumulation of PCB-52 in lipid tissue of Hexagenia spp. was determined by gas/liquid chromatography-electron capture detection (GLC-ECD) using standard methods for the determination of polychlorinated biphenyls, organochlorines and chlorobenzenes in fish, clams and mussels by the MOECC (MOECC 2011).

Selection of candidate genes Ten candidate genes for targeted RNA-seq analysis were previously selected based on three criteria: 1) that they are evolutionarily conserved (discussed in Chapter 2), 2) respond to PCBs, based on the literature (discussed in Chapter 2), and 3) are amplifiable across taxa using degenerate primers (discussed in Chapter 3). The candidate genes shown in Table 4.1 come from a breadth of cellular pathways: Genetic Information Processing (EEF1A1, HSPA8, and HSP90AB1); Metabolism (ALDH6A1, PGD, PKM, ATP5B), Cellular Processes (TUBA1C, AP2A2), and Environment Information Processing (ETS2). Two housekeeping genes (18S rRNA and RPS18) were also selected for analysis.

46

DNA barcoding for species verification Given that the culture of Hexagenia spp. reared at the MOECC is a mixture of H. rigida and H. limbata, a leg from each individual was removed for DNA barcoding. DNA was extracted from each leg using a silica matrix method with the Nucleo- Spin®Tissue Kit (Macherey-Nagel, GmbH & Co, Bethlehem, PA). Using Folmer’s forward and reverse barcoding primers, the DNA barcode region of COI was amplified through PCR (Folmer et al. 1994; Hebert et al. 2003). The PCR protocol used was: 94 ° C for 1 minute, 5 repeats of 94° C for 40 seconds, 45 ° C for 40 sec; 72 ° C for 1 min, followed by 35 repeats of 94° C for 40 seconds, 51 ° C for 40 seconds, and 72° C for 1 minute, and finally, 72° C for 5 minutes. PCR products were run on a 1.5% agarose gel for visualization. Amplicons were prepared for Sanger sequencing. Forward and reverse sequences were determined using an Applied Biosystems Genetic Analyzer (Thermo Fisher Scientific, USA). Employing default parameters in Codon Code Aligner (http://www.codoncode.com/aligner), sequences were trimmed and compared against GenBank using MEGAblast (Altschul et al. 1990) to identify the species, retaining top hit identification only.

Total RNA extraction for targeted-transcriptomics Based on the DNA barcoding results, only RNAlater preserved H. rigida samples (N=29; see Table 4.3 for number of H. rigida per treatment) were lysed individually in liquid nitrogen and homogenized using a QIAshredder homogenizer (Qiagen, Germany). Total RNA purification was performed using the standard silica matrix method with the RNeasy Mini Kit for total RNA extraction from animal tissues (Qiagen, Germany), incorporating on-column DNase digestion (Qiagen, Germany) to remove any genomic DNA contamination prior to final elution of RNA. RNA from each sample was eluted with 30µL and 20µL of RNase-free water respectively. The concentration and integrity of each RNA sample was measured spectrophotometrically using a NanoDrop 1000 (Thermo Fisher Scientific, USA). All total RNA preparations had purity values of 2.05-2.18 (A260/A280) indicating presence of high-quality RNA. Total RNA was stored at -80°C until preparation of cDNA.

47

RT-PCR for cDNA synthesis First-strand cDNA synthesis was performed on each sample in triplicate using the SuperScript III First-Strand Synthesis System for the RT-PCR kit (Invitrogen, California, USA), such that there are three technical replicates for each biological sample. Each RNA/primer mixture contained 500 ng of total RNA, 1µL random hexamers (50 ng/µL), 1 µL dNTP mix (10mM), and a calculated volume of DEPC-treated water for a total reaction volume of 10µL. To this reaction mixture, 10µL of the following cDNA synthesis mix is added: 2µL 10X RT buffer, 4µL mM

TM MgCl2, 2µL 0.1 M DTT, 1µL RNaseOUT (40 U/µL), and 1 µL SuperScript III RT (200 U/µL), for a total reaction volume of 20µL. Reverse transcription of each sample was carried out as follows: incubation for 50 min at 50°C, termination for 5 minutes at 85°, addition of 1µL of RNase H to each sample tube and incubation for 20 min at 37°C. The concentration and integrity of final cDNA was measured via the UV absorption (260/280) using a NanoDrop 1000 (Thermo Fisher Scientific, USA). All cDNA preparations had purity values of 1.72-1.87. cDNA was stored at -20°C until PCR amplification.

Library preparation for Illumina MiSeq sequencing cDNA was diluted for all samples by 1/10 (6µL cDNA to 54µL) using Molecular Biology Grade water. Using 2.0µL of the diluted cDNA, along with 17.5µL

MBG H20, 2.5µL Buffer, 1.0µL MgCl2, 0.5 uL dNTPs, 2.0µL Platinum Taq, and 0.5µL each of forward and reverse primers, PCR for each gene (for all 87 samples) was performed using gene-specific primers with adapter tails according to each gene’s specific annealing temperature (Table 4.1). The PCR was carried out as follows: 2 min - 94°, 30 cycles of: 40 s - 94° - melting step, 1 min (43-59°C) – annealing step, and, 30 s - 72°C – elongation step; 5 min – 72°C, hold - 10°C. Amplicons were analyzed by running the same row (Row F of the 96 well plate) on a 1.5% agarose gel for visualization. 5µL of amplicons of each gene per sample were pooled. All pooled samples were purified to remove primers, unused dNTPs, enzymes, primer dimers, salts, and other impurities using the MinElute® PCR Purification Kit (Qiagen, Germany). The concentration of each amplicon library

48

was assessed using PicoGreen® dsDNA Quantitation Reagent on the Turner Biosystems TBS-380 Mini-Flurometer. Amplicon libraries ranged from 7.12-16.85 ng/uL. A DNA bioanalyzer (model 1000, Agilent Technologies, Germany) determined the exact fragments’ size within the amplicon libraries using the DNA chip DNA 7500 kit (Agilent Technologies, Germany) and DNA chip 7500 reagents kit (Agilent Technologies, Germany).

Sequencing on Illumina MiSeq The generated amplicon libraries were dual indexed with Nextera XT indexes (FC-131-1002) for 9 cycles only to attach the indexes. Equimolar portions of each indexed library were pooled and sequenced on a single Miseq flowcell using the V2 Miseq sequencing 500 cycles kit (250X2)(MS-102-2001) following Illumina MiSeq Paired End Sequencing Protocol, Part # 15039740 Revision C (Illumina, USA).

Bioinformatics pipeline Data pre-processing Raw reads were demultiplexed using Usearch software (http://www.drive5.com/usearch/). Sequences were filtered and trimmed using SeqPrep software (https://github.com/jstjohn/SeqPrep) based on the following protocol: 1) remove adapter sequences from FASTQ files, 2) perform quality control on unpaired ends separately using a sliding window of 10 bp with steps of 5 bp to remove sequences in which bases have Phred scores of less than 20 (99% accuracy of base call), 3) trim sequences by removing any sequences shorter than 50 bp, 4) convert FASTQ files to FASTA files., and 5) remove primers. Sequences were also denoised and clustered at 99% identity using Usearch (http://www.drive5.com/usearch/).

Annotation and alignment A non-redundant refseq_rna library (www.ncbi.nlm.nih.gov/refseq/) customized for each gene (1521 concatenated sequences, approximately 100-120 sequences per gene from conventional organisms) was created. Clusters were

49

generated via comparison against this local library using the following parameters: 50% similarity, word size 10, E10, and filtering hits to a minimum of 50 bps alignment length. Each biological sample generated six files (three technical replicates, each with two directions, forward and reverse) with counts for each gene.

Statistical analysis on technical replicates Bartlett’s test (Bartlett 1937) for homoscedasticity (equal variance across samples) was performed on each technical replicate per biological sample in each treatment, such that the standard deviations of the measurement variable (number of counts of each gene) were compared (p>0.05; technical replicate data are homoscedastic). The Kruskal Wallis test (Kruskal and Wallis 1952) is a non-parametric method for testing whether mean ranks of the measurement variable are significantly different among members of a group (similar to a one-way analysis of variance, by mean ranks); this test requires data are homoscedastic. The Kruskal Wallis test can be used to show quality control of the experimental methods (RNA extraction, cDNA synthesis, PCR, NGS). Consequently, a significant Kruskal Wallis test result (p<0.05) can be used as a guide to filter out a biological sample in which the mean ranks of gene counts among the technical replicates is not the same (Vaux et al. 2012). For 84 out of 87 technical replicates, p>0.05. For G2X1, G2X2, G2X3, p<0.05; therefore, biological sample G2 was removed from subsequent data analysis. Technical replicates were collapsed such that each biological sample is the sum of all six technical replicates per gene. Pooling technical replicates increases sequencing depth for differential gene expression analysis (Wang et al. 2010), is a common method in differential gene expression analysis using NGS data (Robinson et al. 2010; Wang et al. 2010; Love et al. 2014), and is a requirement for edgeR software in R (Robinson et al. 2010).

50

Differential gene expression analysis Several studies (Rapaport et al. 2013; Schurch et al. 2015) have evaluated many RNA-seq analysis methods and software programs, including edgeR, DESeq, Cuffdiff, DEGseq, bayseq, and limma, based on performance criteria, such as accuracy and sensitivity of detecting differentially expressed genes, as well as the influence of biological replicate number on performance criteria. Though no single method is considered superior across all categories of comparison, edgeR is a leading tool for many RNAseq studies when trying to control false discovery rate, especially in those studies with less than twelve biological replicates per treatment (Dillies et al. 2012; Schurch et al. 2015). Gene counts were processed with edgeR, an empirical analysis of digital gene expression analysis data (Robinson et al. 2010) in the R software environment. Any genes with fewer than two counts per million reads in less than three biological replicates per treatment were removed from analysis because they are not likely to impact differential expression analysis (Robinson et al. 2010). Gene counts were Trimmed Mean of M-values (TMM) normalized (Robinson et al. 2010), which corrects for biases due to different sequencing depth and/or relative expression of other transcripts between samples, and is recommended for RNA-seq data (Dillies et al. 2012; Schurch et al. 2015). In edgeR, TMM is computed by eliminating the 30% of genes that contain the most extreme M-values (i.e. log-fold changes) for the samples being compared (Robinson and Oshlack et al. 2010; Finotello and Di Camillo 2014), which is then used to normalize the entire data set. Differential gene expression analysis was performed using the exact test in edgeR by computing a log2-fold change for each gene between control and treatment groups. A gene was considered significantly differentially expressed if its absolute log2-fold change was greater than or equal to 1, with a p- value less than 0.05. P-value correction for multiple testing was not incorporated because the analysis used only eleven genes; this correction lowers the p-value for statistical significance to reduce the chance of Type 1 error, but it is typically used for much larger data sets, with hundreds to thousands of genes in which false positives are more likely to occur (Jelaso et al. 2003; Diz et al. 2011).

51

Results Water quality parameters and chemistry results All measured water quality parameters were within suitable ranges for Hexagenia spp. well-being and are summarized as follows: temperature: 22.35 ± 0.19 °C, dissolved oxygen: 8.15 ± 0.76 mg/L, pH: 8.15 ± 0.14, conductivity: 354.88 ± 63.67 µS/cm. The measured concentrations of PCB-52 for each test concentration at 0 hours and 96 hours are shown in Table 4.2. Due to the drastic drop (-97%) in PCB- 52 concentration at 96 hours for the 0.12 µg/L concentration, the actual exposure concentration bioavailable during the test is uncertain. Chemical concentrations of organic contaminants can decrease due to sorption to exposure vessels (Harkley et al. 1994). Consequently, this concentration will not be included in the gene expression results or discussion. Only gene expression results from the lower exposure concentration, 0.033µg/L PCB-52, will be included in the gene expression results and discussion.

Survivorship and bioaccumulation results Survivorship of the 80 individuals per treatment is summarized in Table 4.2. There were no significant differences in survivorship between groups (Kruskal- Wallis test, p=0.406).

DNA Barcoding Summary Identification of each individual in the experiment was determined by DNA barcoding and summarized in Table 4.3. Based on the uneven distribution of H. rigida and H. limbata across treatments and the necessity of at least 3 biological samples per treatment for reliable differential gene analysis (Liu et al. 2014), only H. rigida individuals were used in the subsequent RNAseq analysis workflow.

Sequencing and Annotation Results Over 12 million reads were obtained via high-throughput sequencing. Pre- processing generated 11,106,331 GLGQ reads. 92.5% (11,105,409) of the good length, good quality (GLGQ) reads annotated to one of the targeted genes in the local

52

library. Table 4.4. details sequencing and annotation results. Because the PK gene had fewer than two counts per million reads in less than three biological samples in each treatment, it was removed from analysis.

Targeted RNA-seq gene expression Statistical evaluation of normalized read counts revealed that three genes (HSP90AB1, ALDH6A1, and TUBA1C) in the 0.033 µg/L PCB-52 treatment were significantly differentially expressed compared to the control; these genes showed a significant decrease in expression (Figure 4.1). The acetone (carrier solution) control showed no significant changes in gene expression for any gene. The expression of housekeeping gene 18S rRNA did not significantly change across any treatment. Table 4.5 details gene expression changes across treatments.

Discussion and Conclusion Targeted-transcriptome advantages In this water-only 96-hour experiment, the targeted transcriptome of H. rigida exposed to PCB-52 was analyzed to determine differential gene expression of PCB-interacting genes conserved across taxa. Survivorship of Hexagenia spp. across treatments and bioaccumulation of PCB-52 were also measured. Though H. rigida is a common ecotoxicological test organism, its genome has not been sequenced. To my knowledge, this is the first study to assess gene expression in any member of the Ephemeridae family of mayflies in the context of a chemical stressor. In addition, my molecular evolutionary methodology showed that a targeted-transcriptome approach, one that includes the identification of conserved target genes as well as the design and validation of degenerate primers, can be successfully applied in typical ecotoxicological testing using a conventional test organism, yet non-model species, such as H. rigida. The generation of sequence information for all the genes used in this study increases the repository of sequence information available for H. rigida. This sequence information can be analyzed in future studies probing the genetic variation in these genes, in order to determine any isoforms, haplotypes, and/or SNPs that may exist.

53

Using a targeted-transcriptome approach, 92.5% of the GLGQ transcripts annotated to one of the genes in the local library. This annotation efficacy is vastly higher compared to other studies that use the whole transcriptome approach of RNA-seq in other non-model . For example, in a xenobiotic study involving Aphis glycines, a soybean pest, only 30% of transcripts had one or more hits to protein sequences in the online refseq_protein database in GenBank (Bansal et al. 2014). Other studies using RNA-seq in non-model insects also show similar annotation efficacy (Mittapalli et al., 2010; Bai et al. 2011; Chen et al. 2012). As such, whole transcriptome approaches of non-model organisms produce large data, of which only a small portion may be useful and/or valuable for analysis. Using the targeted-transcriptome approach to analyze gene expression of candidate PCB-interacting genes also revealed the finer detail that can be captured compared to measuring survival and/or tissue concentrations of PCBs alone. In this bioassay, survivorship of Hexagenia spp. was similar across all treatments; PCB-52 was not lethal at either PCB-52 concentration. Several studies of invertebrates, including Hyalella azteca and Daphnia magna, exposed to NDL PCBs also show no effect on survival when exposed to much greater concentrations (Borgmann et al. 1990; Dillon et al. 1990). Similarly, the bioaccumulation results are much lower than the range of PCB concentrations in the tissues of other invertebrates that show toxic effects, perhaps due to the low concentrations of PCB-52 used in this experiment (Borgman et al. 1990). Though there was no significant difference in survival of Hexagenia spp. as a population and tissue concentration of PCB-52 was minimal, significant differences in the expression of three genes: HSP90AB1, TUBA1C, and ALDH6A1 in the 0.033ug/L treatment, indicate the presence of sublethal effects at the subcellular level. These sublethal effects are not a measure of toxicity, since PCB-52 was not lethal in this experiment, but may be more a measure of xenobiotic-induced disruption to certain biological processes.

54

Targeted-transcriptome response to PCB-52 HSP90AB1 The various members of the heat shock protein 90 family are highly conserved across taxa. In many different invertebrate species, the presence of a single HSP90 isoform is most common (Gupta et al. 1995; Picard et al. 2002); however, more recently, two different isoforms of HSP90 have been isolated in the marine crab, Portunus trituberculatus (Zhang et al. 2009), and in the honey bee, Apis mellifera (Xu et al. 2010). Residing in the cytoplasm and ATP-dependent, HSP90 has many cellular functions, including: intracellular transport (protein secretion and trafficking), maintenance, cell signaling, and acting as a chaperone to facilitate the folding/unfolding and assembly/disassembly of many molecules (McClellan et al. 2007; Pearl et al. 2008). In particular, HSP90 associates with tubulin (Gupta et al., 2010; see below in discussion) and is involved in the reorganization of the microtubular network of eukaryotes (Krtková, et al. 2012). Also, during times of environmental stress, HSP90 is required for the progression of the cell cycle (including cytokinesis and meiosis) (McClellen et al. 2007). Previous studies in various invertebrate species have shown the expression of HSP90 can be significantly induced by different environmental stressors (Table 4.6). In this study, however, HSP90 was found to have decreased expression following the 0.033ug/L PCB-52 exposure (log2FC = -2.0). Though HSP90 is down- regulated internally during diapause in some insects (Aruda et al. 2011; Tachibana et al. 2004), the only diapause state for Hexagenia spp. occurs in unhatched eggs, a different life stage than the nymphal life stage used in this experiment (Giberson and Rosenberg 1992). HSP90 also functions in ecdysteroid (steroid hormone) maturation in invertebrates. 20-hydroxyecdysone (20E) and juvenile hormone (JH) are two hormones important for growth and development, especially during the molting process (Jones et al. 2002; Liu et al. 2013). Studies on insects and other invertebrates also show 20E and JH upregulate the expression and nuclear translocation of HSP90 (Chang et al. 1999; Liu et al. 2013). Furthermore, in the moth, Helicoverpa armigera, HSP90 is necessary for gene expression in the 20E and

55

JH pathways by interacting with different proteins once nuclear translocation has occurred (Liu et al. 2013). In addition, studies show HSP90 is required for proper ecdysteroid receptor (EcR) folding in Drosophila (Arbeitman and Hogness 2000). Given that HSP90 is integral to both the 20E and JH pathways, PCB-52 induced down-regulation of HSP90 may interfere with normal functioning of ecdysteroid signaling, ecdysteroid synthesis, and/or molting in H. rigida. Though further work is needed to study the effects of PCBs on other genes in the ecdysteroid pathway and JH pathway, studies have shown that PCBs (DL, NDL, and mixtures) can disrupt molting in other invertebrates, including the water flea and Fiddler crab (Fingerman and Fingerman 1977; Zou and Fingerman 1997). A mechanism of action of PCB-induced molting disruption, however, remains unclear (Zou 2005). Reduced HSP90 gene expression in invertebrates also produces physical deformities. Inhibition of Drosophila HSP90 (either by mutation or pharmacologically induced) caused deformed legs and eyes, and showed wing, thorax, and bristle abnormalities in adults (Rutherford and Lindquist 1998). In C. elegans, downregulation of HSP90 caused defects in muscle cells, leading to muscular structure abnormalities (Gaiser et al. 2011). A morphological comparison of both nymphs and adult H. rigida exposed to PCB-52 during development, either naturally or in the lab, compared to non-exposed H. rigida, would be an area of future work, to make connections between exposure to NDL PCBs, reduced gene expression of HSP90, and possible phenotypic variation of adults.

TUBA1C Tubulin is the highly conserved superfamily of globular proteins; �-tubulin and �-tubulin heterodimers form the major component of microtubules. Microtubules act as a scaffold to determine cell shape, as well as a backbone on which organelles and vesicles can move. Microtubules, along with microfilaments and intermediate filaments, form the cytoskeleton, and together, these proteins help regulate cell growth and movement (Parker et al. 2014). Interestingly, microtubules are also associated with steroid hormone biosynthesis in both vertebrates and invertebrates. In mammals, several studies

56

confirm the association between disrupting microtubule stability and impairment of steroidogenesis (Rajan et al. 1985; Carnegie et al. 1987; Benis and Mattson 1989). Insect studies have shown an increase in expression of �-tubulin following exposure to 20E in Chironomus tentans (Fretz et al. 2001), as well as a decrease in �-tubulin signal from microtubules in ecdysone deficient D. melanogaster mutants (Jin et al. 2005). In many insect species, microtubules are also thought to mediate intracellular transport of EcR to the nucleus, as well as aid the translocation of ecdysteroid precursors, stimulating ecdysteroidogenesis (Watson et al. 1996; Vafopoulou 2009). Recently, Vafopoulou and Steel (2012) proposed this nucleocytoplasmic shuttling occurs via the HSP90-EcR complex using microtubules as a mechanism for transport. In this experiment, expression of TUBA1C in H. rigida significantly decreased following exposure to 0.033ug/L PCB 52 (log2FC = -1.0). Experiments on mammalian systems also show a reduction of �-tubulin following exposure to other NDL PCBs. For example, the TUBA1C protein was less abundant following PCB 153 exposure in MCF-7 cells (human mammary adenocarcinoma cells) (Lassere et al. 2012). In three-month old rats perinatally exposed to PCB 138, protein expression of TUBA1C was also reduced (Campagna et al. 2011). Moreover, �-tubulin assembly into spindle formation was affected in mouse oocytes exposed to Arcolor 1254 in vitro (Liu et al. 2015). Reduction of TUBA1C gene expression following PCB-52 exposure, coupled with the involvement of tubulin and its interaction with HSP90 during steroidogenesis, is further evidence that NDL PCBs may interfere with normal ecdysteroid pathway functioning in H. rigida. Additional studies of other genes integral to cytoskeleton formation, which also play a role in steroidogenesis (e.g. actin, as reviewed in Sewer and Li 2008), may further illuminate the connection between NDL PCBs and disruption to steroid hormone maturation in insects.

ALDH6A1 Aldehyde dehydrogenases are a superfamily of conserved proteins that catalyze the oxidation of many aldehyde substrates and play a role in cellular

57

protection. Aldehyde dehydrogenases can influence neuronal function and are also associated with xenobiotic metabolism (Marchitti et al. 2008; Koppaka et al. 2012). Located in the mitochondria, ALDH6A1 is a CoA dependent involved in valine and pyrimidine catabolism. ALDH6A1 catalyzes the decarboxylation of malonate semialdehyde and methylmalonate semialdehyde to acetyl-CoA and propionyl-CoA at the distal end of the L-valine pathway (Marchitti et al. 2008). Though mutations in the human ALDH6A1 gene are a rare autosomal recessive disorder, insight into ALDH6A1 function can be gained by reviewing associated biological effects. In humans, ALDH6A1 deficiency leads to metabolic abnormalities, including transient methylmalonic acidemia (the inability to properly process proteins and fats) and is associated with dysmyelination, and severe psychomotor and developmental delays (Marchitti et al. 2008; Sass et al. 2012; Marcadier et al. 2013). A role for ALDH6A1 involving lipid metabolism, in both vertebrates and invertebrates, is also supported in the literature. Upregulation of ALDH6A1 gene expression occurs during adipogenesis in rats (Kedishvili et al. 1994) and in the fat cells attached to the mandibular glands that synthesize royal jelly of queen honey bees (Hasegawa et al. 2009). Conversely, down regulation of either ALDH6A1 protein or gene expression has been documented in numerous studies, including: obese individuals post-diet (Bouwman 2009), obese individuals with type 2 diabetes (Dharuri et al. 2014), and hibernating black bears and ground squirrels that use fat stores as the source for energy during winter months (Fedorov et al. 2011; Rose et al. 2011). Interestingly, neurological and endocrine-related abnormalities are also evident with PCB-induced changes to ALDH6A1 expression in mammals. Four- month old rats perinatally exposed to PCB-52 showed a reduction of peripheral blood ALDH6A1 gene expression (DeBoever et al. 2013) along with impaired motor coordination (Boix et al. 2010). ALDH6A1 protein has also been identified as an androgen-dependent and developmentally regulated epididymal sperm protein in rats (Suryawanshi et al. 2012). Rats exposed to Aroclor 1254 showed a decrease in

58

ALDH6A1 gene expression in the epididymis coupled with impairment of epididymal tight junctions (Cai et al. 2013). Given these biological associations with changes in ALDH6A1 gene expression in other animals, significant down regulation of ALDH6A1 in H. rigida

(log2FC = -1.3) following exposure to 0.033µg/L PCB-52 may indicate this NDL PCB congener interferes with certain aspects of lipid metabolism and can affect motor control, though future studies on both nymphs and adults are required to make these possible connections more clear.

Non-differentially expressed genes Six of the targeted genes (EEF1A1, HSPA8, PGD, ATP5B, AP2A2, ETS2) did not show any significant difference in expression following exposure to 0.033 µg/L PCB-52 in H. rigida. There are several possible reasons for this lack of differential expression. Firstly, these evolutionary conserved genes were chosen based on their known interaction with any PCB congener (both DL and NDL) in at least one animal species. These genes may not interact with PCB-52 specifically and/or these genes may not interact with PCBs in H. rigida. Secondly, targeting gene expression at the end of the 96-hour acute test provides only one time point for analysis. Temporal gene expression experiments aim to uncover time-sensitive responses to complex biological processes, including those resulting from xenobiotic exposure (Androulakis et al. 2007). Sometimes a gene can be differentially expressed at one time point and not another during an exposure experiment. For example, time series gene expression studies on the larval fathead minnow (Pimephales promelas) exposed to the insecticide phenylpyrazol fipronil show differences in the expression of genes involving cellular metabolism and endocrine function when early and late time points were measured (Beggel et al. 2012). Though measuring gene expression at different time points was beyond the scope of this project, it is an area of future work for the non- differentially expressed genes in our gene set. In addition to temporal gene expression, the instar of the nymphal life stage of H. rigida used in this experiment may also be a factor. Though the 20-30mg

59

nymphs of H. rigida are useful for bioaccumulation analysis, the smaller 5mg size nymphs are more sensitive to pollution (Watson-Leung, personal communication 2013). Studies on mice have also shown different responses to xenobiotics between life stages, which in part, may be due to differences in expression of xenobiotic metabolizing enzymes and transporters (Lee et al. 2011). Lastly, it is possible that the 0.033 µg/L PCB-52 concentration was too low to induce any differential gene expression for those six non-differentially expressed genes. Given that the concentrations chosen in this experiment were on the low end of environmentally relevant concentration for PCBs, using higher concentrations of PCB-52 within the range warrants consideration. In summary, a water-only 96 hour acute experiment on Hexagenia spp. exposed to PCB-52 was performed in which survivorship and bioaccumulation were measured, and the targeted transcriptome of H. rigida was analyzed to determine differential gene expression of candidate PCB-interacting genes conserved across taxa. Using this gene set, along with degenerate primers designed to work in phylogenetically distant animals, transcript sequences of all targeted genes were successfully generated for H. rigida, a non-model yet commonly used ecotoxicological test organism. This targeted-transcriptome approach also allowed for effective annotation of transcripts, such that over 90% of the data was usable and valuable. Moreover, significant differential expression of three genes in this study, alongside similar survivorship across all groups, showed the sublethal effects that can be missed if measuring mortality and/or bioaccumulation alone. Targeted gene expression analysis also provided insight into the biological processes that may be disrupted by exposure to PCB-52 in H. rigida. Down regulation of both HSP90AB1 and TUBA1C gene expression may suggest NDL-PCBS, such as PCB-52, interfere with the ecdysteroid hormone pathway in H. rigida. Down regulation of ALDH6A1 gene expression may also suggest that NDL-PCBs, such as PCB-52, interfere with certain aspects of lipid metabolism and motor control in H. rigida. Further studies are required to make these possible biological connections more clear.

60

Though further work is needed to apply this molecular evolutionary framework in both vertebrates and other invertebrates exposed to PCB-52 and other environmental toxicants, the success of this study as a proof of concept suggests this targeted-transcriptome approach may complement standard ecotoxicological tests and may be useful in a biomonitoring context as a multispecies tool to screen for sites with potential biological impact. Furthermore, this research contributes gene expression data necessary to understand the molecular events relevant to an AOP approach for ERA of PCB-52, and possibly other NDL-PCBs.

61

Table 4.1 Degenerate primers used in targeted RNA-seq analysis.

Gene (Human) Insect Primers TA AS Homologue (°C) (bp) EEF1A1(Eukaryotic EF-1alpha F - CCACCGGGACTTCATCaaraayatgat 55.2 320 translation elongation R – CCGTGTGATCTTCCAGccyttraacca factor 1 alpha 1)

HSPA8 (Heat shock HSP70-1 F - CCAGCGGCAGGCCaciaargaygc 55.1 212 70kDa protein 8) R - AGTGGTTCACCATCCGGttrtcraartc

HSP90AB1 (Heat shock HSP90 F - CCAGGAGGAGTACGGCgarttytayaa 57.9 165 protein 90kDa alpha R - GCTCGTCGCAGTTGtccatdatraa (cytosolic), B, 1)

ALDH6A1 (Aldehyde ALDH6A1 F - CGGCATCGCCCCCttyaayttycc 57.9 287 dehydrogenase 6 family, R - CGCCGTGGTTCTTGgcicccatrtt A1)

PGD (Phosphogluconate Pgd F - GAAGGGCACCGGCaartggacigc 55.2 278 dehydrogenase) R - GGATGATGCAGCCGccickccacat

PK (Pyruvate kinase, PyK F - CGAGATCCCCGCCgaraargtitt 56.6 152 muscle) R – CGGTCTCGCCGGACarcatiayrca

ATP5B (ATP synthase, ATPsyn-beta F - GGCCGAGTACTTCCGGgaycargargg 55.2 185 H+ transporting, R - AGGTCGTCGGCGggiacrtadat mitochondrial F1, beta) TUBA1C (Tubulin, �-tubulin F - TGGAGCGGCTGTCCgtigaytaygg 57.6 142 alpha, 1c) R – CGGCAGATGTCGTAGATGgcytcrttrtc

AP2A2 (Adaptor-related AP-2alpha F - 54.1 149 protein complex 2, TCCTCCAACAGGTACACCgaraarcarat alpha 2) R - CTCGTGGGCGAAGgcytcigccat

ETS2 (V-Ets avian ETS1/2 F - CCAGATCCAGCTGtggcarttyyt 55.2 128 erthroblastiosis virus R - GCCCGGGACAGCTTCtcrtarttcat E26 oncogene homolog) 18S rRNA* 18S rRNA F - AATTTGACTCAACACGGG 45.9 224 (18S ribosomal RNA) R - CATCACAGACCTGTTATTGC

RPS18* RpS18 F - CCCTGGTGATCCCCgaraarttyca 56.0 198 (Ribosomal protein, R - CAGGAACCAGTCGGGGayyttrtaytg S18)

* housekeeping gene; TA = annealing temperature; AS= amplicon size; Uppercase letters = non- degenerate core; Lower letters = degenerate clamp. Insect homologue gene name and symbol represented by D. melanogaster.

62

Table 4.2 Water chemistry, survivorship, and tissue bioaccumulation results from 96-hour PCB-52 exposure.

Treatment PCB-52 water Survivorship Tissue concentration Bioaccumulation

0 hr 96 hr % Lipid PCB-52 (µg/L) (µg/L) (%) (µg/g ww) Pre-exposure control N/A N/A 80 0.62 0.0016 Water only control 0.000 0.000 85.0 0.67 0.001 Acetone control 0.000 0.000 82.5 0.80 0.001 0.033 µg/L 0.033 0.033 80.0 0.46 0.013 0.12 µg/L 0.120 0.003 81.3 0.44 0.034 * N/A – not applicable; pre-exposure controls were used for baseline data of survivorship and tissue concentration of PCB-52.

Table 4.3 Number of each species of Hexagenia per treatment.

Treatment H. rigida H. limbata Water-only control 8 2 Acetone control 5 5 0.033 µg/L PCB-52 9 1 0.12 µg/L PCB-52 7 3

Table 4.4 Sequencing and annotation results from initial bioinformatics processing.

Description Value Number of libraries 87 Total number of sequences 12, 061, 138 Min number of sequences per sample 17, 577 Max number of sequences per sample 119, 812 Average number of sequences per sample 69, 317 Number of GLGC sequences 11, 106, 331 Number of annotated sequences 11, 105, 409

63

Figure 4.1 ALDH6A1, TUBA1C, and HSP90AB1 gene expression in H. rigida exposed to 0.033 ug/L PCB-52. Asterisks denote statistically significant (p<0.05) log2fold changes between exposed and control groups, as detected by edgeR. Bars show standard error of the mean normalized counts per million.

64

Table 4.5 Log2 fold change values of all genes for the acetone (carrier solution) control and 0.033 ug/L PCB-52.

Function Gene (Human) Acetone 0.033µg/L Genetic Information Processing EEF1A1 0.1 0.2 HSPA8 -0.2 -0.4 HSP90AB1 0.0 -2.0* Metabolism ALDH6A1 -1.3 -1.3* PGD 0.0 -0.3 ATP5B 0.4 0.0 Cellular Processes TUBA1C -0.2 -1.0* AP2A2 0.2 0.3 Environment Information Processing ETS2 0.4 -2.7 Housekeeping Genes 18S rRNA -0.1 0.1 RPS18 0.6 0.4

*Asterisks denote log2 fold changes ≥ |1| that are statistically significant (p < 0.05).

Table 4.6 Induction of HSP90 by several environmental stressors in various invertebrate species.

Stressor Invertebrate species Reference cold/heat Tetranychus cinnabarinus (carmine spider mite) Feng et al., 2009 Portunus trituberculatus (marine crab) Qian et al., 2012 Apolygus lucorum (true bug) Sun et al., 2014 metal ions Portunus trituberculatus (marine crab) Zhang et al., 2009 Spodoptera litura (moth) Shu et al., 2010 bisphenol A Charybdis japonica (marine crab) Park and Kwak, 2014 low salinity Portunus trituberculatus (swimming crab) Bao et al., 2014 pH challenges Litopenaeus vannamei (shrimp) Qian et al., 2012 pesticides Apolygus lucorum (true bug) Sun et al., 2014

65

Chapter 5

General Conclusion

This thesis, as a proof of concept, shows that the molecular evolutionary framework to target the transcriptome in the context of xenobiotic exposure has the potential for functional gene expression analysis. It addresses the challenges of biomarker specificity and working with whole transcriptome analysis, especially when using non-model organisms. The degenerate primers for conserved genes that were developed can be used for other studies of gene expression following PCB exposure in divergent organisms, or can be used as a framework targeting evolutionary conserved gene sets of other toxicants, such as metals, nanoparticles, and other endocrine disrupting chemicals. Furthermore, applying this approach alongside standard ecotoxicological bioassays would complement the results of lethal endpoint testing by elucidating sublethal effects that can occur, as well as contribute to the mechanistic understanding that may be linked to adverse outcomes at higher levels of biological organization; this contribution supports the adverse outcome pathway (AOP) approach, a new paradigm for environmental risk assessment (ERA). This thesis also provides new gene expression data for H. rigidia, an organism whose genome has not yet been sequenced. The sequence information generated for all the genes used in this study increases the repository of sequence information available for H. rigida, and can be used in future studies to probe the genetic variation in these genes, and determine any isoforms, haplotypes, and/or SNPs that may exist. Based on the differential gene expression analysis, new questions and hypotheses can also be proposed that involve the possible connections between PCB-52 and its effect on certain aspects of growth and development, such as molting, in Hexagenia spp. Further work is needed to apply the “multispecies” molecular approach developed in this thesis to other organisms both laboratory-exposed and naturally- exposed to PCBs. Specifically, it would be necessary to repeat the exposure

66

experiment with Hexagenia spp. such that enough H. limbata replicates are obtained to do a species-specific comparison of gene expression. As well, it would be necessary to perform a comparable PCB-52 exposure experiment with model toxicological organisms, such as D. rerio and M. musculus, to investigate any species- level differences in gene expression between invertebrates and vertebrates that may occur. In addition, repeating the exposure experiment on H. rigida with a time series component would be helpful in investigating the six genes that did not show any significant differential gene expression at the end of the 96-hour exposure. It would also be interesting to repeat the exposure experiment with Hexagenia spp. using other chemicals, such as DL-PCBs (e.g. PCB 126), dioxins, polycyclic aromatic hydrocarbons, and halogenated aromatic hydrocarbons, that are known to interact with the aryl hydrocarbon receptor (AHR). In natural ecosystems, testing the gene set and associated molecular tools in PCB-polluted vs. non-polluted aquatic communities, where the presence and abundance of common aquatic bioindicator species (e.g. H. rigida and H. limbata) are used for water-quality analysis, would also signify the usability of this evolutionarily conserved targeted-transcriptome approach in a biomonitoring workflow. From this biomonitoring perspective, the development and application of multispecies molecular tools that are useful across a diverse range of organisms, from invertebrates to vertebrates, from model organisms to non-model organisms, from bioindicator species to conventional ecotoxicological organisms, may help provide an “early warning” of chemical exposure, such that sites with potential biological impact may be identified in a timely manner. In addition, taxonomically wide-range transcriptome analysis can also provide knowledge of molecular events that can elucidate possible biological effects, either conserved or taxon-specific, that may occur at higher levels of biological organization. It is the potential for linking knowledge from different lines of evidence, including in situ studies on benthic organism community, laboratory-based bioassays, and transcriptomics, in order to understand toxicity progression relevant to the ERA process, that makes this research, even as a proof of concept, important and valuable.

67

References: Altschul, S.F., W. Gish, W. Miller, E.W. Myers and D.J. Lipman 1990. Basic local alignment search tool. Journal of Moleculr Biology 215:403-410. Androulakis, I.P., E. Yanf., and R.R. Almon 2007. Analysis of time-series gene expression data: Methods, challenges, and opportunities. Annual Review of Biomedical Engineering 9:205-208. Ankley, G.T., R.S. Bennet, R.J. Erickson, D.J. Hoff, M.W. Hornung, R.D. Johnson et al. 2010. Adverse outcome pathways: a conceptual framework to support ecotoxicological research and risk assessment. Environmental Toxicology and Chemistry 29: 730-741. Arbeitman, M.N. and D.S. Hogness 2000. Molecular chaperones activate the Drosophila ecdysone receptor, an RXR heterodimer. Cell 101: 67–77. Aruda, A.M., M.F. Baumgartner, A.M. Reitzel, and A.M. Tarrant 2011. Heat shock protein expression during stress and diapause in the marine copepod Calanus finmarchicus. Journal of Insect Physiology 57: 665-675. Antunes-Fernandes, E.C., T.F.H. Bovee, F.E.J. Daamen, R.J. Helsdingen, M. van den Berg, and M.B.M. van Duursen 2011. Some OH-PCBs are more potent inhibitors of aromatase activity and (anti-) glucocorticoids than non-dioxin like (NDL)-PCBs and MeSO2-PCBs. Toxicology Letters 206:158-165. Baker, M.E., B. Ruggeri, L.J. Sprague, C. Eckhardt-Ludka, J. Lapira, I. Wick, L. Soverchia, M. Ubaldi, A.M. Polzonetti-Magni, D. Vidal-Dorsch et al. 2009. Analysis of endocrine disruption in southern California coastal fish using an aquatic multispecies microarray. Environmental Health Perspectives 117(2): 223-230. Bai, X., P. Mamidala, S.P. Rajarapu, S.C. Jones, and O. Mittapalli 2011. Transcriptomics of the bed bug (Cimex lectularius). PLoS One 6(1):e16336. Bansal, R., M.A.R. Mian, O. Mittapalli, and A.P. Michel 2014. RNS-seq reveals a xenobiotic stress response in the soybean aphid, Aphis glycines, when fed aphid-resistant soybean. BMC Genomics 15:972. Bao X.N., C.K. Mu, C. Zhang, Y. Wang, W.W. Song, R.H. Li and C.L. Wang 2014. mRNA expression profiles of heat shock proteins of wild and salinity-tolerant swimming crabs, Portunus trituberculatus, subjected to low salinity stress. Genetics and Molecular Research 13(3):6837-6847. Barber, D.S., McNally A.J., Garcia-Reyero, N., and D. Denslow 2007. Exposure to p,p’- DDE or dieldrin during the reproductive season alters hepatic CYP expression in largemouth bass (Micropterus salmoides). Aquatic Toxicology 81:27-35. Bartlett, M.S. 1937. Properties of sufficiency and statistical tests. Proceedings of the Royal Statistical Society Series A. 160:268-282. Ben-Dov, E. and A. Kushmaro 2015. Inosine at different primer positions to study structure and diversity of prokaryotic populations. Current Issues in Molecular Biology 17:53-56. Benis, R and P. Mattson 1989. Microtubules, organelle transport, and steroidogenesis in cultured adrenocortical tumor cells. Tissue Cell 21:479- 494.

68

Beggel S., I. Werner, R.E. Connon, and J.P. Geist 2012. Impacts of the phenylpyrazole insecticide fipronil on larval fish: Time –series gene transcription responses in fathead minnow (Pimephales promelas) following short-term exposure. Science of the Total Environment 426:160-165. Berninger, J.P., D. Martinovic-Weigelt, N. Garcia-Reyero, L. Escalon, E.J. Perkins, G.T. Ankley et al. 2014. Using transcriptomic tools to evaluate biological effects across effluent gradients at a diverse set of study sites in Minnesota, USA. Environmental Science and Technology 47:2404-2412. Bigot, A., Vasseur, P., and R. Rodius 2010. SOD and CAT cDNA cloning , and expression pattern of detoxification genes in the freshwater bivalve Unio tumidus transplanted into the Moselle river. Ecotoxicology 19:369-376 Boix, J., O. Cauli., and V. Felipo 2010. Developmental exposure to polychlorinated biphenyls 52, 138 or 180 affects differentially learning or motor coordination in adult rats. Mechanisms involved. Neuroscience 167:994-1003. Boix, J., O. Cauli, H. Leslie, and V. Felipo 2011. Differential long-term effects of developmental exposure to polychlorinated biphenyls 52, 138 or 180 on motor activity and neurotransmission. Gender dependence and mechanisms involved. Neurochemistry International 58:69-77. Borgmann, U., W.P. Norwwod, and K.M. Ralph 1990. Chronic toxicity and bioaccumulation of 2,5,2’,5’- and 3,4,3’,4’- tetrachlorobiphenyl and Arcolor® 1242 in the amphipod Hyallela azteca. Archives of Environmental Contamination ad Toxicology 19:558-564. Bouwman, F.G., M. Clasessens, M.A. van Baak, J.P. Noben, P. Wang, P., W.H.M. Saris, and E.C.M. Mariman 2009. The Physiologic effects of caloric restriction are reflected in the in vivo adipocyte-enriched proteome of overweight/obese subjects. Journal of Proteome Research 8:5532-5540 Bowman C.J. and N. Denslow 1999. Development and validation of a species- and gene-specific molecular biomarker: Vitellogenin in mRNA in largemouth bass (Micropterus salmoides). Ecotoxicology 8:399-416. Bustos, C and L.D. Corkum 2013. Delayed egg hatching accounts for replacement of burrowing mayflies Hexagenia rigida by Hexagenia limbata after recolonization in western Lake Erie. Journal of Great Lakes Research 39:168- 172. Butler, R.A., M.L. Kelley, W.H. Powell, M.E. Hahn, and R.J. Van Beneden 2001. An aryl hydrocarbon receptor (AHR) homologue from the soft-shell clam Mya arenaria: evidence that invertebrate AHR homologues lack 2,3,7,8- tetrachlorodibenzo-p-dioxin and ß-naphthoflavone binding. Gene 278:223- 234. Byer, J.D., M. Alaee, R.S. Brown, M. Lebeuf, S.Backus, M. Keir, G. Pacepavicious, J. Casselman, C. Belpaire, K. Oliveira, G. Verreault, P.V. and Hodson 2013. Spatial trend of dioxin-like compounds in Atlantic anguillid eels. Chemosphere 91(10): 1439-1446. Cai, J., C. Wang, L. Huang, M. Chen, Z. Zuo 2013. A novel effect of polychlorinated biphenyls: Impairment of the tight junction in mouse epididymis. Toxicological Sciences 134(2):382-390.

69

Canesi, L., C. Ciacci, M. Betti, A. Scarpato, B. Citterio, C. Pruzzo, and G. Gallo 2003. Effects of PCB congeners on the immune function of Mytilus hemocytes: alterations of tyrosine kinase mediated cell signaling. Aquatic Toxicology 63: 293-306. Carnegie, J.A., I. Dardick, B.K. Tsang 1987. Microtubules and the gonadotropic regulation of granulosa cell steroidogenesis. Endocrinology 120:819–828. Chaklravorty, S and J.O. Vigoreaux 2010. Amplification of orthologous genes using degenerate primers. Methods in Molecular Biology 634: 175-185. Chang, E.S., S.A. Chang, R. Keller, P.S. Reddy, M.J. Snyder, J.L. Spees et al. 1999 Quantification of stress in lobsters: crustacean hyperglycemic hormone, stress proteins, and gene expression. American Zoologist 39:487–495 Chen Y., Cassone B.J., Bai X., Redinbaugh M.G., Michael A.P. 2012. Transcriptome of the plant virus vector Graminella nigrifrons, and the molecular interactions of maize fine streak rhabdovirus transmission. PLoS One 7(7):e40613. Chenna, R., H. Sugawara, T. Koike, R. Lopez, T.J. Gibson, D.G. Higgins, and J.D. Thompson 2003. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Research 31:3497-500. Crick, F.H.C. 1966. Codon-anticodon pairing: the wobble hypothesis. Journal of Molecular Biology 19:548-555. 1286-1293. Cullon, D.L., M.B. Yunker, J.R. Christensen, R.W. Macdonald, M.J. Whiticar, N.J. Dangerfield and P.S. Ross 2012. Biomagnification of polychlorinated biphenyls in a harbor seal (Phoca vitulina) food web form the straight of Georgia, British Columbia, Canada. Environmental Toxicology and Chemistry 31(11): 2445-2455. Deane, E.E. and N.Y.S. Woo 2006. Impact of heavy metals and organochlorines on hsp70 and hsc70 gene expression in black sea bream fibroblasts. Aquatic Toxicology. 79:9-15. DeBoever, P., B. Wens, J. Boix, V. Felipo, and G. Schoeters 2013. Perinatal exposure to purity-controlled polychlorinated biphenyl 52, 138, or 180 alters toxicogenomics profiles in peripheral blood of rats after 4 months. Chemical Research in Toxicology 26:1159-1167. Dharuri, H., P.A.C. ‘t Hoen, J.B. van Klinken, P. Henneman, J.F.J. Laros, M.A. Lips, F. el Bouazzaoui, G-J.B. van Ommen et al. 2014. Downregulation of the acetyl-CoA metabolic network in adipose tissue of obese diabetic individuals and recovery after weight loss. Diabetologia 57:2384-2392. Dillies, M-A., A. Rau, J. Aubert, C. Hennequet-Antier, M. Jeanmougin, N. Servant, C. Keime, G. Marot, D. Castel, J. Estelle, G. Guernec et al. 2012 A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing analysis. Briefings in Bioinformatics doi:1o.1093/bib/bbs046 Dillon, T.M., W.H. Benson, R.A. Stackhouse, A.M. Crider 1990. Effects of selected PCB congeners on survival, growth, and reproduction in Daphnia magna. Environmental Toxicology and Chemistry 9:1317-1326. Diz, A.P., A. Carvajal-Rodriguez, and D.O.F. Skibinsk 2011. Multiple hypothesis testing in proteomics: A strategy for experimental work. Molecular and Cellular Proteomics. doi: 10.1074/mcp.M110.004374.

70

Dorneles, P.R., P. Sanz, G. Eppe, A.F. Azevedo, C.P. Bertozzi, M.A. Martinez, E.R. Secchi et al. 2013. High accumulation of PCDD, PCDF, and PCB congeners in marine mammals from Brazil: A serious PCB problem. Science of the Total Environment 463-464: 309-318. Dutta, S.K., P.S. Mitra, S. Ghosh, S. Zang, D. Sonneborn, I. Hertz-Picciotto, T. Trnovec T, et al. 2012. Differential gene expression and a functional analysis of PCB- exposed children: understanding disease and disorder development. Environment International 40:143-154 Eckhardt, S., K. Breivik, S. Mano, A. and Stohl 2007. Record high peaks in PCB concentration in the Arctic atmosphere due to long-range transport of biomass during emissions. Atmospheric Chemistry and Physics 7: 4527- 4536. Edsall, T.A., M.T. Bur, O.T. Gorman, and J.S. Schaeffer 2005. Burrowing mayflies as indicators of ecosystem health: Status population in western Lake Erie, Saginaw Bay and Green Bay. Aquatic Ecosystem Health and Management Strategy 8(2):107-116. Elderkin, C.L., L.D. Corkum, C. Bustos, E.L. Cunningham, and D.J. Berg 2012 DNA barcoding to confirm morphological traits and determine relative abundance of burrowing mayfly species in western Lake Erie. Journal of Great Lakes Research 38:180-186 Eisenberg, E. and E.Y. Levanon 2013. Human housekeeping genes, revisited. Trends in Genetics 29(10): 569-574. Environment Canada. 2015. Polychlorinated Biphenyls (PCBs). Last modified October 3rd, 2014. www.ec.gc.ca/bpc-pcb/. Fang, Z. and X. Cui 2011. Design and validation issues in RNA-seq experiments. Briefings in Bioinformatics 12:280-287. Farcy, E., Serpentini, A., Fievet, B., and J.M Lebel 2007. Identification of cDNAs encoding HSP70 and HSP90 in the abalone Haliotis tuberculata: Transcriptional induction in response to thermal stress in hemocyte primary culture. Comparative Biochemistry and Physiology – Part B 146: 540-550. Fedorov, V.B., A.V. Goropashnaya, T. Øivind, N.C. Stewart, C. Chang, H. Wang, J. Yan, L.C. Showe, M.K. Showe,and B.M. Barnes 2011. Modulation of gene expression in heart and liver of hibernating black bears (Ursus americanus). BMC Genomics 12:171. http://www.biomedcentral.com/1471-2164/12/171. Feng, H., L. Wang, Y. Liu, L. He, M. Li, W. Lu and C. Xu 2009. Molecular characterization and expression of a heat shock gene (HSP90) from the carmine spider mite, Tetranychus cinnabarinus (Boisduval). Journal of Insect Science 19(112) insectscience.org/10.112. Fernandes, E.C., H.S. Hendricks, R.G. van Kleef, M. van den Berg, and R.H. Westerink 2010. Potentiation of the human GABA(A) receptor as a novel mode of action of lower-chlorinated non-dioxin-like PCBs. Environmental Science and Technology 44: 2864-2869 Fingerman, S and M. Fingerman 1977. Effects of polychlorinated biphenyl and polychlorinated dibenofuran on molting of the Fiddler Crab, Una pugilator. Bulletin of Environmental Contaminants and Toxicology 18:138-142.

71

Finotello, F and B. Di Camillo 2014. Measuring differential gene expression with RNA-seq: Challenges and strategies for data analysis. Briefings in Functional Genomics. doi: 10.1093/bfgp/elu035. Folmer, O., M. Black, W. Hoeh, R. Lutz, and R. Vrijenhoek 1994. DNA primers for amplification of mitochondrial cytochrome c oxidase subunit 1 from diverse metazoan invertebrates 3(5): 294-299. Fremling, C.R. and W.L. Mauck. 1980. Methods for using nymphs of burrowing mayflies (Ephemeroptera, Hexagenia) as toxicity test organisms. Pages 81-97 in A.L. Buikema, Jr. and John Cairns, Jr., Eds. Aquatic Invertebrate Bioassays. American Society for Testing and Materials. Fretz, A and K.D. Spindler 2001. Hormonal regulation of actin and tubulin in an epithelial cell line from Chironomus tentans. Archives of insect Biochemistry and Physiology 46:11-18. Gaiser, A.M., C.J.O. Kaiser, V. Haslbeck and K. Richter 2011. Downregulation of the Hsp90 system causes defects in the muscle cells of Caenorhabditis elegans. PLoS One 6(9):e25485. Garber, M., M.G. Grabherr, M. Guttman, and C. Trapnell 2011. Computational methods for transcriptome annotation and quantification using RNA-seq. Nature Methods 8:469-477. Garcia, T.I., Y. Shen, D. Crawford, M.F. Oleskiak, A. Whitehead, and R.B. Walter 2012. RNA-Seq reveals complex genetic responses to deepwater horizon oil release in Fundulus grandis. BMC Genomics 13:474. Garcia-Reyero, N., T. Habib, M. Pirooznia, K.A. Gust, P. Gong, C. Warner, M. Wilbanks et al. 2011. Conserved toxic response across phylogenetic lineages: a meta-analysis of the neurotoxic effects of RDX among multiple species using toxicogenomics. Ecotoxicology 20:580-594. Gerwurtz, S.B., R. Lazar and G.D. Haffner 2000. Comparison of polycyclic aromatic hydrocarbon and polychlorinated biphenyl dynamics in benthic invertebrates of Lake Erie, USA. 2000. Environmental Toxicology and Chemistry. 19(12):2943-2950. Giberson, D.J. and D.M. Rosenberg 1992. Life histories of burrowing mayflies (Hexagenia limbata and H. rigida, Ephemeroptera: Ephemeridae) in a northern Canadian reservoir. Freshwater Biology 32:501-518. Giesy, J.P. and K. Kanna 1998. Dioxin-like and non-dioxin-like effects of polychlorinated biphenyls (PCBs): implications for risk assessment. Critical Reviews in Toxicology 28:511-569. Gioia, R., A.J. Akindele, S.A. Adebusoye, K.A. Asante, S. Tanabe, A. Buekens, A.J. Sasco 2013. Polychlorinated biphenyls (PCBs) in Africa: A review of environmental levels. Environmental Science and Pollution. doi: 10.1007/s11356-013-1739- 1. Groh, K.J., R.N. Carvalho, J.K. Chipman, N.D. Denslow, M. Halder, C.A. Murphy, D. Roelofs, A. Rolaki, K. Shirmer, K.H. Watanabe 2015. Development and application of the adverse outcome pathway framework for understanding and predicting chronic toxicity: I, Challenges and research needs in ecotoxicology. Chemosphere 120:764-777. Gupta, R.S. 1995. Phylogenetic analysis of the 90 kD heat shock family of protein

72

sequences and an examination of the relationship among animals, plants, and fungi species. Molecular Biology and Evolution 12(6): 1063-1073. Gupta, S.C., A. Sharma, M. Mishra, R.K. Mishra, and D.K. Chowdhuri 2009. Heat shock proteins in toxicology: How close and how far. Life Sciences 86:377-384. Hadziavdic, K.; Lekang, K.; Lanzen, A.; Jonassen, I.; Thompson, E.M. and C. Troedsson 2014.Characterization of the 18S rRNA gene for designing universal eukaryote specific primers. PLOS ONE 9(2): e87624. Hahn, M.E. 2002. Aryl hydrocarbon receptors: diversity and evolution. Chemical- Biological Interactions 141:131-160. Hajibabaei, M., J. Xia, G. Drouin 2006. Seed plant phylogeny: genotypes are derived conifers and a sister group to Pinaceae. Molecular Phylogenetics and Evolution 40: 208-217. Hajibabaei, M., J.R. deWaard, N.V. Ivanova, S. Ratnasingham, R.T. Dooh, S.L. Kirk, P.M. Mackie, and P.D.R. Hebert. Critical factors for assembling a high volume of DNA barcodes. Philosophical Transactions of the Royal Society B: Biological Sciences 360:1959-1967. Hamers, T., J.H. Kamstra, P.H. Cenijin, K. Pencikova, L. Palkova, P. Simeckova, J. Vondracek, P.L. Andersson, M. Stenberg, and M. Machala 2011. In vitro toxicity profiling of ultra pure non-dioxin like polychlorinated biphenyl congeners and their relative toxic contribution to PCB mixtures in humans. Toxicological Sciences 121(1): 88-100. Harkey G.A., P.F. Landrum, and S.J. Klain 1994. Comparison of whole sediment, elutriate and pore-water exposures for use in assessing sediment-associated organic contaminants in bioassays. Environmental Toxicology and Chemistry 13(8): 1315-1329. Harwood, A.D., A.K. Rothert, and M.J. Lydy 2014. Using Hexagenia in sediment bioassays: Methods, applicability, and relative sensitivity. Environmental Toxicology and Chemistry 33(4):868-874. Hasegawa, M., S. Asanuma, T. Fujiyuki, T. Kiya, T. Sasaki, D. Endo, M. Morioka and T. Kubo 2009. Differential gene expression in the mandibular glands of queen and worker honeybee, Apis mellifera L: Implications for caste-sensitive aldehyde and fatty acid metabolism. Insect Biochemistry and Molecular Biology 39:661-667. Hebert, P.D.N., A. Cywinski, S.L. Ball, and J.R. deWaard 2003. Biological identifications through DNA barcodes. Proceedings of the Royal Society of London B 270:313-321. Hedges, S.B., J. Marin, M. Suleski, M. Paymer and S. Kumar 2015. Tree of life reveals clock-like speciation and diversification. Molecular Biology and Evolution 32(4):835-845. doi:10.1093/molbev/msv037. Heinloth, A.N., R.D. Irwin, G.A. Boorman, P. Nettesheim, R.D. Fannin, S.O. Sieber, M.L. Snell, et al. 2004. Gene expression profiling of rat livers reveals indicators of potential adverse effects. Toxicological Science 80:193-202. Hoffman, A.A. and Y. Willi 2008. Detecting genetic responses to environmental change. Nature Reviews Genetics 9:421-432.

73

Hu, H. et al. 2014 RNA-Seq Identifies Key Reproductive Gene Expression Alterations in Response to Cadmium Exposure. BioMed Research International. http://dx.doi.org/10.1155/2014/529271. Hunt, B.P. 1958 The life history and economic importance of burrowing mayfly, Hexagenia limbata, in southern Michigan lakes. Bulletin of the Institute of Fisheries Research, No. 4. Michigan Department of Conservation, Lansing, MI. Jaspers, V.L.B., C. Sonne, F. Soler-Rodriguez, D. Boertmann, R. Dietz, M. Eens, L.M. Rasmussen and A. Covaci 2013. Persistent organic pollutants and methoxylated polybromintaed diphenyl ethers in different tissues of white- tailed eagles (Haliaeetus albicilla) from West Greenland. Environmental Pollution 175:137-146. Jelaso, A.M., Lehigh-Shirey, E., J. Means, and C.F. Ide 2003. Gene expression patterns to predict exposure to PCBs in developing Xenopus laevis tadpoles. Environmental and Moleculr Mutagenesis 42:1-10. Jones, R.P., C.Y. Ang and L.S. Inouye 2002. Effects of PCB 30 and its hydroxylated metabolites on ecdysteroid-mediated gene expression. Bulletin of Environmental Contamination and Toxicology 69: 763–770. Jin, X., X. Sun and Q. Song 2005. Woc Gene Mutation Causes 20E-Dependent α-Tubulin Detyrosination in Drosophila melanogaster. Archives of Insect Biochemistry and Physiology 60:116–129. Kanehisa, M. and S. Goto 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research 28:27-30. Kanehisa, M., S. Goto, Y. Sato, M. Kawashima, M. Furumichi, and M. Tanabe 2014. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Research 42:D199-205. Karchner, S.I., Kennedy S.W., Trudeau, S., and M.E. Hahn 2000. Towards molecular understanding of species differences in dioxin sensitivity: initial characterization of Ah receptor cDNAs in birds and an amphibian. Marine Environmental Research 50:51-56. Kedishvili , N. Y., G. W. Goodwin, K. M. Popov and R. A. Harris 2000. Mammalian Methylmalonate-Semialdehyde Dehydrogenase. Methods in Enzymology 324:207-218. Kelly, B.C. and F.A.P.C. Gobas 2001. Bioaccumulation of persistent organic pollutants in lichen-caribou-wolf food chains of Canada’s Central and Western Arctic. Environmental Science and Technology 35:325-334. Kodavanti, P.R.S., C. Osorio, J.E. Royland, R. Ramabhadran, and O. Alzate 2011. Aroclor 1254, a developmental neurotoxicant, alters energy metabolism-and intracellular signaling-associated protein networks in rat cerebellum and hippocampus. Toxicology and Applied Pharmacology 256: 290-299. Koenig, S., P. Fernández, and M. Solé 2012. Differences in cytochrome P450 enzyme activities between fish and crustacea: relationship with the bioaccumulation patterns of polychlorobiphenyls (PCBs). Aquatic Toxicology 108:11-17. Koppaka, V., D.C. Thompson, Y. Chen, M. Ellerman, K.C. Nicolaou, R.O. Juvonen, D. Peterson, R.A. Deitrich et al. 2012. Aldehyde dehydrogenase inhibitors: A comprehensive review of the pharmacology, mechanism of action, substrate specificity, and clinical application. Pharmacological Reviews 64(3):520-539.

74

Kostyniak, P.J., L.G. Hansen, J.J. Widholm, R.D. Fitzpatrick, J.R. Olson, J.L. Helfericj, K.H. Kim et al. 2005. Formulation and characterization of an experimental PCB mixture designed to mimic human exposure from contaminated fish. Toxicological Sciences 88(2): 400-411. Kramer, V.J., M.A. Etterson, M. Hecker, C.A. Murphy, G. Roesijadi, G., D.J. Spade, J.A. Spromberg, M. Wang, and G.T. Ankley 2011. Adverse outcome pathways and ecological risk assessment” Bridging to population level effects. Environmental Toxicology and Chemistry 30(1):64-76. Kruskal W.H. and W.A. Wallis 1952. Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association 47(260):583-621. Krtková,J., A. Zimmermannb, K. Schwarzerováa and P. Nickb 2012. Hsp90 binds microtubules and is involved in the reorganization of the microtubular network in angiosperms. Journal of Plant Physiology 169: 1329– 1339 Kuchipudi, S.V.; Tellabati, M.; Nelli, R.; White, G.A.; Baquero Perez, B. Sebastian, S., Slomka, M.J., Brookes, S.M., Brown, I.H., Dunham, S.P., and K-C Chang 2012. 18S rRNA is a reliable normalization gene for real time PCR based on influenza virus infected cells. Virology Journal 9:230 Kullman, S.W., Hamm, J.T., and D.E. Hinton 2005. Identification and characterization of a cDNA encoding cytochrome P450 3A from the Fresh Water Teleost Medaka (Oryzias latipes). Archives of Biochemistry and Biophysics 380(1): 29-38. Kültz, D. 2005. Molecular and evolutionary basis of the cellular stress response. Annual Reviews in Physiology 67:225-57. LaLone, C.A., D.L. Villeneuve, J.E. Cavallin, M.D. Kahl, E.J. Durhan, E.A. Makynen, K.M. Jensen, K.E. Stevens, M.N. Severson, C.A. Blanksma et al. 2013a. Cross-species sensitivity to a novel agonist of potential environmental concern, spironolactone. Environmental Toxicology and Chemistry 32(11):2528-2541. LaLone, C.A., D.L. Villeneuve, L.D. Burgoon, C.L. Russom, H.W. Helgen, J.P. Berninger, J.E. Tietge, M.N. Severson, J.E. Cavallin, G.T. Ankley 2013b. Molecular target sequence similarity as a basis for species extrapolation to assess the ecological risk of chemicals with known modes of action. Aquatic Toxicology 144-145:141-154. Landrum, P.F. and R. Poore 1988. Toxicokinetics of selected xenobiotics in Hexagenia limbata. Journal of Great Lakes Research 14(4): 427-437. Lasserre, J.P., F. Fack, T. Serchi, D. Revets, S. Planchon, J. Renault, L. Hoffmann, et al. 2012. Atrazine and PCB 153 and their effects on the proteome of the subcellular fractions of human MCF-7 cells. Biochimica et Biophysica Acta (BBA) – Proteins and Proteomics. 1824:833-841. Lee, J.W., E-J. Win, S. Raisuddin, J-S. Lee 2015. Significance of adverse outcomepathways in biomarker-based environmental risk assessment in aquatic organisms. Journal of Environmental Sciences 35: 15-127. Liu, W., F-X. Zhang, M-J, Cai, W-L. Zhao, X-R. Li, J-X. Wang, X-F. Zhao 2013. The hormone-dependent function of Hsp90 in the crosstalk between 20- hydroxyecdysone and juvenile hormone signaling pathways in insects is determined by differential phosphorylation and protein interaction.

75

Biochimica et Biophysica Acta 1830:5184-5192. Liu, Y., J. Zhou, and K.P. White 2014. RNA-seq differential expression studies: more sequence or more replication? Bioinformatics 30:301-304. Liu, S-Z, Z-F. Wei, X-Q. Meng, X-Y. Han, D. Cheng, T. Zhong, T-L. Zhang, Z-B. Wang 2015. Exposure to Aroclor-1254 impairs spindle assembly during mouse oocyte maturation. Environmental Toxicology doi:10.1002/tox.22169. Love, M.I., W. Huber, and S. Anders 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15:550 http://dx.doi.org/10.1186/s13059-014-0550-8. Ludueña, R.F. 2013. A hypothesis on the origin and evolution of tubulin. International Review of Cell and Molecular Biology 302:41-185. Marcadier, J.L., A.M. Smith, D. Pohl, J. Schwartentruber, O.Y. Al-Dirbashi, FORGE Canada Consortium, J. Majewski, R.J.A. Wander et al. 2013. Mutations in ALDH6A1 encoding methylmalonate semialdehyde dehydrogenase are associated with dysmelination and transient methylmalonic acidura. Orphanet Journal of Rare Diseases 8:98. Marchitti, S.A., C. Brocker, D. Stagos and V. Vasiliou 2008. Non-P450 aldehyde oxidizing enzymes: the aldehyde dehydrogenase superfamily. Expert Opinion on Drug Metabolism & Toxicology 4(6):697-720. Marek, R.F., A. Martinez, K.C. and Hornbuckle 2013. Discovery of hydroxylated polychlorinated biphenyls (OH0PCBs) in sediment from Lake Michigan Waterway and original commercial Aroclors. Environmental Science and Technology 47:8204-8210. Maroniche, G.A., Sagadin, M., Mongelli, V.C., Truol, G.A., and M. del Vas 2011. Reference gene selection for gene expression studies using RT-qPCR in virus- infected planthoppers. Virology Journal. 8:230. Martinez, A, C. O’Sullivan, D. Reible and K.C. Hornbuckle 2013. Sediment pore water distribution coefficients of PCB congeners in enriched black carbon sediment. Environmental Pollution 182: 357e363. Mauck, W.L. and Olson, L.E. 1977. Polychlorinated biphenyls in adult mayflies () from the Upper Mississippi River. Bulletin of Environmental Contamination & Toxicology 17(4):387-390.Bulletin McClellan, A.J., Y. Xia, A.M. Deutschbauer, R. W. Davis, M. Gerstein and J. Frydman 2007. Diverse Cellular Functions of the Hsp90 Molecular Chaperone Uncovered Using Systems Approaches. Cell 131: 121–135. McRobb, F.M., V. Sahagún, I. Kufareva, and R. Abagyan 2014. In silico analysis of the conservation of human toxicity and endocrine disruption targets in aquatic species. Environmental Science and Technology 48:1964-1972. Mehinto A.C., C.J. Martyniuk, D.J. Spade and N.D. Denslow 2012. Applications of next- generation sequencing in fish ecotoxicogenomics. Frontiers in Genetics 3(62):1-10. Melymuk, L., M. Robson, P.A. Helm, and M.L. Diamond 2012. PCBs, PBDEs, and PAHs in Toronto air: Spatial and seasonal trends and implications for contaminant transport. Science of the Total Environment 429: 272-280. Menzel, R., H.L. Yeo, S. Rienau, S. Li, and C.E.W. Steinberg, and S.R. Stürzenbaum 2007. Cytochrome P450s and short-chain dehydrogenases mediate the

76

toxicogenomic response of PCB-52 in the nematode Caenorhabditis elegans. Journal of Molecular Biology 370:1-13. Menzel, R., S.C. Swain, S. Hoess, E. Claus, S. Menzel , C.E.W. Steinberg, G. Reifferscheid et al. 2009. Gene expression profiling to characterize sediment toxicity – a pilot study using Caenorhabditis elegans whole genome microarrays. BMC Genomics. 10:160. Milani, D., L.C. Graoentine, and R. Fletcher 2013. Sediment contamination in Lyons Creek East, a tributary of the Niagara River: Part I. Assessment of Benthic Macroinvertebrates. Archives of Environmental Contamination and Toxicology 64:65-86. Miller, H.C., Mills, G.N., Bembo, G.D., Macdonald, J.A., and C.W. Evans 1999. Induction of cytochrome P4501A (CYP1A) in Trematomus bernacchii as an indicator of environmental pollution in Antarctica: assessment by quantitative RT-PCR. Aquatic Toxicology 44: 183-193. Mittapali O., Bai X., Mamidala P., Rajarapu S.P., Bonello P., Herms D.A. 2010. Tissue specific transcriptomics of the exotic invasive insect pest emerald ash borer (Agrilus planipennis). PLoS One 5(10):e13708. MOECC 2011. SOP: Bioaccumulation of sediment-associated contaminants in freshwater organisms (SOP:BIOACC.v12 LaSB Method # E3495) Laboratory Services Branch, Aquatic Toxicology Unit. MOECC 2012. SOP: Hexagenia spp. Acute lethality testing of chemicals (SOP HX3. V5.1) Laboratory Services Branch, Aquatic Toxicology Unit. MOECC 2014. Hexagenia spp. culturing (SOP HX1.v6) Laboratory Services Branch, Aquatic Toxicology Unit. Ockendon, N.F., L.A. O’Connell, S.J. Bush, J. Monzón-Sandoval, H. Barnes, T. Székeley, H.A. Hofmann, S. Dorus, A.O. Urrutia 2015. Optimization of next-generation sequencing transcriptome annotation for species lacking sequenced genomes. Molecular Ecology Resources doi:10.1111/1755-0998.12465. Oen, A. M. P., E. M. L. Janssen, G. Cornelissen, G. D. Breedveld, E. Eek and R. G. Luthy 2011.In Situ Measurement of PCB Pore Water Concentration Profiles in Activated Carbon-Amended Sediment Using Passive Samplers. Environmental Science & Technology 45:4053-4059. Papp, Z., G.R. Bortolotti, M. Sebastian, and J.E.G Smits 2007. PCB congener profiles in nestling tree swallows and their insect prey. Archives of Environmental Contamination and Toxicology. 52:257-263. Park , K. and I-S Kwak 2014. Characterization and gene expression of heat shock protein 90 in marine crab Charybdis japonica following bisphenol A and 4- nonylphenol exposure. Environmental Health and Toxicology 29: e2014002 http://dx.doi.org/10.5620/eht.2014.29.e2014002. Parker, A.L., M. Kavallaris and J. A. McCarroll 2014. Microtubules and their role in cellular stress in cancer. Frontiers in Oncology 153(4): 1-19. Pearl, L.H. and C. Prodromou2006. Structure and mechanism of the Hsp90 molecular chaperone machinery. Annual Review in Biochemistry 75:271-94. Peregrin-Alvarez, J.M., C. Sanford, and J. Parkinson 2009. The conservation and evolutionary modularity of metabolism. Genome Biology 10:R63.

77

Peregrin-Alvarez, J.M., S. Tsoka, and C.A. Ouzounis 2003. The phylogenetic extent of metabolic enzymes and pathways. Genome Research 13:422-427. Picard, D. 2002. Heat-shock protein 90, a chaperone for folding and regulation. Cellular and Molecular Life Sciences 59(10):1640-8. Piña, B., M. Casado, and L. Quirós 2007. Analysis of gene expression as a new tool in ecotoxicology and environmental monitoring. Trends in Analytical Chemistry 26:1145-1154. Pujolar, J.M., I.A.M. Marino, M. Milan, A. Coppe, G.E. Maes, F. Capoccioni, E. Ciccotti et al. 2012. Surviving in a toxic world: transcriptomics and gene expression profiling in response the critically endangered European eel. BMC Genomics 13:507. Qian, X, X. Liu, L. Wang, X. Wang, Y. Li, J. Xiang and P. Wang 2012. Gene expression profiles of four heat shock proteins in response to different acute stresses in shrimp, Litopenaeus vannamei. Comparative Biochemistry and Physiology, Part C: Toxicology & Pharmacology 156(3-4): 211-220. Rajan, V.P. and K.M. Menon 1985. Involvement of microtubules in lipoprotein degradation and utilization for steroidogenesis in cultured rat luteal cells. Endocrinology 117:2408–2416. Rapaport, F., R. Khanin, Y. Liang, M. Pirun, A. Krek, P. Zubo, C.E. Mason, N.D. Socci, and D. Betel 2013. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. GenomeBiology 14:R95. doi:10.1186/gb-2013-14-9-r95. Ren, B. Vallanat, D. Dolker and J.C. Corton 2011. Hepatic xenobiotic metabolizing enzyme and transporter gene expression through the life stages of the mouse. PLoS ONE 6(9): e24381. doi:10.1371/journal.pone.0024381. Rhee, J-S., B-M. Kim, B-S. Choi, I-Y. Choi, R.S.S. Wu, D.R. Nelson and J-S. Lee 2013. Whole spectrum of cytochrome p450 genes and molecular responses to water-accommodated fractions exposure in the marine medaka. Environmental Science & Technology 47(9): 4804-4812. Ribecco, C., G. Hardiman, R. Šášik, S. Vittori, and O. Canevali 2012. Teleost fish (Solea solea): a model for ecotoxicological assay of contaminated sediments. Aquatic Toxicology 109:133-142. Richardson, K.L. and D. Schlenk 2011. Biotransformation of 2,2’,5,5’- Tetrachlorobiphenyl (PCB 52) and 3,3’,4,4’-Tetrachlorobiphenyl (PCB 77) by liver microsomes from four species of sea turtles. Chemical Research in Toxicology 24:718-725. Roberts, A., C. Trapnell, Donaghey J., Rinn, J.L., and L. Pachter 2011. Improving RNA- seq expression estimates by correcting for fragment bias. Genome Biology 1 2:R22 Robinson, M.D. and G.K. Smyth 2008. Small sample estimation of negative binomial distribution, with applications to SAGE data. Biostatistics 9:321-332. Robinson M.D., McCarthy D.J., and Smyth G.K. 2010. edgeR: A bioconductor package f or differential expression analysis of digital gene expression data. Bioinformatics 26:1-

78

Robinson, M and A. Oshlack 2010. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11(3): doi:10.1186/gb-2010-11-3-r25 Rose, J.C., L.E. Epperson, H.V. Carey and S.L. Martin 2011. Seasonal liver protein differences in a hibernator revealed by quantitative proteomics using whole animal isotopic labeling. Comparative Biochemistry and Physiology Part D Genomics Proteomics 6(2):163-170. Rutherford, S.L. and S. Lindquist 1998. Hsp90 as a capacitor for morphological evolution. Nature 396: 336–342. Sandal, S., B. Yilmaz, and D.O. Carpenter 2008. Genotoxic effects of PCB 52 and PCB 77 on cultured human peripheral lymphocytes. Mutation Research 654:88- 92. Sass, J.O., M. Walter, J.P.H. Shield, A.M. Atherton, U. Garg, D. Scott, C.G. Woods, L.D. Smith 2012. 3-Hydroxyisobutyrate aciduria and mutations in the ALDH6A1 gene coding for methylmalonate semialdehyde dehydrogenase. Journal of Inherited Metabolism Disorders 35:437-442. Schurch, N.J., P. Schofield, M. Gierliński, C. Cole, A. Sherstnev, V. Singh, N. Wrobel, K. Gharbi, G.G. Simpson, T. Owen-Hughes, and M. Blaxter 2015. Evaluation of tools for differential gene expression analysis by RNA-seq on a 48 biological replicate experiment. arXiv:1505.02017 [q-bio.GN]. Senthilkumar, P.K., A.J. Klingelhutz, J.A. Jacobus, H. Lehmler, L.W. Roberston, and G. Ludewig 2011. Airborne polychlorinated biphenyls (PCBs) reduce telomerase activity and shorten telomere length in immortal human skin keratinocytes (HaCat). Toxicology Letters 204:64-70. Sewer, M.B. and D. Li 2008. Regulation of Steroid Hormone Biosynthesis by the Cytoskeleton. Lipids 43:1109–1115. Shu, Y., Y. Du, J. Wang 2010. Molecular characterization and expression patterns of Spodoptera litura heat shock protein 70/90, and their response to zinc stress. Comparative Biochemistry and Physiology, Part A 158A: 102-110, 2010. Singh, V.K., R. Govindarajan, S. Naik and A. Kumar 2000. The effect of hairpin structure on PCR amplification efficiency. Molecular Biology Today 1(3):67- 69. Staheli, J.P., Boyce, R.B.;, Kovarik, D., and T.M. Rose 2010. CODEHOP PCR and CODEHOP PCR Primer Design. Methods in Molecular Biology 687: 57-73. Steinberg, C.E.W., S.R. Sturzenbaum, and R. Menzel 2008. Genes and environment – Striking the fine balance between sophisticated biomonitoring and true functional environmental genomics. Science of the Total Environment 400:142-161. Sun, Y., Y. Sheng, L. Bai, Y. Zhang, Y. Xiao, L. Xiao, Y. Tan ad Y. Shen 2014. Characterizing heat shock protein 90 gene of Apolygus lucorum (Meyer-Dür) and its expression in response to different temperature and pesticide stresses. Cell Stress and Chaperones 19:725-739. Suryawanshi, A.R., S. A. Khan, C. S. Joshi and V. V. Khole 2012. Epididymosome- mediated acquisition of MMSDH, and androgen-dependent and developmentally regulated epididymal sperm protein. Journal of Andrology 33(5):963-974.

79

Tachibana, S-I., H. Numata, S.G. Goto 2004. Gene expression of heat-shock proteins (Hsp23, Hsp70 and Hsp90) during and after larval diapause in the blow fly Lucilia sericata. Journal of Insect Physiology 51: 641-647. Tamura, K., D. Peterson, N. Peterson, G. Stecher, M. Nei, and S. Kumar 2011. MEGA 5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular Biology and Evolution 28:2731-2739 Teramitsu, I., Yamamoto, Y., Chiba, I., Iwata, H., Tanabe, S., Fujise, Y., Kazusaka, A., Akahori, F., and S. Fujita 2000. Identification of novel cytochrome P4501A genes from five marine mammal species. Aquatic Toxicology 51:145-153. Triebskorn, R., S. Adam, H. Casper, W. Honnen, M. Pawert, M. Schramm, J. Schwaiger, et al. 2002. Biomarkers as diagnostic tools for evaluating the effects of unknown past water quality conditions on stream organisms. Ecotoxicology 11:451-465. Vafopoulou, X. 2009. Ecdysteroid receptor (EcR) is associated with microtubules and with mitochondria in the cytoplasm of prothoracic gland cells of Rhodnius prolixus (Hemiptera). Archives of Insect Biochemistry and Physiology 72(4):249-262. Vafopoulou, X and C.G. Steel 2012. Cytoplasmic travels of the ecdysteroid receptor in target cells: Pathways for both genomic and non-genomic actions. Frontiers in Endocrinology 3(43):1-16. Vandersteen, W. 2012. Detecting gene expression profiles associated with environmental stressors within an ecological context. Molecular Ecology 20:1322-1323. Van der Oost, R., J. Beyer, and N.P.E. Vermeulen 2003. Fish bioaccumulation and biomarkers in environmental risk assessment. Environmental Toxicology and Pharmacology 13:57-149. Van Praet, N., A. Covaci, J. Teuchies, L. DeBruyn, and H. Van Gossum 2012. Levels of persistent organic pollutants in larvae of the damselfly Ischnura elegans (Odonata, Coenagrinidae) from different ponds in Flanders, Belgium. Science of the Total Environment 423: 162-167. Vaux, D.L., F. Fidler and G. Cumming 2012. Replicates and repeats – what is the difference and is it significant? European Molecular Biology Organization 13(4):291-296 Veldhoen, N., M. Kobylarz, C.J. Lowe, L. Meloche, A.M.H. deBruyn, and C.C. Helbing 2011. Relationship between municipal wastewater outfall in the benthic indicator species Modiolus modiolus (L.). Aquatic Toxicology 105:119-126. Veldhoen, N., C.R. Propper, C.C. Helbing 2014. Enabling comparative gene expression studies of thyroid hormone action through the development of a flexible real- time quantitative PCR assay for the use across multiple anuran indicator and sentinel species. Aquatic Toxicology 148:162-173. Viluksela, M., L.T.M. van der Ven, D. Schrenk, H. Lilienthal, P.L. Andersson, K. Halldin and H. Hakansson 2012. Biological and toxicological effects of non-dioxin like PCBs. Acta Veterinaria Scandinavica 54(Suppl 1):S16. Wang, Z., M. Gerstein, M. Snyder 2009. RNA-seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 10: 57-63.

80

Watkins, N.E. Jr. and J. Jr. Santa Lucia 2005. Nearest-neighbor thermodynamics of deoxyinosine pairs in DNA duplexes. Nucleic Acids Research 33(19): doi:10.1093/nar/gki918. Watson, R.D., S. Ackerman-Morris, W.A. Smith, C.J. Watson and W.E. Bollenbacher 1996. Involvement of microtubules in prothoracicotropic hormone- stimulated ecdysteroidogenesis by insect (Manduca sexta) prothoracic glands. Journal of Experimental Zoology 276(1):63-9. Wellband, K.W. and D.D. Heath 2013. Environmental associations with gene transcription in Babine Lake rainbow trout: evidence for local adaptation. Ecology and Evolution 3:1194-1208 Wens, B., P. DeBoever, M. Maes, K. Hollanders, and G. Schoeters 2011. Transcriptomics identifies differences between ultrapure non-dioxin like polychlorinated biphenyls (PCBs) and dioxin-like PCB 126 in cultured peripheral blood mononuclear cells. Toxicology 287:113-23. Willett, K.L., S.J. McDonald, M.A. Steinberg, K.B. Beatty, M.C. Kennicutt, and S.H. Safe 1997. Biomarker sensitivity for polynuclear aromatic hydrocarbons contamination in two marine fish species collected in Galveston Bay, Texas. Environmental Toxicology and Chemistry 16:1472-1479. Xu, P.J., J.H. Xiao, Q.Y. Xia, B. Murphy and D.W. Huang 2010. Apis mellifera has two isoforms of cytoplasmic HSP90. Insect molecular Biology 19(4):593-597. Yilmaz, B., S. Sandal, and D.O. Carpenter 2012. PCB 9 exposure induced endothelial cell death while increasing intracellular calcium ROS levels. Environmental Toxicology 27:185-191. Zhang, Y., D. Chen, M.A. Smith, B. Zhang, and X. Pan 2012. Selection of reliable reference genes in Caenorhabditis elegans for analysis of nanotoxicity. PLoS One 7:e31849. Zhang, Z., Hu, J., An, W., Jin, F., An, L., and S. Tao 2005. Induction of vitellogenin mRNA in juvenile Chinese sturgeon (Acipenser sinensis Gray) treated with 17ß-estradiol and 4-nonylphenol. Environmental Toxicology and Chemistry 24(8): 1944- 1950. Zhou, H., Y. Qu, H. Wu, C. Liao, J. Zheng, X. Diao, and Q. Xue 2010. Molecular phylogenies and evolutionary behavior of AhR (aryl hydrocarbon receptor) pathway genes in aquatic animals: Implications for the toxicology mechanism of some persistent organic pollutants (POPs). Chemosphere 78:193-205. Zou, E. and M. Fingerman 1997. Effects of estrogenic xenobiotics on molting of the water flea, Daphnia magna. Ecotoxicology and Environmental Safety 38:281- 285. Zou, E. 2005. Impacts of xenobiotics on crustacean molting: The invisible endocrine disruption. Integrative and Comparative Biology 45: 33-38.

81

Appendix - Tables

Table A1. Candidate PCB-interacting genes conserved in animals (N=109). Gene Name (Human) Abbreviation Reference ATP-binding cassette, subfamily B, member 1A ABCB1 Arzuaga et al. 2009; Takahashi et al. 2009 ATP-binding cassette, subfamily C, member 3 ABCC3 Maher et al. 2005; Ovando et al. 2010 Acyl-CoA dehydrogenase, C2-C3, short chain ACADS Wens et al. 2011 Acetylcholinesterase ACHE Muthuvel et al. 2006; Chuiko et al. 2007; Durou et al. 2007 Acyl-CoA oxidase 1, palmitoyl ACOX1 Takahashi et al. 2009 Acyl-CoA synthetase long-chain, family ACSL5 Carlson et al. 2009 member 5 Aryl hydrocarbon receptor AHR Suh et al. 2003; Chen and Bunce 2004; Jensen et al. 2010; Curran et al. 2011; Hamers et al. 2011 Aryl hydrocarbon receptor interacting protein AIP Whitehead et al. 2010 AKT interacting protein AKTIP De Boever et al. 2013 Aminolevulinate dehydratase ALAD Ovando et al. 2010 Aminolevulinate synthase 2 ALAS2 Ovando et al. 2010 Aldehyde dehydrogenase family 1, member 1A ALDH1A1 Carlson et al. 2009 Aldehyde dehydrogenase family 2 ALDH2 Carlson et al. 2009 Aldehyde dehydrogenase family 6, member 1A ALDH6A1 Takahashi et al. 2009 Adenosine monophosphate deaminase AMPD3 Takahashi et al. 2009 Adaptor-related protein complex 2 alpha AP2A2 Sazonova et al. 2011 AT rich interactive domain 1A (SW1-like) ARID1A Whitehead et al. 2010 Aryl hydrocarbon receptor nuclear ARNT Heid et al. 2001; Mortensen transporter et al. 2007; Arzuaga et al. 2009 ATPsynthase, H+ transporting ATP5B Kodavanti et al. 2011 ATP synthase, H+ transporting ATP5H Campagna et al. 2011 Calnexin CANX Takahashi et al. 2009; Lasserre et al. 2012 Catalase CAT Takahashi et al. 2009 Cyclin A2 CCNA2 Wens et al. 2013 CCR4 carbon catabolite repression 4-like CCRN4L De Boever et al. 2013 Chaperonin containing TCP1, subunit 5 CCT5 Campagna et al. 2011 Cell division cycle 5-like CDC5L Takahashi et al. 2009 Cell division cycle 6 CDC6 De Boever et al. 2013 Cyclin-dependent kinase 2 CDK2 Wens et al. 2011 Cyctoskeleton associated protein 5 CKAP5 Wens et al. 2011 Clathrin heavy chain CLTC Campagna et al. 2011 Collagen, type IV, alpha 1 COL4A1 De Boever et al. 2013 Carnitine O-octanoyltransferase CROT Takahashi et al. 2009 Cytochrome b5 type A CYB5A Lasserre et al. 2012 Cytochrome P450, family 3, subfamily A, CYP3A4 Petersen et al. 2007; Thum polypeptide 4 et al. 2008; Williams et al. 2008; Al-Salman and Plant 2012 Cytochrome P450, family 3, subfamily A, CYP3A5 Thum et al. 2008; Kopec et polypeptide 5 al. 2010, 2011 Dopa decarboxylase DDC Ovando et al. 2010

82

Dihydrolipoamide S-acetyltransferase DLAT Ovando et al. 2010 Eukaryotic translation elongation factor EEFIA Campagna et al. 2011 Enolase 1 (alpha) ENO1 Campagna et al. 2011 Epoxide 1, microsomal EPHX1 Lubet et al. 1992 Exocyst complex component 3 EXOC3 Ovando et al. 2010 Fatty acid synthase FASN Carlson et al. 2009; Ovando et al. 2010 FK506 binding protein4 FKBP4 Ovando et al. 2010; Campagna et al. 2011; Lasserre et al. 2012 Glucose-6-phosphate dehydrogenase G6PD Carlson et al. 2009; Ovando et al. 2010 Gamma-aminobutyric acid (GABA) B receptor, GABBR2 Takahashi et al. 2009; 2 Dickerson et al. 2011 GA binding protein transcription factor GABPA Arzuaga et al. 2009 Glyceraldehyde-3-phosphate dehydrogenase GAPDH Campagna et al. 2011 Glutamate dehydrogenase 1 GLUD1 Carlson et al. 2009 Glycerol-3-phosphate acyltransferase GPAM Ovando et al. 2010 Glutamate receptor, metabotropic 3 GRM3 De Boever et al. 2013 Growth hormone regulated TBC protein GRTP1 Williams et al. 2008 Glutathione S- transferase pi 1 GSTP1 Wens et al. 2011 GTP-binding protein 1 GTPBP1 De et al. 2010; Ribecco et al. 2012 2-hydroxy-CoA lyase 1 HACL1 Silkworth et al. 2008; Ovando et al. 2010 Hypoxia inducible factor 1, alpha subunit HIF1A Clausen et al. 2005 Hepatic nuclear factor 4, alpha HNF4∝ Takahashi et al. 2009; Ovando et al. 2010; De Boever et al. 2013 4-hydroxyphenylpyruvate dioxygenase HPD Ovando et al. 2006; Takahashi et al. 2009 Heatshock protein 90kDa beta, member 1 HSP90AB1 Wens et al. 2011 Heatschock 70 kDa protein 1B HSPA1B De et al. 2010 Heatshock 70kDa protein 4 HSPA4 De Boever et al. 2013 Heatshock 70kDa protein 8 HSPA8 Lasserre et al. 2012 Isocitrate dehydrogenase 2 IDH2 Ovando et al. 2006; Insulin-like growth factor 1 receptor IGF1R Wens et al. 2011 Lactate dehydrogenase A LDHA Wens et al. 2011 LIM homeobox 2 LHX2 De Boever et al. 2013 Mitogen-activated protein kinase 1 MAPK1 Al-Anati et al. 2009 Malate dehydrogenase 2, NAD MDH2 Takahashi et al. 2009 NADH-coenzyme Q reductase NDUFS2 Whitehead et al. 2010 Nuclear factor 1/A NF1A Silkworth et al. 2008; Ovando et al. 2010 Nitric oxide synthase 1 (neuronal) NOS1 Takahashi et al. 2009; Whitehead et al. 2010 Nuclear factor subfamily 5 group A, member 2 NR5A2 Takahashi et al. 2009 Prolyl 4-hydroxylase, beta P4HB Takahashi et al. 2009; Okada et al. 2009; Lasserre et al. 2012 Platelelet-activating factor acetylhydrolase 1b, PAFAH1B2 Campagna et al. 2011 catalytic subunit 2 P21 protein (Cdc42/Rac) activated kinase 6 PAK6 De Boever et al. 2013 Poly(ADP-robose) polymerase 1 PARP1 Lin et al. 2009 Phosphoenopyruvate carboxykinase 1 PCK1 Takahashi et al. 2009;

83

Wellband and Heath, 2013 Phosphoenopyruvate carboxykinase 2 PCK2 Takahashi et al. 2009; Wellband and Heath, 2013 Pyruvate dehydrogenase (lipoamide) alpha 1 PDHA1 Takahashi et al. 2009; Campagna et al. 2011 Phosphogluconate dehydrogenase PGD DeFlora S et al. 1985, 1986 Progesterone receptor membrane component PGRMC1 Lasserre et al. 2012 Pyruvate kinase muscle PKM Wellband and Heath, 2013; Campagna et al. 2011 Phospholipase C, beta 1 PLCB1 Wens et al. 2011 RNA polymerase II POLR2B Arzuaga et al. 2009 Protein kinase C substrate PRKCSH Takahashi et al. 2009; Lasserre et al. 2012 Phosphoserine aminotransferase PSAT1 Ovando et al. 2010 Phosphorylase, glycogen, muscle PYGM Campagna et al. 2011 Member RAS ongogene family RAB1A Lasserre et al. 2012 Retinoid X receptor, alpha RXR Olsvik et al. 2013 Ribosomal protein, large P0 RPLP0 Takahashi et al. 2009; Lasserre et al. 2012 Ribosomal protein S8 PRS8 Shimada et al. 2010 Succinate dehydrogenase complex, subunit A SDHA Campagna et al. 2011 Selenium-binding protein 1 SELENBP1 Takahashi et al. 2009; Lasserre et al. 2012 Solute carrier family 17, member 7 SLC17A7 De Boever et al. 2013 Solute carrier family 1, member 3 SLC1A3 Wens et al. 2011 Solute carrier family 7, member 4 SLC7A4 Wens et al. 2011 SW1/SNF related, matrix assoc., actin SMARCB1 Wens et al. 2011 dependent regulator of chromatin, subfamily b, member 1 Sulfontransferase family 1E, estrogen- SULT1E1 Wang et al. 2005; Wang and preferring, 1 James 2007; Ekuase et al. 2011; Hamers et al. 2011 Symplekin SYMPK Wens et al. 2013 Tyrosine hydroxylase TH Lyng et al. 2007 Transmembrane protein 98 TMEM98 De Boever et al. 2013 Tubulin, alpha 1b TUBA1B Campagna et al. 2011 Tubulin, alpha 1c TUBA1C Lasserre et al. 2012 Tubulin, beta 2b TUBB2B Pelletier et al. 2009 UDP-glucose ceramide glycostransferase UGCG Carlson et al. 2009; Ovando et al. 2010 UDP glucuronosyltransferase 1 family , UGT1A1 Ovando et al. 2010; Roos et polypeptide A1 al. 2011 Voltage-dependent anion channel 1 VDAC1 Takahashi et al. 2009 Voltage-dependent anion channel 2 VDAC2 Takahashi et al. 2009 Vitamin D (1,25-dihydroxyvitamin D3) VDR Ju et al. 2012 receptor

Supplementary references (from Table A1)

Al-Anati, L., J. Högberg, and U. Stenius 2009. Non-dioxin like PCBs phosphorylate Mdm2 at Ser166 and attenuate the p53 response in HepG2 cells. Chemico- Biological Interactions 287:113-23.

84

Al-Salman, F. and N. Plant 2012. Non-coplanar polychlorinated biphenyls (PCBs) are direct agonists for the human pregnane-X receptor and constitutive androstane receptor, and activate target gene expression in a tissue-specific manner. Toxicology and Applied Pharmacology 263:7-13. Arzuaga, X., N. Ren, A. Stromberg, E.P. Black, V. Arsenescu, L.A. Cassis, Z. Majkova, et al. 2009. Induction of gene pattern changes associated with dysfunctional lipid metabolism induced by dietary fat and exposure to a persistent organic pollutant. Toxicology Letters 189:96-101. Baker, M.E., B. Ruggeri, L.J. Sprague, C. Eckhardt-Ludka, J. Lapira, I. Wick, L. Soverchia, M. Ubaldi, A.M. Polzonetti-Magni, D. Vidal-Dorsch et al. 2009. Analysis of endocrine disruption in southern California coastal fish using an aquatic multispecies microarray. Environmental Health Perspectives 117(2): 223-230. Campagna, R., L. Brunelli, L. Airoldi, R. Fanelli, H. Hakansson, R.A. Heimeier, P. DeBoever, et al. 2011. Cerebellum proteomics addressing the cognitive deficit of rats exposed to the food-relevant polychlorinated biphenyl 138. Toxicological Sciences 123:170-179. Carlson, E.A., C. McCulloch, A. Koganti, S.B. Goodwin, T.R. Sutter, and J.B. Silkworth 2009. Divergent transcriptomic responses to aryl hydrocarbon receptor agonists between rat and human primary hepatocyctes. Toxicological Sciences 11:257-72. Chen, G. and N.J. Bunce 2004. Interaction between halogenated aromatic compounds in the Ah receptor signal transduction pathway. Environmental Toxicology 19:480-489. Chuiko, G.M., D.E. Tillitt, J.L Zajicek, B.A. Flerov, V.M. Stepanova, Y.Y Zhelnin, and V.A. Podgornaya 2007. Chemical contamination of the Rybinsk Reservoir, northwest Russia: relationship between liver polychlorinated biphenyls (PCB) content and health indicators in bream (Abramis brama). Chemosphere. 67:527-36. Clausen I., S. Kietz, and B. Fischer 2005. Lineage-specific effects of polychlorinated biphenyls (PCB) on gene expression in the rabbit blastocyst. Reproductive Toxicology. 20:47-56. Curran, C.P., C.V. Vorhees, M.T. Williams, M.B. Genter, M.L. Miller, D.W. Nebert 2011. In utero and lactational exposure to a complex mixture of polychlorinated biphenyls: toxicity in pups dependent on the Cyp1A2 and Ahr genotypes. Toxicological Science 119:189-208. De, S., S. Ghosh, R. Chatterjee, Y.-Q. Chen, L. Moses, A. Kesari, E.P. Hoffman, et al. 2010. PCB congener specific oxidative stress response by microarray analysis using human liver cell line. Environment International 36:907-917. DeBoever, P., B. Wens, J. Boix, V. Felipo, and G. Schoeters 2013. Perinatal exposure to purity-controlled polychlorinated biphenyl 52, 138, or 180 alters toxicogenomics profiles in peripheral blood of rats after 4 months. Chemical Research in Toxicology 26:1159-1167. De Flora, S., A. Morelli, C. Basso, M. Romano, D. Serra, and A. De Flora 1985. Prominent role of DT-diaphorase as a cellular mechanism reducing

85

chromium(VI) and reverting its mutagenicity. Cancer Research 45:3188- 96. De Flora, S., M. Romano, C. Basso, M. Bagnasco, C.F. Cesarone , G.A. Rossi, and A. Morelli 1986. Detoxifying activities in alveolar macrophages of rats treated with acetylcysteine, diethyl maleate and/or aroclor. Anticancer Research 6:10009-12. Dickerson S.M., S.L. Cunningham, and A.C. Gore 2011. Prenatal PCBs disrupt early neuroendocrine development of the rat hypothalamus. Toxicology and Applied Pharmacology 252:36-46. Durou, C., L. Poirier, J.C. Amiard, H. Budzinski, M. Gnassia-Barelli, K. Lemenach, L. Peluhet, et al. 2007. Biomonitoring in a clean and multi-contaminated estuary based on biomarkers and chemical analyses in the endobenthic worm Nereis diversicolor. Chemosphere 67:527-36. Ekuase, E.J., Y. Liu, H.J Lehmler, L.W. Roberston, and M.W. Duffel 2011. Structure- activity relationships for hydroxylated polychlorinated biphenyls as inhibitors of the sulfation of dehydroepiandrosterone catalyzed by human hydroxysteroid sulfotransferase SULT2A1. Chemical Research Toxicology 24:1720-8. Hamers, T., J.H. Kamstra, P.H. Cenjin, K. Pencikova, L. Palkova, P. Simeckova, and J. Vondracek, et al. 2011. In vitro toxicity profiling of ultrapure non-dioxin-like polychlorinated biphenyl congeners and their relative toxic contribution to PCB mixtures in humans. Toxicological Sciences 121:88-100. Heid, S.E., M.K. Walker, and H.I. Swanson 2001. Correlation of cardiotoxicity mediated by halogenated aromatic hydrocarbons to aryl hydrocarbon receptor activation. Toxicological Sciences 61:187-96. Jensen, B.A., C.M. Reddy, R.K. Nelson, and M.E. Hahn 2010. Developing tools for risk assessment in protected species: Relative potencies inferred from competitive binding of halogenated aromatic hydrocarbons to aryl hydrocarbon receptors from beluga (Delphinapterus leucus) and mouse. Aquatic Toxicology 100: 238-45. Ju, L., K. Tang, X.R. Guo, Y. Yang, G.Z. Zhu, and Y. Lou 2012. Effects of embryonic exposure to polychlorinated biphenyls on zebrafish skeletal development. Molecular Medicine Reports 5: 1227-31. Kodavanti, P.R., C., Osorio, J.E., Royland, R. Ramabhadran, and O. Alzate O 2011. Aroclor 1254, a developmental neurotoxicant, alters energy metabolism and intracellular signaling-associated protein networks in rat cerebellum and hippocampus. Toxicology and Applied Pharmacology. 256:290-9. Kopec, A.K., L.D. Burgoon, D. Ibrahim-Aibo, B.D. Mets, C. Tashiro, D. Potter, B. Sharratt et al. 2010. PCB153-elicited hepatic responses in the immature, ovariectomized C57BL/6 mice: comparative toxicogenomic effects of dioxin and non-dioxin-like ligands. Toxicology and Applied Pharmacology 243:359- 371. Kopec, A.K., M.L. D’Souza, B.D. Mets, L.D. Burgoon, S.E. Reese, K.J. Archer, D. Potter, et al. 2011. Non-additive hepatic gene expression elicited by 2,3,7,8- tetrachlorodibenzo-p-dioxin (TCDD) and 2,2’,4,4’,5,5’-hexachlorobiphenyl

86

(PCB153) co-treatment in C57BL/6 mice. Toxicology and Applied Pharmacology 256: 154-167. LaLone, C.A., D.L. Villeneuve, J.E. Cavallin, M.D. Kahl, E.J. Durhan, E.A. Makynen, K.M. Jensen, K.E. Stevens, M.N. Severson, C.A. Blanksma et al. 2013a. Cross-species sensitivity to a novel androgen receptor agonist of potential environmental concern, spironolactone. Environmental Toxicology and Chemistry 32(11):2528-2541. Lasserre, J.P., F. Fack, T. Serchi, D. Revets, S. Planchon, J. Renault, L. Hoffmann, et al. 2012. Atrazine and PCB 153 and their effects on the proteome of the subcellular fractions of human MCF-7 cells. Biochimica et Biophysica Acta (BBA) – Proteins and Proteomics. 1824:833-841. Lin, C.H., C.L. Huang, M.C. Chuang ,Y.J. Wang, D.R. Chen, S.T. Chen, and P.H. Lin 2009. Protective role of estrogen receptor-alpha on lower chlorinated PCB congener-induced DNA damage and repair in human tumoral breast cancer cells. Toxicology Letters. 188:11-9. Lubet, R.A., K.H. Dragnev, D.P. Chauhan, R.W. Nims, B.A. Diwan, J.M. Ward, C.R. Jones, et al. 1992. A pleiotropic response to phenobarbital-type enzyme inducers in the F344/NCr rat: effects of chemicals of varied structure. Biochemical Pharmacology 43:1067-78. Lyng, G.D., A. Snyder-Keller, and R.F. Seegal 2007. Polychlorinated biphenyl-induced neurotoxicity in organotypic cocultures of developing rat ventral mesencephalon and striatum. Toxicological Sciences 97:128-39. Maher, J.M., X. Cheng, A.L Slitt, M.Z. Dieter, and C.D. Klaassen 2005. Induction of the multidrug resistance-associated of transporters by chemical activators of receptor-mediated pathways in mouse liver. Drug Metabolism & Disposition 33:956-62. Mortensen, A.S., M. Braathen, M. Sandvik, and A. Arukwe 2007. Effects of hydroxy- polychlorinated biphenyl (OH-PCB) congeners on the xenobiotic biotransformation gene expression patterns in primary culture of Atlantic salmon (Salmo salar) hepatocytes. Ecotoxicology and Environmental Safety 68:351-360. Muthuvel, R., P. Venkataraman, G. Krishnamoorthy, D.N. Gunadharini, P. Kanagaraj, A. Jone Stanley, N. Srinivasan et al. 2006. Antioxidant effect of ascorbic acid on PCB (Arcolor 1254) induced oxidative stress in hypothalamus of albino rats. Clinica Chimica Acta. 365: 297-303. Okada, K., S. Hashimoto, Y. Funae, and S. Imaoka 2009. Hydroxylated polychlorinated biphenyls (PCBs) interact with protein disulfide and inhibit its activity. Chemical Research in Toxicology 22:899-904. Olsvik, P.A., V. Berg, and J.L. Lyche 2013. Transcriptional profiling in burbot (Lota lota) from Lake Mjøsa – a Norwegian lake contaminated by several organic pollutants. Ecotoxicology and Environmental Safety 92: 94-103. Ovando, B.J., C.M. Vezina, B.P McGarrigle, and J.R. Olson 2006. Hepatic gene down regulation following acute and subchronic exposure to 2,3,7,8- tetrachlorodibenzo-p-dioxin. Toxicological Sciences 94: 428-38. Ovando, B.J., C.A. Ellison, C.M. Vezina, and J.R. Olson 2010. Toxicogenomic analysis exposure of TCDD, PCB126 and PCB153: identification of genomic

87

biomarkers of exposure to AhR ligands. BMC Genomics 11:583. doi:10.1186/1471-2164-11-583. Pelletier, G., S. M.J. Masson, J. Wade, J. Nakai, R. Alwis, S. Mohottalage S, P. Kumarathasan, P. Black, et al. 2009. Contribution of methylmercury, polychlorinated biphenyls and organochlorine pesticides to the toxicity of a contaminant mixture based on Canadian Arctic population blood profiles. Toxicology Letters. 184: 176-85. Ribecco, C., G, Hardiman, R. Sasik, S. Vittori, and O. Carnevali 2012. Teleost fish (Solea solea): A novel model for ecotoxicological assay of contaminated sediments. Aquatic Toxicology 109: 133-142. Roos, R., P.L. Andersson, K. Halldin, H. Hakansson, E. Westerholm, T. Hamers, G. Hamscher, et al. 2011. Hepatic effects of a highly purified 2,2'',3,4,4'',5,5''- heptachlorbiphenyl (PCB 180) in male and female rats. Toxicology 284:42- 53. Sazonova, N.A., T. DasBanerjee, F.A. Middleton, S. Gowtham, S. Schuckers, and S.V. Faraone 2011. Transcriptome-wide gene expression in a rat model of attention deficit hyperactivity disorder symptoms: rats developmentally exposed to polychlorinated biphenyls. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 156: 898-912. Silkworth, J.B., E.A. Carlson, C. McCulloch, K. Illouz, S. Goodwin, and T.R. Sutter 2008. Toxicogenomic analysis of gender, chemical, and dose effects in livers of TCDD-or aroclor 1254 exposed rats using a multifactorial model. Toxicological Sciences 102: 291-309. M. Shimada, S. Kameo, N. Sugawara, K. Yaginuma-Sakurai, N. Kurokawa, S. Mizukami-Murata, K. Nakai et al. 2010. Gene expression profiles in the brain of the neonate mouse perinatally exposed to methymercury and/or polychlorinated biphenyls. Archives of Toxicology 84: 271-86. J Suh, J.S. Kang, K.H. Yang, and N.E. Kaminski 2003. Antagonism of the aryl- hydrocarbon receptor-dependent induction of CYP1A1 and inhibition of IgM expression by di-ortho-substituted polychlorinated biphenyls. Toxicololgy and Applied Pharmacology 187:11-21. Takahashi, M. T. Negishi, M. Imamura, E. Sawano, Y. Kuroda, Y. Yoshikawa, and T. Tashiro 2009. Alterations of glutamate receptors and exocytosis-related factors by a hydroxylated-polychlorinated biphenyl in developing rat brains. Toxicology 257:17-24. Thum, T. and J. Borlak 2008. Detection of early signals of hepatotoxicity by gene expression profiling studies with cultures of metabolically competent human hepatocytes. Archives of Toxicology 82:89-101. Veldhoen, N., C.R. Propper, C.C. Helbing 2014. Enabling comparative gene expression studies of thyroid hormone action through the development of a flexible real- time quantitative PCR assay for the use across multiple anuran indicator and sentinel species. Aquatic Toxicology 148:162-173. Wang, L.Q., H.J. Lehmler, L.W. Roberston, C.N. Falany, and M.O. James 2005. In vitro inhibition of human hepatic and cDNA-expressed sulfotransferase activity with 3-hydroxybenzo[a]pyrene by polychlorobiphenylols. Environmental Health Perspectives 113:680-7.

88

Wang, L.Q. and M.O. James 2007. Sulfonation of 17beta-estradiol and inhibition of sulfotransferase activity by polychlorobiphenylols and celecoxib in channel catfish, Ictalurus punctatus. Aquatic Toxicology 81:286-92. Wens, B., P. DeBoever, M. Maes, K. Hollanders, and G. Schoeters 2011. Transcriptomics identifies differences between ultrapure non-dioxin like polychlorinated biphenyls (PCBs) and dioxin-like PCB 126 in cultured peripheral blood mononuclear cells. Toxicology 287: 113-23. Wens, B., P. DeBoever, M. Verbeke, K. Hollanders, and G. Schoeters 2013. Cultured human peripheral blood mononuclear cells alter their gene expression when challenged with endocrine-disrupting chemicals. Toxicology 303:17-24. Whitehead, A., D.A. Triant, D. Champlin, and D. Nacci 2010. Comparative transcriptomics implicates mechanism of evolved pollution tolerance in a killifish population. Molecular Ecology 19: 5186-5203. Williams, T.D., A. Diab, F. Ortega, V.S. Sabine, R.E. Godfrey, F. Falciani, J.K. Chipman, and S.G. George 2008. Transcriptomic responses of European flounder (Platichthys flesus) to model toxicants. Aquatic Toxicology 90:83-91.

89

Table A2. Summary statistics for domain nucleotide conservation of candidate PCB- interacting genes per KEGG pathway.

Std. 95% Confidence N Mean Deviation Median Interval for Mean Min. Max. Lower Upper Cellular 17 53.01 9.90 53.88 49.15 56.87 30.93 72.76 Processes Environmenta 14 51.29 5.40 50.60 47.04 55.54 42.65 60.90 l Information Processing Genetic 23 55.60 6.99 55.22 52.31 51.30 31.72 65.82 Information Processing Metabolism 51 49.09 8.23 50.43 46.86 51.32 31.72 65.82

Source of Sum of d.f Mean F p Variation Squares squares between 722.8 3 240.9 3.797 0.013 within 6410 101 63.46 total 7133 104 ANOVA test summary – nucleotide conservation

90

Table A3. Summary statistics for domain amino acid conservation of candidate PCB-interacting genes per KEGG pathway.

Std. 95% Confidence N Mean Deviation Median Interval for Mean Min. Max. Lower Upper Cellular 17 70.79 18.0 69.99 63.45 78.14 32.43 97.56 Processes Environmental 14 65.38 14.2 61.20 57.29 73.48 41.18 94.37 Information Processing Genetic 23 75.81 13.1 80.56 69.50 82.13 50.00 91.63 Information Processing Metabolism 51 62.86 15.5 65.28 58.62 67.10 25.26 97.52

Source of Sum of d.f Mean F p Variation Squares squares between 2931 3 976.8 4.190 0.0077 within 2.3548E 101 233.1 04 total 2.6478E 104 04 ANOVA test summary – amino acid conservation

91

Table A4. CODEHOP primers and primer properties (N=68).

Upper case letters = non-degenerate core; Lower case letter = degenerate clamp; D=degeneracy; Tm=melting temperature; *=housekeeping gene.

92

93

94

Table A5. BLAST output for all genes tested using CODEHOP primers for all primers.

Q=query; SL=sequence length Low quality Incorrect gene sequence identified

95

96

97

98

Appendix – Figures

Figure A2a A simplified comparison of data generation between RNA-seq and targeted RNA-seq. *In Targeted RNA-seq, Total RNA can also be prepared initially and reverse transcribed into cDNA; primers can enrich for mRNA during PCR.

99

Figure A2b A simplified comparison of data analysis between RNA-seq and targeted RNA-seq.

100