DE NOVO TRANSCRIPTOME ASSEMBLY FOR THE OLYMPIA

( LURIDA), A OF CONSERVATIONAL CONCERN

A University Thesis Presented to the Faculty of State University, East Bay

In Partial Fulfillment of the Requirements for the Degree Master of Science in Biological Science

By Ashley Maynard June 2016

Copyright © 2016 by Ashley Maynard

ii

ABSTRACT The Olympia oyster, , is the only oyster native to the west coast of N.

America. Once abundant, O. lurida is now considered functionally extinct and absence of

Olympia has contributed to a continual decline in the health of California’s valuable estuary ecosystems. Human-assisted reintroduction is one potential strategy to increase O. lurida population sizes and help restore Californian estuaries. Ideally, restoration would use genotypes capable of surviving future conditions. However, which

O. lurida populations will be most tolerant of climate change, in particular low salinity, and therefore most suitable for use in reintroduction remains uncertain. RNA-Seq has emerged as a vital tool to understand ecological and evolutionary processes of non-model organisms. Using a novel RNA-Seq pipeline, a transcriptome for O. lurida was generated yielding 51,574 contigs and accounting for upwards of 10,000 unique genes. Quality control metrics including mean contig length (1664 bp), percent of reads mapped back to the reference transcriptome (78%), and the percent of annotated reads (49%), are similar to other non-model organism transcriptome assemblies, and offers a substantial improvement over existing sequence resources for O. lurida. In important future research, this transcriptome can be used to better understand how O. lurida populations manage environmental stress at the molecular level and help to identify which populations of O. lurida are ultimately best suited for reintroduction.

iii

DE NOVO TRANSCRIPTOME ASSEMBLY FOR THE OLYMPIA OYSTER

(OSTREA LURIDA), A SPECIES OF CONSERVATIONAL CONCERN

By Ashley Maynard

Approved: Date: -/a- Lsr ' Tyler G. Evans

, /Z~ • I s/[) /J~ Clistopher Baysdorfer

iv

ACKNOWLEDGEMENTS There are a lot of people that should be thanked for their support throughout this research and honestly I think making sure everyone is given their due is almost harder than writing all the research in the first place. For anyone who has completed or is even in the processing of scientific research, I am sure you can sympathize and relate to the vast amounts of tireless work that goes into it. I strongly believe that without the support of the following people I wouldn’t have ever been able to end this journey with my sanity intact. Firstly, I would like to express my sincere gratitude to my advisor Dr. Tyler Evans for his knowledge and guidance throughout my Master’s research. Besides my advisor, I would like to extend my thanks for the rest of my thesis committee: Dr. Christopher

Baysdorfer and Dr. Brain Perry. I would also like to extend my thanks to my collaborators, Dr. Jill Bible and Dr. Eric Sanford, at UC Davis. If not for their amazing research and diligence this research would not have been possible. It should be mentioned that there are those that were not directly involved with this project but who have supplied me with confidence, strength and knowledge throughout my time at California

State University East Bay and so, I would like to thank the staff and faculty of the

Biology department. Additionally, I want to thank my friends from the BCP and Masters programs. It was the stimulating conversations, drinking nights and healthy competition that pushed me though all those bumps in the road so inherent to research. Lastly, but certainly not least, I want to thank my family. I love you all so very much and it really has been your support and love that have motivated me to press on even when I wanted to give up. Thank you all so very much, I am forever thankful.

v

TABLE OF CONTENTS Abstract ...... iii Acknowledgements ...... v List of Figures ...... viii List of Tables ...... x 1 Introduction ...... 1 1.1 Background ...... 1 1.2 Thesis Statement ...... 8 1.3 Significance of Research ...... 9 1.4 RNA Sequencing of Olympia Oysters ...... 10 2 Experimental Procedures ...... 14 2.1 Experimental Workflow ...... 14 2.2 Collection and Salinity Challenge ...... 14 2.3 RNA Isolation ...... 16 2.4 cDNA Library Construction ...... 18 2.5 Assessment of Library Quality ...... 26 2.6 RNA-Sequencing ...... 27 2.7 Bioinformatics ...... 29 2.8 Mapping Reads to De Novo Transcriptome ...... 36 2.9 Phylogeny ...... 37 2.10 Comparison of Gene Ontology Among Bivalves ...... 38 3 Results ...... 39 3.1 RNA Sequencing ...... 39 3.2 Transcriptome Assembly...... 39 3.3 Annotation ...... 45 4 Discussion ...... 49 4.1 De Novo Transcripome Assembly...... 49 4.2 Future Work ...... 65 References ...... 68 Appendix A: RNA Isolation Information ...... 75

vi

Appendix B: RNA-Seq Sample Lane Assignments...... 78 Appendix C: Assembly Parameters ...... 80 Appendix D: Raw Transcriptome Results ...... 83 Appendix E: Reads Mapped to Final Transcriptome ...... 86 Appendix F: Genebank Sequences (16S) Used For Phylogeny Tree ...... 87

vii

LIST OF FIGURES

Figure 1: O. lurida in natural environment in ...... 2

Figure 2: The historic distribution of O. lurida ...... 3

Figure 3: Climate processes, from global to estuarine outcomes ...... 6

Figure 4: Mortality results from salinity exposure among O. lurida populations ...... 8

Figure 5: Map of collection sites...... 9

Figure 6: Maximum likelihood tree based on 16S subunit of rRNA...... 12

Figure 7: Experimental workflow ...... 14

Figure 8: O. lurida settled on experimental settlement plate...... 15

Figure 9: Traditional method of de novo assembly...... 33

Figure 10: Stepwise method of de novo assembly...... 35

Figure 11: Number of contigs comparison across four assembly levels ...... 40

Figure 12: Mean contig length comparison across four assembly levels ...... 41

Figure 13: N50 comparison across four assembly levels...... 43

Figure 14: Nucleotide distribution among assembly levels...... 44

Figure 15: Percent of the sample reads mapped to the assembled transcriptome...... 45

Figure 16: BLASTx and BLASTp results from Trinotate ...... 46

Figure 17: Cellular Component GO ID comparison of BLASTx and BLASTp...... 47

Figure 18: Molecular Function GO ID comparison of BLASTx and BLASTp ...... 47

Figure 19: Biological Processes GO ID comparison of BLASTx and BLASTp...... 48

Figure 20: Comparison of mean contig lengths among four oyster species ...... 50

viii

Figure 21: Comparison of N50 and mean contig lengths of the traditional and stepwise

assembly methods...... 51

Figure 22: Comparison of the overall number of contigs between the “traditional” and

stepwise assembly methods...... 53

Figure 23: Percent of contigs annotated by BLAST homology compared among non-

model organisms ...... 54

Figure 24: Comparison of GO IDs among four bivalves...... 56

Figure 25: Comparison of the number of genes by GO ID category ...... 60

Figure 26: Map of salt-related signal transduction, effectors, and physiological changes in

respsonse to salt stress ...... 62

Figure 27: Biological GO IDs related to oyster response to hypoosomotic stress ...... 63

Figure 28: Future experimental workflow ...... 65

ix

LIST OF TABLES

Table 1: Salinity exposure table for gene expression...... 16

Table 2: PCR conditions for library enrichment...... 25

Table 3: Lane and barcode assignments for each of the 36 samples sequenced...... 29

Table 4: Distribution of samples by population and treatment...... 32

Table 5: Summary table of dispersion of sample reads...... 39

Table 6: Salinity exposure table for SNP detection...... 66

x

1

1 INTRODUCTION

1.1 BACKGROUND

The Olympia oyster Ostrea lurida (Fig 1) is the only native oyster along the west coast of the United States and Canada. Ostrea lurida were once abundant in bays and estuaries along their historical distribution (Fig 2), and were harvested at low intensities for thousands of years (Barrett, 1963). However, numbers of O. lurida began to decrease sharply at the beginning of the Gold Rush in 1849 (Kirby, 2004), when a growing population of people began to inhabit the San Francisco Bay area. Increased demand for

O. lurida led to the commercialization of fisheries, which by 1890 had dramatically reduced oyster populations within California estuaries (Kirby, 2004). In the decades that followed, continued overfishing, pollution, and other anthropogenic activities, acted to further deplete O. lurida numbers in coastal California such that they are presently considered functionally extinct (Baker, 1995). Today oyster abundance is estimated to be

1% of historic levels in the San Francisco Bay (Wasson et al., 2014).

As O. lurida numbers decline, so too does the health of California’s estuaries.

Ostrea lurida is considered a foundation species that provides habitat for a community of at least 47 other species and assists in maintaining the overall health of estuary environments by providing valuable ecosystem services such as coastline protection and water filtration (Kimbro and Grosholz, 2006, Newell 2004; Grosholz et al., 2007; Zabin et al., 2010). As suspension feeders, O. lurida are efficient filtration systems and a single adult O. lurida can filter approximately 2.4 liters of seawater per hour at 25°C

(Ermgassen et al., 2013).

2

Figure 1: O. lurida in natural environment in San Francisco Bay.

At this rate, up to 62% of suspended particulate matter can be removed from waters overlaying assemblages, which improves water quality and stimulates the growth of beneficial sea grasses (Brumbaugh et al., 2007). In addition, O. lurida grow in large oyster beds that act like breakwaters by attenuating waves and preventing coastal erosion (Hertlein, 1959).

Considering the ecological importance of O. lurida, resource managers have been targeting O. lurida for restoration since the late 1990s. Early efforts to restore O. lurida began in , Washington in 1999 (Peter-Contesse and Peabody, 2005).

Restoration efforts typically involve spreading oyster seed, large-scale deployment of substrate to encourage juvenile settlement, or a combination of the two. The effort in

Puget Sound continues today: currently the Puget Sound Restoration Fund is engaged in an ongoing 10 year restoration project using both methods of restoration (Olympia Oyster

Restoration). Other projects in Oregon and in Central and Southern California range from

3

Figure 2: The historic distribution of O. lurida along the North American Coast from Alaska to Baja (depicted in orange).

deploying small structures for recruitment, to larger-scale mixed species restoration projects with both physical and biological objectives in a “living shorelines” model

(Wasson et al., 2014). In July and August of 2012, the San Francisco Bay Living

Shorelines Project constructed oyster and eelgrass reefs as part of a restoration plan for the San Rafael and Hayward Shoreline. In 2015 the San Francisco Bay Living Shorelines

Project reported over two million oysters settled on reef structures. Accumulation of O. lurida is credited with providing substantial food and nesting resources, increased wave attenuation rates by 30-50%, and increases in the invertebrate, fish and bird populations

(SF Bay Living Shorelines).

Human-assisted transplants of adult oysters into degraded estuaries represents another potential strategy for increasing O. lurida numbers and promoting estuary

4

recovery. Where seeding oyster larvae is not an option due to unfavorable conditions for recruitment, seeding sites with adult oysters is the only remaining option to encourage juvenile settlement and create a self-sustaining population. While there are no currently known restoration efforts involving transplantation of adult Olympia oysters, this approach has been used successfully in the past for estuaries on the east coast using

Eastern oysters ( virginica). In 1998, 65,000 adult oysters were transplanted into experimental estuaries in Virginia, and large increases in the number of juveniles and recruits were observed the following season (Brumbaugh et al., 2000).

Rapid climate change poses a growing threat to O. lurida and to human-assisted restoration efforts. While there have been several studies investigating the factors limiting

O. lurida restoration with regard to substrate and recruitment (Brumbaugh & Coen, 2009;

Brumbaugh et al., 2006; Wasson, 2010; Trimble et al., 2009), little is known about how climate change will affect restoration efforts. Restoration involving reintroduction of adult oysters is currently hampered by uncertainty about whether or not O. lurida populations differ in their susceptibility toward climate change stressors, and which stock populations (i.e. genotypes) should be used in restoration to ensure transplanted oysters will survive future environmental conditions. Even relatively small restoration projects can be costly, and it makes little sense to transplant oysters whose genotype limits their ability to cope with climate change. Brumbaugh et al. (2007) estimate costs upwards of

$100,000 per acre for restored oyster reef. Ongoing restoration projects in Maryland and

Virginia involving the construction of artificial reefs using oyster shells have estimated costs of $10,000 per acre (Schaafsma & Turner, 2009).

5

Climate change-driven shifts in water salinity have emerged as a prominent risk to

O. lurida in . Estuaries in San Francisco Bay are expected to increase in overall salinity due to rising sea levels, while concurrently experiencing more frequent freshwater floods as a result of increased and heavier winter precipitation (Cloern et al.,

2011), and freshwater runoff from land (Scavia et al., 2002). Climate change increases in air temperature causing precipitation that typically falls as snow to fall as rain and also accelerates winter snowmelt. These freshwater flows inundate estuary oyster beds causing large and rapid declines in water salinity. (Fig 3; Scavia et al., 2002). Ostrea lurida are osmoconformers, meaning they are unable to actively regulate the osmotic composition of their extracellular fluids. Oysters and other bivalves maintain osmotic homeostasis during sudden and dramatic shifts in salinity largely through behavioral changes, physically closing their shells until peripheral osmoreceptors detect a return to normal salinity (Berger & Kharazova, 1997). Valve closing behavior can allow marine mollusks to survive for short periods under salinity stress, but prevents access to oxygen thus restricting aerobic metabolism, and is therefore not a long term coping mechanism.

Field and laboratory observations show that O. lurida thrive in salinities above 25ppt

(Baker, 1995; Couch & Hassler, 1989), but tolerate only short exposure to lower salinities

(Wasson et al., 2014). For example, juvenile O. lurida suffer significant mortality when exposed to salinities below 10 practical salinity units (psu) for five or more days (Wasson et al., 2014). Field monitoring data also show that oyster performance metrics, including size, recruitment rate, and growth, are negatively correlated with the average percentage of days when salinity is below 25 psu (Wasson et al., 2014). Extreme freshwater events,

6

like those predicted to occur in the San Francisco Bay, overwhelm short-term strategies for maintaining osmotic homeostasis and can cause mass mortalities in oyster beds.

Furthermore, sub-lethal exposure to water with reduced

Figure 3: Climate processes, from global to estuarine outcomes.

salinity is likely to impair physiological performance in O. lurida and compromise the ability of oysters to provide ecosystem services that facilitate estuary restoration (Wasson et al., 2014).

The long-term persistence of O. lurida in San Francisco Bay and success of restoration efforts depend in part on identifying genotypes tolerant of low salinity water, which requires an enhanced understanding of local adaptation in O. lurida, and also how distinct oyster populations will respond to future stresses stemming from climate change.

Local adaptation results in resident genotypes that have a higher fitness in their native habitat than do foreign genotypes from more distant populations (Kawecki & Ebert,

7

2004). As a consequence of local adaptation, populations of the same species react differently to changing environmental conditions. For example, one population of oysters may possess a trait, such as increased low salinity tolerance, that an adjacent population encountering different environmental conditions will not have. Some populations of O. lurida within the San Francisco Bay appear locally adapted to water salinity, which is thought to derive from intense selection following freshwater flooding (Bible & Sanford,

2016). Sites exposed to low salinity water favor the survival of genotypes more tolerant of hypoosmotic stress compared with sites that experience freshwater events less frequently. Identifying populations adapted to low salinity will facilitate estuary restoration efforts in Northern California because these populations are most likely to survive future environmental conditions projected for the region.

In 2011, freshwater flooding caused mass mortality within the Loch Lomond O. lurida population of San Francisco Bay (Bible & Sanford, 2016). We hypothesized that oysters surviving this mortality event will have been selected for enhanced tolerance of reduced salinity. Consistent with this prediction, experiments performed by our collaborators, Jillian Bible and Eric Sanford from Bodega Marine Lab of the University of California Davis (BML), demonstrate that Loch Lomond oysters exhibit significantly greater survival during low salinity challenge compared with Oyster Point and Tomales

Bay populations (Fig 4). As the frequency of freshwater flooding is expected to increase within these estuaries due to climate change, Loch Lomond oysters may be better suited for future restoration efforts in Northern California.

8

Figure 4: Morality results from salinity exposure among O. lurida populations. Control conditions are shown on right next to freshwater flooding conditions shown on left. Graph depicts survival along the y axis and test population along the x axis. Of the three populations there is a significant difference between the survival of the Loch Lomond oysters as compared to the Oyster Point and Tomales Bay oyster populations that were exposed to freshwater flooding conditions.

1.2 THESIS STATEMENT

The mechanistic basis for increased tolerance of low salinity seawater in Olympia oysters is unknown, and rarely does nature provide an opportunity to explore adaptive responses to the environment so soon after a major selective event. In order to explore local adaptation to salinity in stress in O. lurida populations inhabiting three Northern

California estuaries: Tomales Bay, Oyster Point, and Loch Lomond (Smith and

Hollibaugh, 1997; NERR 2010; Fig 5), I used a comparative genomics approach to explore mechanisms of enhanced low salinity tolerance in O. lurida. A necessary first step toward achieving this goal was to increase DNA sequence resources for O. lurida, and here I describe RNA sequencing and construction of a comprehensive transcriptome

9

for O. lurida. This research identified over 10,000 putative genes for this species of conservation concern and the O. lurida transcriptome will facilitate future studies aimed at understanding the molecular underpinnings of low salinity tolerance.

Figure 5: Map of collection sites, Tomales Bay (TB), Loch Lomond (LL), and Oyster Point (OP), in the San Francisco Bay.

1.3 SIGNIFICANCE OF RESEARCH

This research integrates genomics, environmental physiology, and marine conservation to address a problem of local concern. Data from this work will advance estuary restoration by providing nucleic acid sequence information that can be used to identify traits associated with tolerance toward climate change and genetic markers that

10

can be used to screen other O. lurida populations for enhanced hypoosmotic tolerance.

More broadly, climate change has emerged as a threat to healthy marine ecosystems world-wide and the O. lurida transcriptome will enable downstream analyses that inform decisions regarding the susceptibility or tolerance of oysters toward climate change. This research also highlights problems surrounding California’s estuaries and the negative consequences of human activities on these important ecosystems. Results generated from this work will be useful to resource managers and may therefore foster stewardship and sustainable use of one of the most heavily populated coastlines in the state.

1.4 RNA SEQUENCING OF OLYMPIA OYSTERS

Rapid advances in next generation sequencing (NGS) technology and bioinformatics tools now offer a means to generate large amounts of nucleic acid sequence information for any species of interest. RNA-Seq is a revolutionary tool for characterizing changes in the cellular mRNA pool or transcriptome, and provides superior measurement of transcript levels compared with alternative methods (Wang et al., 2009). RNA-Seq allows a survey of the transcriptome without having prior knowledge of the genome sequence, as is the case with hybridization-based approaches such as microarrays. With RNA-Seq, a population of mRNA is converted to a library of fragmented complementary DNA (cDNA). The cDNA fragments are then sequenced to obtain short reads from one end (single-end sequencing) or both ends (paired-ended sequencing) of the fragment. These short reads are then assembled into longer contigs using homology to full length gene sequences contained in DNA sequences databases

11

like NCBI and Uniprot. The result is a large number of putative RNA transcripts (genes), representing the pool of mRNA present in cells at the time of sampling, that is, a transcriptome. A transcriptome in itself provides important information about the complement of genes an organism uses to survive and respond to its environment. In addition, a transcriptome facilitates downstream analyses that provide more detailed information about how an organism modifies mRNA to cope with environmental change.

The number of times a short RNA-Seq read maps to the same contig is equivalent to abundance of that particular mRNA, and by comparing the number of reads per contig between samples (for example control versus low salinity treatments) it is possible to determine how organisms modify mRNA abundance (i.e. gene expression) to cope with environmental change. Having expression information for such a large number of contigs provides an integrated picture of the molecular, cellular and physiological mechanisms that collectively constitute an organism’s response to the environment (Evans &

Hofmann, 2012).

Olympia oysters do not have a sequenced genome therefore we cannot map our short RNA-Seq reads to a complete set of DNA sequences. The most closely related species with a fully sequenced genome belongs to the (Crassostrea gigas).

Parsimony based phylogeny studies indicate that C. gigas and O. lurida are distant relatives and in two separate genera of the family (Polson et al., 2009). In addition, a maximum likelihood analysis indicates that O. lurida and C. gigas are in fact distant relatives (Fig 6). Within the 16S subunit of ribosomal RNA (rRNA) alone there is roughly a 16% difference between the two species. The 16S subunit has slow rates of

12

evolution, and is therefore highly conserved across species. Such variation between the

16S subunit of C. gigas and O. lurida indicate that there is considerable DNA sequence

Figure 6: Maximum likelihood tree based on 16S subunit of rRNA, bootstrap support values are based on 500 reps.

variation between the species, which makes using the C. gigas genome ineffective as a template for assembling short RNA-Seq reads into larger contigs. Given this sequence divergence, the alternative strategy of using DNA sequence information from all other available organisms to assemble our O. lurida RNA sequences into longer contigs must

13

be taken. The objective of this research was to develop a de novo transcriptome for O. lurida exposed to normal and low salinity seawater using this approach. The newly developed transcriptome will facilitate a number of important downstream applications that will allow researchers to better understand local adaptation to salinity in populations inhabiting Northern California estuaries.

14

2 EXPERIMENTAL PROCEDURES

2.1 EXPERIMENTAL WORKFLOW

Figure 7 depicts the general pipeline of methods conducted at BML and California

State University East Bay.

Figure 7: Experimental workflow graphical representation (adapted from Simple Fool’s Guide to Population Genomics (De Wit et al., 2012)).

2.2 COLLECTION AND SALINITY CHALLENGE

Oyster field collection:

Experiments were performed on second generation progeny from adult O. lurida collected from Tomales Bay, Loch Lomond and Oyster Point. Adult O. lurida collected from the field were transported to seawater facilities at BML, and maintained in ambient seawater for two generations under care of our collaborators Jill Bible and Eric Sanford.

15

Salinity challenge

Salinity challenge was performed by our collaborators at BML. Settlement plates

(Fig 8) containing 5-8 oysters from each population were transferred to experimental aquaria and either held in ambient seawater (~33ppt) for the duration of the experiment

(control) or exposed to a decline in salinity from 33ppt to 5 ppt over the course of five days (~5.6 ppt/day) and then held in 5ppt seawater for four more days prior to being rapidly frozen in liquid nitrogen and stored at -80°C (Table 1). Preliminary experiments demonstrated that this treatment regime did not cause significant differences in mortality between any of the three populations.

Figure 8: Ostrea lurida settled on experimental settlement plate.

16

Table 1: Salinity exposure table for gene expression.

Day Salinity 1 33 2 25 3 20 4 15 5 10 6 5 7 5 8 5 9 5

2.3 RNA ISOLATION

Whole frozen oysters were dissected and placed in 2mL Eppendorf tubes at BML and transferred to the -80°C at California State University East Bay. Six oysters per treatment per population were used in RNA sequencing. A detailed description of the total RNA isolation protocol is provided below:

TRIzol® RNA purification:

RNA was extracted from frozen oysters using the Trizol method described in

(Chomczynski & Sacchi, 1987). 1ml of TRIzol® reagent (Life Technologies) was added to each 2ml Eppendorf tube containing whole frozen oyster tissue. Samples were homogenized in TRIzol® using a TissueLyser LT for 4 minutes at 50 Hz. Tissues were further homogenized by repeatedly drawing and expelling the homogenate though a 20- gauge (0.99mm) needle and syringe until no tissue debris was visible. The resulting homogenate was centrifuged at high speed (>10,000 rpm) for 10 minutes at room

17

temperature and the supernatant was then transferred to a clean 2ml tube. 300µl of phenol-chloroform was added, tubes were inverted repeatedly to mix, and incubated at room temperature for 5 minutes. Following incubation, samples were centrifuged at high speed (>10,000 rpm) for 15 minutes at 4°C. Resulting supernatant was transferred to a new 2ml tube and 500µl of isopropanol was added and vortex mixed. Samples were then incubated at room temperature for 10 minutes before being centrifuged at high speed

(>10,000 rpm) for 10 minutes. 500µl of 70% ethanol was added to the pelleted RNA and vortex mixed. Precipitated RNA was centrifuged at high speed (>10,000 rpm) for 10 minutes at 4°C. Next, the 70% ethanol was removed, and the RNA pellet was allowed to air day for 15-20 minutes. Dried RNA was re-suspended in 100µl RNase free water.

RNA cleanup protocol for whole Olympia Oysters:

Isolated RNA was cleaned to remove degraded RNA using Qiagen’s RNEasy

Mini protocol. All cleanup steps were carried out at room temperature to reduce contamination from organic salts that can precipitate out of cold buffers. 350µl Buffer

RLT and 250µl of 100% ethanol were added to each RNA sample and mixed well by pipetting. The resulting 700µl volume was transferred to RNEasy spin columns placed in a 2ml collection tube. Samples were centrifuged at high speed (>10,000 rpm) for 15 seconds at room temperature. The collection tube was discarded and the spin column put in a new 2ml collection tube. 500µl of Buffer RPE was added to each RNEasy spin column. To prevent contamination with organics present in the Buffer RLT, the column

18

was rolled and inverted to remove any Buffer RLT on the side of the column. The column was again centrifuged at high speed (>10,000 rpm) for 15 seconds and this process was repeated once to ensure RNA was fully precipitated. The spin column was then put in a new 1.5ml collection tube and centrifuged at high speed (>10,000 rpm) at room temperature for 1 minute to ensure the column was dry and free of Buffer RPE.

Finally, 50µl of RNase free water was added to each column and centrifuged at high speed (>10,000 rpm) for 1 minute to elute the purified RNA. Isolated, clean RNA was stored at -80°C.

Assessing RNA quality using NanoDrop:

NanoDrop spectrophotometry was used to assess the quality of oyster RNA. From the NanoDrop program menu, “Nucleic Acid Measurement” and Sample Type “RNA-

40” were selected. 2µl of nuclease-free water was used to blank the NanoDrop. After drying with a KimWipe, 2µl of each RNA sample was added and measured.

Concentration (ng/µl), as well as A260:A280 and A260:A230 ratios were recorded. Only those samples with A260:A280 and A260:A230 ratios greater than 1.8 and 2.0, respectively, were used to construct cDNA libraries

2.4 CDNA LIBRARY CONSTRUCTION

Purified mRNA was used to construct cDNA libraries using NEB Next Ultra

Directional RNA Library Prep Kit for the Illumina sequencing platform (New England

Biolabs). A detailed protocol is provided below:

19

mRNA Isolation, fragmentation and priming

First Strand Reaction Buffer and Random Primer Mix was first prepared by mixing each of the following components* in a clean, labeled PCR tube.

NEBNext First Strand Synthesis Reaction Buffer (5X) 8µL NEBNext Random Primers 2µL Nuclease-free water 10µL *The volume of each component was multiplied by the number of samples used to make a total master mix.

A 15µl aliquot of NEBNext Oligo d (T) 25 beads was transferred into a nuclease-free PCR tube. The beads were mixed and washed in 100µl of RNA Binding Buffer and placed on a magnetic rack at room temperature for 2 minutes. The supernatant was removed and discarded, with careful attention to not disturb the beads, and this wash was repeated once. Beads were then re-suspended in 50µl of RNA Binding Buffer before 50µl of total

RNA sample was added to the tube. Each sample was placed in a thermal cycler and heated at 65° for 5 minutes to denature the RNA and facilitate binding of the Poly-A mRNA to the beads before being held at 4°C. Samples were quickly centrifuged to collect any condensation at the top of the PCR tubes and beads were re-suspended before being incubated at room temperature for 5 minutes to allow the mRNA to bind to beads.

Following the 5 minute incubation, each sample was placed on a magnetic rack at room temperature for 2 minutes to separate the Poly-A mRNA bound to the metal beads from the solution. The supernatant was discarded and 200µl of Wash Buffer was added to remove unbound RNA. This wash step was repeated once. The supernatant was once

20

again removed and discarded, and 50µl of Tris Buffer was added to each tube, and mixed by pipet. Each sample was placed in a thermal cycler and heated at 80°C for 2 minutes, then held at 25°C to elute the Poly-A mRNA from the beads. Samples were quickly centrifuged to collect any condensation at the top of the PCR tube and 50µl of RNA

Binding Buffer was added to the sample to allow the mRNA to rebind to the beads. Each sample was then incubated at room temperature for 5 minutes, before being placed on the magnetic rack at room temperature for 2 minutes. The supernatant was then removed and discarded, and each sample was washed by adding 200µl of Wash Buffer. Each sample was again placed on the magnetic rack at room temperature for 2 minutes and supernatant removed. 200µl of Tris Buffer was added and each sample placed on the magnetic rack at room temperature for 2 minutes and a 10µl tip was used to remove the remaining Tris

Buffer. To elute mRNA from the beads, 15µl of the previously prepared First Strand

Synthesis Reaction Buffer and Random Primer mix (2x) were added. Each sample was then incubated at 94°C for 15 minutes in a thermal cycler. Upon removal, samples were quickly centrifuged to collect any condensation at the top of the PCR tube and beads were resuspended before being transferred to a magnetic rack. 10µl of purified mRNA was transferred to a clean nuclease-free PCR tube. The tubes were held on ice before proceeding to First Strand cDNA Synthesis.

21

First stand cDNA synthesis:

The following components were added to the 10µl of primed mRNA:

Murine RNase Inhibitor 0.5µl ProtoScript II Reverse Transcriptase 1µl Nuclease free water 8.5µl TOTAL WITH mRNA 20µl

The sample was then incubated in a preheated thermal cycler under the following conditions:

10 minutes at 25°C 15 minutes at 42°C 15 minutes at 70°C Hold at 4°C

Samples were quickly centrifuged to collect any condensation at the top of the PCR tube and immediately the second strand synthesis reaction was performed. The following reagents were added to the First Strand Synthesis Reaction (20ul):

Nuclease-free water 48µl Second Strand Synthesis Buffer (10X) 8µl Second Strand Synthesis Enzyme Mix 4µl TOTAL WITH First Strand Synthesis Rxn 80µl Reagents were mixed thoroughly by pipetting and incubated in a thermal cycler for 1 hour at 16°C.

22

Purification of double-stranded cDNA:

144µl (1.8X) of resuspended AMPure XP beads were added to the second strand synthesis reaction (80µl). Each sample was mixed well by pipet, incubated for 5 minutes at room temperature, and briefly centrifuged to collect any sample on the sides of the tube. Each sample was placed on a magnetic rack for 5 minutes to separate beads from supernatant, after which the supernatant was removed and discarded. 200µl of freshly prepared 80% ethanol was added to each tube while still in the magnetic rack. Each sample was incubated at room temperature for 30 seconds, followed by removal of the supernatant. This ethanol wash was repeated once and the beads were left to air dry for 5 minutes. DNA target was eluted from the beads by adding 60µl of TE buffer and left to incubate at room temperature for 2 minutes. Each sample was placed in the magnetic rack until the solution was clear and 55.5µl of the resulting supernatant was transferred to a clean nuclease-free PCR tube.

End repair/dA-tail of cDNA library:

To the purified double-stranded cDNA (55.5ul), the following components were added: NEBNext End Repair Reaction Buffer (10X) 6.5µl NEBNext End Prep Enzyme Mix 3µl TOTAL with double-stranded cDNA 65µl

23

Each sample was then incubated in a thermal cycler under the following conditions: 30 minutes at 20°C 30 minutes at 65°C Hold at 4°C

Samples were spun in the centrifuge to collect any condensation at the top of the PCR tube.

Adaptor ligation:

Prior to starting the Adaptor Ligation, the NEBNext Adaptor for Illumina (15µM) was diluted to 1.5µM with a 10-fold dilution (1:9) with sterile water. The following components were added to the dA-Tailed cDNA (65µl):

Blunt/TA Ligase Master Mix 15µl Diluted NEBNext Adaptor 1µl Nuclease-free Water 2.5µl TOTAL with dA-Tailed cDNA 83.5µl

Components were mixed by pipetting and briefly centrifuged. Each sample was then incubated for 15 minutes at 20°C in a thermal cycler. 3µl of USER Enzyme was added to the ligation mixture and each sample was incubated at 37°C for an additional 15 minutes.

Purification of ligation reaction:

13.5µl nuclease-free water was added to the ligation reaction (86.5µl) to bring the total reaction volume to 100µl. 100µl of resuspended AMPure XP beads were added,

24

mixed by pipet, and incubated for 5 minutes at room temperature. Each tube was briefly centrifuged and placed on a magnetic rack to separate beads from supernatant. After 5 minutes, the supernatant was removed and 200µl of freshly prepared 80% ethanol was added. The sample was then incubated at room temperature for 30 seconds, and supernatant removed. The ethanol wash was repeated once and the beads were allowed to air dry for 5 minutes while the tube was on the magnetic rack with the lid open. DNA target was eluted from the beads with 52µl TE buffer and 50µl transferred to a clean PCR tube. To the 50µl supernatant, 50µl of resuspended AMPure XP beads were added and incubated at room temperature for 5 minutes. Each tube was placed on a magnetic rack for 5 minutes to separate the beads from the supernatant. Supernatant was discarded and

200µl of freshly prepared 80% ethanol was added and each sample was incubated at room temperature for 30 seconds before the supernatant was removed and discarded. This ethanol wash was repeated once. Beads were then allowed to air dry for 5 minutes. DNA target was eluted from the beads using 23µl of TE buffer. The sample was mixed via gentle pipetting and put in the magnetic rack until the solution was clear. Without disturbing the beads, 20µl of the supernatant was transferred to a clean PCR tube.

PCR library enrichment:

The following components were added to the cDNA (23µl):

NEBNext High-Fidelity PCR Master Mix, 2X 25ul Universal PCR Primer (25uM) 1ul

25

Index (X) Primer (25uM) 1ul TOTAL with cDNA 50ul

Each sample was incubated in a thermal cycler following the conditions in Table 2.

Table 2: PCR conditions for library enrichment.

CYCLE STEP TEMPERATURE TIME CYCLES Initial 98°C 30 seconds 1 Denaturation Denaturation 98°C 10 seconds Annealing 65°C 30 seconds 15 Extension 72°C 30 seconds Final Extension 75°C 5 minutes 1 Hold 4°C ∞

PCR reaction purification:

Following amplification, samples were quickly centrifuged to collect any condensation on top of the PCR tube. 50µl of resuspended Agencourt AMPure XP beads were added to the PCR reaction, incubated at room temperature for 5 minutes. Each tube was placed on a magnetic rack for 5 minutes to separate the beads from the supernatant.

Supernatant was removed and 200µl of freshly prepared ethanol was added and incubated at room temperature for 30 seconds. This wash step was repeated once and before each sample was allowed to air dry for 5 minutes. To elute the DNA target from the beads,

23µl of TE Buffer was added to the tube. The sample was mixed well via pipet before being placed back in the magnetic rack until the solution was clear. 20µl of the supernatant was then transferred to a clean PCR tube, and stored at -20°C.

26

2.5 ASSESSMENT OF LIBRARY QUALITY

Agilent High Sensitivity DNA Assay Protocol and Bioanalyzer were used to assess the quality of the cDNA libraries. 1µl of each library was run on a DNA High

Sensitivity chip. The electropherogram was reviewed to determine fragment size and concentration of cDNA. A detailed description of the Agilent high sensitivity DNA assay protocol is provided below:

Preparing the Gel-Dye mix:

Agilent High Sensitivity DNA Reagents were removed from cold storage and allowed to warm to room temperature for a minimum of 30 minutes. Following room temperature acclimation, 15 µl of High Sensitivity DNA dye concentrate was added to the vial of High Sensitivity DNA gel matrix and vortexed to mix. The vial was briefly centrifuged to remove contents from the side of the vial, before contents were transferred to a spin filter and centrifuged at 2240 RCF for 10 minutes. The prepared gel-dye was then loaded immediately following manufacturer’s instructions.

Loading the high sensitivity DNA chip:

Reagents were removed from cold storage and allowed to acclimate to room temperature for 30 minutes (if not already done to prepare gel-dye mix). A new High

27

Sensitivity DNA chip was then put on the chip priming station. 9µL of gel-dye mix was pipetted into the third G well. The plunger was then set to 1mL and the priming station closed. The plunger was then pushed down until it was held by the priming station clip and it was left for 60 seconds. Following 60 seconds the clip was released and the plunger was allowed to rise on its own for 5 seconds before being returned manually to the 1mL mark. The priming station was then opened and 9µL of gel-dye mix was pipet into the remaining G wells. 5µL of marker were added to each of the remaining wells and no wells were left empty. 1µL of High Sensitivity DNA ladder was added to the ladder well. Each of the remaining 11 wells were filled with 1µL of sample. The chip was then placed in a vortex mixer for 1 minute at 2400 rpm. After and analyzed with the Agilent

2100 Bioanalyzer instrument.

2.6 RNA-SEQUENCING

cDNA libraries were submitted to the Genomics Sequencing Laboratory at UC

Berkeley and sequenced using the HiSeq 4000 instrument (Illumina). Illumina sequencing utilizes amplification on a solid surface, the flow cell, to detect the signal given by the fluorescent reversible terminators used. This is much like Sanger sequencing, however the fluorescent terminator dNTP stops the synthesis of the growing

DNA copy strand, allowing only one base to be added by the polymerase enzyme each time a particular dNTP is added. The detector records the signal and the terminator is removed and another dNTP is added to the flow cell. The 36 samples were multiplexed,

28

whereby unique individual sequences called “barcodes” were added to the ends of the cDNA fragments from each sample (Table 3; Appendix B). Muliplexing allows multiple samples to be sequenced on the same lane and identified later during data analysis. To sequence, three lanes of the flow cell were used yielding 150 base-pair paired-end reads.

Paired-end sequencing protocols are useful for de novo transcriptome assembly because cDNA sequences are read from both the 5’ and 3’ ends of the cDNA fragment, as opposed to single reads where only one end is sequenced. Reading from both ends generates more sequences than single end reads and thus increases the likelihood of a homologous sequence match and assembly into a longer contig.

29

Table 3: Lane and barcode assignments for each of the 36 samples sequenced.

Lane 1 Lane 2 Lane 3

Sample Barcode Sample Barcode Sample Barcode

TE01A 1 TE13B 1 TE25C 1

TE02A 2 TE14B 2 TE26C 2

TE03A 3 TE15B 3 TE27C 3

TE04A 4 TE16B 4 TE28C 4

TE05A 5 TE17B 5 TE29C 5

TE06A 6 TE18B 6 TE30C 6

TE07A 7 TE19B 7 TE31C 7

TE08A 8 TE20B 8 TE32C 8

TE09A 9 TE21B 9 TE33C 9

TE10A 10 TE22B 10 TE34C 10

TE11A 11 TE23B 11 TE35C 11

TE12A 12 TE24B 12 TE36C 12

2.7 BIOINFORMATICS

Resulting raw sequence data were processed and assembled using CLC Genomics

Workbench version 8.5.1 (CLCbio). Analysis of the RNA-Seq data occur in three basic steps: 1) Quality control of raw sequence information, 2) Assembly, and 3) Annotation.

30

Quality control/trimming:

Sequences were downloaded as raw files known as “fastq” files, a text file with the sequences of thymines (Ts), guanines (Gs), cytosines (Cs) and adenines (As).

Sequence reads were quality trimmed and filtered with parameters allowing for 2 ambiguous nucleotides, deletion of the first 10 base pairs on the 5’ end and an error probability limit of 0.05 applied to Phred quality scores. A Phred score is a quality score used in NGS that measures if a base is called incorrectly. Phred scores are logarithmically linked to error probabilities, where a Phred quality score of 20 (as set by the limit of 0.05) means the probability of an incorrect base call is 1 in 100 analogous to a 99% base call accuracy. Reads shorter than 20 nucleotides were removed, as were NEB Illumina

Universal Adapters (AGATCGGAAGAGCACACGTCTGAACTCCAGTCA and

AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT). Ribosomal RNA (rRNA) and mitochondrial RNA was depleted in order to reduce the reads of non-coding sequences.

Resulting reads were then mapped to known rRNA and mitochondrial sequences in an effort to deplete the sample of known non-coding RNAs, from now on called contamination. Seven contamination references were used, each representing either the whole mitochondrion, 18S rRNA subunit or 28S subunit of rRNA. These sequences were identified by creating a de novo assembly using one arbitrary sample (TE01A) and default parameters. The top 50 duplicated sequences were identified using BLASTx. The

31

following seven represented the most abundant contamination found from the top 50 sequences:

1. cristagalli 18S rRNA gene. Accession AJ389635

2. Crassostrea virginica 28S ribosomal RNA gene, partial sequence. Accession

AY145400

3. isolate 11 28S ribosomal RNA gene, partial sequence. Accession

JQ611538

4. Saccostrea sp. LC-2013 28S ribosomal RNA gene, partial sequence. Accession

KC847151

5. Crassostrea virginica voucher BivAToL-276 28S ribosomal RNA gene, partial

sequence. Accession KC429429

6. Ostrea lurida mitochondrion, complete genome. Accession KC768038.

7. Spisula solidissima external transcribed spacer, partial sequence; 18S ribosomal

RNA gene, internal transcribed spacer 1, 5.8S ribosomal RNA gene, internal

transcribed spacer 2, and 28S ribosomal RNA gene, complete sequence; and

external transcribed spacer, partial sequence. Accession JN196041.

Reads that did not map to the contamination sequences listed above were collected and used for the final assembly.

32

De novo transcriptome assembly: A pooled de novo transcriptome assembly was done using sequence reads from all

36 oyster samples (Table 4). A transcriptome was assembled using two methods and the

Table 4: Distribution of samples by population and treatment.

Population

Loch Lomond Tomales Bay Oyster Point

Treatment 5 ppt Treatment 5 ppt Treatment 5 ppt (5LL) (5M) (5OP) TE06A TE02A TE03A

TE12A TE04A TE07A

TE13B TE11A TE08A

TE20B TE19B TE26C

TE25C TE31C TE27C

TE33C TE35C TE29C

Treatment 33 ppt Treatment 33 ppt Treatment 33 ppt (33LL) (33M) (33OP) TE05A TE01A TE10A

TE09A TE18B TE14B

TE22B TE21B TE15B

TE23B TE24B TE16B

TE28C TE30C TE17B

TE34C TE32C TE36C

33

results compared to determine which approach offers the most complete transcriptome.

The first method of assembly (hereby referred to as the traditional method; Fig 9) assembles a single transcriptome by analyzing sequences from all 36 samples simultaneously. The second method uses a stepwise approach, where multiple transcriptomes are assembled using subsets of samples and are then combined to generate the final assembly (Fig 10).

Oyster Point Samples

Loch Final Lomond Samples Assembly

Tomales Bay Samples

Figure 9: Traditional method of de novo assembly.

To construct a de novo transcriptome using the traditional method, all 36 individual samples (TE01A-TE36C; Table 4) are assembled together using CLC bio software. CLC bio’s de novo assembly algorithm works by using de Bruijn graphs. De

Bruijn graphs work by making a table of sub-sequences in the reads called words. Words

34

will overlap with a forward and backwards neighboring word to create longer sequences made of many words. When a sequencing error or single nucleotide polymorphism (SNP) occurs, where there is a discrepancy in the sequence, a bubble occurs. An example of a bubble would be the sequence CCTAGTTA verses sequence CCTACTTA, where the bubble would manifest at the difference between the G and C. In a simple case, the CLC bio software will rectify a bubble by choosing the sequence with the most read coverage.

In the case of diploid organisms, there is a fifty-fifty distribution of both sequences, making the choice of one sequence arbitrary over the other. When the distance of a SNP or sequencing error occurs close together, in that the space between them is less than the length of the word, the bubble size increases to encompass all the sites in that space. The traditional assembly was assembled using a word size of 26, bubble size of 50, and minimum assembled contig length of 200 bp.

The first step in the stepwise method of de novo assembly was to construct 36 individual transcriptomes corresponding to each biological sample (TE01A-TE36C). The

36 individual transcriptomes were than used to construct six group assemblies, one for each population x treatment combination (33OP, 5OP, 33LL, 5LL, 33M, 5M; Table 4).

The six group assemblies were then combined into three population assemblies by assembling the two group assemblies from the same population into one population assembly (example: 33OP and 5OP create the OP population assembly). Lastly, the final assembly was constructed by assembling the three population assemblies into one final assembly. Parameters for each level of assembly can be found in the Appendix (C).

35

Figure 10: Stepwise method of de novo assembly.

Transcriptome Annotation:

Annotation of the resulting contigs was carried out using Trinotate annotation suite on Atmosphere. Atmosphere is provided by CyVerse (formally known as iPlant), and is a cloud-computing server system used to create and launch isolated virtual machines

(VM). VMs are web-based and able to handle computationally intensive bioinformatics tasks. VMs can be configured to desired hardware specifications, where allocations of memory, hard-drive, and processing power can be chosen. Pre-installed software can be chosen to complete a multitude of tasks, including but not limited to: assembly, mapping and annotation. A VM (4 CPUs, 16 GB memory and 160 GB hard-drive) was used to run the software suite Trinotate_RNAseq_annoation_v2. All annotation was done using open source code from http://trinotate.github.io/. TransDecoder was used to find the likely

36

coding sequences within the final contigs, where the minimum open reading frame length was set to 100 amino acids. The Basic Local Alignment Search Tool (BLAST) was used to find regions of local similarity between sequences. BLASTx was used to search for homology, where the nucleotide sequences (contigs) are compared to The National

Center for Biotechnology Information (NCBI) protein sequence database. BLASTp was used to search the TransDecoder-predicted proteins against the NCBI protein sequence database. Both BLAST searches were set with an e-value threshold of 1.0E-5, identical to that used in another genomic study of O. lurida (Timmins-Schiffman et al., 2013).

Gene ontologies (GO) were identified from both BLAST homology searches using

PANTHER version 10 (Mi et al., 2013). All GO identifications fell into three broad functional categories (Biological Processes, Cellular Components and Molecular

Function). Further annotation was performed to identify protein domains with TMHMM

Server v. 2.0, signal peptides with SignalP 4.1 Server, and transmembrane regions using the program HMMEr.

2.8 MAPPING READS TO DE NOVO TRANSCRIPTOME

The number of short RNA-Seq reads that map back to the final assembled transcriptome provides an indication of how well the transcriptome was assembled. A high proportion of reads being incorporated into the final transcriptome suggests the majority of sequences were actually used to create the transcriptome. In contrast, if few reads map to the final transcriptome, the transcriptome is less complete because fewer sequences were used in the assembly. Processed reads from all 36 samples were used to

37

map against the final assembly. Mapping was done in CLC bio with a mismatch cost of 2, insertion cost of 3, deletion cost of 3, length fraction of 0.8, and a similarity fraction of

0.8 (Appendix E).

2.9 PHYLOGENY

The closest available genome to O. lurida is C. gigas. To determine if the C. gigas genome can be used as a template to construct the O. lurida transcriptome, we compared the 16S ribosomal subunits for each species. If 16S ribosomal subunit sequences are similar between the two species, the C. gigas genome can be used as a template for assembly an O. lurida transcriptome. However, if sequences are highly divergent, the O. lurida transcriptome will be better assembled considering sequences from all species in databases. 16S ribosomal subunit sequences acquired from GenBank (Appendix F) were aligned using default parameters on CLUSTALx. Minor alignment changes were done by eye in Mesquite (Maddison & Maddison, 2015). A maximum likelihood tree was constructed in PAUP* (Swofford, 2003) version 4 using a TVM+I+G model and

Heuristic search. Bootstrapping values were determined using 500 non-parametic "Fast"

Stepwise Addition replicates and the same model settings as the heuristic search (Fig 6).

A pairwise comparison was made using 16S sequences of O. lurida and C. gigas surveying 389 sites.

38

2.10 COMPARISON OF GENE ONTOLOGY AMONG BIVALVES

Comparing gene ontologies of multiple bivalve species can provide insight to differences and similarities in biological traits and physiological function. To compare the final transcriptome from the stepwise assembly method to other available bivalve transcriptomes, sequence contigs with annotations were obtained for four bivalves, 1)

Cobicula fluminea, commonly called the Asian and found in freshwater (Huihui et al. 2013), 2) martensii, commonly called the Akoya pearl oyster (Shi et al.

2013), 3) Crassostrea virginica, commonly called the eastern or American oyster

(Eierman & Hare, 2014), and 4) O. lurida (Timmins-Schiffman et al., 2013). PANTHER version 10 (Mi et al., 2013) used Uniprot IDs to assign each contig to an ontology, and the number of contigs within each ontology were then compared between the four species.

39

3 RESULTS

3.1 RNA SEQUENCING

Sequencing 36 samples on Illumina HiSeq4000 yielded 1,079,771,558 150 bp reads. The number of reads were reduced after quality trimming and removing rRNA contamination, leaving 637,134,988 reads remaining (59%). The remaining 637,134,988 reads were then used to assemble a transcriptome using the two methods described previously (Table 5).

Table 5: Summary table of dispersion of sample reads.

Read Status Number of Reads

Raw reads 1,079,771,558

Contamination depleted reads 637,134,988

3.2 TRANSCRIPTOME ASSEMBLY

Traditional method of assembly:

The traditional method of assembly yielded 842,411 contigs totaling 306,888,475 nucleotides. The mean contig length of this assembly is 364 bp, while the N50 length is

334 bp. Nucleotide distribution in this transcriptome shows the expected pattern, where the abundance of thymine (T), guanine (G), cytosine (C) and adenine (A) are realitively similar. Moreso, the distribution of A and T is the same as is the distribtuion of G and C.

GC content of the traditional method assembly is 42.4%.

40

Stepwise method of assembly:

Individual assemblies generated an average of 77,045 contigs, the smallest individual assembly is made of 19,423 contigs (TE13B) and the largest is 160,452 contigs (TE02A). Group assemblies generated an average of 56,836 contigs, with the smallest group assembly made of 42,036 contigs (33OP), and the largest has 67,501 contigs (33LL). Population assemblies yielded an average of 36,719 contigs. The smallest population transcriptome amounts to 31,086 contigs (Oyster Point), while the largest is 41,760 contigs (Loch Lomond). The final transcriptome is formed of 51,574 contigs (Fig 11).

100000 90000 77045 80000 70000 60000 56836 51574 50000 40000 36719

Number Number Contigsof 30000 20000 10000 0 Indiviudal Mean Group Mean Population Mean Final Assembly

Figure 11: Number of Contigs comparison across four assembly levels, error bars show 95% CI for individual assembly mean (n=36), group assembly mean (n=6) and population assembly mean (n=3).

41

Longer contigs are an indication that sequences were assembled to cover a greater portion of the coding regions for that gene, therefore, long contig lengths are desired when assembling a transcriptome. The mean contig length of the individual assemblies is

488 base pairs (bp), where the smallest individual assembly has a mean contig length of

328 bp (TE02A) and largest mean contig length is 604 bp (TE09A). Group assemblies generated an average mean contig length of 1,084 bp. The smallest group assembly has a mean contig length of 1,026 bp (5OP), while the largest is 1,171 (33M). Population assemblies have a mean contig length of 1,693 bp. The smallest mean contig length of the three assembles is 1,649 bp (Loch Lomond), while the largest mean contig length is

1,733 bp (Oyster Point). The mean contig length of the final assembly is 1,664 bp (Fig

12), where the largest contig found is 33,424 bp in length.

2000 1693 1800 1664 1600 1400 1200 1084 1000

Length 800 600 488 400 200 0 Indiviudal Mean Group Mean Population Mean Final Assembly

Figure 12: Mean Contig Length comparison across four assembly levels, error bars show 95% CI for individual assembly mean (n=36), group assembly mean (n=6) and population assembly mean (n=3).

42

The N50 is defined as the contig size such that all the contigs equal to or greater than that size account for at least half of the total assembled bases (Miller et al., 2010). A very low N50 length can be indicative of a poor assembly, such that the short fragments from RNA-Seq were unable to be combined into longer contigs that represent full length or near full length genes. The N50 length of the individual assemblies is 559 base pairs

(bp). Where the smallest individual assembly has a mean contig length of 282 bp

(TE02A) and largest N50 length is 815 bp (TE09A). Group assemblies generated a mean

N50 length of 1,533 bp. The smallest group assembly has a N50 length of 1,433 bp

(5OP), while the largest is 1,701 bp (33M). Population assemblies have an average N50 length of 2,465 bp. The smallest N50 length of the three assembles is 2,440 bp (Loch

Lomond), while the largest N50 is 2,483 bp (Tomales Bay). The N50 length of the final assembly is 2,604 bp (Fig 13).

43

3000

2465 2604 2500

2000 1533

1500 Length 1000 559 500

0 Indiviudal Mean Group Mean Population Mean Final Assembly

Figure 13: N50 comparison across four assembly levels, error bars show 95% CI for individual assembly mean (n=36), group assembly mean (n=6) and population assembly mean (n=3).

Nucleotide distribution was followed at all levels of assembly. All nucleotides should be distributed equally and matching base pairs (tymine to adenine and cytosine to guanine) should have equal distribution. Deviation from these ratios indicate a problem in transcritpome assembly. Abundance of thymine (T), guanine (G), cytosine (C) and adenine (A) remain realitively constant throughout all levels of assembly (Fig 14). Final assembly showed equal distribution of matching base pairs and a GC content of 40.8%.

44

Any nucleotide

Thymine

Guanine

Nucleotide Cytosine

Adenine

0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% Percentage

Final Population Group Individual

Figure 14: Nucleotide distribution among assembly levels (individual, group, population and final).

To test completeness of the final transcriptome, processed reads from all 36 samples were then mapped against the final de novo transcriptome which totals

85,799,458 nucleotides. A total of 371,378,602 reads were mapped in pairs and

105,919,542 were mapped as broken pairs, representing 78% of seqeunces reads.

132,497,962 (22%) reads were unable to be mapped to the final assembly (Fig 15).

45

Total Mapped Not Mapped

22%

78%

Figure 15: Percent of the sample reads mapped to the assembled transcriptome.

3.3 ANNOTATION

Quality control metrics (mean contig length, number of contigs and N50 lengths) indicate that the step-wise assembled transcriptome is superior to that assembled using the traditional method. Consequently, downstream analyses were performed using only the stepwise assembled transcriptome. BLASTx and BLASTp searches were used to find homologous genes in the NCBI database and determine the identity of contigs.

Approximately one-half of contigs were annotated: BLASTx analysis found homology for a total of 25,160 (49%) and BLASTp analysis found homology for a total of 23,257

(45%) contigs (Fig 16). When considering contigs that annotated the same, BLASTx and

BLASTp identified 10, 125 and 10, 247 unique contigs, respectively.

46

BLASTp 45% 55%

BLASTx 49% 51%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

With BLAST Hits No BLAST Hits

Figure 16: BLASTx and BLASTp results from Trinotate show percentage of contigs with homology at or above the 1E-5 e-value threshold.

Gene Ontology (GO) analysis was performed using PANTHER database on the set of unique contigs from both BLASTx and BLASTp in order to classify the annotated genes in terms of function. Gene functions are described in terms of three broad categories: biological processes in which each gene participates, the molecular function of each gene, and the cellular component in which genes are typically expressed. Gene ontology information was available for 8073 (79.7%) contigs from BLASTx analysis and

8223 (80%) from BLASTp analysis. Functional annotation using BLASTx and BLASTp gave very similar results (Figures 17-19).

47

0.40% 6.60% 0.60% 10.60% synapse 0.40% 0.60% 12.80% cell junction 6.40% 10.50% 25.60% membrane 25.90% 12.70% macromolecular complex 3.50% 3.50% extracellular matrix

cell part 40.00% organelle

extracellular region 39.80%

Figure 17: Cellular Component GO ID comparision of BLASTx and BLASTp annotation results where the outer circle is BLASTp results and inner circle is BLASTx results.

1% 7% 1% transporter activity 7% 6% 29% translation regulator activity 1% 1% 29% protein binding transcription factor activity 6% enzyme regulator activity catalytic activity channel regulator activity 6% 37% receptor activity 6% 7% 37% nucleic acid binding transcription 6% factor activity antioxidant activity 7% structural molecule activity 7% binding

Figure 18: Molecular Function GO ID comparision of BLASTx and BLASTp annotation results where the outer circle is BLASTp results and inner circle is BLASTx results.

48

3% 5%

cellular component organization or biogenesis cellular process 3% 21% 5% 31% localization apoptotic process 32% 21% reproduction

biological regulation 9% response to stimulus 2% 9% 1% developmental process 4% 1% 2% 6% 11% 1% rhythmic process 5% multicellular organismal process 4% 1% locomotion 6% 11% 6% biological adhesion

Figure 19: Biological Processes GO ID comparision of BLASTx and BLASTp annotation results where the outer circle is BLASTp results and inner circle is BLASTx results.

49

4 DISCUSSION

4.1 DE NOVO TRANSCRIPOME ASSEMBLY

Our final O. lurida transcriptome was created using 637,134,988 reads, which is far greater sequencing depth than that used to create other oyster transcriptomes for C. virginica, C. gigas, O. lurida, and C. hongkongensis using 1,256,652 (Eierman & Hare,

2014), 15,354,006 million (Zhao et al., 2012), 387,720,843 (Timmins-Schiffman et al.,

2013), and 107,076,589 (Zhao et al., 2014) reads, respectively. In addition, these transcriptomes were assembled using sequences from a single sample of pooled individuals, rather than multiple replicates from different treatments as was the case here.

Our use of multiple replicates within each treatment and population allows future analysis of gene expression and is therefore preferable to the pooled experimental design (Liu et al., 2014). However, most uses of RNA-Seq within the field of molecular ecology lack biological replicates. Recently, Todd et al. (2016) reported that of 158 eco-evolutionary studies published between 2008-2014, only 23 (15%) used more than three replicates per treatment (Todd et al. 2016). Our final transcriptome was assembled using six biological replicates per population (three populations) per treatment (two treatments). The increase in biological replicates is advantageous for downstream analysis but the large amounts of reads, as is the case here, present a bioinformatics challenge. Computer algorithms used to align short sequences into longer contigs can become overwhelmed when hundreds of millions of sequences are simultaneously analyzed. Due to the large amount of reads in this experiment (over 600 million), an alternative transcriptome assembly strategy was developed. De novo transcriptomes are commonly assessed using a set of quality control

50

metrics that include N50, mean contig length, number of contigs, and percentage of reads that could be mapped back to the final assembly (Moreton et al., 2016). Assembling the transcriptome in a stepwise fashion showed an overall increasing trend in the N50 length and mean contig length (Fig 12 and Fig 13). The final assembly N50 value is 2.6 Kb (Fig

13), a value larger than the 1.9 Kb N50 reported for the coding regions of the Pacific oyster genome (Zhang et al., 2012). The mean contig length for the final O. lurida

O. lurida (Final Transcriptome) 1664

O. lurida 611

C. virginica 453

C. gigas 419

C. hongkongensis 645

0 200 400 600 800 1000 1200 1400 1600 1800 Length (bp)

Figure 20: Comparison of mean contig lengths among four oyster species. The final transcriptome from the stepwise method of assembly, O. lurida (Timmins-Schiffman et al., 2013), C. virginica (Eierman & Hare, 2014), C. gigas (Zhao et al., 2012), and C. hongkongensis (Zhao et al., 2014).

assembly is 1664 (Fig 12) which is larger than the mean contig lengths reported for the four other de novo oyster transcriptomes (Fig 20). In contrast, results from the traditional method of transcriptome assembly show a considerably lower N50 value (334 bp) and mean contig length (364 bp). Lower N50 and mean contig lengths using the traditional

51

method of assembly may have arisen as a consequence of the overwhelming the CLC bio software with an extremely large number short sequence reads. The algorithm used by

CLC bio software to assemble the short reads into longer contigs tends to generate considerably shorter contigs when input sequences are extremely high in number, as they were in this case. The resulting shorter contigs, as evidenced by a small N50 and mean contig length, are more likely to represent incompletely assembled contigs and only

3000 2604 2500

2000 1664

1500

basepairs (bp) 1000

500 334 364

0 N50 Mean Contig Length

Traditional Assembly Stepwise Assembly

Figure 21: Comparison of N50 and mean contig lengths of the traditional and stepwise assembly methods.

fractions of the full coding sequence of the gene (Fig 21). By comparison, when using the stepwise method of transcriptome assembly, the short RNA-Seq reads are provided to the

CLC bio software in batches that do not overwhelm the assembly algorithm, which results in greater N50 and mean contig lengths. Combined these results strongly suggest

52

the novel stepwise approach to de novo transcriptome assembly used here was more effective in generating contigs covering the full length of gene coding regions.

Overall the total contig number of the transcriptomes decreases from the individual assembly to the stepwise assembly. The final stepwise assembly identified

51,574 contigs (Figure 11), whereas traditional method of assembly generated 842,411 contigs (Fig 22).The smaller number of contigs with the stepwise assembly suggests this approach was much more effective in assembling the short sequences into full length or near full length contigs of gene coding regions. The much higher number of contigs identified by the traditional assembly results from an inability to combine reads into longer contigs and likely represent partial and incomplete gene coding sequences. Thus the stepwise assembly method identified more complete transcripts as opposed to many small contigs representing fractions of the full length transcript identified by the traditional method. Based on these data, we conclude that the traditional transcriptome is a less complete assembly when compared to the final assembly from the stepwise method, and that any downstream analyses should utilize the transcriptome assembled using the stepwise method. Future transcriptome assemblies using the CLC bio algorithm will benefit from using a stepwise approach when very large numbers of RNA-

Seq reads are being input.

53

900000 842411 800000 700000 600000 500000 400000

basepairs(bp) 300000 200000 100000 51574 0 Traditional Assembly Stepwise Assembly

Figure 22: Comparison of the overall number of contigs between the “traditional” and stepwise assembly methods.

The final metric used to assess quality of the stepwise transcriptome is the percentage of short RNA-Seq reads that match some portion of the final assembled transcriptome. This quality control metric is based on the fact most of the original, short

RNA-Seq reads should match some part of the fully assembled transcriptome. If the number of original reads matching the completed transcriptome is low, it suggests few of these original reads were incorporated into the transcriptome and the assembly is likely incomplete. In contrast, having a high percentage of original reads match some portion of the final transcriptome indicates most of the original sequences were incorporated into the transcriptome and assembly is high quality. 78% of the 600 million original short 150 bp reads mapped back to the stepwise assembled transcriptome (Fig 15). Honaas et al.

(2016) suggest 65% represents a high percentage of mappable reads. Timmins-Schiffman et al. (2013) reported 222,773,113 of their reads were able to be mapped (~57.5%) in a

54

previous O. lurida transcriptome. Mapping 78% of the reads in this experiment indicates deeper sequencing and more gene coverage was achieved in the present study as opposed to the previous O. lurida transcriptome (Timmins-Schiffman et al., 2013)

BLAST searches were used to annotate the assembled contigs from the stepwise transcriptome. BLASTx and BLASTp homology identified 49% and 45% of the total contigs, respectively. This annotation rate is comparable with those reported for other de novo assemblies in non-model organisms including those of other oyster species

(Timmins-Schiffman et al., 2013; Bettencourt et al., 2010; Qin et al., 2012; Wang et al.,

2013; Zhao et al., 2012; Eierman & Hare, 2014; Zhao et al., 2014; Figure 23). Multiple

O. lurida (Final Transcriptome) 49%

C. virginica 51%

C. hongkongensis 33%

C. gigas 16%

Chlamys farreri 47%

Crassostrea angulate 21%

Bathymodiolus azoricus 29%

O. lurida 37%

0% 10% 20% 30% 40% 50% 60%

Figure 23: Percent of contigs annotated by BLAST homology compared among non- model organisms. The final transcriptome from the stepwise method of assembly, C. virginica (Eierman & Hare, 2014), C. hongkongensis (Zhao et al., 2014), C. gigas (Zhao et al., 2012), Chlamys farreri (Wang et al., 2013), Crassostrea angulate (Qin et al., 2012), Bathymodiolus azoricus (Bettencourt et al., 2010), and O. lurida (Timmins- Schiffman et al., 2013).

55

quality control methods show that not only is the final assembly from the stepwise assembly method of quality but also offers advantages over the only other genomic reference available for O. lurida (Timmins-Schiffman et al., 2013). Mean contig length of the assembled contigs is longer than the contigs available from the transcriptome by

Timmins-Schiffman et al. (2013), which could indicate that the contigs here assembled are more complete gene transcripts. This idea is supported by the fact that more reads were able to be mapped back to the final transcriptome (78%) as opposed to the low number of reads mapped back to the other reference (57.5%). Furthermore, more contigs were successfully annotated (49%) by BLAST homology in the final transcriptome in contrast to the previous study that annotated 37%.

Closely related species are expected to have similar physiologies, while species separated by greater lengths of evolutionary time are expected to have diverged physiologically. We would therefore expect bivalves to share many basic molecular and cellular pathways, but show divergence in genes functioning within pathways allowing species to adapt to the unique abiotic challenges of specific habitats. Gene ontologies were subsequently assigned to annotated contigs as a means of exploring how gene functions are distributed across the transcriptome of different bivalve species.

Distribution of ontologies indicates that genes present in our final transcriptome span a broad range of functions. In most cases, the percentage of genes annotating to each ontology was very similar between the bivalve species surveyed (Figure 24). However,

C. virginica (light blue; Figure 24) does not follow the trend observed in other species for

56

extracellular region (GO:0005576) organelle (GO:0043226) cell part (GO:0044464) extracellular matrix (GO:0031012) macromolecular complex (GO:0032991)

membrane (GO:0016020) Cellular Component Cellular cell junction (GO:0030054) synapse (GO:0045202) immune system process (GO:0002376) metabolic process (GO:0008152) biological adhesion (GO:0022610) multicellular organismal process (GO:0032501) developmental process (GO:0032502) response to stimulus (GO:0050896) biological regulation (GO:0065007)

reproduction (GO:0000003) Bioloical Bioloical Process apoptotic process (GO:0006915) localization (GO:0051179) cellular process (GO:0009987) cellular component organization or biogenesis… binding (GO:0005488) structural molecule activity (GO:0005198) antioxidant activity (GO:0016209) nucleic acid binding transcription factor activity… receptor activity (GO:0004872) channel regulator activity (GO:0016247) catalytic activity (GO:0003824)

Molecular Function Molecular enzyme regulator activity (GO:0030234) protein binding transcription factor activity… translation regulator activity (GO:0045182) transporter activity (GO:0005215)

0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00%

Final Assembly P. martensii C. fluminea O. lurida C. virginica

Figure 24: Comparison of GO IDs among four bivalves.

57

several categories. This functional divergence could result from technical differences during the assembly of each transcriptome, reflect evolutionary differences in gene numbers needed for the different lifestyles of O. lurida and C. virginica, or both.

While both species inhabit estuaries, C. virginica is found along the N. American

East Coast, and O. lurida exclusively on the West Coast. Ocean variability in nearshore regions of the U.S. West Coast and its estuaries is distinctly different from that in estuaries and nearshore regions of the East Coast. For example, the U.S. West Coast is dominated by an oceanographic process called upwelling, whereas the East Coast is not.

Upwelling is caused by a seasonal change in wind direction that transports colder, saltier, more acidic, and nutrient-rich water to into estuaries all along the U.S. West Coast

(Hickey & Banas, 2003). Thus, differences in upwelling between coasts cause O. lurida and C. virginica habitats to differ considerably in terms of prevailing temperature, salinity, pH, and nutrient availability. Such pronounced environmental differences will influence gene evolution in these two oyster species and contribute to differences in ontology described here.

Both oysters are commonly be found in estuaries and coastal areas that routinely experience reductions in salinity, however, C. virginica appears more tolerant of low salinity compared with O. lurida. There are reproducing populations of C. virginica found in estuaries where salinity can be less than 5 psu for several months (Eastern

Oyster Biological Review Team, 2007). In contrast, O. lurida thrives in salinities above

25 ppt (Baker, 1995; Couch & Hassler, 1989), but only tolerates exposure to low salinity for short periods (hours to days) (Wasson et al., 2014). Differences in the proportion of

58

genes annotating to different ontologies may be the result of C. virginica employing a greater number of genes to cope with low salinity compared with O. lurida. C. virginica has an increased number of genes in the cellular component category for extracellular region and extracellular matrix. Past studies reveal these ontologies are amongst the most significantly enriched during exposure to low salinity in both C. virginica and C. gigas

(Eierman & Hare, 2014). The extracellular matrix and region refers to gene products that are secreted from the cell, but retained in the interstitial fluid or of hemolymph. Changes in the osmolarity of interstitial fluids creates concentration gradients between the inside and outside of cells that can lead to the diffusion of ions and water across membranes and severe disruptions in protein function and homeostasis. Added genes functioning within extracellular region and extracellular matrix may be helping reduce membrane permeability and passive diffusion, therefore contributing to the enhanced ability of C. virginica to survive in low salinity seawater. Additionally, functional groups including receptor, catalytic, and transporter activity in C. virginica include a higher number of genes as compared to O. lurida (Figure 24). Consistent with salinity change altering the osmolarity of interstitial fluids and the diffusion of ions across membranes, added ion transporters in C. virginica are plausibly being used to transport ions across plasma membranes in order to maintain intracellular ion homeostasis. Again, the ability to express these added transport genes likely contributes to enhanced salinity tolerance in C. virginica.

Technical differences in experimental methodology also likely contribute to ontology variation between C. virginica and O. lurida (Figure 24). Comparing

59

transcriptomes is challenging because protocols used to assemble, annotate and analyze

RNA-Seq data are not standardized, and are often modified to meet objectives of specific research projects. One potential technical difference affecting number of genes annotating to distinct ontologies between C. virginica and O. lurida is sequencing depth.

The C. virginica transcriptome was constructed using approximately 1.2 million sequences reads, whereas our O. lurida transcriptome was assembled from over 600 million reads. Far greater sequencing depth in our experiments increases the detection of comparatively rare transcripts expressed at low levels and we would expect to identify more genes as a result. Some PANTHER gene ontology annotations support this hypothesis. The total number of genes functionally annotated by PANTHER for C. virginica was 1902 (45.6%), whereas 8223 (80%) genes were found in the final O. lurida transcriptome. Figure 25 shows the distribution of genes by number in each category. For the categories shown, O. lurida has more genes than C. virginica.

The presence of genes that pertain directly to salinity stress adaptation in invertebrates also validates the quality of our final transcriptome (Figures 26 and 27).

Key biological functions related to hypoosmotic stress have been characterized in C. gigas (Zhao et al., 2012; Meng et al., 2013), and not surprisingly, genes involved in ion and water transport underlie tolerance of low and fluctuating salinity. Meng et al. (2013) analyzed C. gigas gene expression over the course of a seven day low salinity exposure

(5, 10 and 15 ppt), and found that expressions of voltage-gated Na+/K+ channels were decreased at low salinity. Our O. lurida transcriptome includes multiple Na+ channel protein subunits (SCN2A and SCN5A), as well as voltage-gated K+ channels (KCNQ1,

60

KCNC2, KCNAB2, and KCNH2). Meng et al. (2013) also showed an increase in Ca2+ channel expression and our O. lurida transcriptome contains genes coding for Ca2+ channels (CAC1C, CAC1A) and a calcium-gated K+ channel (KCNMA1). Aquaporins are water transporting proteins that are important to protect against changes in cell volume during osmotic stress, and again, our O. lurida transcriptome includes several aquaporin genes (AQP4, AQP8, AQP9 and AQP10) (Meng et al., 2013).

3000 2500 2000 1500 1000

Number Number Genes of 500

0

(GO:0005215)

(GO:0005576)

(GO:0022610)

(GO:0031012)

(GO:0050896)

(GO:0032502)

transporteractivity

(GO:0002376)

extracellular region extracellular

biological adhesion biological

extracellular matrix extracellular

response to stimulus to response

developmental process developmental

(GO:0032501)

immune system process system immune

cellular process (GO:0009987) cellularprocess

receptor activity (GO:0004872) activity receptor

catalytic activity (GO:0003824) activity catalytic multicellular organismal process organismal multicellular Cellular Biological Process Molecular Function Component

C. virginica Final Assembly

Figure 25: Comparison of the number of genes by GO ID category, where outlying, categories were found by comparing the percentages mapped to GO categories, of both C. virginica and O. lurida.

61

Salinity tolerance research in C. gigas has led to the development of a gene network model that can be used to better illustrate how changes in gene expression contribute to adaptive responses to low salinity in this species (Meng et al., 2013). The model has been appended with genes included as part of our O. lurida transcriptome

(Figure 26). The model suggests a large number of genes with diverse functions work in concert to promote osmotic stress acclimation. Osmotic stress is detected by osmosensing proteins that detect either mechanical stress caused by cell volume changes or by imbalances in the concentration of solutes in the interstitial fluid. Some ion channels, such as aquaporins, are thought to act as cellular osmosensors. Osmotic sensing proteins are connected to signal transduction pathways that are capable of amplifying the stress signal and relaying this signal to the appropriate set of downstream proteins. Fluctuations in intracellular calcium concentration (calcium is an important second messenger in cell signaling), and cascades of protein phosphorylation catalyzed by kinases, are important in transducing osmotic stress signals. Ultimately, signal transduction cascades terminate at osmotic effector proteins that collectively work to restore homeostasis. Key osmotic effector proteins include ion transporters than can shuttle solutes across membranes and enzymes that catalyze chemical reactions forming osmolytes. Osmolytes are molecules that can be accumulated intracellularly in order to eliminate concentration gradients that form across membranes when environmental salinity changes. Thus, osmolytes are critical to preventing the passive diffusion of water and ions that cause cell volume change and disrupt normal ionic concentrations. Free amino acids are important

62

osmolytes, genes involved in their formation are up-regulated during salinity stress, and our O. lurida transcriptome includes many genes in amino acid synthesis pathways. As

Figure 26: Map, adapted from Meng et al. (2013), of salt-related signal transduction, effectors and physiological changes in response to salt stress, based on research from C. gigas. Proteins correspond to those found in the O. lurida final assembly.

mentioned previously, the O. lurida transcriptome also includes transcripts for many ion transport proteins, which by moving ions across membranes, are also key osmotic stress effector proteins. The model developed by Meng et al. (2013) also suggests antioxidant genes that prevent accumulation of reactive oxygen species (ROS), genes involved in immune responses, and certain metabolic enzymes also play a role in adaptive responses

63

to salinity change, and multiple genes from each of these pathways was identified in our

O. lurida transcriptome.

cellular amino acid biosynthetic process 81

cellular amino acid catabolic process 51 Process regulation of cellular amino acid metabolic 6

Primary Metabolic Primary process

immune response 118

antigen processing and presentation 7 Process

Immune Immune System macrophage activation 50

ion channel activity 152

cation transmembrane transporter activity 228

Transmembrane hydrogen ion transmembrane transporter activity 22 TransporterActivity 0 50 100 150 200 250 Number of genes

Figure 27: Biological GO IDs related to oyster response to hypoosmotic stress. Level 2 ontologies from PANTHER Go-Slim based on homology from BLASTx using the contigs from the final transcriptome generated by the stepwise assembly method.

In summary, we have presented a pipeline with a novel assembly method to create more complete transcriptomes when a significant number of biological replicates are used. We created a transcriptome of a non-model organism with 51,574 contigs, of which approximately 10, 200 are unique genes. Key genes as well as gene ontologies associated with hypoosmotic stress in marine bivalves were identified within our transcriptome. This

64

transcriptome will serve as a genomic resource to facilitate future studies in O. lurida, especially those relating to local adaptation to salinity in stress in three northern

California O. lurida populations.

65

4.2 FUTURE WORK

Our hypoosmotic stress transcriptome is a key resource for future experiments relating to the ecology, evolution and conservation of O. lurida (Fig 28). Future experiments will characterize changes in gene expression in order to define the physiological basis of differences in hypoosmotic stress tolerance between populations of

O. lurida. Assessment of gene expression in the C. gigas shows that exposure to low salinity seawater principally modifies the expression ion transport, cytoskeletal and metabolic genes (Zhao et al., 2012). Similarly, the marine galloprovincialis and Mytilus trossulus show modifications of ion transport genes under low salinity stress (Lockwood & Somero, 2010). We hypothesize that genes with similar functions will differ in expression between Loch Lomond oysters and oysters from either

Oyster Point or Tomales Bay following low salinity challenge. We also hypothesize that gene expression patterns from Oyster Point and Tomales Bay oysters will be more similar to each other than to Loch Lomond oysters.

Figure 28: Future experimental workflow graphical representation.

66

To determine evolutionary changes needed to increase hypoosmotic tolerance in

O. lurida, changes in allele frequency between oysters from each population held at ambient salinity and those of oysters from each population having survived a low-salinity challenge (Table 6) will be assessed. Salinity challenge was performed by our collaborators at BML and followed a similar protocol as that previously described, but included holding oysters for an additional nine days at 5 ppt (as opposed for only four days (Table 1)). RNA extraction and cDNA libraries were created in parallel to the samples used for transcriptome assembly. These samples were sequenced on three lanes

Table 6: Salinity exposure table for SNP detection.

Day Salinity 1 33 2 25 3 20 4 15 5 10 6 5 7 5 8 5 9 5 10 5 11 5 12 5 13 5 14 5

to generate 50 base-pair single-end reads on the HiSeq 2000 Illumina instrument. These reads can to be mapped to specific genes using our newly assembled transcriptome to determine the locations of polymorphisms. Changes in allele frequency between

67

populations can then be determined by statistically comparing the frequency of nucleotide substitutions in each full length RNA between populations. We hypothesize changes in allele frequency will occur disproportionately in regulatory genes, including cell signaling enzymes and transcription factors. These genes are capable of modifying the expression of large numbers of downstream genes in order to promote environmental acclimation (Fu et al., 2014).

68

REFERENCES Baker, P. (1995). Review of ecology and fishery of the Olympia oyster, Ostrea lurida, with annotated bibliography. Journal of Shellfish Research, (14), 501-518. Barrett, E. (1963). The California Oyster Industry. State of California, Department of Fish and Game Fish Bulletin, 1-104.

Barrett, E.M. (1963). The California oyster industry. California Department of Fish and Game Fish Bulletin. 123.103pp.

Bible, J. & Sanford, E. (2016). Local adaptation in an estuarine foundation species: Implications for restoration. Biological Conservation, 193:95-102.

Berger, V. & Kharazova, A. (1997). Mechanisms of salinity adaptations in marine molluscs. Interactions and Adaptation Strategies of Marine Organisms, 115-126.

Bettencourt, R., Pinheiro, M., Egas, C., Gomes, P., Afonso, M., Shank, T., & Santos, R.S. (2010) High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent Bathymodiolus azoricus. BMC genomics 11:559.

Brumbaugh, R., & Coen, L. (2009). Contemporary Approaches for Small-Scale Oyster Reef Restoration to Address Substrate Versus Recruitment Limitation: A Review and Comments Relevant for the Olympia Oyster, Ostrea lurida Carpenter 1864. Journal of Shellfish Research, 28(1), 147-161.

Brumbaugh, R., Beck, M., Coen, L., Craig, L., & Hicks, P. (2006). A practitioner’s guide to the design and monitoring of shellfish restoration projects: An ecosystem services approach. The Nature Conservancy, Arlington, VA., MRD Educational Report, 22, 1-32.

Brumbaugh, R., Coen, L., & Grizzle, R. (2007). ) A Practioner’s Perspective on Shellfish Restoration: Why Don’t We Manage Shellfish as the ecosystem Engineers That They Really Are? Coastal Zone, 07.

69

Brumbaugh, R., Sorabella, L., Garcia, C., Goldsborough, W., & Wesson, J. (2000). Making a case for community-based oyster restoration: An example from Hampton Roads, Virginia, U.S.A. Journal of Shellfish Research, 19(1), 467-472.

Chomczynski, P. & Sacchi, N. (1987) Single step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal Biochem 162:156– 159.

Cloern, J., Knowles, N., Brown, L., Cayan, D., Dettinger, M., Morgan, T., & Jassby, A. (2011). Projected Evolution of California's San Francisco Bay-Delta-River System in a Century of Climate Change. PLoS ONE, 6(9), E24465-E24465. doi:10.1371/journal.pone.0024465.

Couch, D., & Hassler, T. (1989). Species Profiles: Life Histories and Environmental Requirements of Coastal Fishes and Invertebrates (Pacific Northwest). Fish and Wildlife Service Biological Report, (82), 11-124.

DeWit, P., Pespeni, M., Ladner, J., Barshis, D., Seneca, F., Jaris, H., & Palumbi, S. (2013). The simple fool's guide to population genomics via RNA-Seq: An introduction to high-throughput sequencing data analysis. Molecular Ecology Resources, 1058-1067.

Eastern Oyster Biological Review Team. (2007). Status review of the (Crassostrea virginica). Report to the National Marine Fisheries Service, Northeast Regional Office. February 16, 2007. NOAA Tech. Memo. NMFS F/SPO-88, 105 p.

Eierman, L., & Hare, M. (2014). Transcriptomic analysis of candidate osmoregulatory genes in the eastern oyster Crassostrea virginica. BMC Genomics, (15), 503-503 doi:10.1186/1471-2164-15-503.

Ermgassen, P., Gray, M., Langdon, C., Spalding, M., & Brumbaugh, R. (2013). Quantifying the historic contribution of Olympia oysters to filtration in Pacific Coast (USA) estuaries and the implications for restoration objectives. Aquat Ecol Aquatic Ecology, 47(2), 149-161. doi:10.1007/s10452-013-9431-6.

70

Evans, T.G. & Hofmann, G.E. (2012). Defining the limits of physiological plasticity: how gene expression can assess and predict the consequences of ocean change. The Royal Society 367, 1733–1745 doi:10.1098/rstb.2012.0019.

Fu, X., Sun, Y., Wang, J., Xing, Q., Zou, J., Li, R., & Bao, Z. (2014). Sequencing-based gene network analysis provides a core set of gene resource for understanding thermal adaptation in Zhikong Chlamys farreri. Molecular Ecology Resources (14), 184-198.

Grosholz, E., Moore, J., Zabin, C., Attoe, S., & Obernoite, R. (2007). Planning for native oyster restoration in San Francisco Bay. Final Report to California Coastal Conservancy.

Hertlein, L. (1959). Notes on California Oysters. Veliger, (2), 5-10.

Hickey, B.M. & Banas, N.S. (2003). Oceanography of the U.S. Pacific Northwest Coastal Ocean and Estuaries with Application to Costal Ecology. Estuaries. 26 (4B), 1010-1031.

Honaas, L. A., Wafula, E. K., Wickett, N. J., Der, J. P., Zhang, Y., Edger, P. P., & de Pamphilis, C. W. (2016). Selecting Superior De Novo Transcriptome Assemblies: Lessons Learned by Leveraging the Best Plant Genome. PLoS ONE, 11(1), e0146062. http://doi.org/10.1371/journal.pone.0146062.

Huihui, C., Zha, J., Liang, X., Bu, J., Wang, M., & Wang, Z. (2013). Sequencing and De Novo Assembly of the Asian Clam (Corbicula fluminea) Transcriptome Using the Illumina GAIIx Method. PLoS ONE, 8 (11), e79516. doi.org/10.1371/journal.pone.0079516.

Kawecki, T., & Ebert, D. (2004). Conceptual issues in local adaptation. Ecology Letters Ecol Letters, 7(12), 1225-1241. doi:10.1111/j.1461-0248.2004.00684.

Kimbro, D., & Grosholz, E. (2006). Disturbance Influences Oyster Community Richness and Evenness, but Not Diversity. Ecology, 87(9), 2378-2388.

71

Kirby, M. (2004). Fishing down the coast: Historical expansion and collapse of oyster fisheries along continental margins. Proceedings of the National Academy of Sciences, 101(35), 13096-13099. doi:10.1073/pnas.0405150101.

Liu, Y., Zhou, J., & White, K. P. (2014). RNA-seq differential expression studies: more sequence or more replication? Bioinformatics, 30(3), 301–304. http://doi.org/10.1093/bioinformatics/btt688.

Lockwood, B., & Somero, G. (2010). Transcriptomic responses to salinity stress in invasive and native blue mussels (genus Mytilus). Molecular Ecology, 20(3), 517- 529. doi:10.1111/j.1365-294X.2010.04973.

Maddison, W. P. & Maddison, D.R. (2015). Mesquite: a modular system for evolutionary analysis. Version 3.04. http://mesquiteproject.org.

Meng, J., Zhu, Q., Zhang, L., Li, C., Li, L., She, Z., Huang, B., & Zhang, G. (2013). Genome and Transcriptome Analyses Provide Insight into the Euryhaline Adaptation Mechanism of Crassostrea gigas. PLoS one, doi.org/10.1371/journal.pone.0058563.

Mi, H., Muruganujan, A., Casagrande, J.T., & Thomas, P.D. (2013) Large-scale Gene Function Analysis with the PANTHER Classification System. Nature Protocols 8, 1551 – 1566. doi: 10.1038/nprot.2013.092.

Miller, J. R., Koren, S., & Sutton, G. (2010). Assembly Algorithms for Next-Generation Sequencing Data. Genomics, 95(6), 315–327. doi.org/10.1016/j.ygeno.2010.03.001.

Moreton, J., Izquierdo, A., & Emes, R.D. (2016). Assembly, Assessment, and Availability of De novo Generated Transcriptomes. Frontiers in Genetics. doi: 10.3389/fgene.2015.00361.

NERR. (2010). National Estuarine Research Reserve, Centralized Data Management Office.

72

Newell, R. (2004). Ecosystem influences of natural and cultivated populations of suspension feeding bivalve mollusks: A review. Journal of Shellfish Research. 23(8), 51-61.

Olympia Oyster Restoration. Retrieved from http://www.restorationfund.org/projects/olympiaoyster.

Peter-Contesse, T. & Peabody, B. (2005). Reestablishing Olympia Oyster Populations in Puget Sound, Washington. A Washington Sea Grant Program publication.

Polson, M.P., Hewson, W.E., Eernisse, D.J., Baker, P.K., & Zacherl, D.C. (2009). You say Conchaphila, I say Lurida: Molecular evidence for restricting the Olympia oyster (Ostrea lurida carpenter 1864) to temperate western . Journal of Shellfish Research, 28(1), 11-21.

Qin, J., Huang, Z., Chen, J., Zou, Q., You, W., & Ke, C. (2012). Sequencing and de novo analysis of Crassostrea angulate (Fujian Oyster) from 8 different developing phases using 454 GSFlx. PLoS one, 7: e43653.

Scavia, D., Field, J., Boesch, D., Buddemeier, R., Burkett, V., Cayan, D., Titus J. (2002). Climate Change Impacts On U.S. Coastal And Marine Ecosystems. Estuaries, (25), 149-164.

Schaafsma, M., & Turner, R. (2009). Valuation of Coastal and Marine Ecosystem Services: A Literature Review. Studies in Ecological Economics Coastal Zones Ecosystem Services, 103-125.

SF Bay Living Shorelines. Retrieved from http://www.sfbaylivingshorelines.org/sf_shorelines_about.html.

Shi, Y., Yu, C., Gu, Z., Zhan, X., Wang, Y., & Wang, A. (2013). Characterization of Pearl Oyster (Pinctada martensii) Mantle Transcriptome Unravels Biomineraliztion Genes. Marine Biotechnology, 15 (2): 175-187.

73

Smith, S., & Hollibaugh, J. (1997). Annual Cycle and Interannual Variability of Ecosystem Metabolism in a Temperate Climate Embayment. Ecological Monographs, (67), 509-533.

Swofford, D. L. (2003). PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts.

Timmins-Schiffman, E., Friedman, C., Metzger, D., White, S., & Roberts, S. (2013). Genomic resource development for shellfish of conservation concern. Molecular Ecology Resources, 13(2), 295-305.

Todd, E. V., Black, M. A., & Gemmell, N. J. (2016). The power and promise of RNA- seq in ecology and evolution. Mol Ecol, 25: 1224–1241. doi:10.1111/mec.13526.

Trimble, A., Ruesink, J., & Dumbauld, B. (2009). Factors Preventing the Recovery of a Historically Overexploited Shellfish Species, Ostrea lurida Carpenter 1864. Journal of Shellfish Research, (28), 97-106.

Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-Seq: A Revolutionary Tool for Transcriptomics. Nature Reviews Genetics, 10(1), 57-63. doi:10.1038/nrg2484.

Wang, S., Hou, R., Bao, Z., Du, H., He, Y., Su, H., … Hu, X. (2013) Transcriptome sequencing of Zhikong Scallop (Chlamys farreri) and comparative transcriptome analysis with Yesso Scallop (Patinopecten yessoensis). PLoS One 8: e63927.

Wasson, K. (2010). Informing Olympia oyster Restoration: Evaluation of Factors That Limit Populations in a California Estuary. Wetlands, (30), 449-459.

Wasson, K., Zabin, C., Bible, J., Ceballos, E., Chang, A., Deck, A., & Ferner, M. (2014). Guide to Olympia oyster Restoration and Conservation. Environmental conditions and sites that support sustainable populations in central California. San Francisco Bay National Estuarine Research Reserve.

74

Zabin, C., Attoe, S., Grosholz, E., & Coleman-Hulbert, C. (2010). Shellfish Conservation and Restoration in San Francisco Bay: Opportunities and Constraints. Final Report for the Subtidal Habitat Goals Committee. University of California, Davis.

Zhang, L., Li, L., Zhu, Y., Zhang, G., & Guo, X. (2012). Transcriptome Analysis Reveals a Rich Gene Set Related to Innate Immunity in the Easter Oyster (Crassostrea virginica). Marine Biotechnology. 16(1), 17-33.

Zhao, X., Yu, H., Kong, L., & Li, Q. (2012). Transcriptomic Responses to Salinity Stress in the Pacific Oyster Crassostrea gigas. PLoS ONE, 7(9), E46244-E46244. doi:10.1371/journal.pone.0046244

Zhao, X., Yu, H., Kong, L., Liu, S., & Li Q. (2014). Comparative Transcriptome Analysis of Two Oysters, Crassostrea gigas and Crassostrea hongkongensis Provides Insights into Adaptation to Hypo-Osmotic Conditions. PLos ONE, 10(3): e0118665. doi: 10.1371/journal.pone.0118665

75

APPENDIX A: RNA ISOLATION INFORMATION

Group 5M

Sample I (ng/µl) Concentration A260 A280 260/280 260/230 Primer Index Date Lib Creation Lib Code Date Bioanalyzer

solation Date solation

791-1 4/20/15 635.1 15.88 7.346 2.16 2.14 Not 5/25/15 5 M 1 5/28/15 Used (1) 791-1 4/20/15 635.1 15.88 7.346 2.16 2.14 2 6/27/15 5 M 1 7/2/15 (2) 290-1 4/28/15 1613.9 40.35 18.81 2.14 2.29 Not 6/10/15 5 M 2 6/11/15 Used (1) 290-1 4/28/15 1613.9 40.35 18.81 2.14 2.29 Not 6/12/15 5 M 2 6/19/15 Used (2) 791-3 5/1/15 435.4 10.88 5.147 2.11 2.08 11 6/27/15 5 M 2 7/2/15 (3) 142-2 4/28/15 1721.5 43.04 20.08 2.14 2.31 11 6/16/15 5 M 3 6/19/15 142-1 4/28/15 1027.2 25.68 12.09 2.12 2.09 4 5/25/15 5 M 4 5/28/15 290-2 4/29/15 1236.2 30.91 14.33 2.16 2.26 7 6/18/15 5 M 5 6/19/15 142-3 5/2/15 1870.7 46.77 21.79 2.15 2.33 7 6/12/15 5 M 6 6/19/15

Group 33M

Sample Date Isolation (ng/µl) Concentration A260 A280 260/280 260/230 Primer Index Date Lib Creation Lib Code Date Bioanalyzer

746-2 5/2/15 1010.8 25.271 11.693 2.16 2.13 12 6/16/15 33 M 1 6/19/15 746-1 5/2/15 2201.7 55.044 25.689 2.14 2.33 8 6/12/15 33 M 2 6/19/15 295-2 5/2/15 1222.1 30.552 14.137 2.16 2.33 9 6/12/15 33 M 3 6/19/15 285-2 4/29/15 965.7 24.143 11.158 2.16 2.27 6 6/12/15 33 M 4 6/19/15 295-3 4/29/15 646.8 16.17 7.557 2.14 2.29 6 6/4/15 33 M 5 6/11/15 295-1 4/20/15 1792.2 44.804 20.664 2.17 2.25 1 6/10/15 33 M 6 6/11/15

76

Group 5LL

Sample Date Isolation (ng/µl) Concentration A260 A280 260/280 260/230 Primer Index Date Lib Creation Lib Code Date Bioanalyzer

517-1 4/20/15 995.3 24.882 11.575 2.15 2.33 1 5/18/15 5 LL 1 (1) 5/19/15

517-1 4/20/15 995.3 24.882 11.575 2.15 2.33 Not 6/4/15 5 LL 1 (2) 6/11/15 Used 485-1 4/20/15 920.5 23.013 10.65 2.16 2.35 6 6/18/15 5 LL 2 6/19/15

766-2 4/28/15 667.4 16.685 7.819 2.13 2.34 1 6/16/15 5 LL 3 6/19/15

517-2 4/29/15 1341.2 33.531 15.565 2.15 2.19 12 6/10/15 5 LL 4 6/11/15

766-1 5/2/15 600.4 15.01 6.973 2.15 2.38 9 6/18/15 5 LL 5 6/19/15

493-2 5/2/15 316.8 7.921 3.689 2.15 2.2 10 6/18/15 5 LL 6 6/19/15

Group 33LL

Sample Date Isolation (ng/µl) Concentration A260 A280 260/280 2 Primer Index Date Lib Creation Lib Code Date Bioanalyzer

60/230

598-2 5/2/15 627.2 15.681 7.34 2.14 2.36 4 6/10/15 33 LL 1 6/19/15 595-1 5/2/15 1560.2 39.005 18.091 2.16 2.34 10 6/18/15 33 LL 2 6/19/15 592-1 5/2/15 1414.5 35.363 16.43 2.15 2.35 10 6/12/15 33 LL 3 6/19/15 598-1 4/29/15 1805.8 45.144 21.058 2.14 2.32 9 6/4/15 33 LL 4 6/11/15 516-1 4/20/15 680.3 17.008 7.897 2.15 2.28 5 6/18/15 33 LL 5 6/19/15 501-1 4/20/15 1330.3 33.257 15.335 2.17 2.27 11 6/27/15 33 LL 6 7/1/15

77

Group 5OP

Sample Isol (ng/µl) Concentration A260 A280 260/280 260/230 Primer Index Date Lib Creation Lib Code Date Bioanalyzer

ation Date ation

863-2 4/20/15 527.3 13.182 6.057 2.18 2.28 Not 5/25/15 5 OP 1 5/28/15 Used (1) 863-2 4/20/15 527.3 13.182 6.057 2.18 2.28 3 6/27/15 5 OP 1 7/2/15 (2) 569-1 4/20/15 472.6 11.814 5.498 2.15 2.18 7 6/4/15 5 OP 2 6/11/15 565-1 5/2/15 1338.4 33.459 15.574 2.15 2.25 5 6/27/15 5 OP 3 7/2/15 580-2 4/29/15 312.9 7.823 3.623 2.16 2.3 3 6/16/15 5 OP 4 6/19/15 580-1 5/1/15 990.1 24.753 11.441 2.16 2.29 8 6/4/15 5 OP 5 6/11/15 552-1 5/1/15 1179.5 29.489 13.724 2.15 2.19 2 6/10/15 5 OP 6 6/11/15

Group 33OP

Sample Date Isolation (ng/µl) Concentration A260 A280 260/280 260/230 Primer Index Date Lib Creation Lib Code Date Bioanalyzer

535-1 5/2/15 672.6 16.814 7.794 2.16 2.39 4 6/16/15 33 OP 1 6/19/15 854-1 4/20/15 1090.9 27.271 12.554 2.17 2.07 5 6/27/15 33 OP 2 7/2/15 864-1 4/28/15 1285.9 32.149 14.975 2.15 2.33 2 6/16/15 33 OP 3 6/19/15 532-1 4/20/15 1334.5 33.363 15.371 2.17 2.22 3 6/10/15 33 OP 4 6/11/15 864-2 5/1/15 1085.5 27.138 12.565 2.16 2.21 10 6/4/15 33 OP 5 6/11/15 559-1 4/20/15 801.8 20.046 9.292 2.16 2.2 12 6/27/15 33 OP 6 7/2/15

78

APPENDIX B: RNA-SEQ SAMPLE LANE ASSIGNMENTS

Lane A QB3 Sample # Sample Sample Group cDNA lib code Index Primer TE01A 33-2 295-1 M 33 M 6 1 TE02A 5-3 791-1 M 5 M 1 (2) 2 TE03A 5-2 863-2 OP 5 OP 1 (2) 3 TE04A 5-1 142-1 M 5 M 4 4 TE05A 33-3 516-1 LL 33 LL 5 5 TE06A 5-2 485-1 LL 5 LL 2 6 TE07A 5-1 569-1 OP 5 OP 2 7 TE08A 5-3 580-1 OP 5 OP 5 8 TE09A 33-1 598-1 LL 33 LL 4 9 TE10A 33-3 864-2 OP 33 OP 5 10 TE11A 5-1 142-2 M 5 M 3 11 TE12A 5-2 517-2 LL 5 LL 4 12 Lane B QB3 Sample # Sample Sample Group cDNA lib code Index Primer TE13B 5-2 517-1 LL 5 LL 1 (1) 1 TE14B 33-3 864-1 OP 33 OP 3 2 TE15B 33-1 532-1 OP 33 OP 4 3 TE16B 33-2 535-1 OP 33 OP 1 4 TE17B 33-1 854-1 OP 33 OP 2 5 TE18B 33-2 295-3 M 33 M 5 6 TE19B 5-2 290-2 M 5 M 5 7 TE20B 5-1 493-2 LL 5 LL 6 8 TE21B 33-2 295-2 M 33 M 3 9 TE22B 33-3 592-1 LL 33 LL 3 10 TE23B 33-2 501-1 LL 33 LL 6 11 TE24B 33-3 746-2 M 33 M 1 12 Lane C QB3 Sample # Sample Sample Group cDNA lib code Index Primer TE25C 5-3 766-2 LL 5 LL 3 1 TE26C 5-3 552-1 OP 5 OP 6 2 TE27C 5-3 580-2 OP 5 OP 4 3 TE28C 33-1 598-2 LL 33 LL 1 4 TE29C 5-1 565-1 OP 5 OP 3 5 TE30C 33-1 285-2 M 33 M 4 6 TE31C 5-1 142-3 M 5 M 6 7 TE32C 33-3 746-1 M 33 M 2 8

79

TE33C 5-3 766-1 LL 5 LL 5 9 TE34C 33-1 595-1 LL 33 LL 2 10 TE35C 5-3 791-3 M 5 M 2 (3) 11 TE36C 33-3 559-1 OP 33 OP 6 12

80

APPENDIX C: ASSEMBLY PARAMETERS All assemblies were set to a minimum contig length of 200 bp.

Individual Sample Assemblies

Sample # Word Size Bubble Size TE01A 23 50 TE02A 23 50 TE03A 21 50 TE04A 23 50 TE05A 23 50 TE06A 23 50 TE07A 23 50 TE08A 21 50 TE09A 23 50 TE10A 23 50 TE11A 23 50 TE12A 23 50 TE13B 20 50 TE14B 21 50 TE15B 21 50 TE16B 21 50 TE17B 21 50 TE18B 21 50 TE19B 20 50 TE20B 20 50 TE21B 21 50 TE22B 21 50 TE23B 21 50 TE24B 21 50 TE25C 23 50 TE26C 23 50 TE27C 23 50 TE28C 23 50 TE29C 23 50 TE30C 23 50 TE31C 23 50 TE32C 24 50 TE33C 23 50 TE34C 24 50

81

TE35C 23 50 TE36C 22 50

Group Assemblies

5OP Samples: Word Size Bubble Size TE03A TE07A TE08A 21 451 TE26C TE27C TE29C 5M Samples: Word Size Bubble Size TE02A TE04A TE11A 21 433 TE19B TE31C TE35C 5LL Samples: Word Size Bubble Size TE06A TE12A TE13B 21 487 TE20B TE25C TE33C 33OP Samples: Word Size Bubble Size TE10A TE14B TE15B 20 516 TE16B TE17B TE36C 33M Samples: Word Size Bubble Size TE01A 21 544 TE18B

82

TE21B TE24B TE30C TE32C 33LL Samples: Word Size Bubble Size TE05A TE09A TE22B 21 516 TE23B TE28C TE34C

Population Assemblies

Oyster Point Word Size Bubble Size 20 1040 Loch Lomond Word Size Bubble Size 20 1095 Tomales Bay Word Size Bubble Size 20 1114

Final Assembly

Final Assembly Word Size Bubble Size 20 1688

83

APPENDIX D: RAW TRANSCRIPTOME RESULTS

Individual Assembly

Sample N50 Mean Contig # of Group # Length Contigs TE01A 722 573 85791 33M TE02A 282 328 160452 5M TE03A 399 388 42590 5OP TE04A 715 564 88754 5M TE05A 456 425 115326 33LL TE06A 336 366 127382 5LL TE07A 467 437 86815 5OP TE08A 455 441 27678 5OP TE09A 815 604 89536 33LL TE10A 491 447 84336 33OP TE11A 686 556 80204 5M TE12A 747 577 88183 5LL TE13B 393 386 19423 5LL TE14B 535 499 37824 33OP TE15B 566 524 36681 33OP TE16B 577 528 40344 33OP TE17B 587 533 38903 33OP TE18B 456 439 27629 33M TE19B 446 431 22603 5M TE20B 458 441 21389 5LL TE21B 574 526 43329 33M TE22B 584 530 43390 33LL TE23B 518 484 35824 33LL TE24B 498 475 28174 33M TE25C 691 551 91975 5LL TE26C 708 558 104548 5OP TE27C 507 452 109224 5OP TE28C 579 488 115461 33LL TE29C 386 395 117296 5OP TE30C 688 543 97798 33M TE31C 589 497 98940 5M TE32C 770 572 123083 33M TE33C 662 529 115658 5LL TE34C 768 567 128867 33LL

84

TE35C 286 344 118347 5M TE36C 739 578 79865 33OP

Group Assemblies

5OP 33OP Samples: N50 Mean # of Samples: N50 Mean # of Contig Contigs Contig Contigs Length Length TE03A 1433 1026 58183 TE10A 1468 1061 42036 TE07A TE14B TE08A TE15B TE26C TE16B TE27C TE17B TE29C TE36C 5M 33M Samples: N50 Mean # of Samples: N50 Mean # of Contig Contigs Contig Contigs Length Length TE02A 1461 1056 56132 TE01A 1701 1171 57820 TE04A TE18B TE11A TE21B TE19B TE24B TE31C TE30C TE35C TE32C 5LL 33LL Samples: N50 Mean # of Samples: N50 Mean # of Contig Contigs Contig Contigs Length Length TE06A 1569 1108 59341 TE05A 1565 1084 67501 TE12A TE09A TE13B TE22B TE20B TE23B TE25C TE28C TE33C TE34C

Population Assemblies

Oyster Point (OP) N50 Mean Contig Length # of Contigs 2473 1733 31086

85

Loch Lomond (LL) N50 Mean Contig Length # of Contigs 2440 1649 41760 Tomales Bay (M) N50 Mean Contig Length # of Contigs 2483 1696 37310

Final Assembly

Final Assembly N50 Mean Contig Length # of Contigs 2604 1664 51574

86

APPENDIX E: READS MAPPED TO FINAL TRANSCRIPTOME

Sample Mapped Mapped Total Total Total Not # (pairs) (orphans) Unmapped Reads Mapped Mapped TE01A 12975668 3503553 3954859 20434080 81% 19% TE02A 10897108 3191320 7564180 21652608 65% 35% TE03A 2260048 496347 1013967 3770362 73% 27% TE04A 13430226 4440068 4355626 22225920 80% 20% TE05A 12119224 3190797 4615947 19925968 77% 23% TE06A 12498320 3816178 6048194 22362692 73% 27% TE07A 13354262 4406066 4361646 22121974 80% 20% TE08A 1352056 280630 400760 2033446 80% 20% TE09A 13829336 4252205 4115557 22197098 81% 19% TE10A 12700900 3375269 3993109 20069278 80% 20% TE11A 13900350 3697797 4423797 22021944 80% 20% TE12A 13445824 4976114 4081228 22503166 82% 18% TE13B 597682 180023 364819 1142524 68% 32% TE14B 1391470 364497 390807 2146774 82% 18% TE15B 1499866 435598 440440 2375904 81% 19% TE16B 1703190 354662 436746 2494598 82% 18% TE17B 1490234 417029 393837 2301100 83% 17% TE18B 1096832 236207 300357 1633396 82% 18% TE19B 863416 218954 303128 1385498 78% 22% TE20B 754776 202356 279000 1236132 77% 23% TE21B 1526454 548238 459160 2533852 82% 18% TE22B 1681946 511150 483318 2676414 82% 18% TE23B 1382364 348653 420565 2151582 80% 20% TE24B 1110134 254856 325230 1690220 81% 19% TE25C 15751050 4232877 4620869 24604796 81% 19% TE26C 20529562 6967419 7042629 34539610 80% 20% TE27C 17858142 5244951 6063845 29166938 79% 21% TE28C 18352306 5612894 5495196 29460396 81% 19% TE29C 18027280 5582237 6971177 30580694 77% 23% TE30C 23202162 5012414 5650656 33865232 83% 17% TE31C 16169372 4456691 4891611 25517674 81% 19% TE32C 24850932 6810751 7433903 39095586 81% 19% TE33C 18244492 4824072 5919194 28987758 80% 20% TE34C 30412010 8173899 8380207 46966116 82% 18% TE35C 11852096 3255177 14082635 29189908 52% 48% TE36C 8267512 2047593 2419763 12734868 81% 19% Total 371378602 105919542 132497962 609796106 78% 22%

87

APPENDIX F: GENEBANK SEQUENCES (16S) USED FOR PHYLOGENY TREE

Species Accession Alectryonella plicatula AF052072 Teskeyostrea weberi AY376601 Saccostrea cuccullata AF458901 Saccostrea commericialis AF353100 Ostrea stentina AF052064 Ostrea puelchana AF052073 Ostrea permollis AF052075 Ostrea lurida AF052071 Ostrea edulis DQ280032 Ostrea denselamellosa AF052067 AF052063 Ostrea algoensis AF052062 AF052066 Dendostrea frons AF052070 Denostrea folium AF052069 Crassostrea virginica AF092285 Crassostrea rhizophorae AJ312938 Crassostrea hongkongensis JF808181 Crassostrea gigas AJ553903 Crassostrea ariakensis AY160757