<<

Genomic Platforms and Molecular Physiology of Stress Tolerance

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Justin Peyton MS

Graduate Program in Evolution, Ecology and Organismal

The Ohio State University

2015

Dissertation Committee:

Professor David L. Denlinger Advisor

Professor Zakee L. Sabree

Professor Amanda A. Simcox

Professor Joseph B. Williams

Copyright by

Justin Tyler Peyton

2015

Abstract

As ectotherms with high surface area to volume ratio, are particularly susceptible to desiccation and low temperature stress. In this dissertation, I examine the molecular underpinnings of two facets of these stresses: rapid and cryoprotective dehydration.

Rapid cold hardening (RCH) is an insect’s ability to prepare for cold stress when that stress is preceded by an intermediate temperature for minutes to hours. In order to gain a better understanding of cold shock, recovery from cold shock, and RCH in

Sarcophaga bullata I examine the transcriptome with microarray and the metabolome with gas chromatography coupled with mass spectrometry (GCMS) in response to these treatments. I found that RCH has very little effect on the transcriptome, but results in a shift from aerobic metabolism to glycolysis/gluconeogenesis during RCH and preserved metabolic homeostasis during recovery.

In cryoprotective dehydration (CD), a moisture gradient is established between external ice and the moisture in the body of an insect. As temperatures decline, the external ice crystals grow, drawing in more moisture which dehydrates the insect causing its melting point to track the ambient temperature. To gain a better understanding of CD and dehydration in I explore the transcriptome with RNA sequencing ii and the metabolome with GCMS. I found an up regulation of genes involved in autophagy and down regulation of those involved in apoptosis. I also found coordinated shut down of metabolism during cryoprotective dehydration.

Sequencing the of an organism is an expensive and time consuming endeavor, but with the advent of next generation sequencing, it is possible for a single lab or a small group of allied labs to undertake the task. Because of its importance as a model for polar biology, low temperature biology, and dehydration tolerance, I present the assembled, annotated, and characterized genome of B. antarctica. Because of its importance as a model for diapause and low temperature biology, I present the assembled, annotated, and characterized genome of S. bullata.

iii

Dedication

To Alexander Edwin who inspires me to be the very best I can be.

iv

Acknowledgments

First, I would like to thank my friends and family who have supported me during this long and difficult process. Without their love and help I never would have achieved my goals. Second, I would like thank my advisor, Dr. David Denlinger, for his great advice and superlative editing. Next, I would like to thank my other committee members,

Dr. Zakee Sabree, Dr. Amanda Simcox, and Dr. Joseph Williams, for their valuable comments. Finally, I would like to thank my collaborators and labmates. Together, as a team, is how we push science forward.

v

Vita 2005...... B.S. Chemistry, Milligan College

2008...... M.S. Mathematics, East Tennessee State University

2008 to present ...... Graduate Teaching Assistant, The Ohio State University

Publications

1. Paquette, C., Joplin, K. H., Seier, E., Peyton, J. T., & Moore, D. (2008) Sex- specific differences in spatial behavior in the flesh crassipalpis.

Physiological Entomology, 33, 382-388.

2. Michaud, R. M., Teets, N. M., Peyton, J. T., Blobner, B. M. & Denlinger, D. L.

(2011) Heat shock response to hypoxia and its attenuation during recovery in the , Sarcophaga crassipalpis. Journal of Insect Physiology, 57, 203-210.

3. Geji, R., Lou, Y., Munther, D. & Peyton, J. (2011) Convergence to ideal free dispersal strategies and coexistence. Bulletin of Mathematical Biology, 74, 257-299

4. Teets, N. M., Peyton, J. T., Ragland, G. J., Colinet, H., Renault, Hahn, D. A., &

Denlinger, D. L. (2012) Combined transcriptomic and metabolomic approach uncovers molecular mechanisms of cold tolerance in a temperate flesh fly, Physiological

Genomics, 44, 764-777.

vi

5. *Teets, N. M., *Peyton, J. T., Colinet, H., Renault, D., Kelley, J. L., Kawarasaki,

Y., Lee, R. E. & Denlinger, D. L. (2012) Gene expression changes governing extreme dehydration tolerance in an Antarctic insect. Proceedings of the National Academy of

Sciences, 50, 20744–20749.

6. Kelley, J. L., Peyton, J. T., Fiston-Lavier, A., Teets, N. M., Yee M., Johnston, S. J.,

Bustamante, C. D., Lee, R. E. & Denlinger, D. L. (2014) Compact genome of the

Antarctic is likely an to an extreme environment, Nature

Communications, 5, 4611.

7. The International Glossina Genome Initiative. (2014). Genome sequence of the tsetse fly (Glossina morsitans): vector of African trypanosomiasis, Science, 344, 380-

366.

8. Kobelkovaa, A., Goto, S. G., Peyton, J. T., Ikeno, T., Lee, R. E. & Denlinger, D. L.

(2015) Continuous activity and no cycling of clock genes in the Antarctic midge during the polar summer, Journal of Insect Physiology, 81, 90 – 96.

Fields of Study

Major Field: Evolution, Ecology and Organizmal Biology

vii

Table of Contents

Abstract ...... ii

Dedication ...... iv

Acknowledgments...... v

Vita ...... vi

Table of Contents ...... viii

List of Tables ...... xiii

List of Figures ...... xvi

Chapter 1: Review of Mechanisms of Insect Cold Tolerance ...... 1

1.1 Nature of Cold Stress ...... 1

1.2 On Dealing with Ice ...... 2

1.3 Preparing for the Cold ...... 5

1.4 Rapid Cold Hardening ...... 6

1.5 Cryoprotective Dehydration ...... 9

1.6 Sarcophaga bullata ...... 10

1.7 ...... 12

1.8 Specific Goals ...... 13

viii

CHAPTER 2: Compact Genome of the Antarctic Midge is Likely an Adaptation to an

Extreme Environment ...... 14

Citation ...... 14

Abstract ...... 14

2.1 Introduction ...... 16

2.2 Results ...... 18

2.3 Discussion ...... 24

2.4 Methods ...... 26

2.4.1 Biological Sample...... 26

2.4.2 DNA Library Preparation and Sequencing ...... 27

2.4.3 PacBio Library Preparation and Sequencing ...... 27

2.4.4 De Novo Genome Assembly ...... 28

2.4.5 Repeat Annotation ...... 29

2.4.6 Gene Annotation ...... 30

2.4.7 Comparative Analyses ...... 31

2.4.8 Functional Enrichment Analysis ...... 32

2.4.9 Polymorphism Detection ...... 32

2.4.10 Demography From a Single Genome ...... 33

2.5 Acknowledgements ...... 33

ix

2.6 Tables ...... 34

2.7 Figures ...... 37

Chapter 3: Gene Expression Changes Governing Extreme Dehydration Tolerance in an

Antarctic Insect ...... 41

Citation ...... 41

Abstract ...... 41

3.1 Introduction ...... 42

3.2 Results and Discussion ...... 45

3.3 Methods ...... 55

3.4 Tables ...... 59

3.5 Figures ...... 61

Chapter 4: Sequencing the Genome of the Flesh Fly, Sarcophaga bullata, Provides a

Platform for Physiological Research ...... 65

Abstract ...... 65

4.1 Introduction ...... 65

4.2 Methods ...... 67

4.2.1 Source of ...... 67

4.2.2 Sequencing...... 68

4.2.3 Quality Control ...... 69

x

4.2.4 Genome Assembly ...... 70

4.2.5 Annotation ...... 70

4.2.6 Comparative Genomics ...... 71

4.3 Results and Discussion ...... 72

4.3.1 Sequencing and Assembly ...... 72

4.3.2 Annotation ...... 74

4.3.3 ...... 75

4.3.4 Library Specific Expression ...... 76

4.4 Conclusions ...... 77

4.5 Tables ...... 79

Chapter 5: Combined Transcriptomic and Metabolomic Approach Uncovers Molecular

Mechanisms of Cold Tolerance in a Temperate Flesh Fly ...... 82

Citation ...... 82

Abstract ...... 82

5.1 Introduction ...... 83

5.2 Methods ...... 86

5.3 Results and Discussion ...... 92

5.4 Conclusions ...... 105

5.5 Acknowledgements ...... 106

xi

5.6 Tables ...... 107

5.7 Figures ...... 115

Appendix A: Supplement to Chapter 2 ...... 125

A.1 Supplementary Methods ...... 125

A.2 Supplemental Tables ...... 131

A.3 Supplemental Figures ...... 145

Appendix B: Supplement to Chapter 3 ...... 154

B.1 Supplemental Materials and Methods ...... 154

B.2 Supplemental Tables...... 161

B.3 Supplemental Figures ...... 166

Appendix C: Supplement to Chapter 4 ...... 171

C.1 Supplemental Tables...... 171

Appendix D: Supplement to Chapter 5 ...... 185

D.1 Supplemental Tables ...... 185

References ...... 220

xii

List of Tables

Table 2.1 Genome assembly and annotation summary ...... 34

Table 2.2 Repeat content in B. antarctica...... 35

Table 3.1 GO enrichment analysis ...... 59

Table 3.2 GSA revealing enriched KEGG pathways during desiccation ...... 60

Table 4.1 Comparison of repeat masking ...... 79

Table 4.2 Summary of genic content ...... 80

Table 4.3 Comparison of exon and intron content ...... 81

Table 5.1 Number of differentially expressed probes ...... 107

Table 5.2 DAVID enrichment analysis for the Control vs. CS+2R comparison ...... 108

Table 5.3 DAVID enrichment analysis for the Control vs. RCH+CS+2R comparison . 109

Table 5.4 GSA of genes involved in recovery from cold shock ...... 110

Table 5.5 GSA of genes enriched in the C vs. RCH+CS+2R comparison ...... 111

Table 5.6 Drosophila cold stress genes expressed during recovery in S. bullata ...... 112

Table 5.7 Metabolic pathways modulated by RCH ...... 113

Table 5.8 Coordinated pathways enrichment during recovery from cold shock ...... 114

Table A.1 Flow cytometry estimates for species ...... 131

Table A.2 Comparison of different assemblers ...... 132

Table A.3 Nested TEs ...... 133

Table A.4 Distribution of unique TEs...... 134 xiii

Table A.5 Distance to closest gene from annotated TE ...... 137

Table A.6 CEGMA analysis of the B. antarctica assembled genome ...... 138

Table A.7 CEGMA analysis of five ...... 139

Table A.8 GC content in the five Dipteran species used for comparative analyses ...... 140

Table A.9 Presence of piRNA pathway genes in B. antarctica assembly ...... 141

Table A.10 Coding region length summaries for each species ...... 142

Table A.11 Coding region length summaries for loci with one-to-one orthologs ...... 143

Table A.12 Intron length summaries for each species ...... 144

Table B.1 GO enrichment analysis of cryoprotective dehydration ...... 161

Table B.2 GSA revealing enriched pathways during cryoprotective dehydration ...... 162

Table B.3 GO enrichment analysis of CD VS D comparison...... 163

Table B.4 Summary of read statistics from Illumina sequencing ...... 164

Table B.5 Primers used for qPCR validation ...... 165

Table C.1 Summary of Illumina read filtering...... 171

Table C.2 Reapr Summary ...... 172

Table C.3 Assembly statistics ...... 173

Table C.4 Summary of repeat elements ...... 174

Table C.5 Summary of level 2 molecular function Gene Ontology terms ...... 175

Table C.6 Summary of level 2 cellular process Gene Ontology terms ...... 176

Table C.7 Summary of level 2 cellular component Gene Ontology terms ...... 177

Table C.8 Gene Ontology terms enriched in S. bullata v. dipteran comparison ...... 178

Table C.9 Library specific expression ...... 183

xiv

Table D.1 Differentially expressed probes in Control vs. CS+2R comparison ...... 185

Table D.2 Differentially expressed probes in Control vs. RCH+CS+2R comparison ... 189

Table D.3 Expression of heat stress genes during recovery from cold shock ...... 193

Table D.4 Expression of hypoxia genes during recovery from cold shock ...... 198

Table D.5 Expression of hyperoxia genes during recovery from cold shock ...... 201

Table D.6 Expression of oxidative stress genes during recovery from cold shock ...... 204

Table D.7 Metabolite content in response to RCH and cold shock ...... 213

Table D.8 Metabolite pathway enrichment analysis ...... 215

xv

List of Figures

Figure 2.1: Larval and adult stages of B. antarctica...... 37

Figure 2.2 Distribution of genome annotations among five Diptera ...... 38

Figure 2.3 Orthologous gene clusters ...... 39

Figure 2.4 Demographic history inferred from a single B. antarctica genome ...... 40

Figure 3.1 Expression summary ...... 61

Figure 3.2 Pathway diagrams illustrating autophagy-related genes ...... 62

Figure 3.3 Similarity between gene expression profiles ...... 64

Figure 5.2 Effect of rapid cold-hardening ...... 116

Figure 5.3 Expression heat map ...... 117

Figure 5.4 Multivariate analysis of expression ...... 118

Figure 5.5 Metabolomic response to rapid-cold hardening and cold shock ...... 120

Figure 5.6 Heat map and multivariate analysis of metabolomics ...... 121

Figure A.2 Distribution of 17-mers from raw sequence data ...... 146

Figure A.3 Distribution of scaffold lengths from assembled genome ...... 147

Figure A.4 Coverage histogram of bases in assembled genome ...... 148

Figure A.5 cox1 relationship among B. antarctica individuals ...... 149

Figure A.6 Codon usage bias estimates for each species...... 150

Figure A.7 Intron size distribution comparison ...... 151

Figure A.8 Demographic history inferred from a single B. antarctica genome ...... 152 xvi

Figure A.9 Demographic history inferred from a single B. antarctica genome ...... 153

Figure B.1 Results of qPCR validation experiment ...... 166

Figure B.2 Metabolomic response to desiccation and cryoprotective dehydration ...... 168

Figure B.3 Hierarchical clustering of the metabolomics dataset ...... 170

xvii

Chapter 1: Review of Mechanisms of Insect Cold Tolerance

1.1 Nature of Cold Stress

Because of their high surface area to volume ratio, insects are particularly susceptible to thermal stress. Low temperatures have many detrimental effects on insects including altering enzyme kinetics, interfering with mobility (Kelty and Lee, 1999;

Powell and Bale, 2006; Kelty et al., 1996), decreasing fecundity (Rinehart et al.,2000), decreasing longevity (Powell and Bale, 2005), and undermining spatial conditioning

(Kim et al., 2005). An insect’s ability to overcome or mitigate these challenges determines if an insect survives and its capacity for cold tolerance thus limits an insect’s range and guides its evolution. For insects to overwinter in temperate and more extreme climates, mechanisms have evolved that enable insects to survive these stresses.

There are different intensities of cold exposure. As insects are exposed to relatively mild cold, cellular ion balance regulation is compromised (Kostal et al., 2006).

This interferes with neuromuscular function (Staszak and Mutchmor, 1973), leading to a reversible cessation of movement termed chill coma (Kostal et al., 2004; MacMillan and

Sinclair, 2011). Although chill coma is reversible, in many cases it may represent an ecological end point because the insect cannot seek a more favorable microclimate or evade predators until temperatures have risen and homeostasis reestablished. If exposure 1 is extended, ion balance disruption, depletion of ATP (Dollo et al., 2010), and build up of toxic metabolic end products (Rojas and Leopold, 1996) leads to indirect chill injury.

When subjected to harsher temperatures direct chill injury occurs which is characterized by the denaturation of (Feder and Hofman 1999), restructuring of the cytoskeleton (Kim et al., 2006), and most importantly membrane damage caused by membrane phase change (Kostal, 2010).

It may seem odd to not simply define different intensities of cold shock with temperature ranges. The intensity of a cold shock is inherently relative: what is deadly cold to one insect is business as usual for another. For example -5 C will kill Drosophila melanogaster (Kostal et al., 2011) but a Himalayan midge will remain mobile

(Kohshima, 1984). Time is also an important consideration. In Sarcophaga crassipalpis, a 25 day exposure to 0 C is lethal (Chen and Denlinger, 1992), but that same temperature can be protective if a two hour exposure precedes a cold shock (Lee et al., 1987).

1.2 On Dealing with Ice

At low temperatures ice formation is a serious risk. The mechanical stress of ice crystal formation can perforate cellular membranes resulting in massive damage. A small volume of water does not freeze immediately at its melting point; instead, it enters a supercooled state. The supercooling point (SCP) is the temperature at which ice will spontaneously form (Lee, 1989). This can be substantially lower than the melting temperature. The smaller the body of water, the higher its capacity for supercooling

2

(Angell, 1982). Because insects act as small bodies of water, they can supercool extensively.

There are two main strategies insects employ when dealing with ice formation: freeze tolerant and freeze avoiding (Salt, 1961). While freeze tolerant insects survive internal ice formation, the more common freeze avoiding insects die (Bale, 1989). The two strategies have differing ways of handling the threat of ice. Freeze avoiding insects minimize the chances of ice formation by having a low SCP while freeze tolerant insects maintain a higher SCP in order to avoid uncontrolled ice formation (Zachariassen, 1985;

Mazur, 1984).

Insects often have a SCP above that of a similarly sized pool of water. This is due to the presence of ice nucleators which promote freezing at higher temperatures

(Zachariassen and Hammel, 1976). By reducing the number of ice nucleators, freeze avoiding insects can lower their SCP (Neven et al., 1986). By producing ice nucleating proteins and transporting them into the extracellular space, freeze tolerant insects can have ice form at relatively high temperatures and in the extracellular space (Zachariassen et al., 2008). Freeze avoiding insects may also produce antifreeze proteins that can lower the SCP by inhibiting ice nucleation (Duman, 2002).

Cryoprotectants are a large, loosely defined class of small, stable, and non-toxic molecules that enhance cold and freezing tolerance (Lee, 2010). Commonly found in extremely high concentrations, they work largely through their colligative properties

(Salt, 1961; Zachariassen, 1985). The molecules most commonly included in this class are polyhydric alcohols (glycerol, mannitol, and sorbitol) and sugars ( and

3 ), but others such as amino acids (proline and alanine) are sometimes included

(Lee, 2010; Lee, 1991). In freeze avoiding insects, the accumulation of cryoprotectants serves to increase the insect’s supercooling capacity.

In freeze tolerant insects there is a close relationship between osmotic stress and cold stress. This is because the insect is often not frozen solid (Lee and Costanzo, 1998).

Ice forms extracellularly due to the presence of ice nucleating proteins (Zachariassen et al., 2008). As the temperature is lowered, ice crystals grow incorporating more water molecules. This concentrates the cellular fluid as water moves from the cellular fluid into the extracellular fluid and eventually into the ice crystal. As the cellular fluid becomes more concentrated its melting point lowers, limiting intracellular ice formation (Lee and

Costanzo, 1998; Zachariassen, 1991). If cellular concentrations become too high, the insect will succumb to osmotic stress. Accumulation of cryoprotectants in freeze tolerant insects lowers the amount of ice formed at any temperature below the supercooling point thus reducing the osmotic stress experienced (Lee, 2011; Zachariassen, 1979).

While traditionally there are two strategies insects employ when dealing with ice formation (Salt, 1961), cryoprotective dehydration has recently emerged as a third strategy (Elnitsky et al., 2008; Holmstrup et al., 2002). Because supercooled water has a higher vapor pressure than ice, when insects are in close proximity to a frozen substrate, a vapor pressure gradient is established that draws water from the insect and into the ice

(Holmstrup, 2011). In invertebrates with permeable cuticles, dehydration will continue until the vapor pressure of the body fluids equals the vapor pressure of the ice (Holmstrup et al., 2002). At this point the melting point of the body fluids equals the ambient

4 temperature and will track it as the temperature is lowered (Holmstrup, 2011). Because of the amount of water lost, insects that undergo cryoprotective dehydration need to have substantial dehydration tolerance. For example, Belgica antarctica, the only insect in which cryoprotective dehydration was described, can survive the loss of 70% of its water content (Benoit et al., 2007).

1.3 Preparing for the Cold

Cold hardening is the ability to mitigate the negative effects of cold stress with preparation. Because of how important cold stress is to survival, insects have evolved mechanisms to cold harden and increase survivability including seasonal hardening, diapause, and rapid cold hardening.

As fall transitions into winter temperatures decline. Insects can sense this shift and over a period of weeks mount a defense called seasonal hardening (Salt, 1961). By synthesizing cryoprotectants (Salt, 1961), restructering membranes (Overgaard et al.,

2006), and producing stress proteins (Vesala, 2012) insects can significantly improve their survival to cold. For example, Sarcophaga crassipalpis, reared at 15°C can survive a 2h exposure to -10°C while those reared at 30°C cannot withstand a 1h exposure to -

10°C (Chen et al.1987 #2).

In addition to seasonal hardening, many insects overwinter in a dormant state called diapause. Diapause is a hormonally controlled, typically optional, developmental stage that is typified by significantly reduced metabolism and increased stress tolerance

(Denlinger, 1991). Many insects use day length as an accurate predictor of the seasonal

5 trials ahead (Zaslavski, 1988). This token stimulus enables them to prepare by sequestering additional reserves, producing stress proteins, accumulating cryoprotectants, and finding a good place to overwinter before harsh conditions prevail (Denlinger, 2002).

Diapause is associated with changes in expression of many genes, but the roles of many of these genes is poorly understood (Ragland et al., 2010)

In contrast to preparations made on the seasonal time scale, insects can increase cold tolerance on a time scale of minutes to hours in a process called rapid cold hardening

(RCH). For example, no pharate adults of Sarcophaga bullata survive a 2 hour exposure to -10°C, however, when that same challenge is preceded by 2 hours at 0°C, 91% survive through to adult emergence (Lee et al., 1987). RCH is associated with membrane restructuring (Michaud and Denlinger, 2007), shifts in ion homeostasis (Armstrong,

2012), and inhibition of apoptotic cell death (Yi et al., 2007). RCH is thought to be involved in tolerating the diurnal cycle of temperature (Kelty and Lee, 2001).

1.4 Rapid Cold Hardening

As mentioned before, RCH is an insect’s ability to prepare for cold stress when that stress is preceded by an intermediate temperature for minutes to hours (Lee et al.,

1987). In contrast to seasonal cold hardening and diapause a benefit from RCH can be garnered in as little as 10 minutes (Lee et al., 1987). The benefits of RCH are not limited to survival. RCH preserves reflexive and flight behaviors enabling escape from predators

(Kelty et al., 1996; Larsen and Lee, 1994). It also maintains courtship, mating, and fecundity, thus ensuring many offspring (Powell and Bale, 2005; Rinehart et al., 2000;

6

Shreve et al., 2004). The benefits of RCH can also be seen at the cellular level. RCH inhibits apoptotic cell death, improving cell viability by as much as 38% (Yi and Lee

2007).

Early experiments on Sarcophaga crassipalpis suggested that accumulation of cryoprotectants, including glycerol, is an important component of RCH (Lee et al., 1987;

Chen et al, 1987). The central nature of cryoprotectants has been drawn into question by experiments involving the sister species Sarcophaga bullata which has a similar RCH response, but no cryoprotectant synthesis was found (Teets et al., 2012). The role of the brain is also unclear. The fact that isolated tissues can rapidly cold harden ex vivo implies that the brain is not necessary for the RCH responce (Teets et al., 2013, Teets et al.,

2008), but the brain has been shown to enhance the RCH benefit (Yoder et al., 2006) and capa neuropeptides have been shown to contribute to the cold response (Terhzaz et al.,

2015).

Evidence is also mixed about the role that lipid bilayer remodeling may play in

RCH. In homeoviscous adaption, homeostasis of the lipid bilayer is maintained by adjusting the composition of the lipid bilayer in order to maintain proper fluidity

(Sinensky, 1974). Shortening the length of fatty acid chains increases fluidity at low temperatures. Increasing the proportion of unsaturated fatty acids increases fluidity by changing how efficiently the bilayer can pack (Kostal, 2010). Rapid production of oleic acid at the expense of almost all other fatty acids is accompanied by an increase in fluidity of the lipid bilayer during RCH in S.crassipalpis (Michaud and Denlinger, 2006;

Lee et al., 2006). This would help prevent the lipid bilayer from transiting into a gel state

7 during cold shock and help maintain homeostasis. Similar results have been seen in

Drosophila melanogaster with linoleic acid increasing in abundance instead of oleic acid

(Overgaard et al., 2005). In selected lines of D. melanogaster, however, RCH is observed without changes in lipid composition (MacMilan, 2009).

Ion homeostasis and signaling are emerging as potential players in RCH. Cold shock disrupts potassium homeostasis in the brain of D. melanogaster. Somewhat counterintuitively, RCH pretreatment results in a larger perturbation that is cleared at a higher rate (Armstrong et al., 2012). Recent evidence suggests that calcium signaling is critical to triggering RCH. During RCH, cellular calcium concentrations go up in isolated tissues (Teets et al., 2013; Teets et al., 2008). Removing calcium from the test media or inhibiting the calcium/calmodulin/CaMKII signaling axis with pharmaceuticals mitigates the RCH benefit.

Two other signaling pathways have been implicated in RCH: apoptosis signaling and mitogen-activated kinase (MAPK) signaling. As mentioned earlier RCH inhibits apoptosis in cold stressed cells. This seems to be accomplished by accumulation of Bcl-3, an anti-apoptosis protein, and reduction of caspases which are responsible for triggering the mass destruction of proteins (Yi et al., 2007; Yi and Lee, 2011). It has been noted in S.crassipalpis that p38 MAPK is activated within 10 minutes of exposure to 0°C and deactivated within 3 hours of returning to 25°C (Fujiwara and Denlinger, 2007). This is intriguing because of how closely it mirrors the RCH phenotype. Ten minutes is within the minimum time needed to initiate RCH and 0°C is the most effective temperature for

8 instigating the RCH response in that species (Chen et al., 1987) and the RCH benefit is lost after 2 hours (Coulson and Bale, 1990).

It has been argued that RCH, as outlined here, is not ecologically relevant because such sharp changes in temperature are not likely to be seen in nature. Interestingly, the rate of chilling does not seem to be the critical factor for triggering RCH. It seems that the amount of time the organism spends inside a window of temperatures is what is important for triggering RCH (Chen and Denlinger, 1987). Furthermore, RCH has been demonstrated in more ecologically relevant situations such as day-night like thermal cycles (Kelty and Lee, 2001) and exposures in the field (Kelty, 2007). Furthermore, RCH has been observed in diverse taxa including Diptera, Coleoptera, Lepidoptera,

Thysanoptera, Hemiptera, and Orthoptera (Lee et al. 1987; McDonald et al. 1997; Larsen and Lee, 1994). Taken together, this points to an evolutionarily important and ecologically relevant process that warrants further investigation.

1.5 Cryoprotective Dehydration

The literature is understandably sparse when it comes to insect cryoprotective dehydration (CD). Even though it may be a wide spread strategy, CD has only been described in 16 species, only one of which is a true insect (Holmstrup, 2014). It has been described in such taxonomically diverse groups as Nematoda (Wharton et al, 2003),

Tardigrada (Holmstrup, 2014), Oligochaeta (Holmsrup, 1994), and Insecta (Elnitsky et al., 2008). The physiological mechanisms of CD are just starting to be investigated.

9

Cryoprotective dehydration involves loss of a large proportion of an invertebrate’s water. This dehydration stress results in the denaturation of proteins. Heat shock proteins can help refold denatured proteins and are upregulated in response to CD in some (Clark et al., 2009; Sorensen et al., 2010; Sorensen and Holmstrup, 2013).

Trehalose metabolism is emerging as a potential driver of CD. In Megaphorura arctica, a springtail, CD is accompanied by up regulation of trehalose-6-phosphate synthase (Clark et al., 2009), which correlates well with trehalose accumulation (Sorenson and

Holmstrup, 2013; Bahrndorff et al., 2007). The role of membrane restructuring is uncertain in CD. Palmetic acid accumulation at the expense of linolenic acid, as well as an overall decrease in average degree of unsaturation and unsatureated to saturated ratio

(UFA/SFA), was observed in M. arctica (Bahrndorff et al., 2007). The decrease in degree of unsaturation and the decrease in UFA/SFA does not conform to predictions of homeoviscous adaptation theory. This may point to involvement of another molecule, such as cholesterol, modulating membrane fluidity. Given that CD has been demonstrated in such a large group of taxa, more study is need to understand the underpinnings and to develop a general understanding of CD.

1.6 Sarcophaga bullata

Sarcophaga bullata Parker (Diptera: Sarcophagidae) is a flesh fly whose range extends across the United States and (Byrd and Castner, 2009). Pupae have a mass of about 120 mg when fed ad libitum. The complete life cycle of S. bullata takes about 28 days when reared at 25°C. It is a holometabolous insect with 3 larval instars

10

(Denlinger and Zdarek, 1994). S. bullata is a freeze avoiding insect that overwinters in a pupal diapause (Henrich and Denlinger, 1982).

S. bullata and its sister species, S. crassipalpis, have been models for cold tolerance for more than 40 years (Denlinger, 1972). There is substantial literature devoted to understanding their seasonal acclimation (Chen et al., 1987 #2), diapause (Adedokum and Denlinger 1984; Henrich and Denlinger, 1982; Rinehart et al. 2007), rapid cold hardening (Lee et al., 1987; Chen et al., 1987; Michaud and Denlinger 2006), and acute cold stress (Joplin et al. 1990; Chen et al., 1990; Yocum et al., 1994).

In S. bullata the offspring of a mother who went through pupal diapause will not enter diapause even if strongly diapause-inducing environmental conditions prevail.

Termed the maternal effect, it has the evolutionary advantage of eliminating the possibility that the first generation in early spring will go into diapause because the day lengths have not yet surpassed the critical photoperiod (Henrich and Denlinger, 1982).

The hypothesis that the maternal effect is mediated by epigenetics is currently being investigated. If epigenetic involvement is confirmed, S. bullata would become an important insect model organism for the study of epigenetics.

S. bullata can be parasitized by wasps of the Nasonia (Desjardins et al.,

2010). An important model in their own right (Werren et al., 2010), N. vitripennis, when combined with S. bullata, will form a powerful platform for studying host-parasitoid interactions.

Sequencing the genome of an organism is an expensive and time consuming endeavor, but with the advent of next generation sequencing, it is possible for a single lab

11 or a small group of allied labs to undertake the task. Because of the many ways that S. bullata is an important model organism it is worth the cost and effort to sequence its genome. In so doing, we develop a platform that advances the research of cold tolerance, epigenetics, and parasitology.

1.7 Belgica antarctica

Belgica antarctica Jacobs (Diptera: Chironomidae) is the southernmost free living insect (Usher and Edwards, 1984). Its range includes the west coast of the Antarctic

Peninsula and adjacent stretching from 61" S in the South Shetland Islands to 65"

27' through the Gerlache Strait (Sugg et al., 1983). This freeze tolerant insect can overwinter in any of four larval instars (Kawarasaki et al, 2014). Larvae of this holometabolous insect reach a length of about 7mm (Usher and Edwards, 1984).

Relatively little of the insect’s two year life cycle is spent as an adult. Alive for about a week, the adults emerge, mate, lay eggs and die (Peckham, 1971).

B. antarctica is the only free living insect endemic to Antarctica and is thus uniquely evolved to withstand the harsh conditions common there. As such, it provides a powerful model for examining many environmental stressors. The most conspicuous stressor, cold, has been studied for decades in B. antarctica (Baust and Edwards, 1979).

Dehydration stress tolerance is also particularly important in Antarctica because most water is inaccessible as ice most of the year (Kennedy, 1993). B. antarctica is also one of the few organisms, and the only true insect, shown to undergo cryoprotective dehydration

(Elnitsky et al., 2008; Holmstrup, 2014). Other stressors, such as oxidative stress (Lopez-

12

Martinez et al., 2008), heat shock (Michaud et al., 2008), and exposure to fluids of extreme salinity (Elnitsky et al., 2009) have been explored. Because of its value as a model for stress and polar biology it is worth the cost and effort to sequence the genome of Belgica antarctica.

1.8 Specific Goals

The remainder of this dissertation is divided into four chapters, each of which addresses a different goal. First, in order to further develop Belgica antarctica as a model for stress and polar biology, I present its annotated and characterized genome. Second, in order to better understand the physiological mechanisms of dehydration and cryoprotective dehydration, I explore the transcriptome and metabolome of B. antarctica with RNA sequencing and gas chromatography coupled with mass spectrometry

(GCMS). Next, in order to better understand the physiological underpinnings of cold shock, recovery from cold shock, and rapid cold hardening, I explore the transcriptome and metabolome of Sarcophaga bullata using microarray and GCMS. Finally, in order to further develop S. bullata as a model for stress tolerance, diapause, and parasitology, I present the annotated and characterized genome of S. bullata.

13

CHAPTER 2: Compact Genome of the Antarctic Midge is Likely an Adaptation to an Extreme Environment

Citation

This chapter has previously been published. The work was done in a collaborative setting and represents the work of several people. I modeled repetitive elements and annotated the genome. I also performed the Gene Ontology analysis and helped prepare the manuscript.

Kelley, J. L., Peyton, J. T., Fiston-Lavier, A., Teets, N. M., Yee M., Johnston, S. J.,

Bustamante, C. D., Lee, R. E. & Denlinger, D. L. Compact genome of the

Antarctic midge is likely an adaptation to an extreme environment. Nat Commun

4611, 1-8, (2014).

Abstract

The midge, Belgica antarctica, is the only insect endemic to Antarctica, and thus it offers a powerful model for probing responses to extreme temperatures, freeze tolerance, dehydration, osmotic stress, ultraviolet radiation and other forms of environmental stress. Here we present the first genome assembly of an extremophile, the

14 first dipteran in the family Chironomidae, and the first Antarctic eukaryote to be sequenced. At 99 megabases, B. Antarctica has the smallest insect genome sequenced thus far. Although it has a similar number of genes as other Diptera, the midge genome has very low repeat density and a reduction in intron length. Environmental extremes appear to constrain genome architecture, not gene content. The few transposable elements present are mainly ancient, inactive retroelements. An abundance of genes associated with development, regulation of metabolism and responses to external stimuli may reflect for surviving in this harsh environment.

15

2.1 Introduction

Loss of the land bridge between South America and the Antarctic Peninsula isolated the southernmost continent 33 million years ago (Livermore et al, 2004), yielding a cold, desert environment inhospitable to most forms of terrestrial life. Although the surrounding ocean nurtures an abundance of marine life, and offshore islands offer summer breeding grounds for birds and seals, few are found year-round in

Antarctica’s terrestrial . Insects, the dominant life form on most continents, are represented by a single endemic Antarctic species, a wingless midge, Belgica antarctica

(Diptera: Chironomidae) (Convey and Block, 1996; Sugg and Baust, 1983), a species first noted by a naturalist aboard the S.Y. Belgica, a Belgian exploratory ship that plied the waters off the Antarctica Peninsula at the end of the 19th century (Peckham, 1971). In its patchy habitat along the Antarctic Peninsula, B. antarctica is subjected to a range of environmental onslaughts including temperature extremes, periodic desiccation, exposure to both fresh water ice melt and high-salinity sea water, intense ultraviolet exposure, high nitrogen generated from penguin rookeries and elephant seal wallows and high winds (Teets and Denlinger, 2014; Elnitsky et al., 2009; Lopez-Martinez et al.,

2008). The adults like those of many other species living on wind-swept islands, are apterous (wingless). The larvae (Fig. 2.1a), encased in ice for most of the year, require 2 years to complete their development and then pupate and emerge as adults (Fig. 2.1b) at the beginning of their third austral summer. The apterous adults crawl over surfaces of rocks and other substrates, mate, lay eggs and die within 7–10 days after emergence.

16

Unusual adaptations, including winglessness, freeze tolerance, severe desiccation tolerance and constitutive expression of heat shock protein (Rinehart et al., 2006), allow this fly to survive in the inhospitable climate of rocky outcrops along the Antarctic

Peninsula. The genome of this fly can be expected to offer insights into genomic processes and genome evolution essential to its survival. At the molecular level, select genes that have been examined in B. antarctica include those encoding heat shock proteins (Rinehart et al., 2009), the antioxidant enzymes catalase and superoxide dismutase (Lopez-Martinez et al., 2008), a collection of genes that responded to changes in hydration state (Lopez-Martinez et al., 2008; Teets et al., 2012) and an aquaporin

(Goto et al., 2011). It is clear from these molecular studies that this species displays some unusual patterns of gene expression. For example, unlike most organisms, the messenger RNAs encoding heat shock proteins, catalase and superoxide dismutase are expressed at high levels all the time, not just in response to a sudden stress. Interestingly, there are also Antarctic fish species that have constitutive expression of heat shock protein 70 (Buckley et al., 2004; Hofman et al., 2000). The novelty of these responses suggests there are unique genomic adaptations to cope with extreme environments.

The genome we present for the extremophile B. antarctica is the first for a dipteran in the family Chironomidae; it consists of 99 megabase pairs assembled using over 100-fold depth coverage of the genome. It is the smallest insect genome discovered thus far. This unusually small genome has low repeat content and general lack of transposable elements (TEs), which are mainly limited to retroelements. The gene content is similar to other Diptera; however, intron length and repeat elements are greatly

17 reduced. Genes that are abundant compared with the related dipteran Anopheles gambiae are associated with development, regulation of metabolism and responses to external stimuli. The genome provides a foundation for studying extremophile biology and insect genome evolution.

2.2 Results

The genome of B. antarctica is the smallest yet reported for an insect. The estimate of total genome size based on flow cytometry is 1C=99.25±0.4 megabase pairs

(Mbp) for the female and 1C=98.4±0.1 Mbp for the male (Supplementary Methods;

Supplementary Fig. 1; Supplementary Table 1). On the basis of the raw sequence reads, we estimate the size of the B. antarctica genome to be >89.5 Mbp but <105 Mbp

(Supplementary Fig. 2). Previous cytological preparations of polytene from salivary glands indicate B. antarctica has three linkage groups (2n=6) (Atchley and

Davis, 1979). We used a single of B. antarctica of unknown sex from Cormorant

Island, near Palmer Station, Antarctica for the reference genome, using Illumina sequencing technology and Velvet de novo (Zerbino and Birney, 2008) for assembly.

Several assemblers were compared (Supplementary Table 2). Paired-end reads from

RNA-seq data (Teets et al., 2012) were used to improve the assembly by scaffolding contigs, resulting in 5,064 scaffolds. One Pacific Biosciences RSII SMRTbell library was generated to scaffold the assembly, which added minimal scaffolding owing to the limited amount of DNA in a single individual. The size of the assembled genome was

89.6 Mbp, including ambiguous bases; this represents over 90% of the total genome

18

(Table 2.1). The assembly consists of 5,003 contigs >300 bp (Supplementary Fig. 3), with an N50 contig length of 98.2 kilobases (kb) and an average coverage of x177

(Supplementary Fig. 4). A total of 83.89 Mbp (93.7% of the assembled genome) was contained in 1,256 contigs >10 kb. The longest contig assembled was over 622 kb. These multiple lines of evidence as well as the identification of nearly all (97%) core eukaryotic genes suggest a high-quality assembled genome (see also Supplementary Methods). The genome of B. antarctica is smaller than even the tiny genomes reported for the body louse (104.7 Mbp) and Strepsiptera (108 Mbp) (Johnston et al., 2004; Kirkness et al,

2010). Previously published genome size estimates for three chironomid species, as well as new flow cytometry estimates for three additional members of the family

Chironomidae (1C=108–118 Mbp), further suggest that the B. antarctica genome is small even for a chironomid (Supplementary Methods; Supplementary Table 1). The small genome found in this Antarctic midge does not conform to the coupling frequently reported between low temperatures and large genomes (Hessen et al., 2013), thus suggesting that alternative evolutionary forces are driving the small size of this genome.

The only other Diptera with genomes near 100 Mbp are Colboldia fuscipes () and Psycoda cinerea (), cosmopolitan species whose genomes sizes may be constrained by early developmental traits (Schmidt-Ott et al., 2009).

Amplification, deletion and rearrangements of repeated DNA sequences may account for intraspecific variations in genome size (Biemont, 2008). In B. Antarctica the small size of the genome is a function of a paucity of repeats, including a reduction in the number of TEs and the reduced length of introns (Fig. 2.2). Analysis of the repeat content

19 of the genome assembly revealed that repeat elements comprise only 0.49% of the assembled genome and 10% of the entire genome, assuming that the discrepancy between the assembled genome size and the flow cytometry estimate is due to repeat elements.

Most of the repeat elements we identified were found in low-complexity sequences

(Table 2; Supplementary Methods; Supplementary Data 1; Supplementary Tables 3–5).

Using known TE libraries and examining raw reads, we estimate that only 0.016% of the genome failed to assemble due to TE insertions. Furthermore, no species specific TEs were detected (Supplementary Methods). The B. antarctica genome has ~0.12% of the genome as TEs, a small proportion compared with Aedes aegypti (47%) (Nene et al.,

2007), Anopheles gambiae (16%) (Sharakhova et al., 2007), Culex quinquefasciatus

(29%) (Arensburger et al., 2010) and Drosophila melanogaster (20%) (Quesneville et al.,

2005; Adams et al., 2000). In contrast to the above, the body louse Pediculus humanus humanus, similar to B. Antarctica, has a small genome (1C=105Mbp) associated with a low TE proportion (1.03% of genome) (Kirkness et al., 2010).

The TEs found in the B. antarctica genome were of multiple origins. The TEs represent 154 TE families from the three main TE orders (DNA elements, retroelements with long terminal repeat (LTR) and non-LTR retroelements). A total of 513 TE insertion locations were identified in the assembled genome (Table 2.2). Of those 513 TE insertions, 74 were nested with >1 TE insertion, while the remaining 439 clearly correspond to unique TE insertions. An additional 23 TE insertions were detected as absent from the assembly, as they were located at the flanking regions of the contigs.

Most of the unique insertions are from retroelements. The reduced number of TEs in the

20 genome was reflected in ribosomal genes. R1 and R2 non-LTR retroelements are present in nearly all arthropods and have been identified in the ribosomal DNA (rDNA) loci of nearly all lineages examined to date26. However, based on reconstruction of the rDNA region, B. antarctica lacks both R1 and R2 non-LTR retroelements. All lines of evidence suggest that the TE insertions in B. antarctica are of ancient origin.

Approximately 19.4% (just under 19 Mbp) of the genome is protein coding in B. antarctica and contains 97% of the core eukaryotic genes Supplementary Methods;

Supplementary Tables 6 and 7). A large proportion of the genome is coding in comparison with Ae. aegypti (22 Mbp, 1.6% of the genome), An. gambiae (20.7 Mbp,

7.6%), C. quinquefasciatus (24.9 Mbp, 4.3%) and D. melanogaster (22.8 Mbp, 13.6%)

(Fig. 2.2). A total of 13,517 protein-coding genes were annotated, underscoring that loss of gene function is not driving the small genome of B. antarctica. On the basis of a cox1 sequence data, our sample clusters with other samples collected at the same location

(Supplementary Fig. 5). Of the 13,517 gene models, 12,914 gene models were supported by at least one RNA-seq read, and 9,011 models were supported by at least 100 RNA-seq reads. Among the annotated genes, 8,575 genes have unique alignments to entries in the

SwissProt database, and 10,557 genes have matches in the non-redundant database.

Genes are clustered in regions of relatively high GC content (GC content of coding regions is 47%, compared with a 37% GC content for the non-coding, Supplementary

Table 8).

We compared the B. antarctica genome with that of four other dipteran species, three mosquitoes, Ae. aegypti, An. gambiae, C. quinquefasciatus and D. melanogaster,

21 the insect with the most completely annotated genome. Overall, B. antarctica has intermediate genome GC content but a lower coding GC% than any of the other four

Diptera (Supplementary Table 8). Analysis of codon usage suggests that B. antarctica is not unusual compared with the other four dipteran species (Supplementary Fig. 6).

Potential clusters of orthologous genes for comparative analyses were determined using annotations from the four dipteran species (Fig. 3). In an orthoMCL (Stockert and Roos,

2003) comparison between the five species, 4,910 genes were unique to B. antarctica, and 3,582 one-to-one orthologous genes between all five species were identified. Given the lack of TEs in the B. antarctica genome, we interrogated the PIWI-interacting RNA

(piRNA) pathway genes (Supplementary Methods). The piRNA pathway serves to control transposon activity (Brennecke et al., 2007). We identify several key players in the piRNA pathway that are absent from the B. antarctica genome (Supplementary Table

9). Intron size distribution was compared with protein-coding length distribution, calculated for the one-to-one orthologs as well as all genes for each of the five dipteran species (Supplementary Tables 10–12; Supplementary Fig. 7). The comparison showed that reduction in intron length also contributed to the reduced size of this genome.

While the number of genes is consistent with other Diptera, the relative proportion in different ontologies varies between B. antarctica and An. gambiae. Gene ontology (GO) terms were assigned to the gene models for B. antarctica and the published genes of An. gambiae using Blast2GO (Conesa et al., 2005); this yielded 8,856 and 8,653 genes, respectively, with at least one GO term. A comparison of the gene sets using Fisher’s exact test revealed 162 GO terms positively enriched in B. antarctica and

22

20 terms negatively enriched (Supplementary Data 2). Many of the positively enriched terms fall into two broad categories, ‘development’ (38 terms) and ‘regulation of biological processes’ (50 terms).

The effective population size of B. antarctica has been decreasing over the past

10,000 years. Mapping reads onto the assembled genome allowed us to identify 195,860 of the 88,780,579 non-repeat-masked base pairs in the assembled genome as putative heterozygotes; this is an ~0.2% heterozygosity rate, which suggests an average of one heterozygous position for every 450 bp in this single individual (in contrast D. melanogaster has an order of magnitude more heterozygosity at ~2%). The result is similar (~185,000 single-nucleotide polymorphisms (SNPs) in 83.89 Mbp) when the analysis is limited to contigs greater than 10 kb. Using the D. melanogaster single- nucleotide mutation rate of 8.4 X 109 per site per generation (ref. 30), we estimate that the time-averaged effective population size of B. antarctica is ~60,000 diploid individuals. We used a pairwise sequentially Markovian coalescent analysis (Li and

Durbin, 2011) to infer population changes from a single individual to make inferences about population change over time (Fig. 2.4). Assuming that the mutation rate estimated for B. antarctica is correct, the analysis suggests that the population reached a population size that was maximum just prior to the last glacial maximum, suggesting that the midge populations declined markedly at the glacial maximum but survived in refugia during the period of extensive glaciation. The use of alternate mutation rates would, of course, shift the estimates either higher or lower (Supplementary Figs 8 and 9). Our work is consistent with current hypotheses on Antarctic arthropod dispersal, indicating that most endemic

23 species established well before the last glacial maximum and survived in isolated refugia during glacial periods (Allegrucci et al., 2006). Moreover, low levels of genetic diversity suggest a small effective population size, implying that strong selective pressure drove the fixation of adaptive alleles underlying these unique features of the midge genome.

2.3 Discussion

The small genome size of B. antarctica is achieved by a reduction in repeats, TEs and intron size, a result similar to that reported for the bladderwort plant (Utricularia gibba) (Ibarra-Laclette et al., 2013) and pufferfish (Tetraodon nigroviridis) (Roest et al.,

2000), whose small genomes are also attained by a great reduction in non-genic DNA.

Intron size has been correlated to TE number in D. melanogaster (Cridland et al., 2013), suggesting that in B. Antarctica small intron size may be a result of few TEs. There is strong evidence that DNA-mediated transposons (class II) are derived from horizontal gene transfer (Robertson, 2002; Robertson and Lampe, 1995); we hypothesize that horizontal gene transfer events are likely rare in the species-poor Antarctic environment, thus reducing the presence of TEs in Antarctic genomes. While gene content remains largely conserved, the absence of TEs does, however, have a major impact on certain gene classes (for example, the piRNA pathway genes). Moreover, there is evidence that the expression of Hsp90 may be a suppressor of TE movement through either direct interaction with piRNA biogenesis or transposon activation (Specchia et al., 2010). The constitutive expression of Hsp90 in B. antarctica larvae (Rinehart et al., 2006) may also contribute to the lack of active TEs in the genome.

24

Among negatively enriched terms, ‘odourant binding’ suggests a loss of sensory perception in B. antarctica, a feature that may reflect the limited food choices available to the midge in its Antarctic environment, as well as its mating behaviour that does not involve long-distance orientation. By contrast, the mosquito relies extensively on odour reception for the complicated challenges of finding vertebrate hosts and floral nectar sources. It is worth noting that any hypotheses derived from the GO analysis are preliminary, since our comparison was restricted to a single species. The availability of closely related species with well-assembled genomes will facilitate a more in-depth analysis. Notably absent in this genome were genes encoding late embryogenesis abundant (LEA) proteins; these are critical for surviving extreme dehydration in a close relative, the African sleeping midge, vanderplanki (Cornette et al., 2010).

Dehydration is also a critically important response in larvae of B. antarctica, but in this species other genes, including constitutively expressed heat shock proteins and genes involved in regulation of autophagy, likely contribute to dehydration survival (Teets et al., 2012).

Use of a single individual for sequencing imposed technological limitations due to the DNA input requirements needed for an assembly. The fresh mass of the fourth instar larvae used in the study was <1.5 mg, and it contained < 1 ưg of DNA. To add Pacific

Biosciences RSII (PacBio) data, we prepared the sequencing library from DNA extracted from a second individual. The PacBio data yielded only a modest improvement in assembly connectivity due to the low input of genomic DNA. This highlights the need for long-read sequencing technologies that have low input DNA requirements for assemblies

25 of small organisms that cannot be easily reared in the laboratory or readily collected in the field.

In recent times, several additional midge species have arrived in the northern perimeter of the Antarctic Peninsula, but these species are also found in Tierra del Fuego, suggesting that they are recent immigrants from South America (Convey and Block,

1996). B. antarctica has a single congener, B. albipes, a species restricted to one of the sub-Antarctic islands, Isles Crozet. The B. antarctica genome sets the stage for future comparative genomics of Antarctic and Sub-Antarctic species.

As the first polar insect and first freeze-tolerant insect to be sequenced, B. antarctica offers a unique opportunity to probe the genome architecture of an extremophile. Lynch (Lynch, 2006) argues that small species-effective size should lead to genome expansion, presumably due to a reduction in efficacy of purifying selection.

However, this is the opposite of what we observe. Among the genome’s conspicuous features, in fact, is its small size that is attained without reducing the number of protein- coding genes. By stripping the genome of repeats and TEs and by reducing the length of introns, the genome has been streamlined to the minimum yet reported for an insect. Our interpretation is, therefore, that the small midge genome is the result of genome adaptation via fixation of strongly selected mutations that overcame the opposing action of genetic drift inherent in small populations.

2.4 Methods

2.4.1 Biological Sample

26

The single fourth instar larva used for sequencing was collected near Palmer

Station, Antarctica (64_460S, 64_040W) in January 2011. It was held in its natural substrate at 2 ºC and shipped to the Denlinger home laboratory at The Ohio State

University, where it was held at 4 ºC, under a daily 16:8 h light:dark regime until it was used for sequencing. A single individual was used to reduce assembly issues introduced by genetic heterozygosity.

2.4.2 DNA Library Preparation and Sequencing

DNA was extracted using the Qiagen DNeasy Blood and Tissue Kit and sheared using a Covaris E220 (Woburn, MA) (duty cycle 10, intensity 5, cycles/burst 200 and time 180 s) to B400 (bp). Sheared genomic DNA was gel purified and used as input for Illumina sequencing library prep, with end-repair using the NEBNext end-repair kit (E6050L), and A-tailing with Taq polymerase. Ligation to Illumina paired-end adapters (PE-102-1003) was done using Ultrapure ligase (L603-HC-L) from Enzymatics, and amplification with iProof (Bio-Rad). Agencourt Ampure XP beads were used for clean-up and size selection at each step. The resulting DNA library was sequenced on a single lane of an Illumina HiSeq2000 at the Stanford Center for Genomics and

Personalized Medicine.

2.4.3 PacBio Library Preparation and Sequencing

Genomic DNA was sheared using a G-Tube (Covaris Inc.) to generate 10-kb fragments. The sheared DNA was converted into a SMRTbell library using the SMRT

27 bell Template Preparation Kit 1.0 (Pacific Biosciences, Menlo Park, CA). SMRTbell library templates were sequenced using standard SMRT sequencing with a P4 DNA polymerase on the Pacific Biosciences (PacBio) RSII system according to the manufacturer’s protocol in the Genomics Core at the Washington State University,

Pullman, WA.

2.4.4 De Novo Genome Assembly

Genome size determinations from flow cytometry were produced following procedures described in Hare and Johnston (Hare and Johnston, 2011) (see also

Supplementary Methods). The assembly individual was sequenced to over 100-fold coverage using one lane of Illumina HiSeq2000 sequencing technology with a 400-bp insert paired-end sequencing library. Filtered sequence reads were assembled using

Velvet de novo15. Two iterations of ERANGE (Mortazavi et al, 2010) were used to scaffold the assembled contigs with RNA-sequencing data from pooled larvae (Teets et al, 2012) (Supplementary Methods). Filtered sequence reads were mapped back to the reference genome using Burrows-Wheeler Aligner (bwa) with default parameters (Li and

Durbin, 2009). PacBio scaffolding was accomplished using PBJelly (Jelly 14.1.14 with blasr 1.3.1) (English et al., 2012). Mapped sequence reads to the assembled genome were also used to estimate the percent PCR duplicates in the sequencing library using

Picard MarkDuplicates tool (Picard, 2013) and to estimate coverage using BamTools

(Barnett et al., 2011). To assess assembly quality, RNA-sequencing data from pooled

28 larvae10 was mapped to the assembled genome using Bowtie with default parameters

(Langmean et al., 2009).

2.4.5 Repeat Annotation

Repeat annotation was accomplished using RepeatMaker (Smit et al., 1996-2010) and T-lex2 de novo pipeline (Fiston-Lavier et al., 2012; Fiston-Lavier et al., 2011). The

T-lex2 de novo pipeline uses reads mapped to the reference and identifies read pairs for which only one read is mapped successfully, called one-end anchored (OEA) pairs, as well as reads that are partially mapped, called split reads. To discriminate TE sequences from other insertions or reference repeated sequences, the unmapped reads from OEA pairs and split reads are BLATed against a library of TE sequences. Finally, a clustering step defines the TE insertion breakpoint on the reference sequence (here our assembly).

To avoid bias of the TE discovery owing to the quality of the TE library, we launched

RepeatMasker and the T-lex2 de novo pipeline using three previously curated TE libraries from: (i) the Drosophila, (ii) the An. gambiae TE libraries and (iii) a dipteran repeat library constructed from the 12 Drosophila genomes and An. gambiae (RepBase version 17.08 09-01-2012, http://www.fruitfly.org/data/p_disrupt/datasets/ASHBURNER/D_mel_transposon_seque nce_set.fasta version 9.41) (Smith et al., 2007). Species-specific TE reconstruction was attempted with ReAs (Li et al., 2005).

The R1 and R2 TE reconstruction attempt was a multi-approach attempt. Known

R1 and R2 elements were downloaded from NCBI, and BLAST was used to map B.

29

Antarctica raw reads to the available R1 and R2 element sequences. Few reads mapped to the rDNA from other species, and with no significant evidence for rDNA sequence.

Moreover, we undertook a targeted reassembly of the Rdna region using rDNA sequence from D. melanogaster and An. gambiae. Raw reads were mapped to the D. melanogaster and An. gambiae 18S and 28S rDNA sequence. All reads were used in a targeted reassembly of the region, which was compared with the reference assembly and an assembly from RNA-seq data obtained from Teets et al. (Teets et al., 2014) using Trinity de novo (Haas et al., 2013). All genomic reads were then mapped to the rDNA region using bwa (Li and Durbin, 2009) and visually inspected for evidence of insertions or reads with split-mappings of OEA mappings that would be indicative of TE insertion.

2.4.6 Gene Annotation

Gene annotation was accomplished using the MAKER annotation pipeline

(Cantarel et al., 2008) to map protein homology data, expressed sequence tag evidence and ab initio gene predictions to the draft genome. Protein homology data were provided by protein databases of D. melanogaster and Ae. aegypti obtained from FlyBase

(McQuilton et al., 2012) and VectorBase (Megy et al., 2012), respectively. To avoid spurious matches to repetitive regions of the genome, RepeatMasker was used to mask low-complexity regions (Smit et al., 1996-2010). In addition to the included libraries, a custom repeat library for use with RepeatMasker was created with RepeatModeler,

RECON, RepeatScout and TRF (Smit and Hubley, 2008-2010). Filtered RNA- sequencing reads from Teets et al. (Teets et al, 2012) were mapped to the repeatmasked

30 genome with Bowtie and TopHat and putative transcripts were assembled with Cufflinks

(Trapnell et al., 2010). The putative transcripts were used in MAKER as expressed sequence tag evidence. An iterative approach with four rounds of training was used with

MAKER and the training of the ab initio predictors SNAP (Korf, 2004) and AUGUSTUS

(Stanke et al, 2006). The benefit of the iterative approach is that with each round, the gene models improve and consequently training the ab initio predictors improves. For the first round, SNAP was not used and the included ‘fly’ hidden Markov model was used in AUGUSTUS. In subsequent rounds, gene models predicted in the previous round of MAKER were used to generate hidden Markov models for SNAP and AUGUSTUS.

Functional annotation was accomplished with Blast2GO (Conesa et al., 2005; Conesa and

Gotz, 2008). Transcripts predicted by MAKER were compared using BLAST with the

SwissProt database62 (E<1010) and the non-redundant database (E<1010).

Transcripts predicted by MAKER and the top 19 BLAST hits (blastx, NR, e-value

<.0001) were loaded into Blast2GO. GO terms were assigned based on BLAST hits and

InterProScan results (Burge et al., 2012). The set of core eukaryotic genes in the assembled genome was identified using CEGMA (Parra et al., 2007). Nuclear transfer

RNA were predicted using tRNAscan-SE 1.3.1 (Lowe an Eddy, 1997) using options -H and -y and Aragorn 1.2.346) using options -w -t -i116 -l -d (Supplementary Data 3).

2.4.7 Comparative Analyses

For comparative analyses, we used the first annotated isoforms for each locus from the annotations of An. gambiae (12,669) (Sharakhova et al., 2007; Holt et al., 2002),

31

Ae. aegypti (15,996) (Nene et al., 2007), C. quinquefasciatus (18,954) (Arensburger et al., 2010) and D. melanogaster (13,492) (McQuilton et al., 2012). Clusters of orthologous genes between the five species were identified using the OrthoMCL package

(Li et al., 2007). Effective number of codons, for codon usage estimates, was calculated using ENCPrime using the implementation of Nc’ that accounts for background nucleotide composition (Novembre, 2010). Sizes of introns and exons were calculated using in-house scripts.

2.4.8 Functional Enrichment Analysis

We conducted GO enrichment analysis to determine whether any functional groups of genes were enriched in B. antarctica relative to An. gambiae. GO annotations for B. antarctica and An. gambiae were obtained via Blast2GO (Conesa et al., 2005;

Conesa and Gotz, 2008), and compared with the built-in Fisher’s exact test (false discovery rate<0.05).

2.4.9 Polymorphism Detection

We identified SNPs in the individual used for the genome assembly. Putative

SNPs were identified by mapping trimmed sequencing reads back to the assembled genome using bwa (Li and Durbin, 2009). We then applied the Genome Analysis Toolkit

(McKenna et al., 2010) to the mapped reads for PCR duplicate removal, base quality score recalibration and indel realignment. SNP discovery as well as genotyping was performed using standard hard filtering parameters (DePristo et al., 2011)

32

2.4.10 Demography From a Single Genome

We estimated the demographic history of the single individual using the pairwise sequentially Markovian coalescent method (Li and Durbin, 2011). This method uses sequence data mapped to the reference genome from a diploid individual to infer ancestral effective population sizes at time points in the past determined by the rate of lineage coalescence at that time. The pairwise sequentially Markovian coalescent method was run on data mapped to contigs larger than 10 kb, with parameters -d 58 and -D 354 limits for coverage and bootstrap sampling was executed 100 times.

2.5 Acknowledgements

This work was funded by NSF OPP-ANT-0837613 and ANT-0837559 to D.L.D. and R.E.L. and NIH NRSA GM087069 to J.L.K. We thank Marc Mangel for making this collaboration possible through a timely introduction. We thank Jeffrey D. Jensen for useful discussions.

33

2.6 Tables

Table 2.1 Genome assembly and annotation summary Genome

Size (1n) 99 Mbp

Karotype 2n = 6

GC content 39%

Genes 19.4%

Assembly

Size in scaffolds >ize in scaffolds > 300bp 89.6 Mbp

Number of scaffolds > 500 bp 3,589

Number of scaffolds > 10 kb 1,256

N50 98,263

NG50 85,160

Annotation

Coding loci 13,517

Non-coding loci 337

34

Table 2.2 Repeat content in B. antarctica LTR, long terminal repeat; TEs, transposable elements. Repeat annotations were achieved by combining two distinct approaches: (i) RepeatMasker was used to annotate the all-kinds-of-repeats part of the assembly while (ii) T-lex2 was used to discover and annotate TEs in the genome that were missed by the assembly. Several libraries of TE consensuses were used for the annotation of individual TEs. While the Drosophila library improved the non-TE detection, the Anopheles library allowed detection of more TE sequences. We report here the maximum repeat content estimated using the different libraries. *Values in parenthesis correspond to the 15 TE sequences with multiple annotations (3 with DNA/non-LTR, 2 DNA/LTR and 10 LTR/non-LTR).

35

Table 2.2

Type of repeat Sequence Avg. length Total Total

number (bp) (min-max) coverage coverage

(bp) (%)

Low-complexity 8,536 32 (12 – 247) 276,261 0.31

Simple repeats 999 37 (19 – 306) 36,911 0.04

TEs

Class 1/retroelements

LTR 324 (12)* 228 (25-2,454) 74,297 0.08

Non-LTR 115 (13)* 207 (30-4,919) 26,554 0.03

Class II/DNA elements 59 (5)* 133 (32-865) 8,536 0.009

Small RNA 36 199 (42-3,605) 7,165 0.01

Total 10,084 43 (12-4,919) 429,724 0.49

36

2.7 Figures

Figure 2.1: Larval and adult stages of B. antarctica. Fourth (final) instar larvae of the Antarctic midge B. antarctica (a) and an adult male (b).

This is the southernmost insect and the only insect species endemic to Antarctica. Larval length is 6–7 mm, and the adult male is about 3 mm. Photo by Richard E. Lee Jr

37

Figure 2.2 Distribution of genome annotations among five Diptera Each panel is ordered with respect to overall assembled genome size. The four panels represent the total amount of sequence in each annotation: genome size, intron, coding sequence (CDS) and transposable elements (TEs).

38

Figure 2.3 Orthologous gene clusters Venn diagram of orthologous gene clusters among B. antarctica, An. gambiae, Ae. aegypti, C. quinquefasciatus and D. melanogaster. The numbers in each area indicate the number of orthologous gene clusters in each category, and the numbers in parentheses indicate the total number of genes in each area. The Venn diagram was generated at http://bioinformatics.psb.ugent.be/webtools/Venn/.

39

Figure 2.4 Demographic history inferred from a single B. antarctica genome Pairwise sequentially Markovian coalescent (PSMC) analysis for inferred historical population sizes using variant data from the sequenced diploid individual using a mutation rate of 0.84 x 108 per site per generation. Populations of B. antarctica reached a maximum size coinciding with the glacial maximum at 20,000 years ago (vertical line on graph). The x axis gives time measured by pairwise sequence divergence and the y axis gives the effective population size measured by the scaled mutation rate. The green lines correspond to PSMC inferences on 100 rounds of bootstrapped sequences, while the red line corresponds to the estimate from the data.

40

Chapter 3: Gene Expression Changes Governing Extreme Dehydration Tolerance in an Antarctic Insect

Citation

This chapter has previously been published. The work was done in a collaborative setting and represents the work of several people. N. M. Teets and I are co-first authors of this paper. I helped prepare samples, lead the analysis of the data, helped interpret the analysis and coauthored the manuscript.

Teets, N. M., Peyton, J. T., Colinet, H., Renault, D., Kelley, J. L., Kawarasaki, Y.,

Lee, R. E. & Denlinger, D. L. Gene expression changes governing extreme

dehydration tolerance in an Antarctic insect. Proc Nat Acad Sci 109, 20744-20749

(2012).

Abstract

Among terrestrial organisms, arthropods are especially susceptible to dehydration, given their small body size and high surface area to volume ratio. This challenge is particularly acute for polar arthropods that face near-constant desiccating conditions, as water is frozen and thus unavailable for much of the year. The molecular mechanisms

41 that govern extreme dehydration tolerance in insects remain largely undefined. In this study, we used RNA sequencing to quantify transcriptional mechanisms of extreme dehydration tolerance in the Antarctic midge, Belgica antarctica, the world’s southernmost insect and only insect endemic to Antarctica. Larvae of B. antarctica are remarkably tolerant of dehydration, surviving losses up to 70% of their body water. Gene expression changes in response to dehydration indicated up-regulation of cellular recycling pathways including the ubiquitin-mediated proteasome and autophagy, with concurrent down-regulation of genes involved in general metabolism and ATP production. Metabolomics results revealed shifts in metabolite pools that correlated closely with changes in gene expression, indicating that coordinated changes in gene expression and metabolism are a critical component of the dehydration response. Finally, using comparative genomics, we compared our gene expression results with a transcriptomic dataset for the collembolan, Megaphorura arctica. Although B. antarctica and M. arctica are adapted to similar environments, our analysis indicated very little overlap in expression profiles between these two arthropods. Whereas several orthologous genes showed similar expression patterns, transcriptional changes were largely species specific, indicating these polar arthropods have developed distinct transcriptional mechanisms to cope with similar desiccating conditions.

3.1 Introduction

For organisms living in arid environments, mechanisms to maintain water balance and cope with dehydration stress are an essential physiological adaptation. Insects, in

42 particular, are at high risk of dehydration because of their small body size and consequent high surface area to volume ratio (Gibbs et al., 1997). Physiological mechanisms for maintaining water balance in insects include adaptations to reduce cuticular water permeability (Gibbs, 1998) and mechanisms to reduce respiratory water loss (Chown,

2002). When water balance cannot be maintained, insects evoke a suite of molecular mechanisms to cope with cellular osmotic stress. For example, during periods of dehydration, heat shock proteins are up-regulated to minimize protein damage (Benoit et al., 2010), whereas aquaporins mediate water movement between cellular compartments

(Liu et al., 2011). However, we have a limited knowledge of the large-scalemolecular changes prompted by water loss.

Among terrestrial biomes, polar environments are particularly challenging from a water balance perspective, as water is frozen and therefore unavailable for much of the year (Kennedy, 1993). Polar arthropods are typically extremely tolerant of desiccation, with some being able to survive near-anhydrobiotic conditions (Montiel et al., 1998). One such dehydration-tolerant polar arthropod is the Antarctic midge, Belgica antarctica,

Antarctica’s only endemic insect and the southernmost free-living insect. Larvae of B. antarctica are one of the most dehydration-tolerant insects known, surviving a 70% loss of water under ecologically relevant conditions (Hayward et al., 2007). In this species, the ability to tolerate dehydration is an important adaptation for successful overwintering.

The loss of water enhances acute freezing tolerance (Hayward et al., 2007). In addition, overwintering midge larvae are capable of undergoing another distinct form of dehydration, known as cryoprotective dehydration (Elnitsky et al., 2008). During

43 cryoprotective dehydration, a gradual decrease in temperature in the presence of environmental ice creates a vapor pressure gradient that draws water out of the body, thereby depressing the body fluid melting point and allowing larvae to remain unfrozen at subzero temperatures (Holmstrup et al., 2002).

In this study, we used next-generation RNA sequencing (RNA-seq) to quantify genome-wide mRNA changes in response to both dehydration at a constant temperature and cryoprotective dehydration. Although our recent work on B. antarctica has revealed several key molecular mechanisms of dehydration tolerance, including expression of heat shock proteins (Lopez-Martinez et al., 2009), aquaporins (Goto et al., 2011; Yi et al.,

2011), and metabolic genes (Teets et al., 2012), we lack a comprehensive understanding of the genes and pathways involved in extreme dehydration tolerance. To date, only three studies have examined large-scale transcriptional changes in response to dehydration in insects, all of which were conducted on tropical species. Using a semiquantitative EST approach, Cornette et al. (Cornette et al., 2010) identified genes associated with anhydrobiosis in the African sleeping midge, Polypedilum vanderplanki, whereas Wang et al. (Wang et al., 2011) and Matzkin et al. (Matzkin and Markow., 2009) used microarrays to examine genome-wide transcriptional changes following dehydration in

Anopheles gambiae and Drosophila mojavensis, respectively. In addition to the insect studies, transcriptional responses to desiccation have been reported for an Arctic arthropod closely related to insects, the springtail (Collembola) Megaphorura arctica

(Clark et al., 2009), as well as a widely distributed collembolan, Folsomia candida

(Timmerman et al., 2009). Here, in response to dehydration, we report up-regulation of

44 recycling pathways such as the proteasome and autophagy with a concurrent shutdown of central metabolism. Complementary metabolomics experiments supported a number of our transcriptome observations, indicating a strong correlation between gene expression and metabolic end products during dehydration. Using comparative genomics, we also compared the molecular response to dehydration in the Antarctic species B. antarctica with that of the Arctic arthropod M. arctica (Clark et al., 2009).

3.2 Results and Discussion

The Antarctic midge, B. antarctica, is one of the most dehydration tolerant insects that has been characterized. In this study, we used RNA-seq to measure gene expression levels in response to the following treatments that hereafter we refer to as control, desiccation, and cryoprotective dehydration: control, held at 4 °C and 100% relative humidity, fully hydrated; desiccation, constant temperature of 4 °C and 93% relative humidity for 5 d, resulting in ~40% water loss; cryoprotective dehydration, gradually chilled over 5 d from −0.6 to −3 °C at vapor pressure equilibrium with surrounding ice and then held at −3 °C for 10 d (Elnitsky et al., 2008) (also yielded ~40% water loss).

Both dehydration treatments resulted in substantial changes in gene expression. Of the ~11,500 gene models that had enough reads to support estimation of differential expression, 3,275 and 2,365 were differentially expressed during desiccation and cryoprotective dehydration, respectively (Fig. 3.1A; Datasets S1 and S2). Hierarchical clustering analysis indicated that the desiccation and cryoprotective dehydration treatments yielded distinct transcriptional signatures (Fig. 3.1B). However, a majority of

45 the differentially expressed genes were shared between the two treatments (Fig. 3.1C), and downstream analyses revealed that many enriched pathways were identical. Thus, for clarity, we will primarily discuss the results of the desiccation treatment, whereas specific results from the cryoprotective dehydration treatment can be found in the Tables

S1 and S2. Additionally, a direct comparison of the desiccation and cryoprotective dehydration treatments, highlighting the expression differences between these two conditions, is provided in Dataset S3 and Table S3. However, it is worth mentioning that time differences between the two dehydration treatments (5 d for desiccation and 15 d for cryoprotective dehydration) may also contribute to differences between these treatments.

To validate our expression results, we used qPCR to measure expression of 13 genes in the same RNA samples used for RNA-seq. Overall, there was excellent agreement between the RNA-seq results and qPCR results (Fig. S1).

3.3.1 Functional Categories of Differentially Expressed Genes

To place these large-scale changes in gene expression into a meaningful context, we identified enriched functional categories using gene ontology (GO) enrichment analysis (Table 3.1) and enriched Kyoto encyclopedia of genes and genomes (KEGG) pathways using gene set analysis (GSA; Table 3.2). To distinguish between functional categories of genes that are turned on and off in response to desiccation, we separated the

GO enrichment analysis into lists of up- and down-regulated genes.

3.3.2 Functional Categories Up-Regulated During Desiccation

46

In response to desiccation, we observed enrichment of several functional terms, notably terms related to stress response, ubiquitin-dependent proteasome, actin organization, and signal transduction, specifically several GTPase enzymes that are involved in membrane trafficking (Table 3.2). The GO term “response to heat” was enriched in the up-regulated genes, and this category primarily encompasses the heat shock proteins (hsps), cellular chaperones that repair misfolded proteins in response to various environmental stressors (Feder and Hofmann, 1999), including heat, cold (Teets et al., 2012), oxidative damage (Girardot et al., 2004), and dehydration (Benoit et al.,

2010; Lopez-Martinez et al., 2009). Our group has demonstrated the importance of hsps in B. antarctica stress tolerance (Lopez-Martinez et al., 2009; Rinehart et al., 2006), but previous studies were limited to a few hsp genes obtained by targeted approaches. Here, we report up-regulation of numerous putative hsps, including members of the small heat shock protein (three members), hsp40 (two members), hsp70 (eight members), and hsp90

(one member) families (Dataset S1). We also observed ~1.8-fold up-regulation of hsf, the transcription factor that regulates hsp expression (Morimoto, 1998). In addition to chaperone activity, hsps target damaged proteins to the proteasome to prevent accumulation of dysfunctional proteins and to recycle peptides and amino acids

(Goldberg, 2003). Indeed, we detected enrichment of GO terms related to ubiquitin- dependent proteolysis (Table 3.1) in the desiccation up-regulated genes. Our results indicate coordinated up-regulation of hsps and proteasomal genes, which cooperatively function to repair and degrade damaged proteins during dehydration.

47

In our GSA, we observed positive enrichment of the KEGG pathway “Regulation of autophagy” during desiccation (Table 3.2). Autophagy is a catabolic process in which parts of the cytoplasm and organelles are sequestered into vesicles and digested in lysosomes (Maiuri et al., 2007), thereby conserving cellular macromolecules and energy during periods of stress and nutrient deprivation. Although autophagy can be an alternative means of programmed cell death, during times of stress, autophagy can reduce the amount of cell death by recycling cellular components and inhibiting apoptotic cell death (Maiuri et al., 2007). We hypothesize that during dehydration, the level of autophagy increases, which conserves energy and promotes survival during prolonged periods of cellular stress. We identified 92 homologs of genes with known function in autophagy and programmed cell death that were differentially expressed during desiccation and/or cryoprotective dehydration (Dataset S4). Several lines of evidence support the hypothesis that dehydration promotes autophagy while concurrently inhibiting apoptosis (Fig. 3.2A). This evidence includes the following. (i) An 11-fold up- regulation of sestrin during desiccation. Sestrins are highly conserved genes that have an antioxidant function and promote longevity by inhibiting apoptosis and increasing autophagy via inhibition of TOR signaling (Lee et al., 2010). (ii) Significant up- regulation of six authophagy-related signaling genes (atg1, atg6, atg8, atg9, atg13, and atg18) that carry out the essential cellular functions of autophagy (He and Klionsky,

2009). (iii) Up-regulation of four transcription factors, eip74EF, eip75EF, cabut, and maf-

S, that are positive regulators of autophagy in D. melanogaster (Gorski et al., 2003). (iv)

A threefold up-regulation of thread, a potent inhibitor of apoptotic cell death that prevents

48 activity of proapoptotic caspases (Lisi et al., 2000). (v) Up-regulation of proteasomal genes, suggesting cross-talk and cooperation between these distinct cellular recycling pathways (Korolchuk et al., 2010). We suspect that the autophagy pathway serves an important protective function by limiting cell death and turnover of macromolecules during dehydration, especially during the long Antarctic winter.

3.3.3 Functional Categories Down-Regulated During Dehydration

Upregulation of cellular recycling pathways, such as ubiquitinmediated proteasome and autophagy, likely serves to conserve energy during prolonged dehydration. Consistent with this idea, we observed down-regulation of genes related to general metabolism and ATP production (Table 3.1; Fig. 3.2B). Larvae of B. antarctica significantly depress oxygen consumption rates in response to dehydration (Benoit et al.,

2007) Metabolic depression is a common adaptation in dehydration-tolerant insects, presumably to minimize respiratory water loss and to minimize the loss of water bound to glycogen and other carbohydrates (Marron et al., 2003). This dehydration- mediated metabolic shutdown is strongly supported by gene expression data, as nearly 25% of all metabolic genes in our dataset were down-regulated in response to desiccation (Table

3.1). We noted a general shutdown of carbohydrate catabolism and ATP generation; nearly every gene involved in glycolysis, the tricarboxylic acid (TCA) cycle, and ATP synthesis is down-regulated (Fig. 3.2B). Furthermore, among our down-regulated genes, we observed enrichment of genes related to protein, lipid, and chitin metabolism, as well as energetically expensive processes such as membrane transport, including proton,

49 cation, carbohydrate, and amino acid transport. A decrease in metabolic activity was further supported by our GSA results; nearly every negatively enriched KEGG pathway

(i.e., pathways in which genes tended to be down-regulated) was related to metabolism, including several pathways related to carbohydrate and amino acid metabolism (Table

3.2). Thus, taken together, both GO enrichment analysis and GSA analysis of KEGG pathways revealed a coordinated shutdown of metabolic activity at the transcript level.

We hypothesize that these mechanisms may be particularly important for overwintering larvae, contributing to energy conservation during the long Antarctic winter.

3.3.4 Dehydration-Induced Changes in the Metabolome

To determine whether the above changes in metabolic gene expression correlated with changes in metabolic endpoints, we conducted a follow up metabolomics experiment with the same treatment conditions. Using targeted GC-MS metabolomics, we measured levels of 36 compounds in response to desiccation and cryoprotective dehydration. As with gene expression, desiccation and cryoprotective dehydration had a major impact on the metabolome, as the concentrations of 32 of the 36 compounds significantly changed in at least one treatment (Fig. S2). Although the metabolic changes induced by desiccation and cryoprotective dehydration were largely similar, our treatment groups were distinct from one another, as determined by hierarchical clustering

(Fig. S3).

We observed several distinct metabolic responses to desiccation, and these were generally supported by gene expression data. We noted the following. (i) Decreased

50 levels of the glycolytic intermediates glucose-6-phosphate and fructose-6-phosphate, which reflected down-regulation of glycolysis genes (Fig. 3.2B). Hexokinase and glucose-6-phosphate isomerase, the enzymes that synthesize glucose-6-phosphate and fructose-6-phosphate, were both significantly down-regulated (>1.5-fold). Additionally, we observed decreased levels of lactate, the endpoint of anaerobic respiration through glycolysis. (ii) Accumulation of citrate, which is evidence of decreased flux through the

TCA cycle, was supported by down-regulation of a number of TCA cycle genes (Fig.

3.2B). An alternative explanation for accumulation of citrate would be increased oxidation of fatty acids, but this hypothesis is not supported by the gene expression data, as a majority of fatty acid metabolism genes were down-regulated (Tables 3.1 and 3.2).

(iii) Increase in proline levels from 7.8 to 21.1 nmol/mg dry mass in response to desiccation, which was supported by 1.5-fold up-regulation of pyrroline-5-carboxylate reductase, the terminal enzyme of proline synthesis. Additionally, we observed 1.3-fold up-regulation of glutamate synthase and concurrent accumulation of glutamate, a precursor of proline, from 12.6 to 29.9 nmol/mg dry mass (Fig. S2). Although proline is a potent cryoprotectant in insects (Kostál et al., 2011) and confers desiccation tolerance in plants (Verbruggen and Hermans, 2008), proline has not been linked to dehydration in insects. (iv) Accumulation of several osmoprotective polyols, of which the quantities of sorbitol (increase from 0.5 to 4.3 nmol/mg dry mass) and mannitol (increase from 5.0 to

155.1 nmol/mg dry mass) exhibited the most dramatic changes. Additionally, fructose, a precursor for both mannitol and sorbitol, increased from 1.3 to 33.4 nmol/mg dry mass.

Although the genes involved in mannitol and sorbitol synthesis are poorly defined in

51 insects, we did observe 4.6-fold up-regulation of phosphoenolpyruvate carboxykinase, the rate-limiting step of gluconeogenesis (Hanson and Reshef, 1997). Up-regulation of this gene leads to increased glucose production via gluconeogenesis, with glucose serving as a central precursor for the synthesis of most sugar alcohols. Interestingly, we did not observe accumulation of glucose during dehydration (Fig. S2), suggesting glucose is being shunted to other pathways as soon as it is produced. On the whole, there was good agreement between gene expression and metabolomics data. However, some metabolite changes could not be correlated with changes at the transcript level, suggesting posttranscriptional levels of control. Also, in some instances, changes in gene expression may alter rates of metabolic flux that are not captured in these types of metabolomics analyses.

3.3.5 Comparative Genomics of Molecular Response to Dehydration

The transcriptomic response to dehydration has been studied in three other insects, the African sleeping midge P. vanderplanki (Cornette et al., 2010), the mosquito

A. gambiae (Wang et al., 2011), and the cactophilic fruit fly, D. mojavensis (Matkin and

Markow, 2009), as well as two closely related arthropods, the Arctic collembolan M. arctica (Clark et al., 2009) and the collembolan F. candida (Timmermans et al., 2009), thus facilitating cross-species comparisons of dehydration-induced gene expression. We observed several general similarities between our dataset and the transcriptome of P. vanderplanki, which inhabits temporary pools in tropical Africa. Like B. antarctica, dehydration in P. vanderplanki induced expression of a number of heat shock proteins,

52 including multiple members of the hsp70 family. Additionally, dehydration in P. vanderplanki causes up-regulation of genes involved in cell death signaling and ubiquitin-mediated proteasome, patterns that are also quite prevalent in our dataset.

However, one conspicuous difference between our dataset and that of P. vanderplanki is the absence of late embryogenesis active (LEA) proteins in the B. antarctica genome, despite B. antarctica and P. vanderplanki being in the same family, Chironomidae. LEA proteins are dehydration-associated proteins found in organisms ranging from bacteria to animals (Hand et al., 2011), but P. vanderplanki is the only true insect in which LEA genes have been identified.

Like B. antarctica, D. mojavensis is adapted to desiccating environments and, albeit warm, desert . As in our dataset, severe dehydration in D. mojavensis elicited significant modulation of numerous metabolic pathways, including downregulation of genes regulating flux through glycolysis and the TCA cycle (Matkin and Markow, 2009). Thus, it appears down-regulation of metabolism may be a general feature of xeric-adapted insects. In contrast, comparing our expression data with A. gambiae revealed little overlap between our dataset and the mosquito response to desiccation. Nonetheless, similar to our results, Wang et al. (Wang et al., 2011) observed down-regulation of numerous metabolic genes, particularly genes related to chitin metabolism.

The transcriptomic study of dehydration in M. arctica (Clark et al., 2009) included two treatments very similar to our desiccation and cryoprotective dehydration

53 treatments, allowing a formal comparison of the two datasets. M. arctica (formerly

Onychiurus arcticus) is found on numerous islands in the northern Palearctic (Hodkinson et al., 1994), and like B. antarctica is extremely dehydration-tolerant and capable of using cryoprotective dehydration as an overwintering strategy (Montiel et al., 1998).

Thus, we investigated whether B. antarctica and M. arctica share common transcriptional responses to desiccation and cryoprotective dehydration, despite their geographic and phylogenetic separation.

Using reciprocal blast, we identified 1,280 putative one-to-one orthologs between the B. antarctica gene models and the M. arctica EST library. Of these, we found 12 genes that were upregulated in response to both desiccation and cryoprotective dehydration in both species, and 7 that were down-regulated (Dataset S5). Of note, common up-regulated genes included an hsp40 gene, two genes involved in the ubiquitin- mediated proteasome, and a GTPase involved in membrane trafficking, thus supporting the central roles of these processes during dehydration. Among the seven down- regulated genes in common were four genes involved in carbohydrate hydrolysis and a single peptidase, indicating that down-regulation of metabolic genes may be a common attribute of dehydration. Additionally, there were 37 genes that were either up- or down- regulated in response to desiccation only (Dataset S6), and 2 genes up-regulated only during cryoprotective dehydration. Genes specific to cryoprotective dehydration were a gene involved in unfolded protein binding and an acid-amino acid ligase.

Despite the above similarities in dehydration-induced gene expression, the expression profiles of B. antarctica and M. arctica during dehydration were largely different. The

54

Venn diagrams in Fig. 3.3 A and B indicate that more differentially expressed genes are specific to a particular species than are shared between the two species. Also, hierarchical clustering indicates a high degree of separation in the transcript signatures of B. antarctica and M. arctica (Fig. 3.3C). Thus, the transcript signature for a particular group is more dependent on the species than the dehydration treatment it experienced. This result suggests that despite being adapted to similar habitats, B. antarctica and M. arctica have evolved distinct molecular responses to dehydration. General comparisons with a second collembolan transcriptomic dataset, that of F. candida (Timmermans et al., 2009), also revealed very little similarity to B. antarctica. In F. candida, desiccation at a constant temperature likewise results in down-regulation of lipid and chitin metabolism genes, but aside from these examples, very few genes showed similar expression patterns.

These differences in expression patterns may reflect different strategies for combating dehydration; whereas B. antarctica shuts down metabolic activity and waits for favorable conditions to return, F. candida relies on active water vapor absorption to restore water balance during prolonged periods of desiccation. However, because B. antarctica and collembolans are so phylogenetically distant, similar comparisons with closely related chironomids are needed to better understand the evolutionary physiology of dehydration tolerance in this taxonomic family that is so well known for its extreme tolerance of multiple environmental stresses.

3.3 Methods

55

Larvae of B. antarctica were collected on offshore islands near Palmer

Station (64°46′S, 64°04′W) in January 2010 and shipped to The Ohio State University.

Before an experiment, fourth-instar larvae were handpicked from substrate in ice water and left at 4 °C overnight on moist filter paper to standardize body water content.

For these experiments, larvae were exposed to the following conditions: control

(C, held at 100% relative humidity at 4 °C), desiccation (D, exposed to 93% relative humidity for 5 d at 4 °C), and cryoprotective dehydration (CD, temperature gradually lowered from −0.6 to −3 °C over 5 d in the presence of environmental ice and then held at −3 °C for 10 d). During cryoprotective dehydration, larvae lose water through the cuticle to the surrounding ice and remain unfrozen by decreasing the hemolymph melting point to match the temperature of the surrounding ice (Elnitsky et al., 2008). Both the desiccation and cryoprotective dehydration treatments resulted in ~40% water loss, with survival near 100%. Immediately after treatment, larvae were frozen at −70 °C, where they were held until RNA and metabolite extractions. Each treatment consisted of three biological replicates, with each replicate containing 20 larvae. Total RNA was extracted from larvae using TRIzol reagent (Life Technologies), and RNA-seq libraries were prepared with the Illumina TruSeq RNA Sample Preparation kit (Illumina) according to the manufacturer’s protocol. Libraries were checked for the correct insert size on an

Agilent Bioanalyzer 2100 and sequenced on an Illumina Genome Analyzer II. A summary of the raw sequencing data is provided in Table S4.

Reads were mapped to B. antarctica genomic contigs using Bowtie and TopHat

(Trapnell et al., 2009), and we counted the total number of sequencing reads that aligned

56 to each putative gene model in the draft B. antarctica genome using HTSeq. Genes were annotated using blastx (E-value cutoff of 1E−4) to compare our gene models with annotated protein sequences from Aedes aegypti and Drosophila melanogaster, and GO terms were assigned to each gene model with Blast2GO (Conesa et al., 2005).

Differentially expressed genes were determined using the R package DESEq.

(Anders and Huber, 2010). For hierarchical clustering of the phenotypic classes, we obtained variance stabilized data from DESeq, calculated a matrix of distances, and used the R package hclust for clustering. Enriched GO terms were determined using the R package GOsEq. (Young et al., 2010), with P values corrected using the Benjamini and

Hochberg method (Benjamini and Hochberg, 1995). We restricted the output to GO terms with ontology “Biological Process” to limit redundancy. Additionally, we tested for enriched KEGG pathways with the R package GSA (Efron and Tibshirani, 2007). For

GSA, we mapped our gene models to the A. aegypti proteome and tested the entire set of

A. aegypti KEGG pathways for enrichment. Expression results were validated by conducting qPCR on a subset of genes (Table S5).

For comparative analysis with M. arctica, we identified putative orthologs between B. antarctica and M. arctica using reciprocal blast. We restricted gene expression comparisons to the two treatments in ref. 18 that were analogous to our desiccation and cryoprotective dehydration treatments, the treatments named “0.9 salt” and “−2°C,” respectively. The M. arctica microarray data were obtained from

ArrayExpress (accession no. E-MEXP-2105) and analyzed using the R package limma according to the parameters outlined in ref. Rinehart et al. (2006). To determine overall

57 similarity in gene expression between groups, we conducted hierarchical clustering on the samples, restricting the analysis to orthologous transcripts.

Because a large number of metabolic genes were differentially regulated in our treatments, we also conducted a metabolomics analysis of the same treatment conditions.

Metabolomics experiments were conducted as in ref. Nene et al. (2007).

Additional methodological detail is provided in Appendix B.

58

3.4 Tables

Table 3.1 GO enrichment analysis desiccation Enrichment analysis of genes up-regulated or down-regulated in response to desiccation.

GO term Definition FDR No. up- or down- Total in regulated category Up GO:0006511 Ubiquitin-dependent protein catabolic 7.35E−03 29 82 process GO:0007465 R7 cell fate commitment 1.20E−02 10 14 GO:0009408 Response to heat 1.20E−02 21 50 GO:0007015 Actin filament organization 1.96E−02 28 80 GO:0006468 Protein phosphorylation 1.96E−02 86 363 GO:0007264 Small GTPase mediated signal transduction 2.75E−02 26 98 GO:0042176 Regulation of protein catabolic process 8.51E−02 6 8 Down GO:0006508 Proteolysis 9.33E−20 152 595 GO:0008152 Metabolic process 1.52E−17 193 827 GO:0055114 Oxidation-reduction process 1.62E−12 160 732 GO:0006030 Chitin metabolic process 4.22E−12 44 104 GO:0015992 Proton transport 2.55E−08 19 30 GO:0015986 ATP synthesis coupled proton transport 8.39E−07 14 19 GO:0005975 Carbohydrate metabolic process 1.48E−05 48 177 GO:0006754 ATP biosynthetic process 3.68E−05 17 32 GO:0006629 Lipid metabolic process 8.70E−05 41 159 GO:0006810 Transport 1.36E−04 133 756 GO:0006096 Glycolysis 2.38E−04 15 30 GO:0055085 Transmembrane transport 1.22E−03 82 408 GO:0006099 Tricarboxylic acid cycle 2.16E−03 15 36 GO:0006123 Mitochondrial electron transport, cytochrome 3.08E−03 6 7 c to O2 GO:0003333 Amino acid transmembrane transport 1.12E−02 11 28 GO:0006812 Cation transport 1.72E−02 15 40 GO:0044262 Cellular carbohydrate metabolic process 3.11E−02 5 6 GO:0008643 Carbohydrate transport 3.34E−02 18 61 GO:0015672 Monovalent inorganic cation transport 3.34E−02 4 4 GO:0015991 ATP hydrolysis coupled proton transport 3.34E−02 11 28 GO:0006032 Chitin catabolic process 5.91E−02 9 22 GO:0019083 Viral transcription 8.20E−02 6 11 GO:0006865 Amino acid transport 9.50E−02 7 15 GO:0009253 Peptidoglycan catabolic process 9.98E−02 6 13

59

Table 3.2 GSA revealing enriched KEGG pathways during desiccation Positive gene sets are enriched gene sets in which genes tend to be up-regulated, whereas negative gene sets are enriched gene sets in which genes tend to be down-regulated.

Gene set name Score Adjusted P value

Positive gene sets* Regulation of autophagy 1.27 <2E−4 TGF-β signaling pathway 1.11 <2E−4 mTOR signaling pathway 0.82 <2E−4 Endocytosis 0.68 <2E−4 Ether lipid metabolism 0.41 <2E−4 Negative gene sets* Glyoxylate and dicarboxylate acid -2.07 <2E−4 metabolism Glycolysis/gluconeogenesis -1.32 <2E−4 Starch and sucrose metabolism -1.19 <2E−4 Galactose metabolism −1.05 <2E−4 Nicotinate and nicotinamide metabolism −1.03 <2E−4 Propanoate metabolism −1.01 <2E−4 Pyruvate metabolism −0.85 <2E−4 Tryptophan metabolism −0.78 <2E−4 β-Alanine metabolism −0.71 <2E−4 Valine, leucine, and isoleucine −0.60 <2E−4 degradation rginine and proline metabolism −0.58 <2E−4 Metabolism of xenobiotics −0.58 <2E−4 Glutathione metabolism −0.54 <2E−4 Fatty acid metabolism −0.54 <2E−4 Folate biosynthesis −0.45 <2E−4 Phagosome −0.35 <2E−4

60

3.5 Figures

Figure 3.1 Expression summary (A), dendrogram (B), and Venn diagram (C) showing degree of similarity between the D and CD groups. In A and B, the criteria for differentially expressed genes was false discovery rate (FDR) < 0.05. In C, the length of each branch indicates the relative distance between two nodes. C, control; D, desiccation; CD, cryoprotective dehydration

61

Figure 3.2 Pathway diagrams illustrating autophagy-related genes (A) Up regulation of autophagy-related genes. (B) Down-regulation of carbohydrate metabolism and ATP synthesis. Green boxes indicate significant up-regulation, red boxes indicate significant down-regulation, and gray boxes indicate no significant change in expression. Gene abbreviations are provided in SI Methods. Only the results for the closest homolog to each D. melanogaster gene (determined by BLAST) are included.

Consecutive arrows indicate steps where intermediate reactions are not pictured or the intermediate reactions are unknown.

62

Figure 3.2

63

Figure 3.3 Similarity between gene expression profiles Venn diagrams (A and B) and dendrogram (C) showing degree of similarity between the gene expression profiles of the Antarctic midge B. antarctica (Ba) and the Arctic springtail M. arctica (Ma) in response to desiccation (D) and cryoproective dehydration

(CD). The numbers of shared and unique up-regulated genes are depicted in A, whereas the numbers of shared and unique down-regulated genes are depicted in B. In C, hierarchical clustering was conducted on the log fold change values for each orthologous gene in each sample.

64

Chapter 4: Sequencing the Genome of the Flesh Fly, Sarcophaga bullata, Provides a Platform for Physiological Research

Abstract

The flesh fly, Sarcophaga bullata, is a well-established model organism for diapause, development, and stress tolerance and an emerging model for the study of diapause epigenetics. Here we present the assembled genome of this important species and highlight surprising differences between it and other closely related dipterans. We predicted 14,375 protein coding genes and found enrichment of many Gene Ontology categories, some of which relate to stress tolerance and epigenetics. A comparison of life stages and tissues revealed some tissue and life stage differences in transcripts.

4.1 Introduction

Sarcophaga bullata Parker (Diptera: Sarcophagidae), a flesh fly widely distributed across North America (Byrd and Castner, 2009), gives birth to active first instar larvae that are deposited on carrion. This life style exposes the larvae to a plethora of stresses including anoxia, temperature extremes, and pathogens. S. bullata and a sister species, S. crassipalpis, are large flies that are easy to rear in the laboratory, making them useful models for diapause, cold tolerance, and other stress responses for more than 40

65 years (Denlinger, 1972). Substantial literature is devoted to understanding the fly’s diapause (Denlinger 1972, ; Henrich and Denlinger, 1982; Rinehart et al. 2007) seasonal acclimation (Adedoken and Denlinger 1984, Chen et al., 1987 #2), rapid cold hardening

(Lee et al., 1987; Chen et al., 1987; Michaud and Denlinger 2006), and acute cold stress

(Joplin et al. 1990; Chen et al., 1990; Yocum et al., 1994). S. bullata also offers a potent model for probing the basis for an interesting maternal effect regulating diapause. If a female of S. bullata enters pupal diapause none of her progeny are capable of entering diapause, even if reared in a strong diapause-inducing environment (Henrich and

Denlinger, 1982, Rockey et al.,1991). This maternal effect has the evolutionary advantage of allowing adults to emerge early in the spring, while preventing her progeny from interpreting the short daylengths that prevail at that time as a signal for diapause entry. Later generations are fully capable of responding to short daylengths as a signal for diapause entry. Mounting evidence suggests an important role of epigenetics in regulating this maternal effect in S. bullata (Reynolds et al., 2013).

In addition, S. bullata has been a key species used for pioneering experiments on insect endocrinology, including discovery of the neuropeptide bursicon (Fraenkel and

Hsiao, 1964), the multiple endocrine factors regulating the complex behavior associated with puparium formation (Zdarek and Fraenkel, 1970; Zdarek and Fraenkel, 1972), as well as hormones associated with reproduction (Bylemans et al., 1994). S. bullata has also featured prominently in experiments in neurobiology (Mitchell and Itagaki, 1992;

Mitchell et al., 1999)

66

The fact that S. bullata is a favored host of the jewel wasp Nasonia vitripennis

(Desjardins et al., 2010; Daneels et al., 2013; Rivers et al., 2011) adds further experimental importance to S. bullata as a model species. Combining the sequencing of

S. bullata with genomic information on N. vitripennis offers a powerful platform for studying host-parasitoid interactions (Werren et al., 2010).

The close association of flesh flies with garbage, rotting carcasses, and feces suggests these flies may serve as mechanical vectors of disease (Graczyk et al., 2001).

Cases of myiasis in humans, domesticated animals, especially sheep, are well known

(James, 1947; Shiota et al., 1990; Morris, 1987), ranking these flies as a minor, but sometimes significant species of medical and veterinary importance.

We anticipate that the sequenced and annotated genome of S. bullata that we present here will facilitate advances using this model system for a range of research topics including diapause, endocrinology, stress tolerance, epigenetics, and parasitology, among others.

4.2 Methods

4.2.1 Source of Flies

Flies used in this project originated from a colony of Sarcophaga bullata collected in Columbus, Ohio by the Denlinger laboratory at Ohio State University and subsequently maintained in culture by Carolina Biological Supply Co. (Burlington, NC).

Samples prepared by the Bachtrog laboratory at the University of California, Berkeley, were purchased directly from Carolina Biological Supply and are hereafter referred to as

67 the Carolina strain. Samples prepared by the Werren laboratory at the University of

Rochester also were derived from flies purchased from Carolina Supply Company, but they were maintained for several years in the Werren laboratory prior to use in this study, and are hereafter referred to as the Werren strain.

4.2.2 Sequencing

DNA from a single male and a single female of the Carolina strain were extracted with DNeasy Blood and Tissue kit (Qiagen, Valencia, CA) in the Bachtrog laboratory;

Illumina library preparation and sequencing was completed at the Beijing Genomics

Institute.

DNA from three pooled females of the Werren strain was extracted with DNeasy

Blood and Tissue kit in the Werren laboritory. Illumina library preparation and sequencing was completed at University of Rochester Genomics Research Center and resulted in 21 billion raw bases.

DNA was also extracted from the Werren strain using a Gentra Puregene Tissue kit (Qiagen) with a modified protocol in the Denlinger lab. In short, our protocol differed from the manufacturer’s protocol in the following ways: all references to vortexing in the original protocol were replaced with mixing by tube inversions; incubation with occasional mixing was replaced with incubation and continual mixing by utilizing a hybridization oven; DNA was precipitated using several small aliquots of isopropanol chilled to -20°C instead of a single aliquot of room-temperature isopropanol. An aliquot of DNA was run on a 0.1% agarose gel to check for degradation. DNA was quantified

68 utilizing a Qubit 2.0 Fluorometer and double stranded DNA high sensitivity kit (Life

Technologies Grand Island, NY) following the manufacturer’s instructions. Samples were sent to the Duke Genome Sequencing and Analysis Core Resource (Durham, NC) for library preparation and sequencing of Fifty SMRT cells.

4.2.3 Quality Control

PacBio reads were split into sub-reads and filtered for quality score at the Duke facility. Sub-reads were further filtered by size. FastQC was used throughout to monitor quality control of Illumina reads. FastQC identified two issues affecting quality of the raw reads. First, there was an unexpected A/T/G/C distribution in the first few 5’ bases of reads obtained from BGI; this was addressed by trimming off these bases. Second, genomic reads obtained from the Werren strain had an unexpectedly high concentration of certain k-mers at particular places along the read. This was addressed with a custom perl script that checked for the offending combinations on each read, and if found, trimmed the 5’ end of that read. Reads were further filtered for primer contamination, quality, and size with Trimmomatic (Bolger, Lohse et al. 2014).

In k-mer correction, the k-mers from reads are created and counted. K-mers that occur at lower than expected frequency are often the result of sequencing errors. It is then possible to correct, trim, or filter reads containing these rare k-mers. Different k-mer filtering approaches were tried with the k-mer filtering program supplied with

SOAPdenovo and were assessed based on the quality of the assembly produced (see

69 section 4.2.4). The assembly that was chosen was created from the reads filtered by counting the k-mers of length 19 produced from the Werren reads.

4.2.4 Genome Assembly

Utilizing the reads from three filtering approaches, three assemblers, and different settings, over 250 assemblies were created. Many were discarded based on size and continuity of the assembly. Reapr (Hunt et al., 2013) was used to examine differences among the retained assemblies and to break possible miss-assemblies. One short read assembly was chosen based on the number of “error free bases” as reported by Reapr.

That assembly was produced by the assembler SOAPdenovo2 (Luo et al., 2012) with the following settings: [-d 1 -k 65] during sparse pregraph phase, [-R] during contig phase, [- k 27] during map phase, and [-F -V] during the scaff phase. The chosen assembly was filtered for vector contamination using vecscreen. It was then improved with PBJelly

(English et al., 2012) and the PacBio reads using default settings.

4.2.5 Annotation

Gene annotation was accomplished using the MAKER annotation pipeline

(Cantarel et al., 2008) to map protein homology data, expressed sequence tag evidence and ab initio gene predictions to the draft genome. Protein homology data were provided by Swiss-prot (The UniProt Consortium, 2015). To avoid spurious matches to repetitive regions of the genome, RepeatMasker was used to mask low-complexity regions (Smit et al., 1996-2010). In addition to the included libraries, a custom repeat library for use with

70

RepeatMasker was created with RepeatModeler (Smit and Hubley 2008-2015),

RECON(Bao and Eddy, 2002), RepeatScout (Smit and Hubley, 2008-2015), and TRF

(Smit and Hubley, 2008-2010). Filtered RNA-sequencing reads were mapped to the genome with Bowtie2 (Langmead and Salzberg, 2012), junctions were mapped with

TopHat (Trapnell et al, 2009), and putative transcripts were assembled with Cufflinks

(Trapnell et al., 2010). The output from TopHat and Cufflinks were converted into gff files and passed to Maker as expressed sequence tag evidence. An iterative approach with three rounds of training was used with MAKER and the training of the ab initio predictors SNAP (Korf, 2004) and AUGUSTUS (Stanke et al, 2006). For the first round,

SNAP was not used and the included ‘fly’ hidden Markov model was used in

AUGUSTUS. In subsequent rounds, gene models predicted in the previous round of

MAKER were used to generate hidden Markov models for SNAP and AUGUSTUS.

Functional annotation was accomplished with Blast2GO (Conesa et al., 2005;

Conesa and Gotz, 2008). Transcripts predicted by MAKER and the top 20 BLAST hits

(blastx, NR, e-value < 10-5) from the Swiss-prot database were loaded into Blast2GO filtering out hits from the same species. GO terms were assigned based on BLAST hits and InterProScan results (Burge et al., 2012) and reduced to GOslim terms.

4.2.6 Comparative Genomics

The gene models produced here were compared to a genereal dipteran set comprised of Drosophila melanogaster, Glossina morsitans, domestica, and

Lucilia cuprina. To facilitate this comparison all gene models were assigned GO terms in

71 the same way as S. bullata. GO enrichment analysis of S. bullata genes was accomplished with a Fisher’s exact test as implemented in Blast2GO.

4.2.7 Library Specific Expression

RNA reads were mapped to the gene models produced by MAKER using TopHat.

Hits were sorted with samtools (Li et al., 2009) and counted with htseq (Anders et al.,

2014) utilizing the intersection-strict mode. A gene model was considered unique to a library if more than 0.5 reads mapped to it per million reads in the library and no other libraries had any reads that mapped to it.

4.3 Results and Discussion

4.3.1 Sequencing and Assembly

Illumina sequencing resulted in 428 million reads totaling 40 billion bases (67X coverage). About 83% of the reads were retained after filtering (Table C.1). PacBio sequencing yielded 20 million size-selected reads totaling 14 billion bases (23X coverage).

Different assembly protocols resulted in short read assemblies (SRA) ranging in size from 230 million bases to 3.8 billion bases, i.e. 39% - 630% of the 593 million bases measured by flow cytometry (Meaghan et al. 2014) which are divided among 2n = 12 chromosomes (Bultman and Mezzanote, 1987). Reapr reported surprisingly large differences in quality among SRAs, even for similar approaches (Table C.2). For example, Reapr reported 84% error free bases (efb) in the SRA that was chosen, but

72 reported 56% efb in an SRA whose only difference was how the reads were filtered and

59% efb in one whose only difference was that the parameter k was two lower. This reinforces the idea that experimentation is needed to find the best combination of parameters for a good assembly (Bradnam et al., 2013; Salzberg et al., 2011).

The use of Reapr was a good investment of time and computational resources. In addition to providing quality statistics that helped select the best SRA from many seemingly similar ones, Reapr also broke and trimmed the SRAs at sites of likely misassembly, thus resulting in substantive changes to the chosen assembly. Trimming reduced the number of non-gap bases from 439Mbp to 427Mbp. Reapr broke many scaffolds, increasing their number from 328Kbp to 345Kbp and reducing the N50 from

11.5Kbp to 9.3Kbp (Table C.3).

The PacBio reads were used with PBJelly to fill gaps, extend contigs and further scaffold the contigs. PBJelly provided substantial improvements in size and continuity.

The number of non-gap bases was increased from 427 Mbp to 519 Mbp and the N50 was increased from 9.3Kbp to 29.5Kbp (Table C.3).

To help reduce sequencing contamination the scaffolds were screened against the

UniVec database maintained by NCBI. 117 of the sequences had suspected contamination and were either trimmed or removed altogether. This draft was used for annotation. Since scaffolds less the 1k are of little use during annotation, they were ignored for further analysis.

To assess quality of the assembly, two different programs were used to scan for orthologs common to eukaryotes. Both found a high percentage of orthologs, thus

73 indicating a well-assembled genome including most of the protein coding regions.

BUSCO found 96% of the searched orthologs. There was evidence for 4% duplication, a relatively small percentage given the challenge of assembling multiple individuals into one assembly. Cegma reported finding partial matches for >98% of the orthologs and full length matches for 92% of the orthologs.

4.3.2 Annotation

To avoid spurious matches to repetitive regions, RepeatMasker was used to soft mask repetitive regions of the genome. The “melanogaster” repeat library failed to mask any retroelements or DNA transposons. A custom library masked 230K retroelements and 127K DNA transposons. Including interspersed repeats, simple repeats, and other types of repetitive regions a total of 148Mbp (31%) were masked (Table C.4). These results line up well with D. melanogaster (29%) (http://www.repeatmasker.org/ ) and

Lucilia cuprina (33%) (Anstead et al., 2015) than Musca domestica (52%) (Scott et al.,

2014). It is unclear if this is further evidence of a unique evolutionary trajectory of M. domestica, as proposed by Scott et al. (2014), or a symptom of higher sensitivity / lower specificity of WindowMasker (Morgulis et al., 2006) since wildly different results have been reported when WindowMasker and RepeatMasker have been run on the same genome (Table 4.1).

We predict 14,375 protein coding genes. Summary statistics can be seen in Table

4.2. This is similar to the 13,919 protein coding genes found in D. melanogaster (dos

Santos et al., 2015) and 14,180 protein coding genes predicted in M. domestica (Scott et

74 al., 2014). The number of exons per gene are also similar between S. bullata (4.6) and M. domestica (4.4). The most interesting difference is that introns in M. domestica are on average about twice as long (Scott et al., 2014). Intron length in S.bullata is closer to that of D. melanogaster and Glossina morsitans in spite of its genome being closer in size to

M. domestica (Table 4.3).

4.3.3 Gene Ontology

The S. bullata transcripts had more Gene Ontology (GO) terms assigned to them on average then the dipteran set. This can be seen in the level 2 summary in Table C.5.

The most common GO biological processes were cellular process, single-organism process, and metabolic process. The most common GO molecular functions were binding, catalytic activity, and transporter activity. The most common GO cellular components were cell, organelle, and membrane. These top GO terms were identical between S. bullata and other members of the dipteran set, and indeed much of the entire summary is similar. The only substantial difference is the proportion of genes assigned each term.

All of the significantly enriched GO terms had more genes in S. bullata than expected given the dipteran set (Table C.6). The most interesting of these enriched terms revolve around the stress tolerance axis. Response to stress, cell death, homeostatic process, autophagy, aging, and protein folding are all positively enriched. Given the stress-filled niche occupied by the flesh fly, it is not surprising that S. bullata has evolved substantial defenses. An enriched immune system process is beneficial because a life

75 cycle that brings S. bullata into contact with a carcass also frequently brings it into contact with pathogens. Many epigenetic changes are accomplished by altering the packaging of DNA. Interestingly, some GO terms related to DNA packing are enriched:

DNA binding, histone binding, helicase activity, and organization.

4.3.4 Library Specific Expression

Several genes were expressed in only one of our RNA libraries (Table C.7).

Unfortunately, most of these genes were not assigned descriptions nor Gene ontology terms. Neither of the two female-specific genes were assigned descriptions. For some genes it is hard to determine the biological relevance of the assigned physiological function. For example, Sbullata00012956-RA, a testes library-specific gene, is similar to polycystin-2, a calcium permeable ion channel, but the meaning of this remains unclear.

Both of the larval-specific genes that were assigned descriptions are biologically relevant. Sbullata00013305-RA has high similarity to sp24d, a serine endopeptidase.

Larvae turn on a special endopeptidase to help break down their meat diet, although sp24d in Anopheles gambiae does not appear to be developmentally specific (Han et al.,

1997). Sbullata00013747-RA is similar to lcp-22, a cuticular protein found in the larvae of Bombyx mori.

Both of the ovary-specific genes are developmental sequence-specific transcription factors. Sbullata00007528-RA is similar to ca in D. melanogaster, a transcription factor that is involved in determination of cells in the nervous system.

Sbullata00010051-RA is similar to ftz which is required for body segmentation during

76 embryogenesis. It is unusual that ftz was found in the ovaries library and not in the ovaries and embryos library. Its presence in the ovaries library may point to substantial differences in the way segmentation is accomplished in S. bullata and D. melanogaster

(Ingham, 1988). In D. melanogaster the ftz gene is only expressed during early development (Kuroiwa, 1984). The most likely explanation for why the ftz homolog was not found in the ovaries and embryos library is that sampling was possibly not done during the critical window for expression.

Both of the identified genes specific to the ovaries and embryos library are protein catabolism enzymes. Sbullata00009475-RA is most similar to sp34 a venom serine protease. The connection between sp34 and embryological development is unclear, but

Sbullata00009475-RA is also similar to CG3355 in D. melanogaster. CG3355 is also a serine protease and has also been noted during embryological development (Fisher et al.,

2012). Sbullata00014354-RA is similar to carboxypeptidase b from the noble crayfish,

Astacus astacus.

4.4 Conclusions

We present the assembled and annotated genome of Sarcophaga bullata, a medically relevant species that is also an important model organism for stress tolerance.

The assembly of 522 Mbp represents 88% of the 593 million bases measured by flow cytometry (Meaghan et al. 2014). We predicted and analyzed 14,375 protein-coding genes, genes that provide insights into the development and evolution S. bullata. This

77 platform will enhance future studies of diapause, stress tolerance, parasitology and diapause epigenetics.

78

4.5 Tables

Table 4.1 Comparison of repeat masking Comparison of the repetitive content as masked by RepeateMasker and WindowMasker.

Source: Dm, Ag, Am: http://www.repeatmasker.org/; Md: Scott et al., 2014; Lc: Anstead et al., 2015.

Species RepeateMasker (%) WindowMasker (%) Drosophila melanogaster 28.61 28.92 Anopheles gambiae 14.1 21.52 Apis mellifera 6.21 40.93 Musca domestica 52 Lucilia cuprina *57.8 Sarcophaga bullata 31.15

79

Table 4.2 Summary of genic content Mean Count (#) (bp) Total (bp) Gene 14375 9814.97 141090148 5'-UTR 13741 274.63 3773750 Exon 66485 421.85 28046913 Intron 52110 1989.80 103688416 3'-UTR 10594 526.81 5581069

80

Table 4.3 Comparison of exon and intron content Source: Dm: dos Santos et al., 2015; Gm: International Glossina Genome Initiative,

2014; Md: Scott et al., 2014; Lc: Anstead et al., 2015.

Exon Exon Intron Intron Species Number Length (bp) Number Length (bp) Drosophila melanogaster 77,682 539 58,537 1,700 Glossina morsitans 63,000 475 52,000 1,600 Musca domestica 67,886 431 52,875 3,889 Sarcophaga bullata 66,485 422 52,110 1,989 Lucilia cuprina 432 2,560

81

Chapter 5: Combined Transcriptomic and Metabolomic Approach Uncovers Molecular Mechanisms of Cold Tolerance in a Temperate Flesh Fly

Citation

This chapter has previously been published. The work was done in a collaborative setting and represents the work of several people. I led the data analysis, helped interpret the analysis, and helped prepare the manuscript.

Teets, N. M., Peyton, J. T., Ragland, G. J., Colinet, H., Renault, Hahn, D. A., &

Denlinger, D. L. Combined transcriptomic and metabolomic approach uncovers

molecular mechanisms of cold tolerance in a temperate flesh fly. Physiol

Genomics 44, 764-777 (2012).

Abstract

The ability to respond rapidly to changes in temperature is critical for insects and other ectotherms living in variable environments. In a physiological process termed rapid cold-hardening (RCH), exposure to nonlethal low temperature allows many insects to

82 significantly increase their cold tolerance in a matter of minutes to hours. Additionally, there are rapid changes in gene expression and cell physiology during recovery from cold injury, and we hypothesize that RCH may modulate some of these processes during recovery. In this study, we used a combination of transcriptomics and metabolomics to examine the molecular mechanisms of RCH and cold shock recovery in the flesh fly,

Sarcophaga bullata. Surprisingly, out of ~15,000 expressed sequence tags (ESTs) measured, no transcripts were upregulated during RCH, and likewise RCH had a minimal effect on the transcript signature during recovery from cold shock. However, during recovery from cold shock, we observed differential expression of ~1,400 ESTs, including a number of heat shock proteins, cytoskeletal components, and genes from several cell signaling pathways. In the metabolome, RCH had a slight yet significant effect on several metabolic pathways, while cold shock resulted in dramatic increases in gluconeogenesis, amino acid synthesis, and cryoprotective polyol synthesis. Several biochemical pathways showed congruence at both the transcript and metabolite levels, indicating that coordinated changes in gene expression and metabolism contribute to recovery from cold shock. Thus, while RCH had very minor effects on gene expression, recovery from cold shock elicits sweeping changes in gene expression and metabolism along numerous cell signaling and biochemical pathways.

5.1 Introduction

Due to a small body size and ectothermic nature, adaptations for surviving low temperature are a critical component of an insect’s physiology. In many cases, the

83 overwintering physiology of an insect limits its potential range, particularly in the face of a changing climate (Bale and Hayward, 2010). While insects undergo many gradual biochemical and physiological changes in anticipation of winter (Lee, 2010), they are also capable of responding to low temperature on much shorter time scales. In a process termed rapid cold-hardening (RCH), insects can significantly enhance their cold tolerance in response to brief (i.e., minutes to hours) ecologically relevant chilling exposure (Kelty and Lee, 1999; Lee et al., 1987). Furthermore, during recovery from a cold challenge, insects undergo a number of physiological changes to repair cellular damage, for example by synthesizing heat shock proteins (Hsps) to repair misfolded proteins (Colinet et al.,

2010).

In recent years, several studies have begun to unravel the cellular and molecular mechanisms associated with RCH. Evidence suggests that RCH is triggered at the cellular level by signaling events including p38 MAP kinase (Fujiwara and Denlinger,

2007) and calcium signaling (Teets et al., 2008). Downstream of these signaling events, several physiological changes contributing to RCH have been uncovered. In the flesh fly,

Sarcophaga crassipalpis, there is a slight elevation of the cryoprotectant glycerol in response to RCH, although the amount of glycerol appears to be too low to be a major driver of RCH (Lee et al., 1987). In both Drosophila melanogaster and S. crassipalpis,

RCH increases the proportion of unsaturated fatty acids in the cell membrane (Michaud and Denlinger, 2006; Overgaard et al., 2005), cf. (MacMillan et al., 2009), and inhibits apoptotic pathways to prevent cell death (Yi and Lee, 2011; Yi et al., 2007).

Additionally, a recent proteomics study indicated that three different proteins, including a

84 small Hsp, are more abundant in the brains of flesh flies exposed to RCH (Li and

Denlinger, 2008). However, despite these recent advances, many of the underlying mechanisms of RCH remain unknown.

In D. melanogaster, three separate microarray studies have measured gene expression either in response to artificial selection for cold resistance (Telonis-Scott et al., 2009) or in direct response to various cold exposures (Qin et al., 2005; Zhang et al.,

2011). These experiments elucidated many key players in the molecular response to cold, including the multitude of Hsps involved in cold recovery (Qin et al., 2005) and possible cross talk between environmental stress signals and immune pathways (Zhang et al.,

2011). While Qin et al. (Qin et al., 2005) intended to measure gene expression during

RCH, they allowed a 30 min recovery after hardening, so changes in gene expression could not be directly attributed to the hardening period. The only study to our knowledge to measure gene expression during cold hardening is Sinclair et al. (Sinclair et al., 2007), who measured the expression of five candidate genes in D. melanogaster during RCH.

They failed to detect any expression differences in their candidate genes during hardening

(although some were differentially expressed during recovery) and hypothesized that

RCH does not require the synthesis of new gene products. However, gene expression changes during RCH have yet to be examined on a genome-wide scale. Also, since D. melanogaster is considerably less cold tolerant than its temperate counterparts (Hoffman,

2010), similar studies in cold-adapted species are necessary to fully grasp molecular adaptations to low temperature. The completion of an expressed sequence tag (EST) library for S. crassipalpis (Hahn et al., 2009) has made such work possible for

85 sarcophagid flies, which have long been a model for cold tolerance research (Adedokun and Denlinger, 1984; Chen et al., 1991). Recent work has described transcriptional changes associated with overwintering dormancy in S. crassipalpis (Ragland et al.,

2010), but no large-scale transcriptomic studies of acute cold stress have been conducted in this group.

In this study, we explore the molecular mechanisms of RCH and cold shock recovery in Sarcophaga bullata using a combined transcriptomic and metabolomic approach. Our custom microarray platform allowed us to simultaneously measure the expression of ~15,000 transcripts in response to cold. Additionally, using a targeted GC-

MS approach, we tracked the levels of 35 metabolites in response to the same treatments.

Our experimental design permitted us to address the following hypotheses: 1) RCH causes changes in gene expression and/or metabolism during the hardening period; 2)

Recovery from cold shock elicits changes in gene expression and/or metabolism to repair cellular damage; and 3) RCH conditions alter gene expression and/or metabolite composition during recovery from cold shock. Our results indicate that while differential gene expression is not a major contributor to RCH, RCH does have a significant effect on specific metabolic pathways. Furthermore, our results identify a number of genes and metabolites that are rapidly elevated during recovery from cold shock.

5.2 Methods

5.2.1 Animals

86

Flesh flies, S. bullata, were reared at 25°C and 16 h/8 h light-dark according to

Denlinger (Denlinger, 1972). Red-eye, pharate adults were used for all experiments.

5.2.2 Experimental Conditions

For the microarray experiments, pharate adult flies were exposed to the following temperature conditions: control (maintained at 25°C, Fig. 5.1A), RCH (exposed to 0°C for 2 h, Fig. 5.1A), cold shock +2 h recovery (CS+2R: ~10°C for 2 h followed by 25°C for 2 h; Fig. 5.1B), and RCH + cold shock + 2 h recovery (RCH+CS+2R: 0°C for 2 h, -

10°C for 2 h, followed by 25°C for 2 h; Fig. 5.1C). For metabolomics experiments, flies were exposed to the same four treatments as well as two additional treatments with 24 h recovery (CS+24R and RCH+CS+24R; Fig. 5.1, B and C). Immediately after treatment, flies were snap-frozen in liquid nitrogen and stored at +70°C until RNA and metabolite extraction. For the microarray experiments, we collected six biological replicates for each treatment, while the metabolomics experiments consisted of 10 replicates for each treatment.

5.2.3 Microarray Data Acquisition

For each RNA sample, four flies were removed from storage at -70°C and immediately homogenized together in 4 ml of Tri reagent (Ambion, Carlsbad, CA) with a ground glass homogenizer. From each homogenate, 1 ml was removed, and RNA was purified using the RiboPure Kit (Ambion) according to the manufacturer’s protocol. Total

RNA was quantified using a NanoDrop spectrophotometer (Thermo Fisher Scientific,

87

Waltham, MA), and the integrity was checked on an Agilent 2100 Bioanalyzer (Agilent,

Santa Clara, CA). Starting with 500 ng of total RNA, Cy3- and Cy5-labeled

cRNA was generated with the Agilent Low RNA Input Linear Amplification Kit

(Agilent). The labeled samples were hybridized to custom Agilent 4 X 44K arrays containing a previously designed probe set from a closely related species, S. crassipalpis

(Regland et al., 2010). Despite using a probe-set designed for a different species, MA plots suggested a distribution of fold change and intensity values similar to those observed when similar arrays were hybridized to S. crassipalpis. Also, targeted analyses in our lab have revealed very little sequence difference between S. crassipalpis and S. bullata. Arrays were scanned on an Agilent G2505B scanner, and data were extracted using Feature Extraction 9.5 software (Agilent). For each treatment comparison, six biological replicates were conducted; however, several failed to pass quality control; hence we report the results from four independent arrays for each comparison. Because the 16 arrays did not contain the same four replicates of each treatment, our dataset includes five replicates of control flies, four replicates of RCH flies, five replicates of

CS+2R flies, and five replicates of RCH+CS+2R flies. The hybridization scheme is depicted in Fig. 5.1D.

5.2.3 Metabolomics Data Acquisition

Individual frozen flies were homogenized in 750 ưl of cold (~20°C) 2:1 methanol- chloroform using a tungsten bead-beating apparatus (Retsch MM301; Retsch, Haan,

Germany) at 25 Hz for 1.5 min. After homogenization, 500 ưl ice-cold water was added

88 to each sample to separate an upper aqueous phase from a lower nonpolar phase. Two aliquots (60 and 180 ưl) of the upper phase were transferred to chromatographic vials and vacuum dried using a Speed Vac Concentrator (MiVac; Genevac, Ipswich, UK). The 60

_l aliquots were used for the quantification of the few metabolites from the larger volume sample that surpassed the upper detection limit of the equipment. The dried samples were doubly derivatized by first suspending the sample in 30 ưl of 20 mg/ml methoxyaminehydrochloride (Sigma-Aldrich, St. Louis, MO) and heating for 90 min at

40°C, followed by the addition of 30 ưl of N-methyl-N-(trimethylsilyl) trifluoroacetamide (Sigma-Aldrich) and heating for 45 min at 40°C. This on-line derivatization process was conducted with a CTC CombiPal autosampler (GERSTEL,

Mülheim an der Ruhr, ), which standardized the derivatization process and ensured that each sample was derivatized for an identical amount of time.

To identify and quantify metabolites, samples were injected into a GC-MS consisting of a Trace GC Ultra chromatograph and a Trace DSQII quadrupole mass spectrometer (Thermo Fischer Scientific). We injected 1 ưl of each sample into the GC using the splitless mode (Hanson and Reshef, 1997; Abdrakhamanova et al., 2003), and the samples were gradually heated from 70 to 310°C as follows: the oven temperature ranged from 70 to 170°C at 5°C/min, from 170 to 310°C at 7°C/min, and remained for 3 min at 310°C. We used a fused silica column (TR5 MS, I.D. 25 mm, 95% dimethyl siloxane, 5% phenyl polysilphenylene-siloxane), and helium at a rate of 1 ml/min as the gas carrier. MS detection was achieved using electron impact. Ion source temperature was set to 250°C, and the MS transfer line to 300°C. The order of injection was

89 randomized to prevent bias due to machine drift. Compounds were identified in the MS using a selective ion mode (electron energy: ~70 eV) to only search for ions that matched metabolites in our database of 60 pure reference compounds. The quantity of each metabolite was determined using the quadratic calibration curves drawn from pure compounds run at 11 different concentrations ranging from 10 to 3,000 ưM.

Concentrations were also corrected relative to an internal standard, arabinose, to correct for any sample loss during extraction or injection. Finally, all concentrations were divided by the fresh mass of the individual.

5.2.4 Data Analysis

Microarray data were processed and normalized using the limma package for R

(Smyth, 2004). The data were background corrected and normalized within arrays using a lowess approach. Additionally, to standardize intensities across arrays, we conducted a between array normalization with the “scale” method. After normalization, data integrity was checked using a combination of MA plots, box plots, and red-green intensity plots.

Replicate probes were collapsed by taking the average M value for each spot. To find differentially expressed probes, we used the limma pipeline to fit a linear model and compute empirical Bayes statistics from the linear contrasts. P values were adjusted using the Benjamini and Hochberg method (Benjamini and Hochberg, 1995) to control the false discovery rate (FDR). For the top 150 differentially expressed genes, we constructed a heat map of the expression ratios using JMP 9 (SAS, Carry, NC). The

90 microarray data are deposited in the National Center for Biotechnology Information Gene

Expression Omnibus database, accession number GSE36483.

To conduct the multivariate analyses described below, the normalized probe intensities for each sample were averaged across replicate probes, and the resulting intensities were averaged across technical replicates of the same individual. The intensities were log2 transformed prior to data analysis. Principal components analysis

(PCA) was conducted using the R package prcomp, while hierarchical clustering of the phenotypic classes was computed using the Ward method in JMP 9. To test for enriched functional categories in our dataset, Sarcophaga ESTs were mapped to their D. melanogaster homolog, and lists of significantly (FDR <0.05) differentially expressed genes were submitted to the DAVID functional annotation database

(http://david.abcc.ncifcrf.gov) (Huang et al., 2009). Specifically, we used the functional annotation clustering tool in DAVID, which finds overrepresented Gene Ontology (GO) terms (Ashburner et al., 2000) and places these GO terms into nonredundant clusters. We also tested for enriched KEGG pathways using the R package GSA (Efron and

Tibshirani, 2007). Additionally, gene sets were generated from lists of differentially expressed genes from previous microarray studies, and these a priori lists were also tested for enrichment using GSA. Whereas DAVID simply tests for enrichment in a defined list of genes, GSA takes into account expression values for each probe in calculating the enrichment score. For GSA, we used full, nonredundant log2 transformed data and performed 1,000 permutations to estimate the FDR.

91

To analyze the metabolomics data, metabolite contents expressed as nmol/mg fresh mass were first log2 transformed. Metabolite quantities were compared across the six treatments using ANOVA and a pooled t-test in JMP 9, and the resulting P values were adjusted using the Benjamini and Hochberg method (Benjamini and Hochberg,

1995) to control the FDR at 0.05. PCA and hierarchical clustering on the phenotypic classes were conducted as before. To identify metabolic pathways associated with our treatments, metabolite pathway enrichment analysis was conducted using MetaboAnalyst

(http://www.Metaboanalyst.ca), a webbased platform for metabolomics data analysis

(Xia et al., 2009).

5.3 Results and Discussion

5.3.1 RCH Significantly Enhances Cold Tolerance

To verify that our strain of S. bullata exhibited the RCH phenotype, we measured the effects of RCH on the cold tolerance of pharate adult flies (i.e., flies that have completed ~75% of adult development but have yet to eclose from the puparium) (Lee et al., 1987). In this experiment, the criterion for survival was successful eclosion, which occurred ~5–6 days after the experiment. While ~80% of flies survived a 2 h cold shock at -8°C, only 20% survived at -9°C, and none successfully emerged following 2 h at -

10°C (Figure 5.2). However, when flies were exposed to 0°C for 2 h prior to being cold shocked, nearly all survived -9°C and ~50% survived -10°C. Thus, for these experiments, we selected -10°C as our cold shock temperature, since at this temperature RCH allowed significant survival at a temperature that is normally 100% lethal. While no flies emerge

92 as adults after experiencing a -10°C cold shock, all were still able to continue adult morphogenesis, so flies sampled 2 h after cold shock were clearly still alive.

5.3.2 RCH Has Very Little Effect on the Transcriptome

Our treatment design and microarray hybridization scheme (Figure 5.1) allowed us to test two separate hypotheses regarding the effects of RCH on gene expression: 1)

RCH induces or turns off specific genes during the hardening period (i.e., the 2 h at 0°C), and 2) RCH alters the transcriptional signature during recovery from cold shock.

However, we found very little evidence in support of either hypothesis. In the Control vs.

RCH comparison, no ESTs were differentially expressed between the two groups (Table

5.1). In addition, while numerous ESTs were differentially expressed during recovery from cold shock (see below), RCH had little effect on gene expression during recovery; a direct comparison of the CS+2R and RCH+CS+2R comparisons showed that only five

ESTs were differentially expressed between the two treatments, none of which differed by >33%. Of these five ESTs, three were downregulated cell signaling genes

(Arrestin-1, TNF-receptor-associated factor 4, and CG10737), perhaps indicating an

RCH-mediated shutdown of certain cell signaling events. In particular, TNF-receptor- associated factor 4 is a positive regulator of apoptosis (Cha et al., 2003); thus, significant downregulation of this transcript in the RCH+CS+2R group relative to the CS+2R group may reflect inhibition of apoptosis by RCH (Yi et al., 2007).

While RCH had little effect on a gene-by-gene comparison, we also conducted several multivariate analyses in an attempt to reveal subtle differences in gene expression

93 caused by RCH. A heat map diagram of the top 150 differentially expressed ESTs, as determined by ranking the F-statistics for each probe, separated our four treatment groups into four distinct clades, indicating that RCH causes some differences in expression patterns among the most labile genes (Figure 5.3). However, when the entire dataset was considered, we were once again unable to detect differences attributable to RCH. Both

PCA and hierarchical clustering of the entire dataset produced similar results, in that the phenotypes form two distinct clusters: a cluster consisting of control and RCH samples, and a cluster consisting of CS+2R and RCH+CS+2R samples (Figure 5.4). Finally, using gene set analysis (GSA), we attempted to discover subtle, coordinated changes in gene expression in response to RCH to 1) test for enrichment of specific KEGG pathways and

2) test whether there were detectable similarities between our dataset and expression patterns from previous microarray studies of cold and other stressors using a priori gene lists from those published reports (Efron and Tibshirani, 2007). Because GSA tests for enrichment across entire pathways or lists of genes, it has the capability to detect differential expression of pathways even in the absence of major changes in individual genes. However, when comparing control vs. RCH and CS+2R vs. RCH+CS+2R we found no gene sets were enriched. Thus, at both the individual gene and pathway level,

RCH had very little effect on gene expression, both during the hardening period and during recovery from cold shock.

In plants, a number of cold-related genes are upregulated within minutes of transfer to low temperature (Thomashow, 1999), so we suspected some genes may be differentially expressed during RCH. However, in D. melanogaster, RCH does not

94 appear to be associated with changes in gene expression (Sincalir et al., 2007) and can occur even when protein synthesis is blocked with cycloheximide (Misener et al., 2001).

Because RCH occurs so rapidly, there may not be ample time to synthesize new gene products during hardening, particularly at low temperatures. This idea is supported by a proteomics study of the wasp Aphidius colemani, where relatively few proteins were upregulated during cold exposure, while nearly 1⁄3 of the proteome changed during recovery (Colinet et al., 2010). Similarly, in the linden bug, Pyrrhocoris apterus, hsp70 is expressed at very low levels during cold exposure but is rapidly upregulated upon return to room temperature (Kostál and Tollarova-Borovanska, 2009). However, even more puzzling is the failure of RCH to alter expression patterns during recovery from cold shock. RCH triggers a number of signaling pathways, including MAP kinase signaling (Fujiwara and Denlinger, 2007) and apoptosis signaling (Yi and Lee, 2011), thus we hypothesized this would be reflected in the gene expression profiles upon return to ambient temperature. However, of the ~1,400 differentially expressed ESTs during recovery from cold shock (Table 5.1; see discussion below), only five ESTs were significantly changed by RCH. These results indicate that the existing cellular machinery is sufficient to carry out RCH, suggesting that second messenger systems and other posttranslational processes are the likely drivers of RCH.

5.3.3 Recovery from Cold Shock Has a Large Effect on the Transcriptome

While RCH had little effect on gene expression in S. bullata, we identified 1,378

ESTs that were differentially expressed in the Control vs. CS+2R comparison (Table 5.1,

95

Table C.1).1 Of these, 111 were either 1.5X up- or downregulated. The results for the

Control vs. RCH+CS+2R comparison were remarkably similar; 1,525

ESTs were differentially expressed, including 134 that were 1.5X up- or downregulated

(Table 4.1, Table C.2). This, in combination with the results discussed above, indicates that recovery from cold shock is the primary driver of differential gene expression in our treatments. Because the gene expression profiles of CS_2R and RCH_CS_2R flies were largely similar, we will discuss the results of the Control vs. CS+2R and Control vs.

RCH+CS+2R comparisons together but for simplicity will refer to specific results from only the Control vs. CS+2R comparison.

Using the DAVID functional annotation tool (Huang et al., 2009), we tested for enriched GO terms in our list of differentially expressed genes (Tables 5.2 and 5.3). Of note, we found several enriched GO terms related to cytoskeletal organization and cell shape. Because the cell membrane is one of the major sites of cold shock damage

(Cossins, 1983), recent evidence suggests that changes in the actin cytoskeleton are an essential component of cold-hardening and repair of cold damage. In mosquitoes, for example, two actin genes are upregulated in preparation for winter, and cold shock induces a reorganization of actin fibers in the midgut (Kim et al., 2006). Similar effects of cold on the cytoskeleton have been observed in plants (Abdrakhamanova et al., 2003), fish (Detrich et al., 1989), and even mammals (Al-Fageeh and Smales, 2006), suggesting a central role for the actin cytoskeleton during cold stress. In our dataset, we found differential expression of 55 ESTs related to the GO biological process “actin cytoskeleton organization,” indicating that cytoskeletal reorganization is also an

96 important component of cold shock recovery in S. bullata. Other noteworthy results from the DAVID analysis are 14 ESTs related to the heat shock response and 32 transcripts associated with programmed cell death, supporting the critical role of apoptotic cell death during cold shock injury (Yi and Lee, 2011; Yi et al., 2007).

We also identified several enriched KEGG pathways and a priori gene sets by using GSA (Tables 5.4 and 5.5). In particular, we identified several biochemical pathways (discussed below) and cell signaling pathways that were enriched during recovery from cold shock. Some of these cell signaling pathways, including Jak/STAT signaling and insulin signaling, have yet to be implicated in the response to cold.

Jak/STAT signaling does have known immune functions (Arbouzova and Zeidler, 2006), perhaps explaining in part the overlap between cold stress and immune signaling (Zhang et al., 2011). Others, such as Wnt signaling and dorso-ventral axis formation, are predominantly considered developmental pathways. However, recent research in

Drosophila is revealing that many embryonic and developmental signaling pathways are co-opted for other functions during later stages of development. For example, Wnt signaling participates in hindgut regeneration in adults (Takashima et al., 2008), while the primary regulators of dorso-ventral axis formation are also key regulators of the fungal immune response later in life (Lematire et al., 1996). One enriched signaling pathway that is particularly promising is phosphatidylinositol signaling; components of this signaling system have been implicated as regulators of apoptosis during oxidative stress in Drosophila (Terhzaz et al., 2010). However, further experiments are needed to validate the exact function of these signaling pathways during cold shock recovery in S. bullata.

97

In addition to enrichment of several KEGG categories, GSA also revealed congruence between our dataset and several other transcriptomic datasets from the literature. In particular, during recovery from cold shock, there was significant enrichment of genes involved in the Drosophila response to cold (Qin et al., 2005), with several genes being significantly upregulated in both studies (Table 5.6). In addition, there was strong enrichment of genes involved in the Drosophila response to hypoxia

(Liu et al., 2006), hyperoxia (Landis et al., 2004), oxidative stress (Girardot et al., 2004), and heat (Sorenson et al., 2005), suggesting that these different environmental stresses have overlapping transcriptional signatures. Indeed, cold exposure directly causes oxidative stress in insects (Laloutte et al., 2011), so this likely explains the similarities in gene expression between cold and various forms of oxygen stress. The genes responsible for much of this overlap were the heat shock proteins, which are known to be involved in a number of environmental stresses (Feder and Hofmann, 1999). Examples of other genes upregulated both in our cold shock recovery treatment and during other forms of environmental stress include: phosphoenolpyruvate carboxykinase (PEPCK), an important metabolic regulator (see below); hairy, a transcription factor that regulates metabolism during hypoxic stress (Zhou et al., 2008); and DGP-1, a translation elongation factor that functions during periods of oxidative stress (Girardot et al., 2004).

Tables summarizing the expression of these stress-related genes during cold shock recovery are provided in Tables D.3–D.6. Interestingly, a gene set derived from a recent study on repeated cold exposure in D. melanogaster (Zhang et al., 2011), which included a single cold exposure treatment similar to our CS+2R treatment, showed no significant

98 enrichment in our data. However, this study used a milder temperature regime and a longer recovery period (6 h), which could explain the lack of overlap.

5.3.4 RCH Has a Significant Impact on Several Metabolic Pathways

While RCH had no detectable effect on gene expression, we did observe several metabolic changes attributable to RCH. During the 2 h at 0°C, there was a significant increase in two glycolytic intermediates, glucose-6-phosphate (45% increase) and fructose-6-phosphate (9% increase; Figure 5.5, Table D.7); these were the only two metabolites that changed during RCH. This suggests a rapid shift from aerobic metabolism to glycolysis/gluconeogenesis during RCH, perhaps to begin the process of diverting carbon flow toward cryoprotectant synthesis (Kostál et al., 2004). Similar observations were made by Michaud and Denlinger ( 2007) and Overgaard et al. ( 2007) in S. crassipalpis and D. melanogaster, respectively, indicating that a shift toward glucose production may be a general feature of RCH. In addition, S. crassipalpis undergoes several other changes during RCH, including elevation of the cryoprotectants glycerol and sorbitol. However, it is worth noting that Michaud and Denlinger (2007) used 8 h at 4°C as their RCH treatment; thus, these flies had a much longer time at a milder temperature to carry out these biochemical changes.

In addition to changes in metabolism during RCH, RCH also altered the metabolic signature of flies during recovery from cold shock. During recovery from cold shock, we observed increases in numerous compounds, including nearly every amino acid, sugar, and polyol that we measured (Figure 5.5, Table D.7). While a number of

99 these changes may be adaptive to help repair cold injury (such as synthesis of cryoprotectants to protect damaged membranes and proteins), some of these changes may simply reflect significant protein breakdown and an inability to maintain homeostasis due to cold shock damage. However, for several compounds, RCH dampened the increase, perhaps reflecting a homeostasis-preserving function for RCH. For example, after a 2 h recovery from cold shock, flies exposed to RCH had 47% less glucose, 25% less glycerol, and 46% less sorbitol. These differences resulted in RCH+CS+2R flies having significantly lower total levels of both sugars and polyols than their CS+2R counterparts

(Fig. 5.5, Table D.7). In Drosophila, longterm cold acclimation helps preserve metabolic homeostasis following cold shock (Colinet et al., 2012), and it appears that short-term

RCH has a similar effect in Sarcophaga. This effect was not as pronounced after 24 h of recovery, but nonetheless this evidence suggests that RCH helps preserve homeostasis immediately following cold shock.

To help put the above metabolic changes into context, we also conducted metabolic pathway enrichment analysis on our metabolomics data set. In a similar manner to GSA, metabolic pathway analysis looks for coordinated changes in metabolites that belong to the same pathway. In the control vs. RCH comparison, several metabolic pathways were enriched, including glycolysis/gluconeogenesis, amino sugar and nucleotide sugar metabolism, and starch and sucrose metabolism (Table 5.7).

Interestingly, in the CS+2R vs. RCH+CS+2R comparison, despite a number of differences in individual metabolites, no specific pathway enrichment was detected, meaning that the observed metabolite differences were not coordinated along an entire

100 pathway. In contrast, after 24 h of recovery, there were several metabolic pathways that showed differences between CS+24R and RCH+CS+24R individuals, notably several pathways related to sugar and amino acid metabolism. One pathway that was enriched both during RCH and during recovery, the pentose phosphate pathway (also referred to as the hexose monophosphate shunt), was previously shown to provide some of the energy and reducing equivalents for cryoprotectant synthesis (Storey and Storey, 1990). Thus, the impact of RCH on this pathway may be particularly important for increased cold tolerance. While these changes need to be explored in more detail to determine their adaptive benefits, they do demonstrate that RCH exerts a direct effect on certain metabolic pathways, both during hardening and during recovery from cold shock.

5.3.5 Recovery from Cold Shock Elicits Sweeping Changes in Metabolite Content

Although RCH had substantial effects on the metabolome, the effects of cold shock and recovery on the metabolome were even more dramatic (Figs. 5.5 and 5.6,

Table d.7). Using both PCA and hierarchical clustering (Fig. 5.6), our metabolomics data forms three distinct clusters: a cluster consisting of control and RCH flies, a cluster consisting of CS+2R and RCH+CS+2R flies, and a cluster consisting of CS+24R and

RCH+CS+24R flies. Thus, similar to the gene expression data, recovery from cold shock was the major driver of differences in metabolite content. As mentioned above, recovery from cold shock caused an increase in almost every metabolite we measured; out of 24 total amino acids, sugars, and polyols in our dataset, 22 were elevated by 24 h of recovery from cold shock (Figure 5.5, Table D.7). Because nearly every compound

101 changed, the results of metabolic pathway analysis were not particularly informative, since nearly every pathway showed evidence of enrichment relative to control samples

(Table D.8). One interesting finding from these data is the presence of a multiple- component cryoprotectant system in S. bullata. While the importance of glycerol as a cryoprotectant has been well established in S. bullata (Yoder et al., 2006), sorbitol and inositol were both more abundant than glycerol in our samples. Of all the metabolites we measured, sorbitol showed the most dramatic changes; CS+24R flies showed a 96-fold increase in sorbitol levels compared with control, an increase in sorbitol content from 6.2

± 0.6 to 608 ± 66.9 pmol/mg. However, despite the dramatic accumulation of sorbitol and other cryoprotectants, polyol levels in response to cold shock were much lower than those observed in overwintering individuals, both within S. bullata and compared with other species. For example, diapausing pupae of S. bullata accumulate ~450X more glycerol than CS+24R flies in our study (Chen et al., 1991), while another temperate dipteran,

Eurosta solidaginis, accumulates sorbitol at levels ~250X greater than CS+24R flies

(Storey et al., 1981). Thus, while our data clearly demonstrate a multiple-component cryoprotectant system during recovery from cold shock, the actual concentrations of these compounds are much lower than in some overwintering insects. However, evidence suggests that polyol cryoprotectants may have specific protective functions in the cell, in addition to the colligative effects most often attributed to these compounds (Yancey,

2005).

5.3.6 Coordinated Changes in Gene Expression and Metabolism

102

While the transcriptomics and metabolomics data have been discussed separately up to this point, there were several points of congruence between the two datasets. First, the shift to gluconeogenesis during recovery from cold shock correlates with a 3.9-fold upregulation of PEPCK (Table D.1). PEPCK catalyzes the conversion of oxaloacetate to phosphoenolpyruvate, the rate-limiting step of gluconeogenesis (Hanson and Reshef,

1997). Interestingly, we also found evidence in our GSA that genes related to insulin signaling are enriched during recovery from cold shock (Tables 5.4 and 5.5). This enrichment of insulin signaling appeared to be driven by significant upregulation of protein kinase 61c and downregulation of gigas, two regulatory proteins that are linked to insulin signaling. Normally, insulin signaling is thought to inhibit expression of PEPCK

(Barthel and Schmoll, 2003), but this concept is primarily based on vertebrate research; in both S. crassiapalpis and the apple maggot, Rhagoletis pomonella, there is concurrent upregulation of both insulin signaling and PEPCK during overwintering diapause

(Ragland et al., 2010; Sinclair et al., 2007). Also, due to the relative lack of information concerning invertebrate insulin signaling (Wu and Brown, 2006), our a priori list only contains 11 insulin-related genes, thus our list is likely not comprehensive. Our combined

GSA and metabolite pathway analysis also revealed four pathways that were enriched at both the transcript and metabolite level (Table 5.8). One such KEGG pathway, valine, leucine, and isoleucine biosynthesis, was strongly enriched in both datasets. Also, the

KEGG pathway urea cycle and metabolism of amine groups was significantly downregulated during recovery from cold shock, thus contributing to the observed accumulation of amino acids (Tables 5.4 and 5.5). We hypothesize that increased amino

103 acid biosynthesis either serves a cryoprotective role during recovery from cold shock, is necessary to support the burst of protein synthesis during recovery from cold shock

(Joplin et al., 1990), or a combination of both. While previous studies have primarily focused on carbohydrates and low-molecular weight polyols in response to cold, recent research suggests that amino acids are indeed an important component of cold-hardiness in some insects (Kostál et al., 2011; Kostál et al., 2011). A second pathway that was enriched in both the transcriptome and metabolome was pyruvate metabolism, which serves as a key intersection point between carbohydrate and amino acid metabolism

(Kanehisa and Goto, 2000). Finally, two pathways, inositol phosphate metabolism and pentose and glucuronate interconversions, reflect the biosynthesis of the second most abundant polyol, inositol, and the three 5-carbon polyols in our dataset, ribitol, xylitol, and arabitol. Overall, these results show reasonably good agreement between the transcriptomic and metabolomics data, although these pathways should be further explored with a targeted approach to verify their role in cold shock recovery.

While our data showed relatively good agreement at the transcript and metabolite level, there were many instances when changes in metabolism were not reflected by changes in gene expression. This is not surprising, as there are many levels of biological organization between transcription and metabolite synthesis (Feder and Walser, 2005).

Thus, the transcriptomics data in particular should be taken with caution, because transcript levels may not be entirely representative of the physiological state of the organism. However, there are well-established cases where gene expression and metabolic endpoints are strongly correlated, as in the case of PEPCK discussed above

104

(Hanson and Reshef, 1997). Also, while transcriptomics data may not always reflect the physiological function of an organism, they can provide important clues as to which pathways are activated during times of stress, by identifying the downstream genes that are regulated by these pathways (Feder and Walser, 2005).

5.4 Conclusions

Our experiments allowed us to examine the effects of RCH on the transcriptome and metabolome both during the hardening period and during recovery from a subsequent cold shock. Despite its dramatic effect on cold tolerance, RCH had little effect on the transcriptome of S. bullata. Using several multivariate tools, we were unable to detect differences in gene expression attributable to RCH. At the very least, these results indicate that transcriptional regulation is not a major contributor to RCH. Instead, future research will focus on changes in protein phosphorylation and other signaling events that govern RCH. For example, p38 MAP kinase is rapidly phosphorylated during RCH

(Fujiwara and Denlinger, 2007), and there are likely numerous other signaling proteins, both upstream and downstream of p38, that have yet to be identified. In particular, we would like to identify the signaling mechanisms that drive the observed changes in metabolism attributable to RCH.

While the primary purpose of this study was to dissect the molecular mechanisms of RCH, a secondary objective was to discover pathways involved in recovery from cold shock. Indeed, while RCH had minor effects on gene expression and metabolism, an abundance of genes and metabolites were differentially expressed during recovery from

105 cold shock. Our results have allowed us to generate/advance the following hypotheses regarding the mechanisms of cold injury repair in insects: 1) Cytoskeletal rearrangement is crucial for the repair of cold damage, as evidenced by the abundance of cytoskeletal genes upregulated during recovery from cold shock; 2) Coordination of numerous cell signaling pathways is a key component of cold-damage repair. Future experiments will seek to identify the relationships between these pathways and determine which are essential for cold repair; and 3) Many of the genes involved in cold shock repair are also essential for other forms of environmental stress. Evidence from both the present study and previous work suggests the presence of a common stress-signaling axis (or axes) that is activated by disparate forms of environmental stress.

5.5 Acknowledgements

We appreciate the assistance of Yanping Zhang of the Florida Genetics Core

Facility for assistance with microarray experiments. Also, we thank Vanessa Larvor at the University of Rennes for maintaining the GC-MS and running samples for the metabolomics experiments. Finally, we acknowledge the late Rob Michaud for help in conceiving and designing this study.

106

5.6 Tables

Table 5.1 Number of differentially expressed probes Probes that were considered significant had false discovery rate (FDR) 0.05, while the

1.5columns contain probes that were both significant (FDR 0.05) and 1.5-fold up- or downregulated. For each comparison, we measured the expression of 15,558 distinct expressed sequence tags (ESTs). RCH, rapid cold-hardening; CS, cold shock; R, recovery.

Comparison FDR < 0.05 1.5X Up 1.5X Down

Control vs. RCH 0 0 0

Control vs. CS+R 1,378 103 8

Control vs RCH+CS+R 1,525 125 9

CS +R vs. RCH+CS+R 5 0 0

107

Table 5.2 DAVID enrichment analysis for the Control vs. CS+2R comparison ESTs with FDR <0.05 and mapped to a Drosophila melanogaster protein database using blastx (E-value cutoff 1E-4) were included in the enrichmentanalysis. The “clustered observations” were obtained using the DAVID functional annotation clustering tool to cluster similar Gene Ontology (GO) terms into clustered observations. The “unclustered observations” were not placed into a functional cluster by the DAVID analysis. All clustered observations contained at least 1 GO term with FDR <0.05, while each of the unclustered observations had FDR <0.05.

Description Type of GO Term Enrichment Genes Score Represented, n Clustered vesicle mediated transport, endocytosis, biological process 5.1 59 observations phagocytosis nucleotide binding molecular function 4.4 123 cell adhesion biological process 3.9 31 actin cytoskeleton organization biological process 3.4 55 cell morphogenesis biological process 2.8 84 tracheal system development biological process 2.5 28 cytoskeleton-dependent intracellular transport biological process 2.4 17 cytoskeleton organization biological process 2.4 61 programmed cell death biological process 2.3 32 Unclustered cytoskeletal protein binding molecular function 2.6 39 observations actin binding molecular function 3 26 protein localization biological process 1.8 47 unfolded protein binding molecular function 3 16 calcium ion binding molecular function 1.9 33 negative regulation of signal transduction biological process 2.7 18 negative regulation of cell communication biological process 2.6 18 response to heat biological process 3.1 14

108

Table 5.3 DAVID enrichment analysis for the Control vs. RCH+CS+2R comparison ESTs with FDR <0.05 and mapped to a Drosophila melanogaster protein database using blastx (E-value cutoff 1E-4) were included in the enrichmentanalysis. The “clustered observations” were obtained using the DAVID functional annotation clustering tool to cluster similar Gene Ontology (GO) terms into clustered observations. The “unclustered observations” were not placed into a functional cluster by the DAVID analysis. All clustered observations contained at least 1 GO term with FDR <0.05, while each of the unclustered observations had FDR <0.05.

Description Type of GO Term Enrichment Genes Score Represented, n Clustered vesicle mediated transport, endocytosis, biological process 7.1 71 observations phagocytosis nucleotide binding molecular function 6.3 138 response to temperature stress biological process 4 23 actin cytoskeleton organization biological process 3.8 64 epithelial development biological process 3.4 53 cytoskeleton cellular component 3.2 86 cytoskeleton organization biological process 2.9 97 postembryonic development biological process 2.9 63 RNAi mediated gene silencing biological process 2.8 46 cell morphogenesis biological process 2.7 94 cell adhesion biological process 2.6 21 Unclustered cytoskeletal protein binding molecular function 2.8 46 observations actin binding molecular function 3.2 29 cell adhesion biological process 2.8 34 biological adhesion biological process 2.6 34 regulation of cell shape biological process 3.3 23 negative regulation of signal biological process 3.1 22 transduction unfolded protein binding molecular function 3.4 19 regulation of cell morphogenesis biological process 2.9 24 negative regulation of cell biological process 3 22 communication protein localization biological process 1.7 47 protein folding biological process 2.5 50

109

Table 5.4 GSA of genes involved in recovery from cold shock

Log2 intensity values for each probe that mapped to a D. melanogaster gene (E-value

<1E-5) were included in the analysis. Gene sets were obtained from the KEGG database and from a priori lists generated from other microarray studies. Each gene set included in the table has FDR < 0.1. GSA, gene set analysis.

Category Gene Set Genes Score FDR Measured, n KEGG valine, leucine, and isoleucine 8.2 57 <0.0001 biosynthesis KEGG dorso-ventral axis formation 11 0.75 <0.0001 KEGG Jak/STAT signaling pathway 12 1.38 <0.0001 A priori Drosophila hypoxia response (42) 55 2.61 <0.0001 A priori Drosophila hyperoxia response (37) 57 1.64 <0.0001 A priori Drosophila cold stress (49) 14 3.05 <0.0001 A priori insulin receptor signaling pathway 9 0.95 <0.0001 (62) KEGG urea cycle and metabolism of amine 18 1.54 <0.0001 groups A priori Drosophila heat stress (54) 81 1.66 <0.0001 A priori Drosophila oxidative stress response 369 0.89 0.013 (23) KEGG pentose and glucuronate 16 1.01 0.013 interconversions A priori Drosophila ecdysone signaling (9) 274 0.74 0.023 KEGG phosphatidylinositol signaling 25 1.06 0.042 KEGG Wnt signaling 39 0.9 0.049 KEGG VEGF signaling 28 0.83 0.059 KEGG TGF-beta signaling 18 0.8 0.075 KEGG inositol phosphate metabolism 19 0.9 0.075 KEGG pyruvate metabolism 29 1.01 0.082 KEGG p53 signaling 11 0.86 0.084 KEGG glycerolipid metabolism 34 0.83 0.093

110

Table 5.5 GSA of genes enriched in the C vs. RCH+CS+2R comparison

Log2 intensity values for each probe that mapped to a D. melanogaster gene (E-value

<1E-5) were included in the analysis. Gene sets were obtained from the KEGG database and from a priori lists generated from other microarray studies. Each gene set included in the table has FDR <0.1. GSA, gene set analysis.

Category Gene Set Genes Score FDR Measured, n KEGG valine, leucine, and isoleucine 8 2.57 <0.0001 biosynthesis A priori Drosophila hypoxia response (42) 55 2.57 <0.0001 A priori Drosophila cold stress (49) 14 2.78 <0.0001 A priori Drosophila heat stress (54) 81 1.47 <0.0001 KEGG urea cycle and metabolism of amine 18 -1.54 <0.0001 groups A priori Drosophila hyperoxia response (37) 57 1.61 0.011 KEGG Jak/STAT signaling pathway 12 1.37 0.016 KEGG pyruvate metabolism 29 1.01 0.016 A priori Drosophila ecdysone signaling (9) 274 0.84 0.035 A priori Drosophila oxidative stress response 369 0.87 0.044 (23) KEGG pentose and glucuronate 16 0.88 0.044 interconversions KEGG inositol phosphate metabolism 19 0.94 0.044 KEGG phosphatidylinositol signaling 25 0.98 0.044 A priori Drosophila reproductive diapause 257 0.63 0.045 (6) KEGG dorso-ventral axis formation 11 0.96 0.055 A priori insulin receptor signaling pathway 9 0.99 0.063 (62) KEGG Wnt signaling 39 0.84 0.079

111

Table 5.6 Drosophila cold stress genes expressed during recovery in S. bullata This list of genes, identified as significantly enriched by GSA, was obtained from a microarray study of Drosophila cold stress (Qin et al., 2005). The column “GSA Gene

Score” is a modified t-statistic that reflects the relative importance of a particular transcript toward the overall enrichment of that gene set.

EST Accession Description Drosophila Blastx E-value GSA Score FDR

RefSeq Homolog Gene Log2

FC

EZ605491 CG8026, isoform B NP_610468.1 3.00E-72 15.48 0.9 5.56E-11 [Drosophila melanogaster] U96099.2 Sarcophaga crassipalpis NA NA 11.41 1.74 3.74E-11 23 kDa heat shock protein ScHSP23 mRNA, complete cds EZ598021 CG15745, isoform A NP_572873.1 6.98E-10 7.13 0.63 4.16E-09 [Drosophila melanogaster] EZ597482 Ubiquitin-5E NP_727078.1 4.90E-127 5.67 0.39 1.29E-05 [Drosophila melanogaster] EZ601452 SRY interacting protein NP_524712.1 1.79E-08 4.3 0.12 2.45E-01 1 [Drosophila melanogaster] SRR006884.66098 pinocchio, isoform A NP_608568.1 8.51E-37 3.93 0.9 1.37E-05 [Drosophila melanogaster] SRR006884.70084 CG3814, isoform A NP_610824.1 1.20E-06 2.75 0.14 3.44E-01 [Drosophila melanogaster] EZ601126 CG3345 [Drosophila NP_608509.1 1.67E-06 1.66 0.21 5.46E-01 melanogaster] EZ604149 draper, isoform B NP_728660.2 1.97E-25 0.36 0.03 7.78E-01 [Drosophila melanogaster] EZ610357 CG15347, isoform A NP_572479.1 8.11E-63 -0.17 -0.02 8.83E-01 [Drosophila melanogaster] SRR006884.112334 CG2118, isoform A NP_651896.1 6.55E-08 -0.87 0 9.82E-01 [Drosophila melanogaster] EZ599928 mitochondrial acyl NP_477002.1 1.13E-13 -1.05 -0.04 6.88E-01 carrier protein 1, isoform B [Drosophila melanogaster] SRR006884.85625 Ect3 [Drosophila NP_650142.1 2.30E-13 -1.12 0.07 6.29E-01 melanogaster] EZ600486 CG8778 [Drosophila NP_610805.1 1.45E-108 -1.14 -0.05 6.73E-01 melanogaster]

112

Table 5.7 Metabolic pathways modulated by RCH

Pairwise metabolite pathway enrichment analysis was conducted on the log2 metabolite contents for each comparison. Significantly enriched pathways with FDR <0.05 are included in the table

Comparison Pathway Represented Impact FDR Control vs. RCH glycolysis or 3 1.43E-01 2.34E- gluconeogenesis 03 amino sugar and 4 1.53E-01 2.34E- nucleotide sugar 03 metabolism pentose phosphate 5 8.99E-02 1.84E- pathway 02 galactose 8 9.34E-02 2.21E- metabolism 02 starch and sucrose 6 8.77E-02 2.99E- metabolism 02 CS+24R vs. none RCH+CS+24R CS+2R vs. starch and sucrose 6 8.77E-02 3.75E- RCH+CS+2R metabolism 02 lysine degredation 2 1.46E-02 3.75E- 02 amino sugar and 4 1.53E-01 3.75E- nucleotide sugar 02 metabolism pentose phosphate 5 8.99E-02 3.75E- pathway 02 glycolysis or 3 1.43E-01 3.75E- gluconeogenesis 02 galactose 8 9.34E-02 3.75E- metabolism 02 fructose and 4 1.58E-01 3.75E- mannose 02 metabolism glutathione 4 2.81E-02 4.46E- metabolism 02

113

Table 5.8 Coordinated pathways enrichment during recovery from cold shock KEGG pathways that were significantly enriched (FDR <0.1) in the Control vs. CS 2R comparison by both GSA for the gene expression data and metabolite pathway enrichment analysis for the metabolimics data are included.

Biochemical Pathway Genes GSA FDR Metabolites Metabolite

Measured, Measured, Pathway

n n FDR

Valine, leucine, and 8 <1.00E-04 4 1.64E-03

isoleucine biosynthesis

Pentose and glucuronate 16 0.013 3 1.06E-04

interconversions

Inositol phosphate 20 0.075 1 1.04E-03

metabolism

Pyruvate metabolism 29 0.082 1 6.23E-03

114

5.7 Figures

Figure 5.1 Experimental design Temperature treatments (A–C) and hybridization design (D) for microarray experiment.

Temperature treatments are depicted in A–C, with an arrow depicting the time of sampling for each treatment. In A, the solid line depicts control conditions while the dashed line indicates RCH conditions. In D, treatments connected with a double arrow were hybridized on the same chip,n = 4 biological replicates for each comparison: CS, cold shock; 2R, 2 h recovery; 24R, 24h recovery.

115

Figure 5.2 Effect of rapid cold-hardening Effect of rapid cold-hardening (RCH) on the cold tolerance of pharate adult flesh flies.

Cold-shocked flies were directly exposed to the indicated test temperature for 2 h, while flies in the RCH groups were exposed to 0°C for 2h prior to being transferred to the test temperature. Flies that successfully eclosed as adults were considered alive. Different letters represent significant differences between groups (ANOVA, Tukey,P<0.05).

116

Figure 5.3 Expression heat map Heat map showing expression patterns of the top 150 most differentially expressed expressed sequence tags (ESTs). Expression values are given as the log2 ratio of each treatment relative to the control sample hybridized on the same chip. All control values are set to 0. The phenotypes (horizontal axis) and probes (vertical axis) are separated with hierarchical clustering, and each distinct cluster of samples is indicated by a colored bar.

117

Figure 5.4 Multivariate analysis of expression Principal components analysis (A) and hierarchical clustering (B) of the entire microarray dataset. Input data were the log2 intensity values for each individual sample. In B, the dendrogram is scaled to represent the distance between each branch. The distinct cluster containing control and RCH samples is highlighted in blue, while the cluster containing

CS+2R and RCH+CS+2R samples is highlighted in red.

118

Figure 5.4

119

Figure 5.5 Metabolomic response to rapid-cold hardening and cold shock Relative changes in metabolite contents in response to RCH and cold shock. Metabolite contents are expressed as the mean±SE fold change of each metabolite relative to control.

*Significant difference (ANOVA, false discovery rate < 0.05) between that treatment and the control within a particular metabolite.

120

Figure 5.6 Heat map and multivariate analysis of metabolomics Heat map diagram (A) and principal componentsanalysis (B) of the entire metabolomics dataset. In A, the colors represent the log2 fold change of each metabolite relative to the mean control level. Individual samples (horizontal axis) and compounds (vertical axis) are separated using hierarchical clustering, with the dendrogram scaled to represent the distance between each branch. The cluster containing control and RCH groups is highlighted in green, the cluster containing CS+2R and RCH+CS+2R groups is highlighted in orange, while the cluster containing the CS+24R and RCH+CS+24R groups is highlighted in red. In B the input data consisted of the log2metabolite content for each compound measured in each sample.

121

Figure 5.6

122

Conclusions

Belgica antarctica has the smallest genome yet measured. B. antarctica does not achieve its size by a reduction in the number of coding genes. B. antarctica achieves it size by reducing the amount of repetitive elements including transposable elements and introns. B. antarctica shows positive enrichment of many Gene Ontology categories related to developmental and regulation of biological process. B. antarctica shows negative enrichment of the Gene Ontology category of odorant binding.

Dehydration and cryoprotective dehydration result in similar expression changes in B. antarctica. In response to dehydration, B. antarctica has a coordinated response to refold and breakdown misfolded proteins. A large scale reduction of metabolism is seen in response to dehydration at both the transcriptomics and metabolomic levels. This is especially true of glycolysis and the citric acid cycle. B. antarctica accumulates protective compounds such as proline and polyols in response to dehydration. In response to dehydration, B. antarctica promotes autophagy and inhibits apoptosis. B. antarctica and Megaphorura arctica have largely different expression patterns in response to desiccation in spite of having evolved to live in similar environments.

123

Sarcophaga bullata has fewer interspersed repeats than closely related species with similar sized genomes. S. bullata has smaller introns on average than closely related species with similar sized genomes. Gene Ontology (GO) analysis revealed enrichment of GO terms related to stress tolerance and DNA packaging. Transcriptional analysis revealed differences among tissue and life stage.

Rapid Cold Hardening (RCH) has very little effect on transcription during treatment or during recovery. RCH ramped up gluconeogenesis, probably to produce cryoprotective polyols. Recovery from cold shock has sweeping effects on the metabolome including increasing the concentration of almost every metabolite studied. RCH dampened the increase associated with cold shock and recovery for some metabolites including sugars and polyols.

124

Appendix A: Supplement to Chapter 2

A.1 Supplementary Methods

A.1.1 Genome size estimate from flow cytometry

Genome size determinations were produced following procedures described in

Hare & Johnston (2011); an expansion on those methods is provided here. A single head of the species of interest plus the single head of D. melanogster standard (1C = 175 Mbp) were placed into 1 ml of Galbraith buffer in a 2-ml Kontes Dounce homogenizer tube and stroked 15 times with the A pestle to release nuclei from both the sample and standard.

The resultant solution was filtered through 40U nylon mesh, stained a minimum of 20 min in the dark with 25 ul of propidium iodide, and then run on a Partec Cyflow cytometer to score relative red fluorescence (> 590 nm) of nuclei from the sample and standard. The amount of DNA in the sample was determined as the mean channel number of the 2C peak of the sample divided by the mean channel number of the 2C peak of the standard times the amount of DNA in the standard. All DNA estimates were determined from a co-preparation of sample and (internal) standard. The position of the sample peak relative to that of the other peaks was established by a single run with the sample or

(external) standard prepared and stained individually. Average genome sizes of males and females of B. antarctica were based on 20 total replicate estimates on 10 males and 10 125 females (Supplementary Fig. 1, Supplementary Table 1). Flow cytometry estimates of genome size were also performed for three additional members of the family

Chironomidae; the samples were collected in Minnesota and provided by Leonard C.

Ferrington Jr., University of Minnesota.

A.1.2 Genome size estimate from sequence reads

Genome size was estimated from sequence reads using a k-mer based approach

(Zhang et al, 2012). The genome size is estimated as the total number of k-mers (in this case 17-mers) divided by the maximal frequency of the k-mer (Supplementary Fig. 2).

A.1.3 Assembly strategy

The assembly individual was sequenced to over 100x coverage using one lane of

Illumina HiSeq2000 sequencing technology with a 400 bp insert paired-end sequencing library. A total of 92 million paired-end reads of 101 bp were input into Velvet de novo with a k-mer of 55 and an insert length of 400 bp (Zerbo et al, 2008). A total of 5,422 contigs were output from the Velvet de novo assembly. Two iterations of ERANGE using the paired-end RNA-sequencing data (Mortazavi et al, 2010) were used to scaffold the assembled contigs, reducing the number of contigs to 5,064 (Supplementary Fig. 3).

Complex regions flanked by sequence reads are represented by stretches of Ns.

A.1.4 Evidence of high quality genome assembly

126

Multiple lines of evidence confirm that the assembled genome is of high quality and represents most of the DNA sequence. Ninety-five percent of the sequencing reads mapped to the reference genome, with a modal coverage of 177; coverage is calculated based on reads mapped to assembled genome using BamTools (Barnett et al, 2011)

(Supplementary Fig. 4). Genome quality assessment was also accomplished by mapping

RNA-sequencing data to the assembled genome (see Methods); over 87% of RNA- sequencing reads from Teets et al. (2012) mapped to the assembled reference. The RNA- sequencing libraries were exclusively from 4th instar larvae, thus only genes expressed during the fourth larval instar are present in the data. Moreover, the concordance between the flow cytometry estimate and the assembled genome size suggest that the assembly is complete, with little or no significant blocks of missing chromatin.

A.1.5 Repeats and transposable elements

Transposable element (TE) insertion locations were identified (see Methods,

Table 2, Supplementary Data 1). Sixty-eight of the TE insertion sites contain more than one nested TE insertion (Supplementary Table 3). Sequences at the remaining 468 sites clearly correspond to unique TE insertions, representing TEs from the three main TE orders (DNA, non-Long Terminal Repeat (LTR), LTR) (Supplementary Table 4). Most insertions corresponded to retroelements: 306 LTR, 107 non-LTR retroelements, and only 55 DNA elements. No annotated genes were found in contigs containing 105 of the

468 unique TE insertions identified in the assembled genome, indicating that some contigs contain highly repetitive sequence and no apparent coding regions. Among the

127

TE insertions, more than 60% were located inside or less than 1Kb from an identified gene (Supplementary Table 5). We detected one full length LTR, indicating that while we are able to detect full-length LTRs using the short-read sequence data, LTRs are not present in the other LTR retroelements, thus suggesting that those elements are inactive.

Multiple approaches to TE assembly recovered only partial TE sequences, each with high divergence from the canonical consensus TE sequence of the respective families. We conclude that there are very few TEs in the genome, and those that are present are likely to be old and inactive. No species-specific TEs were detected in the raw reads using

ReAs (Li et al, 2005).

A.1.6 Gene annotation of core eukaryotic genes

The set of core eukaryotic genes in the assembled genome was identified using

CEGMA (Parra et al, 2007). Out of the 248 identified as the most highly conserved core eukaryotic genes (Parra et al, 2007), the assembled B. antarctica genome contains 233

(Supplementary Table 6). Including partial matches, the assembled genome contains

97.6% of the highly conserved core eukaryotic genes. Moreover, the average copy number of those genes is low (1.13); both lines of evidence suggest that the assembly is complete and contains the majority of the protein coding sequences, especially compared to other larger insect genomes in which core eukaryotic genes were also identified

(Supplementary Table 7). The D. melanogaster and An. gambiae assembled genomes both contain over 97% of the complete core eukaryotic genes, whereas the Ae. aegypti

128 and C. quinquefasciatus assembled genomes contain 54.4% and 51.2% of the core genes, respectively, with higher average copy number of 1.47 and 1.3, respectively.

A.1.7 piRNA pathway proteins

The presence of piRNA pathway genes was interrogated using the OrthoMCL (Li et al, 2003) comparison between the five species. Three of the loci (rhino, squash, and zucchini) are not present in a cluster of orthologous genes. We executed a tblastx search of the genome for those loci to confirm the absence of the genes in the genome and not solely mis-annotation. The tblastx search revealed few regions of homology that were limited to common protein domains (such as PLD6 for zuc), suggesting that the loci are not present in B. antartica. It has been shown that rhino, krimper, and aubergine have been subjected to pervasive positive selection in Drosophila species (Simkin et al, 2013).

This suggests that a rhino ortholog may be sufficiently divergent to identify orthologs, however, this does not appear to be the case for the other two genes.

A.1.8 Evidence for known infections or symbionts

To determine whether there was any evidence for Wolbachia or Spiroplasma melliferum in B. antarctica, sequence reads were mapped to the Wolbachia genome

(Genbank NZ_AAQP00000000 ) (Clark et al, 2007) and S. melliferum genome (Genbank

AGBZ01000003) (Alexeev et al, 2012). There was no evidence of either species in B. antarctica.

129

A.1.9 Mitochondrial annotation

Contigs from the original genome assembly were compared using BLAST to mitochondrial genomes of Drosophila melanogaster (NC_001709.1) and Chironomus tepperi (NC_016167.1) (Beckenbach, 2012) to identify mitochondrial contigs. The reference mitochondrial sequence was present as two contigs in the assembled genome.

The two contigs were oriented and merged by homology. Annotation of the mitochondria was accomplished by homology using Mauve (Darling et al, 2004) . Mitochondrial tRNA genes were identified using the tRNAScan-SE 1.21 server (Lowe et al, 1997) and alignment to other mt-tRNA genes. The annotated mitochondrial sequence is 15,912 bp, with 13 genes and 18 tRNAs.

130

A.2 Supplemental Tables

Table A.1 Flow cytometry estimates for Chironomidae species

Species n Genome size ± Reference

S.E. (Mbp)

Chironomus tentans - 205 Petitpierre, 1996

Chironomus riparius (female) - 196.2 ± 1.0 Schmidt-Ott et al., 2009

Chironomus riparius (male) - 194.3 ± 1.1 Schmidt-Ott et al., 2009

Prodiamesa olivacea - 127 Zacharias, 1979

Allacapnia sp. 3 118.3 ±1.4 New in this study

Diamesa mendotae 9 116.3± New in this study

Micropsectra sp. 6 108.2 ± New in this study

Belgica antartica (female) 10 99.25 ± New in this study

Belgica antarctica (male) 10 98.4 ± New in this study

131

Table A.2 Comparison of different assemblers

Assembler version k reads Size (bp) contigs N50 NG50 REAPR REAPR REAPR used >300 error errors warnings free bases (%) SGA 0.10.13 1/3 90448289 15248 33724 29620 82.49 2303 27020 SGA 0.10.13 all 91442312 25571 14861 12876 77.21 2074 42092 ABySS 1.3.7 35 all 88718974 5256 44041 37719 89.9 2192 15016 ABySS 1.3.7 39 all 90460926 5580 41622 37036 89.85 2379 15240 Velvet 1.2.10 49 all 89431929 4870 95812 82919 84.06 9053 18484 velvet 1.2.10 53 all 89969631 5043 95610 82768 83.97 8929 19485 velvet 1.1.05 55 all 89519377 5422 94529 82407 85.88 6203 19889 velvet 1.2.10 55 all 90252044 5089 94510 83802 83.21 9263 20247 velvet 1.2.10 57 all 90660484 5199 95728 85026 82.53 10755 20809 velvet 1.2.10 57 1/3 89581777 5434 72821 64577 82.44 10185 20162 velvet+ 1.1.05 55 all 89501225 5064 97740 84969 85.85 6427 19956 ERANGE velvet+ 1.1.05 all 89589133 5003 98263 85160 85.93 6440 20464 ERANGE PacBio

132

Table A.3 Nested TEs

TE orders TE families copies (#) DNA Mariner2;Tc1 1 DNA Marwolen1;Mariner2;Tc1 1 DNA Paris;Quetzal 1 DNA Quetzal;Tc1 1 DNA Helitron1;RtaG4 2 DNA;LTR Galileo;Gypsy68 1 DNA;LTR P4;HMS-Beagle 1 LTR Accord2;Gypsy59 1 LTR Accord2;Stalker 1 LTR Bel2;ninja 3 LTR Bel4;diver 2 LTR Bel4;roo 1 LTR Bel8;diver 1 LTR Bel8;diver;max-element 1 LTR Bel8;max-element 3 LTR Bel8;rooA 3 LTR Copia;frogger 1 LTR Copia;mtanga 5 LTR Copia1;Copia 2 LTR Copia1;Copia2 1 LTR Copia1;Copia4 1 LTR Copia1;mtanga 3 LTR Copia2;1731 1 LTR Copia2;Copia4 2 LTR Copia4;Copia 5 LTR Copia4;Copia;frogger 1 LTR Copia4;frogger 2 LTR Copia2;frogger 1 LTR frogger;mtanga 1 LTR Gypsy10;Gypsy6 1 LTR Gypsy10;Tabor 1 LTR;non-LTR Ag-Jock-13;Ag-Outcast-6;mdg1;G2;Tabor 1 LTR;non-LTR Amer3;Copia 3 LTR;non-LTR MinoAg1;roo 1 LTR;non-LTR roo;Rt2 1 LTR;non-LTR Rt1;Tabor 1 LTR;non-LTR Tabor;Ag-Jock-13 1 non-LTR Cr1-1;Cr1-4 1 non-LTR MinoAg1;Tart 1 non-LTR R7AG2;Tart 1 non-LTR Rt1;Tart 1 non-LTR Rt2;Tart 1 non-LTR RtaG4;Tart 3

133

Table A.4 Distribution of unique TEs

copies TE order TE family (#) DNA BuT2 1 DNA BuT3 1 DNA DNAREP1 1 DNA Harbinger1 2 DNA hAT-2 1 DNA Helitron1 12 DNA Helitron2 3 DNA ISBu2 3 DNA Kepler 1 DNA Mariner1 3 DNA Mariner2 1 DNA mini-me 3 DNA P4 1 DNA Polinton 5 DNA Quetzal 1 DNA S-element 1 DNA Tc1 3 DNA Transib1 6 DNA Transib2 1 DNA Transib3 2 DNA tsessbeII 1 DNA Uhu 2 LTR 17.6 3 LTR 1731 1 LTR Accord 1 LTR Accord2 2 LTR aurora-element 1 LTR Bel1 1 LTR Bel10 1 LTR Bel11 2 LTR Bel12 3 LTR Bel13 2 LTR Bel14 5 LTR Bel15 4 LTR Bel17 1 LTR Bel18 2 LTR Bel2 4 LTR Bel4 6 LTR Bel8 3 LTR Bel9 2 LTR Burdock 1 LTR Chouto 1 LTR Circe 2 LTR Copia 16 LTR Copia1 8 LTR Copia2/Dm88 7 LTR Copia3 2 LTR Copia4 20 LTR Copia5 2 continued

134

Table A.4 continuted

copies TE order TE family (#) LTR diver2 2 LTR flea 1 LTR frogger 4 LTR GATE 1 LTR Gypsy 2 LTR Gypsy1 1 LTR Gypsy10 4 LTR Gypsy12 1 LTR Gypsy12A 1 LTR Gypsy15 1 LTR Gypsy19 1 LTR Gypsy20 1 LTR Gypsy21 5 LTR Gypsy24 1 LTR Gypsy25 1 LTR Gypsy26 2 LTR Gypsy27 3 LTR Gypsy29 1 LTR Gypsy37 2 LTR Gypsy38 1 LTR Gypsy4 1 LTR Gypsy40 2 LTR Gypsy41 1 LTR Gypsy5 1 LTR Gypsy55 7 LTR Gypsy57 3 LTR Gypsy58 1 LTR Gypsy59 1 LTR Gypsy6 3 LTR Gypsy61 5 LTR Gypsy62 3 LTR Gypsy63 10 LTR Gypsy64 7 LTR Gypsy65 1 LTR Gypsy68 23 LTR Gypsy69 12 LTR Gypsy8 1 LTR Gypsy9 1 LTR HMS-Beagle 5 LTR Invader1 2 LTR Invader3 1 LTR Invader3 1 LTR max-element 6 LTR mdg1 6 LTR mtanga 5 LTR Ninja 3 LTR nomad 2 LTR Osvaldo 4 LTR roo 8 continued

135

Table A.4 continuted

copies TE order TE family (#) LTR Springer 1 LTR Stalker2 1 LTR Tabor 16 LTR Tom 5 LTR tv1 1 LTR Ulysses 1 LTR ZAM 1 non-LTR aara8 3 non-LTR Ag-Jock-13 3 non-LTR Agam2 4 non-LTR Amer3 1 non-LTR Baggins 1 non-LTR Baggins1 2 non-LTR Bilbo 3 non-LTR BS 2 non-LTR Cr1-1 4 non-LTR Cr1-4 2 non-LTR Cr1-8 1 non-LTR Cr1-9 2 non-LTR Cr1A 1 non-LTR Cr1A3 2 non-LTR Doc2 2 non-LTR Doc3 2 non-LTR Doc4 1 non-LTR Doc5 1 non-LTR Doc6/Juan 4 non-LTR Dong 1 non-LTR F-element 3 non-LTR FW 2 non-LTR G2 1 non-LTR G3 3 non-LTR G4 2 non-LTR G5 2 non-LTR HidaAg1 1 non-LTR I-element 1 non-LTR Loa 7 non-LTR MinoAg1 1 non-LTR R7Ag1 2 non-LTR R7AG2 1 non-LTR Rt1 3 non-LTR Rt1a 1 non-LTR Rt2 1 non-LTR RtaG4 9 non-LTR Tart 20 non-LTR Tart-B 1 non-LTR uvir 2 non-LTR worf 2

136

Table A.5 Distance to closest gene from annotated TE

Distance Number

No gene on contig 105

0 246

0-1kb 55

1-5kb 54

5-10kb 6

10-50kb 2 total 468

137

Table A.6 CEGMA analysis of the B. antarctica assembled genome Core Eukaryotic Genes Mapping Approach (CEGMA) analysis of the B. antarctica assembled genome. Group 1 contains the least conserved of the Core Eukaryotic Genes

(CEG) and Group 4 the most conserved.

CEG Complete % Complete Total Average % group proteins observed (#) Copy Orthologs (#) Number Complete 233 93.95 264 1.13 9.44 Group 1 61 92.42 67 1.1 6.56 Group 2 53 94.64 63 1.19 13.21 Group 3 56 91.8 61 1.09 8.93 Group 4 63 96.92 73 1.16 9.52 Partial 242 97.58 295 1.22 15.29 Group 1 66 100 77 1.17 12.12 Group 2 55 98.21 71 1.29 20 Group 3 58 95.08 70 1.21 15.52 Group 4 63 96.92 77 1.22 14.29

138

Table A.7 CEGMA analysis of five genomes Core Eukaryotic Genes Mapping Approach (CEGMA) analysis of the B. antarctica assembled genome.

Species CEG set Complete % Total Average Copy % Orthologs

proteins Complete observed Number

(#) (#)

Ae. aegypti Complete 135 54.44 198 1.47 29.63

Partial 161 64.92 262 1.63 36.65

An. gambiae Complete 242 97.58 451 1.86 49.17 Partial 245 98.79 588 2.4 54.29 B. antarctica Complete 232 93.55 263 1.13 8.62 Partial 242 97.58 296 1.22 14.46 C. quinquefasciatus Complete 127 51.21 165 1.3 27.56 Partial 140 56.45 206 1.47 35.71 D. melanogaster Complete 241 97.18 279 1.16 13.69 Partial 245 98.79 293 1.2 15.92

139

Table A.8 GC content in the five Dipteran species used for comparative analyses

Species Genome GC% Coding GC%

Ae. aegypti 36.2 49.9

An. gambiae 40.9 56.7

B. antarctica 39.0 47.0 C. quinquefasciatus 34.9 55.3 D. melanogaster 40.2 53.4

140

Table A.9 Presence of piRNA pathway genes in B. antarctica assembly

Dmel name Dmel Dmel symbol function interacting ortholog

Annotation partners* present

Ago3 CG40300 AGO3 piRNA 2 yes Armitage CG11513 armi Helicase 4 yes Aubergine CG6137 aub piRNA 15 yes Krimper CG15707 krimp Tudor ? yes Piwi CG6122 piwi piRNA 5 yes Rhino CG10683 rhi Chromatin 9 no SpnE CG3158 spn-E Helicase 2 yes Squash CG4711 squ Nuclease 1 no Vasa CG3506 vas Helicase 4 yes Zucchini CG12314 zuc Nuclease 1 no

141

Table A.10 Coding region length summaries for each species

Species Minimum Median Mean Maximum Total bp Ae. aegypti 90 1062 1376 33984 22005111 An. gambiae 81 1140 1551 47532 19648704 B. antarctica 69 1008 1403 26004 18619038 C. quinquefasciatus 60 1016 1310 27324 24835191 D. melanogaster 48 1152 1542 68916 20804839

142

Table A.11 Coding region length summaries for loci with one-to-one orthologs

Species Minimum Median Mean Maximum Total bp

Ae. aegypti 159 1473 1847 15903 6618768 An. gambiae 246 1548 2004 16485 7177935 B. antarctica 183 1455 1827 13965 6544527 C. quinquefasciatus 162 1461 1837 16269 6578769 D. melanogaster 246 1623 2038 16683 7300473

143

Table A.12 Intron length summaries for each species

Species Median Mean Maximum

Ae. aegypti 150 3728 329295 An. gambiae 101 1136 196915 B. antarctica 69 333 19461 C. quinquefasciatus 109 1474 88659 D. melanogaster 103 955 142973

144

A.3 Supplemental Figures

Figure A.1 Flow cytometry estimates for B. antarctica Flow cytometry estimate histogram for B. antarctica using standard of D. melanogaster.

145

Figure A.2 Distribution of 17-mers from raw sequence data Shown here are 17-mers with at least 4 occurrences. For visualization, k-mers that are seen less than 4 times i the dataset are excluded from this plot. Those k-mers likely arise due to sequencing errors. The smaller peak (at uniqueness 69) represents regions in the genome with a single nucleotide polymorphism (SNP).

146

Figure A.3 Distribution of scaffold lengths from assembled genome Length distribution of scaffolds from the final genome assembly. The smallest scaffold is

300 base pairs. Note the y-axis is on a log scale.

147

Figure A.4 Coverage histogram of bases in assembled genome For each base pair in the assembled genome, coverage is calculated based on reads mapped to the assembled genome using BamTools (Barnet et al., 2011).

148

Figure A.5 cox1 relationship among B. antarctica individuals Comparison to existing B. antarctica sequences from the cox1 locus collected in the

Allegrucci et al. (2012) study was downloaded from Genbank, aligned to the data from this study and a maximum likelihood tree was built using RaxML (Stamatakis, 2006).

Sample codes from Allegrucci et al. (2012) are Livingston Island, Byers Peninsula (BGS,

BBS, BLR), Spert Island (BSI), Cierva Point (BCP), Danco Island (DIB), Goudier Island

(PLB), Palmer Station (BPS), Berthelot Island (BBI), Peterman Island (BPI), Gand Island

(BGI), and Cape Evensen (BCE).

149

Figure A.6 Codon usage bias estimates for each species Side-by-side comparison of boxplots of codon usage among five Diptera species using an estimate of effective number of codons that accounts for background nucleotide composition. Only genes in the set of 3,582 one-to-one orthologs were used in the analysis to ensure that the same loci were compared between species. Each box represents the interquartile range and outliers that are more than or less than 1.5 times the interquartile range are represented as dots in the boxplots.

150

Figure A.7 Intron size distribution comparison Boxplot comparing the natural logarithm of intron size for the five Diptera species. Data also shown in Supplementary Table 12. Each box represents the interquartile range and outliers that are more than or less than 1.5 times the interquartile range are represented as dots in the boxplots.

151

Figure A.8 Demographic history inferred from a single B. antarctica genome Pairwise Sequentially Markovian Coalescent (PSMC) analysis for inferred historical population sizes using variant data from the sequenced individual using a mutation rate of

0.4 × 10-8. The x-axis gives time measured by pairwise sequence divergence converted to years and the y-axis gives the effective population size measured by the scaled mutation rate. The green lines correspond to PSMC inferences on 100 rounds of bootstrapped sequences (as shown in Figure 4), while the red line corresponds to the estimate from the data.

152

Figure A.9 Demographic history inferred from a single B. antarctica genome Pairwise Sequentially Markovian Coalescent (PSMC) analysis for inferred historical population sizes using variant data from the sequenced individual using a mutation rate of

1.7 × 10-8. The x-axis gives time measured by pairwise sequence divergence converted to years and the y-axis gives the effective population size measured by the scaled mutation rate. The green lines correspond to PSMC inferences on 100 rounds of bootstrapped sequences (as shown in Figure 4), while the red line corresponds to the estimate from the data.

153

Appendix B: Supplement to Chapter 3

B.1 Supplemental Materials and Methods

B.1.1 RNA Extraction and Library Preparation

Total RNA was extracted from larvae using TRIzol reagent (Life Technologies) according to the manufacturer’s protocol. RNA quantity and purity was assessed on a

NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific), and integrity was measured on an Agilent Bioanalyzer 2100 (Agilent Technologies). To generate RNA sequencing (RNA-seq) libraries, we used the Illumina TruSeq RNA Sample Preparation kit (Illumina) according to the manufacturer’s protocol. In short, mRNA was purified from 2 μg total RNA from each sample, fragmented, and converted to double-stranded cDNA. Sequencing barcodes were ligated to the cDNA fragments, and the resulting fragments were amplified using PCR. Libraries were validated on an Agilent Bioanalyzer

2100 to ensure the libraries had the expected fragment size of ~300 bp.

B.1.2 Sequencing

Libraries were quantified using qPCR and sequenced at the Ohio Agricultural

Research and Development Center Molecular and Cellular Imaging Center. Sequencing libraries were multiplexed into groups of three (so that each multiplexed library contained 154 one library from each of the three treatment groups) and sequenced on an Illumina

Genome Analyzer II. For each sample, we obtained between 1.2 and 11.7 million 76-bp reads (Table S4).

B.1.3 Mapping and Counting Reads

Reads were mapped to Belgica antarctica genomic contigs (in preparation) using

Bowtie and TopHat (Trapnell et al., 2009), a short read aligner that is capable of predicting exon-exon splice junctions. After mapping, alignment files were processed using SAMtools (Li et al., 2009), and counts were generated with HTSeq, a Python package for high-throughput sequencing analysis. Using HTSeq, we counted the total number of sequencing reads that aligned to each putative gene model in the draft B. antarctica genome. Our draft genome contains ~13,500 gene models that were derived from a combination of RNA-seq reads, BLAST hits, and ab initio gene prediction software using MAKER (Cantarel et al., 2008). Of these, ~11,500 had enough reads align to them to allow estimation of differential gene expression. A relatively high percentage

(>76% for all samples) of reads aligned to gene models, suggesting a good representation of the transcriptome. Using blastx (E-value cutoff of 1E−4), we compared our gene models with annotated protein sequences from Aedes aegypti and Drosophila melanogaster to determine putative functions, and gene ontology (GO) terms were assigned to each gene model using Blast2GO (Conesa et al., 2005).

B.1.4 RNA-seq Data Analysis

155

To determine which genes were differentially expressed (DE), we used the R package DESEq. (Anders and Huber, 2010). In short, DESeq normalizes counts so that library size is equivalent for each sample, estimates a variance function, and tests for expression difference between two treatment conditions using a negative binomial distribution. We ran DESeq for each pairwise comparison of treatments (i.e., C vs. D, C vs. CD, and D vs. CD). For clarity, throughout this manuscript, a fold change >1 for comparison X vs. Yindicates higher expression in group Y relative to X, whereas a fold change <1 indicates lower expression in group Y relative to X. For hierarchical clustering of the phenotypic classes, we obtained variance stabilized data from DESeq, calculated a matrix of distances, and used the R package hclust for clustering.

After identifying DE genes, enriched GO terms were determined using the R package GOsEq. (Young et al., 2010), which accounts for transcript length bias associated with RNA-seq data. We separately tested for enriched GO terms in genes that were up- and down-regulated to identify which categories of genes were induced and which were repressed by a particular treatment. After enrichment testing, P values were corrected using the Benjamini and Hochberg method (Benjamini and Hochberg, 1995) to control the false discovery rate. We restricted the output to GO terms with ontology

“Biological Process” to limit redundancy. Additionally, we tested for enriched Kyoto encyclopedia of genes and genomes (KEGG) pathways (Kanehisa and Goto, 2000) using the R package gene set analysis (GSA) (Efron and Tibshirani, 2007). Unlike traditional overrepresentation analysis, GSA uses the actual expression values for each gene in determining its enrichment score. For GSA, we mapped our gene models to the A.

156 aegypti proteome and tested the entire set of A. aegypti KEGG pathways for enrichment.

The input for GSA was a matrix of normalized counts for the B. antarctica gene models that had a significant (E-value < 1E−4) BLAST hit against A. aegypti. In cases where two or more gene models mapped to the same A. aegypti protein, only the best BLAST match was retained.

B.1.5 Comparative Genomics of Dehydration Response

Using microarrays, Clark et al. (Clark et al., 2009) identified ESTs responsive to desiccation and cryoproective dehydration in the arctic collembolan, M. arctica. We restricted our comparison with the two treatments (Clark et al., 2009) that were analogous to our desiccation and cryoprotective dehydration treatments: the treatments named “0.9 salt” and “−2° C,” respectively. For simplicity, these treatments will also be referred to as desiccation and cryoprotective dehydration.

Putative orthologs between B. antarctica and M. arctica were determined by conducting reciprocal blast (algorithm tblastx) of our gene models against the M. arctica

ESTs found on the microarray. The M. arctica microarray data were obtained from

ArrayExpress (accession no. E-MEXP-2105) and analyzed using the R package limma according to the parameters outlined in Clark et al (2009). Finally, using the R package

VennDiagram, we calculated the degree of overlap between orthologous up- and downregulated genes among the four species/treatment combinations. Additionally, to determine the overall similarity in gene expression between groups, we conducted hierarchical clustering on the samples, restricting the analysis to orthologous transcripts.

157

Hierarchical clustering was conducted on the log fold change values for each transcript from each individual sample using JMP 9 (SAS Institute). For our dataset, we calculated the log fold change of each transcript relative to the mean expression value of the control group. For the M. arctica data, log fold changes for each EST were obtained from the limma pipeline following between-array normalization.

B.1.6 qPCR Validation

To validate results from the RNA-seq analysis, we conducted qPCR on a subset of

13 genes. We selected genes from several functional categories of interest (i.e., heat shock proteins, detoxification enzymes, regulators of cell death, and structural components of the cuticle and cystoskeleton), including a mix of genes that were up- and down-regulated by our treatments. Primers were designed using IDT’s primer design software (www.idtdna.com) with the following parameters: length of 24 nt, melting temperature of 60 °C, and product size of 100–180 bp. Primers were tested using conventional PCR and gel electrophoresis for a product of the correct size, and standard curves were conducted on a 10-fold dilution series of PCR products. The primer sequences and standard curves are presented in Table S5.

cDNA for qPCR was generated from aliquots of the same RNA samples used for

RNA-seq, thus allowing a direct correlation RNA-seq and qPCR results. Total RNA was further purified using the Ambion RiboPure kit (Life Technologies), and cDNA was generated with the Invitrogen SuperScript VILO cDNA Synthesis Kit. The resulting cDNA samples were diluted 10× before analysis and stored at −20 °C. Each qPCR

158 reaction consisted of 2 μL cDNA, 2 μL of each primer at 250 nM concentration, 4 μL water, and 10 μL 2× iQ SYBR Green Supermix (Bio-Rad). Reactions were carried out on a Bio-Rad iCycler iQ Real-Time PCR Detection System, with the following temperature protocol: 94 °C for 3 min, followed by 40 cycles of 94 °C for 10 s, 58 °C for 30 s, and 72

°C for 30 s. After each run, a meltcurve was generated to verify that only one product was present in the reaction. Baseline correction, amplitude normalization, and threshold cycle (Ct) calculations were conducted according to Larionov et al., (2005) with a custom

MatLab script. Relative gene expression was calculated using the 2−ΔCt method, with rpl19 serving as the reference gene. To convert to fold change, the mean 2−ΔCt value for each treatment group was divided by the mean value for the control.

B.1.7 Metabolomics

Because a large number of metabolic genes were differentially regulated in our treatments, we also conducted a metabolomics analysis of the same treatment conditions.

Groups of 15 larvae were homogenized in 600 μL of 2:1 methanolchloroform, 400 μL water was added for phase separation, and 180 μL of the upper aqueous phase was vacuum dried. The extract was resuspended in 30 μL of 20 mg/mL methoxyaminehydrochloride in pyridine and heated for 60 min at 40 °C while shaking.

Subsequently, 30 μL of N-Methyl-N-(trimethylsilyl) trifluoroacetamide was added, and the sample was heated for an additional 60 min at 40 °C. All derivatization steps were conducted with a CTC CombiPal autosampler (Gerstel) to ensure uniformity of samples.

After derivatization, samples were run on a Trace GC Ultra chromatograph coupled to a

159

Trace DSQII quadrupole mass spectrometer (Thermo Fischer Scientific). Oven conditions were as follows: from 70 to 170 °C at 5 °C/min, from 170 to 280 °C at 7

°C/min, from 280 to 320 °C at 15 °C/min, and then the oven remained for 4 min at 320

°C. Spectra were screened for 60 pure reference compounds in a custom database, and quantification was accomplished by comparing samples to a 10-point standard curve of pure analyte. Data were analyzed by conducting an ANOVA followed by a pooled-t test for each compound in JMP 9. P values were corrected using the Benjamini-Hochberg method (Benjamini and Hochberg, 1995).

160

B.2 Supplemental Tables

Table B.1 GO enrichment analysis of cryoprotective dehydration

GO term Definition FDR No. up- or Total in category

down-regulated

Up GO:0009408 Response to heat 4.18E−03 19 50 GO:0032312 Regulation of ARF GTPase activity 4.25E−03 8 13 GO:0016310 Phosphorylation 1.81E−02 82 466 GO:0006950 Response to stress 1.81E−02 17 59 GO:0007015 Actin filament organization 2.32E−02 25 80 GO:0006468 Protein phosphorylation 3.18E−02 71 363 GO:0070936 Protein K48-linked ubiquitination 4.44E−02 4 6 GO:0032436 Positive regulation of proteasomal 4.50E−02 4 5 ubiquitin-dependent protein catabolic process GO:0007298 Border follicle cell migration 5.20E−02 23 79 GO:0043405 Regulation of MAP kinase activity 5.20E−02 3 4 GO:0045081 Negative regulation of interleukin- 5.20E−02 3 3 10 biosynthetic process GO:0045599 Negative regulation of cell 5.20E−02 3 3 differentiation Down GO:0006508 Proteolysis 4.93E−18 115 595 GO:0008152 Metabolic process 3.58E−13 133 827 GO:0006030 Chitin metabolic process 1.89E−08 32 104 GO:0006629 Lipid metabolic process 1.95E−08 38 159 GO:0055114 Oxidation-reduction process 2.23E−06 101 732 GO:0005975 Carbohydrate metabolic process 1.37E−05 37 177 GO:0055085 Transmembrane transport 3.85E−05 64 408 GO:0006810 Transport 6.19E−05 96 756 GO:0015986 ATP synthesis coupled proton 4.85E−04 10 19 transport GO:0006096 Glycolysis 4.85E−04 12 30 GO:0008643 Carbohydrate transport 8.43E−04 17 61 GO:0015992 Proton transport 9.27E−04 12 30 GO:0006754 ATP biosynthetic process 2.03E−03 11 32 GO:0009253 Peptidoglycan catabolic process 4.58E−03 7 13 GO:0015672 Monovalent inorganic cation 8.97E−03 4 4 transport GO:0050830 Defense response to Gram-positive 1.30E−02 8 21 bacterium GO:0030239 Myofibril assembly 4.94E−02 6 15 GO:0060361 Flight 5.86E−02 5 10 GO:0001894 Tissue homeostasis 6.42E−02 3 3 GO:0044262 Cellular carbohydrate metabolic 6.59E−02 4 6 process GO:0031032 Actomyosin structure organization 7.34E−02 3 3 GO:0045087 Innate immune response 7.34E−02 16 91 GO:0016045 Detection of bacterium 7.48E−02 4 6 GO:0015991 ATP hydrolysis coupled proton 7.48E−02 8 28 transport

161

Table B.2 GSA revealing enriched pathways during cryoprotective dehydration In Gene Set Analysis (GSA) positive gene sets are enriched gene sets in which genes tend to be upregulated, whereas negative gene sets are enriched gene sets in which genes tend to be down-regulated.

Gene set name Score P value Positive gene sets* Jak/STAT signaling pathway 1.22 <2E−4 Ubiquitin mediated proteolysis 0.95 <2E−4 Natural killer cell mediated cytotoxicity 0.73 <2E−4 mTOR signaling pathway 0.71 <2E−4 Wnt signaling pathway 0.54 <2E−4 Purine metabolism 0.37 <2E−4 Negative gene sets* Glycolysis/gluconeogenesis -1.38 <2E−4 Starch and sucrose metabolism -1.16 <2E−4 Propanoate metabolism -1.15 <2E−4 Galactose metabolism -1.13 <2E−4 Pyruvate metabolism -0.92 <2E−4 Ether lipid metabolism -0.86 <2E−4 Drug metabolism—cytochrome P450 -0.83 <2E−4 Retinol metabolism -0.81 <2E−4 Valine, leucine, and isoleucine degradation -0.79 <2E−4 Metabolism of xenobiotics -0.75 <2E−4 Glutathione metabolism -0.5 <2E−4 Amino sugar and nucleotide sugar metabolism -0.42 <2E−4

162

Table B.3 GO enrichment analysis of CD VS D comparison GO enrichment analysis of genes more highly expressed in the cryoprotective dehydration (CD) group relative to the desiccation (D) group. FDR, False Discovery

Rate.

GO term Definition FDR No. up- or Total in

down- category

regulated

GO:0006950 Response to stress 5.18E−04 11 59

GO:0045214 Sarcomere 9.78E−04 11 41

organization

GO:0030239 Myofibril assembly 3.15E−02 6 15

GO:0009408 Response to heat 3.99E−02 9 50

GO:0006508 Proteolysis 9.64E−02 31 595

163

Table B.4 Summary of read statistics from Illumina sequencing Total reads includes the raw number of unprocessed reads obtained from Illumina sequencing, whereas number of high-quality reads refers to the reads that remained after read trimming and filtering during the mapping step. The last column shows the percentage of high-quality reads that unambiguously mapped to a B. antarctica gene model.

Percentage of high- Total no. of No. high-quality quality mapping to a reads reads gene model C1 1,423,663 1,210,411 77.93 C2 11,739,615 10,335,845 79.26 C3 1,180,431 1,027,833 78.06 D1 2,836,265 2,385,458 77.44 D2 5,211,200 4,553,910 78.89 D3 2,746,396 2,349,825 76.81 CD1 2,620,152 2,223,350 77.98 CD2 7,924,380 6,996,403 79.39 CD3 1,861,076 1,608,715 77.89

164

Table B.5 Primers used for qPCR validation R2 and efficiency were determined by conducting an eight-point standard curve with purified PCR product as template. cyp450a and cyp450b, two different cytochrome P450 genes; hsp40, 40-kDa heat shock protein; hsp70, 70-kDa heat shock protein; l(2)efl, lethal-2 essential for life; mlck, myosin light chain kinase; rpl19, ribosomal protein L19; spermidine syn., spermidine synthase; tep3, thiolester containing protein III; UDP-

GlycTrans, UDP-glycosyltransferase.

Gene Accession no. R2 E (%) Primers Tm (°C) l(2)efl GAAK01009816 0.9999 97 F: 5′-ATGGTGCGGTCCTTAACCTTGACT-3′ 60.4 R: 5′-AAATTGCGCAGCGACACCCTTATC-3′ 60.3 hsp40 GAAK01004380 0.9991 77.3 F: 5′-TCGCAATCATTCAACGTTCACGGC-3′ 60.4 R: 5′-TGTTGATGTCTTCCAGGCTGACCA-3′ 60.4 hsp70 GAAK01011953 0.9956 94.9 F: 5′-CTGCTTTGGCTTACGGTTTGGACA-3′ 60 R: 5′-AGATCCCTCGTCGATGGTCAAGAT-3′ 59.3 UDP- GAAK01002922 0.9996 95.4 F: 5′-CGAACTGCTGCATTCCAAGCAAGA-3′ 60.2 GlycTrans R: 5′-GCAACCAACGGAACGTTGAACTGA-3′ 60.2 cyp450a GAAK01011671 0.9996 97.3 F: 5′-TTCGTACTGGAAGAAACTCGGCGT-3′ 60.2 R: 5′-ACGGTGTGCCAAACGACTTCAATG-3′ 60.3 cyp450b GAAK01006077 0.9998 96.7 F: 5′-TCATGGAGCGCGTCGTTAAAGAGA-3′ 60.3 R: 5′-CGGTGCAGCGCGTATATGTTCAAA-3′ 60.1 Sestrin GAAK01000559 0.9999 96.7 F: 5′-GCTTGTTGCTATCCTGACCGCATT-3′ 59.9 R: 5′-TGGCCTCCAGAATCATCAGGTTCA-3′ 60.1 Relish GAAK01006924 0.9991 100.2 F: 5′-TCTTGCGAACTCCGCCTTACAGAA-3′ 60.3 R: 5′-ACTTGTACCGAAACTCGATGGGCT-3′ 60.3 tep3 GAAK01010272 0.9991 100.6 F: 5′-TGACGTCAAAGACGAGGGAAACCA- 60.3 3′ R: 5′-TGAACGGGCGGATCATGAACGATA-3′ 60.3 thread GAAK01000576 0.9999 94.9 F: 5′-TCGGTTCCTCGTTCTTCGTTTCCA-3′ 60.2 R: 5′-ACGACAACCCTTGGGTAGAACACA-3′ 60.2 spermidine GAAK01013086 0.99999 94.1 F: 5′-GCCGTTTATGGCTTGTGGGTTTGA-3′ 60.3 syn. R: 5′-ACTGCTGGGCCAATAGGATCACTT-3′ 60.4 cuticular GAAK01011152 0.9997 97.8 F: 5′-TTAACGCCCGCTTGTGATATGTGC-3′ 60 protein R: 5′-AAGAAAGCGGCATCGTAATGCGTG-3′ 60.3 mlck GAAK01011539 0.9999 96.4 F: 5′-CCGGTGATTACAAATGCATCGCCA-3′ 60.1 R: 5′-ACTCAAGTGTGGTCGTTCGGTTCT-3′ 60.3

rpl19 GAAK01002260 0.999 92.7 F: 5′-ACATCCACAAGCGTAAGGCTGAGA-3′ 60.3

R: 5′-TTCTTGTTTCTTGGTGGCGATGCG-3′ 60.1

165

B.3 Supplemental Figures

Figure B.1 Results of qPCR validation experiment In A and B, the fold changes obtained by both RNA-seq and qPCR are graphed together for the C vs. D (A) and C vs. CD (B) comparisons. InC, individual log fold changes obtained by RNA-seq and qPCR for each gene in each sample are plotted with the best-fit regression line. Log fold changes for each sample were determined relative to the mean of the control group and were normalized to a reference gene, rpl19. C, control; D, desiccation; CD, cryoprotective dehydration; l(2)efl, lethal-2 essential for life; hsp40,40- kDa heat shock protein;hsp70,70 kDa heat shock protein;cyp450a and cyp450b,two different cytochrome P450 genes;tep3,thiolester containing protein III;mlck,myosin light chain kinase.

166

Figure B.1

167

Figure B.2 Metabolomic response to desiccation and cryoprotective dehydration Bars represent mean±SE of the fold change of each metabolite relative to control.

Different letters represent significant differences between groups (ANOVA, pooled t-test, false discovery rate<0.05).

168

Figure B.2

169

Figure B.3 Hierarchical clustering of the metabolomics dataset Hierarchical clustering was conducted on the log metabolite concentrations for each compound in each sample using the Ward method. C, control; D, desiccation; CD, cryoprotective dehydration.

170

Appendix C: Supplement to Chapter 4

C.1 Supplemental Tables

Table C.1 Summary of Illumina read filtering

Bases Reads (#) Bases (#) Reads (%) (%) Raw 428,355,968 40,615,616,800 100 100 Quality Filtered 414,083,892 37,890,395,975 96.67 93.29 k-mer Filtered 376,681,172 33,610,109,059 87.94 82.75

171

Table C.2 Reapr Summary Comparison of the Reapr output for different assemblers, filtering schemes, and parameters. WFull refers to k-mer filtering with k-mer distribution created with reads from a single library. Equal refers to k-mer filtering with k-mer distribution created with reads from equal amounts from each library. Qual refers to quality only filtering (No k- mer filtering).

Assembler Filtering k Error Free (bp) Error Free (%) Total (bp) SOAPdenovo WFull 63 360,137,912 77.99 461,749,204 SOAPdenovo WFull 65 364,395,182 81.15 449,012,964 SOAPdenovo WFull 67 345,903,768 80.06 432,068,322 SOAPdenovo WFull 69 340,035,087 82.57 411,837,682 SOAPdenovo WFull 71 317,015,653 82.93 382,254,842 SOAPdenovo Qual 65 284,506,728 56.47 503,843,443 SOAPdenovo Qual 67 279,315,522 60.67 460,417,480 SOAPdenovo Qual 69 264,617,309 63.31 417,940,701 SOAPdenovo Qual 71 255,445,841 68.21 374,498,616 SOAPdenovo Equal 65 291,373,129 57.40 507,580,294 SOAPdenovo Equal 67 286,664,060 61.88 463,266,561 SOAPdenovo Equal 69 269,330,295 63.94 421,193,959 SOAPdenovo Equal 71 255,093,486 67.55 377,654,689 Abyss WFull 63 330,663,211 59.41 556,617,841 Abyss WFull 65 277,722,332 50.30 552,139,698 Abyss WFull 67 273,679,107 50.03 546,994,756 Abyss WFull 69 342,894,970 63.30 541,726,415 Abyss WFull 71 341,260,993 63.84 534,536,703 Celera WFull 22 3,836,556,978

172

Table C.3 Assembly statistics Summary statistics throughout the assembly process. Number is the number of scaffolds.

Longest is the length (bp) of the longest scaffold. Total and Total (no N) are the length

(bp) of the entire assembly with and without Ns respectively. n50 is the n50 of the assembly (bp).

SOAPdenovo Reapr PBJelly UniVec

Number 328427 345805 241421 241305

Longest 1086837 551903 2355881 2355881

Total 449012964 447765945 522293227 522270905

Total (no N) 438919414 427062759 519197457 519175170 n50 11465 9328 29439 29476

173

Table C.4 Summary of repeat elements

Type Number Bases % of Genome

SINEs: 33478 7221604 1.52

LINEs: 179696 32921812 6.91

LTR 17235 6433260 1.35

DNA 127299 19969616 4.19

Unclassified: 397153 59304762 12.45

Total interspersed 125851054 26.42

Small RNA 323 44904 0.01

Simple repeats 416660 18229578 3.83

Low complexity 89350 4553468 0.96

Total 148389050 31.15

174

Table C.5 Summary of level 2 molecular function Gene Ontology terms

GO ID GO Term S. bullata (#) S. bullata (%) diptera (#) diptera (%)

GO:0000988 protein binding transcription factor activity 504 3.51 969 1.25

GO:0001071 nucleic acid binding transcription factor activity 670 4.66 1941 2.50

GO:0003824 catalytic activity 4839 33.66 20654 26.61

GO:0005198 structural molecule activity 522 3.63 1910 2.46

GO:0005215 transporter activity 1033 7.19 4297 5.54

GO:0005488 binding 8460 58.85 37306 48.06

GO:0009055 electron carrier activity 138 0.96 326 0.42

GO:0016015 morphogen activity 2 0.01 1 0.00

GO:0016209 antioxidant activity 83 0.58 230 0.30

GO:0016530 metallochaperone activity 6 0.04 17 0.02

GO:0031386 protein tag 4 0.03 10 0.01

GO:0042056 chemoattractant activity 5 0.03 10 0.01

GO:0045182 translation regulator activity 45 0.31 81 0.10

GO:0045735 nutrient reservoir activity 11 0.08 14 0.02

GO:0060089 molecular transducer activity 726 5.05 2546 3.28

GO:0098772 molecular function regulator 747 5.20 2064 2.66

175

Table C.6 Summary of level 2 cellular process Gene Ontology terms

S. S. bullata bullata diptera diptera (#) (%) (#) (%) GO:0009987 cellular process 8591 59.76 29813 38.40 GO:0044699 single-organism process 8474 58.95 26726 34.43 GO:0008152 metabolic process 7675 53.39 26643 34.32 GO:0065007 biological regulation 6675 46.43 16710 21.53 GO:0050896 response to stimulus 5791 40.29 12762 16.44 GO:0032501 multicellular organismal process 5543 38.56 10085 12.99 GO:0032502 developmental process 5165 35.93 9197 11.85 GO:0071840 cellular component organization or biogenesis 4868 33.86 9282 11.96 GO:0051179 localization 4411 30.69 11806 15.21 GO:0023052 signaling 3531 24.56 8564 11.03 GO:0051704 multi-organism process 2420 16.83 3719 4.79 GO:0022414 reproductive process 1785 12.42 2874 3.70 GO:0002376 immune system process 1779 12.38 2624 3.38 GO:0000003 reproduction 1460 10.16 2725 3.51 GO:0040011 locomotion 1445 10.05 2503 3.22 GO:0040007 growth 1032 7.18 1862 2.40 GO:0007610 behavior 981 6.82 1455 1.87 GO:0022610 biological adhesion 838 5.83 1582 2.04 GO:0048511 rhythmic process 341 2.37 408 .53 GO:0044848 biological phase 262 1.82 484 .62 GO:0001906 cell killing 45 .31 44 .06 GO:0098743 cell aggregation 20 .14 20 .03

176

Table C.7 Summary of level 2 cellular component Gene Ontology terms

S. bullata S. bullata

(#) (%) diptera (#) diptera (%)

GO:0005576 extracellular region 3016 20.98 5806 7.48

GO:0005623 cell 8668 60.30 23173 29.85

GO:0009295 nucleoid 48 0.33 73 0.09

GO:0016020 membrane 5414 37.66 15964 20.56

GO:0019012 virion 11 0.08 57 0.07

GO:0030054 cell junction 869 6.05 1222 1.57

GO:0031012 extracellular matrix 251 1.75 515 0.66

GO:0031974 membrane-enclosed lumen 3110 21.63 5179 6.67

GO:0032991 macromolecular complex 3845 26.75 9656 12.44

GO:0043226 organelle 7864 54.71 19153 24.67

GO:0044215 other organism 18 0.13 18 0.02

GO:0045202 synapse 742 5.16 1205 1.55

GO:0055044 symplast 137 0.95 302 0.39

177

Table C.8 Gene Ontology terms enriched in S. bullata v. dipteran comparison

GO-ID Term Category FDR P-Value #Test #Ref #not #not OverUnder Annot Annot Test Ref GO:0043234 protein complex C 0 0 3216 7984 7059 48808 OVER GO:0005886 plasma membrane C 0 0 2674 4668 7601 52124 OVER GO:0005654 nucleoplasm C 0 0 2154 2132 8121 54660 OVER GO:0005829 cytosol C 0 0 2126 3365 8149 53427 OVER GO:0005739 mitochondrion C 8.67E-240 2.45E- 1573 3025 8702 53767 OVER 240 GO:0005794 Golgi apparatus C 3.63E-214 1.11E- 1194 2009 9081 54783 OVER 214 GO:0005783 endoplasmic C 1.2E-165 4.28E- 1087 2055 9188 54737 OVER reticulum 166 GO:0005768 endosome C 1.56E-131 6.52E- 683 1067 9592 55725 OVER 132 GO:0016023 cytoplasmic C 8.84E-129 3.9E-129 775 1363 9500 55429 OVER membrane-bounded vesicle GO:0005929 cilium C 1.763E-89 9.371E- 401 546 9874 56246 OVER 90 GO:0000228 nuclear chromosome C 2.568E-88 1.374E- 445 675 9830 56117 OVER 88 GO:0005764 lysosome C 5.617E-87 3.06E-87 427 633 9848 56159 OVER GO:0005615 extracellular space C 4.769E-72 2.646E- 546 1120 9729 55672 OVER 72 GO:0005815 microtubule C 7.687E-65 4.367E- 464 917 9811 55875 OVER organizing center 65 GO:0005635 nuclear envelope C 1.265E-58 7.86E-59 341 583 9934 56209 OVER GO:0009536 plastid C 3.252E-44 2.301E- 282 517 9993 56275 OVER 44 GO:0005777 peroxisome C 1.649E-40 1.227E- 198 296 10077 56496 OVER 40 GO:0005730 nucleolus C 5.655E-29 4.565E- 831 2945 9444 53847 OVER 29 GO:0005578 proteinaceous C 1.905E-25 1.563E- 144 247 10131 56545 OVER extracellular matrix 25 GO:0005811 lipid particle C 1.657E-18 1.409E- 103 177 10172 56615 OVER 18 GO:0005618 cell wall C 1.732E-10 1.566E- 96 233 10179 56559 OVER 10 GO:0005840 ribosome C 5.023E-10 4.589E- 249 866 10026 55926 OVER 10 GO:0009579 thylakoid C 2.024E-09 1.869E- 29 31 10246 56761 OVER 09 GO:0019899 enzyme binding F 9.71E-280 2.32E- 1489 2444 8786 54348 OVER 280 GO:0008134 transcription factor F 1.7E-106 8.31E- 535 810 9740 55982 OVER binding 107 continued

178

Table C.8 continued

GO-ID Term Category FDR P-Value #Test #Ref #not #not OverUnder Annot Annot Test Ref GO:0008289 lipid binding F 1.846E-61 1.104E- 481 1007 9794 55785 OVER 61 GO:0008092 cytoskeletal protein F 9.242E-51 6.233E- 672 1840 9603 54952 OVER binding 51 GO:0043167 ion binding F 1.814E-48 1.259E- 3497 15258 6778 41534 OVER 48 GO:0003677 DNA binding F 4.762E-44 3.401E- 1216 4279 9059 52513 OVER 44 GO:0001071 nucleic acid binding F 4.762E-44 3.402E- 670 1941 9605 54851 OVER transcription factor 44 activity GO:0030234 enzyme regulator F 6.338E-42 4.611E- 542 1470 9733 55322 OVER activity 42 GO:0003729 mRNA binding F 1.395E-36 1.066E- 203 338 10072 56454 OVER 36 GO:0016887 ATPase activity F 5.896E-30 4.741E- 469 1375 9806 55417 OVER 30 GO:0042393 histone binding F 3.169E-26 2.58E-26 140 229 10135 56563 OVER GO:0016874 ligase activity F 1.38E-24 1.137E- 338 945 9937 55847 OVER 24 GO:0004871 signal transducer F 1.794E-21 1.496E- 658 2373 9617 54419 OVER activity 21 GO:0004386 helicase activity F 1.53E-19 1.286E- 160 353 10115 56439 OVER 19 GO:0030674 protein binding, F 3.934E-18 3.372E- 111 205 10164 56587 OVER bridging 18 GO:0016491 oxidoreductase F 7.469E-18 6.426E- 818 3225 9457 53567 OVER activity 18 GO:0022857 transmembrane F 1.479E-16 1.278E- 861 3478 9414 53314 OVER transporter activity 16 GO:0005198 structural molecule F 3.412E-16 2.97E-16 522 1910 9753 54882 OVER activity GO:0016746 transferase activity, F 1.809E-15 1.58E-15 282 884 9993 55908 OVER transferring acyl groups GO:0016829 lyase activity F 2.03E-15 1.78E-15 217 620 10058 56172 OVER GO:0032182 ubiquitin-like protein F 3.751E-11 3.34E-11 98 232 10177 56560 OVER binding GO:0016853 isomerase activity F 1.045E-10 9.409E- 147 424 10128 56368 OVER 11 GO:0016791 phosphatase activity F 1.045E-10 9.41E-11 248 843 10027 55949 OVER GO:0008565 protein transporter F 1.186E-09 1.092E- 92 228 10183 56564 OVER activity 09 GO:0004518 nuclease activity F 2.039E-09 1.89E-09 165 521 10110 56271 OVER GO:0008135 translation factor F 7.469E-09 6.973E- 108 300 10167 56492 OVER activity, RNA 09 binding continued

179

Table C.8 continued

GO-ID Term Category FDR P-Value #Test #Ref #not #not OverUnder Annot Annot Test Ref GO:0016765 transferase activity, F 2.956E-08 2.799E- 65 148 10210 56644 OVER transferring alkyl or 08 aryl (other than methyl) groups GO:0008233 peptidase activity F 8.536E-06 8.139E- 567 2548 9708 54244 OVER 06 GO:0003924 GTPase activity F 2.416E-05 2.312E- 169 641 10106 56151 OVER 05 GO:0016757 transferase activity, F 2.585E-05 2.482E- 210 831 10065 55961 OVER transferring glycosyl 05 groups GO:0016779 nucleotidyltransferase F 4.035E-05 3.887E- 112 392 10163 56400 OVER activity 05 GO:0016301 kinase activity F 9.647E-05 9.326E- 553 2548 9722 54244 OVER 05 GO:0016798 hydrolase activity, F 0.0002897 0.000281 137 529 10138 56263 OVER acting on glycosyl bonds GO:0019843 rRNA binding F 0.0012543 0.001221 36 102 10239 56690 OVER GO:0002376 immune system P 0 0 1779 2624 8496 54168 OVER process GO:0006950 response to stress P 0 0 3266 5527 7009 51265 OVER GO:0007165 signal transduction P 0 0 3176 7926 7099 48866 OVER GO:0030154 cell differentiation P 0 0 3198 4948 7077 51844 OVER GO:0048646 anatomical structure P 9.28E-271 2.31E- 1271 1856 9004 54936 OVER formation involved in 271 morphogenesis GO:0008283 cell proliferation P 6.88E-269 1.74E- 1296 1940 8979 54852 OVER 269 GO:0008219 cell death P 3.05E-260 8.09E- 1555 2797 8720 53995 OVER 261 GO:0006464 cellular protein P 2.04E-251 5.69E- 2438 6068 7837 50724 OVER modification process 252 GO:0007267 cell-cell signaling P 2.51E-234 7.17E- 1015 1359 9260 55433 OVER 235 GO:0000003 reproduction P 5.56E-231 1.61E- 1460 2725 8815 54067 OVER 231 GO:0042592 homeostatic process P 1.14E-211 3.56E- 1160 1924 9115 54868 OVER 212 GO:0007005 mitochondrion P 2.16E-199 6.81E- 659 640 9616 56152 OVER organization 200 GO:0048870 cell motility P 7.27E-197 2.37E- 1013 1587 9262 55205 OVER 197 GO:0009790 embryo development P 1.91E-183 6.4E-184 1393 2924 8882 53868 OVER GO:0006461 protein complex P 1.15E-167 4E-168 1095 2064 9180 54728 OVER assembly GO:0040007 growth P 1.51E-167 5.3E-168 1032 1862 9243 54930 OVER GO:0006629 lipid metabolic P 9.84E-162 3.53E- 1211 2505 9064 54287 OVER process 162 continued

180

Table C.8 continued

GO-ID Term Category FDR P-Value #Test #Ref #not #not OverUnder Annot Annot Test Ref GO:0007010 cytoskeleton P 6.29E-147 2.38E- 1118 2328 9157 54464 OVER organization 147 GO:0050877 neurological system P 9.29E-144 3.55E- 871 1545 9404 55247 OVER process 144 GO:0016192 vesicle-mediated P 4.53E-128 2.02E- 1087 2413 9188 54379 OVER transport 128 GO:0007155 cell adhesion P 7.77E-125 3.51E- 826 1558 9449 55234 OVER 125 GO:0044403 symbiosis, P 5.36E-116 2.51E- 616 979 9659 55813 OVER encompassing 116 mutualism through parasitism GO:0051301 cell division P 1.91E-107 9.22E- 606 1011 9669 55781 OVER 108 GO:0006914 autophagy P 2.87E-107 1.39E- 369 375 9906 56417 OVER 107 GO:0006605 protein targeting P 4.45E-106 2.31E- 658 1181 9617 55611 OVER 106 GO:0006259 DNA metabolic P 8.858E-91 4.62E-91 784 1744 9491 55048 OVER process GO:0003013 circulatory system P 1.499E-89 7.869E- 406 559 9869 56233 OVER process 90 GO:0021700 developmental P 6.988E-88 3.761E- 403 561 9872 56231 OVER maturation 88 GO:0007067 mitotic nuclear P 4.871E-63 2.881E- 368 630 9907 56162 OVER division 63 GO:0007059 chromosome P 8.927E-61 5.368E- 320 505 9955 56287 OVER segregation 61 GO:0005975 carbohydrate P 3.295E-60 1.992E- 778 2111 9497 54681 OVER metabolic process 60 GO:0051604 protein maturation P 5.054E-60 3.072E- 271 373 10004 56419 OVER 60 GO:0006790 sulfur compound P 1.068E-58 6.529E- 349 606 9926 56186 OVER metabolic process 59 GO:0007568 aging P 5.226E-55 3.264E- 390 767 9885 56025 OVER 55 GO:0006397 mRNA processing P 1.206E-54 7.654E- 445 953 9830 55839 OVER 55 GO:0006913 nucleocytoplasmic P 1.281E-53 8.256E- 424 893 9851 55899 OVER transport 54 GO:0007009 plasma membrane P 2.328E-52 1.524E- 289 475 9986 56317 OVER organization 52 GO:0006520 cellular amino acid P 4.336E-51 2.91E-51 449 1006 9826 55786 OVER metabolic process GO:0006412 translation P 1.384E-49 9.561E- 612 1626 9663 55166 OVER 50 GO:0055085 transmembrane P 6.527E-48 4.554E- 1047 3445 9228 53347 OVER transport 48 GO:0006091 generation of P 1.369E-46 9.641E- 437 1016 9838 55776 OVER precursor metabolites 47 and energy GO:0034330 cell junction P 3.405E-42 2.443E- 220 347 10055 56445 OVER organization 42 continued

181

Table C.8 continued

GO-ID Term Category FDR P-Value #Test #Ref #not #not OverUnder Annot Annot Test Ref GO:0030198 extracellular matrix P 6.358E-41 4.668E- 246 433 10029 56359 OVER organization 41 GO:0019748 secondary metabolic P 7.633E-41 5.63E-41 171 221 10104 56571 OVER process GO:0022618 ribonucleoprotein P 6.606E-40 4.982E- 231 396 10044 56396 OVER complex assembly 40 GO:0051186 cofactor metabolic P 3.036E-37 2.3E-37 378 917 9897 55875 OVER process GO:0034655 nucleobase- P 1.376E-34 1.079E- 390 996 9885 55796 OVER containing compound 34 catabolic process GO:0043473 pigmentation P 7.737E-32 6.092E- 134 176 10141 56616 OVER 32 GO:0030705 cytoskeleton- P 2.233E-24 1.848E- 136 231 10139 56561 OVER dependent 24 intracellular transport GO:0007034 vacuolar transport P 4.686E-20 3.923E- 122 223 10153 56569 OVER 20 GO:0071554 cell wall organization P 2.451E-16 2.125E- 98 180 10177 56612 OVER or biogenesis 16 GO:0006399 tRNA metabolic P 4.322E-13 3.834E- 181 517 10094 56275 OVER process 13 GO:0032196 transposition P 2.51E-10 2.277E- 29 27 10246 56765 OVER 10 GO:0006457 protein folding P 2.817E-09 2.62E-09 202 681 10073 56111 OVER GO:0071941 nitrogen cycle P 1.645E-08 1.541E- 20 15 10255 56777 OVER metabolic process 08

182

Table C.9 Library specific expression RPM: reads per million reads mapped; F: female; L: Larva; Ov: ovaries from unmated females; OvEm: ovaries from mated females; T: testies.

Library Model Discription Counts RPM F Sbullata00009846-RA None 153 17.35800696 F Sbullata00013667-RA None 135 15.3158885 L Sbullata00003329-RA None 13326 367.9538332 L Sbullata00007365-RA None 5752 158.8226361 L Sbullata00012747-RA None 1265 34.92883078 L Sbullata00002604-RA None 1076 29.71021496 L Sbullata00000268-RA None 956 26.39680808 cu22_bommo larval cuticle protein lcp-22 os=bombyx mori gn=lcp22 pe=2 L Sbullata00013747-RA sv=1 644 17.78195022 L Sbullata00014197-RA None 448 12.37005232 L Sbullata00010070-RA None 273 7.538000635 L Sbullata00006946-RA None 185 5.108168929 L Sbullata00002089-RA None 175 4.832051689 L Sbullata00011203-RA None 156 4.307428934 L Sbullata00013968-RA None 143 3.948476523 L Sbullata00013908-RA None 64 1.767150332 L Sbullata00005941-RA None 45 1.242527577 L Sbullata00009372-RA None 41 1.132080681 L Sbullata00011018-RA None 38 1.04924551 sp24d_anoga serine protease sp24d os=anopheles gambiae gn=sp24d pe=2 L Sbullata00013305-RA sv=3 20 0.552234479 ast5_drome achaete-scute complex protein t5 os=drosophila melanogaster Ov Sbullata00007528-RA gn=ac pe=2 sv=1 24 1.814774442 continued

183

Table C.9 continued

Library Model Discription Counts RPM OvEm Sbullata00010163-RA None 14169 367.3297458 OvEm Sbullata00012603-RA None 862 22.34725393 OvEm Sbullata00009222-RA None 333 8.632987886 OvEm Sbullata00010970-RA None 331 8.581138109 sp34_apime venom serine protease 34 os=apis mellifera OvEm Sbullata00009475-RA pe=2 sv=1 306 7.933015895 OvEm Sbullata00005016-RA None 133 3.448010177 OvEm Sbullata00014071-RA None 109 2.825812852 OvEm Sbullata00004893-RA None 96 2.4887893 OvEm Sbullata00009537-RA None 91 2.359164858 OvEm Sbullata00007463-RA None 84 2.177690638 OvEm Sbullata00007516-RA None 67 1.736967533 OvEm Sbullata00009684-RA None 43 1.114770207 OvEm Sbullata00003081-RA None 40 1.036995542 OvEm Sbullata00010531-RA None 29 0.751821768 OvEm Sbullata00010249-RA None 21 0.544422659 cbpb_astas carboxypeptidase b os=astacus astacus pe=1 OvEm Sbullata00014354-RA sv=1 20 0.518497771 T Sbullata00004754-RA None 62 5.024358821 T Sbullata00011112-RA None 55 4.457092502 T Sbullata00011998-RA None 22 1.782837001 pkd2_human polycystin-2 os=homo sapiens gn=pkd2 pe=1 T Sbullata00012956-RA sv=3 21 1.701798955 T Sbullata00009926-RA None 18 1.458684819 T Sbullata00011346-RA None 18 1.458684819 T Sbullata00001679-RA None 11 0.8914185 T Sbullata00008679-RA None 9 0.729342409 T Sbullata00012220-RA None 7 0.567266318

184

Appendix D: Supplement to Chapter 5

D.1 Supplemental Tables

Table D.1 Differentially expressed probes in Control vs. CS+2R comparison This table contains the list of all probes that had FDR<0.05 and were at least 1.5-fold up- or down-regulated in the Control vs. CS+2R comparison, as well as some metadata for each probe. For probes derived from previously-annotated Sarcophaga sequences, we did not include the Drosophila homolog.

185

Table D.1

EST Accession Description Drosophila Blastx E-value log2 FC FDR

RefSeq

Homolog

EZ610106 CG10570 NP_652585.1 4.89E-11 2.32 4.13E-17 EZ610719 prophenol oxidase A1 NP_476812.1 3.21E-160 -4.45 1.71E-16 EZ611625 NA NA NA 2.34 1.71E-16 EZ611655 Heat-shock-protein-70Bb NP_524927.2 1.46E-151 3.68 1.35E-15 EZ608140 Heat-shock-protein-70Aa NP_731651.1 7.03E-70 3.29 4.72E-15 SRR006884.54720.1 epidermal stripes and NP_524490.2 1.74E-23 2.11 5.41E-15 patches, isoform A AF107338.2 Sarcophaga 70 kDa heat NA NA 2.86 5.95E-15 shock protein ScHSP70 EZ608890 asparagine synthetase NP_996132.1 7.67E-108 2.77 7.32E-15 EZ597552 NA NA NA 1.83 2.89E-14 EZ601392 pathetic, isoform C NP_729505.1 1.44E-167 1.97 3.13E-14 EZ608919 phosphoenolpyruvate NP_523784.2 1.46E-89 1.54 1.19E-13 carboxykinase, isoform A EZ611115 lethal (2) essential for life, NP_523827.1 2.16E-42 1.46 1.19E-13 isoform A EZ610108 NA NA NA 1.68 1.35E-13 EZ605413 CG1673 NP_572884.1 4.37E-61 2.02 1.86E-13 EZ612017 defense repressor 1 NP_611680.2 6.72E-29 1.14 1.86E-13 SRR006884.231129.1 NA NA NA 1.28 1.86E-13 EZ605270 NA NA NA 1.79 2.41E-13 EZ608250 CG9572, isoform A NP_608372.1 0 1.64 2.41E-13 EZ597976 Dgp-1, isoform B NP_611302.1 3.44E-86 2.07 5.25E-13 EZ604518 CG32103, isoform B NP_729802.1 2.78E-35 1.60 5.66E-13 EZ616395 NA NA NA 1.55 9.21E-13 EZ597277 CG17734, isoform B NP_731561.1 6.71E-20 1.06 1.07E-12 EZ609079 heat shock protein 83 NP_523899.1 0 1.66 1.07E-12 EZ597590 DnaJ-like-1, isoform A NP_523936.2 1.09E-128 1.14 1.14E-12 AF261773.1 Sarcophaga HSP90 gene, NA NA 1.36 2.58E-12 partial cds EZ609815 CG31288 NP_733094.1 3.09E-92 1.92 3.36E-12 SRR006884.17184.1 CG1673 NP_572884.1 4.07E-42 1.79 3.59E-12 EZ617119 NA NA NA 1.44 6.49E-12 EZ607850 hairy, isoform A NP_523977.2 1.03E-72 1.37 1.12E-11 U96099.2 Sarcophaga HSP23, NA NA 1.74 3.74E-11 ScHSP23 EZ609349 CG10731 NP_001027422.1 2.60E-56 0.91 4.45E-11 SRR006884.221712.1 NA NA NA 2.37 4.61E-11 EZ602763 NA NA NA 1.51 4.65E-11 EZ611782 NA NA NA 0.74 4.80E-11 EZ605491 CG8026, isoform B NP_610468.1 3.00E-72 0.90 5.56E-11 SRR006884.148248.1 CG3759, isoform A NP_609287.3 7.19E-28 1.25 9.87E-11 EZ611426 NA NA NA 1.67 1.33E-10 EZ610237 CG32103, isoform C NP_729803.1 1.34E-40 1.55 1.78E-10 SRR006884.108373.1 NA NA NA 1.66 1.91E-10 EZ616916 Ecdysone-inducible gene NP_728960.1 4.79E-47 0.94 1.97E-10 L2, isoform B EZ600924 CG6465 NP_650004.1 5.40E-135 -0.97 2.14E-10 SRR006884.98433.1 NA NA NA -0.77 2.95E-10 EZ600617 CG10467 NP_648026.1 9.29E-124 -0.76 2.99E-10 continued

186

Table D.1 continued

EST Accession Description Drosophila Blastx E-value log2 FC FDR

RefSeq

Homolog

SRR006884.134994.1 CG10444 NP_611465.2 1.06E-18 1.01 5.73E-10 EZ600280 CG10103 NP_648058.1 1.73E-89 0.63 1.33E-09 EZ612503 TGF-beta activated kinase NP_524080.1 6.52E-13 0.59 1.87E-09 1 EZ599915 CG15528 NP_651767.2 1.79E-61 0.93 3.30E-09 EZ608352 twin of m4 NP_524073.2 7.74E-32 -0.67 3.30E-09 EZ614742 NA NA NA 0.86 3.89E-09 EZ598021 CG15745, isoform A NP_572873.1 6.98E-10 0.63 4.16E-09 EZ603466 CG4302 NP_611563.1 3.14E-118 0.74 4.20E-09 EZ606826 NA NA NA 0.91 4.28E-09 EZ600126 cellular retinaldehyde NP_523939.1 3.82E-77 0.76 6.57E-09 binding protein EZ597235 NA NA NA 0.67 7.21E-09 SRR006884.82471.1 NA NA NA 0.93 7.21E-09 SRR006884.127793.1 NA NA NA 0.98 7.81E-09 EZ600225 c11.1 NP_652606.1 1.04E-41 0.61 8.07E-09 EZ608338 CG1600, isoform A NP_724572.1 1.31E-132 0.71 8.64E-09 SRR006884.79852.1 NA NA NA 0.85 8.80E-09 EZ617564 nimrod C3 NP_524928.2 2.37E-26 -0.59 9.10E-09 EZ603690 tissue inhibitor of NP_524301.2 3.11E-46 0.76 9.93E-09 metalloproteases, isoform A EZ601976 lipid storage droplet-2, NP_001036276.1 9.73E-126 0.82 1.06E-08 isoform B EZ603392 CG13510 NP_611700.2 1.56E-43 0.62 1.21E-08 EZ613093 NA NA NA 0.80 1.43E-08 EZ597550 heat shock protein 23 NP_523999.1 3.92E-48 1.63 1.44E-08 EZ614212 bmcp, isoform B NP_648501.1 2.38E-30 0.67 1.60E-08 EZ604019 CG6299, isoform B NP_788906.1 4.32E-49 0.77 1.61E-08 SRR006884.141855.1 myosin 31DF, isoform A NP_523538.1 2.96E-34 0.70 1.64E-08 EZ606871 transport and golgi NP_608577.2 6.30E-17 0.68 2.21E-08 organization 14 SRR006884.50189.1 GIIIspla2 NP_572454.2 4.10E-18 0.75 2.60E-08 EZ598987 NA NA NA 0.63 3.33E-08 EZ603367 Xrp1, isoform A NP_732384.1 9.48E-13 0.71 3.60E-08 SRR006884.182940.1 NA NA NA 0.63 4.10E-08 EZ602220 lamin C NP_523742.2 1.76E-13 0.76 5.54E-08 SRR006884.49279.1 NA NA NA 0.68 6.14E-08 EZ610004 astray NP_524001.2 6.52E-83 0.60 6.79E-08 SRR006884.182737.1 NA NA NA 0.89 6.79E-08 SRR006884.78656.1 NA NA NA 0.94 7.38E-08 EZ599261 CG11050, isoform A NP_609052.1 1.29E-73 0.62 8.05E-08 EZ608889 CG11050, isoform A NP_609052.1 3.26E-43 0.67 9.60E-08 SRR006884.201391.1 CG42327 NP_001036706.2 6.47E-32 0.59 9.72E-08 SRR006884.83266.1 NA NA NA 0.71 1.07E-07 EZ615581 lamin C NP_523742.2 2.42E-39 0.82 1.34E-07 EZ602774 CG2765, isoform A NP_611988.1 2.56E-48 0.63 1.66E-07 SRR006884.89607.1 suppressor of cytokine NP_724096.1 2.70E-46 0.71 2.28E-07 signaling at 36E, isoform A EZ606899 CG32155 NP_730119.1 5.96E-78 0.67 2.30E-07 SRR006884.187442.1 NA NA NA 0.61 2.38E-07 EZ599632 heat shock protein cognate NP_727563.1 5.00E-80 0.60 2.56E-07 3, isoform A EZ603692 alpha/beta hydrolase2 NP_608751.2 1.36E-76 0.59 2.57E-07 continued 187

Table D.1 continued

EST Accession Description Drosophila Blastx E-value log2 FC FDR

RefSeq

Homolog

EZ601636 CG10407 NP_650519.1 7.56E-66 0.65 3.89E-07 SRR006884.242661.1 translocase of outer NP_524796.1 2.03E-36 0.61 6.63E-07 membrane 34 EZ598796 heat shock protein cognate NP_727563.1 0 0.81 7.34E-07 3, isoform A EZ606591 NA NA NA 0.70 1.16E-06 SRR006884.74523.1 methuselah-like 9 NP_612029.1 6.93E-26 0.59 1.17E-06 EZ599411 Cyp303a1 NP_652070.1 7.52E-130 0.97 1.19E-06 SRR006884.195476.1 NA NA NA 0.64 1.21E-06 SRR006884.131738.1 wurst NP_573180.1 2.72E-06 0.69 1.61E-06 SRR006884.239437.1 CG6592 NP_648016.1 6.65E-05 0.75 3.56E-06 EZ612535 NA NA NA -0.61 4.77E-06 SRR006884.250231.1 NA NA NA 0.65 5.11E-06 EZ617454 NA NA NA -0.64 7.26E-06 SRR006884.164446.1 NA NA NA 0.60 7.70E-06 EZ597554 heat shock protein 23 NP_523999.1 1.34E-40 0.62 7.82E-06 SRR006884.66098.1 pinocchio, isoform A NP_608568.1 8.51E-37 0.90 1.37E-05 SRR006884.75201.1 NA NA NA 0.76 2.24E-05 SRR006884.129049.1 NA NA NA 1.31 2.73E-05 SRR006884.93116.1 Ras opposite NP_523916.2 1.72E-05 0.69 5.94E-05 SRR006884.3701.1 CG42666, isoform B NP_569945.1 6.62E-21 0.59 1.14E-04 EZ598153 CG10165, isoform A NP_609982.1 6.60E-16 0.60 2.12E-03

188

Table D.2 Differentially expressed probes in Control vs. RCH+CS+2R comparison This table contains the list of all probes that had FDR<0.05 and were at least 1.5-fold up- or down-regulated in the Control vs. RCH+CS+2R comparison, as well as some metadata for each probe. For probes derived from previously-annotated Sarcophaga sequences, we did not include the Drosophila homolog. FDR, False Discovery Rate.

189

Table D.2

EST Accession Description Drosophila Blastx log2 FDR RefSeq E-value FC Homolog EZ610106 CG10570 NP_652585.1 4.89E-11 2.54 8.01E-18 EZ611625 NA NA NA 2.53 6.29E-17 EZ610719 prophenol oxidase A1 NP_476812.1 3.21E-160 -4.49 1.24E-16 EZ611655 Heat-shock-protein-70Bb NP_524927.2 1.46E-151 3.99 3.04E-16 EZ608140 Heat-shock-protein-70Aa NP_731651.1 7.03E-70 3.48 1.69E-15 EZ608890 asparagine synthetase NP_996132.1 7.67E-108 3.04 1.77E-15 AF107338.2 70 kDa heat shock protein NA NA 3.03 2.08E-15 ScHSP70 SRR006884.54720.1 epidermal stripes and NP_524490.2 1.74E-23 2.18 2.17E-15 patches, isoform A EZ601392 pathetic, isoform C NP_729505.1 1.44E-167 2.29 2.20E-15 EZ597552 NA NA NA 2.05 3.28E-15 EZ610108 NA NA NA 1.98 7.95E-15 EZ611115 lethal (2) essential for life, NP_523827.1 2.16E-42 1.52 5.53E-14 isoform A EZ608919 phosphoenolpyruvate NP_523784.2 1.46E-89 1.58 6.82E-14 carboxykinase, isoform A SRR006884.231129.1 NA NA NA 1.34 9.08E-14 EZ605270 NA NA NA 1.90 1.03E-13 EZ605413 CG1673 NP_572884.1 4.37E-61 2.05 1.34E-13 EZ609815 CG31288 NP_733094.1 3.09E-92 2.33 1.60E-13 EZ597976 Dgp-1, isoform B NP_611302.1 3.44E-86 2.22 1.60E-13 EZ608250 CG9572, isoform A NP_608372.1 0 1.61 3.10E-13 EZ607850 hairy, isoform A NP_523977.2 1.03E-72 1.68 3.86E-13 AF261773.1 Sarcophaga crassipalpis NA NA 1.52 3.86E-13 heat shock protein 90 gene, partial cds EZ612017 defense repressor 1 NP_611680.2 6.72E-29 1.05 4.77E-13 EZ597590 DnaJ-like-1, isoform A NP_523936.2 1.09E-128 1.20 4.94E-13 EZ609079 heat shock protein 83 NP_523899.1 0 1.71 5.89E-13 EZ616395 NA NA NA 1.52 1.11E-12 EZ604518 CG32103, isoform B NP_729802.1 2.78E-35 1.47 1.77E-12 U96099.2 Sarcophaga crassipalpis 2.07 1.90E-12 23kDa heat shock protein ScHSP23 mRNA, complete cds EZ597277 CG17734, isoform B NP_731561.1 6.71E-20 1.01 1.91E-12 EZ617119 NA NA NA 1.53 2.11E-12 SRR006884.17184.1 CG1673 NP_572884.1 4.07E-42 1.82 2.31E-12 SRR006884.221712.1 NA NA NA 2.76 3.04E-12 EZ605491 CG8026, isoform B NP_610468.1 3.00E-72 1.05 4.01E-12 SRR006884.108373.1 NA NA NA 2.04 6.23E-12 EZ616916 Ecdysone-inducible gene NP_728960.1 4.79E-47 1.15 6.53E-12 L2, isoform B SRR006884.148248.1 CG3759, isoform A NP_609287.3 7.19E-28 1.37 2.09E-11 EZ603590 NA NA NA 1.47 2.09E-11 EZ611782 NA NA NA 0.76 2.67E-11 EZ602763 NA NA NA 1.47 6.30E-11 EZ600617 CG10467 NP_648026.1 9.29E-124 -0.83 7.28E-11 EZ611426 NA NA NA 1.72 7.41E-11 EZ609349 CG10731 NP_001027422.1 2.60E-56 0.87 7.85E-11 SRR006884.134994.1 CG10444 NP_611465.2 1.06E-18 1.12 9.36E-11 EZ608352 twin of m4 NP_524073.2 7.74E-32 -0.80 1.77E-10 EZ613093 NA NA NA 1.06 1.87E-10 SRR006884.98433.1 NA NA NA -0.77 2.55E-10 continued

190

Table D.2 continued

EZ606826 NA NA NA 1.09 2.64E-10 EZ597550 heat shock protein 23 NP_523999.1 3.92E-48 2.09 3.17E-10 EZ600924 CG6465 NP_650004.1 5.40E-135 -0.93 3.93E-10 EZ599915 CG15528 NP_651767.2 1.79E-61 1.05 4.47E-10 EZ601976 lipid storage droplet-2, NP_001036276.1 9.73E-126 1.00 4.52E-10 isoform B EZ610467 CG3226 NP_572332.1 8.63E-65 0.63 4.86E-10 EZ611783 CG10103 NP_648058.1 1.46E-13 0.61 5.33E-10 EZ600280 CG10103 NP_648058.1 1.73E-89 0.65 5.57E-10 EZ603466 CG4302 NP_611563.1 3.14E-118 0.83 5.93E-10 EZ606871 transport and golgi NP_608577.2 6.30E-17 0.85 6.67E-10 organization 14 EZ610237 CG32103, isoform C NP_729803.1 1.34E-40 1.33 1.62E-09 EZ614212 bmcp, isoform B NP_648501.1 2.38E-30 0.77 2.05E-09 SRR006884.79723.1 bmcp, isoform B NP_648501.1 2.17E-27 0.70 2.29E-09 EZ603690 tissue inhibitor of NP_524301.2 3.11E-46 0.82 2.80E-09 metalloproteases, isoform A EZ598021 CG15745, isoform A NP_572873.1 6.98E-10 0.64 3.25E-09 EZ602774 CG2765, isoform A NP_611988.1 2.56E-48 0.82 3.35E-09 SRR006884.141855.1 myosin 31DF, isoform A NP_523538.1 2.96E-34 0.77 3.56E-09 EZ603392 CG13510 NP_611700.2 1.56E-43 0.64 8.44E-09 SRR006884.82471.1 NA NA NA 0.91 9.08E-09 EZ597235 NA NA NA 0.64 1.19E-08 EZ599261 CG11050, isoform A NP_609052.1 1.29E-73 0.70 1.39E-08 EZ601640 sprint, isoform G NP_788896.1 9.63E-55 0.72 1.40E-08 SRR006884.49279.1 NA NA NA 0.75 1.43E-08 EZ610004 astray NP_524001.2 6.52E-83 0.66 1.53E-08 EZ604019 CG6299, isoform B NP_788906.1 4.32E-49 0.76 1.81E-08 EZ598987 NA NA NA 0.65 1.84E-08 SRR006884.83266.1 NA NA NA 0.79 2.13E-08 SRR006884.127793.1 NA NA NA 0.91 2.19E-08 SRR006884.50189.1 GIIIspla2 NP_572454.2 4.10E-18 0.75 2.42E-08 EZ614742 NA NA NA 0.75 2.74E-08 EZ598183 CG12795 NP_001097074.1 1.69E-28 0.63 2.94E-08 EZ606072 NA NA NA 0.61 3.15E-08 SRR006884.78656.1 NA NA NA 0.99 3.23E-08 EZ606899 CG32155 NP_730119.1 5.96E-78 0.76 3.34E-08 EZ607675 Glutamate-cysteine ligase NP_732780.1 9.50E-62 0.64 3.69E-08 modifier subunit EZ600865 NA NA NA 0.71 4.23E-08 EZ599632 heat shock protein NP_727563.1 5.00E-80 0.68 4.34E-08 cognate 3, isoform A EZ606591 NA NA NA 0.88 4.34E-08 EZ607807 CG12795 NP_001097074.1 2.29E-07 0.64 5.45E-08 EZ603692 alpha/beta hydrolase2 NP_608751.2 1.36E-76 0.65 5.76E-08 SRR006884.74279.1 NA NA NA 0.65 5.76E-08 EZ608889 CG11050, isoform A NP_609052.1 3.26E-43 0.68 6.44E-08 EZ599411 Cyp303a1 NP_652070.1 7.52E-130 1.18 6.70E-08 EZ600126 cellular retinaldehyde NP_523939.1 3.82E-77 0.64 7.09E-08 binding protein SRR006884.182737.1 NA NA NA 0.88 7.62E-08 EZ602220 lamin C NP_523742.2 1.76E-13 0.73 8.99E-08 SRR006884.239437.1 CG6592 NP_648016.1 6.65E-05 0.98 9.62E-08 EZ612535 NA NA NA -0.78 1.50E-07 EZ597554 heat shock protein 23 NP_523999.1 1.34E-40 0.82 1.74E-07 SRR006884.79852.1 NA NA NA 0.67 2.71E-07 SRR006884.129049.1 NA NA NA 1.85 2.77E-07 continued

191

Table D.2 continued

EZ613335 HDAC4, isoform D NP_572868.3 6.92E-30 0.71 2.85E-07 SRR006884.74523.1 methuselah-like 9 NP_612029.1 6.93E-26 0.65 2.99E-07 EZ609900 annexin IX, isoform B NP_476603.1 1.29E-95 0.60 3.25E-07 SRR006884.83846.1 HDAC4, isoform B NP_727682.1 2.72E-38 0.63 3.28E-07 EZ607342 CG2017, isoform B NP_731024.1 1.86E-73 0.62 3.51E-07 EZ615581 lamin C NP_523742.2 2.42E-39 0.76 3.56E-07 SRR006884.131738.1 wurst NP_573180.1 2.72E-06 0.76 4.42E-07 SRR006884.242661.1 translocase of outer NP_524796.1 2.03E-36 0.62 4.58E-07 membrane 34 EZ617062 NA NA NA 0.69 4.71E-07 SRR006884.60792.1 NA NA NA 0.65 6.01E-07 EZ611432 X box binding protein-1 NP_726032.3 7.01E-40 0.60 8.68E-07 SRR006884.250231.1 NA NA NA 0.73 9.38E-07 EZ615862 NA NA NA 0.59 9.65E-07 SRR006884.89607.1 suppressor of cytokine NP_724096.1 2.70E-46 0.63 1.03E-06 signaling at 36E, isoform A EZ598796 heat shock protein NP_727563.1 0 0.78 1.03E-06 cognate 3, isoform A EZ596809 NA NA NA 0.68 1.27E-06 EZ602986 tissue inhibitor of NP_524301.2 1.57E-14 0.63 2.51E-06 metalloproteases, isoform A EZ598810 CG7214 NP_609141.1 1.74E-10 0.63 3.18E-06 EZ598153 CG10165, isoform A NP_609982.1 6.60E-16 1.03 3.62E-06 SRR006884.164446.1 NA NA NA 0.63 3.82E-06 SRR006884.46929.1 helix loop helix protein NP_730449.1 7.90E-22 0.59 4.06E-06 106, isoform A SRR006884.21927.1 NA NA NA 0.63 4.35E-06 EZ617454 NA NA NA -0.65 4.73E-06 EZ598790 NA NA NA 0.61 1.70E-05 EZ598074 thread, isoform A NP_524101.2 3.38E-35 -0.59 4.44E-05 SRR006884.75201.1 NA NA NA 0.70 5.41E-05 SRR006884.93116.1 Ras opposite NP_523916.2 1.72E-05 0.67 6.99E-05 EZ604907 CG15497 NP_650976.1 1.15E-53 0.70 7.17E-05 SRR006884.68759.1 NA NA NA 0.73 2.49E-04 EZ607226 NA NA NA 0.68 6.64E-04 EZ602845 CG12995 NP_728059.1 1.19E-14 -1.24 7.21E-04 SRR006884.66098.1 pinocchio, isoform A NP_608568.1 8.51E-37 0.61 1.04E-03 EZ601961 neural conserved at 73EF, NP_001097629.1 1.71E-47 0.61 1.36E-03 isoform I EZ598957 pale, isoform A NP_476897.1 9.05E-169 1.07 1.80E-03 EZ604700 NA NA NA 0.64 5.51E-03 EZ600519 Attacin-A NP_523745.1 9.29E-28 0.68 1.78E-02 EZ608452 diptericin NP_476808.1 2.39E-15 1.70 1.96E-02 EZ598023 NA NA NA 1.42 2.01E-02

192

Table D.3 Expression of heat stress genes during recovery from cold shock This list of genes, identified as significantly enriched using gene set analysis (GSA), was obtained from a microarray study of Drosophila heat stress (Sorensen et al., 2005). The column "GSA gene score" is a modified t-statistic that reflects the relative importance of a particular transcript towards the overall enrichment of that gene set.

193

Table D.3

Drosophila Blastx GSA RefSeq E- gene Log2 EST Accession Description Homolog value score FC FDR Sarcophaga crassipalpis 70 kDa heat shock protein ScHSP70 (HSP70) 5.95E- AF107338.2 mRNA, partial cds NA NA 36.38 2.86 15 Heat-shock-protein-70Aa [Drosophila 7.03E- 4.72E- FLY.8943.C3 melanogaster] NP_731651.1 70 25.59 3.29 15 Sarcophaga crassipalpis heat shock protein 90 2.58E- AF261773.1 gene, partial cds NA NA 18.09 1.36 12 Sarcophaga crassipalpis 23kDa heat shock protein ScHSP23 mRNA, 3.74E- U96099.2 complete cds NA NA 11.41 1.74 11 phosphoenolpyruvate carboxykinase, isoform A [Drosophila 1.46E- 1.19E- FLY.9464.C1 melanogaster] NP_523784.2 89 10.65 1.54 13 astray [Drosophila 6.52E- 6.79E- FLY.10178.C1 melanogaster] NP_524001.2 83 5.39 0.6 08 thor [Drosophila 5.31E- 1.76E- FLY.10812.C2 melanogaster] NP_477295.1 20 4.58 0.32 04 CG10383 [Drosophila 3.60E- 8.92E- FLY.5812.C1 melanogaster] NP_609896.1 51 4.39 0.55 05 CG1946 [Drosophila 1.40E- 8.16E- FLY.9317.C1 melanogaster] NP_610319.2 18 4.27 0.43 06 CTP:phosphocholine cytidylyltransferase 1, isoform A [Drosophila 9.19E- 3.36E- FLY.1081.C1 melanogaster] NP_647621.1 148 3.37 0.25 03 Ecdysone-inducible gene L3 [Drosophila 5.78E- 1.75E- FLY.1418.C1 melanogaster] NP_476581.1 169 3.11 0.46 05 CG16749 [Drosophila 5.75E- 3.46E- FLY.9371.C1 melanogaster] NP_649881.1 17 1.73 0.1 01 CG16762 [Drosophila 5.19E- 1.41E- FLY.2711.C1 melanogaster] NP_647722.1 30 1.66 0.33 02 CG3106 [Drosophila 5.92E- FLY.1945.C1 melanogaster] NP_572577.1 0 1.58 0.06 01 CG10189 [Drosophila 3.83E- 2.87E- FLY.6518.C1 melanogaster] NP_609975.1 35 1.28 0.13 01 CG18522 [Drosophila 2.33E- 2.02E- FLY.259.C1 melanogaster] NP_650475.1 98 1.25 0.12 01 homogentisate 1,2- dioxygenase [Drosophila 1.73E- 8.27E- FLY.502.C1 melanogaster] NP_523544.2 41 1.19 0.02 01 juvenile hormone 3 [Drosophila 3.01E- 6.17E- FLY.994.C1 melanogaster] NP_611387.1 147 1.09 0.07 01 CG14439, isoform A [Drosophila 7.07E- 8.14E- FLY.9329.C1 melanogaster] NP_572347.1 70 0.96 0.12 02 continued

194

Table D.3 continued

CG11200, isoform B [Drosophila 6.33E- 7.80E- FLY.3028.C1 melanogaster] NP_611471.1 121 0.94 0.21 02 CG5966 [Drosophila 1.79E- 2.66E- FLY.16066.C1 melanogaster] NP_572286.1 37 0.93 0.12 01 multiple inositol polyphosphate phosphatase 1, isoform A [Drosophila 6.40E- 7.58E- FLY.4743.C1 melanogaster] NP_524109.2 32 0.84 0.23 02 CG17224, isoform A [Drosophila 1.51E- 9.62E- EUA37Q302F5T6Z melanogaster] NP_722867.1 09 0.71 -0.01 01 CG6660 [Drosophila 1.93E- 7.95E- EUA37Q301DLFRS melanogaster] NP_651104.1 12 0.65 0.05 01 CG31300 [Drosophila 5.04E- 3.70E- FLY.13427.C1 melanogaster] NP_651373.2 05 0.51 0.11 01 CG5945 [Drosophila 3.38E- 4.35E- FLY.738.C1 melanogaster] NP_609626.1 29 0.49 0.11 01 thioredoxin reductase-1, isoform A [Drosophila 7.74E- FLY.9718.C1 melanogaster] NP_511082.2 0 0.43 0.03 01 heat shock protein 27 [Drosophila 5.06E- 9.10E- FLY.10507.C2 melanogaster] NP_524000.1 58 0.35 -0.02 01 CG5023 [Drosophila 2.03E- 2.56E- FLY.10355.C1 melanogaster] NP_650867.1 85 0.34 0.15 01 CG10514 [Drosophila 1.74E- 5.42E- FLY.10299.C1 melanogaster] NP_651371.1 22 0.33 0.07 01 thiolester containing protein II, isoform A [Drosophila 6.13E- 6.54E- EUA37Q302JLPCO melanogaster] NP_523506.1 06 0.3 0.05 01 CG7916 [Drosophila 1.34E- 9.81E- FLY.1408.C1 melanogaster] NP_609682.1 92 0.28 0.01 01 CG4847, isoform A [Drosophila 3.35E- 6.53E- FLY.87.C1 melanogaster] NP_611221.1 116 0.27 0.06 01 CG6602 [Drosophila 5.19E- 2.08E- EUA37Q301CCVIK melanogaster] NP_648020.1 05 0.15 0.14 01 CG10472 [Drosophila 2.53E- 7.24E- FLY.124.C2 melanogaster] NP_648017.1 86 0.01 -0.05 01 CG18249, isoform C [Drosophila 8.58E- 9.78E- FLY.8677.C1 melanogaster] NP_649771.1 17 0.01 0 01 ornithine aminotransferase precursor [Drosophila 1.10E- 7.32E- EUA37Q302JGIYY melanogaster] NP_649139.1 15 -0.03 0.06 01 CG8997 [Drosophila 1.12E- 6.10E- FLY.10168.C2 melanogaster] NP_609681.1 80 -0.04 -0.08 01 CG8773 [Drosophila 1.22E- 9.11E- EUA37Q302JOT8U melanogaster] NP_650273.2 22 -0.05 -0.05 01 urate oxidase [Drosophila 2.93E- 8.72E- FLY.2460.C2 melanogaster] NP_476779.1 121 -0.06 0.02 01 continued

195

Table D.3 continued

CG11378 [Drosophila 9.18E- 6.04E- FLY.988.C1 melanogaster] NP_569886.1 76 -0.17 -0.05 01 CG8774, isoform A [Drosophila 3.70E- 8.09E- EUA37Q302G9AOD melanogaster] NP_650274.2 24 -0.23 -0.04 01 CG31414 [Drosophila 3.53E- 7.41E- FLY.1190.C1 melanogaster] NP_001097888.1 30 -0.5 0.04 01 inwardly rectifying potassium channel 3, isoform A [Drosophila 5.60E- 5.56E- EUA37Q301CWH7G melanogaster] NP_609903.2 36 -0.53 0.06 01 CG2254 [Drosophila 3.52E- 7.25E- FLY.9866.C1 melanogaster] NP_572436.1 114 -0.55 -0.04 01 CG9702 [Drosophila 2.62E- 7.22E- EUA37Q302IJ598 melanogaster] NP_651810.1 25 -0.6 -0.05 01 refractory to sigma P, isoform A [Drosophila 3.87E- 6.36E- FLY.10624.C2 melanogaster] NP_476700.1 28 -0.67 -0.05 01 CG10513 [Drosophila 2.08E- 5.63E- EUA37Q301BZDUA melanogaster] NP_651370.2 14 -0.67 -0.07 01 CG5390 [Drosophila 2.12E- 8.10E- FLY.177.C1 melanogaster] NP_609374.1 127 -0.72 -0.03 01 CG6726, isoform A [Drosophila 1.43E- 3.54E- FLY.10471.C1 melanogaster] NP_651120.1 31 -0.76 -0.08 01 CG6426 [Drosophila 4.57E- 9.53E- FLY.519.C1 melanogaster] NP_611163.2 65 -0.77 -0.01 01 CG15201 [Drosophila 6.53E- 8.89E- FLY.3238.C1 melanogaster] NP_572691.1 16 -0.77 -0.03 01 CG16985 [Drosophila 9.72E- 1.81E- FLY.3087.C1 melanogaster] NP_647730.1 33 -0.82 -0.11 01 Odorant-binding protein 99b [Drosophila 2.70E- 4.62E- FLY.9422.C3 melanogaster] NP_651713.1 37 -0.82 -0.07 01 CG3984 [Drosophila 8.01E- 6.65E- FLY.13646.C1 melanogaster] NP_650420.1 05 -0.83 0.05 01 CG5107 [Drosophila 3.84E- 3.57E- FLY.9085.C3 melanogaster] NP_651401.1 07 -0.87 -0.19 01 CG3246 [Drosophila 2.10E- 5.32E- FLY.343.C1 melanogaster] NP_608781.2 159 -0.9 -0.08 01 NAD-dependent methylenetetrahydrofolate dehydrogenase, isoform B [Drosophila 1.12E- 2.26E- FLY.5979.C1 melanogaster] NP_476929.1 80 -0.91 -0.09 01 CG5618, isoform A [Drosophila 4.68E- 1.69E- FLY.10202.C2 melanogaster] NP_649211.1 159 -1 -0.09 01 CG6733 [Drosophila 1.12E- 4.12E- FLY.10471.C2 melanogaster] NP_651123.1 20 -1.05 -0.06 01 CG12116 [Drosophila 2.14E- 8.89E- FLY.10554.C3 melanogaster] NP_572481.1 66 -1.08 -0.04 01 ferredoxin, isoform A [Drosophila 1.14E- 7.40E- FLY.9488.C1 melanogaster] NP_523993.1 43 -1.2 -0.04 01 vermilion [Drosophila 5.57E- 2.10E- FLY.2917.C1 melanogaster] NP_511113.1 154 -1.27 -0.13 01 continued

196

Table D.3 continued

CG30491 [Drosophila 5.90E- 1.25E- FLY.3971.C1 melanogaster] NP_610306.1 62 -1.35 -0.13 01 Ance-5 [Drosophila 6.85E- 3.25E- FLY.11818.C1 melanogaster] NP_573392.2 64 -1.44 -0.1 01 CG12374 [Drosophila 5.05E- 2.57E- FLY.1256.C3 melanogaster] NP_610819.1 126 -1.68 -0.12 01 CG6126 [Drosophila 3.03E- 6.47E- FLY.12551.C1 melanogaster] NP_650528.1 05 -1.78 -0.16 02 CG14528 [Drosophila 4.38E- 1.97E- EUA37Q302I27F3 melanogaster] NP_651643.1 25 -1.79 -0.11 01 CG13607, isoform A [Drosophila 1.89E- 8.38E- FLY.1034.C1 melanogaster] NP_651215.1 19 -1.81 -0.02 01 CG10026, isoform A [Drosophila 2.72E- 2.69E- EUA37Q301AMACP melanogaster] NP_724195.1 14 -1.82 -0.11 01 CG3348 [Drosophila 3.97E- 1.08E- FLY.2561.C1 melanogaster] NP_652261.1 08 -1.86 -0.14 01 Odorant-binding protein 56a [Drosophila 5.43E- 5.85E- FLY.10893.C1 melanogaster] NP_611442.1 29 -1.94 -0.12 01 rosy [Drosophila 2.73E- 6.71E- EUA37Q302G1BTZ melanogaster] NP_524337.1 30 -2.06 -0.04 01 CG5162 [Drosophila 3.14E- 4.91E- FLY.535.C2 melanogaster] NP_573202.1 137 -2.23 -0.1 01 sugarbabe [Drosophila 8.00E- 6.63E- EUA37Q301BI7MQ melanogaster] NP_610826.1 11 -2.32 -0.16 02 CG31908, isoform A [Drosophila 1.72E- 3.62E- EUA37Q302HFSZG melanogaster] NP_723239.1 21 -2.46 -0.08 01 CG16799 [Drosophila 8.93E- 9.31E- FLY.3495.C4 melanogaster] NP_611504.2 28 -2.56 -0.13 02 CG32483 [Drosophila 5.21E- FLY.10206.C1 melanogaster] NP_728540.1 0 -2.57 -0.17 02 alkaline phosphatase 4, isoform A [Drosophila 3.41E- FLY.2376.C1 melanogaster] NP_524601.2 0 -3.23 -0.42 05 Ecdysone-induced protein 28/29kD, isoform A [Drosophila 9.51E- 1.11E- FLY.2140.C2 melanogaster] NP_524085.2 63 -4.38 -0.27 03

197

Table D.4 Expression of hypoxia genes during recovery from cold shock This list of genes, identified as significantly enriched using gene set analysis (GSA), was obtained from a microarray study of the Drosophila hypoxia response (Liu et al., 2006).

The column "GSA gene score" is a modified t-statistic that reflects the relative importance of a particular transcript towards the overall enrichment of that gene set.

198

Table D.4

EST Accession Description Drosophila Blastx E- GSA Log2 FDR RefSeq value gene FC Homolog score EZ608140 Heat-shock-protein-70Aa NP_731651.1 7.03E-70 25.59 3.29 4.72E-15 EZ610106 CG10570 NP_652585.1 4.89E-11 24.61 2.32 4.13E-17 EZ607850 hairy, isoform A NP_523977.2 1.03E-72 11.49 1.37 1.12E-11 U96099.2 Sarcophaga crassipalpis 23kDa NA NA 11.41 1.74 3.74E-11 heat shock protein ScHSP23 mRNA, complete cds EZ610237 CG32103, isoform C NP_729803.1 1.34E-40 10.81 1.55 1.78E-10 EZ608919 phosphoenolpyruvate NP_523784.2 1.46E-89 10.65 1.54 1.19E-13 carboxykinase, isoform A EZ616916 Ecdysone-inducible gene L2, NP_728960.1 4.79E-47 8.64 0.94 1.97E-10 isoform B SRR006884.89607 suppressor of cytokine NP_724096.1 2.70E-46 8.42 0.71 2.28E-07 signaling at 36E, isoform A EZ608338 CG1600, isoform A NP_724572.1 1.31E-132 7.28 0.71 8.64E-09 EZ598021 CG15745, isoform A NP_572873.1 6.98E-10 7.13 0.63 4.16E-09 EZ614842 CG17325, isoform A NP_652586.1 2.43E-05 7.13 0.54 2.34E-05 EZ610004 astray NP_524001.2 6.52E-83 5.39 0.60 6.79E-08 EZ604643 CG10383 NP_609896.1 3.60E-51 4.39 0.55 8.92E-05 EZ601015 CG13868, isoform A NP_611472.1 3.10E-124 3.22 0.46 1.61E-05 EZ599302 Ecdysone-inducible gene L3 NP_476581.1 5.78E-169 3.11 0.46 1.75E-05 SRR006884.207492 brummer, isoform B NP_001163445.1 8.09E-28 2.22 0.24 8.89E-03 EZ602131 dunce, isoform I NP_726849.1 1.23E-79 1.74 0.16 3.71E-02 SRR006884.163264 CG15673 NP_611591.2 4.53E-06 1.70 0.01 9.71E-01 SRR006884.175245 CG1882, isoform A NP_610326.1 3.33E-12 1.60 0.15 2.40E-01 EZ603223 CG3308 NP_650963.2 1.68E-107 1.54 0.17 6.81E-02 EZ598495 CG7224, isoform A NP_609170.1 1.59E-39 1.43 0.13 1.43E-01 EZ614161 methuselah-like 5 NP_650126.2 2.18E-38 1.36 0.12 8.35E-02 EZ604900 vrille, isoform A NP_477191.1 8.29E-16 1.26 0.30 1.66E-02 EZ611923 CG11796, isoform A NP_730536.1 1.90E-157 1.24 0.11 2.41E-01 EZ598728 CG10396 NP_610168.1 4.09E-37 1.00 0.04 7.90E-01 EZ616432 CG5966 NP_572286.1 1.79E-37 0.93 0.12 2.66E-01 EZ598238 CG6834 NP_650104.2 9.78E-40 0.85 0.14 8.49E-02 EZ603539 CG32548, isoform B NP_728167.1 2.01E-09 0.41 0.04 7.17E-01 EZ608725 CG6870 NP_609852.1 8.52E-37 0.33 -0.06 5.48E-01 EZ612562 imaginal disc growth factor 1 NP_477258.1 7.77E-06 0.25 0.05 6.75E-01 EZ608996 CG15829 NP_648082.1 3.75E-32 0.20 -0.02 9.41E-01 EZ610437 heat shock protein cognate 2 NP_524339.1 4.99E-06 0.12 0.19 5.16E-02 EZ611263 desat1, isoform A NP_652731.1 7.65E-156 0.08 0.03 8.39E-01 EZ615333 cabut, isoform A NP_722636.1 6.75E-10 0.02 0.05 5.33E-01 SRR006884.168148 relish, isoform A NP_477094.1 2.03E-06 0.02 -0.02 8.69E-01 EZ610997 CG13324 NP_610833.1 2.77E-07 -0.01 0.00 9.94E-01 SRR006884.73787 phosphoribosylamidotransferase NP_523949.2 9.73E-20 -0.02 0.04 6.68E-01 2, isoform B EZ597113 ade5 NP_572826.1 2.35E-107 -0.08 0.00 9.87E-01 EZ609026 CG4363 NP_611613.1 2.62E-55 -0.16 -0.01 9.29E-01 SRR006884.242280 Cyp311a1, isoform B NP_572780.3 1.52E-17 -0.33 -0.09 5.02E-01 EZ602559 scylla NP_648456.2 1.40E-47 -0.45 0.02 8.96E-01 EZ615859 amylase proximal NP_536346.1 3.96E-21 -0.59 -0.01 9.76E-01 EZ598690 CG6188 NP_650223.1 5.17E-128 -0.64 -0.02 8.95E-01 SRR006884.101284 serpin 28D NP_609172.1 6.97E-18 -0.71 -0.04 7.25E-01 EZ609912 henna, isoform A NP_523963.2 0 -0.86 -0.01 9.71E-01 SRR006884.109548 CG30372, isoform B NP_001014504.1 7.18E-31 -0.89 -0.02 8.88E-01 EZ603287 coracle, isoform C NP_725864.1 7.87E-85 -1.06 -0.05 7.80E-01 EZ597977 CG15043 NP_573300.1 2.59E-27 -1.18 -0.14 6.03E-01 continued

199

Table D.4 continued

EZ610691 CG16758, isoform B NP_647727.2 1.72E-97 -1.18 -0.02 8.94E-01 EZ599864 CG5550 NP_611160.2 1.28E-10 -1.39 -0.06 4.91E-01 EZ596829 Trehalose-6-phosphate NP_608827.1 0 -1.73 -0.14 1.23E-01 synthase 1 SRR006884.112085 sugarbabe NP_610826.1 8.00E-11 -2.32 -0.16 6.63E-02 EZ597538 photoreceptor dehydrogenase, NP_524105.2 1.32E-109 -2.91 -0.09 2.62E-01 isoform A SRR006884.262398 CG13321 NP_610831.1 2.85E-08 -3.29 -0.16 9.91E-02 EZ602354 thread, isoform A NP_524101.2 3.64E-12 -4.18 -0.42 2.79E-03

200

Table D.5 Expression of hyperoxia genes during recovery from cold shock This list of genes, identified as significantly enriched using gene set analysis (GSA), was obtained from a microarray study of the Drosophila hyperoxia response (Landis et al.,

2004). The column "GSA gene score" is a modified t-statistic that reflects the relative importance of a particular transcript towards the overall enrichment of that gene set.

201

Table D.5

EST Accession Description Drosophila Blastx GSA Log2 FDR RefSeq Homolog E-value gene FC score EZ605491 CG8026, isoform B NP_610468.1 3.00E-72 15.48 0.90 5.56E-11 EZ597976 Dgp-1, isoform B NP_611302.1 3.44E-86 13.14 2.07 5.25E-13 EZ607850 hairy, isoform A NP_523977.2 1.03E-72 11.49 1.37 1.12E-11 EZ608919 phosphoenolpyruvate NP_523784.2 1.46E-89 10.65 1.54 1.19E-13 carboxykinase, isoform A EZ616916 Ecdysone-inducible gene NP_728960.1 4.79E-47 8.64 0.94 1.97E-10 L2, isoform B SRR006884.50189 GIIIspla2 NP_572454.2 4.10E-18 5.97 0.75 2.60E-08 EZ607342 CG2017, isoform B NP_731024.1 1.86E-73 5.07 0.54 2.50E-06 EZ604643 CG10383 NP_609896.1 3.60E-51 4.39 0.55 8.92E-05 SRR006884.33130 calcium-independent NP_648366.2 2.32E-34 4.28 0.31 4.97E-04 phospholipase A2 VIA, isoform A EZ597412 HspB8, isoform A NP_608326.1 4.21E-82 4.18 0.43 2.28E-07 EZ606080 charged multivesicular NP_647947.1 1.76E-48 4.01 0.26 2.98E-04 body protein 2b EZ610102 aminolevulinate synthase NP_477281.1 7.57E-109 3.11 0.51 1.84E-04 SRR006884.175245 CG1882, isoform A NP_610326.1 3.33E-12 1.60 0.15 2.40E-01 EZ596828 glutathione S transferase NP_524326.1 1.07E-88 1.53 0.29 2.34E-04 D1, isoform A SRR006884.29383 CG3008 NP_608871.1 4.00E-30 1.47 0.07 4.01E-01 EZ598495 CG7224, isoform A NP_609170.1 1.59E-39 1.43 0.13 1.43E-01 EZ608087 ferritin 2 light chain NP_524802.2 6.54E-05 1.28 0.12 5.02E-02 homologue, isoform A EZ603846 CG6428 NP_570088.1 2.29E-79 1.27 0.13 3.41E-01 EZ604900 vrille, isoform A NP_477191.1 8.29E-16 1.26 0.30 1.66E-02 EZ597398 CG18522 NP_650475.1 2.33E-98 1.25 0.12 2.02E-01 EZ611923 CG11796, isoform A NP_730536.1 1.90E-157 1.24 0.11 2.41E-01 EZ609148 Jun-related antigen, NP_476586.1 4.18E-54 0.99 0.03 7.91E-01 isoform A EZ616432 CG5966 NP_572286.1 1.79E-37 0.93 0.12 2.66E-01 EZ608187 CG6459 NP_611243.1 1.62E-99 0.93 0.13 2.71E-01 SRR006884.88866 muscle LIM protein at NP_477122.1 4.64E-35 0.83 0.13 1.08E-01 84B, isoform A EZ601597 CG10420 NP_651356.1 1.24E-18 0.81 0.04 8.19E-01 EZ610446 muscle protein 20, NP_476643.1 7.17E-92 0.61 0.12 3.81E-01 isoform A SRR006884.129257 sniffer NP_572466.1 8.70E-05 0.54 0.11 2.73E-01 SRR006884.107010 shawn, isoform D NP_001027071.1 1.04E-05 0.53 0.00 9.93E-01 EZ596875 PHGPx, isoform C NP_728868.1 7.92E-83 0.48 0.02 8.60E-01 EZ610419 CG5955 NP_649230.1 9.01E-131 0.47 -0.05 6.98E-01 EZ609296 thioredoxin reductase-1, NP_511082.2 0 0.43 0.03 7.74E-01 isoform A EZ606400 CG6272 NP_648434.1 3.58E-06 0.22 -0.01 9.59E-01 EZ613870 CG12264 NP_609533.1 4.90E-78 0.14 0.02 9.06E-01 SRR006884.66367 CG8249 NP_611060.2 2.21E-32 -0.18 0.01 9.58E-01 EZ605256 CG3740 NP_569932.1 1.36E-28 -0.24 0.07 5.36E-01 EZ602671 CG2789 NP_608531.1 1.63E-57 -0.31 -0.05 6.20E-01 EZ610105 eukaryotic initiation NP_524043.1 1.18E-93 -0.36 -0.01 9.52E-01 factor 2beta EZ599574 CG12262 NP_648149.1 0 -0.40 0.04 8.01E-01 EZ603626 CG30152 NP_611509.1 6.67E-76 -0.48 -0.08 3.70E-01 EZ610770 ribosomal protein L24- NP_650073.1 1.31E-80 -0.48 -0.01 9.08E-01 like continued

202

Table D.5 continued

EZ609730 Alanyl-tRNA synthetase, NP_523511.2 2.14E-70 -0.49 -0.02 9.04E-01 isoform A EZ602109 globin 1, isoform B NP_524369.1 1.48E-45 -0.51 0.03 7.98E-01 EZ600610 escl NP_723702.1 3.63E-73 -0.80 -0.05 7.62E-01 SRR006884.103225 CG17327, isoform A NP_731760.1 2.29E-05 -0.82 -0.08 3.09E-01 EZ608197 CG9836 NP_649840.1 4.62E-70 -0.96 -0.07 3.39E-01 SRR006884.54967 CG5958 NP_609119.2 2.70E-11 -1.01 -0.03 8.23E-01 EZ605599 CG13315 NP_652443.1 2.97E-14 -1.02 -0.07 5.41E-01 EZ608570 Rab-related protein 4 NP_524744.2 9.71E-79 -1.10 -0.06 5.27E-01 EZ608950 ferredoxin, isoform A NP_523993.1 1.14E-43 -1.20 -0.04 7.40E-01 EZ610947 CG2004 NP_572501.1 2.66E-107 -1.22 -0.10 3.83E-01 EZ602568 CG30491 NP_610306.1 5.90E-62 -1.35 -0.13 1.25E-01 EZ597105 Z band alternatively NP_729398.1 6.39E-105 -1.79 -0.10 4.63E-01 spliced PDZ-motif protein 66, isoform A SRR006884.42596 CG6512, isoform B NP_730250.1 1.83E-34 -1.83 -0.07 5.25E-01 EZ600403 cln3 NP_649011.1 9.81E-98 -1.84 -0.09 2.53E-01 EZ607040 lysozyme P NP_476828.1 6.39E-46 -1.90 -0.14 1.68E-01 SRR006884.66544 CG6287 NP_609496.1 6.67E-13 -2.49 -0.19 8.68E-02

203

Table D.6 Expression of oxidative stress genes during recovery from cold shock This list of genes, identified as significantly enriched using gene set analysis (GSA), was obtained from a microarray study of Drosophila oxidative stress (Girardot et al., 2004).

The column "GSA gene score" is a modified t-statistic that reflects the relative importance of a particular transcript towards the overall enrichment of that gene set.

204

Table D.6

EST Accession Description Drosophila Blastx GSA gene Log2 FDR RefSeq Homolog E-value score FC AF107338.2 Sarcophaga crassipalpis 70 kDa NA NA 36.38 2.86 5.95E-15 heat shock protein ScHSP70 (HSP70) mRNA, partial cds EZ608140 Heat-shock-protein-70Aa NP_731651.1 7.03E-70 25.59 3.29 4.72E-15 SRR006884.17184 CG1673 NP_572884.1 4.07E-42 16.49 1.79 3.59E-12 EZ597976 Dgp-1, isoform B NP_611302.1 3.44E-86 13.14 2.07 5.25E-13 U96099.2 Sarcophaga crassipalpis 23kDa NA NA 11.41 1.74 3.74E-11 heat shock protein ScHSP23 mRNA, complete cds EZ608919 phosphoenolpyruvate NP_523784.2 1.46E-89 10.65 1.54 1.19E-13 carboxykinase, isoform A SRR006884.8919 CG10103 NP_648058.1 2.94E-05 9.70 0.34 7.28E-05 EZ600225 c11.1 NP_652606.1 1.04E-41 9.38 0.61 8.07E-09 SRR006884.214223 CG11658, isoform A NP_648498.1 4.38E-17 7.28 0.37 1.31E-04 EZ598021 CG15745, isoform A NP_572873.1 6.98E-10 7.13 0.63 4.16E-09 EZ614842 CG17325, isoform A NP_652586.1 2.43E-05 7.13 0.54 2.34E-05 SRR006884.125032 daughter of sevenless, isoform NP_523890.2 5.15E-21 6.72 0.49 2.86E-06 A SRR006884.50189 GIIIspla2 NP_572454.2 4.10E-18 5.97 0.75 2.60E-08 EZ610004 astray NP_524001.2 6.52E-83 5.39 0.60 6.79E-08 EZ598183 CG12795 NP_001097074.1 1.69E-28 5.28 0.52 5.07E-07 EZ601028 CG4928, isoform B NP_573179.1 2.85E-170 5.17 0.49 2.95E-05 EZ607342 CG2017, isoform B NP_731024.1 1.86E-73 5.07 0.54 2.50E-06 SRR006884.159055 CG11242 NP_611425.1 1.76E-05 4.76 0.31 1.53E-04 EZ598110 CG13795 NP_609138.3 4.64E-31 4.73 0.42 1.55E-05 EZ604643 CG10383 NP_609896.1 3.60E-51 4.39 0.55 8.92E-05 EZ597412 HspB8, isoform A NP_608326.1 4.21E-82 4.18 0.43 2.28E-07 EZ597624 CG17273 NP_650918.1 0 3.84 0.42 3.67E-05 SRR006884.74280 CG32091 NP_729728.1 4.16E-15 3.82 0.32 6.05E-04 SRR006884.107432 aralar1, isoform C NP_733364.1 1.04E-29 3.73 0.27 3.91E-04 EZ604817 Isoleucyl-tRNA synthetase, NP_730716.1 1.12E-62 3.66 0.20 1.26E-02 isoform A EZ617493 CG12004, isoform A NP_647627.1 1.68E-24 3.43 0.31 5.15E-04 EZ609833 charged multivesicular body NP_649451.1 5.01E-11 3.40 0.12 1.72E-01 protein 3 EZ598788 CTP:phosphocholine NP_647621.1 9.19E-148 3.37 0.25 3.36E-03 cytidylyltransferase 1, isoform A EZ603872 CG17982 NP_572447.1 1.18E-05 3.25 0.14 1.16E-01 EZ611153 thioredoxin peroxidase 1, NP_477510.1 1.83E-97 3.19 0.33 4.73E-05 isoform A EZ610102 aminolevulinate synthase NP_477281.1 7.57E-109 3.11 0.51 1.84E-04 EZ599302 Ecdysone-inducible gene L3 NP_476581.1 5.78E-169 3.11 0.46 1.75E-05 EZ603144 CG12576, isoform A NP_608465.1 3.35E-31 3.09 0.23 1.52E-02 SRR006884.87534 belphegor NP_524996.1 6.24E-35 2.64 0.35 9.54E-04 EZ602635 CG7967 NP_647640.1 1.60E-64 2.37 0.16 9.15E-03 SRR006884.226258 CG10638, isoform A NP_729808.1 1.95E-28 2.34 0.14 1.06E-01 SRR006884.54631 CG11841 NP_651661.1 8.74E-13 2.30 0.11 1.51E-01 SRR006884.207492 brummer, isoform B NP_001163445.1 8.09E-28 2.22 0.24 8.89E-03 EZ597506 malic enzyme, isoform B NP_524880.2 6.65E-120 2.21 0.43 8.81E-05 EZ613015 plenty of SH3s NP_523776.1 1.41E-23 2.19 0.13 2.19E-01 SRR006884.76417 CG31694 NP_722858.1 1.04E-13 2.14 0.23 5.37E-02 continued

205

Table D.6 continued

EZ598395 cctgamma, isoform A NP_650572.2 1.69E-168 2.04 0.23 8.51E-02 EZ603857 augmenter of liver regeneration NP_608353.2 2.38E-63 1.89 0.16 1.34E-01 SRR006884.194912 Tao-1, isoform D NP_728267.1 1.23E-32 1.86 0.37 1.46E-03 EZ606727 transportin, isoform A NP_477368.1 2.78E-69 1.86 0.20 1.00E-01 EZ610927 Tryptophanyl-tRNA NP_524826.1 0 1.78 0.07 3.78E-01 synthetase, isoform B EZ617649 CG7264 NP_648515.1 1.52E-23 1.71 0.15 2.64E-01 SRR006884.190184 CG13887, isoform B NP_612062.1 1.10E-15 1.69 0.26 1.02E-02 EZ598215 Aspartyl-tRNA synthetase NP_476609.1 0 1.66 0.19 6.55E-02 SRR006884.32763 CG15485 NP_609577.2 9.09E-26 1.63 0.35 9.27E-02 SRR006884.175245 CG1882, isoform A NP_610326.1 3.33E-12 1.60 0.15 2.40E-01 EZ599743 CG32549, isoform A NP_996510.1 0 1.59 0.08 3.12E-01 EZ596828 glutathione S transferase D1, NP_524326.1 1.07E-88 1.53 0.29 2.34E-04 isoform A EZ597641 CG6230 NP_609490.1 2.30E-29 1.51 0.11 1.34E-01 EZ612635 CG12972 NP_649307.1 2.08E-38 1.49 0.11 2.42E-01 SRR006884.29383 CG3008 NP_608871.1 4.00E-30 1.47 0.07 4.01E-01 EZ598495 CG7224, isoform A NP_609170.1 1.59E-39 1.43 0.13 1.43E-01 EZ601918 CG5792, isoform A NP_609590.1 7.83E-85 1.42 0.05 5.69E-01 EZ617312 Fem-1, isoform A NP_611508.1 1.76E-25 1.37 0.20 3.12E-02 EZ617417 CG13624, isoform C NP_651271.1 5.92E-22 1.37 0.21 1.63E-01 EZ613821 virus-induced RNA 1, isoform NP_723746.1 3.26E-06 1.36 0.16 1.31E-02 A EZ602113 shaggy, isoform D NP_476716.2 3.75E-82 1.34 0.14 3.10E-02 EZ604719 tetraspanin 42Ed NP_523630.1 1.10E-45 1.33 0.19 1.66E-02 EZ602498 tetraspanin 42El NP_523638.1 5.72E-61 1.31 0.28 8.44E-03 EZ602449 Glutamyl-prolyl-tRNA NP_524471.2 1.44E-84 1.30 0.21 1.09E-01 synthetase, isoform A EZ609440 heat shock protein 60, isoform NP_511115.2 0 1.29 0.19 1.03E-01 A EZ605428 CG10189 NP_609975.1 3.83E-35 1.28 0.13 2.87E-01 EZ608087 ferritin 2 light chain NP_524802.2 6.54E-05 1.28 0.12 5.02E-02 homologue, isoform A EZ603846 CG6428 NP_570088.1 2.29E-79 1.27 0.13 3.41E-01 EZ617538 Histidyl-tRNA synthetase, NP_573305.1 2.41E-48 1.26 0.19 1.62E-02 isoform B SRR006884.91163 RagC NP_610361.1 4.22E-07 1.26 -0.01 9.54E-01 EZ597398 CG18522 NP_650475.1 2.33E-98 1.25 0.12 2.02E-01 SRR006884.276817 GDP dissociation inhibitor NP_523524.2 5.10E-29 1.24 0.15 1.62E-01 EZ611923 CG11796, isoform A NP_730536.1 1.90E-157 1.24 0.11 2.41E-01 SRR006884.262899 fizzy NP_477501.1 1.44E-15 1.22 0.01 9.53E-01 EZ609548 CG1129, isoform A NP_649498.1 4.74E-178 1.21 0.10 3.82E-01 EZ597852 homogentisate 1,2-dioxygenase NP_523544.2 1.73E-41 1.19 0.02 8.27E-01 EZ602847 CG3338 NP_608825.2 1.35E-50 1.19 0.12 1.55E-01 SRR006884.143200 membrane steroid binding NP_573087.1 1.97E-17 1.18 0.13 2.93E-01 protein, isoform A EZ612914 CG8678 NP_610100.1 2.09E-19 1.18 0.10 2.29E-01 SRR006884.33178 CG1208, isoform B NP_649598.1 2.10E-19 1.15 0.06 6.35E-01 SRR006884.66921 CG3808 NP_649083.2 3.30E-20 1.06 0.03 7.78E-01 EZ616416 rhophilin, isoform A NP_511168.3 1.55E-14 1.05 -0.03 7.59E-01 EZ598986 Glycyl-tRNA synthetase, NP_730022.1 3.48E-155 1.04 0.05 6.19E-01 isoform B EZ606663 CG11029 NP_608955.1 1.77E-25 1.03 0.07 5.33E-01 EZ598054 Chmp1 NP_649051.3 2.31E-87 1.02 0.07 4.24E-01 continued

206

Table D.6 continued

SRR006884.42501 mutagen-sensitive 210, isoform NP_476861.1 2.47E-31 0.98 0.04 7.01E-01 A EZ601406 CG10802 NP_570062.1 7.77E-103 0.97 0.07 5.38E-01 SRR006884.135175 CG13189 NP_610712.1 7.21E-20 0.97 0.04 7.65E-01 SRR006884.262835 Valyl-tRNA synthetase, NP_524838.1 3.89E-45 0.95 0.04 8.28E-01 isoform A EZ604970 Prp31 NP_648756.1 2.20E-82 0.93 0.04 7.12E-01 EZ616432 CG5966 NP_572286.1 1.79E-37 0.93 0.12 2.66E-01 EZ608187 CG6459 NP_611243.1 1.62E-99 0.93 0.13 2.71E-01 EZ610202 Thioredoxin-like NP_523938.2 1.52E-103 0.92 0.04 6.88E-01 EZ607656 XRCC1 NP_572217.1 1.61E-47 0.92 0.02 8.48E-01 SRR006884.157105 wings down NP_476900.1 7.97E-22 0.91 0.05 6.63E-01 EZ605933 CG18643 NP_650072.2 2.94E-16 0.90 0.09 2.02E-01 EZ608323 CG7033, isoform A NP_572524.1 2.39E-89 0.89 0.18 1.14E-01 EZ603505 CG7044 NP_650940.2 9.50E-56 0.88 0.06 7.22E-01 EZ600823 msb1l NP_609991.1 8.62E-16 0.85 0.05 8.18E-01 SRR006884.132048 p53, isoform B NP_996267.1 9.06E-10 0.83 0.02 9.16E-01 EZ597627 CG7267 NP_572518.1 1.60E-34 0.82 0.14 1.28E-01 EZ601597 CG10420 NP_651356.1 1.24E-18 0.81 0.04 8.19E-01 EZ613034 CG3689, isoform C NP_001036597.1 1.12E-38 0.81 0.08 3.93E-01 SRR006884.49030 CG4733 NP_650842.2 1.32E-08 0.80 0.11 2.97E-01 EZ603017 CG9821, isoform B NP_788602.1 9.70E-26 0.80 0.16 1.22E-02 EZ601361 reduction in Cnn dots 5 NP_647852.1 8.41E-08 0.78 0.10 2.56E-01 SRR006884.270592 proliferation disrupter NP_725841.1 1.74E-05 0.78 0.11 2.75E-01 EZ614462 CG6353 NP_650975.1 1.25E-35 0.77 0.04 7.36E-01 EZ599277 sds22 NP_650619.1 3.66E-111 0.75 0.04 7.41E-01 EZ610855 CG8679 NP_610099.1 9.26E-31 0.74 0.12 5.62E-01 EZ610954 CG2076 NP_572681.1 1.12E-99 0.73 0.10 2.76E-01 EZ615387 CG5290 NP_649026.1 1.72E-08 0.72 0.05 6.69E-01 EZ603230 CG8468, isoform B NP_610940.1 3.20E-110 0.71 0.05 6.01E-01 EZ602012 cuticular protein 72Ec NP_648884.1 9.56E-23 0.69 0.08 5.14E-01 EZ599707 barrier to autointegration factor NP_609176.1 1.18E-42 0.67 0.05 4.86E-01 EZ616441 bigmax NP_651556.2 2.55E-41 0.67 0.02 8.28E-01 EZ608061 CG3714, isoform A NP_722961.1 0 0.63 0.07 4.41E-01 EZ607760 elongator complex protein 1 NP_650098.1 4.06E-23 0.61 0.05 6.69E-01 EZ603340 CG7272 NP_648768.1 3.76E-65 0.58 0.02 9.32E-01 EZ603922 CG4726 NP_608572.1 8.52E-32 0.58 0.03 8.39E-01 SRR006884.8678 CG5323 NP_611346.1 1.10E-15 0.58 -0.03 7.87E-01 EZ608188 CG15617 NP_611151.1 1.02E-99 0.56 0.03 7.98E-01 SRR006884.129257 sniffer NP_572466.1 8.70E-05 0.54 0.11 2.73E-01 EZ600356 malvolio, isoform E NP_524425.2 1.45E-110 0.51 0.06 5.32E-01 EZ602349 SNF4/AMP-activated protein NP_732598.1 6.66E-63 0.51 0.08 4.75E-01 kinase gamma subunit, isoform F EZ612449 hepatocyte growth factor NP_722830.2 1.97E-11 0.50 0.11 5.41E-01 regulated tyrosine kinase substrate, isoform C EZ596875 PHGPx, isoform C NP_728868.1 7.92E-83 0.48 0.02 8.60E-01 EZ610419 CG5955 NP_649230.1 9.01E-131 0.47 -0.05 6.98E-01 EZ613572 replication protein A 70 NP_524274.1 9.30E-07 0.45 -0.02 8.63E-01 EZ598464 eIF2B-alpha NP_651752.1 1.82E-69 0.45 0.02 8.88E-01 SRR006884.111167 rad50 NP_726199.3 2.50E-20 0.44 0.05 6.77E-01 EZ609296 thioredoxin reductase-1, NP_511082.2 0 0.43 0.03 7.74E-01 isoform A continued

207

Table D.6 continued

EZ610760 CG17331 NP_609804.1 5.72E-85 0.41 -0.03 8.14E-01 EZ603621 Ugt86Da NP_652626.1 3.17E-24 0.38 -0.02 8.86E-01 SRR006884.120965 blue cheese NP_608968.2 1.07E-39 0.38 0.01 9.25E-01 SRR006884.136976 CG3609, isoform A NP_608675.1 3.11E-10 0.37 0.01 9.45E-01 EZ603021 CG7059, isoform A NP_651034.2 6.81E-40 0.37 0.00 9.89E-01 EZ610502 heat shock protein 27 NP_524000.1 5.06E-58 0.35 -0.02 9.10E-01 SRR006884.159817 thiolester containing protein II, NP_523506.1 6.13E-06 0.30 0.05 6.54E-01 isoform A EZ610570 ubiquitin activating enzyme 1 NP_477310.2 0 0.29 0.00 9.70E-01 EZ600551 CG3040 NP_572372.1 2.65E-72 0.29 0.00 9.90E-01 EZ612562 imaginal disc growth factor 1 NP_477258.1 7.77E-06 0.25 0.05 6.75E-01 EZ606400 CG6272 NP_648434.1 3.58E-06 0.22 -0.01 9.59E-01 SRR006884.217646 CG4630 NP_610847.3 9.65E-28 0.21 0.06 6.21E-01 EZ600213 Tat-binding protein-1 NP_524464.1 0 0.18 -0.05 7.16E-01 EZ600569 CG9253 NP_610090.1 1.50E-80 0.17 0.01 9.11E-01 EZ600519 Attacin-A NP_523745.1 9.29E-28 0.15 0.17 6.71E-01 SRR006884.91373 CG6602 NP_648020.1 5.19E-05 0.15 0.14 2.08E-01 EZ610228 lethal (2) 37Cc, isoform A NP_724165.1 1.41E-115 0.15 0.01 9.00E-01 EZ607882 CG17259 NP_608743.2 0 0.14 0.03 8.25E-01 SRR006884.186050 CG5380 NP_651029.1 5.82E-33 0.13 0.01 9.52E-01 SRR006884.58882 Hermansky-Pudlak syndrome 1 NP_610997.1 1.78E-34 0.13 0.04 6.87E-01 ortholog EZ599177 CG2950, isoform B NP_608866.1 1.07E-76 0.13 0.04 7.10E-01 EZ606802 CG10098 NP_649694.1 5.19E-40 0.11 0.04 8.51E-01 EZ602859 Rab40, isoform A NP_572800.2 4.86E-50 0.10 0.06 5.38E-01 EZ608096 upheld, isoform E NP_001014739.1 6.93E-128 0.10 0.13 1.47E-01 EZ602796 CG9272 NP_610078.2 1.81E-53 0.08 0.05 5.98E-01 EZ610975 CG17904 NP_609805.1 3.70E-08 0.07 0.01 9.40E-01 EZ598329 CG2246, isoform B NP_733386.1 0 0.05 0.06 5.93E-01 EZ598663 CG16727 NP_650818.1 1.23E-64 0.05 0.02 8.80E-01 SRR006884.21925 Cyp6a22 NP_652075.1 1.19E-14 0.04 -0.02 8.48E-01 EZ611614 CG4203 NP_650428.1 4.48E-36 0.03 -0.02 9.06E-01 EZ611034 Tal NP_523835.2 5.61E-138 0.02 0.05 5.33E-01 SRR006884.85980 mitotic 15, isoform B NP_524901.2 5.08E-13 -0.02 -0.01 9.55E-01 SRR006884.73787 phosphoribosylamidotransferas NP_523949.2 9.73E-20 -0.02 0.04 6.68E-01 e 2, isoform B EZ604337 Ugt36Bc, isoform A NP_652627.2 6.33E-46 -0.03 0.00 9.86E-01 SRR006884.159177 ornithine aminotransferase NP_649139.1 1.10E-15 -0.03 0.06 7.32E-01 precursor EZ609710 glutamine synthetase 1, NP_476570.1 5.19E-78 -0.04 0.05 6.08E-01 isoform B EZ609401 CG8317 NP_611138.2 3.11E-17 -0.04 0.11 4.69E-01 EZ609186 immune deficiency NP_573394.1 4.34E-29 -0.06 -0.06 4.40E-01 EZ608356 CG9547 NP_609040.1 0 -0.06 0.04 7.15E-01 EZ600693 urate oxidase NP_476779.1 2.93E-121 -0.06 0.02 8.72E-01 EZ599354 NTPase, isoform A NP_477370.1 2.31E-54 -0.07 0.04 7.95E-01 EZ597113 ade5 NP_572826.1 2.35E-107 -0.08 0.00 9.87E-01 SRR006884.117052 adenosine 2, isoform A NP_477212.1 2.89E-32 -0.08 -0.06 5.91E-01 EZ614354 CG9772, isoform B NP_730816.2 2.40E-17 -0.08 0.00 9.78E-01 SRR006884.150401 CG10341, isoform A NP_609898.1 3.08E-34 -0.08 -0.02 9.23E-01 SRR006884.43142 CG4658, isoform A NP_788020.1 1.02E-45 -0.09 0.09 2.92E-01 EZ603070 vacuolar protein sorting 13 NP_610299.2 1.90E-92 -0.09 -0.02 8.94E-01 EZ610043 proteasome 26kD subunit NP_524115.1 5.66E-100 -0.10 0.09 2.42E-01 continued

208

Table D.6 continued

SRR006884.243315 CG4049 NP_611885.3 2.15E-27 -0.12 0.04 7.37E-01 EZ597054 CG15771, isoform A NP_572257.2 2.21E-62 -0.12 -0.01 9.26E-01 EZ606944 CG7766, isoform A NP_572525.3 1.78E-100 -0.13 0.03 8.24E-01 EZ602877 lobe NP_524787.1 1.87E-17 -0.14 0.02 8.84E-01 EZ599922 lethal (2) 06496 NP_652036.1 1.32E-48 -0.15 -0.02 8.25E-01 EZ607719 enhancer of decapping 3 NP_648992.1 1.35E-54 -0.15 -0.03 8.24E-01 EZ610057 updo, isoform A NP_610501.1 1.69E-169 -0.16 0.01 9.57E-01 EZ609026 CG4363 NP_611613.1 2.62E-55 -0.16 -0.01 9.29E-01 EZ607290 CG3604 NP_608801.1 1.36E-05 -0.17 -0.13 4.30E-01 SRR006884.66367 CG8249 NP_611060.2 2.21E-32 -0.18 0.01 9.58E-01 EZ597526 proteasome alpha5 subunit, NP_725669.1 5.58E-90 -0.19 -0.03 7.76E-01 isoform A EZ601388 keren NP_524129.1 1.21E-06 -0.19 -0.01 9.28E-01 SRR006884.133846 locomotion defects, isoform D NP_732773.1 3.55E-09 -0.19 0.04 6.51E-01 EZ613375 CG6607 NP_651266.1 4.63E-17 -0.20 -0.04 7.18E-01 SRR006884.160099 CG6608, isoform A NP_650034.1 3.56E-06 -0.21 0.19 1.20E-01 EZ606175 Immune-regulated catalase NP_650584.1 1.50E-66 -0.22 -0.03 8.34E-01 EZ606679 CG4269 NP_611683.1 3.02E-21 -0.23 -0.08 5.27E-01 SRR006884.271668 Rev1 NP_612047.1 6.65E-21 -0.23 0.00 9.89E-01 EZ606051 mediator complex subunit 17 NP_650686.1 3.04E-44 -0.23 -0.05 5.74E-01 EZ598343 lipase 4, isoform A NP_609418.1 2.38E-154 -0.24 -0.03 8.16E-01 EZ605256 CG3740 NP_569932.1 1.36E-28 -0.24 0.07 5.36E-01 SRR006884.75170 CG32343, isoform B NP_728520.2 3.14E-10 -0.26 0.08 2.85E-01 SRR006884.122969 CG32635 NP_727708.1 3.94E-05 -0.26 0.02 8.59E-01 SRR006884.4243 CG5646 NP_651568.1 8.05E-20 -0.27 0.01 9.86E-01 EZ610173 midway, isoform A NP_609813.1 1.08E-121 -0.28 0.01 9.63E-01 EZ601540 CG5384 NP_609377.1 0 -0.29 -0.05 5.19E-01 EZ602671 CG2789 NP_608531.1 1.63E-57 -0.31 -0.05 6.20E-01 EZ608349 CG6277 NP_651525.1 2.57E-113 -0.31 -0.01 9.39E-01 EZ611110 CG9336 NP_610069.2 2.21E-38 -0.32 -0.03 8.26E-01 EZ610209 yippee, isoform A NP_572882.1 8.04E-34 -0.33 -0.07 5.06E-01 EZ605371 iron regulatory protein 1B NP_524303.2 6.36E-40 -0.34 0.02 8.87E-01 EZ603406 CG4585 NP_477376.1 1.44E-85 -0.35 -0.02 8.78E-01 EZ610710 CG12379 NP_573061.2 9.67E-29 -0.36 -0.01 9.53E-01 EZ610105 eukaryotic initiation factor NP_524043.1 1.18E-93 -0.36 -0.01 9.52E-01 2beta EZ609113 transferrin 1 NP_523401.2 1.75E-05 -0.38 -0.01 9.28E-01 SRR006884.184489 CG6767, isoform A NP_648345.1 5.20E-26 -0.39 -0.04 8.08E-01 EZ599115 phosphogluconate mutase NP_524675.1 7.37E-167 -0.39 0.02 8.63E-01 EZ597060 Mov34 NP_523845.2 3.21E-153 -0.41 -0.06 4.35E-01 EZ611964 structure specific recognition NP_523830.2 2.87E-48 -0.42 -0.04 6.84E-01 protein SRR006884.13996 CG11668 NP_650254.1 8.18E-19 -0.42 0.06 4.42E-01 EZ610634 CG6769 NP_573252.1 1.53E-41 -0.43 0.00 9.77E-01 EZ602559 scylla NP_648456.2 1.40E-47 -0.45 0.02 8.96E-01 EZ614754 CG1550 NP_610325.1 6.55E-21 -0.45 0.01 9.70E-01 EZ610724 CG3817 NP_650405.2 1.98E-11 -0.46 -0.09 3.55E-01 EZ600930 asrij NP_611733.1 9.13E-71 -0.49 0.00 9.94E-01 SRR006884.227093 CG14655 NP_649494.1 2.65E-16 -0.50 0.01 9.55E-01 EZ610203 proteasome 26S subunit NP_524469.2 0 -0.50 -0.01 9.24E-01 subunit 4 ATPase EZ610982 replication protein A2 NP_610077.1 6.24E-19 -0.50 -0.06 4.50E-01 EZ602109 globin 1, isoform B NP_524369.1 1.48E-45 -0.51 0.03 7.98E-01 continued

209

Table D.6 continued

EZ601436 yippee interacting protein 2 NP_523528.1 9.14E-15 -0.52 -0.02 9.41E-01 EZ606230 Attacin-A NP_523745.1 1.75E-05 -0.54 -0.16 2.76E-01 EZ597569 CG9934, isoform A NP_609597.1 6.44E-91 -0.54 -0.06 6.21E-01 EZ609980 N-methyl-D-aspartate receptor- NP_523722.1 3.05E-77 -0.56 -0.02 8.16E-01 associated protein, isoform C EZ596994 Cyp6d5 NP_650327.1 6.63E-144 -0.60 -0.10 3.91E-01 EZ609860 adenosylhomocysteinase at 13 NP_511164.2 0 -0.61 -0.02 8.92E-01 SRR006884.33494 Microfibril-associated protein 1 NP_647679.1 5.06E-05 -0.62 0.03 8.79E-01 EZ597396 proteasome 54kD subunit NP_524204.1 1.23E-148 -0.63 -0.05 5.90E-01 EZ607670 CG7506 NP_648144.3 3.06E-72 -0.64 0.01 9.61E-01 EZ599068 death executioner caspase NP_477462.1 3.91E-39 -0.64 0.00 9.95E-01 related to Apopain/Yama EZ596801 CG2046 NP_649590.1 7.85E-57 -0.64 -0.05 5.71E-01 EZ610017 CG9773 NP_649813.1 5.06E-91 -0.65 -0.05 6.06E-01 EZ607889 CG11899 NP_652046.1 5.39E-169 -0.65 -0.03 7.35E-01 EZ608392 CG8636 NP_570011.1 3.91E-131 -0.65 -0.01 9.60E-01 EZ617641 maroon-like NP_523423.1 3.56E-22 -0.66 0.00 9.87E-01 EZ610698 refractory to sigma P, isoform NP_476700.1 3.87E-28 -0.67 -0.05 6.36E-01 A EZ599652 CG3394, isoform B NP_611906.1 1.46E-52 -0.68 0.06 6.37E-01 EZ600220 CG15019, isoform A NP_647899.1 2.62E-16 -0.68 -0.05 6.28E-01 EZ600528 lethal (2) 35Bg NP_524938.1 2.49E-61 -0.69 -0.08 2.76E-01 EZ599785 CG14615 NP_608459.1 1.84E-44 -0.70 0.05 5.44E-01 SRR006884.255897 CG3534 NP_650582.1 5.04E-27 -0.71 0.02 8.89E-01 EZ616042 Rrp4 NP_611807.1 3.50E-37 -0.72 -0.08 4.22E-01 SRR006884.7518 CG3709 NP_608500.1 4.34E-20 -0.73 -0.05 6.48E-01 SRR006884.71371 CG2909 NP_572616.1 3.87E-05 -0.73 0.01 9.47E-01 EZ598851 CG14022 NP_608922.1 5.30E-44 -0.74 -0.05 6.76E-01 SRR006884.107856 CG5800 NP_573230.1 1.27E-11 -0.76 0.03 7.93E-01 EZ598828 CG6182 NP_651223.2 8.44E-76 -0.79 -0.05 5.88E-01 EZ610751 serpin 27A NP_652024.1 1.52E-125 -0.79 -0.09 3.00E-01 EZ600610 escl NP_723702.1 3.63E-73 -0.80 -0.05 7.62E-01 SRR006884.103225 CG17327, isoform A NP_731760.1 2.29E-05 -0.82 -0.08 3.09E-01 EZ614040 CG3984 NP_650420.1 8.01E-05 -0.83 0.05 6.65E-01 SRR006884.128601 glycerol kinase, isoform A NP_524655.1 1.21E-14 -0.84 -0.01 9.36E-01 EZ608218 CG12321 NP_650685.1 7.95E-23 -0.84 0.02 8.29E-01 EZ605616 DEAD box protein 45A NP_476927.1 5.62E-43 -0.84 -0.06 4.35E-01 EZ615632 CG3448 NP_648316.1 2.79E-06 -0.85 -0.02 8.83E-01 SRR006884.194701 CG6199, isoform A NP_648451.1 1.15E-20 -0.86 0.05 7.47E-01 EZ609912 henna, isoform A NP_523963.2 0 -0.86 -0.01 9.71E-01 EZ610009 CG9363, isoform A NP_649895.1 7.63E-119 -0.87 -0.06 5.92E-01 EZ613130 ferrochelatase, isoform A NP_524613.1 3.69E-91 -0.87 0.01 9.32E-01 EZ604331 CG7135 NP_573269.1 1.45E-27 -0.89 -0.04 6.69E-01 EZ604823 NAD-dependent NP_476929.1 1.12E-80 -0.91 -0.09 2.26E-01 methylenetetrahydrofolate dehydrogenase, isoform B SRR006884.83747 sorbitol dehydrogenase 1 NP_477348.1 3.23E-26 -0.92 -0.10 8.12E-01 EZ610607 proteasome beta3 subunit NP_649858.1 1.31E-107 -0.93 -0.06 4.81E-01 EZ603603 metallothionein B NP_524413.1 3.36E-17 -0.94 -0.18 1.68E-01 EZ597426 CG4757 NP_650043.1 4.37E-47 -0.94 -0.12 4.60E-01 EZ610720 proteasome beta5 subunit, NP_652014.1 1.81E-10 -0.95 -0.11 2.92E-01 isoform A EZ611246 CG8206 NP_573064.1 8.70E-05 -0.95 -0.06 6.72E-01 EZ608335 CG8801 NP_610484.1 2.13E-101 -0.97 -0.02 8.65E-01 continued

210

Table D.6 continued

EZ617653 CG6841 NP_649073.1 2.13E-51 -0.99 -0.09 3.34E-01 EZ606257 bocksbeutel, isoform A NP_649917.1 2.47E-13 -1.00 -0.05 5.60E-01 SRR006884.54967 CG5958 NP_609119.2 2.70E-11 -1.01 -0.03 8.23E-01 EZ613573 CG2658, isoform A NP_570017.1 7.60E-36 -1.04 -0.03 8.58E-01 EZ597306 CG8209 NP_648167.1 2.98E-15 -1.06 -0.11 2.51E-01 SRR006884.250489 Rpn9, isoform A NP_651177.1 1.38E-07 -1.06 -0.09 4.15E-01 SRR006884.150010 CG11594, isoform B NP_647848.1 9.29E-07 -1.09 -0.13 4.85E-02 EZ609317 tetraspanin 29Fa, isoform A NP_523515.1 8.19E-27 -1.09 -0.07 4.35E-01 EZ608570 Rab-related protein 4 NP_524744.2 9.71E-79 -1.10 -0.06 5.27E-01 EZ601870 CG1703 NP_572736.1 2.43E-131 -1.10 -0.07 5.61E-01 EZ599343 CG10778 NP_572425.1 1.57E-106 -1.10 -0.08 2.96E-01 EZ611027 Pros45 NP_608447.1 0 -1.15 -0.06 5.42E-01 EZ615724 CG9098, isoform A NP_608979.2 7.67E-33 -1.18 -0.08 3.13E-01 EZ600070 Palmitoyl-protein thioesterase NP_609500.1 1.19E-49 -1.19 -0.17 1.77E-02 2 EZ608950 ferredoxin, isoform A NP_523993.1 1.14E-43 -1.20 -0.04 7.40E-01 EZ601523 rhythmically expressed gene 2 NP_612043.1 1.91E-97 -1.20 -0.07 5.91E-01 EZ610947 CG2004 NP_572501.1 2.66E-107 -1.22 -0.10 3.83E-01 EZ613817 O-fucosyltransferase 2 NP_569916.1 2.28E-13 -1.23 0.01 9.55E-01 EZ612550 Smg5, isoform A NP_609685.1 5.68E-36 -1.24 -0.08 3.70E-01 SRR006884.34309 NTF2-related export protein 1 NP_611833.1 3.95E-29 -1.25 -0.13 1.63E-01 EZ616919 CG3838, isoform B NP_609298.1 3.05E-34 -1.25 -0.06 5.47E-01 EZ598420 CG1681 NP_572886.2 1.79E-80 -1.25 -0.13 1.83E-01 EZ597136 tetraspanin 29Fb, isoform B NP_523516.1 1.04E-10 -1.27 -0.10 3.65E-01 EZ617518 CG11486, isoform G NP_647767.1 2.52E-44 -1.29 -0.06 5.73E-01 EZ603183 CG31743 NP_609820.1 3.78E-124 -1.30 -0.16 4.84E-02 EZ609559 Rpn11 NP_608905.1 2.64E-173 -1.31 -0.12 9.92E-02 EZ598881 Rpn1 NP_649158.1 9.50E-87 -1.32 -0.10 2.19E-01 EZ610900 CG9034 NP_652538.1 1.46E-11 -1.32 -0.04 7.12E-01 EZ604939 CG10153 NP_610986.1 3.28E-20 -1.34 -0.08 3.73E-01 EZ602568 CG30491 NP_610306.1 5.90E-62 -1.35 -0.13 1.25E-01 EZ597921 lethal (2) 03709, isoform C NP_725832.2 6.77E-121 -1.40 -0.07 3.98E-01 SRR006884.180657 CG9003, isoform B NP_001097271.1 5.06E-27 -1.40 -0.10 2.76E-01 EZ609282 CG5045 NP_609388.1 2.94E-97 -1.42 -0.12 1.07E-01 SRR006884.122763 G protein alpha49B, isoform H NP_523718.1 1.15E-12 -1.43 -0.03 8.04E-01 EZ612194 Ance-5 NP_573392.2 6.85E-64 -1.44 -0.10 3.25E-01 SRR006884.28673 RNA polymerase I 135kD NP_476708.1 8.15E-43 -1.44 -0.10 1.96E-01 subunit SRR006884.123509 pugilist, isoform B NP_731489.2 8.82E-21 -1.46 -0.16 2.53E-01 EZ610659 PGRP-SB1 NP_648917.1 5.62E-79 -1.46 -0.24 8.29E-02 EZ600852 CG11089, isoform A NP_651305.1 2.67E-112 -1.47 -0.09 4.07E-01 SRR006884.35855 plexus, isoform A NP_726208.1 1.08E-07 -1.47 -0.10 3.60E-01 EZ597739 Rpt1 NP_477473.1 1.97E-50 -1.53 -0.05 6.10E-01 EZ609945 cuticular protein 57A, isoform NP_611489.1 3.39E-62 -1.58 -0.03 7.26E-01 A EZ609879 CG3699 NP_569875.2 2.47E-31 -1.63 -0.12 2.34E-01 EZ600365 CG1139 NP_647686.1 5.75E-109 -1.66 -0.13 3.07E-01 EZ610730 sclp, isoform B NP_001162727.1 2.89E-66 -1.68 -0.19 3.44E-01 SRR006884.85171 CG6910 NP_648556.3 7.88E-20 -1.71 -0.14 3.72E-01 EZ608091 CG6115 NP_652578.1 6.35E-33 -1.72 -0.09 2.66E-01 SRR006884.265754 6-phosphofructo-2-kinase, NP_477451.1 1.54E-38 -1.75 -0.11 5.20E-01 isoform G EZ601014 CG12012, isoform A NP_647813.1 7.02E-19 -1.77 -0.10 1.56E-01 continued

211

Table D.6 continued

EZ599672 CG10877 NP_650894.1 9.64E-28 -1.77 -0.07 4.58E-01 EZ610061 CG42486 NP_001162960.1 9.92E-19 -1.81 -0.14 3.56E-01 SRR006884.42596 CG6512, isoform B NP_730250.1 1.83E-34 -1.83 -0.07 5.25E-01 SRR006884.116401 CG4080 NP_648305.1 1.80E-41 -1.85 -0.14 8.73E-02 EZ617643 alpha-coatomer protein, NP_477395.1 6.39E-32 -1.88 -0.07 4.13E-01 isoform A SRR006884.94997 brain washing NP_610020.1 1.01E-16 -1.90 -0.13 2.32E-01 EZ600173 Rpn2 NP_651677.2 2.02E-54 -2.03 -0.13 1.21E-01 EZ599347 visgun, isoform C NP_648349.1 3.13E-17 -2.04 -0.12 1.73E-01 SRR006884.270688 rosy NP_524337.1 2.73E-30 -2.06 -0.04 6.71E-01 EZ610369 CG5059, isoform A NP_649239.1 2.85E-56 -2.16 -0.12 1.17E-01 EZ605460 CG15021 NP_647902.1 4.31E-07 -2.19 -0.14 7.37E-02 EZ599137 Rpn12 NP_648904.1 3.22E-89 -2.24 -0.14 6.92E-02 EZ615955 hemomucin NP_477159.1 1.02E-37 -2.24 -0.19 2.36E-02 EZ608918 CG12177 NP_572911.1 2.08E-81 -2.34 -0.23 2.17E-02 EZ604730 Roc2, isoform A NP_610691.1 8.81E-45 -2.34 -0.20 7.00E-03 EZ601062 CG6762, isoform A NP_573250.1 8.35E-45 -2.40 -0.18 1.56E-01 EZ598236 CG17333 NP_572656.1 2.94E-42 -2.42 -0.18 2.21E-02 SRR006884.171002 CG31908, isoform A NP_723239.1 1.72E-21 -2.46 -0.08 3.62E-01 SRR006884.66544 CG6287 NP_609496.1 6.67E-13 -2.49 -0.19 8.68E-02 EZ599716 cytochrome P450-4p1 NP_524828.1 7.21E-87 -2.69 -0.05 5.59E-01 EZ600269 O-6-alkylguanine-DNA NP_477366.1 4.19E-18 -2.88 -0.22 6.00E-03 alkyltransferase EZ598913 serine pyruvate NP_511062.1 5.06E-11 -2.88 -0.24 2.54E-02 aminotransferase SRR006884.23495 CG3999 NP_649989.1 5.74E-33 -3.00 -0.16 9.99E-02 EZ597253 CG3011, isoform A NP_572278.1 5.36E-07 -3.11 -0.12 2.39E-01 SRR006884.61637 RhoGAP68F NP_648552.1 6.24E-35 -3.25 -0.15 9.55E-02

212

Table D.7 Metabolite content in response to RCH and cold shock Metabolite contents are expressed as mean ± SE of the metabolite content in nmol metabolite mg-1 fresh mass. In each row, different letters represent significant differences (ANOVA, FDR<0.05) between groups for a particular metabolite.

213

Table D.7

Metabolite content (nmol/mg fresh mass) Metabolite Control RCH CS+2R RCH+CS+2R CS+24R RCH+CS+24R Valine 1.62±0.09a,b 1.60±0.13b 1.97±0.07c 1.87±0.09a,c 2.76±0.13d 2.73±0.09d Glycine 2.86±0.07a,b 2.73±0.10b 3.43±0.09c 3.16±0.12a,c 4.44±0.18d 4.66±0.18d Serine 1.62±0.08a 1.57±0.10a 2.13±0.09b 1.91±0.08b 3.03±0.17c 3.19±0.19c Glutamate 4.43±0.35a,b 4.59±0.28a 5.15±0.18a 4.69±0.31a 3.55±0.22c 3.63±0.13b,c Proline 5.66±0.22a 6.02±0.24a,b,c 6.27±0.42a,b,c 5.95±0.33a,c 7.03±0.36b 6.82±0.37b,c Leucine 1.08±0.06a 1.02±0.05a 1.29±0.06b 1.13±0.04a,b 1.86±0.07c 1.88±0.07c Isoleucine 0.65±0.02a 0.63±0.03a 0.84±0.04b 0.73±0.02c 1.35±0.04d 1.37±0.05d Threonine 0.62±0.03a 0.63±0.05a 0.87±0.06b 0.74±0.03b 1.48±0.07c 1.55±0.07c Alanine 0.64±0.02a 0.57±0.04a 0.63±0.06a 0.64±0.05a 1.4±0.06b 1.34±0.10b Phenylalanine 0.48±0.03a 0.50±0.02a,b 0.57±0.02b 0.53±0.03a,b 0.93±0.03c 0.88±0.03c Ribose 1.77E-02 1.60E-02 1.84E-02 1.66E-02 1.77E-02 1.70E-02 ±1.2E-03a ±1.2E-03a ±5.1E-04a ±8.2E-04a ±8.1E-04a ±7.4E-04a Glucose 0.52±0.04a 0.59±0.04a 2.75±0.61b 1.47±0.11c 6.15±0.63d 4.12±0.47e Fructose 3.50E-03 2.88E-03 2.22E-02 1.51E-02 0.23±0.03d 0.13±0.01e ±2.0E-04a ±9.4E-05a ±2.8E-03b ±8.1E-04c Mannose 3.61E-02 3.82E-02 0.11±0.01b 7.81E-02 0.30±0.03d 0.22±0.02e ±9.1E-04a ±1.2E-03a ±3.7E-03c Maltose 8.23E-02 8.26E-02 0.27±0.04b 0.18±0.02c 0.12 9.36E-02 ±7.6E-03a ±7.0E-03a ±8.7E-3d ±6.6E-03a,d Trehalose 7.93±0.32a,b 7.38±0.33b 8.70±0.26a,c 8.33±0.30a,c 9.15±0.37c,d 9.92±0.32d Glycerol 2.69E-02 2.70E-02 4.11E-02 3.08E-02 0.18 0.18±8.5E-03c ±2.0E-03a ±2.2E-03a ±4.9E-03b ±1.4E-03a ±6.5E-03c Erythritol 8.46E-02 8.50E-02 9.60E-02 8.98E-02 0.14 0.14±6.8E-03b ±2.8E-03a ±3.5E-03a ±3.9E-03a ±3.7E-03a ±5.9E-03b Xylitol 6.62E-03 6.08E-03 1.21E-02 8.32E-03 2.32E-02 2.44E-02 ±1.7E-04a ±2.5E-04a ±1.4E-03b ±3.53E-04c ±1.0E-03d ±2.1E-03d Arabitol 1.99E-02 1.98E-02 2.33E-02 2.11E-02 2.58E-02 2.80E-02 ±6.5E-04a ±8.0E-04a ±8.7E-04b,c ±9.8E-04a,c ±1.1E-03d ±1.3E-03d Ribitol 3.10E-03 3.09E-03 4.20E-03 3.74E-03 1.47E-02 1.52E-02 ±1.5E-04a ±1.4E-04a ±2.3E-04b ±1.8E-04b ±7.1E-04c ±1.2E-03c Galactitol 5.14E-02 5.21E-02 6.90E-02 5.74E-02 9.70E-02 9.52E-02 ±3.2E-03a ±3.63E-03a ±4.0E-03b ±2.4E-03a,b ±6.2E-03c ±5.5E-03c Inositol 0.25±0.01a,b 0.24±0.01b 0.33±0.01c 0.28±0.01a 0.53±0.02d 0.57±0.03d Sorbitol 6.26E-03 5.65E-03 3.88E-02 2.10E-02 0.61±0.07d 0.46±0.06d ±5.8E-04a ±4.1E-04a ±7.4E-03b ±1.2E-03c Glucose-6- 0.25 0.37±0.01b 0.50±0.04c 0.41±0.01b,d 0.39 0.44±8.5E-03c,d phosphate ±7.4E-03a ±5.5E-03b Fructose-6- 0.17 0.19 0.21 0.20±2.3E-03b 0.20 0.20±3.4E-03b phosphate ±2.8E-03a ±4.3E-03b ±5.8E-03c ±2.2E-03b Citrate 3.87±0.15a 3.67±0.17a 4.17±0.18a 4.02±0.15a 1.20±0.12b 1.20±0.09b Succinate 1.19±0.04a,b 1.06±0.05b 1.43±0.05c 1.24±0.05a 1.22±0.08a,b 1.36±0.06a,c Fumarate 0.80±0.03a,b 0.75±0.03b 0.96±0.03c 0.89±0.03a,c 1.49±0.06d 1.58±0.05d Malate 3.22±0.07a 3.21±0.12a 3.70±0.12b 3.48±0.11a,b 7.51±0.28c 7.91±0.23c Glycerol-3- 6.00±0.18a,b 5.74±0.20b 7.11±0.21c 6.60±0.22a,c 8.05±0.28d 9.18±0.32d phosphate Phosphate 1.46±0.07a 1.46±0.08a 1.77±0.08b 1.52±0.08a 2.54±0.13c 2.75±0.14c Putrescine 2.21E-02 1.81E-02 2.63E-02 2.27E-02 2.83E-02 4.34E-02 ±2.5E-03a,b ±3.0E-03b ±2.4E-03a ±3.3E-03a,b ±3.6E-03a ±4.1E-03c Cadaverine 2.55E-02 2.19E-02 2.97E-02 2.50E-02 3.16E-02 5.42E-02 ±3.1E-03a,b ±4.0E-03b ±2.8E-03a ±3.2E-03a,b ±3.4E-03a ±5.2E-03c Glucono- 0.31±0.02a,b 0.28±0.02b 0.42±0.03c 0.36±0.03a,c 0.94±0.05d 0.78±0.06d delta-lactone Total amino 14.00±0.67a 13.84±0.64a 16.87±0.45b 15.39±0.62a,b 20.81±0.87c 21.25±0.76c acids Total sugars 8.59±0.32a 8.12±0.24a 11.87±0.71b 10.10±0.37c 15.96±0.99d 14.50±0.48d Total polyols 0.45±0.02a,b 0.43±0.02b 0.61±0.03c 0.51±0.02a 1.61±0.10d 1.52±0.10d

214

Table D.8 Metabolite pathway enrichment analysis Metabolite pathway enrichment analysis of the CS+2R, RCH+CS+2R, CS+24R, and

RCH+CS+24R treatments relative to control. Pairwise metabolite pathway enrichment analysis was conducted on the log2 metabolite contents for each comparison.

Significantly enriched pathways with FDR<0.05 are included in the table.

215

Table D.8

Comparison Pathway Represented Impact FDR metabolites (#) Control v. CS+2R Fructose and Mannose 4 1.58E-01 6.53E-08 metabolism Control v. CS+2R Galactose metabolism 8 9.34E-02 1.17E-07 Control v. CS+2R Starch and sucrose 6 8.77E-02 1.77E-07 metabolism Control v. CS+2R Amino sugar and nucleotide 4 1.53E-01 1.11E-06 sugar metabolism Control v. CS+2R Glycolysis or 3 1.43E-01 1.82E-06 gluconeogenesis Control v. CS+2R Pentose phosphate pathway 5 8.99E-02 1.82E-06 Control v. CS+2R Pentose and glucuronate 3 3.19E-02 1.06E-04 interconversions Control v. CS+2R Primary bile acid biosynthesis 1 8.22E-03 3.96E-04 Control v. CS+2R Purine metabolism 1 0.00E+00 3.96E-04 Control v. CS+2R Thiamine metabolism 1 0.00E+00 3.96E-04 Control v. CS+2R Glycine, serine, and threonine 3 4.20E-01 7.21E-04 metabolism Control v. CS+2R Inositol phosphate 1 1.37E-01 1.04E-03 metabolism Control v. CS+2R Ascorbate and aldarate 1 0.00E+00 1.04E-03 metabolism Control v. CS+2R Methane metabolism 2 1.75E-02 1.17E-03 Control v. CS+2R Cyanoamino acid metabolism 2 0.00E+00 1.17E-03 Control v. CS+2R Riboflavin metabolism 1 0.00E+00 1.18E-03 Control v. CS+2R Tyrosine metabolism 2 0.00E+00 1.49E-03 Control v. CS+2R Valine, lecuine, and 4 3.98E-02 1.64E-03 isoleucine biosynthesis Control v. CS+2R Nicotinate and nicotinamide 1 0.00E+00 1.65E-03 metabolism Control v. CS+2R Sphingolipid metabolism 1 0.00E+00 1.99E-03 Control v. CS+2R Sulphur metabolism 1 0.00E+00 1.99E-03 Control v. CS+2R Porphyrin and chlorophyll 3 0.00E+00 2.12E-03 metabolism Control v. CS+2R Glycerophospholipid 1 3.26E-02 2.12E-03 metabolism Control v. CS+2R Aminoacyl tRNA 10 1.13E-01 2.63E-03 biosynthesis Control v. CS+2R Propanoate metabolism 2 1.34E-03 2.81E-03 Control v. CS+2R Valine, lecuine, and 3 2.23E-02 3.79E-03 isoleucine degredation Control v. CS+2R Citrate cycle (TCA cycle) 4 1.38E-01 4.48E-03 Control v. CS+2R Phenylalanine metabolism 3 1.19E-01 5.66E-03 Control v. CS+2R Pyruvate metabolism 1 0.00E+00 6.23E-03 Control v. CS+2R Glycerolipid metabolism 2 1.96E-01 7.00E-03 Control v. CS+2R Glyoxylate and dicarboxylate 3 2.75E-02 9.34E-03 metabolism Control v. CS+2R Panthotenate and CoA 1 0.00E+00 1.08E-02 biosynthesis Control v. CS+2R Butanoate metabolism 3 3.55E-02 1.46E-02 Control v. CS+2R Cysteine and methionene 2 1.20E-02 1.89E-02 metabolism Control v. CS+2R Alanine, aspartate, and 4 2.36E-01 2.73E-02 glutamate metabolism continued

216

Table D.8 continued

Control v. CS+2R Nitrogen metabolism 3 0.00E+00 2.83E-02 Control v. CS+2R Phenylalanine, tyrosine, and 1 6.20E-04 4.62E-02 tryptophan metabolism Control v. Fructose and Mannose 4 1.58E-01 1.49E-11 RCH+CS+2R metabolism Control v. Galactose metabolism 8 9.34E-02 3.31E-11 RCH+CS+2R Control v. Starch and sucrose 6 8.77E-02 1.53E-09 RCH+CS+2R metabolism Control v. Amino sugar and nucleotide 4 1.53E-01 5.73E-09 RCH+CS+2R sugar metabolism Control v. Glycolysis or 3 1.43E-01 6.22E-08 RCH+CS+2R gluconeogenesis Control v. Pentose phosphate pathway 5 8.99E-02 9.28E-08 RCH+CS+2R Control v. Pentose and glucuronate 3 3.19E-02 2.93E-02 RCH+CS+2R interconversions Control v. CS+24R Fructose and Mannose 4 1.58E-01 4.06E-16 metabolism Control v. CS+24R Galactose metabolism 8 9.34E-02 4.06E-16 Control v. CS+24R Citrate cycle (TCA cycle) 4 1.38E-01 1.56E-15 Control v. CS+24R Starch and sucrose 6 8.77E-02 5.34E-15 metabolism Control v. CS+24R Pentose and glucuronate 3 3.19E-02 1.44E-14 interconversions Control v. CS+24R Riboflavin metabolism 1 0.00E+00 3.22E-14 Control v. CS+24R Glycerolipid metabolism 2 1.96E-01 8.98E-14 Control v. CS+24R Amino sugar and nucleotide 4 1.53E-01 1.41E-13 sugar metabolism Control v. CS+24R Glyoxylate and dicarboxylate 3 2.75E-02 4.77E-13 metabolism Control v. CS+24R Butanoate metabolism 3 3.55E-02 5.28E-13 Control v. CS+24R Pentose phosphate pathway 5 8.99E-02 7.01E-13 Control v. CS+24R Pyruvate metabolism 1 0.00E+00 7.01E-13 Control v. CS+24R Glycolysis or 3 1.43E-01 7.01E-13 gluconeogenesis Control v. CS+24R Alanine, aspartate, and 4 2.36E-01 1.02E-12 glutamate metabolism Control v. CS+24R Porphyrin and chlorophyll 3 0.00E+00 5.22E-12 metabolism Control v. CS+24R Taurine and hypotaurine 1 3.24E-02 2.36E-11 metabolism Control v. CS+24R Selenoamino acid metabolism 1 0.00E+00 2.36E-11 Control v. CS+24R Aminoacyl tRNA 10 1.13E-01 2.85E-10 biosynthesis Control v. CS+24R Inositol phosphate 1 1.37E-01 4.16E-10 metabolism Control v. CS+24R Ascorbate and aldarate 1 0.00E+00 4.16E-10 metabolism Control v. CS+24R Cysteine and methionene 2 1.20E-02 4.52E-10 metabolism Control v. CS+24R Valine, lecuine, and 4 3.98E-02 4.53E-10 isoleucine biosynthesis Control v. CS+24R Nicotinate and nicotinamide 1 0.00E+00 1.20E-09 metabolism Control v. CS+24R Glycine, serine, and threonine 3 4.20E-01 1.90E-09 metabolism Control v. CS+24R Nitrogen metabolism 3 0.00E+00 3.48E-09 continued

217

Table D.8 continued

Control v. CS+24R Tyrosine metabolism 2 0.00E+00 3.75E-09 Control v. CS+24R Valine, leucine, and 3 2.23E-02 4.33E-09 isoleucine degradation Control v. CS+24R Phenylalanine metabolism 3 1.19E-01 2.53E-08 Control v. CS+24R Primary bile acid biosynthesis 1 8.22E-03 2.96E-08 Control v. CS+24R Purine metabolism 1 0.00E+00 2.96E-08 Control v. CS+24R Thiamine metabolism 1 0.00E+00 2.96E-08 Control v. CS+24R Methane metabolism 2 1.75E-02 7.94E-08 Control v. CS+24R Cyanoamino acid metabolism 2 0.00E+00 7.94E-08 Control v. CS+24R Phenylalanine, tyrosine, and 1 6.20E-04 3.02E-07 tryptophan metabolism Control v. CS+24R Sphingolipid metabolism 1 0.00E+00 3.20E-07 Control v. CS+24R Sulphur metabolism 1 0.00E+00 3.20E-07 Control v. CS+24R Panthotenate and CoA 1 0.00E+00 1.04E-06 biosynthesis Control v. CS+24R Propanoate metabolism 2 1.34E-03 2.34E-06 Control v. CS+24R Arginine and proline 4 1.90E-01 6.36E-06 metabolism Control v. CS+24R Glycerophospholipid 1 3.26E-02 7.66E-06 metabolism Control v. CS+24R Lysine degredation 2 1.46E-02 2.24E-03 Control v. CS+24R Glutathione metabolism 4 2.81E-02 1.38E-02 Control v. Citrate cycle (TCA cycle) 4 1.38E-01 5.68E-17 RCH+CS+24R Control v. Glyoxylate and dicarboxylate 3 2.75E-02 2.40E-15 RCH+CS+24R metabolism Control v. Pyruvate metabolism 1 0.00E+00 4.80E-14 RCH+CS+24R Control v. Galactose metabolism 8 9.34E-02 5.67E-14 RCH+CS+24R Control v. Fructose and Mannose 4 1.58E-01 6.46E-14 RCH+CS+24R metabolism Control v. Glycerolipid metabolism 2 1.96E-01 2.28E-13 RCH+CS+24R Control v. Starch and sucrose 6 8.77E-02 5.65E-13 RCH+CS+24R metabolism Control v. Butanoate metabolism 3 3.55E-02 1.52E-12 RCH+CS+24R Control v. Riboflavin metabolism 1 0.00E+00 3.99E-12 RCH+CS+24R Control v. Porphyrin and chlorophyll 3 0.00E+00 5.31E-12 RCH+CS+24R metabolism Control v. Amino sugar and nucleotide 4 1.53E-01 5.31E-12 RCH+CS+24R sugar metabolism Control v. Pentose and glucuronate 3 3.19E-02 1.83E-11 RCH+CS+24R interconversions Control v. Nicotinate and nicotinamide 1 0.00E+00 3.61E-11 RCH+CS+24R metabolism Control v. Pentose phosphate pathway 5 8.99E-02 7.03E-11 RCH+CS+24R Control v. Glycolysis or 3 1.43E-01 1.01E-10 RCH+CS+24R gluconeogenesis Control v. Valine, leucine, and 4 3.98E-02 1.92E-10 RCH+CS+24R isoleucine biosynthesis Control v. Tyrosine metabolism 2 0.00E+00 2.26E-10 RCH+CS+24R continued

218

Table D.8 continued

Control v. Aminoacyl tRNA 10 1.13E-01 3.96E-10 RCH+CS+24R biosynthesis Control v. Inositol phosphate 1 1.37E-01 4.23E-10 RCH+CS+24R metabolism Control v. Ascorbate and aldarate 1 0.00E+00 4.23E-10 RCH+CS+24R metabolism Control v. Glycine, serine, and threonine 3 4.20E-01 5.50E-10 RCH+CS+24R metabolism Control v. Valine, leucine, and 3 2.23E-02 1.67E-09 RCH+CS+24R isoleucine degradation Control v. Alanine, aspartate, and 4 2.36E-01 1.98E-09 RCH+CS+24R glutamate metabolism Control v. Nitrogen metabolism 3 0.00E+00 3.11E-09 RCH+CS+24R Control v. Primary bile acid biosynthesis 1 8.22E-03 5.89E-09 RCH+CS+24R Control v. Purine metabolism 1 0.00E+00 5.89E-09 RCH+CS+24R Control v. Thiamine metabolism 1 0.00E+00 5.89E-09 RCH+CS+24R Control v. Phenylalanine metabolism 3 1.19E-01 6.26E-09 RCH+CS+24R Control v. Methane metabolism 2 1.75E-02 3.21E-08 RCH+CS+24R Control v. Cyanoamino acid metabolism 2 0.00E+00 3.21E-08 RCH+CS+24R Control v. Glycerophospholipid 1 3.26E-02 5.21E-08 RCH+CS+24R metabolism Control v. Cysteine and methionene 2 1.20E-02 1.00E-07 RCH+CS+24R metabolism Control v. Sphingolipid metabolism 1 0.00E+00 1.34E-07 RCH+CS+24R Control v. Sulphur metabolism 1 0.00E+00 1.34E-07 RCH+CS+24R Control v. Panthotenate and CoA 1 0.00E+00 2.88E-07 RCH+CS+24R biosynthesis Control v. Propanoate metabolism 2 1.34E-03 3.72E-07 RCH+CS+24R Control v. Arginine and proline 4 1.90E-01 4.12E-07 RCH+CS+24R metabolism Control v. Phenylalanine, tyrosine, and 1 6.20E-04 4.89E-07 RCH+CS+24R tryptophan metabolism Control v. Taurine and hypotaurine 1 3.24E-02 6.08E-07 RCH+CS+24R metabolism Control v. Selenoamino acid metabolism 1 0.00E+00 6.08E-07 RCH+CS+24R Control v. Lysine degredation 2 1.46E-02 1.01E-05 RCH+CS+24R Control v. Glutathione metabolism 4 2.81E-02 2.65E-05 RCH+CS+24R

219

References

Abdrakhamanova, A., Wang, Q. Y., Khokhlova, L. & Nick, P. Is microtubule

disassembly a trigger for cold acclimation? Plant Cell Physiol 44, 676–686,

(2003).

Adams, M. D. et al. The genome sequence of Drosophila melanogaster. Science 287,

2185–2195, (2000).

Adedokun, T. A. & Denlinger, D. L. Cold-hardiness - a component of the diapause

syndrome in pupae of the flesh flies, Sarcophaga crassipalpis and Sarcophaga

bullata. Physiol Entomol 9, 361–364, (1984).

Alexeev, D. et al. Application of Spiroplasma melliferum proteogenomic profiling for the

discovery of virulence factors and pathogenicity mechanisms in host-associated

spiroplasmas. J Proteome Res 11, 224-236, (2012).

Al-Fageeh, M. B. & Smales, C. M. Control and regulation of the cellular responses to

cold shock: the responses in yeast and mammalian systems. Biochem J 397, 247–

259, (2006).

Allegrucci, G., Carchini G., Convey, P. & Sbordoni, V. Evolutionary geographic

relationships amony orthocladine chironomid from maritime Antarctic

and sub-Antarctic islands. Biol J Linn Soc 106, 258-274, (2012). 220

Allegrucci, G., Carchini, G., Todisco, V., Convey, P. & Sbordoni, V. A molecular

phylogeny of Antarctic Chironomidae and its implications for biogeographical

history. Polar Biol. 29, 320–326, (2006).

Anders, S. & Huber, W. Differential expression analysis for sequence count data.

Genome Biol 11, R106, (2010).

Anstead, C. A. et al. Lucilia cuprina genome unlocks parasitic fly biology to underpin

future interventions. Nat Commun 6, 1-11, (2015).

Arbouzova, N. I. & Zeidler, M. P. JAK/STAT signalling in Drosophila: insights into

conserved regulatory and cellular functions. Development 133: 2605–2616,

(2006).

Arensburger, P. et al. Sequencing of Culex quinquefasciatus establishes a platform for

mosquito comparative genomics. Science 330, 86–88, (2010).

Armstrong, G. A., Esteban, C., Rodríguez, R. & Robertson, M. Cold hardening

modulates K+ homeostasis in the brain of Drosophila melanogaster during chill

coma. J Insect Physiol 52, 1511-1516, (2012).

Ashburner, M. et al. Gene Ontology C. Gene Ontology: tool for the unification of

biology. Nat Genet 25, 25–29, (2000).

Atchley, W. R. & Davis, B. L. Chromosomal variability in the Antarctic insect, Belgica

Antarctica (Diptera: Chironomidae). Ann. Entomol. Soc. Am. 72, 246–252,

(1979).

Bahrndorff, S., Petersen, S. O., Loeschcke, V., Overgaard, J. & Holmstrup, M.

Difference in cold and drought tolerance of high artic and sub-arctic populations

221

of Megaphorura artica Tullberg 1876 (Onychiuridae: Collembola). Cryobiology

55, 315-323, (2007).

Baker, D. A. & Russell, S. Gene expression during Drosophila melanogaster egg

development before and after reproductive diapause. BMC Genomics 10, (2009).

Bale, J. S. & Hayward , S. A. L. Insect overwintering in a changing climate. J Exp Biol

213, 980–994, (2010).

Bale, J. S. Cold hardiness and overwintering of insects. Agric Zool Rev 3, 157, (1989).

Barnett, D. W., Garrison, E. K., Quinlan, A. R., Stromberg, M. P. & Marth, G. T.

BamTools: a C++ API and toolkit for analyzing and managing BAM files.

Bioinformatics 27, 1691–1692, (2011).

Barthel, A. & Schmoll, D. Novel concepts in insulin regulation of hepatic

gluconeogenesis. Am J Physiol Endocrinol Metab 285, E685–E692, (2003).

Baust, J. G. & Edwards, J. S. Mechanisms of freezing tolerance in Antarctic midge,

Belgica antarctica. Physiol Entomol 4, 1-5, (1979).

Baust, J. G. & Lee, R. E. Multiple stress tolerance in an Antarctic terrestrial arthropod:

Belgica antarctica. Cryobiology 24, 140-147, (1987).

Beckenbach, A. T. Mitochondrial genome sequences of (lower Diptera):

evidence of rearrangement following a complete genome duplication in a winter

crane fly. Genome Biol Evol 4, 89-101, (2012).

Beckstead, R. B., Lam G., & Thummel, C. S. The genomic response to 20-

hydroxyecdysone at the onset of Drosophila metamorphosis. Genome Biol 6,

(2005).

222

Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate – a practical and

powerful approach to multiple testing. J R Stat Soc Ser B Stat Methodol 57, 289–

300, (1995).

Benoit, J. B. et al. Mechanisms to reduce dehydration stress in larvae of the Antarctic

midge, Belgica antarctica. J Insect Physiol 53, 656–667, (2007).

Benoit, J. B., Lopez-Martinez, G., Phillips, Z. P., Patrick, K. R. & Denlinger, D. L.

Heat shock proteins contribute to mosquito dehydration tolerance. J Insect

Physiol 56, 151–156, (2010).

Biemont, C. Genome size evolution: within-species variation in genome size. Heredity

(Edinb) 101, 297–298, (2008).

Bradnam, K., R., et al. Assemblathon 2: evaluating de novo methods of genome

assembly in three vertebrate species. GigaScience 2, 1-31, (2013).

Brennecke, J. et al. Discrete small RNA-generating loci as master regulators of

transposon activity in Drosophila. Cell 128, 1089–1103, (2007).

Buckley, B. A., Place, S. P. & Hofmann, G. E. Regulation of heat shock genes in

isolated hepatocytes from an Antarctic fish, Trematomus bernacchii. J. Exp. Biol.

207, 3649–3656, (2004).

Burge, S. et al. Manual GO annotation of predictive protein signatures: the InterPro

approach to GO curation. Database (Oxford) 2012, bar068 (2012).

Bylemans, D. et al. Sequencing and characterization of trypsin modulating oostatic

factor (TMOF) from the ovaires of the grey fleshfly, Neobellieria (Sarcophaga)

bullata 50, 61-72, (1994).

223

Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging

model organism genomes. Genome Res. 18, 188–196, (2008).

Cha, G. H. et al. Discrete functions of TRAF1 and TRAF2 in Drosophila melanogaster

mediated by c-Jun N-terminal kinase and NF-kappa B-dependent signaling

pathways. Mol Cell Biol 23, 7982–7991, (2003).

Chen, C. P. & Denlinger, D. L. Reduction of cold injury in flies using an intermittent

pulse of high temperature. Cryobiology 29, 138-143, (1992).

Chen, C. P., Denlinger, D. L. & Lee, R. E. Cold-shock injury and rapid cold hardening

in the flesh fly Sarcophaga crassipalpis. Physiol Zool 60, 297-304, (1987).

Chen, C. P., Denlinger, D. L. & Lee, R. E. Responses of nondiapausing flesh flies

(Diperta: Sarcophagidae) to low rearing temperatures: developmental rate, cold

tolerance, and glycerol concentrations. Ann Entomol Soc Am 80, 790-796, (1987).

Chen, C. P., Denlinger, D. L. & Lee, R. E. Seasonal variation in generation time,

diapause and cold hardiness in a central Ohio population of the flesh fly,

Sarcophaga bullata. Ecol Entomol 16, 155–162, (1991).

Chen, C. P., Lee, R. E. & Denlinger, D. L. A comparison of the responses of tropical

and temperate flies (Diptera: Sarcophagidae) to cold and heat stress. J Comp

Physiol B 160, 543-547, (1990).

Chown, S. L. Respiratory water loss in insects. Comp Biochem Physiol A Mol Integr

Physiol 133, 791–804, (2002).

Clark, A. G. et al. Evolution of genes and genomes on the Drosophila phylogeny.

Nature 450, 203-218, (2007).

224

Clark, M. S. et al. Surviving the cold: molecular analyses of insect cryoprotective

dehydration in the Arctic springtail Megaphorura arctica (Tullberg). BMC

Genomics 10, 328, (2009).

Colinet H., Lee S. F. & Hoffmann, A. Knocking down expression of Hsp22 and Hsp23

by RNA interference affects recovery from chill coma in Drosophila

melanogaster. J Exp Biol 213, 4146–4150, (2010).

Colinet, H., An Nguyen, T. T., Cloutier, C., Michaud, D. & Hance, T. Proteomic

profiling of a parasitic wasp exposed to constant and fluctuating cold exposure.

Insect Biochem Mol Biol 37, 1177–1188, (2007).

Colinet, H., Larvor, V., Laparie, M. & Renault, D. Exploring the plastic response to

cold acclimation through metabolomics. Funct Ecol 26, 711–722, (2012).

Conesa, A. & Gotz, S. Blast2GO: a comprehensive suite for functional analysis in plant

genomics. Int. J. Plant Genomics 2008, 619832 (2008).

Conesa, A. et al. Blast2GO: A universal tool for annotation, visualization and analysis in

functional genomics research. Bioinformatics 21, 3674–3676, (2005).

Consortium TU. Reorganizing the protein space at the Universal Protein Resource

(UniProt). Nucleic Acids Res. 40, D71–D75 (2012).

Convey, P. & Block, W. Antarctic diptera: ecology, physiology and distribution. Eur. J.

Entomol. 93, 1–13, (1996).

Cornette, R. et al. Identification of anhydrobiosis-related genes from an expressed

sequence tag database in the cryptobiotic midge Polypedilum vanderplanki

(Diptera; Chironomidae). J Biol Chem 285, 35889–35899, (2010).

225

Cossins, A. R. The adaptation of membrane structure and function to changes in

temperature. In: Cellular Acclimatisation to Environmental Change, edited by

Cossins AR, Sheterline P. Cambridge: Cambridge University Press, 1983, p. 3–

31.

Cridland, J. M., Macdonald, S. J., Long, A. D. & Thornton, K. R. Abundance and

distribution of transposable elements in two Drosophila QTL mapping resources.

Mol. Biol. Evol. 30, 2311–2327, (2013).

Danneels, E. L. et al. Early changes in the pupal transcriptome of the flesh fly

Sarcophaga crassipalpis to parasitization by the ectoparasitic wasp, Nasonia

vitripennis. Insect Biochem Mol Biol 43, 1189-1200, (2013).

Darling, A. C., Mau, B., Blattner, F. R. & Perna, N. T. Mauve: multiple alignment of

conserved genomic sequence with rearrangements. Genome Res 14, 1394-1403,

(2004).

Denlinger, D. L. Induction and termination of pupal diapause in Sarcophaga (Diptera -

Sarcophagidae). Biol Bull 142: 11-&, (1972).

Denlinger, D. L. & Žđárek, J. Metamorphosis behavior of flies. Annu Rev Entomol 39,

243-266, (1994).

Denlinger, D. L. Regulation of diapause. Annu Rev Entomol 47, 93-122, (2002).

DePristo, M. A. et al. A framework for variation discovery and genotyping using next-

generation DNA sequencing data. Nat. Genet. 43, 491–498, (2011).

226

Desjardins, C. A., Perfectti, F., Bartos, J. D., Enders, L. S. and Werren, J. H. The

genetic basis of interspecies host preference difference in the model parasitoid

Nasonia. Heredity 104, 270-277, (2010).

Detrich, H. W., Johnson, K. A. & Marcheseragona, S. P. Polymerization of Antarctic

fish tubulins at low-temperatures - energetic aspects. Biochemistry (Mosc) 28,

10085–10093, (1989).

Dollo, V. H., Yi, S. X. & Lee, R. E. High temperature pulses decrease indirect chilling

injury and elevate ATP levels in the flesh fly, Sarcophaga crassipalpis.

Cryobiology 60, 351-353, (2010).

Duman, J. G. The inhibition of ice nucleators by insect antifreeze proteins is enhanced

by glycerol and citrate. J Comp Physiol B 172, 163-168, (2002).

Efron, B. & Tibshirani, R. On testing the significance of sets of genes. Ann Appl Stat 1,

107–129, (2007).

Elnitsky, M. A., Benoit, J. B., Denlinger, D. L. & Lee, R. E. Dessiccation tolerance

and drought acclimation in the Antarctic collembolan Cryptopygus antarcticus. J

Insect Physiol 54, 1432-1439, (2008).

Elnitsky, M. A., Benoit, J. B., Lopez-Martinez, G., Denlinger, D. L. & Lee, R. E.

Osmoregulation and salinity tolerance in the Antarctic midge, Belgica antarctica:

seawater exposure confers enhanced tolerance to freezing and dehydration. J Exp

Biol 212, 2864–2871, (2009).

227

Elnitsky, M. A., Hayward, S. A. L., Rinehart J. P., Denlinger, D. L. & Lee, R. E.

Cryoprotective dehydration and the resistance to inoculative freezing in the

Antarctic midge, Belgica antarctica. J Exp Biol 211, 524–530, (2008).

English, A. C. et al. Mind the gap: upgrading genomes with Pacific Biosciences RS

long-read sequencing technology. PLoS ONE 7, e47768 (2012).

Feder, M. E. & Hofmann, G. E. Heat-shock proteins, molecular chaperones, and the

stress response: evolutionary and ecological physiology. Annu Rev Physiol 61,

243–282, (1999).

Feder, M. E. & Walser, J. C. The biological limitations of transcriptomics in

elucidating stress and stress responses. J Evol Biol 18, 901–910, (2005).

Fisher, B. et al. BDGP institute homepage BDGP insitu homepage

http://insitu.fruitfly.org/cgi-bin/ex/insitu.pl (2012)

Fiston-Lavier, A. S., Carrigan, M., Petrov, D. A. & Gonzalez, J. T-lex: a program for

fast and accurate assessment of transposable element presence using next-

generation sequencing data. Nucleic Acids Res. 39, e36 (2011).

Fiston-Lavier, A. S., Vejnar, C. E. & Quesneville, H. Transposable sequence evolution

is driven by gene context. Preprint at arXiv:12090176 [q-bioGN] (2012).

Fraenkel, G. & Hsiao, C. Bursicon, a hormone which mediates tanning of the cuticle in

the adult fly and other insects. J Ins Physiol 11, 513-556, (1965).

Francesconi, F. & Lupi, O. Myiasis. Clin Microbiol Rev 25, 79-105, (2012).

228

Fujiwara, Y. & Denlinger, D. L. p38 MAPK is a likely component of the signal

transduction pathway triggering rapid cold hardening in the flesh fly Sarcophaga

crassipalpis. J Exp Biol 210, 3295–3300, (2007).

Gibbs, A. G. Water-proofing properties of cuticular lipids. Am Zool 38, 471–482, (1998).

Gibbs, A. G., Chippindale, A. K. & Rose, M. R. Physiological mechanisms of evolved

desiccation resistance in Drosophila melanogaster. J Exp Biol 200, 1821–1832,

(1997).

Girardot F., Monnier, V. & Tricoire H. Genome wide analysis of common and specific

stress responses in adult Drosophila melanogaster. BMC Genomics 5, 16, ( 2004).

Goldberg, A. L. Protein degradation and protection against misfolded or damaged

proteins. Nature 426, 895–899, (2003).

Gorski, S. M. et al. A SAGE approach to discovery of genes involved in autophagic cell

death. Curr Biol 13, 358–363, (2003).

Goto, S. G. et al. Functional characterization of an aquaporin in the Antarctic midge

Belgica antarctica. J Insect Physiol 57, 1106–1114, (2011).

Graczyk, T. K., Knight, R., Gilman, R. H. & Cranfield, M. R. The role of non-biting

flies in the epidemiology of human infectious diseases. Microbes Infect 3, 231-

235, (2001).

Haag-Liautard, C. et al. Direct estimation of per nucleotide and genomic deleterious

mutation rates in Drosophila. Nature 445, 82–85, (2007).

229

Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the

Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512,

(2013).

Hahn D. A., Ragland G. J., Shoemaker, D. D. & Denlinger, D. L. Gene discovery

using massively parallel pyrosequencing to develop ESTs for the flesh fly

Sarcophaga crassipalpis. BMC Genomics 10, (2009).

Han, Y. S. et al. Cloning and characterization of serine protease from the human malaria

vector, Anopheles gambiae. Insect Mol Biol. 6, 385-395, (1997).

Hand, S. C., Menze, M. A., Toner, M., Boswell, L. & Moore, D. LEA proteins during

water stress: Not just for plants anymore. Annu Rev Physiol 73, 115–134, (2011).

Hanson, R. W. & Reshef, L. Regulation of phosphoenolpyruvate carboxykinase (GTP)

gene expression. Annu Rev Biochem 66, 581–611, (1997).

Hare, E. E. & Johnston, J. S. Genome size determination using flow cytometry of

propidium iodide-stained nuclei. Methods Mol. Biol. 772, 3–12 (2011).

Hayward, S. A. L., Rinehart, J. P., Sandro, L. H, Lee, R. E. & Denlinger, D. L. Slow

dehydration promotes desiccation and freeze tolerance in the Antarctic midge

Belgica antarctica. J Exp Biol 210, 836–844, (2007).

He, C. C. & Klionsky, D. J. Regulation mechanisms and signaling pathways of

autophagy. Annu Rev Genet 43, 67–93, (2009).

Henrich, V. C. & Denlinger, D. L. A maternal effect that eliminates pupal diapause in

progeny of the flesh fly, Sarcophaga bullata. J Insect Physiol 28, 881-884,

(1982).

230

Hessen, D. O., Daufresne, M. & Leinaas, H. P. Temperature-size relations from the

cellular-genomic perspective. Biol. Rev. Camb. Philos. Soc. 88, 476–489, (2013).

Hodkinson, I. D. et al. Feeding studies on Onychiurus arcticus (Tullberg) (Collembola,

Onychiuridae) on West Spitsbergen. Polar Biol 14, 17–19, (1994).

Hoffmann, A. A. Physiological climatic limits in Drosophila: patterns and implications.

J Exp Biol 213, 870–880, (2010).

Hofmann, G. E., Buckley, B. A., Airaksinen, S., Keen, J. E. & Somero, G. N. Heat-

shock protein expression is absent in the antarctic fish Trematomus bernacchii

(family Nototheniidae). J. Exp. Biol. 203, 2331–2339, (2000).

Holmstrup, M. Physiology of cold hardiness in cocoons of five earthworm taxa

(Lumbricidae: Oligochaeta). J Comp Physiol B 164, 222-228, (1994).

Holmstrup, M. The ins and out of water dynamics in cold tolerant soil invertebrates. J

Therm Biol 45, 117-123, (2014).

Holmstrup, M., Bayley, M. & Ramløv, H. Supercool or dehydrate? An experimental

analysis of overwintering strategies in small permeable arctic invertebrates. Proc

Natl Acad Sci USA 99, 5716–5720, (2002).

Holt, R. A. et al. The genome sequence of the malaria mosquito Anopheles gambiae.

Science 298, 129–149, (2002).

Huang, D. W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools:

paths toward the comprehensive functional analysis of large gene lists. Nucleic

Acids Res 37, 1–13, (2009).

231

Hunt, M et al. REAPR: a universal tool for genome assembly evaluation. Genome Biol

14, R47, (2013).

Ibarra-Laclette, E. et al. Architecture and evolution of a minute plant genome. Nature

498, 94–98, (2013).

Jakubczak, J. L., Burke, W. D. & Eickbush, T. H. Retrotransposable elements R1 and

R2 interrupt the rRNA genes of most insects. Proc. Natl Acad. Sci. USA 88,

3295–3299, (1991).

Johnston, J. S., Ross, L. D., Beani, L., Hughes, D. P. & Kathirithamby, J. Tiny

genomes and endoreduplication in Strepsiptera. Insect. Mol. Biol. 13, 581–585,

(2004).

Joplin K. H., Yocum, G. D. & Denlinger, D. L. Cold shock elicits expression of heat-

shock proteins in the flesh fly, Sarcophaga crassipalpis. J Insect Physiol 36,

825–834, (1990).

Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic

Acids Res 28, 27–30, (2000).

Kawaraski, Y., Teets, N. M., Denlinger, D. L. & Lee, R. E. Alternative overwintering

strategies in an Antarctic midge: freezing vs. cryoprotective dehydration. Func

Ecol 28, 933-943, (2014).

Kelty, J. D. & Lee, R. E. Induction of rapid cold hardening by cooling at ecologically

relevant rates in Drosophila melanogaster. J Insect Physiol 45, 719–726, (1999).

232

Kelty, J. D. & Lee, R. E. Rapid cold-hardening of Drosophila melanogaster (Diptera:

Drosophilidae) during ecologically based thermoperiodic cycles. J Exp Biol 204,

1659-1666, (2001).

Kelty, J. D., Killian, K. A. & Lee, R. E. Cold shock and rapid cold-hardening of pharate

adult flesh flies (Sarcophaga crassipalpis): effects on behavior and

neuromuscular function following eclosion. Physiol Entomol 21, 283-288, (1996).

Kelty, J. Rapid-cold hardening of Drosophila melanogaster in a field setting. Physiol

Entomol 32, 343-350, (2007).

Kennedy, A. D. Water as a limiting factor in the Antarctic terrestrial environment: A

biogeographical synthesis. Arct Alp Res 25, 308–315, (1993).

Kim, M., Robich, R. M., Rinehart, J. P. & Denlinger, D. L. Upregulation of two actin

genes and redistribution of actin during diapause and cold stress in the northern

house mosquito, Culex pipiens. J Insect Physiol 52, 1226–1233, (2006).

Kim, S. Y., Denlinger, D. L. & Smith, B. Spatial Conditioning in the flesh fly,

Sarcophaga crassipalpis: disruption of learning by cold shock and protection by

rapid cold hardening. J Asia-Pacific Entomol 8, 345-351, (2005).

Kirkness, E. F. et al. Genome sequences of the human body louse and its primary

endosymbiont provide insights into the permanent parasitic lifestyle. Proc. Natl.

Acad. Sci. USA 107, 12168–12173, (2010).

Kohshima, S. A novel cold-tolerant insect found in a Himalayan . Nature 310,

225-227, (1984).

Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59, (2004).

233

Korolchuk, V. I., Menzies, F. M. & Rubinsztein, D. C. Mechanisms of cross-talk

between the ubiquitin-proteasome and autophagy-lysosome systems. FEBS Lett

584, 1393–1398, (2010).

Koštál , V. & Tollarova-Borovanska M. The 70 kDa heat shock protein assists during

the repair of chilling injury in the insect, Pyrrhocoris apterus. PLoS ONE 4,

e4546, (2009).

Koštál, V. et al. Long-term cold acclimation extends survival time at 0 degrees C and

modifies the metabolomic profiles of the larvae of the fruit fly Drosophila

melanogaster. PLoS One 6, 10, (2011).

Koštál, V. et al. Long-term cold acclimation extends survival time at 0° C and modifies

the metabolomics profiles of the larvae of the fruit fly Drosophila melanogaster.

PLoS ONE 6, e25025, (2011).

Koštál, V., Tollarova, M. & Sula, J. Adjustments of the enzymatic complement for

polyol biosynthesis and accumulation in diapausing cold-acclimated adults of

Pyrrhocoris apterus. J Insect Physiol 50, 303–313, (2004).

Koštál, V., Yanagimoto, M. & Bastl, J. Chilling-injury and disturbance of ion

homeostasis in the coxal muscle of the tropical cockroach (Nauphoeta cinerea).

Comp Biochem Physiol Part B 143, 171-179, (2006).

Koštál, V., Zahradnickova, H. & Simek P. Hyperprolinemic larvae of the drosophilid

fly, Chymomyza costata, survive cryopreservation in liquid nitrogen. Proc Natl

Acad Sci USA 108, 13041–13046, (2011).

234

Kuroiwa, A., Hafen, E. & Gehring, W. J. Cloning and transcriptional analysis of the

segmentation gene fushi tarazu of Drosophila. Cell 37, 825-831, (1984).

Lalouette, L., Williams C. M., Hervant F., Sinclair, B. J. & Renault D. Metabolic rate

and oxidative stress in insects exposed to low temperature thermal fluctuations.

Comp Biochem Physiol A Mol Integr Physiol 158, 229–234, (2011).

Landis, G. N. et al. Similar gene expression patterns characterize aging and oxidative

stress in Drosophila melanogaster. Proc Natl Acad Sci USA 101, 7663–7668,

(2004).

Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memoryefficient

alignment of short DNA sequences to the . Genome Biol. 10, R25,

(2009).

Langmean, B. & Salzberg, S. Fast gapped-read alignment with Bowtie 2. Nat Methods

9, 357-360, (2012).

Larionov, A., Krause A. & Miller W. A standard curve based method for relative real

time PCR data processing. BMC Bioinformatics 6, 62, (2005).

Larsen, K. J. & Lee, R. E. Cold tolerance including rapid cold hardening and

inoculative freezing of fall migrant monarch butterflies in Ohio. J Insect Physiol

40, 859-864, (1994).

Laslett, D. & Canback, B. ARAGORN, a program to detect tRNA genes and tmRNA

genes in nucleotide sequences. Nucleic Acids Res. 32, 11–16, (2004).

Lee, J. H. et al. Sestrin as a feedback inhibitor of TOR that prevents age-related

pathologies. Science 327, 1223–1228, (2010).

235

Lee, R. E. & Costanzo, J. P. Biological ice nucleation and ice distribution in cold-hardy

ectothermic animals. Annu Rev Physiol 60, 55-72, (1998).

Lee, R. E. A primer on insect cold tolerance. In: Low Temperature Biology of Insects,

edited by Denlinger DL, Lee RE. Cambridge: Cambridge University Press, 2010,

p. 3–34.

Lee, R. E. Insect cold-hardiness: to freeze or not to freeze. BioScience 39, 308-313,

(1989).

Lee, R. E., Chen, C. P. & Denlinger, D. L. A rapid cold-hardening process in insects.

Science 238, 1415–1417, (1987).

Lee, R. E., Damodaran, K. Yi, S. X. & Lorigan, G. A. Rapid cold-hardening inscreases

membrane fluidity and cold tolerance of insect cells. Cryobiology 52, 459-463,

(2006).

Lemaitre, B., Nicolas, E., Michaut, L., Reichhart, J. M. & Hoffmann, J. A. The

dorsoventral regulatory gene cassette spatzle/Toll/cactus controls the potent

antifungal response in Drosophila adults. Cell 86, 973–983, (1996).

Li H. et al. 1000 Genome Project Data Processing Subgroup the sequence alignment/map

formal and SAMtools. Bioinformatics 25, 2078-2079, (2005).

Li, A. & Denlinger, D. L. Rapid cold hardening elicits changes in brain protein profiles

of the flesh fly, Sarcophaga crassipalpis. Insect Mol Biol 17, 565–572, (2008).

Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler

transform. Bioinformatics 25, 1754–1760, (2009).

236

Li, H. & Durbin, R. Inference of human population history from individual whole-

genome sequences. Nature 475, 493–496, (2011).

Li, L., Stoeckert, Jr. C. J. & Roos, D. S. OrthoMCL: identification of ortholog groups

for eukaryotic genomes. Genome Res. 13, 2178–2189, (2003).

Li, R. et al. ReAS: recovery of ancestral sequences for transposable elements from the

unassembled reads of a whole genome shotgun. PLoS Comput. Biol. 1, e43

(2005).

Lisi, S., Mazzon, I. & White, K. Diverse domains of THREAD/DIAP1 are required to

inhibit apoptosis induced by REAPER and HID in Drosophila. Genetics 154,

669–678, (2000).

Liu, G. W., Roy, J. & Johnson, E. A. Identification and function of hypoxiaresponse

genes in Drosophila melanogaster. Physiol Genomics 25, 134–141, (2006).

Liu, K., Tsujimoto, H., Cha, S. J., Agre, P. & Rasgon, J. L. Aquaporin water channel

AgAQP1 in the malaria vector mosquito Anopheles gambiae during blood feeding

and humidity adaptation. Proc Natl Acad Sci USA 108, 6062–6066, (2011).

Livermore, R., Eagles, G., Morris, P. & Maldonado, A. Shackleton fracture zone: no

barrier to early circumpolar ocean circulation. 32, 797–800, (2004).

Lopez-Martinez G, et al. Dehydration, rehydration, and overhydration alter patterns of

gene expression in the Antarctic midge, Belgica antarctica. J Comp Physiol B

179, 481–491, (2009).

Lopez-Martinez, G., Elnitsky, M. A., Benoit, J. B., Lee, Jr. R. E. & Denlinger, D. L.

High resistance to oxidative damage in the Antarctic midge Belgica Antarctica,

237

and developmentally linked expression of genes encoding superoxide dismutase,

catalase and heat shock proteins. Insect. Biochem. Mol.Biol. 38, 796–804, (2008).

Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer

RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964, (1997).

Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de

novo assembler. GigaSci 18, 1-6, (2012).

Lynch, M. The origins of eukaryotic gene structure. Mol. Biol. Evol. 23, 450–468,

(2006).

MacMillan, H. A. & Sinclair, B. J. Mechanisms underlying insect chill-coma. J Insect

Physiol 57, 12-20, (2011).

MacMillan, H. A., Guglielmo, C. G. & Sinclair, B. J. Membrane remodeling and

glucose in Drosophila melanogaster: a test of rapid cold-hardening and chilling

tolerance hypotheses. J Insect Physiol 55, 243–249, (2009).

Maiuri, M. C., Zalckvar, E., Kimchi, A. & Kroemer, G. Self-eating and self-killing:

crosstalk between autophagy and apoptosis. Nat Rev Mol Cell Biol 8, 741–752,

(2007).

Marron, M. T., Markow, T. A., Kain, K. J. & Gibbs, A. G. Effects of starvation and

desiccation on energy metabolism in desert and mesic Drosophila. J Insect

Physiol 49, 261–270, (2003).

Matkin, L. M. & Markow, T. A. Transcriptional regulation of metabolism associated

with the increased desiccation resistance of the cactophilic Drosophila

mojavensis. Genetics 182, 1279–1288, (2009).

238

Mazur, P. Freezing of living cells: mechanisms and implications. Am J Physiol 247,

C125-C142, (1984).

McDonald, J. R., Bale, J. S. & Walters, K. F. A. Rapid cold hardening in the western

flowers thrips Frankliniella occidentalis. J Insect Physiol 43, 759-766, (1997).

McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for

analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303,

(2010).

McQuilton, St P., Pierre, S. E. & Thurmond, J. Consortium F. FlyBase 101–the basics

of navigating FlyBase. Nucleic Acids Res. 40, D706–D714, (2012).

Megy, K. et al. VectorBase: improvements to a bioinformatics resource for invertebrate

vector genomics. Nucleic Acids Res. 40, D729–D734, (2012).

Michaud, M. R. & Denlinger, D. L. Oleic acid is elevated in cell membranes during

rapid cold-hardening and pupal diapause in the flesh fly, Sarcophaga crassipalpis.

J Insect Physiol 52, 1073–1082, (2006).

Michaud, M. R. & Denlinger, D. L. Shifts in the carbohydrate, polyol, and amino acid

pools during rapid cold-hardening and diapause-associated cold-hardening in

flesh flies (Sarcophaga crassipalpis): a metabolomics comparison. J Comp

Physiol B 177, 753–763, (2007).

Michaud, M. R. et al. Metabolomics reveals unique and shared metabolic changes in

response to heat shock, freezing and desiccation in the Antarctic midge, Belgica

antarctica. J Insect Physiol 54, 645-655, (2008).

239

Misener S. R., Chen C. P. & Walker V. K. Cold tolerance and proline metabolic gene

expression in Drosophila melanogaster. J Insect Physiol 47, 393–400, (2001).

Mitchell, B. K. & Itagaki, H. Interneurons of the subesophageal ganglion of

Sarcophaga bullata responding to gustatory and mechanosensory stimuli. J Comp

Physiol 171, 213-230, (1992).

Mitchell, B. K., Itagaki, H. & Rivet, M. P. Peripheral and central structures involved in

insect gustation. Microsc Res Techniq 47, 401-415, (1999).

Montiel, P. O., Grubor-Lajsic, G. & Worland, M. R. Partial desiccation induced by

subzero temperatures as a component of the survival strategy of the Arctic

collembolan Onychiurus arcticus (Tullberg). J Insect Physiol 44, 211–219,

(1998).

Morgulis, A., Gertz, M. E., Schäffer, A. A. & Agarwala, R. WindowMasker: window-

based masker for sequenced genomes. Bioinformatics 22, 134-141, (2006).

Morimoto, R. I. Regulation of the heat shock transcriptional response: Cross talk

between a family of heat shock factors, molecular chaperones, and negative

regulators. Genes Dev 12, 3788–3796, (1998).

Morris, B. First reported case of human aural myiasis caused by the flesh fly

Parasarcophaga crassipalpis (Diptera: Sarcophagidae). J Parasitology 73, 1068-

1069, (1987).

Mortazavi, A. et al. Scaffolding a Caenorhabditis nematode genome with RNA-seq.

Genome Res 20, 1740–1747, (2010).

240

Nasonia Genome Working Group, The. Functional and evolutionary insights from the

genomes of three parasitoid Nasonia species. Science 327, 343-347, (2010).

Nene, V. et al. Genome sequence of Aedes aegypti, a major arbovirus vector. Science

316, 1718–1723, (2007).

Neven, L. G., Duman, J. G, Beals, J. M. & Castellino, F. J. Overwintering adaptations

of the stag beetle, Ceruchus piceus: removal of ice nucleators in the winter to

promote supercooling. J Comp Physiol B 165, 707-716, (1986).

Nirmala, X., Hypša, V. & Žurovec, M. Molecular phylogeny of

(Diptera:): the evoluation of 18S and 16S ribosomal rDNAs in higher

dipterans and their use in phylogenetic inference. Insect Mol Biol 10, 475-485,

(2001)

Novembre, J. A. Accounting for background nucleotide composition when measuring

codon usage bias. Mol Biol Evol 19, 1390–1394, (2002).

Overgaard, J. et al. Metabolomic profiling of rapid cold hardening and cold shock in

Drosophila melanogaster. J Insect Physiol 53, 1218–1232, (2007).

Overgaard, J., Sᴓrensen, J. G., Petersen, S. O., Loeschcke, V. & Holmstrup, M.

Changes in membrane lipid composition following rapid cold hardening in

Drosophila melanogaster. J Insect Physiol 51, 1173–1182, (2005).

Overgaard, J., Sᴓrensen, J. G., Petersen, S. O., Loeschcke, V. & Holmstrup, M.

Reorganization of the membrane lipids during fast and slow cold hardening in

Drosophila melanogaster. Physiol Entomol 31, 328-335, (2006).

241

Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core

genes in eukaryotic genomes. Bioinformatics 23, 1061–1067, (2007).

Peckham, V. Notes on the chironomid midge Belgica antarctica Jacobs at Anvers Island

in the maritime Antarctic. Pac Insects Monogr 25, 145–166, (1971).

Petitpierre E. Molecular cytogenetics and of insects, with particular reference

to the coleoptera. Int J of Insect Morphol Embryol 25, 115-134, (1996).

Picard. http://picard.sourceforge.net (2013).

Powell, S. J. & Bale, J. S. Effect of long-term and rapid cold hardening on the cold

torpor temperature of an aphid. Physiol Entomol 31, 348-352 (2006).

Powell, S. J. & Bale, J. S. Low temperature acclimated populations of the grain aphid

Sitobion avenae retain ability to rapidly cold harden with enhanced fitness. J Exp

Biol 208, 2615-2620, (2005).

Qin, W., Neal, S. J., Robertson, R. M., Westwood, J. T. & Walker, V. K. Cold

hardening and transcriptional change in Drosophila melanogaster. Insect Mol Biol

14, 607–613, (2005).

Quesneville, H. et al. Combined evidence annotation of transposable elements in

genome sequences. PLoS Comput. Biol. 1, 166–175 (2005).

Ragland, G. J., Denlinger, D. L. & Hahn, D. A. Mechanisms of suspended animation

are revealed by transcript profiling of diapause in the flesh fly. Proc Natl Acad Sci

USA 107, 14909–14914, (2010).

Ragland, G. J., Egan, S. P., Feder, J. L., Berlocher, S. H. & Hahn, D. A.

Developmental trajectories of gene expression reveal candidates for diapause

242

termination: a key life-history transition in the apple maggot fly Rhagoletis

pomonella. J Exp Biol 214, 3948–3959, (2011).

Reynolds, J. A., Clark, J., Diakoff, S. J. & Denlinger, D. L. Transciptional evidence

for small RNA regulations of pupal diapause in the flesh fly, Sarcophaga bullata.

Insect Biochem Mol Biol 43, 982-989, (2013).

Rinehart, J. P. et al. Continuous up-regulation of heat shock proteins in larvae, but not

adults, of a polar insect. Proc Natl Acad Sci USA 103, 14223–14227, (2006).

Rinehart, J. P., Yocum, G. D. & Denlinger, D. L. Thermotolerance and rapid cold

hardening ameliorate the negative effects of brief exposures to high or low

temperatures on the fecyndity in the flesh fly, Sarcophaga crassipalpis. Physiol

Entomol 25, 330-336, (2000).

Robertson, H. M. & Lampe, D. J. Distribution of transposable elements in arthropods.

Annu. Rev Entomol 40, 333–357 (1995).

Robertson, H. M. in: Mobile DNA II (eds Craig, N. L., Craigie, R., Gellert, M. &

Lambowitz, A. M.) (ASM, 2002).

Rockey, S. J., Yoder, J. A. & Denlinger, D. L. Reproductive and developmental

consequences of diapause materal effect in the flesh fly, Sarcophaga bullata.

Physiol Entomol 16, 477-483, (1991).

Roest Crollius, H. et al. Characterization and repeat analysis of the compact genome of

the freshwater pufferfish Tetraodon nigroviridis. Genome Res. 10, 939–949,

(2000).

243

Rojas, R. R. & Leopold, R. A. Chilling injury in the housefly: evidence for the role of

oxidative stress between pupariation and emergence. Cyrobiology 33, 447-458,

(1996).

Salt, R. W. Principles of insect cold-hardiness. Annu Rev Entomol 6, 55-74, (1961).

Schmidt-Ott, U., Rafiqi, A. M., Sander, K. & Johnston, J. S. Extremely small

genomes in two unrelated dipteran insects with shared early developmental traits.

Dev. Genes Evol. 219, 207–210, (2009).

Scott, J. G. et al. Genome of the house of fly, Musca domestica L., a global vector of

disease with adaptation to a septic environment. Genome Biol 466, 1-16, (2014).

Sharakhova, M. V. et al. Update of the Anopheles gambiae PEST genome assembly.

Genome Biol. 8, R5, (2007).

Shiota, T., Yoshida, Y., Hirai, S. & Torii, S. Interstinal myiasis caused by

Parascarcophaga crassipalpis (Diptera: Sarcophagidae). Pediatrics 85, 215,

(1990).

Simkin, A., Wong, A., Poh, Y. P., Theurkauf, W. E. & Jensen, J. D. Recurrent and

recent selective sweeps in the piRNA pathway. Evolution Int J Org Evol 67,

1081-1090, (2013).

Simpson, J. T. & Durbin, R. Efficient de novo assembly of large genomes using

compressed data structures. Genome Res 22, 549-556, (2012).

Simpson, J. T. et al. ABySS: a parallel assembler for short read sequence data. Genome

Res 19, 1117-1123, (2009).

244

Sinclair, B. J., Gibbs, A. G. & Roberts, S. P. Gene transcription during exposure to,

and recovery from, cold and desiccation stress in Drosophila melanogaster. Insect

Mol Biol 16, 435–443, (2007).

Sinensky, M. Homeoviscous adaptation- a homeostatic process that regulates the

viscopstity of membrane lipids in Escherichia coli. Proc Nat Acad Sci USA 71,

522-525, (1974).

Smit, A. F. A. & Hubley, R. RepeatModeler Open-1.0 http://www.repeatmasker.org

(2008–2010).

Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-3.0

http://www.repeatmasker.org (1996–2010).

Smith, C. D. et al. Improved repeat identification and masking in Dipterans. Gene 389,

1–9 (2007).

Smyth, G. K. Linear models and empirical Bayes methods for assessing differential

expression in microarray experiments. Stat Appl Genet Molec Biol 3, Article 3,

(2004).

Sᴓrensen, J. G. & Holmstrup, M. Candidate gene expression associated with

geographical variation in crypoprotective dehydration of Megaphorura arctica. J

Insect Physiol 59, 804-811, (2013).

Sᴓrensen, J. G., Heckmann, L. H. & Homstrup, M. Temporal gene expression

profiles in a palaearctic springtail as induced by desiccation, cold exposure and

during recovery. Funct Ecol 24, 838-846, (2010).

245

Sᴓrensen, J. G., Nielsen, M. M., Kruhoffer, M., Justesen, J. & Loeschcke, V. Full

genome gene expression analysis of the heat stress response, in Drosophila

melanogaster. Cell Stress Chaperones 10, 312–328, (2005).

Specchia, V. et al. Hsp90 prevents phenotypic variation by suppressing the mutagenic

activity of transposons. Nature 463, 662–665, (2010).

Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with

thousands of taxa and mixed models. Bioinformatics 22, 2688-2690, (2006).

Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic

Acids Res. 34, W435–W439, (2006).

Staszak, D. J & Mutchmor J. A. Influence of temperature on induction of chill-coma

and movement of the American cockroach, Periplaneta Americana. Comp

Biochem Physiol 45A, 895-908, (1973).

Storey, J. M. & Storey, K. B. Carbon balance and energetics of cryoprotectant synthesis

in a freeze-tolerant insect - responses to perturbation by anoxia. J Comp Physiol B

160, 77–84, (1990).

Storey, K. B., Baust, J. G. & Storey, J. M. Intermediary metabolism during low-

temperature acclimation in the overwintering gall fly larva, Eurosta solidaginis. J

Comp Physiol 144, 183–190, (1981).

Sugg, P., Edwards, J. S. & Baust, J. Phenology and life history of Belgica Antarctica,

an Antarctic midge (Diptera: Chironomidae). Ecol Entomol 8, 105–113 (1983).

246

Takashima, S., Mkrtchyan, M., Younossi-Hartenstein, A., Merriam, J. R. &

Hartenstein, V. The behaviour of Drosophila adult hindgut stem cells is

controlled by Wnt and Hh signalling. Nature 454, 651–658, (2008).

Teets, N. M. & Denlinger, D. L. Surviving in a frozen desert: Environmental stress

physiology of terrestrial Antarctic arthropods. J Exp Biol 217, 84–93, (2014).

Teets, N. M. et al. Combined transcriptomic and metabolomics approach uncovers

molecular mechanisms of cold tolerance in a temperate flesh fly. Physiol

Genomics 44, 764-777, (2012).

Teets, N. M. et al. Gene expression changes governing extreme dehydration tolerance in

an Antarctic insect. Proc Natl Acad Sci USA 109, 20744–20749 (2012).

Teets, N. M. et al. Rapid cold-hardening in larvae of the Antarctic midge Belgica

antarctica: cellular cold-sensing and a role for calcium. Am J Physiol Regul

Integr Comp Physiol 294, R1938–R1946, (2008).

Teets, N. M. et al. Uncovering molecular mechanisms of cold tolerance in a temperate

flesh fly using a combined transcriptomic and metabolomic approach. Physiol

Genomics 44, 764–777, (2012).

Teets, N. M., Kawarasaki, Y. Lee, R. E. & Denlinger, D. L. Expression of genes

involved in energy mobilization and osmoprotectant synthesis during thermal and

dehydration stress in the Antarctic midge, Belgica antarctica. J Comp Physiol B,

10.1007/s00360-012-0707-2, (2012).

Teets, N. M., Yi, S. X., Lee, R. E. & Denlinger D. L. Calcium signaling mediates cold

sensing in insect tissues. Proc Natl Acad Sci USA 22, 9154-9159, (2013).

247

Telonis-Scott, M., Hallas, R., McKechnie, S. W., Wee, C. W. & Hoffmann, A. A.

Selection for cold resistance alters gene transcript levels in Drosophila

melanogaster. J Insect Physiol 55, 549–555, (2009).

Terhzaz, S. et al. Cell-specific inositol 1,4,5 trisphosphate 3-kinase mediates epithelial

cell apoptosis in response to oxidative stress in Drosophila. Cell Signal 22, 737–

748, (2010).

Thomashow, M. F. Plant cold acclimation: freezing tolerance genes and regulatory

mechanisms. Annu Rev Plant Physiol Plant Mol Biol 50, 571–599, (1999).

Timmermans, M. J., Roelofs, D., Nota, B., Ylstra, B. & Holmstrup, M. Sugar sweet

springtails: On the transcriptional response of Folsomia candida (Collembola) to

desiccation stress. Insect Mol Biol 18, 737–746, (2009).

Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals

unannotated transcripts and isoform switching during cell differentiation. Nat

Biotechnol 28, 511–515, (2010).

Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: Discovering splice junctions with

RNA-Seq. Bioinformatics 25, 1105–1111, (2009).

UniProt Consotrium, The. UniProt: a hub for protein information. Nucl Acids Res 43,

D204-D212, (2015).

Usher, M. B. & Edwards, M. A dipteran from south of the Antarctic circle: Belgica

antarctica (Chironomidae), with a description of its larva. Biol J Linn Soc 22, 19-

31, (1984).

248

Verbruggen, N. & Hermans, C. Proline accumulation in plants: A review. Amino Acids

35, 753–759, (2008).

Vesala, L., Salminen, T. S., Laiho, A., Hoikkala, A. & Kankare, M. Cold tolerance

and cold-induced modulation of gene expression in two Drosophila virilis group

species with different distributions. Insect Mol Biol 21, 107-118, (2012).

Wang, M. H., Marinotti, O., Vardo-Zalik, A., Boparai, R. & Yan, G. Y. Genome-

wide transcriptional analysis of genes associated with acute desiccation stress in

Anopheles gambiae. PLoS ONE 6, e26011, (2011).

Wharton, D. A., Goodall, G. & Marshall, C. J. Freezing survival and cryoprotective

dehydration as cold tolerance mechanisms in the Antartic nematode

Panagrolaimus davidi. J Exp Biol 206, 215-221, (2003).

Wu, Q. & Brown, M. R. Signaling and function of insulin-like peptides in insects. Annu

Rev Entomol 51, 1–24, (2006).

Xia, J. G., Psychogios, N., Young, N. & Wishart, D. S. MetaboAnalyst: a web server

for metabolomic data analysis and interpretation. Nucleic Acids Res 37, W652–

W660, (2009).

Yancey, P. H. Organic osmolytes as compatible, metabolic and counteracting

cytoprotectants in high osmolarity and other stresses. J Exp Biol 208, 2819–2830,

(2005).

Yi, S. X. & Lee, R. E. Rapid cold-hardening blocks cold-induced apoptosis by inhibiting

the activation of pro-caspases in the flesh fly Sarcophaga crassipalpis. Apoptosis

16, 249–255, (2011).

249

Yi, S. X. et al. Function and immuno-localization of aquaporins in the Antarctic midge

Belgica antarctica. J Insect Physiol 57, 1096–1105, (2011).

Yi, S. X., Moore, C. W. & Lee R. E. Rapid cold-hardening protects Drosophila

melanogaster from cold-induced apoptosis. Apoptosis 12, 1183–1193, (2007).

Yocum, G. D. et al. Alteration of the eclosion rhythm and eclosion behavior in the flesh

fly, Sarcophaga crassipalpis, by low and high temperature stress. J Insect Physiol

40, 13-21, (1994).

Yoder, J. A., Benoit, J. B., Denlinger, D. L. & Rivers, D. B. Stress-induced

accumulation of glycerol in the flesh fly, Sarcophaga bullata: Evidence indicating

anti-desiccant and cryoprotectant functions of this polyol and a role for the brain

in coordinating the response. J Insect Physiol 52, 202–214, (2006).

Young, M. D., Wakefield, M. J., Smyth, G. K. & Oshlack, A. Gene ontology analysis

for RNA-seq: accounting for selection bias. Genome Biol 11, R14, (2010).

Zacharias H. Underreplication of a polytene chromosome arm in the chironomid

Prodiamesa olivacea. Chromosoma 72, 23-51, (1979).

Zachariassen, K. E. & Hammel, H. T. Nucleating agents in the haemolymph of insects

tolerant to freezing. Nature 262, 285-288, (1976).

Zachariassen, K. E. Physiology of tolerance in insects. Physiol Rev 65, 799-832,

(1985).

Zachariassen, K. E. The mechanism of the cryoprotective effect of glycerol in beetles

tolerant to freezing. J Insect Physiol 25, 29-32, (1979).

250

Zachariassen, K. E., Li, N. G., Laugsand, A. E., Kristansen, E. & Pedersen, S. A. Is

the strategy for cold hardiness in insects determined but their water balance? A

study on two closely related families of beetles: Cerambycidae and

Chrysomelidae. J Comp Physiol B 178, 977-984, (2008).

Zaslavskii, V. A., & Veerman, A. Insect development: Photoperiodic and temperature

control. Berlin: Springer-Verlag. (1988).

Zdarek, J. & Fraenkel, G. Overt and covert effects of the endogenous and exogenous

ecdysone in puparium formation of flies. Proc Natl Acad Sci 67, 331-337, (1970).

Zdarek, J. & Fraenkel, G. The mechanism of puparium formation in flies. J Exp Zool

179, 315-324, (1972).

Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using

de Bruijn graphs. Genome Res 18, 821–829, (2008).

Zhang, G. et al. The oyster genome reveals stress adaptation and complexity of shell

formation. Nature 490, 49-54, (2012).

Zhang, J., Marshall, K. E., Westwood, J. T., Clark, M. S. & Sinclair, B. J. Divergent

transcriptomic responses to repeated and single cold exposures in Drosophila

melanogaster. J Exp Biol 214, 4021–4029, (2011).

Zhirong, B., & Eddy, S., R. Automated De Novo Identification of Repeat Sequence

Families in Sequenced Genomes. Genome Res 12, 1269-1276, (2002).

Zhou, D. et al. Mechanisms underlying hypoxia tolerance in Drosophila melanogaster:

hairy as a metabolic switch. Plos Genetics 4, e1000221, (2008).

251