<<

UNDERSTANDING THE GENOMIC BASIS OF STRESS ADAPTATION IN

PICOCHLORUM

By

FATIMA FOFLONKER

A dissertation submitted to the

School of Graduate Studies

Rutgers, The State University of New Jersey

In partial fulfillment of the requirements

For the degree of

Doctor of Philosophy

Graduate Program in Microbial Biology

Written under the direction of

Debashish Bhattacharya

And approved by

______

______

______

______

New Brunswick, New Jersey

January 2018

ABSTRACT OF THE DISSERTATION

Understanding the Genomic Basis of Stress Adaptation in Green Algae

by FATIMA FOFLONKER

Dissertation Director:

Debashish Bhattacharya

Gaining a better understanding of adaptive evolution has become increasingly important to predict the responses of important primary producers in the environment to climate-change driven environmental fluctuations. In my doctoral research, the genomes from four taxa of a naturally robust green algal lineage,

Picochlorum (, Trebouxiphycae) were sequenced to allow a comparative genomic and transcriptomic analysis. The over-arching goal of this work was to investigate environmental adaptations and the origin of haltolerance. Found in environments ranging from brackish estuaries to hypersaline terrestrial environments, this lineage is tolerant of a wide range of fluctuating salinities, light intensities, temperatures, and has a robust photosystem II. The small, reduced diploid genomes (13.4-15.1Mbp) of Picochlorum, indicative of genome specialization to extreme environments, has resulted in an interesting genomic organization, including the clustering of genes in the same biochemical pathway and coregulated genes. Coregulation of co-localized genes in “gene neighborhoods” is more prominent soon after exposure to salinity shock, suggesting a role in the rapid response to salinity stress in Picochlorum. Despite the pressure for genome reduction, key gene gains are seen through gene family expansion of an important

ii

SOS1 salt transporter and through bacterium-derived horizontal gene transfer

(HGT). Thirteen instance of HGT were identified that display differential acquisition among Picochlorum taxa, indicating an ongoing process in this lineage. The presence of introns, differential expression under salinity shock, and the use of high quality genomes from closely related provide robust support for the integration of

HGT candidates into host nuclear genomes. Transferred genes are potentially functionally relevant and include encoded proteins with roles related to osmolyte production, cell wall metabolism, and metabolic flexibility. A transcriptomic comparison of two sister taxa with very similar genomes, Picochlorum SENEW3 from a brackish lagoon and from a hypersaline salt plain environment was performed under high (1.5 M NaCl) and low salinity (10mM NaCl) shock conditions. This work revealed different regulation responses to salinity shock in terms of osmolyte production, reflecting nitrogen availability in the respective environments, and indicating that the habitat-driven regulation of the existing gene inventory is key to environmental adaptation. These diploid sister taxa also reveal one striking difference between them, levels of haplotype heterozygosity.

RNA-seq expression data supports condition-dependent allele-specific gene expression, indicating a functional relevance to maintaining a large divergent allele pool in P. oklahomensis. Overall, Picochlorum has revealed differences in adaptation strategies between seemingly identical species with regard to morphology and gene sequence similarity. My study has provided insights into the adaptive strategies used by with reduced gene inventories that is reflected in selection acting on genome organization, gene regulation, and specialization.

iii

ACKNOWLEDGMENTS

I would like to thank my advisor, Dr. Debashish Bhattacharya, for his guidance, patience, and support; my committee members Dr. Kay Bidle, Dr. Jeff Boyd, and Dr. Lena Struwe; the Microbial Biology graduate program director, Dr. Gerben Zylstra; research collaborators, Dr. G. C. Dismukes, Dr. Gennady Ananyev, Dr. Hwan Su Yoon, and Dr.

Mehdi Javinmard; the members of the Bhattacharya lab for their advice and help, my mom and dad for supporting me and my education, and my family and friends for their support. I would also like to thank Dr. H. Boyd Woodruff for the fellowship support for the first year in the Microbial Biology graduate program, The Phycological Society of

America for research support, and the NSF-IGERT (Integrative Graduate Education

Research Traineeship) for renewable fuels at Rutgers University (0903675) for fellowship support.

iv

DEDICATION

To the memory of my late husband, Mahmood.

v

ACKNOWLEDGEMENT OF PUBLICATIONS

Chapter 2 has been published as Foflonker F, Price DC, Qiu H, Palenik B, Wang S &

Bhattacharya D (2015) Genome of the halotolerant green alga Picochlorum sp. reveals strategies for thriving under fluctuating environmental conditions. Environ Microbiol 17:

412-426. F. Foflonker participated in writing the manuscript and is directly responsible for all genomic analyses and figures 2-7 and all tables.

Chapter 3 has been published as Foflonker F, Ananyev G, Qiu H, Morrison A, Palenik B,

Dismukes GC & Bhattacharya D (2016) The unexpected extremophile: Tolerance to fluctuating salinity in the green alga Picochlorum. Algal Research 16: 465-472. F.

Foflonker participated in writing the manuscript and is directly responsible for all analyses, tables, and figures.

Chapter 4 is being prepared for publication as Foflonker F, Mollegard D, Ong M, Yoon

HS, & Bhattacharya D (2018). Genomic anlysis of Picochlorum species reveals how microalgae adapt to fluctuating environments. F. Foflonker participated in writing the manuscript and is directly responsible for all analyses, tables, and figures.

vi

TABLE OF CONTENTS

Abstract of the Dissertation ...... ii

Acknowledgments ...... iv

Dedication ...... v

Acknowledgement of Publications ...... vi

Table of Contents ...... vii

List of Tables ...... xi

List of Figures ...... xiv

Chapter 1: Introduction ...... 1

Microalgae as biofuel feedstock ...... 1

Salinity stress on eukaryotic microalgae ...... 3

Picochlorum as a biofuel candidate and model to study salinity stress ...... 7

Scope of the thesis ...... 9

Chapter 2: Genome of the haloterant green alga Picochlorum sp. reveals strategies for

thriving under fluctuation environmental conditions ...... 11

Abstract ...... 11

Introduction ...... 12

Results ...... 13

Genome features and phylogeny ...... 13

Clusters of functionally related genes ...... 14

Transporter analysis ...... 17

Growth rates in the presence of organic carbon sources ...... 20

HGT analysis ...... 21

vii

Selenoproteins ...... 24

Hydrogenase activity and other genes of interest ...... 25

Discussion ...... 25

Experimental Procedures ...... 27

Strains and culture conditions ...... 27

DNA and RNA extraction and library construction ...... 28

Genome and transcriptome sequencing ...... 28

Construction of multi-protein tree ...... 29

Phylogenomic analysis ...... 30

Transporter analysis ...... 31

Functionally clustered pathways ...... 31

Acknowledgements ...... 32

Chapter 3: Elucidating salinity shock response mechanisms in picochlorum ...... 45

Abstract ...... 45

Introduction ...... 46

Materials and Methods ...... 56

Salinity shock experimental conditions ...... 56

Transcriptome sequencing ...... 57

Transcriptome analysis ...... 57

Co-localization analysis ...... 58

Photosynthetic measurements ...... 59

Results and Discussion ...... 48

High and low salinity stress elicits separate metabolic responses ...... 48

viii

Co-localization of co-expressed genes in response to salt shock ...... 50

High photorespiration influences carbon and nitrogen flux at high salinity ...... 52

Starch and osmolytes ...... 54

Response of the photosynthetic machinery ...... 54

Conclusions ...... 61

Acknowledgements ...... 62

Supporting Information ...... 67

Cell wall remodeling prevalent at both high and low salinity stress ...... 68

Membrane remodeling key to low salinity shock response ...... 68

Other responses ...... 69

Chapter 4: Characterization of multiple picochlorum genomes to elucidate the origin of

salt tolerance...... 88

Abstract ...... 88

Introduction ...... 89

Results ...... 92

Physiology ...... 92

Genome features and assembly ...... 93

Phylogeny and genome synteny ...... 95

Horizontal gene transfer ...... 96

Gene gain/loss ...... 98

Picochlorum SENEW3 and P. oklahomensis transcriptome comparison in

response to salinity shock ...... 100

Allele-specific expression ...... 103

ix

Discussion ...... 104

Materials and Methods ...... 108

Strain information and growth rates ...... 108

Genome sequencing ...... 108

Genome Assembly, Gene prediction, and annotation ...... 109

Genome synteny ...... 110

Construction of multi-protein tree ...... 111

Genome comparison ...... 112

Phylogenomic methods ...... 112

RNA-seq ...... 113

Allele-specific expression ...... 114

Acknowledgments ...... 114

Conclusion ...... 139

References ...... 142

x

LIST OF TABLES

Table 2.1 Clustered genes in shared pathways in Picochlorum SENEW3 ...... 41

Table 2.2 Instances of HGT that were identified in the Picochlorum SENEW3 genome and their putative gene functions. Putative gene annotations, results of the BLAST analysis, phylogenetic domain of gene origin, putative gene function, putative prokaryotic donor, and the number of EST reads that mapped to the genes are shown...... 42

Table S2.1 List of predicted proteins in Picochlorum SENEW3 showing their putative annotations and results of a BLASTP search against a comprehensive in-house database...... 43

Table S2.2 Ostreococcus tauri nitrate assimilation gene clusters and the corresponding Picochlorum SENEW3 genes and their contig locations. Genes located on the same contig are shown in boldface. The genes are ordered according to their physical location in the Ostreococcus tauri genome. We note that the Ostreococcus

Maf4 gene located in the Cnx2-Maf4-Cnx5 cluster described in the orginal paper is absent from the gene cluster in the lastest genome assembly (Ostreococcus tauri v2.0 from JGI)...... 44

Table 3.S1. Number of RNA-seq reads in each experiment...... 78

Table 3.S2. Co-localized genes in cluster in various gene sets. The highlighted cells represent statistically significant results...... 78

Table 3.S3. Examples of co-localized clusters allowing for two intervening genes that do not follow the expression pattern. Highlighted in yellow are conditions under which clusters meet co-localization criteria. Blue indicates genes that are part of a

xi

cluster, red denotes intervening genes. Also included are orthologs in Chlorella vulgaris and Coccomyxa subellipsoidea. Full clustering information available in Excel file Table 3.S8...... 79

Table 3.S4. Gene expression and predicted targeting for genes in pathways involved in salt stress. Accompanying table for Figure 3.3. E.C. number, enzyme commission number; TC number, transporter classification number; NDE, not differentially expressed; C,chloroplast;M,mitochondria;S,signal peptide...... 82

Table S5. Differentially expressed genes of bacterial origin. NDE; not differentially expressed. L2fc; log2fold change...... 86

Table 4.1. Sequencing and assembly statistics of Picochlorum species ...... 116

Table 4.S1. Summary statistics for variant detection in Picochlorum SENEW3 and P. oklahomensis primary assemblies...... 118

Table 4.S2. Genome Assembly Completeness compared to BUSCO core Eukaryota

...... 120

Table 4.S3. Collinearity between Picochlorum assemblies (# collinear homolog pairs/ # homolog pairs). Maximum gaps allowed = 5...... 120

Table 4.S4. Duplication type categorized into duplication types in the following priority order: segmental duplication (whole genome duplications) (min genes per block 5, maximum gaps 25), tandem, proximal (< 20 intervening genes), dispersed, and singletons...... 121

Table 4.S6. Gene expression of HGT-derived genes in P. oklahomensis and

Picochlorum SENEW3. Differentially expressed genes are highlighted in green...... 131

Table 4.S7. Organelle genome statistics...... 132

xii

Table 4.S8. Predicted genes in plastid genomes. Present (+); absent (-)...... 132

Table 4.S9. Predicted genes in mitochondrial genomes. Present (+); absent (-); found in nuclear genome (N)...... 135

Table 4.S10. Allele-specific gene expression in P. oklahomensis. Primary and haplotig columns represent percentage of gene pairs with > 90% of reads mapping to one of the two alleles on either the primary or haplotig contigs. Biallelic expression defined as between 40 and 60% of reads mapping to both alleles...... 138

xiii

LIST OF FIGURES

Figure 1.1 UV images of Picochlorum SENEW3, with BODIPY. Green fluorescence indicates the neutral lipid ...... 8

Figure 2.1 Phylogenetic analysis of Picochlorum SENEW3. Multi-gene maximum likelihood tree of ten green algae inferred from an alignment of 480,102 amino acids.. . 32

Figure 2.3 Analysis of metabolite transporters in Picochlorum SENEW3 showing frequency of transporters per transporter family in Picochlorum SENEW3 and O. tauri...... 34

Figure 2.4 Putative distribution and functions of metabolite transports in the Picochlorum SENEW3 cell showing transporters involved in the salt stress response...... 35

Figure 2.5 Mixotrophic growth of Picochlorum SENEW3. (A) Growth under salt stress and (B) growth under 1.5 M NaCl salt stress with the addition of different amounts of glucose...... 37

Figure 2.7. Putative functions in Picochlorum SENEW3 conferred by HGT...... 40

Figure S2.1. Mixotrophic growth of Picochlorum SENEW3 cultures in the absence of high salt stress (0.4 M NaCl) with the addition of different amounts of glucose...... 43

Figure 3.1. (a) Average chlorophyll variable fluorescence yield (Fv/Fm) and (b) Growth rate of the algal cultures acclimated to 1M NaCl media...... 63

Figure 3.2. (a) Examples of co-expressed and co-localized gene clusters. (b) Number of genes co-localized versus total genes in gene set at the 1.0 and1.5 L2fc cutoffs ...... 64

Figure 3.3. Summary of the salt shock response in Picochlorum SENEW3 at 1h under (a) high salinity and (b) low salinity conditions...... 66

Figure 3.4. Photoinhibition under 1500 µE m−2 s-1 high light conditions in the presence and absence of chloroplast protein synthesis inhibitor lincomycin (LIN). Cells adapted to 1M NaCl media were incubated in media at various salinities...... 67

xiv

Figure 3.S1. Venn diagram showing the number of genes that are DE (shared and unique) when comparing high salinity and low salinity at 1h and 5h time points...... 70

Figure 3.S2. Gene expression patterns over the time course (data centered) at (A) high salinity and (B) low salinity...... 72

Figure 3.S3. KEGG metabolic maps comparing the low and high salinity stress response at 1h; revealing little overlap in gene responses (blue). Green: low salinity, red: high salinity, blue: both. (A) Background expression showing all expressed genes (DE and not DE). (B) Up-regulated (C) Down-regulated...... 74

Figure 3.S4. Example of randomization analysis simulating gene clustering of same-sized data (N=1000)...... 74

Figure 3.S5. KEGG pathway analysis of genes involved in the TCA cycle under (A) high salinity and (B) low salinity ...... 76

Figure 3.S6. KEGG pathway analysis of genes involved in protein processing in the endoplasmic reticulum under (A) high salinity and (B) low salinity...... 77

Figure 3.S7. Effect of salinity on quality factor (QF) over 24 hours for cells initially grown in 1M NaCl incubated in media at various salinities. QF describes the Kok fitting paramaters, alpha (misses) and beta (double hits), of Fv/Fm data from figure 1...... 78

Figure 4.1. (A) Acclimated growth rates of Picochlorum species in media with varying salinity (10mM – 1.2 M). (B) Growth rates of P. oklahomensis and Picochlorum SENEW3 acclimated to 1M NaCl and shocked with 10mM and 1.5M NaCl...... 115

Figure 4.2. Phylogeny of Picochlorum and other sequenced chlorophytes. Multi-protein tree constructed from an alignment of 1122 proteins (295,805 characters). Overall gene family gains plus losses noted on the branches...... 117

Figure 4.3. Acquisition of HGT-derived genes in Picochlorum...... 118

4.S1. Distribution of variant frequencies in (A) P. oklahomensis and (B) Picochlorum SENEW3 primary assemblies...... 119

Figure 4.S2. Synteny between Picochlorum SENEW3 (right) and other species (left). 123

xv

Figure 4.S3. (A) IQ-TREE of HGT candidate peptidase S9. (B) Transcriptome evidence for intron in the gene in Picochlorum SENEW3 under control (1M NaCl) conditions. (C) Collinearity of this candidate with Picochlorum SENEW3 as the reference chromosome. (D) Collinearity of this candidate with P. soleocismus as the reference chromosome. .. 126

Figure 4.S4. (A) IQ-TREE of HGT candidate GDP-Mannose 4,6,dehydratase gene. (B) Transcriptome evidence for the gene in P. oklahomensis under control (1M NaCl) conditions. (C) Collinearity of this candidate with Picochlorum SENEW3 as the reference chromosome.. (D) Collinearity of this candidate with P. soleocismus as the reference chromosome...... 128

Figure 4.S5. (A) IQ-TREE of HGT candidate indolepyruvate decarboxylase. (B) Transcriptome evidence for the gene in Picochlorum SENEW3 under control (1M NaCl) conditions. (C) Collinearity of this candidate with Picochlorum SENEW3 as the reference chromosome. (D) Collinearity of this candidate with P. soleocismus as the reference chromosome...... 130

Figure 4.S6. Venn diagram of differentially expressed genes in P. oklahomensis under the four conditions tested...... 136

Figure 4.S7. Gene expression comparison between P. oklahomensis and Picochlorum SENEW3. (A) 1.5M NaCl 1h, (B) 10mM NaCl 1h, (C) 1.5M NaCl 5h, (D) 10mM NaCl 5h...... 137

Figure 4.S8. Number of (A) monoallelic (118 pairs total) and (B) biallelically (200 pairs total) expressed allele pairs shared under various salinity treatment conditions...... 138

Figure 4.S9. Number of allele pairs showing high change in ratio of primary: haplotig allele expression under various salinity treatment conditions...... 139

xvi

1

Chapter 1: Introduction

Microalgae as biofuel feedstock

In the quest for renewable and sustainable energy, algae have emerged as promising sources of biofuel. Marine or halotolerant algae are particularly desirable as biofuel feedstock because of their ability to grow on non-arable land, with non-potable brackish or saltwater. Unlike land based biofuel crops, algae reduce the competition with food crops for land or drinking water usage (1). Microalgae are favored because of their fast growth rates, large biomass production, and desirable lipid profiles. Finally, algal biomass can be processed into liquid biofuels that have the benefit of being able to be directly incorporated into existing infrastructure and pipelines (2).

There are several sought after characteristics, which if combined, would create the ideal microalga for biofuel production. First, the strain should have high biomass productivity and high lipid content. Constitutive lipid production would be best, because the nutrient deprivation technique typically used to induce oil body formation, diverts energy from growth (3). It should be robust enough to withstand the shear stresses caused by mixing, and the temperature, salinity, and light fluctuations associated with the traditional open pond cultivation systems. It should also be resistant to contamination and infections that commonly result in resource competition or decimation of entire pond cultures (4). Considering the effects of high light intensity on a cell, including photoinhibition and high mutation rates, and the fact that these open pond culturing facilities have typically been built in desert environments, the ability to withstand high light intensities would also be desirable in a potential biofuel feedstock strain (5). Strains

2 with reduced sensitivity of Rubisco to high oxygen concentrations are desirable, because oxygen competes with carbon dioxide as a substrate for Rubisco, limiting yield (6). Cells that autoflocculate, cells that are larger or heavier, or cells that have thin membranes would reduce harvest and extraction costs. Moreover, the ideal strain would excrete the desired lipids, avoiding the harvest process altogether (3). Finally, the ability to produce a high value co-product in conjunction with biofuels, a low value product, would make the process more economically feasible (7).

The use of halophiles for algal biomass production is a strategic choice; culturing halophiles reduces freshwater usage, and salinity can be utilized as a crop protection mechanism. When there is water loss in an open pond system due to evaporation, the lost water can either be replaced by freshwater thereby maintaining constant salinity or by seawater, resulting in a gradual increase in overall salinity of the pond (8). Ideally, saltwater, saline aquifer water, or nutrient rich wastewater would be the more environmentally sustainable choice for large-scale algal biomass production, requiring a halophilic feedstock organism, however a substantial amount freshwater is still required to overcome the effects of evaporation (1). In addition to evaporation, open pond systems are also susceptible to contamination. The use of selective conditions like high pH in the case of Spirulina platensis or high salinity for Dunaliella salina has been successful in reducing contamination, and facilitating growth of unialgal cultures (4). Von Alvensleben et al. showed that a salinity of 36‰, about the salinity of seawater, was capable of slowing the establishment of contamination of a Picochlorum atomus culture by the freshwater cyanobacterial contaminant, Pseudanabaena limnetica (9). Contamination by

3 non-target algae or cyanobacteria can also lead to resource competition or the release of potentially toxic allelochemicals into the culture and environment (9).

Salinity stress on eukaryotic microalgae

Microalgae are ubiquitous in the marine and freshwater environment, and as such, are exposed to a variety of salinity stresses, often coupled with osmotic and desiccation stress: river and tidal influx into estuaries, precipitation and evaporation in ponds or terrestrial environments, diel low and high tide exposure to salinity and desiccation stress in intertidal zones, wind-driven salinity fluctuations near the coast, and salinity changes in brine pockets during sea ice freezing and thawing cycles (10).

The majority of the algal salinity response literature focuses on the cell wall-less halotolerant green alga, Dunaliella. In Dunaliella, the immediate responses to salt stress are biophysical responses independent of gene expression. These include immediate loss of water and shrinking of the cell under hyperosmotic conditions, followed by the passive influx of external ions and reuptake of water. However the now internal excess of ions inhibits many cellular functions including photosynthesis and translation (11). The short- term response involves the synthesis or expulsion of osmolytes under hyper- and hypo- saline conditions, respectively. Osmolytes are small organic low molecular weight compounds involved in osmoacclimation that do not disrupt cellular processes. This occurs within 2-3 hours in Dunaliella. The acclimated response, which involves the accumulation of stress induce proteins starts around 12 hours after initial salt exposure in

Dunaliella (12).

4

The cell membrane is an integral component of halotolerance because it acts as a barrier to solutes entering or exiting the cell and may be involved in osmotic sensing.

Salinity affects the plasma membrane of cells in many ways including increased cell rigidity and shrinking under hyperosmotic stress, while hypoosmotic stress leads to an increase in cell volume and membrane fluidity (13). In the cell wall-less Dunaliella, the ability to adjust fatty acid composition and organization, thereby maintaining membrane fluidity is also an important factor in salt tolerance. These molecular responses to hypersalinity stress include the induction of fatty acid elongase involved in fatty acid elongation and ultimately leading to desaturation in order to maintain membrane fluidity

(14), changes in membrane sterols that may also be involved in osmotic signaling (15), changes in membrane lipid order to maintain elasticity (16), and membrane reservoirs that allow the expansion and contraction of the cell membrane without apoptosis (17).

While not much is known about salinity sensing in microalgae, the membrane mechanosensitive ion channels or cytoskeletal elements a role in sensing changes to turgor pressure in higher (18). The retraction of the cell membrane from the cell wall is one way to maintain membrane integrity and is noted in aeroterrestrial green alga

Zygnema under osmotic stress as well as Asterochloris erici under desiccation stress (19,

20). It has been suggested that cell walls may play a role in osmotic stress tolerance by forming a rigid layer of protection to resist water loss or possibly cell wall elasticity is an important strategy in maintaining cell integrity in streptophytes and higher plants (21,

22).

Microalgae apply a ‘salt out strategy’ resulting in the exclusion of salt from the cytoplasm via transporter enhancement in the membrane, in contrast to halophilic

5 prokaryotes that utilize a ‘salt-in strategy’ in which they accumulate KCl to maintain ionic and turgor pressure (23) . Salt overly sensitive transporters involved in sodium extrusion in Arabidopsis are also found to be involved in the salinity response in

Picochlorum. Enhanced transporters include carbonic anhydrases in Dunaliella and

Picochlorum and an iron transporter in Dunaliella potentially involved in overcoming carbon dioxide and iron limitations under high salinity (24-26). In Dunaliella, nitrate transporters are upregulated under high salinity and are coupled to the sodium rather than proton gradient (27). Dunaliella and Porphyra purpurea may also sequester salt in vacuoles, similar to higher plants (10).

Microalgae accumulate a variety of compatible osmolytes and often multiple in the same organism. Osmolytes include glycerol in Dunaliella and Phaeodactylum , and betaine or proline in Picochlorum and Fragilariopsis cylindrus (28, 29).

Dimethylsulphoniopropionate (DMSP) uptake and rapid expulsion into the environment for Cylindrotheca closterium or Phaedactylum salinity shock suggest that it can also serve in osmoacclimation (29, 30). Osmolytes are formed from either glucose generated by photosynthesis or starch degradation (31). Starch degradation occurs to free up resources for the carbon pool. Carbon flux is redirected to the production of compatible osmolytes such as glycerol in Dunaliella and proline in Picochlorum and energy utilized for transporters involved in ion exclusion. Salinity tolerance is an energy intensive process that is diverted from growth to maintaining homeostasis; therefore, pathways such as carbohydrate metabolism are differentially expressed (26, 32).

Salinity stress is associated with generation of reactive oxygen species (ROS), therefore antioxidant responses are increased in Dunaliella including ascorbate and

6 glutathione peroxidases, and alpha-tocopherol (33). Antioxidant upregulation is also correlated with salinity stress in Chlamydomonas (34).

Some algae such as Chlamydomonas and Dunaliella lose their flagella or flagellar activity and form aggregates of cells surrounded by an exopolysaccaride matrix reducing ion influx, called palmelloids or palmella, under stress, including salinity stress. These structures dissociate upon removal of stress, accompanied with recovery of flagella (35,

36). Extracellular polysaccharide substances (EPS) were found to protect the photosynthetic apparatus and resulted in enhanced viability and maintenance of Fv/fm under hypersaline conditions in Cylndrotheca closterium (37).

Salinity may inhibit photosynthesis and photosystem II repair from photoinhibition (38). Photosynthesis is reduced while photorespiration is increased under salt stress. Energy is also utilized under salinity stress to repair photosystem II in order to resume photosynthesis. Kim and co-authors showed that many photosynthetic related proteins were downregulated under both high and low salinity stress in Dunaliella, as well as reduced photosynthetic efficiency (Fv/Fm) under low salinity stress (39).

Reduced Fv/Fm was also noted in Fragilariopsis cylindrus (28). Light stress in combination with salinity stress results in enhanced photoinhibition in Chlamydomonas and sea ice algae (38, 40). Carotenoids such as Beta-carotene in Dunaliella or lutein in

Botryococcus braunii accumulate under salinity stress may be involved in photoprotection and was also seen in Picochlorum (41, 42).

Gene and protein expression changes in microalgae are similar to the response in higher plants. Protein folding and chaperones are also commonly expressed in response to protein denaturation that can occur at high salt concentrations. CO2 availability

7 decreases with increased salinity, correlating with upregulating of carbonic anhydrases as a counter response to convert bicarbonate into CO2 (25, 26, 43). Long term effects of salinity generally include reduced growth rate in favor of maintaining cell homeostasis and increased respiration.

Picochlorum as a biofuel candidate and model to study salinity stress

Bioprospecting for unique, robust, and highly efficient strains is the first step towards developing a suitable biofuel candidate through downstream genetic manipulation or adaptation for increased yield. The species we have chosen to investigate as a potential biofuel candidate, Picochlorum sp. strain SENEW3 (SENEW3), was selected because it is highly robust in the face of environmental fluctuations, specifically salt stress. Picochlorum SENEW3 is a small coccoid green alga

(, Chlorophyta) that is about 2-3 µm in diameter. It was isolated from a small permanent pond in the San Elijo Lagoon system, one of the largest coastal wetlands in San Diego, California. The lagoon is a shallow-water estuary, therefore a brackish environment, and is subject to large fluctuations in salinity through evaporation, precipitation, and tidal influx of seawater. Salinities range from 108.3‰ in the dry season to freshwater levels, 1.7‰ in the winter, rainy season. Nutrient levels (phosphate, nitrate, nitrite, ammonium) vary substantially as well. As is commonplace in brackish environments, this pond exhibits low species diversity. The plankton is typically dominated throughout the year by three major species: the green alga sp., a diatom Chaetoceros sp., and Picochlorum SENEW3. Of the three, Picochlorum

SENEW3 was found to have the broadest range of salinity tolerance and was present

8 year-round in the pond, whereas Picocystis had the highest salinity tolerance.

Picochlorum SENEW3 grows above 16 oC and has reduced growth rate above 32 oC.

Pigment analysis indicates it contains the carotenoids violaxanthin and zeaxanthin.

Picochlorum SENEW3 also shows significant lipid body accumulation under nitrogen limitation (Figure 1.1) (44).

Figure 1.1 UV images of Picochlorum SENEW3, with BODIPY. Green fluorescence indicates the neutral lipid (44)

Studies have assessed the potential of various Picochlorum species as candidates for biofuel production. These studies report doubling times between 36 - 48 hours and maximum biomass concentration 1.8-2.1 g/L (45, 46). Zhu et al. showed that

Picochlorum oklahomensis can be easily harvested with common methods of flocculation such as pH adjustment and chitosan addition (46). Most studies reported lipid content around 20-25% (10% fatty acids), however lipid profiles differ vastly between species of the , and between Picochlorum and other microalgae (45-47). Picochlorum species also produce several carotenoids that can be used to produce high value co-products, such as eye vitamin supplements and other dietary supplements (44-46). Additionally,

9

Picochlorum soleocismus has been found to be amenable to genetic manipulation in order to increase lipid production (48).

The Picochlorum lineage has evolved halotolerance from a freshwater ancestral state (49). Halotolerance within the genus varies with Picochlorum SENEW3 and P. oklahomensis capable of tolerating hypersaline conditions of a brackish lagoon and a salt plains environment, respectively. Picochlorum species also have highly reduced, specialized genomes, making this lineage a good model to address evolutionary questions of a complex trait like halotolerance.

Scope of the thesis

Using Picochlorum SENEW3 as a model, my thesis aims to explore the basic question of broad ecologic importance: How do microalgae adapt to environmental stresses such as salinity stress? How does habitat-driven adaptation shape small genomes like Picochlorum? Addressing this question from a genomics perspective will provide a better understanding of the molecular mechanisms of salinity and stress tolerance.

Studying the evolution of halotolerance is fundamentally interesting because it is a complex response that employs a variety of mechanisms to adapt to salt stress. Chapter 1 involves the sequencing and characterization of a new green algal isolate, Picochlorum

SENEW3, generating hypotheses on salinity tolerance mechanisms and metabolic flexibility. Chapter 2 establishes a model for the salinity shock response in Picochlorum

SENEW3 through transcriptome analysis, suggests that genome organization is important for rapid coordinated response to salinity stress, and explores the robust nature of its photosystem II under salinity and high light stress. Finally, chapter 3 identifies

10 mechanisms of environmental adaptation and evolution of salinity tolerance through a comparative genomics analysis of several Picochlorum species. Utilizing robust genome assemblies from multiple microalgae, the role and extent of HGT in this lineage is identified, and evolutionary strategies including coordinated gene expression, and gene family gain/loss and expansion.

11

Chapter 2: Genome of the haloterant green alga Picochlorum sp. reveals strategies for thriving under fluctuation environmental conditions

Fatima Foflonker1, Dana C. Price2, Huan Qiu3, Brian Palenik4, Shuyi Wang4 and

Debashish Bhattacharya1

1Departments of Biochemistry and , 2Plant Biology, 3Ecology, Evolution and Natural Resources, Rutgers University, New Brunswick, NJ 08901, USA. 4Scripps

Institution of Oceanography, University of California, San Diego, La Jolla, CA 92093,

USA.

Abstract

An expected outcome of climate change is intensification of the global water cycle, which magnifies surface water fluxes, and consequently alters salinity patterns. It is therefore important to understand the adaptations and limits of microalgae to survive changing salinities. To this end, we sequenced the 13.5 Mbp genome of the halotolerant green alga Picochlorum SENEW3 (SENEW3) that was isolated from a brackish water pond subject to large seasonal salinity fluctuations. Picochlorum SENEW3 encodes 7367 genes, making it one of the smallest and most gene dense eukaryotic genomes known.

Comparison with the pico-prasinophyte Ostreococcus tauri, a species with a limited range of salt tolerance, reveals the enrichment of transporters putatively involved in the salt stress response in Picochlorum SENEW3. Analysis of cultures and the protein complement highlight the metabolic flexibility of Picochlorum SENEW3 that encodes genes involved in urea metabolism, acetate assimilation and fermentation, acetoin

12 production and glucose uptake, many of which form functional gene clusters. Twenty- four cases of horizontal gene transfer from bacterial sources were found in Picochlorum

SENEW3 with these genes involved in stress adaptation including osmolyte production and growth promotion. Our results identify Picochlorum SENEW3 as a model for understanding microalgal adaptation to stressful, fluctuating environments.

Introduction

Climate change is expected to intensify changes in the water cycle at the rate of a 7% increase in intensity per degree Kelvin of warming (50). Increased evaporation and precipitation, caused by warmer waters and the ability of warmer air to retain more moisture are the major driving forces in this cycle (51). The predicted magnification of surface water fluxes from evaporation and precipitation closely correlate to changing salinity patterns (52).Salt concentration in water also affects its density and thereby the vertical mixing patterns of water (53). In addition to the challenge of adapting to salinity variation, phytoplankton communities will also face differences in nutrient and light availability due to changes in turbulence (54).

Picochlorum sp. strain SENEW3 (here SENEW3) is potentially a highly useful model to understand the effects of salinity stress on microalgae because of its wide range of salt tolerance. Picochlorum SENEW3 is a tiny, coccoid (i.e., non-motile) green alga

(Trebouxiophyceae, Chlorophyta) that is 2-3 µm in cell diameter. It was isolated from a small permanent pond in the San Elijo Lagoon system in San Diego County, California.

The pond is subject to large seasonal fluctuations in salinity (1.7 -108.3‰ [i.e., parts per thousand]) via evaporation, precipitation, and tidal influx of seawater. Laboratory studies

13 have confirmed the wide salt tolerance range of Picochlorum SENEW3 that extends from at least 3.5 – 105‰ (55). In comparison, other Picochlorum species grow maximally to ca. 75‰ salinity and in contrast, species from a freshwater sister clade to Picochlorum grow in salinities up to 25‰ (49). Picochlorum SENEW3 tolerates temperatures above

16°C but exhibits a reduced growth rate above 32°C. Carotenoid production and significant lipid body accumulation under nitrogen limitation suggests that Picochlorum

SENEW3 may be a promising species for commercial algal biomass applications (55).

Here we report the genome sequence of the natural isolate, Picochlorum

SENEW3. We analyze possible mechanisms of adaptation to salt stress through comparisons of metabolite transporters, identify genome regions of functionally clustered genes, and investigate the role of horizontal gene transfer (HGT) in potentially enhancing the stress tolerance capabilities of this free-living green alga.

Results

Genome features and phylogeny

A total of 830 Mbp of paired-end (2 x 150bp) Illumina sequence data were generated from Picochlorum SENEW3 using the Illumina MiSeq Personal Genome Sequencer of which 98.3% of reads matched to the assembled contigs. The assembly comprised 1,266 contigs with a N50=124.5 Kbp and an average genome coverage of 62x (52x median coverage). A total of 2.07 Gbp of RNA-seq data from this species were used to train the ab initio gene predictor Augustus (Stanke and Morgenstern, 2005), resulting in high quality gene models for downstream analysis. Our data show that the 13.5 Mbp nuclear genome encodes 7,367 protein-coding genes, with 5,795 introns, a G+C content of

14

46.1%, and a gene density of 1.8 Kbp/gene. These values are comparable to

Ostreococcus tauri (i.e., 12.6 Mbp genome; 1.6 Kbp/gene). Of the 458 shared core genes compiled in the Core Eukaryotic Genes Mapping Approach (CEGMA; http://korflab.ucdavis.edu/datasets/cegma/) database, 454 (99%) are present in the

Picochlorum SENEW3 draft genome suggesting a complete assembly. Putative annotations of all Picochlorum SENEW3 predicted proteins, their contig of origin, top database hit, and other attributes are presented in Table 2.S1. A maximum likelihood tree inferred from a concatenated alignment of 480,102 amino acids unambiguously places

Picochlorum SENEW3 as sister to Chlorella variabilis within Chlorophyta (100% bootstrap support) and reveals that its average protein evolutionary divergence rate (i.e., branch length) is elevated since its split from C. variablilis (Fig. 2.1). Nested within two other Trebouxiophyceae that have much larger genomes (e.g., 46 Mbp, 49 Mbp), the relatively small genome of Picochlorum SENEW3 likely indicates genome reduction in this taxon (Fig. 2.1). Streamlined genomes are characteristic of fast-evolving species that live in specialized ecological niches or in extreme environments (e.g.,(56-59)).

Clusters of functionally related genes

Recent evidence suggests that eukaryotic gene order may not be random, but rather some groups of functionally related genes form gene clusters (60). Here we identified gene clusters involved in a shared metabolic pathway, while allowing for the presence of intervening genes with unknown or un-related functions. A total of 5,795 Picochlorum

SENEW3 proteins were mapped to 482 pathways annotated in the Unipathway database.

15

This analysis resulted in a list of 633 proteins with BLASTp hits to Unipathway that were manually examined for evidence of functional clustering.

One interesting cluster we uncovered contains genes involved in urea uptake and degradation (Picochlorum_contig_54.g177.t1 – Picochlorum_contig_54.g180.t1 [Table

2.S1]). In contrast to the major route of urea degradation to carbon dioxide and ammonia by the nickel-containing urease present in green algae such as Ostreococcus and

Micromonas species, as well as land plants (58), Picochlorum SENEW3 and some green algae and fungi, including C. reinhardtii and C. vulgaris, use a two-step process involving an ATP-dependent urea carboxylase and allophanate hydrolase/amidase (61,

62). In green algae, these two enzymes are encoded by genes in close proximity, and are

673 bp apart in Picochlorum SENEW3 (Fig. 2.2; (63)). We did not detect any genes that encode subunits of the nickel-dependent urease complex in the Picochlorum SENEW3 genome. The Picochlorum SENEW3 urease gene cluster also includes a high affinity urea:Na+ symporter similar to DUR3 in Arabidopsis thaliana (AtDUR3) that is involved in import of urea during nitrogen starvation. This is the sole urea transporter identified in the Picochlorum SENEW3 genome. The clustering of these genes (not the case in

Ostreococcus) is not surprising because exogenous urea is an important nitrogen source to support amino acid biosynthesis in phytoplankton (64). Consistent with these results, the major nitrate assimilation cluster found in Ostreococcus (57) is also largely conserved in Picochlorum SENEW3 (see Table 2.S2).

Another pathway that shows evidence of clustering is the acetate assimilation pathway that leads to acetyl-CoA biosynthesis. Two genes encoding acetate kinase and phosphate acetyltransferase are located in close proximity

16

(Picochlorum_contig_155.g703.t1 and Picochlorum_contig_155.g705.t1). These genes are also present in C. reinhardtii but not in Ostreococcus and function in acetate assimilation or in the reverse direction, generate ATP through fermentation (65). This evidence suggests that Picochlorum SENEW3 may be able to utilize acetate as a carbon source, enhancing its metabolic flexibility, and may be capable of energy generation through fermentation during anoxic conditions. Also noteworthy is a cluster of two genes encoding alpha-acetolactate decarboxylase and acetolactate synthase

(Picochlorum_contig_58.g128.t1 and Picochlorum_contig_58.g129.t1) that are involved in the conversion of pyruvate to acetoin as part of (R,R)-butane-2,3-diol biosynthesis pathway. These genes have homologs in C. variabilis and appear to have a bacterial origin, but are absent in other , including Ostreococcus,. In some , these two genes form an operon (alsSD operon in Bacillus subtilis) and are involved in the fermentative production of acetoin, a neutral four-carbon molecule that serves to maintain cellular pH levels, regulate NAD/NADH ratios, and acts as a carbon storage molecule that can be excreted or reutilized during stationary phase (66-68). Acetolactate synthase also catalyzes the first step in the biosynthesis of the branched chain amino acids leucine, isoleucine, and valine (67). The genome of Picochlorum SENEW3 does not however encode the gene alsR, the transcription factor essential for the transcription of the B. subtilis alsSD operon, nor does it include (R,R)-butanediol dehydrogenase, the enzyme responsible for the reduction of acetoin to 2,3-butanediol, a common fermentation product of industrial importance in bacteria (69, 70). Other clusters of functionally related gene are listed in Table 2.1.

17

We also used the program C-Hunter (71) to identify functional clusters based on

Gene Ontology (GO) terms. Several clusters of 5-8 genes were identified that are involved in the following functions: response to abiotic stimulus, transferase activity, hydrolase activity, and nucleotide binding. Clusters of three genes or less included those involved in biotin synthesis, the citric acid cycle, inorganic phosphate transport, histone proteins, and response to abscisic acid, a stress indicator (results not shown). These results provide evidence for the clustering of functionally related genes in Picochlorum

SENEW3 and suggest that some of the clusters we identified likely play a role in environmental adaptation. Finally, we note that due to the fragmented nature of the genome assembly and the limited number of genes that have pathway annotations, the gene clusters we have identified likely represent an under-estimate of the true number. A more complete assembly is likely to physically connect additional genes with shared pathways functions into single contigs.

Transporter analysis

The number and type of metabolite transporters were compared between Picochlorum

SENEW3 and Ostreococcus tauri, a species with limited halotolerance. We identified putative membrane transport proteins and classified them based on sequence similarity

(BLASTp, E-value cutoff ≤1x10-10) using the Transporter Classification Database

(TCDB; http://www.tcdb.org/) (72). This resulted in the identification of 719 putative membrane transport proteins in Picochlorum SENEW3 that were categorized into 124 families, representing 9.8% of the genome. Fewer transport proteins were identified in O. tauri, 660 proteins in 112 families, comprising 8.3% of the genome. The most common

18 transporter proteins in Picochlorum SENEW3 belong to the ATP-binding cassette (ABC) superfamily (60 proteins), followed by the nuclear mRNA exporter (mRNA-E) family

(39), and the peroxisomal protein importer (PPI) family (37) (see Fig. 2.3).

Picochlorum SENEW3 and O. tauri share a set of 165 distinct transporter classification (TC) numbers, making up a core set of transporter functions. TC number distinguishes transport proteins at the level of subfamily and substrate range.

Picochlorum SENEW3 encodes 533 shared proteins, whereas O. tauri encodes 508, indicating gene expansion in the Picochlorum SENEW3 genome. Putative functional annotations of the overrepresented proteins in the Picochlorum SENEW3 genome include the mitochondrial protein translocase family (3.A.8.1.1) and the sodium/hydrogen

(AtNHX8) exchanger (2.A.36.7.3). Picochlorum SENEW3 had 175 transport proteins with 35 distinct TC numbers, not present in O. tauri. The most abundant transporters found only in Picochlorum SENEW3 include general amino acid transporters (AAP3)

(2.A.18.2.3) and the multidrug resistance protein 4 involved in efflux of drugs and signaling molecules (3.A.1.201.7). Protein families overrepresented in O. tauri include the Resistance-Nodulation-Cell Division (RND) Superfamily (2.A.6) that functions in drug and lipid efflux, the Voltage-gated Ion Channel (VIC) Superfamily (1.A.1), and potassium transport related proteins. These data show generally that both Picochlorum

SENEW3 and Ostreococcus contain a large number of membrane transporters, many of which are shared and some of which are unique to each taxon. The latter likely reflect adaptations to their different environments, although this needs to be tested using functional analyses.

19

Nonetheless, it is likely that some of the transport proteins contribute to the broad range of salt tolerance in Picochlorum SENEW3 (Fig. 2.4). For example, this alga encodes six copies of the atNHX8/salt overly sensitive 1 (SOS1) gene (compared to one in O. tauri), a plasma membrane Na+/H+ antiporter involved in the extrusion of sodium from the cell that is essential for salt tolerance (73, 74). Sodium extrusion via the Na+/H+ antiporter is coupled to an H+ gradient formed by an H+-ATPase. Picochlorum SENEW3 also contains one gene annotated as a subunit of Na+/K+-ATPase, involved in the ATP- dependent active extrusion of sodium from the cell, which is particularly useful under high pH conditions when the export of sodium via the Na+/H+ antiporter is rendered ineffective (75). Thought initially to be exclusive to animals, homologs of the Na+/K+-

ATPase have been reported in some algae including Dunaliella salina, Heterosigma akashiwo, and Porphyra yezoensis (76-78). Also present is NHX1, a Na+ (K+)/H+ antiporter localized in the vacuolar membrane that is involved in the vacuolar accumulation of K+ for osmotic adjustment (79). NHX1, similar to SOS1, is also driven by a proton gradient formed by vacuolar H+-ATPase and H+ translocating inorganic pyrophosphatase (80).

Other proteins found in Picochlorum SENEW3 but not in O. tauri include three copies of the mechanosensitive channel 1 (MSC1), likely located in the chloroplast.

Mechanosensitive channels are present in most prokaryotes and sense changes in the membrane, often involved in sensing osmotic stress (81). Picochlorum SENEW3 also has two inward rectifying potassium channels (IRK). Maintaining a high intracellular potassium level is one strategy used to reduce the toxic effects of Na+ on cells (82).

Picochlorum SENEW3 contains several more amino acid permeases than O. tauri. These

20 are primarily the general amino acid permease 3 (AAP3) involved in the transport of neutral and acidic amino acids. Amino acids and other nitrogen containing compounds accumulate in cells as osmolytes in response to salt stress (83)

Other environmental adaptations include genes involved in metal uptake.

Picochlorum SENEW3 has several additional transporters for zinc and other heavy metals including iron and magnesium in the Zinc (Zn2+)-Iron (Fe2+) Permease (ZIP)

Family (2.A.5) and the Putative Tripartite Zn2 Transporter (TZT) Family (9.B.10), the

CorA Metal Ion Transporter (MIT) Family (1.A.35); there are 23 genes in these three categories in Picochlorum SENEW3 compared to 10 in O. tauri. Picochlorum SENEW3 has an abundance of ABC transporters, many of which are multidrug resistance proteins.

In terms of metabolic flexibility, Picochlorum SENEW3 contains seven hexose transporters including one that is homologous to the Hup1 glucose transporter in

Chlorella kessleri (84), consistent with the reported mixotrophic capabilities of

Picochlorum strain S1b in the presence of glucose (85).

Growth rates in the presence of organic carbon sources

Given the discovery of putative glucose transporters in the Picochlorum SENEW3 genome, we tested the impact of glucose on cell growth. For this experiment, we raised

Picochlorum SENEW3 under different levels of salt stress, added glucose to the medium, and then measured the growth rate (Fig. 2.5). This analysis reveals suppressed growth rates and longer acclimation periods between 1.4 and 1.6 M NaCl with no growth being observed at 1.8 M NaCl; i.e., under the conditions used in the laboratory (see Methods).

Mixotrophic growth with the addition of glucose under 1.5 M NaCl stress showed

21 increased maximum cell density with increasing glucose concentrations. No evidence of heterotrophic growth was observed with the addition of glucose in the dark. Mixotrophic growth on glucose has also been shown to increase growth rates in Picochlorum S1b and

Chlorella vulgaris (85, 86). Unlike Picochlorum SENEW3 and C. vulgaris, C. reinhardtii lacks hexose transporters (87). These results are consistent with our comparative genomic analysis, suggesting that glucose may be taken up and metabolized by the cell, thereby partially mitigating the effects of high salt stress. Preliminary culture experiments in which acetate was added to the medium show that Picochlorum SENEW3, similar to C. reinhardtii, can grow mixotrophically in the presence of this organic carbon source

(Foflonker and Bhattacharya unpublished results).

HGT analysis

We investigated the extent of HGT in the Picochlorum SENEW3 genome using an automated phylogenomic pipeline (88). Here we focused on inter-domain HGT because of the greater sampling depth of prokaryote genomes and their large phylogenetic distance from Picochlorum SENEW3 that provides a clear signal of foreign gene acquisition. We generated 5,871 maximum likelihood (Randomized Axelerated

Maximum Likelihood [RaxML];(89)) protein trees (containing >3 phyla) using the

Picochlorum SENEW3 predicted proteins as the query. These trees were sorted using the program PhyloSort (90) to search for cases of monophyly with Bacteria, Archaea, or

Vira. Phylogenies of interest were then manually examined to identify candidates for

HGT with >60% bootstrap support for the sister-group relationship between Picochlorum

SENEW3 and prokaryotes or trees that included only prokaryotes with the Picochlorum

22

SENEW3 protein (Fig. 2.6). Given the rampant history of HGT among prokaryotes and the relatively rich green algal/plant database, we presumed that the absence of eukaryotic proteins in the latter trees (except for Picochlorum SENEW3) was sufficient evidence to implicate HGT.

Using this approach, we found 24 instances of HGT unique to Picochlorum

SENEW3 (i.e. not found in any other sequenced green alga), of prokaryotic, mainly bacterial origin (Table 2.2). This can be compared to ca. 74 genes of putative prokaryotic origin in the Bathycoccus prasinos genome (91). Fifteen of the 24 are expressed and have at least 20X EST coverage (Table 2.2) under standard culture conditions (see materials and methods). Interestingly, many of the 24 genes in Picochlorum SENEW3 have functions potentially related to stress adaptation and the majority is related to carbohydrate metabolism. Most are glycosyltransferases, glycoside hydrolases, and polysaccharide lyases that function in polysaccharide synthesis and degradation into its sugar moieties. The gain of polysaccharide degrading enzymes, including a cellulase, may function in cell wall recycling, remodeling, or may be excreted from the cell and function in nutrient acquisition, thereby providing metabolic flexibility (92). Other genes may be involved in cell wall synthesis; for example, the GDP-mannose to GDP fucose pathway; both genes in this pathway (GDP-mannose-4,6,-dehydratase and GDP-L-fucose synthase) appear to have a bacterial origin. In Arabidopsis, GDP-mannose-4,6- dehydratase mutants are deficient in L-fucose, a precursor of the cell wall constituent rhamnose, leading to weakened cell walls resulting in stunted growth (93, 94). Also noteworthy are several genes of suspected HGT origin involved in carbohydrate modifications in the cell wall in the green algae C. variabilis and Ostreococcus

23 lucimarinus (57, 95). Several glycosyltransferases of foreign origin involved in cell surface protein modifications were identified in the cyanobacterium Synechococcus, and hypothesized to function as a predation avoidance mechanism (96). Glycosylation was also noted as an enriched category of genes of both prokaryotic and non-Viridiplantae eukaryotic origin in B. prasinos (91). Taken together, these data suggest that HGT in green algae has repeatedly conferred genes involved in cell wall and cell surface modifications.

Several of the HGT-derived genes may contribute to the broad salt tolerance properties of Picochlorum SENEW3 (Fig. 2.7). A gene encoding glycerol dehydrogenase is involved in the synthesis of glycerol, a common osmolyte involved in osmoregulation during salt stress. Proteases, including a glutamyl endopeptidase known to be induced during salt stress, breaks peptide bonds thereby freeing amino acids like glutamate, which may act as an osmolyte (97).

Other HGT-derived genes potentially involved in stress adaptation include those involved in sulfate scavenging, growth promoting hormone synthesis, cell cycle control, and DNA methylation. An arylsulfatase gene in Picochlorum SENEW3 is potentially involved in sulfur mineralization by the hydrolysis of sulfate esters to sulfate (98).

Whereas most plants and algae increase inorganic sulfur uptake in response to sulfur limitation stress typical of freshwater environments, periplasmic arylsulfatases provide the means to utilize organic sulfur as an alternative, and are induced in C. reinhardtii and

Volvox carteri under sulfur limitation (99-101). Excess sulfur, can be incorporated in sulfur containing amino acids such as cysteine and methionine or shunted to the synthesis of DMSP, an osmoprotectant that is favored under nitrogen limiting conditions (102).

24

Another bacterial gene found in Picochlorum SENEW3, regulation of chromosome condensation (RCC1) is a protein that binds the nucleosome and is involved in chromosome segregation during cell division (103). RCC1 may be involved in cell cycle control, important in unpredictable environments. It is also among the expanded gene families in the diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana (104).

Also of foreign origin is an enzyme involved in the synthesis of a plant growth promotion hormone, indole-3-acetic acid, typically produced by plants and plant-associated soil bacteria (105). This growth hormone promotes growth in co-cultures of Chlorella vulgaris with the plant growth-promoting rhizobacterium Azospirillium brasilens (106).

Selenoproteins

Selenoproteins are selenocysteine-containing proteins, often oxidoreductases that are widely distributed in all domains of life. Green algae such as O. tauri and C. reinhardtii contain 26 and 10 selenoproteins, respectively, whereas some higher plants lack selenoproteins (57, 107, 108). Selenocysteine is encoded by UGA (typically read as a stop codon) in the mRNA sequence, and its translation as selenocysteine is dependent on the presence of a SECIS element (selenocysteine insertion sequence) located in the 3’

UTR region in eukaryotes (109). We used TBLASTn to search for similarity between

Picochlorum SENEW3 proteins and known selenoproteins in O. tauri (57). Top hits were tested for the occurrence of the conserved stem-loop structure of the SECIS element using SECISearch (110), which were then manually verified. Using this approach, suprisingly, no clear evidence was found for the presence of selenoproteins in

25

Picochlorum SENEW3, although our data do not preclude their potential occurrence in the genome.

Hydrogenase activity and other genes of interest

The Picochlorum SENEW3 genome contains two hydA-like genes encoding [FE]- hydrogenase involved in maintaining pH homeostasis while releasing H2 gas in response to anoxic stress. Hydrogen evolution is present in green algae such as C. reinhardtii but is absent from others (e.g., D. salina) and provides a target alternative energy source (111).

Our finding suggests that Picochlorum SENEW3 may have H2-evolution capabilities, adding to its repertoire of stress adaptations. This species may also provide an attractive alternative to C. reinhardtii for H2 production, particularly because a high salt, selective medium could be utilized for cultivation. Finally, similar to C. variabilis NC64A, the genome of Picochlorum SENEW3 contains genes involved in chitin and chitosan biosynthesis (112). A homolog of chitin synthase, four copies of chitin deacetylase, three chitinase genes, and a chitosanase gene were found (see Table 2.S1), suggesting that chitin and chitosan may contribute to the resilience of the Picochlorum SENEW3 cell wall.

Discussion

Microalgae are increasingly being looked upon as indicators of climate change and as their genomes are being determined and metabolic pathways described, as potential models for biotechnology (55, 113, 114). Two traits of particular interest for the biofuel industry are salt-tolerance as a means of achieving crop protection in open pond systems

26 and high lipid production (114). Picochorum SENEW3 has both of these properties and is therefore a potential algal biofuel candidate (55). Understanding how this alga is able to survive in a stressful and fluctuating environment offers the promise to advance both applied and fundamental research.

Given this interest, it was serendipitous that Picochlorum SENEW3 has a highly reduced and compact genome of size 13.5 Mbp (i.e., with a gene present every ca. 2 Kbp) that is highly amenable to comparative analysis. Our work identifies a suite of characters that differentiate Picochlorum SENEW3 from its non-halotolerant sisters that represent major innovations in this lineage. One of these is the clustering of functionally related genes such as for urea metabolism and acetate assimilation that likely allow a rapid response to nutrient stress. Another adaptation to salt and nutrient stress in Picochlorum

SENEW3 is the expansion of metabolite transporter gene families with 719 members representing nearly 10% of the gene inventory. We found seven transporters putatively capable of glucose uptake with culture work showing that glucose in the medium improved algal growth rate under high salt stress (Fig. 2.5). A third major input into overcoming environmental stress in Picochlorum SENEW3 is HGT of bacterial genes.

The notion that microalgae can, like bacteria, gather genes from the environment to adapt to changing conditions has still not been widely tested with free-living taxa. A recent study of the extremophilic (i.e., found in hot acidic waters near fumaroles)

Cyanidiophytina red alga Galdieria sulphuraria showed that its remarkable metabolic flexibility (e.g., glycerol metabolism, ability to detoxify mercury and arsenic) is explained by HGT from prokaryotic sources. Its sister lineage, the rock-dwelling G. phlegrea has, since its split from G. sulphuraria, regained all of the genes required for

27 urea hydrolysis through (likely independent) HGTs from bacteria, allowing it to survive in the nitrogen-limited cryptoendolithic environment (58). Our results with Picochlorum

SENEW3 significantly extend these findings and show that an alga that lives in a variety of environmental conditions ranging from mesophilic to halophilic is also able to acquire genes from the environment to extend its metabolic flexibility. This trait is evident in the enhanced repertoire of proteins involved in carbohydrate metabolism, osmolyte regulation, sulfate scavenging, and cell cycle control. The 24 clear cases of HGT we identified in Picochlorum SENEW3 stand in stark contrast to the obvious strong selection in this lineage for shedding genes and reducing genome size and complexity. This observation suggests that the prokaryote-derived genes in Picochlorum SENEW3 must confer an ecological, adaptive advantage.

In summary, although little known in the general scientific literature, our results identify Picochlorum SENEW3 as a potentially valuable model for investigating the origin of metabolic flexibility in eukaryotic microbes. The next step is to develop genetic tools in Picochlorum SENEW3 to test the hypotheses presented here with the goal of using this knowledge to improve other microbial strains of interest to serve basic and applied research goals.

Experimental Procedures

Strains and culture conditions

Picochlorum strain SENEW3 (SENEW3) was isolated by B.P. and S.W. from the San

Elijo Lagoon system, in San Diego County, California and is described further in (55).

The alga was cultivated in artificial seawater (115) based Guillard’s F/2 medium without

28 silica at 25°C under continuous light (~100uE/m2/s) on a rotary shaker at 100 rpm

(Innova 43, New Brunswick Eppendorf).

The high salt stress experiments were done in duplicate cultures by varying the concentration of NaCl in the artificial seawater based F/2 medium. Mixotrophic growth rate experiments under 1.5 M NaCl stress were performed in triplicate cultures with the addition of 1-30 g/L of glucose that was filter-sterilized using 0.2 µm cellulose acetate filters. Heterotrophic growth was tested with the addition of 5 g/L glucose in the dark.

Picochlorum SENEW3 stock solution was used to inoculate 100 mL flasks to the inoculation density of 1x105 cells/mL. Algal growth was determined by cell counts using a hemacytometer (Neubauer improved, Hausser Scientific) and ImageJ software.

DNA and RNA extraction and library construction

Approximately 100 mg of cells was harvested by centrifugation at 4,000 rpm for 2 minute and then immediately frozen with liquid nitrogen. DNA extraction was performed using the DNeasy Plant Mini Kit (Qiagen, Valencia, CA) and total RNA was extracted using the RNeasy Mini Kit (Qiagen). DNA and cDNA libraries were constructed using the Nextera DNA Sample Preparation Kit V2 and TruSeq RNA Sample Preparation Kit, respectively (Illumina Inc., San Diego, CA) following manufacturer’s protocols.

Genome and transcriptome sequencing

A total of 830 Mbp of paired-end (2 x 150bp) Illumina genome sequence data and 2.07

Gbp (13.8 million reads) of paired-end 2 x 150bp mRNAseq data were generated from

Picochlorum SENEW3 using the Illumina MiSeq Personal Genome Sequencer (Illumina,

29

Inc., San Diego, CA). The genome assembly was generated using the CLC Genomics

Workbench de novo assembler (CLC Bio, Aarhus, Denmark) and consisted of 1,266 contigs totaling 13.45 Mbp with an N50 of 124,539 bp. The RNA-seq data were aligned to the genome using GSNAP (116) and the output used to train the ab initio gene predictor

Augustus (117), resulting in 7,367 high quality gene models for downstream analysis.

The sequence data used to assemble the draft Picochlorum SENEW3 genome and the assembled contigs can be accessed via NCBI BioProject PRJNA245752. The genome assembly, gene models, and phylogenomic output (see below) are also available at: http://cyanophora.rutgers.edu/picochlorum/.

Construction of multi-protein tree

We collected complete proteome data from ten green algae: Picochlorum SENEW3,

Chlamydomonas reinhardtii (118), carteri (119), Chlorella variabilis (112),

Coccomyxa subellipsoidea (95), Micromonas isolate RCC299 (120), Micromonas pusilla

CCMP1545 (120), Ostreococcus lucimarinus, Ostreococcus tauri (57), and Bathycoccus prasinos (91), from the Cyanophora paradoxa (88), and from the red alga

Porphyridium purpureum (Bhattacharya et al., 2013). These combined data were subjected to an all-vs-all self-BLASTp search (E-value cut-off ≤1e-05). Ortholog groups across the 12 taxa were constructed using OrthoMCL (121)with default settings.

Sequence alignments were constructed for orthologous groups containing only one sequence in each green algal taxon (allowing missing data in a maximum of 2 taxa). The alignment was built using MAFFT (--auto) (122) with the poorly aligned regions being removed using Gblocks (-b4=5; -b5=h) (123). Because Gblocks is unable to remove

30 badly aligned individual sequence within well-conserved blocks, we applied T-coffee

(124) to remove poorly aligned residuals (i.e., conservation score ≤5). Sequences less than one-half of the alignment length and columns with <8 residues were also removed from alignments. The resulting alignments (≥100 amino acids) were used for single gene tree construction using PhyML3 (125) under the LG+Γ+F+I model. Trees (and the alignments) with 20% longest total branch length were removed. The remaining 1656 alignments were concatenated into a super-alignment (480,102 amino acids). The multi- protein tree was built using RAxML (89) under the PROGAMMALGF model. The bootstrap values were generated using 100 replicates.

Phylogenomic analysis

Automated phylogenomic analysis of individual proteins was done as previously described in (88). Briefly, BLASTp was used to retrieve a set of taxonomically diverse sequences from our in-house protein database. Sequence alignments were constructed using MAFFT v6.864b (122), and RAxML 7.2.8 (PROTGAMMAWAG model; 100 bootstrap replicates) was used to generate 5,871 trees containing greater than or equal to

3 phyla. Trees were sorted for monophyly with bacteria, archaea, and vira using

PhyloSort (90). Instances of HGT were then manually confirmed with ≥60% bootstrap support for the sister relationship between Picochlorum SENEW3 and prokaryotes or trees containing only prokaryotes.

31

Transporter analysis

Putative membrane transport proteins and their classifications were identified based on sequence similarity searches (BLASTp, E-value cutoff ≤1x10-10) against the Transporter

Classification Database (TCDB). TC numbers were used to identify a set of core or shared proteins and those unique to Picochlorum SENEW3 or O. tauri.

Functionally clustered pathways

We downloaded pathway annotations from the Unipathway database (126). The sequences of the underlying genes were retrieved from UniProtKB/Swiss-Prot that comprises a collection of high quality manually annotated and non-redundant protein sequences (http://www.ebi.ac.uk/uniprot). The resulting database comprises proteins for

907 reactions (gene families) that build 207 pathways (478 sub-pathways). Picochlorum

SENEW3 protein sequences were used as query to search against the database using

BLASTp (E-value cutoff ≤1x10-10). This resulted in a list of 633 Picochlorum SENEW3 proteins with significant hits. Physically linked genes involved in the same pathway were then manually identified. C-hunter (71) was also used to identify functional clusters based on Gene Ontology (GO) terms ((127); minimum number of genes per cluster 2; maximum cluster size 3; E-value cutoff ≤1 x 10-3; threshold of cluster overlap 10%) and

(minimum number of genes per cluster 2; maximum cluster size 50; E-value cutoff ≤1x

10-4; threshold of cluster overlap 50%). GO terms were identified using the Blast2Go program (default settings) (128). The top two levels of GO scheme were not considered; e.g., molecular function and biological process, which are too general to provide insights in this analysis.

32

Acknowledgements

This work was supported by a grant from the Department of Energy (DE- EE0003373/001). The authors have no conflict of interest with respect to this work.

Genome size Gene number 0.31 13.3 Mb 7367 0.09 Picochlorum SE3 0.04 0.18 Chlorella variabilis 46 Mb 9791 0.25 Coccomyxa subellipsoidea 49 Mb 9851 0.07 Volvox carteri 138 Mb 14566 0.23 0.07 Chlamydomonas reinhardtii 121 Mb 15143 Ostreococcus lucimarinus 13 Mb 7805 Ostreococcus tauri 12 Mb 8116

Bathycoccus prasinos 15 Mb 7847

Micromonas strain RCC299 21 Mb 10056

Micromonas pusilla CCMP1545 22 Mb 10575

Figure 2.1 Phylogenetic analysis of Picochlorum SENEW3. Multi-gene maximum likelihood tree of ten green algae inferred from an alignment of 480,102 amino acids. The genome size and protein-coding gene number are shown for each taxon.

33

contig 54.g180.t1 contig 54.g177.t1 contig 54.g178.t1 contig 54.g179.t1 Allophanate hydrolase, partial Urea cycle

Urea-1-carboxylate High 3.5.1.54 CO Extracellular affinity Urea 6.3.4.6 2 + Amidase/ urea urea : Na Urea carboxylase Allophanate transporter hydrolase

3.5.1.5 Urease

Figure 2.2 Analysis of the urea gene cluster. Urea functional gene cluster in Picochlorum

SENEW3 includes genes encoding a high affinity urea transporter and two (urea carboxylase and allophanate hydrolase) enzymes involved in urea degradation instead of a single urease enzyme.

34

Frequency 0 10 20 30 40 50 60

The ATP-binding Cassette (ABC) Superfamily The Nuclear mRNA Exporter (mRNA-E) Family The Peroxisomal Protein Importer (PPI) Family The Gap Junction-forming Connexin (Connexin) Family The Endoplasmic Reticular Retrotranslocon (ER-RT) Family The Major Facilitator Superfamily (MFS) The Mitochondrial Carrier (MC) Family The Plant Photosystem I Supercomplex (PSI) Family The Drug/Metabolite Transporter (DMT) Superfamily The H+- or Na+-translocating F-type, V-type and A-type ATPase (F-ATPase) The Chloroplast Envelope Protein Translocase (CEPT or Tic-Toc) The H+ or Na+-translocating NADH Dehydrogenase (NDH) Family The General Secretory Pathway (Sec) Family The P-type ATPase (P-ATPase) Superfamily The Ankyrin (Ankyrin) Family The Proposed Fatty Acid Group Translocation (FAT) Family The Mitochondrial Protein Translocase (MPT) Family The Nuclear t-RNA Exporter (t-Exporter) Family

Transporter Family Transporter The Amino Acid/Auxin Permease (AAAP) Family The Autotransporter-1 (AT-1) Family The Multidrug/Oligosaccharidyl-lipid/Polysaccharide (MOP) Flippase The G-protein-coupled receptor (GPCR) Family The Unknown BART Superfamily-1 (UBS1) Family The Zinc (Zn2+)-Iron (Fe2+) Permease (ZIP) Family The Putative Tripartite Zn2 Transporter (TZT) Family The Monovalent Cation:Proton Antiporter-1 (CPA1) Family The Transient Receptor Potential Ca2+ Channel (TRP-CC) Family Picochlorum SE3 The -Latrotoxin (Latrotoxin) Family Ostreococcus tauri The Voltage-gated Ion Channel (VIC) Superfamily Total The Voltage-gated K+ Channel -subunit (Kv ) Family The DedA or YdjX-Z (DedA) Family

Figure 2.3 Analysis of metabolite transporters in Picochlorum SENEW3 showing frequency of transporters per transporter family in Picochlorum SENEW3 and O. tauri.

Minimum frequency cutoff = 6 for Picochlorum SENEW3.

35

+ Amino Acids + H + K H + Na IRK SOS1 AAP3 + - H + Na ATPase- ATPase Na+

i ATP ADP+P ADP+P ATP i Anions Na+/K+

MSC1 NHX1 2P H+ i H+ + H PP H+- i V C a pyrophosphatase hl cu or ol + oplast e H - ATPase

Channel Facilitated ATPase ATP Transporter ADP+Pi

Amino C ne Cations Anions Acids ell Membra

Figure 2.4 Putative distribution and functions of metabolite transports in the Picochlorum

SENEW3 cell showing transporters involved in the salt stress response.

36

AB 1.00E+08 1.00E+08 0.4 M NaCl 1.4 M NaCl 1.5 M NaCl 1.6 M NaCl 1.00E+07 1.00E+07

1.00E+06 1.00E+06 0 g/L glucose

Cells/mL Cells/mL Cells/mL 1 g/L glucose 5 g/L glucose 10 g/L glucose 1.00E+05 1.00E+05 30 g/L glucose

1.00E+04 1.00E+04 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 400 450 500 Hours Hours

AB 1.00E+08 1.00E+08 0.4 M NaCl 1.4 M NaCl 1.5 M NaCl 1.6 M NaCl 1.00E+07 1.00E+07

1.00E+06 1.00E+06 0 g/L glucose

Cells/mL Cells/mL Cells/mL 1 g/L glucose 5 g/L glucose 10 g/L glucose 1.00E+05 1.00E+05 30 g/L glucose

1.00E+04 1.00E+04 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 400 450 500 Hours Hours

37

Figure 2.5 Mixotrophic growth of Picochlorum SENEW3. (A) Growth under salt stress and (B) growth under 1.5 M NaCl salt stress with the addition of different amounts of glucose. The inoculation density was 1x105 cells/mL. The error bars represent standard error of (A) duplicate and (B) triplicate cultures, except for the 0.4 M NaCl result in (A) that represents a single culture.

38

100 Proteobacteria AB60 Francisella spp. 100 Actinobacteria-Streptomyces flavogriseus ATCC 33331 gi357409612 95 Picochlorum-contig-43-g357-t1 72 Actinobacteria-Streptomyces sp. SPB78 gi302519724 100 Cyanobacteria-Synechococcus sp CC9311 gi113954751 Stramenopiles-Phytophthora capsici jgi82743 Cyanobacteria-Synechococcus sp. WH 7805 gi88809700 74 Chloroflexi-Ktedonobacter racemifer DSM 44963 gi298248939 Actinobacteria-Streptomyces violaceusniger Tu 4113 gi345011319 100 Actinobacteria-Frankia sp. EuI1c gi312197329 100 100 Archaea-Methanosarcina acetivorans C2A gi20089483 Actinobacteria-Streptomyces griseoflavus Tu4000 gi302562622 65 Archaea-Methanosarcina barkeri str Fusaro gi73668953 92 Actinobacteria-Streptomyces zinciresistens K42 gi345852298 100 Proteobacteria-Providencia rettgeri DSM 1131 gi291327076 99 100 99 Proteobacteria-Providencia stuartii ATCC 25827 gi183598941 90 Proteobacteria-Dickeya dadantii Ech586 gi271501529 Actinobacteria-Streptomyces pristinaespiralis ATCC 25486 gi297191008 84 Proteobacteria-Edwardsiella tarda ATCC 23685 gi294634417 Actinobacteria-Stackebrandtia nassauensis DSM 44728 gi291300803 Proteobacteria-Desulfovibrio magneticus RS 1 gi239907262 91 Actinobacteria-Amycolatopsis mediterranei U32 gi300790149 100 100 54 97 Actinobacteria 100 99 Actinobacteria-Streptomyces sp. AA4 gi302526472 91 Metazoa-Nasonia vitripennis gi156538983 Actinobacteria-Actinosynnema mirum DSM 43827 gi256376007 97 Excavata-Leishmania infantum JPCM5 gi146099506 100 Excavata-Leishmania major strain Friedlin gi157876137 Actinobacteria-Janibacter sp. HTCC2649 gi84496412 Excavata-Leishmania braziliensis MHOM BR 75 M2904 gi154336655 Actinobacteria-Rubrobacter xylanophilus DSM 9941 gi108805388 100 Actinobacteria 100 Chloroflexi-Ktedonobacter racemifer DSM 44963 gi298247695 94 Chloroflexi-Ktedonobacter racemifer DSM 44963 gi298241261 FibrobacteresAcidobacteria-Terriglobus saanensis SP1PR4 gi320108709 Firmicutes-Clostridium spiroforme DSM 1552 gi169349600 67 Fibrobacteres/Acidobacteria-Candidatus Solibacter usitatus Ellin6076 gi116624232 100 89 FibrobacteresAcidobacteria-Terriglobus saanensis SP1PR4 gi320105657 Fungi Firmicutes-Mahella australiensis 50 1 BON gi332981218 /Chlorobi-Odoribacter splanchnicus DSM 20712 gi325282145 81 Rhodophyta-Boldia sp. contig 22757 1 Bacteroidetes/Chlorobi-Maribacter sp. HTCC2170 gi305664924 Bacteroidetes/Chlorobi-Leeuwenhoekiella blandensis MED217 gi86142021 100 91 89 Bacteroidetes/Chlorobi-Zunongwangia profunda SM A87 gi295136007 Firmicutes Bacteroidetes/Chlorobi-Flavobacterium johnsoniae UW101 gi146299791 60 74 Bacteroidetes/Chlorobi-Capnocytophaga canimorsus Cc5 gi340620830 70 72 Firmicutes-Staphylococcus hominis SK119 gi228475763 100 Tenericutes-Mycoplasma penetrans HF 2 gi26554143 77 Bacteroidetes/Chlorobi-Capnocytophaga sputigena ATCC 33612 gi213961665 Tenericutes-Mycoplasma iowae 695 gi350546566 65 100 Bacteroidetes/Chlorobi-Capnocytophaga ochracea DSM 7271 gi256819062 Firmicutes-Paenibacillus mucilaginosus KNP414 gi337748795 93 Cyanobacteria-Nostoc punctiforme PCC 73102 gi186682481 Bacteroidetes/Chlorobi-Capnocytophaga ochracea F0287 gi315224469 Viridiplantae-Chlorella NC64A jgi35601 100 Viridiplantae-Oryza sativa Japonica Group gi115464407 Bacteroidetes/Chlorobi-Capnocytophaga sp. oral taxon 329 str F0087 gi332876949 78 Viridiplantae-Arabidopsis lyrata subsp. lyrata gi297806095 Viridiplantae-Arabidopsis lyrata jgi486945 99 Viridiplantae-Arabidopsis thaliana gi15240950 Picochlorum-contig-15-g22-t1 100 Viridiplantae-Arabidopsis thaliana gi15234062 100 Proteobacteria-Pseudoalteromonas atlantica T6c gi109896461 Viridiplantae-Glycine max gi356506590 61 100 88 Viridiplantae-Populus trichocarpa gi224079117 100 78 Proteobacteria-Shewanella amazonensis SB2B gi119773445 65 99 Viridiplantae-Ricinus communis gi255563082 75 Viridiplantae-Physcomitrella patens subsp patens gi168008475 Proteobacteria-Pseudoalteromonas atlantica T6c gi109896448 85 Viridiplantae-Physcomitrella patens subsp. patens gi168021407 Viridiplantae-Chlamydomonas reinhardtii gi159491138 Bacteroidetes/Chlorobi-Bacteroides sp. D2 gi315921059 100 Rhodophyta-Porphyridium purureum 2012 contig 479.1 100 Chloroflexi-Ktedonobacter racemifer DSM 44963 gi298251494 Proteobacteria-Pseudoalteromonas tunicata D2 gi88857326 Fibrobacteres/Acidobacteria-Granulicella tundricola gi322433283 62 76 Planctomycetes-Planctomyces maris DSM 8797 gi149178136 75 Proteobacteria-Stenotrophomonas maltophilia JV3 gi344207627 100 Alveolata-Cryptosporidium muris RN66 gi209876424 Proteobacteria-Stenotrophomonas maltophilia K279a gi190574625 51 100 Alveolata-Cryptosporidium hominis TU502 gi67623969 99 80 Alveolata-Cryptosporidium parvum Iowa II gi66363022 100 Amoebozoa-Acanthamoeba castellanii tbACL00000790 3 100 Proteobacteria-Stenotrophomonas sp. SKA14 gi254522131 53 Amoebozoa-Hartmannella vermiformis tbHVL00001944 3 100 Proteobacteria-Xanthomonas fuscans subsp. aurantifolii str ICPB 11122 gi294625907 Proteobacteria-Xanthomonas fuscans subsp. aurantifolii str ICPB 10535 gi294664809 100 100 Cyanobacteria 100 Proteobacteria-Xanthomonas campestris pv campestris str 8004 gi66769470 100 Bacteroidetes/Chlorobi-Chitinophaga pinensis DSM 2588 gi256420798 65 100 Proteobacteria-Stigmatella aurantiaca DW4 3 1 gi310822571 Bacteroidetes/Chlorobi-Flavobacterium johnsoniae UW101 gi146301575 99 86 Chlamydiae/Verrucomicrobia-Methylacidiphilum infernorum V4 gi189220208 Proteobacteria-Stigmatella aurantiaca DW4 3 1 gi115373389 Chlamydiae/Verrucomicrobia-Chthoniobacter flavus Ellin428 gi196234738 69 Chlamydiae/Verrucomicrobia-Pedosphaera parvula Ellin514 gi223936081 Fibrobacteres/Acidobacteria-Granulicella tundricola gi322433051 8057 Cyanobacteria-Gloeobacter violaceus PCC 7421 gi37521626 Nitrospirae-Candidatus Nitrospira defluvii gi302037007 Fibrobacteres/Acidobacteria-Acidobacterium capsulatum ATCC 51196 gi225871709 57 100 Cyanobacteria-Cylindrospermopsis raciborskii CS 505 gi282900698 100 86 100 Cyanobacteria-Raphidiopsis brookii D9 gi282897903 81 100 Fibrobacteres/Acidobacteria-Terriglobus saanensis SP1PR4 gi320106524 Cyanobacteria-Anabaena variabilis ATCC 29413 gi75910313 51 Fibrobacteres/Acidobacteria-Granulicella tundricola gi322435793 97 Planctomycetes 63 Fibrobacteres/Acidobacteria-Candidatus Koribacter versatilis Ellin345 gi94967545 Bacteroidetes/Chlorobi-Bacteroides sp. 1 1 6 gi253571344 100 ChlamydiaeVerrucomicrobia-Parachlamydia acanthamoebae UV7 gi338174716 100 ChlamydiaeVerrucomicrobia-Parachlamydia acanthamoebae str. Hall’s coccus gi282892060 Thermotogae-Kosmotoga olearia TBF 19 5 1 gi239616888 ChlamydiaeVerrucomicrobia-Waddlia chondrophila WSU 86 1044 gi297621853 0.2 substitutions/site Other Bacteria/ Archaea

0.5 substitutions/site

39

100 Proteobacteria AB60 Francisella spp. 100 Actinobacteria-Streptomyces flavogriseus ATCC 33331 gi357409612 95 Picochlorum-contig-43-g357-t1 72 Actinobacteria-Streptomyces sp. SPB78 gi302519724 100 Cyanobacteria-Synechococcus sp CC9311 gi113954751 Stramenopiles-Phytophthora capsici jgi82743 Cyanobacteria-Synechococcus sp. WH 7805 gi88809700 74 Chloroflexi-Ktedonobacter racemifer DSM 44963 gi298248939 Actinobacteria-Streptomyces violaceusniger Tu 4113 gi345011319 100 Actinobacteria-Frankia sp. EuI1c gi312197329 100 100 Archaea-Methanosarcina acetivorans C2A gi20089483 Actinobacteria-Streptomyces griseoflavus Tu4000 gi302562622 65 Archaea-Methanosarcina barkeri str Fusaro gi73668953 92 Actinobacteria-Streptomyces zinciresistens K42 gi345852298 100 Proteobacteria-Providencia rettgeri DSM 1131 gi291327076 99 100 99 Proteobacteria-Providencia stuartii ATCC 25827 gi183598941 90 Proteobacteria-Dickeya dadantii Ech586 gi271501529 Actinobacteria-Streptomyces pristinaespiralis ATCC 25486 gi297191008 84 Proteobacteria-Edwardsiella tarda ATCC 23685 gi294634417 Actinobacteria-Stackebrandtia nassauensis DSM 44728 gi291300803 Proteobacteria-Desulfovibrio magneticus RS 1 gi239907262 91 Actinobacteria-Amycolatopsis mediterranei U32 gi300790149 100 100 54 97 Actinobacteria 100 99 Actinobacteria-Streptomyces sp. AA4 gi302526472 91 Metazoa-Nasonia vitripennis gi156538983 Actinobacteria-Actinosynnema mirum DSM 43827 gi256376007 97 Excavata-Leishmania infantum JPCM5 gi146099506 100 Excavata-Leishmania major strain Friedlin gi157876137 Actinobacteria-Janibacter sp. HTCC2649 gi84496412 Excavata-Leishmania braziliensis MHOM BR 75 M2904 gi154336655 Actinobacteria-Rubrobacter xylanophilus DSM 9941 gi108805388 100 Actinobacteria 100 Chloroflexi-Ktedonobacter racemifer DSM 44963 gi298247695 94 Chloroflexi-Ktedonobacter racemifer DSM 44963 gi298241261 FibrobacteresAcidobacteria-Terriglobus saanensis SP1PR4 gi320108709 Firmicutes-Clostridium spiroforme DSM 1552 gi169349600 67 Fibrobacteres/Acidobacteria-Candidatus Solibacter usitatus Ellin6076 gi116624232 100 89 FibrobacteresAcidobacteria-Terriglobus saanensis SP1PR4 gi320105657 Fungi Firmicutes-Mahella australiensis 50 1 BON gi332981218 Bacteroidetes/Chlorobi-Odoribacter splanchnicus DSM 20712 gi325282145 81 Rhodophyta-Boldia sp. contig 22757 1 Bacteroidetes/Chlorobi-Maribacter sp. HTCC2170 gi305664924 Bacteroidetes/Chlorobi-Leeuwenhoekiella blandensis MED217 gi86142021 100 91 89 Bacteroidetes/Chlorobi-Zunongwangia profunda SM A87 gi295136007 Firmicutes Bacteroidetes/Chlorobi-Flavobacterium johnsoniae UW101 gi146299791 60 74 Bacteroidetes/Chlorobi-Capnocytophaga canimorsus Cc5 gi340620830 70 72 Firmicutes-Staphylococcus hominis SK119 gi228475763 100 Tenericutes-Mycoplasma penetrans HF 2 gi26554143 77 Bacteroidetes/Chlorobi-Capnocytophaga sputigena ATCC 33612 gi213961665 Tenericutes-Mycoplasma iowae 695 gi350546566 65 100 Bacteroidetes/Chlorobi-Capnocytophaga ochracea DSM 7271 gi256819062 Firmicutes-Paenibacillus mucilaginosus KNP414 gi337748795 93 Cyanobacteria-Nostoc punctiforme PCC 73102 gi186682481 Bacteroidetes/Chlorobi-Capnocytophaga ochracea F0287 gi315224469 Viridiplantae-Chlorella NC64A jgi35601 100 Viridiplantae-Oryza sativa Japonica Group gi115464407 Bacteroidetes/Chlorobi-Capnocytophaga sp. oral taxon 329 str F0087 gi332876949 78 Viridiplantae-Arabidopsis lyrata subsp. lyrata gi297806095 Viridiplantae-Arabidopsis lyrata jgi486945 99 Viridiplantae-Arabidopsis thaliana gi15240950 Picochlorum-contig-15-g22-t1 100 Viridiplantae-Arabidopsis thaliana gi15234062 100 Proteobacteria-Pseudoalteromonas atlantica T6c gi109896461 Viridiplantae-Glycine max gi356506590 61 100 88 Viridiplantae-Populus trichocarpa gi224079117 100 78 Proteobacteria-Shewanella amazonensis SB2B gi119773445 65 99 Viridiplantae-Ricinus communis gi255563082 75 Viridiplantae-Physcomitrella patens subsp patens gi168008475 Proteobacteria-Pseudoalteromonas atlantica T6c gi109896448 85 Viridiplantae-Physcomitrella patens subsp. patens gi168021407 Viridiplantae-Chlamydomonas reinhardtii gi159491138 Bacteroidetes/Chlorobi-Bacteroides sp. D2 gi315921059 100 Rhodophyta-Porphyridium purureum 2012 contig 479.1 100 Chloroflexi-Ktedonobacter racemifer DSM 44963 gi298251494 Proteobacteria-Pseudoalteromonas tunicata D2 gi88857326 Fibrobacteres/Acidobacteria-Granulicella tundricola gi322433283 62 76 Planctomycetes-Planctomyces maris DSM 8797 gi149178136 75 Proteobacteria-Stenotrophomonas maltophilia JV3 gi344207627 100 Alveolata-Cryptosporidium muris RN66 gi209876424 Proteobacteria-Stenotrophomonas maltophilia K279a gi190574625 51 100 Alveolata-Cryptosporidium hominis TU502 gi67623969 99 80 Alveolata-Cryptosporidium parvum Iowa II gi66363022 100 Amoebozoa-Acanthamoeba castellanii tbACL00000790 3 100 Proteobacteria-Stenotrophomonas sp. SKA14 gi254522131 53 Amoebozoa-Hartmannella vermiformis tbHVL00001944 3 100 Proteobacteria-Xanthomonas fuscans subsp. aurantifolii str ICPB 11122 gi294625907 Proteobacteria-Xanthomonas fuscans subsp. aurantifolii str ICPB 10535 gi294664809 100 100 Cyanobacteria 100 Proteobacteria-Xanthomonas campestris pv campestris str 8004 gi66769470 100 Bacteroidetes/Chlorobi-Chitinophaga pinensis DSM 2588 gi256420798 65 100 Proteobacteria-Stigmatella aurantiaca DW4 3 1 gi310822571 Bacteroidetes/Chlorobi-Flavobacterium johnsoniae UW101 gi146301575 99 86 Chlamydiae/Verrucomicrobia-Methylacidiphilum infernorum V4 gi189220208 Proteobacteria-Stigmatella aurantiaca DW4 3 1 gi115373389 Chlamydiae/Verrucomicrobia-Chthoniobacter flavus Ellin428 gi196234738 69 Chlamydiae/Verrucomicrobia-Pedosphaera parvula Ellin514 gi223936081 Fibrobacteres/Acidobacteria-Granulicella tundricola gi322433051 8057 Cyanobacteria-Gloeobacter violaceus PCC 7421 gi37521626 Nitrospirae-Candidatus Nitrospira defluvii gi302037007 Fibrobacteres/Acidobacteria-Acidobacterium capsulatum ATCC 51196 gi225871709 57 100 Cyanobacteria-Cylindrospermopsis raciborskii CS 505 gi282900698 100 86 100 Cyanobacteria-Raphidiopsis brookii D9 gi282897903 81 100 Fibrobacteres/Acidobacteria-Terriglobus saanensis SP1PR4 gi320106524 Cyanobacteria-Anabaena variabilis ATCC 29413 gi75910313 51 Fibrobacteres/Acidobacteria-Granulicella tundricola gi322435793 97 Planctomycetes 63 Fibrobacteres/Acidobacteria-Candidatus Koribacter versatilis Ellin345 gi94967545 Bacteroidetes/Chlorobi-Bacteroides sp. 1 1 6 gi253571344 100 ChlamydiaeVerrucomicrobia-Parachlamydia acanthamoebae UV7 gi338174716 100 ChlamydiaeVerrucomicrobia-Parachlamydia acanthamoebae str. Hall’s coccus gi282892060 Thermotogae-Kosmotoga olearia TBF 19 5 1 gi239616888 ChlamydiaeVerrucomicrobia-Waddlia chondrophila WSU 86 1044 gi297621853 0.2 substitutions/site Other Bacteria/ Archaea

0.5 substitutions/site Figure 2.6. Examples of two types of RAxML trees that show evidence of HGT in

Picochlorum SENEW3. The results of 100 bootstrap replicates are shown on the

branches when ≥50%. (A) Tree inferred from an indolepyruvate decarboxylase enzyme

that shows Picochlorum SENEW3 to be monophyletic with bacteria, not Viridiplantae

which are elsewhere in this tree. (B) Tree inferred from an alpha-1,2-mannosidase

enzyme that provides an example of a tree containing only Picochlorum SENEW3 and a

wide diversity of bacteria.

40

Cell wall synthesis Cell wall Sulfate degradation ester Proteins (f) (b) SulfateSulfatS Protein (e) degradationon (a) Aminono AcAcidsds + -2 SugarsS Na SO4

Growth hormone (d) synthesisy SulfurSulfur GlycerolGlycer Osmolytesmomolytesl

DMSPDMSP GrowthGrowG (c)(c) Nucleus

CellC cycle control (g)

Figure 2.7. Putative functions in Picochlorum SENEW3 conferred by HGT. Many genes have roles in cell growth, response to nutrient limitation, or production of osmolytes for osmoregulation. (a) Glycosyltransferases, glycoside hydrolases, and polysaccharide lyases may function in cell wall degradation. Some may be excreted from the cell. (b)

GDP-mannose-4,6,-dehydratase and GDP-L-fucose synthase involved in fucose synthesis, important for cell wall integrity. (c) Indolepyruvate decarboxylase that catalyzes the conversion of indolepyruvate to indole-3-acetic acid, a plant auxin. (d)

Glycerol dehydrogenase that catalyzes the conversion of glycerone to glycerol, a common osmoprotectant. (e) Three peptidases that are involved in protein cleavage to amino acids. (f) A periplasmic arylsulfatase that converts sulfate esters to sulfate for uptake into the cell. (g) Regulation of chromosome condensation (RCC1) protein that is

41 involved in cell cycle control.

.

Table 2.1 Clustered genes in shared pathways in Picochlorum SENEW3

42

genome and their gene genome and putative

SENEW3

Picochlorum

identified 2.2Instances that HGTTablethe of were in gene domain putative annotations, the gene Putative origin, functions. ofresults BLAST phylogenetic analysis, theare the EST mappedof togenesprokaryoticdonor, number shown. gene function, reads that and putative

43

1.00E+08

1.00E+07

L

1.00E+06 Cells/m 0 g/L glucose 1 g/L glucose 5 g/L glucose 1.00E+05 10 g/L glucose 30 g/L glucose

1.00E+04 0 50 100 150 200 250 300 Hours

Figure S1. Mixotrophic growth of Picochlorum SE3. GrowthFigure ofS2.1 Picochlorum. Mixotrophic SE3 growth culture sof in Picochlorum the absence SENEW3of high salt cultures stress (0.4 in the M NaCl)absence of with the addition of different amounts of glucose. The innoculation density was 1x10high E+5salt stresscells/mL. (0.4 The M NaCl)error bars with represent the addition standard of different error from amounts triplicate of glucose. cultures. The inoculation density was 1 x 105 cells ml−1. The error bars represent standard error from triplicate cultures.

Table S2.1 List of predicted proteins in Picochlorum SENEW3 showing their putative annotations and results of a BLASTP search against a comprehensive in-house database.

44

Table available in online supplementary excel file. http://onlinelibrary.wiley.com/doi/10.1111/1462-2920.12541/full

Table S2.2 Ostreococcus tauri nitrate assimilation gene clusters and the corresponding

Picochlorum SENEW3 genes and their contig locations. Genes located on the same contig are shown in boldface. The genes are ordered according to their physical location in the Ostreococcus tauri genome. We note that the Ostreococcus Maf4 gene located in the Cnx2-Maf4-Cnx5 cluster described in the orginal paper is absent from the gene cluster in the lastest genome assembly (Ostreococcus tauri v2.0 from JGI).

Picochlorum SENEW3 Gene number Annotation gene 1 Nar1 contig_150.g639.t1 2 Nia contig_16.g91.t1 3 Nii contig_16.g88.t1 4 Nar2 Missing 5 Nrt2 contig_16.g90.t1 6 Snt contig_16.g87.t1 7 Cnx2 contig_16.g86.t1 8 Cnx5 contig_62.g288.t1 9 Cb5f contig_150.g664.t1

45

Chapter 3: Elucidating Salinity Shock Response Mechanisms in

Picochlorum

Fatima Foflonker1, Gennady Ananyev2,3, Huan Qiu4, Andrenette Morrison5, Brian

Palenik6, G. Charles Dismukes2,3, and Debashish Bhattacharya1

1Department of Biochemistry and Microbiology, 2Waksman Institute of Microbiology,

3Department of Chemistry and Chemical Biology, 4Department of Ecology, Evolution and Natural Resources, 5Department of Marine and Coastal Sciences, Rutgers University,

NJ, USA, 6Scripps Institution of Oceanography, University of California, San Diego, La

Jolla, CA, USA

Abstract

The broadly halotolerant green alga, Picochlorum strain SENEW3, has a highly reduced nuclear genome of 13.5 Mbp that encodes only 7,367 genes. It was originally isolated from a shallow, mesophilic brackish-water lagoon that experiences extreme changes in temperature, light, and in particular, salinity (freshwater to 3-fold seawater). We challenged Picochlorum cells with high or low salinity shock and used transcriptomic and chlorophyll fluorescence analyses to elucidate tolerance to salinity fluctuation. The transcriptome analysis showed that one-half of the coding regions are differentially expressed in response to salinity changes. In addition, a significant number of co- expressed genes (usually from different metabolic pathways) are co-localized in the genome, forming 2-10 gene clusters. Whereas the overall salt stress response in

46

Picochlorum SENEW3 is similar to that in other salt-tolerant algae, the “operon-like” structure in this species likely contributes to rapid recovery during salinity fluctuation. In summary, our work elucidates how evolutionary forces play out in a streamlined genome.

Picochlorum SENEW3 relies on a broad array of adaptations from the reliance on horizontally transferred adaptive genes to the co-localization of stress response genes and a robust photosystem II to deal with a fluctuating environment. These attributes make

Picochlorum SENEW3 of great biotechnological interest.

Abbreviations: HGT, horizontal gene transfer; DE, differentially expressed; ASW, artificial seawater; FRRF, fast repetition rate fluorometer; STF, single turnover flash;

PSII, photosystem II; WOC, water-oxidizing complex; QF, quality factor; FDR, false discovery rate; L2fc, log2 fold change; TCA, tricarboxylic acid cycle; THF, tetrahydrofolate; LIN, lincomycin.

Introduction

Most well-known eukaryotic extremophiles are better described as polyextremophiles because they survive multiple types of environmental stress. The sea ice diatom, Melosira arctica, faces not only cold temperatures but also high pH, irradiance stress, and fluctuating salinity due to the brine habitat in the ice (129). On the other end of the temperature scale, Galdieria sulphuraria and Cyanidioschyzon merolae, red algal thermoacidophiles native to hot spring environments, tolerate high levels of salinity and toxic metal concentrations as well as low pH (130). What is shared by these species in extreme environments are highly reduced genomes, gene inventories (5-10

47 thousand genes), and functional specialization (131). In the case of G. sulphuraria, horizontal gene transfer (HGT) and expansion of gene families with adaptive functions also have played key roles in their evolutionary history (58, 59). But are there environments that are less exotic (i.e., mesophilic) that may pose just as much environmental risk? These habitats could include the intertidal zone and shallow water lagoons, both of which endure high ultraviolet and visible light levels, desiccation, and fluctuating salinity. To address this issue, we studied Picochlorum sp. strain SENEW3

(SENEW3) (hereafter, Picochlorum SENEW3), a “polyextremotolerant” green alga

(Chlorophyta, Trebouxiophycae) that was isolated from a shallow water estuary in San

Diego County, California. This alga has one of the smallest genomes known (13.5 Mbp) for a free-living and encodes only 7,367 genes (132). Nonetheless,

Picochlorum SENEW3 is remarkably robust in the face of environmental perturbations, thriving in freshwater as well as in 3-fold the salinity of seawater, light intensities between 80–2000 µE/m2/s, and temperatures that range from at least 16-33oC (55).

Picochlorum SENEW3 shows similarities in halotolerance range, environment, and other abiotic stress tolerances to the closely related species Picochlorum oklahomensis isolated from the Great Salt Plains of Oklahoma (133).

Several other Picochlorum species including P. oklahomensis and P. atomus, have been investigated and are considered suitable for biofuel, nutritional, nutraceutical, and waste water remediation applications (9, 45-47, 85, 134-137). This is due to the robustness, ability to harvest through common flocculation methods (136), high biomass production (1.8-2.1 g/L maximum biomass) (45, 46), high protein content for potential use as an feedstock (350-550 g/kg) (9, 46, 134), high carotenoid content for high value

48 co-products (45), and lipid accumulation (total lipid content reported to be 16-25% dry weight, unstressed cells) (45-47, 134) exhibited by members of this genus. The NAABB consortium reported a Picochlorum strain with the ability to rapidly accumulate lipids under nitrogen depletion, accompanied with an increase in starch accumulation, and demonstrated effective genetic manipulation in increasing lipid accumulation by as much as 38% (Department of Energy, 2014). Salinity has also been reported as an effective method of crop protection in reducing freshwater cyanobacterial contaminants in P. atomus cultures (9). Therefore, Picochlorum SENEW3 may be of great biotechnological interest due to its ability to withstand a hypervariable environment. This allows the use of salinity as a crop protection mechanism and makes this alga potentially suitable for large- scale open-pond cultivation.

Results and Discussion

High and low salinity stress elicits separate metabolic responses

The growth rate and average PSII quantum efficiency (proportional to Fv/Fm) were used to assess the effects of various salt concentrations (10mM – 1.6 M NaCl) on

1M NaCl pre-adapted cultures of the alga (Fig. 3.1). The time dependence of Fv/Fm over

24 h shows that the PSII quantum yield, which is a measure of the light energy conversion to heat + photochemical water oxidation, is diminished at both lower and higher salt concentrations. The salt dependence of Fv/Fm after 24 h follows a similar sequence as the growth rate, indicating that energy resources that normally go into growth are diverted to the maintenance of osmotic balance, with both hypo- and hyper- osmotic balance diverting energy resources. Higher growth rates at lower salinities (0.1

49 and 0.01M) than the pre-adapted condition (1M) may simply indicate that lower salinity results in optimum growth over the period of several days. However, Fig. 3.1 indicates that lower salinities have similar effects as high salinities in the short-term. When examined in more detail it is apparent that the kinetic response of Fv/Fm differs for hypo- vs. hyper-osmotic changes. The kinetics of recovery are faster when the change in salt concentration is smallest for both hypo- and hyper-osmotic changes. Additionally, all of the hyper-salinity samples (1M, 1.5M, and 1.6 M NaCl) attain a steady-state between 7h and 24 h, whereas the hypo-salinity samples (0.1M, 0.01M, and 0 M NaCl) are still recovering at 24 h, indicated by the positive slopes. This difference may be simply because the changes in concentration are so much larger (10X and 100X), thereby slowing recovery. No large-scale cell rupture due to osmotic shock was observed microscopically, but smaller-scale rupture could have gone unnoticed due to the small cell size (2-3 µm) of this coccoid alga.

For the transcriptome analysis we chose 1.5M and 10mM NaCl concentrations to represent hyper- and hypo-salinity stresses because both elicited similar photosynthetic efficiency responses in the short-term and exhibited some recovery of growth rate under long-term acclimation. The 1M to 10mM stress represents what Picochlorum cells might experience during a rain event in its natural environment (55). The two salinity shock conditions were normalized to a pre-stress state (1M NaCl, 0 h), and therefore does not address the stress responses from algal cell transfer indicated by the drop in Fv/Fm of the control cells (Fig. 3.1a). Transcriptomic analysis reveals that salt stress induces significant gene expression changes in Picochlorum SENEW3 with a total of 3,681 genes

(50% of the nuclear gene inventory) being differentially expressed (DE; >1 log2 fold

50

change [hereafter, L2fc]), p < 0.01, adjusted for the false discovery rate, FDR, of 5%) by salt stress over the 5-hour time course, relative to the pre-stress control. This includes 12 of 24 genes that originated in Picochlorum SENEW3 via HGT from bacterial sources

(Fig. 3.S2) (132), suggesting functional relevance of these foreign genes in salt tolerance, analogous to what was found in G. sulphuraria (58, 59). More genes showed differential expression at 1h (3,256) than at 5h (1,629) and around one-half of the DE genes at 1h returned to below threshold levels (<1 and <-1 L2fc) by the 5h time point (77% high, 68% low salinity) (Fig. 3.S1, 3.S2). In addition, each salinity treatment resulted in different

DE gene sets, with some statistically significant overlap of genes involved in multiple treatments (Fig. 3.S1, Table 3.S3; the full list of DE genes is shown Table 3.S4). For example, the 1h up-regulated gene set shared 173 genes, with 742 and 722 genes belonging uniquely to the high and low salinity responses, respectively. In terms of biochemical pathways, there is little overlap in the initial high and low salinity responses, indicating two very different types of stress on the cell (Fig. 3.3,Fig. 3.S3). We posit that high and low salinity shock, and salt acclimation all represent different challenges to

Picochlorum SENEW3, which deals with them using specialized gene sets.

Co-localization of co-expressed genes in response to salt shock

For each culture treatment, we searched for clusters of co-localized DE genes with a shared expression pattern (i.e., co-expressed), allowing for a maximum of two intervening genes not sharing that expression pattern (Fig. 3.2a). Randomization analysis was performed to determine the statistical significance of these putative clusters (Fig.

3.S4). These results indicate that the number of genes in clusters is statistically significant

51 in most data sets, meaning that larger clusters of co-localized genes are found within co- expressed data sets than could be attributed to chance alone. Under some conditions, the number of clusters formed is also significant. These results hold true for the majority of the conditions, even when more stringent gene expression cutoffs are used (1.5 vs. 1.0

L2fc) (Table 3.S5, Fig. 3.2a). Allowing for three intervening genes resulted in loss of statistical significance in clustering. The underlying transcription control mechanisms of these clusters are, however, unknown.

The co-localized genes comprise 2-10 gene clusters and constitute 42-72% of the total number of genes in individual co-expressed data sets (1.0 L2fc cutoff). The results shown in Fig. 3.2 demonstrate that more co-expressed genes, both up- and down- regulated and at both salinities, are co-localized at 1h when compared to the 5h time point. For example, 61.6% (546/886) of genes form clusters under high salinity at 1h compared to 42.4% (225/531) at the 5h time point (1.0 L2fc cutoff). This suggests that genes are organized in close proximity in the genome for more efficient regulation of gene expression during the initial phase of salinity shock. Although these clusters of co- localized and co-expressed genes are functionally related in that they all take part in the salt stress response, the vast majority do not appear to be members of the same biochemical pathway. Exceptions include genes involved in urea degradation, nitrate assimilation, acetate assimilation, pyruvate to acetoin conversion, some light regulated genes, and photorespiration (Table 3.S6). Interestingly, many of the co-localized genes in

Picochlorum SENEW3 are not physically linked in the genomes of other green algae such as Chlorella vulgaris and Coccomyxa subellipsoidea (Table 3.S6; full output presented in Table 3.S7). Some clusters, including genes in the nitrate assimilation

52 pathway, however, are partially conserved in other green algae including, C. vulgaris, C. subellipsoidea, and C. reinhardtii (138). The cluster of photorespiration related genes appear to be conserved in C. subellipsoidea as well. These results suggest that stress- related genes in streamlined eukaryotic genomes may be organized in operon-like structures similar to bacteria. This presumably allows rapid activation under stressful conditions. However, no evidence of polycistronic transcription was found in

Picochlorum SENEW3.

High photorespiration influences carbon and nitrogen flux at high salinity

The most striking change in response to high salinity shock at 1h is the up- regulation of genes involved in photorespiration (Fig. 3.3; Table 3.S8), which may function in stress protection (139). Photorespiration in green algae such as C. reinhardtii differs from that in higher plants, in that glycolate is converted to glyoxylate via glycolate dehydrogenase in the mitochondrion rather than via glycolate oxidase in the peroxisome

(140). In photorespiration in higher plants, H2O2 is a byproduct of the glycolate oxidase reaction, resulting in an increased need for catalase activity. Up-regulation of glycolate dehydrogenase and down-regulation of glycolate oxidase and catalase suggests that the mechanism of photorespiration in Picochlorum SENEW3 is similar to that of other green algae. Glycolate dehydrogenase is also located in a cluster with two other genes in the photorespiration pathway (Table 3.S6).

Reduced CO2 concentrations at high salinities can inhibit the preferred photosynthetic carbon fixation pathway of Rubisco, stimulating the competing pathway of photorespiration. Conditions of CO2 limitation are also correlated with increased

53 carbon concentrating activities via up-regulation of carbonic anhydrase in Dunaliella salina; i.e., to increase photosynthetic carbon fixation through the Calvin-Benson-

Bassham cycle (25, 141). Picochlorum SENEW3 also displays this response with strong up-regulation of carbonic anhydrase under high salinity stress. The TCA cycle is down- regulated at 1h, potentially due to the inhibition of reductant production via the TCA cycle due to excessive NADH produced during the glycine to serine conversion in photorespiration (142) (Fig. 3.3, 3.S5). One-carbon metabolism including tetrahydrofolate (THF) synthesis from glycine and the interconversion between its derivatives is also highly up-regulated under high salinity. In Arabidopsis, THF metabolism is essential to photorespiration, thereby maintaining carbon fixation under

CO2-limited conditions (143).

High photorespiration rates in plants result in the generation of ammonia as a byproduct of the conversion of glycine to serine, which is subsequently converted to glutamate (144). Accordingly, one copy each of glutamine synthetase and ferrodoxin- dependent glutamate synthase, the major pathway for ammonia fixation, is up-regulated in Picochlorum SENEW3 under high salinity stress. Nitrate and urea assimilation are also significantly up-regulated, potentially serving as nitrogen sources for the increased protein synthesis demands under salinity stress. The opposite is true for nitrogen assimilation at low salinity stress, however, translation and ribosome biogenesis are up- regulated. These results again indicate that Picochlorum SENEW3 cells are less stressed at low (versus high) salinity.

54

Starch and osmolytes

Proline is the major osmolyte in Picochlorum oklahomensis; glycerol and glucosylglycerol were detected to a lesser extent (145). One gene involved in proline synthesis was up-regulated in Picochlorum SENEW3 under high salinity, and an increased expression of pathways leading towards glutamate synthesis, the precursor of proline, was observed. In Dunaliella, starch formation decreases and starch degradation increases as salinity increases in favor of glycerol synthesis (31, 146). In contrast, starch synthesis is up-regulated and starch degradation is down-regulated, whereas no evidence of increased glycerol synthesis was observed in Picochlorum SENEW3 under the high salt conditions used in this experiment. Synthesis of trehalose, another osmolyte, is up- regulated at 1h and sorbitol is up-regulated at 5h. As expected, in order to maintain osmotic equilibrium at low salinity, osmolyte synthesis is down-regulated, and accordingly starch synthesis is up-regulated. The up-regulation of starch synthesis under both high and low salinity stress conditions may indicate that proline rather than glycerol is the major osmolyte in Picochlorum SENEW3. Other aspects of the stress response in

Picochlorum SENEW3 (e.g., cell wall and membrane remodeling) are discussed in the supplementary information Notes S1.

Response of the photosynthetic machinery

Photoinhibition involves light-induced oxidative damage to the PSII protein D1 and inactivation of the WOC (147, 148). PSII repair entails D1 protein digestion by internal proteases and replacement of the damaged protein via de novo synthesis (149).

Photoinhibition, when accelerated by environmental stresses, including salinity, reduces

55

photosynthetic carbon fixation, which accelerates light-driven H2O2 generation, which in turn accelerates D1 protein damage (38, 148, 150, 151). We measured the rate of photoinhibition under high light, 1500 µE m−2 s-1 (i.e., different from the transcriptome experimental conditions), at various salinities in the presence and absence of the chloroplast protein synthesis inhibitor lincomycin (LIN). These photoinhibition curves

(Fig. 3.4) are the result of several interacting phenomena: photoprotection, D1 biosynthesis de novo, and osmoregulation after salinity shock. They may be fitted via biexponential decrease of Fv/Fm, with the two phases distinguished by an initial period of osmoregulation and a later phase dominated by D1 biosynthesis. Both high and low salinity shock conditions exhibit a similar pattern of reduced decline of D1 protein synthesis compared to the control (1M NaCl); 10 mM NaCl shows the shortest difference in lifetimes between samples with and without LIN (1.6 min) (Fig. 3.4a) compared to the control (16.9 min), 1.5M NaCl (6.1 min), and 0.4M NaCl (6.8 min) conditions. We postulate that the additional stress of salinity shock diverts resources from D1 protein repair to the energy intensive process of maintaining cell homeostasis during salt shock.

These results partially support the hypothesis that high salinity may provide protection against other abiotic stresses such as irradiance and temperature. This phenomenon has been reported for other halotolerant green algae, such as Nannochloris sp. and Dunaliella parva (152). However, because our experiments examined salinity shock, the observed lessened stress effects under 0.4M and 1M NaCl and high light may be better explained by acclimation.

Fig. 3.S7 highlights the faster recovery of the WOC cycling efficiency, indicated by the quality factor (QF) at low salinity shock versus high salinity shock (halftime = 0.5

56 h at 10mM NaCl and 4.5 h at 1.5 M NaCl). Recovery of WOC cycling efficiency appears to correlate with recovery of Fv/Fm or overall photosynthetic efficiency at high salinity, but recovers faster than Fv/Fm at low salinity. Whereas, D1 repair is equally inhibited by the low and high salinity conditions, WOC efficiency may provide a greater contribution to photodamage at high salinities.

Materials and Methods

Salinity shock experimental conditions

Picochlorum sp. strain SENEW3 was cultivated in artificial seawater (153) based

Guillard’s f/2 medium (154) containing 1M NaCl, without silica (f/2 ASW –Si). Cultures were grown at 25° C under continuous light (100 µE m−2 s-1) on a rotary shaker at 100 rpm (Innova 43, New Brunswick Eppendorf). To determine growth rate, cells adapted to

1M NaCl f/2 media were inoculated in f/2 media containing various salt concentrations and cell counts were performed using a hemacytometer (Neubauer improved, Hausser

Scientific), image capture (Infinity 2 camera, Lumenera corporation), and ImageJ counting software. Growth rates were determined based on cell counts during exponential phase using triplicate cultures (Equation 1).

growth rate = !.!"!(!"#(!"#$%!)!!"# (!"#$%!)) (1) (!"#$!!!"#$!)

For the transcriptome experiment, cells adapted to 1M NaCl were pelleted, washed, and inoculated in fresh f/2 media containing 1.5M NaCl (high salt stress) and

10mM NaCl (low salt stress). NaCl was the only component of the f/2 ASW -Si media

57 modified for all experiments. Approximately 100 mg of cells were harvested and flash frozen after 1h and 5h of incubation with salt treatment under 100 µE m−2 s-1 light. Cells harvested prior to treatment were used as the control. This experiment was performed in triplicate; i.e., three separate cultures were used for each salinity condition and each of these was sampled at the 1h and at 5h time points.

Transcriptome sequencing

Frozen algal cell samples were homogenized using the TissueLyser II (Qiagen) and total RNA was extracted according to the RNeasy Plant Mini Kit (Qiagen) protocol.

The cDNA libraries were constructed using the TruSeq RNA Sample Preparation Kit

(Illumina, San Diego, CA) following the manufacturer’s protocols. RNA concentrations were determined using a NanoDrop 2000c Spectrophotometer (Thermoscientific) and

Qubit 2.0 Fluorometer (Life Technologies). Libraries were sequenced using the MiSeq

Personal Genome Sequencer (Illumina) with 2x80bp (paired end) and 1x160bp read lengths. See Table 3.S1 for details. These Illumina transcriptome data can be retrieved via

NCBI BioProject RJNA245752 and are also available at http://cyanophora.rutgers.edu/picochlorum/.

Transcriptome analysis

RNA-seq reads were trimmed and aligned to the Picochlorum SENEW3 genome using CLC Genomics Workbench. Reads < 50bp in length were discarded during trimming and the RNA-seq analysis in CLC was performed using default parameters.

Differential expression was determined using DESeq (R/bioconductor package) (155)

58 and read counts determined by the CLC RNA-seq analysis as the input. A P-value of 1% and a Log2 fold change of 1 were set as the cutoffs for all differentially expressed (DE) genes discussed in this paper. DE genes were used as input for the KEGG metabolic pathway mapping tool (156) to identify metabolic impacts of salinity stress. Gene ontology (GO) term enrichment analysis was performed using the Fisher’s exact test as part of the blast2go software (128). Differential gene expression overlap between treatments was determined with Venny 2.0 (157). The inference of protein targeting was done using TargetP (158).

Co-localization analysis

A co-regulated gene cluster was defined as two or more genes with shared properties (e.g., up-regulation in response to salt treatment) that were physically linked.

One or two intervening genes (without the shared properties) between adjacent genes were allowed. Gene clusters were merged when separated by two genes or less. To test if the observed clustering of genes up-regulated by salt treatment was significantly more than expected by chance alone, we removed genes derived from contigs encoding less than three genes. Clusters of up-regulated genes were then identified as described above.

We randomly sampled the same number of genes as those actually up-regulated from the total gene population, and identified gene clusters using this randomly sampled gene set.

This random sampling-based analysis was repeated 1,000 times and the information regarding resulting gene clusters was recorded. The gene cluster information (the number of clusters and number of genes in clusters) derived from actual data and randomly generated data were plotted as shown in supplementary information (SI) Fig. 3.S4. The p-

59 value of actual gene number in a cluster (or cluster number) was defined as the number of random samples generating a clustered gene number equal to or greater than the actual number divided by the total random sample size (i.e., 1000).

Photosynthetic measurements

Chlorophyll fluorescence was measured in Fig. 3.1 using a fast repetition rate fluorometer (FRRF) with saturated light pulses produced by a laser diode at an intensity of 32,000 µE m−2 s-1 with a 50µs flash duration (defined as single turnover flash, STF), after 2 minutes of dark incubation (159). Mid-exponential phase cells adapted to 1M

NaCl were pelleted then shocked in media with various salinities. Aliquots and measurements were taken hourly. The relative quantum efficiency of PSII charge separation (QY) was approximated by the yield of variable chlorophyll fluorescence intensity (Fv/Fm) (160). Signal averaging of four trains of 50 flashes was performed with

2 min dark pre-incubation between each train. The steady-state value of Fv/Fm was obtained by averaging the first 50 individual STFs, denoted Fv/Fm, and is reported herein for three biological replicates. This average eliminates the transient damping of period- four oscillations.

The relative fraction of photosystem II water-oxidizing complex (PSII-WOC) centers that achieve productive water oxidation following primary charge separation was determined from lines fit to the amplitude of the period-four oscillations of Fv/Fm from individual STFs. The oscillations from 50 STFs were fitted to a modified Kok model

(VZAD model) to obtain the inefficiency parameters describing the rate of damping and the S state populations in the dark prior to flashing (161). By analogy with resonance

60 circuit theory, the Quality Factor (QF) is used to quantify the efficiency of oscillations, defined as the inverse of the Kok damping terms, QF = (alpha + beta)-1. Here, alpha represents misses (STF not resulting in advancement of oxidation state of PSII-WOC) and beta represents double hits (STF resulting in advancement by two oxidation states).

Kok parameters were determined as an average of four measurements. The QF is reported as an average of three biological replicates.

Photoinhibition was determined by tracking Fv/Fm during exposure to 1500 µE m−2 s-1 high light conditions in the presence and absence of chloroplast protein synthesis inhibitor lincomycin (LIN). Exponential phase cells adapted to 1M NaCl were pelleted and then shocked with 0.01, 0.4, 1, and 1.5 M NaCl with the addition of 10 mM bicarbonate and 1 mM LIN. Fv/Fm was measured using a PAM (pulse amplitude modulated) fluorometer (Photon Systems Instruments, Brno, Czech Republic) after 2- minute dark adaptation. Samples were exposed to high light for a total of 70 minutes with measurements taken every ten minutes (7 min light exposure, 2 min dark adaptation, 1 minute measurement time). Experiments were performed with three biological triplicates.

The PSII inactivation rate was calculated by fitting Fv/Fm over time to a biexponential decay function using the OriginPro 2014 software (OriginLab). The lifetimes of the biexponential curves were calculated using the fitting parameters (Fig. 3.4) and equation

2:

�������� = �1�1 + (�2 + �2) (2)

61

Conclusions

Our genome-wide and PSII analyses demonstrate that despite living in an apparently mesophilic lagoon environment, environmental fluctuations have left significant “footprints” on the Picochlorum SENEW3 genome. Diametrically opposed salinity conditions impose distinct challenges that this alga responds to with specialized gene sets. Shock and acclimated responses also differ, highlighting the challenges posed by a rapid versus gradual environmental change with respect to salinity. Overall,

Picochlorum SENEW3 responds to high/low salinity stress in a similar fashion as other algae and plants; i.e., photoprotective mechanisms, oxidative stress response, cell wall and membrane rearrangement, nitrogen assimilation, and diverting resources from growth and PSII repair in favor of maintaining homeostasis. Despite these shared responses, the key to the ability of Picochlorum SENEW3 to withstand massive environmental fluctuations is likely explained by genome organization. Co-localization of genes within specialized gene sets appears to function similarly to bacterial operons in enhancing a rapid response to shock. In summary, our study highlights a compact genome that has evolved a broad array of adaptations from HGT to co-localization of stress response genes to a robust PSII to deal with a challenging environment. Although our results suggest that energetic resources are diverted from growth and productivity during periods of salinity shock in favor of maintaining cell homeostasis, Picochlorum SENEW3 is highly adapted to rapid acclimation (beginning within 5 hours of salinity shock) to salinity shock and maintains growth rates over a broad range of salinities. Therefore,

Picochlorum SENEW3 may be suitable for open-pond cultivation under conditions of high irradiance and high salinity, which would not be tolerable to competing freshwater

62 and many marine microalgal species. Its broad range of salinity and shock tolerance makes Picochlorum SENEW3 particularly suited to cultivation in seawater or saline groundwater because it can tolerate evaporative loss in an open-pond environment.

Acknowledgements

This work was supported by grants from the Department of Energy (DE-

EE0003373/001) to D.B. and B.P., a grant from the DOE Office of Basic Energy

Sciences (DE-FG02-10ER16195) to G.C.D., and graduate training support from the

National Science Foundation IGERT for Renewable and Sustainable fuels program at

Rutgers University (0903675) to F.F. We are grateful to the Rutgers University School of

Environmental and Biological Sciences and members of the Genome Cooperative at

SEBS for supporting this research. F.F. was aided by an R workshop sponsored by the

Porphyra Algal Genomics RCN (NSF0741907).

63

Figure 3.1. (a) Average chlorophyll variable fluorescence yield (Fv/Fm) and (b) Growth rate of the algal cultures acclimated to 1M NaCl media, following transfer to media containing various salt concentrations under 100 µE m−2 s-1 light. Fv/Fm is the average of

4 trains of 50 STFs and three biological replicates. Error bars represent standard deviation.

64

Figure 3.2. (a) Examples of co-expressed and co-localized gene clusters. Gene clusters, as defined in the text, are denoted with boxes. Blue represents up-regulated genes; red, down-regulated; gray, not DE. The numbers above the boxes show relative gene locations within contigs; gene numbers are labeled above the boxes. (b) Number of genes co- localized versus total genes in gene set (percentages labeled) at the 1.0 and1.5 L2fc cutoffs. Only statistically significant results are shown (see Table 3.S5 for details).

65

66

Figure 3.3. Summary of the salt shock response in Picochlorum SENEW3 at 1h under (a) high salinity (1.5M NaCl) and (b) low salinity (10mM NaCl) conditions. Genes/pathways in blue are up-regulated (>1.0 L2fc); red, down-regulated (< -1.0 L2fc); gray, not DE

(<1.0 L2fc and > -1.0 L2fc). Solid colored arrows indicate that at least one copy of the gene is DE (i.e. other copies may not be DE), whereas arrows with a blue/red gradient have gene copies that are both up- and down-regulated. The numbers correspond to enzymes shown in Table 3.S8. G3P, glyceraldehyde 3-phosphate; DHAP, dihydroxyacetone phosphate; DHA, dihydroxyacetone; THF, tetrahydrofolate; SQDG, sulfoquinovosyl diacylglycerol; UDP-GlcNAc, uridine diphosphate N- acetylglucosamine; geranylgeranyl-PP, geranylgeranyl pyrophosphate.

67

Figure 3.4. Photoinhibition under 1500 µE m−2 s-1 high light conditions in the presence and absence of chloroplast protein synthesis inhibitor lincomycin (LIN). Cells adapted to

1M NaCl media were incubated in media at 0.01, 0.4, 1, and 1.5M NaCl. The data are fit to a second order exponential decay curve. The relative amplitudes (A1, A2) and lifetimes in minutes (t1, t2) values are shown. Error bars represent standard deviation.

Supporting Information

Notes S1. Salinity stress responses in Picochlorum SENEW3

68

Cell wall remodeling prevalent at both high and low salinity stress

Cell wall composition and rigidity are known to change in plant cells under salinity stress, leading to strengthening or weakening of the cell wall (162, 163). DE of genes related to the synthesis and degradation of cell wall constituents at both high and low salinity shock might imply a role for cell wall rearrangement in response to salt shock in

Picochlorum SENEW3. This includes up-regulation of synthesis and degradation of the potential cell wall components chitin and cellulose. Cellulose synthase was shown to be essential for salinity and tolerance in Arabidopsis (164, 165) or may be potentially involved in oxidative stress regulation rather than cell wall rearrangement (165).

Chitinases may alternatively function in plant signaling in response to abiotic stresses including osmotic stress (166, 167). N-glycan biosynthesis is also highly affected by salt shock; it is down-regulated at high salinity and up-regulated at low salinity shock at 1h.

N-glycans are essential for salt stress tolerance in Arabidopsis and play a role in protein folding and maintaining proper cell wall formation (164).

Membrane remodeling key to low salinity shock response

Hypoosmotic shock leads to the rapid influx of water into a cell. Dunaliella, which lacks a rigid cell wall, can expand rapidly upon dilution shock. It relies on vesicle fusion of small vesicles from the endoplasmic reticulum (ER) for plasma and chloroplast membrane expansion (17). Genes involved in vesicle trafficking are up-regulated at 1h under low salinity shock in Picochlorum SENEW3. This includes genes involved in the

SNARE complex and processing of proteins in the endoplasmic reticulum (ER) (Fig.

3.S6). Another gene that is up-regulated, plant synaptotagmin, is thought be involved in

69 membrane repair by sensing an influx of extracellular calcium during abiotic stress response, including salinity (168, 169). Several more genes encoding small Rab GTPase proteins are up-regulated at low salinity in Picochlorum SENEW3, and are implicated in vesicle trafficking and the high salt tolerance response in Arabidopsis (170). Membrane remodeling may be important in facilitating expansion of cellular compartments under hypoosmotic stress in Picochlorum SENEW3. Additionally, aquaporins, channels that facilitate water transport across the cell membrane, are significantly down-regulated under both treatments, potentially to reduce osmotic stress through water flux.

Other responses

Carotenoid synthesis is also up-regulated at high salinity at 1H. This response is also observed in Dunaliella under high salinity shock (41).

70

Figure 3.S1. Venn diagram showing the number of genes that are DE (shared and unique) when comparing high salinity and low salinity at 1h and 5h time points. The numbers under the labels indicate the total DE genes per comparison.

71

A. cluster_1 38 genes cluster_2 219 genes cluster_3 85 genes 3 3 3 2 2 2 1 1 1 0 0 0 −1 −1 −1 centered log2(vsd) centered log2(vsd) centered log2(vsd) −2 −2 −2 −3 −3 −3 0H 1H 5H 0H 1H 5H 0H 1H 5H

cluster_4 346 genes cluster_5 1000 genes cluster_6 120 genes 3 3 3 2 2 2 1 1 1 0 0 0 −1 −1 −1 centered log2(vsd) centered log2(vsd) centered log2(vsd) −2 −2 −2 −3 −3 −3 0H 1H 5H 0H 1H 5H 0H 1H 5H

cluster_7 937 genes cluster_8 173 genes cluster_9 192 genes 3 3 3 2 2 2 1 1 1 0 0 0 −1 −1 −1 centered log2(vsd) centered log2(vsd) centered log2(vsd) −2 −2 −2 −3 −3 −3 0H 1H 5H 0H 1H 5H 0H 1H 5H

72

B. cluster_1 336 genes cluster_2 281 genes cluster_3 356 genes cluster_4 299 genes 3 3 3 3 2 2 2 2 1 1 1 1 0 0 0 0 −1 −1 −1 −1 centered log2(vsd) centered log2(vsd) centered log2(vsd) centered log2(vsd) −2 −2 −2 −2 −3 −3 −3 −3 0H 1H 5H 0H 1H 5H 0H 1H 5H 0H 1H 5H

cluster_5 254 genes cluster_6 86 genes cluster_7 43 genes cluster_8 40 genes 3 3 3 3 2 2 2 2 1 1 1 1 0 0 0 0 −1 −1 −1 −1 centered log2(vsd) centered log2(vsd) centered log2(vsd) centered log2(vsd) −2 −2 −2 −2 −3 −3 −3 −3 0H 1H 5H 0H 1H 5H 0H 1H 5H 0H 1H 5H

cluster_9 198 genes cluster_10 46 genes cluster_11 649 genes 3 3 3 2 2 2 1 1 1 0 0 0 −1 −1 −1 centered log2(vsd) centered log2(vsd) centered log2(vsd) −2 −2 −2 −3 −3 −3 0H 1H 5H 0H 1H 5H 0H 1H 5H

Figure 3.S2. Gene expression patterns over the time course (data centered) at (A) high salinity and (B) low salinity. The 0h time point is the pre-stressed control. These analyses show genes that are DE in at least one time point. Panel (A) shows DE genes falling into clusters (5,7,9) with gene expression patterns returning to within +/- 1 L2fc of the control value, and panel (B) shows 1,3,4,9 with this pattern.

73

A.

B.

C.

74

Figure 3.S3. KEGG metabolic maps comparing the low and high salinity stress response at 1h; revealing little overlap in gene responses (blue). Green: low salinity, red: high salinity, blue: both. (A) Background expression showing all expressed genes (DE and not DE). (B) Up-regulated (C) Down-regulated.

1.0 log2fold

0.020 0.015 Density

0.010 Figure 3.S4. Example of randomization

0.005 analysis simulating gene clustering of 0.000

420 440 460 480 500 520 540 560 same-sized data (N=1000). The red line

1.5 log2fold represents the number of genes co- 0.03 localized in clusters, based on actual data, 0.02

Density under high salinity at 1h at various gene 0.01 expression change cutoffs. Results in top 0.00

80 100 120 140 160 180 5% were taken as significant results (blue

2.0 log2fold line). 0.05

0.04 0.03 Density 0.02 0.01 0.00

20 30 40 50 60 Number of co-localized genes

75

A.

B.

76

Figure 3.S5. KEGG pathway analysis of genes involved in the TCA cycle under (A) high salinity and (B) low salinity. Up-regulated genes are in green, and down-regulated genes are in red.

A.

77

B.

Figure 3.S6. KEGG pathway analysis of genes involved in protein processing in the endoplasmic reticulum under (A) high salinity and (B) low salinity. Up-regulated genes are in green, and down-regulated genes are in red.

78

Figure 3.S7. Effect of salinity on quality factor (QF) over 24 hours for cells initially grown in 1M NaCl incubated in media at various salinities. QF describes the Kok fitting paramaters, alpha (misses) and beta (double hits), of Fv/Fm data from figure 3.1. Higher

QF indicates higher efficiency of WOC cycling.

Table 3.S1. Number of RNA-seq reads in each experiment.

Salinity Time (Hours) Read length Number of Reads 1.0M NaCl 0 2x80 (2) and 1x160 (1) 25,723,152 10mM NaCl 1 2x80 (2) and 1x160 (1) 32,040,665 10mM NaCl 5 2x80 (2) and 1x160 (1) 21,095,104 1.5M NaCl 1 1x160 (3) 13,231,733 1.5M NaCl 5 1x160 (3) 13,745,130

Table 3.S2. Co-localized genes in cluster in various gene sets. The highlighted cells represent statistically significant results.

1.5M 0.01M Time 1H 5H 1H 5H Up/Down,regulated Up Down up down up down up down Cut,off,(log2fold) 1 1.5 1 1.5 1 1.5 1 1.5 1 1.5 1 1.5 1 1.5 1 1.5 Number,of,clusters 204 74 304 132 100 30 47 4 196 79 240 106 109 36 61 13 Number,of,genes,in,clusters 546 174 890 331 225 65 103 7 494 170 628 234 266 79 142 32 Total,genes,in,data,set 886 433 1231 672 531 242 359 100 857 448 955 531 574 262 416 155

79

Table 3.S3. Examples of co-localized clusters allowing for two intervening genes that do not follow the expression pattern. Highlighted in yellow are conditions under which clusters meet co-localization criteria. Blue indicates genes that are part of a cluster, red denotes intervening genes. Also included are orthologs in Chlorella vulgaris and

Coccomyxa subellipsoidea. Full clustering information available in Excel file Table 3.S8.

1.5M 1.5H 10mM 10mM 1h 5h 1h 5h expre expres express expres Picochlorum ssion sion ion sion Coccomyxa gene gene number Annotation (L2fc) (L2fc) (L2fc) (L2fc) Chlorella gene number number Cluster 1: Photorespirati on contig_97.g39 serine glyoxylate Chr_1_10098;Chr_1_10 Scf_6_10358;Scf_6 9.t2 aminotransferase 4.464 NDE NDE -0.827 098; _10358; contig_97.g40 Chr_6_10841;Chr_6_10 Scf_6_10356;Scf_6 1.t1 blue light receptor 0.836 -0.420 NDE NDE 841; _10356; contig_97.g40 glycolate Chr_3_11048;Chr_3_11 Scf_17_10093;Scf_ 0.t1 dehydrogenase 3.942 NDE NDE NDE 048; 7_10437; contig_97.g40 hydroxypyruvate Chr_6_10974;Chr_6_10 Scf_6_10359;Scf_6 2.t1 reductase 3.773 NDE NDE NDE 974; _10359; Custer 2: Photorespirati on NDE mitochondrial contig_97.g40 substrate carrier Chr_6_10973;Chr_6_10 Scf_6_10354;Scf_6 8.t1 family protein 5.909 1.277 NDE 0.738 973; _10354; contig_97.g40 serine-pyruvate Chr_6_10958;Chr_6_10 Scf_6_10355;Scf_6 9.t1 aminotransferase 3.430 NDE 0.911 -0.703 958; _10355; contig_97.g41 alanine Chr_10_10748;Chr_6_1 Scf_6_10357;Scf_6 0.t1 aminotransferase 2.432 -0.501 -2.514 1.257 0756; _10357; Cluster 3: nitrogen assimilation #REF! #REF! #REF! #REF! molybdopterin contig_16.g86 biosynthesis protein Chr_13_10831;Chr_13_ Scf_12_10284;Scf_ .t1 cnx2 1.398 NDE -2.976 -0.676 10831; 12_10284; contig_16.g87 sulfate/molybdate Chr_4_10041;Chr_4_10 Scf_11_10121;Scf_ .t1 transporter NDE NDE -3.193 0.715 041; 11_10121; contig_16.g88 ferredoxin--nitrite Chr_9_10511;Chr_9_10 Scf_11_10138;Scf_ .t1 reductase 1.675 NDE -2.285 1.056 511; 11_10138; contig_16.g89 high affinity nitrate Scf_11_10112;Scf_ .t1 transporter 1.318 NDE -1.094 0.568 11_10112; contig_16.g90 high-affinity nitrate Chr_9_10512;Chr_9_10 Scf_7_10422;Scf_7 .t1 transporter 2.1 1.604 1.134 -2.087 -0.579 512; _10422; contig_16.g91 Chr_9_10515;Chr_9_10 Scf_11_10137;Scf_ .t1 nitrate reductase 2.215 1.559 -2.173 1.942 515; 11_10137; Cluster 4: urea assimilation #REF! #REF! #REF! #REF! contig_54.g17 urea:Na+ high Chr_8_10044;Chr_17_1 Scf_15_10193;Scf_ 7.t1 affinity transporter 2.169 NDE -1.020 -1.308 0158; 15_10193; contig_54.g17 Chr_8_10041;Chr_8_10 Scf_20_10117;Scf_ 8.t1 urea carboxylase 2.654 -1.664 -4.369 0.707 041; 20_10117; Cluster 5: light regulated genes #REF! #REF! #REF! #REF!

80 contig_104.g6 early light induced Chr_2_11233;Chr_2_11 Scf_2_10727;Scf_2 12.t1 chloroplast precursor 3.194 1.949 2.439 NDE 233; _10727; contig_104.g6 13.t1 ---NA--- NDE NDE NDE NDE contig_104.g6 Chr_2_11233;Chr_2_11 Scf_9_10120;Scf_9 14.t1 at3g22840 mwi23_21 4.494 3.188 NDE 1.140 233; _10120; contig_104.g6 Chr_6_11283;Chr_6_11 Scf_10_10309;Scf_ 15.t1 protein 1.630 0.532 -0.720 1.106 283; 10_10309; Cluster 6 #REF! #REF! #REF! #REF! contig_144.g7 proteasome sCrscaffold_79_10004;s Scf_12_10000;Scf_ 41.t1 component ecm29 -3.801 0.946 1.156 -0.833 Crscaffold_79_10004; 12_10000; major facilitator contig_144.g7 superfamily Chr_5_10049;Chr_5_10 Scf_18_10267;Scf_ 42.t1 transporter -2.647 0.953 NDE -0.573 049; 18_10267; contig_144.g7 cytochrome c Chr_2_11422;Chr_2_11 Scf_11_10030;Scf_ 43.t1 peroxidase -1.829 NDE -0.813 -0.459 422; 11_10030; contig_144.g7 l-ascorbate Chr_6_10765;Chr_6_10 Scf_18_10219;Scf_ 44.t1 peroxidase -1.294 NDE NDE -0.840 765; 18_10219; contig_144.g7 Chr_12_10814;Chr_3_1 Scf_3_10493;Scf_3 45.t1 protein -1.816 0.975 NDE -1.445 0334; _10493; contig_144.g7 5 -3 exoribonuclease Chr_3_10095;Chr_3_10 Scf_18_10275;Scf_ 46.t1 2 -1.227 -0.588 NDE NDE 095; 18_10275; contig_144.g7 signal peptidase 18k Chr_9_10440;Chr_9_10 Scf_18_10260;Scf_ 47.t1 chain -0.543 NDE NDE NDE 440; 18_10260; contig_144.g7 48.t1 protein -1.297 NDE 0.832 2.400 contig_144.g7 phosphatidylserine Chr_2_10999;Chr_2_10 Scf_25_10064;Scf_ 49.t1 decarboxylase -1.392 NDE -0.627 NDE 999; 25_10064; Cluster 7 #REF! #REF! #REF! #REF! contig_128.g7 Chr_1_10334;Chr_1_10 Scf_13_10207;Scf_ 15.t1 protein ilityhia NDE NDE 1.151 NDE 334; 13_10207; contig_128.g7 betrophin chloride Chr_7_10892;Chr_6_10 Scf_5_10036;Scf_5 16.t1 channel domain -2.050 NDE 1.405 -3.297 602; _10036; contig_128.g7 Chr_7_10470;Chr_7_10 Scf_15_10079;Scf_ 17.t1 ferredoxin NDE NDE 1.925 1.030 470; 15_10079; contig_128.g7 Chr_6_10570;Chr_17_1 Scf_15_10233;Scf_ 18.t1 histone h2a NDE NDE 2.072 0.431 0331; 15_10233; Cluster 8 #REF! #REF! #REF! #REF! contig_87.g25 Chr_5_10358;Chr_5_10 Scf_2_10637;Scf_2 5.t1 cathepsin l precursor NDE NDE NDE 1.570 358; _10637; contig_87.g25 had-superfamily Chr_5_10123;Chr_5_10 Scf_16_10032;Scf_ 6.t1 subfamily variant 3 -2.285 NDE 1.331 NDE 123; 11_10221; contig_87.g25 trehalose phosphate Chr_13_10768;Chr_13_ Scf_1_10644;Scf_1 7.t1 synthase NDE -0.958 -0.552 NDE 10768; _10644; contig_87.g25 8.t1 ---NA--- NDE 3.191 2.658 NDE contig_87.g25 ornithine Chr_3_10250;Chr_1_11 Scf_24_10079;Scf_ 9.t1 decarboxylase -0.854 2.544 3.268 -0.390 440; 24_10079; haloacid contig_87.g26 dehalogenase-like Chr_6_10071;Chr_6_10 Scf_1_10633;Scf_1 0.t1 hydrolase NDE NDE 1.931 NDE 071; _10633; contig_87.g26 proline Chr_1_10758;Chr_16_1 Scf_1_10522;Scf_1 1.t1 dehydrogenase -0.842 -1.343 -1.802 -1.080 0014; _10522; contig_87.g26 vacuolar atp synthase Chr_6_10187;Chr_6_10 Scf_1_10521;Scf_1 2.t1 proteolipid subunit NDE NDE NDE -0.571 187; _10521; contig_87.g26 secreted trypsin-like Scf_3_10400;Scf_3 3.t1 serine protease 1.251 1.342 3.226 2.449 _10400; phosphatidic acid contig_87.g26 phosphatase-like Chr_6_10022;Chr_6_10 Scf_1_10520;Scf_1 4.t1 protein 1.210 NDE NDE NDE 022; _10520; contig_87.g26 Chr_3_11034;Chr_3_10 5.t1 glycosyl transferase NDE 0.541 3.187 1.790 732; contig_87.g26 harpin binding 6.t1 protein 1 0.992 NDE NDE 1.270 sCrscaffold_38_10018;sCrscaffold_38_10018; contig_87.g26 nucleic acid binding Chr_1_10665;Chr_1_10 Scf_1_10597;Scf_1 7.t1 protein NDE NDE NDE NDE 665; _10597; contig_87.g26 8.t1 ---NA--- 1.766 NDE 1.383 -1.541

81 contig_87.g26 30s ribosomal protein Chr_6_10323;Chr_6_10 Scf_1_10494;Scf_1 9.t1 s15 3.280 NDE 1.535 NDE 323; _10494; Cluster 9 contig_85.g68 Chr_13_10630;Chr_13_ Scf_13_10305;Scf_ 7.t1 protein NDE NDE 1.375 NDE 10630; 13_10305; short-chain contig_85.g68 dehydrogenase sCrscaffold_27_10023; Scf_2_10686;Scf_2 8.t1 reductase sdr 0.674 NDE -1.714 0.441 Chr_16_10566; _10686; contig_85.g68 arginine n- Chr_1_11037;Chr_1_11 Scf_3_10449;Scf_3 9.t1 methyltransferase 1.102 0.749 1.905 NDE 037; _10449; contig_85.g69 0.t1 ---NA--- 1.479 1.211 1.865 NDE contig_85.g69 1.t1 ---NA--- NDE NDE NDE -0.723 contig_85.g69 vesicle-associated Chr_3_11003;Chr_3_11 Scf_23_10137;Scf_ 2.t1 protein 1-2-like -0.713 NDE 1.175 -1.470 003; 23_10137; contig_85.g69 3.t1 ---NA--- -2.335 NDE -1.133 NDE major facilitator contig_85.g69 superfamily Chr_2_11351;Chr_2_11 Scf_13_10293;Scf_ 4.t1 expressed -2.218 NDE 0.793 -1.007 351; 13_10293; contig_85.g69 Chr_11_10244;Chr_11_ Scf_13_10231;Scf_ 5.t1 protein -1.703 1.340 2.458 3.450 10244; 13_10231; contig_85.g69 sCrscaffold_31_10006;s Scf_13_10229;Scf_ 6.t1 protein NDE NDE -0.991 NDE Crscaffold_31_10006; 13_10229; contig_85.g69 glutamine Chr_11_10193;Chr_11_ Scf_18_10183;Scf_ 7.t1 cyclotransferase NDE 0.746 1.338 -3.234 10193; 18_10183; Cluster 10 contig_4.g156 nucleoside Chr_16_10068;Chr_16_ Scf_18_10199;Scf_ .t1 diphosphate kinase NDE NDE -1.250 0.479 10068; 18_10199; contig_4.g157 acetylornithine Chr_13_10710;Chr_13_ Scf_1_10211;Scf_1 .t1 aminotransferase NDE -0.910 -0.655 -0.605 10710; _10211; hydroxyproline-rich contig_4.g158 glycoprotein family Chr_1_11334;Chr_1_11 Scf_2_10567;Scf_2 .t1 protein NDE NDE -0.461 -0.946 334; _10567; contig_4.g159 abc transporter g Chr_12_11329;Chr_12_ Scf_5_10335;Scf_5 .t1 family member 7 -1.290 NDE -1.599 1.928 11329; _10335; contig_4.g160 major lipid droplet Chr_9_10415;Chr_9_10 Scf_5_10352;Scf_5 .t1 protein 0.852 1.072 2.003 1.733 415; _10352; contig_4.g161 Chr_1_10342;Chr_1_10 Scf_2_10562;Scf_2 .t1 translation factor-like -0.925 -1.129 -1.485 NDE 342; _10562; contig_4.g162 amino acid transport Scf_10_10102;Scf_ .t1 protein aap2 1.722 NDE -3.429 -0.694 10_10102; contig_4.g163 Chr_7_10922;Chr_7_10 .t1 protein -3.361 NDE -1.161 -0.635 922; contig_4.g164 Chr_16_10890;Chr_16_ Scf_10_10008;Scf_ .t1 protein 0.611 NDE NDE NDE 10890; 10_10008; craniofacial contig_4.g165 development protein Chr_14_10501;Chr_14_ Scf_10_10007;Scf_ .t1 1-like NDE -0.764 -1.470 0.864 10501; 10_10007;

82

Table 3.S4. Gene expression and predicted targeting for genes in pathways involved in salt stress. Accompanying table for Figure 3.3. E.C. number, enzyme commission number; TC number, transporter classification number; NDE, not differentially expressed; C,chloroplast;M,mitochondria;S,signal peptide.

# 1.5M 1.5H 10mM 10mM in E.C./TC 1h 5h 1h 5h Targ- fig. Enzyme number Pathway Gene ID (L2fc) (L2fc) (L2fc) (L2fc) eting Photorespi- ribulose-bisphosphate rtion; Calvin contig_124 1 carboxylase small subunit cycle .g739.t1 1.778 NDE 0.944 NDE C contig_58. g145.t1 1.131 0.736 NDE -0.510 C phosphoglycolate P hotorespir- contig_43. 2 phosphatase ation g127.t1 4.398 0.420 NDE -0.856 _ glycolate Photorespi- contig_97. 3 dehydrogenase ration g400.t1 3.942 NDE NDE NDE M glutamate- glyoxylate Photorespir- contig_180 4 aminotransferase 2.6.1.2 ation .g875.t1 5.142 1.815 -0.773 -1.753 _ contig_97. g410.t1 2.432 -0.501 -2.514 1.257 _ P hotorespir- glycine ation; THF contig_194 5 decarboxylase 2.1.2.1 cycle .g795.t1 5.144 1.987 0.567 2.415 C serine-glyoxylate Photorespir- contig_97. 6 aminotransferase ation g399.t2 4.464 NDE NDE -0.827 M contig_97. g409.t1 3.430 NDE 0.911 -0.703 M hydroxypyruvate P hotorespir- contig_97. 7 reductase 1.1.1.81 ation g402.t1 3.773 NDE NDE NDE M Photorespir- 8 glycerate kinase ation not found NDE NDE NDE NDE NDE contig_329 9 rubisco activase Calvin cycle .g728.t1 3.439 0.799 NDE 1.119 C phosphoglycerate contig_142 10 kinase 2.7.2.3 Calvin cycle .g606.t1 1.278 NDE NDE NDE C glyceraldehyde-3- phosphate contig_126 11 dehydrogenase 1.2.1.12 Calvin cycle .g527.t1 -1.484 NDE NDE NDE M contig_69. Calvin cycle g562.t1 1.689 NDE 0.592 1.925 C triosephosphate contig_187 12 isomerase 5.3.1.1 Calvin cycle .g830.t1 -2.747 NDE -0.458 NDE _ contig_45. g328.t1 NDE -0.882 -0.580 -3.685 C fructose - bisphosphate contig_141 13 aldolase 4.1.2.13 Calvin cycle .g848.t1 1.587 NDE 0.959 0.848 C contig_58. g141.t1 1.544 NDE NDE -0.690 C

83

contig_29. g120.t1 -0.764 NDE -0.776 -0.678 _ contig_201 .g653.t1 NDE NDE NDE NDE _ fructose -1,6- contig_104 14 bisphosphatase I 3.1.3.11 Calvin cycle .g638.t1 NDE -0.811 0.324 -1.984 C contig_58. g98.t1 NDE -1.082 -1.152 0.296 _ contig_180 15 transketolase 2.2.1.1 Calvin cycle .g864.t1 1.313 -0.510 0.582 -1.466 C sedoheptulose-1,7- contig_58. 16 bisphosphatase 3.1.3.37 Calvin cycle g212.t1 -1.834 NDE NDE NDE _ contig_177 .g561.t1 0.889 -0.887 0.591 0.742 C ribose 5-phosphate contig_155 17 isomerase A 5.3.1.6 Calvin cycle .g681.t1 NDE -0.913 -0.541 0.952 M Phosphoribulokin- contig_16. 18 ase 2.7.1.19 Calvin cycle g155.t1 0.615 NDE NDE -2.128 C glycine contig_194 19 dehydrogenase 1.4.4.2 THF cycle .g795.t1 5.144 1.987 0.567 2.415 C 10- formyltetrahydrofo contig_201 20 late synthetase 6.3.4.3 THF cycle .g635.t1 3.168 NDE NDE 0.433 _ glutamine Nitrogen contig_240 21 synthetase 6.3.1.2 metabolism .g949.t1 0.919 -1.539 -0.946 -0.654 M contig_135 .g465.t1 1.141 NDE -2.777 1.628 _ 1.4.1.13/ Nitrogen contig_97. 22 glutamate synthase 1.4.1.14 metabolism g403.t1 -1.593 0.747 -1.922 -2.177 _ Nitrogen contig_68. 1.4.7.1 metabolism g372.t1 3.290 NDE -1.188 NDE C glutamate Nitrogen contig_82. 23 dehydrogenase 1.4.1.4 metabolism g338.t1 -3.102 -1.142 -0.937 NDE _ contig_57. 1.4.1.3 g156.t1 -2.206 -1.475 -1.234 0.578 M contig_203 1.4.1.3 .g874.t1 1.203 NDE -1.279 -0.962 _ ferredoxin -nitrite Nitrogen contig_16. 24 reductase 1.7.7.1 metabolism g88.t1 1.675 NDE -2.285 1.056 M nitrite transporter contig_150 25 NAR1-like .g639.t1 3.224 1.371 0.676 NDE M 1.7.1.1/1.7 .1.2/1.7.1. Nitrogen contig_16. 26 nitrate reductase 3 metabolism g91.t1 2.215 1.559 -2.173 1.942 _ high affinity nitrate transporter Nitrogen contig_16. 27 NTR2.1-like 2.A.1.8.12 metabolism g90.t1 1.604 1.134 -2.087 -0.579 _ high affinity nitrate transporter contig_16. NAR2.1-like 8.A.20.3.1 g89.t1 1.318 NDE -1.094 0.568 S urea:Na+ high Urea contig_54. 28 affinity transporter 2.A.21.6.2 assimilation g177.t1 2.169 NDE -1.020 -1.308 _ Urea contig_54. 29 urea carboxylase 6.3.4.6 assimilation g178.t1 2.654 -1.664 -4.369 0.707 _ allophanate Urea contig_54. 30 hydrolase 3.5.1.54 assimilation g179.t1 NDE NDE NDE NDE M phosphoglucomuta Starch contig_44. 31 se 5.4.2.2 metabolism g332.t1 2.405 0.824 0.993 1.056 C ADP-glucose Starch contig_169 32 pyrophosphorylase 2.7.7.27 metabolism .g538.t1 1.221 1.834 1.656 NDE C

84

contig_758 .g788.t1 NDE NDE NDE NDE M contig_144 .g761.t1 0.480 NDE -0.602 NDE C contig_66. g284.t1 NDE NDE NDE 1.081 M soluble starch Starch contig_205 33 synthase III 2.4.1.21 metabolism .g899.t1 1.852 NDE -0.569 0.811 C contig_187 .g808.t1 NDE NDE NDE NDE C granule bound Starch contig_152 34 starch synthase I 2.4.1.242 metabolism .g808.t1 4.331 4.385 2.750 -0.997 C contig_169 .g552.t1 NDE -1.233 -0.421 -1.036 C contig_84. g479.t1 0.985 NDE -0.594 -2.497 M 1,4 -alpha-glucan Starch contig_125 35 branching enzyme 2.4.1.18 metabolism .g810.t1 1.467 NDE -0.310 -1.505 M contig_146 .g649.t1 -0.756 NDE -0.405 2.197 C glucan water Starch contig_134 36 dikinase 2.7.9.4 metabolism .g589.t1 -2.456 NDE -0.627 NDE _ contig_58. g234.t1 NDE NDE -0.411 NDE M phophoglucan Starch contig_196 37 water dikinase 2.7.9.5 metabolism .g901.t1 NDE NDE -1.291 1.872 M Starch contig_139 38 beta-amylase 3.2.1.2 metabolism .g662.t1 NDE -0.612 NDE NDE M contig_196 .g908.t1 0.457 -0.422 -0.866 -1.600 _ Starch contig_39. 39 isoamylase 3.2.1.68 metabolism g62.t1 0.431 -1.565 -0.342 -3.190 M contig_45. g478.t1 -1.863 NDE -2.104 NDE M contig_150 .g638.t1 -0.567 NDE -0.738 0.807 _ disproportionating Starch contig_43. 40 enzyme 2 2.4.1.25 metabolism g172.t1 NDE NDE -2.598 0.541 _ disproportionating Starch contig 41 enzyme 1 2.4.1.25 metabolism 57.g261.t1 NDE NDE NDE NDE NDE Starch contig_96. 42 alpha-amylase 3.2.1.1 metabolism g415.t1 -0.691 NDE -1.059 NDE M contig_125 .g820.t1 -2.490 -1.361 -4.549 0.411 C contig_16. g159.t1 NDE NDE -0.733 NDE S starch Starch contig_17. 43 phosphorylase 2.4.1.1.1 metabolism g11.t1 -1.099 0.891 NDE NDE M contig_64. g296.t1 -1.574 0.758 -0.859 0.513 C glycerol 3- phosphate 1.1.1.8/1.1 Glycerol contig_29. 44 dehydrogenase .1.94 metabolsim g64.t1 NDE -1.381 NDE NDE _ contig_62. g244.t1 NDE NDE NDE 0.648 M glycerol 3- phosphate Glycerol 45 phosphatase 3.1.3.21 metabolsim not found NDE NDE NDE NDE NDE Glycerol contig_140 46 glycerol kinase 2.7.1.30 metabolsim .g808.t1 -0.544 -0.662 -0.960 -0.615 _

85

dihydroxyacetone Glycerol contig_54. 47 kinase 2.7.1.29 metabolsim g92.t1 -1.375 0.752 NDE NDE _ glycerol Glycerol contig_91. 48 dehydrogenase 1.1.1.6 metabolsim g462.t1 NDE 1.003 -1.524 -2.571 M possible dihydroxyacetone Glycerol contig_307 49 reductase 1.1.1.156 metabolsim .g934.t1 3.635 NDE NDE 0.878 _ tehalose phosphate Trehalose contig_87. 50 synthase 2.4.1.15 synthesis g257.t1 NDE -0.958 -0.552 NDE _ contig_139 .g614.t1 NDE NDE NDE NDE _ trehalose 6- phosphate Trehalose contig_124 51 phosphatase 3.1.3.12 synthesis .g772.t1 1.811 NDE NDE NDE _ Cellulose contig_58. 52 cellulose synthase 2.4.1.12 metabolism g204.t1 1.065 NDE NDE -1.114 _ Cellulose contig_66. 53 cellulase 3.2.1.4 metabolism g280.t1 1.919 NDE NDE NDE S UDP- sulfoquinovose SQDG contig_124 54 synthase 3.13.1.1 synthesis .g763.t1 1.234 NDE 0.477 -1.498 C sulfoquinovosyltra SQDG contig_62. 55 nsferase 2.4.1.- synthesis g265.t1 1.704 NDE NDE NDE _ chitin contig_69. 56 chitin synthase 2.4.1.16 metabolism g530.t1 1.456 NDE 1.588 -1.075 _ chitin contig_57. 57 chitinase 3.2.1.14 metabolism g176.t1 1.324 NDE 1.711 -0.949 S beta-N- acetylhexosaminid chitin contig_43. 58 ase 3.2.1.52 metabolism g170.t1 -1.138 NDE -0.958 NDE S N- acetylglucosamine chitin 59 kinase 2.7.1.59 metabolism not found NDE NDE NDE NDE NDE phosphoacetylgluc chitin contig_180 60 osamine mutase 5.4.2.3 metabolism .g865.t1 NDE NDE -0.561 NDE _ UDP-N- chitin contig_4.g 61 acetylglucosamine 2.7.7.23 metabolism 106.t1 -2.484 NDE NDE NDE _ contig_163 62 aquaporin 1.A.8.8.1 transporter .g793.t1 -4.077 -3.818 -2.677 -0.628 _ oxidative glutathione stress contig_64. 63 peroxidase 1.11.1.9 response g297.t1 1.350 1.169 NDE NDE _ oxidative glutathione stress contig_69. 63 reductase 1.8.1.7 response g553.t1 -1.060 NDE 0.957 -0.482 _ contig_41. 64 carbonic anhydrase Calvin cycle g125.t1 NDE NDE NDE 0.710 M contig_55. g188.t1 4.962 1.768 NDE -0.558 _ carotenoid contig_141 65 phyotene synthase 2.5.1.32 biosynthesis .g849.t1 1.398 -1.684 NDE 1.905 C zeta-carotene carotenoid contig_34. 66 isomerase 5.2.1.12 biosynthesis g78.t1 1.032 NDE NDE NDE M 15-cis-phyotene carotenoid contig_150 67 desaturase 1.3.5.5 biosynthesis .g646.t1 1.400 -0.977 0.751 2.779 M zeta-carotene carotenoid contig_58. 68 desaturase 1.3.5.6 biosynthesis g224.t1 2.081 -0.437 NDE 0.859 M lycopene beta carotenoid contig_45. 69 cyclase 5.5.1.19 biosynthesis g484.t2 0.930 -1.117 1.417 NDE C beta-ring carotenoid contig_146 70 hydroxylase 1.14.-.- biosynthesis .g617.t1 NDE -0.964 -1.772 -0.432 M

86

carotene epsilon- carotenoid contig_18. 71 monoxygenase 1.14.99.45 biosynthesis g35.t1 0.586 -0.852 -1.106 1.307 _ proline contig_150 72 glutamate 5-kinase 2.7.2.11 synthesis .g671.t1 -0.795 NDE -0.628 -2.027 _ contig_100 .g607.t1 0.513 NDE NDE -1.960 _ glutatmate -5- semialdehyde proline contig_30. 73 dehydrogenase 1.2.1.41 synthesis g21.t1 NDE NDE NDE NDE M pyrroline-5- carboxylate proline contig_135 74 reductase 1.5.1.2 synthesis .g458.t1 1.482 NDE -1.397 -2.048 _ contig_84. g431.t1 -2.145 -0.865 -1.389 1.318 M

Table 3.S5. Differentially expressed genes of bacterial origin. NDE; not differentially expressed. L2fc; log2fold change.

0.01M 0.01M 1.5M 1.5M NaCl NaCl NaCl NaCl 1h 5h 1h 5h Contig ID Annotation (L2fc) (L2fc) (L2fc) (L2fc) indolepyruvate - contig_43.g357.t1 decarboxylase 3.450 3.006 1.316 1.284

87

Trypsin-like serine contig_107.g527.t1 protease 2.946 1.460 2.909 1.265 glycosyl transferase contig_231.g589.t1 family protein 1.855 NDE NDE NDE contig_114.g490.t1 O-methyltransferase 1.014 NDE NDE NDE contig_82.g397.t1 glutamylendopeptidase -3.334 -3.314 NDE NDE sheath polysaccharide- contig_201.g667.t1 degrading enzyme -1.923 NDE NDE NDE contig_196.g914.t1 Peptidase M13(PepO) -1.801 NDE NDE NDE glycerol contig_91.g462.t1 dehydrogenase -1.524 1.494 NDE 1.003 glycosyltransferase glycosyltransferase o- contig_101.g759.t1 methyltransferase -1.197 NDE 1.187 NDE contig_96.g379.t1 sulfatase 2.302 NDE NDE NDE sheath polysaccharide- contig_28.g272.t1 degrading enzyme NDE NDE 3.343 3.169 contig_166.g707.t1 rmt2 protein NDE NDE 1.304 NDE dna methylase n-4 n-6 - contig_126.g526.t1 domain protein NDE NDE 1.315 NDE

Table 3.S6. Excel file of DE genes. Highlighted cells (orange) denote HGT.

Table 3.S7. Excel file of shared differentially expressed genes between the 4 conditions tested (high and low salinity at 1h and 5h)

Table 3.S8. Excel file of all co-expressed co-localized genes.

88

Chapter 4: Characterization of multiple Picochlorum genomes to

elucidate the origin of salt tolerance.

Fatima Foflonker1, Devin Mollegard2, Meichin Ong2, Hwan Su Yoon3, and Debashish

Bhattacharya1

1Department of Biochemistry and Microbiology, Rutgers, The State University of New

Jersey, New Brunswick, NJ 08901, USA.

2School of Environmental and Biological Sciences, Rutgers, The State University of New

Jersey, New Brunswick, NJ 08901, USA.

3Department of Biological Sciences, Sungkyunkwan University, Suwon 16419 Korea

Abstract

Understanding how microalgae adapt to rapidly changing environments is important not only to basic science but also to illuminate the potential impact of climate change on the biology of critical primary producers. Here we used the green algal genus, Picochlorum, as a model to elucidate strategies of environmental adaptation. This taxon has highly reduced and specialized nuclear genomes. We found that coordinated regulation of the gene inventory is utilized in adaptation to salinity stress, in combination with gene loss and gene family expansion. Relying on robust genome assemblies from multiple

Picochlorum species, we determined the extent of horizontal gene transfer (HGT) from prokaryotic sources and their role in the origin of novel functions. HGT is an ongoing and

89 dynamic process in this algal clade with transfer, divergence, and loss playing key roles in adaptation. Genes of bacterial origin ‘move’ within Picochlorum genomes, suggesting that HGT-derived sequences may impact genome rearrangement. Large differences in levels of heterozygosity were found in diploid haplotypes among Picochlorum species.

Biallelic divergence was pronounced in P. oklahomensis (salt plains environment) when compared to its closely related sister strain Picochlorum SENEW3 (brackish water environment); allele-specific expression data under salinity shock suggests a role in response to environmental stress. In summary, we provide fundamental insights into how microbial eukaryotes with limited gene inventories expand their habitat range from mesophilic to halophilic by evolving coordinated gene regulation and allelic diversity, with minor but important contributions made by the acquisition of foreign prokaryotic genes.

Introduction

Adaptive evolution is a process by which natural variation in populations under divergent selection from the environment results in niche expansion, local specialization, speciation

(171, 172). Understanding environmentally directed evolution is of increasing importance

(173, 174) because a fluctuating climate can impact foundational primary producers such as microalgae (175-179). More broadly, understanding how traits such as halotolerance evolve is important in the face of salinizing soils and water sources that will impact agricultural crops and food security in the coming years (180, 181). Here we study habitat-driven adaptive differentiation of the green algal genus, Picochlorum

(Chlorophyta, Trebouxiophyceae). This halotolerant clade evolved from freshwater

90 ancestors, thereby providing a model for elucidating the origin of salinity tolerance and other stress adaptations (49).

We performed a comparative genome analysis of five Picochlorum species that represent a diversity of halotolerance capacities within the genus. Cells of the unicellular genus are small (1.5-3µm in diameter), coccoid, and contain a single chloroplast and mitochondrion (182). Picochlorum species have been studied for their potential application in wastewater remediation (183, 184), biomass production (46, 48, 134, 185-

189), and as aquaculture feedstock (190, 191). One species investigated in this study,

Picochlorum soleocismus, has been found to be amenable to genetic manipulation in order to increase lipid production (48).

The nuclear genome of Picochlorum SENEW3 was previously sequenced using

Illumina short-read technology and found to be highly reduced in size (13.5 Mbp), resulting in interesting features, including ‘gene neighborhoods’ of coding regions that are co-expressed under salt stress, potentially as a response to selective constraints imposed by fluctuating environmental conditions (26). This species was also highly robust in terms of salinity and high light stress and adaptations included horizontal gene transfer (HGT) from prokaryotic sources of genes related to stress responses, the clustering of functionally related genes, and a robust photosystem II water oxidation complex (26, 132).

In the past, many green algae selected for draft genome sequencing were haploid dominant to ameliorate issues with DNA polymorphisms that interfere with the assembly process. Due to inherent limitations of current sequencing technologies, some genome assemblies may not accurately represent the ploidy of the organism, including

91

Picochlorum SENEW3 (132). This is a result of the loss of haplotype phasing information when using short-read (e.g., Illumina) data to assemble the reads. In cases of high haplotype diversity, this can result in a patchwork consensus assembly that represents either haplotype at a given polymorphic site (SNP). For this reason, phase- aware assemblers that rely on long-read data (e.g., PacBio) may reveal the existence of more diploid algal species than has been previously reported, as will be shown in our analysis.

Through an integrated genomic and transcriptomic analysis of the halotolerant

Picochlorum lineage, with a focus on two sister species, we address the consequences of habitat-driven adaptation on halotolerance and other key stress adaptations by analyzing physiology, genome architecture, genome evolution, and transcriptional regulation. We address the question of how species with reduced genomes utilize, modify, or adjust regulation of existing gene inventories to adapt to new environmental challenges. With the limited number of HGT analyses of eukaryotic genomes that are currently available, there has been active debate regarding the extent and role of HGT in eukaryote evolution

(192). Using high quality genome information, we conducted a rigorous search for HGT and studied its role in niche adaptation in the Picochlorum lineage. We also demonstrated that haplotype diversity can be extensive in algal species and addressed the functional relevance of divergent alleles to stress adaptation.

92

Results

Physiology

Four Picochlorum strains of varying halotolerance sensitivities were used in this growth rate analysis. Picochlorum SENEW3 was isolated from a small permanent pond with seasonally variable salinity in the San Elijo Lagoon estuary in San Diego County,

California, USA (55). A closely related sister species, Picochlorum oklahomensis, was isolated from a temporary saline pool in the Great Salt Plains, Oklahoma, USA. The

Great Salt Plains also experiences high desiccation, UV irradiation, and diel temperature variation (182). Picochlorum NBRC102739 was originally isolated from the Pacific

Ocean, and P. oculata from brackish waters of the York River Estuary, Virginia, USA. P. oklahomensis, Picochlorum NBRC102739, and P. oculata were obtained from culture collections. Picochlorum soleocismus DOE 101 was not available in culture, but the genome was previously published and included in this analysis (48).

Salinity tolerance ranges were determined for four of the Picochlorum strains that we had in culture (excludes P. soleocismus) (Figure 4.1A). Picochlorum taxa maintain relatively robust growth rates over a wide range of salinities with Picochlorum SENEW3 and P. oklahomensis tolerating the highest salinities, making this genus a good model to study physiological acclimation. The two hypersaline, tolerant species, Picochlorum

SENEW3 and P. oklahomensis have similar growth rates at high salinities, but P. oklahomensis has a lower rate at low salinities. Both species grow at up to 1.2M NaCl in this experiment and up to 1.8M NaCl has been reported for Picochlorum SENEW3 and

150 psu for P. oklahomensis (132, 193). These results may reflect the intermittent freshwater exposure of P. oklahomensis in its hypersaline terrestrial salt flat environment

93 during brief periods of heavy rainfall, forming pools that dry up in a matter of days, whereas Picochlorum SENEW3 is exposed to seasonal salinity changes in a permanent pond with seasonal rainfall followed by gradual evaporation the remainder of the year

(55, 194). Henley et al., however were able to show near maximal growth of P. oklahomensis and P. oculata in freshwater (i.e., 0% NaCl) (194).

Genome features and assembly

Three different Picochlorum taxa were sequenced, and an improved genome assembly was generated for Picochlorum SENEW3. Picochlorum SENEW3 and P. oklahomensis were sequenced using the PacBio RSII system generating 1.4 and 3.7Gbp of data, respectively. The Illumina platform was used to sequence P. oculata (1.8Gbp) and

Picochlorum NBRC102739 (2.4Gbp). The assembler in CLC workbench and FALCON-

Unzip were used to assemble the Illumina and PacBio data, respectively. The genome sizes of the five Picochlorum taxa (excluding Picochlorum NBRC102739, see below) in this study range from 13.4 – 15.14Mbp, encoding 6,340 – 7,037 predicted genes (see

Table 4.1). The PacBio generated assembly of the Picochlorum SENEW3 genome resulted in a high quality 13.5Mbp nuclear genome assembly in 16 contigs with 7,014 predicted genes and an N50 = 1.1Mbp. This is a significant improvement over the previous lllumina generated assembly of 13.5Mbp in 1,266 contigs with 7,367 predicted genes and an N50 = 124.5kbp (132).

Picochlorum SENEW3 and P. oklahomensis were identified as diploid organisms, and the primary (haplotype) assemblies could be separated from heterozygous regions

(haplotigs) using FALCON-Unzip. The primary assembly sizes of P. oklahomensis

94

(13.47Mbp) and Picochlorum SENEW3 (13.36Mbp) contrasted to these haplotig assemblies that totaled 12.05Mbp and 0.46Mbp, respectively. The haplotigs of P. oklahomensis consist of 80 contigs mapping to 11.8Mbp of the primary assembly (87%) and encode 7,063 predicted proteins. Haplotig assembly sizes suggest that the P. oklahomensis genome is highly heterozygous compared to Picochlorum SENEW3, and indeed, variant detection results in 48-fold more single or multi-nucleotide variants per kbp in P. oklahomensis (Table 4.S1). Figure 4.S1 shows the distribution of variant frequencies around 50%, supporting the diploid nature of these genomes. The other

Picochlorum species (P. oculata, P. soleocismus) are likely to be diploid as well but have low heterozygosity, resulting in a smaller Illumina assembly, whereas the 22.76Mbp genome assembly of Picochlorum NBRC102739 with 12,018 genes may represent an overestimate resulting from high heterozygosity. The complexity introduced by highly heterozygous diploid genomes that are sequenced using short-read data leads to the assembly of polymorphic regions as separate contigs, resulting in many smaller contigs, and an over-estimation of the genome size and gene inventory (195).

To remove potential false duplications during the assembly process, a non- redundant gene set (<99% similarity) was used for the analyses in this paper resulting in

6,327 – 6,832 genes (excluding Picochlorum NBRC102739) (Table 4.1). When compared to the BUSCO core Eukaryota protein set, the Picochlorum assemblies contain between 82% (P. oklahomensis) to 93% (Picochlorum SENEW3) of the core proteins.

This is comparable to other green algal assemblies (Chorella variabilis, 90.4%;

Coccomyxa subellipsoidea, 93.4%) (Table 4.S2).

95

Phylogeny and genome synteny

A multi-protein maximum likelihood tree (Figure 4.2) constructed from a super- alignment of 1,122 proteins (293,805 amino acids) from 18 green algae reveals that

Picochlorum SENEW3 and P. oklahomensis are closely related sister species, as are P. oculata and P. soleocismus (boostrap support 100). This is reflected in their similar genome sizes, predicted gene numbers, and G + C contents for the two pairs of sister species.

To determine the percentage collinearity (i.e., number of collinear homolog pairs/ number of homolog pairs) among the Picochlorum species, the MCScanX_h program of the MCscanX toolkit was used with homologous groups determined by an OrthoMCL analysis (Blastp E-value cutoff < 1E-5) as the input. Picochlorum NBRC102739 was omitted from this analysis due to its lower quality assembly. This pairwise collinearity analysis (minimum block size = 5, maximum gaps = 5) revealed a range of values that ranged from 80.0% - 88.7%, with the highest collinearity occurring between the sister species pairs Picochlorum SENEW3 and P. oklahomensis (88.3%), and Picochlorum

DOE and P. oculata (88.7%) (Table 4.S3). For a full MCscanX synteny comparison for the five species, see SI Excel file 1. For comparison, collinearity between Picochlorum

SENEW3 and C. variabilis is 45.6% and between Picochlorum SENEW3 and C. subellipsoidea is 7.4%, indicating that synteny is well conserved within the Picochlorum clade. The synteny between P. oklahomensis and Picochlorum SENEW3 is shown in

Figure 4.S2 and consists of 24 collinear blocks between 7 and 746 genes in length

(average = 242 genes). This image reveals a region of 306 genes that is missing in P. oklahomensis, corresponding to part of contig 7 in Picochlorum SENEW3. These genes

96 are present in the three other species (SI Excel file 1), and thus these missing data are likely due to sequencing error. Synteny is highly conserved between the sister species P. oklahomensis and Picochlorum SENEW3, whereas some rearrangement is seen in P. soleocismus, a more distantly related species with a high-quality genome. Although the percentage of collinear homologous gene blocks between Picochlorum SENEW3 and P. oculata is 85.8%, Figure 4.S2C highlights the fragmented nature of Illumina assemblies, which makes synteny on a larger scale difficult to assess.

The duplicate gene classifier program of the MCScanX toolkit was used to classify redundant genes as segmental, tandem, proximal (<20 intervening genes), or dispersed duplications (Blastp cutoff 1e-10). Excluding Picochlorum NBRC102739, singletons make up the majority (90.4 – 93.5%) of Picochlorum species genomes, followed by dispersed duplications (5.4 – 7.4%) (Table 4.S4). About 75% of the

Picochlorum NBRC102739 genome is categorized as segmental duplications, again likely attesting to the potential falsely duplicated chromosomal regions due to a lower quality assembly.

Horizontal gene transfer

The five Picochlorum species in this study were analyzed using automated phylogenetic methods to identify instances of HGT. The standard approach is to identify cases of incongruency between gene tree and species trees; e.g., Picochlorum species nested in bacterial clades. In total, 13 HGT-derived gene families were identified of prokaryotic origin that are unique to Picochlorum (i.e., not found in any other available Chlorophyta genome), 8 of which are common to all five species in this genus (Figure 4.3 and Table

97

4.S5). This represents a set of HGT candidates with phylogenetic evidence and transcriptomic evidence in P. oklahomensis or Picochlorum SENEW3. Two genes contain introns, further validating integration into the nuclear genome and ruling out contamination (Figure 4.S3). A total of 7/24 of those previously identified in

Picochlorum SENEW3 were verified by this analysis using the more robust genome assembly, the IQ-TREE method, and the increased number of sequenced algal genomes.

Figure 4.3 shows the differential gain of HGT-derived genes, indicating the ongoing and dynamic nature of HGT. Cellulase was acquired only in Picochlorum

SENEW3 and P. oklahomensis. Three genes were potentially gained in the P. oculata and

P. soleocismus lineage, however there are no transcriptome data to verify expression of these HGT candidates. Acetolactate synthase, involved in branched chain amino acid biosynthesis and fermentation, appears to be acquired in a common ancestor of

Picochlorum species and lost in P. oklahomensis and Picochlorum SENEW3.

One note of interest is that among the eight HGT candidate families present in all

Picochlorum species, the majority (6/8 genes in Picochlorum SENEW3) are collinear in the two sister species, but not across all species. Instead, HGT candidates appear as insertions of one or a few genes into otherwise collinear blocks of more distantly related species (Figures 4.S3, 4.S4,4.S5). This feature, along with the phylogenies, indicates that these transfers occurred in the Picochlorum common ancestor and were later rearranged in genomes, suggesting that regions containing transferred genes are also more likely to be mobile.

The majority of (8/9) HGT-derived gene families in Picochlorum SE or P. oklahomensis are differentially expressed (DE; see below) under at least one stress

98 condition in either species, indicating functional relevance of many of these genes in stress tolerance (Table 4.S6). The potential HGT of genes involved in fermentation

(fumarate reductase, O-succinylbenzoate-CoA ligase, acetolactate synthase) indicate that foreign gene acquisition increases metabolic flexibility in anaerobic environments. Some foreign genes are also involved in cofactor biosynthesis: a putative O-succinylbenzoate-

CoA ligase involved in menaquinone synthesis, and a pimeloyl-ACP methyl ester carboxylesterase – like protein involved in biotin synthesis. Glycerol dehydrogenase is involved in osmolyte production and therefore salinity tolerance. Two genes involved in

GDP-fucose synthesis pathway involved in cell wall metabolism are also of foreign origin. Indolepyruvate decarboxylase, involved in the production of the plant auxin, indole-3-acetate (IAA), is differentially expressed under all conditions in Picochlorum

SENEW3 and P. oklahomensis under salinity stress, suggesting that it may be involved in stress regulation (Table 4.S6). Auxins have been showed to increase tolerance to salinity stress in higher plants (196). It has been suggested that IAA produced by bacteria serves as a signaling molecule mediating bacteria-diatom symbiotic interactions or enogenously produced IAA by diatoms plays a role in intra-species signaling (197-199).

Gene gain/loss

We analyzed the organelle genomes in Picochlorum genus. The chloroplast genome size ranged from 72.7 – 74.5kbp with 65 or 68 predicted genes (Tables 4.S7, 4.S8). The P. oculata and P. soleocismus lineages have lost the three required genes (chlN, chlB, chlL) for the light-independent chlorophyll synthesis pathway in plastids. These genes are present in the other Picochlorum species as well as in Chlorella and Chlamydomonas.

99

Loss of this pathway is common in algae, angiosperms, and in other chlorophytes such as

Ostreococcus tauri and Micromonas pusilla (200-203). The mitochondrial genomes are between 33.0 – 38.7 kbp with 14 -17 predicted genes (Tables 4.S7,4.S9). P. soleocismus and P. oculata are missing NADH dehydrogenase subunit 9 (nad9) and small subunit ribosomal RNA (rrnS). Ribosomal protein S10 (rps10) has has copies present in both the nuclear genome and mitochondrial genome in P. oklahomensis and Picochlorum

SENEW3, whereas it is only present in the nuclear genomes of the other three species.

All five species show conserved synteny of rps10 in the nuclear genome and nuclear copies are DE in P. oklahomensis and Picochlorum SENEW3 under salinity shock conditions discussed below. Many independent transfers of rps10 to the nuclear genome have been found in angiosperms with subsequent rapid loss of the mitochondrial copy, suggesting that this may be a recent transfer that has not yet been lost from P. oklahomensis and Picochlorum SENEW3 (204).

With regard to the nuclear gene inventory, OrthoMCL and the DOLLOP program of the PhyML package (Dollo parsimony method) were used to infer gene family gains and losses among sequenced chlorophyte species (Figure 4.2). Gene family loss can be seen along the branches of the Picochlorum genus, consistent with genome size reduction in these species. Picochlorum species share a core set of 3,825 gene families with sister species P. oklahomensis and Picochlorum SENEW3 sharing 5,343 and P. oculata and

Picochlorum NBRC102739 sharing 5,405 gene families. One interesting finding is the loss of urea assimilation related genes in P. oklahomensis, that are clustered in the genomes of other Picochlorum species and include a high affinity sodium: urea transporter, urea carboxylase, and allophanate hydrolase. These genes and the urease

100 homologs are absent in the PacBio and Illumina generated assemblies from P. oklahomensis and appear to be missing from an otherwise collinear gene order shared with Picochlorum SENEW3. Algal growth in the Great Salt Plains may be dependent on nitrogen availability, therefore loss of the urea assimilation pathway suggests limited access to urea as a N-source (205, 206).

Gene family expansion of the sodium hydrogen exchanger 7-family (SOS1), involved in salt extrusion, is seen in Picochlorum SENEW3 (3 copies), Picochlorum

NBRC102739 (3), compared to a single-copy gene in the other Picochlorum species.

Over-expression of SOS1 improves salinity tolerance in Arabidopsis, however none of these genes were differentially expressed in Picochlorum SENEW3 under previously tested salinity shock conditions (26, 207).

Picochlorum SENEW3 and P. oklahomensis transcriptome comparison in response to salinity shock

Transcriptome analysis was performed on P. oklahomensis cultures acclimated to F/2 medium containing 1M NaCl and shocked with high salinity (1.5M NaCl) and low salinity (10mM NaCl), corresponding to a previous experiment done with Picochlorum

SENEW3 under the same conditions (26). Messenger RNA was sequenced from the 1h and 5h time points after exposure to salinity shock. A log2fold change (l2fc) cutoff of 1

(P-value adj <0.01) was used to designate differential gene expression (DE). Growth rates

(Figure 4.1B) show that both species respond similarly under salinity stress and show improved growth at lower salinity after acclimation of a few days, whereas cells remain stressed under high salinity. Whereas P. oklahomensis and Picochlorum SENEW3 have

101 similar growth responses to low salinity shock, approximately after 4 days after changing from 1M to 10mM NaCl in the medium (Figure 4.1B), Picochlorum SENEW3 shows better acclimated growth rates at low salinity (Figure 4.1A). This suggests that the long- term acclimation responses may differ, which were not explored with our RNA-seq experiment.

Between 2,202 and 3,135 genes were DE under one of the four conditions tested, totaling 4,967 unique DE genes under at least one condition, representing 73% of the genome. This can be compared to 50% of the gene inventory in Picochlorum SENEW3 affected by salinity stress under the same conditions. The core salinity stress response in

P. oklahomensis is comprised of 9.4% of these genes, whereas 22.4% of the genes are uniquely involved in the high salinity response, and 31.1% in the low salinity response

(Figure 4.S6). A total of 4,841 gene homolog pairs show DE in P. oklahomensis and

Picochlorum SENEW3 under at least one condition, with more genes detected as DE in

P. oklahomensis than in Picochlorum SENEW3 under all conditions. Responses of each species within the first hour show more similarity at low salinity (53.6% and 61.6% of up and down-regulated genes in Picochlorum SENEW3 shared with P. oklahomensis) and for down-regulated genes at high salinity (45.8%) than for up-regulated genes at high salinity (28.7%). By 5hrs the percentage of shared genes is similar across conditions, implying that the greatest regulation differences are the response to high salinity shock at

1h (Figure 4.S7). Furthermore, 53 genes up-regulated P. oklahomensis under high salinity shock at 1h show the down-regulation in Picochlorum SENEW3 (97 genes in

Picochlorum SENEW3 show the opposite regulation).

102

The response of P. oklahomensis and Picochlorum SENEW3 to high salinity differs greatly with respect to nitrogen metabolism, photorespiration, and osmolyte production under high salinity stress (SI Excel file 2). Whereas Picochlorum SENEW3 shows an increase in nitrate and urea uptake and assimilation at 1h under high salinity, P. oklahomensis shows down-regulation of nitrate reduction and has lost the ability to uptake urea from the environment. Subsequently, pathways involved in synthesizing the nitrogen rich proteins, glycine, glutamate, glutamine, homoserine and proline are down- regulated in P. oklahomensis. Glycerol metabolism is up-regulated in P. oklahomensis, suggesting that glycerol may be the major osmolyte as an alternative to proline in nitrogen-limited environments. Starch metabolism that shows some down-regulation under high salinity has a stronger down-regulation at low salinity in P. oklahomensis, a pathway that is upregulated under both high and low salinity shock in Picochlorum

SENEW3. Similar to the response in P. oklahomensis, glycerol content increase is coupled with starch degradation under hyperosmotic shock in Dunaliella parva (208).

Additionally, fatty acid biosynthesis and elongation is up-regulated in Picochlorum

SENEW3 and down-regulated in P. oklahomensis at 1h under high salinity shock. This suggests that carbon is diverted from storage in starch and lipids for glycerol production in P. oklahomensis. In contrast, proline may be the major osmolyte in Picochlorum

SENEW3.

Whereas strong upregulation of photorespiration was observed in Picochlorum

SENEW3, this is not the case for P. oklahomensis under high salinity and the pathway is strongly down-regulated, along with the Calvin-Benson-Bassham cycle under low salinity shock. However, other reactive oxygen species detoxification mechanisms are

103 upregulated including glutathione metabolism in both species under high salinity.

Tocopherol and plastoquinol-9 synthesis are such up-regulated mechanisms in P. oklahomensis under high salinity. Tocopherols function as antioxidants to protect the cell against photosynthesis-derived reactive oxygen species and ultimately, photoinhibition

(209).

Allele-specific expression

A total of 2,119 allele pairs were identified in P. oklahomensis between the genes predicted from the primary and haplotig contigs (reciprocal best blast hit, identity >70%, coverage >90%). RNA-seq reads were mapped to the combined set of primary and haplotig contigs at high stringency, (100% identity and length fraction), removing any nonspecific hits and allele pairs with <10 reads mapped (210, 211). This resulted in 721 allele pairs with significant mapping under at least one condition. Of these, about 1/3 of allele pairs showed monoallelic expression (>90% reads mapped to one allele) and 1/3 showed biallelic expression (40-60% reads mapped to each allele) under each condition

(Table 4.S10). Monoallelic expression may also be a result of the lack of reads mapping to the alternate allele under the specified set of filters. A set of 340 of these had mapping data under all conditions. The majority of these allele pairs did not change in their categorized expression type (monoallelic vs. biallelic): 90/118 (76%) monoallelic shared between the control and at least one condition, 121/200 biallelic (61%). A total of 64% of monoallelic compared to 24% of biallelically expressed gene pairs showing no change are still expressed under all conditions, suggesting that biallelic expression is more dynamic under stress (Figure 4.S8). Taking the top 10% of allele pairs showing a high

104 change in primary: haplotig expression (i.e., change in ratio of reads mapped to each allele) under at least one condition compared to the control, results in a subset of 128 pairs with approximately >18% change. This subset includes 3 core subunits of DNA- directed RNA polymerases and 20s core proteasome, indicating that divergent alleles may play a role in transcription modulation. This set also includes 71 allele pairs for which we have expression data for all 4 conditions (Figure 4.S9) and the majority, 70%

(50 allele pairs; 25 low, 25 high) are specific to either low or high salinity rather than shared across conditions. This again suggests functional relevance for allele specific expression patterns.

Discussion

The fundamental importance of high quality genomes and gene models for non-model organisms that have little or no ploidy information is highlighted here with the apparently

“simple”, reduced genomes of the Picochlorum lineage. Generations of high quality genome assemblies allowed us to address questions of local adaptation in microbial eukaryotes from a genome-wide perspective. The differences in salinity range of each species and reduced genome size suggest that salinity and other environmental stress tolerance is specialized for individual habitats, making Picochlorum an ideal model for understanding differential adaptation.

Our study revealed that this lineage is comprised of diploid organisms that contain differing levels of heterozygosity between haplotypes. This novel insight contrasts with genome data from many green algae genomes that are derived from haploid stages. Low heterozygosity in P. oculata, P. soleocismus, and Picochlorum SENEW3 suggest that

105 these populations may have experienced inbreeding or alternatively, there was a recent diversification of alleles in P. oklahomensis and Picochlorum NBRC102739. P. oklahomensis and Picochlorum SENEW3 are of particular interest because they are sister species that inhabit different hypersaline environments. Although very similar in genome size, synteny, gene number, gene content, sequence identity, and with identical 18S rRNA sequences, the glaring difference between these lineages is the high level of heterozygosity in P. oklahomensis. Thus, allele-specific gene expression analysis was performed to identify potential advantages of maintaining highly heterozygous alleles.

We found evidence of expression changes between alleles under various stress conditions, suggesting that heterozygosity may provide a functional advantage under stress. However, the data available from short read RNA-sequencing methods was of limited utility, and alternative methods such as PacBio Iso-Seq sequencing of full-length transcripts coupled with better gene models is clearly needed to address this question.

Similarly, Mock et al. found highly divergent alleles in natural populations of the diatom

Fragilariopsis cylindrus, and showed that allele-specific expression was condition- dependent. Furthermore, they showed condition-dependent expression correlated with recent diversifying selection, suggesting that maintaining a large pool of diverse alleles may contribute to tolerance of fluctuating environmental conditions (212). In our study, we report two sister taxa, both inhabiting environments with fluctuating salinity that utilize contrasting strategies in terms of allelic diversity. High allelic diversity in P. oklahomensis, may be reflective of other extreme environmental fluctuations in a terrestrial hypersaline environment, including irradiation and dessication stresses.

106

Whereas the gene inventory is very similar between Picochlorum SENEW3 and

P. oklahomensis, gene expression differs greatly under salinity stress, revealing that the environment has a significant effect on driving transcriptional responses. In P. oklahomensis, loss of urea assimilation pathways and lack of reliance on nitrogen rich products during salinity stress suggests a N-limited environment. A carbon shunt towards glycerol productions suggests that glycerol may be the major osmolyte in P. oklahomensis. This response is opposed to the up-regulation of nitrogen uptake and assimilation and proline metabolism under high salinity shock in Picochlorum SENEW3.

In addition, gene family expansion of the SOS1 sodium transporter is observed in

Picochlorum SENEW3 and may contribute to salinity tolerance.

HGT has been shown to drive niche specialization and stress tolerance in microbial eukaryotes, however there is limited information on the impacts of HGT from multiple, phylogenetically closely related taxa (58, 59, 213-215). Using five independent genome assemblies, three of which are high quality, we find clear evidence of HGT from prokaryotic donors that were differentially acquired or fixed in Picochlorum species. Our results reveal HGT to be a dynamic process involving the transfer, divergence, rearrangement, and loss of genes. The lack of collinearity of HGT genes in otherwise collinear portions of the genome across Picochlorum species, suggests that transferred genes of foreign origin were subsequently rearranged in the recipient genome, possibly acting as a driver of genomic rearrangement. There is evidence that HGT tend to be incorporated into genomic regions rich in transposable or repetitive elements. One hypothesis is that prokaryotic HGT to eukaryotes was first incorporated into these regions, and later, the gene duplicated, and subsequently a copy was incorporated into

107 gene-rich regions (216). Another study showed conserved gene order of HGT among mealybugs (217). Gene acquisitions may be advantageous in each organism’s environment, most of which are differentially expressed under salinity stress conditions tested in two species. HGT candidates include genes involved in osmolyte production, anaerobic, cofactor, and cell wall metabolism. They also include two genes in a pathway involved in cell wall metabolism, underlining the functional relevance of HGTs. One

HGT candidate, indolepyruvate decarboxylase, involved in the production of the plant auxin, indole-3-acetate (IAA), is significantly differentially expressed under all salinity stress conditions in both species, suggesting a role in salt stress signaling. HGT identification is to be cautioned without high quality assemblies, transcriptome data, and data from closely related species, as evidenced by 10/24 previously identified HGT in

Picochlorum SENEW3 assembly V1 failing to be validated with this new higher quality assembly.

Through the Picochlorum lineage we see multiple factors at play in shaping the genomes adapted to environmental niches. Selective pressure for genome reduction is observed with loss of environmentally irrelevant pathways, suggesting niche specialization. Under this pressure for genome reduction, gene gain or gene family expansion may highlight environmentally advantageous adaptations. HGT is one method of novel functionally relevant gene gain that we show is a dynamic and ongoing process in microbial eukaryotic evolution, which may also drive genomic rearrangement.

Variations in allelic diversity among Picochlorum species highlight another avenue available to algal lineages to adapt to fluctuating environmental conditions.

108

Materials and Methods

Strain information and growth rates

Picochlorum cultures (Picochlorum SENEW3, P. oklahomensis UTEX B2795,

Picochlorum NBRC102739 (formerly MBIC10091 ), P. oculata UTEX LB 1998

(synonym Nannochloris oculata) were cultivated in artificial seawater (115) based

Guillard's f/2 medium (218) without silica (f/2 ASW–Si). Cultures were grown at 25 °C under continuous light (100 µE m−2 s−1) on a rotary shaker at 100 rpm (Innova 43, New

Brunswick Eppendorf).

Cell counts were performed using a hemacytometer (Neubauer improved, Hausser

Scientific), image capture (Infinity 2camera, Lumenera corporation), and ImageJ counting software. Acclimated growth rates were determined for all Picochlorum species in F/2 media with various salinities (0.01, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0, and 1.2M NaCl), based on cell counts during exponential phase using four replicate cultures. The growth rate experiment for P. oklahomensis and Picochlorum SENEW3 under shock conditions was performed by shocking cells acclimated to F/2 media with 1M NaCl to 10mM and

1.5M NaCl media and using triplicate cultures. Growth rates were then calculated during exponential phase, approximately four days after shock.

Genome sequencing

Algal cells were flash frozen in liquid nitrogen, homogenized using a mortar and pestle, and gDNA was extracted for PacBio sequencing according to the “isolation of genomic

DNA from plants and filamentous fungi using the Qiagen Genomic-tip” user-developed protocol using the Qiagen Genomic-Tip 100/G Kit. Genomic DNA was extracted from P.

109 oculata and Picochlorum NBRC102739 for Illumina sequencing using the Qiagen

DNeasy Plant Mini Kit.

SMRT Pacific Biosciences sequencing, library preparation, and genome assembly was performed by DNA Link, Inc (Seoul, South Korea). Picochlorum SENEW3 and P. oklahomensis were sequenced on the PacBio RSII system using P6-C4 chemistry with

MagBead OneCellPerWell v1 Protocol (Insert Sizes 20 kb, movie time 1x240 min) and 2 and 4 SMRT cells respectively. The P. oculata and Picochlorum NBRC102739 libraries were prepared using the Illumina Truseq Nano Library Preparation Kit and sequenced using the MiSeq Personal Genome Sequencer (Illumina) with 2x300 bp (paired end) reads and a 600-cycle kit. CLC Genomics Workbench was used for de novo assembly of illumina reads using default parameters. Chloroplast and mitochondrial genomes were derived from Illumina sequencing for all species.

Genome Assembly, Gene prediction, and annotation

Falcoln Unzip was used to assemble Picochlorum SENEW3 V2 and P. oklahomensis primary and haplotig contigs (195). Eight contigs appearing to be duplicates of regions in larger contigs in P. oklahomensis were manually removed. The CLC Genomics

Workbench was used to generate assemblies from the Illumina derived sequencing reads for Picochlorum NBRC102739 and P. oculata. Contigs from Bacterial contaminated contigs were manually removed based on top Blastp hits. P. oculata and Picochlorum

NBRC102739 contigs under 10x read coverage were removed except those with eukaryotic Blastp hits. Assemblies and gene predictions can be retrieved at http://cyanophora.rutgers.edu/picochlorum/.

110

Variant frequencies in P. oklahomensis and Picochlorum SENEW3 were calculated by mapping Illumina reads (5.9M reads and 23.9M paired end reads) to primary contigs derived from Pacbio sequencing and Falcoln unzip (100% length fraction, 70% identity), and then using the basic variant detection function of CLC

Genomics Workbench (ploidy=2, min coverage =10, min frequency = 1%). Insertions and deletions were excluded from the frequency calculations.

Augustus (219) was used with Picochlorum SENEW3 as a model, using hints from Picochlorum SENEW3 ESTs (132) to predict genes in Picochlorum SENEW3, P. oklahomensis, P. oculata, and Picochlorum NBRC102739. DOGMA was used for gene prediction and annotation of chloroplast genomes (220). To assess genome completeness, protein sequences were used as query against BUSCO Eukaryota_odb9 core gene set

(221). CD-hit was used to remove redundant genes with >99% similarity (222).

Gene function and gene ontology (GO) terms were annotated using the Blast2go pipeline with default parameters (128). Pathways were annotated using the Kyoto

Encyclopedia of Genes and genomes (KEGG) Automatic Annotation Server (KASS) with the bi-directional best hit method against the eukaryotic representative gene set and including available algae with the default threshold cutoff (223).

Genome synteny

The MCScanX_h program of the MCScanX package was used to determine synteny with a list of OrthoMCL (BlastP cutoff <1E-5) generated orthologs and coortholog pairs between the Picochlorum species as the input (224). Protein predictions before removal of 99% identity redundant proteins were used in this analysis to better assess collinearity.

111

Collinear blocks are defined as having at least 5 collinear genes with <25 intervening genes. Synteny was visualized using Circos v-0.69 (225). The duplicate gene classifier program of the MCScanX package with an all-versus-all Blast (Blastp 1E-10 cutoff) as the input was used to classify duplicate genes uniquely in the following categories in order of priority: segmental > tandem > proximal (<20 intervening genes) > dispersed.

Construction of multi-protein tree

Proteome data from 18 chlorophyte green algae and 3 streptophyte species were collected: Chlamydomonas reinhardtii (118), Volvox carteri (119), Chlorella variabilis

(112), Coccomyxa subellipsoidea (95), Micromonas sp. RCC299, Micromonas pusilla

CCMP1545 (120), Ostreococcus tauri (57), Ostreococcus lucimarinus, Ostreococcus sp.

RCC809 (US Department of Energy, Phytozome), and Bathycoccus prasinos (226),

Chlorella protothecoides (227), Gonium pectoral (228), Picochlorum SENEW3,

Picochlorum NBRC102739, P. oklahomensis, P. oculata, and Picochlorum DOE101, and the streptophytes Klebsormidium flaccidum (229), Arabidopsis thaliana (230), and

Physcomitrella patens (231). CD-Hit was used to remove redundant proteins with >95% similarity per genome (222). Ortholog groups were constructed using OrthoMCL with a

BlastP cutoff E-value < 1E-5 (121). Alignments of orthologous groups were generated from this combined data set (allowing for missing data from a maximum of 3 chlorophytes and 2 streptophyte outgroups) as described in Foflonker et al. (132). A total of 1122 alignments were concatenated into a super-alignment (293,805 amino acids) and the multi-protein tree built using IQ-TREE (232) with the built-in model selection function and branch support estimated using ultrafast bootstrap (UFboot) with 1,500

112 bootstrap replicates (-bb 1,500).

Genome comparison

OrthoMCL analysis was performed using the default parameters (BlastP E-value cutoff <

1E-5) with the 5 Picochlorum species, 13 other green algal species (described under

‘construction of multi-protein tree’), and 3 outgroup species: Arabidopsis thaliana,

Klebsormidium flaccidum, and Physcomitrella patens. To infer gene family gain or loss, the Dollo parsimony method (233) in the DOLLOP program of PHYLIP package (234) was used with gene families determined by the OrthoMCL analysis as the input of this analysis. The Fischer exact test function of the Blast2GO program was used to determine enrichment of GO terms. Pathway comparisons were performed using KEGG Mapper.

Phylogenomic methods

To search for HGTs, the Picochlorum proteomes was subject to a phylogenomic pipeline

(235). Briefly, we downloaded the protein database (RefSeq version 58) from NCBI FTP site (ftp://ftP.ncbi.nlm.nih.gov/refseq/). For each genus (e.g., Arabidopsis), we retained the species with the largest number of sequences (e.g., A. thaliana) and removed the remaining species (e.g., A. lyrata). This reduced RefSeq database was then combined with a red algal sequence database (collected in (236)), chromalveolate protein sequences derived from MMETSP database (237) and sequences from green algae including the recently published green algal genomes: Picochlorum soleocismus DOE101

(greenhouse.lanl.gov), Klebsormidium flaccidum (229), Gonium pectoral (228),

Chlorella protothecoides (227), Coccomyxa subellipsoidea (95), Bathycoccus prasinos

113

(226), and four Picochlorum species sequenced in this study. For each order or species, the redundant sequences (sequence identity ≥85%) were removed using CD-HIT v4.5.4

(238). The Picochlorum query sequences were used in a Blastp search (E-value < 10-5) against the local database mentioned above. The top 1,000 Blastp hits (sorted by bit score) from each query were parsed via custom scripts to extract ≤12 representatives from each phylum to create a taxonomically diverse sample. The Blastp hits were re- ordered according to query-hit identity followed by the sampling of another set of representative sequences. The query sequence was then combined with the two sets of sampled representative sequences. Sequence alignments were built using Muscle v3.8.31 under default settings (239) followed by trimming using TrimAl v1.2 (240) in the automated mode (-automated1). FastTree v2.1.7 (241) was used the under the WAG model to build phylogenetic trees consisting of at least 4 leaves. A custom script was used to sort trees consisting of Picochlorum that was nested among prokaryotes (236).

Candidate HGT sequences were then reanalyzed using IQ-TREE v1.4.3 (232) with the built-in model selection function and branch support estimated using ultrafast bootstrap

(UFboot) with 1,500 bootstrap replicates (-bb 1,500). Phylogenetic trees were then manually analyzed for evidence of monophyly between Picochlorum and prokaryotes or trees containing only prokaryotes with at least 4 species and 30% identity. HGT candidates in Picochlorum SENEW3 and P. oklahomensis were verified with transcriptome data, any lacking transcriptional evidence were removed.

RNA-seq

RNA extraction, sequencing, and analysis was done as described previously (26), with

114 the exception that libraries were sequenced using the Illumina (MiSeq reagent kit v3 150- cycle) paired end 2 x 75 reagent kit. The Biocyc Pathway tools were used to create a pathway database for P. oklahomensis. The OmicsViewer and OmicsDashboard functionalities were used to visualize and analyze RNA-seq data (242). Illumina transcriptome data are available at http://cyanophora.rutgers.edu/picochlorum/.

Allele-specific expression

RNA-seq reads were mapped to the combined set of primary and haplotig contigs at high stringency, (100% identity and length fraction), removing any nonspecific hits. Allele pairs were determined by taking the reciprocal Blastp hits (Identity > 70%, coverage

>90%) of genes predicted on the haplotig and primary contigs. The ratio of primary to haplotig reads mapped was calculated by totaling the reads mapped at each of the three trials at each condition, filtering out genes with <10 reads mapped per allele pair.

Acknowledgments

This work was supported by a grant from the Department of Energy (DE-

EE0003373/001) to D.B., graduate training support from the National Science

Foundation IGERT for Renewable and Sustainable fuels program at Rutgers University

(0903675) to F.F., and a grant in aid of research from the Phycological Society of

America to F.F. We are grateful to the Rutgers University School of Environmental and

Biological Sciences Genome Cooperative for supporting this research and to Dr. Sarah

Kingan at Pacific Biosciences for her help in interpreting the genome assemblies.

115

A B 0.05 P. SENEW3 P. oklahomensis P. oklahomensis 0.07 P. SENEW3 P. NBRC102739 P. oculata 0.06 0.04

) 0.05 1 - h ( 0.03 e t 0.04 a r h t w 0.03 o 0.02 r G 0.02 0.01 0.01

0.00 0.00 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0.0 0.5 1.0 1.5 Salinity (M NaCl) Salinity (M NaCl)

Figure 4.1. (A) Acclimated growth rates of Picochlorum species in media with varying salinity (10mM – 1.2 M). (B) Growth rates of P. oklahomensis and Picochlorum

SENEW3 acclimated to 1M NaCl and shocked with 10mM and 1.5M NaCl.

116

Table 4.1. Sequencing and assembly statistics of Picochlorum species

117

Figure 4.2. Phylogeny of Picochlorum and other sequenced chlorophytes. Multi-protein tree constructed from an alignment of 1122 proteins (295,805 characters). Overall gene family gains plus losses determined using Dollo parsimony and OrthoMCL are noted on branches in blue or green. Bootstrap supported labeled on nodes. Scale: amino acid subsititutions per site.

118

Figure 4.3. Acquisition of HGT-derived genes in Picochlorum.

Table 4.S1. Summary statistics for variant detection in Picochlorum SENEW3 and P. oklahomensis primary assemblies. Minimum frequency = 1%, minimum coverage = 10.

Primary Haplotig Single/multi Variant Mean of assembly assembly size nucleotide s per frequencies size (Mbp) (Mbp) variants Kb (including (excluding reference allele) reference allele) Picochlorum 13.47 0.46 2, 532 0.190 49.5 SENEW3 P. 13.36 12.05 122, 058 9.136 49.1 oklahomensis

119

4.S1. Distribution of single and multi nucleotide variant frequencies in (A) P. oklahomensis and (B) Picochlorum SENEW3 primary assemblies. Includes frequencies for the reference allele and its variant allele pair.

120

Table 4.S2. Genome Assembly Completeness compared to BUSCO core Eukaryota dataset

Found Fragmented Missing Percentage found (out of 303 total) Coccomyxa subellipsoidea 269 14 20 93.4 Chlorella variabilis 258 16 29 90.4 Picochlorum NBRC 272 8 23 92.4 102739 P. oculata 261 11 31 89.4 P. oklahomensis 209 40 54 82.2 Picochlorum SENEW3 V2 275 8 20 93.4 P. soleoscismus 257 3 43 85.8

Table 4.S3. Collinearity between Picochlorum assemblies (# collinear homolog pairs/ # homolog pairs). Maximum gaps allowed = 5.

P. oculata P. oklahomensis Picochlorum SENEW3 V2 P. soleoscismus 88.7% 80.0% 80.8% (5228/5892) (3726/4657) (4415/5462) P. oculata -- 84.0% 85.8% (3970/4727) (4699/5479) P. oklahomensis -- -- 88.3% (5313/6019) Chlorella 45.6% variabilis (2084/4568) Coccomyxa 7.8% subellipsoidea (319/4298)

121

Table 4.S4. Duplication type categorized into duplication types in the following priority order: segmental duplication (whole genome duplications) (min genes per block 5, maximum gaps 25), tandem, proximal (< 20 intervening genes), dispersed, and singletons.

Species Singletons Dispersed Proximal Tandem Segmental Duplication P. 6129 386 21 40 0 (0%) soleoscismus (93.2%) (5.4%) (0.3%) (0.6%) Picochlorum 1628 1091 19 34 8594 (75.6%) NBRC (14.3%) (9.6%) (0.2%) (0.3%) 102739 P. oculata 5915 353 10 49 0 (0%) (93.5%) (5.6%) (0.2%) (0.8%) P. 6, 339 391 32 56 14 (0.2%) oklahomensis (92.8%) (5.7%) (0.4%) (0.8%) Picochlorum 6112 501 40 83 24 (0.4%) SENEW3 (90.4%) (7.4%) (0.6%) (1.2%)

122 Picochlorum Picochlorum Contigs are are Contigsnumbered

contigs are colored, gray represent secondcolored, gray contigs are represent

(right) and and species(left). (right)other

Picochlorum Picochlorum SENEW3 Picochlorum Picochlorum SENEW3 Synteny between Synteny . .

. Figure 4.S2 Figure the and segments; outer represented by originatingcontig Ribbons to and blocks corresponds species. on color connect collinear SENEW3

123

NBRC 102739,

icochlorum P P. soleoscismus , Pso: SENEW3

icochlorum P , Pse: derived genes and gene counts per species. Pnb: - P. oklahomensis , Pok: . Putative HGT 4.S5 P. oculata Table Poc:

124

70 Opisthokonta.Fungi-Sclerotinia_sclerotiorum_1980_UF-70.GI.156060051 Bacteria.Actinobacteria-Saccharothrix.GI.433605822 84 Bacteria.Actinobacteria-Microbacterium_testaceum_StLB037.GI.323357610 92 Bacteria.Actinobacteria-Saccharothrix.GI.433602098 Bacteria.Actinobacteria-Dietzia_cinnamea_P4.GI.319951391 A. 70 Bacteria.Proteobacteria-Rhodopseudomonas_palustris_TIE-1.GI.192291082 45 100 44 100Bacteria.Proteobacteria-Bradyrhizobiaceae_bacterium_SG-6C.GI.338973616 Bacteria.Proteobacteria-Afipia.GI.414172260 35 70 Plantae.Glaucophyte-Gloeochaete_witrockiana.SAG46.84.MMETSP0308_8260_1 Archaea.unclassified-halophilic_archaeon_DL31.GI.345004664 51 Bacteria.Proteobacteria-Aromatoleum_aromaticum_EbN1.GI.56475455 68 86 Bacteria.FibrobacteresAcidobacteria-Candidatus_Chloracidob.GI.347754550 80 Archaea.Crenarchaeota-Sulfolobus.GI.384432859 72 Bacteria.Firmicutes-Brevibacillus_sp._BC25.GI.398815305 58 Bacteria.Cyanobacteria-Coleofasciculus.GI.254412815 91 Bacteria.Firmicutes-Solibacillus_silvestris_StLB046.GI.393200605 Bacteria.Firmicutes-Kurthia_sp._JC30.GI.403384065 34 47 Bacteria.Actinobacteria-Blastococcus_saxobsidens_DD2.GI.379734873 Opisthokonta.Fungi-Coccidioides_immitis_RS.GI.119181859 100Opisthokonta.Fungi-Uncinocarpus.GI.258572398 71 Opisthokonta.Fungi-Coccidioides_immitis_RS.GI.119181545 Bacteria.Actinobacteria-Janibacter_sp._HTCC2649.GI.84496365 Bacteria.BacteroidetesChlorobi-Spirosoma.GI.284034970 68 Chromalveolata.Haptophyta-Scyphosphaera_apsteinii.RCC1455.49592 67 100 Bacteria.Planctomycetes-Pirellula.GI.283780744 Bacteria.Planctomycetes-Gemmata.GI.168703832 Bacteria.Firmicutes-Clostridium_carboxidivorans_P7.GI.296188527 59 97 95 Plantae.Glaucophyte-Cyanoptyche-gloeocystis_SAG4.97.MMETSP1086.41398_1 38 Bacteria.Proteobacteria-Stigmatella_aurantiaca_DW43-1.GI.310824161 Bacteria.Proteobacteria-Novosphingobium_pentaromativorans_.GI.359400685 68 63 63 Bacteria.Actinobacteria-Arthrobacter_chlorophenolicus_A6.GI.220911228 Bacteria.Actinobacteria-Modestobacter_marinus.GI.389864506 100 Bacteria.Cyanobacteria-Gloeocapsa.GI.434392103 72 Bacteria.Planctomycetes-Rhodopirellula.GI.32472501 100 Bacteria.Thermotogae-Kosmotoga_olearia_TBF_19.5.1.GI.239617048 98 Bacteria.Thermotogae-Fervidobacterium.GI.383786858 100 Bacteria.BacteroidetesChlorobi-Chryseobacterium.GI.300775548 68 98 Bacteria.Spirochaetes-Leptonema.GI.374584547 69 Bacteria.Thermotogae-Mesotoga_prima_MesG1.Ag.4.2.GI.389844239 100 61 Bacteria.BacteroidetesChlorobi-Haliscomenobacter_hydrossis_DS.GI.332662990 Bacteria.Proteobacteria-Sphingomonas_sp._LH128.GI.402826023 64 Bacteria.Proteobacteria-Hydrocarboniphaga_effusa_AP103.GI.392951789 Bacteria.Fusobacteria-Leptotrichia_hofstadii_F0254.GI.260890051 100 100 Bacteria.Proteobacteria-Parasutterella_excrementihominis_YI.GI.330999882 94 Bacteria.Proteobacteria-Sutterella.GI.378822107 65 78 Chromalveolata.Dinoflagellate-Symbiodinium_minutum.v1.2.013517 100 Bacteria.Actinobacteria-Mobilicoccus.GI.377575771 Bacteria.Actinobacteria-Kineosphaera.GI.403717575 100 Bacteria.Actinobacteria-Mycobacterium.GI.399984710 Bacteria.Actinobacteria-Amycolicicoccus_subflavus_DQS3-9A1.GI.333922140 53 Bacteria.Proteobacteria-Asticcacaulis_biprosthecum_C19.GI.329848607 100 100 Bacteria.Proteobacteria-Cellvibrio_sp._BR.GI.388258183 Bacteria.Proteobacteria-Saccharophagus_degradans_2-40.GI.90022799 99 Bacteria.BacteroidetesChlorobi-Dysgonomonas_gadei_ATCC_BAA-286.GI.333382501 Bacteria.BacteroidetesChlorobi-Mesoflavibacter_zeaxanthinifacie.GI.372223081 96 34 99 99 96 Bacteria.BacteroidetesChlorobi-Emticicia.GI.408672717 96 Bacteria.BacteroidetesChlorobi-Mucilaginibacter.GI.373957801 Bacteria.BacteroidetesChlorobi-Spirosoma.GI.284036160 Bacteria.BacteroidetesChlorobi-Cyclobacterium.GI.343085558 72 Bacteria.Proteobacteria-Phyllobacterium_sp._YR531.GI.398830532 99 Archaea.Euryarchaeota-Halomicrobium.GI.257388651 93 Bacteria.Firmicutes-Subdoligranulum_sp._4_3_54A2FAA.GI.365129319 Chromalveolata.Alveolata-Oxyrrhis_marina.gi117413254 Bacteria.Firmicutes-Thermobacillus_composti_KWC4.GI.430748691 Bacteria.Proteobacteria-Methylocystis.GI.323138426 86 Chromalveolata.Dinoflagellate-Pyrodinium_bahamense.pbaha01.44419 83 100 100 Chromalveolata.Dinoflagellate-Amphidinium_massartii_CS259.51725 Chromalveolata.Dinoflagellate-Scrippsiella_trochoidea.CCMP3099.49352 89 Bacteria.ChlamydiaeVerrucomicrobia-Lentisphaera_araneosa_HTCC215.GI.149197136 60 Bacteria.ChlamydiaeVerrucomicrobia-Lentisphaera_araneosa_HTCC215.GI.149196112 33 47 Bacteria.ChlamydiaeVerrucomicrobia-Lentisphaera_araneosa_HTCC215.GI.149195796 67 Bacteria.Planctomycetes-Rhodopirellula.GI.32477901 Bacteria.ChlamydiaeVerrucomicrobia-Lentisphaera_araneosa_HTCC215.GI.149199828 98 Bacteria.Planctomycetes-Phycisphaera.GI.383765779 Bacteria.Planctomycetes-Rhodopirellula.GI.32471842 85 83 100Target-Picochlorum_DOE101_03569 Target-Picochlorum_oculata_51.g801.t1 100 Target-Picochlorum_oklahomensis_000001F.g558.t1 33 100 100 Target-Picochlorum_SE3_000001F_consensus.g150.t1 65 100Target-Picochlorum_NBRC102739_290.g2286.t1 Target-Picochlorum_NBRC102739_461.g2546.t1 66 Bacteria.BacteroidetesChlorobi-Cyclobacterium.GI.343087611 100 100 Bacteria.BacteroidetesChlorobi-Algoriphagus_sp._PR1.GI.311747277 100 Bacteria.BacteroidetesChlorobi-Maribacter_sp._HTCC2170.GI.305664934 Bacteria.BacteroidetesChlorobi-Zobellia_galactanivorans.GI.340619410 52 66 Bacteria.ChlamydiaeVerrucomicrobia-Verrucomicrobium_spinosum_.GI.171914060 Bacteria.Planctomycetes-Planctomyces.GI.149177230 93 98 Bacteria.Planctomycetes-Pirellula.GI.283781724 99 Bacteria.ChlamydiaeVerrucomicrobia-Chthoniobacter_flavus_Ellin4.GI.196230535 Bacteria.Planctomycetes-Blastopirellula.GI.87310146 Bacteria.ChlamydiaeVerrucomicrobia-Verrucomicrobium_spinosum_.GI.171915691 38 93 Bacteria.Planctomycetes-Blastopirellula.GI.87310684 96 Bacteria.ChlamydiaeVerrucomicrobia-Opitutaceae_bacterium_TAV5.GI.373854448 98 Bacteria.Planctomycetes-Pirellula.GI.283780288 94 Bacteria.ChlamydiaeVerrucomicrobia-Verrucomicrobium_spinosum_.GI.171913890 Bacteria.ChlamydiaeVerrucomicrobia-Diplosphaera_colitermitum_TAV2.GI.225165785 97 42 100 Bacteria.ChlamydiaeVerrucomicrobia-Diplosphaera_colitermitum_TAV2.GI.225165584 Bacteria.ChlamydiaeVerrucomicrobia-Opitutaceae_bacterium_TAV5.GI.373851722 100 Bacteria.BacteroidetesChlorobi-Galbibacter_sp._ck-I2-15.GI.408369679 Bacteria.FibrobacteresAcidobacteria-Acidobacterium_capsulatum_A.GI.225873775 Opisthokonta.Metazoa-Saccoglossus_kowalevskii.GI.291221104 100 100 Opisthokonta.Metazoa-Saccoglossus_kowalevskii.GI.291229528 100 Opisthokonta.Metazoa-Saccoglossus_kowalevskii.GI.291229530 Opisthokonta.Metazoa-Strongylocentrotus_purpuratus.GI.390366184

0.6

125

B.

C.

D.

126

Figure 4.S3. (A) IQ-TREE of HGT candidate peptidase S9. (B) Transcriptome evidence for intron in the gene in Picochlorum SENEW3 under control (1M NaCl) conditions. (C)

Collinearity of this candidate with Picochlorum SENEW3 as the reference chromosome.

Double vertical lines indicate regions of collinear blocks that are not collinear homologs.

(D) Collinearity of this candidate with P. soleocismus as the reference chromosome. Full collinearity data can be found in SI Excel file 1.

127

A. Bacteria.Spirochaetes-Sphaerochaeta_pleomorpha_Grapes.GI.374316044 97 Bacteria.Firmicutes-Planococcus.GI.389818647 Bacteria.Fusobacteria-Ilyobacter.GI.310778603 100 Bacteria.BacteroidetesChlorobi--Bacteroidetes.GI.298372944 99 94 94 Bacteria.BacteroidetesChlorobi-Tannerella_sp_6_1_58FAA_CT1.GI.365122709 99 100 Bacteria.ChlamydiaeVerrucomicrobia-Akkermansia_muciniphila_AT.GI.187735743 84 Bacteria.BacteroidetesChlorobi-Porphyromonas_gingivalis_TDC60.GI.334147966 Bacteria.Fusobacteria-Ilyobacter.GI.310780461 Bacteria.Thermotogae-Marinitoga_piezophila_KA3.GI.374340156 84 Bacteria.Thermotogae-Mesotoga_prima_MesG1.Ag.4.2.GI.389844030 Bacteria.Nitrospirae-Candidatus_Nitrospira_defluvii.GI.302038311 94 Bacteria.Firmicutes-Thermosediminibacter.GI.302389088 28 97 Bacteria.Cyanobacteria-Thermosynechococcus_elongatus_BP-1.GI.22298176 Bacteria.Cyanobacteria-Arthrospira_platensis_C1.GI.423062318 100 89 88 Bacteria.Cyanobacteria-Oscillatoriales_cyanobacterium_JSC-12.GI.411119012 99 97 Bacteria.Cyanobacteria-Pleurocapsa.GI.428203506 Bacteria.Cyanobacteria-Gloeocapsa.GI.434393432 93 100 Bacteria.Cyanobacteria-Fischerella_sp_JSC-11.GI.354564880 Bacteria.ChlamydiaeVerrucomicrobia-Opitutus_terrae_PB90-1.GI.182416050 100 Bacteria.ChlamydiaeVerrucomicrobia-Verrucomicrobium_spinosum_.GI.171912314 Bacteria.Spirochaetes-Brachyspira_intermedia_PWSA.GI.384209232 Bacteria.Synergistetes-Thermanaerovibrio.GI.365873009 99 99 97 Bacteria.Synergistetes-Dethiosulfovibrio.GI.288572970 Bacteria.Synergistetes-Aminomonas.GI.312880519 Bacteria.Deferribacteres-Calditerrivibrio.GI.313673129 67 Bacteria.BacteroidetesChlorobi-Haliscomenobacter_hydrossis_DS.GI.332664848 Bacteria.BacteroidetesChlorobi-Gramella_forsetii_KT0803.GI.120434917 77 76 Bacteria.BacteroidetesChlorobi-Microscilla.GI.124004209 Bacteria.BacteroidetesChlorobi-Zobellia_galactanivorans.GI.340619138 73 92 Bacteria.BacteroidetesChlorobi-Nonlabens_dokdonensis_DSW-6.GI.443243367 71 90 Bacteria.BacteroidetesChlorobi-Pedobacter_sp_BAL39.GI.149277566 99 97 Bacteria.Deferribacteres-Denitrovibrio.GI.291288679 Bacteria.BacteroidetesChlorobi-Pontibacter_sp_BAB1700.GI.395213188 Bacteria.Spirochaetes-Treponema_primitia_ZAS-2.GI.333998074 89 99 Bacteria.Spirochaetes-Spirochaeta.GI.302340731 40 Bacteria.Firmicutes-Pelosinus_fermentans_B3.GI.421060938 69 Archaea.Euryarchaeota-Methanosalsum.GI.336477555 88 Archaea.Euryarchaeota-Methanolobus_psychrophilus_R15.GI.410671574 100 62 Archaea.Euryarchaeota-Methanohalophilus.GI.294494770 Archaea.Euryarchaeota-Methanosarcina_acetivorans_C2A.GI.20090039 51 77 Bacteria.Firmicutes-Sporosarcina.GI.340355321 85 Bacteria.Firmicutes-Acetonema.GI.338814309 Bacteria.Firmicutes-Thermosinus_carboxydivorans_Nor1.GI.121533318 Bacteria.Deferribacteres-Flexistipes.GI.336322679 30 Bacteria.ChlamydiaeVerrucomicrobia-Lentisphaera_araneosa_HTCC215.GI.149196371 Bacteria.BacteroidetesChlorobi-Chlorobium.GI.119358376 95 Bacteria.Proteobacteria-Salmonella_enterica_enteric.GI.417334698 79 95 95 Bacteria.ChlamydiaeVerrucomicrobia-Coraliomargarita.GI.294055946 86 Bacteria.BacteroidetesChlorobi-Chloroherpeton.GI.193215765 97 78 Bacteria.Proteobacteria-Limnobacter_sp_MED105.GI.149925732 44 Bacteria.Proteobacteria-Hydrogenophaga_sp_PBC.GI.388565911 Bacteria.Cyanobacteria-Cyanobium.GI.427702488 Bacteria.Proteobacteria-Desulfovibrio_magneticus_RS-1.GI.239907315 27 57 Bacteria.Proteobacteria-Desulfomicrobium.GI.256831031 89 34 Bacteria.Proteobacteria-Desulfobacter_postgatei_2ac9.GI.389581025 24 Bacteria.Proteobacteria-Desulfomicrobium.GI.256831013 Bacteria.Cyanobacteria-Acaryochloris_marina_MBIC11017.GI.158338836 32 Bacteria.FibrobacteresAcidobacteria-Candidatus_Koribacter_.GI.94970840 Bacteria.FibrobacteresAcidobacteria-Terriglobus.GI.390959005 80 53 53 Bacteria.Proteobacteria-Dechlorosoma_suillum_PS.GI.372486785 63 Bacteria.Proteobacteria-Thalassospira_xiamenensis_M-5___DS.GI.407767890 80 Bacteria.Proteobacteria-Thioalkalivibrio_nitratireducens_D.GI.430760001 Bacteria.Proteobacteria-Brucella_pinnipedialis_B294.GI.261317195 100 99 Bacteria.Proteobacteria-Bradyrhizobiaceae_bacterium_SG-6C.GI.338974799 Bacteria.Proteobacteria-Magnetococcus_marinus_MC-1.GI.117924729 Picochlorum_SE3_000004F_consensus.g907.t1 100 Picochlorum_oklahomensis_000005F.g773.t1 99 Picochlorum_NBRC102739_12.g133.t1 100 100 Picochlorum_NBRC102739_11.g281.t1 99 Picochlorum_NBRC102739_303.g2248.t1 Picochlorum_DOE101_03874 100 Picochlorum_oculata.contig_56.g761.t1 Chromalveolata.Stramenopiles-Ectocarpus_siliculosus.GI.299471989 99 98 Chromalveolata.Stramenopiles-Thalassiosira_pseudonana_CCMP.GI.224001660 Chromalveolata.Stramenopiles-Pythium_ultimum.G010618 99 Chromalveolata.Haptophyta-Coccolithus_pelagicus.braarudi_PLY182g.1550 100 Chromalveolata.Haptophyta-Chrysochromulina_rotalis.UIO044.5840 100 100 88 Chromalveolata.Haptophyta-Emiliania_huxleyi.EMIHUDRAFT_434640 Chromalveolata.Haptophyta-Chrysoculter_rhomboideus.RCC1486.695 51 Chromalveolata.Alveolata-Toxoplasma_gondii_ME49.GI.237834497 96 Chromalveolata.Alveolata-Oxyrrhis_marina.gi117407635 Excavata.Heterolobosea-Naegleria_gruberi.jgiNaegr1.35594 Opisthokonta.Metazoa-Trichoplax_adhaerens.GI.196012566 68 Opisthokonta.Metazoa-Nematostella_vectensis.GI.156351520 87 Opisthokonta.Metazoa-Helobdella_robusta.jgiHelro1_101345 99 99 96 Opisthokonta.Metazoa-Lottia_gigantea.jgiLotgi1_215196 Opisthokonta.Metazoa-Capitella_teleta.jgiCapca1_174389 97 Opisthokonta.Metazoa-Metaseiulus_occidentalis.GI.391340191 84 Opisthokonta.Metazoa-Daphnia_pulex.jgiDappu1_231678 83 Opisthokonta.Metazoa-Anolis_carolinensis.GI.327281799 94 Opisthokonta.Metazoa-Nasonia_vitripennis.GI.156555392 90 100 87 Opisthokonta.Metazoa-Pediculus_humanus_corporis.GI.242019803 86 Opisthokonta.Metazoa-Acyrthosiphon_pisum.GI.328702574 Opisthokonta.Metazoa-Tribolium_castaneum.GI.91080057 Opisthokonta.Fungi-Coprinopsis_cinerea_okayama7130.GI.169862535

0.2

128

B.

C.

D.

Figure 4.S4. (A) IQ-TREE of HGT candidate GDP-Mannose 4,6,dehydratase gene. (B)

Transcriptome evidence for the gene in P. oklahomensis under control (1M NaCl) conditions. (C) Collinearity of this candidate with Picochlorum SENEW3 as the reference chromosome. Double vertical lines indicate regions of collinear blocks that are not collinear homologs. (D) Collinearity of this candidate with P. soleocismus as the reference chromosome. Full collinearity data can be found in SI Excel file 1.

129

A.

Opisthokonta.Metazoa-Mnemiopsis_leidyi.ML16122a Chromalveolata.Stramenopiles-Vaucheria_litorea.pGI.215400728 100 98 Bacteria.Firmicutes-Sulfobacillus_acidophilus_TPY.GI.339626972 Bacteria.Proteobacteria-Oceaniovalibus_guishaninsula_JLT20.GI.407799865 Plantae.Viridiplantae-Selaginella_moellendorffii.GI.302790868 86 Plantae.Viridiplantae-Auxenochlorella_protothecoides.GI.011397126.1 31 Plantae.Viridiplantae-Chlorella_variabilis_1192.estExt_Genewise1Plus.C_110232 68 Chromalveolata.Haptophyta-Phaeocystis_cordata.RCC1383.5751 79 Plantae.Viridiplantae-Gonium_pectorale.ID.KXZ56358.1 100 72 Plantae.Viridiplantae-Gonium_pectorale.ID.KXZ56360.1 89 100Plantae.Viridiplantae-Chlamydomonas_reinhardtii.GI.159491138 Plantae.Viridiplantae-Gonium_pectorale.ID.KXZ56359.1 100 Bacteria.Chloroflexi-Ktedonobacter.GI.298251494 99 Bacteria.Planctomycetes-Planctomyces.GI.149178136 Plantae.Glaucophyte-Cyanoptyche-gloeocystis_SAG4.97.MMETSP1086.33632_1 Opisthokonta.Fungi-Phaeosphaeria_nodorum_SN15.GI.169602327 45 50 Opisthokonta.Fungi-Schizosaccharomyces_pombe_972h-.GI.162312366 Opisthokonta.Fungi-Schizophyllum_commune_H4-8.GI.302684909 Opisthokonta.Fungi-Clavispora.GI.260951207 100 94 Opisthokonta.Fungi-Debaryomyces_hansenii_CBS767.GI.294656903 Opisthokonta.Fungi-Clavispora.GI.260944548 62 Opisthokonta.Fungi-Lachancea.GI.255713576 94 88 Opisthokonta.Fungi-Lachancea.GI.255711120 68 Opisthokonta.Fungi-Naumovozyma.GI.365981637 73 Opisthokonta.Fungi-Kazachstania.GI.410079819 92 100 1008170 Opisthokonta.Fungi-Saccharomyces_cerevisiae_S288c.GI.6323163 Opisthokonta.Fungi-Zygosaccharomyces.GI.254583396 77 Opisthokonta.Fungi-Tetrapisispora.GI.444319294 Opisthokonta.Fungi-Eremothecium_cymbalariae_DBVPG7215.GI.363750568 Opisthokonta.Fungi-Verticillium_albo-atrum_VaMs.102.GI.302409370 Chromalveolata.Alveolata-Vitrella_brassicaformis.Vbra9705 Target-Picochlorum_DOE101_00297 100 Target-Picochlorum_oculata.contig_44.g569.t1 100 Target-Picochlorum_NBRC102739_59.g610.t1 100 Target-Picochlorum_NBRC102739_60.g591.t1 100 100 Target-Picochlorum_oklahomensis_000003F.g338.t1 71 100 Target-Picochlorum_SE3_000003F_consensus.g331.t1 99 84 Bacteria.Cyanobacteria-Cyanobacterium.GI.428770248 100 Bacteria.Proteobacteria-Francisella_tularensis_subsp._tula.GI.134302770 Bacteria.Planctomycetes-Singulisphaera.GI.430742372 Archaea.Euryarchaeota-Methanosarcina_acetivorans_C2A.GI.20089483 100 Bacteria.Proteobacteria-Desulfotalea_psychrophila_LSv54.GI.51246182 99 Bacteria.Proteobacteria-Edwardsiella.GI.294634417 Bacteria.Proteobacteria-Providencia.GI.268591342 100 89 74 Bacteria.Proteobacteria-Proteus.GI.226330558 100 Bacteria.Proteobacteria-Morganella_morganii_subsp._morgani.GI.455740288 55 89 Bacteria.Proteobacteria-Providencia.GI.291327076 98 Bacteria.Proteobacteria-Starkeya.GI.298293810 89 100 Bacteria.Proteobacteria-Beijerinckia_indica_subsp._indica_.GI.182678765 54 56 Bacteria.Chloroflexi-Ktedonobacter.GI.298248939 55 Plantae.Rhodophyta-Pyropia_haitanensis.gi570632748 Bacteria.Proteobacteria-Desulfovibrio_magneticus_RS-1.GI.239907262 Bacteria.Firmicutes-Paenibacillus_mucilaginosus_KNP414.GI.337749433 100 Bacteria.Firmicutes-Sporolactobacillus_inulinus_CASD.GI.374711277 Bacteria.Actinobacteria-Salinispora_arenicola_CNS-205.GI.159036494 Bacteria.Actinobacteria-Rothia_aeria_F0474.GI.383809509 97 100 95 Bacteria.Actinobacteria-Nesterenkonia_sp._F.GI.373251326 100 Bacteria.Actinobacteria-Segniliparus_rugosus_ATCC_BAA-974.GI.317507564 100 Bacteria.Actinobacteria-Microbacterium_testaceum_StLB037.GI.323359329 100 99 94 Bacteria.Actinobacteria-Nakamurella.GI.258654904 Excavata.Euglenozoa-Leishmania_major_strain_Friedlin.GI.157876137 99 Bacteria.Proteobacteria-Salmonella_enterica_subsp._enteric.GI.421447362 100 82 Opisthokonta.Metazoa-Nasonia_vitripennis.GI.156538983 98 Bacteria.Proteobacteria-Hafnia.GI.365838031 Bacteria.Actinobacteria-Kineococcus_radiotolerans_SRS30216.GI.152964153 Bacteria.Firmicutes-Staphylococcus_saprophyticus_subsp._saprophyticus_.GI.73661481 100 Bacteria.Firmicutes-Macrococcus_caseolyticus_JCSC5402.GI.222151578 100 88 Bacteria.Firmicutes-Melissococcus.GI.332687066 96 100 Bacteria.Firmicutes-Carnobacterium_maltaromaticum_LMA28.GI.414082598 Bacteria.Firmicutes-Bacillus_thuringiensis_serovar_thuringiensis_str._.GI.228975994 100 95 Bacteria.Firmicutes-Sporolactobacillus_inulinus_CASD.GI.374711340 91 Bacteria.Firmicutes-Paenibacillus_mucilaginosus_KNP414.GI.337748795 Bacteria.Cyanobacteria-Nostoc.GI.186682481 Chromalveolata.Alveolata-Cryptosporidium_muris_RN66.GI.209876424 100 Chromalveolata.Alveolata-Chromera_velia.Cvel23245 90 100 100 Chromalveolata.Alveolata-Vitrella_brassicaformis.Vbra3752 Amoebozoa.Centramoebida-Acanthamoeba_castellanii.gi470395277 Bacteria.Cyanobacteria-Moorea_producens_3L.GI.332706486 72 100 Bacteria.Cyanobacteria-Microcystis_aeruginosa_NIES-843.GI.166366416 Bacteria.Cyanobacteria-Lyngbya.GI.119490568 Chromalveolata.Haptophyta-Chrysoculter_rhomboideus.RCC1486.7034 89 98 Chromalveolata.Haptophyta-Pavlova_gyrans.CCMP608.38598 100 100 Chromalveolata.Haptophyta-Pavlova_sp.CCMP459.9338 Chromalveolata.Haptophyta-Pavlova_lutheri.RCC1537.1631 Bacteria.ChlamydiaeVerrucomicrobia-Waddlia_chondrophila_WSU_86-1044.GI.297621853 100 Bacteria.ChlamydiaeVerrucomicrobia-Parachlamydia_acanthamoebae_str..GI.282892060 Bacteria.Nitrospirae-Candidatus_Nitrospira_defluvii.GI.302037007 Bacteria.Planctomycetes-Blastopirellula.GI.87308959 100 79 Bacteria.Planctomycetes-Pirellula.GI.283779947 97 Bacteria.Planctomycetes-Schlesneria.GI.406834270 83 100 98 Bacteria.Planctomycetes-Planctomyces.GI.149176096 90 Bacteria.Planctomycetes-Singulisphaera.GI.430745532 Bacteria.Planctomycetes-planctomycete_KSU-1.GI.386811869 90 93 Bacteria.Cyanobacteria-Pleurocapsa.GI.428202820 99 Bacteria.Cyanobacteria-Gloeobacter.GI.37521626 86 Bacteria.ChlamydiaeVerrucomicrobia-Chthoniobacter_flavus_Ellin4.GI.196234738 98 Bacteria.ChlamydiaeVerrucomicrobia-Pedosphaera_parvula_Ellin5.GI.223936081 Bacteria.unclassified-Candidatus_Methylomirabilis_oxyfera.GI.392375559 88 95 Opisthokonta.Metazoa-Capitella_teleta.jgiCapca1_217557 Bacteria.Cyanobacteria-Cylindrospermopsis_raciborskii_CS-505.GI.282900698

0.4

130

B.

C.

D.

Figure 4.S5. (A) IQ-TREE of HGT candidate indolepyruvate decarboxylase. (B)

Transcriptome evidence for the gene in Picochlorum SENEW3 under control (1M NaCl) conditions. (C) Collinearity of this candidate with Picochlorum SENEW3 as the reference chromosome. Double vertical lines indicate regions of collinear blocks that are not collinear homologs. (D) Collinearity of this candidate with P. soleocismus as the reference chromosome. Full collinearity data can be found in SI Excel file 1.

131

. Differentially .Differentially

Picochlorum SENEW3

and and

P. oklahomensis

derived genes derived in -

of .Gene expression HGT

4.S6

Table green. expressed are in genes highlighted

132

Table 4.S7. Organelle genome statistics

Chloroplast Mitochondria

Length (bp) Genes Length (bp) Genes Picochlorum 74, 484 68 33,007 16 NBRC 102739 P. oculata 72, 741 65 38,672 14 P. oklahomensis 74, 180 68 38,173 17 Picochlorum 74, 269 68 37,587 17 SENEW3 P. soleoscismus 72, 761 65 38,692 14

.

Table 4.S8. Predicted genes in plastid genomes. Present (+); absent (-).

Picochlorum P. oculata P. oklahomensis Picochlorum P. NBRC SENEW3 soleoscismus 102739 accD + + + + + atpA + + + + + atpB + + + + + atpE + + + + + atpF + + + + + atpH + + + + + atpI + + + + + cemA + + + + + chlB + - + + - chlI + + + + + chlL + - + + - chlN + - + + - clpP + + + + + infA + + + + + lhbA + + + + + minD + + + + + petA + + + + + petB + + + + +

133 petD + + + + + petG + + + + + petL + + + + + psaA + + + + + psaB + + + + + psaC + + + + + psaI + + + + + psaJ + + + + + psaM + + + + + psbA + + + + + psbB + + + + + psbC + + + + + psbD + + + + + psbE + + + + + psbF + + + + + psbH + + + + + psbI + + + + + psbJ + + + + + psbK + + + + + psbL + + + + + psbM + + + + + psbN + + + + + psbT + + + + + psbZ + + + + + rbcL + + + + + rpl12 + + + + + rpl14 + + + + + rpl16 + + + + + rpl19 + + + + + rpl2 + + + + + rpl20 + + + + + rpl23 + + + + + rpl32 + + + + + rpl36 + + + + + rpl5 + + + + +

134 rps11 + + + + + rps12 + + + + + rps14 + + + + + rps18 + + + + + rps19 + + + + + rps2 + + + + + rps3 + + + + + rps4 + + + + + rps7 + + + + + rps8 + + + + + rps9 + + + + + tufA + + + + + ycf12 + + + + + ycf3 + + + + + ycf4 + + + + +

135

Table 4.S9. Predicted genes in mitochondrial genomes. Present (+); absent (-); found in nuclear genome (N).

Picochlorum P. Picochlorum SENEW3 oklahomensis NBRC102739 P. oculata P. soleocismus atp1 + + + + + atp6 + + + + + atp9 + + + + + cob + + + + + cox1 + + + + + cox2 + + + + + cox3 + + + + + nad1 + + + + + nad4 + + + + + nad4L + + + + + nad7 + + + + + nad9 + + + - - rps2 + + + + + rps10 + + N N N rps12 + + + + + rrnL + + + + + rrnS + + + - - Total gene number 17 17 16 14 14

136

Figure 4.S6. Venn diagram of differentially expressed genes in P. oklahomensis under the four conditions tested.

A. B

137

C. D.

Figure 4.S7. Gene expression comparison between P. oklahomensis and Picochlorum

SENEW3. (A) 1.5M NaCl 1h, (B) 10mM NaCl 1h, (C) 1.5M NaCl 5h, (D) 10mM NaCl

5h.

138

Table 4.S10. Allele-specific gene expression in P. oklahomensis. Primary and haplotig

columns represent percentage of gene pairs with > 90% of reads mapping to one of the

two alleles on either the primary or haplotig contigs. Biallelic expression defined as

between 40 and 60% of reads mapping to both alleles.

Condition Primary Haplotig Monoallelic Biallelic Total gene

allele allele pairs

1M 0h 17.8% 13.1% 30.9% 33.7% 618 (control) 1.5 M 1h 19.4% 12.9% 32.4% 31.1% 479 1.5M 5h 18.7% 14.9% 33.6% 29.5% 509 10mM 1h 21.2% 13.5% 34.7% 32.7% 495 10mM 5h 19.8% 14.1% 33.9% 34.4% 616

A. B.

Figure 4.S8. Number of (A) monoallelic (118 pairs total) and (B) biallelically (200 pairs

total) expressed allele pairs shared under various salinity treatment conditions.

139

Figure 4.S9. Number of allele pairs showing high change in ratio of primary: haplotig allele expression under various salinity treatment conditions. 71 pairs total.

140

CONCLUSION

Picochlorum served as a model to understand adaptive evolution of unicellular eukaryotes to environmental fluctuations. What may appear to be a small, “simple” genome was revealed to be a diploid organism with complex adaptive traits. This highlights the difficulty in working with non-model species for which we have limited ploidy information and suggests that with the use of new phase-aware sequencing technologies, more algal genomes may turn out to be diploid in nature.

Varying levels of haplotype heterozygosity among Picochlorum species and condition-dependent allele-specific expression in P. oklahomensis leads to questions about why large divergent allele pools are maintained in some species and whether it is related to extreme environmental fluctuations (212). Genome reduction pressure has resulted in the loss of environmentally irrelevant pathways, clustering of biochemically related pathways, and coregulated genes that are colocalized in

“gene neighborhoods”. These adaptive strategies in terms of genome organization are only recently coming to light in other eukaryotes including plants, and poses the question of what kinds of transcriptional or epigenetic regulation may be involved with clustered genes that do not exhibit polycistronic translation (243). Despite the pressure for genome reduction, key novel functional gains of bacterial genes were observed in the Picochlorum lineage and were differentially acquired, indicating the dynamic nature of HGT. This is novel evidence that contributes to an ongoing debate about the role and extent of HGT in eukaryotic evolution and provides a robust example with high quality genomes from closely related species. One HGT candidate

141 in specific, involved in IAA production, brings to light the role of IAA signaling in algae which may mediate inter-species, algal-bacterial, symbiotic interactions or intra-species signaling (197-199). Our results align with the latter hypothesis.

Another adaptive strategy utilized by this genus is metabolic flexibility in the form of mixotrophy to overcome salinity stress. Overall, my research has characterized and generated interest in an important algal genus with potential for biomass production, contributed four new high quality genomes to the sequenced green algal genome database, and has contributed to the understanding of eukaryotic adaptive evolution and niche expansion. Future work should address these issues in lab- based studies of other Picochlorum species and use metagenomics and metatranscriptomics to test hypotheses regarding the potential sources of bacterial

HGT and differential gene expression in nature. Full-length transcript sequencing, better gene models, and metatranscriptomics would help to better address the role of divergent allele pools in Picochlorum taxa. Furthermore, it is important to develop genetic tools (e.g., RNAi, CRISPR-Cas9) in different Picochlorum species to study the function(s) of genes that I have postulated to play key roles in adaptation to salinity stress.

142

REFERENCES

1. Farooq W, Suh WI, Park MS, & Yang J-W (2015) Water use and its recycling in microalgae cultivation for biofuel application. Bioresource technology 184:73- 81. 2. Singh NK & Dhar DW (2011) Microalgae as second generation biofuel. A review. Agronomy for Sustainable Development 31(4):605-629. 3. Wijffels RH & Barbosa MJ (2010) An outlook on microalgal biofuels. Science(Washington) 329(5993):796-799. 4. Scott SA, et al. (2010) Biodiesel from algae: challenges and prospects. Current Opinion in Biotechnology 21(3):277-286. 5. Demmig-Adams B & Adams Iii WW (1992) Photoprotection and other responses of plants to high light stress. Annual Review of Plant Biology 43(1):599-626. 6. Ogawa T, Fujii T, & Aiba S (1980) Effect of oxygen on the growth (yield) of Chlorella vulgaris. Archives of microbiology 127(1):25-31. 7. Stephens E, et al. (2010) An economic and technical evaluation of microalgal biofuels. Nature biotechnology 28(2):126-128. 8. Borowitzka M (2013) Species and Strain Selection. Algae for Biofuels and Energy, eds Borowitzka MA & Moheimani NR (Springer), Vol 5, pp 77-89. 9. von Alvensleben N, Stookey K, Magnusson M, & Heimann K (2013) Salinity Tolerance of Picochlorum atomus and the Use of Salinity for Contamination Control by the Freshwater Cyanobacterium Pseudanabaena limnetica. Plos One 8(5):e63569. 10. Kirst G (1990) Salinity tolerance of eukaryotic marine algae. Annual review of plant biology 41(1):21-53. 11. Hagemann M (2011) Molecular biology of cyanobacterial salt acclimation. FEMS microbiology reviews 35(1):87-123. 12. Chen H & Jiang JG (2009) Osmotic responses of Dunaliella to the changes of salinity. J Cell Physiol 219(2):251-258. 13. Los DA & Murata N (2004) Membrane fluidity and its roles in the perception of environmental signals. Biochimica et Biophysica Acta (BBA)-Biomembranes 1666(1):142-157. 14. Azachi M, et al. (2002) Salt induction of fatty acid elongase and membrane lipid modifications in the extreme halotolerant alga Dunaliella salina. Plant Physiology 129(3):1320-1329. 15. Zelazny AM, Shaish A, & Pick U (1995) Plasma membrane sterols are essential for sensing osmotic changes in the halotolerant alga Dunaliella. Plant physiology 109(4):1395-1403.

143

16. Curtain CC, Looney FD, Regan DL, & Ivancic NM (1983) Changes in the ordering of lipids in the membrane of Dunaliella in response to osmotic- pressure changes. An esr study. Biochemical Journal 213(1):131-136. 17. Maeda M & Thompson GA (1986) On the mechanism of rapid plasma membrane and chloroplast envelope expansion in Dunaliella salina exposed to hypoosmotic shock. The Journal of cell biology 102(1):289-297. 18. Türkan I & Demiral T (2009) Recent developments in understanding salinity tolerance. Environmental and Experimental Botany 67(1):2-9. 19. Kaplan F, Lewis LA, Herburger K, & Holzinger A (2013) Osmotic stress in Arctic and Antarctic strains of the green alga Zygnema (Zygnematales, ): effects on photosynthesis and ultrastructure. Micron 44:317- 330. 20. Gasulla F, et al. (2013) The response of Asterochloris erici (Ahmadjian) Skaloud et Peksa to desiccation: a proteomic approach. Plant, cell & environment 36(7):1363-1378. 21. Karsten U & Holzinger A (2012) Light, temperature, and desiccation effects on photosynthetic activity, and drought-induced ultrastructural changes in the green alga Klebsormidium dissectum (Streptophyta) from a high alpine soil crust. Microbial ecology 63(1):51-63. 22. Martinez J, Silva H, Ledent J, & Pinto M (2007) Effect of drought stress on the osmotic adjustment, cell wall elasticity and cell volume of six cultivars of common beans (Phaseolus vulgaris L.). European Journal of Agronomy 26(1):30-38. 23. Flowers TJ, Munns R, & Colmer TD (2014) Sodium chloride toxicity and the cellular basis of salt tolerance in halophytes. Annals of botany 115(3):419- 431. 24. Fisher M, Gokhman I, Pick U, & Zamir A (1997) A structurally novel transferrin-like protein accumulates in the plasma membrane of the unicellular green alga Dunaliella salina grown in high salinities. Journal of biological chemistry 272(3):1565-1570. 25. Fisher M, Gokhman I, Pick U, & Zamir A (1996) A salt-resistant plasma membrane carbonic anhydrase is induced by salt in Dunaliella salina. Journal of Biological Chemistry 271(30):17718-17723. 26. Foflonker F, et al. (2016) The unexpected extremophile: Tolerance to fluctuating salinity in the green alga Picochlorum. Algal Research 16:465-472. 27. Katz A, Waridel P, Shevchenko A, & Pick U (2007) Salt-induced changes in the plasma membrane proteome of the halotolerant alga Dunaliella salina as revealed by blue native gel electrophoresis and nano-LC-MS/MS analysis. Molecular & Cellular Proteomics 6(9):1459-1472. 28. Krell A, Funck D, Plettner I, John U, & Dieckmann G (2007) Regulation of proline metabolism under salt stress in the psychrophilic diatom Fragilariopsis cylindrus (Bacillariophyceae). Journal of Phycology 43(4):753- 762. 29. Dickson D & Kirst G (1987) Osmotic adjustment in marine eukaryotic algae: the role of inorganic ions, quaternary ammonium, tertiary sulphonium and carbohydrate solutes. New Phytologist 106(4):645-655.

144

30. Van Bergeijk SA, Van der Zee C, & Stal LJ (2003) Uptake and excretion of dimethylsulphoniopropionate is driven by salinity changes in the marine benthic diatom Cylindrotheca closterium. European Journal of Phycology 38(4):341-349. 31. Goyal A (2007) Osmoregulation in Dunaliella, Part II: Photosynthesis and starch contribute carbon for glycerol synthesis during a salt stress in Dunaliella tertiolecta. Plant Physiology and Biochemistry 45(9):705-710. 32. Yokthongwattana C, et al. (2012) Proteomic analysis of salinity-stressed Chlamydomonas reinhardtii revealed differential suppression and induction of a large number of important housekeeping proteins. Planta 235(3):649- 659. 33. Jahnke LS & White AL (2003) Long-term hyposaline and hypersaline stresses produce distinct antioxidant responses in the marine alga Dunaliella tertiolecta. Journal of Plant Physiology 160(10):1193-1202. 34. Yoshida K, Igarashi E, Wakatsuki E, Miyamoto K, & Hirata K (2004) Mitigation of osmotic and salt stresses by abscisic acid through reduction of stress- derived oxidative damage in Chlamydomonas reinhardtii. Plant Science 167(6):1335-1341. 35. Khona DK, et al. (2016) Characterization of salt stress-induced palmelloids in the green alga, Chlamydomonas reinhardtii. Algal Research 16:434-448. 36. Wei S, et al. (2017) Salinity-Induced Palmella Formation Mechanism in Halotolerant Algae Dunaliella salina Revealed by Quantitative Proteomics and Phosphoproteomics. Frontiers in plant science 8. 37. Steele DJ, Franklin DJ, & Underwood GJ (2014) Protection of cells from salinity stress by extracellular polymeric substances in diatom biofilms. Biofouling 30(8):987-998. 38. Neale PJ & Melis A (1989) Salinity-stress enhances photoinhibition of photosynthesis in Chlamydomonas reinhardtii. Journal of Plant Physiology 134(5):619-622. 39. Kim M, Park S, Polle JE, & Jin E (2010) Gene expression profiling of Dunaliella sp. acclimated to different salinities. Phycological research 58(1):17-28. 40. Ralph PJ, Ryan KG, Martin A, & Fenton G (2007) Melting out of sea ice causes greater photosynthetic stress in algae than freezing in. Journal of phycology 43(5):948-956. 41. Borowitzka MA, Borowitzka LJ, & Kessly D (1990) Effects of salinity increase on carotenoid accumulation in the green alga Dunaliella salina. Journal of applied Phycology 2(2):111-119. 42. Rao AR, Dayananda C, Sarada R, Shamala T, & Ravishankar G (2007) Effect of salinity on growth of green alga Botryococcus braunii and its constituents. Bioresource technology 98(3):560-564. 43. Liska AJ, Shevchenko A, Pick U, & Katz A (2004) Enhanced photosynthesis and redox energy production contribute to salinity tolerance in Dunaliella as revealed by homology-based proteomics. Plant physiology 136(1):2806- 2817. 44. Wang S, Lambert W, Giang S, Goericke R, & Palenik B (2013) Microalgal Assemblages in a Poikilohaline Pond.

145

45. De la Vega M, Díaz E, Vila M, & León R (2011) Isolation of a new strain of Picochlorum sp. and characterization of its potential biotechnological applications. Biotechnology Progress 27(6):1535-1543. 46. Zhu Y & Dunford NT (2013) Growth and biomass characteristics of Picochlorum oklahomensis and Nannochloropsis oculata. Journal of the American Oil Chemists' Society (Journal Article):1-9. 47. Pereira H, et al. (2013) Isolation and fatty acid profile of selected microalgae strains from the Red Sea for biofuel production. Energies 6(6):2773-2783. 48. Unkefer CJ, et al. (2017) Review of the algal biology program within the National Alliance for Advanced Biofuels and Bioproducts. Algal Research 22:187-215. 49. Henley WJ, et al. (2004) Phylogenetic analysis of the ‘Nannochloris-like’algae and diagnoses of Picochlorum oklahomensis gen. et sp. nov.(Trebouxiophyceae, Chlorophyta). Phycologia 43(6):641-652. 50. Held IM & Soden BJ (2006) Robust responses of the hydrological cycle to global warming. Journal of climate 19(21):5686-5699. 51. Yu L (2011) A global relationship between the ocean water cycle and near‐ surface salinity. Journal of Geophysical Research: Oceans 116(C10). 52. Durack PJ, Wijffels SE, & Matear RJ (2012) Ocean salinities reveal strong global water cycle intensification during 1950 to 2000. science 336(6080):455-458. 53. SChmitt RW (2008) Salinity and the global water cycle. Oceanography 21(1):12-19. 54. Lauria ML, Purdie DA, & Sharples J (1999) Contrasting phytoplankton distributions controlled by tidal turbulence in an estuary. Journal of Marine Systems 21(1):189-197. 55. Wang S, Lambert W, Giang S, Goericke R, & Palenik B (2014) Microalgal assemblages in a poikilohaline pond. Journal of Phycology 50(2):303-309. 56. Matsuzaki M, et al. (2004) Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D. Nature 428(6983):653-657. 57. Palenik B, et al. (2007) The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. Proceedings of the National Academy of Sciences 104(18):7705-7710. 58. Qiu H, et al. (2013) Adaptation through horizontal gene transfer in the cryptoendolithic red alga Galdieria phlegrea. Current Biology 23(19):R865- R866. 59. Schönknecht G, et al. (2013) Gene transfer from bacteria and archaea facilitated evolution of an extremophilic eukaryote. Science 339(6124):1207- 1210. 60. Blumenthal T (2004) Operons in eukaryotes. Briefings in Functional Genomics 3(3):199-211. 61. Hodson RC, Williams SK, & Davidson W (1975) Metabolic control of urea catabolism in Chlamydomonas reinhardi and Chlorella pyrenoidosa. Journal of 121(3):1022-1035.

146

62. Thompson JF & Muenster A-ME (1971) Separation of the Chlorella ATP: urea amido-lyase into two components. Biochemical and biophysical research communications 43(5):1049-1055. 63. Strope PK, Nickerson KW, Harris SD, & Moriyama EN (2011) Molecular evolution of urea amidolyase and urea carboxylase in fungi. BMC evolutionary biology 11(1):80. 64. Solomon CM, Collier JL, Berg GM, & Glibert PM (2010) Role of urea in microbial metabolism in aquatic systems: a biochemical and molecular review. Aquatic Microbial Ecology 59(1):67-88. 65. Ingram-Smith C, Martin SR, & Smith KS (2006) Acetate kinase: not just a bacterial enzyme. Trends in microbiology 14(6):249-253. 66. Grundy FJ, Waters DA, Takova TY, & Henkin TM (1993) Identification of genes involved in utilization of acetate and acetoin in Bacillus subtilis. Molecular microbiology 10(2):259-271. 67. Xiao Z & Xu P (2007) Acetoin metabolism in bacteria. Critical reviews in microbiology 33(2):127-140. 68. Renna MC, Najimudin N, Winik L, & Zahler S (1993) Regulation of the Bacillus subtilis alsS, alsD, and alsR genes involved in post-exponential-phase production of acetoin. Journal of Bacteriology 175(12):3863-3875. 69. Frädrich C, et al. (2012) The transcription factor AlsR binds and regulates the promoter of the alsSD operon responsible for acetoin formation in Bacillus subtilis. Journal of bacteriology 194(5):1100-1112. 70. Nicholson WL (2008) The Bacillus subtilis ydjL (bdhA) gene encodes acetoin reductase/2, 3-butanediol dehydrogenase. Applied and environmental microbiology 74(22):6832-6838. 71. Yi G, Sze S-H, & Thon MR (2007) Identifying clusters of functionally related genes in genomes. Bioinformatics 23(9):1053-1060. 72. Saier Jr MH, Tran CV, & Barabote RD (2006) TCDB: the Transporter Classification Database for membrane transport protein analyses and information. Nucleic acids research 34(suppl_1):D181-D186. 73. Qiu Q-S, Guo Y, Dietrich MA, Schumaker KS, & Zhu J-K (2002) Regulation of SOS1, a plasma membrane Na+/H+ exchanger in Arabidopsis thaliana, by SOS2 and SOS3. Proceedings of the National Academy of Sciences 99(12):8436-8441. 74. Wu S-J, Ding L, & Zhu J-K (1996) SOS1, a genetic locus essential for salt tolerance and potassium acquisition. The Plant Cell 8(4):617-627. 75. Taylor AR, Brownlee C, & Wheeler GL (2012) Proton channels in algae: reasons to be excited. Trends in plant science 17(11):675-684. 76. Katz A & Pick U (2001) Plasma membrane electron transport coupled to Na+ extrusion in the halotolerant alga Dunaliella. Biochimica et Biophysica Acta (BBA)-Bioenergetics 1504(2):423-431. 77. Shono M, Wada M, Hara Y, & Fujii T (2001) Molecular cloning of Na+-ATPase cDNA from a marine alga, Heterosigma akashiwo. Biochimica et Biophysica Acta (BBA)-Biomembranes 1511(1):193-199.

147

78. Kishimoto M, et al. (2013) Functional expression of an animal type-Na+- ATPase gene from a marine red seaweed Porphyra yezoensis increases salinity tolerance in rice plants. Plant Biotechnology 30(4):417-422. 79. Barragán V, et al. (2012) Ion exchangers NHX1 and NHX2 mediate active potassium uptake into vacuoles to regulate cell turgor and stomatal function in Arabidopsis. The Plant Cell 24(3):1127-1142. 80. Blumwald E (2000) Sodium transport and salt tolerance in plants. Current opinion in cell biology 12(4):431-434. 81. Nakayama Y, Fujiu K, Sokabe M, & Yoshimura K (2007) Molecular and electrophysiological characterization of a mechanosensitive channel expressed in the chloroplasts of Chlamydomonas. Proceedings of the National Academy of Sciences 104(14):5883-5888. 82. Amtmann A & Beilby MJ (2010) The role of ion channels in plant salt tolerance. Ion Channels and Plant Stress Responses, (Springer), pp 23-46. 83. Parida AK & Das AB (2005) Salt tolerance and salinity effects on plants: a review. Ecotoxicology and environmental safety 60(3):324-349. 84. Sauer N, Caspari T, Klebl F, & Tanner W (1990) Functional expression of the Chlorella hexose transporter in Schizosaccharomyces pombe. Proceedings of the National Academy of Sciences 87(20):7949-7952. 85. Chen T-Y, Lin H-Y, Lin C-C, Lu C-K, & Chen Y-M (2012) Picochlorum as an alternative to Nannochloropsis for grouper larval rearing. Aquaculture 338(Journal Article):82-88. 86. Liang Y, Sarkany N, & Cui Y (2009) Biomass and lipid productivities of Chlorella vulgaris under autotrophic, heterotrophic and mixotrophic growth conditions. Biotechnology letters 31(7):1043-1049. 87. Doebbe A, et al. (2007) Functional integration of the HUP1 hexose symporter gene into the genome of C. reinhardtii: impacts on biological H2 production. Journal of biotechnology 131(1):27-33. 88. Price DC, et al. (2012) Cyanophora paradoxa genome elucidates origin of photosynthesis in algae and plants. Science 335(6070):843-847. 89. Stamatakis A, Hoover P, & Rougemont J (2008) A rapid bootstrap algorithm for the RAxML web servers. Systematic biology 57(5):758-771. 90. Moustafa A & Bhattacharya D (2008) PhyloSort: a user-friendly phylogenetic sorting tool and its application to estimating the cyanobacterial contribution to the nuclear genome of Chlamydomonas. BMC Evolutionary Biology 8(1):6. 91. Moreau H, et al. (2012) Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage. Genome biology 13(8):R74. 92. Gruber S & Seidl-Seiboth V (2012) Self versus non-self: fungal cell wall degradation in Trichoderma. Microbiology 158(1):26-34. 93. Bonin CP, Potter I, Vanzin GF, & Reiter W-D (1997) The MUR1 gene of Arabidopsis thaliana encodes an isoform of GDP-D-mannose-4, 6- dehydratase, catalyzing the first step in the de novo synthesis of GDP-L- fucose. Proceedings of the National Academy of Sciences 94(5):2085-2090.

148

94. Reiter W-D, Chapple CC, & Somerville CR (1993) Altered growth and cell walls in a fucose-deficient mutant of Arabidopsis. SCIENCE-NEW YORK THEN WASHINGTON- 261:1032-1032. 95. Blanc G, et al. (2012) The genome of the polar eukaryotic microalga Coccomyxa subellipsoidea reveals traits of cold adaptation. Genome Biol 13(5):R39. 96. Palenik B, et al. (2003) The genome of a motile marine Synechococcus. Nature 424(6952):1037-1042. 97. Gabdrakhmanova L, et al. (2005) Salt stress induction of glutamyl endopeptidase biosynthesis in Bacillus intermedius. Microbiological research 160(3):233-242. 98. De Hostos EL, Togasaki RK, & Grossman A (1988) Purification and biosynthesis of a derepressible periplasmic arylsulfatase from Chlamydomonas reinhardtii. The Journal of cell biology 106(1):29-37. 99. Lien T & Schreiner Ø (1975) Purification of a derepressible arylsulfatase from Chlamydomonas reinhardti: Properties of the enzyme in intact cells and in purified state. Biochimica et Biophysica Acta (BBA)-Enzymology 384(1):168-179. 100. Takahashi H, Kopriva S, Giordano M, Saito K, & Hell R (2011) Sulfur assimilation in photosynthetic organisms: molecular functions and regulations of transporters and assimilatory enzymes. Annual review of plant biology 62:157-184. 101. Yildiz FH, Davies JP, & Grossman AR (1994) Characterization of sulfate transport in Chlamydomonas reinhardtii during sulfur-limited and sulfur- sufficient growth. Plant Physiology 104(3):981-987. 102. Ratti S & Giordano M (2008) Allocation of sulfur to sulfonium compounds in microalgae. Sulfur Assimilation and Abiotic Stress in Plants, (Springer), pp 317-333. 103. England JR, Huang J, Jennings MJ, Makde RD, & Tan S (2010) RCC1 uses a conformationally diverse loop region to interact with the nucleosome: a model for the RCC1–nucleosome complex. Journal of molecular biology 398(4):518-529. 104. Bowler C, et al. (2008) The Phaeodactylum genome reveals the evolutionary history of diatom genomes. Nature 456(7219):239-244. 105. Sergeeva E, Liaimer A, & Bergman B (2002) Evidence for production of the phytohormone indole-3-acetic acid by cyanobacteria. Planta 215(2):229-238. 106. De-Bashan LE, Antoun H, & Bashan Y (2008) Involvement of indole‐3‐ acetic acid produced by the growth‐promoting bacterium Azospirillum spp. in promoting growth of Chlorella vulgaris. Journal of Phycology 44(4):938- 947. 107. Novoselov SV, et al. (2002) Selenoproteins and selenocysteine insertion system in the model plant cell system, Chlamydomonas reinhardtii. The EMBO journal 21(14):3681-3693.

149

108. Lobanov AV, et al. (2007) Evolutionary dynamics of eukaryotic selenoproteomes: large selenoproteomes may associate with aquatic life and small with terrestrial life. Genome biology 8(9):R198. 109. Liu Q & Jiang L (2011) Bioinformatics of selenoproteins. Selenoproteins and Mimics, (Springer), pp 125-140. 110. Mariotti M, Lobanov AV, Guigo R, & Gladyshev VN (2013) SECISearch3 and Seblastian: new tools for prediction of SECIS elements and selenoproteins. Nucleic acids research 41(15):e149-e149. 111. Cao H, Zhang L, & Melis A (2001) Bioenergetic and metabolic processes for the survival of sulfur-deprived Dunaliella salina (Chlorophyta). Journal of applied phycology 13(1):25-34. 112. Blanc G, et al. (2010) The Chlorella variabilis NC64A genome reveals adaptation to photosymbiosis, coevolution with viruses, and cryptic sex. The Plant Cell 22(9):2943-2955. 113. Huesemann MH, et al. (2009) Biomass productivities in wild type and pigment mutant of Cyclotella sp.(Diatom). Applied biochemistry and biotechnology 157(3):507-526. 114. Hannon M, Gimpel J, Tran M, Rasala B, & Mayfield S (2010) Biofuels from algae: challenges and potential. Biofuels 1(5):763-784. 115. Goldman JC & McCarthy JJ (1978) Steady state growth and ammonium uptake of a fast‐growing marine diatom. Limnology and oceanography 23(4):695-703. 116. Wu TD & Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7):873-881. 117. Stanke M & Morgenstern B (2005) AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic acids research 33(suppl_2):W465-W467. 118. Merchant SS, et al. (2007) The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318(5848):245-250. 119. Prochnik SE, et al. (2010) Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri. Science 329(5988):223-226. 120. Worden AZ, et al. (2009) Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes Micromonas. Science 324(5924):268-272. 121. Li L, Stoeckert CJ, & Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome research 13(9):2178-2189. 122. Katoh K, Misawa K, Kuma Ki, & Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic acids research 30(14):3059-3066. 123. Talavera G & Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic biology 56(4):564-577. 124. Notredame C, Higgins DG, & Heringa J (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of molecular biology 302(1):205-217.

150

125. Guindon S, et al. (2010) New algorithms and methods to estimate maximum- likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic biology 59(3):307-321. 126. Morgat A, et al. (2011) UniPathway: a resource for the exploration and annotation of metabolic pathways. Nucleic acids research 40(D1):D761-D769. 127. Ashburner M, et al. (2000) Gene Ontology: tool for the unification of biology. Nature genetics 25(1):25-29. 128. Conesa A, et al. (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18):3674-3676. 129. Boetius A, et al. (2013) Export of algal biomass from the melting Arctic sea ice. Science 339(6126):1430-1432. 130. Barbier G, et al. (2005) Comparative genomics of two closely related unicellular thermo-acidophilic , Galdieria sulphuraria and Cyanidioschyzon merolae, reveals the molecular basis of the metabolic flexibility of Galdieria sulphuraria and significant differences in carbohydrate metabolism of both algae. Plant Physiology 137(2):460-474. 131. Wolf YI & Koonin EV (2013) Genome reduction as the dominant mode of evolution. Bioessays 35(9):829-837. 132. Foflonker F, et al. (2015) Genome of the halotolerant green alga Picochlorum sp. reveals strategies for thriving under fluctuating environmental conditions. Environmental microbiology 17(2):412-426. 133. Henley WJ, Kvíderová J, Kirkwood AE, Milner J, & Potter AT (2007) Life in a Hypervariable Environment. Algae and Cyanobacteria in Extreme Environments, (Springer), pp 681-694. 134. Dahmen I, et al. (2014) Optimisation of the critical medium components for better growth of Picochlorum sp. and the role of stressful environments for higher lipid production. Journal of the Science of Food and Agriculture 94(8):1628-1638. 135. El-Kassas HY (2013) Growth and fatty acid profile of the marine microalga Picochlorum Sp. grown under nutrient stress conditions. The Egyptian Journal of Aquatic Research 39(4):233-239. 136. Zhu Y (2012) Biomass and flocculation characteristics of Picochlorum oklahomensis and Nannochloropsis oculata (Oklahoma State University). 137. Anonymous (2014) National alliance for advanced biofuels and bio-products (NAABB) final report (Department of Energy). 138. Goldman JC & McCarthy JJ (1978) Steady state growth and ammonium uptake of a fast‐growing marine diatom 1. Limnology and Oceanography 23(4):695-703. 139. Guillard RRL & Ryther JH (1962) Studies of Marine Planktonic Diatoms: I. Cyclotella nana Hustedt and Detonula confervacea (Cleve) Gran. Canadian Journal of Microbiology 8(2):229-239.

151

140. Kanehisa M, Goto S, Sato Y, Furumichi M, & Tanabe M (2011) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res:gkr988. 141. Oliveros JC (2007) VENNY. An interactive tool for comparing lists with Venn Diagrams. 142. Emanuelsson O, Nielsen H, Brunak S, & von Heijne G (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300(4):1005-1016. 143. Ananyev G & Dismukes GC (2005) How fast can photosystem II split water? Kinetic performance at high and low frequencies. Photosynthesis research 84(1-3):355-365. 144. Kolber ZS, Prášil O, & Falkowski PG (1998) Measurements of variable chlorophyll fluorescence using fast repetition rate techniques: defining methodology and experimental protocols. Biochimica et Biophysica Acta (BBA)-Bioenergetics 1367(1):88-106. 145. Vinyard DJ, Zachary CE, Ananyev G, & Dismukes GC (2013) Thermodynamically accurate modeling of the catalytic cycle of photosynthetic oxygen evolution: A mathematical solution to asymmetric Markov chains. Biochimica et Biophysica Acta (BBA)-Bioenergetics 1827(7):861-868. 146. Quesada A, Hidalgo J, & Fernandez E (1998) Three Nrt2 genes are differentially regulated in Chlamydomonas reinhardtii. Molecular and General Genetics MGG 258(4):373-377. 147. Wingler A, Lea PJ, Quick WP, & Leegood RC (2000) Photorespiration: metabolic pathways and their role in stress protection. Philosophical Transactions of the Royal Society B: Biological Sciences 355(1402):1517-1529. 148. Stabenau H & Winkler U (2005) Glycolate metabolism in green algae. Physiologia Plantarum 123(3):235-245. 149. Booth W & Beardall J (1991) Effects of salinity on inorganic carbon utilization and carbonic anhydrase activity in the halotolerant alga Dunaliella salina (Chlorophyta). Phycologia 30(2):220-225. 150. Bykova NV, Keerberg O, Pärnik T, Bauwe H, & Gardeström P (2005) Interaction between photorespiration and respiration in transgenic potato plants with antisense reduction in glycine decarboxylase. Planta 222(1):130- 140. 151. Collakova E, et al. (2008) Arabidopsis 10-formyl tetrahydrofolate deformylases are essential for photorespiration. The Plant Cell Online 20(7):1818-1832. 152. Rachmilevitch S, Cousins AB, & Bloom AJ (2004) Nitrate assimilation in plant shoots depends on photorespiration. Proceedings of the National Academy of Sciences, USA 101(31):11506-11510. 153. Henley WJ, et al. (2004) Phylogenetic analysis of the'Nannochloris-like'algae and diagnoses of Picochlorum oklahomensis gen. et sp. nov.(Trebouxiophyceae, Chlorophyta). Phycologia 43(6):641-652.

152

154. Xia B-B, Wang S-H, Duan J-B, & Bai L-H (2014) The relationship of glycerol and glycolysis metabolism patway under hyperosmotic stress in Dunaliella salina. Central European Journal of Biology 9(9):901-908. 155. Murata N, Takahashi S, Nishiyama Y, & Allakhverdiev SI (2007) Photoinhibition of photosystem II under environmental stress. Biochimica et Biophysica Acta (BBA)-Bioenergetics 1767(6):414-421. 156. Kolling DR, Brown TS, Ananyev G, & Dismukes GC (2009) Photosynthetic Oxygen Evolution Is Not Reversed at High Oxygen Pressures: Mechanistic Consequences for the Water-Oxidizing Complex†. Biochemistry 48(6):1381- 1389. 157. Dasgupta J, Ananyev GM, & Dismukes GC (2008) Photoassembly of the water- oxidizing complex in photosystem II. Coordination chemistry reviews 252(3):347-360. 158. Takahashi S & Murata N (2008) How do environmental stresses accelerate photoinhibition? Trends in Plant Science 13(4):178-182. 159. Allakhverdiev SI & Murata N (2004) Environmental stress inhibits the synthesis de novo of proteins involved in the photodamage–repair cycle of photosystem II in Synechocystis sp. PCC 6803. Biochimica et Biophysica Acta (BBA)-Bioenergetics 1657(1):23-32. 160. Henley WJ, Major KM, & Hironaka JL (2002) RESPONSE TO SALINITY AND HEAT STRESS IN TWO HALOTOLERANT CHLOROPHYTE ALGAE1. Journal of Phycology 38(4):757-766. 161. Iraki NM, Bressan RA, Hasegawa P, & Carpita NC (1989) Alteration of the physical and chemical structure of the primary cell wall of growth-limited plant cells adapted to osmotic stress. Plant Physiology 91(1):39-47. 162. An P, et al. (2014) Effects of NaCl on root growth and cell wall composition of two soya bean cultivars with contrasting salt tolerance. Journal of agronomy and crop science 200(3):212-218. 163. Kang JS, et al. (2008) Salt tolerance of Arabidopsis thaliana requires maturation of N-glycosylated proteins in the Golgi apparatus. Proceedings of the National Academy of Sciences 105(15):5933-5938. 164. Zhu J, et al. (2010) A cellulose synthase‐like protein is required for osmotic stress tolerance in Arabidopsis. The Plant Journal 63(1):128-140. 165. Kasprzewska A (2003) Plant chitinases-regulation and function. Cellular and Molecular Biology Letters 8(3):809-824. 166. Yun D-J, et al. (1996) Novel osmotically induced antifungal chitinases and bacterial expression of an active recombinant isoform. Plant Physiology 111(4):1219-1225. 167. Schapire AL, et al. (2008) Arabidopsis synaptotagmin 1 is required for the maintenance of plasma membrane integrity and cell viability. The Plant Cell Online 20(12):3374-3388. 168. Yamazaki T, Kawamura Y, Minami A, & Uemura M (2008) Calcium-dependent freezing tolerance in Arabidopsis involves membrane resealing via synaptotagmin SYT1. The Plant Cell Online 20(12):3389-3404.

153

169. Mazel A, Leshem Y, Tiwari BS, & Levine A (2004) Induction of salt and osmotic stress tolerance by overexpression of an intracellular vesicle trafficking protein AtRab7 (AtRabG3e). Plant Physiology 134(1):118-128. 170. Schluter D (2001) Ecology and the origin of species. Trends in ecology & evolution 16(7):372-380. 171. Rundell RJ & Price TD (2009) Adaptive radiation, nonadaptive radiation, ecological speciation and nonecological speciation. Trends in Ecology & Evolution 24(7):394-399. 172. Davis MB, Shaw RG, & Etterson JR (2005) Evolutionary responses to changing climate. Ecology 86(7):1704-1714. 173. Sgro CM, Lowe AJ, & Hoffmann AA (2011) Building evolutionary resilience for conserving biodiversity under climate change. Evolutionary Applications 4(2):326-337. 174. Doney SC, et al. (2011) Climate change impacts on marine ecosystems. 175. Scheffer M, Straile D, van Nes EH, & Hosper H (2001) Climatic warming causes regime shifts in lake food webs. Limnology and Oceanography 46(7):1780-1783. 176. Winder M & Sommer U (2012) Phytoplankton response to a changing climate. Hydrobiologia 698(1):5-16. 177. Flanagan KM, McCauley E, Wrona F, & Prowse T (2003) Climate change: the potential for latitudinal effects on algal biomass in aquatic ecosystems. Canadian Journal of Fisheries and Aquatic Sciences 60(6):635-639. 178. Hoegh-Guldberg O & Bruno JF (2010) The impact of climate change on the world’s marine ecosystems. Science 328(5985):1523-1528. 179. Munns R & Gilliham M (2015) Salinity tolerance of crops–what is the cost? New Phytologist 208(3):668-673. 180. Dasgupta S, Hossain MM, Huq M, & Wheeler D (2015) Climate change and soil salinity: The case of coastal Bangladesh. Ambio 44(8):815-826. 181. Henley WJ, et al. (2004) Phylogenetic analysis of the ‘Nannochloris-like’algae and diagnoses of Picochlorum oklahomensis gen. et sp. nov. (Trebouxiophyceae, Chlorophyta). Phycologia 43(6):641-652. 182. Wang S, Shi X, & Palenik B (2016) Characterization of Picochlorum sp. use of wastewater generated from hydrothermal liquefaction as a nitrogen source. Algal Research 13:311-317. 183. Kim EJ, Park S, Hong H-J, Choi Y-E, & Yang J-W (2011) Biosorption of chromium (Cr (III)/Cr (VI)) on the residual microalga Nannochloris oculata after lipid extraction for biodiesel production. Bioresource technology 102(24):11155-11160. 184. Cai W, Dunford NT, Wang N, Zhu S, & He H (2016) Audible sound treatment of the microalgae Picochlorum oklahomensis for enhancing biomass productivity. Bioresource technology 202:226-230. 185. Dogaris I, Brown T, Loya B, & Philippidis G (2016) Cultivation study of the marine microalga Picochlorum oculatum and outdoor deployment in a novel bioreactor for high-density production of algal cell mass. Biomass and bioenergy 89:11-23.

154

186. Ra CH, Kang C-H, Jung J-H, Jeong G-T, & Kim S-K (2016) Enhanced biomass production and lipid accumulation of Picochlorum atomus using light- emitting diodes (LEDs). Bioresource technology 218:1279-1283. 187. Tran D, et al. (2014) An isolated Picochlorum species for aquaculture, food, and biofuel. North American Journal of Aquaculture 76(4):305-311. 188. Watanabe K & Fujii K (2016) Isolation of high-level-CO 2-preferring Picochlorum sp. strains and their biotechnological potential. Algal Research 18:135-143. 189. Chen T-Y, Lin H-Y, Lin C-C, Lu C-K, & Chen Y-M (2012) Picochlorum as an alternative to Nannochloropsis for grouper larval rearing. Aquaculture 338:82-88. 190. Kumar SD, et al. (2016) Evaluation of suitability of wastewater-grown microalgae (Picochlorum maculatum) and copepod (Oithona rigida) as live feed for white leg shrimp Litopenaeus vannamei post-larvae. Aquaculture International 1(25):393-411. 191. Martin WF (2017) Too much eukaryote LGT. BioEssays. 192. Henley WJ, Major KM, & Hironaka JL (2002) Response to salinity and heat stress in two halotolerant Chlorphyte algae. Journal of Phycology 38(4):757- 766. 193. Henley WJ, Kvíderová J, Kirkwood AE, Milner J, & Potter AT (2007) Life in a Hypervariable Environment. Algae and Cyanobacteria in Extreme Environments:681-694. 194. Chin C-S, et al. (2016) Phased diploid genome assembly with single-molecule real-time sequencing. Nature methods 13(12):1050-1054. 195. Egamberdieva D (2009) Alleviation of salt stress by plant growth regulators and IAA producing bacteria in wheat. Acta Physiologiae Plantarum 31(4):861-864. 196. Iqbal M & Ashraf M (2007) Seed Treatment with Auxins Modulates Growth and Ion Partitioning in Salt‐stressed Wheat Plants. Journal of Integrative Plant Biology 49(7):1003-1015. 197. Lau S, Shao N, Bock R, Jürgens G, & De Smet I (2009) Auxin signaling in algal lineages: fact or myth? Trends in plant science 14(4):182-188. 198. Amin S, et al. (2015) Interaction and signalling between a cosmopolitan phytoplankton and associated bacteria. Nature 522(7554):98-101. 199. Wakasugi T, et al. (1997) Complete nucleotide sequence of the chloroplast genome from the green alga Chlorella vulgaris: the existence of genes possibly involved in chloroplast division. Proceedings of the National Academy of Sciences 94(11):5967-5972. 200. Shi C & Shi X (2006) Characterization of Three Genes Encoding the Subunits of Light‐Independent Protochlorophyllide Reductase in Chlorella protothecoides CS‐41. Biotechnology progress 22(4):1050-1055. 201. Li J, Goldschmidt-Clermont M, & Timko MP (1993) Chloroplast-encoded chlB is required for light-independent protochlorophyllide reductase activity in Chlamydomonas reinhardtii. The Plant Cell 5(12):1817-1829.

155

202. Hunsperger HM, Cattolico RA, & Randhawa T (2015) Extensive horizontal gene transfer, duplication, and loss of chlorophyll synthesis genes in the algae. BMC evolutionary biology 15(1):16. 203. Adams KL, Daley DO, Qiu Y-L, Whelan J, & Palmer JD (2000) Repeated, recent and diverse transfers of a mitochondrial gene to the nucleus in flowering plants. Nature 408(6810):354-357. 204. Major KM, Kirkwood AE, Major CS, McCreadie JW, & Henley WJ (2005) In situ studies of algal biomass in relation to physicochemical characteristics of the Salt Plains National Wildlife Refuge, Oklahoma, USA. Saline Systems 1(1):11. 205. Kirkwood AE & Henley WJ (2006) Algal community dynamics and halotolerance in a terrestrial, hypersaline environment. Journal of phycology 42(3):537-547. 206. Shi H, Lee B-h, Wu S-J, & Zhu J-K (2003) Overexpression of a plasma membrane Na+/H+ antiporter gene improves salt tolerance in Arabidopsis thaliana. Nature biotechnology 21(1):81-85. 207. GIMMLER H & MÖLLER E (1981) Salinity‐dependent regulation of starch and glycerol metabolism in Dunaliella parva. Plant, Cell & Environment 4(5):367-375. 208. Nowicka B & Kruk J (2012) Plastoquinol is more active than α-tocopherol in singlet oxygen scavenging during high light stress of Chlamydomonas reinhardtii. Biochimica et Biophysica Acta (BBA)-Bioenergetics 1817(3):389- 394. 209. Stevenson KR, Coolon JD, & Wittkopp PJ (2013) Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome. BMC genomics 14(1):536. 210. Vijaya Satya R, Zavaljevski N, & Reifman J (2012) A new strategy to reduce allelic bias in RNA-Seq readmapping. Nucleic acids research 40(16):e127- e127. 211. Mock T, et al. (2017) Evolutionary genomics of the cold-adapted diatom Fragilariopsis cylindrus. Nature 541(7638):536-540. 212. Schönknecht G, Weber AP, & Lercher MJ (2014) Horizontal gene acquisitions by eukaryotes as drivers of adaptive evolution. Bioessays 36(1):9-20. 213. Raymond JA & Kim HJ (2012) Possible role of horizontal gene transfer in the colonization of sea ice by algae. PloS one 7(5):e35968. 214. Bhattacharya D, et al. (2013) Genome of the red alga Porphyridium purpureum. Nature communications 4:1941. 215. Husnik F & McCutcheon JP (2017) Functional horizontal gene transfer from bacteria to eukaryotes. Nature Reviews Microbiology. 216. Husnik F & McCutcheon JP (2016) Repeated replacement of an intrabacterial symbiont in the tripartite nested mealybug symbiosis. Proceedings of the National Academy of Sciences:201603910. 217. Guillard RR & Ryther JH (1962) Studies of marine planktonic diatoms: I. Cyclotella Nana Hustedt, and Detonula Confervacea (CLEVE) Gran. Canadian journal of microbiology 8(2):229-239.

156

218. Stanke M, Steinkamp R, Waack S, & Morgenstern B (2004) AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic acids research 32(suppl_2):W309-W312. 219. Wyman SK, Jansen RK, & Boore JL (2004) Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20(17):3252-3255. 220. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, & Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19):3210-3212. 221. Li W & Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658- 1659. 222. Moriya Y, Itoh M, Okuda S, Yoshizawa AC, & Kanehisa M (2007) KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic acids research 35(suppl_2):W182-W185. 223. Wang Y, et al. (2012) MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic acids research 40(7):e49- e49. 224. Krzywinski M, et al. (2009) Circos: an information aesthetic for comparative genomics. Genome research 19(9):1639-1645. 225. Moreau H, et al. (2012) Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage. Genome Biol 13(8):R74. 226. Gao C, et al. (2014) Oil accumulation mechanisms of the oleaginous microalga Chlorella protothecoides revealed through its genome, transcriptomes, and proteomes. BMC Genomics 15:582. 227. Hanschen ER, et al. (2016) The Gonium pectorale genome demonstrates co- option of cell cycle regulation during the evolution of multicellularity. Nature communications 7. 228. Hori K, et al. (2014) Klebsormidium flaccidum genome reveals primary factors for plant terrestrial adaptation. Nat Commun 5:3978. 229. Blanc G, Hokamp K, & Wolfe KH (2003) A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome research 13(2):137-144. 230. Rensing SA, et al. (2008) The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319(5859):64-69. 231. Nguyen L-T, Schmidt HA, von Haeseler A, & Minh BQ (2014) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular biology and evolution 32(1):268-274. 232. Farris JS (1977) Phylogenetic analysis under Dollo's Law. Systematic Biology 26(1):77-88. 233. Felsenstein J (2002) PHYLIP(Phylogeny Inference Package) version 3.6 a3. 234. Qiu H, Cai G, Luo J, Bhattacharya D, & Zhang N (2016) Extensive horizontal gene transfers between plant pathogenic fungi. BMC Biol 14:41. 235. Qiu H, Price DC, Yang EC, Yoon HS, & Bhattacharya D (2015) Evidence of ancient genome reduction in red algae (Rhodophyta). J Phycol 51(4):624-636.

157

236. Keeling PJ, et al. (2014) The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing. PLoS Biol 12(6):e1001889. 237. Li W & Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658- 1659. 238. Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113. 239. Capella-Gutiérrez S, Silla-Martínez JM, & Gabaldón T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15):1972-1973. 240. Price MN, Dehal PS, & Arkin AP (2010) FastTree 2--approximately maximum- likelihood trees for large alignments. PLoS One 5(3):e9490. 241. Karp PD, et al. (2015) Pathway Tools version 19.0 update: software for pathway/genome informatics and systems biology. Briefings in bioinformatics 17(5):877-890.