<<

DEVELOPMENT OF CRISPR/CAS9 IN

AND OTHER TOWARD UNDERSTANDING

AND MANIPULATING ENERGY ALLOCATION

by

Brian Vogler A thesis submitted to the Faculty and the Board of Trustees of the Colorado School of

Mines in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Applied

Chemistry).

Golden, Colorado

Date ______

Signed: ______

Brian W. Vogler

Signed: ______

Dr. Matthew C. Posewitz Thesis Advisor

Golden, Colorado

Date ______

Signed: ______

Dr. Thomas Gennett Department Head Department of Chemistry

ii

ABSTRACT

Nannochloropsis is a of eukaryotic that grows well in outdoor bioreactors and produces high yields of triacylglycerols (TAGs), which can be processed into . In this work, we chemically characterize the storage of Nannochloropsis for the first time, and interrupt required for its biosynthesis to (1) understand their function and (2) interrogate whether this unused biomass component can be eliminated without significant impact on viability.

To generate targeted knockouts, we developed CRISPR/Cas9 methods for

Nannochloropsis and first interrupted nitrate reductase, a common target for genetic knockout because it is both inessential and easily scored. The method and validated chassis was then used to interrogate the beta-glucan synthase (BGS) and transglycosylase (TGS) enzymes believed to be responsible for the backbone polymerization and branching of the storage beta glucan, respectively.

We identified no significant growth defects in our laboratory culturing conditions but did confirm that both were fundamental to synthesis of this beta glucan storage carbohydrate. The generated knockouts of either gene do not produce the elevated carbohydrate phenotype of wild-type cells in response to nitrogen deprivation. We also observed a non-bleaching phenotype in knockouts of BGS, where chlorophyll and carotenoid content remain elevated in mature cultures when wild-type cultures reduce their chlorophyll and carotenoid content.

The design and diagnostic CRISPR/Cas9 methods developed for Nannochloropsis were then modified to transform a fast-growing high-light-, high-heat-, and high-salt-tolerant microalga

Chlorella sp. strain CCMP252 with complexed Cas9-sgRNA to generate insertional knockouts of the nitrate reductase gene, demonstrating the broad applicability of the lessons learned.

iii

TABLE OF CONTENTS

ABSTRACT ...... iii

LIST OF FIGURES ...... ix

LIST OF TABLES ...... xiii

ACKNOWLEDGEMENTS ...... xiv

CHAPTER 1 INTRODUCTION ...... 1

1.1 Photosynthetic control the global environment ...... 1

1.2 Eukaryotic algae ...... 3

1.3 Nannochloropsis as a new model system ...... 5

1.4 Intracellular accumulation of fats and ...... 6

1.5 Algal cell walls ...... 8

1.6 Mixotrophic and heterotrophic cultivation ...... 12

1.7 Focus of this work...... 13

CHAPTER 2 CHARACTERIZATION OF PLANT CARBON SUBSTRATE UTILIZATION BY AUXENOCHLORELLA PROTOTHECOIDES ...... 14

2.1 Introduction...... 14

2.2 Materials and methods ...... 15

2.2.1. Cultivation ...... 15

2.2.2. Metabolite 1D 1H-NMR ...... 17

2.2.3. FAME analysis ...... 17

2.2.4. Glycome profiling ...... 18

2.2.5. Proteomic analysis ...... 18

2.2.5.1. Computerized comparisons ...... 19

2.2.5.2. MW and pI measurements ...... 19

2.2.5.3 digestion and peptide extraction ...... 20

2.2.5.4. NanoLC-MS/MS ...... 20

iv 2.2.5.5. Data processing and protein identification ...... 21

2.2.6. Genomic sequencing, assembly, and annotation ...... 22

2.2.7. Transcriptomic sequencing and analysis ...... 24

2.3. Results and discussion ...... 25

2.3.1. Degradation of and plant substrates ...... 25

2.3.2. analysis ...... 31

2.3.3. and transcriptomics ...... 33

2.4. Conclusions ...... 40

2.5. Acknowledgements ...... 41

CHAPTER 3 CHARACTERIZATION OF THE NANNOCHLOROPSIS GADITANA STORAGE CARBOHYDRATE: A 1,3-BETA GLUCAN WITH LIMITED 1,6-BRANCHING ...... 42

3.1. Introduction...... 43

3.2. Materials and methods ...... 44

3.2.1. Carbohydrate extraction ...... 44

3.2.2. 1H-NMR analysis ...... 45

3.2.3. Size exclusion chromatography ...... 45

3.2.4. Linkage analysis ...... 46

3.2.5. Degrees of polymerization and branching calculations ...... 46

3.2.6. Phylogenetic analysis...... 47

3.3. Results and discussion ...... 47

3.3.1. Nannochloropsis produces a β-1,3-linked soluble glucan with β-1,6 branching ...... 47

3.3.2. Glucan synthase and branching are conserved among ...... 50

3.3.3. Nannochloropsis may utilize a glycogenin-like protein to nucleate laminarin ... 51

3.3.4. Bioinformatics suggests three potential laminarinases from N. gaditana ...... 52

3.3.5. Manipulating carbohydrate synthesis increases accumulation in ...... 54

3.4. Conclusions ...... 56

v 3.5. Acknowledgement ...... 56

CHAPTER 4 CAS9-GENERATED GLUCAN SYNTHASE KNOCKOUT OF NANNOCHLOROPSIS GADITANA ...... 57

4.1. Introduction...... 57

4.1.1. Nannochloropsis storage carbohydrate ...... 57

4.1.2. CRISPR/Cas9 nuclease-directed insertional mutagenesis ...... 58

4.2. Materials and methods ...... 59

4.2.1. Design and assembly of Cas9 expression vector for Nannochloropsis gaditana ...... 59

4.2.2. Generation and identification of Cas9-expressing N. gaditana CCMP526 ...... 60

4.2.3. Design and synthesis of single guide RNAs ...... 61

4.2.4. In vitro assay of sgRNA nuclease activity ...... 61

4.2.5. Transformation of Nannochloropsis gaditana CG15 with sgRNA ...... 61

4.2.6. Identification of glucan synthase and KRE6-like branching enzyme knockouts...... 62

4.2.7. Culturing for growth phenotype ...... 62

4.2.8. Quantification of storage glucan ...... 63

4.2.9. Quantification and composition of total fatty acids ...... 63

4.3. Results and discussion ...... 63

4.3.1. Stable Cas9 expression does not have a significant effect on cell growth ...... 63

4.3.2. Soluble carbohydrate accumulation is induced following nitrogen starvation ... 63

4.3.3. Knockout of BGS or TGS drastically reduces soluble carbohydrate accumulation ...... 64

4.3.4. Growth under light/dark cycling ...... 66

4.3.5. BGS and TGS knockout strains accumulate significantly more total fatty acids ...... 66

4.3.6. BGS knockout cells are less sensitive to photoinhibition in older cultures ...... 67

4.4. Conclusions ...... 67

CHAPTER 5 HIGH-EFFICIENCY CAS9-DIRECTED INSERTIONAL GENE KNOCKOUT IN CHLORELLA SP. STRAIN CCMP252 ...... 72

vi 5.1. Introduction...... 72

5.2. Materials and methods ...... 72

5.2.1. Design, scoring, and synthesis of sgRNA...... 72

5.2.2. In vitro assay of sgRNA...... 73

5.2.3. Transformation of Chlorella sp. strain CCMP252 using complexed Cas9- sgRNA ...... 74

5.2.4. PCR analysis of transformants ...... 75

5.2.5. Locus sequencing ...... 75

5.2.6. NR phenotype on agar ...... 76

5.3. Results and discussion ...... 76

5.3.1. Chlorella sp. strain CCMP252 is high-light tolerant ...... 76

5.3.2 Most Cas9-induced breakages result in insertion of selection marker fragment ...... 76

5.4. Conclusions ...... 76

CHAPTER 6 CONCLUSIONS AND FUTURE WORK ...... 79

6.1. General discussion ...... 79

6.1.1. Identifying fluorescent profile of N. gaditana to inform choice of FP reporter ... 79

6.1.2. Struggles with RNA interference ...... 80

6.1.3. Cas9 diagnostic tests and troubleshooting methods ...... 81

6.2. Recommendations for future research ...... 81

6.2.1. Interrupting synthesis to generate protoplast strains ...... 82

6.2.2. Characterizing the carbon concentrating mechanisms of Nannochloropsis ..... 82

6.2.3. Implementation of Cas9 RNP complex and expansion into other organisms ... 83

6.2.4. Knockout of chlororespiration in high-light tolerant strains ...... 83

6.2.5. Precision gene editing ...... 83

REFERENCES ...... 85

APPENDIX A SUPPLEMENTAL MATERIALS ...... 101

APPENDIX B PERMISSIONS ...... 106

vii APPENDIX C LAB-READY METHODS...... 116

APPENDIX D GENERATED GENETIC MATERIALS ...... 130

viii

LIST OF FIGURES

Figure 2.1. Algae utilization of carboxymethylcarboxymethyl cellulose (CMC) in the absence of light. Reduction of the red stain indicates CMC consumption. Prestained and stained plates of (A) A. protothecoides and (B) Chlamydomonas reinhardtii are shown ...... 26

Figure 2.2. Comparison of A. protothecoides growth with and without augmentation with raw switchgrass (0.2% w/v). Cultures were grown with a 12:12 light:dark cycle at ~80 µmol/m-2s-1 light with constant shaking at room temperature for two weeks. All experiments were performed in triplicate and the error bars represent the standard deviations of the three biological replicates. Significant differences in cell density are marked with asterisks (* for p < 0.05 and ** for p < 0.01). Triplicate cultures at 14 days are shown in the inset ... 26

Figure 2.3. The doublings per day were calculated for each sample and the average for each set (Algae + Switchgrass and Algae alone) is shown...... 26

Figure 2.4. Lipid contents (A) and profiles (B) by GC/FID for A. protothecoides biomass grown with and without switchgrass. Data shown as the mean of the triplicate growth experiments and the error bars represent the standard deviations of the three biological replicates ...... 27

Figure 2.5. Metabolite 1D 1H-NMR analysis of A. protothecoides cultures. HS culture medium was amended with (A) switchgrass or (B) CMC in the presence and absence of algae. Representative spectra from each treatment are shown...... 28

Figure 2.6. Plant cell wall glycan-directed monoclonal antibodies detection (Glycome profiling) of most major non-cellulosic carbohydrates in plant cellulosic substrates incubated in media with or without A. protothecoides for 2 weeks. Regions of evidence for carbohydrate utilization/modification by algal growth are indicated by light blue arrows and xylan region is shown in yellow dotted block ...... 30

Figure 2.7. Cell wall glycan-directed mAb screening of cell wall extracts in culture media incubated without (1-5) and with (1C-5C) algae. 4M KOH extracts were isolated from raw switchgrass (1), ionic liquid pre-treated switchgrass (2), 25% H2O2-washed switchgrass (3), raw eucalyptus (4), and ionic-liquid pre- treated eucalyptus (5) to supplement with media for incubating with and without A. protothecoides. Lighted bars represent presence of sugar. Green- highlighted boxes show levels of xyloglucans are vastly reduced when algae is present (+). Blue-highlighted box shows algae produces a valuable coproduct arabinogalactan. A representative sample from each treatment is shown ...... 31

Figure 2.8. Phylogenetic analysis of (A) Glycosyl Hydrolase Family 5 and (B) Glycosyl Hydrolase Family 9 and their nearest characterized neighbors ...... 35

ix Figure 2.9. Alignments of the GH5 (A) and GH9 (B) family proteins and their nearest characterized neighbors. Two PROSITE motifs for presumed active site are available for family 5, and one for family 9 ...... 35

Figure 2.10. Proteome of A. protothecoides in the presence and absence of plant substrates. 2D Gel images of protein extracts from (A) Algae Alone, (B) Plant Alone, and (C) Algae + Plant. Spots present on the silver-stained gels of sample Plant Alone were removed from the image and total spot counts. (D) Overlay image shows sample Algae + Plant in green overlaying Algae Alone in magenta. Image appears white where spots of similar intensity overlay each other ...... 36

Figure 3.1. The protons of interest appear between 4.2 and 4.6 ppm, as demonstrated by Kim et al. [1]. Spectra for (A) N. gaditana soluble glucan extract and (B) laminarin are shown. At right (C), the generalized structure is shown, with anomeric carbons highlighted with the colors of their corresponding chemical shifts ...... 49

Figure 3.2. Results of linkage analysis showing a majority of laminarin-related residues, with evidence of some putative cellulose (cell wall) contamination. Relative abundances of each residue were used to estimate degree of polymerization and degree of branching ...... 49

Figure 3.3. Proposed conserved glucan metabolism of investigated stramenopiles, including the UGPase/PGM fusion that strongly affects β-1,3-glucan synthesis in diatoms. The specific glucosyl hydrolases active on laminarin are not yet verified ...... 50

Figure 3.4. Phylogenetic trees generated for the (A) branching enzyme, (B) glucan synthase, and (C-F) glycoside hydrolases. Both branching enzyme and glucan synthase clustered as expected, with stramenopiles separating well from green plants and fungi. Of the glycoside hydrolases, (C) Nga03687, (D) Nga00952, and (E) Nga00051 clustered similarly, unlike other considered genes. (F) Nga05229 is included as a representative example of rejected glycoside hydrolase ...... 55

Figure 4.1. Cas9 experression vector constructed for Nannochloropsis gaditana CCMP526 using a viral SV40 promoter to drive hygromycin resistance and the endogenous elongation factor promoter to drive Cas9 expression. Cas9 is codon-optimized and fused to an NLS, GFP, and another NLS ...... 59

Figure 4.2. Soluble carbohydrate extracted from cultures split into nitrogen replete and nitrogen starved media at late exponential phase and cultured for 72 hours. Error bars represent standard deviations of biological triplicates. Soluble carbohydrate is presented on a per cell (left) and fraction of dry cell weight (right) basis. Asterisks (*) indicate significant difference from both wild-type and Cas9-GFP background (p-value < 0.05) ...... 64

Figure 4.3. Total carbohydrate from cultures split into nitrogen replete and nitrogen starved media at late exponential phase and cultured for 72 hours. Error bars represent standard deviations of biological triplicates. Soluble carbohydrate is presented on a per cell (left) and fraction of dry cell weight x (right) basis. Asterisks (*) indicate significant difference from both wild-type and Cas9-GFP background (p-value < 0.05) ...... 65

Figure 4.4. The total fatty acid content as a fraction of dry cell weight (%DCW) for cultures split into nitrogen replete (+N) and nitrogen starved (-N) media at late exponential phase and cultured for 72 hours. Error bars represent standard deviations of biological triplicates ...... 67

Figure 4.5. Cell densities of fluorescent subpopulations as cultures mature. Taken as a whole, no difference in growth is observable between the selected strains (top), but separating the fluorescent subpopulations reveals more complexity. In exponential growth, nearly all cells are in the healthy fluorescent region (High Chl). As cultures mature, a population appears (Mid Chl) with lower 695 nm fluorescence and more fluorescence at 574 nm. Following this, a population with even less 695 nm fluorescence and more 574 nm fluorescence appears (Low Chl) and the transitional population fades. Beyond this point, increased cell density is limited to cells in the Low Chl cluster, and the density of healthy cells plateaus. Error bars represent standard deviation of four biological replicates, except for wild-type. For each knockout, four genetically unique mutants were used. Asterisks (*) indicate significant difference from Cas9-GFP background (p-value < 0.05) ...... 69

Figure 4.6. and carotenoid content of late exponential phase cultures (1.2- 1.5x108 cells/ml) grown under 12/12 light/dark cycle. Samples were taken at 8 hrs into the 12 hr light period. Error bars represent standard deviation of four biological replicates, except for wild-type. For each knockout, four genetically unique mutants were used. Asterisks (*) indicate significant difference from Cas9-GFP background (p-value < 0.05) ...... 71

Figure 5.1. Genomic sequence of Chlorella sp. CCMP252 with annotated target sites (light blue) primers for in vitro assay, PCR analysis, and sequencing, and nitrate reductase coding sequence (yellow) ...... 73

Figure 5.2. Agarose gel electrophoresis gel fragment analysis of in vitro sgRNA guide assay. Two different guides were tested in lanes 1 and 2, and negative control with no guide added was run in lane 3 ...... 74

Figure 5.3. -irradiance curve for Chlorella sp. strain CCMP252 and Tetraselmis suecica UTEX2286. Error bars indicate standard deviations of triplicate measurements ...... 77

Figure 5.4. Exemplary agarose gel illustrating the increased band size resulting from insertion of the selection marker fragment at the Cas9 target site in the nitrate reductase coding sequence. Left and right lanes are loaded with 5 µl of O’GeneRuler 1 kb Plus DNA Ladder (ThermoFisher), and wild-type C. 252 was used as template in lane 8, and water in lane 10 ...... 78

Figure A.1. After normalization by optical density, the fluorescence signal of Venus could be detected by subtraction of the chlorophyll autofluorescence from wild-type culture in one isolated mutant. The expected peak emission of Venus is at 528 nm ...... 102

xi Figure A.2. Fluorescence scans of wild-type N. gaditana to identify suitable fluorescent reporters and optimal excitation and emission wavelength settings. For each FP, the peak excitation wavelength was used and the emission spectrum scanned. The vertical bars indicate the reported peak emission wavelength for each FP. The colored curves indicate the photosynthetic autofluorescence of wild-type N. gaditana. A dotted line traces the % transmittance of wild-type N. gaditana at each wavelength. The ideal reporter has the vertical bar intersect its matching curve at a low fluorescence value, and the dotted line at a high value, so that the light- harvesting complex is not quenching the fluorescent signal. On the other hand, higher wavelength FPs are associated with lower quantum efficiencies (lower signal) ...... 104

Figure A.3. Polyketide synthases identified in Nannochloropsis gaditana CCMP526. A canonical type I fatty acid synthase (FAS) is shown at right, indicating the essential domains. To the right of each domain cartoon is a number representing the relative transcript coverate from combined N+ and N- transcriptome datasets [2]. PKS1 is essentially a type 1 FAS. PKS4-6 are very similar but terminate with a short-chain dehydrogenase/reductase domain (SDR) rather than the thioesterase domain. The highest expressed PKS transcript (PKS6) has been knocked out, but shares high sequence similarity with PKS4 and is structurally very similar to PKS5. This gene redundancy may make knockout effects difficult to identify. The next target is PKS3, which has a unique arrangement of domains and is the next-highest expressed transcript. PKS2 is highly repetitive, with repeat regions having very little difference, complicating assembly ...... 105

xii

LIST OF TABLES

Table 2.1. General characteristics of the Auxenochlorella protothecoides genome ...... 23

Table 2.2. A catalog of the potential GH-encoding genes from A. protothecoides sp. 0710 and their UTEX 25 homologs. Included are the glycosyl hydrolase families identified by the Conserved Domain Database and the respective E- Value. Also included are predicted subcellular localizations and reliability scores (1 is high confidence, 5 is low), generated by TargetP. From the transcriptome analysis, the fold-change (FC) of each transcript is given during the dark and light periods in the presence of CMC, relative to the unsupplemented culture, as well as p values. Significant differences (p < 0.05) are highlighted in bold ...... 34

Table 2.3. Quantitative summary of differentially expressed proteins on duplicate 2D gels. 2D gel comparisons of 1C Algae + Plant and 2C Algae + H2O2 Treated Plant versus Algae Alone. Reference spot numbering, pI, and MW are given for changing polypeptide spots analyzed in samples 1C Algae + Raw Plant (gels LF1093 #5-6), 2C Algae + H2O2 Treated Plant (gels LF1093 #8 and LF1094 #1), and HSC Algae Alone (gels LF1093 #3-4). Also shown are the fold increases or decreases (difference) of the polypeptides for 1C Algae + Raw Plant and 2C Algae + H2O2 Treated Plant vs. HSC Algae Alone. The differences are calculated from average spot percentages from duplicate 2D gels (individual spot density divided by total density of all measured spots). Polypeptide spots increased by a fold increase of ≥ 3.0 or a fold increase of ≥ 1.7 and a p-value ≤ 0.05 are highlighted in blue in the difference column, while spots decreased with a fold decrease of ≤ -3.0 or a fold decrease of ≤ 1.7 and a p-value ≤ 0.05 are highlighted in red. NC = difference not confirmed. Polypeptide spots with a p-value of ≤ 0.05 are highlighted in green in the T-test column. Spot percentages are given to indicate relative abundance. A total of 519 spots were analyzed. Highlighted spots were selected for identification by spot analysis...... 37

Table 2.4. The differentially expressed spots selected for proteomic analysis. Fold changes are calculated from average spot percentages (individual spot density divided by total density of all measured spots for duplicate 2D gels) ...... 39

Table 3.1. The degree of polymerization (DP) and degree of branching (DB) are calculated separately using our 1H-NMR and linkage data. For context, DP and DB measurements of soluble carbohydrates from diatoms and brown alga are provided ...... 47

Table 5.1. Primer sequences used in C252 Cas9 work ...... 73

xiii

ACKNOWLEDGMENTS

I would like to express my sincere gratitude to my advisor, Dr. Matthew Posewitz, whose knowledge and understanding of photosynthetic biochemistry has been essential to navigating the vast field of possible projects, and to avoiding many of the pitfalls presented by these complex hosts. I am very grateful to have been surrounded by kind, supportive, and talented colleagues throughout my time in graduate school, and would like to especially thank Dr. Robert

Jinkerson, Dylan Franks, Dr. Huiya Gu, Dr. Victoria Work, Dr. Lee Stanish, Dr. Matt Scholz, Dr.

Fiona Davies, Jeff Chung, Lyle Sisson, Dr. Wei Fang, Lukas Dahlin, Dylan Thomas, Amy

Ashford, Dr. Melissa Cano, and Ed Dempsey. I would also like to thank my committee members, Dr. Nannette Boyle, Dr. Brian Trewyn, and Dr. Ryan Richards for their helpful guidance and inspiring questions. I would also like to thank Dr. Amanda Barry for her support and encouragement, and for the opportunity to translate my skills into new organisms.

I would especially like to thank Dr. Joseph Weissman and ExxonMobil for their faithful support of algal research and development, without whom this research would not be possible. They have been driven and intelligent sponsors of much of our work, and their projects have already seen significant success. It is a bold strategy, and I hope that it pays off for them.

I owe a great deal of thanks to my previous mentors, specifically Dr. Charles Gersbach and Dr. Antonio Jimeno, for stoking my interest in molecular biology and teaching me about exciting new techniques in and applications of biotechnology.

Finally, my family is the most unwaveringly supportive collection of individuals I could ask for. Their continued patience balanced with urgency, and faith tempered with inquiry, have ensured that I would complete this work. Most of all, I am grateful for the constant and exceptional support of my wife, Laura, to whom I owe the greatest thanks.

xiv

CHAPTER 1

INTRODUCTION

Modified from a paper published in Environmental Progress & Sustainable Energy1

Victoria H. Work2, Fiona K. Bentley3, Matthew J. Scholz3, Sarah D’Adamo3, Huiya Gu3, Brian W.

Vogler3, 4, Dylan T. Franks3, Lee F. Stanish3, Robert E. Jinkerson3 and Matthew C. Posewitz3, 5

1.1. Photosynthetic microorganisms control the global environment

It is estimated that one million years of global photosynthate stored as fossil fuels is combusted every year [3]. Our ability to extract and utilize these energy reserves has enabled a unique period in the Earth’s history – a time in which abundant energy carriers are used to provide relatively inexpensive electricity, heating, and transportation fuels, as well as sustain current agricultural practices. This in turn has facilitated a rapid expansion in the human population, which recently surpassed 7 billion and is projected to rise to 8 billion within the next

25 years [4]. Energy and food are inextricably linked to a society’s well-being, and the 21st century will likely see dramatic changes in how these products are generated and distributed.

As a growing human population encounters diminishing fossil energy reserves, climate change, and terrestrial ecosystem degradation, one key component in this transformation will be leveraging photosynthesis in novel ways to satisfy aspects of the world’s energy and nutritional demands [5–7].

______

1 Reprinted with modification and additions with permission from Environmental Progress & Sustainable Energy 2013, 32, 989. 2 Department of Civil and Environmental Engineering, Colorado School of Mines. 3 Department of Chemistry and Geochemistry, Colorado School of Mines. 4 Contributed content about Nannochloropsis to sections 1.3 and 1.4, and added section 1.6. 5 Corresponding author.

1 Photosynthetic microorganisms (PSMs), which include algae and , have the potential to fulfill an important role in this transformation, and PSMs are currently being intensively investigated for the efficient production of several targeted bioenergy carriers. These include, but are not limited to, triacylglycerols (TAGs), carbohydrates (e.g., starch, glycogen, sucrose), H2, free fatty acids (FFA), omega-3-fatty acids and other nutraceuticals, vitamins, amino acids and proteins, terpenoids, alcohols, alkanes, aldehydes, and organic acids. PSMs are capable of exemplary photosynthetic yields and can thrive in a diverse range of ecosystems, including saltwater [5,7–10]. As potable water is a highly prized and limited resource, the development of marine systems for bioenergy and food production will likely prove critical in satisfying anthropologic demands. Contemporary agriculture required centuries of evolution and breeding, whereas targeted biomass improvements in PSMs is a relatively new concept that is in the transformative stages of development. Fortunately, rapid advances in genome sequencing, molecular biology, and analytical chemistry are allowing scientists to dissect and improve PSM metabolic networks to more efficiently produce a diversity of targeted products.

Although the conversion of sunlight and CO2 into basic precursors suitable for transportation fuel and agricultural, nutritional, and specialty products is an inherent property of

PSMs, improvements in product yields and processing technologies are necessary to attain environmental and economic sustainability. Currently, PSM productivities are limited by well- recognized shortcomings, which include: (a) inefficiencies in coupling light capture to photosynthetic processes, (b) the absorption of excess excitation energy relative to the amount used in photochemical processes, (c) insufficient CO2 availability, trafficking, and reduction, and

(d) limited metabolic flux directed to biocommodity pathways [6–11]. Additionally, harvesting

PSMs from media where they typically represent approximately 1% of the total mass remains a significant techno-economic challenge, as does the deconstruction and segregation of PSM biomass into purified or enriched products of value.

2 and strain evolution will likely be used to overcome factors currently limiting the large-scale deployment of PSMs. Ultimately, a self-contained bioprocessing platform in a single organism may emerge – a “photons to fuels” PSM that is capable of highly efficient light utilization, CO2 assimilation, and metabolic partitioning into biomolecules that are readily harvested and transformed into commodity products. This would uncouple the current need for a secondary organism (e.g., yeast, ) to convert photosynthate (e.g., cellulose) to , thereby significantly simplifying costly integration steps. The deployment of these organisms should be designated to footprints that do not compete with agriculture or require substantial freshwater or fertilizer inputs. Recent years have seen rapid growth in PSM research and significant advances are systematically being attained in the quest to improve PSM productivities.

1.2. Eukaryotic algae

Eukaryotic algae can accumulate high levels of intracellular starch or , as well as TAGs, particularly when stressed for nutrients such as nitrogen. Significant research efforts are therefore focused on studying the metabolic rearrangements induced by nutrient stress, as well as the biosynthetic routes to TAGs, storage carbohydrates, and H2. Hundreds of distinct algal strains from existing and newly established [12] culture collections have been evaluated for their biofuel or nutraceutical production potential. Contemporary algal-biofuels research is divided among approaches that either (a) characterize and manipulate biofuel- relevant processes in established model laboratory organisms, which typically have low productivities outdoors [8,13–19], or (b) study industrially promising organisms that have high productivities, but are not as fully characterized and often do not have established or robust genetic manipulation techniques [5,6,20–24]. Importantly, the barriers to genome sequencing, annotation, and assembly have diminished in recent years and many new algal genomes are now emerging, several with biofuels or biocommodity production being the principal driver for investigation. 3 Genetic manipulations that alter algal metabolism or introduce transgenes can be used to increase production of a desired molecule. Although successful homologous recombination in eukaryotic algae is reported for only a select few [25,26], several organisms can be successfully transformed using exogenous DNA that is ectopically incorporated into the genome, enabling the transgenic expression of proteins [15]. Random integration of marker genes can disrupt endogenous genes for forward genetic screens. Gene silencing by RNA interference (RNAi) has also emerged as a powerful genetic tool to attenuate levels of targeted transcripts and is particularly well developed in C. reinhardtii [27–34]. RNAi in algae relies on two classes of sRNA as genetic templates: microRNA (miRNA) and small interfering RNA

(siRNA) [28,29]. originate from endogenous, non-coding single-stranded RNA that fold into imperfect stem-loop structures displaying low complementarity. Alternatively, siRNA can originate from exogenous, double stranded RNA (dsRNA) molecules containing hairpin structures displaying near-perfect complementarity [28,29]. These hairpin and stem-loop structures are recognized by the enzyme Dicer, which binds and processes the double-stranded region into sRNA. Dicer and sRNA form a complex that, when joined by the protein Argonaute-

Piwi (AGO-Piwi), form an effector known as the RNA-induced silencing complex (RISC) [28,29].

The AGO-Piwi protein cleaves the “passenger strand,” leaving sRNA as a template for the subsequent recognition, binding, and destruction of complementary mRNA. An alternative but related pathway involves the activity of RNA-dependent RNA polymerases (RDRs) that serve to amplify single-stranded transcripts or sRNA precursors leading to the production of dsRNAs destined for RNAi pathways [28,29].

The existence of RNAi pathways has been experimentally confirmed in C. reinhardtii and

P. tricornutum [27–34], and homologs to the key components of RNAi machinery have been identified from genome and transcriptome sequences in Volvox carteri, Dunaliella salina,

Porphyra yezoensis, Ectocarpus siliculosus, Nannochloropsis oceanica, Vaucheria frigida and

Euglena gracilis. Gene silencing through the introduction of exogenously synthesized dsRNAs 4 has long been used as a transient method for probing gene function. Techniques to express inverted repeat (IR) transgenes that code for dsRNA molecules or artificial miRNAs (amiRNA) are also established [27–34].

1.3. Nannochloropsis as a new model system

While model organisms such as C. reinhardtii and P. tricornutum have been instrumental for studying aspects of photosynthesis, nutrient acquisition, and algal metabolism, these organisms have yet to emerge in successful large-scale outdoor cultivation platforms.

Researchers are now focusing on algal species with the demonstrated ability to grow outdoors at meaningful scales, where they are currently used in nutraceutical and aquaculture industries.

Among these are highly oleaginous marine species of Nannochloropsis. Several

Nannochloropsis genomes have been published [2,35–37], as well as transformation methods

[2,25,35]. Nannochloropsis species are particularly attractive as new model systems due to their exemplary lipid production characteristics [38–40]. In outdoor culture, Nannochloropsis is projected to produce more than 90 kg of oil per hectare per day [41]. Although lipid production for biofuels has largely driven the use of Nannochloropsis species as model organisms, additional biocommodity prospects exist, including the production of a number of valuable pigments and eicosapentaenoic acid [42].

Nannochloropsis gaditana is a small (2-4 µm), nonmotile, marine related to diatoms and [2,43,44]. Nannochloropsis and the lineage are widely held to have evolved from a secondary endosymbiotic event of red alga, resulting in oftentimes dramatically different solutions to the same challenges that green plants and alga have addressed (e.g., energy storage and transport from the photosynthetic organelle, carbon concentration, resistance to predation). It is spherical in shape with a rugged cell wall and large lipid droplets. The lipid profile of Nannochloropsis species is rich in C14, C16, and C18, with variable amounts of C20 [2,45,46]. With nuclear transformation becoming a routine process in

Nannochloropsis, it is expected that rapid progress in investigating and manipulating lipid 5 metabolic pathways, cell wall biosynthesis, and light-harvesting complexes will ensue, and that the addition of exogenous elements to produce other marketable chemicals will be undertaken.

Homologous recombination has been reported in one strain of Nannochloropsis (W2JB3) [25].

In N. gaditana, a number of genetic ontology (GO) terms associated with RNAi were identified

[128, 131] [2,47], indicating that Nannochloropsis may also be amenable to manipulation by

RNAi techniques.

CRISPR was first demonstrated in N. gaditana in 2016, with the successful knockout of nitrate reductase [48]. This gene is a popular first target because most algae can be cultured such that the gene is unnecessary, by providing nitrogen in the form of ammonium rather than nitrate. Following growth, nitrate reductase-deficient mutants can be screened by growth on plates with nitrate as the sole source of nitrogen.

Many of these newly emerging organisms require extensive time for phototrophic growth on agar plates, thereby delaying the subsequent selection and study of genetic transformants.

Enabling heterotrophy is a potential mechanism to accelerate research in these cases.

Zaslavskaia and colleagues demonstrated a successful trophic conversion of P. tricornutum by introducing a human transporter (Glut1) into the nuclear genome [49]. Similar approaches have been applied to C. reinhardtii and Volvox carteri by expressing the hexose/H+ symporter (HUPI) gene from Chlorella [50,51] and in cyanobacteria by expressing the glucose transporter gene (glcP) from Synechocystis PCC 6803 [52]. Although heterotrophy is frequently established by simply expressing an appropriate transporter, other enzymes, for example α- ketoglutarate dehydrogenase in the tricarboxylic acid cycle (TCA cycle) or isocitrate lyase in glyoxylate cycle, may have to be introduced in some cases [53–57].

1.4. Intracellular accumulation of fats and carbohydrates

In the model green alga C. reinhardtii, notable advances have enabled starch hyperaccumulation [19,58], reduction of light-harvesting antennae for more efficient light harvesting [59–62], and increased TAG accumulation in starchless mutants [18,19,63–65]. In 6 the model Phaeodactylum tricornutum, medium chain and incorporation into TAG has been realized [15]. Medium chain fatty acid derivatives have lower melting points relative to the C16 and C18 fatty acids that typically dominate algal biomass and are therefore better suited for biofuel applications.

Throughout the light day, Nannochloropsis and other Stramenopiles accumulate a β-1,3- linked glucan, storing the excess energy for consumption by cellular processes in during the dark. As a result, the composition of the cell biomass fluctuates throughout the day/night cycle, with this soluble carbohydrate fraction peaking at the end of the day period.

In diatoms, this carbohydrate supply is stored in within the cell, with the membrane-bound beta-glucan synthase (BGS) incorporating the activated UDP-glucose monomers in the cytosol into the growing which is elongated into the interior of the [66]. The many transmembrane domains of the Kre6-like branching enzyme suggest that this enzyme is also membrane-bound, and adds the β-1,6-linked branched glucose units as the polysaccharide is synthesized [67].

The beta glucan synthase homolog of Nannochloropsis gaditana is highly similar to that identified in diatoms and brown alga [66,68,69], with predicted transmembrane domains between an FKS1 domain and a glucan synthase domain and at both N- and C-termini of the protein. This arrangement is in contrast to the beta glucan synthases identified in yeast and other fungus, which lack the N-terminal transmembrane domains [66].

The Nannochloropsis β-1,6-transglycosylase (TGS) believed responsible for the branching of this carbohydrate is likewise similar to the homolog from diatoms and brown alga, but notably more like the enzyme from fungus than from Oomycetes in its arrangement of transmembrane domains.

Analysis of the TGS and BGS homologs from across the evolutionary tree provides some insights into the history and divergence of these enzymes. All the identified beta glucan synthase homologs of vascular plants encode a Vta1 domain (pfam04652) at the N-terminus. 7 The Vta1 domain is associated with VPS20, which is implicated in the concentration and sorting of cargo proteins [70]. The BGS homologs of all stramenopiles, as well as vascular plants, encode an FKS1 domain (pfam14288) N-terminal to the glucan synthase domain (pfam02364), separated by a set of transmembrane domains.

In oomycetes, the beta glucan homolog is fused to an N-terminal SKN1 domain

(pfam03935), which is the same domain used to identify the transglycosylase protein. In fact, searches for branching enzymes among oomycetes yields the same genes as the beta glucan synthase search, heavily supporting the related functions of these two proteins. It is worth noting that at least two oomycetes harbor a predicted second copy of the beta glucan synthase gene lacking the N-terminal SKN1 domain, but still possessing the FKS1 domain and transmembrane region between.

1.5. Algal cell walls

In contrast to cyanobacterial approaches that are focused on metabolite secretion, most algal biofuels research is centered on harvesting cells then deconstructing and segregating algal biomass to attain the end product, typically TAG. Understanding and manipulating algal cell walls is likely to improve both cell harvesting and processing methodologies. To be economically viable, high oil species must effectively resist predation and endure the environmental changes inherent to outdoor growth facilities, as well as be amenable to efficient separation and extraction. These traits are embedded in the structure of the cell wall. Biofuel production processes will benefit from a deeper understanding of cell wall biosynthesis, composition, architecture, dynamism, and vulnerabilities. This will allow molecular biologists to genetically engineer more compliant strains, cue harvesters to intelligent flocculant options, guide chemists’ choices of enzymes and chemicals for biomass extraction and provide engineers with a physical basis for modeling the bulk flow properties of algal slurries.

Even at the ultrastructural level, eukaryotic algae present a diversity of cell wall types, including some algae, such as species of Dunaliella, that do not have a cell wall [71]. In C. 8 reinhardtii, the cell wall comprises five distinct layers of hydroxyproline-rich glycoproteins [72].

Yamada et al. found three distinct wall architectures among the thin-section electron micrographs of 12 surveyed species of Chlorella that contained either two layers—with or without a trilaminar outer sheath—or one microfibrillar layer [73]. The chrysophycean algae,

Pleurochrysis scherfellii, exhibits a wall of cellulosic plates embedded in a gelatinous, pectic matrix [74]. A thin retaining wall that is studded with tubules and freeze-fractures into two distinct layers surrounds colonies of Botryococcus braunii (Race B) cells, which are themselves each surrounded by a bipartite, polysaccharide wall [75].

As is the case with higher plants, algal walls assemble a panoply of molecules. Typically, , proteins, and glycoproteins are believed to form the bulk of the wall, but few mass balances of algal cell wall composition have been published (for example, [76]). Other classes of molecules found in the wall of some algae include algaenans [77–81], phlorotannins

[82], and lignin [83].

Cell wall polysaccharides including cellulose [84], chitin or chitosan-like molecules [85], hemicelluloses [86], pectins [87], fucans [88], alginates [89], ulvans [90], carrageans [91], and lichenins [92] have all been described in various algae. Fucans are proposed anticoagulants

[93], alginates are used by the pharmaceutical and cosmetic industries [94], and cellulose can be converted to biofuel. Chemical modifications of the polysaccharides, such as acetylation

[95,96] and sulfonation [88,89], help determine the properties of these molecules and how they are cross-linked into larger networks. An understanding of these linkages could yield improved approaches toward biomass deconstruction, and to that end, a large number of enzymes have been identified that attack both the primary linkages and cross-links of these matrices [97]. Ionic linkages among pectins are important determinants of cell wall rigidity and susceptibility to enzymatic attack in higher plants, and this is likely true in some algae as well [98]. If so, ionic chelators such as EDTA and EGTA might facilitate biomass extraction.

9 Identification of polysaccharides often begins with determination of their monomeric constituents, and an abundance of literature addresses these compositions in several species.

Notably, Takeda [99,100] catalogued monomeric sugars to classify 40 species of Chlorella and nine species of Scenedesmus. Species of Chlorella were grouped into cells with “rigid” walls

(resistant to 2 N TFA) of either glucosamine or galactose and mannose, and wall “matrix” material (hydrolyzed in 2 N TFA) that contained a broad suite of pentose and hexose sugars.

Scenedesmus species exhibited rigid walls of glucose, galactose and mannose (but no glucosamine), and wall matrices that were mannose, galactose, and glucose.

Though surveys of amino acid profiles of algal walls are available [101,102], characterization of the corresponding proteome is still in its infancy. Structural glycoproteins are common scaffolding material in the extracellular matrices of plants and animals, and plants often express hydroxyproline-rich glycoproteins (HRGPs), including the covalent and insoluble network of extensins and the GPI-anchored, soluble arabinogalactan proteins (AGPs) [103].

Unsurprisingly, this is true of algae as well. The cell wall of C. reinhardtii, for example, comprises self-assembling, extensin-like HRGPs [72], and HRGPs are also found in other [86,104]. The susceptibility of the Chlorella vulgaris cell wall to trypsin digestion suggests one way in which understanding the structural role of algal wall proteins might lead to improved biomass extraction methods [105].

Many algal genomes have now been published, and, in consort with proteomics efforts, that data is being mined for insight into cell wall metabolism. Wang et al. isolated 81 protein orthologs from the cell wall fraction of Haematococcus pluvialis, including proteins involved in cell wall construction, modification, and hydrolysis and others involved in signaling, transport, motility, and heat shock [106]. Putative cell wall proteins in the brown alga Ectocarpus siliculosus have also been identified [89].

Algaenans represent an intriguing class of molecules present in the cell walls of some algae [77–81]. These long-chain and typically saturated hydrocarbons are highly resistant to 10 hydrolysis and solvation, resembling the exine material of pollen grains and the cutin and cutan of higher plants [107,108]. Resolving whether these compounds are truly distinct from one another will require greater understanding of how each is biosynthesized. Some of the more recalcitrant algal cell walls, such as those found in species of Chlorella, Scenedesmus, and

Nannochloropsis, contain algaenans, and it is tempting to hypothesize that their presence makes these cell walls difficult to breach. What little is known of algaenans largely comes from the organic geochemistry literature, where they have been studied as a potential source of diagenic material. Allard et al. concluded that the algaenans of Chlorella emersonii, Tetraedron minimum and Scenedesmus communis were at least in part monomeric, extremely long-chain

(up to C120) carboxylic acids and alcohols [109]. On the other hand, Blokker et al. determined that polymeric, long-chain, ω-hydroxy fatty acids form the structural basis of algaenan in some green algae and are found ester-bound to the cell walls of several species [81]. Such findings suggest a role for in biomass disassembly. In fact, the hairlike projections on the outer surface of the Chlorella vulgaris cell wall are susceptible to phospholipase A1 digestion [105].

Other potentially valuable molecules associated with the cell wall are the carotenoids.

Though carotenoids are typically found in the plastid where they mediate light and heat transfer, a few studies have found them tightly associated with cell walls. It is not clear whether the carotenoids are integral to the wall or if free carotenoids are simply adventitiously binding to shed or otherwise disrupted walls. Canthaxanthin, , lutein, and other pigments have been described in preparations of the cell walls of Scenedesmus obliquus [102], and pigment bodies are strongly associated with the shed walls of a Nannochloropsis sp. [110]. Some algae divide by eleutheroschisis, wherein the parental cell wall is shed into the media as the daughter cells emerge. Owing to their resilient nature, these often pigmented, shed walls persist and accumulate in the culture where they likely affect light transmittance.

The and composition of the N. gaditana CCMP526 cell wall was characterized in 2014 [111]. A combination of quick-freeze, deep-etch electron , 11 FTIR, and carbohydrate analysis indicated a bilayer construction with an inner layer of cellulose enveloped by a hydrophobic algaenan outer shell. Algae protected by robust cell walls that buffer the inner cell from external threat possess an inherent advantage. Yet, these are barriers not only against predators, but also to biofuel processors who must contend with refractory biomass when trying to extract commodities of interest from the intracellular compartment. An expanded understanding of algal cell wall monomeric composition, polymeric architecture, and dynamic compositional changes are necessary to inform research regarding manipulation of cell walls into more compliant harvesting and deconstruction phenotypes, which will doubtlessly yield more cost-effective approaches to algal biofuel production.

1.6. Mixotrophic and heterotrophic cultivation

Significant effort has been made to maximize productivity under photoautotrophic growth conditions; however, little progress has been made to discover and understand reduced carbon assimilation pathways or enzymatic degradation of complex carbon substrates in algae.

Utilization of plant-based carbon substrates in addition to photosynthesis (mixotrophic growth) for biochemical assimilation into biomass, biofuels, and bioproducts, can increase cultivation productivity and improve the economic viability of algal-derived biofuels. Herein we report that a freshwater production strain of microalgae, Auxenochlorella protothecoides UTEX 25, is capable of directly degrading and utilizing non-food plant substrates, such as switchgrass, for cell growth. Glycome profiling of plant substrates before and after addition to A. protothecoides cultures demonstrates the utilization of xyloglucans. Genomic, proteomic and transcriptomic analyses revealed the identity of many enzymes that are hypothesized to be involved in complex carbohydrate degradation, including several family 5 and 9 glycosyl hydrolases. This work paves the way for future designer engineering of plant-carbon utilization to further improve productivity of algal production strains.

12 1.7. Focus of this work

Continued expansion of the human population combined with ambitions for higher living standards must be reconciled with climate change, ecosystem degradation, and the finite reserves of fossil fuels. Photosynthesis provides the caloric, nutritional, and oxygenic requirements for human metabolism and established the fossil energy reserves that are now deeply integrated into our society. A transition to more sustainable means of providing the bioenergy resources required for current and future anthropologic lifestyle is necessarily imminent and requires new photosynthetic platforms. Compared to traditional agricultural crops, relatively little time and effort has been devoted to PSM strain improvements for societal gain.

The last decade has seen explosive interest in gaining a better understanding of PSM physiology, metabolism, and photosynthesis, and dramatic advances have been made in manipulating these organisms. This work will focus on the development of extraction and characterization methods, genetic engineering technologies, and troubleshooting tests.

Several PSMs are already highly productive, genetically tractable, and likely more straightforward to manipulate than higher plants. Considerable opportunities remain to improve photosynthetic efficiencies and carbon utilization, as well as modulating product titers to enable economic feasibility. Given the rapid rate of progress in these areas at present, meaningful advances using these metabolically versatile organisms are likely to emerge in the coming years.

13

CHAPTER 2

CHARACTERIZATION OF PLANT CARBON SUBSTRATE UTILIZATION BY

AUXENOCHLORELLA PROTOTHECOIDES

Modified from a paper to be published in Algal Research1

Brian Vogler2, 3, 4, Shawn Starkenburg3, Nilusha Sudasinghe3, Joseph N.M. Rollin5, Sivakumar

Pattathil6, and Amanda N. Barry3, 7

2.1. Introduction

Auxenochlorella protothecoides strain UTEX25, a freshwater microalga, produces large quantities of [112–114], that can be directly converted to fuel. A. protothecoides can grow mixotrophically on simple sugars to increase biomass productivity [115] but feeding sugar to algae is not economically viable. Furthermore, adding sugar monomers to open ponds will inevitably cause rampant contamination by heterotrophic organisms. A potential cost-effective and energy-efficient alternative is the utilization of raw or minimally-treated (e.g., acid-treated, heat-treated) lignocellulosic feedstocks: trees, grasses, and agricultural residues [116]. These feedstocks are expected to increase in abundance over time which would further lower cost

[114]. Whereas current technology allows these plant substrates to be processed into alcohols with low energy density, using these substrates as feedstocks for algae converts them to high energy density algal lipids with highly reduced carbon chains compatible with existing transportation fuel infrastructure (e.g., jet fuel, diesel, and gasoline).

______

1 Reproduced with permission of coauthors. 2 Contributed to the metabolite NMR, genome analysis, proteomics and transcriptomics analysis, prepared figures and tables for publication, and prepared article for submission. 3 Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87544. 4 Department of Chemistry, Colorado School of Mines. 5 National Renewable Energy Laboratory, Golden, CO 80401. 6 Past: Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602. 7 Corresponding author.

14 The extent to which algae can degrade the three polymers present in lignocellulose

(cellulose, hemicelluloses, and lignin) is not known. Previous studies have shown that

Chlamydomonas reinhardtii is capable of degrading cellulose [116]. However, the degradation of complex lignocellulose has not been explored. Potential glycosyl hydrolases, enzymes that catalyze the glycosidic bonds in glycans such as cellulose and hemicelluloses, and their associated domains can be found in the deposited genome and transcriptome sequences of algae; however, no algal glycosyl hydrolases involved in the deconstruction of plant substrates have been identified. In the current study, we present the first example of algae degradation and utilization of untreated plant substrate, the putative genetic and molecular mechanism(s) behind this degradation, and identify potential glycosyl hydrolases that may be involved in plant deconstruction.

2.2. Materials and methods

2.2.1 Cultivation

Auxenochlorella protothecoides UTEX 25 was obtained from the UTEX Culture

Collection of Algae (https://utex.org/). Cultures were maintained on Sueoka’s high salt (HS) media[117] on agar plates under constant light at room temperature. Cultures were replated on

HS as necessary to maintain culture viability. For carboxymethylcellulose (CMC) degradation studies, cells were plated on HS media agar plates with 0.1% w/v CMC (Sigma) and grown in constant light or constant dark for 1 week. Culture plates were flooded with 0.2% w/v Congo

Red (Sigma) in water, incubated for 20 min, decanted, and then destained with repeated washings of 1 M NaCl to determine the degree of CMC degradation.

In growth with plant substrate experiments, plant matter (raw Pinacum virgatum

(switchgrass), 25% H2O2 pretreated switchgrass, ionic liquid pretreated switchgrass, raw

Eucalyptus grandis (eucalyptus), ionic liquid pretreated eucalyptus, untreated disintegrated

Betula pendula (silver birch) wood chips, soda-pulped disintegrated silver birch wood chips, liquid hot water pretreated Zea mays (corn) stover, ammonia fiber expansion (AFEX) pretreated 15 eucalyptus top portion, raw Solidago canadensis (goldenrod) biomass, and extractive ammonia fiber expansion (EAFEX) pretreated corn stover) (all biomass provided to S. Pattathil courtesy of

Idaho National Laboratory and Bruce Dale, Great Lakes Bioenergy Research Center, Michigan

State University) was added to HS media. For initial glycomic analysis, triplicate 50 mL cultures of HS media with and without (control) 0.2% w/v plant substrate and with and without (control)

A. protothecoides, inoculated from a liquid culture at an algae cell OD750 of 0.02, were grown at room temperature on a shaker in a 12/12 hr light/dark cycle at ~80 µmol/m-2s-1 light intensity.

Triplicate liquid samples of 50 mL were shipped to the University of Georgia Complex

Carbohydrate Research Center for glycome analysis.

Hemicellulose (xylan and xyloglucan) enriched extracts were extracted from plant substrates with 4 M KOH as previously described [118]. For hemicellulose analyses, triplicate

100 mL cultures of HS media with and without (control) 0.01 g xylan extracts of raw switchgrass, ionic liquid pretreated switchgrass, 25% H2O2 pretreated switchgrass, raw eucalyptus, and ionic liquid pretreated eucalyptus with and without (control) A. protothecoides at a starting OD750 of

0.037 were grown in constant dark or constant minimal light (~25 μmol·m-2·s-1) and constant shaking at 22 °C for two weeks. 50 mL of each of these cultures was sent for glycome analyses.

The remaining culture volume was centrifuged, and algae pellets were stored at -80 °C for proteomic analyses.

To examine growth with raw switchgrass, triplicate 50 mL cultures of A. protothecoides inoculated at around 4 x 105 cells/mL were grown with and without (control) switchgrass (0.2% w/v) at room temperature on a shaker in a 12/12 hr light/dark cycle at ~80 µmol/m-2s-1 light intensity. Cell density was determined daily for all samples with a hemocytometer. All experiments were performed in triplicate and the standard deviations and a significance value

(P-value) of the three biological replicates were determined. After 2 weeks of growth, culture samples were filtered to remove plant debris for metabolite 1D 1H-NMR analysis and fatty acid methyl ester (FAME) analysis for determination of algal lipids. Specific growth rate of flask 16 cultures was calculated with the following equation: μ = ln(N2/N1)/(t2 − t1), where μ is the specific growth rate, and N1 and N2 are the biomass at time 1 (t1) and time 2 (t2), respectively during the exponential growth phase.

For transcriptomic analysis, 50 mL cultures of HS with A. protothecoides at OD750 of 0.03 were grown in triplicate with and without (control) 0.1% w/v CMC (Sigma) in flasks under constant light or constant dark. After one week of growth, cells were pelleted and stored at -80

°C prior to RNA extraction and sequencing.

2.2.2. Metabolite 1D 1H-NMR

Analysis of cellulose and CMC conversion was conducted on an Avance 400 MHz NMR spectrophotometer (Bruker). Samples of HS media with switchgrass or CMC alone, algae alone, and switchgrass or CMC with algae, were incubated in constant light and shaking for one week.

Cells and insoluble biomass were pelleted by centrifugation at 14,000 x g for three minutes and the supernatants were diluted by addition of 20% v/v D2O. Trimethylsilylpropionic acid (Sigma) was used as an internal standard at 1 g/L. Water suppression was conducted using the

Watergate W5 pulse sequence with double gradient echo [119], with a D1 relaxation time of 10 s, 4 dummy scans and 8 sample scans.

2.2.3. FAME analysis

Lipids were determined as FAME by gas chromatography coupled with flame ionization detection (GC/FID) according to Van Wychen et al. [120]. Acid-catalyzed transesterification to quantify the total fatty acids in the biomass was performed by treating 5-10 mg of freeze-dried algae biomass containing 25 µL of 10 mg/mL methyl tridecanoate (C13:0ME) as the internal standard with 200 µL of chloroform:methanol (2:1, v/v) and 300 μL of 0.6 M HCl:methanol. The samples were heated at 85 °C for 1 h and FAMEs were back extracted with 1 mL of hexane.

The hexane extracts were analyzed with an Agilent 7890A Series GC/FID. 2 μL injections at a 10:1 split ratio were loaded onto a DB-WAX column (30 m length x 0.25 mm inner diameter x 0.25 μm film thickness, (Agilent Technologies, Santa Clara, CA)). Helium was used 17 as the carrier gas at a flow rate of 1 mL/min. Initial column temperature was 100 °C. Then, the temperature was ramped to 200 °C at 25 °C/min and held for 1 min and again ramped to 242 °C at 1.5 °C/min and held for 1 min (35 min total). The inlet and the detector temperatures were

250 and 280 °C, respectively. Chromatographic signals were matched to a GLC 461C 30- component FAME standard mix (Nu-Chek Prep, Inc., Elysian, MN), and FAME quantification was performed by internal calibration using C13:0ME as the internal standard.

2.2.4 Glycome profiling

To determine the type of cell-wall glycans in A. protothecoides, glycome profiling was performed as previously described [118]. Briefly, sequential extracts of the Alcohol Insoluble

Residues (AIR) from various plant biomass samples was completed using increasingly harsh reagents to isolate cell wall components on the bases of the relative tightness with which they are integrated onto the cell walls followed by screening with a comprehensive suite of cell wall glycan directed monoclonal antibodies (mAbs) that can monitor glycan epitopes comprised in most major non-cellulosic matrix glycans. The comprehensive suite of mAbs employed in glycome profiling were procured from laboratory stocks (CCRC, JIM and MAC series) at the

Complex Carbohydrate Research Center (available through CarboSource Services; http://www.carbosource.net) or from BioSupplies (Australia) (BG1, LAMP).

2.2.5 Proteomic analysis

Pellets were shipped to Kendrick Labs (Madison, WI) for analysis by 2D gel electrophoresis. Two-dimensional electrophoresis was performed according to the carrier ampholine method of isoelectric focusing [121,122]. Isoelectric focusing was carried out in a glass tube of inner diameter 3.3 mm using pH 3-10 Isodalt Servalytes (Serva, Heidelberg,

Germany) for 20,000 volt-hrs. 100 ng of an IEF internal standard, tropomyosin, was added to each sample. This protein migrates as a doublet with lower polypeptide spot of MW 33,000 and pI 5.2; an arrow on the stained gels marks its position. The tube gel pH gradient plot for this set of Servalytes was determined with a surface pH electrode. After equilibration for 10 min in buffer 18 "O" (10% glycerol, 50mM dithiothreitol, 2.3% SDS and 0.0625 M tris, pH 6.8), each tube gel was sealed to the top of a stacking gel that overlaid a 10% acrylamide slab gel (1.0 mm thick). SDS slab gel electrophoresis was carried out for about 5 hrs at 25 mA/gel. The following proteins

(Sigma Chemical Co., St. Louis, MO and EMD Millipore, Billerica, MA) were used as molecular weight standards: myosin (220,000), phosphorylase A (94,000), catalase (60,000), actin

(43,000), carbonic anhydrase (29,000), and lysozyme (14,000). These standards appear as bands at the basic edge of the silver-stained [123] 10% acrylamide slab gels. The gels were dried between sheets of cellophane paper with the acid edge to the left.

2.2.5.1. Computerized comparisons

Duplicate gels were obtained from each sample and were scanned with a laser densitometer (Model PDSI, Molecular Dynamics Inc, Sunnyvale, CA). The scanner was checked for linearity prior to scanning with a calibrated Neutral Density Filter Set (Melles Griot,

Irvine, CA). The images were analyzed using Progenesis Same Spots software version 4.5,

2011, TotalLab, UK) and Progenesis PG240 software (version 2006, TotalLab, UK).

Computerized analysis for these pairs included image warping followed by spot finding, background subtraction (average on boundary), matching, and quantification in conjunction with detailed manual checking. Spot % was calculated from spot integrated density above background (volume) expressed as a percentage of total density above background of all spots measured. Differential expression was defined as fold-change of the spot percentages. For example, if corresponding protein spots from two different samples (e.g., photoautotrophic versus mixotrophic) have the same spot %, the difference is indicated as 1.0; if the spot % between samples is twice as large as wild type, the difference is 2.0 indicating 2-fold up regulation.

2.2.5.2. MW and pI measurements

The isoelectric point (pI) measurements were approximated based on the pH gradient plot for the batch of ampholines for conditions of 9 M urea and room temperature of 22 °C. The 19 molecular weight and pI values for each spot were determined from algorithms applied to the reference image.

2.2.5.3. Protein digestion and peptide extraction

Spots 211, 263, 391, 466, 68, 433, 572, and 576 were selected for excision and sequencing. Proteins that were separated by SDS-PAGE/2D-PAGE and stained by Coomassie dye were excised, washed and the proteins from the gel were treated according to published protocols [124–126]. Briefly, the gel pieces were washed in high purity, high performance liquid chromatography HPLC grade water, dehydrated and cut into small pieces and destained by incubating in 50 mM ammonium bicarbonate, 50 mM ammonium bicarbonate/50% acetonitrile, and 100% acetonitrile under moderate shaking, followed by drying in a speed-vac concentrator.

The gel bands were then rehydrated with 50 mM ammonium bicarbonate. The procedure was repeated twice. The gel bands were then rehydrated in 50 mM ammonium bicarbonate containing 10 mM DTT and incubated at 56 °C for 45 minutes. The DTT solution was then replaced by 50 mM ammonium bicarbonate containing 100 mM Iodoacetamide for 45 minutes in the dark, with occasional vortexing. The gel pieces were then re-incubated in 50 mM ammonium bicarbonate/50% acetonitrile, and 100% acetonitrile under moderate shaking, followed by drying in speed-vac concentrator. The dry gel pieces were then rehydrated using 50 mM ammonium bicarbonate containing 10 ng/μL trypsin and incubated overnight at 37 °C under low shaking.

The resulting peptides were extracted twice with 5% formic acid/50 mM ammonium bicarbonate/50% acetonitrile and once with 100% acetonitrile under moderate shaking. Peptide mixture was then dried in a speed-vac, solubilized in 20 μL of 0.1% formic acid/2% acetonitrile.

2.2.5.4. NanoLC-MS/MS

The peptides mixture was analyzed by reversed phase nanoliquid chromatography (LC) and MS (LC-MS/MS) using a NanoAcuity UPLC (Micromass/Waters, Milford, MA) coupled to a

Q-TOF Xevo G2 mass spectrometer (Micromass/Waters, Milford, MA), according to published procedures [125,127,128]. Briefly, the peptides were loaded onto a 100 μm x 10 mm 20 NanoAquity BEH130 C18 1.7 μm UPLC column (Waters, Milford, MA) and eluted over a 60- minute gradient of 2-80% organic solvent (ACN containing 0.1% FA) at a flow rate of 400 nL/min. The aqueous solvent was 0.1% FA in HPLC water. The column was coupled to a

Picotip Emitter Silicatip nano-electrospray needle (New Objective, Woburn, MA). MS data acquisition involved survey MS scans and automatic data dependent analysis (DDA) of the top six ions with the highest intensity ions with the charge of 2+, 3+ or 4+ ions. The MS/MS was triggered when the MS signal intensity exceeded 250 counts/second. In survey MS scans, the three most intense peaks were selected for collision-induced dissociation (CID) and fragmented until the total MS/MS ion counts reached 10,000 or for up to 6 seconds each. The entire procedure used was previously described [125,127,128]. Calibration was performed for both precursor and product ions using 1 pmol GluFib (Glu1-Fibrinopeptide B) standard peptide with the sequence EGVNDNEEGFFSAR and the monoisotopic doubly-charged peak with m/z of

785.84.

2.2.5.5. Data processing and protein identification

The raw data were processed using ProteinLynx Global Server (PLGS, version 2.4) software as previously described [127]. The following parameters were used: background subtraction of polynomial order 5 adaptive with a threshold of 30%, two smoothings with a window of three channels in Savitzky-Golay mode and centroid calculation of top 80% of peaks based on a minimum peak width of 4 channels at half height. The resulting pkl files were submitted for database search and protein identification to the in-house Mascot server

(www.matrixscience.com, Matrix Science, London, UK) for database search using the following parameters: databases from NCBI (Plants), parent mass error of 0.5 Da with 1 13C, product ion error of 0.8 Da, enzyme used: trypsin, three missed cleavages, propionamide as cysteine fixed modification and Methionine oxidized as variable modification. To identify the false negative results, we used additional parameters such as different databases or organisms, a narrower error window for the parent mass error (1.2 and then 0.2 Da) and for the product ion error (0.6 21 Da), and up to two missed cleavage sites for trypsin. In addition, the pkl files were also searched against in-house PLGS database version 2.4 (www.waters.com) using searching parameters similar to the ones used for Mascot search. The Mascot and PLGS database search provided a list of proteins for each gel band. To eliminate false positive results, for the proteins identified by either one peptide or a mascot score lower than 25, we verified the MS/MS spectra that led to identification of a protein.

2.2.6. Genomic sequencing, assembly, and annotation

The A. protothecoides UTEX25 genome was sequenced using a combination of Illumina

HiSeq 2000 [129] and 454 sequencing technologies [130]. A 1× 100 base-pair Illumina shotgun library was prepared using standard TruSeq protocols and sequenced from bulk A. protothecoides genomic DNA on an Illumina GAII sequencer to generate 2 billion reads.

Additional shotgun single-end and paired-end (11-kb insert) DNA libraries were prepared for sequencing on the 454 Titanium platform, which resulted in the generation of 1.16 and 1.15 million reads, respectively. The 454 single-end data and the 454 paired-end data were assembled together using Newbler (version 2.3, release 091027_1459). The Illumina-generated sequences were assembled separately with (version 1.0.13) [131]. The resulting consensus sequences from both the VELVET and Newbler assemblies were computationally shredded into 10-kb fragments and re-assembled with reads from the 454 paired-end library using parallel phrap (version 1.080812, High-Performance Software, LLC) [132]. The mitochondrial genome was identified in this final hybrid assembly using homologous blast searches against Chlorella variabilis and Chlorella vulgaris mitochondrial genomes.

Misassemblies were corrected using Dupfinisher [133], and repeat resolution was performed in

Consed to generate the final consensus sequences [134].

The resulting assembly was annotated using MAKER version 2.32.8_5.24 [135]. De novo assembled transcripts from all functional studies (see Transcriptome Analysis), and protein

22 Table 2.1. General characteristics of the Auxenochlorella protothecoides genome.

Feature Total sequence length (bp) 21217637 N50 scaffold length (bp) 639896 Longest scaffold (bp) 2059987 Number of contigs 1892 Number of scaffolds 217 % of assembly in scaffolded contigs 98.9 Number of genes 6039 % of genome covered by genes 77 mean exons per mRNA 7 mean per mRNA 6 % of genes annotated 74.3 mitochondria genome size (bp) xx genome size (bp) 84579

homology evidence from a closely related strain of A. protothecoides (Genbank Accession

#GCF_000733215.1) were used to train and constrain structural gene models. All predicted genes were functionally annotated using InterProScan (version 5.21-60.0) by conducting homology searches against proteins in IntroPro, Uniprot and Swiss-Prot database using Blastp

(version NCBI 2.2.28+). The mitochondrial genome was previously deposited under accession number KC631634.1. The general characteristics of the assembly and annotation are described in Table 2.1.

From the genome sequence of A. protothecoides UTEX 25 and Auxenochlorella sp.

0710, thirty potential proteins with homology to the glycosyl hydrolase (GH) family of proteins were identified [136]. The identified proteins were scanned for conserved domains using NCBI’s

23 conserved domain database [137] to identify the GH family and were analyzed using TargetP

1.1 to predict subcellular localization [138]. The glycosyl hydrolases of families 5 and 9 were each used to search the characterized members of these families available on the

Carbohydrate-Active enZYmes Database (CAZY; www.cazy.org), to identify and verify the specific reactions catalyzed. Using PhyML 3.2 with the Le & Gascuel substitution model and

100 bootstraps, we estimated the maximum likelihood phylogenies of these GH5 and GH9 sequences from the predicted protein sequences [139].

2.2.7 Transcriptomic sequencing and analysis

As described above, cultures were grown in triplicate with and without 0.1% CMC in HS media in flasks in both constant light and constant dark. In all cases, after five days, total RNA was extracted from 100 mg of cells within the algae pellet using the Direct-zol RNA-miniprep kit

(ZYMO, P/N 2051) according to the manufacturer’s instructions. Each total RNA sample (tRNA) was enriched for mRNA by hybridizing the poly(A) tail to oligo d(T)25 probes covalently coupled to magnetic beads, followed by elution (NEB, P/N S1419S). The enriched mRNA fractions were prepared for Illumina sequencing using the ScriptSeq V.2 RNA-seq Library Preparation Kit

(Epicentre, P/N SSV21106). The resulting sequencing library for each sample was denatured to a final concentration of 1.5 pM before being loaded onto the NextSeq Reagent Cartridge for a paired-end sequencing on a Nextseq 500 (2 X 150bp), multiplexed at 6 samples per lane.

The recovered forward and reverse 2 X 151 bp Nextseq reads were trimmed for quality using FastQ Quality Control Software [140]. All reads with a minimum sliding window q-score of

15, and a minimum trimmed read length of > 29 base pairs were kept for mapping and differential expression analysis. Quality filtered reads were mapped with Bowtie2 using the default settings, per the manual. Abundance estimation (RPKM; fragments per kilobase of exon per million mapped reads) was calculated using RSEM. Differential expression of transcripts was determined with EdgeR. Transcripts that were differentially expressed >2 fold with a p- value < 0.001 were considered statistically significant. 24 2.3. Results and discussion

2.3.1. Degradation of cellulose and plant substrates

To examine the degradation of cellulose by A. protothecoides UTEX 25, we grew A. protothecoides on CMC-containing agar plates kept constant dark (Figure 2.1). Based on the clearing of Congo Red staining of the cellulose in the plate kept in the dark, A. protothecoides appears not only to actively degrade cellulose, but to do so more than C. reinhardtii (Figure 2.1).

The diffuse pattern of degradation on the plate suggests that the cellulolytic enzyme is secreted from the organism. To explore the impact of plant substrate degradation by A. protothecoides, we examined the growth of the organism on raw switchgrass. By measuring cell counts over time, we observed a >40% increase in growth rate over days 1-5 in the presence of switchgrass

(μ = 1.21 vs. 0.85 day-1), resulting in an average of >140% increase in cell count at stationary phase (days 9-14). (Figure 2.2; Figure 2.3).

To determine the impact of mixotrophic growth with switchgrass on lipid accumulation, we examined the total algal lipids by FAME analysis. A. protothecoides grown with switchgrass show higher lipid accumulation compared to the cultures grown without (9% vs. 7%; Figure

2.3A). Relatively lower lipid contents observed for A. protothecoides in the present study could possibly due to growing the cultures with no exogenous supply. Increasing lipid concentration by supplementation of organic carbon in A. protothecoides and other Chlorella sp. in either heterotrophic or mixotrophic culturing has been observed previously [136], but this study is the first we are aware of that utilizes raw plant matter as the carbon source. Fatty acid profiles for the cultures grown with and without switchgrass were compared to examine the impact of the carbon source on the fatty acid distribution (Figure 2.4B). Biomass generated from the cultures grown with switchgrass showed slightly higher levels of C14:0, C16:0, C18:2n6 and

C18:3n3 fatty acids compared to the cultures grown without switchgrass. Additionally, C20:0 and C20:1 are only observed in the biomass generated from the cultures grown in the presence of switchgrass. 25

Figure 2.1. Algae utilization of carboxymethyl cellulose (CMC) in the absence of light. Reduction of the red stain indicates CMC consumption. Prestained and stained plates of (A) A. protothecoides and (B) Chlamydomonas reinhardtii are shown.

Figure 2.2. Comparison of A. protothecoides growth with and without augmentation with raw switchgrass (0.2% w/v). Cultures were grown with a 12:12 light:dark cycle at ~80 µmol/m-2s-1 light with constant shaking at room temperature for two weeks. All experiments were performed in triplicate and the error bars represent the standard deviations of the three biological replicates. Significant differences in cell density are marked with asterisks (* for p < 0.05 and ** for p < 0.01). Triplicate cultures at 14 days are shown in the inset.

26

3.0 2.5 Algae + Switchgrass 2.0 Algae alone 1.5 1.0 0.5

0.0 Doublings/Day -0.5 -1.0 -1.5 0 2 4 6 8 10 12 14 Days since Inoculation Figure 2.3. The doublings per day were calculated for each sample and the average for each set (Algae + Switchgrass and Algae alone) is shown.

Figure 2.4. Lipid contents (A) and fatty acid profiles (B) by GC/FID for A. protothecoides biomass grown with and without switchgrass. Data shown as the mean of the triplicate growth experiments and the error bars represent the standard deviations of the three biological replicates.

To observe whether plant substrate structure was modified by A. protothecoides, we performed metabolite 1D 1H-NMR on cultures with substrate (switchgrass or CMC) alone, with algae alone, and with both substrate and algae. By examining metabolite 1D 1H-NMR, we observe a diminishing in peaks associated with alkane C-H bonds (0.8-1.5 ppm) and methylene 27 A

B

Figure 2.5. Metabolite 1D 1H-NMR analysis of A. protothecoides cultures. HS culture medium was amended with (A) switchgrass or (B) CMC in the presence and absence of algae. Representative spectra from each treatment are shown.

28 protons in cellulose (3.5-4.0 ppm) [141], when comparing cultures with switchgrass alone to cultures where algae and switchgrass are grown together (Figure 2.5A). We see these same peaks disappearing with cultures grown with CMC with and without algae (Figure 2.5B), indicating a similar mechanism of carbon utilization with both substrates.

With each substrate, the signal attributed to alkane C-H bonds from algae cultivated with substrate is less than or equal to the signals from either substrate or algae alone, when traces are roughly normalized. This allows the possibility that the substrate alkanes are completely or almost completely incorporated into the algae, as we would otherwise expect a signal intensity between the two (algae alone and substrate alone). We also note two peaks, at 1.27 and 2.22 ppm, which are present in algae alone, but that disappear when the algae are cultured with either substrate. These phenomena are more difficult to explain, but may be the result of a chemical produced by the cells only in the absence of substrate. Alternatively, this chemical may be altered or degraded by interaction with our substrates.

To assess the range of non-cellulosic plant cell wall glycans (including most major hemicelluloses and pectic-polysaccharides) that are utilized or modified by A. protothecoides, we profiled its glycome (using suite of monoclonal antibodies raised against different types of cell wall glycans) in cultures amended with a panel of 12 untreated or pretreated plant biomass substrates. After active culturing, the abundance of specific cell wall glycan epitopes in the cell wall extracts varied as a result of presence of specific plant biomass substrates (Figure 2.6; light blue arrows in yellow dotted blocks).

Algal growth specifically altered the abundance of epitopes contained in hemicellulose like non-fucosylated and fucosylated xyloglucan, in substituted and unsubstituted xylan, pectic backbone including homogalacturonan and rhamnogalacturonan backbones, and in pectic arabinogalactan. In general, the depletion of these epitopes was noted in a majority of the biomass tested. Further, non-cellulosic glycan epitopes increased in extracts with algae indicting

29

Figure 2.6. Plant cell wall glycan-directed monoclonal antibodies detection (Glycome profiling) of most major non-cellulosic carbohydrates in plant cellulosic substrates incubated in media with or without A. protothecoides for 2 weeks. Regions of evidence for carbohydrate utilization/modification by algal growth are indicated by light blue arrows and xylan region is shown in yellow dotted block.

that A. protothecoides induced modifications in the plant cell wall structure, causing changes in the extractability of epitopes of xyloglucans, pectic polysaccharides, glucomannan and β-glucan.

To further examine which specific glycans are utilized, we grew A. protothecoides on specific plant carbohydrate extracted from switchgrass (enriched in hemicelluloses, such as xylans and xyloglucans). A. protothecoides was able to utilize the extracts as a carbon source.

When we repeated glycome profiling that included hemicellulose-specific antibodies, we found that the predominant components utilized by the algae cells were xyloglucans (Figure 2.6), suggesting that A. protothecoides actively expresses one or more xyloglucanases, enzymes that facilitates the hydrolysis of cellulose in wood. This finding is consistent with the fact that this organism was first isolated from woody material [142]. Unexpectedly and outside the scope of

30 this study, glycome profiling also revealed the presence of pectic arabinogalactans in the A. protothecoides cell wall, a valuable thickening agent for food [143] (Figure 2.7).

Figure 2.7. Cell wall glycan-directed mAb screening of cell wall extracts in culture media incubated without (1-5) and with (1C-5C) algae. 4nM KOH extracts were isolated from raw switchgrass (1), ionic liquid pre-treated switchgrass (2), 25% H2O2-washed switchgrass (3), raw eucalyptus (4), and ionic-liquid pre-treated eucalyptus (5) to supplement with media for incubating with and without A. protothecoides. Lighted bars represent presence of sugar. Green-highlighted boxes show levels of xyloglucans are vastly reduced when algae is present (+). Blue-highlighted box shows algae produces a valuable coproduct arabinogalactan. A representative sample from each treatment is shown.

2.3.2. Genome analysis

To determine the genetic basis of the observed xyloglucanase activity, we sequenced, assembled and annotated the genome of A. protothecoides UTEX25. Homology searches with publicly available xyloglucanase protein and transcript sequences identified at least 30 potential proteins in A. protothecoides UTEX25 with homology to the glycosyl hydrolase (GH) family of proteins (Table 2.1).

All of these proteins are also conserved in the genome of a closely related strain of A. protothecoides [136]. Proteins involved in ER lipid and glycoprotein shuttling and synthesis

31 (glycosyl families 16, 31, and 32) were identified along with three GH family 47 homologs that potentially function as alpha-mannosidases. Other GH or GH-like proteins identified include proteins homologous to GH family 20 protein (a potential hexosamidase), GH family 38 (a potential alpha-mannosidase), and GH family 63, potentially involved in the transport of phosphorylated mannosylglycerate within the cell, with a C-terminal adenylosuccinate lyase.

Furthermore, a GH 18 family homolog was identified that contains a D-X-X-D-X-D-X-Q motif.

Whereas chitinase members of the GH18 family contain a D-X-X-D-X-D-X-E motif, the Glu to

Gln substitution is indicative of xylanase inhibitors [144]. Two of these potential glycosyl hydrolases include sequences with high homology (E-value < 10-30) to peptidases, likely involved in other cellular processes.

The largest fraction of GH and GH-like protein homologs identified in A. protothecoides are most similar to the family 5 (GH5) and family 9 (GH9) type of glycosyl hydrolases. Two of the GH5 family of proteins contain a PAN/APPLE-like domain, which are functionally versatile mediators of protein-protein and protein-carbohydrate interactions [145]. Finally, one protein

(APUTEX2500005777) includes a CESA-CelA-like domain, of the glucosyl transferase family, usually involved in the elongation of the glucan chain of cellulose. Maximum likelihood phylogenetic trees of these two families with the characterized enzymes from the nearest neighbor are all of only two enzyme commission (EC) numbers (Figure 2.7).

For GH5, all characterized homologs were EC 3.2.1.78, indicating mannan endo-1,4- beta-mannosidase activity. The GH 9 homologs were all EC 3.2.1.4, indicating endocellulase activity. The characterized members of the GH 5 and 9 families have over 20 different catalytic functions, including xylanase activities (EC 3.2.1.8, 3.2.1.37, 3.2.1.151), which were notably absent from the nearest homologs.

The sequences of the predicted GH5 and GH9 proteins were further analyzed for the presence of active sites. In family 5, each sequence was found to have the conserved Asn in the active site. More broadly, the complete PROSITE motif ([LIV]-[LIVMFYWGA](2)-[DNEQG]-[LIVMGST]- 32 {SENR}-N-E-[PV]-[RHDNSTLIVFY]) was also present in all members, when allowing for two mismatches, with five having only one incorrect amino acid and member completely conserved

(Figure 2.9). GH family 9 has two active site signatures ([STV]-x-[LIVMFY]-[STV]-x(2)-G-x-

[NKR]-x(4)-[PLIVM]-H-x-R and [FYW]-x-D-x(4)-[FYW]-x(3)-E-x-[STA]-x(3)-N-[STA]), with three residues annotated as active sites. The first signature was not completely conserved in any of the members (and completely absent from one), but the active His residue is present in four of the seven members. The second signature is found in all but one member, with complete conservation in four of the seven, including the active Asp and Glu residues. Notably, coding sequences in the UTEX25 gene models for most of the glycosyl hydrolases differed slightly from the sequences catalogued in NCBI, with additional exons or introns.

2.3.3. Proteomics and transcriptomics

To further characterize the enzymes involved in cellulolytic degradation and to explore the molecular mechanisms of plant substrate utilization, we analyzed the proteome and transcriptome of A. protothecoides UTEX 25 cultures grown in the presence or absence of either raw or H2O2-treated switchgrass. With respect to the proteome (2D gel analysis; Figure

2.10), several proteins were upregulated in the presence of plant substrates compared to algae grown photoautotrophically (Table 2.2). Eight of the proteins with the highest degree of differential expression were selected for identification by LC-MS/MS (Table 2.4).

The most differentially expressed algal protein in the presence of plant substrate (spot

211) is predicted to have an NAD-binding domain and may be a potential oxidoreductase. The remaining seven spots contained peptides from the 50S ribosomal protein L9, a putative RNA- binding protein, putative plastid-lipid-associated chloroplastic protein 4, 60S ribosomal protein

L6, protein YOP1, chlorophyll a-b binding protein 3C, ATP-dependent Clp protease proteolytic subunit-related protein 4, 40S ribosomal protein S8, 60S ribosomal protein L19-2, chlorophyll a- b binding protein of LHCII type I, ribulose bisphosphate carboxylase small chain 7, fructokinase-

1, photosystem II CP43 chlorophyll apoprotein, and photosystem I reaction center subunit II. 33 Table 2.2. A catalog of the potential GH-encoding genes from A. protothecoides sp. 0710 and their UTEX25 homologs. Included are the glycosyl hydrolase families identified by the Conserved Domain Database and the respective E-Value. Also included are predicted subcellular localizations and reliability scores (1 is high confidence, 5 is low), generated by TargetP. From the transcriptome analysis, the fold-change (FC) of each transcript is given during the dark and light periods in the presence of CMC, relative to the unsupplemented culture, as well as p values. Significant differences (p < 0.05) are highlighted in bold.

Transcriptome Analysis Gene ID CDD GH Hit CMC Dark CMC Light UTEX 25 sp. 0710 Family E-Value FC p-value FC p-value APUTEX2500001730 KFM23057.1 GH2 0.0E+000 1.0 0.3412 0.9 0.1265 APUTEX2500000235 KFM29315.1 GH5 1.9E-020 0.5 0.0189 1.1 0.3191 APUTEX2500000742 KFM23002.1 GH5 7.1E-020 1.3 0.2214 1.3 0.0248 APUTEX2500000844 KFM25716.1 GH5 1.3E-022 1.3 0.0531 1.2 0.0766 APUTEX2500000985 KFM28016.1 GH5 3.5E-017 0.2 0.0720 1.2 0.3396 APUTEX2500002156 KFM22632.1 GH5 4.0E-025 1.0 0.3401 1.1 0.1168 APUTEX2500002671 KFM29130.1 GH5 2.7E-021 1.2 0.2156 1.1 0.3292 APUTEX2500003890 KFM28332.1 GH5 7.4E-013 0.9 0.3470 1.0 0.3405 APUTEX2500004827 KFM23138.1 GH5 8.2E-021 1.6 0.0707 0.9 0.3385 APUTEX2500000468 KFM25061.1 GH9 2.9E-084 1.1 0.2258 1.0 0.3130 APUTEX2500000981 KFM28012.1 GH9 3.4E-050 1.4 0.0384 0.9 0.2630 APUTEX2500003119 KFM24581.1 GH9 4.6E-098 1.0 0.3535 1.2 0.0761 APUTEX2500005369-A KFM24183.1 GH9 1.1E-113 1.8 0.0195 1.3 0.1806 APUTEX2500005369-B KFM24184.1 GH9 9.2E-104 0.9 0.3152 0.9 0.3074 APUTEX2500005369-C KFM24182.1 GH9 2.8E-100 1.4 0.0384 0.9 0.1853 APUTEX2500005777 KFM24064.1 GH9 3.6E-067 1.4 0.0125 1.1 0.1316 APUTEX2500000143 KFM24770.1 GH16 3.0E-065 1.2 0.2949 1.2 0.2977 APUTEX2500002026 KFM25245.1 GH18 6.8E-009 1.5 0.0084 0.9 0.2833 APUTEX2500002706 KFM22671.1 GH20 2.9E-104 1.5 0.0643 1.0 0.3535 APUTEX2500005097 KFM28226.1 GH27 8.8E-128 0.7 0.0905 1.2 0.1310 APUTEX2500001996 KFM25289.1 GH31 0.0E+000 0.7 0.0547 1.0 0.3534 APUTEX2500003474 KFM25547.1 GH31 0.0E+000 0.5 0.0197 1.0 0.3450 APUTEX2500000041 KFM23987.1 GH32 1.7E-067 1.6 0.0222 1.1 0.2441 APUTEX2500000101 KFM28270.1 GH38 6.6E-065 0.9 0.3005 1.0 0.2626 APUTEX2500005781 KFM24068.1 GH43 3.9E-015 1.2 0.2908 0.7 0.1250 APUTEX2500000370 KFM24944.1 GH43 2.3E-005 1.2 0.2460 1.0 0.3494 APUTEX2500000577 KFM26117.1 GH47 3.7E-172 0.9 0.2522 1.2 0.1479 APUTEX2500002068 KFM28922.1 GH47 6.1E-111 1.1 0.3259 1.0 0.3458 APUTEX2500004355 KFM22865.1 GH47 0.0E+000 1.2 0.2791 1.1 0.1297 APUTEX2500001195 KFM26337.1 GH63 0.0E+000 1.4 0.0281 1.1 0.1980

34 A

2.0 B

0.5 Figure 2.8. Phylogenetic analysis of (A) Glycosyl Hydrolase Family 5 and (B) Glycosyl Hydrolase Family 9 proteins and their nearest characterized neighbors.

A

B

Figure 2.9. Alignments of the GH5 (A) and GH9 (B) family proteins and their nearest characterized neighbors. Two PROSITE motifs for presumed active site are available for family 5, and one for family 9.

35 A B

C D

Figure 2.10. Proteome of A. protothecoides in the presence and absence of plant substrates. 2D Gel images of protein extracts from (A) Algae Alone, (B) Plant Alone, and (C) Algae + Plant. Spots present on the silver-stained gels of sample Plant Alone were removed from the image and total spot counts. (D) Overlay image shows sample Algae + Plant in green overlaying Algae Alone in magenta. Image appears white where spots of similar intensity overlay each other.

36 Table 2.3. Quantitative summary of differentially expressed proteins on duplicate 2D gels. 2D gel comparisons of 1C Algae + Plant and 2C Algae + H2O2 Treated Plant versus Algae Alone. Reference spot numbering, pI, and MW are given for changing polypeptide spots analyzed in samples 1C Algae + Raw Plant (gels LF1093 #5-6), 2C Algae + H2O2 Treated Plant (gels LF1093 #8 and LF1094 #1), and HSC Algae Alone (gels LF1093 #3-4). Also shown are the fold increases or decreases (difference) of the polypeptides for 1C Algae + Raw Plant and 2C Algae + H2O2 Treated Plant vs. HSC Algae Alone. The differences are calculated from average spot percentages from duplicate 2D gels (individual spot density divided by total density of all measured spots). Polypeptide spots increased by a fold increase of ≥ 3.0 or a fold increase of ≥ 1.7 and a p-value ≤ 0.05 are highlighted in blue in the difference column, while spots decreased with a fold decrease of ≤ -3.0 or a fold decrease of ≤ 1.7 and a p-value ≤ 0.05 are highlighted in red. NC = difference not confirmed. Polypeptide spots with a p-value of ≤ 0.05 are highlighted in green in the T-test column. Spot percentages are given to indicate relative abundance. A total of 519 spots were analyzed. Highlighted spots were selected for identification by spot analysis.

Algae + Algae + Algae + Algae + Raw Plant H O Plant Algae Raw H O vs Algae vs Algae Spot Alone Plant Plant Alone ₂Alone₂ # pI MW Ave. % Ave. % Ave.₂ %₂ Fold p Fold p 211 7.5 41,729 0.087 0.048 0.014 6.4 0.020 3.5 0.548 259 4.8 38,224 0.024 0.007 0.011 2.1 0.030 -1.5 0.162 263 5.3 37,867 0.053 0.039 0.032 1.7 0.031 1.2 0.116 367 7.9 31,260 0.389 0.565 0.093 NC 0.368 6.1 0.349 391 6.5 28,763 0.008 0.006 0.005 1.7 0.018 1.3 0.499 461 7.3 24,060 0.018 0.034 0.004 4.2 0.102 NC 0.412 466 7.5 23,833 0.034 0.020 0.007 5.1 0.037 3.1 0.230 638 4.4 12,461 0.029 0.018 0.005 5.9 0.439 3.6 0.257 639 4.4 12,420 0.018 0.019 0.005 3.6 0.244 3.8 0.033 110 6.3 71,361 0.007 0.008 0.015 -2.1 0.009 -1.8 0.173 169 5.6 50,059 0.008 0.015 0.016 -2.0 0.018 -1.1 0.786 179 6.9 46,718 0.005 0.013 0.018 -3.2 0.107 -1.4 0.568 216 6.5 40,750 0.008 0.015 0.018 -2.3 0.021 -1.2 0.612 251 7.3 39,230 0.021 0.028 0.044 -2.1 0.025 -1.6 0.069 293 6.2 36,608 0.022 0.031 0.039 -1.8 0.041 -1.3 0.309 310 6.2 35,227 0.070 0.148 0.130 -1.9 0.050 1.1 0.716 313 6.3 35,359 0.054 0.086 0.129 -2.4 0.004 -1.5 0.034 314 6.4 35,096 0.032 0.032 0.090 -2.8 0.041 -2.8 0.081 328 6.9 34,340 0.019 0.021 0.038 -2.0 0.014 -1.8 0.142 386 6.1 29,540 0.033 0.048 0.136 -4.1 0.275 -2.9 0.220 397 6.3 27,953 0.054 0.082 0.102 -1.9 0.001 -1.2 0.015 583 7.4 15,107 0.075 0.188 0.167 -2.2 0.016 1.1 0.702 68 5.2 86,138 0.004 0.004 0.002 2.1 0.174 2.0 0.049 395 7.9 28,052 0.052 0.240 0.069 -1.3 0.108 3.5 0.303 433 6.4 26,103 0.488 0.479 0.225 2.2 0.346 2.1 0.005 522 7.9 20,300 0.035 0.120 0.040 -1.2 0.616 3.0 0.116 526 5.8 18,799 0.013 0.044 0.015 -1.1 0.625 3.0 0.392

37 Table 2.3. Continued

554 7.2 17,200 0.016 0.035 0.021 -1.3 0.044 1.7 0.020 572 5.0 16,102 0.055 0.119 0.054 1.0 0.963 2.2 0.033 576 7.3 16,015 0.009 0.025 0.005 1.8 0.425 4.7 0.159 39 5.5 112,999 0.034 0.021 0.045 -1.3 0.158 -2.1 0.049 74 7.7 82,072 0.024 0.014 0.049 -2.1 0.378 -3.5 0.186 228 6.4 40,520 0.017 0.007 0.025 -1.5 0.041 -3.5 0.040 268 4.8 37,407 0.013 0.003 0.005 2.4 0.123 -1.7 0.008 375 5.0 30,047 0.012 0.006 0.013 -1.1 0.887 -2.0 0.018 377 6.7 30,198 0.054 0.017 0.130 -2.4 0.277 -7.5 0.156

Interestingly, spot 68 had homology to a hypothetical protein from Gossypium raimondii, a species of cotton plant endemic to northern Peru. However, the Mascot score for this protein was very low (29) and is based upon a single peptide and could be a false identification.

Transcriptomic analysis of A. protothecoides grown with and without CMC in either constant light or constant dark revealed several potential proteins targets for cellulolytic activity

(Table 2.2; Table 2.4). One of the most significantly differentially expressed transcripts in A. protothecoides grown in the presence of CMC in the dark was identified as a potential sugar transporter (XM_011403249.1), upregulated approximately 34X compared to phototrophically grown algae. Not unexpectedly, other identified transcripts upregulated in the presence of CMC include genes involved in and carbon storage. Consistent with the proteomics results, an increase in transcripts for a50S ribosomal protein subunit (XM_011402927.1) and

Fructokinase-1 (XM_011401440.1) were observed. Notably, the potential oxidoreductase, Spot

211, discovered by proteomics was not significantly (within a p-value < 0.05) upregulated in the presence of CMC (XM_011398701.1). A member of the glycosyl hydrolase family 63

(APUTEX2500001195), annotated as a mannosylglycerate transporter (XM_011400931.1), had higher levels of in the presence of CMC in the dark. Most interesting, we observed the significant upregulation of three potential glycosyl hydrolases (APUTEX2500005369A,

APUTEX2500005777, and APUTEX2500004827) transcripts from the GH5 and GH9 families

(previously known as cellulase families A and E 38 Table 2.4. The differentially expressed spots selected for proteomic analysis. Fold changes are calculated from average spot percentages (individual spot density divided by total density of all measured spots for duplicate 2D gels).

Algae Algae Algae + Raw Algae + H O Algae + + Plant vs Plant vs Alone Raw H O Algae Alone Algae Alone₂ ₂ MW Plant Plant Mascot Spot Ave. Ave. ₂ ₂ No. pI (g) % % Ave. % Fold p Fold p Protein Identified Score 211 7.5 41,729 0.014 0.087 0.048 6.40 0.020 3.51 0.548 hypothetical protein F751_1382 233 Ribulose bisphosphate carboxylase/oxygenase activase 167 263 5.3 37,867 0.032 0.053 0.039 1.69 0.031 1.22 0.116 Fructokinase-1 177 photosystem II CP43 chlorophyll apoprotein 93 Chlorophyll a-b binding protein of LHCII type I 76 391 6.5 28,763 0.005 0.008 0.006 1.70 0.018 1.34 0.499 putative RNA-binding protein 88 putative RNA-binding protein 53 466 7.5 23,833 0.007 0.034 0.020 5.12 0.037 3.07 0.230 putative plastid-lipid-associated protein 4 98 68 5.2 86,138 0.002 0.004 0.004 2.06 0.174 2.03 0.049 hypothetical protein B456_007G254600 29 433 6.4 26,103 0.225 0.488 0.479 2.17 0.346 2.13 0.005 Chlorophyll a-b binding protein of LHCII type I 455 60S ribosomal protein L6 226 Protein YOP1 159 Chlorophyll a-b binding protein 3C 104 ATP-dependent Clp protease proteolytic subunit-related

protein 4 82 40S ribosomal protein S8 63 60S ribosomal protein L19-2 59 572 5.0 16,102 0.054 0.055 0.119 1.02 0.963 2.21 0.033 Chlorophyll a-b binding protein of LHCII type I 306 Ribulose bisphosphate carboxylase small chain 7 76 576 7.3 16,015 0.005 0.009 0.025 1.81 0.425 4.67 0.159 Photosystem I reaction center subunit II 116 50S ribosomal protein L9 95

39 [146]), in cultures grown in the dark with CMC. One of these family 9 glycosyl hydrolases

(APUTEX2500005777) encodes the CESA cellulose synthase domain N-terminal to the glycosyl hydrolase. One of the family 5 glucosyl hydrolases (APUTEX2500004827) is one of the two glycosyl hydrolases identified earlier with a PAN/APPLE domain C-terminal to the glycosyl hydrolase. Six other potential glycosyl hydrolases sequences (APUTEX2500000370,

APUTEX2500000844, APUTEX2500000742, APUTEX2500002671, APUTEX2500005369C, and APUTEX2500000981) with homology to proteins from families GT64, GH5, and GH9 were slightly induced in the presence of cellulose albeit not significantly. We also observed a significant induction of a transcript encoding the putative xylanase inhibitor

(APUTEX2500002026), in the presence of CMC when grown in the dark. We hypothesize that this protein may serve a protective role, inhibiting hydrolytic activity within the cell.

2.4. Conclusions

We discovered that the algae A. protothecoides UTEX25 is capable of directly degrading and utilizing non-food plant substrates, such as switchgrass, for cell growth. A. protothecoides is also known to be a high-lipid producing green microalgae, capable of generating more than half of its cell mass as lipids [113]. Although A. protothecoides was known to utilize sugars, we have demonstrated that growth can be enhanced in the presence of raw plant biomass under mixotrophic conditions and that enzymes with cellulolytic activity are actively secreted. Notably,

A. protothecoides is substantially more effective at degrading cellulose than the only other known cellulose-degrading algae (Figure 1) [116]. We also discovered that A. protothecoides has a faster growth rate on plant substrates compared to media without an organic C source

(Figure 2) and produces more lipids (Figure 3). By examining both metabolite 1D 1H-NMR and profiling the plant glycome, we observed that the structure of plant substrate is modified by the algae. In NMR, a diminishing in peaks associated with alkane C-H bonds (0.8-1.5 ppm) and methylene protons in cellulose (3.5-4.0 ppm) [141], when comparing cultures with plant substrate alone to cultures where algae and substrate are grown together (Figure 4) is 40 observed. Glycome profiling revealed that the predominant components utilized by the algae were xyloglucans (Figure 6), suggesting that A. protothecoides actively expresses one or more xyloglucanases. Genomic, transcriptomic, and proteomic analyses identified many members of the glycosyl hydrolase family, including 6 potential glycosyl hydrolases from families 5 and 9 are encoded in the genome and were significantly upregulated in the presence of plant substrate

(Table 1). This research identifies potential gene targets for future genetic engineering of plant substrate degradation and calls attention to the potential benefits of growing algae with plant substrate at larger scales to generate greater algae biomass and lipid production.

2.5. Acknowledgements

This work was supported by a grant to A.N.B. from the Laboratory Directed Research and Development Early Career Research Program at Los Alamos National Laboratory

(20170533ECR) and funds provided to S.R.S. by the Department of Energy’s Bioenergy

Technologies Office, under contract DE-EE0003046 to the National Alliance for Advanced

Biofuels and Bioproducts. Special thanks to Geoffrey Smith, Olga Chertkov, Cheryl Gleasner,

Kim McMurry, Oliver Oviedo, and Xiang Li Zhang for technical assistance.

A.N.B. designed and contributed to all experiments, B.W.V. contributed to glycosyl hydrolase enzyme sequence and structural analyses, S.R.S completed all genomic and transcriptomic sequencing, N.S. contributed to cell growth and FAME data collection, J.Y.S. contributed to plate experiments, J.A.R. contributed to NMR data collection, and S.P. contributed to glycome data collection. All authors contributed to data analysis and to the writing of the article.

41

CHAPTER 3

CHARACTERIZATION OF THE NANNOCHLOROPSIS GADITANA STORAGE

CARBOHYDRATE: A 1,3-BETA GLUCAN WITH LIMITED 1,6-BRANCHING

Modified from a paper to be submitted to Algal Research1

Brian W. Vogler2, 3, Jacob Brannum4, Jeffrey W. Chung5, Mark Seger3, Matthew C. Posewitz3

As a result of a secondary endosymbiosis of a red alga, stramenopiles retained β-1,3 glucans as the primary soluble storage carbohydrate, in contrast to the starch and glycogen of green plants and animals. This storage carbohydrate has been identified and characterized in representative diatoms, brown algae, and oomycetes, but has not been biochemically characterized in species of Nannochloropsis, an industrially relevant genus of algae favored for its lipid content and productivity. In this study, the soluble storage carbohydrate of

Nannochloropsis is characterized by 1H-NMR, linkage analysis, and size exclusion chromatography. The putative genes encoding enzymes required for synthesis of the carbohydrate oligosaccharide from glucose 6-phosphate are identified. A glycogenin-like glycosyltransferase family 8 (GT8) protein was also identified to be conserved among all species of Nannochloropsis, despite the lack of glycogen in the genus. Homologs were identified in available genomes of brown algae, diatoms, and oomycetes, all documented to utilize β-1,3 glucans for carbon storage. Finally, a set of likely laminarinases are highlighted from the glycosyl hydrolases by phylogenetic analyses.

______

1 Reproduced with permission of the coauthors. 2 This author wrote and edited all text, prepared all samples, and performed final analysis. 3 Department of Chemistry, Colorado School of Mines. 4 Department of Physics, Colorado School of Mines. 5 Department of Immunology and Microbiology, University of Colorado School of Medicine, Denver, CO, USA, formerly Department of Chemistry, Colorado School of Mines.

42 3.1. Introduction

In response to stress, many algae accumulate chemical energy stores in the form of lipids and/or carbohydrates that can used to maintain cellular viability until growth conditions that support cell proliferation are restored. Nannochloropsis is a genus of algae of biotechnological interest due to its relative stability in outdoor cultivation, and significant lipid production; especially in response to the stress of nitrogen-starvation. Species of

Nannochloropsis have been shown to accrue lipid to greater than 60% of their dry cell weight

[41]. Although much of the stored energy in species of Nannochloropsis is in the form of triacylglyceride (TAG), Nannochloropsis has also been demonstrated to accumulate a soluble storage carbohydrate during the day of a diel light-cycle and consuming this carbohydrate throughout the night [147].

While most organisms store carbon in α-1,4-glucans, stramenopiles typically synthesize

β-1,3-glucans. In brown algae, this polysaccharide is called laminarin, with occasional 1-linked

D-mannitol at non-reducing ends, and has short β-1,6-linked branches. Diatoms produce a polysaccharide called chrysolaminarin which is structurally similar to laminarin but lacks mannitol. The mycolaminarin of oomycetes is closest to the chrysolaminarin of diatoms.

Nannochloropsis is a genus within eustigmatophyceae, another class of stramenopiles. Recent work suggests that Nannochloropsis synthesizes laminarin as the storage glucan based on the presence of polymeric mannitol in the intracellular neutral sugars of Nannochloropsis oceanica

IMET1 as detected after ion chromatography [148], but the storage glucan has not yet been confirmed.

Although the soluble carbohydrates in species of Nannochloropsis have frequently been described as laminarin or chrysolaminarin, little biochemical evidence supporting this assignment or characterizing the nature of the soluble carbohydrate in species of

Nannochloropsis has been reported [112]. Here we use a variety of analytical techniques (1H-

NMR, glycosyl composition, glycosyl linkage analysis, and size exclusion chromatography) to 43 characterize the most abundant soluble carbohydrates oligomers in Nannochloropsis gaditana

CCMP526.

An understanding of the carbohydrate structure and the enzymes involved in its synthesis and consumption will help us to understand energy management and physiological culturing phenotypes, especially during diel growth, in this phototroph and related algae.

Additionally, manipulation of carbohydrate accumulation may allow directing increased carbon flux toward TAG, or alternatively into more carbohydrate [66,68,149,150].

The persistence of β-1,3-glucans as a storage carbohydrate in this evolutionary lineage is peculiar, and a better understanding of all the genes involved, as well as the energetic role within the cell, will likely help efforts to understand how and why this unique structure was maintained. It is believed that the last eukaryotic common ancestor had the tools for synthesizing both α-1,4-glycogen and β-1,3-glucans [151], and that the primary endosymbiosis of cyanobacteria, employing starch rather than glycogen, gave rise to a hybrid metabolism built around starch in green and .

3.2. Materials and methods

3.2.1. Carbohydrate extraction

Soluble carbohydrates were extracted from 50 ml of Nannochloropsis gaditana

CCMP526 culture grown under constant light (90 µmol photons m-2·s-1) in a 250 ml flask, shaking at 200 rpm in air-level CO2 at 24-26 °C. The culture was centrifuged for 10 min at 3500 rcf and the supernatant decanted. The cell pellet was resuspended in 4.5 ml of deionized water and transferred to six 1.7 ml tubes. These tubes were centrifuged at 15,000 rcf for 3 min and the supernatant was removed. Approximately 100 mg of garnets (BioSpec Products catalogue number 11079103gar) and 0.75 ml 100% methanol was added to each tube. The samples were vortexed for 5 min to lyse the cells. The cells and garnets were centrifuged again, and the supernatant removed. This process was repeated three more times, until the green color had been removed from the pellet. 44 The lysate was resuspended in 0.75 ml of 0.1 M HCl and incubated at 70 °C for 1 h. The insoluble cell debris was pelleted by centrifugation and the soluble supernatant was transferred to 15 ml tubes. The soluble carbohydrate was precipitated by the addition of 9 volumes of 100% ethanol and a 1-h incubation at -80 °C. Following this, the carbohydrate was pelleted by centrifugation at 3500 rcf for 30 min. The supernatant was removed, and the remaining solvent was removed using a lyophilizer. The dried sample was subsequently analyzed by proton NMR, size exclusion chromatography, and carbohydrate residue and linkage analysis.

3.2.2. 1H-NMR analysis

Approximately 6 mg of N. gaditana soluble carbohydrate extract or laminarin from

Laminaria digitata [used as a control (Sigma Aldrich L9634)] was dissolved in 1 ml of 6:1

DMSO-d6:D2O solvent, to which 4 drops of TFA were added. Proton scans of the N. gaditana sample were taken from 25 °C to 75 °C in increments of 10 °C. The sample temperature of 25

°C showed optimal placement for the water peak in that there was no overlap with the surrounding peaks of interest. A JOEL AC-500 NMR spectrometer acquired 2800 1H-NMR scans for each sample at 25 °C, using 5.89 µsec (45°) pulses with a 5.75 s repetition time. The resulting data were line-fitted using MestReNova software (Supplemental Table 2), utilizing a mixed Lorentzian/Gaussian lineshape function with least-squares optimization.

3.2.3. Size exclusion chromatography

Carbohydrate sizing was performed by the Complex Carbohydrate Research Center

(Athens, GA), who also provided residue and linkage analysis on the same 6 mg sample used for 1H-NMR analysis. For size exclusion chromatography, the sample was brought to 1 mg/mL in 50 mM ammonium acetate, pH 5.5 and filtered (0.45 µm) before injection. A volume of 100 µl of sample was resolved first using a Superose 12 PC 3.2/30 column (GE Healthcare Life

Sciences) with an ELS detector, which indicated the dominance of short-length carbohydrates

(Supplementary Figure 1). A second analysis using a Superose 12 10/300 GL column with refractive index detection was performed for better resolution at the smaller sizes of interest. 45 3.2.4. Linkage analysis

For glycosyl linkage analysis, the sample was permethylated, depolymerized, reduced, and acetylated; and the resultant partially methylated alditol acetates (PMAAs) analyzed by gas chromatography-mass spectrometry (GC-MS) as described by Heiss et al. [152].

About 1 mg of sample was used for linkage analysis. The sample was suspended in

200 µl of dimethyl sulfoxide and left to stir for 3 days. Permethylation was effected by two rounds of treatment with sodium hydroxide (15 min) and methyl iodide (45 min). Following sample workup, the permethylated material was hydrolyzed using 2 M TFA (2 h in sealed tube at 121 °C), reduced with NaBD4, and acetylated using acetic anhydride/TFA. The resulting

PMAAs were analyzed on an Agilent 7890A GC interfaced to a 5975C MSD (mass selective detector, electron impact ionization mode); separation was performed on a 30 m Supelco SP-

2331 bonded phase fused silica capillary column.

3.2.5. Degrees of polymerization and branching calculations

The degree of polymerization (DP) and degree of branching (DB) were calculated from the

1H-NMR data according to Kim et al. [1]:

1 푎푙푙 푎푛표푚푒푟푖푐 퐻 푡푒푟푚푖푛푎푙 푠푖푑푒 푐ℎ푎푖푛 퐷푃푛 = 퐷퐵푛 = 1 Formulas for calculating푟푒푑푢푐푖푛푔 the푡푒푟푚푖푛푢푠 corresponding statistics from linkage data푎푙푙 푎푛표푚푒푟푖푐 were arrived퐻 at by using the sum of all laminarin-related residues (3-linked glucopyranosyl, terminal glucopyranosyl, terminal mannopyranosyl, and 3,6-linked glucopyranosyl) as the numerator of DP and denominator of DB. Instead of the reducing terminus, the number of particles was calculated by taking the terminal residues and subtracting the branches. The 3,6-linked residues were used to calculate the branching. The formulas used to compute the DP and DB from linkage data are:

-

- 푎푙푙 푙푎푚푖푛푎푟푖푛 푟푒푠푖푑푢푒푠 3,6 푙푖푛푘푒푑 퐷푃푛 = 퐷퐵푛 = 푡푒푟푚푖푛푎푙 − 3,6 푙푖푛푘푒푑 푎푙푙 푙푎푚푖푛푎푟푖푛 푟푒푠푖푑푢푒푠

46 3.2.6. Phylogenetic analysis

The glucan synthase, Kre6-like branching enzyme, and glycogenin-like protein were

BLASTed against the non-redundant protein sequences of 26 selected species, spanning vascular plants, green alga, , and fungi (Supplemental Table 1). The best match from each organism with an E-value below 10-30 was used to construct a ClustalW alignment for each protein, using a BLOSUM cost matrix. These alignments were used to create PhyML trees with the Le Gascuel substitution method and 100 bootstraps [139].

3.3. Results and discussion

3.3.1. Nannochloropsis produces a β-1,3-linked soluble glucan with β-1,6 branching

It has been largely assumed that Nannochloropsis species utilize a storage carbohydrate structured similarly to its evolutionary neighbors, diatoms (chrysolaminarin [153]) and brown alga (laminarin [151,154]). Despite this assumption, or perhaps because of it, the storage carbohydrate of Nannochloropsis has not previously been characterized. Using 1H-NMR, glycosyl composition, glycosyl linkage analysis, and size exclusion chromatography, we have largely confirmed this general structure of β-1,3-linked glucans with occasional β-1,6 branching as well as the approximate degrees of polymerization (DP) and branching (DB).

Table 3.1. The degree of polymerization (DP) and degree of branching (DB) are calculated separately using our 1H-NMR and linkage data. For context, DP and DB measurements of soluble carbohydrates from diatoms and brown alga are provided.

Sample DP DB Method Referenced Work Laminarin 33 0.07 1H-NMR Kim et al. [1] Laminarin 28 0.063 1H-NMR Caballero et al. [155] P. tricornutum 17 0.015 1H-NMR Caballero et al. [155] Laminarin 13 0.043 1H-NMR This study N. gaditana 8 0.028 - 0.105 1H-NMR This study N. gaditana 8.1 - 9.2 0.0036 - 0.0071 Linkage This study Laminarin 30.3 0.052 Linkage Read et al. [154]

47 Both our linkage and 1H-NMR data indicate degrees of polymerization much shorter than those reported for other stramenopiles. By size exclusion chromatography, greater than 90% of the total soluble carbohydrate peak area was <5 kDa, corresponding to 30 glucose residues or less. Roughly, 70% of the peak area corresponded to molecular weights of 1 kDa and less, or carbohydrates of only 7 glucose residues and shorter.

Because the 1H-NMR spectra in the region correlating to the proton of the side chain terminus anomeric carbon (4.24-4.29 ppm) is less clear in the N. gaditana trace than for the L. digitata extract, we provide upper and lower bounds for the area of this signal (Figure 1). This range is carried through the calculation of DB, resulting in the presented possible range.

Results of the linkage analysis showed evidence consistent with minor cellulose contamination (Figure 2), which may have caused an underestimation of DP by increasing the observed terminal glucopyranosyl residues. However, the high MW of cellulose suggests few terminal residues relative to the number of 4-linked residues, and SEC data suggests the cellulose contamination to be less than 10% of the sample (Supplementary Figure 1).

Putative cellulose contamination prevents us from declaring the presence or absence of mannitol in the non-reducing ends of the soluble carbohydrate. As a result, we present a possible range for values of DP and DB from linkage analysis, depending on how much of the observed terminal mannopyranosyl, if any, is included as carbohydrate termini. When all of the terminal mannopyranosyl signal is presumed to be a component of the storage carbohydrate

(laminarin), the calculated DP is in closer agreement with the value calculated from our 1H-NMR analysis.

Trehalose is a disaccharide produced in many organisms, inherited in stramenopiles from the red algal [151]. The presence of trehalose metabolism in N. gaditana may have caused an underestimation of the degree of polymerization by linkage analysis. The

α-1,1-linked glucose units of trehalose will likely have appeared as non-reducing terminal glucose 48 β-1,3 backbone

Reducing terminus

Side chain terminus A

C

B

Figure 3.1. The protons of interest appear between 4.2 and 4.6 ppm, as demonstrated by Kim et al. [1]. Spectra for (A) N. gaditana soluble glucan extract and (B) laminarin are shown. At right (C), the generalized structure is shown, with anomeric carbons highlighted with the colors of their corresponding chemical shifts.

2,3-linked residues, 1.0%

2-linked rhamnopyranosyl, 1.3%

Other terminal residues, 1.4%

3-linked 3-linked glucopyranosyl, mannopyranosyl, 3.8% 74.5% Other, 16.6%

4-linked residues, 9.1%

3,6-linked glucopyranosyl, 0.6% Terminal mannopyranosyl, 2.1% Terminal glucopyranosyl, 7.7% Figure 3.2. Results of linkage analysis showing a majority of laminarin-related residues, with evidence of some putative cellulose (cell wall) contamination. Relative abundances of each residue were used to estimate degree of polymerization and degree of branching.

49 units following permethylation, hydrolyzation, and acetylation. However, while the 1H-NMR method is similarly incapable of discriminating the α-1,1-linked glucose units of trehalose from the non-reducing terminus, the signal from the anomeric carbon of the reducing terminus is used to calculate degree of polymerization, reversing the effect on DB calculation.

3.3.2. Glucan synthase and branching enzyme are conserved among stramenopiles

It is reasonable to assume that there exists a means of regulating storage carbohydrate production in Nannochloropsis, and that it is at the generation of UDP-glucose from glucose-1- phosphate, analogous to mechanism of starch regulation in green plants (Figure 3). In green plants, this regulation is demonstrated to be largely allosteric, such that overexpression of the wild-type gene had little impact on starch content [156].

The Nannochloropsis genomes sequenced to date all contain a UDP-glucose pyrophosphorylase (UGPase), phosphoglucomutase (PGM), and a UGPase-PGM fusion polypeptide (Figure 3). The conversion from the metabolically flexible glucose 6-phosphate to the polysaccharide-fated glucose 1-phosphate is catalyzed by phosphoglucomutase. This is a reversible interconversion. The UGPase enzyme transfers uracil diphosphate to the 1-position of glucose using the energy from the release of a phosphate from each substrate.

Figure 3.3. Proposed conserved glucan metabolism of investigated stramenopiles, including the UGPase/PGM fusion that strongly affects β-1,3-glucan synthesis in diatoms. The specific glucosyl hydrolases active on laminarin are not yet verified.

50 This charged UDP-glucose is the energetic substrate of β-1,3-glucan synthase, which frees

UDP and catalyzes the addition of glucose units to the growing β-glucan polymer. The Kre6 enzyme of Saccharomyces cerevisiae and Candida albicans drastically reduce the presence of

β-1,6-glycosidic bonds when knocked down [157–159].

A glucosyltransferase family 48 (GT48) enzyme and a Kre6-like enzyme are hypothesized to be responsible for generating the linear β-1,3-backbone and β-1,6-branching linkages of laminarin, respectively [151]. Two genes with high homology to GT48 and Kre6-like enzyme are found within sequenced Nannochloropsis genomes, with high conservation and transcript support where available. The functionality of the Kre6-like protein was demonstrated in P. tricornutum when two of the three Kre6-like proteins from P. tricornutum were shown to functionally complement a Kre6 knockout of S. cerevisiae (the third was not investigated) [67].

Considering the very low degree of branching calculated here, it is unclear what benefit this rare branching may confer to the organism. It is surprising that this branching enzyme appears so highly conserved throughout this lineage, and it suggests an additional functionality for this enzyme.

3.3.3. Nannochloropsis may utilize a glycogenin-like protein to nucleate laminarin

Glycogenin is the enzyme responsible for initiating the synthesis of glycogen, the storage carbohydrate of animals and fungi [160]. Resulting from efforts to understand the storage carbohydrate metabolism of Nannochloropsis (and Microchloropsis) species, a glycosyltransferase family 8 (GT8) protein with high similarity to glycogenin was found conserved among all species. Homologs were identified in available genomes of brown algae, diatoms, and oomycetes, all documented to utilize β-1,3 glucans for carbon storage [161–163].

Near neighbors were found in multiple apicomplexans and ciliates, and an individual each of cryptophyte, labyrinthulomycete, and pelagophyceae.

Because Nannochloropsis lacks glycogen, it is possible that this enzyme is responsible instead for the nucleation of its own storage carbohydrate, chrysolaminarin. Its conservation 51 among many organisms with similar storage carbons supports this hypothesis, but it is notably absent in a few genomes (i.e. Phaeodactylum tricornutum, ). One explanation is that this nucleation function was taken up by another enzyme in these exceptions, allowing for the loss of this otherwise conserved gene. It may also be that these organisms do encode their own glycogenin-like gene, but that they are absent from their incomplete genomes. Finally, it is possible that this glycogenin-like gene is not a necessary component of β-1,3 glucan biosynthesis, but instead involved in protein glycosylation or another unknown function.

Phylogenetic analysis was conducted on the glycogenin-like protein of Nannochloropsis using the nearest homologous predicted proteins from an array of vascular plants, alga, fungi and oomycetes. The resulting maximum likelihood tree clusters the vascular plants, in which glycogenin functions to nucleate the formation of glycogen. Outside of this cluster, there is a grouping of heterokonts, including oomycetes, diatoms, brown algae, and Nannochloropsis – all of which synthesize structurally similar β-1,3 glucans for carbon storage. Separate from these groupings are the glycogenin-like homologs of fungi.

In an analysis of the light/dark transcriptional coordination, Poliner et al. found the glucan synthase transcript to be upregulated in anticipation of dusk, with peak transcription after the beginning of the dark phase [147]. Interestingly, the Kre6-like branching enzyme is maximally expressed in the middle of the light-phase, with minimum transcription at dusk. The glycogenin- like protein is also cyclic, with higher transcription during the light and a swift downregulation at dusk.

3.3.4. Bioinformatics suggests three potential laminarinases from N. gaditana

The Chromalveolate hypothesis proposes that both stramenopiles and alveolates descended from the same genetic parentage, meaning one instance of the secondary endosymbiosis of a red alga [164]. This theory has the advantage of being much simpler than the alternate theories of as many as five independent endosymbiosis events for chromalveolates [165], but does not easily explain the significantly divergent storage 52 carbohydrate metabolisms. If both lineages evolve from a shared parent with a glycogen host metabolism and starch-producing red alga, it is hard to imagine why one lineage would refactor around starch (as did green plants and red alga before), but another would abandon all of this for a β-1,3-glucans storage carbohydrate. A more complete understanding of the genes involved in this β-1,3-glucan metabolism will better inform phylogenetic studies into this evolutionary question.

To identify likely candidates for the breakdown of the Nannochloropsis storage glucan, a list was made of the predicted proteins from glycoside hydrolase families with potential activity on β-1,3 glucans. The glycoside hydrolase families we selected were GH1, GH2, GH3, GH9, and GH81. The proteins from this screen were evaluated phylogenetically for clustering among organisms utilizing β-1,3 storage glucans (in-group). Among these proteins, only three resulted in a clear separation of these organisms, one GH1 family and two GH3 (Figure 3.4). Homologs of the two GH3 enzymes (Nga00051 and Nga03687) were found in eight of the ten in-group organisms, and GH1 homologs were found in seven (absent from Albugo laibachii). The two

Thalassiosira genomes were not found to have any of these three enzymes. We hypothesize that Thalassiosira acquired a different enzyme, perhaps through , capable of degrading their storage glucans. This functional redundancy would have allowed for the loss of the original enzyme.

Additional GH families with potential activity on β-1,3 glucans, as well as undiscovered

GH families may also be present that were not identified in this analysis. The Thalassiosira genomes necessarily have enzymes capable of hydrolyzing their storage glucans, and the identification of these enzymes would help to further illuminate the structures which dictate this function.

We also sought to utilize the Poliner et al. dataset to provide context for our proposed laminarinases. The glycoside hydrolase family 1 enzyme Nga00952 had two high-similarity hits in the N. oceanica CCMP1779 genome, 1902 and 2516. These enzymes peaked at the end of 53 light and after onset of dark, respectively. The expression of 1902 demonstrated the most dramatic transcriptional upregulation of the set of storage carbohydrate-related genes identified, with an 8-fold change at hours 9 and 12 relative to hour 0. The GH family 3 enzymes each had a high-similarity (E-value = 0) homolog in CCMP1779.

3.3.5. Manipulating carbohydrate synthesis increases lipid accumulation in diatoms

In the diatom , the glucan synthase gene (Thaps3_12695) was knocked down by antisense RNA expression [68], resulting in reduced chrysolaminarin content, as much as 54% after 24 hours of silicon starvation. In the diatom Phaeodactylum tricornutum, a knockout of the bifunctional UDP-glucose pyrophosphorylase- phosphoglucomutase fusion protein resulted in a dramatic 45-fold increase in nutrient-replete

TAG accumulation [149]. This mutant also accumulated two-fold more than wildtype in nitrogen- depleted medium. More recently, a knockdown of this same gene by both antisense and inverted repeat expression resulted in significantly lower chrysolaminarin content, an increased total lipid content (19.5% dry weight), and slightly slower growth rate [150].

It is possible that a resulting buildup of precursor metabolites might generate product feedback inhibition on upstream reactions. On one hand, this accumulation of metabolites has the potential to redirect carbon to preferred metabolic pathways but may instead reduce the rate of carbon fixation. When the beta glucan synthase gene from P. tricornutum was knocked down,

TAG levels increased, but growth and oxygen evolution rates decreased [66]. For the knockdown exhibiting 22% wildtype level of beta glucan synthase transcript, TAG content nearly doubled following nitrogen deprivation. In that study, the knockdown generated surprising effects on the thylakoids, which may not translate to Nannochloropsis, allowing hope that the growth defects might be avoided.

54

A B

KEY Fungus Green alga Vascular plant Oomycete Non-green alga C D E F

Figure 3.4. Phylogenetic trees generated for the (A) branching enzyme, (B) glucan synthase, and (C-F) glycoside hydrolases. Both branching enzyme and glucan synthase clustered as expected, with stramenopiles separating well from green plants and fungi. Of the glycoside hydrolases, (C) Nga03687, (D) Nga00952, and (E) Nga00051 clustered similarly, unlike other considered genes. (F) Nga05229 is included as a representative example of rejected glycoside hydrolase.

55 3.4. Conclusion

Nannochloropsis gaditana CCMP526 produces a soluble β-1,3-linked storage glucan with extremely sparse β-1,6 branching and a surprisingly short average length of 10-20 glucose residues per molecule. The glucan synthase and branching enzyme likely to be responsible for the formation β-1,3 and β-1,6 glycosidic bonds have been identified in the available N. gaditana and N. salina genomes (CCMP526, B-31, CCMP527, HH-1, CCMP1776, CCMP537), but knockout/complementation or enzymatic studies are necessary to definitively assign these biochemical activities. Similarly, three genes have been identified as high potential candidates for laminarinase activity by phylogenetic analysis and structural homology, but again enzymatic analysis is necessary for confirmation. Finally, a gene encoding a glycogenin-like protein is conserved among Nannochloropsis genomes, despite a lack of glycogen, perhaps performing a novel function, such as the nucleation of the storage beta glucan.

3.5. Acknowledgement

Supported in part by the Chemical Sciences, Geosciences and Biosciences Division,

Office of Basic Energy Sciences, U.S. Department of Energy Grant Number DE-SC0015662 to

Parastoo Azadi at the Complex Carbohydrate Research Center. B.W.V designed and contributed to all experiments, J.C. and B.W.V. generated research samples, J.B. performed

NMR data collection and processing, M.S. interpreted NMR results. All authors contributed to data analysis and to the writing of the article. The authors declare no potential financial or other interests that could be perceived to influence the outcomes of the research. No conflicts, informed consent, human or animal rights applicable. All authors declare agreement to authorship and submission of the manuscript for peer review.

56

CHAPTER 4

CAS9-GENERATED GLUCAN SYNTHASE KNOCKOUT

IN NANNOCHLOROPSIS GADITANA

Modified from a paper to be submitted to Algal Research1

Brian W. Vogler2, 3 Matthew C. Posewitz3

4.1. Introduction

4.1.1. Nannochloropsis storage carbohydrate

Nannochloropsis gaditana is a Stramenopile in the Eustigmatophyceae class, related to

Diatoms and brown alga, and more distantly to oomycetes. Stramenopiles have been demonstrated to accumulate β-1,3-linked carbohydrate, with β-1,6-branching, during the light day to be consumed during the night. Recently, this storage carbohydrate was chemically characterized in Nannochloropsis gaditana CCMP526 as a short (DPn = 5-15) oligosaccharide with very infrequent branching. The genes involved in its biosynthesis have been identified, though the activity of the KRE6-like branching enzyme has yet to be demonstrated.

In the diatom Phaeodactylum tricornutum, the fusion PGM/UGPase enzyme has been both knocked down by RNAi and knocked out by a targeted TALE nuclease. The knockdown demonstrated significant reduction in chrysolaminarin levels (50-60% reduction), with a modest impact on specific growth rate under a 12:12 dark-light cycle (5% reduction) [150]. The effect of the knockout on chrysolaminarin levels was not evaluated, but a significant increase in TAG accumulation was observed when grown in nutrient-replete medium (45-fold increase) [149].

______

1 Reproduced with permission of the coauthors. 2 This author wrote and edited all text and executed all experiments. 3 Department of Chemistry, Colorado School of Mines.

57

This result was predicted by the analogous knockout of ADP-glucose pyrophosphorylase in

Chlamydomonas reinhardtii, which yielded an increase in TAG accumulation in that system [19].

The presence of two parallel pathways from glucose 6-phosphate to UDP-glucose means that even a knockout of this fusion enzyme is likely to cause an incomplete phenotype.

Alternatively, targeted knockout of the enzyme required for the synthesis of the β-1,3-linked backbone should eliminate production of the storage carbohydrate completely. This enzyme was knocked down in the diatom Thalassiosira pseudonana, resulting in a decreased chrysolaminarin accumulation by Aniline blue fluorescence. Knockdown in Phaeodactylum tricornutum also impaired chrysolaminarin synthesis [66]. They also observed elevated triacylglycerol levels, which combines with an absence of nucleotide sugar precursor to suggest that precursor UDPG is shuttled to alternative pathways.

This knockout study investigates the effects of eliminating synthesis of this storage carbohydrate, for the first time in the Stramenopile division, avoiding the potential growth defects potentially caused by RNA interference [166]. We also generated knockouts of the putative branching enzyme, a KRE6-like protein conserved among this division.

4.1.2. CRISPR/Cas9 nuclease-directed insertional mutagenesis

The rapid expansion of CRISPR/Cas9 technology in model systems has quickly raised the bar for the development of new model organisms [167–169]. The implementation of Cas9 in new systems is made difficult by (1) the low expression levels and resulting challenge to validate translation and targeting and (2) the lack of diagnostic tools to identify failures and help troubleshoot in undeveloped organisms. Doubts and fears plagued Cas9 research, leaving it an incredibly powerful black box with cytotoxicity concerns frequently being offered as explanations for failure [170].

Following evidence for very low activity in Nannochloropsis published in 2016 [48], a much more practical strategy relying on the use of a Cas9-expressing chassis strain was published in 2017 suggesting that Nannochloropsis might not struggle with stable Cas9

58 expression [171]. In this work, we share a strategy to verify Cas9 expression and localization

(by GFP fluorescence), assay the activity of synthesized guide RNA in vitro, and validate insertional knockouts within weeks.

4.2. Materials and methods

4.2.1. Design and assembly of Cas9 expression vector for Nannochloropsis gaditana

Plasmid pNgEx_hyg_bfloGFPa1 was synthesized for us by DNA2.0 (now ATUM) with a codon-optimized hygromycin resistance gene (AphIV) driven by SV40 promoter and terminator, followed by codon-optimized bfloGFP [172] driven by the N. gaditana CCMP526 eukaryotic elongation factor 1 alpha promoter and aldolase terminator (Figure 4.1). A C. reinhardtii codon- optimized Cas9 with C-terminal SV40 nuclear localization sequence was graciously provided by

Dr. Donald Weeks (University of Nebraska, Lincoln, NE) [173] and cloned before the fluorescent protein, with an added linker sequence fusing the two (S-A-G-G-G-S-G-G). The SV40 NLS was added again to the C-terminus of this Cas9-GFP fusion product. The resulting plasmid was

ClaI (2, 6)

CDS chimeric hygroB S 0 promoter S 0 late polyA region S 0 enhancer Ori pUC EEF1A Promoter and ATG BsaBI (761) NdeI (5, 51)

Gentamicin-r

ngO -hyg Cas -NLS-GFP-NLS 13,353 bp CrCas

Ampicillin-r Transaldolase terminator S 0 NLS bfloGFP poly-G linker S 0 NLS

PmeI (10,2 5) NheI (10,268)

Figure 4.1. Cas9 experression vector constructed for Nannochloropsis gaditana CCMP526 using a viral SV40 promoter to drive hygromycin resistance and the endogenous elongation factor promoter to drive Cas9 expression. Cas9 is codon-optimized and fused to an NLS, GFP, and another NLS.

59 linearized by BsiWI overnight and cleaned by phenol-chloroform/isoamyl alcohol extraction in followed by ethanol precipitation and resuspended to a final concentration > 1 µg/µl.

4.2.2. Generation and identification of Cas9-expressing N. gaditana CCMP526

Nannochloropsis gaditana CCMP526 was cultured in buffered, enriched f/2 (50% filtered seawater, 5.88 mM NaNO3, 3 mM NH4Cl, 435 µM NaH2PO4, 5X f/2 vitamins, 2X f/2 trace metals, 10 mM HEPES pH 8.0) under constant 100 µE/m2s light with shaking at 105 rpm and 27

°C. Following centrifugation at 3500 rcf for 15 minutes, growth medium was removed, and cells were resuspended in 25-40 ml of sterile 375 mM D-sorbitol. This was repeated two more times, with a cell count before the last centrifugation. The pelleted cells were resuspended in a volume of D-sorbitol such that the final concentration of cells was 109 cells/ml. This cell suspension was aliquoted into 100 µl reactions in sterile 1.7 ml tubes, and 1-5 µg of linearized plasmid was added to each tube for transformation.

Each cell/DNA solution was transferred to a chilled 2 mm gap electroporation cuvette and electroporated at 12000 /cm, with 750 Ω shunt resistance and 50 µF capacitance, and quickly transferred to 900 µl D-sorbitol in a culture tube, taking effort to reduce the exposure time of the cells to the cold cuvette. After 2-5 hours, 4-9 ml of growth medium was added to each tube and they were incubated overnight in a rotary shaker under light. The following day, cultures were pelleted by centrifugation and all but 1 ml of media was removed. The cells were resuspended and split across three plates of enriched f/2 augmented with 1% agar and 200 mg/L Hygromycin B Gold (InvivoGen).

Colonies appeared after 2-3 weeks and were streaked onto gridded plates of the same media formulation. Viable cultures were transferred to liquid and analyzed by flow cytometry on an Attune NxT flow cytometer with blue laser (ThermoFisher Scientific). Cultures with no observable green fluorescence above WT were abandoned and remaining cultures were maintained. An early isolate with 3-5% GFP-positive cells (CG15) was used to generate the knockout lines in this work, but this culture was further enriched by fluorescence assisted cell

60 sorting on a FACSAria (BD Biosciences) to 10% GFP-positive after two rounds, and other isolates with greater populations of GFP-positive cells prior to sorting were later generated. The

Cas9 coding sequence of CG15 was amplified from genomic DNA by PCR and sequenced by

GeneWiz (South Plainfield, NJ) to ensure the sequence had not been mutated.

4.2.3. Design and synthesis of single guide RNAs

RNA guides were chosen by uploading the genomic sequences to Benchling

(https://benchling.com) with coding regions annotated. Guides of 20 nucleotides in length with

NGG PAMs were selected based on a high on-target score, generated by Benchling based on the design rules of Doench et al [174]. For each target, two guides were selected, and primers were synthesized with 5’ and 3’ additions, according to the EnGen® sgRNA Synthesis Kit (New

England Biolabs) manual. Guides were then synthesized according to the kit protocol and assayed for in vitro nuclease activity following DNase I treatment and RNA purification by RNA

Clean & Concentrator™-25 (Zymo Research).

4.2.4. In vitro assay of sgRNA nuclease activity

Prior to testing nuclease activity, guides were diluted to 300 nM in 50 µl nuclease-free water, heated to 98 °C for 5 minutes and allowed to cool to room temperature. 3 µl of each guide was then added to equimolar Cas9 Nuclease, S. pyogenes (New England Biolabs) and incubated for 10 minutes at 25 °C, as well as a no-guide negative control for each target. After

Cas9-guide binding, substrate DNA including both target sites was added to each tube, such that the molar ratio of Cas9:sgRNA:substrate was 20:20:1, according to the protocol provided with the Cas9. The reactions were incubated overnight at 37 °C and followed by fragment analysis on 1% agarose gel.

4.2.5. Transformation of Nannochloropsis gaditana CG15 with sgRNA

Zeocin (InvivoGen) was used as a secondary selection for sgRNA transformation into

CG15. The resistance cassette was generated by Gibson assembly of a codon-optimized Sh ble gene in place of the hygromycin resistance gene in pNgEx_hyg_bfloGFPa1. The resistance

61 cassette was amplified by PCR and the resulting fragment was purified using the QIAquick PCR

Purification Kit (Qiagen) and concentrated using a Savant™ DNA120 Speed ac™ Concentrator to a concentration >1 µg/µl.

Liquid culture of CG15 was cultured for transformation with guide RNA similar to Cas9 plasmid transformation, but growth media was supplemented with 200 mg/L hygromycin. RNA guides were concentrated to >1 µg/ml by Speed ac™. For transformations the same protocol was followed, but with 1-2 µl of concentrated sgRNA and 1 µl of concentrated Sh ble cassette added prior to electroporation rather than linearized Cas9-GFP plasmid. After overnight incubation, each culture was split onto three plates with 30 mg/L Zeocin antibiotic.

4.2.6. Identification of glucan synthase and KRE6-like branching enzyme knockouts

After streaking onto gridded plates with 30 mg/mL Zeocin, sgRNA transformants were

2 incubated in 1% CO2 under constant 100 µE/m s light and transferred into 500 µl in culture tubes with Zeocin as they came up. These liquid cultures were amplified by PCR using the same primers used to generate the target substrate for in vitro guide assays. The products generated were analyzed by agarose gel electrophoresis and fragments indicating insertion of the Sh ble cassette were purified using Monarch® PCR & DNA Cleanup Kit (New England

Biolabs) and sequenced.

4.2.7. Culturing for growth phenotype

Cultures of wildtype, Cas9-expression background, Δbgs, and Δtgs were inoculated in triplicate at an initial density of 1x106 cells/ml. Every two days, 200 µl of culture were transferred to sterile 1.7 ml tubes. 50 µl of this was diluted 5X with modified f/2 and analyzed by flow cytometry. 100 µl was pelleted by centrifugation, with 90 µl of the supernatant removed and replaced with 190 µl of DI water. The cells were resuspended and heated to 60°C for 20 min to release cellular soluble carbon [175]. Samples were then pelleted again at 20,000 rcf for 10 min, and 100 µl of the cleared lysate separated. Both fractions were frozen for subsequent Anthrone analysis.

62

4.2.8. Quantification of storage glucan

Anthrone reagent in concentrated sulfuric acid was used to quantify soluble carbohydrate [176]. Glucose standards ranging from 100 to 500 µg/ml and the cleared lysates were diluted 10X in chilled Anthrone reagent (2 g/L Anthrone in 71% concentrated sulfuric acid), and incubated for 12 minutes at 100 °C. All samples were immediately chilled in an ice bath and

200 µl of each was plated in triplicate in a 96-well microplate and absorbance was read at 625 nm.

4.2.9. Quantification and composition of total fatty acid

From continuous illumination cultures, 500 µl was taken during stationary phase and stored at -20 °C for analysis. Total lipids were extracted and converted to fatty acid methyl esters as described previously [15,19]. 1.0 ml of 0.7 M KOH in 95% methanol was added to the cultures and mixed, then incubated at 100 °C for 1.5 hours to achieve cell lysis and saponification. Acid-catalyzed methylation was executed by adding 1.5 ml 12 N HCl:MeOH,

1:16 (v/v) and incubating at 60 °C for 16 hours. After cooling, fatty acid methyl esters were extracted into 1.25 ml hexane and analyzed by GC-FID.

4.3. Results and discussion

4.3.1. Stable Cas9 expression does not have significant effect on cell growth

There were several potential barriers to the successful implementation of a Cas9- expressing Nannochloropsis mutant chassis. Most concerning at the outset were reports of

Cas9 toxicity in C. reinhardtii [170], necessitating transformation with vectors encoding both

Cas9 and sgRNA, both of which were quickly lost. Even in later work, this toxicity was supported by the absence of Cas9 DNA in all targeted mutants generated [177].

4.3.2. Soluble carbohydrate accumulation is induced following nitrogen starvation

Upon transfer to nitrogen-deficient media, wild-type and Cas9-GFP expressing cell lines accumulated significant soluble carbohydrate. Relative to cultures maintained in nitrogen-replete

63 media, soluble carbohydrate increased per cell and as a fraction of dry cell weight (%DCW) 4- fold (Figure 4.1).

300 3.5%

250 3.0%

N-replete 2.5% 200 N-replete N-starved 2.0% N-starved 150 1.5% 100 1.0% * * * *

50 0.5%

Soluble carbohydrates (fg/cell) carbohydrates Soluble Soluble carbohydrates (%DCW) carbohydrates Soluble 0 0.0% WT Cas9 Δbgs Δtgs WT Cas9 Δbgs Δtgs

Figure 4.2. Soluble carbohydrate extracted from cultures split into nitrogen replete and nitrogen starved media at late exponential phase and cultured for 72 hours. Error bars represent standard deviations of biological triplicates. Soluble carbohydrate is presented on a per cell (left) and fraction of dry cell weight (right) basis. Asterisks (*) indicate significant difference from both wild-type and Cas9-GFP background (p-value < 0.05).

4.3.3. Knockout of BGS or TGS drastically reduces soluble carbohydrate accumulation

In both sets of knockouts, soluble carbohydrate was significantly reduced relative to both wild-type and Cas9-GFP-expressing cultures following nitrogen starvation. The effect was not noticeable in nitrogen replete conditions, but following nitrogen starvation, carbohydrate levels were comparable to nitrogen-replete cultures of control cultures (Figure 1).

It is surprising that the Δtgs line displayed a similar disruption of soluble carbohydrate accumulation. It may be the case that the lack of beta glucan branching causes the storage carbohydrate to be less soluble, causing its loss from the analyzed soluble fraction. To investigate this possibility, we also quantified the total carbohydrate from these same cultures

(Figure 4.2).

Investigation of the total carbohydrate content supports the conclusion that the Δtgs line, like the Δbgs line, is unable (or significantly hampered) in its ability to synthesize or accumulate

64 the storage beta glucan. It may be that the lack of branching effects the capacity to store the carbohydrate, perhaps effecting packing efficiency. Alternatively, it may be that the branch sites function to regulate the breakdown of this storage carbohydrate, and that a lack of branching one mechanism to moderate the degradation and use of the carbohydrate.

1600 18% N-replete N-replete 1400 N-starved 16% N-starved 1200 14% * 1000 12% 10% 800 8% 600 6% 400 4%

200 2% Total carbohydrates (fg/cell)carbohydrates Total 0 (%DCW)carbohydrates Total 0% WT Cas9 Δbgs Δtgs WT Cas9 Δbgs Δtgs

Figure 4.3. Total carbohydrate from cultures split into nitrogen replete and nitrogen starved media at late exponential phase and cultured for 72 hours. Error bars represent standard deviations of biological triplicates. Soluble carbohydrate is presented on a per cell (left) and fraction of dry cell weight (right) basis. Asterisks (*) indicate significant difference from both wild- type and Cas9-GFP background (p-value < 0.05).

It must be pointed out that our understanding of the β-1,6-transglycosylase is very limited. Almost all of our knowledge comes from inference by homology to the yeast Kre6 proteins, believed to be involved in the β-1,6-branching of β-1,3 chains of yeast cell walls

[157,158]. The capacity for tgs genes of P. tricornutum to functionally complement a transglycosylase-deficient Δkre6 yeast strain supports the functional similarity of these two genes [67]. In C. albicans, knockdown of skn1 (the conserved domain present in the tgs genes of stramenopiles) reduced β-1,6-glucan with no effect on β-1,3-glucan [158], but in

Saccharomyces cerevisiae, mutations of KRE6 reduced levels of both β-1,6-glucan and β-1,3- glucan [178], which leaves the door open to the N. gaditana homolog being involved in β-1,3- glucan synthesis.

65

4.3.4. Growth under light/dark cycling

In P. tricornutum, bgs knockdowns demonstrated a lower growth rate than wild-type under constant light, but growth difference was lost in light/dark cycles with short days (8L/16D,

4L/20D [66]. In our growth experiments under constant light, Δbgs and Δtgs strains grew slightly better than WT, though not significantly. This trend was maintained under light/dark cycling

(Figure 4.3), with both knockouts growing slightly better than wild-type, but well within expected deviation.

4.3.5. BGS and TGS knockout strains accumulate significantly more total fatty acids

In diatom knockdowns of the beta glucan synthase, total fatty acid accumulation was significantly greater relative to wild-type [66,68]. Within Thalassiosira pseudonana, an array of knockdowns with unmeasured effect on transcript or protein levels yielded significantly greater total fatty acid accumulation only at stationary phase of culture [68]. A knockdown in P. tricornutum yielded a near doubling of total fatty acid content relative to WT following nitrogen depletion. While we did notice an increase in total fatty acid content, the increase was not statistically significant (p > 0.05, Figure 4.2).

4.3.6. BGS knockout cells are less sensitive to photoinhibition in older cultures

In diel experiments, Δbgs mutant lines demonstrated a significantly smaller population of cells displaying photoinhibition by flow cytometry, measured by reduced chlorophyll autofluorescence. This difference was only observed at late growth, beyond 10 days of diel culture, at cell densities above 100 million cells/ml (Figure 4.4). As cell densities reached this level, a subpopulation of cells with reduced chlorophyll autofluorescence appeared (Mid Chl). A few days later, another population arose, with further reduced chlorophyll autofluorescence

(Low Chl). At the same time this second population began to grow, the mid-group began disappearing, apparently transitioning into the Low Chl phenotype, because the density of cells with healthy phenotypes appeared to plateau in all cultures except the Δbgs mutant lines.

66

45%

N-replete 40% N-starved 35%

30%

25%

20% * 15%

10% Total fatty acid content (%DCW) content acid fattyTotal

5%

0% WT Cas9-GFP Δbgs Δtgs

Figure 4.4. The total fatty acid content as a fraction of dry cell weight (%DCW) for cultures split into nitrogen replete (+N) and nitrogen starved (-N) media at late exponential phase and cultured for 72 hours. Error bars represent standard deviations of biological triplicates.

In the Δbgs knockout lines, the Mid Chl subpopulation appeared significantly later, and the Low Chl subpopulation remained significantly smaller than that observed for either the control populations or the Δtgs knockout lines. While the Δtgs knockouts did significantly delay their shift to Mid Chl and Low Chl, they eventually followed the control lines and failing to produce more photoactive cells. These results suggest a role for the storage glucan in regulating the operation of the photosystem. In our Δbgs knockout lines, we indeed observe significantly more chlorophyll and carotenoids per cell relative to both the Cas9-GFP background and wildtype (Figure 4.5).

4.4. Conclusions

Knockout of the predicted beta-glucan synthase gene in Nannochlorpsis gaditana confirms its involvement in the synthesis of the β-1,3-linked soluble storage carbohydrate by

67 significantly reducing its production. Suprisingly, knockout of the predicted branching enzyme had similar effect, also significantly reducing levels of the storage carbohydrate.

We denomstrated a modest imprvement in lipd levels under nutrient-replete conditions relative to wild-type and Cas9-expressing background. We did not observe the growth defects shown by knockdowns of the beta-glucan synthase gene in Phaeodactylum tricornutum, instead growing just as well under both constant light and a light/dark cycle.

Storage carbohydrates have been observed to mediate stress response in cyanobacteria [179]. In this work, we demonstrate a similar non-bleaching phenotype resulting from the knockout of the beta-glucan synthase, but not for the transglycosylase knockout, despite a similar effect on synthesis of the beta glucan. The disconnection between these two observed phenotypes invites further investigation.

68

Figure 4.5. Cell densities of fluorescent subpopulations as cultures mature. Taken as a whole, no difference in growth is observable between the selected strains (top), but separating the fluorescent subpopulations reveals more complexity. In exponential growth, nearly all cells are in the healthy fluorescent region (High Chl). As cultures mature, a population appears (Mid Chl) with lower 695 nm fluorescence and more fluorescence at 574 nm. Following this, a population with even less 695 nm fluorescence and more 574 nm fluorescence appears (Low Chl) and the transitional population fades. Beyond this point, increased cell density is limited to cells in the Low Chl cluster, and the density of healthy cells plateaus. Error bars represent standard deviation of four biological replicates, except for wild-type. For each knockout, four genetically unique mutants were used. Asterisks (*) indicate significant difference from Cas9-GFP background (p-value < 0.05).

69

250 Algae 200 WT CG_15 150 BGS 100 TGS 50 Cell density density Cell 0 (million cells/ml)(million 0 48 96 144 192 240 288 336 384 432 480

200 WT High Chl 150 CG_15 * BGS * * * 100 TGS * * * * * * 50

Cell density density Cell 0 (million cells/ml)(million 0 48 96 144 192 240 288 336 384 432 480

25 WT Mid Chl 20 CG_15 15 BGS * TGS 10 * * * * 5 * Cell density density Cell * * 0 *** **** * * * (million cells/ml)(million 0 48 96 144 192 240 288 336 384 432 480

50 WT Low Chl 40 CG_15 30 BGS 20 TGS * * 10 *

Cell density density Cell * 0 * * * * * (million cells/ml) (million 0 48 96 144 192 240 288 336 384 432 480

70

300 * 250

200

150

fg/cell * 100

50

0 WT CG_15 BGS TGS

Chl a Carotenoids

Figure 4.6. Chlorophyll a and carotenoid content of late exponential phase cultures (1.2-1.5x108 cells/ml) grown under 12/12 light/dark cycle. Samples were taken at 8 hrs into the 12 hr light period. Error bars represent standard deviation of four biological replicates, except for wild-type. For each knockout, four genetically unique mutants were used. Asterisks (*) indicate significant difference from Cas9-GFP background (p-value < 0.05).

71

CHAPTER 5

HIGH-EFFICIENCY CAS9-DIRECTED INSERTIONAL GENE KNOCKOUT

IN CHLORELLA SP. STRAIN CCMP252

5.1. Introduction

Following the successful implementation of Cas9-directed insertional gene knockout in

Nannochloropsis gaditana CCMP526, we saught to test the adaptability of the methods developed and the robustness of the technique. While the strength of N. gaditana rests largely in its resistance to predation and high fat content, it demonstrates a relatively poor growth rate and succeptibility to high heat-induced crashes, as well as being evolutionarily removed from the green plant lineage. In contrast, Chlorella sp. CCMP252 is a fast-growing (30.2±0.9 g/m2d), high-light-, high-salt-, and heat-tolerant microalgae of the green lineage.

This work utilizes the guide troubleshooting and mutant screening methods developed in

N. gaditana and applies them to a new system. The technique is further improved here by the application of a newly available commercial supply of affordable and electroporation-compatible purified Cas9-NLS. This strategy of using complexed Cas9-sgRNA allows the use of one more resistance marker to use for sequential combined gene knockouts, as we no longer need to use one for maintenance of Cas9 expression. The avoidance of Cas9 expression also simplifies the analysis of generated mutants, as they can be simply compared to the wild-type host rather than the Cas9-expressing background strain as well.

5.2. Materials and Methods

5.2.1. Design, scoring, and synthesis of sgRNA

RNA guides were chosen by uploading the genomic nitrate reductase sequence

(Genetic Data D.4) to Benchling (https://benchling.com) with coding regions annotated. Guides of 20 nucleotides in length with NGG PAMs were selected based on a high on-target score,

72 generated by Benchling based on the design rules of Doench et al [3]. For each target, two guides were selected, and primers were synthesized with 5’ and 3’ additions, according to the

EnGen® sgRNA Synthesis Kit (New England Biolabs) manual (Table 5.1). Guides were then synthesized according to the kit protocol and assayed for in vitro nuclease activity following

DNase I treatment and RNA purification by RNA Clean & Concentrator™-25 (Zymo Research).

Figure 5.1. Genomic sequence of Chlorella sp. CCMP252 with annotated target sites (light blue) primers for in vitro assay, PCR analysis, and sequencing, and nitrate reductase coding sequence (yellow).

Table 5.1. Primer sequences used in C252 Cas9 work.

Primer name Nucleotide sequence (5’→3’) EnGen_C252_NR_1 TTCTAATACGACTCACTATAGAGTCCCCAGAATCAAATGGGGTTTTAGAGCTAGA EnGen_C252_NR_2 TTCTAATACGACTCACTATAGCAAGCCCGTTCACTACCACAGTTTTAGAGCTAGA pcr_C252_NR_F1 AAAATCAAGAGGCACCGTGAGAGAGGC pcr_C252_NR_R1 CAGCACCCCAGTTGAATCCAATCGTCT pcr_C252_NR_F2 GTGCGACCAACACCTGCTCTCATGTAT pcr_C252_NR_R2 ATCGTCTGCTTCACCATGTTCTGCTCC C252_NR_seq_for GGATCAGCTGAATATTCAAGACATCTTC C252_NR_seq_rev GCAGCAGGGAGGCTGCAC

5.2.2. In vitro assay of sgRNA

Prior to testing nuclease activity, guides were diluted to 300 nM in 50 µl nuclease-free water, heated to 98 °C for 5 minutes and allowed to cool to room temperature. 3 µl of each guide was then added to equimolar Cas9 Nuclease, S. pyogenes (New England Biolabs) and incubated for 10 minutes at 25 °C, as well as a no-guide negative control for each target. After

Cas9-guide binding, substrate DNA including both target sites was added to each tube, such that the molar ratio of Cas9:sgRNA:substrate was 20:20:1, according to the protocol provided

73 with the Cas9. The reactions were incubated overnight at 37 °C and followed by fragment analysis on 1% agarose gel (Figure 2).

Figure 5.2. Agarose gel electrophoresis gel fragment analysis of in vitro sgRNA guide assay. Two different guides were tested in lanes 1 and 2, and negative control with no guide added was run in lane 3.

5.2.3. Transformation of Chlorella sp. strain CCMP252 using complexed Cas9-sgRNA

Chlorella was grown with constant shaking under constant light and ambient, air-level

CO2, in half-salinity f/2 medium augmented with urea as a nitrogen source and increased nutrients, as used previously. At a cell density anywhere from 2 x 106 to 2 x 107 cells/ml, culture was pelleted at 2000 rcf for 10 min and the media removed. The cell pellet was washed twice with sterile ᴅ-sorbitol (70 g/L) and resuspended to a final concentration of 2 x 109 cells/ml. This suspension was separated into 100 µl aliquots before addition of genetic material.

Cas9-sgRNA RNPs were assembled by first preparing 5 µM dilutions of sgRNA in

RNase-free water, melting at 98 °C for 5 min and allowing to cool back to 20 °C at a rate of 1

°C/s. These properly-folded sgRNA were then incubated with Cas9-NLS protein (QB3

MacroLab, UC Berkeley, CA), each at a concentration of 2 µM, for 30 minutes. Once bound, the

74 complexes were added to the aliquoted cell suspensions along with PCR fragments encoding antibiotic resistance (0.5-1 µg) and mixed by digital manipulation (flicking).

Electroporation of Chlorella sp. CCMP252 was achieved using a voltage of 850 V across a chilled 2 mm gap cuvette, with a fixed time constant of 20 ms in a Gene Pulser Xcell™

Electroporation System (Bio-Rad Laboratories, Hercules, CA). Cell solutions were immediately resuspended to 1 ml with growth media and transferred to vented culture tubes. Cultures were further diluted with 5 ml of additional media and incubated overnight in a rotary shaker at ambient light, temperature, and CO2. Cultures were plated the next morning, splitting each tube across two plates with phleomycin, ampicillin, and kanamycin (30 µg/ml, 100 µg/ml, and 100

µg/ml, respectively).

After 5-7 days, colonies were picked and streaked onto similar plates. Some fraction

(approx. 20%) of these colonies failed to grow on replating, implying either escapes or insufficient expression levels of the antibiotic resistance marker. Transformants that did grow were analyzed by PCR of the targeted gene.

5.2.4. PCR analysis of transformants

Two rounds of PCR using nested primers (Supplementary Table 1) were carried out, using KOD Hot Start Master Mix (Millipore Sigma). The thermocycler protocol began with a 2 min polymerase activation at 95 °C, then 35 cycles with a 30 s denaturation at 95 °C, an annealing temperature of 65 °C for 20 s, and 2 min extension at 70 °C. The reaction was completed with a final extension at 70 °C for 10 min. Results of the first round of PCR were diluted 100X with deionized water and used as template for a secondary amplification using the same settings. PCR products were cleaned up on using NEB Monarch PCR & DNA Cleanup Kit according to kit instructions.

5.2.5. Locus sequencing

The genomic PCR products of the targeted region were sent for sequencing using primers internal to the second amplification set of primers, with primers directed inward from

75 either site of both CRISPR sites (Supplementary Table 1). If sequence did not reach either (1) the other end of the insertion site or (2) the antibiotic resistance fragment, primers internal to the resistance marker and directed outward were used to close the gap.

5.2.6. NR phenotype on agar

Thirty-six viable colonies from restreaking onto urea-containing plates with antibiotic were inoculated into N-free liquid media and replica plated onto plates containing urea as a nitrogen source (positive control), nitrate as a nitrogen source (test), and no nitrogen source

(negative control). Once all plated cells appeared bleached on the nitrogen-free plate, the nitrate-only plate was scored for ability to grow.

5.3. Results and discussion

5.3.1. Chlorella sp. strain CCMP252 is high-light tolerant

The results of our photosynthetic rate analysis show Chlorella sp. strain CCMP252 to be high-light tolerant, reaching a photosynthetic maximum of > 165 µmol O2/mg Chl·hr at a light intensity of 1000 µmol/m2s, with little photoinhibition noticed up to an irradiance of 2000

µmol/m2s (Figure 5.1).

5.3.2. Most Cas9-induced breakages result in insertion of selection marker fragment

Cas9 is often used to disrupt genes by insertion or deletion of a few nucleotides, resulting in a frameshift and an early stop. In our analysis, most interrupted genes were instead caused by insertion of our selection marker fragment (Figure 5.2). The prevalence of insertions at the cleavage site suggests that the rate limiting factor for marker integration may indeed be the presence of double-stranded breaks (DSB) into which the marker may be inserted.

Presumably, this would mean that increasing the occurrence of DSBs in transformed cells should increase the observed rate of transformants.

5.4. Conclusions

We have presented a simple and high-efficiency method for knocking out selected genes in a fast-growing, high light-, high temperature-, and high salt-tolerant green alga. The Chlorella

76

200 180 160

/mg Chl hr) Chl /mg 140 2 120

mol O mol 100 μ 80 Chlorella sp. strain CCMP252 60 Tetraselmis suecica UTEX2286

40 evolution ( evolution 2 20

0 Net O Net 0 500 1000 1500 2000 Irradiance (μmol/m2s)

Figure 5.3. Photosynthesis-irradiance curve for Chlorella sp. strain CCMP252 and Tetraselmis suecica UTEX2286. Error bars indicate standard deviations of triplicate measurements.

genus is large and diverse [180], and the strain employed by this work is exceptionally facile to manipulate in the laboratory. The methods demonstrated in this work will allow for rapid reverse genetic investigation owing to the quick growth both on plate and in liquid.

The ease of implementing this technique in Chlorella suggest that this strategy may be broadly applicable to other host strains. We aim to apply this same technique to newly developed strains as well as on Nannochloropsis gaditana CCMP526.

77

Figure 5.4. Exemplary agarose gel illustrating the increased band size resulting from insertion of the selection marker fragment at the Cas9 target site in the nitrate reductase coding sequence. Left and right lanes are loaded with 5 µl of O’GeneRuler 1 kb Plus DNA Ladder (ThermoFisher), and wild-type C. 252 was used as template in lane 8, and water in lane 10.

78

CHAPTER 6

CONCLUSIONS AND FUTURE WORK

6.1. General Discussion

Nannochloropsis remains among a small set of algae that grow well in outdoor open pond bioreactors, with many faster growing strains succumbing to predation or infestation. This, paired with the capacity to accumulate high levels of fatty acids, make it a genus of ongoing interest to large-scale industrial production, driving the need for advanced genetic tools and investigation.

6.1.1. Identifying fluorescent profile of N. gaditana to inform choice of FP reporter

We quickly realized that we were not detecting the fluorescence levels presented earlier in our lab in cyanobacterial hosts (Appendix A.1). In cyanobacteria (Synechococcus sp. strain

7002), we could visualize widespread fluorescence in nearly every cell under the fluorescence scope, even using suboptimal optical filters. It was also very easy to detect their fluorescence in bulk solutions, simply by taking fixed wavelength readings from a fluorimeter.

For Nannochloropsis, neither of these methods was readily useful. Using the same fluorescent protein (Venus [181]), the fluorescence was not easily distinguishable from chlorophyll autofluorescence by fluorescent microscopy. Analyzing a large set (40+) of FP transformants by fluorimeter illustrated the need to subtract chlorophyll autofluorescence, necessitating a sweep of the emission spectrum. Eventually, one mutant was detected with fluorescence. Unfortunatly, this same mutant under fluorescence microscopy was still challenging to localize.

After this experience, we designed an investigation into the fluorescence profile of the host strain in search for a better fluorescent reporter (Figure A.2). The results led us to the adoption of a newly characterized green fluorescent protein with improved quantum yield [172].

79

The choice of this new fluorescence reporter and the arrival of flow cytometry instrumentation to our lab has made detection and characterization of fluorescent mutants much faster and more successful.

6.1.2. Struggles with RNA interference

The use of RNA interference has been demonstrated several times in the

Nannochloropsis genus [171,182,183], but has always resulted in an array of knockdown levels, largely caused by the varying accessibility of the construct site of integration into the genome.

We struggled to achieve significant knockdowns of thioreductase and PKS genes in our

Nannochloropsis. These complications were in part due to the challenging GC-rich targets, whose sequences often generate secondary structures that plague PCR analysis with difficult to interpret incorrectly-sized bands and favored smaller products. Combining with these obstacles were the expected growth biases toward RNAi mutants with modest or negligible levels of gene silencing in these targets that significantly benefit cell viability and competitiveness. As each transformation generates a heterogenous spread of knockdown efficacy, we made an effort to maintain all mutants generated by transformation, with an emphasis on those potentially displaying growth defects.

Over time, many of the slow-growing mutants died out next to healthier generated mutants. Transcription analysis by quantitative PCR showed that the mutants who survived were indistinguishable from wild-type cells by their levels of expression of the gene targets.

These early challenges, paired with reports of non-specific RNAi defects and potential instability ended these efforts, allowing us to focus on targeted nucleases.

CRISPR/Cas9 technology allows us to both improve the efficacy of RNAi and to achieve the same effect by other means. The efficacy can be improved by targeting highly accessible regions of the genome for integration of the expression cassette. By doing this, we can eliminate the greatest predicted variable in the heterogenous knockdown levels observed for traditional

80

RNAi experiments. There are, for example, highly expressed transcripts with no predicted coding sequence that would make suitable integration sites for RNAi expression cassettes.

The alternative to RNA interference is gene attenuation by using Cas9-targeted insertion to interrupt the 5’ or 3’ untranslated region of a gene of interest, thus destabilizing the mRNA product and lowering steady-state levels. This is similar in effect to RNAi and has been demonstrated already in Nannochloropsis [171].

6.1.3. Cas9 diagnostic tests and troubleshooting methods

When this work started, there was no practical way to troubleshoot Cas9 activity. As a result, each component needed to be validated separately. This meant tagging a Cas9 with both

NLS and GFP to ensure that (a) the Cas9 was being expressed and (b) it was being localized to the nucleus of the cell. Only once these were validated could we go on to check other components of the strategy.

We also developed a strategy for amplifying the target DNA and using a commercial

Cas9 protein to test our synthesized sgRNA for its ability to cut the exact target sequence. This helped us identify the need to melt and reanneal our guides. This work also highlighted the significant increase in activity from sgRNA that had been vetted through an online scoring algorithm to select for better target sequences. The ability to validate each guide as it was synthesized allowed us to eliminate ineffective or degraded guides as a potential problem.

Finally, the use of instertional mutagenesis allowed us to rapidly screen generated mutants in large number very early by use of PCR. This technique reduced the turnaround time between experiment and results and helped us to more quickly optimize our transformation conditions.

6.2. Recommendations for future research

Cas9 and other CRISRP-related nucleases (e.g., Cpf1) represent an incredibly enabling technology. In this work, we demonstrated the ability to knock out targeted genes by insertion of an antibiotic resistance marker cassette. In other systems, modified Cas9 proteins lacking

81 nuclease activity (dCas9) have been fused to activator domains (e.g., VP64, p65), repressor domains (e.g., KRAB), and methyltransferases to alter expression levels of genes. There is now also a wave of tailored alteration to genomes which rely on to manipulate a few bases, insert a fused epitope tag, or alter UTRs to generate specific new products by supplying ssDNA as template.

6.2.1. Interrupting cell wall synthesis to generate protoplast strains

After successfully knocking out nitrate reductase in Nannochloropsis, the next goals were to characterize and validate the genes involved in the synthesis of the storage carbohydrate, and to generate a cell wall mutant. The cell wall of N. gaditana is composed of a cellulose layer outside the cell membrane, itself enveloped in a layer of algaenan.

We have identified four cellulose synthase genes in N. gaditana, and two of these genes have been knocked out with no easily detected phenotype. We are continuing efforts to knock out the remaining two copies. It is possible that a cell wall mutant will be more susceptible to antibiotic, which may make selection of these mutants especially challenging.

In addition to the cellulose synthase genes, we have identified six different polyketide synthase genes in Nannochloropsis, which we hypothesize to be involved in the the biosynthesis of the long aliphatic chains of the algeanan layer of the cell wall (Figure A.3). We have successfully knocked out the most highly transcribed PKS, PKS6, but it has gene redundancy in two other transcripts (PKS4 and PKS5). We are now focused on PKS3, which has a unique arrangement of catalytic domains.

6.2.2. Characterizing the carbon concentrating mechanisms of Nannochloropsis

We have expended significant effort developing the fluorescent marker technology for N. gaditana, which included the overexpression of GFP-tagged elements of the carbon concentrating mechanism. We have isolated high-expression mutants of N. gaditana which allow for the localization of each of the identified proteins, including four carbonic anhydrases and two bicarbonate transporters that are common to all species of Nannochloropsis currently

82 sequenced. Of note, we observed the localization of a previously characterized carbonic anhydrase (CAH1) to the cell wall in our mutants [184].

We also identified the presence of a tandem repeat of an alpha carbonic anhydrase which is only found in gaditana and salina species of Nannochloropsis, and completely absent from others (oceanica, oculata, limnetica, granulate).

6.2.3. Implementation of Cas9 RNP complex and expansion into other organisms

While cost and availability directed these early efforts toward a stable Cas9-expressing host strain, the rapid development in this field has brought down prices and increased availability.

We have demonstrated the flexibility of Cas9 in two systems, and the host-independence of the

RNP complex system suggests utility in a broad range of hosts. We will apply this technique to new systems our lab is working with in order to increase our interrogative power by inserting expression cassettes into highly active regions and knocking out genes of interest.

6.2.4. Knockout of chlororespiration in high-light tolerant strains

Other efforts in our lab have focused on isolating and characterizing strains of alga with natural tolerance for high light conditions. These organisms exhibit reduced antenna size and efficient use of incident light. We are adapting our RNP strategies to interrogate their photosystems, beginning with the knockout of the plastidial terminal oxidase, believed to be among few means for dealing with excessive electron flow without production of reactive oxygen species. A knockout would open the door to further investigation of the flow of electrons around the photosynthetic apparatus.

6.2.5. Precision gene editing

In model systems, the technology has been advanced as far as allowing very specific alterations by combination of CRISPR/Cas9 with single-stranded template DNA. The use of ssDNA allows for efficient binding spanning double-stranded breaks, promoting homologous recombination. Cas9-induced homologous recombination will allow for manipulation of active sites, addition of fluorescent proteins or epitope tags to endogenous genes, or tuning of

83 regulatory elements. Amid the rapid progression in genetic techniques following the adoption of

Cas9, demonstrating this capability is sure to be the next bar to meet in order to compete for funding.

84

REFERENCES

[1] Y.T. Kim, E.H. Kim, C. Cheong, D.L. Williams, C.W. Kim, S.T. Lim, Structural characterization of beta-D-(1 --> 3, 1 --> 6)-linked glucans using NMR spectroscopy, Carbohydr. Res. 328 (2000) 331–341.

[2] R. Radakovits, R.E. Jinkerson, S.I. Fuerstenberg, H. Tae, R.E. Settlage, J.L. Boore, M.C. Posewitz, Draft genome sequence and genetic transformation of the oleaginous alga Nannochloropis gaditana, Nat. Commun. 3 (2012) 686. doi:10.1038/ncomms1688.

[3] P. Falkowski, Ocean Science: The power of , Nature. (2012). doi:10.1038/483S17a.

[4] D.E. Bloom, 7 Billion and Counting, Science. 333 (2011) 562–569. doi:10.1126/science.1209290.

[5] G.C. Dismukes, D. Carrieri, N. Bennette, G.M. Ananyev, M.C. Posewitz, Aquatic phototrophs: efficient alternatives to land-based crops for biofuels, Curr. Opin. Biotechnol. 19 (2008) 235–240. doi:10.1016/j.copbio.2008.05.007.

[6] Q. Hu, M. Sommerfeld, E. Jarvis, M. Ghirardi, M. Posewitz, M. Seibert, A. Darzins, Microalgal triacylglycerols as feedstocks for biofuel production: perspectives and advances, Plant J. Cell Mol. Biol. 54 (2008) 621–639. doi:10.1111/j.1365- 313X.2008.03492.x.

[7] R.E. Blankenship, D.M. Tiede, J. Barber, G.W. Brudvig, G. Fleming, M. Ghirardi, M.R. Gunner, W. Junge, D.M. Kramer, A. Melis, T.A. Moore, C.C. Moser, D.G. Nocera, A.J. Nozik, D.R. Ort, W.W. Parson, R.C. Prince, R.T. Sayre, Comparing Photosynthetic and Photovoltaic Efficiencies and Recognizing the Potential for Improvement, Science. 332 (2011) 805–809. doi:10.1126/science.1200165.

[8] S.S. Merchant, J. Kropat, B. Liu, J. Shaw, J. Warakanont, TAG, You’re it! Chlamydomonas as a reference organism for understanding algal triacylglycerol accumulation, Curr. Opin. Biotechnol. 23 (2012) 352–363. doi:10.1016/j.copbio.2011.12.001.

[9] R. Radakovits, R.E. Jinkerson, A. Darzins, M.C. Posewitz, Genetic Engineering of Algae for Enhanced Biofuel Production, Eukaryot. Cell. 9 (2010) 486–501. doi:10.1128/EC.00364-09.

[10] .H. Work, S. D’Adamo, R. Radakovits, R.E. Jinkerson, M.C. Posewitz, Improving photosynthesis and metabolic networks for the competitive production of phototroph- derived biofuels, Curr. Opin. Biotechnol. 23 (2012) 290–297. doi:10.1016/j.copbio.2011.11.022.

[11] M. Mitra, A. Melis, Optical properties of microalgae for enhanced biofuels production, Opt. Express. 16 (2008) 21807–21820. doi:10.1364/OE.16.021807.

85

[12] L.G. Elliott, C. Feehan, L.M.L. Laurens, P.T. Pienkos, A. Darzins, M.C. Posewitz, Establishment of a bioenergy-focused microalgal culture collection, Algal Res. 1 (2012) 102–113. doi:10.1016/j.algal.2012.05.002.

[13] J. Lü, C. Sheahan, P. Fu, Metabolic engineering of algae for fourth generation biofuels production, Energy Environ. Sci. 4 (2011) 2451. doi:10.1039/c0ee00593b.

[14] J.H. Mussgnug, S. Thomas Hall, J. Rupprecht, A. Foo, V. Klassen, A. McDowall, P.M. Schenk, O. Kruse, B. Hankamer, Engineering photosynthetic light capture: impacts on improved solar energy to biomass‐ conversion, Plant Biotechnol. J. 5 (n.d.) 802–814. doi:10.1111/j.1467-7652.2007.00285.x.

[15] R. Radakovits, P.M. Eduafo, M.C. Posewitz, Genetic engineering of fatty acid chain length in Phaeodactylum tricornutum, Metab. Eng. 13 (2011) 89–95. doi:10.1016/j.ymben.2010.10.003.

[16] B.A. Rasala, S.P. Mayfield, The microalga Chlamydomonas reinhardtii as a platform for the production of human protein therapeutics, Bioeng. Bugs. 2 (2011) 50–54. doi:10.4161/bbug.2.1.13423.

[17] J. Rupprecht, From systems biology to fuel—Chlamydomonas reinhardtii as a model for a systems biology approach to improve biohydrogen production, J. Biotechnol. 142 (2009) 10–20. doi:10.1016/j.jbiotec.2009.02.008.

[18] Z.T. Wang, N. Ullrich, S. Joo, S. Waffenschmidt, U. Goodenough, Algal Lipid Bodies: Stress Induction, Purification, and Biochemical Characterization in Wild-Type and Starchless Chlamydomonas reinhardtii, Eukaryot. Cell. 8 (2009) 1856–1868. doi:10.1128/EC.00272-09.

[19] V.H. Work, R. Radakovits, R.E. Jinkerson, J.E. Meuser, L.G. Elliott, D.J. Vinyard, L.M.L. Laurens, G.C. Dismukes, M.C. Posewitz, Increased Lipid Accumulation in the Chlamydomonas reinhardtii sta7-10 Starchless Isoamylase Mutant and Increased Carbohydrate Synthesis in Complemented Strains, Eukaryot. Cell. 9 (2010) 1251–1261. doi:10.1128/EC.00075-10.

[20] K. Anandarajah, G. Mahendraperumal, M. Sommerfeld, Q. Hu, Characterization of microalga Nannochloropsis sp. mutants for improved production of biofuels, Appl. Energy. 96 (2012) 371–377. doi:10.1016/j.apenergy.2012.02.057.

[21] Y. Chisti, Biodiesel from microalgae, Biotechnol. Adv. 25 (2007) 294–306. doi:10.1016/j.biotechadv.2007.02.001.

[22] H. Hu, K. Gao, Optimization of growth and fatty acid composition of a unicellular marine , Nannochloropsis sp., with enriched carbon sources, Biotechnol. Lett. 25 (2003) 421–425. doi:10.1023/A:1022489108980.

[23] J. Van Wagenen, T.W. Miller, S. Hobbs, P. Hook, B. Crowe, M. Huesemann, Effects of Light and Temperature on Fatty Acid Production in Nannochloropsis Salina, Energies. 5 (2012) 731–740. doi:10.3390/en5030731.

86

[24] B. Wawrik, B.H. Harriman, Rapid, colorimetric quantification of lipid from algal cultures, J. Microbiol. Methods. 80 (2010) 262–266. doi:10.1016/j.mimet.2010.01.016.

[25] O. Kilian, C.S.E. Benemann, K.K. Niyogi, B. Vick, High-efficiency homologous recombination in the oil-producing alga Nannochloropsis sp, Proc. Natl. Acad. Sci. (2011). doi:10.1073/pnas.1105861108.

[26] A. Minoda, R. Sakagami, F. Yagisawa, T. Kuroiwa, K. Tanaka, Improvement of Culture Conditions and Evidence for Nuclear Transformation by Homologous Recombination in a Red Alga, Cyanidioschyzon merolae 10D, Plant Cell Physiol. 45 (2004) 667–671. doi:10.1093/pcp/pch087.

[27] J.A. Casas-Mollano, J. Rohr, E.-J. Kim, E. Balassa, K. van Dijk, H. Cerutti, Diversification of the Core RNA Interference Machinery in Chlamydomonas reinhardtii and the Role of DCL1 in Transposon Silencing, Genetics. 179 (2008) 69–81. doi:10.1534/genetics.107.086546.

[28] H. Cerutti, F. Ibrahim, Turnover of Mature miRNAs and siRNAs in Plants and Algae, in: Regul. MicroRNAs, Springer, New York, NY, 2010: pp. 124–139. doi:10.1007/978-1- 4419-7823-3_11.

[29] H. Cerutti, X. Ma, J. Msanne, T. Repas, RNA-Mediated Silencing in Algae: Biological Roles and Tools for Analysis of Gene Function, Eukaryot. Cell. 10 (2011) 1164–1172. doi:10.1128/EC.05106-11.

[30] V.D. Riso, R. Raniello, F. Maumus, A. Rogato, C. Bowler, A. Falciatore, Gene silencing in the marine diatom Phaeodactylum tricornutum, Nucleic Acids Res. 37 (2009) e96–e96. doi:10.1093/nar/gkp448.

[31] A. Huang, L. He, G. Wang, Identification and characterization of microRNAs from Phaeodactylum tricornutum by high-throughput sequencing and bioinformatics analysis, BMC Genomics. 12 (2011) 337. doi:10.1186/1471-2164-12-337.

[32] A. Molnar, A. Bassett, E. Thuenemann, F. Schwach, S. Karkare, S. Ossowski, D. Weigel, D. Baulcombe, Highly specific gene silencing by artificial microRNAs in the unicellular alga Chlamydomonas reinhardtii, Plant J. 58 (n.d.) 165–174. doi:10.1111/j.1365- 313X.2008.03767.x.

[33] A. Molnár, F. Schwach, D.J. Studholme, E.C. Thuenemann, D.C. Baulcombe, miRNAs control in the single-cell alga Chlamydomonas reinhardtii, Nature. 447 (2007) 1126–1129. doi:10.1038/nature05903.

[34] J. Rohr, N. Sarkar, S. Balenger, B. Jeong, H. Cerutti, Tandem inverted repeat system for selection of effective transgenic RNAi strains in Chlamydomonas, Plant J. 40 (n.d.) 611– 621. doi:10.1111/j.1365-313X.2004.02227.x.

[35] E. Corteggiani Carpinelli, A. Telatin, N. itulo, C. Forcato, M. D’Angelo, R. Schiavon, A. Vezzi, G.M. Giacometti, T. Morosinotto, G. Valle, Chromosome Scale Genome Assembly and Transcriptome Profiling of Nannochloropsis gaditana in Nitrogen Depletion, Mol. Plant. 7 (2014) 323–335. doi:10.1093/mp/sst120.

87

[36] D. Wang, K. Ning, J. Li, J. Hu, D. Han, H. Wang, X. Zeng, X. Jing, Q. Zhou, X. Su, X. Chang, A. Wang, W. Wang, J. Jia, L. Wei, Y. Xin, Y. Qiao, R. Huang, J. Chen, B. Han, K. Yoon, R.T. Hill, Y. Zohar, F. Chen, Q. Hu, J. Xu, Nannochloropsis Genomes Reveal Evolution of Microalgal Oleaginous Traits, PLoS Genet. 10 (2014) e1004094. doi:10.1371/journal.pgen.1004094.

[37] A.S. Schwartz, R. Brown, I. Ajjawi, J. McCarren, S. Atilla, N. Bauman, T.H. Richardson, Complete Genome Sequence of the Model Oleaginous Alga Nannochloropsis gaditana CCMP1894, Genome Announc. 6 (2018) e01448-17. doi:10.1128/genomeA.01448-17.

[38] S. Boussiba, A. Vonshak, Z. Cohen, Y. Avissar, A. Richmond, Lipid and biomass production by the halotolerant microalga Nannochloropsis salina, Biomass. 12 (1987) 37– 47. doi:10.1016/0144-4565(87)90006-0.

[39] A. Converti, A.A. Casazza, E.Y. Ortiz, P. Perego, M. Del Borghi, Effect of temperature and nitrogen concentration on the growth and lipid content of Nannochloropsis oculata and Chlorella vulgaris for biodiesel production, Chem. Eng. Process. Process Intensif. 48 (2009) 1146–1151. doi:10.1016/j.cep.2009.03.006.

[40] D. Simionato, E. Sforza, E. Corteggiani Carpinelli, A. Bertucco, G.M. Giacometti, T. Morosinotto, Acclimation of Nannochloropsis gaditana to different illumination regimes: Effects on lipids accumulation, Bioresour. Technol. 102 (2011) 6026–6032. doi:10.1016/j.biortech.2011.02.100.

[41] L. Rodolfi, G. Chini Zittelli, N. Bassi, G. Padovani, N. Biondi, G. Bonini, M.R. Tredici, Microalgae for oil: Strain selection, induction of lipid synthesis and outdoor mass cultivation in a low-cost photobioreactor, Biotechnol. Bioeng. 102 (2009) 100–112. doi:10.1002/bit.22033.

[42] L.M. Lubián, O. Montero, I. Moreno-Garrido, I.E. Huertas, C. Sobrino, M.G. Valle, G. Parés, Nannochloropsis (Eustigmatophyceae) as source of commercially valuable pigments, J. Appl. Phycol. 12 (2000) 249–255. doi:10.1023/A:1008170915932.

[43] J.M.S. Rocha, J.E.C. Garcia, M.H.F. Henriques, Growth aspects of the marine microalga Nannochloropsis gaditana, Biomol. Eng. 20 (2003) 237–242. doi:10.1016/S1389- 0344(03)00061-3.

[44] R.A. Andersen, R.W. Brett, D. Potter, J.P. Sexton, Phylogeny of the Eustigmatophyceae Based upon 18S rDNA, with Emphasis on Nannochloropsis, Protist. 149 (1998) 61–74. doi:10.1016/S1434-4610(98)70010-0.

[45] A. Vieler, G. Wu, C.-H. Tsai, B. Bullard, A.J. Cornish, C. Harvey, I.-B. Reca, C. Thornburg, R. Achawanantakun, C.J. Buehl, M.S. Campbell, D. Cavalier, K.L. Childs, T.J. Clark, R. Deshpande, E. Erickson, A. Armenia Ferguson, W. Handee, Q. Kong, X. Li, B. Liu, S. Lundback, C. Peng, R.L. Roston, Sanjaya, J.P. Simpson, A. TerBush, J. Warakanont, S. Zäuner, E.M. Farre, E.L. Hegg, N. Jiang, M.-H. Kuo, Y. Lu, K.K. Niyogi, J. Ohlrogge, K.W. Osteryoung, Y. Shachar-Hill, B.B. Sears, Y. Sun, H. Takahashi, M. Yandell, S.-H. Shiu, C. Benning, Genome, Functional Gene Annotation, and Nuclear Transformation of the Oleaginous Alga Nannochloropsis oceanica CCMP1779, PLoS Genet. 8 (2012) e1003064. doi:10.1371/journal.pgen.1003064.

88

[46] P.A. Hodgson, R.J. Henderson, J.R. Sargent, J.W. Leftley, Patterns of variation in the lipid class and fatty acid composition of Nannochloropsis oculata (Eustigmatophyceae) during batch culture, J. Appl. Phycol. 3 (1991) 169–181. doi:10.1007/BF00003699.

[47] R.E. Jinkerson, R. Radakovits, M.C. Posewitz, Genomic insights from the oleaginous model alga Nannochloropsis gaditana, Bioengineered. 4 (2013) 37–43. doi:10.4161/bioe.21880.

[48] Wang Qintao, Lu Yandu, Xin Yi, Wei Li, Huang Shi, Xu Jian, Genome editing of model oleaginous microalgae Nannochloropsis spp. by CRISPR/Cas9, Plant J. 88 (2016) 1071– 1081. doi:10.1111/tpj.13307.

[49] L.A. Zaslavskaia, J.C. Lippmeier, C. Shih, D. Ehrhardt, A.R. Grossman, K.E. Apt, Trophic Conversion of an Obligate Photoautotrophic Organism Through Metabolic Engineering, Science. 292 (2001) 2073–2075. doi:10.1126/science.160015.

[50] A. Hallmann, M. Sumper, The Chlorella hexose/H+ symporter is a useful selectable marker and biochemical reagent when expressed in Volvox, Proc. Natl. Acad. Sci. 93 (1996) 669–673. doi:10.1073/pnas.93.2.669.

[51] A. Doebbe, J. Rupprecht, J. Beckmann, J.H. Mussgnug, A. Hallmann, B. Hankamer, O. Kruse, Functional integration of the HUP1 hexose symporter gene into the genome of C. reinhardtii: Impacts on biological H2 production, J. Biotechnol. 131 (2007) 27–33. doi:10.1016/j.jbiotec.2007.05.017.

[52] C.-C. Zhang, R. Jeanjean, F. Joset, Obligate phototrophy in cyanobacteria: more than a lack of sugar transport, FEMS Microbiol. Lett. 161 (1998) 285–292. doi:10.1111/j.1574- 6968.1998.tb12959.x.

[53] A.J. Smith, J. London, R.Y. Stanier, Biochemical Basis of Obligate Autotrophy in Blue- Green Algae and Thiobacilli, J. Bacteriol. 94 (1967) 972–983.

[54] M.I. Mendzhul, T.G. Lysenko, O.A. Shainskaia, I.V. Busakhina, [Activity of tricarboxylic acid cycle enzymes in cyanobacteria Spirulina platensis], Mikrobiolohichnyi Zhurnal Kiev Ukr. 1993. 62 (2000) 3–10.

[55] A.H. NEILSONM, O. HOLM-HANSEN, R.A. LEWIN, An Obligately Autotrophic Mutant of Chlamydomonas dysosmos i a Biochemical Elucidation, Microbiology. 71 (1972) 141– 148. doi:10.1099/00221287-71-1-141.

[56] J. PEARCE, C.K. LEACH, N.G. CARR, The Incomplete Tricarboxylic Acid Cycle in the Blue-green Alga Anabaena Variabilis, Microbiology. 55 (1969) 371–378. doi:10.1099/00221287-55-3-371.

[57] S.J. Van Dien, Y. Okubo, M.T. Hough, N. Korotkova, T. Taitano, M.E. Lidstrom, Reconstruction of C3 and C4 metabolism in Methylobacterium extorquens AM1 using transposon mutagenesis, Microbiology. 149 (2003) 601–609. doi:10.1099/mic.0.25955-0.

[58] M.H. Spalding, Modulation of low carbon dioxide inducible proteins (lci) for increased biomass production and photosynthesis, US20130007916A1, 2013. https://patents.google.com/patent/US20130007916A1/en (accessed June 11, 2018).

89

[59] J.E.W. Polle, S.-D. Kanakagiri, A. Melis, tla1, a DNA insertional transformant of the green alga Chlamydomonas reinhardtii with a truncated light-harvesting chlorophyll antenna size, Planta. 217 (2003) 49–59. doi:10.1007/s00425-002-0968-1.

[60] H. Kirst, J.G. Garcia-Cerdan, A. Zurbriggen, T. Ruehle, A. Melis, Truncated Photosystem Chlorophyll Antenna Size in the Green Microalga Chlamydomonas reinhardtii upon Deletion of the TLA3-CpSRP43 Gene, Plant Physiol. 160 (2012) 2251–2260. doi:10.1104/pp.112.206672.

[61] H. Kirst, J.G. García-Cerdán, A. Zurbriggen, A. Melis, Assembly of the Light-Harvesting Chlorophyll Antenna in the Green Alga Chlamydomonas reinhardtii Requires Expression of the TLA2-CpFTSY Gene, Plant Physiol. 158 (2012) 930–945. doi:10.1104/pp.111.189910.

[62] M. Mitra, H. Kirst, D. Dewez, A. Melis, Modulation of the light-harvesting chlorophyll antenna size in Chlamydomonas reinhardtii by TLA1 gene over-expression and RNA interference, Phil Trans R Soc B. 367 (2012) 3430–3443. doi:10.1098/rstb.2012.0229.

[63] Y. Li, D. Han, G. Hu, D. Dauvillee, M. Sommerfeld, S. Ball, Q. Hu, Chlamydomonas starchless mutant defective in ADP-glucose pyrophosphorylase hyper-accumulates triacylglycerol, Metab. Eng. 12 (2010) 387–391. doi:10.1016/j.ymben.2010.02.002.

[64] Y. Li, D. Han, G. Hu, M. Sommerfeld, Q. Hu, Inhibition of starch synthesis results in overproduction of lipids in Chlamydomonas reinhardtii, Biotechnol. Bioeng. 107 (n.d.) 258–268. doi:10.1002/bit.22807.

[65] M. Siaut, S. Cuiné, C. Cagnon, B. Fessler, M. Nguyen, P. Carrier, A. Beyly, F. Beisson, C. Triantaphylidès, Y. Li-Beisson, G. Peltier, Oil accumulation in the model green alga Chlamydomonas reinhardtii: characterization, variability between common laboratory strains and relationship with starch reserves, BMC Biotechnol. 11 (2011) 7. doi:10.1186/1472-6750-11-7.

[66] W. Huang, I. Haferkamp, B. Lepetit, M. Molchanova, S. Hou, W. Jeblick, C.R. Bártulos, P.G. Kroth, Reduced vacuolar β-1,3-glucan synthesis affects carbohydrate metabolism as well as plastid homeostasis and structure in Phaeodactylum tricornutum, Proc. Natl. Acad. Sci. (2018) 201719274. doi:10.1073/pnas.1719274115.

[67] W. Huang, C.R. Bártulos, P.G. Kroth, Diatom Vacuolar 1,6-β-Transglycosylases can Functionally Complement the Respective Yeast Mutants, J. Eukaryot. Microbiol. 63 (2016) 536–546. doi:10.1111/jeu.12298.

[68] M. Hildebrand, K. Manandhar-Shrestha, R. Abbriano, Effects of chrysolaminarin synthase knockdown in the diatom Thalassiosira pseudonana: Implications of reduced carbohydrate storage relative to green algae, Algal Res. 23 (2017) 66–77. doi:10.1016/j.algal.2017.01.010.

[69] G. Michel, T. Tonon, D. Scornet, J.M. Cock, B. Kloareg, Central and storage carbon metabolism of the brown alga Ectocarpus siliculosus: insights into the origin and evolution of storage carbohydrates in Eukaryotes, New Phytol. 188 (2010) 67–81. doi:10.1111/j.1469-8137.2010.03345.x.

90

[70] S.C.L. Yeo, L. Xu, J. Ren, V.J. Boulton, M.D. Wagle, C. Liu, G. Ren, P. Wong, R. Zahn, P. Sasajala, H. Yang, R.C. Piper, A.L. Munn, Vps20p and Vta1p interact with Vps4p and function in multivesicular body sorting and endosomal transport in Saccharomyces cerevisiae, J Cell Sci. 116 (2003) 3957–3970. doi:10.1242/jcs.00751.

[71] A. Ben-Amotz, M. Avron, The Role of Glycerol in the Osmotic Regulation of the Halophilic Alga Dunaliella parva, Plant Physiol. 51 (1973) 875–878. doi:10.1104/pp.51.5.875.

[72] W.S. Adair, S.A. Steinmetz, D.M. Mattson, U.W. Goodenough, J.E. Heuser, Nucleated assembly of Chlamydomonas and Volvox cell walls., J. Cell Biol. 105 (1987) 2373–2382. doi:10.1083/jcb.105.5.2373.

[73] T. Yamada, K. Sakaguchi, Comparative studies onChlorella cell walls: Induction of protoplast formation, Arch. Microbiol. 132 (1982) 10–13. doi:10.1007/BF00690809.

[74] R.M. Brown, W.W. Franke, H. Kleinig, H. Falk, P. Sitte, SCALE FORMATION IN CHRYSOPHYCEAN ALGAE: I. Cellulosic and Noncellulosic Wall Components Made by the Golgi Apparatus, J. Cell Biol. 45 (1970) 246–271. doi:10.1083/jcb.45.2.246.

[75] T.L. Weiss, R. Roth, C. Goodson, S. Vitha, I. Black, P. Azadi, J. Rusch, A. Holzenburg, T.P. Devarenne, U. Goodenough, Colony Organization in the Green Alga Botryococcus braunii (Race B) Is Specified by a Complex Extracellular Matrix, Eukaryot. Cell. 11 (2012) 1424–1440. doi:10.1128/EC.00184-12.

[76] E. Loos, D. Meindl, Composition of the cell wall of Chlorella fusca, Planta. 156 (1982) 270–273. doi:10.1007/BF00393735.

[77] B. Allard, J. Templier, C. Largeau, An improved method for the isolation of artifact-free algaenans from microalgae, Org. Geochem. 28 (1998) 543–548. doi:10.1016/S0146- 6380(98)00012-6.

[78] F. Gelin, I. Boogers, A.A.M. Noordeloos, J.S.S. Damste, R. Riegman, J.W. De Leeuw, Resistant biomacromolecules in marine microalgae of the classes Eustigmatophyceae and Chlorophyceae: Geochemical implications, Org. Geochem. 26 (1997) 659–675. doi:10.1016/S0146-6380(97)00035-1.

[79] R.B. Kodner, R.E. Summons, A.H. Knoll, Phylogenetic investigation of the aliphatic, non- hydrolyzable biopolymer algaenan, with a focus on green algae, Org. Geochem. 40 (2009) 854–862. doi:10.1016/j.orggeochem.2009.05.003.

[80] A.J. Simpson, X. Zang, R. Kramer, P.G. Hatcher, New insights on the structure of algaenan from Botryoccocus braunii race A and its hexane insoluble botryals based on multidimensional NMR spectroscopy and electrospray–mass spectrometry techniques, Phytochemistry. 62 (2003) 783–796. doi:10.1016/S0031-9422(02)00628-3.

[81] P. Blokker, S. Schouten, H. van den Ende, J.W. de Leeuw, P.G. Hatcher, J.S. Sinninghe Damsté, Chemical structure of algaenans from the fresh water algae Tetraedron minimum, Scenedesmus communis and Pediastrum boryanum, Org. Geochem. 29 (1998) 1453–1468. doi:10.1016/S0146-6380(98)00111-9.

91

[82] R. Koivikko, J. Loponen, T. Honkanen, V. Jormalainen, CONTENTS OF SOLUBLE, CELL-WALL-BOUND AND EXUDED PHLOROTANNINS IN THE BROWN ALGA Fucus vesiculosus, WITH IMPLICATIONS ON THEIR ECOLOGICAL FUNCTIONS, J. Chem. Ecol. 31 (2005) 195–212. doi:10.1007/s10886-005-0984-2.

[83] P.T. Martone, J.M. Estevez, F. Lu, K. Ruel, M.W. Denny, C. Somerville, J. Ralph, Discovery of Lignin in Seaweed Reveals Convergent Evolution of Cell-Wall Architecture, Curr. Biol. 19 (2009) 169–175. doi:10.1016/j.cub.2008.12.031.

[84] D.P. Delmer, Cellulose Biosynthesis, Annu. Rev. Plant Physiol. 38 (1987) 259–290. doi:10.1146/annurev.pp.38.060187.001355.

[85] E. Kapaun, W. Reisser, A chitin-like glycan in the cell wall of a Chlorella sp. (Chlorococcales, Chlorophyceae), Planta. 197 (1995) 577–582. doi:10.1007/BF00191563.

[86] D.S. Domozych, K.D. Stewart, K.R. Mattox, The comparative aspects of cell wall chemistry in the green algae (chlorophyta), J. Mol. Evol. 15 (1980) 1–12. doi:10.1007/BF01732578.

[87] D.S. Domozych, A. Serfis, S.N. Kiemle, M.R. Gretz, The structure and biochemistry of charophycean cell walls: I. Pectins of Penium margaritaceum, Protoplasma. 230 (2007) 99–115. doi:10.1007/s00709-006- 0197-8.

[88] O. Berteau, B. Mulloy, Sulfated fucans, fresh perspectives: structures, functions, and biological properties of sulfated fucans and an overview of enzymes active toward this class of polysaccharide, Glycobiology. 13 (2003) 29R-40R. doi:10.1093/glycob/cwg058.

[89] G. Michel, T. Tonon, D. Scornet, J.M. Cock, B. Kloareg, The cell wall polysaccharide metabolism of the brown alga Ectocarpus siliculosus. Insights into the evolution of extracellular matrix polysaccharides in Eukaryotes, New Phytol. 188 (2010) 82–97. doi:10.1111/j.1469-8137.2010.03374.x.

[90] M. Lahaye, A. Robic, Structure and Functional Properties of Ulvan, a Polysaccharide from Green Seaweeds, Biomacromolecules. 8 (2007) 1765–1774. doi:10.1021/bm061185q.

[91] G. Michel, W. Helbert, R. Kahn, O. Dideberg, B. Kloareg, The Structural Bases of the Processive Degradation of ι-Carrageenan, a Main Cell Wall Polysaccharide of Red Algae, J. Mol. Biol. 334 (2003) 421–433. doi:10.1016/j.jmb.2003.09.056.

[92] C.W. Ford, E. Percival, 551. Polysaccharides synthesised by Monodus subterraneus. Part II. The cell-wall glucan, J. Chem. Soc. Resumed. 0 (1965) 3014–3016. doi:10.1039/JR9650003014.

[93] C.M.P.G. Dore, M.G. das C. Faustino Alves, L.S.E. Pofírio Will, T.G. Costa, D.A. Sabry, L.A.R. de Souza Rêgo, C.M. Accardo, H.A.O. Rocha, L.G.A. Filgueira, E.L. Leite, A sulfated polysaccharide, fucans, isolated from brown algae Sargassum vulgare with

92

anticoagulant, antithrombotic, antioxidant and anti-inflammatory effects, Carbohydr. Polym. 91 (2013) 467–475. doi:10.1016/j.carbpol.2012.07.075.

[94] H.H. Tønnesen, J. Karlsen, Alginate in Drug Delivery Systems, Drug Dev. Ind. Pharm. 28 (2002) 621–630. doi:10.1081/DDC-120003853.

[95] M.J. Selig, W.S. Adney, M.E. Himmel, S.R. Decker, The impact of cell wall acetylation on corn stover hydrolysis by cellulolytic and xylanolytic enzymes, Cellulose. 16 (2009) 711– 722. doi:10.1007/s10570-009-9322-0.

[96] T. Wu, S. Zivanovic, Determination of the degree of acetylation (DA) of chitin and chitosan by an improved first derivative UV method, Carbohydr. Polym. 73 (2008) 248– 253. doi:10.1016/j.carbpol.2007.11.024.

[97] S. Bauer, P. Vasu, S. Persson, A.J. Mort, C.R. Somerville, Development and application of a suite of polysaccharide-degrading enzymes for analyzing plant cell walls, Proc. Natl. Acad. Sci. 103 (2006) 11417–11422. doi:10.1073/pnas.0604632103.

[98] J.B. Wehr, N.W. Menzies, F.P. C. Blamey, Inhibition of cell-wall autolysis and pectin degradation by cations, Plant Physiol. Biochem. 42 (2004) 485–492. doi:10.1016/j.plaphy.2004.05.006.

[99] H. Takeda, Sugar Composition of the Cell Wall and the of Chlorella (chlorophyceae)1, J. Phycol. 27 (n.d.) 224–232. doi:10.1111/j.0022-3646.1991.00224.x.

[100] H. Takeda, Cell wall sugars of some Scenedesmus species, Phytochemistry. 42 (1996) 673–675. doi:10.1016/0031-9422(95)00952-3.

[101] T. Punnett, E.C. Derrenbacker, The Amino Acid Composition of Algal Cell Walls, Microbiology. 44 (1966) 105–114. doi:10.1099/00221287-44-1-105.

[102] J. Burczyk, B. Śmietana, K. Termińska-Pabis, M. Zych, P. Kowalowski, Comparison of nitrogen content amino acid composition and glucosamine content of cell walls of various chlorococcalean algae, Phytochemistry. 51 (1999) 491–497. doi:10.1016/S0031- 9422(99)00063-1.

[103] M.J. Kieliszewski, D.T.A. Lamport, L. Tan, M.C. Cannon, Hydroxyproline-Rich Glycoproteins: Form and Function, in: Annu. Plant Rev., American Cancer Society, 2018: pp. 321–342. doi:10.1002/9781119312994.apr0442.

[104] F.A. Baylson, B.W. Stevens, D.S. Domozych, Composition and Synthesis of the Pectin and Protein Components of the Cell Wall of Closterium Acerosum (chlorophyta), J. Phycol. 37 (n.d.) 796–809. doi:10.1046/j.1529-8817.2001.00179.x.

[105] H.G. Gerken, B. Donohoe, E.P. Knoshaug, Enzymatic cell wall degradation of Chlorellavulgaris and other microalgae for biofuels production, Planta. 237 (2013) 239–253. doi:10.1007/s00425-012-1765-0.

93

[106] S.-B. Wang, Q. Hu, M. Sommerfeld, F. Chen, Cell wall proteomics of the green alga Haematococcus pluvialis (Chlorophyceae), PROTEOMICS. 4 (n.d.) 692–708. doi:10.1002/pmic.200300634.

[107] A.W. Atkinson, B.E.S. Gunning, P.C.L. John, Sporopollenin in the cell wall of Chlorella and other algae: Ultrastructure, chemistry, and incorporation of 14C-acetate, studied in synchronous cultures, Planta. 107 (1972) 1–32. doi:10.1007/BF00398011.

[108] C.E. Jeffree, The Fine Structure of the Plant Cuticle, in: Annu. Plant Rev. Vol. 23 Biol. Plant Cuticle, Wiley-Blackwell, 2007: pp. 11–125. doi:10.1002/9780470988718.ch2.

[109] B. Allard, M.-N. Rager, J. Templier, Occurrence of high molecular weight lipids (C80+) in the trilaminar outer cell walls of some freshwater microalgae. A reappraisal of algaenan structure, Org. Geochem. 33 (2002) 789–801. doi:10.1016/S0146-6380(02)00029-3.

[110] L. Rodolfi, G.C. Zittelli, L. Barsanti, G. Rosati, M.R. Tredici, Growth medium recycling in Nannochloropsis sp. mass cultivation, Biomol. Eng. 20 (2003) 243–248. doi:10.1016/S1389-0344(03)00063-7.

[111] M.J. Scholz, T.L. Weiss, R.E. Jinkerson, J. Jing, R. Roth, U. Goodenough, M.C. Posewitz, H.G. Gerken, Ultrastructure and Composition of the Nannochloropsis gaditana Cell Wall, Eukaryot. Cell. (2014) EC.00183-14. doi:10.1128/EC.00183-14.

[112] A. Barry, A. Wolfe, C. English, C. Ruddick, D. Lambert, 2016 National Algal Biofuels Technology Review, USDOE Office of Energy Efficiency and Renewable Energy (EERE), Bioenergy Technologies Office (EE-3B), 2016.

[113] J. Olivares, National Alliance for Advanced Biofuels and Bioproducts Synopsis (NAABB) Final Report, US DOE-EERE Biotechnol. Off. (2014).

[114] M.H. Langholtz, B.J. Stokes, L.M. Eaton, 2016 Billion-Ton Report: Advancing Domestic Resources for a Thriving Bioeconomy, EERE Publication and Product Library, 2016.

[115] K. Miazek, C. Remacle, A. Richel, D. Goffin, Effect of Lignocellulose Related Compounds on Microalgae Growth and Product Biosynthesis: A Review, Energies. 7 (2014) 4446– 4481. doi:10.3390/en7074446.

[116] O. Blifernez-Klassen, V. Klassen, A. Doebbe, K. Kersting, P. Grimm, L. Wobbe, O. Kruse, Cellulose degradation and assimilation by the unicellular phototrophic eukaryote Chlamydomonas reinhardtii, Nat. Commun. 3 (2012) 1214. doi:10.1038/ncomms2210.

[117] N. Sueoka, Mitotic Replication of Deoxyribonucleic Acid in Chlamydomonas Reinhardi, Proc. Natl. Acad. Sci. 46 (1960) 83–91.

[118] S. Pattathil, U. Avci, J.S. Miller, M.G. Hahn, Immunological Approaches to Plant Cell Wall and Biomass Characterization: Glycome Profiling, in: Biomass Convers., Humana Press, Totowa, NJ, 2012: pp. 61–72. doi:10.1007/978-1-61779-956-3_6.

94

[119] M. Liu, X. Mao, C. Ye, H. Huang, J.K. Nicholson, J.C. Lindon, Improved WATERGATE Pulse Sequences for Solvent Suppression in NMR Spectroscopy, J. Magn. Reson. 132 (1998) 125–129. doi:10.1006/jmre.1998.1405.

[120] S. Van Wychen, L.M.L. Laurens, Determination of Total Lipids as Fatty Acid Methyl Esters (FAME) by in situ Transesterification: Laboratory Analytical Procedure (LAP), National Renewable Energy Laboratory (NREL), Golden, CO., 2013.

[121] P.H. O’Farrell, High resolution two-dimensional electrophoresis of proteins., J. Biol. Chem. 250 (1975) 4007–4021.

[122] A. Burgess-Cassler, J.J. Johansen, D.A. Santek, J.R. Ide, N.C. Kendrick, Computerized quantitative analysis of coomassie-blue-stained serum proteins separated by two- dimensional electrophoresis., Clin. Chem. 35 (1989) 2297–2304.

[123] B.R. Oakley, D.R. Kirsch, N.R. Morris, A simplified ultrasensitive silver stain for detecting proteins in polyacrylamide gels, Anal. Biochem. 105 (1980) 361–363.

[124] A. Shevchenko, M. Wilm, O. Vorm, M. Mann, Mass Spectrometric Sequencing of Proteins from Silver-Stained Polyacrylamide Gels, Anal. Chem. 68 (1996) 850–858. doi:10.1021/ac950914h.

[125] C.C. Darie, K. Deinhardt, G. Zhang, H.S. Cardasis, M.V. Chao, T.A. Neubert, Identifying transient protein–protein interactions in EphB2 signaling by blue native PAGE and mass spectrometry, PROTEOMICS. 11 (2011) 4514–4528. doi:10.1002/pmic.201000819.

[126] I. Sokolowska, A.G. Woods, M.A. Gawinowicz, U. Roy, C.C. Darie, Identification of Potential Tumor Differentiation Factor (TDF) Receptor from Steroid-responsive and Steroid-resistant Breast Cancer Cells, J. Biol. Chem. 287 (2012) 1719–1733. doi:10.1074/jbc.M111.284091.

[127] I. Sokolowska, C. Dorobantu, A.G. Woods, A. Macovei, N. Branza-Nichita, C.C. Darie, Proteomic analysis of plasma membranes isolated from undifferentiated and differentiated HepaRG cells, Proteome Sci. 10 (2012) 47. doi:10.1186/1477-5956-10-47.

[128] I. Sokolowska, M.A. Gawinowicz, A.G.N. Wetie, C.C. Darie, Disulfide proteomics for identification of extracellular or secreted proteins, ELECTROPHORESIS. 33 (2012) 2527–2536. doi:10.1002/elps.201200182.

[129] S. Bennett, Solexa Ltd, Pharmacogenomics. 5 (2004) 433–438. doi:10.1517/14622416.5.4.433.

[130] M. Margulies, M. Egholm, W.E. Altman, S. Attiya, J.S. Bader, L.A. Bemben, J. Berka, M.S. Braverman, Y.-J. Chen, Z. Chen, S.B. Dewell, L. Du, J.M. Fierro, X.V. Gomes, B.C. Godwin, W. He, S. Helgesen, C.H. Ho, G.P. Irzyk, S.C. Jando, M.L.I. Alenquer, T.P. Jarvie, K.B. Jirage, J.-B. Kim, J.R. Knight, J.R. Lanza, J.H. Leamon, S.M. Lefkowitz, M. Lei, J. Li, K.L. Lohman, H. Lu, V.B. Makhijani, K.E. McDade, M.P. McKenna, E.W. Myers, E. Nickerson, J.R. Nobile, R. Plant, B.P. Puc, M.T. Ronan, G.T. Roth, G.J. Sarkis, J.F. Simons, J.W. Simpson, M. Srinivasan, K.R. Tartaro, A. Tomasz, K.A. Vogt, G.A. Volkmer, S.H. Wang, Y. Wang, M.P. Weiner, P. Yu, R.F. Begley, J.M. Rothberg, Genome

95

sequencing in microfabricated high-density picolitre reactors, Nature. 437 (2005) 376– 380. doi:10.1038/nature03959.

[131] D.R. Zerbino, E. Birney, Velvet: Algorithms for de novo short read assembly using de Bruijn graphs, Genome Res. 18 (2008) 821–829. doi:10.1101/gr.074492.107.

[132] B. Ewing, L. Hillier, M.C. Wendl, P. Green, Base-Calling of Automated Sequencer Traces UsingPhred. I. Accuracy Assessment, Genome Res. 8 (1998) 175–185. doi:10.1101/gr.8.3.175.

[133] C. Han, P. Chain, Finishing Repetitive Regions Automatically with Dupfinisher., in: BIOCOMP, 2006: pp. 142–147.

[134] D. Gordon, C. Abajian, P. Green, Consed: A Graphical Tool for Sequence Finishing, Genome Res. 8 (1998) 195–202. doi:10.1101/gr.8.3.195.

[135] C. Holt, M. Yandell, MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects, BMC Bioinformatics. 12 (2011) 491. doi:10.1186/1471-2105-12-491.

[136] C. Gao, Y. Wang, Y. Shen, D. Yan, X. He, J. Dai, Q. Wu, Oil accumulation mechanisms of the oleaginous microalga Chlorella protothecoides revealed through its genome, transcriptomes, and proteomes, BMC Genomics. 15 (2014) 582. doi:10.1186/1471-2164- 15-582.

[137] A. Marchler-Bauer, M.K. Derbyshire, N.R. Gonzales, S. Lu, F. Chitsaz, L.Y. Geer, R.C. Geer, J. He, M. Gwadz, D.I. Hurwitz, C.J. Lanczycki, F. Lu, G.H. Marchler, J.S. Song, N. Thanki, Z. Wang, R.A. Yamashita, D. Zhang, C. Zheng, S.H. Bryant, CDD: NCBI’s conserved domain database, Nucleic Acids Res. 43 (2015) D222–D226. doi:10.1093/nar/gku1221.

[138] O. Emanuelsson, H. Nielsen, S. Brunak, G. von Heijne, Predicting Subcellular Localization of Proteins Based on their N-terminal Amino Acid Sequence, J. Mol. Biol. 300 (2000) 1005–1016. doi:10.1006/jmbi.2000.3903.

[139] S. Guindon, J.-F. Dufayard, V. Lefort, M. Anisimova, W. Hordijk, O. Gascuel, New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0, Syst. Biol. 59 (2010) 307–321. doi:10.1093/sysbio/syq010.

[140] C.-C. Lo, P.S.G. Chain, Rapid evaluation and quality control of next generation sequencing data with FaQCs, BMC Bioinformatics. 15 (2014). doi:10.1186/s12859-014- 0366-2.

[141] A. Isogai, NMR analysis of cellulose dissolved in aqueous NaOH solutions, Cellulose. 4 (1997) 99–107. doi:10.1023/A:1018471419692.

[142] U.C.C. of Algae, UTEX 25 Chlorella protothecoides, UTEX Cult. Collect. Algae. (n.d.). https://utex.org/products/utex-0025 (accessed February 19, 2018).

96

[143] E.G. Maxwell, N.J. Belshaw, K.W. Waldron, V.J. Morris, Pectin – An emerging new bioactive food polysaccharide, Trends Food Sci. Technol. 24 (2012) 64–73. doi:10.1016/j.tifs.2011.11.002.

[144] M. Hennig, J.N. Jansonius, A.C. Terwisscha van Scheltinga, B.W. Dijkstra, B. Schlesier, Crystal Structure of Concanavalin B at 1.65 Å Resolution. An “Inactivated” Chitinase from Seeds ofCanavalia ensiformis, J. Mol. Biol. 254 (1995) 237–246. doi:10.1006/jmbi.1995.0614.

[145] P.J. Brown, A.C. Gill, P.G. Nugent, J.H. McVey, F.M. Tomley, Domains of invasion organelle proteins from apicomplexan parasites are homologous with the Apple domains of blood coagulation factor XI and plasma pre-kallikrein and are members of the PAN module superfamily, FEBS Lett. 497 (2001) 31–38.

[146] B. Henrissat, M. Claeyssens, P. Tomme, L. Lemesle, J.-P. Mornon, Cellulase families revealed by hydrophobic cluster analysis, Gene. 81 (1989) 83–95. doi:10.1016/0378- 1119(89)90339-9.

[147] E. Poliner, N. Panchy, L. Newton, G. Wu, A. Lapinsky, B. Bullard, A. Zienkiewicz, C. Benning, S.-H. Shiu, E.M. Farré, Transcriptional coordination of physiological responses in Nannochloropsis oceanicaCCMP1779 under light/dark cycles, Plant J. 83 (2015) 1097–1113. doi:10.1111/tpj.12944.

[148] J. Jia, D. Han, H.G. Gerken, Y. Li, M. Sommerfeld, Q. Hu, J. Xu, Molecular mechanisms for photosynthetic carbon partitioning into storage neutral lipids in Nannochloropsis oceanica under nitrogen-depletion conditions, Algal Res. 7 (2015) 66–77. doi:10.1016/j.algal.2014.11.005.

[149] F. Daboussi, S. Leduc, A. Maréchal, G. Dubois, V. Guyot, C. Perez-michaut, A. Amato, A. Falciatore, A. Juillerat, M. Beurdeley, D.F. Voytas, L. Cavarec, P. Duchateau, Genome engineering empowers the diatom Phaeodactylum tricornutum for biotechnology, Nat. Commun. Lond. 5 (2014) 3831. doi:http://dx.doi.org/10.1038/ncomms4831.

[150] B.-H. Zhu, H.-P. Shi, G.-P. Yang, N.-N. Lv, M. Yang, K.-H. Pan, Silencing UDP-glucose pyrophosphorylase gene in Phaeodactylum tricornutum affects carbon allocation, New Biotechnol. 33 (2016) 237–244. doi:10.1016/j.nbt.2015.06.003.

[151] G. Michel, T. Tonon, D. Scornet, J.M. Cock, B. Kloareg, Central and storage carbon metabolism of the brown alga Ectocarpus siliculosus: insights into the origin and evolution of storage carbohydrates in Eukaryotes, New Phytol. 188 (2010) 67–81. doi:10.1111/j.1469-8137.2010.03345.x.

[152] C. Heiss, J. Stacey Klutts, Z. Wang, T.L. Doering, P. Azadi, The structure of Cryptococcus neoformans galactoxylomannan contains β-d-glucuronic acid, Carbohydr. Res. 344 (2009) 915–920. doi:10.1016/j.carres.2009.03.003.

[153] S.M. Myklestad, Production, Chemical Structure, Metabolism, and Biological Function of the (1→3)-Linked, β3-D-Glucans in Diatoms, Biol. Oceanogr. 6 (1989) 313–326. doi:10.1080/01965581.1988.10749534.

97

[154] S.M. Read, G. Currie, A. Bacic, Analysis of the structural heterogeneity of laminarin by electrospray-ionisation-mass spectrometry, Carbohydr. Res. 281 (1996) 187–201. doi:10.1016/0008-6215(95)00350-9.

[155] M.A. Caballero, D. Jallet, L. Shi, C. Rithner, Y. Zhang, G. Peers, Quantification of chrysolaminarin from the model diatom Phaeodactylum tricornutum, Algal Res. 20 (2016) 180–188. doi:10.1016/j.algal.2016.10.008.

[156] D.M. Stark, K.P. Timmerman, G.F. Barry, J. Preiss, G.M. Kishore, Regulation of the Amount of Starch in Plant Tissues by ADP Glucose Pyrophosphorylase, Science. 258 (1992) 287–292. doi:10.1126/science.258.5080.287.

[157] T. Kurita, Y. Noda, T. Takagi, M. Osumi, K. Yoda, Kre6 Protein Essential for Yeast Cell Wall β-1,6-Glucan Synthesis Accumulates at Sites of Polarized Growth, J. Biol. Chem. 286 (2011) 7429–7438. doi:10.1074/jbc.M110.174060.

[158] T. Mio, T. Yamada-Okabe, T. Yabe, T. Nakajima, M. Arisawa, H. Yamada-Okabe, Isolation of the Candida albicans homologs of Saccharomyces cerevisiae KRE6 and SKN1: expression and physiological function., J. Bacteriol. 179 (1997) 2363–2372.

[159] C. Boone, S.S. Sommer, A. Hensel, H. Bussey, Yeast KRE genes provide evidence for a pathway of cell wall beta-glucan assembly., J. Cell Biol. 110 (1990) 1833–1843. doi:10.1083/jcb.110.5.1833.

[160] J. Pitcher, C. Smythe, P. Cohen, Glycogenin is the priming glucosyltransferase required for the initiation of glycogen biogenesis in rabbit skeletal muscle, Eur. J. Biochem. 176 (1988) 391–395. doi:10.1111/j.1432-1033.1988.tb14294.x.

[161] E.G.V. Percival, A.G. Ross, 156. The constitution of laminarin. Part II. The soluble laminarin of Laminaria digitata, J. Chem. Soc. Resumed. (1951) 720–726. doi:10.1039/JR9510000720.

[162] A. Beattie, E.L. Hirst, E. Percival, Studies on the metabolism of the Chrysophyceae. Comparative structural investigations on leucosin (chrysolaminarin) separated from diatoms and laminarin from the brown algae, Biochem. J. 79 (1961) 531–537. doi:10.1042/bj0790531.

[163] M.C. Wang, S. Bartnicki-Garcia, Mycolaminarans: storage (1→3)-β-D-glucans from the cytoplasm of the fungus phytophthora palmivora, Carbohydr. Res. 37 (1974) 331–338. doi:10.1016/S0008-6215(00)82922-5.

[164] T. Cavalier-Smith, Principles of Protein and Lipid Targeting in Secondary Symbiogenesis: Euglenoid, , and Sporozoan Plastid Origins and the Eukaryote Family Tree1,2, J. Eukaryot. Microbiol. 46 (1999) 347–366. doi:10.1111/j.1550- 7408.1999.tb04614.x.

[165] C.F. Delwiche, J.D. Palmer, The origin of plastids and their spread via secondary symbiosis, in: Orig. Algae Their Plast., Springer, Vienna, 1997: pp. 53–86. doi:10.1007/978-3-7091-6542-3_3.

98

[166] E.M. Trentacoste, R.P. Shrestha, S.R. Smith, C. Glé, A.C. Hartmann, M. Hildebrand, W.H. Gerwick, Metabolic engineering of lipid increases microalgal lipid accumulation without compromising growth, Proc. Natl. Acad. Sci. 110 (2013) 19748– 19753. doi:10.1073/pnas.1309299110.

[167] O. Shalem, N.E. Sanjana, F. Zhang, High-throughput functional genomics using CRISPR–Cas9, Nat. Rev. Genet. 16 (2015) 299–311. doi:10.1038/nrg3899.

[168] J.A. Doudna, E. Charpentier, The new frontier of genome engineering with CRISPR- Cas9, Science. 346 (2014) 1258096. doi:10.1126/science.1258096.

[169] P.D. Hsu, E.S. Lander, F. Zhang, Development and Applications of CRISPR-Cas9 for Genome Engineering, Cell. 157 (2014) 1262–1278. doi:10.1016/j.cell.2014.05.010.

[170] W. Jiang, A.J. Brueggeman, K.M. Horken, T.M. Plucinak, D.P. Weeks, Successful Transient Expression of Cas9 and Single Guide RNA Genes in Chlamydomonas reinhardtii, Eukaryot. Cell. 13 (2014) 1465–1469. doi:10.1128/EC.00213-14.

[171] I. Ajjawi, J. Verruto, M. Aqui, L.B. Soriaga, J. Coppersmith, K. Kwok, L. Peach, E. Orchard, R. Kalb, W. Xu, T.J. Carlson, K. Francis, K. Konigsfeld, J. Bartalis, A. Schultz, W. Lambert, A.S. Schwartz, R. Brown, E.R. Moellering, Lipid production in Nannochloropsis gaditana is doubled by decreasing expression of a single transcriptional regulator, Nat. Biotechnol. 35 (2017) 647–652. doi:10.1038/nbt.3865.

[172] E.K. Bomati, J.E. Haley, J.P. Noel, D.D. Deheyn, Spectral and structural comparison between bright and dim green fluorescent proteins in Amphioxus, Sci. Rep. 4 (2014) 5469. doi:10.1038/srep05469.

[173] W. Jiang, H. Zhou, H. Bi, M. Fromm, B. Yang, D.P. Weeks, Demonstration of CRISPR/Cas9/sgRNA-mediated targeted gene modification in Arabidopsis, tobacco, sorghum and rice, Nucleic Acids Res. 41 (2013) e188–e188. doi:10.1093/nar/gkt780.

[174] J.G. Doench, N. Fusi, M. Sullender, M. Hegde, E.W. Vaimberg, K.F. Donovan, I. Smith, Z. Tothova, C. Wilen, R. Orchard, H.W. Virgin, J. Listgarten, D.E. Root, Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9, Nat. Biotechnol. 34 (2016) 184–191. doi:10.1038/nbt.3437.

[175] E. Granum, S.M. Myklestad, A simple combined method for determination of β-1,3-glucan and cell wall polysaccharides in diatoms, Hydrobiologia. 477 (2002) 155–161. doi:10.1023/A:1021077407766.

[176] D.L. Morris, Quantitative determination of carbohydrates with Dreywood’s anthrone reagent., Sci. Wash. 107 (1948) 254–255.

[177] W.Z. Jiang, D.P. Weeks, A gene-within-a-gene Cas9/sgRNA hybrid construct enables gene editing and gene replacement strategies in Chlamydomonas reinhardtii, Algal Res. 26 (2017) 474–480. doi:10.1016/j.algal.2017.04.001.

[178] T. Roemer, H. Bussey, Yeast beta-glucan synthesis: KRE6 encodes a predicted type II membrane protein required for glucan synthesis in vivo and for glucan synthase activity in vitro., Proc. Natl. Acad. Sci. U. S. A. 88 (1991) 11295–11299.

99

[179] M. Gründel, R. Scheunemann, W. Lockau, Y. Zilliges, Impaired glycogen synthesis causes metabolic overflow reactions and affects stress responses in the cyanobacterium Synechocystis sp. PCC 6803, Microbiology. 158 (2012) 3032–3043. doi:10.1099/mic.0.062950-0.

[180] C. Bock, L. Krienitz, T. Pröschold, Taxonomic reassessment of the genus Chlorella (Trebouxiophyceae) using molecular signatures (barcodes), including description of seven new species., Fottea. 11 (2011) 293–312. doi:10.5507/fot.2011.028.

[181] A. Rekas, J.-R. Alattia, T. Nagai, A. Miyawaki, M. Ikura, Crystal Structure of Venus, a Yellow Fluorescent Protein with Improved Maturation and Reduced Environmental Sensitivity, J. Biol. Chem. 277 (2002) 50573–50578. doi:10.1074/jbc.M209524200.

[182] Wei Li, Xin Yi, Wang Qintao, Yang Juan, Hu Hanhua, Xu Jian, RNAi based targeted gene knockdown in the model oleaginous microalgae Nannochloropsis oceanica, Plant J. 89 (2017) 1236–1250. doi:10.1111/tpj.13411. ‐

[183] X. Ma, L. Yao, B. Yang, Y.K. Lee, F. Chen, J. Liu, RNAi-mediated silencing of a pyruvate dehydrogenase kinase enhances triacylglycerol biosynthesis in the oleaginous marine alga Nannochloropsis salina, Sci. Rep. 7 (2017) 11485. doi:10.1038/s41598-017-11932- 4.

[184] C.W. Gee, K.K. Niyogi, The carbonic anhydrase CAH1 is an essential component of the carbon-concentrating mechanism in Nannochloropsis oceanica, Proc. Natl. Acad. Sci. 114 (2017) 4537–4542. doi:10.1073/pnas.1700139114.

[185] B. Gschloessl, Y. Guermeur, J.M. Cock, HECTAR: A method to predict subcellular targeting in heterokonts, BMC Bioinformatics. 9 (2008) 393. doi:10.1186/1471-2105-9- 393.

[186] K.E. Apt, L. Zaslavkaia, J.C. Lippmeier, M. Lang, O. Kilian, R. Wetherbee, A.R. Grossman, P.G. Kroth, In vivo characterization of diatom multipartite plastid targeting signals, J. Cell Sci. 115 (2002) 4061–4069. doi:10.1242/jcs.00092.

100

APPENDIX A

SUPPLEMENTAL MATERIALS

A.1. Targeting thioesterase to the plastid of N. gaditana A.1.1. Identifying bidirectional promoter with plastid-targeting signal peptides Bidirectional promoters were identified from our compiled promoter data, filtered for promoters driving genes in both directions. For this list of bidirectional promoters, the genes on either side were identified by VLOOKUP function to correlate gene ID (Nga#####) to annotation. Keywords suggesting a plastid localization of a gene (e.g., plastid, photo, light, chlorophyll) were then used to identify likely genes. Promoters driving two potentially plastid-localized genes were then sorted for transcript level of the gene on either side, with a preference for highly-expressed gene products. The top promoters identified in this list were analyzed more closely, and the predicted protein products processed through HECTAR (http://webtools.sb-roscoff.fr/) to confirm predicted signal peptides and plastid localization [185]. The winner identified from this search drove two predicted chlorophyll a-binding proteins. A.1.2. Construction of the overexpression plasmid An overexpression plasmid was designed using the UEP-Sh ble cassette validated by earlier work [2]. Following this, the identified bidirectional promoter was designed to drive a yellow fluorescent protein (Venus [181]) in one direction, and a C12 thioesterase (UcFatB1 from Umbellularia californica) in the other. Each of these proteins was inserted after the predicted bipartite plastid targeting sequence [186], conservatively including a few additional amino acids. This plasmid was synthesized by DNA2.0 (now ATUM) and transformed into N. gaditana CCMP526 as described previously [2]. A.1.3. Fluorescence analysis by spectrophotometer Successful transformants were restreaked on antibiotic plates. Viable transformants were inoculated into liquid f/2 half-salt media with antibiotic and grown under constant light and agitation. Fluorescence from these cultures was analyzed on a

101

NanoLog Spectrofluorometer (Horiba, Kyoto, Japan) with a 515 nm excitation, 530 nm emission, and 2 mm slit width. No culture displayed significant increase in fluorescence relative to wildtype. To investigate further, all cultures were investigated by fluorescence microscopy on a Nikon Eclipse 80i, using a 41017 Endow GFP bandpass GFP filter set (Chroma Technology, Bellows Falls, Vermont, USA). A fluorescence scan method was ultimately able to identify one clone with detectable fluorescence over wild-type, by normalizing fluorescence signal by optical density of the culture and subtracting wildtype fluorescence from each mutant. By this method, one isolate was ultimately identified with a fluorescence emission peak at the expected 530 m wavelength (Figure A.1).

30 Wildtype 25 C12 Mut 20 C12 Mut 20 (Corrected) 20

15

10

5 Fluorescence intensity (MCPS) intensity Fluorescence

0 525 530 535 540 545 550 555 560 Wavelength (nm)

Figure A.1. After normalization by optical density, the fluorescence signal of Venus could be detected by subtraction of the chlorophyll autofluorescence from wild-type culture in one isolated mutant. The expected peak emission of Venus is at 528 nm.

A.1.4. Transcription analysis In the absence of protein confirmation for most cultures, we sought to verify RNA transcription. Using primers designed against our inserted transgenes (both the

102 fluorescent protein and the thioesterase), we tested extracted RNA for presence of our transcripts. Previous work demonstrated very poor yields of RNA extraction from Nannochloropsis [45], suggesting either incomplete lysis or otherwise significant degradation. We improved extraction under the assumption that lysis was the limiting factor. Liquid cultures were pelleted and resuspended in 1 ml TEN buffer (10 mM Tris–

HCl, 10 mM Na2EDTA, 150 mM NaCl). This concentrate was transferred to tubes with approximately 100 mg of garnets (BioSpec Products catalogue number 11079103gar). The tubes were centrifuged, and the suspension buffer removed. Ice cold methanol (100%) was added to the tubes and they were vortexed for 5-10 minutes to shear the cells, then incubated on ice for 5-10 minutes more. The insoluble nucleic acids, proteins, and cell debris were pelleted by centrifugation at 20,000 rcf for 10 min at 4 °C. The methanol was removed and replaced, and the process repeated until the pellet was grey in color, indicating extraction of the methanol-soluble chlorophyll from the cells. The lysate was resuspended in 150 µl ice-cold, nuclease-free water, followed by addition of 350 µl SDS-EB (2% SDS, 400 mM NaCl, 40 mM Na2EDTA, 100 mM Tris– HCl). The solution was vortexed once more and incubated on ice for another 5 min to dissolve lysate. Proteins were extracted by addition of 500 µl phenol:chloroform:isoamyl alcohol (25:24:1). Following mixing, incubation, and centrifugation, the aqueous phase was transferred to a new tube. This was repeated twice more with 500 µl chloroform:isoamyl alcohol (24:1). Finally, remaining salts were extracted by ethanol precipitation, and the pellet resuspended in RNase-free water. Amplification of transgenes from generated cDNA validated successful transcription in multiple mutants, and reverse transcriptase-negative controls showed no amplification, eliminating the possibility of genomic DNA contamination.

103

8 30%

7 25%

6 EBFP2 20% 5 ECFP mCerulean 4 15% sfGFP mCitrine

RFU 100000x RFU 3 mKO 10% dTomato 2 mCherry 5% % Trans. 1

0 0% 400 450 500 550 600 650 Wavelength (nm)

Figure A.2. Fluorescence scans of wild-type N. gaditana to identify suitable fluorescent reporters and optimal excitation and emission wavelength settings. For each FP, the peak excitation wavelength was used and the emission spectrum scanned. The vertical bars indicate the reported peak emission wavelength for each FP. The colored curves indicate the photosynthetic autofluorescence of wild-type N. gaditana. A dotted line traces the % transmittance of wild-type N. gaditana at each wavelength. The ideal reporter has the vertical bar intersect its matching curve at a low fluorescence value, and the dotted line at a high value, so that the light-harvesting complex is not quenching the fluorescent signal. On the other hand, higher wavelength FPs are associated with lower quantum efficiencies (lower signal).

104

Figure A.3. Polyketide synthases identified in Nannochloropsis gaditana CCMP526. A canonical type I fatty acid synthase (FAS) is shown at right, indicating the essential domains. To the right of each domain cartoon is a number representing the relative transcript coverate from combined N+ and N- transcriptome datasets [2]. PKS1 is essentially a type 1 FAS. PKS4-6 are very similar but terminate with a short-chain dehydrogenase/reductase domain (SDR) rather than the thioesterase domain. The highest expressed PKS transcript (PKS6) has been knocked out, but shares high sequence similarity with PKS4 and is structurally very similar to PKS5. This gene redundancy may make knockout effects difficult to identify. The next target is PKS3, which has a unique arrangement of domains and is the next-highest expressed transcript. PKS2 is highly repetitive, with repeat regions having very little difference, complicating assembly.

105

APPENDIX B

PERMISSIONS

Brian Vogler Wed, Jun 6, 2018 at 12:02 PM Reply-To: [email protected] To: "Barry, Amanda N" Cc: "Rollin, Joseph" , "[email protected]" , "Sudasinghe Appuhamilage, Nilusha Maduwanthi" , "Starkenburg, Shawn Robert" , "Schambach, Jenna Young"

Dear Co-authors,

I am writing to obtain permission from you to include this work in my doctoral thesis materials. If you agree, please reply to this email. For record keeping purposes, the article I am referring to is:

Brian W. Vogler, Shawn R. Starkenburg, Nilusha Sudasinghe, Jenna Y. Schambach, Joseph A. Rollin, Sivakumar Pattathil, and Amanda N. Barry, Characterization of plant carbon substrate utilization by Auxenochlorella protothecoides, Algal Research

Thank you in advance, and please let me know if you have any questions.

Very sincerely,

Brian Vogler

Barry, Amanda N Wed, Jun 6, 2018 at 11:36 AM To: "[email protected]" OK, then great!

Amanda

From: Brian Vogler Sent: Wednesday, June 6, 2018 11:34:59 AM [Quoted text hidden]

[Quoted text hidden]

Sivakumar Pattathil Wed, Jun 6, 2018 at 12:03 PM

106

To: [email protected] Cc: "Barry, Amanda N" , "Rollin, Joseph" , "Sudasinghe Appuhamilage, Nilusha Maduwanthi" , "Starkenburg, Shawn Robert" , "Schambach, Jenna Young"

Yes, I approve. [Quoted text hidden]

Rollin, Joseph Wed, Jun 6, 2018 at 12:04 PM To: "[email protected]" Cc: "Barry, Amanda N"

Good to go Brian. Good luck!

Joe

From: Brian Vogler Reply-To: "[email protected]" Date: Wednesday, June 6, 2018 at 12:02 PM To: "Barry, Amanda N" Cc: "Rollin, Joseph" , "[email protected]" , "Sudasinghe Appuhamilage, Nilusha Maduwanthi" , "Starkenburg, Shawn Robert" , "Schambach, Jenna Young" Subject: Re: Resubmission to Algal Research

Dear Co-authors,

[Quoted text hidden]

Schambach, Jenna Young Wed, Jun 6, 2018 at 1:04 PM To: "[email protected]" Yes, you have my permission, good luck!

From: Brian Vogler Sent: Wednesday, June 6, 2018 12:02:11 PM To: Barry, Amanda N Cc: Rollin, Joseph; [email protected]; Sudasinghe Appuhamilage, Nilusha Maduwanthi; Starkenburg, Shawn Robert; Schambach, Jenna Young Subject: Re: Resubmission to Algal Research

[Quoted text hidden]

Starkenburg, Shawn Robert Thu, Jun 7, 2018 at 8:55 AM To: "[email protected]"

107

Definitely agree.

Shawn

From: Sivakumar Pattathil Date: Wednesday, June 6, 2018 at 12:03 PM To: "[email protected]" Cc: "Barry, Amanda N" , "Rollin, Joseph" , "Sudasinghe Appuhamilage, Nilusha Maduwanthi" , "Starkenburg, Shawn Robert" , "Schambach, Jenna Young" Subject: Re: Resubmission to Algal Research

Yes, I approve.

[Quoted text hidden]

Brian Vogler Wed, Jun 6, 2018 at 1:50 PM Reply-To: [email protected] To: Victoria Work , Fiona Davies , Matthew Scholz , [email protected], Huiya Gu , "Franks, Dylan" , Lee Stanish , Robert Jinkerson , Matthew Posewitz

Hello Co-authors,

I am writing to obtain your permission to include in my doctoral thesis materials from a publication we coauthored together. If you agree, please reply to this email.

The material, including figures and text, is from the following publication:

Work, .H., Bentley, F.K., Scholz, M.J., D’Adamo, S., Gu, H., ogler, B.W., Franks, D.T., Stanish, L.F., Jinkerson, R.E., and Posewitz, M.C. (2013). Biocommodities from photosynthetic microorganisms. Environmental Progress & Sustainable Energy 32, 989–1001.

Thank you in advance, and please let me know if you have any questions.

Sincerely, Brian Vogler

Robert Jinkerson Wed, Jun 6, 2018 at 1:54 PM

108

To: Brian Vogler

Dear Brian,

You have my permission to include our coauthored material in your thesis.

-Robert

Robert Jinkerson Assistant Professor Department of Chemical and Environmental Engineering Department of Botany and Plant Sciences University of California, Riverside Riverside, CA 92521 Office: 951-827-2271 JinkersonLab.ucr.edu [email protected] [Quoted text hidden]

Fiona Davies Wed, Jun 6, 2018 at 2:01 PM To: [email protected]

Hi Brian,

Yes, I agree to you using this material.

Thanks, Fiona [Quoted text hidden]

Adamo, Sarah d Wed, Jun 6, 2018 at 2:04 PM To: "[email protected]"

Dear Brian, no doubts, you have my permission to include any material from the publication we coauthored.

All the best, Sarah D’Adamo [Quoted text hidden]

Huiya Gu Wed, Jun 6, 2018 at 2:21 PM To: [email protected]

Brian,

You have my permission on this.

Congratulations on your thesis.

Huiya [Quoted text hidden]

Wed, Jun 6, 2018 at 2:38 Matthew Scholz PM To: "[email protected]" , Victoria Work , Fiona Davies

109

, "[email protected]" , Huiya Gu , "Franks, Dylan" , Lee Stanish , Robert Jinkerson , Matthew Posewitz

Hmmm… k . …. ….I g g. S .

Good luck!

Matt

Matt Scholz, Ph.D. Program Manager, Sustainable Phosphorus Alliance Senior Sustainability Scientist, ASU's Julie Ann Wrigley Global Institute of Sustainability

O 480.727.4271 | C 520.312.0448 | E [email protected] https://phosphorusalliance.org/ | @SustainP

From: Brian Vogler Sent: Wednesday, June 6, 2018 12:50 PM To: Victoria Work ; Fiona Davies ; Matthew Scholz ; [email protected]; Huiya Gu ; Franks, Dylan ; Lee Stanish ; Robert Jinkerson ; Matthew Posewitz Subject: Permission to include coauthored material in my thesis

[Quoted text hidden]

Victoria Work Wed, Jun 6, 2018 at 3:04 PM To: Brian Vogler

Yes! Congratulations, Brian!

Victoria [Quoted text hidden]

Franks, Dylan Wed, Jun 6, 2018 at 4:00 PM To: [email protected]

110

Go for it!

Good luck and see you soon. Enjoy the camping trip! [Quoted text hidden] -- Dylan Franks Doctoral Candidate · Algal Ecophysiology Dept. of Plant Biology, Ecology and Evolution Oklahoma State University · 301 Physical Science Stillwater, OK 74078

Lee Stanish Mon, Jun 11, 2018 at 3:43 PM To: Brian Vogler Cc: Victoria Work , Fiona Davies , Matthew Scholz , [email protected], Huiya Gu , "Franks, Dylan" , Robert Jinkerson , Matthew Posewitz

Of course! Thanks for asking, and best of luck wrapping up your thesis. Best, -Lee [Quoted text hidden]

JOHN WILEY AND SONS LICENSE TERMS AND CONDITIONS Jun 11, 2018

This Agreement between Brian Vogler ("You") and John Wiley and Sons ("John Wiley and Sons") consists of your license details and the terms and conditions provided by John Wiley and Sons and Copyright Clearance Center.

License Number 4363820844014 License date Jun 07, 2018 Licensed Content John Wiley and Sons Publisher Licensed Content Environmental Progress & Sustainable Energy Publication Licensed Content Title Biocommodities from photosynthetic microorganisms Licensed Content Author Victoria H. Work, Fiona K. Bentley, Matthew J. Scholz, et al Licensed Content Date Sep 10, 2013 Licensed Content Volume 32 Licensed Content Issue 4 Licensed Content Pages 13

111

Type of Use Dissertation/Thesis Requestor type Author of this Wiley article Format Print and electronic Portion Full article Will you be translating? No Title of your thesis / DEVELOPMENT OF CRISPR/CAS9 IN NANNOCHLOROPSIS AND OTHER dissertation ALGAE TOWARD UNDERSTANDING AND MANIPULATING ENERGY ALLOCATION Expected completion Aug 2018 date Expected size (number of 120 pages) Requestor Location Brian Vogler 4922 Meade St

DENVER, CO 80221 United States Attn: Brian Vogler Publisher Tax ID EU826007151 Total 0.00 USD Terms and Conditions TERMS AND CONDITIONS

This copyrighted material is owned by or exclusively licensed to John Wiley & Sons, Inc. or one of its group companies (each a"Wiley Company") or handled on behalf of a society with which a Wiley Company has exclusive publishing rights in relation to a particular work (collectively "WILEY"). By clicking "accept" in connection with completing this licensing transaction, you agree that the following terms and conditions apply to this transaction (along with the billing and payment terms and conditions established by the Copyright Clearance Center Inc., ("CCC's Billing and Payment terms and conditions"), at the time that you opened your RightsLink account (these are available at any time at http://myaccount.copyright.com).

Terms and Conditions

• The materials you have requested permission to reproduce or reuse (the "Wiley Materials") are protected by copyright.

• You are hereby granted a personal, non-exclusive, non-sub licensable (on a stand-alone basis), non- transferable, worldwide, limited license to reproduce the Wiley Materials for the purpose specified in the licensing process. This license, and any CONTENT (PDF or image file) purchased as part of your order, is for a one-time use only and limited to any maximum distribution number specified in the license. The first instance of republication or reuse granted by this license must be completed within two years of the date of the grant of this license (although copies prepared before the end date may be distributed thereafter). The Wiley Materials shall not be used in any other manner or for any other purpose, beyond what is granted in the license. Permission is granted subject to an appropriate acknowledgement given to the author, title of the material/book/journal and the publisher. You shall also duplicate the copyright notice that appears in the Wiley publication in your use of the Wiley Material. Permission is also granted on the understanding that nowhere in the text is a previously published source acknowledged for all or part of this Wiley Material. Any third party content is expressly excluded from this permission.

112

• With respect to the Wiley Materials, all rights are reserved. Except as expressly granted by the terms of the license, no part of the Wiley Materials may be copied, modified, adapted (except for minor reformatting required by the new Publication), translated, reproduced, transferred or distributed, in any form or by any means, and no derivative works may be made based on the Wiley Materials without the prior permission of the respective copyright owner.For STM Signatory Publishers clearing permission under the terms of the STM Permissions Guidelines only, the terms of the license are extended to include subsequent editions and for editions in other languages, provided such editions are for the work as a whole in situ and does not involve the separate exploitation of the permitted figures or extracts, You may not alter, remove or suppress in any manner any copyright, trademark or other notices displayed by the Wiley Materials. You may not license, rent, sell, loan, lease, pledge, offer as security, transfer or assign the Wiley Materials on a stand-alone basis, or any of the rights granted to you hereunder to any other person.

• The Wiley Materials and all of the intellectual property rights therein shall at all times remain the exclusive property of John Wiley & Sons Inc, the Wiley Companies, or their respective licensors, and your interest therein is only that of having possession of and the right to reproduce the Wiley Materials pursuant to Section 2 herein during the continuance of this Agreement. You agree that you own no right, title or interest in or to the Wiley Materials or any of the intellectual property rights therein. You shall have no rights hereunder other than the license as provided for above in Section 2. No right, license or interest to any trademark, trade name, service mark or other branding ("Marks") of WILEY or its licensors is granted hereunder, and you agree that you shall not assert any such right, license or interest with respect thereto

• NEITHER WILEY NOR ITS LICENSORS MAKES ANY WARRANTY OR REPRESENTATION OF ANY KIND TO YOU OR ANY THIRD PARTY, EXPRESS, IMPLIED OR STATUTORY, WITH RESPECT TO THE MATERIALS OR THE ACCURACY OF ANY INFORMATION CONTAINED IN THE MATERIALS, INCLUDING, WITHOUT LIMITATION, ANY IMPLIED WARRANTY OF MERCHANTABILITY, ACCURACY, SATISFACTORY QUALITY, FITNESS FOR A PARTICULAR PURPOSE, USABILITY, INTEGRATION OR NON-INFRINGEMENT AND ALL SUCH WARRANTIES ARE HEREBY EXCLUDED BY WILEY AND ITS LICENSORS AND WAIVED BY YOU.

• WILEY shall have the right to terminate this Agreement immediately upon breach of this Agreement by you.

• You shall indemnify, defend and hold harmless WILEY, its Licensors and their respective directors, officers, agents and employees, from and against any actual or threatened claims, demands, causes of action or proceedings arising from any breach of this Agreement by you.

• IN NO EVENT SHALL WILEY OR ITS LICENSORS BE LIABLE TO YOU OR ANY OTHER PARTY OR ANY OTHER PERSON OR ENTITY FOR ANY SPECIAL, CONSEQUENTIAL, INCIDENTAL, INDIRECT, EXEMPLARY OR PUNITIVE DAMAGES, HOWEVER CAUSED, ARISING OUT OF OR IN CONNECTION WITH THE DOWNLOADING, PROVISIONING, VIEWING OR USE OF THE MATERIALS REGARDLESS OF THE FORM OF ACTION, WHETHER FOR BREACH OF CONTRACT, BREACH OF WARRANTY, TORT, NEGLIGENCE, INFRINGEMENT OR OTHERWISE (INCLUDING, WITHOUT LIMITATION, DAMAGES BASED ON LOSS OF PROFITS, DATA, FILES, USE, BUSINESS OPPORTUNITY OR CLAIMS OF THIRD PARTIES), AND WHETHER OR NOT THE PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THIS LIMITATION SHALL APPLY NOTWITHSTANDING ANY FAILURE OF ESSENTIAL PURPOSE OF ANY LIMITED REMEDY PROVIDED HEREIN.

• Should any provision of this Agreement be held by a court of competent jurisdiction to be illegal, invalid, or unenforceable, that provision shall be deemed amended to achieve as nearly as possible the same economic effect as the original provision, and the legality, validity and enforceability of the remaining provisions of this Agreement shall not be affected or impaired thereby.

• The failure of either party to enforce any term or condition of this Agreement shall not constitute a waiver of either party's right to enforce each and every term and condition of this Agreement. No breach under this agreement shall be deemed waived or excused by either party unless such waiver or consent is in writing signed by the party granting such waiver or consent. The waiver by or consent

113

of a party to a breach of any provision of this Agreement shall not operate or be construed as a waiver of or consent to any other or subsequent breach by such other party.

• This Agreement may not be assigned (including by operation of law or otherwise) by you without WILEY's prior written consent.

• Any fee required for this permission shall be non-refundable after thirty (30) days from receipt by the CCC.

• These terms and conditions together with CCC's Billing and Payment terms and conditions (which are incorporated herein) form the entire agreement between you and WILEY concerning this licensing transaction and (in the absence of fraud) supersedes all prior agreements and representations of the parties, oral or written. This Agreement may not be amended except in writing signed by both parties. This Agreement shall be binding upon and inure to the benefit of the parties' successors, legal representatives, and authorized assigns.

• In the event of any conflict between your obligations established by these terms and conditions and those established by CCC's Billing and Payment terms and conditions, these terms and conditions shall prevail.

• WILEY expressly reserves all rights not specifically granted in the combination of (i) the license details provided by you and accepted in the course of this licensing transaction, (ii) these terms and conditions and (iii) CCC's Billing and Payment terms and conditions.

• This Agreement will be void if the Type of Use, Format, Circulation, or Requestor Type was misrepresented during the licensing process.

• This Agreement shall be governed by and construed in accordance with the laws of the State of New York, USA, without regards to such state's conflict of law rules. Any legal action, suit or proceeding arising out of or relating to these Terms and Conditions or the breach thereof shall be instituted in a court of competent jurisdiction in New York County in the State of New York in the United States of America and each party hereby consents and submits to the personal jurisdiction of such court, waives any objection to venue in such court and consents to service of process by registered or certified mail, return receipt requested, at the last known address of such party.

WILEY OPEN ACCESS TERMS AND CONDITIONS

Wiley Publishes Open Access Articles in fully Open Access Journals and in Subscription journals offering Online Open. Although most of the fully Open Access journals publish open access articles under the terms of the Creative Commons Attribution (CC BY) License only, the subscription journals and a few of the Open Access Journals offer a choice of Creative Commons Licenses. The license type is clearly identified on the article.

The Creative Commons Attribution License

The Creative Commons Attribution License (CC-BY) allows users to copy, distribute and transmit an article, adapt the article and make commercial use of the article. The CC-BY license permits commercial and non-

Creative Commons Attribution Non-Commercial License

The Creative Commons Attribution Non-Commercial (CC-BY-NC)License permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.(see below)

Creative Commons Attribution-Non-Commercial-NoDerivs License

114

The Creative Commons Attribution Non-Commercial-NoDerivs License (CC-BY-NC-ND) permits use, distribution and reproduction in any medium, provided the original work is properly cited, is not used for commercial purposes and no modifications or adaptations are made. (see below)

Use by commercial "for-profit" organizations

Use of Wiley Open Access articles for commercial, promotional, or marketing purposes requires further explicit permission from Wiley and will be subject to a fee.

Further details can be found on Wiley Online Library http://olabout.wiley.com/WileyCDA/Section/id-410895.html

115

APPENDIX C

LAB-READY METHODS

Method 1. DNA extraction from N. gaditana and other robust organisms with garnets in MeOH.

1. Pellet culture by centrifugation

o Use the microfunnel® to add garnets to a new tube to approximately the 0.1 mL

mark during this period

2. Resuspend cells in 1 mL TEN Buffer (10 mM Tris–HCl, 10 mM Na2EDTA, 150 mM NaCl)

o DI water is fine here

3. Transfer suspension to 1.7 mL tube with garnets added

4. Pellet cells by centrifugation at 18,000 rcf for 1 min

5. Discard supernatant

6. Add 500 µl 100% methanol

7. Vortex to resuspend and incubate on ice for 5-10 min

8. Pellet insoluble fraction by centrifugation at 20,000 rcf for 5 min

9. Discard supernatant

10. Repeat steps 6-9 two more times (for a total of 3 methanol washes/extractions)

11. Resuspend lysate in 150 µl nuclease-free H2O

12. Add 350 µl SDS-EB (2% SDS, 400 mM NaCl, 40 mM Na2EDTA, 100 mM Tris–HCl)

13. Vortex cells to mix and incubate at RT for 5 min to further lyse cells

14. Add 500 µl phenol:chloroform:isoamyl alcohol (25:24:1)

15. Shake vigorously for 5-10 sec by hand and vortex for 5-10 sec to mix further

16. Incubate on ice for 5 min

17. Separate phases by centrifugation at 12000 rcf for 5 min

18. Transfer aqueous phase (roughly 500 µL) to a PLG Heavy tube

116

19. Add 500 µl chloroform:isoamyl alcohol (24:1)

20. Vortex for 5-10 sec to mix, and incubate on ice for 5 min

21. Separate phases by centrifugation at 12000 rcf for 5 min

22. Repeat steps 19-21

23. Transfer upper phase to a new tube

24. Add 50 µL 3 M NaOAc and vortex to mix (do not pipette)

25. 1000 µL ethanol (200 proof) and vortex to mix (do not pipette

26. Incubate at -80 °C for 1 hr

o Chill the centrifuge rotor at -20 °C during this period

27. Centrifuge at max speed (>16000 rcf) for 15 min

28. Discard supernatant, being careful not to disturb pellet

29. Resuspend pellet in 70-75% ethanol

30. Centrifuge at max speed for 15 min

31. Discard supernatant, being careful not to disturb pellet

32. Allow sample to air dry for 5 min

33. Resuspend pellet in 100 ul IDTE (or other solvent of choice)

Method 2. Single guide RNA design using Benchling and Geneious.

1. Identify coding sequence of your gene of interest and annotate it, including >1 kb of

genomic sequence on either side, if possible

2. Export this annotated sequence as a GenBank flat file (.gb), removing other annotations

for simplicity

3. Login to Benchling (use [email protected] Google login or create your own)

o Benchling may drag you through a tutorial if you create your own account

4. Open Projects tab on the left (looks like a briefcase)

117

5. Create Project by clicking the plus (+) tile at the top

6. Name your project effectively (suggest including gene target and organism)

7. Click the plus tile again to Create … and select Import DNA sequences

8. Find your saved GenBank file and import it

9. Click the CRISPR button to Design and Analyze Guides

10. In the Guide Parameters dialogue, leave Single guide as the Design Type, 20 as the

Guide Length, and NGG s the PAM

o For green alga, use Chlamydomonas reinhardtii as the Genome

118

o For stramenopiles, use Phaeodactylum tricornutum as the Genome

11. Select Linear Map tab from the top right

12. Select a CDS from the linear map

13. Open the Design CRISPR tab again

14. Click the green plus tile to Create

15. Repeat steps 11-14 for any remaining CDS elements

16. Above the table of available target sequences, click On-Target Score to sort entries by

predicted cutting efficiency

17. Use your judgement to identify 2-3 targets

o Optimizing a high On-Target Score and Position such that you are not targeting a

site so late in the sequence that an insertion would have limited effect

18. Add these CRISPR sites to your annotated file in Geneious

19. Design EnGen sgRNA synthesis primers using the Excel formula:

o ="TTCTAATACGACTCACTATA"&IF(LEFT(cell,1)="G","","G")&cell&"GTTTTAGAGC

TAGA", where cell is a reference to the target sequence

20. Design primers to amplify the target substrate DNA

119

Method 3. Target substrate primer design.

1. In Geneious, shift+select both CRISPR target sequences

2. Click the Primers button in the top ribbon and Design New Primers ...

3. Adjust the Primer Characteristics to the following:

120

4. Expand the Target Region such that the fragments resulting from a cut at either CRISPR

site will result in a discernibly smaller band on gel electrophoresis (e.g., if the product is 1

kb, ensure that each primer is at least 150 bp away from both CRISPR sites, so that the

larger resulting fragment is only 900 bp)

o Expand the region by subtracting from the left limit and adding to the right (e.g., in

the above example, change 747 to 647 and 1402 to 1502)

5. Expand the Included Region such that the software has >150 bp of sequence available to

search for primers

o From the Target Region, subtract 150 from the left limit and add 150 to the right limit

(e.g., in the example, change 747 to 1252 and 1402 to 1650)

6. Press OK to generate 10 primer sets

7. Click on the Annotations tab to view a table of the annotations

8. Delete the generated primers with Hairpin Tms (if they all have hairpins, leave at least

two unique forward and reverse primers with the lowest Tms)

o If the Hairpin Tm column is not shown, open the Columns dropdown and activate

the Hairpin Tm menu item

121

o The Self Dimer Tm is also worth looking at, though typically not an issue

2. If there are still more than two forward and reverse primers available, select two unique

primers from the remaining and rename them (keep the others for now)

3. Test these primer pairs for Pair Dimers by first pairing them in Geneious (if they are not

already paired)

o Ctrl+select the forward and reverse primers to be paired

o Click the Primers button from the top ribbon and select Characteristics for

Selection …

o Make sure Pair is selected and press OK

4. Repeat for the second, nested set

5. If the Pair Dimer Tm is too high, try a different pair

6. If the two pairs of primers look good, delete the remaining primers

Method 4. Cas9 guide synthesis using NEB EnGen sgRNA Synthesis Kit.

Required materials

• Dilute primers at 1 μM with EnGen additions (later referred to as Target-specific DNA

Oligo)

o Design primers using Benchling (walkthrough)

o To hydrate primer, add 10 μl per nmol (e.g., add 273 μl nuclease-free water to 27.3

nmol primer)

o To prepare 1 μM working stock, dilute 2 μl of 100 μM stock with 1 8 μl nuclease-free

water

• EnGen® sgRNA Synthesis Kit, S. pyogenes with 2X reaction mixture thawed on ice

• Nuclease-free water

122

Procedure

1. Thaw EnGen 2X sgRNA Reaction Mix, S. pyogenes and Target-specific DNA Oligo (1

μM)

2. Mix and pulse-spin each component in a microfuge prior to use

o Store enzyme mix on ice but assemble reaction at room temperature

3. Assemble the reaction at room temperature in a PCR tube strip in the order listed

o Avoid master mixes, and add the enzyme last to each reaction

Reagent Per rxn

Nuclease-free water 3 μl

EnGen 2X sgRNA Reaction Mix, S. pyogenes 10 μl

Target-specific DNA Oligo (1 μM) 5 μl

EnGen sgRNA Enzyme Mix 2 μl

Total volume 20 μl

4. Mix thoroughly and pulse-spin in a microfuge

5. Incubate at 37 °C for 30 minutes

6. Transfer reaction to ice

7. For DNase treatment bring volume to 50 μl by adding 30 μl of nuclease-free water

8. Add 2 μl of DNase I (RNase-free, provided), mix and incubate at 37 °C for 15 minutes

9. Proceed with purification of RNA by RNA Concentrator column (ZYMO)

10. Add 2 volumes (100 μl) RNA Binding Buffer to each sample and mix

11. Add an equal volume (150 μl) of ethanol (95-100%) and mix

12. Transfer the sample to a Zymo-Spin™ IIC Column in a collection tube and centrifuge for

30 seconds

123

13. Discard flow-through

14. Add 00 μl RNA Prep Buffer to the column and centrifuge for 30 seconds

15. Discard the flow-through

16. Add 700 μl RNA Wash Buffer to the column and centrifuge for 30 seconds

17. Discard the flow-through

18. Add 00 μl RNA Wash Buffer to the column and centrifuge for 2 minutes to ensure

complete removal of the wash buffer

19. Transfer the column carefully into an RNase-free tube

20. Add 50 μl DNase/RNase-Free Water directly to the column matrix and centrifuge for 30

seconds

Method 5. In vitro guide assay.

Required materials

• DNase-treated sgRNAs (protocol)

• Purified target substrate fragment

(note for primer design)

• Thawed 10X Cas9 Nuclease Reaction Buffer

• 20 µM Cas9 Nuclease, S. pyogenes

• Nuclease-free water

Procedure

1. Dilute guide RNA to 300 nM with nuclease-free water on ice

o Input concentration (Conc) for each guide in the Guide Assay worksheet

o Leave dilution volume (bottom) unless volumes required are too large or small

Tube Guide Conc RNA Water

1 C252_NR_1 325.3 1.23 18.77

124

2 C252_NR_2 387.0 1.04 18.96

3 Desmo_NR_1 375.7 1.07 18.93

4 Desmo_NR_2 379.5 1.06 18.94

5

6

2. Melt and re-anneal diluted guides using thermocycler Melt&Anneal protocol

3. Dilute substrate DNA to 30 nM with nuclease-free water on ice

o Input concentration (Conc), length (bp), and number of guides for each target into

the Guide Assay worksheet

o Use the volumes in the farthest right column, under Mol Dilution

Tube Target Conc # DNA Water

1 C252_NR 37.1 2 7.77 2.23

2 Desmo_NR 30.2 2 14.12 -4.12

4. Assemble the reaction at room temperature in the following order:

o First make a water, buffer, Cas9 master mix enough for all reactions (including

negative controls):

o These quantities are calculated in the Guide Assay worksheet

Component Per rxn Master Mix (enough for 6.5 rxns)

Nuclease-free water 21 µl 135.9 µl

10X Cas9 Nuclease Reaction Buffer 3 µl 19.5 µl

125

20 µM Cas9 Nuclease, S. pyogenes 0.1 µl 0.7 µl

5. Aliquot 24 µl this master mix into PCR tubes

6. Add 3 µl sgRNA (or RNase-free water for negative controls) to each reaction tube:

Component Per 30 µl rxn

Water, buffer, Cas9 mixture 24 µl

300 nM sgRNA 3 µl (30 nM final)

Reaction volume 27 µl

4. Pre-incubate for 10 minutes at 25 °C

5. Add substrate to each tube:

30 nM substrate DNA 3 µl (3 nM final)

Total reaction volume 30 µl

6. Mix thoroughly and pulse-spin in a microfuge

7. Incubate at 37 °C overnight

8. Proceed with fragment analysis by gel electrophoresis

Method 6. Ribonucleoprotein complex assembly.

Required materials

• 10 μl tube S. pyogenes Cas9 nuclease (qb3 MacroLab, UC Berkeley)

• alidated sgRNA (>168 ng/μl)

Procedure

1. Dilute sgRNA to 5 μM in PCR tubes

RNA for Cas9 spreadsheet

126

sgRNA Conc. RNA H2O

Desmo_NR_1 375.7 4.45 5.55

Desmo_NR_2 379.5 4.41 5.59

2. Incubate guide dilutions at 98 °C for 5 min and allow to cool to room temperature

3. Prepare dilution of Cas9 protein

▪ (3 μl of Cas + 2 μl of water)

4. Combine Cas9 with sgRNA (and water)

a. For 1 μM solution:

▪ 2.25 μl Cas dilution

▪ 2 μl guide (@ 5 μM)

▪ .25 μl water

b. For 2 μM solution:

▪ .5 μl Cas dilution

▪ μl guide (@ 5 μM)

5. Incubate at room temperature for 5-30 min

6. Add 1.5 μl of marker fragment to each

Method 7. Transformation of Chlorella sp. strain CCMP252 with Cas9 RNP.

Cell culture Plates

• Culture cells to about 2.25 million cells/ml • f/2 EU

• Exxon 2.0 medium (or Mf/2, or f/2 EU) • Phleomycin 30 µg/ml

• Constant light and shaking • Ampicillin 100 µg/ml

• Ambient CO2 (or 1%, whatever) • Kanamycin 100 µg/ml

127

Cell preparation (for n transformations)

1. Measure cell density by Attune flow 9. Transfer the contents of each 1.7 ml tube

cytometer into a 2 mm electroporation cuvette

2. Pellet n x 200 M cells by centrifuge @ • Important - avoid bubbles

2000 rcf for 10 min o Tap cuvette on bench to settle

3. Decant supernatant and resuspend in 25 10. Electroporate cells:

ml filter-sterilized 375 mM ᴅ-sorbitol o Time constant protocol

4. Pellet cells again with same centrifuge o Voltage = 850 V

setting as above o TC = 20 ms

o Incubate n 2 mm electroporation o Cuvette = 2 mm

cuvettes on ice while waiting • Important - wipe cuvette dry with

5. Decant supernatant and resuspend cells Kimwipe before placement into

in n x 100 µl sterile d-sorbitol electroporator

6. Aliquot 100 µl of cells into sterile 1.7 ml 11. Chill cuvettes/cells on ice for (>) 5 min

tube for each reaction 12. Transfer each sample to respective

7. Mix in 2-5 µg of DNA for each reaction culturing tubes

8. Incubate on ice for at least 5 min 13. Dilute each culture tube with 5 ml

o Add 900 µl Exxon 2.0 medium to n additional media

culturing tubes (for later) 14. Incubate in tube rotator overnight at

o Label tubes for transformation ambient temperature, CO2, light

128

Plating the next day

1. Transfer 5.85 ml of culture to 15 ml 4. Resuspend pellet in remaining volume

centrifuge tubes (6 x 975 µl) 5. Deposit 300 µl per plate (2X) and spread 6. Leave plates half-covered in the back of 2. Centrifuge tubes at 3000 rcf for 10 min the hood until dry and cover 7. Wrap plates with parafilm and transfer to a. Prepare spreaders at this time (Percival or CO2 box) for 3-4 day

3. Remove and discard 5.25 ml (6 x 875 µl)

129

APPENDIX D

GENETIC MATERIALS

Genetic Data D.1. Genbank sequence of ngBDP_hyg_Venus_UcFatB1-HA used for plastid- targeting of thioesterase in N. gaditana CCMP526.

LOCUS ngBDP_SV40-hyg_V 8588 bp DNA circular 10-JUN-

2018

DEFINITION Dual expression vector.

ACCESSION urn.local...e-94ld6wa

KEYWORDS .

SOURCE

ORGANISM Synthetic

.

COMMENT Serial Cloner Genbank Format SerialCloner_Type=DNA

SerialCloner_Comments=Ligation of :

ngPT_UEP-ble_BDP_Venus_UcFatB1-HA.gb [7102 nt ] : (PmeI[3911] /

PacI[2799]) to PacI-SV40-hyg-PmeI.txt.xdna [1486 nt] : (PacI[7] /

PmeI[1494]) SerialCloner_Ends=0,0,,0,.

FEATURES Location/Qualifiers

primer_bind complement(496..513)

misc_feature complement(652..1455)

misc_feature complement(1503..2351)

/note="Geneious type: gene"

primer_bind 2673..2692

promoter 2819..3021

130 3066..3198 misc_feature 3227..3253

/note="Geneious type: conflict" gene 3258..4283 terminator 4292..4500 primer_bind complement(4355..4376) terminator complement(4501..4584) primer_bind 4544..4563

CDS complement(4594..5853) gene complement(4597..4683) primer_bind complement(4669..4695) gene complement(4696..5595) primer_bind 4696..4719 primer_bind complement(5566..5593) primer_bind complement(5701..5722) sig_peptide complement(5803..5853) promoter complement(5854..6824) primer_bind complement(5866..5884)

CDS order(6825..6903,7262..8133) sig_peptide 6825..6875 intron 6904..7261 primer_bind 7330..7353 misc_feature 7414..8130

/note="Geneious type: gene" terminator 8143..8410

131

primer_bind complement(8159..8178)

primer_bind 8224..8243

terminator complement(8411..8543)

ORIGIN

1 ggtctcctat acgtctctta tacgacatca ccgatgggga acccagacgc tgagtacgta

61 ttctaaatgc ataataaata ctgataacat cttatagttt gtattatatt ttgtattatc

121 gttgacatgt ataattttga tatcaaaaac tgattttccc tttattattt tcgagattta

181 ttttcttaat tctctttaac aaactagaaa tattgtatat acaaaaaatc ataaataata

241 gatgaatagt ttaattatag gtgttcatca atcgaaaaag caacgtatct tatttaaagt

301 gcgttgcttt tttctcattt ataaggttaa ataattctca tatatcaagc aaagtgacag

361 gcgcccttaa atattctgac aaatgctctt tccctaaact ccccccataa aaaaacccgc

421 cgaagcgggt ttttacgtta tttgcggatt aacgattact cgttatcaga accgcccagg

481 gggcccgagc ttaagactgg ccgtcgtttt acaacacaga aagagtttgt agaaacgcaa

541 aaaggccatc cgtcaggggc cttctgctta gtttgatgcc tggcagttcc ctactctcgc

601 cttccgcttc ctcgctcact gactcgctgc gctcggtcgt tcggctgcgg cgagcggtat

661 cagctcactc aaaggcggta atacggttat ccacagaatc aggggataac gcaggaaaga

721 acatgtgagc aaaaggccag caaaaggcca ggaaccgtaa aaaggccgcg ttgctggcgt

781 ttttccatag gctccgcccc cctgacgagc atcacaaaaa tcgacgctca agtcagaggt

841 ggcgaaaccc gacaggacta taaagatacc aggcgtttcc ccctggaagc tccctcgtgc

901 gctctcctgt tccgaccctg ccgcttaccg gatacctgtc cgcctttctc ccttcgggaa

961 gcgtggcgct ttctcatagc tcacgctgta ggtatctcag ttcggtgtag gtcgttcgct

1021 ccaagctggg ctgtgtgcac gaaccccccg ttcagcccga ccgctgcgcc ttatccggta

1081 actatcgtct tgagtccaac ccggtaagac acgacttatc gccactggca gcagccactg

1141 gtaacaggat tagcagagcg aggtatgtag gcggtgctac agagttcttg aagtggtggg

1201 ctaactacgg ctacactaga agaacagtat ttggtatctg cgctctgctg aagccagtta

132

1261 ccttcggaaa aagagttggt agctcttgat ccggcaaaca aaccaccgct ggtagcggtg

1321 gtttttttgt ttgcaagcag cagattacgc gcagaaaaaa aggatctcaa gaagatcctt

1381 tgatcttttc tacggggtct gacgctcagt ggaacgacgc gcgcgtaact cacgttaagg

1441 gattttggtc atgagcttgc gccgtcccgt caagtcagcg taatgctctg cttttaccaa

1501 tgcttaatca gtgaggcacc tatctcagcg atctgtctat ttcgttcatc catagttgcc

1561 tgactccccg tcgtgtagat aactacgata cgggagggct taccatctgg ccccagcgct

1621 gcgatgatac cgcgagaacc acgctcaccg gctccggatt tatcagcaat aaaccagcca

1681 gccggaaggg ccgagcgcag aagtggtcct gcaactttat ccgcctccat ccagtctatt

1741 aattgttgcc gggaagctag agtaagtagt tcgccagtta atagtttgcg caacgttgtt

1801 gccatcgcta caggcatcgt ggtgtcacgc tcgtcgtttg gtatggcttc attcagctcc

1861 ggttcccaac gatcaaggcg agttacatga tcccccatgt tgtgcaaaaa agcggttagc

1921 tccttcggtc ctccgatcgt tgtcagaagt aagttggccg cagtgttatc actcatggtt

1981 atggcagcac tgcataattc tcttactgtc atgccatccg taagatgctt ttctgtgact

2041 ggtgagtact caaccaagtc attctgagaa tagtgtatgc ggcgaccgag ttgctcttgc

2101 ccggcgtcaa tacgggataa taccgcgcca catagcagaa ctttaaaagt gctcatcatt

2161 ggaaaacgtt cttcggggcg aaaactctca aggatcttac cgctgttgag atccagttcg

2221 atgtaaccca ctcgtgcacc caactgatct tcagcatctt ttactttcac cagcgtttct

2281 gggtgagcaa aaacaggaag gcaaaatgcc gcaaaaaagg gaataagggc gacacggaaa

2341 tgttgaatac tcatattctt cctttttcaa tattattgaa gcatttatca gggttattgt

2401 ctcatgagcg gatacatatt tgaatgtatt tagaaaaata aacaaatagg ggtcagtgtt

2461 acaaccaatt aaccaattct gaacattatc gcgagcccat ttatacctga atatggctca

2521 taacacccct tgtttgcctg gcggcagtag cgcggtggtc ccacctgacc ccatgccgaa

2581 ctcagaagtg aaacgccgta gcgccgatgg tagtgtgggg actccccatg cgagagtagg

2641 gaactgccag gcatcaaata aaacgaaagg ctcagtcgaa agactgggcc tttcgcccgg

2701 gctaataagg ggggctggat cgcttcgtgt tccccatcgg tgatgtcgta taggaagcag

133

2761 tatacgagac ctataggaga cgtatatggt cttcttttaa ttaagtaggc ctctcgagtg

2821 catctcaatt agtcagcaac catagtcccg cccctaactc cgcccatccc gcccctaact

2881 ccgcccagtt ccgcccattc tccgccccat cgctgactaa ttttttttat ttatgcagag

2941 gccgaggccg cctcggcctc tgagctattc cagaagtagt gaggaggctt ttttggaggc

3001 ctaggctttt gcaaaaagct tgcggccgct taagctgcag aagttggtcg tgaggcactg

3061 ggcaggtaag tatcaaggtt acaagacagg tttaaggaga ccaatagaaa ctgggcttgt

3121 cgagacagag aagactcttg cgtttctgat aggcacctat tggtcttact gacatccact

3181 ttgcctttct ctccacaggt gtccactccc aggttcaata cagctcttaa gcggccgcaa

3241 gcttgccgcc aacatcgatg aaaaagcctg aactcaccgc gacgtctgtc gagaagtttc

3301 tgatcgaaaa gttcgacagc gtctccgacc tgatgcagct ctcggagggc gaagaatctc

3361 gtgctttcag cttcgatgta ggagggcgtg gatatgtcct gcgggtaaat agctgcgccg

3421 atggtttcta caaagatcgt tatgtttatc ggcactttgc atcggccgcg ctcccgattc

3481 cggaagtgct tgacattggg gaattcagcg agagcctgac ctattgcatc tcccgccgtg

3541 cacagggtgt cacgttgcaa gacctgcctg aaaccgaact gcccgctgtt ctgcagccgg

3601 tcgcggaggc catggatgcg atcgctgcgg ccgatcttag ccagacgagc gggttcggcc

3661 cattcggacc gcaaggaatc ggtcaataca ctacatggcg tgatttcata tgcgcgattg

3721 ctgatcccca tgtgtatcac tggcaaactg tgatggacga caccgtcagt gcgtccgtcg

3781 cgcaggctct cgatgagctg atgctttggg ccgaggactg ccccgaagtc cggcacctcg

3841 tgcacgcgga tttcggctcc aacaatgtcc tgacggacaa tggccgcata acagcggtca

3901 ttgactggag cgaggcgatg ttcggggatt cccaatacga ggtcgccaac atcttcttct

3961 ggaggccgtg gttggcttgt atggagcagc agacgcgcta cttcgagcgg aggcatccgg

4021 agcttgcagg atcgccgcgg ctccgggcgt atatgctccg cattggtctt gaccaactct

4081 atcagagctt ggttgacggc aatttcgatg atgcagcttg ggcgcagggt cgatgcgacg

4141 caatcgtccg atccggagcc gggactgtcg ggcgtacaca aatcgcccgc agaagcgcgg

4201 ccgtctggac cgatggctgt gtagaagtac tcgccgatag tggaaaccga cgccccagca

134

4261 ctcgtccgag ggcaaaggaa tgagtttaaa ctagcctcgg aagtcagaag aagatcaggg

4321 aacgtactgc cgacgacaga agcagagagc ctagcaggcc aatgcttcta ggtgtgatga

4381 tggcaggatt gatcagatct tgcggcaacg tgtctttcgg cctttgctcc ctgtgttgta

4441 taaaagaaaa ggatgaaaag cgacccagac ggactaaaaa aacggaacaa ggactcatta

4501 tttagaacag caaggccatt gctttatctt tatatttcac gacttctcca ccccatccat

4561 tgccacctcg gccttcctgg atcaactagt atactaggca tagtctggca catcataagg

4621 gtatccagca taatcaggaa catcatacgg atagccagca tagtctggca catcataggg

4681 atacattcga agcttaacac ggggttcggc tgggatcacg gaaataccac gaaaactgtc

4741 agtgagtttg ggacgccatt ctgtgcgagc gcggaggact tcagagccgc cctcgagttg

4801 caaaagatgg tcgcaaacca gtcccgcttc gctcgagcct ccggacacag tcgtaagact

4861 gcgaaggacg ctgtcgcgag tgcattcacg gcggtattcc agggtaaacg acgaaatgtg

4921 gtgggactcg aagatagagt ctggaaccgt ttcgaaaacc caagcgacgt acttcaagtt

4981 gttgacgtgc tgattcacgt ccaagtcgtt ccatcgcggc gtgagtccac cctgaatgta

5041 gtcggcggtg gaatcgttga gtttctggag ctttttgatt tcgtcatcct tgacggccac

5101 gttatcgatg aaagcaggac caatttcgcc acgaacttcg tcgggaatag tggagaggcg

5161 acgtgtgcga gtattcatca agacggacaa ggacgtgcag cgggtcagaa tctctccggt

5221 cttacaatca cgcaccaaga aatcacgacg catgccgttg ttgccagacg caccaatcca

5281 acattcgact tcgacggtgt ctccccaggt ggggtatcgc tcgaccgcga cgtgagtacg

5341 gcggacgacc cacatcaagt cgcgcttact catctccaag gtcgtgccaa aaccatcacc

5401 cagaatacca accgatttcg cgtgattcaa tgtagcctct tgcatgtggt tcatgacagc

5461 gaggatgctc gtcgagcggt caggtccgac ttcataggag cggatggcaa aggtacgacg

5521 gaacaccaga ccatggagtc cgaagtgatc gtccagaagc tgggggagct taggcttggg

5581 cttccattcc aacatatggg cgatctcggg ccgtgtgacc cagggaaggg cggtcgagag

5641 ggtcgggtcg tcccggacgt accactttcc gagtgggccc gagccaggct ccacctcctt

5701 gaggtagttg ttggcagtag cggccttacc ggccttggct cccttccccg ccttgatggc

135

5761 ctccatcttg gttgcggctg gcacccgggc gttgggggtg aggaaagcct gggcgccgac

5821 cagacccgac aggaagagga ctgcggtctt catcgcggtt gataggacgc aaataggggg

5881 cctggaacgg aatgtattta taaggtgtag attagaggga gtgcggattt acgaacaata

5941 atagcaccag ctttggtctt caaagacatg gtctctcact acagcgcgag gtcgagatag

6001 attttaaact ttctcagagt ggcagcgtga atgtgcgtta caactctttg aatctttgga

6061 aagcactcaa ctacaataag tgtatgaggt gccatgtatt gacaagtcaa aagaattcat

6121 cgtctgaatc catgtcttct ggagccaaaa tccttctctg cctcgccgta ctgtgcttct

6181 cattagcgcc tgctggccct aggcttcgat aaagcaagag cgcacatgag agctatcagc

6241 ggacgtattc ggtcccaaca agttgactca ataggtccta gggaaacgca tggtctaggt

6301 cctggaatga gttcatcagg caagaaaggg gccggacaga ggagtcgggg aagcgagcga

6361 tgggagttga attgaggagg ataccgttcc tgcggtcgac tggtctgcgt ggaaataatg

6421 ccccacaagg aaggataaat cttctccctc gcggcgtagc gccctcctca agagctgccc

6481 aagtttaaat ctgcgcaatc tttgggcccc aatatagctc tgagcctcac gaatggcttc

6541 atggctgcct cctggttgac gtgcggagcc agaaaggatt tcctcttcaa cgtgagcgtt

6601 aaatgcttgt cccatgtccg acgcgggggc gaccccgcgt ggacgtgcgc ttgcagaacg

6661 tttactccct cctcctagac aatgaagtat gtggaataga acacatgtac tctcgtgttt

6721 ggggatgtgc acgattatga gcgaggagaa gtttttggtg tcaaaatcac actctcttcc

6781 acctattaat cctacacgag cagatccact cccccttctt caaaatgaag tccgcgatgg

6841 tacttgtcgg cagcttggct gtggcctctg cattcgttcc tgcggcccct aagatgtctc

6901 gcagtaagta tagcgcgcgg gagggtcatg cattcgtgga gaatgtgttt gacgttacga

6961 aatgtaattt gaggatcgag ggatgtaacc gtcctagact atgaggggtt ccaaacacgt

7021 agcactttgc gatgaaaatt ttcatgtcga tatctgcata actctctatc ggtttgaggg

7081 ctggatagtc gggtgtgtcg ggcggtgggg agattgtaca tgacgtgacc tcgcagatga

7141 gggcaaagta ctactttttg gaaaagtggg tagcttttta ccggcccagc tgaccttaat

7201 tttcaaaaag tgtttgcgaa tcctaggact catacgccaa ccctctttct gtttttgaca

136

7261 gctcgtggga tgacacgcat ggccgtgaac gacatcctgg gctcagacgt ggagactggc

7321 ggcgtgtggg accccctcaa tttttccaaa gacgagggca gcctgtaccg atatcgcgcc

7381 gtggagctca agcacggccg attggcggga tccatggtga gcaagggcga ggagctgttc

7441 accggggtgg tgcccatcct ggtcgagctg gacggcgacg taaacggcca caagttcagc

7501 gtgtccggcg agggcgaggg cgatgccacc tacggcaagc tgaccctgaa gttcatctgc

7561 accaccggca agctgcccgt gccctggccc accctcgtga ccaccttcgg ctacggcctg

7621 cagtgcttcg cccgctaccc cgaccacatg aagcagcacg acttcttcaa gtccgccatg

7681 cccgaaggct acgtccagga gcgcaccatc ttcttcaagg acgacggcaa ctacaagacc

7741 cgcgccgagg tgaggttcga gggcgacacc ctggtgaacc gcatcgagct gaagggcatc

7801 gacttcaagg aggacggcaa catcctgggg cacaagctgg agtacaacta caacagccac

7861 aacgtctata tcatggccga caagcagaag aacggcatca aggtgaactt caagatccgc

7921 cacaacatcg aggacggcag cgtgcagctc gccgaccact accagcagaa cacccccatc

7981 ggcgacggcc ccgtgctgct gcccgacaac cactacctga gctaccagtc cgccctgagc

8041 aaagacccca acgagaagcg cgatcacatg gtcctgctgg agttcgtgac cgccgcaggg

8101 atccaggtga gcatggacga gctgtacaag taagttaacg cgtaggaaag tgggttacct

8161 ggctagctag ggagggttac tcgtctcaag ttcatggcga gcgaatgaat gatcgggagg

8221 gactatgatc gggatggtcg gcgggcaatt atgcagggca tgctagtatg agtgcaaggt

8281 agaatgctac ggacgtcatg gtgaagctcg acgacgcttc ccacaatcag atagcggtgg

8341 gtctcggttg ggacgcataa cctccatttc tgcccttgaa cgaccaaaga aagaagtgga

8401 aaaaatacaa cccgcccgga caatttgtca tctcttcgct ttattcccgt ctcttgcttt

8461 tttcattccc ccttcctgct tcccccacga cgatgtacaa cgcacatgca tgctgcgacg

8521 acttcaagtt gatgatcggt ctaggcctgc agggctcagc ggccgctcta gaccccgaag

8581 acactata

//

137

Genetic Data D.2. Genbank sequence of ngOX_CAH1-bfloGFPa1, an representative of the expression plasmid used to localize identified components of the carbon concentrating mechanism in N. gaditana CCMP526.

LOCUS ngOX_CAH1-bfloGF 10143 bp DNA circular UNA 11-JUN-

2018

DEFINITION A new nucleotide sequence entered manually.

ACCESSION urn.local...eu-8buax2e

VERSION urn.local...eu-8buax2e

KEYWORDS .

SOURCE

ORGANISM

.

FEATURES Location/Qualifiers

misc_feature 442..972

/product="Gentamicin-r"

/note="Geneious type: gene"

misc_feature 1071..1743

/product="Ori_pUC"

/note="Geneious type: rep_origin"

promoter 2559..2761

intron 2806..2938

gene 2998..4023

polyA_signal 4066..4287

misc_feature 4307..4543

/note="Geneious type: conflict"

/note="Geneious type: 3'UTR"

138

promoter 4550..5448

misc_feature 5452..7062

/note="Geneious type: ORF"

sig_peptide 5452..5520

misc_feature 6385..6402

/note="Geneious type: polylinker"

gene 6403..7062

terminator 7089..7605

misc_feature 8105..8962

/product="Ampicillin-r"

/note="Geneious type: gene"

ORIGIN

1 ggcctatttc cctaaagggt ttattgagaa tatgtttttc gtcagcgcca atccctgggt

61 gagtttcacc agttttgatt taaacgtggc caatatggac aacttcttcg cccccgtttt

121 cactatgggc aaatattata cgcaaggcga caaggtgctg atgccgctgg cgattcaggt

181 tcatcatgcc gtctgtgatg gcttccatgt cggcagaatg cttaatgaat tacaacagta

241 ctgcgatgag tggcagggcg gggcgtaaac gccgaggagg aaaaaaaaat gcgctcacgc

301 aactggtcca gaaccttgac cgaacgcagc ggtggtaacg gcgcagtggc ggttttcatg

361 gcttgttatg actgtttttt tgtacagtct atgcctcggg catccaagca gcaagcgcgt

421 tacgccgtgg gtcgatgttt gatgttatgg agcagcaacg atgttacgca gcagggcagt

481 cgccctaaaa caaagttagg tggctcaagt atgggcatca ttcgcacatg taggctcggc

541 cctgaccaag tcaaatccat gcgggctgct cttgatcttt tcggtcgtga gttcggtgac

601 gtagccacct actcccaaca tcagccggac tccgattacc tcgggaactt gctccgtagt

661 aagacattca tcgcgcttgc tgccttcgac caagaagcgg ttgttggcgc tctcgcggct

721 tacgttctgc ccaagtttga gcagccgcgt agtgagatct atatctatga tctcgcagtc

139

781 tccggcgagc accggaggca gggcattgcc accgcgctca tcaatctcct caagcatgag

841 gccaacgcgc ttggtgctta tgtgatctac gtgcaagcag attacggtga cgatcccgca

901 gtggctctct atacaaagtt gggcatacgg gaagaagtga tgcactttga tatcgaccca

961 agtaccgcca cctaagcaga gcattacgct gacttgacgg gacggcgcaa gctcatgacc

1021 aaaatccctt aacgtgagtt acgcgcgcgt cgttccactg agcgtcagac cccgtagaaa

1081 agatcaaagg atcttcttga gatccttttt ttctgcgcgt aatctgctgc ttgcaaacaa

1141 aaaaaccacc gctaccagcg gtggtttgtt tgccggatca agagctacca actctttttc

1201 cgaaggtaac tggcttcagc agagcgcaga taccaaatac tgttcttcta gtgtagccgt

1261 agttagccca ccacttcaag aactctgtag caccgcctac atacctcgct ctgctaatcc

1321 tgttaccagt ggctgctgcc agtggcgata agtcgtgtct taccgggttg gactcaagac

1381 gatagttacc ggataaggcg cagcggtcgg gctgaacggg gggttcgtgc acacagccca

1441 gcttggagcg aacgacctac accgaactga gatacctaca gcgtgagcta tgagaaagcg

1501 ccacgcttcc cgaagggaga aaggcggaca ggtatccggt aagcggcagg gtcggaacag

1561 gagagcgcac gagggagctt ccagggggaa acgcctggta tctttatagt cctgtcgggt

1621 ttcgccacct ctgacttgag cgtcgatttt tgtgatgctc gtcagggggg cggagcctat

1681 ggaaaaacgc cagcaacgcg gcctttttac ggttcctggc cttttgctgg ccttttgctc

1741 acatgttctt tcctgcgtta tcccctgatt ctgtggataa ccgtattacc gcctttgagt

1801 gagctgatac cgctcgccgc agccgaacga ccgagcgcag cgagtcagtg agcgaggaag

1861 cggaaggcga gagtagggaa ctgccaggca tcaaactaag cagaaggccc ctgacggatg

1921 gcctttttgc gtttctacaa actctttctg tgttgtaaaa cgacggccag tcttaagctc

1981 gggccccctg ggcggttctg ataacgagta atcgttaatc cgcaaataac gtaaaaaccc

2041 gcttcggcgg gtttttttat ggggggagtt tagggaaaga gcatttgtca gaatatttaa

2101 gggcgcctgt cactttgctt gatatatgag aattatttaa ccttataaat gagaaaaaag

2161 caacgcactt taaataagat acgttgcttt ttcgattgat gaacacctat aattaaacta

2221 ttcatctatt atttatgatt ttttgtatat acaatatttc tagtttgtta aagagaatta

140

2281 agaaaataaa tctcgaaaat aataaaggga aaatcagttt ttgatatcaa aattatacat

2341 gtcaacgata atacaaaata taatacaaac tataagatgt tatcagtatt tattatgcat

2401 ttagaatacg tactcagcgt ctgggttccc catcggtgat gtcgtataag agacgtatag

2461 gagacctata gtgtcttcgg ggatttaaat gaattccccg gggatcccgc ttagcgagct

2521 ctctagacat gccctgcagg accggtaggc ctctcgagtg catctcaatt agtcagcaac

2581 catagtcccg cccctaactc cgcccatccc gcccctaact ccgcccagtt ccgcccattc

2641 tccgccccat cgctgactaa ttttttttat ttatgcagag gccgaggccg cctcggcctc

2701 tgagctattc cagaagtagt gaggaggctt ttttggaggc ctaggctttt gcaaaaagct

2761 tgcggccgct taagctgcag aagttggtcg tgaggcactg ggcaggtaag tatcaaggtt

2821 acaagacagg tttaaggaga ccaatagaaa ctgggcttgt cgagacagag aagactcttg

2881 cgtttctgat aggcacctat tggtcttact gacatccact ttgcctttct ctccacaggt

2941 gtccactccc aggttcaata cagctcttaa gcggccgcaa gcttgccgcc aacatcgatg

3001 aaaaagcctg aactcaccgc gacgtctgtc gagaagtttc tgatcgaaaa gttcgacagc

3061 gtctccgacc tgatgcagct ctcggagggc gaagaatctc gtgctttcag cttcgatgta

3121 ggagggcgtg gatatgtcct gcgggtaaat agctgcgccg atggtttcta caaagatcgt

3181 tatgtttatc ggcactttgc atcggccgcg ctcccgattc cggaagtgct tgacattggg

3241 gaattcagcg agagcctgac ctattgcatc tcccgccgtg cacagggtgt cacgttgcaa

3301 gacctgcctg aaaccgaact gcccgctgtt ctgcagccgg tcgcggaggc catggatgcg

3361 atcgctgcgg ccgatcttag ccagacgagc gggttcggcc cattcggacc gcaaggaatc

3421 ggtcaataca ctacatggcg tgatttcatc tgcgcgattg ctgatcccca tgtgtatcac

3481 tggcaaactg tgatggacga caccgtcagt gcgtccgtcg cgcaggctct cgatgagctg

3541 atgctttggg ccgaggactg ccccgaagtc cggcacctcg tgcacgcgga tttcggctcc

3601 aacaatgtcc tgacggacaa tggccgcata acagcggtca ttgactggag cgaggcgatg

3661 ttcggggatt cccaatacga ggtcgccaac atcttcttct ggaggccgtg gttggcttgt

3721 atggagcagc agacgcgcta cttcgagcgg aggcatccgg agcttgcagg atcgccgcgg

141

3781 ctccgggcgt atatgctccg cattggtctt gaccaactct atcagagctt ggttgacggc

3841 aatttcgatg atgcagcttg ggcgcagggt cgatgcgacg caatcgtccg atccggagcc

3901 gggactgtcg ggcgtacaca aatcgcccgc agaagcgcgg ccgtctggac cgatggctgt

3961 gtagaagtac tcgccgatag tggaaaccga cgccccagca ctcgtccgag ggcaaaggaa

4021 tgacgtacga ccaggtctag agtcggggcg gtcggccgct tcgagcagac atgataagat

4081 acattgatga gtttggacaa accacaacta gaatgcagtg aaaaaaatgc tttatttgtg

4141 aaatttgtga tgctattgct ttatttgtaa ccattataag ctgcaataaa caagttaaca

4201 acaacaattg cattcatttt atgtttcagg ttcaggggga ggtgtgggag gttttttaaa

4261 gcaagtaaaa cctctacaaa tgtggtaaaa tgcataagga tctgaacgat ggagcggaga

4321 atgggcggaa ctgggcggag ttaggggcgg gatgggcgga gttaggggcg ggactatggt

4381 tgctgactaa ttgagatgca tgctttgcat acttctgcct gctggggagc ctggggactt

4441 tccacacctg gttgctgact aattgagatg catgctttgc atacttctgc ctgctgggga

4501 gcctggggac tttccacacc ctaactgaca cacattccac agctcatgag tgccattgtt

4561 tcttctttct ccctctgttg tctcaatcgg aaaagctagg gccggggggt ggccgtaatc

4621 ccagacattc cgggcgatgc ctcgagccaa cgggtggctc ccgacgtgct tggctttcgt

4681 gggcagggag agccctgcgc tgggttttag acgtgccgcg agtgctgctt gattcgaggt

4741 acacgttaga gatatgaggc acatagcact tcgctggcac ccccgcgtgt aggcgtgtcc

4801 tcatccgtgt ttggtttggt acgtcggtta ccggagaggc ctgtgtggct gtcacgcgcg

4861 acgaagacgg gggagtgttg cctcgtgcct ccaggtgctt tgatgacgcg cattgcgcgt

4921 ttcgcttatc caaacaccgc aatcactatt tgaccatgca atcaggttga tagggacgtc

4981 aaagtgcatt ttctgccttt aaactggtcg cgcgggtgta gatgtggcat ctggacggga

5041 ccctcagaac gaagaaaaca agatagggat ggctaagaat tgcacacgcc cctcgtgtgg

5101 ctgaaagact tttgcgccac attaacgcgt aaacaaattg agctagtctt acagttattc

5161 aggaggagga caaaattatg tcgtccgacc gtgtgttatc taggaagccg tgacgcgaga

5221 catataccta gtatctctag tacggctgcg tcttctttcc tccctttttg aaaggaagac

142

5281 gtcggtgtcc gcgattctat tttatgcgaa aggtgtggga ggtgcattag tcgcgctttt

5341 cttgatagtt gaccgtctgg ctgattcagg gaaatcctca caccatcctg tctttctcac

5401 acgacagcct cttctttcgg aatattcccc tcaacaccta atctgtaaca tatgaagatc

5461 ttcacgacct ctgcagcatg cctggccctc atgatggccc ccttgtccct gatcaaggct

5521 gccacaagca tggaagacct tgagaagatg gagtcctgga gctacatcca caatgtgaac

5581 tacgcaccct gtgacgacgc ggacgccaag tctttcgggt gctgcgggaa gcccgaagac

5641 gggaaggagc gatgcggtcc cgacaactgg atccatatca atgagaccac gaccaccacc

5701 tgcgccaacg tgaagggaaa ccaacagagc cccatcaaca tccaaaccgt gggcaccacc

5761 ctcacgccca tggacaaggc ggacgggacc atgtccttct ccgggcacac gtgcgaagcc

5821 ttcattgagt tcaaggaggg gacgtgggag gtggttatcc ctgacacgtg caccttccaa

5881 gccaagatcg ggggcaccga ctacaaactc ctccagttcc acttccacaa cgcggagcac

5941 tccatcaact tctcctacat gccgctggag gtccatctcg tccacgccaa ggtagacagt

6001 cccaaggaca tcgccgtcct ctccgtcttc ctcgcgcccg gggcctcctc cacgttcttt

6061 gagaccctgc aggcgaacga ggcgcaggca gaccccgacc accccgacct ggccaccaag

6121 aacaagatca acgcctacga catcctcccc gaggaccact ccttctggca ctacgtgggc

6181 tccctcacca cccctccctg ccagaccgag aagggacagt ctgtgaactg gtacctgttc

6241 aagacgccca acaccctgtc ctacagccag ctctactact tcacctccta cttccgcgac

6301 ctgcccctgt cggacaacgg ccgcatctcc cgcgacaccc aggacctggt ggagggaacc

6361 accctcttca ctttcccctc cgccggcgga ggaagcggag gaatgcctct gcccgcaacc

6421 cacgacatcc accttcacgg ctccatcaac ggccacgagt tcgacatggt ggggggagga

6481 aaaggcgacc cgaacgccgg ctcgctggtg accacagcga aatccaccaa gggtgccctg

6541 aagttctctc cctacttgat gatcccccac ctcgggtacg ggtactacca gtacctcccc

6601 tacccggacg gaccctcgcc tttccaggtc tccatgttgg aaggatcggg gtatgcagtc

6661 taccgcgtgt tcgactttga agacggaggc aagctgtcta ccgagtttaa gtactcctac

6721 gagggttccc atatcaaggc cgacatgaag ctgatgggaa gcggtttccc tgacgacggc

143

6781 ccagtcatga ccagccagat tgtcgaccag gacggctgcg tgtccaagaa gacgtatctt

6841 aacaacaaca ccatcgtgga cagcttcgac tggagttaca acctgcagaa tgggaagcgc

6901 tacagggccc gagtgtcgag ccactacatc ttcgacaagc ccttttcagc cgatctcatg

6961 aagaagcagc cggtcttcgt ttaccgcaag tgccacgtga aggctaccaa gaccgaagtc

7021 accctggacg agagggagaa ggcgttctac gagctggctt gagctagcat taataccatg

7081 gtttaaacta agctttcgca tccacgtaca aaactggatg ctgaatgaga tctggaggga

7141 gggatggctg ccgagtccgg agtccaagga aattcaagcg gttacgaagg aaattgtatg

7201 tcctgctaat actttatggt cgatggggaa tttggttgat tacgagtgag agttcatttg

7261 ctgctatcac ttcaagttcg agagattttc aggatgaata ggaaggaggt cagtattgag

7321 gcggacgaga ggctccgtgg ggccgcagct cgcgcgagag ggcgcagcgc cttttgggaa

7381 aggtggactg atctcgtcac cgagacgaag ctcatgcgct ttgatttatt tatgggcctc

7441 aggggaagat ataactcttt ttttgttgat gaggtgatgt gtcttagttt aatgtgtcag

7501 gcagccactg gtcctgactt atccgaatcc tacttgtcat agcgaggaga agaatgaaac

7561 ataccaagaa gcaatgagtg gtgcaatgac actctgactg ttacgggcgc gcctgagggc

7621 ggccgcacta gtcagctgat gcgcatggta ccttaattaa aaaagaagac catatacgtc

7681 tcctataggt ctcgtatact gcttcctata cgacatcacc gatggggaac acgaagcgat

7741 ccagcccccc taattagccc gggcgaaagg cccagtcttt cgactgagcc tttcgtttta

7801 tttgatgcct ggcagttccc tactctcgca tggggagtcc ccacactacc atcggcgcta

7861 cggcgtttca cttctgagtt cggcatgggg tcaggtggga ccaccgcgct actgccgcca

7921 ggcaaacaag gggtgttatg agccatattc aggtataaat gggctcgcga taatgttcag

7981 aattggttaa ttggttgtaa cactgacccc tatttgttta tttttctaaa tacattcaaa

8041 tatgtatccg ctcatgagac aataaccctg ataaatgctt caataatatt gaaaaaggaa

8101 gaatatgagt attcaacatt tccgtgtcgc ccttattccc ttttttgcgg cattttgcct

8161 tcctgttttt gctcacccag aaacgctggt gaaagtaaaa gatgctgaag atcagttggg

8221 tgcacgagtg ggttacatcg aactggatct caacagcggt aagatccttg agagttttcg

144

8281 ccccgaagaa cgttttccaa tgatgagcac ttttaaagtt ctgctatgtg gcgcggtatt

8341 atcccgtatt gacgccgggc aagagcaact cggtcgccgc atacactatt ctcagaatga

8401 cttggttgag tactcaccag tcacagaaaa gcatcttacg gatggcatga cagtaagaga

8461 attatgcagt gctgccataa ccatgagtga taacactgcg gccaacttac ttctgacaac

8521 gatcggagga ccgaaggagc taaccgcttt tttgcacaac atgggggatc atgtaactcg

8581 ccttgatcgt tgggaaccgg agctgaatga agccatacca aacgacgagc gtgacaccac

8641 gatgcctgta gcgatggcaa caacgttgcg caaactatta actggcgaac tacttactct

8701 agcttcccgg caacaattaa tagactggat ggaggcggat aaagttgcag gaccacttct

8761 gcgctcggcc cttccggctg gctggtttat tgctgataaa tccggagccg gtgagcgtgg

8821 ttctcgcggt atcatcgcag cgctggggcc agatggtaag ccctcccgta tcgtagttat

8881 ctacacgacg gggagtcagg caactatgga tgaacgaaat agacagatcg ctgagatagg

8941 tgcctcactg attaagcatt ggtaaaggag gaaaaaaaaa tgagccatat tcaacgggaa

9001 acgtcgaggc cgcgattaaa ttccaacatg gatgctgatt tatatgggta taaatgggct

9061 cgcgataatg tcgggcaatc aggtgcgaca atctatcgct tgtatgggaa gcccgatgcg

9121 ccagagttgt ttctgaaaca tggcaaaggt agcgttgcca atgatgttac agatgagatg

9181 gtcagactaa actggctgac ggaatttatg ccacttccga ccatcaagca ttttatccgt

9241 actcctgatg atgcatggtt actcaccact gcgatccccg gaaaaacagc gttccaggta

9301 ttagaagaat atcctgattc aggtgaaaat attgttgatg cgctggcagt gtgcacctgt

9361 ttgtaattgt ccttttaaca gcgatcgcgt atttcgcctc gctcaggcgc aatcacgaat

9421 gaataacggt ttggttgatg cgagtgattt tgatgacgag cgtaatggct ggcctgttga

9481 acaagtctgg aaagaaatgc ataaactttt gccattctca ccggattcag tcgtcactca

9541 tggtgatttc tcacttgata accttatttt tgacgagggg aaattaatag gttgtattga

9601 tgttggacga gtcggaatcg cagaccgata ccaggatctt gccatcctat ggaactgcct

9661 cggtgagttt tctccttcat tacagaaacg gctttttcaa aaatatggta ttgataatcc

9721 tgatatgaat aaattgcagt ttcatttgat gctcgatgag tttttctaaa ggaggaaaaa

145

9781 aaaatggaga aaaaaatcac tggatatacc accgttgata tatcccaatg gcatcgtaaa

9841 gaacattttg aggcatttca gtcagttgct caatgtacct ataaccagac cgttcagctg

9901 gatattacgg cctttttaaa gaccgtaaag aaaaataagc acaagtttta tccggccttt

9961 attcacattc ttgcccgcct gatgaatgct catccggagt tccgtatggc aatgaaagac

10021 ggtgagctgg tgatatggga tagtgttcac ccttgttaca ccgttttcca tgagtgcaca

10081 cgacgatttc cggcagtttc tacacatata ttcgcaagat gtggcgtgtt acggtgaaaa

10141 cct

//

Genetic Data D.3. Genbank sequence of ngOX-hyg_Cas9-NLS-GFP-NLS, the vector used to generate the stable Cas9-expressing chassis strain of N. gaditana CCMP526.

LOCUS ngOX-hyg_Cas9-NL 13353 bp DNA circular UNA 11-JUN-

2018

DEFINITION A new nucleotide sequence entered manually.

ACCESSION urn.local...y-94nh5f0

VERSION urn.local...y-94nh5f0

KEYWORDS .

SOURCE

ORGANISM

.

FEATURES Location/Qualifiers

misc_feature 442..972

/product="Gentamicin-r"

/note="Geneious type: gene"

misc_feature 1071..1743

/product="Ori_pUC"

146

/note="Geneious type: rep_origin"

promoter 2559..2761

intron 2806..2938

CDS 2998..>3000

gene 2998..4023

polyA_signal 4066..4287

misc_feature 4307..4543

/note="Geneious type: conflict"

/note="Geneious type: 3'UTR"

promoter 4550..5448

CDS <5452..10296

gene 5452..9555

sig_peptide 9568..9588

misc_feature 9595..9612

/note="Geneious type: polylinker"

gene 9613..>10269

sig_peptide 10273..10293

terminator 10299..10815

misc_feature 11315..12172

/product="Ampicillin-r"

/note="Geneious type: gene"

ORIGIN

1 ggcctatttc cctaaagggt ttattgagaa tatgtttttc gtcagcgcca atccctgggt

61 gagtttcacc agttttgatt taaacgtggc caatatggac aacttcttcg cccccgtttt

121 cactatgggc aaatattata cgcaaggcga caaggtgctg atgccgctgg cgattcaggt

147

181 tcatcatgcc gtctgtgatg gcttccatgt cggcagaatg cttaatgaat tacaacagta

241 ctgcgatgag tggcagggcg gggcgtaaac gccgaggagg aaaaaaaaat gcgctcacgc

301 aactggtcca gaaccttgac cgaacgcagc ggtggtaacg gcgcagtggc ggttttcatg

361 gcttgttatg actgtttttt tgtacagtct atgcctcggg catccaagca gcaagcgcgt

421 tacgccgtgg gtcgatgttt gatgttatgg agcagcaacg atgttacgca gcagggcagt

481 cgccctaaaa caaagttagg tggctcaagt atgggcatca ttcgcacatg taggctcggc

541 cctgaccaag tcaaatccat gcgggctgct cttgatcttt tcggtcgtga gttcggtgac

601 gtagccacct actcccaaca tcagccggac tccgattacc tcgggaactt gctccgtagt

661 aagacattca tcgcgcttgc tgccttcgac caagaagcgg ttgttggcgc tctcgcggct

721 tacgttctgc ccaagtttga gcagccgcgt agtgagatct atatctatga tctcgcagtc

781 tccggcgagc accggaggca gggcattgcc accgcgctca tcaatctcct caagcatgag

841 gccaacgcgc ttggtgctta tgtgatctac gtgcaagcag attacggtga cgatcccgca

901 gtggctctct atacaaagtt gggcatacgg gaagaagtga tgcactttga tatcgaccca

961 agtaccgcca cctaagcaga gcattacgct gacttgacgg gacggcgcaa gctcatgacc

1021 aaaatccctt aacgtgagtt acgcgcgcgt cgttccactg agcgtcagac cccgtagaaa

1081 agatcaaagg atcttcttga gatccttttt ttctgcgcgt aatctgctgc ttgcaaacaa

1141 aaaaaccacc gctaccagcg gtggtttgtt tgccggatca agagctacca actctttttc

1201 cgaaggtaac tggcttcagc agagcgcaga taccaaatac tgttcttcta gtgtagccgt

1261 agttagccca ccacttcaag aactctgtag caccgcctac atacctcgct ctgctaatcc

1321 tgttaccagt ggctgctgcc agtggcgata agtcgtgtct taccgggttg gactcaagac

1381 gatagttacc ggataaggcg cagcggtcgg gctgaacggg gggttcgtgc acacagccca

1441 gcttggagcg aacgacctac accgaactga gatacctaca gcgtgagcta tgagaaagcg

1501 ccacgcttcc cgaagggaga aaggcggaca ggtatccggt aagcggcagg gtcggaacag

1561 gagagcgcac gagggagctt ccagggggaa acgcctggta tctttatagt cctgtcgggt

1621 ttcgccacct ctgacttgag cgtcgatttt tgtgatgctc gtcagggggg cggagcctat

148

1681 ggaaaaacgc cagcaacgcg gcctttttac ggttcctggc cttttgctgg ccttttgctc

1741 acatgttctt tcctgcgtta tcccctgatt ctgtggataa ccgtattacc gcctttgagt

1801 gagctgatac cgctcgccgc agccgaacga ccgagcgcag cgagtcagtg agcgaggaag

1861 cggaaggcga gagtagggaa ctgccaggca tcaaactaag cagaaggccc ctgacggatg

1921 gcctttttgc gtttctacaa actctttctg tgttgtaaaa cgacggccag tcttaagctc

1981 gggccccctg ggcggttctg ataacgagta atcgttaatc cgcaaataac gtaaaaaccc

2041 gcttcggcgg gtttttttat ggggggagtt tagggaaaga gcatttgtca gaatatttaa

2101 gggcgcctgt cactttgctt gatatatgag aattatttaa ccttataaat gagaaaaaag

2161 caacgcactt taaataagat acgttgcttt ttcgattgat gaacacctat aattaaacta

2221 ttcatctatt atttatgatt ttttgtatat acaatatttc tagtttgtta aagagaatta

2281 agaaaataaa tctcgaaaat aataaaggga aaatcagttt ttgatatcaa aattatacat

2341 gtcaacgata atacaaaata taatacaaac tataagatgt tatcagtatt tattatgcat

2401 ttagaatacg tactcagcgt ctgggttccc catcggtgat gtcgtataag agacgtatag

2461 gagacctata gtgtcttcgg ggatttaaat gaattccccg gggatcccgc ttagcgagct

2521 ctctagacat gccctgcagg accggtaggc ctctcgagtg catctcaatt agtcagcaac

2581 catagtcccg cccctaactc cgcccatccc gcccctaact ccgcccagtt ccgcccattc

2641 tccgccccat cgctgactaa ttttttttat ttatgcagag gccgaggccg cctcggcctc

2701 tgagctattc cagaagtagt gaggaggctt ttttggaggc ctaggctttt gcaaaaagct

2761 tgcggccgct taagctgcag aagttggtcg tgaggcactg ggcaggtaag tatcaaggtt

2821 acaagacagg tttaaggaga ccaatagaaa ctgggcttgt cgagacagag aagactcttg

2881 cgtttctgat aggcacctat tggtcttact gacatccact ttgcctttct ctccacaggt

2941 gtccactccc aggttcaata cagctcttaa gcggccgcaa gcttgccgcc aacatcgatg

3001 aaaaagcctg aactcaccgc gacgtctgtc gagaagtttc tgatcgaaaa gttcgacagc

3061 gtctccgacc tgatgcagct ctcggagggc gaagaatctc gtgctttcag cttcgatgta

3121 ggagggcgtg gatatgtcct gcgggtaaat agctgcgccg atggtttcta caaagatcgt

149

3181 tatgtttatc ggcactttgc atcggccgcg ctcccgattc cggaagtgct tgacattggg

3241 gaattcagcg agagcctgac ctattgcatc tcccgccgtg cacagggtgt cacgttgcaa

3301 gacctgcctg aaaccgaact gcccgctgtt ctgcagccgg tcgcggaggc catggatgcg

3361 atcgctgcgg ccgatcttag ccagacgagc gggttcggcc cattcggacc gcaaggaatc

3421 ggtcaataca ctacatggcg tgatttcatc tgcgcgattg ctgatcccca tgtgtatcac

3481 tggcaaactg tgatggacga caccgtcagt gcgtccgtcg cgcaggctct cgatgagctg

3541 atgctttggg ccgaggactg ccccgaagtc cggcacctcg tgcacgcgga tttcggctcc

3601 aacaatgtcc tgacggacaa tggccgcata acagcggtca ttgactggag cgaggcgatg

3661 ttcggggatt cccaatacga ggtcgccaac atcttcttct ggaggccgtg gttggcttgt

3721 atggagcagc agacgcgcta cttcgagcgg aggcatccgg agcttgcagg atcgccgcgg

3781 ctccgggcgt atatgctccg cattggtctt gaccaactct atcagagctt ggttgacggc

3841 aatttcgatg atgcagcttg ggcgcagggt cgatgcgacg caatcgtccg atccggagcc

3901 gggactgtcg ggcgtacaca aatcgcccgc agaagcgcgg ccgtctggac cgatggctgt

3961 gtagaagtac tcgccgatag tggaaaccga cgccccagca ctcgtccgag ggcaaaggaa

4021 tgacgtacga ccaggtctag agtcggggcg gtcggccgct tcgagcagac atgataagat

4081 acattgatga gtttggacaa accacaacta gaatgcagtg aaaaaaatgc tttatttgtg

4141 aaatttgtga tgctattgct ttatttgtaa ccattataag ctgcaataaa caagttaaca

4201 acaacaattg cattcatttt atgtttcagg ttcaggggga ggtgtgggag gttttttaaa

4261 gcaagtaaaa cctctacaaa tgtggtaaaa tgcataagga tctgaacgat ggagcggaga

4321 atgggcggaa ctgggcggag ttaggggcgg gatgggcgga gttaggggcg ggactatggt

4381 tgctgactaa ttgagatgca tgctttgcat acttctgcct gctggggagc ctggggactt

4441 tccacacctg gttgctgact aattgagatg catgctttgc atacttctgc ctgctgggga

4501 gcctggggac tttccacacc ctaactgaca cacattccac agctcatgag tgccattgtt

4561 tcttctttct ccctctgttg tctcaatcgg aaaagctagg gccggggggt ggccgtaatc

4621 ccagacattc cgggcgatgc ctcgagccaa cgggtggctc ccgacgtgct tggctttcgt

150

4681 gggcagggag agccctgcgc tgggttttag acgtgccgcg agtgctgctt gattcgaggt

4741 acacgttaga gatatgaggc acatagcact tcgctggcac ccccgcgtgt aggcgtgtcc

4801 tcatccgtgt ttggtttggt acgtcggtta ccggagaggc ctgtgtggct gtcacgcgcg

4861 acgaagacgg gggagtgttg cctcgtgcct ccaggtgctt tgatgacgcg cattgcgcgt

4921 ttcgcttatc caaacaccgc aatcactatt tgaccatgca atcaggttga tagggacgtc

4981 aaagtgcatt ttctgccttt aaactggtcg cgcgggtgta gatgtggcat ctggacggga

5041 ccctcagaac gaagaaaaca agatagggat ggctaagaat tgcacacgcc cctcgtgtgg

5101 ctgaaagact tttgcgccac attaacgcgt aaacaaattg agctagtctt acagttattc

5161 aggaggagga caaaattatg tcgtccgacc gtgtgttatc taggaagccg tgacgcgaga

5221 catataccta gtatctctag tacggctgcg tcttctttcc tccctttttg aaaggaagac

5281 gtcggtgtcc gcgattctat tttatgcgaa aggtgtggga ggtgcattag tcgcgctttt

5341 cttgatagtt gaccgtctgg ctgattcagg gaaatcctca caccatcctg tctttctcac

5401 acgacagcct cttctttcgg aatattcccc tcaacaccta atctgtaaca tatggacaag

5461 aagtacagca tcggcctgga catcggcacg aactcggtgg gctgggcggt gatcacggac

5521 gagtacaagg tgccctccaa gaagttcaag gtgctgggca acaccgaccg ccactcgatc

5581 aagaagaacc tgatcggcgc cctgctgttc gactccggcg agaccgccga ggcgacgcgc

5641 ctgaagcgca ccgcgcgtcg ccgctacacg cgtcgcaaga accgcatctg ctacctgcag

5701 gagatcttca gcaacgagat ggccaaggtg gacgactcgt tcttccaccg cctggaggag

5761 tccttcctgg tggaggaaga caagaagcac gagcgccacc ccatcttcgg caacatcgtg

5821 gacgaggtgg cctaccacga gaagtacccg acgatctacc acctgcgcaa gaagctggtg

5881 gacagcaccg acaaggcgga cctgcgcctg atctacctgg ccctggcgca catgatcaag

5941 ttccgcggcc acttcctgat cgagggcgac ctgaaccccg acaactcgga cgtggacaag

6001 ctgttcatcc agctggtgca gacctacaac cagctgttcg aggagaaccc gatcaacgcc

6061 tccggcgtgg acgccaaggc gatcctgagc gcgcgcctgt ccaagagccg tcgcctggag

6121 aacctgatcg cccagctgcc cggcgagaag aagaacggcc tgttcggcaa cctgatcgcg

151

6181 ctgtcgctgg gcctgacgcc gaacttcaag tccaacttcg acctggccga ggacgcgaag

6241 ctgcagctga gcaaggacac ctacgacgac gacctggaca acctgctggc ccagatcggc

6301 gaccagtacg cggacctgtt cctggccgcg aagaacctgt cggacgccat cctgctgtcc

6361 gacatcctgc gcgtgaacac cgagatcacg aaggcccccc tgtcggcgtc catgatcaag

6421 cgctacgacg agcaccacca ggacctgacc ctgctgaagg cgctggtgcg ccagcagctg

6481 ccggagaagt acaaggagat cttcttcgac cagagcaaga acggctacgc cggctacatc

6541 gacggcggcg cgtcgcaaga ggagttctac aagttcatca agcccatcct ggagaagatg

6601 gacggcacgg aggagctgct ggtgaagctg aaccgcgagg acctgctgcg caagcagcgc

6661 accttcgaca acggcagcat cccccaccag atccacctgg gcgagctgca cgccatcctg

6721 cgtcgccaag aggacttcta cccgttcctg aaggacaacc gcgagaagat cgagaagatc

6781 ctgacgttcc gcatccccta ctacgtgggc ccgctggccc gcggcaacag ccgcttcgcg

6841 tggatgaccc gcaagtcgga ggagaccatc acgccctgga acttcgagga agtggtggac

6901 aagggcgcca gcgcgcagtc gttcatcgag cgcatgacca acttcgacaa gaacctgccc

6961 aacgagaagg tgctgccgaa gcactccctg ctgtacgagt acttcaccgt gtacaacgag

7021 ctgacgaagg tgaagtacgt gaccgagggc atgcgcaagc ccgccttcct gagcggcgag

7081 cagaagaagg cgatcgtgga cctgctgttc aagaccaacc gcaaggtgac ggtgaagcag

7141 ctgaaagagg actacttcaa gaagatcgag tgcttcgaca gcgtggagat ctcgggcgtg

7201 gaggaccgct tcaacgccag cctgggcacc taccacgacc tgctgaagat catcaaggac

7261 aaggacttcc tggacaacga ggagaacgag gacatcctgg aggacatcgt gctgaccctg

7321 acgctgttcg aggaccgcga gatgatcgag gagcgcctga agacgtacgc ccacctgttc

7381 gacgacaagg tgatgaagca gctgaagcgt cgccgctaca ccggctgggg ccgcctgagc

7441 cgcaagctga tcaacggcat ccgcgacaag cagtccggca agaccatcct ggacttcctg

7501 aagagcgacg gcttcgcgaa ccgcaacttc atgcagctga tccacgacga ctcgctgacc

7561 ttcaaagagg acatccagaa ggcccaggtg tcgggccagg gcgactccct gcacgagcac

7621 atcgccaacc tggcgggctc ccccgcgatc aagaagggca tcctgcagac cgtgaaggtg

152

7681 gtggacgagc tggtgaaggt gatgggccgc cacaagccgg agaacatcgt gatcgagatg

7741 gcccgcgaga accagaccac gcagaagggc cagaagaaca gccgcgagcg catgaagcgc

7801 atcgaggaag gcatcaagga gctgggctcg cagatcctga aggagcaccc cgtggagaac

7861 acccagctgc agaacgagaa gctgtacctg tactacctgc agaacggccg cgacatgtac

7921 gtggaccagg agctggacat caaccgcctg tccgactacg acgtggacca catcgtgccc

7981 cagagcttcc tgaaggacga ctcgatcgac aacaaggtgc tgacccgcag cgacaagaac

8041 cgcggcaaga gcgacaacgt gccgtcggag gaagtggtga agaagatgaa gaactactgg

8101 cgccagctgc tgaacgccaa gctgatcacg cagcgcaagt tcgacaacct gaccaaggcc

8161 gagcgcggtg gcctgtcgga gctggacaag gcgggcttca tcaagcgcca gctggtggag

8221 acccgccaga tcacgaagca cgtggcgcag atcctggact cccgcatgaa cacgaagtac

8281 gacgagaacg acaagctgat ccgcgaggtg aaggtgatca ccctgaagtc caagctggtc

8341 agcgacttcc gcaaggactt ccagttctac aaggtgcgcg agatcaacaa ctaccaccac

8401 gcccacgacg cgtacctgaa cgccgtggtg ggcaccgcgc tgatcaagaa gtaccccaag

8461 ctggagagcg agttcgtgta cggcgactac aaggtgtacg acgtgcgcaa gatgatcgcc

8521 aagtcggagc aggagatcgg caaggccacc gcgaagtact tcttctactc caacatcatg

8581 aacttcttca agaccgagat cacgctggcc aacggcgaga tccgcaagcg cccgctgatc

8641 gagaccaacg gcgagacggg cgagatcgtg tgggacaagg gccgcgactt cgcgaccgtg

8701 cgcaaggtgc tgagcatgcc ccaggtgaac atcgtgaaga agaccgaggt gcagacgggc

8761 ggcttctcca aggagagcat cctgccgaag cgcaactcgg acaagctgat cgcccgcaag

8821 aaggactggg accccaagaa gtacggcggc ttcgactccc cgaccgtggc ctacagcgtg

8881 ctggtggtgg cgaaggtgga gaagggcaag tccaagaagc tgaagagcgt gaaggagctg

8941 ctgggcatca ccatcatgga gcgcagctcg ttcgagaaga accccatcga cttcctggag

9001 gccaagggct acaaagaggt gaagaaggac ctgatcatca agctgccgaa gtactcgctg

9061 ttcgagctgg agaacggccg caagcgcatg ctggcctccg cgggcgagct gcagaagggc

9121 aacgagctgg ccctgcccag caagtacgtg aacttcctgt acctggcgtc ccactacgag

153

9181 aagctgaagg gctcgccgga ggacaacgag cagaagcagc tgttcgtgga gcagcacaag

9241 cactacctgg acgagatcat cgagcagatc tcggagttct ccaagcgcgt gatcctggcc

9301 gacgcgaacc tggacaaggt gctgagcgcc tacaacaagc accgcgacaa gcccatccgc

9361 gagcaggcgg agaacatcat ccacctgttc accctgacga acctgggcgc cccggccgcg

9421 ttcaagtact tcgacaccac gatcgaccgc aagcgctaca cctccacgaa agaggtgctg

9481 gacgcgaccc tgatccacca gagcatcacc ggcctgtacg agacgcgcat cgacctgagc

9541 cagctgggcg gcgactcccg cgcggacccg aagaagaagc gcaaggtttc cgccggcgga

9601 ggaagcggag gaatgcctct gcccgcaacc cacgacatcc accttcacgg ctccatcaac

9661 ggccacgagt tcgacatggt ggggggagga aaaggcgacc cgaacgccgg ctcgctggtg

9721 accacagcga aatccaccaa gggtgccctg aagttctctc cctacttgat gatcccccac

9781 ctcgggtacg ggtactacca gtacctcccc tacccggacg gaccctcgcc tttccaggtc

9841 tccatgttgg aaggatcggg gtatgcagtc taccgcgtgt tcgactttga agacggaggc

9901 aagctgtcta ccgagtttaa gtactcctac gagggttccc atatcaaggc cgacatgaag

9961 ctgatgggaa gcggtttccc tgacgacggc ccagtcatga ccagccagat tgtcgaccag

10021 gacggctgcg tgtccaagaa gacgtatctt aacaacaaca ccatcgtgga cagcttcgac

10081 tggagttaca acctgcagaa tgggaagcgc tacagggccc gagtgtcgag ccactacatc

10141 ttcgacaagc ccttttcagc cgatctcatg aagaagcagc cggtcttcgt ttaccgcaag

10201 tgccacgtga aggctaccaa gaccgaagtc accctggacg agagggagaa ggcgttctac

10261 gagctggcta gcccgaagaa gaagcgcaag gtttaaacta agctttcgca tccacgtaca

10321 aaactggatg ctgaatgaga tctggaggga gggatggctg ccgagtccgg agtccaagga

10381 aattcaagcg gttacgaagg aaattgtatg tcctgctaat actttatggt cgatggggaa

10441 tttggttgat tacgagtgag agttcatttg ctgctatcac ttcaagttcg agagattttc

10501 aggatgaata ggaaggaggt cagtattgag gcggacgaga ggctccgtgg ggccgcagct

10561 cgcgcgagag ggcgcagcgc cttttgggaa aggtggactg atctcgtcac cgagacgaag

10621 ctcatgcgct ttgatttatt tatgggcctc aggggaagat ataactcttt ttttgttgat

154

10681 gaggtgatgt gtcttagttt aatgtgtcag gcagccactg gtcctgactt atccgaatcc

10741 tacttgtcat agcgaggaga agaatgaaac ataccaagaa gcaatgagtg gtgcaatgac

10801 actctgactg ttacgggcgc gcctgagggc ggccgcacta gtcagctgat gcgcatggta

10861 ccttaattaa aaaagaagac catatacgtc tcctataggt ctcgtatact gcttcctata

10921 cgacatcacc gatggggaac acgaagcgat ccagcccccc taattagccc gggcgaaagg

10981 cccagtcttt cgactgagcc tttcgtttta tttgatgcct ggcagttccc tactctcgca

11041 tggggagtcc ccacactacc atcggcgcta cggcgtttca cttctgagtt cggcatgggg

11101 tcaggtggga ccaccgcgct actgccgcca ggcaaacaag gggtgttatg agccatattc

11161 aggtataaat gggctcgcga taatgttcag aattggttaa ttggttgtaa cactgacccc

11221 tatttgttta tttttctaaa tacattcaaa tatgtatccg ctcatgagac aataaccctg

11281 ataaatgctt caataatatt gaaaaaggaa gaatatgagt attcaacatt tccgtgtcgc

11341 ccttattccc ttttttgcgg cattttgcct tcctgttttt gctcacccag aaacgctggt

11401 gaaagtaaaa gatgctgaag atcagttggg tgcacgagtg ggttacatcg aactggatct

11461 caacagcggt aagatccttg agagttttcg ccccgaagaa cgttttccaa tgatgagcac

11521 ttttaaagtt ctgctatgtg gcgcggtatt atcccgtatt gacgccgggc aagagcaact

11581 cggtcgccgc atacactatt ctcagaatga cttggttgag tactcaccag tcacagaaaa

11641 gcatcttacg gatggcatga cagtaagaga attatgcagt gctgccataa ccatgagtga

11701 taacactgcg gccaacttac ttctgacaac gatcggagga ccgaaggagc taaccgcttt

11761 tttgcacaac atgggggatc atgtaactcg ccttgatcgt tgggaaccgg agctgaatga

11821 agccatacca aacgacgagc gtgacaccac gatgcctgta gcgatggcaa caacgttgcg

11881 caaactatta actggcgaac tacttactct agcttcccgg caacaattaa tagactggat

11941 ggaggcggat aaagttgcag gaccacttct gcgctcggcc cttccggctg gctggtttat

12001 tgctgataaa tccggagccg gtgagcgtgg ttctcgcggt atcatcgcag cgctggggcc

12061 agatggtaag ccctcccgta tcgtagttat ctacacgacg gggagtcagg caactatgga

12121 tgaacgaaat agacagatcg ctgagatagg tgcctcactg attaagcatt ggtaaaggag

155

12181 gaaaaaaaaa tgagccatat tcaacgggaa acgtcgaggc cgcgattaaa ttccaacatg

12241 gatgctgatt tatatgggta taaatgggct cgcgataatg tcgggcaatc aggtgcgaca

12301 atctatcgct tgtatgggaa gcccgatgcg ccagagttgt ttctgaaaca tggcaaaggt

12361 agcgttgcca atgatgttac agatgagatg gtcagactaa actggctgac ggaatttatg

12421 ccacttccga ccatcaagca ttttatccgt actcctgatg atgcatggtt actcaccact

12481 gcgatccccg gaaaaacagc gttccaggta ttagaagaat atcctgattc aggtgaaaat

12541 attgttgatg cgctggcagt gtgcacctgt ttgtaattgt ccttttaaca gcgatcgcgt

12601 atttcgcctc gctcaggcgc aatcacgaat gaataacggt ttggttgatg cgagtgattt

12661 tgatgacgag cgtaatggct ggcctgttga acaagtctgg aaagaaatgc ataaactttt

12721 gccattctca ccggattcag tcgtcactca tggtgatttc tcacttgata accttatttt

12781 tgacgagggg aaattaatag gttgtattga tgttggacga gtcggaatcg cagaccgata

12841 ccaggatctt gccatcctat ggaactgcct cggtgagttt tctccttcat tacagaaacg

12901 gctttttcaa aaatatggta ttgataatcc tgatatgaat aaattgcagt ttcatttgat

12961 gctcgatgag tttttctaaa ggaggaaaaa aaaatggaga aaaaaatcac tggatatacc

13021 accgttgata tatcccaatg gcatcgtaaa gaacattttg aggcatttca gtcagttgct

13081 caatgtacct ataaccagac cgttcagctg gatattacgg cctttttaaa gaccgtaaag

13141 aaaaataagc acaagtttta tccggccttt attcacattc ttgcccgcct gatgaatgct

13201 catccggagt tccgtatggc aatgaaagac ggtgagctgg tgatatggga tagtgttcac

13261 ccttgttaca ccgttttcca tgagtgcaca cgacgatttc cggcagtttc tacacatata

13321 ttcgcaagat gtggcgtgtt acggtgaaaa cct

//

Genetic Data D.4. Genomic sequence of nitrate reductase gene from Chlorella sp. CCMP252

LOCUS NR_genomic 11817 bp DNA linear UNA 06-JUN-2018

SOURCE C252 (749)

156

ORGANISM C252

.

FEATURES Location/Qualifiers

primer_bind 2624..2650

primer_bind 2651..2677

primer_bind 2992..3019

CDS order(3001..3074,3187..3287,3595..3744,4146..4208,

4614..4724,5397..5535,5989..6137,6516..6644,6935..7078,

7267..7410,7657..7763,8084..8231,8562..8746,8918..9066,

9351..9472,9802..9977,10294..10438,10688..10875)

misc_feature order(3614..3630,3631..3636)

/note="Geneious type: CRISPR"

misc_feature order(complement(3647..3663),complement(3641..3646))

/note="Geneious type: CRISPR"

primer_bind complement(3799..3816)

primer_bind complement(4159..4185)

primer_bind complement(4179..4205)

ORIGIN

1 ttgttatttc tatacttatt tcaatgaaaa ttttttcatt tttaggatta tttgtatgca

61 tcaatttgat tattcataca attattttgg gtcaattagt ttagcctttg ttatgtatgt

121 aaggccggga gaaaaattta gccaagtata tgtgtatata gacattgatt gtttgttatt

181 tgtatcgctc ctgtttagta tagaataacc ccatgtacaa gaccagccca gataatgaac

241 atcatggatg aaccttgtcc attcctttca aacatcacga gcacaaactg agcagcaaca

301 gctagtgctg ttgtttttgc tggggcctct caccatgagc cagggggcgg gagtcgataa

361 gctcgtgggg gtcaatgccc gcaatagggc tgcctgtgat attccgcacg ttatccatcc

157

421 cgctcatcac attgctcagg ccccgagact ggaggccctc aatcaccctg tcagcatcct

481 ccagcttcag cccacgaagc tgcagcacac gtgggcaatc gcacacatgt cgagtaggaa

541 aacacactcg gttgcatgtt catttgacaa aatgcacgag aaggacagga atgaactggg

601 gtttaggacc agtgcaagtg gtgcctacat tatgcaattc ataaccaagg ctatctggac

661 tggggattgg agcttcccca gctacacctc atcaactaaa tcaacctgat acagtgagca

721 atcatcgact gcctttgcgg gcaagattca aaagcggcaa tagattgtgc accgcccttg

781 ttgtcgttgc ttcctccttt cttccaaata tcagcttcat ataaacacat ggtaagctta

841 cctgtatgtt ggcacgagtg gtgatgtcag cacatccatc cgctccatag ggggctatgg

901 agtctccgag gaacctcatc tgctccgagg tgagctcccc atttgggacc tgcacaaaaa

961 gctcgaaagc attcattaaa gtttccagta agattaatgt tcgagtcaac actggtcttg

1021 cataatgtca tccagatggc atgagctggc atgccattga cagccagcct tcttggtgct

1081 tcatgagcag aaaggatcca atgcaaaatg aatacatcga tacagcaaat aaaacaatgt

1141 gggacagcct atcaatcagg cgcagtatga gcctgttcca accgactctc atgtacatgc

1201 acaggaacag gaagttcagg cagcagttgc gaagcccacc ttcaagcgca tcatgaaacg

1261 tccaggtgtc ttctttctcc tgtgaaacaa ccctgcccac ttaagcctga cttcaatatc

1321 atcaaggtcc aaatcttccc aatttgtctt gccctctcta cagcacacac aatgcaggtg

1381 aatgactggt cgcatgcagc ggcagccagc tgctttggga tttcacactt cctgtatgac

1441 ttcgaaacat acttcacaat gcaagacatc gccaacaacg gacaatcatc ctcaaaataa

1501 ttctcctgca gctgacacgt ctttaacgta aagcacaaca agaaccactt gcattgcggg

1561 aatatacatg attttgacag ttacccaggc ataccttatg agctgtgcca ggtcggatac

1621 ctctgtccac accctggagc cgcatttctc attcttgatg cgctcaaact tgttggcttg

1681 cttggccatc gcacgcatct acatgtagca aatgtatctc cagttacttg tggcattgat

1741 acatggatcg attctggcag ccccgatata catttgataa ttagttcact gagctggcat

1801 caataccgaa tagctcaaca caacatggga tgttgattag tgcgaatata tgcatgactt

1861 caggttatat tgggaggata gcaaccgcag ctcacccgtg cctcctcagg taaatatctc

158

1921 agaccgcttg tttctaggtc atagcccagc tccgccatct tagctgcaac ctccttggac

1981 aacgaatgtt catcagcgcc cgatgattca tcagaataaa aggatgcctc ctgctgcgcg

2041 acttccaaag ctggcctggg agcgataaat ccgcctctgc gttgcctgat cagagggacg

2101 cttgggcggc agcagaaacg ccggtgcttg aatgcgacct gtattcaagc atatcaattg

2161 tggctgattg aacatgtcta cgtcttttat aaaaaaatca acggtcacag cggcttatcg

2221 gcttgccccc acttgggaag aaagtggtca agtcactacc gagtagccag aaaatagccc

2281 caaaaagatg taaagctctt tagcaaaata ttgatgcgtt tgaacgctct ttagtccttc

2341 tcgatacctg catactcctc tgaactcccg gttttatgtg ccttcaataa cgcgcgtatt

2401 tcaatcgagt tgcttgcagt gaaggtcgat ttgaatactg cagcgagcat ttaccaaata

2461 tataaggcac caaaatgatt tgtattgtta cagtgatgct gacctcaatc atgttgagct

2521 gagactcctc attacggctt gcgcccgcta ctccacatgc tgcgcctgct ctgagcaagg

2581 gtgcagcgcg aatatgtcat atgagagcgg gtgcaagggg ccgaaaatca agaggcaccg

2641 tgagagaggc gtgcgaccaa cacctgctct catgtattgc atgagactgg gaagtcaacc

2701 tagctctaaa ataaatgagc cgtgagtcga agccctactg gctcgttcga gctggaacct

2761 cagcactgcg cctcagcggg gactatatta tagagcagat atggggttca ctggtaactc

2821 atgtatagat gagaaccttc tattagcaat tcaatcttcg caagaatggt tttgctcaaa

2881 gaagacagcg ctcacggcaa gcatgacgca atagccgagc tgaaacgtat ttctgtcaac

2941 ggcgttggcc tgaaggcaaa cacagctccc gcagaaaaat ctcttattcc cggatcagct

3001 gaatattcaa gacatcttcc agtgaacaag attcataaag atgacgagaa caccagggat

3061 gactgggtgc caaggtaatc gagtattcct ttccttgcac actgtcacag cggaataccg

3121 taccaactgg tttgtgtctg tagaatactg caattactcc tcctccctgg ctaattcgta

3181 ttgcaggcat cccgatcttg tcaggctgac aggccgacac ccattcaact gcgagccgat

3241 acctactcgt ctgtttgaga gctttattac tcctccaagc cttcattgta agaattgcgt

3301 atgattgcgt gaaaaagtac cgctacatac atttccatca tcatcaatcg caatctttat

3361 gcgctaacat tgcatcagct gagcggttgt tagtacggta ctgttgcaca gcttgctcac

159

3421 cagccactgg tcgtgtgaaa cagcaaaacg aggctgctca gtcattttct attttatcat

3481 tcatatgtta tgttgtagct ttatattatt atggtaactt cccatgattg cacagattgc

3541 ccgataacat aatttcctat gattttcttg actcgccatg agttttttct gcagatgttc

3601 gcaaccacgg ggcagtcccc agaatcaaat gggaggagca ccgtgtggta gtgaacgggc

3661 ttgtttccaa ccccaagtcg tttaccatgg ccgacatact cgcgatgcca tcagttgatg

3721 tcacatgcac cttgacatgc gtaggtgtgt cacagcttac atacatgcct ctcagtcctt

3781 gctgctgttc ttgccccagt gcagcctccc tgctgcatcc accttgccac gctccgtaca

3841 gaatcctatg taaaaacatc attggtgcat gagacacatt ccttgggatg ccatcccttg

3901 tgctgctgga atccctgatg atcaagcact gctgttgctg aatccaacgc caccccggct

3961 gcactctttg ggttattttg gatgcatcag aatatagctg catcatcatg tcaacaaaag

4021 ggggggacag cagggtgcag gcgaacgcat ttacgtgtgg acgcgccaca ccaccacgtg

4081 cgggcaggga cagttgatat tttgcgaaga aaattgcgct tgtgtacaca tggacatgtg

4141 tgcaggtaac aggcgcaagg agcagaacat ggtgaagcag acgattggat tcaactgggg

4201 tgctgctggt aaggacctgg tgggttttcc agccacacag cttgattaga tcactgctgc

4261 tagtcatggg ctcgaaatcc cctctggctc agcgccgaca gtgcgcagta ctagtccaac

4321 agctgccaaa accttctgag caggtcactg tcatttgctt cagacagctc agttagtacg

4381 taggctgcat gggcaaggac tgacaatgtt gtagcaatcg tggatgggag aggcgcggaa

4441 gcacaggctc actgtgtccc tgcattagaa atcacctgtg ccttttgctt tacgcggctt

4501 cctgctttcc tgcttgcatg cattgttgtt ccgcggatat tttattctgt gtcctgtttt

4561 gtcgtctgtt cttgttttgc aatcattctt gcaaccacta acccctcctc caggcacggg

4621 gtgctcgacc tggacgggtg tgaggctgag ggacgtcctg caggctgccg gggtcaagtc

4681 gcgggatgag ggcgcccggc acgtgtgctt tagggggccg aagggtgagg acggtggggc

4741 cagcccctgt tgctcgtgct gtctgctgat gctgcacgcc atgcagagca tcttgtcata

4801 ccgcgatgga tgaggcacgc cagactgtgc accagaatca tctgtcatac tgcccatgct

4861 gtttccctcg aaaagaacaa tcccacctct ctgggcacgg agcattcgaa ctagagtaac

160

4921 agattgtcgg cctggagacc aacccgcttg catgaagtga ccgctgcaaa aaaccctaac

4981 tgccacaggc aggcgctggc ccgcatcccg catttccgca gtccccagca acatgtgctc

5041 accagggggc ctgatctgtg tgttgtatcg aagtcttcaa gtgcatatgt ttacattccc

5101 ctcttcatat tgctttaaag aacaccagac aaactgcaat gaattattat atatatatgg

5161 atagattata ataacagaaa tggtggggaa aaaatggagc ccccaggcgg caccccctct

5221 ctctcaattc ttatcaacat cccccccccc ccccccttcc cccccctttc cgtcagaaga

5281 ccaccgtcta gccactcctt gatgctgggg ttcccctaaa taccccctga agaaccctgg

5341 cctccaaccc taagaaagat gtcatcgtca ttacatgtat caatgaatgc acgcaggcga

5401 gcttcccaag ggcgaggatg ggtcctatgg gacctcactc actctctgga aggccatgga

5461 cccgtcagcg gatgtgatcc tggcctacaa gcagaacggc aggctgctca cccccgacca

5521 tggcttcccc atcaggtgag ggaggggctg gcagggtcgc ttgcttctgc tgaggggggg

5581 tgcgggcatg gggagcggtg tgtgctgcag cccctggaag ggcgggtaca gccttgggac

5641 tttgtttacc ctatatagga gccgcgtctt gcctggagat gatcaccgca tctcgtccgg

5701 ccgagcacca ttgtttggtt gtctcgtctt ggtgataact tgtgcgtcct tgcttacccc

5761 ttgtcttggt ggcgcaagga tgtttccctc atcctggtgg tgagggtact tcgtttccca

5821 cccattatgg tggcgaacat cgcctctctc tggtgctggg caccgttctg ggaacaggcg

5881 tgagccaaaa agctcctctt ttcatgtata tctggggagc ttcgtgggaa gggtggctga

5941 gagcggcttc tggcatactc acctgaccct gctggttgct tggtgcagga tcatcatccc

6001 cggctacatt ggtggccgta tggtcaagtg gctggaggag atcacagtca ccagcgagga

6061 gtcaaccaac cattaccact actttgacaa ccgggtgctg ccatcgcatg tggacgagaa

6121 gaaagcgatg gatgagggta agcatgcaag gccaagcgat gccaggaaga tctgcagcat

6181 gcagttgact ttcattctga ttgaaaaaga ctgtctgttg ttgcttcatt acctgattcg

6241 aacaaagaag gaagatggta cacttgtttc cctctgctgg agcttggctg cacacagata

6301 gcagcagctg ctgctgctgt gctggtgttt gttcatgggg gtgtccatgg tggatatgca

6361 catgacaatg acggtgtttg ggatgctgtg aatgatgagg cagcaatact gttatttctg

161

6421 ctcaagcatg tgaacatgca tggtcctctc acctgcttgt agatttgagt aaaaatgcca

6481 gcactaacca tgtggatgcc gcccttcttg cgtagggtgg ttctacaagc ctgaatacat

6541 ctgcaacgac ctggcggtga actccgccat cggctaccca ttgcatgagg aagtggtgcc

6601 actgaacaag cgcacgcaca cagtgcgagg atatgcgtac tctggtcagt ctgccctctt

6661 gaccacctgt cgctcatgcc tggctggacc acatggctgg ccctgagtat gtgaatataa

6721 cttgaccagg tcgtggaggt acagaaaacc aaacatgaca ggtttatgcc tgatgcgatc

6781 ccgcaatcct gcattgcaac aagtccggtg tggcatgcca ggcttgctcc tctcactgtt

6841 gggcacaccc gcaaacatga gtcagaggat agggagggcg cgagggcggc ggccagctgc

6901 caaactctgc ctgaggtttt cgctatttat gcaggttgcg gcaatcagat agtgaggtgt

6961 gaagtttctc tggatgatgg gaagagctgg gtactggcgg acatcacgca caaagtggag

7021 cctaacagct acgggaaata ttggtcgtgg gtgtggtggc aagtggaagt gcccatgggt

7081 gcgtggttat gcatcgaagc tgcgatcctc actatcattc cacttctgtt tgcatggcac

7141 gcaaccacct accaccgccc tgcatgcttt ttgacaaatg agccttgtgt tgacaagcaa

7201 gtggttgggc actttgtcta gcttgtgatg ggcaaaagct gaacactttg tttcctgctg

7261 acgcagcgga tctgatgcga tcgcctgaga tttgctgccg tgcttttgac agcacgatga

7321 acacgcagcc aaacacgttc acgtggaatg tgatgggcat gatgaataac tgtgtgtacc

7381 gagtgaaagt gcaccctgaa ttaacagcgg gtgggtgctg ctttgcttca catatcatgc

7441 tgtaacccac aattgctgca tgagaggttg ggcacgtcaa ctagcgccag tgacttgcca

7501 gctgctctgt ttctagtcat taatcctgtg agtaaaccca gggttgacac cgatcgatta

7561 atatgcacat tcaagcaaac gctgctgccg acctccttgc agttgcactg tagctgccat

7621 ggctatatct gagtgtggac acggcatggc atgcagatgg tgggttcggc cttcgtttcg

7681 agcaccccac ggtagcaggt cccacagtgg gtggctggat gaacagagca gccaacgata

7741 agacgtcttc acaggaaaaa aaggtgaact gtaccagata gcaacgttgc acagaagagt

7801 gagccagccc gtcctgatcc agatgttttc aggcgctcgt gctatgcatg ctgttgtggc

7861 acaagtacgt gggcaaggct caagtggctg tcaccgtcca tgtctggaat gagcgcacca

162

7921 tagtatccac caatgctatt cgtcaagctc tggaacatac atattcactt cacatcagcc

7981 acttaagggg tcaagaacag cccgaacagc aacaggagct gccacagggc acttttcatc

8041 tggcataata cagaacaatc accacaaacc ttcaacaatg caggctgctc tccctgctgc

8101 tgcccagccc caatctaacg gccgaacctt caccatggag gaggtggaaa agcacgacac

8161 caaggagtca gcatggtttg tggtggatgg cagggtgtat gatgcgactc cctacctcaa

8221 ggatcaccca ggtgcaccat catctctgac gtgtacatgc aacgtgctgc gcatggtcag

8281 ggtcatgttt aaagttgtaa accctaaaaa tcctgaatag gttgcatgcg cctgtttggt

8341 ccgagatgag gagtatagcg ttgtggttca tgcctcacct gagcagacca gctgcagctg

8401 tgtgcaaggt gtacacattg attctgattg ccttgcatag cagctacgcg ctgctaaacg

8461 tggtgccaag gtgccagagc ttgggcactg tgcattcctt tccacacact ctggtgtgca

8521 ccagccgagt ggccaatgca tccgttgccc tgtgcatcca ggtggtgcgg acagcatttt

8581 gcttgtagcc ggcaccgacg ccacggagga gttccaggct atacactcgg ccaaggccaa

8641 ggcgatgctg ctagagtact acattgggga tctggaggga ggcgtagaca gcccgcccgc

8701 gcttcctgag cctgacgtcc ctacacctac ccctgctcct gcccctgcag ccaacgggca

8761 cagcaggccg tcttcggcca gcgagctggc caacatgaat acccaccgtc aagccaatgg

8821 gttctccagt ggtgcggcag aggtggtcaa tgcggtgggg tctggcccca aagcagcgcc

8881 gccagcgcca gcatccgctg caattaccac agacctggtg gccctgaacc ccaagaagaa

8941 gctggccttt gcgctggtgg agaaggagga gctgagtcac aacaccaggc gattccggtt

9001 tgcgctcccc tcctcccagc accgattcgg gctgccggtg gggaagcacg tcttcctgta

9061 tgccaggtgc gtgggcatga ctggtaccac acctcaccgt gtcttctggg tgagaagccc

9121 cttcctccct cctcccccct gctgggcctg ctcctgccac tcctatgcct tgaagaggtg

9181 ggcagtcaaa agcaagcatt tgggtgaatg atcgtggaga aacgatttgg aacgaaacga

9241 taagaaagaa ttatccctag tggagcttac acgaccgaca cgtggatcaa taggttacat

9301 gcgcagaatt gaaacgcatg gaacctaaaa tttatctgat gccggtgcag agtgaatggc

9361 gagttggtga tgcgcgccta cactccatcc agtagcgatg atgacctggg ctacttcgag

163

9421 ctggttgtca agatttacag agccaacgag cacccacgat tccctgaggg aggtgagcca

9481 ctgtctcgtc gccagtgact tacctctggc gtacgagctc ggatctgcac acacagctgc

9541 tgcggagtgt tgtatgtgcc gcagtggcac cattttctgc agcgacactc atgctgcaga

9601 tcagccataa aattccttgg ttctgtgcag ctctaaatcc gttgcattgg atcaacactg

9661 tatcccagtc agtgcagtga ggccagcaaa cctcattgct ggtttgaact gggttgcact

9721 caaatccata acaattgctg cctcaccccc taactcatgg ctctactgga tggtctggtt

9781 gattgattgc atttcatgca ggcaaaatgt cccagtacct ggactccctg gcgctggggg

9841 acaccattga tgtaaaggga cccacaggcc acgtgcacta tacgggcagg gggcggtaca

9901 ctctggatgg ggagccgtac ttcaccacac atatttccat gatcgctgga ggcaccggca

9961 tcacccccat gtatcaggtg ggcacaaaga tatacctgtg ctcttacacg atacctctca

10021 gcgtgccatg cagccacctc atggcatgcg aggcctgcac gcttttttgc agcagctgga

10081 agggactgaa gctgcattag tttgctggtg ccgctcatcc tttgcaagtt ttttctaaag

10141 aaagaacagc tgtgagattc agtgagcaaa aggtagacct gtggtggtcg gtcgtgtctg

10201 gctagagggg tagcgattgc cagcaacacg ttgtacaatg tgcggaaagt gccaccttca

10261 acaaggtgaa cgtctgcatg gcatgcactg caggtgatca aggctgttct caaggatgca

10321 gacgacacca ctcagctcag tttgctgtat gcgaacgttg cacccgatga catcctgctg

10381 agggaggagt tggacgcgct ggcccagagg cacaccaact tcgacgtgtg gtatacgggt

10441 aggtcaatgg aggccagcgc ttatggtgtg gcatttaact gcattgacgc tcgtgcacat

10501 cgagagaggc tcatgtcagg gagtatgctg tgcagttgca cgcatgggcg aacgcatggt

10561 cctctgaccg gtttgcatct gcctgaaagc agcttgcatt ccaacacaca tctggctctg

10621 gctggtcacc tgttaccatg atgccattga cgccacctga cttgcttctc tttctcttgg

10681 cctgcagtgg accgccccac agatggatgg cagtacagca ctggcttcat ctgtgaggac

10741 atgatccgtg agaggctgtt ccctgctggg ccttccacag tgtgcttcct gtgcggcccg

10801 ccgcccatga tcaagtttgc gtgcctacct aaccttgcca gtctggggta taatgagaag

10861 cattgtattc agttctgaag gcactaggga atgcaatggc agttgcgaag tatttgatcg

164

10921 cgaaactgcg ttgtatctgc tttttgaacg tttacaacaa ttcccaactg ttcaattaac

10981 ccagcacacg ctgcggtaac ttatggacta ttgagatgtc cgaggatttt ataatggtgt

11041 tttatcaagc atttcatgtt ttgttgtgca ctcacatttg tgttctcctt gccgtaattc

11101 tttcctgttc cgtgttttgg tttctttctg ccgctgtaca tcctggaggt gctgataaac

11161 acaggatgtg tattcgctcg atttccaaat ctgtcttgtg caatcatgtt tcccaaaccc

11221 acctgacgat aactgcagta gagccgaaat ggactattat acatgtctgc catggacctt

11281 gcgttccctt tttcattcta ttgtatttcc tcctacagtc aagaaatata tagaatgaag

11341 cagtttttcc tgctgattca agtgcacttg ttgcattaca gatatgtact ttgctgttat

11401 caggaataat gcatgctaaa tgtggaattc caaaatcatt ccacaaccct gtaatgagga

11461 aagctacgtt gtcccaattt tccaacttat ttgaagcaca ctttcgacga acatgccatc

11521 ccagggattt acgcaattat atagcttaac aatactattg gattaaatac gggttgatct

11581 gtccttgcat gtgcgcatcc acagcctcag gctcagttgg gggttattat tactcttttt

11641 gattcttggg cagggtttct cggaatgcaa ctcgtcttaa agctaactgc gaaagtataa

11701 tcatgactga gttgcataaa tcctattact tcgaactctg tcgcataagc atgagtgcca

11761 tggcgtccag gacctcgata tttggtcgag tacgggacag ggtcaaaccc cccctga

//

165