<<

University of Rhode Island DigitalCommons@URI

Open Access Dissertations

2018

THALASSIOSIROID RESPONSES TO SILICON STRESS AND

Joselynn Wallace University of Rhode Island, [email protected]

Follow this and additional works at: https://digitalcommons.uri.edu/oa_diss

Recommended Citation Wallace, Joselynn, "THALASSIOSIROID DIATOM RESPONSES TO SILICON STRESS AND OCEAN ACIDIFICATION" (2018). Open Access Dissertations. Paper 762. https://digitalcommons.uri.edu/oa_diss/762

This Dissertation is brought to you for free and open access by DigitalCommons@URI. It has been accepted for inclusion in Open Access Dissertations by an authorized administrator of DigitalCommons@URI. For more information, please contact [email protected]. THALASSIOSIROID DIATOM RESPONSES TO SILICON

STRESS AND OCEAN ACIDIFICATION

BY

JOSELYNN WALLACE

A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE

REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

IN

BIOLOGICAL AND ENVIRONMENTAL SCIENCES

UNIVERSITY OF RHODE ISLAND

2018

DOCTOR OF PHILOSOPHY DISSERTATION

OF

JOSELYNN WALLACE

APPROVED:

Dissertation Committee:

Major Professor Bethany D. Jenkins

Christopher E. Lane

Tatiana A. Rynearson

Nasser H. Zawia DEAN OF THE GRADUATE SCHOOL

UNIVERSITY OF RHODE ISLAND 2018

ABSTRACT 1

Atmospheric CO2 has risen dramatically since the industrial revolution. This 2 rise in atmospheric and oceanic pCO2 has perturbed ocean carbonate chemistry and 3 led to ocean acidification. are that account for 40% of oceanic 4 primary production through photosynthetic carbon fixation, which is aided by their 5 carbon concentrating mechanism (CCM). The CCM uses the bicarbonate transporters 6

(BCTs) and carbonic anhydrases (CAs). Our current understanding of how diatoms 7 might respond to ocean acidification is based on experiments using model diatoms or 8 assessing the response of the bulk diatom community, rather than assessing a diversity 9 of diatoms in a complex environment. This dissertation aims to expand our knowledge 10 regarding diatom response to CO2 in ecologically important, non-model diatoms and 11 their response in laboratory experiments and field mesocosms to alterations in CO2 12 concentration. 13

Diatoms’ primary production is a function of their growth, which is 14 constrained by the availability of nutrients in the surface ocean. Silicon is a nutrient 15 that is particularly important for diatoms, as they are unique in their requirement for 16 silicon to build their cell walls. Silicon limitation has been observed in low iron high 17 nutrient low chlorophyll (HNLC) regions and the North Atlantic Ocean, although 18 these studies have focused on the whole diatom community rather than specific diatom 19 groups that may not uniformly experience silicon limitation. Genetic markers have 20 been used to probe species-specific iron status in the field, and similar molecular 21 markers of silicon status could be powerful tools to probe the silicon status of different 22 co-existing diatom species. However, current studies of silicon limitation have relied 23

on model diatoms rather than species that are likely to be found in HNLC regions or 24 the North Atlantic Ocean, limiting the ability to develop appropriate molecular 25 markers. This dissertation aimed to fill in these knowledge gaps using transcriptomic 26 studies of Thalassiosiroid diatom isolate cultures as well as incubations of mixed 27 diatom assemblages. 28

Chapter 1 of this thesis examined the transcriptomes of two closely related 29

Thalassiosira diatoms to better understand the silicon limitation gene expression 30 response of diatoms found in ecosystems where their growth may be constrained by 31 silicon availability, toward the goal of developing appropriate molecular markers of 32 silicon status. This study found a gene family encoding putative ATP-grasp domain 33 proteins that were upregulated in silicon-limited T. oceanica and T. weissflogii. The 34 upregulation of these genes is unique to silicon limitation and its expression is not 35 induced in nitrate or iron limitation conditions. The members of this gene family were 36 also upregulated in response to silicon limitation in previously published T. 37 pseudonana transcriptome studies and homologs were found across a diversity of 38 diatoms, suggesting that it might be useful as a silicon limitation marker in many 39 diatoms, in addition to spp. diatoms. 40

Chapter 2 of this thesis adds to our knowledge of the diversity of CCM and 41 high CO2 response in different species of the Thalassiosira genus of diatoms. CO2 42 manipulation experiments were conducted with four Thalassiosira species – T. 43 pseudonana, T. rotula, T. weissflogii, and T. oceanica, and transcriptomes were 44 sequenced from the latter 3 species. These species displayed a range of growth rate 45 changes across CO2 conditions, from no change across conditions (T. pseudonana), 46

slower growth in lower CO2 (T. weissflogii), slower growth in higher CO2 (T. 47 oceanica), and faster growth in higher CO2 (T. rotula). The accompanying 48 transcriptome evidence for T. rotula, T. weissflogii, and T. oceanica indicate that these 49 differences in CO2 responses boil down to the ability to dynamically regulate the 50 carbon concentrating mechanism while maintaining internal redox and ion 51 homeostasis, as well as trade-offs between funneling excess CO2 into internal storage 52 pools versus a higher growth rate. 53

Chapter 3 uses a metatranscriptome approach to generalize about how the CCM 54 of Thalassiosira spp. differs from that of Skeletonema spp. diatoms. This study 55 involved CO2 manipulation experiments of a assemblage collected from 56

Massachusetts Bay during March of 2014. We show evidence of distinct enzymatic 57 strategies for N and P uptake and scavenging in these two diatom genera. There is also 58 evidence of distinct strategies in their CCMs – while Thalassiosira spp. diatoms seem 59 to use the α, δ, and ζ carbonic anhydrases and the substrate-specific SLC4 and non- 60 specific SLC26 bicarbonate transporters, Skeletonema spp. diatoms rely on the γ and ζ 61 carbonic anhydrases and the SLC26 bicarbonate transporters or diffuse CO2 uptake. 62

This suggests that the co-existence of Skeletonema spp. and Thalassiosira spp. might 63

- be partially due to a CCM that uses distinct forms of inorganic carbon (CO2 or HCO3 ) 64 and CA isozymes that use different metal cofactors for their enzymatic activity (γ can 65 substitute iron, δ can substitute cobalt). Furthermore, if Skeletonema does rely on 66

- diffusive CO2 uptake rather than active transport of HCO3 using SLC4 or SLC26 67 transporters, Thalassiosira may have a competitive advantage in high CO2 as it 68 downregulates these active transporters. 69

ACKNOWLEDGMENTS 70

The last 7 years of my life has certainly been challenging. I would have not made 71 it through without the constant support of friends, family, colleagues and mentors. I 72 would like to take this opportunity to thank my advisor, Bethany Jenkins. You have 73 been incredibly supportive of me and have provided me with many opportunities to 74 develop as a scientist. Thank you to my committee, Tatiana Rynearson, Chris Lane, 75 and Marta Gómez-Chiarri, for their patience, guidance, and precious time. I would like 76 to acknowledge Andrew King and Dreux Chappell, you have both been mentors and 77 role models for the kind of scientist and person I would like to be. I’d like to 78 acknowledge all the MetaLab group for helping me refine all of my talks and posters 79 over the years. Special thanks to all of the other Jenkins lab mates who have shared 80 snacks with me over the years – LeAnn, Shelley, AJ, Laura, Kris, Alexa, and Eric. 81

Special thanks to all of my very large family, particularly my nieces Ronnie Clo and 82

Sophie Pie. I would also like to acknowledge my partner, Andrew. I love you and 83 thank you for always reminding me to drink water and for washing my travel mugs. 84

85

86

87

88

89

v

PREFACE 90

This dissertation was formatted in accordance with the manuscript format 91 guidelines established by the Graduate School of the University of Rhode Island 92

93

94

95

96

97

98

99

100

vi

TABLE OF CONTENTS 101 102 103 ABSTRACT ...... ii 104

ACKNOWLEDGMENTS ...... v 105

PREFACE ...... vi 106

TABLE OF CONTENTS...... vii 107

LIST OF TABLES ...... xii 108

LIST OF FIGURES ...... xii 109

INTRODUCTION...... 1 110

Ocean acidification …………………………………………………………………... 1 111 112 Diatoms experience spatial and temporal fluctuations in CO2 ...... 1 113 114 Diatoms and the carbon concentrating mechanism ……………………………...... … 2 115 116 Diversification of the carbon concentrating mechanism ………………...…..….…… 3 117 118 cAMP-mediated CO2 sensing in diatoms ……………………………………..……... 4 119 120 Silicon limited diatom growth and primary production …………………………..…. 5 121 122 Silicon uptake and deposition …………………………………...………..……….… 6 123 124 Molecular biology of silicon limitation ……………………………..………….……. 7 125 126 Thesis motivation and outline ……………………………………...…………….….. 9 127 128 References ...... 15 129 130 Manuscript 1: Transcriptome sequencing reveals contrasting silicon limitation 131 responses in two Thalassiosiroid diatom species ...... 26 132

Abstract ...... 27 133 134 Introduction ...... 28 135 136 Materials and methods ...... 34 137

vii

138 Culturing, RNA extraction, and sequencing ...... 34 139 140 Trimming, assembly, and read mapping ...... 37 141 142 Annotation and gene ontologies ...... 38 143 144 Ortholog finding with Silix ...... 39 145 146 Differential expression analysis and hierarchical clustering ...... 39 147 148 Silicon transporter phylogeny ...... 41 149 150 Silaffins, cingulins, silacidins, frustulins, pleuralins, and polyamines ...... 42 151 152 Results and discussion ...... 43 153 154 Transcriptome assembly and annotation ...... 43 155 156 Differential expression analysis and hierarchical clustering ...... 44 157 158 Identification of genes sensitive to silicon status: silicon transporters and 159 160 silicon efflux proteins ...... 46 161 162 A subset of putative silaffins, cingulins, silacidins, frustulins, and pleuralins are 163 164 differentially regulated in Thalassiosiroids ...... 48 165 166 Genes involved in long-chain polyamine synthesis are not differentially 167 168 regulated in T. oceanica or T. weissflogii ...... 52 169 170 Identifying novel markers of silicon stress ...... 53 171 172 ATP-grasp proteins are silicon sensitive and silicon specific ...... 54 173 174 Identifying metabolic pathways with silicon sensitivity ...... 56 175 176 Sel1/MYND zinc finger domain gene expansion in T. oceanica ...... 59 177 178 Conclusions ...... 61 179 180 References ...... 64 181 182 183

vii

Manuscript 2: Thalassiosira diatoms display distinct, species-specific strategies 184 for responding to changes in pCO2 ...... 82 185

Abstract ...... 83 186 187 Introduction ...... 84 188 189 Materials and methods ...... 90 190 191 Culturing …………...... 90 192 193 Sequencing, mapping, differential expression ...... 91 194 195 Identification and comparison of orthologs ...... 92 196 197 Predicted protein localization ...... 93 198 199 Results ...... 95 200 201 Species-specific metabolic rearrangement, redox and ion homeostasis in low CO2 ....96 202 203 Bicarbonate transporters …………………………………………………………….. 97 204 205 Copy number variations in bicarbonate transporters ………………..….…… 97 206 207 Varied bicarbonate transporter localization prediction and CO2 sensitivity.....98 208 209 Carbonic anhydrases ……………………………………………………………………. 99 210 211 Copy number variations in carbonic anhydrases ...... 99 212 213 Varied localization predictions …………………....….. 100 214 215 Carbonic anhydrase CO2 sensitivity ...... 101 216 217 Putative C4 photosynthesis genes ...... 102 218 219 cAMP secondary messengers and CO2/HCO3- sensing ...... 104 220 221 Discussion ...... 105 222 223 Key point of regulation for T. oceanica carbon concentrating mechanism: SLC4- 224 225 mediated HCO3- uptake into the cytoplasm ...... 105 226 227

ix

T. oceanica decreased growth rate in high CO2 due to ionic and redox stress ….... 107 228

Key point of regulation for T. weissflogii carbon concentrating mechanism: 229 230 cytosolic-predicted β, δ, and ζ carbonic anhydrases preventing CO2 leakage 231 232 ...... 108 233 234

T. weissflogii energetic trade-offs across CO2 conditions ...... 110 235 236 Key point of regulation for T. rotula carbon concentrating mechanism: SLC4- 237 238 mediated HCO3- uptake into the plastid ...... 110 239 240

T. rotula relies on degradation pathways to sustain growth in low CO2 ...... 112 241 242

Evidence for cAMP-mediated CO2 sensing in T. oceanica, but not T. weissflogii or 243 244 T. rotula …………………………………………………………………....…….… 112 245 246 Conclusions ...... 113 247 248 References ...... 116 249 250 Manuscript 3: Following diatom response to ocean acidification in mesocosm 251 252 experiments with metatranscriptome sequencing indicates contrasting CO2 253 254 responses in Skeletonema and Thalassiosira diatoms ...... 133 255 256 Abstract ...... 134 257 258 Introduction ...... 135 259 260 Materials and methods ...... 138 261 262 Seawater collection and incubation ...... 138 263 264 Macronutrients, carbonate, and chlorophyll measurements ...... 139 265 266 DNA extraction ...... 141 267 268 RNA extraction ...... 141 269 270 DNA Amplicon sequencing ...... 141 271

x

272 RNA sequencing …...... 143 273 274 Two step read mapping approach: DIAMOND and BWA ……………………...... 143 275 276 Normalization ...... 144 277 278 Differential transcript expression ………………………………………………….. 144 279 280 Metatranscriptome rarefaction ...... 145 281 282 Annotation and differential KO expression ………………………….…………….. 145 283 284 Identification of carbon concentrating mechanism components and autolysis genes 285 286 ...... 146 287 288 Results and discussion ...... 146 289 290 Chlorophyll, nutrients, and carbonate chemistry ...... 146 291 292 Metatranscriptome reads are dominated by Skeletonema and Thalassiosira diatoms 293 294 ...... 149 295 296 Diatom genera show different patterns of expression of KOs for C, N, and P 297 298 acquisition and across CO2 conditions ...... 150 299 300 Expression patterns in ion and redox homeostasis KO2 indicate diatoms experience 301 increased oxidative stress in high CO2 ...... 153 302

Diatoms have distinct complements of carbon concentrating mechanism components 303 that are regulated in response to CO2 ...... 154 304

Conclusions ...... 156 305 306 References ...... 158 307 308 Appendix I ...... 175 309 310 Appendix II …………………………………………………...... …………..…. 178 311 312 Appendix III ……………………………………………………...... ………….... 179 313

xi

LIST OF TABLES 314 315 316 TABLE PAGE 317 318 MANUSCRIPT 1 319

Table 1. Summary table for T. oceanica and T. weissflogii showing the total 320 number of transcripts assembled and number of transcripts with function or 321 protein domain information assigned by KEGG automated annotation server or 322

InterProScan…...... …81 323

Table 2. Summary table for T. oceanica and T. weissflogii showing the number 324 of transcripts differentially expressed in silicon limitation relative to nutrient 325 replete conditions (>= 0.95 posterior probability of a 2-fold change). Numbers 326 shown in the “Si specific” rows indicate the numbers of genes that are silicon- 327 specifically differentially expressed and not part of a general stress 328 response…………………………………………….……….…..……….……81 329

MANUSCRIPT 2 330

Table 1. Numbers of differentially expressed genes (>= 0.95 posterior 331 probability of a 2-fold change in ASC) (Wu et al. 2010) across pCO2 condition 332 in each transcriptome………………………………………………………...131 333

Table 2. Number of putative carbonic anhydrases from model diatom genomes 334 and the transcriptomes described in this study. The localization and activity of 335 the full complement of putative carbonic anhydrases in P. tricornutum and T. 336 pseudonana have not been completely experimentally verified (Tachibana et 337 al. 2011; Samukawa et al. 2014; Kikutani et al. 2016) 338

……………………………………………………...... 132 339

xii

340 LIST OF FIGURES 341 342 FIGURE PAGE 343 344 MANUSCRIPT 1 345 346 Figure 1. Depiction of the results from the hierarchical clustering of T. 347 oceanica and T. weissflogii silicon sensitive genes, with only the results from 348 silicon-specific clusters shown. Each panel shows the per-cluster gene 349 expression values as a log2-fold change in silicon, iron, or nitrate limitation 350 relative to the nutrient replete condition. The bar height indicates the per- 351 cluster median expression value and the error bars indicate the interquartile 352 range. The numbers at the bottom of each panel indicate the number of genes 353 in each cluster……………………….……………………………………….76 354

Figure 2. Phylogenetic analysis of diatom silicon transporters. A maximum 355 likelihood comparison for silicon transporters from Fragilariopsis cylindrus, 356

Thalassiosira pseudonana, T. rotula, T. oceanica, and T. weissflogii was 357 conducted. For each silicon transporter identified in this study, a heat map is 358 also shown. The heat map colors indicate the log2-fold change of each putative 359 silicon transporter in silicon (Si), iron (Fe), or nitrate (N) limited condition 360 relative to the nutrient replete condition. The asterisk indicates which silicon 361 transporters were statistically significantly differentially expressed. A 1:1 line 362 is also inset, which shows the log10 RPKM expression value in nutrient replete 363

(x-axis) and silicon limited (y-axis) conditions. Each point corresponds to a 364 silicon transporter and the shape of the point indicates which species the 365 silicon transporter originates from. The black diagonal line indicates where 366

xiii

points would fall if they have the same expression value in the two conditions 367 compared. …...…………………………..………………………………… 77 368

Figure 3. A series of 1:1 lines showing the expression values for putative 369 silaffins, cingulins, silacidins, and frustulins identified in this study. Each point 370 represents a single transcript. The x-axis is the log 10 transformed RPKM 371 values in the nutrient replete condition and the y-axis is the log 10 transformed 372

RPKM values in silicon limitation. The black diagonal line indicates where 373 points would fall if they have the same expression value in the two conditions 374 compared. The color of the point indicates the direction of differential 375 expression in silicon limitation (up (red), down (blue), or no change) and the 376 shape indicates whether or not each differentially expressed transcript is 377 specific to silicon limitation or is part of a general stress response. 378

………………………………………………………………………….……. 78 379

Figure 4. Heat map overlaid on a pathway map of genes involved in the 380 precursors of long-chain polyamine biosynthesis. Colors in the heat map 381 indicate the log2-fold change in silicon (Si), iron (Fe), or nitrate (N) limitation 382 relative to the nutrient replete condition. The full description for each KEGG 383 orthology are as follows: K00138:aldB, aldehyde dehydrogenase; 384

K00286:proC, pyrroline-5-carboxylate reductase; K01259:pip, proline 385 iminopeptidase; K01476:arginase; K00819:rocD, ornithine--oxo-acid 386 transaminase; K01581:ornithine decarboxylase; K00789:metK, S- 387 adenosylmethionine synthetase; K01611:speD, S-adenosylmethionine 388 decarboxylase; K00797:speE, spermidine synthase. …………………….…. 79 389

xiv

Figure 5. Heat map of the silicon specific, differentially expressed genes from 390

T. oceanica and T. weissflogii and their KEGG orthology or silicon metabolism 391 gene assignment categories. Colors in the heat map indicate the relative 392 proportion of silicon specific genes in each category. Some categories that 393 would not apply to diatoms were excluded from this analysis (e.g., 394

Cardiovascular diseases). …………………………………………………… 80 395

MANUSCRIPT 2 396

Figure 1. A heat map showing a subset of the differentially expressed genes 397 from T. weissflogii, T. rotula, and T. oceanica. Warmer colors in the heat map 398 indicate an upregulation and a cooler color indicates a downregulation. “Low” 399 refers to the <230 ppm CO2 condition, “projected” refers to the 690-1340 ppm 400 pCO2 condition, and “max” refers to the 2900-5100 ppm pCO2 401 condition…………………………………………………..……………..…..124 402

Figure 2. A maximum likelihood phylogeny of the bicarbonate transporters 403 from the model diatoms T. pseudonana (JGI Thaps3 gene IDs) and P. 404 tricornutum (JGI IDs or NCBI accession numbers), the transcriptomes 405 assembled in this study (indicated by asterisks), and MAFFT homologs 406 included for reference. All alignments were done in MAFFT (version 7) using 407 the L-INS-i strategy with the BLOSUM62 scoring matrix, the “leave gappy 408 regions” option selected, and the MAFFT-homologs option selected (Katoh & 409

Standley 2013). RAxML maximum likelihood trees were computed with the 410 following parameters: PROTGAMMA matrix, BLOSUM62 substitution 411

xv

model, and 1000 bootstrap replicates. The tree was mid-point rooted in 412

FigTree. Support values less than 70 are not shown……………………...…125 413

Figure 3. Expression values of the carbonic anhydrases and bicarbonate 414 transporters from the transcriptomes analyzed in this study. Each point 415 represents a single transcript. The x-axis is the log 10 transformed RPKM 416 values in the 320-390 ppm pCO2 and the y-axis is the log 10 transformed 417

RPKM values in the <230 ppm condition, the 690-1340 ppm pCO2 condition, 418 and the 2900-5100 ppm pCO2 condition at the bottom of the figure. The black 419 diagonal line indicates the same expression value in the two conditions 420 compared. Circled points appearing above this line are upregulated in the 421 condition indicated on the y-axis and circled points below it are downregulated 422 in the condition indicated on the y-axis (>= 0.95 posterior probability of a 2- 423 fold change in ASC) (Wu et al. 2010). The color of the points indicates the 424 specific isozyme type and the shape of the point indicates from which 425 organism the transcript 426 originates………………………………………………………………..…...126 427

Figure 4. A maximum likelihood phylogeny of the carbonic anhydrases from 428 the model diatoms T. pseudonana (JGI Thaps3 gene IDs) and P. tricornutum 429

(JGI IDs or NCBI accession numbers), the three Thalassiosira transcriptomes 430 assembled in this study (indicated by asterisks), and two additional carbonic 431 anhydrases from Halothiobacillus neapolitanus and Plasmodium falciparum 432 for reference. All alignments were done in MAFFT (version 7) using the L- 433

INS-i strategy with the BLOSUM62 scoring matrix, the “leave gappy regions” 434

xvi

option selected (Katoh & Standley 2013). RAxML maximum likelihood trees 435 were computed with the following parameters: PROTGAMMA matrix, 436

BLOSUM62 substitution model, and 1000 bootstrap replicates. The carbonic 437 anhydrase tree was rooted using the gamma carbonic anhydrases because they 438 are the most ancient carbonic anhydrase (Smith et al. 1999). Support values 439 less than 70 are not shown. ……………………………………………..….. 128 440

Figure 5. Putative carbonic anhydrases (CAs) that have been localized in the 441 model diatom T. pseudonana (from Samukawa et al. 2014) and the predicted 442 localization of putative CAs from T. oceanica and T. weissflogii. The 443 localizations of the putative CAs reported in this study have not been 444 experimentally verified and are based on the predictions from SignalP (V4.1), 445

TargetP (V1.1), and ASA-find (1.1.5) (Bendtsen et al. 2004; Emanuelsson et 446 al. 2007; Gruber et al. 2015). SignalP was run against the organism 447 group and the default cutoff values were used. TargetP was run with the plant 448 model and “winner-takes-all” option selected. PPS = periplastid space, Cyt = 449 cytoplasm, mit = mitochondria, CER = chloroplast endoplasmic reticulum, 450

PPC = periplastidal compartment, CEV = chloroplast envelope, Str = 451 chloroplast stroma, Pyr = pyrenoid. ……………………………………… 129 452

Figure 6. A schematic summarizing the enzymes thought to be involved in the 453 diatom CCM, including variations of C4 metabolism (Reinfelder et al. 2000; 454

Sage 2004; Kustka et al. 2014). Grey boxes indicate enzymes performing the 455 conversions between the metabolites on either end of the black arrow. Grey 456 boxes: AGAT = alanine glyoxylate amino transferase, ALAT = alanine amino 457

xvii

transferase, CA = carbonic anhydrase, ENO = enolase, GDC = glycine 458 decarboxylase P-protein, GK = glycerate kinase , HPR = hydroxypyruvate 459 reductase, MDH = malate dehydrogenase, ME = malic enzyme, PEPC = 460 phosphoenolpyruvate carboxylase, PEPCK = phosphoenolpyruvate 461 carboxykinase, PGM = phosphoglycerate mutase, PPDK = pyruvate, 462 phosphate dikinase, PYC = pyruvate carboxylase, SLC4/SLC26 = solute 463 carrier family 4/6, SPT = serine pyruvate transaminase. Outside grey boxes: 464

- CO2 = carbon dioxide, GLC = glycerate, GLX = glyoxylate, HCO3 = 465

+ bicarbonate, MAL = malate, NH4 = ammonium, OAA = oxaloacetate, PEP = 466 phosphoenolpyruvate, PYR = pyruvate ……………………………….… 130 467

MANUSCRIPT 3 468

-1 - - Figure 1. (A) ppm CO2, (B) Chlorophyll L , (C) µM NO3 +NO2 , (D) µM PO4, 469

(E) µM Si(OH)4, (F) µM POC, (G) µM PON, (H) µM POP, (I) µM BSi, and 470

(J) temperature °C for each incubation carboy. The x-axis indicates the date the 471 variable was measured and the y-axis indicates the values and variables 472 measured. The shapes and colors indicate the CO2 condition shown. Points and 473 error bars indicate the mean and range of the measurement across the triplicate 474 incubation carboys. The asterisks indicate significant differences relative to the 475 present day pCO2 condition (1-way ANOVA; p < 0.05). Dilution days are 476 indicated with black arrows. Solid shapes indicate that the measurement shown 477 was taken before dilution, empty shapes indicate the measurement was taken 478 post-dilution. ………..…………………………………………………..…. 165 479

xviii

Figure 2. A: Bar charts showing the relative distributions of metatranscriptome 480 reads mapped to the MMETSP database using either DIAMOND or BWA, 481 moving from less specific to more specific from left to right. B: DADA2 482 taxonomic assignment of 18S V4 rDNA amplicon reads using the PR2 483 database. ………………………………………………………………....… 166 484

Figure 3. A pathway heat map showing broad metabolic rearrangements in the 485 citrate cycle in Skeletonema and Thalassiosira across CO2 conditions. The 486 color indicates the log2-fold change of total reads mapped per genus per 487

KEGG KO in low or high CO2 relative to the present day CO2 condition. 488

……………………………………………………………………………… 168 489

Figure 4. A heat map showing broad metabolic rearrangements in nitrogen 490 uptake pathway members in Skeletonema and Thalassiosira across CO2 491 conditions. The color indicates the log2-fold change of total reads mapped per 492 genus per KEGG KO in low or high CO2 relative to the present day CO2 493 condition. ……...…………………………………………………………… 169 494

Figure 5. A pathway heat map showing broad metabolic rearrangements in 495 branched-chain amino acid metabolism in Skeletonema and Thalassiosira 496 across CO2 conditions. The color indicates the log2-fold change of total reads 497 mapped per genus per KEGG KO in low or high CO2 relative to the present 498 day CO2 condition. ………………………………………………………… 170 499

Figure 6. A pathway heat map showing broad metabolic rearrangements in 500 phosphorus metabolism in Skeletonema and Thalassiosira across CO2 501 conditions. The color indicates the log2-fold change of total reads mapped per 502

xix

genus per KEGG KO in low or high CO2 relative to the present day CO2 503 condition. ……………………………………………………...………....… 171 504

Figure 7. A pathway heat map showing broad metabolic rearrangements in 505 phosphorus metabolism in Skeletonema and Thalassiosira across CO2 506 conditions. The color indicates the log2-fold change of total reads mapped per 507 genus per KEGG KO in low or high CO2 relative to the present day CO2 508 condition. …..…………………………………………………...………..… 172 509

Figure 8. A series of 1 to 1 lines showing the expression values of the α, δ, γ, 510 and ζ carbonic anhydrases from Skeletonema spp. and Thalassisoira spp. 511 diatoms. Each point indicates a gene and the color indicates which genera of 512 diatom the sequence is from. The x-axis is the expression value in present day 513 pCO2 and the y-axis is the expression value in either low or high pCO2. The 514 black line indicates the point at which the expression of a gene is the same in 515 both conditions. Genes that were differentially expressed (>=0.95 posterior 516 probability of >2-fold change) are indicated by the black star) (Wu et al. 2010). 517

……………………………………………………..…...………………...… 173 518

Figure 9. A series of 1 to 1 lines showing the expression values of the SLC4 519 and SLC26 bicarbonate transporters from Skeletonema spp. and Thalassisoira 520 spp. diatoms. Each point indicates a gene and the color indicates which genera 521 of diatom the sequence is from. The x-axis is the expression value in present 522 day pCO2 and the y-axis is the expression value in either low or high pCO2. 523

The black line indicates the point at which the expression of a gene is the same 524 in both conditions. Genes that were differentially expressed (>=0.95 posterior 525

xx

probability of >2-fold change) are indicated by the black star) (Wu et al. 2010). 526

……………………………………………………..…...………………...… 174 527

528

xxi

INTRODUCTION 529

Ocean acidification 530

Fossil fuel combustion has raised atmospheric pCO2 from ~277 ppm in the 531 year 1750 to the present level of ~400 ppm (Tans 2009; Le Quéré et al. 2016). 532

Increased levels of dissolved CO2 gas in seawater perturbs carbonate chemistry in the 533 ocean, which results in a decreased buffering capacity, or ocean acidification (OA) 534

(Caldeira & Wickett 2003). Climate models suggest that pCO2 levels in the 535 atmosphere may rise to ~1000 ppm by the year 2100 if no measures are taken to 536 reduce emissions (Solomon et al. 2007). Through physical mixing and biological 537 activity, the ocean sequesters approximately 1/3 of anthropogenic CO2 and has been 538 the only net sink for CO2 in the past 200 years (Sabine et al. 2004; Khatiwala et al. 539

2013). 540

Diatoms experience spatial and temporal fluctuations in CO2 541

Diatoms are photosynthetic single-celled that account for 40% of 542 oceanic primary production and approximately 20% of global primary production 543

(Nelson et al. 1995; Field et al. 1998). Diatoms have experienced fluctuations in pCO2 544 on different temporal scales throughout their evolutionary history. Molecular clock 545 estimates suggest that diatoms arose around 240 million years ago (Sims et al. 2006; 546

Medlin 2016). This coincides with the start of a glacial “hot-house” period, where 547 atmospheric pCO2 levels were as high as 6000 ppm (Kump et al. 2009). In more recent 548 history, diatoms have also experienced low pCO2: pre-industrial (before 1850) 549 atmospheric pCO2 averaged ~270 ppm (Tans 2009). In the present day, diatoms can 550 also experience fluctuations in pCO2 levels that vary on much shorter time scales. 551

1

Coastal diatoms might experience dissolved pCO2 concentrations greater than 850 552 ppm during seasonal coastal upwelling events that entrain CO2-rich deep water into 553 the lit ocean (Feely et al. 2008). Diatoms’ distributions in the ocean can also influence 554 the pCO2 conditions they experience over the course of their life histories. Relative to 555 coastal diatoms, diatoms living in the open-ocean experience more stable pCO2 556 conditions (Hofmann et al. 2011). Coastal diatoms may have unique strategies for 557 coping in high CO2 that are not seen in open-ocean diatoms. 558

Diatoms & the carbon concentrating mechanism 559

Diatoms' key role in global carbon cycles is aided by a carbon-concentrating 560 mechanism (CCM), which concentrates CO2 around RuBisCO, the enzyme that begins 561 the CO2 fixation process (Badger et al. 1998; Hopkinson et al. 2011). The diatom CCM 562

- uses bicarbonate transporters (BCTs) to take up HCO3 and carbonic anhydrases (CAs) 563

- to interconvert HCO3 and CO2 (Badger et al. 1998; Hopkinson et al. 2011). The 564

- predominant dissolved form of inorganic carbon in the ocean is bicarbonate (HCO3 ) 565

(Zeebe & Wolf-Gladrow 2001), which is transported into the cell by the BCTs and can 566 be interconverted to CO2 (which diffuses across membranes without active transport) 567 via the activity of a suite of carbonic anhydrases (Meldrum & Roughton 1933; Stadie 568

& O’Brien 1933). The use of a carbon concentrating mechanism and active transport 569 of inorganic carbon requires a significant energetic investment by the cell (Raven 570

1991). If diatoms can downregulate CCM components when CO2 levels are high, this 571 may result in the conservation of cellular energy as anthropogenic CO2 emissions rise. 572

This energy conservation may confer a competitive advantage to some diatoms and 573 other groups of phytoplankton and could influence community composition in the 574

2

future. The ability or inability to regulate the CCM in response to fluctuating pCO2 575 may be related to how diatoms sense changes in pCO2. 576

Diversification of the diatom CCM 577

The diatom CCM has been most well-studied in the model diatoms 578

Thalassiosira pseudonana and Phaeodactylum tricornutum and includes the a, β, δ, g, 579 q, and ζ CAs and the SLC4 and SLC26 BCTs (Tachibana et al. 2011; Samukawa et al. 580

2014; Kikutani et al. 2016; Hopkinson 2014; Hopkinson et al. 2016). Thalassiosira 581 pseudonana and Phaeodactylum tricornutum generally contain the same complement 582 of CCM genes, with a few exceptions: The δ and ζ CA isozymes have not been found 583 in P. tricornutum, while the β CAs have not been found in T. pseudonana (Tachibana 584 et al. 2011; Samukawa et al. 2014). The θ carbonic anhydrases have been 585 characterized in P. tricornutum, but their expression in T. pseudonana has not been 586 confirmed (Kikutani et al. 2016). The SLC4 transporters have been characterized in P. 587 tricornutum, but not T. pseudonana (Nakajima et al. 2013). The sub-cellular 588 arrangements of the CCM components (and thus the key points of CCM regulation) 589 also differ in these two species, suggesting that their CCMs work quite differently 590

(Tachibana et al. 2011; Samukawa et al. 2014; Hopkinson et al. 2016). While T. 591 pseudonana has CAs localized to the periplastid space, the cytosol, and the chloroplast 592 stroma, P. tricornutum does not (Samukawa et al. 2014). In contrast, P. tricornutum 593 has CAs localized to the chloroplast endoplasmic reticulum and the pyrenoid, T. 594 pseudonana does not (Tachibana et al. 2011). It has been hypothesized that this 595 diversification and re-configuration of the CCM in different diatoms is due to historic 596 fluctuations in atmospheric CO2 (Shen et al. 2017). However, there are few studies 597

3

examining the CO2-sensitivity and regulation of the CCM across a diversity of 598 diatoms. This is particularly true of diatoms living in the open-ocean that may not 599 have the same high CO2-coping mechanisms as coastal isolates (like T. pseudonana). 600 cAMP-mediated CO2 sensing in diatoms 601

One aspect of carbon physiology that might influence how diatoms cope in 602 high CO2 is their ability to sense changes in CO2. Sensing of pCO2 in diatoms is via 603 the activity of cAMP secondary messengers (Harada et al. 2006; Ohno et al. 2012; 604

Hennon et al. 2015). Soluble adenylate cyclases generate cAMP secondary 605 messengers from ATP in response to signals received from bicarbonate (Buck et al. 606

1999). The cAMP secondary messengers activate cAMP-dependent protein kinases 607 that phosphorylate and activate bZIP domain transcription factors that in turn bind to 608 cAMP response elements and regulate downstream gene expression (Ohno et al. 609

2012). Phosphodiesterases regulate cellular cAMP levels by dephosphorylating cAMP 610 back into AMP (Conti & Beavo 2007). Studies in P. tricornutum and T. pseudonana 611 suggest that the activity of a soluble adenylyl cyclase is increased in the presence of 612

CO2/HCO3, and that the resulting increase in cellular cAMP levels induces the activity 613 of bZIP transcription factors (Harada et al. 2006; Ohno et al. 2012; Hennon et al. 614

2015). In high CO2, the transcription factors bind to upstream CO2-cAMP-responsive 615 elements and repress the expression of CCM genes, including the expression of the 616 adenylyl cyclase itself (Hennon et al. 2015). The pCO2 sensitivity of the β carbonic 617 anhydrase of P. tricornutum was repressed when cells were treated with a cAMP 618 inhibitor IBMX (Harada et al. 2006) or when cAMP-responsive motifs were deleted 619 upstream of the β carbonic anhydrase gene (Ohno et al. 2012). These studies of 620

4

cAMP-mediated CO2 sensing have been conducted in the model diatoms T. 621 pseudonana (Hennon et al. 2015) and P. tricornutum (Harada et al. 2006; Ohno et al. 622

2012). In light of the diversification and re-configuration of the CCM in different 623 diatoms (Shen et al. 2017), it is possible that this cAMP-mediated CO2-sensing 624 mechanism is not ubiquitous across diatoms. 625

Silicon limits diatom growth and primary production 626

In addition to being aided by the CCM, diatoms’ primary production is also a 627 function of their growth, which is constrained by the availability of nutrients in the 628 ocean. Many phytoplankton share growth requirements for nutrients such as nitrate, 629 phosphate, and iron, but diatoms are unique among the phytoplankton as they require 630 silicon, which is used to build their ornately silicified cell walls or (Lewin 631

1957; Lewin 1962; Jørgensen 1955). 632

It has been suggested that silicon ultimately limits diatom growth throughout 633 the ocean (Dugdale et al. 1995; Yool & Tyrrell 2003) by several different 634 mechanisms. Firstly, there is a discrepancy in the available pools of recycled nitrogen 635 and recycled silicon in the surface ocean. Silicon recycling occurs deeper in the water 636 column than that of other nutrients (C, N, P), leading to a propensity for silicon 637 depletion in the surface ocean (Dugdale et al. 1995; Treguer et al. 1995; Dugdale & 638

Wilkerson 1998). Secondly, diatoms growing in upwelled water can become silicon 639 limited if these waters are enriched in nitrate relative to silicon, which been observed 640 in the North Atlantic Ocean (Allen et al. 2005; Johnson et al. 2007). Thirdly, silicon 641 may also set the upper limit on diatom primary production in iron limited high nutrient 642 low chlorophyll (HNLC) regions (Dugdale et al. 1995; Dugdale & Wilkerson 1998). 643

5

This is because diatoms growing low iron conditions (< 5 nM), take up as much as 8 644 times more silicon than nitrate (Franck et al. 2000). Thus, diatoms in HNLC regions 645 may be driven into silicon limitation due to their increased silicon requirement 646

(Hutchins & Bruland 1998; De La Rocha et al. 2000). It has also been suggested that 647 silicon processing may require iron as a cofactor, which would lead to a 648 biochemically-dependent iron and silicon co-limitation (Mock et al. 2008; Saito et al. 649

2008). Silicon and trace metal co-limitation of diatoms has been documented in the 650

HNLC waters of the Equatorial Pacific Ocean (Brzezinski et al. 2011), the Southern 651

Ocean and sub-Antarctic (Boyd et al. 1999; Leblanc et al. 2005; Franck et al. 2000), 652 and the sub-Arctic Pacific (Koike et al. 2001). As these studies assessed the silicon 653 status of the bulk diatom community, it is currently not known whether these 654 responses are uniform across particular diatom genera, species, or strains. Genetic 655 markers have been used to probe iron status in the field, and similar molecular markers 656 of silicon status could be powerful tools to probe the silicon status of different co- 657 existing diatom species (Chappell et al. 2015). 658

Silicon uptake and deposition 659

Silicic acid is a small, uncharged molecule and can freely diffuse across diatom 660 membranes when concentrations are high (Hildebrand et al. 2018), but when silicic 661 acid concentration is low, silicon transporters are used for active transport (Kimberlee 662

Thamatrakoln & Hildebrand 2008). Diatoms’ silicified cell wall is formed inside the 663 silicon deposition vesicle (Drum & Pankratz 1964; Reimann et al. 1966). Silicification 664 inside the silicon deposition vesicle is aided by several classes of molecules -- the 665 silaffins (Poulsen & Kröger 2004; Kröger et al. 1999), cingulins (Scheffel et al. 2011), 666

6

silacidins (Wenzl et al. 2008), and long-chain polyamines (Kröger et al. 2000) all play 667 a role in silicon precipitation and formation. Pleuralins and frustulins are 668 tightly associated with the silicified cell wall (Kröger et al. 1997; Kröger et al. 1994), 669 but they are not thought to be directly involved in silicon deposition in the silicon 670 deposition vesicle (Kröger & Wetherbee 2000; van de Poll et al. 1999). 671

Diatom silicon physiology and the cell cycle are tightly linked. In addition to 672 requiring silicon for their cell walls (Lewin 1957; Lewin 1962; Jørgensen 1955), 673 silicon is also required for diatoms’ DNA synthesis to proceed during S phase of the 674 cell cycle (Darley & Volcani 1969). This phase of the cell cycle also sees peak 675 expression of silicon transporter mRNA (but the lowest expression of silicon 676 transporter proteins) (Thamatrakoln & Hildebrand 2007). Diatom frustule valves are 677 formed just after cell division, during G2/M phase, while girdle band synthesis 678 happens in G1 phase (Eppley et al. 1967; Hildebrand et al. 2007). These different 679 processes that require silicon explain why silicon limited T. weissflogii cells 680 sometimes arrest in the G2/M phase or other times in the G1/S phase transition 681

(Brzezinski et al. 1990; Vaulot et al. 1987). 682

Molecular biology of silicon limitation 683

Genetic studies have given us a better understanding of the molecular 684 underpinnings of diatom silicon physiology. Silicon limited T. pseudonana 685 upregulates silicon transporter gene copies SIT1 and SIT2 (transcripts and proteins) in 686 silicon limitation (Mock et al. 2008; Shrestha et al. 2012; Du et al. 2014; Smith et al. 687

2016; Brembu et al. 2017) and also has a higher concentration of silacidin proteins 688 relative to silicon replete condition (Richthammer et al. 2011), although transcriptome 689

7

studies in T. pseudonana did not see differential expression of silacidin genes (Mock 690 et al. 2008; Shrestha et al. 2012). Genes for the biosynthesis of long-chain polyamine 691 precursors (spermidine synthases) (Bouchereau et al. 1999) and the silaffins were 692 upregulated during silicon limitation (Mock et al. 2008; Brembu et al. 2017), while 693 others were upregulated only after silicon resupply (Shrestha et al. 2012; Brembu et al. 694

2017), leading to the hypothesis that distinct silaffin gene copies or LCPAs are 695 optimized for silicon deposition in either low or high silicon concentrations. In 696 addition to SIT1 and SIT2, T. pseudonana also has a third silicon transporter, SIT3, 697 that is lowly expressed and not sensitive to changes in silicon, leading to the 698 hypothesis that it functions as a silicon sensor rather than a transporter (Thamatrakoln 699

& Hildebrand 2007). Cingulins were downregulated during silicon limitation and 700 upregulated after silicon resupply, but were also upregulated during the S phase of the 701 cell cycle (when silicon deposition does not occur) suggesting that cingulins have 702 other roles in the cell aside aside from silicon deposition (Shrestha et al. 2012; Brembu 703 et al. 2017). Given that diatoms also require silicon for DNA synthesis during S phase 704 of the cell cycle (Darley & Volcani 1969), when cingulin expression increases 705

(Shrestha et al. 2012; Brembu et al. 2017), it is also feasible that the cingulins play a 706 role in signaling cellular silicon status before DNA synthesis proceeds. Notably, the 707 studies to date have not included diatoms found in HNLC regions, where silicon sets 708 an upper limit on diatom production (Dugdale & Wilkerson 1998; Chappell et al. 709

2013), or the North Atlantic Ocean where silicon limitation of the bulk diatom 710 community has been observed (Allen et al. 2005; Johnson et al. 2007), limiting the 711 ability to develop appropriate molecular markers. 712

8

Thesis motivation and outline 713

Diatom primary production is a function of their overall growth rate, which is 714 constrained by the availability of dissolved nutrients in the surface ocean. Many 715 phytoplankton share growth requirements for nutrients such as nitrate, phosphate, and 716 iron, but diatoms are unique among the phytoplankton as they require silicon, which is 717 used to build their ornately silicified cell walls or frustules (Lewin 1957; Lewin 1962; 718

Jørgensen 1955). Silicon growth limitation of the bulk diatom community has been 719 observed in the North Atlantic Ocean (Allen et al. 2005; Johnson et al. 2007) and 720 silicon and trace metal co-limitation has been documented in the HNLC waters of the 721

Equatorial Pacific Ocean (Brzezinski et al. 2011), the Southern Ocean and sub- 722

Antarctic (Boyd et al. 1999; Leblanc et al. 2005; Franck et al. 2000), the sub-Arctic 723

Pacific (Koike et al. 2001), and the equatorial Pacific (Dugdale & Wilkerson 1998). 724

As these were studies assessing the bulk diatom community, it is unclear whether or 725 not certain diatom species are experiencing silicon limitation while others are not. 726

Molecular markers have been useful for assessing the iron status of a diatom species, 727

Thalassiosira oceanica, that is prevalent across oceanic gradients (Chappell et al. 728

2015), and a similar approach could prove useful for determining whether or not T. 729 oceanica and other co-occuring diatoms experience silicon limitation in a bulk 730 community. However, most studies looking at the molecular and genetic response to 731 silicon limitation have used the model diatoms T. pseudonana (Mock et al. 2008; 732

Shrestha et al. 2012; Du et al. 2014; Smith et al. 2016; Brembu et al. 2017) or P. 733 tricornutum (Sapriel et al. 2009), rather than diatoms typically found in places where 734 silicon is thought to limit diatom primary production (Dugdale & Wilkerson 1998; 735

9

Allen et al. 2005; Johnson et al. 2007), hindering our ability to develop appropriate 736 molecular markers. 737

Chapter 1 of this thesis examined the transcriptomes of two closely related 738

Thalassiosira diatoms to better understand the silicon limitation gene expression 739 response of diatoms found in ecosystems where their growth may be constrained by 740 silicon availability, toward the goal of developing appropriate molecular markers of 741 silicon status. Thalassiosira oceanica CCMP1005 has a high tolerance for iron 742 limitation (Sunda & Huntsman 1995) and may persist in chronically iron limited 743

HNLC regions (Chappell et al. 2013) where diatoms’ silicon requirements are high 744

(Hutchins & Bruland 1998; De La Rocha et al. 2000). Thalassiosira weissflogii 745

CCMP1010 was isolated from the North Atlantic Ocean, an ocean basin where silicon- 746 depleted upwelling has been observed (Allen et al. 2005; Johnson et al. 2007). We 747 selected this genus as it allows us to make comparisons to studies in T. pseudonana 748 which has a completely sequenced genome (Armbrust et al. 2004) and its silicon 749 metabolism and molecular biology is very well studied (Poulsen & Kröger 2004; 750

Thamatrakoln & Hildebrand 2007; Mock et al. 2008; Wenzl et al. 2008; Scheffel et al. 751

2011; Shrestha et al. 2012; Du et al. 2014; Smith et al. 2016; Brembu et al. 2017). 752

While T. weissflogii upregulated the silicon transporters in low silicon, T. oceanica did 753 not. We hypothesize that this is due to the lack of a SIT3-like silicon sensor in T. 754 oceanica. We also generally found very different silicon limitation responses in these 755 two diatoms. We hypothesize that this is because silicon is required for cell wall 756 synthesis (Lewin 1957; Lewin 1962; Jørgensen 1955) and DNA replication (Darley & 757

Volcani 1969) in diatoms and that these two processes are being differentially affected 758

10

in silicon limited T. oceanica and T. weissflogii. However, we did find a family of 759

ATP-grasp domain proteins of unknown function that were upregulated in silicon- 760 limited T. oceanica, T. weissflogii, and T. pseudonana (Smith et al. 2016; Brembu et 761 al. 2017). The upregulation of these genes is unique to silicon limitation and their 762 expression is not induced in nitrate or iron limitation conditions. It is also found across 763 a diversity of diatoms in the MMETSP database (Keeling et al. 2014), suggesting that 764 they might be useful as a silicon limitation markers in many diatoms, in addition to 765

Thalassioisra spp. diatoms. 766

Increased fossil fuel consumption since the industrial revolution has increased 767

CO2 levels in the atmosphere and in the ocean (Tans 2009; Le Quéré et al. 2016). The 768 increased CO2 in the ocean has perturbed carbonate chemistry and led to ocean 769 acidification (Caldeira & Wickett 2003). Diatoms have experienced large fluctuations 770 in CO2 conditions on evolutionary time scales (Sims et al. 2006; Kump et al. 2009; 771

Tans 2009; Medlin 2016), and it has been suggested that these historical fluctuations 772 in CO2 have led to a diversification and reconfiguration of the diatom CCM (Shen et 773 al. 2017). Most studies of the diatom CCM have used the model diatoms T. 774 pseudonana and P. tricornutum, and they suggest that the CCM of these diatoms 775 works very differently (Tachibana et al. 2011; Samukawa et al. 2014; Kikutani et al. 776

2016; Hopkinson 2014; Hopkinson et al. 2016). Further, CCM function and evolution 777 may have been influenced by diatom biogeography, as open-ocean species typically 778 experience stable CO2 conditions that are in equilibrium with atmospheric levels of 779

CO2 and coastal diatoms may experience very high CO2 conditions during seasonal 780 upwelling of high CO2 deep water (Feely et al. 2008; Hofmann et al. 2011). There is 781

11

currently a paucity of studies of the diatom CCM and its sensitivity to CO2 across a 782 diversity of diatom species with representatives from a range of biogeographical 783 distributions in the ocean. 784

Chapter 2 of this thesis adds to our knowledge of the diversity of CCM and 785 high CO2 response in different species of the Thalassiosira genus of diatoms. CO2 786 manipulation experiments were conducted with four Thalassiosira species – T. 787 pseudonana, T. rotula, T. weissflogii, and T. oceanica, and transcriptomes were 788 sequenced from the latter 3 species. T. rotula is a coastal diatom that may have 789 experienced high CO2 during seasonal upwelling, while T. oceanica and T. weissflogii 790 are open-ocean isolates that have likely experienced relatively stable CO2 conditions 791

(Feely et al. 2008; Hofmann et al. 2011). These species displayed a range of growth 792 rate changes across CO2 conditions (<230 ppm CO2 to >2900 ppm CO2), from no 793 change across conditions (T. pseudonana), slower growth in lower CO2 (T. 794 weissflogii), slower growth in higher CO2 (T. oceanica), and faster growth in higher 795

CO2 (T. rotula) (King et al. 2015). The accompanying transcriptome evidence for T. 796 rotula, T. weissflogii, and T. oceanica indicate that the molecular basis for these 797 differences in CO2 responses boil down to the ability to dynamically regulate the CCM 798 while maintaining internal redox and ion homeostasis, as well as trade-offs between 799 funneling excess CO2 into internal storage pools versus a higher growth rate (Wallace 800 et al. in prep). 801

Chapter 3 uses a metatranscriptome approach to generalize about how the 802

CCM of Thalassiosira spp. differs from that of Skeletonema spp. diatoms. 803

Community-level global gene expression profiling (metatranscriptome) studies are a 804

12

powerful approach that capture the molecular basis for different diatoms’ contrasting 805 growth and nutrient acquisition strategies in mixed assemblages. This study involved 806

CO2 manipulation experiments of a plankton assemblage collected from 807

Massachusetts Bay during March of 2014. Seawater was collected in acid-cleaned 808 carboys, pre-filtered to remove grazers larger than 180 µm, amended with nutrients to 809 mimic the mixed layer depth, and pCO2 levels were manipulated to approximate 810 low/pre-industrial (less than 215 ppm pCO2), present-day ambient pCO2 in the open 811 ocean (330-390 ppm pCO2), and future pCO2 projections (780-1320 ppm pCO2) (Tans 812

2009; Le Quéré et al. 2016). Metatranscriptome and diatom-specific 18S rDNA 813 amplicon sequencing (Zimmermann et al. 2011; Mellett et al. 2018) suggest that 814

Thalassiosira spp. and Skeletonema spp. were abundant in these incubations, 815 consistent with previous light microscopy observations conducted at the final 816 experimental time point, as well as previous studies of Massachusetts Bay in March 817

(Hunt et al. 2010). We show evidence of distinct enzymatic strategies for N and P 818 uptake and scavenging in these two diatom genera. There is also evidence of distinct 819 strategies in their CCMs – while Thalassiosira diatoms seem to use the α, δ, and ζ 820 carbonic anhydrases and the substrate-specific SLC4 and non-specific SLC26 821 bicarbonate transporters, Skeletonema relies on the γ and ζ carbonic anhydrases and 822 the SLC26 bicarbonate transporters or diffuse CO2 uptake (Nakajima et al. 2013; 823

Alper & Sharma 2013). This suggests that the co-existence of Skeletonema spp. and 824

Thalassiosira spp. might be partially due to a CCM that use distinct CA isozymes that 825 use different metal cofactors for their enzymatic activity (γ can substitute iron, δ can 826 substitute cobalt) (Lane & Morel 2000; Tripp et al. 2004). Furthermore, if 827

13

- Skeletonema does rely on diffusive CO2 uptake rather than active transport of HCO3 828 using SLC4 or SLC26 transporters, Thalassiosira may have a competitive advantage 829 in high CO2 as it downregulates these active transporters. 830

831

14

References 832

Allen, J.T. et al., 2005. Diatom carbon export enhanced by silicate upwelling in the 833

northeast Atlantic. Nature, 437(7059), pp.728–732. 834

Alper, S.L. & Sharma, A.K., 2013. The SLC26 Gene Family of Anion Transporters 835

and Channels. Molecular Aspects of Medicine, 34(2–3), pp.494–515. 836

Armbrust, E.V. et al., 2004. The Genome of the Diatom Thalassiosira pseudonana: 837

Ecology, Evolution, and Metabolism. Science, 306(October), pp.79–86. 838

Badger, M.R. et al., 1998. The diversity and coevolution of Rubisco, plastids, 839

pyrenoids, and chloroplast-based CO2-concentrating mechanisms in algae. 840

Canadian Journal of Botany, 76(6), pp.1052–1071. 841

Bouchereau, A. et al., 1999. Polyamines and environmetal challenges: recent 842

development. Plant Science, 140, pp.103–125. 843

Boyd, P. et al., 1999. Role of iron, light, and silicate in controlling algal biomass in 844

subantarctic waters SE of New Zealand. Journal of Geophysical Research, 104, 845

pp.395–408. 846

Brembu, T. et al., 2017. Dynamic responses to silicon in Thalasiossira pseudonana - 847

Identification, characterisation and classification of signature genes and their 848

corresponding protein motifs. Scientific Reports, 7(1), pp.1–14. 849

Brzezinski, M. a. et al., 2011. Co-limitation of diatoms by iron and silicic acid in the 850

equatorial Pacific. Deep-Sea Research Part II: Topical Studies in Oceanography, 851

58(3–4), pp.493–511. 852

Brzezinski, M., Olson, R. & Chisholm, S., 1990. Silicon availability and cell-cycle 853

progression in marine diatoms. Marine Ecology Progress Series, 67(M), pp.83– 854

96. 855 15

Buck, J. et al., 1999. Cytosolic adenylyl cyclase defines a unique signaling molecule 856

in mammals. Proceedings of the National Academy of Sciences, 96(1), pp.79–84. 857

Caldeira, K. & Wickett, M.E., 2003. Oceanography: anthropogenic carbon and ocean 858

pH. Nature, 425(6956), p.365. 859

Chappell, P.D. et al., 2015. Genetic indicators of iron limitation in wild populations of 860

Thalassiosira oceanica from the northeast Pacific Ocean. The ISME Journal, 9, 861

pp.592–602. 862

Chappell, P.D. et al., 2016. Preferential depletion of zinc within Costa Rica upwelling 863

dome creates conditions for zinc co-limitation of primary production. Journal of 864

Plankton Research, 38(2), pp.244–255. 865

Chappell, P.D. et al., 2013. Thalassiosira spp. community composition shifts in 866

response to chemical and physical forcing in the northeast Pacific Ocean. 867

Frontiers in Microbiology, 4(SEP), pp.1–14. 868

Conti, M. & Beavo, J., 2007. Biochemistry and Physiology of Cyclic Nucleotide 869

Phosphodiesterases: Essential Components in Cyclic Nucleotide Signaling. 870

Annual Review of Biochemistry, 76(1), pp.481–511. 871

Darley, W.M. & Volcani, B.E., 1969. Role of silicon in diatom metabolism. A silicon 872

requirement for deoxyribonucleic acid synthesis in the diatom Cylindrotheca 873

fusiformis Reimann and Lewin. Experimental cell research, 58(2), pp.334–342. 874

Drum, R.W. & Pankratz, S.H., 1964. Post mitotic fine structure of Gomphonema 875

parvulum. Journal of ultrastructure research, 10, pp.217–223. 876

Du, C. et al., 2014. ITRAQ-based proteomic analysis of the metabolism mechanism 877

associated with silicon response in the marine diatom Thalassiosira pseudonana. 878

16

Journal of Proteome Research, 13(2), pp.720–734. 879

Dugdale, R.C. & Wilkerson, F.P., 1998. Silicate regulation of new production in the 880

equatorial Pacific upwelling. Nature, 391(January), pp.270–273. 881

Dugdale, R.C., Wilkerson, F.P. & Minas, H.J., 1995. The role of a silicate pump in 882

driving new production. Deep-Sea Research Part I: Oceanographic Research 883

Papers, 42(5), pp.697–719. 884

Eppley, R.W., Holmes, R.W. & Paasche, E., 1967. Periodicity in cell division and 885

physiological behavior of Ditylum brightwellii a marine planktonic diatom during 886

grow in light-dark cycles. Archiv Fur Mikrobiologie, 56, pp.305–323. 887

Feely, R. a et al., 2008. Evidence for upwelling of corrosive “acidified” water onto the 888

continental shelf. Science, 320(5882), pp.1490–1492. 889

Field, C.B. et al., 1998. Primary Production of the Biosphere: Integrating Terrestrial 890

and Oceanic Components. Science, 281(5374), pp.237–240. 891

Franck, V.M. et al., 2000. Iron and silicic acid concentrations regulate Si uptake north 892

and south of the Polar Frontal Zone in the Pacific Sector of the Southern Ocean. 893

Deep Sea Research Part II: Topical Studies in Oceanography, 47(15–16), 894

pp.3315–3338. 895

Harada, H. et al., 2006. CO2 sensing at ocean surface mediated by cAMP in a marine 896

diatom. Plant physiology, 142(3), pp.1318–1328. 897

Hennon, G.M.M. et al., 2015. Diatom acclimation to elevated CO2 via cAMP 898

signalling and coordinated gene expression. Nature Climate Change, 5(August), 899

pp.761–766. 900

Hildebrand, M., Frigeri, L.G. & Davis, A.K., 2007. Synchronized growth of 901

17

Thalassiosira pseudonana (Bacillariophyceae) provides novel insights into cell- 902

wall synthesis processes in relation to the cell cycle. Journal of Phycology, 43(4), 903

pp.730–740. 904

Hildebrand, M., Lerch, S.J.L. & Shrestha, R.P., 2018. Understanding Diatom Cell 905

Wall Silicification—Moving Forward. Frontiers in Marine Science, 5(April), 906

pp.1–19. 907

Hofmann, G.E. et al., 2011. High-Frequency Dynamics of Ocean pH : A Multi- 908

Ecosystem Comparison. PLoS ONE , 6(12). 909

Hopkinson, B.M., 2014. A chloroplast pump model for the CO2 concentrating 910

mechanism in the diatom Phaeodactylum tricornutum. Photosynthesis Research, 911

121(2–3), pp.223–233. 912

Hopkinson, B.M. et al., 2011. Efficiency of the CO2-concentrating mechanism of 913

diatoms. Proceedings of the National Academy of Sciences, 108(10), pp.3830– 914

3837. 915

Hopkinson, B.M., Dupont, C.L. & Matsuda, Y., 2016. The physiology and genetics of 916

CO2 concentrating mechanisms in model diatoms. Current Opinion in Plant 917

Biology, 31, pp.51–57. 918

Hunt, C.D. et al., 2010. Phytoplankton patterns in Massachusetts Bay-1992-2007. 919

Estuaries and Coasts, 33(2), pp.448–470. 920

Hutchins, D. a & Bruland, K.W., 1998. Iron-limited diatom growth and Si : N uptake 921

ratios in a coastal upwelling regime. Nature, 393(June), pp.561–564. 922

Johnson, M. et al., 2007. Ammonium accumulation during a silicate-limited diatom 923

bloom indicates the potential for ammonia emission events. Marine Chemistry, 924

18

106(1–2 SPEC. ISS.), pp.63–75. 925

Jørgensen, E.G., 1955. Variations in the Silica Content of Diatoms. Physiologia 926

Plantarum, 8(4), pp.840–845. 927

Keeling, P.J. et al., 2014. The Marine Microbial Eukaryote Transcriptome Sequencing 928

Project (MMETSP): Illuminating the Functional Diversity of Eukaryotic Life in 929

the Oceans through Transcriptome Sequencing. PLoS Biology, 12(6). 930

Khatiwala, S. et al., 2013. Global ocean storage of anthropogenic carbon. 931

Biogeosciences, 10(4), pp.2169–2191. 932

Kikutani, S. et al., 2016. Thylakoid luminal θ-carbonic anhydrase critical for growth 933

and photosynthesis in the marine diatom Phaeodactylum tricornutum. 934

Proceedings of the National Academy of Sciences, 113(35), pp.9828–9833. 935

King, A.L. et al., 2015. Effects of CO2 on growth rate, C:N:P, and fatty acid 936

composition of seven marine phytoplankton species. Marine Ecology Progress 937

Series, 537, pp.59–69. 938

Koike, I. et al., 2001. Silicate to nitrate ratio of the upper sub-arctic Pacific and the 939

Bering Sea basin in summer: Its implication for phytoplankton dynamics. Journal 940

of Oceanography, 57(3), pp.253–260. 941

Kröger, N. et al., 1997. Characterization of a 200-kDa Diatom Protein that is 942

Specifically Associated with a Silica-Based Substructure of the Cell Wall. Eur J 943

Biochem, 250(1), pp.99–105. 944

Kröger, N. et al., 2000. Species-specific polyamines from diatoms control silica 945

morphology. Proceedings of the National Academy of Sciences of the United 946

States of America, 97(26), pp.14133–14138. 947

19

Kröger, N., Bergsdorf, C. & Sumper, M., 1994. A new calcium binding glycoprotein 948

family constitutes a major diatom cell wall component. The EMBO journal, 949

13(19), pp.4676–4683. 950

Kröger, N., Deutzmann, R. & Sumper, M., 1999. Polycationic Peptides from Diatom 951

Biosilica That Direct Silica Nanosphere Formation. Science, 286(5442), 952

pp.1129–1132. 953

Kröger, N. & Wetherbee, R., 2000. Pleuralins are involved in theca differentiation in 954

the diatom Cylindrotheca fusiformis. Protist, 151(3), pp.263–273. 955

Kump, L.R., Bralowe, T.J. & Ridgwell, A., 2009. Ocean Acidification in Deep Time. 956

Oceanography, 22(4), pp.94–107. 957

De La Rocha, C.L. et al., 2000. Effects of iron and zinc deficiency on elemental 958

composition and silica production by diatoms. Marine Ecology Progress Series, 959

195, pp.71–79. 960

Lane, T.W. & Morel, F.M.M., 2000. Regulation of Carbonic Anhydrase Expression by 961

Zinc, Cobalt, and Carbon Dioxide in the Marine Diatom Thalassiosira 962

weissflogii. Plant Physiology, 123(1), pp.345–352. 963

Leblanc, K. et al., 2005. Fe and Zn effects on the Si cycle and diatom community 964

structure in two contrasting high and low-silicate HNLC areas. Deep-Sea 965

Research Part I: Oceanographic Research Papers, 52(10), pp.1842–1864. 966

Lewin, J.C., 1957. Silicon metabolism in diatoms IV. Growth and frustule formation 967

in pelliculosa. Canadian Journal of Microbiology, 3, pp.427–433. 968

Lewin, R.A., 1962. Silicification. In Physiology and biochemistry of algae. New York: 969

Academic Press, pp. 445–455. 970

20

Medlin, L.K., 2016. Evolution of the diatoms: major steps in their evolution and a 971

review of the supporting molecular and morphological evidence. Phycologia, 972

55(1), pp.79–103. 973

Meldrum, N.U. & Roughton, F.J., 1933. Carbonic anhydrase. Its preparation and 974

properties. The Journal of Physiology, 80(2), pp.113–42. 975

Mellett, T. et al., 2018. The biogeochemical cycling of iron, copper, nickel, cadmium, 976

manganese, cobalt, lead, and scandium in a California Current experimental 977

study. Limnology and Oceanography, 63, pp.S425–S447. 978

Mock, T. et al., 2008. Whole-genome expression profiling of the marine diatom 979

Thalassiosira pseudonana identifies genes involved in silicon bioprocesses. 980

Proceedings of the National Academy of Sciences of the United States of 981

America, 105(5), pp.1579–84. 982

Nakajima, K., Tanaka, A. & Matsuda, Y., 2013. SLC4 family transporters in a marine 983

diatom directly pump bicarbonate from seawater. Proceedings of the National 984

Academy of Sciences of the United States of America, 110(5), pp.1767–72. 985

Nelson, D.M. et al., 1995. Production and dissolution of biogenic silica in the ocean: 986

Revised global estimates, comparison with regional data and relationship to 987

biogenic sedimentation. Global Biogeochemical Cycles, 9(3), pp.359–372. 988

Ohno, N. et al., 2012. CO2-cAMP-Responsive cis-Elements Targeted by a 989

Transcription Factor with CREB/ATF-Like Basic Zipper Domain in the Marine 990

Diatom Phaeodactylum tricornutum. Plant Physiology, 158(1), pp.499–513. 991 van de Poll, W.H., Vrieling, E.G. & Gieskes, W.W.C., 1999. Location and expression 992

of frustulins in the pennate diatoms Cylindrotheca fusiformis, Navicula 993

21

pelliculosa, and Navicula salinarum (Bacillariophyceae). J Phycol, 35, pp.1044– 994

1053. 995

Poulsen, N. & Kröger, N., 2004. Silica morphogenesis by alternative processing of 996

silaffins in the diatom Thalassiosira pseudonana. Journal of Biological 997

Chemistry, 279(41), pp.42993–42999. 998

Le Quéré, C. et al., 2016. Global Carbon Budget 2016. Earth System Science Data, 999

8(2), pp.605–649. 1000

Raven, J., 1991. Physiology of inorganic carbon acquisition and implications for 1001

resource use efficiency by marine phytoplankton: Relation to increased carbon 1002

dioxide and temperature. Plant, Cell and Environment, 14(8), pp.779–794. 1003

Reimann, B.E.F., Lewin, Joyce, C. & Volcani, B.E., 1966. Studies on the 1004

Biochemistry and Fine Structure of Silica Shell Formation in Diatoms: II. The 1005

Structure of the Cell Wall of Navicula pelliculosa (Breb.) Hilse. Journal of 1006

Phycology, 2(2), pp.74–84. 1007

Richthammer, P. et al., 2011. Biomineralization in Diatoms: The Role of Silacidins. 1008

ChemBioChem, 12(9), pp.1362–1366. 1009

Sabine, C.L. et al., 2004. The Oceanic Sink for Anthropogenic CO2. Science, 1010

305(5682), pp.367–371. A 1011

Saito, M.A., Goepfert, T.J. & Ritt, J.T., 2008. Some thoughts on the concept of 1012

colimitation: Three definitions and the importance of bioavailability. Limnology 1013

and Oceanography, 53(1), pp.276–290. 1014

Samukawa, M. et al., 2014. Localization of putative carbonic anhydrases in the marine 1015

diatom, Thalassiosira pseudonana. Photosynthesis Research, 121(2–3), pp.235– 1016

22

249. 1017

Sapriel, G. et al., 2009. Genome-wide transcriptome analyses of silicon metabolism in 1018

Phaeodactylum tricornutum reveal the multilevel regulation of silicic acid 1019

transporters. PLoS ONE, 4(10). 1020

Scheffel, A. et al., 2011. Nanopatterned protein microrings from a diatom that direct 1021

silica morphogenesis. Proceedings of the National Academy of Sciences of the 1022

United States of America, 108(8), pp.3175–80. Available at: 1023

http://www.pnas.org/content/108/8/3175.short. 1024

Shen, C., Dupont, C.L. & Hopkinson, B.M., 2017. The diversity of carbon dioxide- 1025

concentrating mechanisms in marine diatoms as inferred from their genetic 1026

content. Journal of Experimental Botany. 1027

Shrestha, R. et al., 2012. Whole transcriptome analysis of the silicon response of the 1028

diatom Thalassiosira pseudonana. BMC Genomics, 13(1), p.499. 1029

Sims, P.A., Mann, D.G. & Medlin, L.K., 2006. Evolution of the diatoms: insights 1030

from fossil, biological and molecular data. Phycologia, 45(4), pp.361–402. 1031

Smith, S.R. et al., 2016. Transcript level coordination of carbon pathways during 1032

silicon starvation-induced lipid accumulation in the diatom Thalassiosira 1033

pseudonana. New Phytologist, 210(3), pp.890–904. 1034

Solomon, S. et al., 2007. IPCC 2007 Climate Change 2007. In The Physical Science 1035

Basis. Contribution of Working Group I to the Fourth Assessment Report of the 1036

Intergovernmental Panel on Climate Change. Cambridge: Cambridge University 1037

Press. 1038

Stadie, W.C. & O’Brien, H., 1933. The Catalysis of the Hydration of Carbon Dioxide 1039

23

and Dehydration of Carbonic Acid by an Enzyme Isolated from Red Blood Cells. 1040

The Journal of Biological Chemistry, 103(2), pp.521–529. 1041

Sunda, W.G. & Huntsman, S.A., 1995. Iron uptake and growth limitation in oceanic 1042

and coastal phytoplankton. Marine Chemistry, 50, pp.189–206. 1043

Tachibana, M. et al., 2011. Localization of putative carbonic anhydrases in two marine 1044

diatoms, Phaeodactylum tricornutum and Thalassiosira pseudonana. 1045

Photosynthesis Research, 109(1–3), pp.205–221. 1046

Tans, P., 2009. An Accounting of the Observed Increase in Oceanic and Atmospheric 1047

CO2 and the Outlook for the Future. Oceanography, 22(4), pp.26–35. 1048

Thamatrakoln, K. & Hildebrand, M., 2007. Analysis of Thalassiosira pseudonana 1049

silicon transporters indicates distinct regulatory levels and transport activity 1050

through the cell cycle. Eukaryotic Cell, 6(2), pp.271–279. 1051

Thamatrakoln, K. & Hildebrand, M., 2008. Silicon Uptake in Diatoms Revisited: A 1052

Model for Saturable and Nonsaturable Uptake Kinetics and the Role of Silicon 1053

Transporters. Plant Physiology, 146(3), pp.1397–1407. 1054

Treguer, P. et al., 1995. The Silica Balance in the World Ocean - a Reestimate. 1055

Science, 268(5209), pp.375–379. 1056

Tripp, B.C. et al., 2004. A Role for Iron in an Ancient Carbonic Anhydrase. Journal of 1057

Biological Chemistry, 279(8), pp.6683–6687. 1058

Vaulot, D. et al., 1987. Cell-cycle response to nutrient starvation in two phytoplankton 1059

species, Thalassiosira weissflogii and Hymemonas carterae. Marine Biology, 95, 1060

pp.625–630. 1061

Wenzl, S. et al., 2008. Silacidins: Highly acidic phosphopeptides from diatom shells 1062

24

assist in silica precipitation in vitro. Angewandte Chemie - International Edition, 1063

47(9), pp.1729–1732. 1064

Yool, A. & Tyrrell, T., 2003. The role of diatoms in regulating the ocean’s silicate 1065

cycle. Global Biogeochemical Cycles, 17(4), pp.1–24. 1066

Young, J.N. et al., 2016. Large variation in the Rubisco kinetics of diatoms reveals 1067

diversity among their carbon-concentrating mechanisms. Journal of Experimental 1068

Botany, 67(11), pp.3445–3456. 1069

Zeebe, R.E. & Wolf-Gladrow, D., 2001. CO2 in Seawater: Equilibrium, Kinetics, 1070

Isotopes 1st ed. D. Halpern, ed., Amsterdam: Elsevier Oceanography Series. 1071

Zimmermann, J., Jahn, R. & Gemeinholzer, B., 2011. Barcoding diatoms: Evaluation 1072

of the V4 subregion on the 18S rRNA gene, including new primers and protocols. 1073

Organisms Diversity and Evolution, 11(3), pp.173–192. 1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

25

Manuscript 1 1086

Formatted for publication in Journal of Phycology 1087

Transcriptome sequencing reveals contrasting silicon limitation responses in two 1088

Thalassiosira diatom species 1089

Joselynn R. Wallace1, Tani Leigh1, P. Dreux Chappell2, Bethany D. Jenkins1,3 1090

1 Department of Cell and Molecular Biology, University of Rhode Island, Kingston, 1091 RI, USA 1092 1093 2 Ocean, Earth and Atmospheric Sciences, Old Dominion University, Norfolk, VA, 1094 USA 1095 1096 3 Graduate School of Oceanography, University of Rhode Island, Narragansett, RI, 1097 USA 1098 1099

26

ABSTRACT 1100

Diatoms perform 40% of primary production in the global ocean, which is 1101 constrained by the availability of growth-requiring nutrients. This includes silicon, 1102 which diatoms use to build cell walls or frustules. Silicon limitation of diatom growth 1103 has been observed in the North Atlantic Ocean and some high nutrient low chlorophyll 1104

(HNLC) regions, but these studies assessed the silicon status of the bulk diatom 1105 community and it is not known whether silicon limitation is uniform across particular 1106 diatom species. Genetic markers have been used to probe iron status of diatom species 1107 in the field, and similar markers of silicon status could be used to probe the silicon 1108 status of different co-existing diatom species. However, silicon limited transcriptome 1109 studies have been restricted mainly to the model diatom Thalassiosira pseudonana 1110 and have not included diatoms found in the North Atlantic Ocean or HNLC regions, 1111 hampering the ability to develop appropriate molecular markers. This study analyzed 1112 the silicon limited transcriptomes of two closely related Thalassiosira diatoms found 1113 in HNLC regions and the North Atlantic Ocean, toward the goal of developing 1114 appropriate molecular markers of silicon status. We found a gene family encoding 1115 putative ATP-grasp domain proteins that were upregulated in silicon-limited (but not 1116 iron or nitrate-limited) T. oceanica and T. weissflogii. Members of this gene family 1117 were also upregulated in previously published T. pseudonana transcriptomes and 1118 homologs were found across a diversity of diatoms, suggesting that they might be 1119 useful as a silicon limitation marker in many diatoms, in addition to Thalassioisra spp. 1120 diatoms. 1121

1122

27

INTRODUCTION 1123

Diatoms are photosynthetic single-celled eukaryotes that account for 40% of 1124 oceanic primary production and approximately 20% of global primary production 1125

(Nelson et al. 1995; Field et al. 1998). The rate and extent of diatoms’ primary 1126 production is a function of their growth, which is constrained by the availability of 1127 nutrients in the ocean. Many phytoplankton share growth requirements for nutrients 1128 such as nitrate, phosphate, and iron, but diatoms are unique among the phytoplankton 1129 as they have an absolute growth requirement for silicon, which is used to build their 1130 ornately silicified cell walls or frustules (Lewin 1957; Lewin 1962; Jørgensen 1955). 1131

It has been suggested that silicon ultimately limits diatom growth throughout 1132 the ocean (Dugdale et al. 1995; Yool & Tyrrell 2003) by several different 1133 mechanisms. Firstly, there is a discrepancy in the available pools of recycled nitrogen 1134 and recycled silicon in the surface ocean. Silicon recycling occurs deeper in the water 1135 column than that of other nutrients (C, N, P), leading to a propensity for silicon 1136 depletion in the surface ocean (Dugdale et al. 1995; Treguer et al. 1995; Dugdale & 1137

Wilkerson 1998). Secondly, diatoms growing in upwelled water can become silicon 1138 limited if these waters are enriched in nitrate relative to silicon. This has been 1139 observed in the North Atlantic Ocean (Allen et al. 2005; Johnson et al. 2007). Thirdly, 1140 silicon may also set the upper limit on diatom primary production in iron limited high 1141 nutrient low chlorophyll (HNLC) regions (Dugdale & Wilkerson 1998). Diatoms 1142 growing in low iron conditions (< 5 nM), take up as much as 8 times more silicon than 1143 nitrate (Franck et al. 2000). Thus, diatoms in HNLC regions may be driven into silicon 1144 limitation due to their increased silicon requirement (Rueter & Morel 1981; Hutchins 1145

28

& Bruland 1998; De La Rocha et al. 2000). It has also been suggested that silicon 1146 processing may require iron as a cofactor, which would lead to a biochemically- 1147 dependent iron and silicon co-limitation (Mock et al. 2008; Saito et al. 2008). Silicon 1148 and trace metal co-limitation of diatoms has been documented in the HNLC waters of 1149 the Equatorial Pacific Ocean (Brzezinski et al. 2011), the Southern Ocean and sub- 1150

Antarctic (Boyd et al. 1999; Leblanc et al. 2005; Franck et al. 2000), and the sub- 1151

Arctic Pacific (Koike et al. 2001). As these studies assessed the silicon status of the 1152 bulk diatom community, it is currently not known if different diatom species 1153 experience varying degrees of silicon stress. 1154

Results from lab experiments show that silicon metabolism and physiology 1155 vary widely across diatom species. There is a wide range in diatoms’ silicon quotas, 1156 which vary from less than 0.1 to greater than 200 pmol silicon per cell (Martin- 1157

Jézéquel et al. 2000). Diatoms can also vary greatly in their maximal silicon uptake 1158 rate, even amongst diatoms in the same genus. For example, the maximal uptake rate 1159 of Thalassiosira weissflogii (CCMP1336) is approximately 10 fold higher than for 1160

Thalassiosira pseudonana (CCMP1335) (Thamatrakoln & Hildebrand 2008). Closely 1161 related diatoms also vary in their silicon storage capacities, exemplified in the 4-fold 1162 difference in the ratio of silicon cell wall:silicon storage pool in two strains of 1163

Navicula pelliculosa (Thamatrakoln & Hildebrand 2008). These differences in silicon 1164 metabolism and physiology suggest that diatoms may have different strategies for 1165 coping with silicon limitation in the wild and highlight the need for molecular markers 1166 to probe the silicon status of whole diatom communities and also for more sensitive, 1167 species-specific markers. 1168

29

Biochemical studies in diatoms have revealed several classes of proteins that 1169 are involved in silicon processing. The silicon transporters (SITs) are used for uptake 1170 when silicic acid levels are low (Thamatrakoln & Hildebrand 2008). Other molecules 1171 involved in silicon processing include the silaffins (Poulsen & Kröger 2004; Kröger et 1172 al. 1999), cingulins (Scheffel et al. 2011), silacidins (Wenzl et al. 2008), long-chain 1173 polyamines (Kröger et al. 2000), frustlins (Kröger et al. 1994; Kröger et al. 1996), and 1174 pleuralins (Kröger et al. 1997; Kröger & Wetherbee 2000). The silaffins, cingulins, 1175 and silacidins all contain an N-terminal ER signal peptide to facilitate their import into 1176 the ER and are subsequently localized to the silicon deposition vesicle, where they 1177 interact with long-chain polyamines to precipitate silicon for frustule formation 1178

(Kröger & Poulsen 2008; Scheffel et al. 2011; Wenzl et al. 2008; Poulsen et al. 2013; 1179

Drum & Pankratz 1964). While the pleuralin and frustulin molecules are tightly 1180 associated with the silicified cell wall (Kröger et al. 1997; Kröger et al. 1994), they are 1181 not thought to be directly involved in silicon deposition in the silicon deposition 1182 vesicle (Kröger & Wetherbee 2000; van de Poll et al. 1999). 1183

Genetic studies have also given us a better understanding of the molecular 1184 underpinnings of diatom silicon physiology. The majority of these studies have been 1185 conducted in the model diatom T. pseudonana. Silicon limited T. pseudonana 1186 upregulate the expression of the genes and proteins for the silicon transporters SIT1 1187 and SIT2 (Mock et al. 2008; Shrestha et al. 2012; Du et al. 2014; Smith et al. 2016; 1188

Brembu et al. 2017). Other studies show that they are also tightly regulated with the 1189 cell cycle (Thamatrakoln & Hildebrand 2007), so it may not be possible to distinguish 1190 aspects of cell cycle progression from changes in silicon status. In addition to SIT1 1191

30

and SIT2, T. pseudonana also has a third silicon transporter, SIT3, that is expressed at 1192 low levels and insensitive to changes in silicon and the cell cycle (Thamatrakoln & 1193

Hildebrand 2007; Shrestha et al. 2012); leading to the hypothesis that it may function 1194 as a silicon sensor (Thamatrakoln & Hildebrand 2007). Silicon-limited T. pseudonana 1195 also increase their concentration of silacidin proteins relative to silicon replete 1196 condition (Richthammer et al. 2011). Silaffin genes are differentially regulated both in 1197 response to silicon supply and deficit, leading to the hypothesis that distinct silaffin 1198 gene copies are optimized for silicon deposition in either low or high silicon 1199 concentrations (Shrestha et al. 2012; Brembu et al. 2017). Cingulins are 1200 downregulated during silicon limitation and upregulated after silicon resupply, but 1201 they are also upregulated during the S phase of the cell cycle (when silicon deposition 1202 does not occur) suggesting that cingulins have roles in the cell aside from silicon 1203 deposition (Shrestha et al. 2012; Brembu et al. 2017). In T. pseudonana, silicidin 1204 genes are not differentially expressed. Genes for the biosynthesis of long-chain 1205 polyamine (LCPA) precursors (spermidine synthases) are both up and downregulated 1206 in silicon limitation, suggesting that distinct spermidine synthases may be optimized 1207 for LCPA synthesis in either low or high silicon concentrations or that LCPAs also 1208 play a larger role in diatom cells aside from silicon deposition (Bouchereau et al. 1209

1999; Shrestha et al. 2012; Mock et al. 2008). 1210

Other general trends regarding diatom silicon metabolism emerge from genetic 1211 studies in T. pseudonana and P. tricornutum. Firstly, these studies note that many of 1212 the silicon-sensitive transcripts are unique to diatoms or the particular diatom species 1213 tested (Mock et al. 2008; Sapriel et al. 2009; Shrestha et al. 2012). Genes involved in 1214

31

intracellular signaling or post-transcriptional regulation (e.g. kinases) are both up 1215

(Smith et al. 2016; Brembu et al. 2017; Mock et al. 2008; Shrestha et al. 2012) and 1216 downregulated (Mock et al. 2008; Sapriel et al. 2009; Shrestha et al. 2012) in silicon 1217 limitation. 1218

To date, the genetic response of diatoms to silicon limitation has not been 1219 followed in diatoms isolated from regions of the ocean where silicon sets an upper 1220 limit on diatom production, such as iron-limited, high-nutrient, low chlorophyll 1221

(HNLC) regions, (Dugdale & Wilkerson 1998; Chappell et al. 2016). A potential 1222 target for ecological markers of silicon status is the suite of genes encoding proteins 1223 involved in silicon processing described above. Homologs of some of these silicon 1224 processing proteins are bioinformatically difficult to predict in new species. For 1225 example, the silaffins (Poulsen & Kröger 2004) and silacidins (Pamirsky & 1226

Golokhvast 2013; Wenzl et al. 2008; Richthammer et al. 2011) don’t show global 1227 sequence homology across taxa and are characterized by their overall amino acid 1228 content. Similarly, the pleuralins and frustulins are characterized by the presence of 1229 short repeat domains rather than their global sequence homology (Kröger et al. 1997; 1230

Kröger & Wetherbee 2000; Armbrust et al. 2004; Kröger & Poulsen 2008; Kröger et 1231 al. 1994; Kröger et al. 1996). The cingulins have only been identified in T. 1232 pseudonana and do not have homologs in P. tricornutum or F. cylindrus, suggesting 1233 that there is no cingulin sequence homology across species (or that cingulins are 1234 unique to T. pseudonana) (Scheffel et al. 2011). This makes finding and confirming 1235 orthologs using global sequence homology-based approaches a challenge. Amongst 1236 silicon metabolism proteins, the only ones that can be easily in novel genomes and 1237

32

transcriptomes are the silicon transporters (SITs) and genes involved in the 1238 biosynthesis of the long-chain polyamines (LCPAs). SITs are distributed across all the 1239 major diatom lineages and can be identified by homology searches of protein 1240 similarity (Hildebrand et al. 1997; Hildebrand et al. 1998; Durkin et al. 2016). 1241

Although their lengths and overall structures vary across species, the biosynthesis of 1242

LCPAs involves putrescine, spermine, and spermidine precursors, and the genes for 1243 the synthesis of these precursors can be identified by homology searches of protein 1244 similarity (Kröger et al. 2000; Bouchereau et al. 1999). 1245

As global gene-based studies of silicon metabolism across diatom species have 1246 been restricted to a few species, the database of silicon-responsive genes across 1247 species is limited, particularly from diatoms found in silicon limited environments. In 1248 order to better understand the responses of diatoms found in ecosystems where their 1249 growth could be constrained by silica availability and to develop genetic markers that 1250 could be used in the environment to profile silicon sufficiency, we selected two related 1251 diatom species in the Thalassiosira genus that may be found in areas of the ocean 1252 where silicon may limit diatom growth and primary production. Thalassiosira 1253 oceanica CCMP1005 has a high tolerance for iron limitation and may persist in 1254 chronically iron limited HNLC regions where silicon requirements are high (Sunda & 1255

Huntsman 1995; Franck et al. 2000; Chappell et al. 2015). Thalassiosira weissflogii 1256

CCMP1010 was isolated from the North Atlantic Ocean, an ocean basin where silicon- 1257 depleted upwelling has been observed (Allen et al. 2005; Johnson et al. 2007). We 1258 selected this genus as it allows us to make comparisons to studies in T. pseudonana 1259 which has a completely sequenced genome (Armbrust et al. 2004). 1260

33

Comparative quantitative transcriptomes of Thalassiosira oceanica 1261

CCMP1005 and Thalassiosira weissflogii CCMP1010 were analyzed for candidate 1262 gene markers of silicon limitation by profiling their global gene expression under 1263 silicon-limiting growth in comparison to profiles in other limiting nutrients, nitrate and 1264 iron. Annotations based on the KEGG database (Moriya et al. 2007) were used to 1265 determine whether or not broad metabolic pathways are silicon-sensitive in both T. 1266 oceanica and T. weissflogii. Silix homology-based networks (Miele et al. 2011) 1267 including other Thalassiosiroid transcriptomes were used to search for individual, 1268 homologous genes with silicon-specific regulation in T. oceanica and T. weissflogii. 1269

Homology-independent methods (FIMO, BioStrings, SignalP) were used to identify 1270 putative silaffins, cingulins, silacidins, and frustulins from Thalassiosira oceanica 1271

CCMP1005 and Thalassiosira weissflogii CCMP1010 and their potential utility as 1272 molecular markers of silicon stress is discussed (Grant et al. 2011; Pagès et al. 2016; 1273

Petersen et al. 2011). 1274

MATERIALS AND METHODS 1275

Culturing, RNA extraction, and Sequencing 1276

Cultures of Thalassiosira oceanica CCMP1005 and Thalassiosira weissflogii 1277

CCMP1010 were obtained from the National Center for Marine Algae and Microbiota 1278

(NCMA; West Boothbay Harbor, ME, USA). Culturing and sequencing was 1279 completed in a range of conditions in order to get a thorough representation of each 1280 diatoms’ transcript pool for the de novo assemblies. For nutrient replete, silicon 1281 limited, iron limited, and nitrate limited culturing, T. weissflogii and T. oceanica were 1282 grown in triplicate in a modified F/2 media (Guillard & Ryther 1962; Guillard 1975) 1283

34

containing 400 nM iron or 4 nM iron for the iron limited condition. A final 1284 concentration of 10 µM silicon was used for the silicon limited condition and no 1285 nitrate was added to the media for the nitrate limited condition. Macronutrients and 1286 vitamin solutions were treated with prepared Chelex-100 resin (Bio-Rad Laboratories 1287

Inc., Hercules, CA, USA) and syringe-filtration sterilized before being added to 1288 microwave sterilized Sargasso Sea water (Price et al. 1989). For inclusion in the de 1289 novo assembly, additional sequencing was performed using RNA from diatoms grown 1290 in F/2 medium at four CO2 levels ranging from ~100 ppm to ~4000 ppm using 1291 autoclaved Woods Hole sea water amended with nutrients, vitamins, and trace metal 1292 mix that had not been Chelex-treated (Guillard & Ryther 1962; Guillard 1975; King et 1293 al. 2015). For the replete and silicon limited diatom culturing, biomass was collected 1294 after 3 days of growth, when the cells in the silicon limited condition began to grow 1295 more slowly relative to the silicon replete condition. Experiments were also conducted 1296 using the diatom Thalassiosira rotula (CCMP3096) to obtain a draft transcriptome for 1297 inclusion in Silix homology networks and comparison to T. oceanica and T. 1298 weissflogii (Miele et al. 2011). Duplicate cultures of T. rotula were grown in a range 1299 of three CO2 conditions as well as iron-limited and nutrient replete conditions. Cells 1300 were collected on 3 µm pore size polyester filters (Sterlitech Corporation, Kent, WA, 1301

USA) by gentle filtration, flash frozen in liquid nitrogen, and stored at -80 °C until 1302

RNA extraction. 1303

RNA was extracted from filters following a modified version of the RNeasy 1304

Mini Kit protocol (Qiagen,Venlo, Netherlands). The protocol was modified by 1305 vortexing filters in lysis buffer with 0.5 mm zircon beads (BioSpec, Bartlesville, OK, 1306

35

USA) for 15 minutes to lyse the cells. Cell debris was immediately removed by 1307 running lysate over a QiaShredder column (Qiagen). DNA contamination was 1308 removed by an on-column DNAse step using the RNase-free DNase Set (Qiagen) and 1309 an additional DNAse step was performed on eluted RNA using the Turbo DNA-free 1310 kit (Life Technologies, Carlsbad, CA, USA), both performed as per the manufacturers’ 1311 instructions. Samples were quantified using the Qubit High-Sensitivity Kit (Life 1312

Technologies). 1313

All culturing was performed in triplicate, except for the T. rotula CO2 1314 experiments, which were performed in duplicate. Equal amounts of total RNA from 1315 each like-condition replicate was pooled and sequenced. Nutrient replete, silicon 1316 limited, iron limited, and nitrate limited RNA from T. weissflogii and T. oceanica was 1317 sent to the National Center for Genome Resources as part of the Gordon and Betty 1318

Moore Marine Eukaryotic Transcriptome Sequencing Project (MMETSP) for 1319 sequencing on an Illumina HiSeq 2000 (30 million 50-nucleotide paired-end reads) 1320

(Keeling et al. 2014). RNA from T. oceanica, T. weissflogii, and T. rotula grown in 1321 the various CO2 conditions were sequenced at the JP Sulzberger Genome Sequencing 1322

Center at Columbia University on an Illumina HiSeq 2500 (30 million 100-nucleotide 1323 paired-end reads). The T. rotula iron limited and nutrient replete sequencing was done 1324 at the Joint Genome Institute on an Illumina HiSeq 2500 (50 million 100-nucleotide 1325 paired-end reads). All libraries were mRNA enriched using a poly-A pull-down 1326 method. 1327

1328

1329

36

Trimming, Assembly, and Read Mapping 1330

Initial read quality assessments were performed in FastQC (Andrews 2010). 1331

Raw reads were subsequently imported into CLC Workbench (CLC Bio-Qiagen, 1332

Aarhus, Denmark) with an insert size ranging from 1-400 nucleotides. Reads were 1333 trimmed in CLC Workbench with the following parameters: removal of 13 bases off 1334 the 5’ end, .001 quality cut off, and no ambiguous nucleotide base calls. 1335

Assemblies of each library were conducted in CLC Workbench, using a set of 1336 k-mers ranging from 1/2 read length to full length read after trimming. The assemblies 1337 were completed with no scaffolding, automatic bubble size selection, minimum contig 1338 length of 200, and with the auto detect paired distance options selected. To remove 1339 redundant contigs that were assembled more than once (e.g., present in multiple 1340 libraries or assemblies), the resulting assemblies were pooled using CD-HIT-EST 1341 using the following flags: c .99 -G .99 -n 10 -d 1000 -aS .75 -g 1 -p 1 -T 0 (Li & 1342

Godzik 2006; Fu et al. 2012). 1343

ORF prediction was performed on the nucleotide sequences as follows: 1344 nucleotide sequences were run in BlastX against a custom database containing the 1345 predicted proteins (filtered set of best models per locus) from available diatom 1346 genomes from JGI (Thalassiosira pseudonana Thaps3 and Thaps3_bd, Fragiliariopsis 1347 cylindrus v1.0, Phaeodactylum tricornutum Phatr2 and Phatr2_bd, and Pseudo- 1348 multiseries v1) and transcriptomes available through MMETSP as of March 1349

2013. The assemblies and BlastX output were uploaded to ORFpredictor to determine 1350 coding (CDS) and protein sequences (Min et al. 2005). 1351

37

Additional clustering of protein sequences was performed to remove redundant 1352 amino acid sequences that were assembled multiple times but did not cluster at the 1353 nucleotide sequence level. Proteins were clustered in CD-HIT using the following 1354 flags: -c .75 -G 0 -n 5 -d 1000 -aS .75 -g 1 -p 1 -T 0. CDS sequences corresponding to 1355 the clustered protein sequences that were greater than 150 amino acids were used for 1356 subsequent read mapping. Trimming and assembly for T. rotula were completed as 1357 described for T. oceanica and T. weissflogii, although the iron limited and nutrient 1358 replete reads were included as guide-reads only, as there was contaminating ribosomal 1359

RNA sequence in these libraries. Sequences with less than 1 RPKM (T. oceanica) or 2 1360

RPKM (T. weissflogii and T. rotula) across all conditions (i.e., consistently very lowly 1361 expressed) were removed from the assembly as potential contaminants (e.g. organelle 1362 transcripts). 1363

Mapping for downstream expression analysis was done using the RNAseq 1364 module in CLC Workbench with the following parameters: mismatch cost (2), 1365 insertion cost (3), deletion cost (3), length fraction (0.8), similarity fraction (0.8), 1366 paired distances (auto-detect), strand specific (both), maximum number of hits for a 1367 read (10). 1368

Annotation and Gene Ontologies 1369

The translated protein sequences from each diatom transcriptome assembly 1370 were uploaded to the KEGG automatic annotation server (Moriya et al. 2007). The 1371

BLAST search program was used to query the “Genes” representative data set, plus 1372 the two model diatoms available in the KEGG database (T. pseudonana and P. 1373

38

tricornutum). An additional annotation step was performed in InterProScan (version 1374

5.7-48.0 for T. oceanica and 5.13-52.0 for T. weissflogii) (Jones et al. 2014). 1375

Ortholog Finding with Silix 1376

Silix networks were used to infer orthologous transcripts across diatom species 1377

(Miele et al. 2011). This is a blast-based homology network construction approach that 1378 groups similar sequences into gene families. Two sequences are included in the same 1379 family if the high-scoring segment pair length covers at least 80% of the query protein 1380 length and if their similarity is over 35% identity. The input for this analysis was the 1381

T. rotula draft transcriptome assembly, the T. pseudonana filtered protein models from 1382

JGI (Thaps3 and Thaps3_bd), and the assembled T. oceanica and T. weissflogii 1383 transcriptomes. 1384

Differential Expression Analysis and Hierarchical Clustering 1385

Differential gene expression was inferred with Analysis of Sequence Counts 1386

(ASC), using the total number of reads mapped to each transcript. Transcripts with at 1387 least a 95% posterior probability of a 2-fold change were considered differentially 1388 expressed (Wu et al. 2010). 1389

To identify genes that respond specifically to silicon status rather than as part 1390 of a general stress response, hierarchical clustering was performed on the genes 1391 differentially expressed in silicon limitation. The gene expression values (as RPKM) 1392 in replete, silicon limited, nitrate limited, and iron limited conditions were the input to 1393 this analysis. Expression values were normalized by adding an offset value of 1 to all 1394

RPKM values and the log2 fold change for silicon limited, nitrate limited, and iron 1395 limited conditions was calculated relative to the nutrient replete condition. 1396

39

The complete Euclidian distance was calculated using the dist function and 1397 hierarchical clustering was performed using hclust, both part of the stats library 1398 included in R (R Development Core Team 2013). The resulting dendograms were cut 1399 at a height of 6 (T. oceanica upregulated) or 4 (T. oceanica downregulated, T. 1400 weissflogii up and downregulated). The heights used were selected based on the ability 1401 to distinguish clusters of genes regulated by silicon from those that may be part of a 1402 general stress response (i.e., clusters of genes upregulated or downregulated in all 1403 nutrient limitation scenarios examined). 1404

Genes were grouped according to their cluster assignment and the per-cluster 1405 median and interquartile ranges of expression values were tabulated and plotted. 1406

Clusters of genes were considered part of a general stress response if the per-cluster 1407 interquartile range of expression values was above (or below, in the case of genes 1408 downregulated in silicon limitation) the log2 fold change 0 threshold in iron or nitrate 1409 limitation (i.e., genes upregulated in silicon limitation that are also upregulated in iron 1410 and/or nitrate limitation were considered part of a general stress response; e.g., T. 1411 oceanica up in silicon limitation cluster #7) (Supplemental Figure 1, Supplemental 1412

Table 3). Gene clusters with non-overlapping interquartile ranges were also 1413 considered part of a silicon-specific expression response (i.e., clusters of genes that are 1414 responsive to iron and/or nitrate are still considered part of a silicon-specific response 1415 as long as their fold-change values in silicon limitation are considerably larger than the 1416 fold-change in iron and/or nitrate limitation; e.g., T. weissflogii up in silicon limitation 1417 cluster #3) (Supplemental Figure 1, Supplemental Table 3). Clusters are also 1418 considered part of a silicon-specific expression response even if their fold change is 1419

40

large in iron or nitrate limitation, as long as the direction of regulation in these 1420 conditions is in the opposite direction than what is seen in silicon limitation (e.g., T. 1421 oceanica up in silicon limitation cluster #6) (Supplemental Figure 1, Supplemental 1422

Table 3). 1423

Silicon Transporter Phylogeny 1424

Representative diatom silicon transporter (SIT) protein sequences 1425 corresponding to the 5 major phylogenetic SIT clades (A, B, C, D, and E) (Durkin et 1426 al. 2016) from F. cylindrus (Durkin et al. 2016) and the 3 known SITs from T. 1427 pseudonana (SIT1, SIT2, SIT3) (Thamatrakoln et al. 2006) were used as a BlastP 1428 query to identify putative silicon transporters from the T. oceanica and T. weissflogii 1429 transcriptomes and from the previously published T. oceanica genome (Lommer et al. 1430

2012). Genes with at least 35% identity and 80% query coverage compared to the 1431 known SITs were considered putative silicon transporters. The alignment and tree also 1432 included partial SIT sequences from T. weissflogii CCMP1010 (Alverson 2007). 1433

SIT amino acid sequences were aligned in MAFFT (version 7) using the L- 1434

INS-i strategy with the BLOSUM62 scoring matrix and the “leave gappy regions” 1435 option selected (Katoh & Standley 2013). Alignments were trimmed in Jalview and 1436 trees were constructed in RAxML from a 325 amino acid fragment of the alignment. A 1437 maximum likelihood tree was computed with the following parameters: 1438

PROTGAMMA matrix, BLOSUM62 substitution model, and 1000 bootstrap 1439 replicates. The tree was mid-point rooted in FigTree. 1440

1441

1442

41

Silaffins, cingulins, silacidins, frustulins, pleuralins, and polyamines 1443

Homologs of other silicon processing proteins are bioinformatically difficult to 1444 predict in new species. This is because the silaffins (Poulsen & Kröger 2004), 1445 silacidins (Pamirsky & Golokhvast 2013), and cingulins (Scheffel et al. 2011) do not 1446 show global sequence homology across taxa, while the pleuralins and frustulins are 1447 characterized by the presence of short repeat domains rather than their global sequence 1448 homology (Kröger et al. 1997; Kröger & Wetherbee 2000; Armbrust et al. 2004; 1449

Kröger & Poulsen 2008; Kröger et al. 1994; Kröger et al. 1996). Common domain or 1450 amino acid content features were used to search for potential homologs of silicon 1451 processing proteins in the translated transcriptomes of T. oceanica and T. weissflogii. 1452

To identify silaffins and cingulins in T. oceanica and T. weissflogii, a FIMO 1453 search (Grant et al. 2011) was run for the pentalysine cluster KXXKXXKXX-KXXK 1454

(Poulsen et al. 2013) and proteins that did not contain an N-terminal ER signal peptide 1455 in SignalP were filtered out (Petersen et al. 2011; Poulsen et al. 2013). Using the 1456 predicted proteins from the T. pseudonana genome, this approach was sufficient to 1457 recover all of the sequences for silaffins and cingulins present in T. pseudonana 1458

(Poulsen & Kröger 2004; Scheffel et al. 2011). 1459

The T. pseudonana silacidin sequence was used as a benchmark to search for 1460 silacidins in T. oceanica and T. weissflogii. This protein is acidic and is ~14% aspartic 1461 acid and 17% glutamic acid residues (Wenzl et al. 2008; Richthammer et al. 2011). 1462

Biostrings in R was used to find proteins in T. oceanica and T. weissflogii that pass 1463 this threshold of acidic amino acids per protein and the presence or absence of an N- 1464

42

terminal ER signal peptide was determined by SignalP (Pagès et al. 2016; Petersen et 1465 al. 2011). 1466

To identify putative frustulins in T. oceanica and T. weissflogii, a FIMO search 1467

(Grant et al. 2011) for the hexapeptide motifs CEGDCD and VPGCSG that are 1468 diagnostic of the ACR (acidic cysteine rich domains) of frustulins was performed 1469

(Kröger et al. 1994; Kröger et al. 1996). Transcripts without an N terminal ER signal 1470 peptide were filtered out using SignalP (Petersen et al. 2011; Kröger & Poulsen 2008). 1471

Pleuralins have been identified and characterized in Cylindrotheca but are not 1472 present in T. pseudonana and are characterized by PSCD (proline, serine, cysteine, 1473 and aspartic acid-rich) repeat domains (Kröger et al. 1997; Kröger & Wetherbee 2000; 1474

Armbrust et al. 2004). The PSCD domain repeats of the pleuralins from C. fusiformis 1475 are 87 or 89 amino acids in length and are 22% proline, 11% serine, 11% cysteine, and 1476

9% aspartic acid (Kröger et al. 1997). These percentages were used as a benchmark to 1477 scan our transcriptomes for pleuralins using BioStrings with a sliding window length 1478 of 87 amino acids (Pagès et al. 2016). 1479

KEGG annotations were searched for genes involved in the synthesis of 1480 polyamines, including those involved in polyamine metabolism as well as arginine, 1481 ornithine, and glutamate metabolism that funnel into polyamine metabolism. 1482

RESULTS AND DISCUSSION 1483

Transcriptome Assembly and Annotation 1484

In total, 36382 transcripts were assembled from T. oceanica and 24900 1485 transcripts from T. weissflogii (Table 1). When compared to the previously sequenced 1486

T. oceanica, the number of T. oceanica transcripts assembled is less than the total 1487

43

number of predicted genes (37921) but more than the number of genes (29306) in the 1488 non-redundant predicted set of genes (Lommer et al. 2012). InterProScan and KEGG 1489

KAAS assigned initial functional annotations or protein domain information to 12450 1490

T. oceanica and 9580 T. weissflogii transcripts (Table 1, Supplemental Table 1). This 1491 corresponds to 34% and 38% of transcripts in the respective transcriptomes of T. 1492 oceanica and T. weissflogii. This is consistent with numbers from other studies that 1493 were able to assign functional annotations to 48% (17920 transcripts) or 40% (10591) 1494 of assembled de novo transcriptomes from microbial eukaryotes Aureococcus 1495 anophagefferens and Euglena gracilis, respectively (Frischkorn et al. 2014; Yoshida et 1496 al. 2016). 1497

Differential Expression Analysis and Hierarchical Clustering 1498

Differential gene expression between the silicon limited and silicon replete 1499 condition was inferred with Analysis of Sequence Counts (Wu et al. 2010). Genes 1500 with a 95% posterior probability of at least a 2-fold change in silicon limitation 1501 relative to the nutrient replete condition were considered differentially expressed (Wu 1502 et al. 2010). 1503

To distinguish a silicon-specific stress response from a more general 1504 nutrient stress response, the subset of transcripts differentially expressed in silicon 1505 limitation were hierarchically clustered. The clustering input was the per-transcript 1506 log2-fold change expression values in nitrate limited, iron limited, and silicon limited 1507 conditions relative to the nutrient replete condition. Differentially expressed genes in 1508

T. oceanica were binned into 18 clusters – 7 upregulated (containing 947 genes) and 1509

11 downregulated (containing 809 genes) (Supplemental Figure 1, Supplemental 1510

44

Table 3). Of these T. oceanica clusters, 4 (containing 823 genes) were part of a 1511 silicon-specific upregulation and 4 (containing 539 genes) were part of a silicon- 1512 specific downregulation (Figure 1). The differentially expressed genes in T. weissflogii 1513 were binned into 47 clusters – 23 upregulated (containing 2814 genes) and 24 1514 downregulated (containing 1467 genes) (Supplemental Figure 1, Supplemental Table 1515

3). Of these T. weissflogii clusters, 6 (containing 557 genes) were part of a silicon- 1516 specific upregulation and 3 (containing 801 genes) were part of a silicon-specific 1517 downregulation (Figure 1). 1518

Previous studies have found a significant overlap in the subset of genes 1519 upregulated in iron and silicon limited T. pseudonana transcriptomes (Mock et al. 1520

2008). There are 3 clusters of genes that are upregulated in iron and silicon limited T. 1521 weissflogii (Supplemental Figure 1), but there are not any clusters in T. oceanica that 1522 are upregulated in both iron and silicon limited conditions (Supplemental Figure 1). 1523

This discrepancy might be explained by differences in iron physiology. T. oceanica 1524 can tolerate lower iron concentrations than T. pseudonana (Sunda & Huntsman 1995), 1525 but T. weissflogii and T. pseudonana show similar thresholds for their iron 1526 requirements (Whitney et al. 2011). Perhaps the overlap in iron and silicon sensitive 1527 genes is a feature of diatoms with higher iron requirements rather than diatoms that are 1528 tolerant of low iron. 1529

1530

1531

1532

45

Identification of genes sensitive to silicon status: silicon transporters and putative 1533 silicon efflux proteins 1534

Sequence similarity to known SITs from Thalassiosira pseudonana and 1535

Fragilariopsis cylindrus was used identify SIT sequences from T. oceanica, T. 1536 weissflogii, and T. rotula. All of the SITs from these four Thalassiosira diatoms (T. 1537 pseudonana, T. oceanica, T. weissflogii, and T. rotula) are in clade E (Lommer et al. 1538

2012; Durkin et al. 2016). There is no evidence of any clade B SITs, unlike other 1539

Thalassiosira diatoms (Durkin et al. 2016). The numbers of SITs present in each 1540 diatom varies (Thamatrakoln et al. 2006; Durkin et al. 2016). Our T. oceanica data 1541 confirms results from previous studies indicating that CCMP1005 has a single SIT 1542 gene copy (Lommer et al. 2012; Durkin et al. 2016) (Figure 2). The single SIT in T. 1543 oceanica shares a branch point with T. pseudonana SIT1 and SIT2 and it is also 1544 grouped together with the single full length silicon transporter recovered from the T. 1545 oceanica genome (support value of 100) (Figure 2) (Lommer et al. 2012). We 1546 recovered 4 SIT sequences from T. weissflogii CCMP1010 (Figure 2), a greater 1547 number than the 2 obtained in a previous study (Alverson 2007) that relied on 1548 amplification methods using degenerate primers to amplify SIT gene fragments . We 1549 also recovered only a single SIT from T. rotula CCMP3096. It is possible that this 1550 diatom has additional SIT copies not captured in this assembly, as there was no silicon 1551 limited condition included in the T. rotula transcriptome to potentially induce their 1552 expression (Figure 2). 1553

Sequence alignment and maximum likelihood tree construction indicates that 1554 three of the SITs in T. weissflogii appear to be paralogous sequences derived from a 1555

46

single ancestral SIT copy that forms a clade with T. pseudonana SIT1 and SIT2 1556

(Figure 2). Two of these sequences group with previously identified T. weissflogii SIT 1557 fragments (Alverson 2007), as illustrated by their grouping together on the tree with 1558 bootstrap support values of 100 (Figure 2). The third T. weissflogii SIT recovered from 1559 the transcriptome also clusters with one of the previously identified SIT fragments 1560

(Alverson 2007), although with a lower support value of 89 (Figure 2). The fourth T. 1561 weissflogii SIT forms a clade with T. pseudonana SIT3 (Figure 2). This is in contrast 1562 with previous studies indicating that T. weissflogii CCMP1010 didn’t have a SIT3-like 1563 gene copy (Alverson 2007). Perhaps the degenerate primers used in the Alverson 2007 1564 study could not distinguish between or amplify some of the SITs from T. weissflogii. 1565

One of the T. weissflogii SITs (T. weissflogii SIT3, K32_12335) is one of the 1566 most highly silicon responsive genes in the dataset and is ~15 fold upregulated (log2- 1567 fold change of ~4) in silicon limitation versus nutrient replete (Figure 2). The 1568 upregulation this T. weissflogii SIT is consistent with previous studies where some 1569

SITs are upregulated in T. pseudonana and P. tricornutum grown in low silicon 1570

(Shrestha et al. 2012; Sapriel et al. 2009). The low expression of the SIT3-like 1571 sequence from T. weissflogii is consistent with what was observed for the SIT3 from 1572

T. pseudonana, which is constitutively lowly expressed, leading to the suggestion that 1573 it functions as a silicon sensor (Thamatrakoln & Hildebrand 2007). 1574

Surprisingly, the single SIT in T. oceanica was not significantly differentially 1575 expressed, but only slightly upregulated in low silicon (Figure 2). Although a similar 1576 silicon-indifferent trend has been seen in T. pseudonana SIT3, the single T. oceanica 1577

SIT groups with the silicon sensitive SIT1 and SIT2 sequences (Figure 2) (Shrestha et 1578

47

al. 2012). The lack of a strong silicon limitation response in the T. oceanica SIT might 1579 be due to the fact that it lacks a SIT3-like gene copy that has been hypothesized to act 1580 as a silicon sensor in T. pseudonana (Thamatrakoln & Hildebrand 2007). It is also 1581 possible that the single T. oceanica SIT both senses silicon status as well as 1582 functioning as a transporter. 1583

In addition to SIT mediated silicon transport, it has been hypothesized that 1584

T. pseudonana also contains a silicon efflux transporter. This is because silicon- 1585 limited T. pseudonana upregulated a gene annotated as a citrate transporter that shares 1586 homology with a plant silicon efflux transporter (Shrestha et al. 2012). The Silix gene 1587 family (FAM001283) containing the putative silicon efflux protein from T. 1588 pseudonana also contains a single gene from each T. oceanica and T. weissflogii 1589

(Supplemental Table 4), but they are not differentially expressed in silicon limitation 1590

(Supplemental Table 2), suggesting that this gene is not sensitive to silicon status in T. 1591 oceanica or T. weissflogii. 1592

A subset of putative silaffins, cingulins, silacidins, frustulins, and pleuralins are 1593 differentially regulated in Thalassiosiroids 1594

Identification of proteins (silaffins, cingulins, silacidins, frustulins, and 1595 pleuralins) involved in silicon processing using homology-based approaches is 1596 challenging because many of these genes do not show global sequence homology 1597 across species and are characterized by the presence of conserved domains or amino 1598 acid content (Poulsen & Kröger 2004; Pamirsky & Golokhvast 2013; Kröger et al. 1599

1997; Kröger & Wetherbee 2000; Kröger & Poulsen 2008; Kröger et al. 1994; Kröger 1600 et al. 1996). We queried the T. oceanica and T. weissflogii transcriptomes for 1601

48

candidate genes involved with silicon processing using a variety of homology- 1602 independent methods. 1603

The search for silaffins and cingulins involved a FIMO search (Grant et al. 1604

2011) for the pentalysine cluster KXXKXXKXX-KXXK (Poulsen et al. 2013). 1605

Proteins lacking an N-terminal ER signal peptide in SignalP were filtered out 1606

(Petersen et al. 2011; Poulsen et al. 2013), as these proteins would not be localized to 1607 the ER and silicon deposition vesicle. This approach yielded 138 proteins in T. 1608 oceanica and 243 proteins in T. weissflogii (Supplemental Table 7). A similar search 1609 of the T. pseudonana predicted proteins from the genome sequence (filtered set of best 1610 models per locus) yielded 162 proteins, including the 6 known cingulins and 3 known 1611 silaffins (Poulsen & Kröger 2004; Scheffel et al. 2011). The majority of the other 1612 putative silaffin-like and cingulin-like genes we identified are not differentially 1613 expressed in response to silicon (Figure 3, Supplemental Table 7). In T. oceanica, 10 1614 of the putative silaffins and cingulins we identified are upregulated and 6 are 1615 downregulated in response to low silicon (Figure 3, Supplemental Table 7). In T. 1616 weissflogii, 2 are upregulated and 37 are downregulated in response to low silicon 1617

(Figure 3, Supplemental Table 7). This pattern of expression whereby some silaffins 1618 and cingulins are upregulated and some are downregulated in response to silicon status 1619 is consistent with previous studies that saw both the up and downregulation of 1620 different silaffin and cingulin homologs in silicon limited T. pseudonana (Mock et al. 1621

2008; Shrestha et al. 2012; Brembu et al. 2017). Our data is also consistent with the 1622 hypotheses that distinct silaffin gene copies are optimized for silicon deposition in 1623 either low or high silicon and that cingulins might play a wider role in diatoms other 1624

49

than silicon deposition (Shrestha et al. 2012; Brembu et al. 2017). Perhaps this 1625 dynamic regulation of silaffins and cingulins is part of the underlying molecular 1626 mechanism that allows diatoms to regulate the thickness of their frustules in 1627 accordance with silicon availability (Paasche 1973). 1628

Putative T. oceanica and T. weissflogii silicadins were identified using the T. 1629 pseudonana silacidin sequence as a benchmark as this acidic protein is comprised of 1630

~14% aspartic acid and 17% glutamic acid residues (Wenzl et al. 2008; Richthammer 1631 et al. 2011). Biostrings in R was used to find proteins in T. oceanica and T. weissflogii 1632 that pass this threshold of acidic amino acids per protein and the presence or absence 1633 of an N-terminal ER signal peptide was determined by SignalP (Pagès et al. 2016; 1634

Petersen et al. 2011). In T. oceanica and T. weissflogii, 17 and 28 proteins met the 1635 silacidin amino acid composition threshold cutoffs (Figure 3, Supplemental Table 8). 1636

None of the copies in T. oceanica were differentially expressed in silicon limitation, 1637 nor do they have an N terminal ER signal peptide (Figure 3, Supplemental Table 8). In 1638

T. weissflogii, 2 silacidin-like proteins are upregulated in silicon limitation while one 1639 is downregulated and regulation of all three is specific to the silicon limited condition, 1640 although none of the 3 have an N terminal ER signal peptide (Figure 3, Supplemental 1641

Table 8). While neither transcriptome study in T. pseudonana saw a differential 1642 expression of the silacidin gene, (Mock et al. 2008; Shrestha et al. 2012), silicon 1643 limited T. pseudonana accumulated higher amounts of silacidin protein in low silicon 1644

(Richthammer et al. 2011). This suggests that the transcription of silacidin genes is not 1645 a key point of regulation for silicon processing in T. pseudonana or T. oceanica but 1646 may be post-transcriptionally or post-translationally regulated. 1647

50

To identify putative frustulins in T. oceanica and T. weissflogii, a FIMO search 1648

(Grant et al. 2011) for the hexapeptide motifs CEGDCD and VPGCSG that are 1649 diagnostic of the acidic cysteine rich domains (ACR) of frustulins (Kröger et al. 1994; 1650

Kröger et al. 1996) was performed and yielded 2480 putative proteins from T. 1651 oceanica and 1739 proteins from T. weissflogii. Transcripts without a predicted N 1652 terminal ER signal peptide were filtered out using SignalP (Petersen et al. 2011), 1653 narrowing this number down to 313 proteins from T. oceanica and 215 from T. 1654 weissflogii (Figure 3, Supplemental Table 9). Most of theses putative frustulins are not 1655 differentially expressed in silicon limitation (Figure 3, Supplemental Table 9). In T. 1656 oceanica, 25 putative frustulins are upregulated in silicon limitation and 22 of them 1657 specifically regulated in the silicon limited condition (Figure 3, Supplemental Table 1658

9). Similarly, T. oceanica downregulates 18 putative frustulins in silicon limitation, 13 1659 of which are specific to the silicon limited condition specifically downregulated 1660

(Figure 3, Supplemental Table 9). In T. weissflogii, 24 of the putative frustulins are 1661 upregulated in silicon limitation, 3 of them specific to the silicon limited condition. Of 1662 the putative frustulins, 33 are downregulated in silicon limitation and 18 of them with 1663 silicon specificity (Figure 3, Supplemental Table 9). The lack of silicon-sensitivity of 1664 most of the putative frustulins is not surprising – in T. pseudonana, only a single 1665 frustulin was upregulated 7 hours after silicon resupply (Shrestha et al. 2012). Further, 1666 it is likely that many of these sequences are not truly frustulins, as the number of 1667 putative frustulins recovered from T. oceanica and T. weissflogii is significantly higher 1668 than the 3 frustulin sequences from the T. pseudonana genome (Armbrust et al. 2004). 1669

51

Pleuralins have been identified and characterized in Cylindrotheca and are 1670 characterized by PSCD (proline, serine, cysteine, and aspartic acid-rich) repeat 1671 domains (Kröger et al. 1997; Kröger & Wetherbee 2000). The PSCD domain repeats 1672 are 87 or 89 amino acids in length and are 22% proline, 11% serine, 11% cysteine, and 1673

9% aspartic acid (Kröger et al. 1997). These percentages were used as a benchmark to 1674 scan the translated transcriptomes’ protein sequences for pleuralins using BioStrings 1675 with a sliding window length of 87 amino acids (Pagès et al. 2016). We did not 1676 recover any putative pleuralin sequences, consistent with previous studies that did not 1677 find this class of molecule present in T. pseudonana (Armbrust et al. 2004). 1678

Genes involved in long-chain polyamine synthesis are not differentially regulated 1679 in T. oceanica or T. weissflogii 1680

Long-chain polyamines (LCPAs) are additional molecules that play a role in 1681 silicon deposition in diatoms (Kröger et al. 2000). Proline, glutamate, and arginine 1682 fuel polyamine biosynthesis via their conversion to ornithine, which is converted into 1683 putrescine, spermine, and spermidine – the polyamine precursor molecules 1684

(Bouchereau et al. 1999). The enzymes spermine and spermidine synthase catalyze the 1685 conversion of putrescine to spermine and spermidine using S-adenosyl-L-methionine 1686 and S-adenosyl-metioninamine as a substrate, which is derived from methionine 1687 metabolism (Bouchereau et al. 1999). 1688

Genes involved in funneling proline, arginine, glutamate, ornithine, and 1689 methionine toward polyamine synthesis were of interest as potential markers of silicon 1690 stress in diatoms and their presence in the transcriptomes was determined by their 1691

KEGG annotations. However, we found little evidence of silicon-specific changes in 1692

52

polyamine metabolism, with the exception of T. oceanica upregulating a proline 1693 iminopeptidase (K01259) that cleaves proline residues from peptides and T. 1694 weissflogii upregulating an arginase (K01476), which converts arginine to ornithine 1695

(Figure 4). This is in contrast with previous studies in T. pseudonana, where 1696 spermidine and spermine synthases were up and downregulated during silicon 1697 limitation (Shrestha et al. 2012; Mock et al. 2008). This suggests that while the 1698 synthesis of LCPA precursors may be a key point of regulation in silicon processing 1699 for T. pseudonana, that may not be the case in T. oceanica or T. weissflogii. 1700

Identifying novel markers of silicon stress 1701

Molecular markers of silicon stress can include novel genes or genes lacking 1702 an annotated function in a well described metabolic pathway. To determine whether or 1703 not any of the unannotated genes may be good markers of silicon stress in diatoms, we 1704 used a network-based approach to identify homologous genes that were upregulated 1705 specifically in response to silicon in both T. oceanica and T. weissflogii. Silix 1706 networks were computed using the T. rotula draft transcriptome assembly, the T. 1707 pseudonana filtered protein models from JGI (Thaps3 and Thaps3_bd), and the 1708 assembled T. oceanica and T. weissflogii transcriptomes. (Miele et al. 2011; Armbrust 1709 et al. 2004). Homologues were identified if they were in the same Silix gene family 1710

(e.g., were 35% identical over at least 80% query coverage per HSP). As an example, 1711 the silicon transporters cluster together in a single Silix family (FAM000927, 1712

Supplemental Table 4). 1713

53

The Silix gene network consists of 104,848 genes grouped into 77,386 gene 1714 families (Supplemental Table 4, Supplemental Table 5). In the Silix networks, l32,857 1715 genes (grouped into 9076 gene families) are present in 2 or more diatom species. 1716

However, the majority of genes (71,991) are grouped into 68,310 species-specific 1717 gene families that contain representatives from 1 diatom species only (Supplemental 1718

Table 4, Supplemental Table 5). This pattern (many genes residing in species-specific 1719 gene families) also holds true for the silicon-specific differentially expressed genes. In 1720

T. oceanica, 621 of the up and 270 of downregulated genes are in in species-specific 1721 gene families (Supplemental Table 6). In T. weissflogii, 381 of the up and 613 of 1722 downregulated genes are also in species-specific gene families (Supplemental Table 1723

6). To target potential markers of silicon stress, networks were subsequently filtered to 1724 include only Silix families with at least one silicon-specifically upregulated gene from 1725 both T. oceanica and T. weissflogii. 1726

The search for silicon specific, upregulated homologous genes in T. oceanica 1727 and T. weissflogii revealed 4 gene families containing candidate silicon stress markers. 1728

The most promising of these is a family of ATP-grasp domain proteins (FAM001061). 1729

Silix gene families were also used to recover T. oceanica and T. weissflogii homologs 1730 to the T. pseudonana putative silicon efflux protein (Shrestha et al. 2012) and also 1731 highlighted the presence of an interesting gene expansion in T. oceanica. 1732

ATP-grasp proteins are silicon sensitive and silicon specific 1733

The most promising marker of silicon stress from this study are members of a 1734 gene family of ATP-grasp domain proteins, which contains 3 silicon-specific 1735 upregulated genes – 2 from T. weissflogii and 1 from T. oceanica (EJK54127 from the 1736

54

genome Lommer et al. 2012). This Silix family also contains 2 genes that are 1737 upregulated in silicon limited T. pseudonana (Brembu et al. 2017; Smith et al. 2016). 1738

Another study saw that T. pseudonana slightly upregulated 1 of these 2 genes ~7 hours 1739 after silicon resupply, but its expression values return to baseline levels at the 9 hour 1740 mark (Shrestha et al. 2012). 1741

To determine whether or not these ATP-grasp proteins are found in other 1742 diatoms, the MMETSP database was queried for sequences similar (35% identity and 1743

80% query coverage per HSP) to the 3 silicon-sensitive ATP grasp proteins in the T. 1744 oceanica and T. weissflogii transcriptome assemblies. Sequences meeting our 1745 similarity threshold were found in 143 genes across 29 diatom species across 17 1746 diatom genera. The only non-diatom sequences recovered were 2 sequences from the 1747 Pyrodinium bahamense and Alexandrium monilatum. 1748

Proteins containing ATP-grasp domains are ATP-dependent enzymes that 1749 ligate carboxylate groups to amino or imino groups (Galperin & Koonin 1997). They 1750 are found throughout the tree of life and ~40% of the described ATP-grasp domain 1751 proteins are involved in purine biosynthesis or are biotin carboxylases (e.g., Acetyl- 1752

CoA carboxylase, pyruvate carboxylase, carbamoyl phosphate synthetase) (Fawaz et 1753 al. 2011). Other examples of ATP-grasp domain proteins include D-Alanine-D- 1754

Alanine ligases (involved in peptidoglycan biosynthesis), glutathione synthetases 1755

(involved in glutathione metabolism), tubulin-tyrosine ligases (involved in 1756 microtubule assembly), and they also have been indicated to play a role in the 1757 biosynthesis of mycosporine-like amino acids in , where they play a role 1758 in UV protection (Fawaz et al. 2011; Gao & Garcia-Pichel 2011). Mycosporine-like 1759

55

amino acids are tightly embedded into diatom frustules and are hypothesized to play a 1760 role in mediating UV stress (Ingalls et al. 2010). 1761

The upregulation of ATP-grasp domains in low silicon may be related to the 1762 thinning of diatom frustules in low silicon, and need for enhanced UV protection. In 1763 addition to fortification of their non-silicified cell wall components to compensate for 1764 thinner frustules, it is also possible that the ATP-grasp domain proteins are functioning 1765 similarly to the tubulin-tyrosine ligases involved in microtubule assembly and the 1766 cytoskeleton (Fawaz et al. 2011). This would be consistent with the role of the 1767 cytoskeleton-mediated positioning of the silicon deposition vesicle (Tesson & 1768

Hildebrand 2010). While the specific enzymatic activity of the family of ATP-grasp 1769 domain proteins discussed in this study is unclear, they are an attractive choice as a 1770 molecular marker of silicon stress in diatoms. Although transcriptome evidence 1771 suggests they are wide-spread throughout the diatoms, their silicon sensitivity in 1772 individual diatom species should be verified. 1773

Identifying metabolic pathways with silicon sensitivity 1774

In addition to the previous gene family based approach, an additional strategy 1775 for finding potential molecular markers of silicon stress is to look for common, 1776 silicon-sensitive pathways in both T. oceanica and T. weissflogii. To find silicon- 1777 sensitive metabolic pathways in both T. oceanica and T. weissflogii, genes upregulated 1778 in silicon stress and not part of a general stress response were binned into broad 1779

KEGG modules or silicon metabolism gene categories (Figure 5). There are several 1780 pathways that are upregulated in silicon limited T. oceanica and T. weissflogii, 1781 including “Cellular processes and signaling”, “Membrane transport”, and 1782

56

“Xenobiotics biodegradation and metabolism” (Figure 5). Although similar pathways 1783 are silicon-sensitive in both diatoms, the direction of expression (e.g. up or 1784 downregulated in silicon limitation) is often different in each diatom (Figure 5). 1785

Silicon limited T. oceanica upregulates the “silaffin-cingulins”, “Cell growth and 1786 death”, “Replication and repair”, “Transcription” and “Glycan biosynthesis and 1787 metabolism”, these same categories are downregulated in silicon limited T. weissflogii 1788

(Figure 5). Silicon limited T. weissflogii upregulates “Genetic information 1789 processing”, “Biosynthesis of secondary metabolites”, “Metabolism of terpenoids and 1790 polyketides”, and “Energy metabolism”, these same categories are downregulated in 1791 silicon limited T. oceanica (Figure 5). 1792

Although there are not specific gene homologs in the “Cellular processes and 1793 signaling”, “Membrane transport”, and “Xenobiotics biodegradation and metabolism” 1794 categories (e.g., genes annotated with the same KEGG orthology assignment) that are 1795 differentially expressed in both diatoms, these categories contain several phosphatases, 1796 peptidases, proteases, and kinases (Supplemental Table 1, Supplemental Table 2). 1797

Perhaps during silicon limitation, some degree of post-translational regulation of 1798 protein activity is occuring via degradation by peptidases/proteases or 1799 phosphorylation/de-phosphorylation by kinases/phosphatases. Genes involved in 1800 modifying proteins could be related to the silacidin and silaffin proteins being 1801 phosphorylated (Wenzl et al. 2008; Krog̈ er et al. 2002). It could also be related to the 1802 fact that protein degradation is an important mediator of transitioning through distinct 1803 phases of the cell cycle (Glotzer et al. 1991), a process that is tightly associated with 1804

57

silicon uptake and processing (Brzezinski et al. 1990; Vaulot et al. 1987; 1805

Thamatrakoln & Hildebrand 2007). 1806

The contradictory trends in silicon-sensitive gene expression between T. 1807 oceanica and T. weissflogii may be related to previous studies indicating that silicon 1808 processing is tied to DNA replication, cell division, and the cell cycle (Eppley et al. 1809

1967; Darley & Volcani 1969; Vaulot et al. 1987; Brzezinski et al. 1990; 1810

Thamatrakoln & Hildebrand 2007; Hildebrand et al. 2007). Like other eukaryotic 1811 cells, diatom cells mitotically divide during M phase, DNA synthesis and replication 1812 occurs during S phase, and these two phases of the cell cycle are separated by the G1 1813 and G2 gap phases (Mitchison 1971). Diatom frustule valves are formed just after cell 1814 division, during G2/M phase, while girdle band synthesis happens in G1 phase 1815

(Eppley et al. 1967; Hildebrand et al. 2007). In addition to cell wall synthesis, diatoms 1816 also require silicon for DNA replication to proceed during the S phase of the cell cycle 1817

(Darley & Volcani 1969). These silicon requirements explain why silicon limited T. 1818 weissflogii cells sometimes arrest in the G2/M phase or other times in the G1/S phase 1819 transition (Brzezinski et al. 1990; Vaulot et al. 1987). 1820

As we were largely focused on identifying markers that would be applicable in 1821 asynchronous field populations, we focused on the onset of silicon limitation and 1822 made no attempt to synchronize cell cycle shifts in any of the cultures during these 1823 experiments, so we cannot speak definitively to specific aspects of the cell cycle that 1824 are affected by silicon limitation and we do not know if either diatom was silicon 1825 limited to the point of cell cycle arrest. However, the cyclin B1/B2 genes of P. 1826 tricornutum were upregulated in the G2/M phase of the cell cycle (Huysman et al. 1827

58

2010). The Silix gene family containing the T. pseudonana B1/B2 cyclins 1828

(FAM003363, Supplemental Table 4) contains a silicon-specifically upregulated gene 1829 in T. oceanica and a silicon-specifically downregulated gene in T. weissflogii, 1830 suggesting that at least some of the silicon limited T. oceanica were in G2/M phase 1831 while some of the silicon limited T. weissflogii cells were likely not in G2/M phase at 1832 the time the transcriptomes were sequenced (Supplemental Table 3) (Huysman et al. 1833

2010). 1834

The different biological processes that happen during distinct phases of the cell 1835 cycle might explain the contradictory trends seen in silicon limited T. oceanica and T. 1836 weissflogii. Silicon limited T. oceanica upregulates genes in “Replication and repair”, 1837 suggesting that it may be awaiting sufficient silicon resupply for DNA synthesis to 1838 proceed (Figure 5). Silicon limited T. weissflogii upregulates genes in “Translation”, 1839 which may be indicative of the biosynthesis of proteins required for silicon processing 1840

(e.g., SITs) (Figure 5). 1841

Sel1/MYND zinc finger domain gene expansion in T. oceanica 1842

One gene family (FAM006551) contained a highly responsive gene (~8 fold 1843 upregulation in silicon stress) and 7 other silicon-sensitive genes in T. oceanica 1844

(27088) (Supplemental Table 3, Supplemental Table 4) but did not have homologs in 1845 our T. weissflogii transcriptome assembly. This gene family is highly expanded in T. 1846 oceanica and consists of 683 genes, 463 of them contain a Sel1 like repeat (including 1847 the highly silicon sensitive gene in T. oceanica) 44 have a MYND zinc finger domain 1848

(Supplemental Table 1). A search for Sel1-like repeats in the T. oceanica genome 1849

(through protists.ensembl.org) reveals that there are 1051 Sel1-like repeats in the T. 1850

59

oceanica genome, while a similar query of Thalassiosira pseudonana, Phaeodactylum 1851 tricornutum, and Fragilariopsis cylindrus genomes (through genome.jgi.doe.gov) 1852 found 8, 5, and 17 genes, respectively, confirming the expansion of Sel1-like repeat 1853 proteins in T. oceanica. Similarly, searches for the MYND zinc finger proteins 1854

(InterPro accession IPR002893) in the genomes of T. oceanica, T. pseudonana, P. 1855 tricornutum, and F. cylindrus revealed 443, 6, 8, and 134 genes, respectively, 1856 indicating that the MYND zinc finger domain proteins are expanded in both T. 1857 oceanica and F. cylindrus. Sel1 domain proteins are extracellular proteins involved in 1858 regulating Notch signaling in C. elegans, and MYND zinc finger domains are thought 1859 to be involved in mediating protein-protein interactions, rather than the canonical 1860

DNA binding associated with zinc finger domains (Grant & Greenwald 1996; 1861

Gamsjaeger et al. 2006). This is consistent with previous studies indicating that Notch 1862 signaling proteins are upregulated during formation of diatom frustule valves 1863

(Shrestha et al. 2012). 1864

A search for homologs to the differentially regulated Sel1/MYND zinc finger 1865 domain containing genes in the MMETSP database revealed that homologs are wide- 1866 spread across 168 species across the alveolate, amoebozoa, archplastida, hacrobia, 1867 rhizaria, and stramenopile lineages. In most species (including the diatoms), these 1868 genes are not highly expanded (e.g., less than 20 copies) (Supplemental Figure 5). 1869

However, in addition to being highly expanded in T. oceanica, homologs to this gene 1870 family are also highly expanded in Cyclotella and Skeletonema diatoms (Supplemental 1871

Figure 5). 1872

60

The expansion of MYND zinc finger domain proteins in F. cylindrus has been 1873 previously reported and was tentatively attributed to the high zinc concentrations in 1874 the Southern Ocean where this diatom is found (Mock et al. 2017). F. cylindrus also 1875 experiences chronically low iron in the Southern Ocean, as does T. oceanica in the 1876 northeast Pacific (Chappell et al. 2015; Bertrand et al. 2015). Perhaps these expanded 1877 gene families in both F. cylindrus and T. oceanica are part of lineage-specific metal 1878 sensing mechanisms. The upregulation of this gene in silicon limited T. oceanica 1879 would be consistent with the hypothesis that iron is a co-factor for diatoms’ silicon 1880 processing (Mock et al. 2008). 1881

In this context, the high copy number of MYND/Sel1-like genes in 1882

Skeletonema diatoms is particularly interesting, as these diatoms are typically found in 1883 coastal environments where iron is not thought to limit diatom growth (Kooistra et al. 1884

2008). One possible explanation is that Sel1-like repeats have also been suggested to 1885 play a role in mediating protein degradation in the endoplasmic reticulum and perhaps 1886 this is a key point of metabolic regulation in both T. oceanica and Skeletonema 1887 diatoms (Mittl & Schneider-Brachert 2007). 1888

CONCLUSIONS 1889

This aim of this study was to find molecular markers of silicon stress in two 1890 ecologically relevant diatoms, Thalassiosira oceanica and Thalassiosira weissflogii. 1891

Ideal silicon stress gene markers should be upregulated in silicon limitation and not 1892 other nutrient stressors and have homologs found across a range of diatoms. This 1893 study revealed a family of ATP-grasp domain proteins that were upregulated in both 1894

T. oceanica and T. weissflogii uniquely in the silicon limited condition. Previous 1895

61

studies in T. pseudonana also saw an upregulation of this gene family in silicon 1896 limitation (Smith et al. 2016; Brembu et al. 2017). Homologs of this gene were 1897 widespread through other diatoms, but their silicon sensitivity across these different 1898 diatoms has yet to be tested. 1899

Unlike T. pseudonana (Shrestha et al. 2012) and T. weissflogii (this study), T. 1900 oceanica does not strongly upregulate its silicon transporter in silicon limitation. This 1901 may be related to the absence of a SIT3-like sequence in this diatom, as SIT3 has been 1902 suggested to play a role as a silicon sensor in T. pseudonana (Thamatrakoln & 1903

Hildebrand 2007). The hypothesis that SIT3 functions as a silicon sensor was born 1904 from the observation that T. pseudonana SIT3 is lowly expressed and is not 1905 upregulated in silicon limitation (Thamatrakoln & Hildebrand 2007). This study has 1906 produced incongruous results in its support for this hypothesis. The low expression of 1907 the SIT3-like sequence from T. weissflogii is consistent with the suggestion that it 1908 might act as a silicon sensor, but its expression is sensitive to silicon (Figure 2), which 1909 is contrary to what has been shown for T. pseudonana SIT3 (Thamatrakoln & 1910

Hildebrand 2007; Shrestha et al. 2012). Future discovery of the SIT3 promoter 1911 sequence would aid in further attempts to discern whether or not SIT3 acts as a silicon 1912 sensor. This would enable experiments where SIT1/SIT2 expression and diatom 1913 silicification are quantified in the context of a SIT3 promoter deletion. 1914

Homology-independent methods were used to identify putative silaffins, 1915 cingulins, silacidins, and frustulin sequences (Figure 3, Supplemental Table 7, 1916

Supplemental Table 8, Supplemental Table 9), but it is likely that most of these are not 1917 truly functioning in silicon processing because most of the genes identified are not 1918

62

silicon sensitive (Figure 3) and the number of genes recovered are much higher than 1919 what has been found for T. pseudonana (Supplemental Table 7, Supplemental Table 8, 1920

Supplemental Table 9) (Poulsen & Kröger 2004; Scheffel et al. 2011; Wenzl et al. 1921

2008; Armbrust et al. 2004). The candidates we did identify as being silicon-sensitive 1922 with N terminal ER targeting sequences could be good candidates to test for function 1923 using diatom gene editing methods. These methods have been used to characterize the 1924 urease gene in T. pseudonana (Hopes et al. 2016). 1925

Another approach to finding silicon stress markers was to look for similar 1926 metabolic pathways that are silicon sensitive in T. oceanica and T. weissflogii. 1927

Although similar KEGG pathways are silicon-sensitive in the two diatoms, the 1928 direction of expression is sometimes in opposition in each diatom (e.g., upregulated in 1929 one silicon limited diatom and down in the other) (Figure 5). This trend is most clear 1930 in the “Transcription”, “Translation”, and “Replication and repair” categories (Figure 1931

5). These different manifestations of silicon limitation may be related to the fact that 1932 silicon is required for both DNA synthesis and cell wall synthesis (Darley & Volcani 1933

1969; Eppley et al. 1967; Hildebrand et al. 2007), so it is possible that silicon 1934 limitation is disproportionately influencing these two processes in the diatoms tested. 1935

In this context, the argument for using the ATP-grasp domain proteins as molecular 1936 markers of silicon status is quite strong, as both manifestations of silicon limitation 1937 saw this gene upregulated. 1938

1939

63

References 1940

Allen, J.T. et al., 2005. Diatom carbon export enhanced by silicate upwelling in the 1941

northeast Atlantic. Nature, 437(7059), pp.728–732. 1942

Alverson, A.J., 2007. Strong purifying selection in the silicon transporters of marine 1943

and freshwater diatoms. Limnology and Oceanography, 52(4), pp.1420–1429. 1944

Andrews, S., 2010. FastQC: A quality control tool for high throughput sequence data. 1945

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ 1946

Armbrust, E.V. et al., 2004. The Genome of the Diatom Thalassiosira pseudonana: 1947

Ecology, Evolution, and Metabolism. Science, 306(October), pp.79–86. 1948

Bertrand, E.M. et al., 2015. Phytoplankton-bacterial interactions mediate 1949

micronutrient colimitation at the coastal Antarctic sea ice edge. Proceedings of 1950

the National Academy of Sciences of the United States of America, 112(32), 1951

pp.9938–43. 1952

Bouchereau, A. et al., 1999. Polyamines and environmetal challenges: recent 1953

development. Plant Science, 140, pp.103–125. 1954

Boyd, P. et al., 1999. Role of iron, light, and silicate in controlling algal biomass in 1955

subantarctic waters SE of New Zealand. Journal of Geophysical Research, 104, 1956

pp.395–408. 1957

Brembu, T. et al., 2017. Dynamic responses to silicon in Thalasiossira pseudonana - 1958

Identification, characterisation and classification of signature genes and their 1959

corresponding protein motifs. Scientific Reports, 7(1), pp.1–14. 1960

Brzezinski, M. a. et al., 2011. Co-limitation of diatoms by iron and silicic acid in the 1961

equatorial Pacific. Deep-Sea Research Part II: Topical Studies in Oceanography, 1962

64

58(3–4), pp.493–511. 1963

Brzezinski, M., Olson, R. & Chisholm, S., 1990. Silicon availability and cell-cycle 1964

progression in marine diatoms. Marine Ecology Progress Series, 67(M), pp.83– 1965

96. 1966

Chappell, P.D. et al., 2015. Genetic indicators of iron limitation in wild populations of 1967

Thalassiosira oceanica from the northeast Pacific Ocean. The ISME Journal, 9, 1968

pp.592–602. 1969

Chappell, P.D. et al., 2016. Preferential depletion of zinc within Costa Rica upwelling 1970

dome creates conditions for zinc co-limitation of primary production. Journal of 1971

Plankton Research, 38(2), pp.244–255. 1972

Darley, W.M. & Volcani, B.E., 1969. Role of silicon in diatom metabolism. A silicon 1973

requirement for deoxyribonucleic acid synthesis in the diatom Cylindrotheca 1974

fusiformis Reimann and Lewin. Experimental cell research, 58(2), pp.334–342. 1975

Drum, R.W. & Pankratz, S.H., 1964. Post mitotic fine structure of Gomphonema 1976

parvulum. Journal of ultrastructure research, 10, pp.217–223. 1977

Du, C. et al., 2014. ITRAQ-based proteomic analysis of the metabolism mechanism 1978

associated with silicon response in the marine diatom Thalassiosira pseudonana. 1979

Journal of Proteome Research, 13(2), pp.720–734. 1980

Dugdale, R.C. & Wilkerson, F.P., 1998. Silicate regulation of new production in the 1981

equatorial Pacific upwelling. Nature, 391(January), pp.270–273. 1982

Dugdale, R.C., Wilkerson, F.P. & Minas, H.J., 1995. The role of a silicate pump in 1983

driving new production. Deep-Sea Research Part I: Oceanographic Research 1984

Papers, 42(5), pp.697–719. 1985

65

Durkin, C.A. et al., 2016. The evolution of silicon transporters in diatoms. Journal of 1986

Phycology, 52(5), pp.716–731. 1987

Eppley, R.W., Holmes, R.W. & Paasche, E., 1967. Periodicity in cell division and 1988

physiological behavior of Ditylum brightwellii a marine planktonic diatom during 1989

grow in light-dark cycles. Archiv Fur Mikrobiologie, 56, pp.305–323. 1990

Fawaz, M. V, Topper, M. & Firestine, S.M., 2011. The ATP-Grasp Enzymes. 1991

Bioorganic Chemistry, 39(5–6), pp.185–191. 1992

Field, C.B. et al., 1998. Primary Production of the Biosphere: Integrating Terrestrial 1993

and Oceanic Components. Science, 281(5374), pp.237–240. 1994

Franck, V.M. et al., 2000. Iron and silicic acid concentrations regulate Si uptake north 1995

and south of the Polar Frontal Zone in the Pacific Sector of the Southern Ocean. 1996

Deep Sea Research Part II: Topical Studies in Oceanography, 47(15–16), 1997

pp.3315–3338. 1998

Frischkorn, K.R. et al., 2014. De novo assembly of Aureococcus anophagefferens 1999

transcriptomes reveals diverse responses to the low nutrient and low light 2000

conditions present during blooms. Frontiers in Microbiology, 5(JULY), pp.1–16. 2001

Fu, L. et al., 2012. CD-HIT: Accelerated for clustering the next-generation sequencing 2002

data. Bioinformatics, 28(23), pp.3150–3152. 2003

Galperin, M.Y. & Koonin, E. V., 1997. A diverse superfamily of enzymes with ATP- 2004

dependent carboxylate-amine/thiol ligase activity. Protein Science, 6(12), 2005

pp.2639–2643. 2006

Gamsjaeger, R. et al., 2006. Sticky fingers : zinc-fingers as protein-recognition motifs. 2007

Trends in Biochemical Sciences, 32(2). 2008

66

Gao, Q. & Garcia-Pichel, F., 2011. An ATP-Grasp ligase involved in the last 2009

biosynthetic step of the iminomycosporine shinorine in Nostoc punctiforme 2010

ATCC 29133. Journal of Bacteriology, 193(21), pp.5923–5928. 2011

Glotzer, M., Murray, A.W. & Kirschner, M.W., 1991. Cyclin is degraded by the 2012

ubiquitin pathway. Nature, 349(6305), pp.132–138. 2013

Grant, B. & Greenwald, I., 1996. The Caenorhabditis elegans sel-1 gene, a negative 2014

regulator of lin-12 and glp-1, encodes a predicted extracellular protein. Genetics, 2015

143(1), pp.237–247. 2016

Grant, C.E., Bailey, T.L. & Noble, W.S., 2011. FIMO: Scanning for occurrences of a 2017

given motif. Bioinformatics, 27(7), pp.1017–1018. 2018

Guillard, R.R.L., 1975. Culture of Phytoplankton for Feeding Marine Invertebrates. In 2019

Culture of Marine Invertebrate Animals. pp. 29–60. 2020

Guillard, R.R.L. & Ryther, J.H., 1962. Studies of marine planktonic diatoms I. 2021

Cyclotella nana Hustedt, and Detonula confervasea (Cleve) Gran. Canadian 2022

Journal of Microbiology, 8(1140), pp.229–239. 2023

Hildebrand, M. et al., 1997. A gene family of silicon transporters. Nature, 385, 2024

pp.688–689. 2025

Hildebrand, M., Dahlin, K. & Volcani, B.E., 1998. Characterization of a silicon 2026

transporter gene family in Cylindrotheca fusiformis: sequences, expression 2027

analysis, and identification of homologs in other diatoms. Molecular and General 2028

Genetics MGG, 260(5), pp.480–486. 2029

Hildebrand, M., Frigeri, L.G. & Davis, A.K., 2007. Synchronized growth of 2030

Thalassiosira pseudonana (Bacillariophyceae) provides novel insights into cell- 2031

67

wall synthesis processes in relation to the cell cycle. Journal of Phycology, 43(4), 2032

pp.730–740. 2033

Hopes, A. et al., 2016. Editing of the urease gene by CRISPR-Cas in the diatom 2034

Thalassiosira pseudonana. Plant Methods, 12(1), pp.1–12. 2035

Hutchins, D. a & Bruland, K.W., 1998. Iron-limited diatom growth and Si : N uptake 2036

ratios in a coastal upwelling regime. Nature, 393(June), pp.561–564. 2037

Huysman, M.J.J. et al., 2010. Genome-wide analysis of the diatom cell cycle unveils a 2038

novel type of cyclins involved in environmental signaling. Genome biology, 2039

11(2), p.R17. 2040

Ingalls, A.E., Whitehead, K. & Bridoux, M.C., 2010. Tinted windows: The presence 2041

of the UV absorbing compounds called mycosporine-like amino acids embedded 2042

in the frustules of marine diatoms. Geochimica et Cosmochimica Acta, 74(1), 2043

pp.104–115. 2044

Johnson, M. et al., 2007. Ammonium accumulation during a silicate-limited diatom 2045

bloom indicates the potential for ammonia emission events. Marine Chemistry, 2046

106(1–2 SPEC. ISS.), pp.63–75. 2047

Jones, P. et al., 2014. InterProScan 5: Genome-scale protein function classification. 2048

Bioinformatics, 30(9), pp.1236–1240. 2049

Jørgensen, E.G., 1955. Variations in the Silica Content of Diatoms. Physiologia 2050

Plantarum, 8(4), pp.840–845. 2051

Katoh, K. & Standley, D.M., 2013. MAFFT multiple sequence alignment software 2052

version 7: Improvements in performance and usability. Molecular Biology and 2053

Evolution, 30(4), pp.772–780. 2054

68

Keeling, P.J. et al., 2014. The Marine Microbial Eukaryote Transcriptome Sequencing 2055

Project (MMETSP): Illuminating the Functional Diversity of Eukaryotic Life in 2056

the Oceans through Transcriptome Sequencing. PLoS Biology, 12(6). 2057

King, A.L. et al., 2015. Effects of CO2 on growth rate, C:N:P, and fatty acid 2058

composition of seven marine phytoplankton species. Marine Ecology Progress 2059

Series, 537, pp.59–69. 2060

Koike, I. et al., 2001. Silicate to nitrate ratio of the upper sub-arctic Pacific and the 2061

Bering Sea basin in summer: Its implication for phytoplankton dynamics. Journal 2062

of Oceanography, 57(3), pp.253–260. 2063

Kooistra, W.H.C.F. et al., 2008. Global Diversity and Biogeography of Skeletonema 2064

Species (Bacillariophyta). Protist, 159(2), pp.177–193. 2065

Krog̈ er, N. et al., 2002. Self-Assembly of Highly Phosphorylated Silaffins and Their 2066

Function in Biosilica Morphogenesis. Science, 298(5593), pp.584–586. 2067

Kröger, N. et al., 1997. Characterization of a 200-kDa Diatom Protein that is 2068

Specifically Associated with a Silica-Based Substructure of the Cell Wall. Eur J 2069

Biochem, 250(1), pp.99–105. 2070

Kröger, N. et al., 2000. Species-specific polyamines from diatoms control silica 2071

morphology. Proceedings of the National Academy of Sciences of the United 2072

States of America, 97(26), pp.14133–14138. 2073

Kröger, N., Bergsdorf, C. & Sumper, M., 1994. A new calcium binding glycoprotein 2074

family constitutes a major diatom cell wall component. The EMBO journal, 2075

13(19), pp.4676–4683. 2076

Kröger, N., Bergsdorf, C. & Sumper, M., 1996. Frustulins: domain conservation in a 2077

69

protein family associated with diatom cell walls. European Journal of 2078

Biochemistry, 239(2), pp.259–264. 2079

Kröger, N., Deutzmann, R. & Sumper, M., 1999. Polycationic Peptides from Diatom 2080

Biosilica That Direct Silica Nanosphere Formation. Science, 286(5442), 2081

pp.1129–1132. 2082

Kröger, N. & Poulsen, N., 2008. Diatoms-from cell wall biogenesis to 2083

nanotechnology. Annual review of genetics, 42, pp.83–107. 2084

Kröger, N. & Wetherbee, R., 2000. Pleuralins are involved in theca differentiation in 2085

the diatom Cylindrotheca fusiformis. Protist, 151(3), pp.263–273. 2086

De La Rocha, C.L. et al., 2000. Effects of iron and zinc deficiency on elemental 2087

composition and silica production by diatoms. Marine Ecology Progress Series, 2088

195, pp.71–79. 2089

Leblanc, K. et al., 2005. Fe and Zn effects on the Si cycle and diatom community 2090

structure in two contrasting high and low-silicate HNLC areas. Deep-Sea 2091

Research Part I: Oceanographic Research Papers, 52(10), pp.1842–1864. 2092

Lewin, J.C., 1957. Silicon metabolism in diatoms IV. Growth and frustule formation 2093

in Navicula pelliculosa. Canadian Journal of Microbiology, 3, pp.427–433. 2094

Lewin, R.A., 1962. Silicification. In Physiology and biochemistry of algae. New York: 2095

Academic Press, pp. 445–455. 2096

Li, W. & Godzik, A., 2006. CD-HIT: A fast program for clustering and comparing 2097

large sets of protein or nucleotide sequences. Bioinformatics, 22(13), pp.1658– 2098

1659. 2099

Lommer, M. et al., 2012. Genome and low-iron response of an oceanic diatom adapted 2100

70

to chronic iron limitation. Genome Biology, 13(7), p.R66. 2101

Martin-Jézéquel, V., Hildebrand, M. & Brzezinski, M.A., 2000. REVIEW SILICON 2102

METABOLISM IN DIATOMS: IMPLICATIONS FOR GROWTH. J. Phycol, 2103

36(February), pp.821–840. 2104

Miele, V., Penel, S. & Duret, L., 2011. Ultra-fast sequence clustering from similarity 2105

networks with SiLiX. BMC bioinformatics, 12, p.116. 2106

Min, X.J. et al., 2005. OrfPredictor: Predicting protein-coding regions in EST-derived 2107

sequences. Nucleic Acids Research, 33(SUPPL. 2), pp.677–680. 2108

Mitchison, J.M., 1971. The biology of the cell cycle, Cambridge: Cambridge 2109

University Press. 2110

Mittl, P.R.E. & Schneider-Brachert, W., 2007. Sel1-like repeat proteins in signal 2111

transduction. Cellular Signalling, 19(1), pp.20–31. 2112

Mock, T. et al., 2017. Evolutionary genomics of the cold-adapted diatom 2113

Fragilariopsis cylindrus. Nature Publishing Group, 541(7638), pp.536–540. 2114

Mock, T. et al., 2008. Whole-genome expression profiling of the marine diatom 2115

Thalassiosira pseudonana identifies genes involved in silicon bioprocesses. 2116

Proceedings of the National Academy of Sciences of the United States of 2117

America, 105(5), pp.1579–84. 2118

Moriya, Y. et al., 2007. KAAS: An automatic genome annotation and pathway 2119

reconstruction server. Nucleic Acids Research, 35(SUPPL.2), pp.182–185. 2120

Nelson, D.M. et al., 1995. Production and dissolution of biogenic silica in the ocean: 2121

Revised global estimates, comparison with regional data and relationship to 2122

biogenic sedimentation. Global Biogeochemical Cycles, 9(3), pp.359–372. 2123

71

Paasche, E., 1973. Silicon and the ecology of marine plankton diatoms. I. 2124

Thalassiosira pseudonana (Cyclotella nana) grown in a chemostat with silicate as 2125

limiting nutrient. Marine Biology, 19(2), pp.117–126. 2126

Pagès, H. et al., 2016. Biostrings: String objects representing biological sequences, 2127

and matching algorithms. 2128

Pamirsky, I.E. & Golokhvast, K.S., 2013. Origin and status of homologous proteins of 2129

biomineralization (Biosilicification) in the of phylogenetic domains. 2130

BioMed Research International, 2013. 2131

Petersen, T.N. et al., 2011. SignalP 4.0: discriminating signal peptides from 2132

transmembrane regions. Nature methods, 8(10), pp.785–6. 2133 van de Poll, W.H., Vrieling, E.G. & Gieskes, W.W.C., 1999. Location and expression 2134

of frustulins in the pennate diatoms Cylindrotheca fusiformis, Navicula 2135

pelliculosa, and Navicula salinarum (Bacillariophyceae). J Phycol, 35, pp.1044– 2136

1053. 2137

Poulsen, N. et al., 2013. Pentalysine clusters mediate silica targeting of silaffins in 2138

Thalassiosira pseudonana. Journal of Biological Chemistry, 288(28), pp.20100– 2139

20109. 2140

Poulsen, N. & Kröger, N., 2004. Silica morphogenesis by alternative processing of 2141

silaffins in the diatom Thalassiosira pseudonana. Journal of Biological 2142

Chemistry, 279(41), pp.42993–42999. 2143

Price, N.M. et al., 1989. Preparation and Chemistry of the Artificial Algal Culture 2144

Medium Aquil. Biological Oceanography, 6(5–6), pp.443–461. 2145

R Development Core Team, 2013. R: A language and environment for statistical 2146

72

computing. R Foundation for Statistical Computing, Vienna, Austria. URL 2147

http://www.R-project.org/. R Foundation for Statistical Computing, Vienna, 2148

Austria. 2149

Richthammer, P. et al., 2011. Biomineralization in Diatoms: The Role of Silacidins. 2150

ChemBioChem, 12(9), pp.1362–1366. 2151

Rueter, J.G. & Morel, F.M.M., 1981. The interaction between zinc deficiency and 2152

copper toxicity as it affects the silicic acid uptake mechanisms in Thalassiosira 2153

pseudonana. Limnology and Oceanography, 26(1), pp.67–73. 2154

Saito, M.A., Goepfert, T.J. & Ritt, J.T., 2008. Some thoughts on the concept of 2155

colimitation: Three definitions and the importance of bioavailability. Limnology 2156

and Oceanography, 53(1), pp.276–290. 2157

Sapriel, G. et al., 2009. Genome-wide transcriptome analyses of silicon metabolism in 2158

Phaeodactylum tricornutum reveal the multilevel regulation of silicic acid 2159

transporters. PLoS ONE, 4(10). 2160

Scheffel, A. et al., 2011. Nanopatterned protein microrings from a diatom that direct 2161

silica morphogenesis. Proceedings of the National Academy of Sciences of the 2162

United States of America, 108(8), pp.3175–80. 2163

Shrestha, R. et al., 2012. Whole transcriptome analysis of the silicon response of the 2164

diatom Thalassiosira pseudonana. BMC Genomics, 13(1), p.499. 2165

Smith, S.R. et al., 2016. Transcript level coordination of carbon pathways during 2166

silicon starvation-induced lipid accumulation in the diatom Thalassiosira 2167

pseudonana. New Phytologist, 210(3), pp.890–904. 2168

Sunda, W.G. & Huntsman, S.A., 1995. Iron uptake and growth limitation in oceanic 2169

73

and coastal phytoplankton. Marine Chemistry, 50, pp.189–206. 2170

Tesson, B. & Hildebrand, M., 2010. Extensive and intimate association of the 2171

cytoskeleton with forming silica in diatoms: Control over patterning on the meso- 2172

and micro-scale. PLoS ONE, 5(12). 2173

Thamatrakoln, K., Alverson, A.J. & Hildebrand, M., 2006. Comparative sequence 2174

analysis of diatom silicon transporters: Toward a mechanistic model of silicon 2175

transport. Journal of Phycology, 42(4), pp.822–834. 2176

Thamatrakoln, K. & Hildebrand, M., 2007. Analysis of Thalassiosira pseudonana 2177

silicon transporters indicates distinct regulatory levels and transport activity 2178

through the cell cycle. Eukaryotic Cell, 6(2), pp.271–279. 2179

Thamatrakoln, K. & Hildebrand, M., 2008. Silicon uptake in diatoms revisited: a 2180

model for saturable and nonsaturable uptake kinetics and the role of silicon 2181

transporters. Plant Physiol, 146(3), pp.1397–1407. 2182

Treguer, P. et al., 1995. The Silica Balance in the World Ocean - a Reestimate. 2183

Science, 268(5209), pp.375–379. 2184

Vaulot, D. et al., 1987. Cell-cycle response to nutrient starvation in two phytoplankton 2185

species, Thalassiosira weissflogii and Hymemonas carterae. Marine Biology, 95, 2186

pp.625–630. 2187

Wenzl, S. et al., 2008. Silacidins: Highly acidic phosphopeptides from diatom shells 2188

assist in silica precipitation in vitro. Angewandte Chemie - International Edition, 2189

47(9), pp.1729–1732. 2190

Whitney, L.A.P. et al., 2011. Characterization of putative iron responsive genes as 2191

species-specific indicators of iron stress in Thalassiosiroid diatoms. Frontiers in 2192

74

Microbiology, 2(NOV), pp.1–14. 2193

Wu, Z. et al., 2010. Empirical bayes analysis of sequencing-based transcriptional 2194

profiling without replicates. BMC Bioinformatics, 11(1), p.564. 2195

Yool, A. & Tyrrell, T., 2003. The role of diatoms in regulating the ocean’s silicate 2196

cycle. Global Biogeochemical Cycles, 17(4), pp.1–24. 2197

Yoshida, Y. et al., 2016. De novo assembly and comparative transcriptome analysis of 2198

Euglena gracilis in response to anaerobic conditions. BMC genomics, 17(1), 2199

p.182. 2200

2201

2202

2203

2204

2205

2206

2207

2208

2209

75

2210 8 19 n=6 n=1 height height 7 5 n=15 n=199 silicon sensitive sensitive silicon

: Down in Silicon Limitation 6 4 n=70 n=596 : Down in Silicon Limitation

cluster gene expression expression cluster gene 6 4 2 0

-2 -4 - veresus nutrient replete nutrient veresus

. weissflogii T change fold 2 log 1 . oceanica T. T. weissflogii n=453 T

8 6 4 2 0

-2 -4

and

veresus nutrient replete nutrient veresus log 2 fold change fold 2 log 22 n=1 6 19 T. T. oceanica n=4 n=5

3 14 n=25 n=61 9 2 n=40 : Up in Silicon Limitation : Up in Silicon Limitation n=164 specific clusters shown. Each the panel per shows specific clusters 6 1 - . oceanica T n=12 . weissflogii n=717 T

4 2 0

-2 -4 -6 veresus nutrient replete nutrient veresus log 2 fold change fold 2 log 3 n=351

8 6 4 2 0 -2 -4 -6 veresus nutrient replete nutrient veresus

Iron Nitrate Silicon log 2 fold change fold 2 log Nutrient Limiting cluster median expression value and the error bars indicate the interquartile range. The numbers value The range.at and indicate the the the cluster error median interquartile bars expression fold change in silicon, iron, or nitrate limitation relative to the nutrient replete condition. The bar condition. limitationrelative The the replete to nutrient fold iron, nitrate silicon, or in change - - Depiction of the results from the hierarchical clustering of from results the Depiction the hierarchical of

Figure 1. genes, with the from onlyresults silicon a as log2 values the indicates per in bottom cluster. each number panel each indicate of the genes of

76

2211 5 log2 fold change relative to replete 4 1 2.5 0.0 −2.5 −5.0 w The asterisk 3 2 4 N 2 Species 3 e F 5 The black diagonal The line 1 oceanica * Si * * * ASC posterior probability >= 0.95 * log 10 RPKM nutrient replete 0

5 4 3 2 1 0 log 10 RPKM silicon limited silicon RPKM 10 log 1 2 3 4 5 fold change ofchange fold putative each E - E E was conducted. was silicon For each

ade l B e d _002291245 Clade a l _002295920 Clade P 7 P 5 X I X I MA T T. T. weissflogii axis) conditions. Each point corresponds to point a Each axis) corresponds conditions. K0 157255 C

T3 NCB -

t I 4 I T2 NCB A A K20_1219 I S G

9 ro S J

3

P de de i s 5 u 1 la la T1 NCBI XP_002290700 C 6 B

I , , and A 2680 Un dr S

de ABF5043 0 de

lin ica ica I de C la y la ana n n ABF5043 la c n

. I pseudonana F K54_189 . NCB K32_1233 K32_558 264054 C

212017 C

pseudonana T . I I ocea ocea T G 0 G . . NCB K57_921 J J 0 T T

1 s s pseudo 138651 C .

263777 C 0 u u

I rotula 0 T I D . 186485 C 1 I G dr dr T G T. T. oceanica J

J G

de s J , lin lin

s 0 u s y y la u 0 8 u 1 c 9 dr . . c dr 9 0 F F dr n. the map The log2 colors indicate heat 8 0 lin 1 lin 4 ters. from A for likelihoodsilicon transporters maximum comparison y 8 lin 9 y 8 9 0 c y 4 0 . 0 0 . c 0 c F 1 . 0 F 1 211148 C 1 F

I T. T. rotula G axis) axis) and (ysilicon limited J ,

- s u same expression value in expression same conditions the compared two

dr tically significantly differentially expressed. A 1:1 line is also is inset, also tically the A line expressed. which significantly 1:1 shows differentially F. cylin 0 0 1 2 . 0 point indicates which species the silicon transporterwhich from.species originates indicates point y have the y have

the 9 6 0 0 0 1 7 Thalassiosira Thalassiosira pseudonana , 0 0 1 Phylogenetic analysis of diatom silicon transpor diatom of Phylogenetic silicon analysis where where points would if fall

. 2 Figure cylindrus Fragilariopsis show transporter is also this heat study, in map identified a condition. silicon (N) replete tonutrient transporter limited nitrate or relative silicon the (Fe), (Si), iron condition in which indicates silicon transporters were statis log10value in(x expression RPKM replete nutrient silicon transporter the the of shape and indicates

77

2212

Figure 3. A series of 1:1 lines showing the expression values for putative silaffins, cingulins, silacidins, and frustulins identified in this study. Each point represents a single transcript. The x-axis is the log 10 transformed RPKM values in the nutrient replete condition and the y-axis is the log 10 transformed RPKM values in silicon limitation. The black diagonal line indicates where points would fall if they have the same expression value in the two conditions compared. The color of the point indicates the direction of differential expression in silicon limitation (up (red), down (blue), or no change) and the shape indicates whether or not each differentially expressed transcript is specific to silicon limitation or is part of a general stress response.

78

2213 K00318 2.5 To 33304 0.0 −5.0 −2.5 Tw K24_638 Si Fe N log2 fold change K00286 relative to replete To 20745 indicates silicon-specific * differential expression Tw K28_13844 glutamate Si Fe N

proline methionine

K01259 K00819 K00789 To 13814 To 27543 To 19346 To 30787 * Tw K24_1059 To 655 Si Fe N Si Fe N Tw K20_3684 Tw K32_48 peptide Tw K57_766 Si Fe N

K01476 To 31198 Tw K24_10694 ornithine S-adenosyl-L-methionine *Si Fe N K01581 K01611 arginine To 27123 To 28306 Tw K20_882 Tw K32_4437 Si Fe N Si Fe N

putrescine S-adenosyl-metioninamine

K00797 To 4706 To 7947 Tw K50_13069 Si Fe N

spermine/spermidine Figure 4. Heat map overlaid on a pathway map of genes involved in the precursors of long-chain polyamine biosynthesis. Colors in the heat map indicate the log2-fold change in silicon (Si), iron (Fe), or nitrate (N) limitation relative to the nutrient replete condition. The full description for each KEGG orthology are as follows: K00138:aldB, aldehyde dehydrogenase; K00286:proC, pyrroline-5-carboxylate reductase; K01259:pip, proline iminopeptidase; K01476:arginase; K00819:rocD, ornithine--oxo-acid transaminase; K01581:ornithine decarboxylase; K00789:metK, S- adenosylmethionine synthetase; K01611:speD, S-adenosylmethionine decarboxylase; K00797:speE, spermidine synthase

79

T. oceanicaT. weissflogii

SITs silaffins−cingulins silacidins frustulins 09193 Cellular processes and signaling 09192 Genetic information processing 09143 Cell growth and death 09142 Cell motility 09141 Transport and catabolism 09133 Signaling molecules and interaction 3 2 09132 Signal transduction 1 09131 Membrane transport 0 09124 Replication and repair −1 09123 Folding, sorting and degradation −2 to nutrient replete −3

09122 Translation log2 fold change relative not differentially 09121 Transcription expressed 09111 Xenobiotics biodegradation and metabolism 09110 Biosynthesis of other secondary metabolites 09109 Metabolism of terpenoids and polyketides 09108 Metabolism of cofactors and vitamins 09107 Glycan biosynthesis and metabolism 09106 Metabolism of other amino acids 09105 Amino acid metabolism 09104 Nucleotide metabolism 09103 Lipid metabolism 09102 Energy metabolism 09101 Carbohydrate metabolism Figure 5. Heat map of the silicon specific, differentially expressed genes from T. oceanica and T. weissflogii and their KEGG orthology or silicon metabolism gene assignment categories. Colors in the heat map indicate the relative proportion of silicon specific genes in each category. Some categories that would not apply to diatoms were excluded from this analysis (e.g., Cardiovascular diseases).

2214

80

2215 Table 1. Summary table for T. oceanica and T. weissflogii showing the total number of transcripts assembled and number of transcripts with function or protein domain information assigned by KEGG automated annotation server or InterProScan.

T. oceanica T. weissfogii Total Transcripts 36382 24900 Annotated 12450 (34%) 9580 (38%)

Table 2. Summary table for T. oceanica and T. weissflogii showing the number of transcripts differentially expressed in silicon limitation relative to nutrient replete conditions (>= 0.95 posterior probability of a 2-fold change). Numbers shown in the “Si specific” rows indicate the numbers of genes that are silicon-specifically differentially expressed and not part of a general stress response.

T. oceanica T. weissfogii Up 947 2814 Up Si specifc 823 557 Down 809 1467 Down Si specifc 539 801

81

Manuscript 2 2216

Formatted for publication in Marine Ecology Progress Series 2217

Thalassiosira diatoms display distinct, species-specific strategies for responding to 2218

changes in pCO2 2219

Joselynn Wallace1, Andrew L. King2, Gary H. Wilkfors3, Shannon L. Meseck3, 2220 Bethany D. Jenkins1,4 2221 2222 1 Department of Cell and Molecular Biology, University of Rhode Island, Kingston, 2223 RI, USA 2224 2225 2 NIVA Norwegian Institute for Water Research, Bergen, Norway 2226 2227 3 NOAA/NMFS, Northeast Fisheries Science Center, Milford Laboratory, Milford, 2228 CT, USA 2229 2230 4 Graduate School of Oceanography, University of Rhode Island, Narragansett, RI, 2231 USA 2232 2233

82

ABSTRACT 2234

Fossil fuel combustion has raised atmospheric pCO2 from ~277 ppm in 1750 to 2235 the present level of ~400 ppm. The ocean sequesters ~30% of that CO2, decreasing its 2236 buffering capacity and pH (ocean acidification). Diatoms are responsible for ~40% of 2237 oceanic primary production based upon photosynthetic fixation of CO2. This is aided 2238 by a carbon-concentrating mechanism (CCM), which uses bicarbonate transporters 2239

- - (BCTs) to take up HCO3 and carbonic anhydrases (CAs) to interconvert HCO3 and 2240

CO2. There are several distinct classes of BCTs and CAs found in diatoms, including 2241 the substrate-specific SLC4 and non-specific SLC26 BCTs, as well as the α, β, δ, γ, ζ, 2242 and θ CAs. The ability to regulate and conserve energetic costs associated with the 2243

CCM may influence diatom community composition in a higher CO2 future. CO2 2244 manipulation experiments were conducted with four Thalassiosira species – T. 2245 pseudonana, T. rotula, T. weissflogii, and T. oceanica, and transcriptomes were 2246 sequenced from the latter 3 species. T. rotula is a coastal diatom that may have 2247 experienced high CO2 during seasonal upwelling, while T. oceanica and T. weissflogii 2248 are open-ocean isolates that have likely experienced relatively stable CO2 conditions 2249

(Feely et al. 2008; Hofmann et al. 2011). These species displayed a range of growth 2250 rate changes across CO2 conditions (<230 ppm CO2 to >2900 ppm CO2), from no 2251 change across conditions (T. pseudonana), slower growth in lower CO2 (T. 2252 weissflogii), slower growth in higher CO2 (T. oceanica), and faster growth in higher 2253

CO2 (T. rotula) (King et al. 2015). The accompanying transcriptome evidence for T. 2254 rotula, T. weissflogii, and T. oceanica indicate that these differences in CO2 responses 2255 boil down to the ability to dynamically regulate the carbon concentrating mechanism 2256

83

while maintaining internal redox and ion homeostasis, as well as trade-offs between 2257 funneling excess CO2 into internal storage pools versus a higher growth rate. 2258

2259 INTRODUCTION 2260

Anthropogenic activity (fossil fuel emissions and land use changes) has led to 2261 an increase in atmospheric pCO2 concentration from ~277 ppm in 1750 to over 400 2262 ppm currently (Tans 2009; Le Quéré et al. 2016; Scripps 2017). Atmospheric CO2 2263 equilibrates with the ocean, which sequesters ~20% of anthropogenic CO2 through 2264 physical processes and biological activity and has been the only net sink for CO2 in the 2265 past 200 years (Sabine et al. 2004; Khatiwala et al. 2013). Phytoplankton are 2266 organisms that live in the surface ocean and convert CO2 into organic biomass via 2267 photosynthesis and serve as an important biological source of CO2 uptake. 2268

Phytoplankton cells that sink out of the lit ocean can export the CO2 formerly in the 2269 atmosphere, to the deep ocean. 2270

As global CO2 levels rise, the increased dissolved CO2 in the ocean disturbs 2271 carbonate chemistry and leads to a decrease in pH and buffering capacity, or ocean 2272 acidification (Caldeira & Wickett 2003; Sabine et al. 2004). Ocean acidification is 2273 thought to have negative consequences for some phytoplankton groups, like those that 2274 have calcareous skeletons comprised of calcium carbonate (e.g., Riebesell et al. 2000). 2275

The impact of ocean acidification on other groups of non-calcified phytoplankton, like 2276 diatoms that have silicified skeletons, is less clear (e.g., King et al. 2015). 2277

Diatoms are a phytoplankton group that perform roughly 40% of oceanic 2278 primary production via photosynthetic CO2 fixation (Nelson et al. 1995). Diatoms 2279 have experienced fluctuations in pCO2 on different temporal scales throughout their 2280

84

evolutionary history. Molecular clock estimates suggest that diatoms arose around 240 2281 million years ago (Sims et al. 2006; Medlin 2016). This coincides with the start of a 2282 glacial “hot-house” period, where atmospheric pCO2 levels were as high as 6000 ppm 2283

(Kump et al. 2009). In more recent history, diatoms have also experienced low pCO2: 2284 pre-industrial (before 1850) atmospheric pCO2 averaged ~270 ppm (Tans 2009). 2285

Diatoms can also experience fluctuations in pCO2 levels that vary on much shorter 2286 time scales. Coastal diatoms might experience dissolved pCO2 concentrations greater 2287 than 850 ppm during seasonal coastal upwelling events (Feely et al. 2008). The 2288 distribution patterns of diatoms in the ocean can also influence the pCO2 conditions 2289 they experience over the course of their life histories. Relative to coastal diatoms, 2290 diatoms living in the open-ocean experience more stable pCO2 conditions (Hofmann et 2291 al. 2011). 2292

Climate models suggest that pCO2 levels in the atmosphere may rise to ~1000 2293 ppm by the year 2100 if no measures are taken to reduce emissions (Solomon et al. 2294

2007). In light of the sensitivity of calcified phytoplankton to ocean acidification, 2295 diatoms may have an even larger role to play in the marine food web if no measures 2296 are taken to reduce CO2 emissions. Within the diatoms as well, there will likely be 2297 species that are ecological "winners" in adapting to elevated pCO2. Physiological 2298 studies examining how diatoms respond to changing pCO2 suggest that there is a wide 2299 range of physiological responses within the diatoms. Members of the same genera 2300 differ in their physiological response to PCO2 concentration. This is illustrated by the 2301 differences in specific growth rates of members of the Thalassiosira genus grown in 2302 different CO2 conditions. Some genera within this species seem to thrive in high CO2; 2303

85

the specific growth rate of T. rotula increased by at least 13% in elevated (690 ppm) 2304

CO2 (King et al. 2015). A similar trend is seen in T. weissflogii, which grows faster in 2305 high CO2 and slower in low pCO2 (King et al. 2015; Shi et al. 2010). The data for T. 2306 oceanica is less clear – in one study the specific growth rate is decreased by at least 2307

19% in high pCO2 (700 ppm) (King et al. 2015), while other studies indicate that the 2308 specific growth rate of this diatom is increased in high pCO2 (Shi et al. 2010). Other 2309

Thalassiosira diatoms appear to be indifferent to CO2 levels; the growth rate of T. 2310 pseudonana does not change with pCO2 (Crawfurd et al. 2011; King et al. 2015). 2311

One aspect of phytoplankton biology that may influence which groups are 2312 winners and losers in a higher CO2 future is their ability to regulate how much of their 2313 energy budgets they spend on inorganic carbon concentrating mechanisms (Giordano 2314 et al. 2005; Raven et al. 2014). During the dark reactions of photosynthesis, diatoms 2315 use the enzyme ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) to fix 2316

CO2 onto ribulose bisphosphate, producing 3-phosphoglycerate and fueling the Calvin 2317 cycle. However, RuBisCO can also fix O2 onto ribulose bisphosphate, producing 2-P- 2318 glycolate, fueling the less energetically efficient photorespiration pathway. Diatoms 2319 use carbon concentrating mechanisms (CCMs) to increase the supply of CO2 proximal 2320 to RuBisCO, to energetically favor carbon fixation rather than photorespiration. 2321

The use of carbon concentrating mechanisms and active transport of inorganic 2322 carbon requires a significant energetic investment by the cell (Raven 1991). If diatoms 2323 can downregulate CCM components when CO2 levels are high, this may result in the 2324 conservation of cellular energy as anthropogenic CO2 emissions rise. This energy 2325 conservation may confer a competitive advantage to some diatoms and other groups of 2326

86

phytoplankton and could influence community composition in the future. The ability 2327 or inability to regulate the CCM in response to fluctuating pCO2 may be related to 2328 how diatoms sense changes in pCO2. 2329

Diatom CCMs have been studied mainly in the model diatoms Thalassiosira 2330 pseudonana and Phaeodactylum tricornutum. Diatom CCMs depend on the activity of 2331 bicarbonate transporters and carbonic anhydrases -- particularly those enzymes 2332 localized to the plastid or the pyrenoid, a proteinaceous sub-compartment of the 2333 plastid where carbon fixation occurs (Hopkinson et al. 2011). Bicarbonate transporters 2334 allow membrane-impermeable bicarbonate to cross cell membranes, while carbonic 2335 anhydrases interconvert carbon dioxide and bicarbonate. This facilitates the diffusion 2336 of CO2 across membranes and also allows for the conversion of CO2 to bicarbonate to 2337 prevent it from leaking back out of the cell. 2338

- Carbonic anhydrases interconvert HCO3 and CO2. They allow diatoms to 2339 concentrate CO2 proximal to the carbon fixation enzyme RuBisCO, as they can 2340

- prevent CO2 leakage across membranes by controlling the interconversion of HCO3 2341 and CO2 (Hopkinson 2014; Hopkinson et al. 2016). Carbonic anhydrases can also 2342 facilitate CO2 diffusion into the cell if they are extracellular, and mitochondrial 2343 carbonic anhydrases may be involved in recuperating respired CO2 (Hopkinson 2014; 2344

Hopkinson et al. 2016). There are several distinct classes of carbonic anhydrases 2345 found in diatoms, termed (α, β, δ, γ, ζ, and θ), each with their own unique evolutionary 2346 history and distribution across the tree of life and within diatom groups (Hewett- 2347

Emmett & Tashian 1996; Tachibana et al. 2011; Samukawa et al. 2014; Kikutani et al. 2348

2016). For example, although the δ carbonic anhydrases have been found in 2349

87

Thalassiosira diatoms and other phytoplankton groups (e.g., haptophytes), they have 2350 not been found in P. tricornutum (McGinn & Morel 2008; Tachibana et al. 2011; Soto 2351 et al. 2006). The ζ carbonic anhydrases have only been found in Thalassiosira diatoms 2352

(Tachibana et al. 2011; Samukawa et al. 2014). Thus, carbon concentrating 2353 mechanism function may differ between diatoms as their genetic complement is 2354 variable. Diatom carbonic anhydrases are sensitive to CO2 conditions and proteomic 2355 and gene expression studies across pCO2 conditions indicate that the δ and ζ carbonic 2356 anhydrases play an important role in the CCM of Thalassiosira pseudonana 2357

(Tachibana et al. 2011; Samukawa et al. 2014; Clement et al. 2017; Hennon et al. 2358

2015; Crawfurd et al. 2011; McGinn & Morel 2008). There is some additional 2359 evidence that the α and γ carbonic anhydrases may also play a role in the CCM in T. 2360 pseudonana (Tachibana et al. 2011; Samukawa et al. 2014). 2361

Diatoms also differ as to whether they can use C4 metabolism as part of a 2362

CCM. C4 metabolism involves the carboxylation and decarboxylation of 4-carbon 2363 compounds and requires a carboxylating enzyme (phosphoenolpyruvate carboxylase) 2364 that is localized outside of the plastid and a decarboxylating enzyme 2365

(phosphoenolpyruvate carboxykinase or malic enzyme) that is plastid-localized 2366

(Reinfelder et al. 2000; Sage 2004). Transaminases, pyruvate phosphate dikinase, and 2367 malate dehydrogenase perform other conversions to ultimately regenerate 2368 phosphoenolpyruvate (Sage 2004). C4 metabolism has been implicated to be a 2369 potentially important aspect of CCMs in the diatom Thalassiosira weissflogii 2370

(Reinfelder et al. 2004; Roberts et al. 2007) but is not thought to be part of the CCM 2371 for the model diatom Phaeodactylum tricornutum (Haimovich-Dayan et al. 2013). 2372

88

Sensing of pCO2 in diatoms may be via the activity of cAMP secondary 2373 messengers (Harada et al. 2006; Ohno et al. 2012; Hennon et al. 2015). Adenylate 2374 cylases generate cAMP secondary messengers from ATP in response to signals 2375 received from bicarbonate (Buck et al. 1999). The cAMP secondary messengers 2376 activate cAMP-dependent protein kinases that phosphorylate and activate bZIP 2377 domain transcription factors that in turn bind to cAMP response elements and regulate 2378 downstream gene expression (Ohno et al. 2012). Phosphodiesterases regulate cellular 2379 cAMP levels by dephosphorylating cAMP back into AMP (Conti & Beavo 2007). 2380 cAMP response binding-element proteins are also activated by other signal and 2381 activator pairs, including calcium and calcium-calmodulin-dependent kinases (Mayr & 2382

Montminy 2001). Studies in P. tricornutum and T. pseudonana suggest that the 2383 activity of a soluble adenylyl cyclase is increased in the presence of CO2/HCO3, and 2384 that the resulting increase in cellular cAMP levels induces the activity of bZIP 2385 transcription factors (Harada et al. 2006; Ohno et al. 2012; Hennon et al. 2015). The 2386 transcription factors bind to upstream CO2-cAMP-responsive elements and repress the 2387 expression of CCM genes, including the expression of the adenylyl cyclase itself 2388

(Hennon et al. 2015). When cAMP function is inhibited chemically (Harada et al. 2389

2006) or by gene deletion (Ohno et al. 2012), the pCO2 sensitivity of the β carbonic 2390 anhydrases is repressed in diatoms. 2391

The majority of our molecular and genetic understanding regarding the 2392 metabolic adjustments diatoms may make in a high CO2 future ocean is largely shaped 2393 by studies in two model diatoms, Thalassiosira pseudonana and Phaeodactylum 2394 tricornutum. As these diatoms show significant differences in the metabolism they use 2395

89

to respond to changes in pCO2 (Hopkinson et al. 2016), we wanted to probe this 2396 response in more closely related diatoms that have broad ecological distributions, 2397 albeit in different ocean basins. In this study, we probed the genetic response of 2398

Thalassiosiroid diatoms (T. rotula CCMP3096, T. oceanica CCMP1005, and T. 2399 weissflogii CCMP1010) that demonstrate distinct physiological responses to different 2400 pCO2 concentrations using quantitative global transcriptome profiling. Our gene 2401 expression data is consistent with our previous physiological study (King et al. 2015) 2402 in that related diatom species utilize distinct complements of genes involved in their 2403

CO2-dependent growth response and carbon concentrating mechanisms. 2404

MATERIALS AND METHODS 2405

Culturing 2406

Culturing experiments were conducted as described elsewhere (King et al 2407

2015). Briefly, triplicate cultures were grown in f/2 media at three (T. rotula) or four 2408

(T. oceanica and T. weissflogii) pCO2 levels until growth rates stabilized (for 7-10 2409 generations) (Guillard & Ryther 1962; Guillard 1975). CO2 levels ranged from an 2410 approximation of: low/pre-industrial (less than 230 ppm pCO2), present-day ambient 2411 pCO2 in the open ocean (320-390 ppm pCO2), future pCO2 projections (690-1340 ppm 2412 pCO2), and the maximum that diatoms have experienced in their evolutionary history 2413

(2900-5100 ppm pCO2) (Sims et al. 2006; Medlin 2016; Kump et al. 2009). 2414

After 7-10 generations at the target CO2 level, cells were collected on 3 µm 2415 polyester filters (Sterlitech Corporation, Kent, WA, USA) by gentle filtration, flash 2416 frozen in liquid nitrogen, and stored at -80 °C until RNA could be extracted. RNA was 2417 extracted from filters following a modified version of the RNeasy Mini Kit protocol 2418

90

(Qiagen,Venlo, Netherlands). The protocol was modified by vortexing filters in lysis 2419 buffer with 0.5 mm zircon beads (BioSpec, Bartlesville, OK, USA) for 15 minutes to 2420 lyse the cells. Cell debris was immediately removed by running lysate over a 2421

QiaShredder column (Qiagen). DNA contamination was removed by an on-column 2422

DNAse step using the RNase-free DNase Set (Qiagen) and an additional DNAse step 2423 was performed on eluted RNA using the Turbo DNA-free kit (Life Technologies, 2424

Carlsbad, CA, USA), both performed as per the manufacturers’ instructions. Samples 2425 were quantified using the Qubit High-Sensitivity Kit (Life Technologies). 2426

Sequencing, mapping, differential expression 2427

Equal amounts of total RNA from each triplicate were pooled and sent to the 2428

JP Sulzberger Genome Sequencing Center at Columbia University. All sequencing 2429 was performed on an Illumina HiSeq platform and all libraries were mRNA enriched 2430 using a poly-A pull-down method (Illumina, San Diego, CA, USA). Target sequencing 2431 depth was 30 million, single-end, 100 basepair read length. 2432

Initial read quality assessments were performed in FastQC (Andrews 2010). 2433

Raw reads were subsequently imported into CLC Workbench (CLC Bio-Qiagen, 2434

Aarhus, Denmark) with an insert size ranging from 1-400 nucleotides. Reads were 2435 trimmed in CLC Workbench with the following parameters: removal of 13 bases off 2436 the 5’ end, .001 quality cut off, and no ambiguous nucleotide base calls. Reads were 2437 mapped to de novo transcriptomes that were assembled as described elsewhere 2438

(Wallace et al., in prep). Reads were mapped using the RNAseq module in CLC 2439

Workbench with the following parameters: mismatch cost (2), insertion cost (3), 2440 deletion cost (3), length fraction (0.8), similarity fraction (0.8), paired distances (auto- 2441

91

detect), strand specific (both), maximum number of hits for a read (10). Annotations 2442 from the transcriptome sequences were refined by running the translated protein 2443 sequences on the KEGG KAAS automated server (Moriya et al. 2007). The BLAST 2444 search program and bi-directional best hit options were selected. The sequences were 2445 queried against the “genes” data set, with the addition of the diatoms Thalassiosira 2446 pseudonana (tps) and Phaeodactylum tricornutum (pti). An additional annotation step 2447 was conducted using InterProScan with the pathways and goterms options selected 2448

(Jones et al. 2014). 2449

Differential gene expression was inferred with Analysis of Sequence Counts 2450

(ASC) from the total numbers of reads mapped to each transcript (Wu et al. 2010). 2451

Transcript expression levels in the present-day ambient open ocean pCO2 (320-390 2452 ppm) were compared to expression levels in low/pre-industrial (less than 230 ppm 2453 pCO2), future pCO2 projections (690-1340 ppm pCO2), and the maximum that diatoms 2454 have experienced in their evolutionary history (2900-5100 ppm pCO2). Transcripts 2455 with at least a 95% posterior probability of a 2-fold change were considered 2456 differentially expressed. 2457

Identification and comparison of orthologs 2458

Protein sequences for putative carbonic anhydrases, bicarbonate transporters, 2459 and additional genes potentially involved in C4 metabolism were used for 2460 phylogenetic analysis. An initial BlastP search was conducted to find potential 2461 orthologous genes in the translated transcriptomes. Any sequence meeting our 2462 threshold against any of these queries (35% ID, 80% query coverage per HSP cutoff) 2463

92

were considered an ortholog. We also included any sequence identified as a carbonic 2464 anhydrase or bicarbonate transporter by InterProScan (Jones et al. 2014). 2465

Resulting sequences from the T. oceanica transcriptome were cross-referenced 2466 to the T. oceanica genome (Lommer et al. 2012). Putative carbonic anhydrase, 2467 bicarbonate transporter, and C4 ortholog nucleotide sequences from the T. oceanica 2468 transcriptome were queried against the Ensembl Protist T. oceanica genome blast 2469 database using blastn. The output values for the top corresponding genomic locations 2470 are reported and if multiple top hits were reported, all of those hits were included. 2471

This data is summarized in Supplemental Table 3. 2472

For the carbonic anhydrases, the query sequences used were putative carbonic 2473 anhydrase sequences described in Thalassiosira pseudonana and Phaeodactylum 2474 tricornutum (Tachibana et al. 2011; Samukawa et al. 2014; Kikutani et al. 2016). Also 2475 included in this analysis were the eta type carbonic anhydrases from Plasmodium 2476 falciparum and the epsilon type carbonic anhydrases from Halothiobacillus 2477 neapolitanus (Del Prete et al. 2014; So et al. 2004). 2478

For the bicarbonate transporters, we used the SLC4 sequences from T. 2479 pseudonana and P. tricornutum (Nakajima et al. 2013). We queried the T. pseudonana 2480

JGI genome portal for “SLC26 family” keyword and used those sequences as the 2481 sequence queries and included them in the phylogeny (Armbrust et al. 2004). 2482

For genes with a potential role in C4 metabolism, the T. pseudonana JGI 2483 genome portal was queried for genes annotated as having a key role in C4 metabolism 2484

(Reinfelder et al. 2000; Sage 2004; Kustka et al. 2014; Armbrust et al. 2004). 2485

93

All alignments were done in MAFFT (version 7) using the L-INS-i strategy 2486 with the BLOSUM62 scoring matrix, the “leave gappy regions” option selected, and 2487 the MAFFT-homologs option selected for the bicarbonate transporter proteins (Katoh 2488

& Standley 2013). FASTA alignments were converted into PHYLIP format and 2489 imported into RaxML (Felsenstein 1989; Stamatakis 2014). Maximum likelihood trees 2490 were computed with the following parameters: PROTGAMMA matrix, BLOSUM62 2491 substitution model, and 1000 bootstrap replicates. The carbonic anhydrase tree was 2492 rooted using the gamma carbonic anhydrases because they are the most ancient 2493 carbonic anhydrase (Smith et al. 1999). The bicarbonate transporter tree was mid-point 2494 rooted in FigTree. 2495

Predicted Protein Localization 2496

SignalP (V4.1), TargetP (V1.1), and ASA-find (1.1.5) were used to determine 2497 the putative localization of differentially expressed genes and genes potentially 2498 involved in a carbon concentrating mechanism (Bendtsen et al. 2004; Emanuelsson et 2499 al. 2007; Gruber et al. 2015). TMHMM version 2.0 was used for prediction of 2500 transmembrane domains (Krogh et al. 2001). SignalP was run against the eukaryote 2501 organism group and the default cutoff values were used. TargetP was run with the 2502 plant model and “winner-takes-all” option selected. 2503

The input for these analyses were the translated protein sequences from the 2504 transcriptome assemblies as found by ORF-Predictor and reported elsewhere (Min, 2505

Butler, Storms, & Tsang, 2005, Wallace et al., in prep). 2506

The output of these predictions were interpreted as follows: only proteins 2507 predicted to be plastid targeted by ASA-find (low or high confidence) were considered 2508

94

imported across all four plastid membranes and into the stroma. If a protein was not 2509 plastid targeted but was SignalP positive, it was considered likely to be targeted across 2510 the chloroplast-ER (CER) membrane and retained in the CER compartment. If a 2511

SignalP negative protein was predicted by TargetP to be targeted to the mitochondria 2512 then that protein was considered targeted to the mitochondria (Kroth et al. 2008). 2513

Proteins that were not predicted to have a signal cleavage site and were not predicted 2514 to be localized to the mitochondria were considered cytosolic. If the sequence was 2515 predicted to be SignalP negative, have a transmembrane domain in TMHMM, and 2516 secreted as per TargetP, the localization for this sequence is considered potentially 2517 extracellular. 2518

RESULTS 2519

The three Thalassiosira diatoms studied here show very different physiological 2520 responses to changes in CO2 (King et al. 2015). Transcriptomes sequenced from the 2521 same incubations highlighted the different underlying transcriptional responses. T. 2522 oceanica, T. weissflogii, and T. rotula differentially express different numbers of 2523 genes across CO2 conditions (relative to the present-day approximation of 320-390 2524 ppm pCO2) (Table 1). T. oceanica shows a relatively muted response in global gene 2525 expression in <230 ppm and 690-1340 ppm pCO2 conditions compared to the 2900- 2526

5100 ppm pCO2 condition (Table 1). The larger shift in expression in high pCO2 2527 coincides with a decreased specific growth rate in 690-1340 and 2900-5100 ppm pCO2 2528 that is not reflected in any changes in the molecular ratios of C:N:P across pCO2 (King 2529 et al. 2015). T. weissflogii shows a more muted response in the 690-1340 ppm pCO2, 2530 differentially expressing only 156 genes (Table 1). This is in contrast to the <230 ppm 2531

95

and 2900-5100 ppm CO2 conditions, where there is differential expression of 581 and 2532

434 genes, respectively (Table 1). This corresponds to a decreased specific growth rate 2533 of T. weissflogii in <230 ppm pCO2 that is not reflected in any changes in the 2534 molecular ratios of C:N:P, which do not change across the pCO2 conditions tested 2535

(King et al 2015). The specific growth rate of T. rotula increases in 690-1320 ppm 2536 pCO2 relative to the specific growth rates in <230 and 320-390 ppm pCO2 (King et al. 2537

2015). There is also an increase in the C:P ratio in <230 ppm pCO2 (King et al. 2015). 2538

This corresponds to a stronger differential expression signal in the <230 ppm pCO2 2539 condition, where 988 genes are differentially expressed (Table 1). This is in contrast 2540 with the 690-1340 ppm pCO2 condition, where only 62 genes are differentially 2541 expressed (Table 1). There were no 2900-5100 ppm CO2 cultures, as the high CO2 and 2542 low cell density could not be maintained simultaneously (King et al. 2015). 2543

Species-specific metabolic rearrangement, redox and ion homeostasis in low CO2 2544

T. rotula grown in low pCO2 (<230 ppm) increases the expression of 2545 degradative genes (Supplemental Table 1), including fatty acid and branched chain 2546 amino acid degradation (Figure 1). There also is a low CO2 downregulation of 2547 biosynthetic reactions, chromosomes, translation machinery, and cell cycle 2548 progression (Supplemental Table 1). 2549

T. weissflogii grown in low pCO2 decreases the expression of gluconeogenesis 2550 and pentose phosphate pathway genes (Figure 1). There also is a low CO2 2551 downregulation of genes involved in amino acid, fatty acid, and tRNA biosynthesis 2552

(Supplemental Table 1) and an increase the expression of fatty acid and branched- 2553

96

chain amino acid degradation genes (although not as many genes as T. rotula) 2554

(Supplemental Table 1). 2555

The trends in T. rotula and T. weissflogii are in contrast to what is observed in 2556

T. oceanica, where there is not a clear reorganization of metabolism with changing 2557 pCO2. T. oceanica does not increase the expression of genes involved in degradative 2558 processes in low CO2, but increased the expression of a single gene of unknown 2559 function (Supplemental Table 1). Similarly, T. oceanica does not increase the 2560 expression of genes involved in biosynthetic pathways in high CO2 or decrease their 2561 expression in low CO2 (Supplemental Table 1). However, in high CO2, T. oceanica 2562 upregulates genes involved in transporting protons and cations as well as redox 2563 homeostasis genes (Figure 1). 2564

Bicarbonate transporters 2565

Copy number variation in bicarbonate transporters 2566

The 3 Thalassiosira diatom species considered in this study have different 2567 numbers of bicarbonate transporters. T. pseudonana has 2 SLC4 transporters 2568

(Armbrust et al. 2004; Nakajima et al. 2013), T. weissflogii is predicted to have 3, T. 2569 rotula 4, and T. oceanica 5 (Figure 2, Supplemental Table 2). T. weissflogii is 2570 predicted to have 6 SLC26 transporters, T. rotula 6, T. oceanica 4, and we recovered 2571

9 from the T. pseudonana genome (Figure 2, Supplemental Table 2) (Armbrust et al. 2572

2004). The number of bicarbonate transporter sequences recovered from our T. 2573 oceanica transcriptome are in agreement with what is observed in the T. oceanica 2574 genome (Lommer et al. 2012) (Supplemental Table 3). 2575

2576

97

Varied bicarbonate transporter localization predictions and CO2 sensitivity 2577

The 3 Thalassiosira diatom species in this study show distinct potential 2578 differences in localization of their bicarbonate transporters and expression patterns. 2579

All of the putative SLC4 and SLC26 transporters recovered here have at least 1 2580 transmembrane helix, suggesting that all are localized to a membrane (Supplemental 2581

Table 2). T. rotula and T. oceanica each have 1 plastid predicted SLC4 transporter, 1 2582 mitochondrial predicted SLC4 transporter, and 1 mitochondrial predicted SLC26 2583 transporter (Supplemental Table 2). In <230 ppm pCO2, T. rotula downregulates a 2584 cytosolic membrane predicted SLC4 transporter and upregulates a plastid predicted 2585

SLC4 transporter but does not differentially express SLC26 transporters across CO2 2586 conditions (Figure 3, Supplemental Table 2). T. oceanica downregulates 2 cytosolic 2587 membrane predicted SLC4 transporters in 2900-5100 ppm pCO2 but does not 2588 differentially express SLC26 transporters across CO2 conditions (Figure 3, 2589

Supplemental Table 2). The T. rotula and T. oceanica results are consistent with 2590 results from T. pseudonana, where an SLC4 transporter in the cytosolic membrane 2591 was downregulated in high CO2 (Hennon et al. 2015). In contrast, most of the SLC4 2592 and SLC26 transporters in T. weissflogii are predicted to be in the cytosolic 2593 membrane, save for 1 plastid predicted SLC26 transporter (Supplemental Table 2). In 2594

<230 ppm pCO2, T. weissflogii downregulates a plastid predicted SLC26 but does not 2595 differentially express SLC4 transporters across CO2 conditions (Figure 3, 2596

Supplemental Table 2). Interestingly, none of the SLC4 or SLC26 transporters from T. 2597 weissflogii are predicted to be localized to the mitochondria (Supplemental Table 2). 2598

2599

98

Carbonic Anhydrases 2600

Copy number variation in carbonic anhydrases 2601

The copy numbers and isoform distributions of the putative carbonic 2602 anhydrases from the assembled Thalassiosira transcriptomes vary across species 2603

(Table 2). The T. weissflogii transcriptome has the largest number of putative carbonic 2604 anhydrases (19), while the T. oceanica and T. rotula transcriptomes have 14 and 13, 2605 respectively, which is on par with the 14 putative carbonic anhydrases found in the T. 2606 pseudonana genome (Table 2) (Armbrust et al. 2004; Tachibana et al. 2011; 2607

Samukawa et al. 2014; Kikutani et al. 2016). T. weissflogii appears to be particularly 2608 enriched in its number of ζ carbonic anhydrases (4 copies), compared with the single 2609 copies found in T. pseudonana, T. oceanica, and T. rotula, while this class of carbonic 2610 anhydrase is absent entirely from P. tricornutum (Table 2) (Tachibana et al. 2011; 2611

Samukawa et al. 2014). Three of the T. weissflogii ζ carbonic anhydrases are in the 2612 same clade as the other diatoms, while the fourth T. weissflogii ζ carbonic anhydrase is 2613 basal to all the other ζ carbonic anhydrases (Figure 4). The T. weissflogii 2614 transcriptome also has a higher number of α carbonic anhydrases – 7 copies, compared 2615 to 2 in T. rotula, 3 in T. oceanica and T. pseudonana, and 5 in P. tricornutum (Table 2616

2) (Tachibana et al. 2011; Samukawa et al. 2014). The Thalassiosira α carbonic 2617 anhydrases are in clade separate from those in P. tricornutum (Figure 4). The T. 2618 weissflogii β carbonic anhydrase copy groups closely with the P. tricornutum copies 2619

(support value of 98) and the T. rotula copy also clusters with this group although with 2620 poor support values (support value of 23) (Figure 4). There is no evidence of θ, η, or ε 2621

99

carbonic anhydrases in our assembled transcriptomes (Figure 4) (Kikutani et al. 2016; 2622

Del Prete et al. 2014; So et al. 2004). 2623

The number of carbonic anhydrases recovered from the T. oceanica 2624 transcriptome are generally supported by the genomic evidence, with the exception of 2625 the δ carbonic anhydrases (Supplemental Table 3) (Lommer et al. 2012). Each of the 3 2626

α carbonic can be aligned to the genome, although 1 copy did not have a gene 2627 prediction overlapping at its mapping location (Supplemental Table 3). The 5 γ 2628 carbonic anhydrases and the single ζ carbonic anhydrase also map to distinct genomic 2629 locations and are supported by the gene model predictions (Supplemental Table 3). 2630

This congruence starts to break down when comparing the genome and transcriptome 2631

δ carbonic anhydrases -- 2 of the transcripts are potentially distinct alleles of the same 2632 gene, as they map to the same genetic location and their sequences differ only by 3 2633 nucleotides (Supplemental Table 3). Additionally, 2 of the transcripts both map 2634 equally well to the same 3 genomic locations, although in each of these scenarios the 2635 transcript sequence only covers part of the predicted gene model (Supplemental Table 2636

3). 2637

Varied carbonic anhydrase localization predictions 2638

Each diatom also has a unique pattern of predicted localization for their 2639 particular suite of carbonic anhydrases (Figure 5, Supplemental Table 2). T. oceanica 2640 does not have any carbonic anhydrases predicted to be targeted to the plastid or the 2641 chloroplast endoplasmic reticulum but has 5 mitochondrion predicted carbonic 2642 anhydrases (two δ and three γ) and the remainder are predicted to be cytoplasmic 2643

(Figure 5, Supplemental Table 2). T. rotula has a single plastid predicted α carbonic 2644

100

anhydrase, 1 mitochondrion predicted γ carbonic anhydrase, and the remainder are 2645 predicted to be cytoplasmic (Figure 5, Supplemental Table 2). T. weissflogii has 2 2646 chloroplast endoplasmic reticulum predicted α carbonic anhydrases, 3 mitochondrion 2647 predicted carbonic anhydrases (2γ and 1α), 1 extracellular predicted δ carbonic 2648 anhydrase, and the remainder are cytoplasmic (Figure 5, Supplemental Table 2). 2649

Carbonic anhydrase CO2 sensitivity 2650

The putative β, δ, and ζ carbonic anhydrases in T. weissflogii are generally the 2651 most CO2 sensitive (Figure 3, Supplemental Table 2). In <230 ppm pCO2, T. 2652 weissflogii upregulates 4 ζ carbonic anhydrases, 1 δ carbonic anhydrase, and 1 β 2653 carbonic anhydrase (Figure 3, Supplemental Table 2). The δ carbonic anhydrase is 2654 also downregulated in both 690-1340 ppm and 2900-5100 ppm pCO2. There is an 2655 additional δ carbonic anhydrase downregulated in both 690-1340 ppm and 2900-5100 2656 ppm pCO2 but not upregulated in <230 ppm pCO2 (Figure 3, Supplemental Table 2). 2657

In 2900-5100 ppm pCO2 there also is downregulation of 1 of the ζ carbonic 2658 anhydrases that is upregulated in <230 ppm pCO2 (Figure 3, Supplemental Table 2). 2659

The putative carbonic anhydrases in T. oceanica are not differentially 2660 expressed in <230 or 690-1340 ppm pCO2 but there is a downregulation of 1 α and 2 δ 2661 carbonic anhydrases in 2900-5100 ppm pCO2 (Figure 3, Supplemental Table 2). 2662

However, there also is an upregulation of the other 3 δ carbonic anhydrases in 2900- 2663

5100 ppm pCO2 (Figure 3, Supplemental Table 2). None of the putative carbonic 2664 anhydrases in T. rotula are differentially expressed in any CO2 condition tested 2665

(Figure 3, Supplemental Table 2). 2666

2667

101

2668

Putative C4 photosynthesis genes 2669

Each of the three diatom transcriptomes contain PEPC, PEPCK, ME, and PYC 2670

-- the key carboxylating and decarboxylating enzymes hypothesized to be critical for 2671 diatom C4 metabolism (Figure 6, Supplemental Table 2) (Reinfelder et al. 2000; Sage 2672

2004; Kustka et al. 2014). The localization and pCO2 sensitivity of these enzymes 2673 varies across species, but there is little compelling evidence that C4 enzymes are 2674 playing an important role in the CCM of these Thalassiosiroid diatoms (Supplemental 2675

Table 2). The putative C4 metabolism genes recovered from the T. oceanica 2676 transcriptome are supported by the genomic sequences (Lommer et al. 2012) 2677

(Supplemental Table 3). 2678

If PEPC mediated carboxylation is important for a functional CCM, PEPC 2679 should be localized outside the plastid and be upregulated in low pCO2 or 2680 downregulated in high CO2 (Figure 6). Consistent with the suggestion that PEPC plays 2681 a role in the diatom CCM, T. oceanica downregulates a putative cytosolic PEPC in 2682

2900-5100 ppm pCO2 (Supplemental Table 2). Similarly, in <230 ppm pCO2 T. 2683 weissflogii upregulates 1 putative mitochondrial PEPC and 1 putatively cytosolic 2684

PEPC (which is also downregulated in 2900-5100 ppm pCO2) (Supplemental Table 2). 2685

However, T. weissflogii also upregulates a PEPC that is predicted to be plastid 2686 localized in <230 ppm pCO2 (Supplemental Table 2). Although there is a T. rotula 2687

PEPC that is predicted to be localized to the mitochondria which consistent with a 2688 potential role in a C4 CCM, it is not differentially expressed in any CO2 condition 2689 tested here (Supplemental Table 2). 2690

102

If PEPCK mediated decarboxylation is important for a functional CCM, 2691

PEPCK should be plastid localized and be upregulated in low pCO2 or downregulated 2692 in high CO2 (Figure 6). The T. oceanica PEPCK is mitochondrion-predicted and is 2693 downregulated in 690-1340 ppm pCO2 but then upregulated in the 2900-5100 ppm 2694 pCO2 condition, suggesting it is not involved in decarboxylation steps of the CCM 2695

(Supplemental Table 2). The T. weissflogii PEPCK is upregulated in 2900-5100 ppm 2696 pCO2 but it is also mitochondrion predicted, suggesting it is also not involved in 2697 decarboxylation steps of the CCM (Supplemental Table 2). The T. rotula PEPCK is 2698 upregulated in <230 ppm pCO2, but it is predicted to be cytosolic, suggesting it is also 2699 not involved in decarboxylation steps of the CCM (Supplemental Table 2). 2700

If ME mediated decarboxylation is important for a functional CCM, ME 2701 should be plastid localized and be upregulated in low pCO2 or downregulated in high 2702

CO2. T. oceanica downregulates a putative cytosolic ME in 2900-5100 ppm pCO2, 2703 suggesting it is not involved in the CCM because of its predicted localization 2704

(Supplemental Table 2). T. weissflogii downregulates a putative mitochondrial ME in 2705

<230 ppm pCO2, suggesting it is also not involved in the CCM (Supplemental Table 2706

2). T. rotula does not differentially express ME and ME is not predicted to be plastid 2707 localized in this diatom (Supplemental Table 2), suggesting that ME decarboxylation 2708 is also not important for a functional CCM in this diatom. 2709

If PYC mediated decarboxylation is important for a functional CCM, PYC 2710 should be plastid localized and be upregulated in low pCO2 or downregulated in high 2711

CO2. T. weissflogii downregulates a PYC in <230 ppm pCO2 and downregulates 2712 another in 2900-5100 ppm pCO2 and both copies are predicted to be cytosolic 2713

103

(Supplemental Table 2). T. oceanica does not differentially express PYC and PYC is 2714 not plastid localized (Supplemental Table 2), suggesting PYC is not a key aspect of 2715 the CCM of this diatom. Although the PYC in T. rotula is not differentially expressed, 2716 it is plastid localized, so its potential role in the CCM is ambiguous (Supplemental 2717

Table 2). 2718

- cAMP secondary messengers and CO2/HCO3 sensing 2719

Like T. pseudonana, T. oceanica upregulates adenylyl cyclase (IPR001054) in 2720 high CO2 but also downregulates 6 additional adenylyl cyclases (Supplemental Table 2721

1) (Hennon et al. 2015). Querying the NCBI NR database with the upregulated 2722 adenylyl cyclase indicates that the top blast hit (after the T. oceanica genome 2723 accession EJK45951.1) is the soluble adenylyl cyclase that has been suggested to play 2724 a role in CO2 sensing in T. pseudonana (XP_002288198.1) (Hennon et al. 2015). 2725

Unlike T. oceanica and T. pseudonana, T. rotula and T. weissflogii do not 2726 differentially express adenylyl cyclases in response to pCO2 (Supplemental Table 1). 2727

In contrast to the T. pseudonana study, but consistent with maintaining high 2728 intracellular cAMP levels in high CO2, T. oceanica downregulates a 2729 phosphodiesterase in 2900-5100 ppm pCO2 (Supplemental Table 1) (Hennon et al. 2730

2015). T. oceanica also downregulates 3 bZIP domain transcription factors 2731

(IPR004827) in 2900-5100 ppm pCO2, consistent with the suggestion that cAMP- 2732 responsive transcription factors downregulate clusters of genes involved in the CCM 2733

(including their own expression) in high CO2 (Supplemental Table 1) (Hennon et al. 2734

2015). There is also a single cAMP-dependent kinase downregulated by T. oceanica in 2735

2900-5100 ppm pCO2, but there are 3 additional cAMP-dependent kinases that are 2736

104

upregulated as well (Supplemental Table 1). T. weissflogii downregulates a 2737 phosphodiesterase in low pCO2 (<230 ppm pCO2), which is consistent with the 2738

Hennon et al. 2015 study that saw an upregulation of phosphodiesterases in T. 2739 pseudonana in elevated CO2 (Supplemental Table 1) (Hennon et al. 2015) 2740

(Supplemental Table 1). In contrast to T. pseudonana (where cAMP-responsive 2741 transcription factors downregulate clusters of genes involved in the CCM, including 2742 their own expression in high CO2 (Hennon et al. 2015)), T. rotula and T. weissflogii 2743 downregulate bZIP domain transcription factor and upregulate cAMP-dependent 2744 protein kinases in low CO2 (<230 ppm pCO2) (Supplemental Table 1). This suggests 2745 that bZIP domain transcription factors are not increasing the expression of CCM gene 2746 clusters (including their own expression) in T. rotula and T. weissflogii in low CO2. 2747

This is supported by the increase of cAMP-dependent kinases in T. rotula and T. 2748 weissflogii in low CO2 (Supplemental Table 1), as it was suggested that cAMP- 2749 responsive elements would upregulate the CCM in low CO2, including their own 2750 expression (Hennon et al. 2015). 2751

DISCUSSION 2752

Key point of regulation for T. oceanica carbon concentrating mechanism: SLC4- 2753

- mediated HCO3 uptake into the cytoplasm 2754

The role of carbonic anhydrases in the CCM of T. oceanica is unclear, as these 2755 enzymes are up and downregulated the 2900-5100 ppm pCO2 condition, which 2756 approximates the highest levels diatoms have experienced over the course of their 2757 evolution (Figure 3, Supplemental Table 2) (Sims et al. 2006; Medlin 2016; Kump et 2758 al. 2009). T. pseudonana also upregulates carbonic anhydrases in high CO2 (Hennon et 2759

105

al. 2015), but the high CO2 condition used in that study (~800 ppm) is considerably 2760 lower than the high CO2 conditions where T. oceanica differentially expresses 2761 bicarbonate transporters or carbonic anhydrases (2900-5100 ppm) and T. oceanica 2762 doesn’t differentially express bicarbonate transporters or carbonic anhydrases in the 2763

690-1340 ppm CO2 condition that would be the closest approximation to the Hennon 2764

2015 study (Figure 3, Supplemental Table 2) (Hennon et al. 2015). 2765

The incongruence between the δ carbonic anhydrases recovered from the 2766 genome (Lommer et al. 2012) and transcriptome of T. oceanica may be explained by 2767 the observation that the δ carbonic anhydrases have undergone relatively recent, 2768 lineage-specific gene expansions (Shen et al. 2017). These recent gene duplication 2769 events may lead to species-specific gene expansions that may not have had time to 2770 diverge and would encode very similar transcripts that would likely be merged in de 2771 novo assemblies. 2772

Consistent with a potential role in C4-based CCM, T. oceanica decreases the 2773 expression of a cytosolic PEPC gene in high CO2 (Supplemental Table 2, Figure 6) 2774

(Reinfelder et al. 2000; Sage 2004; Kustka et al. 2014). However, the potential 2775 decarboxylating enzymes’ (PEPCK, ME, PYC) localization (ME and PYC) and/or 2776 expression (PEPCK and PYC) suggests that C4 metabolism is not part of the CCM of 2777

T. oceanica (Supplemental Table 2, Figure 6) (Reinfelder et al. 2000; Sage 2004; 2778

Kustka et al. 2014). 2779

The high CO2 downregulation of 2 cytosolic membrane SLC4 transporters 2780 suggests that exogenous bicarbonate uptake is the key point of regulation for T. 2781 oceanica’s CCM (Figure 3, Supplemental Table 2). This is supported by the higher 2782

106

number of SLC4 transporters found in T. oceanica (5) relative to the 3 found in T. 2783 weissflogii (this study) and T. pseudonana (Armbrust et al. 2004; Nakajima et al. 2784

2013) and the 4 found in T. rotula (this study). 2785

T. oceanica decreased growth rate in high CO2 due to ionic and redox stress 2786

We hypothesize that T. oceanica’s decreased growth rate in high CO2 is related 2787 to oxidative stress and perturbed intracellular ionic balance (Figure 1, Supplemental 2788

Table 1) (King et al. 2015). 2789

Ionic stress in T. oceanica grown in high CO2 may be explained by the 2790 downregulation of SLC4 transporters in high CO2 (Figure 3, Supplemental Table 2). 2791

The SLC4 transporters in diatoms require Na+ ions and are thought to be either Na+ co- 2792

+ - - transpoters or Na driven Cl / HCO3 exchangers (Nakajima et al. 2013). Their activity 2793 requires the movement of Na+ back out of the cell and/or Cl- back into the cell to 2794 maintain ionic gradients across cellular membranes (Nakajima et al. 2013). The 2795 downregulation of SLC4 transporters in high CO2 may perturb the intracellular 2796 balance of Na+ and/or Cl- in T. oceanica. 2797

The redox stress observed in T. oceanica may be related to the carbonic 2798 anhydrases, as the carbonic anhydrases in P. tricornutum are partly redox regulated 2799 via the activity of thioredoxin (Kikutani et al. 2012). T. oceanica’s regulation of redox 2800 enzymes in high pCO2 may indicate that this diatom uses similar redox regulation of 2801 carbonic anhydrase activity (Figure 1, Supplemental Table 1). 2802

2803

2804

107

Key point of regulation for T. weissflogii carbon concentrating mechanism: 2805 cytosolic-predicted β, δ, and ζ carbonic anhydrases preventing CO2 leakage 2806

In terms of the carbon concentrating mechanism, there is a tight regulation of 2807 cytosolic-predicted β, δ, and ζ carbonic anhydrases across CO2 conditions (Figure 3, 2808

Figure 5, Supplemental Table 2). This is consistent with studies in T. pseudonana and 2809

P. tricornutum that suggest the β, δ, and ζ classes of carbonic anhydrases are 2810 upregulated in low pCO2 and/or downregulated in high pCO2 (Tachibana et al. 2011; 2811

Samukawa et al. 2014; Clement et al. 2017; Hennon et al. 2015; Crawfurd et al. 2011; 2812

McGinn & Morel 2008). 2813

None of the T. weissflogii SLC4 or SLC26 bicarbonate transporters are 2814 downregulated in 690-1340 or 2900-5100 or upregulated in <230 ppm pCO2 (Figure 2815

3, Supplemental Table 2). This is consistent with proteomics studies in T. pseudonana 2816 that did not observe downregulation of SLC4 transporters in high pCO2 conditions 2817

(Clement et al. 2017), but is in contrast with a transcriptome study where T. 2818 pseudonana downregulated an SLC4 transporter in high pCO2 (Hennon et al. 2015). 2819

There is one SLC26 transporter predicted to be localized to the plastid that is 2820 downregulated in <230 ppm pCO2 in T. weissflogii (Figure 3, Supplemental Table 2), 2821 but it is possible that this transporter is involved in transporting other anions (e.g., 2822 sulfate) (Alper & Sharma 2013). 2823

Consistent with a potential role in a C4-based CCM, T. weissflogii decreases 2824 the expression of a cytosolic PEPC gene in high CO2 and increases its expression in 2825 low CO2 (Supplemental Table 2, Figure 6) (Reinfelder et al. 2000; Sage 2004; Kustka 2826 et al. 2014). However, there also is upregulation of a plastid predicted PEPC in low 2827

108

CO2, which would potentially compete with RuBisCO for CO2 and is in opposition to 2828 what is expected of a carbon concentrating mechanism (Supplemental Table 2, Figure 2829

6) (Reinfelder et al. 2000; Sage 2004; Kustka et al. 2014). PEPCK or ME mediated- 2830 decarboxylation is unlikely in T. weissflogii because both genes are predicted to be 2831 mitochondrial rather than plastid localized (Supplemental Table 2, Figure 6, 2832

(Reinfelder et al. 2000; Sage 2004; Kustka et al. 2014)). Further, they are upregulated 2833 in 2900-5100 ppm pCO2 and downregulated in <230 ppm pCO2, which is counter to 2834 what would be expected of the CCM (Supplemental Table 2, Figure 6) (Reinfelder et 2835 al. 2000; Sage 2004; Kustka et al. 2014). Although there is a T. weissflogii PYC gene 2836 that is downregulated in 2900-5100 ppm pCO2, its predicted localization to the cytosol 2837 suggest that it is not playing a role as a C4 decarboxylase, which would require 2838 localization to the plastid (Supplemental Table 2, Figure 6) (Reinfelder et al. 2000; 2839

Sage 2004; Kustka et al. 2014). 2840

Their predicted localization to the cytoplasm (Figure 3, Figure 5, Supplemental 2841

Table 2) suggests that similar to T. oceanica, the key point of regulation for the T. 2842 weissflogii carbon concentrating mechanism is the flux of inorganic through the 2843

- cytoplasm. While this is mediated by SLC4 HCO3 transporters in T. oceanica, T. 2844 weissflogii seems to focus more on diffusive CO2 uptake and using carbonic 2845 anhydrases to prevent CO2 leakage out of the cell. 2846

The presence of both β and ζ carbonic anhydrases in T. weissflogii is 2847 interesting, as neither of the two model diatoms examined thus far (T. pseudonana and 2848

P. tricornutum) have both the β and ζ carbonic anhydrases – T. pseudonana has a ζ but 2849 not a β, and the reverse is true for P. tricornutum (Tachibana et al. 2011; Samukawa et 2850

109

al. 2014) (Table 2). It has also been noted that the β and ζ carbonic anhydrases are 2851 found in relatively few diatom transcriptomes compared to the more commonly found 2852

α, δ, and γ carbonic anhydrases, highlighting the rarity of the scenario where a diatom 2853 simultaneously has both of these carbonic anhydrases (Shen et al. 2017). However, it 2854 is possible that β and ζ carbonic anhydrase are more widely distributed than it would 2855 seem from available transcriptome data if their transcripts are too lowly expressed to 2856 be recovered by assemblies that do not include a low pCO2 condition which may be 2857 required to induce their expression. 2858

T. weissflogii energetic trade-offs across CO2 conditions 2859

We hypothesize that the energetic cost of upregulating the carbonic anhydrases 2860 may be the cause of the decreased specific growth rate in <230 ppm pCO2 (Figure 3, 2861

Supplemental Table 2) (Giordano et al. 2005; Raven et al. 2014; King et al. 2015). T. 2862 weissflogii grown in low pCO2 also decreases the expression of the pentose phosphate 2863 pathway (Figure 1, Supplemental Table 1), fatty acid synthesis (Supplemental Table 2864

1), and glycolysis/gluconeogenesis (Figure 1, Supplemental Table 1) suggesting that 2865 in higher pCO2 it may divert carbon and reducing power toward the synthesis of 2866 storage compounds like fatty acids and/or glucose/chrysolaminarin instead of an 2867 increased specific growth rate, as seen in T. rotula grown in high pCO2 (King et al. 2868

2015). 2869

Key point of regulation for T. rotula carbon concentrating mechanism: SLC4- 2870

- mediated HCO3 uptake into the plastid 2871

T. rotula does not differentially express any carbonic anhydrases in any CO2 2872 condition tested here (Figure 3, Supplemental Table 2). However, some of the δ, γ, 2873

110

and ζ carbonic anhydrases are highly expressed and perhaps are not differentially 2874 expressed because they are constitutively expressed across CO2 conditions, as has 2875 been observed for a plastid localized α carbonic anhydrases of T. pseudonana (Figure 2876

3, Supplemental Table 2) (Hennon et al. 2015). 2877

T. rotula does upregulate a plastid-predicted SLC4 transporter in <230 ppm 2878 pCO2, but it also downregulates a cytosolic SLC4 transporter in the same condition 2879

(Figure 3, Supplemental Table 2). This is in contrast with T. pseudonana, which 2880 downregulates a cytoplasmic SLC4 gene in high CO2 and seems to rely on a 2881

- bestrophin putative ion channel to transport HCO3 into the plastid (Hennon et al. 2882

2015). 2883

There is little evidence of a C4-like CCM in T. rotula, which does not 2884 differentially express PEPC across CO2 conditions. Its localization to the mitochondria 2885 is also not consistent with a carboxylase role in a C4-based CCM unless it is 2886 functioning as a means to recover respired CO2 (Supplemental Table 2, Figure 6) 2887

(Reinfelder et al. 2000; Sage 2004; Kustka et al. 2014). The PEPCK, ME, and PYC 2888 enzymes that could act as decarboxylases are either not CO2 sensitive or not localized 2889 to the plastid (Reinfelder et al. 2000; Sage 2004; Kustka et al. 2014). There is a plastid 2890 localized PYC, and while lack of CO2 sensitivity makes its role in a C4-based CCM 2891 ambiguous, its high expression suggests it may be constitutively expressed 2892

(Supplemental Table 2, Figure 6) (Reinfelder et al. 2000; Sage 2004; Kustka et al. 2893

2014). 2894

These data suggest that T. rotula may rely more on SLC4 uptake into the 2895 plastid for their CCM rather than uptake into the cytosol or the activity of carbonic 2896

111

anhydrases, while the potential role of a C4-like CCM in this diatom remains 2897 ambiguous. 2898

T. rotula relies on degradation pathways to sustain growth in low CO2 2899

The upregulation of genes involved in degrading amino and fatty acids in low 2900 pCO2 suggests that T. rotula is relying on these catabolic reactions to fuel growth 2901 when inorganic carbon is less readily available (Figure 1, Supplemental Table 1). The 2902 slower growth rate in 320-390 ppm pCO2 and <230 ppm pCO2 might indicate that T. 2903 rotula is becoming carbon limited in these conditions, but the increased C:P ratio in 2904

<230 ppm pCO2 suggests that T. rotula is probably not carbon limited but is 2905 potentially phosphate limited in low pCO2 (King et al. 2015). T. rotula does not 2906 upregulate alkaline phosphatases or phosphate transporters in low pCO2, genes that are 2907 indicative of phosphate limitation in T. pseudonana (Dyhrman et al. 2012), but there is 2908 an upregulation of a serine/threonine protein phosphatase protein that cleaves 2909 phosphate groups from serine and threonine residues (Supplemental Table 1). T. rotula 2910 doesn’t increase the expression of genes indicative of increased carbon storage in high 2911

CO2 (Supplemental Table 1) consistent with the suggestion that it funnels excess 2912 carbon toward an increased growth rate rather than synthesis of storage compounds as 2913 suggested for T. weissflogii (King et al. 2015). 2914

Evidence for cAMP-mediated CO2 sensing in T. oceanica, but not T. weissflogii or 2915

T. rotula 2916

Evidence suggests diatoms sense external CO2 levels via the activity of cAMP 2917 secondary messengers (Ohno et al. 2012; Hennon et al. 2015). This seems to be true of 2918

112

T. oceanica, but there is little evidence of cAMP mediated CO2 sensing in T. rotula 2919 and T. weissflogii. 2920

One possible complicating factor is the cAMP-mediated cross-talk between 2921

CO2 and light sensing (P. tricornutum diatoms grown in low light are less able to 2922 upregulate β carbonic anhydrases in response to low pCO2 (Tanaka et al. 2016)). The 2923 experiments described in this study were conducted in growth-saturating light 2924 conditions, so a low-light response is not anticipated here (King et al. 2015). However, 2925 the role of cAMP secondary messengers in diatom biology is not completely 2926 understood and there may be additional cross-talk of cAMP (or other signaling 2927 pathways) that obscure further indication of CO2 mediated cAMP signaling in T. 2928 rotula and T. weissflogii. An additional possibility is that T. rotula and T. weissflogii 2929 sense changes in external pCO2 through some other mechanism that does not utilize 2930 adenylyl cylases or cAMP as CO2 sensors. 2931

CONCLUSIONS 2932

The results of this study suggest that the gene complement, subcellular 2933 localization, and CO2 sensitivity of the CCM varies not only across diatom genera 2934

(e.g., Thalassiosira versus Phaeodactylum) (Tachibana et al. 2011; Samukawa et al. 2935

2014), but can also very across species within the same diatom genera. This is 2936 consistent with the hypothesis that historic fluctuations in atmospheric CO2 have led to 2937 a diversification and re-configuation of the CCM within the diatoms (Young et al. 2938

2016; Shen et al. 2017). This study suggests that this diversification has occurred even 2939 in diatoms within the same genus. Additional studies of diatom genomes or low CO2 2940 transcriptomes will also be important for understanding whether or not the presence of 2941

113

both β and ζ carbonic anhydrases in T. weissflogii (Table 2, Figure 5, Supplemental 2942

Table 2) is as anomalous as it would seem based on current available diatom genomic 2943 and transcriptomic evidence. 2944

Different species’ strategies for balancing carbon storage versus carbon 2945 spending (e.g., increased growth rate) may influence community composition and a 2946 species’ ability to cope with additional stressors in a high CO2 future. For example, a 2947 larger internal pool of fatty acids and lipids may help T. weissflogii to survive periods 2948 of phosphorus limitation in high CO2 by allowing access to an intracellular pool of 2949 phospholipids (Dyhrman et al. 2012). Additional mesocosm experiments applying 2950 multiple stressors (e.g., phosphate limitation and elevated CO2) to these naturally 2951 occurring phytoplankton assemblages may give us a better understanding of how 2952 contrasting, species-specific strategies impact competitive dynamics and diatom 2953 community composition. 2954

Many of the conclusions about the diatom CCM presented here rely on 2955 computational predictions of sub-cellular protein localization. The computational 2956 predictions of sub-cellular localization of diatom proteins do not always agree with 2957 observed localizations (Samukawa et al. 2014), so the predicted localizations 2958 presented here (Figure 5, Supplemental Table 2) should be experimentally validated in 2959 future studies. Additionally, the protein products of putative CCM genes requires 2960 biochemical characterization to confirm their enzymatic activity. 2961

The hypothesis that T. oceanica uses cAMP mediated CO2 sensing (but T. 2962 rotula and T. weissflogii don’t) could be tested by tracking the expression of the CCM 2963 genes (carbonic anhydrases and bicarbonate transporters) across a range of CO2 2964

114

conditions with and without the addition of the phosphodiesterase inhibitor 3-isobutyl- 2965

1-methylxanthine (Hennon et al. 2015). These studies could be particularly useful if 2966 they are combined with measurements of carbon assimilation rates via 2967 photosynthetron P versus E cruves (phototosynthesis-irradiance response curves) 2968

(Lewis & Smith 1983). 2969

2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005

115

References 3006 3007 Alper, S.L. & Sharma, A.K., 2013. The SLC26 Gene Family of Anion Transporters 3008

and Channels Seth. Molecular Aspects of Medicine, 34(2–3), pp.494–515. 3009

Andrews, S., 2010. FastQC: A quality control tool for high throughput sequence data. 3010

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ 3011

Armbrust, E.V. et al., 2004. The Genome of the Diatom Thalassiosira pseudonana: 3012

Ecology, Evolution, and Metabolism. Science, 306(October), pp.79–86. 3013

Bendtsen, J.D. et al., 2004. Improved prediction of signal peptides: SignalP 3.0. 3014

Journal of Molecular Biology, 340(4), pp.783–795. 3015

Buck, J. et al., 1999. Cytosolic adenylyl cyclase defines a unique signaling molecule 3016

in mammals. Proceedings of the National Academy of Sciences, 96(1), pp.79–84. 3017

Caldeira, K. & Wickett, M.E., 2003. Oceanography: anthropogenic carbon and ocean 3018

pH. Nature, 425(6956), p.365. 3019

Clement, R. et al., 2017. Responses of the marine diatom Thalassiosira pseudonana to 3020

changes in CO2 concentration : a proteomic approach. Nature Publishing Group, 3021

(February), pp.1–12. 3022

Conti, M. & Beavo, J., 2007. Biochemistry and Physiology of Cyclic Nucleotide 3023

Phosphodiesterases: Essential Components in Cyclic Nucleotide Signaling. 3024

Annual Review of Biochemistry, 76(1), pp.481–511. 3025

Crawfurd, K.J. et al., 2011. The response of Thalassiosira pseudonana to long-term 3026

exposure to increased CO2 and decreased pH. PLoS ONE, 6(10), pp.1–9. 3027

Dyhrman, S.T. et al., 2012. The transcriptome and proteome of the diatom 3028

Thalassiosira pseudonana reveal a diverse phosphorus stress response. PLoS 3029

116

ONE, 7(3). 3030

Emanuelsson, O. et al., 2007. Locating proteins in the cell using TargetP, SignalP and 3031

related tools. Nature Protocols, 2(4), pp.953–971. 3032

Feely, R. a et al., 2008. Evidence for upwelling of corrosive “acidified” water onto the 3033

continental shelf. Science, 320(5882), pp.1490–1492. 3034

Felsenstein, J., 1989. PHYLIP - Phylogeny inference package - v3.2. Cladistics, 3035

pp.164–166. 3036

Giordano, M., Beardall, J. & Raven, J.A., 2005. CO2 CONCENTRATING 3037

MECHANISMS IN ALGAE: Mechanisms, Environmental Modulation, and 3038

Evolution. Annual Review of Plant Biology, 56(1), pp.99–131. 3039

Gruber, A. et al., 2015. Plastid proteome prediction for diatoms and other algae with 3040

secondary plastids of the red lineage. Plant Journal, 81(3), pp.519–528. 3041

Guillard, R.R.L., 1975. Culture of Phytoplankton for Feeding Marine Invertebrates. In 3042

Culture of Marine Invertebrate Animals. pp. 29–60. 3043

Guillard, R.R.L. & Ryther, J.H., 1962. Studies of marine planktonic diatoms I. 3044

Cyclotella nana Hustedt, and Detonula confervasea (Cleve) Gran. Canadian 3045

Journal of Microbiology, 8(1140), pp.229–239. 3046

Haimovich-Dayan, M. et al., 2013. The role of C4 metabolism in the marine diatom 3047

Phaeodactylum tricornutum. New Phytologist, 197(1), pp.177–185. 3048

Harada, H. et al., 2006. CO2 sensing at ocean surface mediated by cAMP in a marine 3049

diatom. Plant physiology, 142(3), pp.1318–1328. 3050

Hennon, G.M.M. et al., 2015. Diatom acclimation to elevated CO2 via cAMP 3051

signalling and coordinated gene expression. Nature Climate Change, 5(August), 3052

117

pp.761–766. 3053

Hewett-Emmett, D. & Tashian, R.E., 1996. Functional diversity, conservation, and 3054

convergence in the evolution of the α-, β-, and γ-carbonic anhydrase gene 3055

families. Molecular Phylogenetics and Evolution, 5(1), pp.50–77. 3056

Hofmann, G.E. et al., 2011. High-Frequency Dynamics of Ocean pH : A Multi- 3057

Ecosystem Comparison. PLoS ONE, 6(12). 3058

Hopkinson, B.M., 2014. A chloroplast pump model for the CO2 concentrating 3059

mechanism in the diatom Phaeodactylum tricornutum. Photosynthesis Research, 3060

121(2–3), pp.223–233. 3061

Hopkinson, B.M. et al., 2011. Efficiency of the CO2-concentrating mechanism of 3062

diatoms. Proceedings of the National Academy of Sciences, 108(10), pp.3830– 3063

3837. 3064

Hopkinson, B.M., Dupont, C.L. & Matsuda, Y., 2016. The physiology and genetics of 3065

CO2 concentrating mechanisms in model diatoms. Current Opinion in Plant 3066

Biology, 31, pp.51–57. 3067

Jones, P. et al., 2014. InterProScan 5: Genome-scale protein function classification. 3068

Bioinformatics, 30(9), pp.1236–1240. 3069

Katoh, K. & Standley, D.M., 2013. MAFFT multiple sequence alignment software 3070

version 7: Improvements in performance and usability. Molecular Biology and 3071

Evolution, 30(4), pp.772–780. 3072

Khatiwala, S. et al., 2013. Global ocean storage of anthropogenic carbon. 3073

Biogeosciences, 10(4), pp.2169–2191. 3074

Kikutani, S. et al., 2012. Redox regulation of carbonic anhydrases via thioredoxin in 3075

118

chloroplast of the marine diatom Phaeodactylum tricornutum. Journal of 3076

Biological Chemistry, 287(24), pp.20689–20700. 3077

Kikutani, S. et al., 2016. Thylakoid luminal θ-carbonic anhydrase critical for growth 3078

and photosynthesis in the marine diatom Phaeodactylum tricornutum. 3079

Proceedings of the National Academy of Sciences, 113(35), pp.9828–9833. 3080

King, A.L. et al., 2015. Effects of CO2 on growth rate, C:N:P, and fatty acid 3081

composition of seven marine phytoplankton species. Marine Ecology Progress 3082

Series, 537, pp.59–69. 3083

Krogh, A. et al., 2001. Predicting transmembrane protein topology with a hidden 3084

markov model: application to complete genomes. Journal of Molecular Biology, 3085

305(3), pp.567–580. 3086

Kroth, P.G. et al., 2008. A Model for Carbohydrate Metabolism in the Diatom 3087

Phaeodactylum tricornutum Deduced from Comparative Whole Genome 3088

Analysis. PLoS ONE, 3(1), p.e1426. 3089

Kump, L.R., Bralowe, T.J. & Ridgwell, A., 2009. Ocean Acidification in Deep Time. 3090

Oceanography, 22(4), pp.94–107. 3091

Kustka, A.B. et al., 2014. Low CO2 results in a rearrangement of carbon metabolism 3092

to support C4 photosynthetic carbon assimilation in Thalassiosira pseudonana. 3093

New Phytologist, 204(3), pp.507–520. 3094

Lewis, M. & Smith, J., 1983. A small volume, short-incubation-time method for 3095

measurement of photosynthesis as a function of incident irradiance. Marine 3096

Ecology Progress Series, 13, pp.99–102. 3097

Lommer, M. et al., 2012. Genome and low-iron response of an oceanic diatom adapted 3098

119

to chronic iron limitation. Genome Biology, 13(7), p.R66. 3099

Mayr, B. & Montminy, M., 2001. Transcriptional regulation by the phosphorylation- 3100

dependent factor CREB. Nature reviews. Molecular cell biology, 2(8), pp.599– 3101

609. 3102

McGinn, P.J. & Morel, F.M.M., 2008. Expression and regulation of carbonic 3103

anhydrases in the marine diatom Thalassiosira pseudonana and in natural 3104

phytoplankton assemblages from Great Bay, New Jersey. Physiologia plantarum, 3105

133(1), pp.78–91. 3106

Medlin, L.K., 2016. Evolution of the diatoms: major steps in their evolution and a 3107

review of the supporting molecular and morphological evidence. Phycologia, 3108

55(1), pp.79–103. 3109

Min, X.J. et al., 2005. OrfPredictor: Predicting protein-coding regions in EST-derived 3110

sequences. Nucleic Acids Research, 33(SUPPL. 2), pp.677–680. 3111

Moriya, Y. et al., 2007. KAAS: An automatic genome annotation and pathway 3112

reconstruction server. Nucleic Acids Research, 35(SUPPL.2), pp.182–185. 3113

Nakajima, K., Tanaka, A. & Matsuda, Y., 2013. SLC4 family transporters in a marine 3114

diatom directly pump bicarbonate from seawater. Proceedings of the National 3115

Academy of Sciences of the United States of America, 110(5), pp.1767–72. 3116

Nelson, D.M. et al., 1995. Production and dissolution of biogenic silica in the ocean: 3117

Revised global estimates, comparison with regional data and relationship to 3118

biogenic sedimentation. Global Biogeochemical Cycles, 9(3), pp.359–372. 3119

Ohno, N. et al., 2012. CO2-cAMP-Responsive cis-Elements Targeted by a 3120

Transcription Factor with CREB/ATF-Like Basic Zipper Domain in the Marine 3121

120

Diatom Phaeodactylum tricornutum. Plant Physiology, 158(1), pp.499–513. 3122

Del Prete, S. et al., 2014. Discovery of a new family of carbonic anhydrases in the 3123

malaria pathogen Plasmodium falciparum--the η-carbonic anhydrases. 3124

Bioorganic & medicinal chemistry letters, 24(18), pp.4389–96. 3125

Le Quéré, C. et al., 2016. Global Carbon Budget 2016. Earth System Science Data, 3126

8(2), pp.605–649. 3127

Raven, J., 1991. Physiology of inorganic carbon acquisition and implications for 3128

resource use efficiency by marine phytoplankton: Relation to increased carbon 3129

dioxide and temperature. Plant, Cell and Environment, 14(8), pp.779–794. 3130

Raven, J.A., Beardall, J. & Giordano, M., 2014. Energy costs of carbon dioxide 3131

concentrating mechanisms in aquatic organisms. Photosynthesis Research, 3132

121(2–3), pp.111–124. 3133

Reinfelder, J.R., Kraepiel, a M. & Morel, F.M., 2000. Unicellular C4 photosynthesis 3134

in a marine diatom. Nature, 407(October), pp.996–999. 3135

Reinfelder, J.R., Milligan, A.J. & Morel, F.M.M., 2004. The Role of the C 4 Pathway 3136

in Carbon Accumulation and Fixation in a Marine Diatom. Plant Physiology, 3137

135(August), pp.2106–2111. 3138

Riebesell, U. et al., 2000. Reduced calcification of marine plankton in response to 3139

increased atmospheric CO2. Nature, 407(6802), pp.364–7. 3140

Roberts, K. et al., 2007. C3 and C4 Pathways of Photosynthetic Carbon Assimilation 3141

in Marine Diatoms Are under Genetic, Not Environmental, Control. Plant 3142

Physiology, 145(1), pp.230–235. 3143

Sabine, C.L. et al., 2004. The Oceanic Sink for Anthropogenic CO2. Science, 3144

121

305(5682), pp.367–371. 3145

Sage, R.F., 2004. The evolution of C4 photosynthesis. New Phytologist, 161(2), 3146

pp.341–370. 3147

Samukawa, M. et al., 2014. Localization of putative carbonic anhydrases in the marine 3148

diatom, Thalassiosira pseudonana. Photosynthesis Research, 121(2–3), pp.235– 3149

249. 3150

Scripps, 2017. The Keeling Curve: Carbon Dioxide Measurements at Mauna Loa. 3151

Scripps Institute of Oceanography. 3152

Shen, C., Dupont, C.L. & Hopkinson, B.M., 2017. The diversity of carbon dioxide- 3153

concentrating mechanisms in marine diatoms as inferred from their genetic 3154

content. Journal of Experimental Botany. 3155

Shi, D. et al., 2010. Effect of Ocean Acidification on Iron Availability to Marine 3156

Phytoplankton. Science, 327(5966), pp.676–679. 3157

Sims, P.A., Mann, D.G. & Medlin, L.K., 2006. Evolution of the diatoms: insights 3158

from fossil, biological and molecular data. Phycologia, 45(4), pp.361–402. 3159

Smith, K.S. et al., 1999. Carbonic anhydrase is an ancient enzyme widespread in 3160

prokaryotes. Proceedings of the National Academy of Sciences of the United 3161

States of America, 96(26), pp.15184–9. 3162

So, A.K.C. et al., 2004. A Novel Evolutionary Lineage of Carbonic Anhydrase (ε 3163

Class) Is a Component of the Carboxysome Shell. Journal of Bacteriology, 3164

186(3), pp.623–630. 3165

Solomon, S. et al., 2007. IPCC 2007 Climate Change 2007. In The Physical Science 3166

Basis. Contribution of Working Group I to the Fourth Assessment Report of the 3167

122

Intergovernmental Panel on Climate Change. Cambridge: Cambridge University 3168

Press. 3169

Soto, A.R. et al., 2006. Identification and preliminary characterization of two cDNAs 3170

encoding unique carbonic anhydrases from the marine alga . 3171

Applied and Environmental Microbiology, 72(8), pp.5500–5511. 3172

Stamatakis, A., 2014. RAxML version 8: A tool for phylogenetic analysis and post- 3173

analysis of large phylogenies. Bioinformatics, 30(9), pp.1312–1313. 3174

Tachibana, M. et al., 2011. Localization of putative carbonic anhydrases in two marine 3175

diatoms, Phaeodactylum tricornutum and Thalassiosira pseudonana. 3176

Photosynthesis Research, 109(1–3), pp.205–221. 3177

Tanaka, A. et al., 2016. Light and CO2 /cAMP Signal Cross Talk on the Promoter 3178

Elements of Chloroplastic β-Carbonic Anhydrase Genes in the Marine Diatom 3179

Phaeodactylum tricornutum. Plant Physiology, 170(2), pp.1105–1116. 3180

Tans, P., 2009. An Accounting of the Observed Increase in Oceanic and Atmospheric 3181

CO2 and the Outlook for the Future. Oceanography, 22(4), pp.26–35. 3182

Wu, Z. et al., 2010. Empirical bayes analysis of sequencing-based transcriptional 3183

profiling without replicates. BMC Bioinformatics, 11(1), p.564. 3184

Young, J.N. et al., 2016. Large variation in the Rubisco kinetics of diatoms reveals 3185

diversity among their carbon-concentrating mechanisms. Journal of Experimental 3186

Botany, 67(11), pp.3445–3456. 3187

3188

3189

3190

123

3191 T. weissfogii low projectedmax Figure 1. A heat map dihydrolipoamide dehydrogenase 4 pyruvate kinase showing a subset of fructose-1,6-bisphosphatase I the differentially aldehyde dehydrogenase log2 fold change expressed genes from glucose-6-phosphate 1-epimerase 2 triosephosphate isomerase T. weissflogii, T. glyceraldehyde 3-phosphate dehydrogenase rotula, and T. pyruvate dehydrogenase E2 component 0 oceanica. Warmer phosphoglycerate kinase colors in the heat map phosphoglucomutase pyruvate kinase indicate an transketolase -2 upregulation and a ribose 5-phosphate isomerase cooler color indicates ribose 5-phosphate isomerase a downregulation. “Low” refers to the T. rotula <230 ppm CO2 low projected 4 condition, “projected”

3-hydroxyacyl-CoA dehydrogenase log2 fold change butyryl-CoA dehydrogenase refers to the 690-1340 2 acetyl-CoA C-acetyltransferase ppm pCO2 condition, acetyl-CoA acyltransferase 2 and “max” refers to acetyl-CoA acyltransferase 0 enoyl-CoA hydratase the 2900-5100 ppm long-chain 3-hydroxyacyl-CoA dehydrogenase -2 pCO2 condition. short/branched chain acyl-CoA dehydrogenase

T. oceanica low projectedmax 4 cation/H+ exchanger cation/H+ exchanger H+-transporting ATPase caspase-like domain

thiol-activated cytolysin log2 fold change 2 thioredoxin domain thioredoxin domain thioredoxin reductase thioredoxin-like fold thioredoxin-like fold 0 thioredoxin-like fold peroxiredoxin Q/BCP glutathione S-transferase glutathione S-transferase haem peroxidase -2 haem peroxidase catalase-peroxidase

124

3192

Figure 2. A maximum likelihood phylogeny of the bicarbonate transporters from the model diatoms T. pseudonana (JGI Thaps3 gene IDs) and P. tricornutum (JGI IDs or NCBI accession numbers), the transcriptomes assembled in this study (indicated by asterisks), and MAFFT homologs included for reference. All alignments were done in MAFFT (version 7) using the L-INS-i strategy with the BLOSUM62 scoring matrix, the “leave gappy regions” option selected, and the MAFFT-homologs option selected (Katoh & Standley 2013). RAxML maximum likelihood trees were computed with the following parameters: PROTGAMMA matrix, BLOSUM62 substitution model, and 1000 bootstrap replicates. The tree was mid-point rooted in FigTree. Support values less than 70 are not shown.

125

3193

126

3194 3195 3196 3197 Figure 3. Expression values of the carbonic anhydrases and bicarbonate 3198 transporters from the transcriptomes in this study. Each point represents a 3199 single transcript. The x-axis is the log 10 transformed RPKM values in the 3200 320-390 ppm pCO2 and the y-axis is the log 10 transformed RPKM values in 3201 the <230 ppm condition, the 690-1340 ppm pCO2 condition, and the 2900- 3202 5100 ppm pCO2 condition at the bottom of the figure. The black diagonal line 3203 indicates the same expression value in the two conditions compared. Circled 3204 points appearing above this line are upregulated in the condition indicated on 3205 the y-axis and circled points below it are downregulated in the condition 3206 indicated on the y-axis (>= 0.95 posterior probability of a 2-fold change in 3207 ASC) (Wu et al. 2010). The color of the points indicates the specific isozyme 3208 type and the shape of the point indicates from which organism the transcript 3209 originates. 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223

127

Figure 3. Expression values of the carbonic anhydrases and bicarbonate transporters from the transcriptomes in this study. Each point represents a single transcript. The x-axis is the log 10 transformed RPKM values in the 320-390 ppm pCO2 and the y-axis is the log 10 transformed RPKM values in the <230 ppm condition, the 690-1340 ppm pCO2 condition, and the 2900-5100 ppm pCO2 condition at the bottom of the figure. The black diagonal line indicates the same expression value in the two conditions compared. Circled points appearing above this line are upregulated in the condition indicated on the y- axis and circled points below it are downregulated in the condition indicated on the y-axis (>= 0.95 posterior probability of a 2-fold change in ASC) (Wu et al. 2010). The color of the points indicates the specific isozyme type and the shape of the point indicates from which organism the transcript originates.

Figure 4. A maximum likelihood phylogeny of the carbonic anhydrases from the model diatoms T. pseudonana (JGI Thaps3 gene IDs) and P. 3224 tricornutum (JGI IDs or NCBI accession numbers), the three 3225 Thalassiosira transcriptomes assembled in this study (indicated by asterisks), and two additional carbonic anhydrases from Halothiobacillus neapolitanus and Plasmodium falciparum for reference. All alignments were done in MAFFT (version 7) using the L- INS-i strategy with the BLOSUM62 scoring matrix, the “leave gappy regions” option selected (Katoh & Standley 2013). RAxML maximum likelihood trees were computed with the following parameters: PROTGAMMA matrix, BLOSUM62 substitution model, and 1000 bootstrap replicates. The carbonic anhydrase tree was rooted using the gamma carbonic anhydrases because they are the most ancient carbonic anhydrase (Smith et al. 1999). Support values less than 70 are not shown.

128

3226 T. pseudonana

plasmamembrane

PPS CA2 Figure 5. Putative Cyt CA3 CA1 Pyr carbonic anhydrases CA1 CA1 (CAs) that have been CA3 localized in the model CA4 CA1 PPC Str CA2 diatom T. pseudonana mit (from Samukawa et al. CER CEV 2014) and the predicted

T. oceanica localization of putative CAs from T. oceanica and plasmamembrane CA1 T. weissflogii. The PPS CA3 CA2 localizations of the CA4 CA3 CA5 putative CAs reported in CA4 Pyr CA1 this study have not been CA5 Cyt experimentally verified CA1 PPC Str CA2 and are based on the CA1 CA3 mit predictions from SignalP CA2 CER CEV (V4.1), TargetP (V1.1), and ASA-find (1.1.5) T. weissflogii (Bendtsen et al. 2004; plasmamembrane Emanuelsson et al. 2007; CA2 CA1 PPS Gruber et al. 2015). CA3 CA2 CA6 CA3 SignalP was run against CA7 CA4 CA4 Pyr the eukaryote organism CA1 βCA1 CA2 CA5 group and the default CA3 CA3

Cyt CA4 cutoff values were used. PPC Str CA1 TargetP was run with the CA1 CA2 mit CER CEV plant model and “winner- takes-all” option selected.

T. rotula PPS = periplastid space, Cyt = cytoplasm, mit = plasmamembrane mitochondria, CER = PPS CA1 CA1 CA1 chloroplast endoplasmic CA2 CA2 CA3 reticulum, PPC = CA3 Pyr CA5 periplastidal compartment, CA4 CA1 βCA1 CEV = chloroplast CA2 Cyt PPC Str envelope, Str = chloroplast CA4 stroma, Pyr = pyrenoid. mit CER CEV

129

OAA

+ PEPC NH4 MDH PYC

GLX GDC CO2 PEPCK

CA MAL HCO3

ME PYR - SLC4/SLC26 HCO3 ALAT CO2 CO2 AGAT PEP PPDK PYR SPT

HPR ENO GLC PGM GK

3227 Figure 6. A schematic summarizing the enzymes thought to be involved in the diatom CCM, including variations of C4 metabolism (Reinfelder et al. 2000; Sage 2004; Kustka et al. 2014). Grey boxes indicate enzymes performing the conversions between the metabolites on either end of the black arrow. Grey boxes: AGAT = alanine glyoxylate amino transferase, ALAT = alanine amino transferase, CA = carbonic anhydrase, ENO = enolase, GDC = glycine decarboxylase P-protein, GK = glycerate kinase , HPR = hydroxypyruvate reductase, MDH = malate dehydrogenase, ME = malic enzyme, PEPC = phosphoenolpyruvate carboxylase, PEPCK = phosphoenolpyruvate carboxykinase, PGM = phosphoglycerate mutase, PPDK = pyruvate, phosphate dikinase, PYC = pyruvate carboxylase, SLC4/SLC26 = solute carrier family 4/6, SPT = serine pyruvate transaminase. Outside grey boxes: CO2 = carbon - dioxide, GLC = glycerate, GLX = glyoxylate, HCO3 = bicarbonate, + MAL = malate, NH4 = ammonium, OAA = oxaloacetate, PEP = phosphoenolpyruvate, PYR = pyruvate.

130

3228 Table 1. Numbers of differentially expressed genes (>= 0.95 posterior probability of a 2-fold change in ASC) (Wu et al. 2010) across pCO2 condition in each transcriptome.

low max projected T. weissfogii up 334 39 186 24900 genes down 247 117 248

T. oceanica up 1 56 1278 36382 genes down 106 2 2539

T. rotula up 678 34 - 31792 genes down 310 28 -

131

3229 3230

3231

3232

3233

13 14 19 14 10 3234 total

3235 θ 1 0 1 0 0 T. pseudonana 3236 and and

1 1 ζ 0 4 1 3237

3238 5 γ 3 5 5 2 et 2011; (Tachibana al. 3239 P. tricornutum P.

. 3240 5 δ 4 4 4 0

3241 β 1 0 1 0 2 3242

3243 α 7 3 2 3 5 tely experimentally tely verified experimentally 3244

comple 3245 ogii f umber of putative carbonic anhydrases from model diatom from and umbercarbonic model genomes anhydrases putative of

N 3246 .

rotula T. 3247 T. oceanica T. T. weiss T. Table 2 and Thefull the activity transcriptomes in described the localization of this study. complement inof carbonic anhydrases putative have not been Samukawa et 2014; al. et 2016) al. Kikutani P. tricornutum P. pseudonana T. 3248

3249

3250

3251

3252

132

Manuscript 3 3253

Formatted for publication in Journal of Experimental Marine Biology and Ecology 3254

Following diatom response to ocean acidication in mesocosm experiments with 3255 metatranscriptome sequencing indicates contrasting CO2 responses in Skeletonema 3256 and Thalassiosira diatoms 3257 3258 Joselynn Wallace1, Andrew L. King2, Gary H. Wilkfors3, Shannon L. Meseck3, Yuan 3259 Liu3, Bethany D. Jenkins1,4 3260 3261 1 Department of Cell and Molecular Biology, University of Rhode Island, Kingston, 3262 RI, USA 3263 3264 2 NIVA Norwegian Institute for Water Research, Bergen, Norway 3265 3266 3 NOAA/NMFS, Northeast Fisheries Science Center, Milford Laboratory, Milford, 3267 CT, USA 3268 3269 4 Graduate School of Oceanography, University of Rhode Island, Narragansett, RI, 3270 USA 3271 3272

133

3273 ABSTRACT 3274

Fossil fuel combustion has raised atmospheric pCO2 from ~277 ppm in 1750 to 3275 the present level of ~400 ppm. The ocean sequesters ~30% of that CO2, decreasing its 3276 buffering capacity and pH (ocean acidification). Diatoms are responsible for ~40% of 3277 oceanic primary production based upon photosynthetic fixation of CO2, which is aided 3278 by a carbon-concentrating mechanism (CCM), which uses bicarbonate transporters 3279

- - (BCTs) to take up HCO3 and carbonic anhydrases (CAs) to interconvert HCO3 and 3280

CO2. There are several distinct classes of BCTs and CAs found in diatoms, including 3281 the substrate-specific SLC4 and non-specific SLC26 BCTs, as well as the α, β, δ, γ, ζ, 3282 and θ CAs. The ability to regulate and conserve energetic costs associated with the 3283

CCM may influence diatom community composition in a higher CO2 future. 3284

This study used mesocosm CO2 manipulation incubations to examine the 3285 diatom community response to changes in pCO2. Seawater from Vineyard Sound, MA 3286 was collected in March of 2014, pre-filtered to remove large grazers, and amended 3287 with nutrients. The pCO2 levels were manipulated to approximate pre-industrial 3288 conditions (< 215 ppm pCO2), present-day pCO2 in the open ocean (330-390 ppm 3289 pCO2), and future pCO2 projections (>780 ppm pCO2). After 20 days, 3290 metatranscriptomes were sequenced (>3-µm size fraction); the majority of reads were 3291 attributed to Skeletonema spp. and Thalassiosira spp. diatoms. 3292

We show evidence of differing CCM strategies: Thalassiosira diatoms regulate 3293 the α, δ, and ζ CAs and the SLC4 and SLC26 BCTs; whereas, Skeletonema regulates 3294 the γ and ζ CAs and the SLC26 BCTs. The CCM of these two diatoms require 3295 different metal cofactors: CAs generally use zinc as a cofactor, but the γ CAs can 3296

134

substitute with iron and δ CAs can substitute with cobalt. Thalassiosira’s ability to 3297

— downregulate the HCO3 specific SLC4 BCTs in high CO2 may afford this organism 3298 some energetic savings in high CO2. 3299

INTRODUCTION 3300

Through physical mixing and biological activity, the ocean sequesters 3301 approximately 1/3 of anthropogenic CO2 and has been the only net sink for CO2 in the 3302 past 200 years (Sabine et al. 2004; Khatiwala et al. 2013). Rising anthropogenic CO2 3303 emissions have led to an increase in atmospheric pCO2 concentration from ~277 ppm 3304 in the year 1750 to over 400 ppm presently (Tans 2009; Le Quéré et al. 2016). 3305

Increased levels of dissolved CO2 gas in seawater perturbs carbonate chemistry in the 3306 ocean, which results in a decreased buffering capacity, or ocean acidification (OA) 3307

(Caldeira & Wickett 2003). 3308

Diatoms are a phytoplankton group responsible for ~40% of oceanic primary 3309 production via photosynthetic fixation of CO2 (Nelson et al. 1995). The predominant 3310

- dissolved form of inorganic carbon in the ocean is bicarbonate (HCO3 ) (Zeebe & 3311

Wolf-Gladrow 2001). Diatom CO2 fixation is aided by carbon-concentrating 3312

- mechanisms (CCMs), which use bicarbonate transporters (BCTs) to take up HCO3 3313

- and carbonic anhydrases (CAs) to interconvert HCO3 and CO2 adjacent to the carbon 3314 fixation enzyme RuBisCO (Badger et al. 1998; Hopkinson et al. 2011). Carbonic 3315 anhydrases are a protein family that arose from convergent evolution (Hewett-Emmett 3316

& Tashian 1996) that has resulted in distinct protein isoforms. In the diatoms, these 3317 include the widespread α, δ, and γ types to rare β and ζ carbonic anhydrases, and to the 3318

135

newly described and relatively underexplored θ carbonic anhydrases (Shen et al. 2017; 3319

Kikutani et al. 2016) 3320

In natural phytoplankton communities, diatoms will generally outcompete 3321 other phytoplankton in high CO2 (Tortell et al. 2002; Tortell et al. 2008; Bach et al. 3322

2017), however the diatom growth response can vary across diatom genera. CO2 3323 manipulation experiments of natural diatom assemblages indicate that in high CO2, 3324

Thalassiosira spp. grow faster than Skeletonema spp. (Sett et al. 2018) while 3325

Skeletonema spp. grow faster than Nitzschia spp. (Kim et al. 2006). Species within the 3326

Thalassiosira diatom genus also display a range of growth rate changes across CO2 3327 conditions (<230 ppm CO2 to >2900 ppm CO2), from no change across conditions (T. 3328 pseudonana), slower growth in lower CO2 (T. weissflogii), slower growth in higher 3329

CO2 (T. oceanica), and faster growth in higher CO2 (T. rotula) (King et al. 2015). 3330

Global gene expression profiling data (transcriptomes) from the King et al. 3331

2015 CO2 response experiments (King et al. 2015) indicate that differences in CO2 3332 responses between T. rotula, T. weissflogii, and T. oceanica boil down to their ability 3333 to dynamically regulate their carbon concentrating mechanisms while maintaining 3334 internal redox and ion homeostasis, as well as potential trade-offs between funneling 3335 excess CO2 into internal storage pools versus a higher growth rate (Wallace et al. in 3336 prep). This range of plasticity in the diatoms’ CCM is particularly important, as the 3337 use of this mechanism is energetically costly (Raven 1991). Diatoms that do 3338 downregulate the CCM in high CO2 conserve this energy, potentially conferring a 3339 competitive advantage over those that do not. However, diatoms that tend to store 3340 excess carbon in high CO2 (rather than exhibit higher growth rates) would have larger 3341

136

internal storage pools of lipids and carbohydrates, potentially allowing those 3342 populations to persist when nutrients are scarce. These differences in CCM regulation 3343 and storage tendencies could influence diatom community composition and dynamics, 3344 but the full impact of these contrasting abilities is not captured by single-species 3345 isolate experiments that do not accurately reflect the complex and competitive 3346 environment that diatoms inhabit. 3347

Community-level global gene expression profiling (metatranscriptome) studies 3348 are a powerful approach that capture the molecular basis for different diatoms’ 3349 contrasting growth and nutrient acquisition strategies in mixed assemblages. 3350

Metatranscriptomes sequenced from nutrient-amended waters on the coast of 3351

Narragansett Bay, Rhode Island suggest that Skeletonema and Thalassiosira diatoms 3352 co-exist because the two diatom genera may access different internal and external 3353 pools of nitrogen and phosphorus (Alexander et al. 2015), while Thalassiosira and 3354

Pseudo-nitzschia diatoms co-exist in the iron-limited Pacific Ocean by using enzymes 3355 with different metal cofactor requirements (Cohen et al. 2017). Similar approaches 3356 combining metatranscriptomes and manipulation of CO2 levels in natural 3357 phytoplankton assemblages might give a more complete picture of how changing CO2 3358 levels impact different diatoms’ metabolic capacities in their complex environment. 3359

This study used mesocosm CO2 manipulation incubations to examine the 3360 diatom community response to changes in pCO2. Surface seawater from Vineyard 3361

Sound, MA was pre-filtered to remove large grazers (>180 µm), amended with 3362 nutrients, and the pCO2 levels in the incubations were manipulated to approximate 3363 low/pre-industrial (less than 215 ppm pCO2) , present-day ambient pCO2 in the open 3364

137

ocean (330-390 ppm pCO2), and future pCO2 projections (780-1320 ppm pCO2) 3365

(Tans 2009; Le Quéré et al. 2016). Metatranscriptomes enriched for eukaryotic 3366 biomass (size fraction greater than 3 µm) were sequenced on an Illumina HiSeq 3367 platform. Metatranscriptome sequence reads were mapped to a database of 3368 transcriptomes available through the Gordon and Betty Moore Foundation Marine 3369

Microbial Eukaryote Transcriptome Sequencing Project (Keeling et al. 2014). Across 3370 all the libraries, the majority of reads map to Skeletonema and Thalassiosira diatoms. 3371

The presence of these diatoms across all incubations is corroborated by light 3372 microscopy observations of the incubations at the final time point and barcoding of the 3373 diatom-specific V4 region of the 18S rDNA gene. Analysis of the transcriptional 3374 response in Skeletonema and Thalassiosira in these mesocosm experiments show 3375 distinct enzymatic strategies for N and P uptake and scavenging in these two diatom 3376 genera. There is also evidence of distinct strategies in their CCMs – while 3377

Thalassiosira diatoms seem to use the α, δ, and ζ carbonic anhydrases and the SLC4 3378 and SLC26 bicarbonate transporters, Skeletonema uses the γ and ζ carbonic 3379 anhydrases and the SLC26 bicarbonate transporters. 3380

MATERIALS AND METHODS 3381

Seawater collection and Incubation 3382

On March 4th 2014, surface seawater was collected in Vineyard Sound, 3383

Massachusetts using acid-cleaned 50 liter carboys. After Nitex mesh pre-filtering to 3384 remove particles larger than 180 µm, water was distributed into 9 acid-cleaned, 10 L 3385 carboys. Target pCO2 levels were chosen to approximate low/pre-industrial (less than 3386

215 ppm pCO2), present-day ambient pCO2 in the open ocean (330-390 ppm pCO2), 3387

138

and future pCO2 projections (780-1320 ppm pCO2) (Tans 2009; Le Quéré et al. 2016). 3388

-1 Target CO2 concentrations were obtained by sparging with CO2 (<100 ml min ). and 3389

-1 between by gentle bubbling with an air:CO2 mixture (<30 ml min ). The 3390 macronutrients nitrate, phosphate and silicic acid were added to mimic nutrient levels 3391 in the mixed layer, aiming for final concentrations of 10 µM nitrate, 20 µM silicate, 3392 and 0.2 µM phosphate. Incubations were diluted with nutrient-amended filtered 3393

th th th th seawater pre-equilibrated to target CO2, on the 8 , 11 , 14 , and 17 day of 3394 incubation. On day 11, phosphate concentrations were amended to 1 µM to avoid a 3395 phosphate limitation. 3396

All incubations were kept in an outdoor tank incubator, with temperature 3397 maintained (between -1 to 3 °C) by a flow-through water system from near-shore 3398 water. All incubations were conducted in triplicate for 20 days time to examine 3399 differences in CO2 response while minimizing complicating factors that might arise 3400 during a longer incubation (e.g., growth on the walls of the carboys, etc.). After 20 3401 days, biomass was collected on 3 µm pore-size polyester filters (Sterlitech 3402

Corporation, Kent, WA, USA) by gentle filtration using a peristaltic pump, flash 3403 frozen in liquid nitrogen, and stored at -80 °C until DNA and RNA could be extracted. 3404

Macronutrients, carbonate, and chlorophyll measurements 3405

Nitrate + nitrite, phosphate, and silicic acid levels were measured from 0.2 µm 3406 filtered aliquots from each carboy and analyzed using a Quattro autoanalyzer. All 3407 nutrient analysis protocols were developed by the Royal Netherlands Institute for Sea 3408

Research. Nitrate + nitrite was determined using the red azo dye method (Method NO 3409

Q-068- 05 Rev. 4), phosphate was determined using a phosophomolybdenum complex 3410

139

(Method NO Q-064-05 Rev. 3), and a silico-molybdenum blue complex was used to 3411 determine silicic acid (Method NO Q-066-05 Rev. 3). 3412

Dissolved inorganic carbon (DIC) and pH samples were collected periodically 3413 throughout the course of the experiment, as per the Department of Energy handbook 3414

(DOE 1994). Total DIC was measured using a DIC analyzer based upon sample 3415 acidification and LI-COR CO2 detection (Apollo SciTech). pH (total scale) was 3416 determined colorimetrically using metacresol purple (Sigma-Aldrich) with a Varian 3417 dual beam spectrophotometer (Agilent Technologies), 10 cm cylindrical cells 3418

(Innovative Lab Supply), and jacketed cell holders, with temperature maintained by a 3419

Peltier cooler. All samples were analyzed within 60 minutes of collection. 3420

Chlorophyll samples were measured by vacuum-filtration onto 25 mm GF/F 3421 glass fiber filters (Whatman, GE HealthCare Biosciences) that were then submerged in 3422

90% acetone in 15 ml centrifuge tubes. Chlorophyll was extracted from the filters for 3423

1 or 24 hours in the dark at -20°C. After extracting, samples were vortexed and 3424 acetone was decanted and read in a Turner Designs fluorometer (Parsons et al. 1984). 3425

Particulate organic C, N, and P samples were vacuum-filtered onto pre- 3426 combusted 25 mm GF/C glass fiber filters (Whatman, GE HealthCare Biosciences). 3427

Briefly, POC and PON samples were dried in an oven at 60°C, acidified with HCl 3428 fumes, and analyzed using gas chromatography with a CHNSO analyzer (Costech) 3429

(Parsons et al. 1984). POP samples were rinsed with 0.17 M Na2SO4, dried at 90°C in 3430

0.017 M MgSO4, and measured using the colorimetric molybdate method (Solorzano 3431

& Sharp 1980). 3432

3433

140

DNA Extraction 3434

DNA extractions were performed using the DNeasy Plant Kit (Qiagen, 3435

Germany) following the manufactuer's protocols with the following modifications: 3436

Filters were bead-beat in lysis buffer with 0.5 mm zircon beads and RNaseA 3437

(BioSpec, Bartlesville, OK, USA) for 1 minute to lyse the cells. Cell debris was 3438 immediately removed by running lysate over a Qiagen QiaShredder column. Extracted 3439

DNA samples were quantified on a NanoDrop 8000 spectrophotometer and stored at - 3440

20 °C 3441

RNA Extraction 3442

RNA extractions were performed using the RNeasy Mini Kit (Qiagen, 3443

Germany) following the manufacturer's instructions with the following modifications. 3444

Filters were bead-beat in lysis buffer with 0.5 mm zircon beads (BioSpec, Bartlesville, 3445

OK, USA) for 1 minute to lyse the cells. Cell debris was immediately removed by 3446 running lysate over a Qiagen QiaShredder column. DNA contamination was removed 3447 by an on-column DNAse step using the RNase-free DNase Set (Qiagen) and an 3448 additional DNAse step was performed on eluted RNA using the Turbo DNA-free kit 3449

(Life Technologies, Carlsbad, CA, USA), both performed as per the manufacturers’ 3450 instructions. RNA samples were quantified using the Qubit High-Sensitivity Kit (Life 3451

Technologies) and stored at -80 °C 3452

DNA Amplicon sequencing 3453

Diatom-specific primers to amplify the V4 region of the 18S rDNA gene 3454

(Zimmermann et al. 2011) were modified to include Illumina MiSeq (Illumina, San 3455

141

Diego, CA, USA) sequencing adapters (Mellett et al. 2018) (forward primer: 5’- 3456

TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATTCCAGCTCCAA 3457

TAGCG-3’; reverse primer: 5’- GTCTCGTGGGCTCGGAGATGTGTATAAGAG 3458

ACAGGACTACGATGGTATCTAATC-3’). The PCR reaction used the following: 2- 3459

8 ng of template, 500 nmol L-1 of each primer, 1x BIO-X-ACT Short Mix, and water 3460 to reach a 25 µL reaction volume. The following conditions were used for the PCR 3461 cycle: 5 minutes at 95˚C; 32 cycles of 1 minute at 95˚C, 1 minute at 61.8˚C, 30 3462 seconds at 72˚C; and a final extension step for 10 minutes at 72˚C. PCR products were 3463 purified using the Qiagen QIAquick Purification Kit (Qiagen, Germany) followed by 3464 visualization on a 1.2% agarose gel. Sequencing was performed using an Illumina 3465

MiSeq Reagent Kit V2 (500 cycle), with 2x250 bp paired-end reads at a depth of 3466

100,000-500,000 reads per sample. Sequencing was performed at the Rhode Island 3467

Genomics Sequencing Center. 3468

Adapter sequences were trimmed from raw reads using CutAdapt version 1.15 3469

(Martin 2011) with the ‘--minimum-length 1’ flag set to omit reads with a length of 0 3470 and the ‘–discard-untrimmed’ flag set to discard reads that do not have the adapter 3471 sequence and a maximum 10% error rate allowed. DADA2 version 1.8.0 (Callahan et 3472 al. 2016) was used to de-noise reads (filterAndTrim default settings with the following 3473 changes: truncLen of 220 to discard reads shorter than 220 bases and maxEE of 2 to 3474 discard reads with more than 2 expected errors) and merge read pairs (mergePairs 3475 using default settings). DADA2 was also used to remove chimeras 3476

(removeBimeraDenovo using default settings) and amplicons were filtered to remove 3477 sequences less than 390 bp or greater than 410 bp. Taxonomy was assigned using the 3478

142

PR2 database version 4.10.0 (Guillou et al. 2013) and rarefaction curves were made in 3479 vegan version 2.5-2 (Oksanen et al. 2018) (Supplemental Figure 1). 3480

RNA Sequencing 3481

Equal amounts of total RNA from each triplicate were pooled and sent to the 3482

JP Sulzberger Genome Sequencing Center at Columbia University. All sequencing 3483 was performed on an Illumina HiSeq 2500 platform and all libraries were mRNA 3484 enriched using a poly-A pull-down method. Target sequencing depth was 30 million, 3485 paired-end, 100 base pair read length. Raw reads were trimmed in Trimmomatic 3486 version 0.3.2 (Bolger et al. 2014) as follows: remove any Illumina adapters, trim the 3487 first 15 nucleotides, trim below quality score of 20 using a sliding window of 6 3488 nucleotides, and 25 nucleotide minimum read length. The number of reads remaining 3489 after trimming are reported in Supplemental Table 1. 3490

Two step read mapping approach: DIAMOND and BWA 3491

A custom database was constructed from all available sequences from the 3492

Gordon and Betty Moore Foundation Marine Microbial Eukaryote Transcriptome 3493

Sequencing Project, clustered by species at 98% ID in CD-HIT-EST (Li & Godzik 3494

2006; Keeling et al. 2014). Using this custom database, an initial read mapping step 3495 was performed using blastX in DIAMOND version 0.8.37 (Buchfink et al. 2015). The 3496 mapping parameters used in DIAMOND were as follows: --top 0 –more-sensitive – 3497 evalue .00001. This allows only the top hits with an evalue cut-off of e-5 to be 3498 reported as mapped. If there are multiple top hits for a read (e.g., a read matches to 3499 multiple protein sequences equally well), the proteins that recruit this read are all 3500 given equal weight. There is no apportioning of read counts; a read that maps to only 1 3501

143

protein is given equal weight relative to a read that maps to 10 proteins. Forward and 3502 reverse reads were mapped separately because DIAMOND does not support paired- 3503 end read mapping. Proteins with at least 1 read mapped were binned at the phylum, 3504 genus, and species level. 3505

After filtering the database to include only those organisms most highly 3506 represented in the DIAMOND search (Skeletonema and Thalassiosira diatoms), a 3507 subsequent read mapping step was performed using Burrows-Wheeler Aligner (BWA- 3508

MEM) (Li & Durbin 2010). The mapping parameters used in BWA-MEM were as 3509 follows: -T 60 –aM –k 10, as has been used in other metatranscriptome studies 3510

(Alexander et al. 2015). The alignments were filtered in SAMtools (Li et al. 2009) so 3511 that only reads with the highest possible quality score (MAPQ60) were counted as 3512 mapped to each transcript. 3513

Normalization 3514

To account for changes in species abundance in each library, the resulting raw 3515 read counts were normalized as per-million reads mapped to each diatom species in 3516 the library using Python pandas (pandas.pydata.org). Transcripts with at least 2 3517 normalized reads mapped in at least 1 library were included in further analysis. 3518

Differential Transcript Expression 3519

Any transcript that passed the expression threshold (at least 2 normalized reads 3520 mapped in at least 1 library) were ran through Analysis of Sequence Counts (ASC) 3521

(Wu et al. 2010) to assess differential expression in the low and high CO2 conditions 3522 relative to the present-day CO2 condition. The raw read counts were the input to ASC, 3523 which is an empirical Bayes approach that estimates the prior distribution based on the 3524

144

input data and is suitable for use in situations where biological replicates were pooled 3525 before sequencing (Wu et al. 2010), as was done here. The counts for each diatom 3526 species were ran individually through ASC (Wu et al. 2010). If the relative abundance 3527 for a given species changed across conditions, this would be reflected by a change in 3528 library size sequenced from those conditions. These differences in library size are 3529 taken into account when ASC builds a model of the transcript counts (Wu et al. 2010). 3530

Transcripts with a greater than 0.95 posterior probability of a 2-fold change were 3531 considered differentially expressed. 3532

Metatranscriptome rarefaction 3533

Metatranscriptome rarefaction curves were constructed to determine 3534 sufficiency of sequencing depth. First, libraries were re-sampled (10% intervals from 3535

10% to 100% of each library) using seqtk (Li 2018). The reads were mapped back 3536

(using BWA-MEM and MAPQ60 filtering) to the subset of genes from Skeletonema 3537 and Thalassiosira with at least 2 reads mapped in at least 1 library. The rarefaction 3538 curves shown plot the number of transcripts that had at least 2 reads mapped per re- 3539 sampled library (Supplemental Figure 2). 3540

Annotation and Differential KO Expression 3541

The associated protein sequences from the transcripts that passed the 3542 expression threshold (at least 2 normalized reads mapped in at least 1 library) were 3543 annotated using the KEGG Automatic Annotation Server (KAAS) with the following 3544 settings: the bi-directional best hit assignment method using the BLAST search 3545 program to query the GENES database plus the model diatoms T. pseudonana and P. 3546 tricorntutum (Moriya et al. 2007). To assess differential expression of specific KEGG 3547

145

KOs across CO2 conditions, genes were grouped first by the genus they originated 3548 from (Skeletonema or Thalassiosira) and then clustered by their KO assignment. The 3549 total number of normalized reads mapped per KO and per genus were ran through 3550

Analysis of Sequence Counts (ASC) (Wu et al. 2010). In addition to its use for 3551 differential expression of genes, ASC is appropriate for any counts-based 3552 measurement, as was done here to determine differential KO expression. Any KO with 3553 a greater than 0.95 posterior probability of a 2-fold change was considered 3554 differentially expressed. 3555

Identification of carbon concentrating mechanism components 3556

Protein sequences from carbonic anhydrases and bicarbonate transporters 3557 previously identified in Thalassiosira or Phaeodactylum diatoms (Tachibana et al. 3558

2011; Samukawa et al. 2014; Kikutani et al. 2016, Wallace et al., in prep) were 3559 queried against a BLAST database of protein sequences corresponding to transcripts 3560 that passed the expression threshold (at least 2 normalized reads mapped in at least 1 3561 library). Sequences that passed the similarity threshold (35% identity, 80% query 3562 coverage per HSP) were considered putative carbonic anhydrases or bicarbonate 3563 transporters. 3564

RESULTS AND DISCUSSION 3565

Chlorophyll, nutrients, and carbonate chemistry 3566

In these experiments, filtered seawater (including only the <180 µm size 3567 fraction) from Massachusetts Bay was incubated at three different CO2 levels, 3568 approximating a low/pre-industrial (less than 215 ppm pCO2), present-day ambient 3569 pCO2 in the open ocean (330-390 ppm pCO2), and future pCO2 projections (780-1320 3570

146

ppm pCO2) (Tans 2009; Le Quéré et al. 2016). Temperature, chlorophyll, dissolved 3571 macronutrients, and pCO2 were tracked over the course of the experiment, and the 3572 initial and final time points also included measurements of BSi, POC, PON, and POP. 3573

Metatranscriptome and 18S V4 rDNA amplicons were sequenced from the final time 3574 point. 3575

The target CO2 levels were steadily maintained over the duration of the 3576 incubation (Figure 1A). At the final time point, the chlorophyll in the high CO2 bottle 3577 was comparatively lower than the present-day open ocean pCO2 (330-390 ppm pCO2) 3578

- condition (1-way ANOVA; p<0.05) (Figure 1B). This corresponds to significant NO3 3579

- +NO2 (Figure 1C) and PO4 (Figure 1D) left unused in the high CO2 incubations 3580 relative to the present-day open ocean pCO2 incubations (1-way ANOVA; p<0.05). 3581

There was also significantly more Si(OH)4 left in the high CO2 incubations (Figure 3582

1E). This is consistent with the measurements of lower POC (Figure 1F), PON (Figure 3583

1G), and POP (Figure 1H), in the high CO2 condition relative to the present-day open 3584 ocean pCO2 (1-way ANOVA; p<0.05), although there was not a significant difference 3585 in the BSi measurements across CO2 conditions (Figure 1I) (1-way ANOVA; p<0.05). 3586

For each of these measurements (chlorophyll, macronutrients, PON/C/P), there was no 3587 statistical difference between the low CO2 condition and the present-day open ocean 3588 condition. 3589

The differences observed in the high CO2 condition relative to the present-day 3590 open ocean CO2 condition do not track over the course of the experiment – previous to 3591

- - the final time point, the drawdown of NO3 +NO2 (Figure 1C), PO4 (Figure 1D), and 3592

4 Si(OH) (Figure 1E) were similar across CO2 conditions. POC (Figure 1F), PON 3593

147

(Figure 1G), and POP (Figure 1H) were only measured at the initial and final time 3594 points, so we cannot compare trends over the course of the incubation to final time 3595 point observations. 3596

If diatoms grown in high CO2 are less able to tolerate stressful conditions like 3597 changing temperatures, an abrupt change in temperature at the final time point could 3598 account for the observed differences in the final time point relative to the course of the 3599 experiment. However, this doesn’t seem to be the case here, as the temperature 3600 fluctuations were minimal (<~3° C). 3601

An additional possibility is that there has been some differential grazing 3602 pressure on the phytoplankton in the high CO2 incubations, as the pre-filtering would 3603 not have removed all microzooplankton grazers. This could be significant, as 3604 micrograzers consume 67% of phytoplankton daily growth (Calbet & Landry 2004). 3605

While this could explain the loss of chlorophyll, the surplus of dissolved nutrients 3606

(Figure 1B, 1C, 1D, 1E) suggests that there is some other mechanism constraining 3607 phytoplankton biomass accumulation in the high CO2 final time point. Previous 3608 mesocosm studies have also found that microzooplankton grazing rates were not 3609 influenced by pCO2 level (Suffrian et al. 2008). One possibility is that phytoplankton 3610 viruses have increased in concentration or activity in the final time point high CO2 3611 condition. This could be important, as a 20% increase in viral load led to a decrease in 3612 bulk carbon fixation rates by nearly 50% (Suttle 1992). However, previous studies 3613 have reported no difference in viral infection across CO2 conditions (Tsiola et al. 3614

2017). An additional possibility is that high CO2 has led to a disturbance in 3615 intracellular ion and redox homeostasis and a decreased growth rate, which has been 3616

148

observed in the diatom T. oceanica (Wallace et al. in prep), and so seems to be the 3617 most likely explanation for the decreased chlorophyll and nutrient uptake observed in 3618 the high CO2 condition at the final time point. 3619

Metatranscriptome reads are dominated by Skeletonema and Thalassiosira 3620 diatoms 3621

Read mapping by both DIAMOND and BWA indicates that most of the 3622 metatranscriptome reads are recruiting to diatoms, most of which are either 3623

Skeletonema or Thalassiosira genera (Figure 2A). The presence of Skeletonema spp. 3624 and Thalassiosira spp. diatoms is corroborated by the diatom-specific V4 18S rDNA 3625 gene taxonomy assignment (Figure 2B). The presence of both Skeletonema and 3626

Thalassiosira diatom genera was also confirmed by light microscopy observations at 3627 the final time point. The timing of these incubations also coincides with a period of 3628 time where both Thalassiosira spp. and Skeletonema spp. should be present in 3629

Massachusetts Bay (Hunt et al. 2010). 3630

Rarefaction analysis from the 18S V4 rDNA amplicon sequencing are 3631 asymptotic (Supplemental Figure 1). This suggests that we have saturated the diatom 3632

V4 amplicon diversity at the species level in the PR2 database. Rarefaction of mapped 3633 reads from the metatranscriptome sequencing are approaching an asymptote 3634

(Supplemental Figure 2). This suggests that we are approaching saturating the 3635 diversity of transcripts detected (with at least 2 reads mapped) from Skeletonema spp. 3636 and Thalassiosira spp. in the MMETSP database. 3637

3638

149

Diatom genera show different patterns of expression of KOs for C, N, and P 3639 acquisition and metabolism across CO2 conditions 3640

Based on the differential KO expression analysis, we looked at broad trends in 3641 metabolism on the genus level across CO2 conditions at the final time point. Under 3642 low CO2, both Skeletonema spp. and Thalassiosira spp. upregulate the citrate cycle to 3643 release stored energy and carbon dioxide, but upregulate different enzymes within the 3644 cycle (Figure 3, Supplemental Table 1). While Skeletonema spp. upregulates fumarate 3645 hydratase (fumC) and citrate synthase (CS), Thalassiosira spp. upregulates succinyl- 3646

CoA synthetase (LSC2) and malate dehydrogenase (maeB) (Figure 3, Supplemental 3647

Table 1). This is consistent with previous laboratory studies in T. pseudonana that saw 3648 a downregulation of the citrate cycle in high CO2 (Hennon et al. 2015). 3649

Based on their expression profiles, Thalassiosira spp. and Skeletonema spp. 3650 seem to have distinct strategies for generating glutamate -- Thalassiosira spp. 3651 regenerates glutamate from glutamine (glnA and gltD) and Skeletonema spp. seems to 3652 preferentially regenerate glutamate from proline (1P5C), all of which are higher in low 3653

CO2 relative to their expression level in present day CO2 (Figure 3, Supplemental 3654

Table 1). The increased expression of these KOs (glnA, gltD, IP5C) in low CO2 could 3655 be indicative of shuttling more amino acids toward glutamate, which can fuel the 3656 citrate cycle via its conversion to α-ketoglutarate (Figure 3, Supplemental Table 1). 3657

Skeletonema spp. and Thalassiosira spp. differentially express KOs mediating 3658 their nitrogen acquisition strategies across CO2 conditions (Figure 4, Supplemental 3659

Table 1). Both diatoms downregulate nitrate transporters in high CO2 (NRT) (Figure 3660

4, Supplemental Table 1), consistent with the high CO2 condition incubations drawing 3661

150

- - down less NO2 +NO3 relative to the present day CO2 conditions at the final time point 3662

(Figure 1C). There is no compensation in the expression of ammonium transporters 3663

(AMT) across CO2 conditions (Figure 4, Supplemental Table 1). However, in low 3664

CO2, Thalassiosira spp. downregulates urease (URE), and in high CO2, Skeletonema 3665 spp. upregulates urease (URE), suggesting that both diatoms might hydrolyze more 3666 urea as CO2 increases (Figure 4, Supplemental Table 1), although rates of urea 3667 hydrolysis and urea levels were not measured in this experiment. These diatom genera 3668 also express KOs that suggest distinct approaches for supplying urea to the urease 3669 enzyme, as the DUR3 urea-proton symporter and the SLC14 urea transporter are not 3670 expressed in Skeletonema spp. in this experiment, but are constitutively expressed in 3671

Thalassiosira spp. (Figure 4, Supplemental Table 1). Similarly, Thalassiosira spp. 3672 upregulates an amino acid transporter (APA) in high CO2, which is not expressed in 3673

Skeletonema spp. (Figure 5, Supplemental Table 1). This is consistent with a previous 3674 metatranscriptome study that found Thalassiosira spp. highly expressed amino acid 3675 transporters, but Skeletonema spp. did not (Alexander et al. 2015). However, both 3676 diatom genera show a broad upregulation of KOs involved in the degradation of amino 3677 acids leucine, valine, and isoleucine in high CO2 (Figure 5, Supplemental Table 1). 3678

This suggests that these amino acids might be a source of recycled nitrogen for 3679

Skeletonema and Thalassiosira spp. The high CO2 decrease in expression of nitrate 3680 transporters and increase in expression of organic nitrogen metabolism genes is 3681 consistent with previous studies in Thalassiosira pseudonana that saw a decrease in 3682 nitrate uptake in elevated CO2 (Shi et al. 2015; Hennon et al. 2014). 3683

151

These two diatom genera show contrasting complements of phosphate 3684 metabolism KOs. While Skeletonema spp. constitutively expresses two phosphatase 3685 genes (ACP5 and phoD) involved in scavenging phosphate from internal pools and an 3686

SLC20 sodium-dependent phosphate transporter under low CO2, they are not 3687 expressed in Thalassiosira spp. (Figure 6, Supplemental Table 1). Both diatoms 3688 constitutively express an SLC34 sodium-dependent phosphate co-transporter, but it is 3689 lowly expressed in Thalassiosira spp. (Figure 6, Supplemental Table 1). This is 3690 consistent with a previous metatranscriptome study that found genes involved in 3691 phosphate uptake and scavenging were more highly expressed in Skeletonema than 3692

Thalassiosira (Alexander et al. 2015). 3693

Skeletonema spp. also remodels phosphate metabolism in high CO2, but 3694

Thalassiosira spp. does not. In high CO2, Skeletonema spp. upregulates two KOs 3695 involved in scavenging phosphate from inositol phosphate (inositol-phosphate 3696 phosphatases IMPL2 and VTC4) (Figure 6, Supplemental Table 1), a class of 3697 molecule that was upregulated in phosphate limited cultures of the diatoms 3698

Thalassiosira pseudonana (Dyhrman et al. 2012) and Phaeodactylum tricornutum 3699

(Feng et al. 2015). Similarly, in high CO2 Skeletonema spp. also upregulates a 3700 nucleotidase KO involved in scavenging phosphate from nucleotides (cysQ) (Figure 6, 3701

Supplemental Table 1), a class of molecule also upregulated in phosphate limited 3702 cultures of the diatom Thalassiosira pseudonana (Dyhrman et al. 2012). These KOs 3703

(IMPL2, VTC4, cysQ) are not differentially expressed in Thalassiosira spp. The 3704 observations that at the final time point the high CO2 condition incubations drew down 3705 less phosphate relative to the present day CO2 conditions (Figure 1G) and the high 3706

152

CO2 upregulation of genes involved in phosphate scavenging (Figure 6, Supplemental 3707

Table 1) suggest that high CO2 Skeletonema diatoms upregulate KOs to preferentially 3708 recycle internal pools of phosphate rather than take up exogenous phosphate, while 3709

CO2 status has little impact on the expression of phosphate scavenging KOs in 3710

Thalassiosira spp. (Figure 6, Supplemental Table 1). 3711

Expression patterns in ion and redox homeostasis KOs indicate diatoms 3712 experience increased oxidative stress in high CO2 3713

The final incubation time points in high CO2 saw lower chlorophyll, lower 3714

POC, PON, and POP, as well as less nutrient drawdown (1-way ANOVA; p<0.05). 3715

(Figure 1). One possible explanation is that high CO2 has led to a disturbance in 3716 diatoms’ intracellular ion and redox homeostasis, as has been observed in the diatom 3717

T. oceanica (Wallace et al. in prep). 3718

At the final time point, both diatoms show signs of increased oxidative stress 3719 in high CO2, as indicated by the differential expression of KOs involved in glutathione 3720 metabolism (Figure 7, Supplemental Table 1), a key mediator of oxidative stress 3721

(Noctor & Foyer 1998). Thalassiosira spp. shows a low CO2 downregulation and a 3722 high CO2 upregulation of glutamate-cysteine ligase (GCLC) and glutathione S- 3723 transferase (GST), while Skeletonema spp. shows a high CO2 upregulation of 3724 glutathione synthase (GSS) and glutathione reductase (GSR) (Figure 7). The 3725 upregulation of oxidative stress genes is consistent with a previous study that saw an 3726 upregulation of these genes in Thalassiosira oceanica grown in high CO2 (Wallace et 3727 al., in prep). In addition, the high CO2 upregulation of glutathione reductase and 3728 glutathione synthase in Skeletonema spp. is consistent with previous studies that report 3729

153

an upregulation of these genes in CO2-enriched Skeletonema marinoi cultures 3730

(Lauritano et al. 2015). Diatoms may be regulating glutathione metabolism as part of a 3731 mechanism to maintain redox homeostasis, as the carbonic anhydrases in P. 3732 tricornutum are redox regulated and a similar redox regulation could be happening in 3733

Skeletonema spp. and Thalassiosira spp. (Kikutani et al. 2012). 3734

Diatoms have distinct complements of carbon concentrating mechanism 3735 components that are regulated in response to CO2 3736

In addition to differential KO expression across genera, we also looked at 3737 differential expression on the per-gene level. Skeletonema spp. and Thalassiosira spp. 3738 express a similar suite of carbonic anhydrases (α, δ, γ, and ζ) (Figure 8, Supplemental 3739

Table 4) and bicarbonate transporters (SLC4 and SLC26) (Figure 9, Supplemental 3740

Table 4). If a carbonic anhydrase or bicarbonate transporter were involved in the 3741

CCM, it is expected that their expression would be upregulated in low CO2 and/or 3742 downregulated in high CO2. In the case of Thalassiosira, there are α, δ, and ζ carbonic 3743 anhydrases that are up in low CO2 or down in high CO2 (Figure 8, Supplemental Table 3744

4). This is in contrast with what is observed for Skeletonema, in which there are γ and 3745

ζ carbonic anhydrases that are up in low CO2 or down in high CO2 (Figure 8, 3746

Supplemental Table 4). This suggests that while the α and δ carbonic anhydrases are 3747 important for the CCM of Thalassiosira, the γ carbonic anhydrases are important in 3748

Skeletonema. It also suggests that the ζ carbonic anhydrases are key players in the 3749

CCM of both Skeletonema and Thalassiosira. The disparity in the CO2-sensitivity of 3750 the carbonic anhydrases may be partially explained by the fact that α and δ carbonic 3751 anhydrases have undergone many lineage-specific expansions or gene losses and that 3752

154

Skeletonema and Thalassiosira diatoms contain completely distinct subsets of the δ 3753 carbonic anhydrases (Shen et al. 2017). It also suggests that these diatoms have 3754 different abilities to substitute the typical zinc CA cofactor for another metal -- the δ 3755

CAs can substitute cobalt (Lane & Morel 2000), while the γ CAs can substitute iron 3756

(although this has only been observed in anaerobic conditions) (Tripp et al. 2004). The 3757 key role of the ζ carbonic anhydrases for these two diatom genera is particularly 3758 interesting, since this class of carbonic anhydrases is found in relatively few diatom 3759 species (Shen et al. 2017). 3760

Thalassiosira spp. also dynamically regulates the SLC4 and SLC26 3761 bicarbonate transporters in response to CO2 (Figure 9, Supplemental Table 4), 3762 suggesting that regulating bicarbonate transport is critical for its CCM (Figure 4). In 3763 contrast, the Skeletonema spp. SLC4 transporter response is relatively muted but the 3764

SLC26 transporters are more sensitive to CO2 (Figure 9, Supplemental Table 4). The 3765 sensitive expression of the SLC26 transporters suggests that Skeletonema may rely on 3766 diffusive CO2 uptake or the non-specific SLC26 anion transporters for bicarbonate 3767 uptake rather than the bicarbonate-specific SLC4 transporters (Alper & Sharma 2013). 3768

These different strategies in inorganic carbon uptake are consistent with the hypothesis 3769 that historic fluctuations in atmospheric CO2 have led to a diversification and re- 3770 configuration of the CCM within the diatoms (Young et al. 2016; Shen et al. 2017). 3771

3772

3773

CONCLUSIONS 3774

155

This study suggests that the CCM of Skeletonema and Thalassiosira use 3775 distinct CA isozymes that use different metal co-factors (Lane & Morel 2000; Tripp et 3776 al. 2004). We also show evidence suggesting distinct strategies for inorganic carbon 3777 uptake: while Thalassiosira uses substrate-specific SLC4 BCTs, Skeletonema seems to 3778 rely on diffusive CO2 transport or non-specific SLC26-mediated transport. This might 3779 influence community composition dynamics in high CO2 because diatoms that can 3780 downregulate the SLC4 active transporters in high CO2 might save some energy and 3781 gain a competitive advantage. The co-existence of these diatoms might be related to 3782 their CCMs that use different metal cofactors and they might also show unique 3783 responses to different metal regimes in high CO2. 3784

This study suggests that there are differences in N and P metabolism in these 3785 two diatoms. The significance of these differences in the context of ocean acidification 3786 is still unclear. Future studies should attempt to constrain the extent that transcript 3787 expression levels correlate with protein expression levels, cell physiology (e.g., 3788 cellular C:N:P ratios), and rate measurements (e.g., nitrate uptake). Additionally, the 3789 enzymatic activity of the CAs and BCTs should be characterized to confirm their 3790 putative function, as well as their flexibility of metal cofactor requirements. Results 3791 from this study showing distinct responses in nutrient metabolism is in agreement with 3792 previous transcriptome studies that suggest there are fundamental differences in N and 3793

P metabolism in these co-occuring diatoms (Alexander et al. 2015). Future 3794 experiments that combine multiple stressors (e.g., culturing in a range of CO2, N, and 3795

P levels) could give a more complete picture of how these differences would impact 3796 diatom community dynamics. 3797

156

3798

3799

3800

3801

3802

3803

3804

3805

3806

3807

3808

3809

3810

3811

3812

3813

3814

3815

3816

3817

3818

References 3819 3820 Alexander, H. et al., 2015. Metatranscriptome analyses indicate resource partitioning 3821

157

between diatoms in the field. Proceedings of the National Academy of Sciences, 3822

112(17), pp.E2182-E2190. 3823

Alper, S.L. & Sharma, A.K., 2013. The SLC26 Gene Family of Anion Transporters 3824

and Channels. Molecular Aspects of Medicine, 34(2–3), pp.494–515. 3825

Bach, L.T. et al., 2017. Simulated ocean acidification reveals winners and losers in 3826

coastal phytoplankton. PLoS ONE, 12(11), pp.1–23. 3827

Badger, M.R. et al., 1998. The diversity and coevolution of Rubisco, plastids, 3828

pyrenoids, and chloroplast-based CO2-concentrating mechanisms in algae. 3829

Canadian Journal of Botany, 76(6), pp.1052–1071. 3830

Bolger, A.M., Lohse, M. & Usadel, B., 2014. Trimmomatic: A flexible trimmer for 3831

Illumina sequence data. Bioinformatics, 30(15), pp.2114–2120. 3832

Buchfink, B., Xie, C. & Huson, D., 2015. Fast and sensitive protein alignment using 3833

DIAMOND. Nature Methods, 12(1), pp.59–60. 3834

Calbet, A. & Landry, M.R., 2004. Phytoplankton growth, microzooplankton grazing, 3835

and carbon cycling in marine systems. Limnology and Oceanography, 49(1), 3836

pp.51–57. 3837

Caldeira, K. & Wickett, M.E., 2003. Oceanography: anthropogenic carbon and ocean 3838

pH. Nature, 425(6956), p.365. 3839

Callahan, B.J. et al., 2016. DADA2: High-resolution sample inference from Illumina 3840

amplicon data. Nature Methods. 3841

Cohen, N.R. et al., 2017. Diatom Transcriptional and Physiological Responses to 3842

Changes in Iron Bioavailability across Ocean Provinces. Frontiers in Marine 3843

Science, 4(November). 3844

158

DOE, 1994. Handbook of methods for the analysis of the various parameters of the 3845

carbon dioxide system in sea water; version 2 A. G. Dickson & C. Goyet, eds., 3846

Dyhrman, S.T. et al., 2012. The transcriptome and proteome of the diatom 3847

Thalassiosira pseudonana reveal a diverse phosphorus stress response. PLoS 3848

ONE, 7(3). 3849

Feng, T.-Y. et al., 2015. Examination of metabolic responses to phosphorus limitation 3850

via proteomic analyses in the marine diatom Phaeodactylum tricornutum. 3851

Scientific Reports, 5(1), p.10373. 3852

Guillou, L. et al., 2013. The Protist Ribosomal Reference database (PR2): A catalog of 3853

unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy. 3854

Nucleic Acids Research. 3855

Hennon, G.M.M. et al., 2014. Acclimation conditions modify physiological response 3856

of the diatom Thalassiosira pseudonana to elevated CO2 concentrations in a 3857

nitrate-limited chemostat. Journal of Phycology, 50(2), pp.243–253. 3858

Hennon, G.M.M. et al., 2015. Diatom acclimation to elevated CO2 via cAMP 3859

signalling and coordinated gene expression. Nature Climate Change, 5(August), 3860

pp.761–766. 3861

Hewett-Emmett, D. & Tashian, R.E., 1996. Functional diversity, conservation, and 3862

convergence in the evolution of the α-, β-, and γ-carbonic anhydrase gene 3863

families. Molecular Phylogenetics and Evolution, 5(1), pp.50–77. 3864

Hopkinson, B.M. et al., 2011. Efficiency of the CO2-concentrating mechanism of 3865

diatoms. Proceedings of the National Academy of Sciences, 108(10), pp.3830– 3866

3837. 3867

159

Hunt, C.D. et al., 2010. Phytoplankton patterns in Massachusetts Bay-1992-2007. 3868

Estuaries and Coasts, 33(2), pp.448–470. 3869

Keeling, P.J. et al., 2014. The Marine Microbial Eukaryote Transcriptome Sequencing 3870

Project (MMETSP): Illuminating the Functional Diversity of Eukaryotic Life in 3871

the Oceans through Transcriptome Sequencing. PLoS Biology, 12(6). 3872

Khatiwala, S. et al., 2013. Global ocean storage of anthropogenic carbon. 3873

Biogeosciences, 10(4), pp.2169–2191. 3874

Kikutani, S. et al., 2012. Redox regulation of carbonic anhydrases via thioredoxin in 3875

chloroplast of the marine diatom Phaeodactylum tricornutum. Journal of 3876

Biological Chemistry, 287(24), pp.20689–20700. 3877

Kikutani, S. et al., 2016. Thylakoid luminal θ-carbonic anhydrase critical for growth 3878

and photosynthesis in the marine diatom Phaeodactylum tricornutum. 3879

Proceedings of the National Academy of Sciences, 113(35), pp.9828–9833. 3880

Kim, J.-M. et al., 2006. The effect of seawater CO2 concentration on growth of a 3881

natural phytoplankton assemblage in a controlled mesocosm experiment. 3882

Limnology and Oceanography, 51(4), pp.1629–1636. 3883

King, A.L. et al., 2015. Effects of CO2 on growth rate, C:N:P, and fatty acid 3884

composition of seven marine phytoplankton species. Marine Ecology Progress 3885

Series, 537, pp.59–69. 3886

Lane, T.W. & Morel, F.M.M., 2000. Regulation of Carbonic Anhydrase Expression by 3887

Zinc, Cobalt, and Carbon Dioxide in the Marine Diatom Thalassiosira 3888

weissflogii. Plant Physiology, 123(1), pp.345–352. 3889

Lauritano, C. et al., 2015. Key genes as stress indicators in the ubiquitous diatom 3890

160

Skeletonema marinoi. BMC Genomics, 16(1). 3891

Li, H., 2018. seqtk. https://github.com/lh3/seqtk. 3892

Li, H. et al., 2009. The Sequence Alignment/Map format and SAMtools. 3893

Bioinformatics, 25(16), pp.2078–2079. 3894

Li, H. & Durbin, R., 2010. Fast and accurate long-read alignment with Burrows- 3895

Wheeler transform. Bioinformatics, 26(5), pp.589-595 3896

Li, W. & Godzik, A., 2006. CD-HIT: A fast program for clustering and comparing 3897

large sets of protein or nucleotide sequences. Bioinformatics, 22(13), pp.1658– 3898

1659. 3899

Martin, M., 2011. Cutadapt removes adapter sequences from high-throughput 3900

sequencing reads. EMBnet.journal. 17(1), pp.10-12 3901

Mellett, T. et al., 2018. The biogeochemical cycling of iron, copper, nickel, cadmium, 3902

manganese, cobalt, lead, and scandium in a California Current experimental 3903

study. Limnology and Oceanography, 63, pp.S425–S447. 3904

Moriya, Y. et al., 2007. KAAS: An automatic genome annotation and pathway 3905

reconstruction server. Nucleic Acids Research, 35(SUPPL.2), pp.182–185. 3906

Noctor, G. & Foyer, C.H., 1998. ASCORBATE AND GLUTATHIONE: Keeping 3907

Active Oxygen Under Control. Annual Review of Plant Physiology and Plant 3908

Molecular Biology, 49(1), pp.249–279. 3909

Oksanen, J. et al., 2018. vegan: Community Ecology Package. https://cran.r- 3910

project.org/package=vegan. 3911

Parsons, T.R., Maita, Y. & Lalli, C.M., 1984. A Manual of Chemical and Biological 3912

Methods for Seawater Analysis, 3913

161

Le Quéré, C. et al., 2016. Global Carbon Budget 2016. Earth System Science Data, 3914

8(2), pp.605–649. 3915

Raven, J., 1991. Physiology of inorganic carbon acquisition and implications for 3916

resource use efficiency by marine phytoplankton: Relation to increased carbon 3917

dioxide and temperature. Plant, Cell and Environment, 14(8), pp.779–794. 3918

Sabine, C.L. et al., 2004. The Oceanic Sink for Anthropogenic CO2. Science, 3919

305(5682), pp.367–371. 3920

Samukawa, M. et al., 2014. Localization of putative carbonic anhydrases in the marine 3921

diatom, Thalassiosira pseudonana. Photosynthesis Research, 121(2–3), pp.235– 3922

249. 3923

Sett, S. et al., 2018. Shift towards larger diatoms in a natural phytoplankton 3924

assemblage under combined high-CO2 and warming conditions. Journal of 3925

Plankton Research, 00(June), pp.1–16. 3926

Shen, C., Dupont, C.L. & Hopkinson, B.M., 2017. The diversity of carbon dioxide- 3927

concentrating mechanisms in marine diatoms as inferred from their genetic 3928

content. Journal of Experimental Botany. 3929

Shi, D. et al., 2015. Interactive effects of light, nitrogen source, and carbon dioxide on 3930

energy metabolism in the diatom Thalassiosira pseudonana. Limnology and 3931

Oceanography, 60(5), pp.1805–1822. 3932

Solorzano, L. & Sharp, J.H., 1980. Determination of total dissolved phosphorus 3933

particulate phosphorus in natural waters. Limnology and Oceanography, 25(4), 3934

pp.754–758. 3935

Suffrian, K. et al., 2008. Microzooplankton grazing and phytoplankton growth in 3936

162

marine mesocosms with increased CO2 levels. Biogeosciences, 5(4), pp.1145– 3937

1156. 3938

Suttle, C.A., 1992. Inhibition of photosynthesis in phytoplankton by the submicron 3939

size fraction concentrated from seawater. Marine Ecology Progress Series, 87(1– 3940

2), pp.105–112. 3941

Tachibana, M. et al., 2011. Localization of putative carbonic anhydrases in two marine 3942

diatoms, Phaeodactylum tricornutum and Thalassiosira pseudonana. 3943

Photosynthesis Research, 109(1–3), pp.205–221. 3944

Tans, P., 2009. An Accounting of the Observed Increase in Oceanic and Atmospheric 3945

CO2 and the Outlook for the Future. Oceanography, 22(4), pp.26–35. 3946

Tortell, P.D. et al., 2002. CO2 effects on taxonomic composition and nutrient 3947

utilization in an Equatorial Pacific phytoplankton assemblage. Marine Ecology 3948

Progress Series, 236, pp.37–43. 3949

Tortell, P.D. et al., 2008. CO2 sensitivity of Southern Ocean phytoplankton. 3950

Geophysical Research Letters, 35(4), pp.1–5. 3951

Tripp, B.C. et al., 2004. A Role for Iron in an Ancient Carbonic Anhydrase. Journal of 3952

Biological Chemistry, 279(8), pp.6683–6687. 3953

Tsiola, A. et al., 2017. Ocean acidification and viral replication cycles: Frequency of 3954

lytically infected and lysogenic cells during a mesocosm experiment in the NW 3955

Mediterranean Sea. Estuarine, Coastal and Shelf Science, 186, pp.139–151. 3956

Wu, Z. et al., 2010. Empirical bayes analysis of sequencing-based transcriptional 3957

profiling without replicates. BMC Bioinformatics, 11(1), p.564. 3958

Young, J.N. et al., 2016. Large variation in the Rubisco kinetics of diatoms reveals 3959

163

diversity among their carbon-concentrating mechanisms. Journal of Experimental 3960

Botany, 67(11), pp.3445–3456. 3961

Zeebe, R.E. & Wolf-Gladrow, D., 2001. CO2 in Seawater: Equilibrium, Kinetics, 3962

Isotopes 1st ed. D. Halpern, ed., Amsterdam: Elsevier Oceanography Series. 3963

Zimmermann, J., Jahn, R. & Gemeinholzer, B., 2011. Barcoding diatoms: Evaluation 3964

of the V4 subregion on the 18S rRNA gene, including new primers and protocols. 3965

Organisms Diversity and Evolution, 11(3), pp.173–192. 3966

3967

3968

164

3969

A 1500 initial low 2 1000 * present high pre-dilution post-dilution

500 ppm CO * * 0

1/14 3/4/14 3/5/14 3/6/14 3/7/14 3/8/14 3/9/14 1 3/10/14 3/ 3/12/14 3/13/14 3/14/14 3/15/14 3/16/14 3/17/14 3/18/14 3/19/14 3/20/14 3/21/14 3/22/14 3/23/14 3/24/14

B F 100 15 -1 80

10 60

40 5 M POC

* μ 20 * g chlorophyll L μ 0 0

1/14 3/4/14 3/5/14 3/6/14 3/7/14 3/8/14 3/9/14 1 3/4/14 3/10/14 3/ 3/12/14 3/13/14 3/14/14 3/15/14 3/16/14 3/17/14 3/18/14 3/19/14 3/20/14 3/21/14 3/22/14 3/23/14 3/24/14 3/24/14

C 15 G 10 - 2 8 10 6 + NO

- 3 * 4 5 M BSi μ 2 M NO μ 0 0

1/14 3/4/14 3/5/14 3/6/14 3/7/14 3/8/14 3/9/14 1 3/4/14 3/10/14 3/ 3/12/14 3/13/14 3/14/14 3/15/14 3/16/14 3/17/14 3/18/14 3/19/14 3/20/14 3/21/14 3/22/14 3/23/14 3/24/14 3/24/14

D 1.5 H 0.5 0.4

3- 1.0 4 0.3 * 0.2 M POP M PO 0.5 μ

μ * 0.1

0.0 0.0

1/14 3/4/14 3/5/14 3/6/14 3/7/14 3/8/14 3/9/14 1 3/4/14 3/10/14 3/ 3/12/14 3/13/14 3/14/14 3/15/14 3/16/14 3/17/14 3/18/14 3/19/14 3/20/14 3/21/14 3/22/14 3/23/14 3/24/14 3/24/14

E 30 I 15 4

20 10 * M PON μ M Si(OH) 10 5 μ *

0 0

1/14 3/4/14 3/5/14 3/6/14 3/7/14 3/8/14 3/9/14 1 3/4/14 3/10/14 3/ 3/12/14 3/13/14 3/14/14 3/15/14 3/16/14 3/17/14 3/18/14 3/19/14 3/20/14 3/21/14 3/22/14 3/23/14 3/24/14 3/24/14

J 3

2

1

0 Temperature ºC Temperature -1

1/14 3/4/14 3/5/14 3/6/14 3/7/14 3/8/14 3/9/14 1 3/10/14 3/ 3/12/14 3/13/14 3/14/14 3/15/14 3/16/14 3/17/14 3/18/14 3/19/14 3/20/14 3/21/14 3/22/14 3/23/14 3/24/14 3970 3971 3972 3973 3974

165

-1 - - Figure 1. (A) ppm CO2, (B) Chlorophyll L , (C) µM NO3 +NO2 , (D) µM 4 PO4, (E) µM Si(OH) , (F) µM POC, (G) µM PON, (H) µM POP, (I) µM BSi, and (J) temperature °C for each incubation carboy. The x-axis indicates the date the variable was measured and the y-axis indicates the values and variables measured. The shapes and colors indicate the CO2 condition shown. Points and error bars indicate the mean and range of the measurement across the triplicate incubation carboys. The asterisks indicate significant differences relative to the present day pCO2 condition (1-way ANOVA; p < 0.05). Dilution days are indicated with black arrows. Solid shapes indicate that the measurement shown was taken before dilution, empty shapes indicate the measurement was taken post-dilution.

3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988

166

3989

Clustered MMETSP Database: Clustered MMETSP Database A Skeletonema & Thalassiosira only

DIAMOND DIAMOND DIAMOND BWA MAPQ60 Phylum Bacillariophyta Genera Species Species Counts Counts Counts Counts 2 Low CO Low 2 Present CO Present 2 High CO Thalassiosira oceanic Thalassiosira punctiger Thalassiosira gravid Thalassiosira antarctic Thalassiosira sp Thalassiosira rotul Thalassiosira weissflogi Thalassiosira miniscul Skeletonema menzeli Skeletonema dohrni Skeletonema costatu Skeletonema marino Othe Labyrinthist Alveolat Cryptophyt Chlorophyt Unknow Ciliophor Haptophyt Ochrophyt Dinophyt Bacillariophyt Othe Extubocellulu Amphipror Pseudo-nitzschi Asterionellopsi Fragilariopsi Chaetocero Ditylu Skeletonem Thalassiosir Thalassiosira oceanic Thalassiosira punctiger Thalassiosira gravid Thalassiosira antarctic Thalassiosira sp Thalassiosira rotul Thalassiosira weissflogi Thalassiosira miniscul Skeletonema menzeli Skeletonema dohrni Skeletonema costatu Skeletonema marino r r m a n a a a a a a a a s a a s a s s a . . a a a a i i i i m m i i a a a a a a a a i i c c f f Less Less More speci speci

PR2 Database B V4 Amplicon ASV’s

Low CO2

Present CO2

High CO2 Othe Porosir Polar-centric-Mediophyceae Grammonem Atthey Navicul Chaetocero Nitzschi Skeletonem Thalassiosir Othe Porosir Polar-centric-Mediophyceae Grammonem Atthey Navicul Chaetocero Nitzschi Skeletonem Thalassiosir Othe Porosir Polar-centric-Mediophyceae Grammonem Atthey Navicul Chaetocero Nitzschi Skeletonem Thalassiosir r r r a a a a a a a a a a a a s s s a a a a a a a a a

3990 3991 3992 Figure 2. A: Bar charts showing the relative distributions of 3993 metatranscriptome reads mapped to the MMETSP database using 3994 either DIAMOND or BWA, moving from less specific to more 3995 specific from left to right. B: DADA2 taxonomic assignment of 18S 3996 V4 rDNA amplicon reads using the PR2 database. 3997 3998 3999 167

HMGCS log2 fold change relative to present-day CO BCKDHB IVD HADHA 2 -8 -4 0 4 8 L-Leu OXCT = not expressed

ACADS HADH KO KO

APA BCKDHB HADHA Thal. amino low Skel. high acids L-Val ACADSB mmsB

ACAA1 - acetyl-CoA acyltransferase 1 (K07513) ACAA2 - acetyl-CoA acyltransferase 2 (K07508) ACADS - butyryl-CoA dehydrogenase (K00248) ACAA1 acetyl ACADSB - short/branched chain acyl-CoA dehydrogenase (K09478) CoA APA - amino acid/polyamine antiporter (K03294) ACADS BCKDHB - 2-oxoisovalerate dehydrogenase (K00167) HADH - 3-hydroxyacyl-CoA dehydrogenase (K00022) HADHA - enoyl-CoA hydratase (K07515) BCKDHB HADHA HADH ACAA2 PCCB HADHB - acetyl-CoA acyltransferase (K07509) L-Ile HMGCS - hydroxymethylglutaryl-CoA synthase (K01641) ACADSB IVD - isovaleryl-CoA dehydrogenase (K00253) mmsB - 3-hydroxyisobutyrate dehydrogenase (K00020) HADHB OXCT - 3-oxoacid CoA-transferase (K01027) PCCB - propionyl-CoA carboxylase (K01966)

GCLC

log2 fold change relative to present-day CO IP 2 log2 fold change relative to present-day CO -8 -4 0 4 8 2 cysQ -8 -4 0 4 8 IMPL2 VTC4 nucleotides GSS = not expressed = not expressed

ACP5 KO KO glutathione KO KO PO Thal. 4 Thal.

PME low Skel. high GSR low phoD Skel. high

SLC20 SLC34 ACP5 - tartrate-resistant acid phosphatase (K14379) GCLC - glutamate-cysteine ligase (K11204) cysQ - 3’(2’),5’-bisphosphate nucleotidase (K01082) glutathione GSR - glutathione reductase NADPH (K00383) IMPL2 - Inositol-phosphate phosphatase (K18649) (reduced) GSS - glutathione synthase (K21456) SLC20 - Sodium-dependent phosphate transporter (K14640) GST - glutathione S-transferase (K00799) SLC34 - Sodium-dependent phosphate cotransporter (K14683) GSR VTC4 - Inositol-phosphate phosphatase (K10047) PO4 PO4

L-Gln log2 fold change relative to present-day CO log2 fold change relative to present-day CO 2 2 -8 -4 0 4 8 -8 -4 0 4 8 gltD CS

glnA nirA = not expressed oxaloacetate = not expressed + maeB NH4 citrate KO KO KO KO NR URE α-ketoglutarate L-Glu Thal. Thal. low malate succinyl-CoA Skel. high low 1P5C Skel. high fumC LSC2 NRT SLC14 DUR3 AMT AMT - ammonium transporter (K03320) 1P5C - 1-pyrroline-5-carboxylate dehydrogenase (K00294) DUR3 - urea-proton symporter (K20989) CS - citrate synthase (K01647) NIRA - ferredoxin-nitrite reductase (K00366) L-Pro fumC - fumarate hydratase (K01679) NR - nitrate reductase (K10534) fumarate glnA - glutamine synthetase (K01915) NRT - nitrate/nitrite transporter (K02575) gltD - glutamate synthetase (K00266) - NH + URE - urease (K01427) succinate LSC2 - succinyl-CoA synthetase (K01900) NO3 urea 4 maeB - malate dehydrogenase (K00029)

4000 4001 4002 4003 4004 4005 Figure 3. A heat map showing broad metabolic rearrangements in the pathway for the 4006 citrate cycle in Skeletonema and Thalassiosira across CO2 conditions. The color 4007 indicates the log2-fold change of total reads mapped per genus per KEGG KO in low 4008 or high CO2 relative to the present day CO2 condition. 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028

168

HMGCS log2 fold change relative to present-day CO BCKDHB IVD HADHA 2 -8 -4 0 4 8 L-Leu OXCT = not expressed

ACADS HADH KO KO

APA BCKDHB HADHA Thal. amino low Skel. high acids L-Val ACADSB mmsB

ACAA1 - acetyl-CoA acyltransferase 1 (K07513) ACAA2 - acetyl-CoA acyltransferase 2 (K07508) ACADS - butyryl-CoA dehydrogenase (K00248) ACAA1 acetyl ACADSB - short/branched chain acyl-CoA dehydrogenase (K09478) CoA APA - amino acid/polyamine antiporter (K03294) ACADS BCKDHB - 2-oxoisovalerate dehydrogenase (K00167) HADH - 3-hydroxyacyl-CoA dehydrogenase (K00022) HADHA - enoyl-CoA hydratase (K07515) BCKDHB HADHA HADH ACAA2 PCCB HADHB - acetyl-CoA acyltransferase (K07509) L-Ile HMGCS - hydroxymethylglutaryl-CoA synthase (K01641) ACADSB IVD - isovaleryl-CoA dehydrogenase (K00253) mmsB - 3-hydroxyisobutyrate dehydrogenase (K00020) HADHB OXCT - 3-oxoacid CoA-transferase (K01027) PCCB - propionyl-CoA carboxylase (K01966)

GCLC log2 fold change relative to present-day CO IP 2 log2 fold change relative to present-day CO -8 -4 0 4 8 2 cysQ -8 -4 0 4 8 IMPL2 VTC4 nucleotides GSS = not expressed = not expressed

ACP5 KO KO glutathione KO KO PO Thal. 4 Thal.

PME low Skel. high GSR low phoD Skel. high

SLC20 SLC34 ACP5 - tartrate-resistant acid phosphatase (K14379) GCLC - glutamate-cysteine ligase (K11204) cysQ - 3’(2’),5’-bisphosphate nucleotidase (K01082) glutathione GSR - glutathione reductase NADPH (K00383) IMPL2 - Inositol-phosphate phosphatase (K18649) (reduced) GSS - glutathione synthase (K21456) SLC20 - Sodium-dependent phosphate transporter (K14640) GST - glutathione S-transferase (K00799) SLC34 - Sodium-dependent phosphate cotransporter (K14683) GSR PO PO VTC4 - Inositol-phosphate phosphatase (K10047) 4 4 4029 4030 4031

L-Gln log2 fold change relative to present-day CO 4032 log2 fold change relative to present-day CO 2 2 -8 -4 0 4 8 4033 -8 -4 0 4 8 gltD CS 4034 glnA nirA = not expressed 4035 oxaloacetate = not expressed + 4036 maeB NH4 citrate KO KO 4037 KO KO NR URE α-ketoglutarate L-Glu Thal. 4038 Thal.

low 4039 malate succinyl-CoA Skel. high low Skel. high 1P5C 4040 fumC LSC2 NRT SLC14 DUR3 AMT 4041 AMT - ammonium transporter (K03320) 1P5C - 1-pyrroline-5-carboxylate dehydrogenase (K00294) DUR3 - urea-proton symporter (K20989) 4042 CS - citrate synthase (K01647) NIRA - ferredoxin-nitrite reductase (K00366) L-Pro fumC - fumarate hydratase (K01679) NR - nitrate reductase (K10534) 4043 fumarate glnA - glutamine synthetase (K01915) NRT - nitrate/nitrite transporter (K02575) 4044 gltD - glutamate synthetase (K00266) NO - urea NH + URE - urease (K01427) succinate LSC2 - succinyl-CoA synthetase (K01900) 3 4 4045 maeB - malate dehydrogenase (K00029) 4046 4047 Figure 4. A heat map showing broad metabolic rearrangements in nitrogen uptake 4048 pathway members in Skeletonema and Thalassiosira across CO2 conditions. The color 4049 indicates the log2-fold change of total reads mapped per genus per KEGG KO in low 4050 or high CO2 relative to the present day CO2 condition. 4051 4052 4053 4054 4055 4056 4057 4058

4059

4060

4061

4062

4063

4064

4065

4066

169

8 2

4 high KO low 8 2 4067 0 = not expressed

4 KO Thal. Skel. high 4068

-4 low - urea-proton symporter (K20989) KO - ferredoxin-nitrite reductase (K00366) - ammonium transporter (K03320) - urease (K01427) - nitrate/nitrite transporter (K02575) - nitrate reductase (K10534) 4069 0 AMT DUR3 NIRA NR NRT URE log2 fold change relative to present-day CO 8 2 -8 = not expressed + KO Thal. Skel. 4070 4

+ -4

4 4 high AMT NH - glutamate-cysteine ligase (K11204)

- glutathione reductase NADPH (K00383) - glutathione synthase (K21456) low - glutathione S-transferase (K00799) NH KO 4071 log2 fold change relative to present-day CO GCLC GSR GSS GST -8 0 DUR3 4072 = not expressed URE conditions. The KO Thal. Skel. urea

2 -4 GSS GSR GSR GCLC SLC14 chain amino acid chain amino

(reduced) 4073 glutathione - glutathione - log2 fold change relative to present-day CO 3 - 2-oxoisovalerate dehydrogenase (K00167) - short/branched chain acyl-CoA dehydrogenase (K09478) - short/branched chain acyl-CoA -8 - hydroxymethylglutaryl-CoA synthase (K01641) - hydroxymethylglutaryl-CoA - acetyl-CoA acyltransferase (K07509) - acetyl-CoA - enoyl-CoA hydratase (K07515) - enoyl-CoA - butyryl-CoA dehydrogenase (K00248) - butyryl-CoA - acetyl-CoA acyltransferase 1 (K07513) - acetyl-CoA acyltransferase 2 (K07508) - acetyl-CoA NR - 3-hydroxyisobutyrate dehydrogenase (K00020) 4074 - 3-hydroxyacyl-CoA dehydrogenase (K00022) - 3-hydroxyacyl-CoA nirA - propionyl-CoA carboxylase (K01966) - propionyl-CoA NRT - 3-oxoacid CoA-transferase (K01027) - amino acid/polyamine antiporter (K03294) NO - isovaleryl-CoA dehydrogenase (K00253) - isovaleryl-CoA across CO across

ACAA1 ACAA2 ACADS ACADSB APA BCKDHB HADH HADHA HADHB HMGCS IVD mmsB OXCT PCCB branched 4075 CoA acetyl 4076 8 2 8 2

Thalassiosira 4077

PCCB OXCT HADH

mmsB

4 HMGCS high 4 high

and and

low KO low KO 4078 ACAA2 ACAA1 HADHB HADHA HADHA 0 0 4079 condition. condition.

= not expressed = not expressed otal reads mapped mapped per in or per genus low reads KO KEGG otal 2 IVD KO Thal. Skel. HADH KO Thal. Skel. ACADS ACADSB -4 -4 Skeletonema

4080 - Sodium-dependent phosphate transporter (K14640) - Sodium-dependent phosphate cotransporter (K14683) - Inositol-phosphate phosphatase (K18649) - tartrate-resistant acid phosphatase (K14379) - Inositol-phosphate phosphatase (K10047) HADHA - 3’(2’),5’-bisphosphate nucleotidase (K01082) log2 fold change relative to present-day CO - malate dehydrogenase (K00029) log2 fold change relative to present-day CO BCKDHB - succinyl-CoA synthetase (K01900) - succinyl-CoA - fumarate hydratase (K01679) - 1-pyrroline-5-carboxylate dehydrogenase (K00294) BCKDHB - glutamine synthetase (K01915) -8 - glutamate synthetase (K00266) -8

- citrate synthase (K01647) 4081 ACP5 cysQ IMPL2 SLC20 SLC34 VTC4 1P5C CS fumC glnA gltD LSC2 maeB ACADS ACADSB 4082 4 fold t of change VTC4 - gltD L-Val 4 L-Leu PO SLC34 IP BCKDHB 1P5C L-Gln L-Glu L-Pro

PO 4083 4 glnA IMPL2 SLC20 PO

L-Ile 4084 acids amino relative topresent day CO relative the A heat map showing broad metabolic rearrangements rearrangements in map metabolic A showing broad heat

CS 2 LSC2

citrate 4085 5. cysQ phoD ACP5 -ketoglutarate succinyl-CoA α APA 4086 Figure metabolism inpathway components color the indicates log2 high CO fumC maeB PME malate fumarate succinate 4087 oxaloacetate nucleotides 4088

4089

170

HMGCS log2 fold change relative to present-day CO BCKDHB IVD HADHA 2 -8 -4 0 4 8 L-Leu OXCT = not expressed

ACADS HADH KO KO

APA BCKDHB HADHA Thal. amino low Skel. high acids L-Val ACADSB mmsB

ACAA1 - acetyl-CoA acyltransferase 1 (K07513) ACAA2 - acetyl-CoA acyltransferase 2 (K07508) ACADS - butyryl-CoA dehydrogenase (K00248) ACAA1 acetyl ACADSB - short/branched chain acyl-CoA dehydrogenase (K09478) CoA APA - amino acid/polyamine antiporter (K03294) ACADS BCKDHB - 2-oxoisovalerate dehydrogenase (K00167) HADH - 3-hydroxyacyl-CoA dehydrogenase (K00022) HADHA - enoyl-CoA hydratase (K07515) BCKDHB HADHA HADH ACAA2 PCCB HADHB - acetyl-CoA acyltransferase (K07509) L-Ile HMGCS - hydroxymethylglutaryl-CoA synthase (K01641) ACADSB IVD - isovaleryl-CoA dehydrogenase (K00253) mmsB - 3-hydroxyisobutyrate dehydrogenase (K00020) HADHB OXCT - 3-oxoacid CoA-transferase (K01027) PCCB - propionyl-CoA carboxylase (K01966)

GCLC

log2 fold change relative to present-day CO IP 2 log2 fold change relative to present-day CO -8 -4 0 4 8 2 cysQ -8 -4 0 4 8 IMPL2 VTC4 nucleotides GSS = not expressed = not expressed

ACP5 KO KO glutathione KO KO PO Thal. 4 Thal.

PME low Skel. high GSR low phoD Skel. high

SLC20 SLC34 ACP5 - tartrate-resistant acid phosphatase (K14379) GCLC - glutamate-cysteine ligase (K11204) cysQ - 3’(2’),5’-bisphosphate nucleotidase (K01082) glutathione GSR - glutathione reductase NADPH (K00383) IMPL2 - Inositol-phosphate phosphatase (K18649) (reduced) GSS - glutathione synthase (K21456) SLC20 - Sodium-dependent phosphate transporter (K14640) GST - glutathione S-transferase (K00799) SLC34 - Sodium-dependent phosphate cotransporter (K14683) GSR VTC4 - Inositol-phosphate phosphatase (K10047) PO4 PO4 4090

L-Gln log2 fold change relative to present-day CO Figure 6. A heat map showing broad metaboliclog2 rearrangements fold change relative to in present-day phosphorous CO 2 4091 2 -8 -4 0 4 8 -8 -4 0 4 8 metabolism in Skeletonema and ThalassiosiragltD across CO2 conditions. The color 4092 indicates the log2-foldCS change of total reads mapped per genus per KEGG KO in low 4093 oxaloacetate glnA nirA = not expressed or high CO2 relative to the present day CO2 condition. = not expressed 4094 + maeB NH4 4095 citrate KO KO KO KO NR URE α-ketoglutarate L-Glu Thal. 4096 Thal. low malate succinyl-CoA Skel. high low 1P5C Skel. high fumC LSC2 NRT SLC14 DUR3 AMT 4097 AMT - ammonium transporter (K03320) 1P5C - 1-pyrroline-5-carboxylate dehydrogenase (K00294) DUR3 - urea-proton symporter (K20989) CS - citrate synthase (K01647) NIRA - ferredoxin-nitrite reductase (K00366) 4098 L-Pro fumC - fumarate hydratase (K01679) NR - nitrate reductase (K10534) fumarate glnA - glutamine synthetase (K01915) NRT - nitrate/nitrite transporter (K02575) gltD - glutamate synthetase (K00266) - NH + URE - urease (K01427) 4099 succinate LSC2 - succinyl-CoA synthetase (K01900) NO3 urea 4 maeB - malate dehydrogenase (K00029) 4100

4101

4102

4103

4104

4105

4106

171

HMGCS log2 fold change relative to present-day CO BCKDHB IVD HADHA 2 -8 -4 0 4 8 L-Leu OXCT = not expressed

ACADS HADH KO KO

APA BCKDHB HADHA Thal. amino low Skel. high acids L-Val ACADSB mmsB

ACAA1 - acetyl-CoA acyltransferase 1 (K07513) ACAA2 - acetyl-CoA acyltransferase 2 (K07508) ACADS - butyryl-CoA dehydrogenase (K00248) ACAA1 acetyl ACADSB - short/branched chain acyl-CoA dehydrogenase (K09478) CoA APA - amino acid/polyamine antiporter (K03294) ACADS BCKDHB - 2-oxoisovalerate dehydrogenase (K00167) HADH - 3-hydroxyacyl-CoA dehydrogenase (K00022) HADHA - enoyl-CoA hydratase (K07515) BCKDHB HADHA HADH ACAA2 PCCB HADHB - acetyl-CoA acyltransferase (K07509) L-Ile HMGCS - hydroxymethylglutaryl-CoA synthase (K01641) ACADSB IVD - isovaleryl-CoA dehydrogenase (K00253) mmsB - 3-hydroxyisobutyrate dehydrogenase (K00020) HADHB OXCT - 3-oxoacid CoA-transferase (K01027) PCCB - propionyl-CoA carboxylase (K01966) 4107

4108

4109 GCLC log2 fold change relative to present-day CO IP 2 log2 fold change relative to present-day CO 4110 -8 -4 0 4 8 2 cysQ -8 -4 0 4 8 IMPL2 VTC4 4111 nucleotides GSS = not expressed = not expressed 4112 ACP5 KO KO glutathione KO KO 4113 PO Thal. 4 Thal.

PME low 4114 Skel. high GSR low phoD Skel. high

SLC20 SLC34 4115 ACP5 - tartrate-resistant acid phosphatase (K14379) GCLC - glutamate-cysteine ligase (K11204) cysQ - 3’(2’),5’-bisphosphate nucleotidase (K01082) glutathione GSR - glutathione reductase NADPH (K00383) 4116 IMPL2 - Inositol-phosphate phosphatase (K18649) (reduced) GSS - glutathione synthase (K21456) SLC20 - Sodium-dependent phosphate transporter (K14640) GST - glutathione S-transferase (K00799) SLC34 - Sodium-dependent phosphate cotransporter (K14683) GSR 4117 VTC4 - Inositol-phosphate phosphatase (K10047) PO4 PO4 4118

4119 L-Gln log2 fold change relative to present-day CO log2 fold change relative to present-day CO 2 2 Figure 7. A heat map showing broad metabolic-8 rearrangements-4 in0 glutathione4 8 4120 -8 -4 0 4 8 gltD CS metabolism in Skeletonema and Thalassiosira across CO2 conditions. The color 4121 indicates the log2-fold change of total reads mapped per genus per KEGG KO in low 4122 oxaloacetate glnA nirA = not expressed = not expressed or high CO2 relative to the present day CO2 condition. 4123 + maeB NH4 4124 citrate KO KO KO KO NR URE α-ketoglutarate L-Glu Thal. 4125 Thal. low malate succinyl-CoA Skel. high low 1P5C Skel. high 4126 fumC LSC2 NRT SLC14 DUR3 AMT AMT - ammonium transporter (K03320) 4127 1P5C - 1-pyrroline-5-carboxylate dehydrogenase (K00294) DUR3 - urea-proton symporter (K20989) CS - citrate synthase (K01647) NIRA - ferredoxin-nitrite reductase (K00366) L-Pro fumC - fumarate hydratase (K01679) NR - nitrate reductase (K10534) 4128 fumarate glnA - glutamine synthetase (K01915) NRT - nitrate/nitrite transporter (K02575) gltD - glutamate synthetase (K00266) - NH + URE - urease (K01427) succinate LSC2 - succinyl-CoA synthetase (K01900) NO3 urea 4 4129 maeB - malate dehydrogenase (K00029) 4130

4131

172

4132

4133

4134

4135

4136

4137

4138

4139

4140

4141

4142

4143

4144

4145

4146

Figure 8. Expression values of the α, δ, γ, and ζ carbonic anhydrases from 4147 Skeletonema spp. and Thalassisoira spp. diatoms. Each point indicates a gene and the 4148 color indicates which genera of diatom the sequence is from. The x-axis is the 4149 expression value in present day pCO2 and the y-axis is the expression value in either 4150 low or high pCO2. The black line indicates the point at which the expression of a gene 4151 is the same in both conditions. Genes that were differentially expressed (>=0.95 4152 posterior probability of >2-fold change) are indicated by the black star) (Wu et al. 4153 2010). 4154 4155

4156

4157

4158

173

4159

4160

4161

4162

4163

4164

4165

4166

4167

4168

4169

4170

4171

4172

4173

Figure 9. Expression values of the SLC4 and SLC26 bicarbonate transporters from 4174 Skeletonema spp. and Thalassisoira spp. diatoms. Each point indicates a gene and the 4175 color indicates which genera of diatom the sequence is from. The x-axis is the 4176 expression value in present day pCO2 and the y-axis is the expression value in either 4177 low or high pCO2. The black line indicates the point at which the expression of a gene 4178 is the same in both conditions. Genes that were differentially expressed (>=0.95 4179 posterior probability of >2-fold change) are indicated by the black star) (Wu et al. 4180 2010). 4181 4182

4183

4184

4185

174

APPENDICES 4186

APPENDIX 1 4187

4188

Amphiprora sp. CCMP467 2 Asterionellopsis glacialis CCMP1581 1 4189 Attheya septentrionalis CCMP2084 2 Aulacoseira subarctica CCAP 1002/5 1 affinis CCMP159 2 Chaetoceros cf. neogracile RCC1993 1 Chaetoceros debilis MM31A-1 6 >400 Chaetoceros dichaeta CCMP1751 1 Chaetoceros neogracile CCMP1317 6 Corethron hystrix 308 1 Corethron pennatum L29A3 1 Cyclotella meneghiniana CCMP 338 112 Cylindrotheca closterium KMMCC:B-181 4 Detonula confervacea CCMP 353 46 Ditylum brightwellii GSO103 1 Ditylum brightwellii GSO104 2 Ditylum brightwellii GSO105 2 Ditylum brightwellii Pop1 (SS4) 3 Ditylum brightwellii Pop2 (SS10) 1 Extubocellulus spinifer CCMP396 65 Fragilariopsis kerguelensis L2-C3 18 200-400 Fragilariopsis kerguelensis L26-C5 30 Grammatophora oceanica CCMP 410 3 Leptocylindrus danicus CCMP1856 1 Leptocylindrus danicus var. apora B651 1 Leptocylindrus danicus var. danicus B650 2 Minutocellus polymorphus NH13 19 Minutocellus polymorphus RCC2270 27 Nitzschia punctata CCMP561 2 Proboscia alata PI-D3 7 Pseudo-nitzschia delicatissima B596 2 Pseudo-nitzschia delicatissima UNC1205 1 Pseudo-nitzschia fraudulenta WWA7 1 Rhizosolenia setigera CCMP 1694 3 Skeletonema costatum 1716 279 20-200 Skeletonema dohrnii SkelB 159 Skeletonema marinoi FE60 96 Skeletonema marinoi FE7 188 Skeletonema marinoi SM1012Den-03 248 Skeletonema marinoi SM1012Hels-07 179 Skeletonema marinoi UNC1201 67 Skeletonema marinoi skelA 233 Skeletonema menzelii CCMP793 117 Thalassionema nitzschioides L26-B 2 Thalassiosira antarctica CCMP982 61 Thalassiosira gravida GMp14c1 4 Thalassiosira miniscula CCMP1093 26 Thalassiosira oceanica CCMP1005 602 Thalassiosira punctigera Tpunct2005C2 6 <20 Thalassiosira rotula CCMP3096 3 Thalassiosira rotula GSO102 5 Thalassiosira sp. NH16 30 Thalassiosira weissflogii CCMP1010 13 Thalassiosira weissflogii CCMP1336 1 Thalassiothrix antarctica L6-D1 3 Copy #

175

4190

Supplemental Figure 2. Sel1/MYND zinc finger domain family homologs from diatoms in the MMETSP database. The color of each box indicates the number of homologs found in each species.

176

4191 Supplemental Table 1. Detailed table for T. oceanica and T. weissflogii showing the 4192 KEGG orthology or protein domain information assigned by KEGG automated 4193 annotation server or InterProScan. 4194 4195 Supplemental Table 2. Detailed table for T. oceanica and T. weissflogii showing the 4196 posterior probability of a 2-fold up or downregulation in silicon limitation relative to 4197 the nutrient replete condition, the gene length in nucleotides, and the expression values 4198 in nutrient replete (Re), silicon limited (Si), iron limited (Fe), and nitrate limited (N) 4199 conditions in RPKM and total reads mapped. 4200 4201 Supplemental Table 3. Hierarchical cluster assignments for the silicon-sensitive genes 4202 from T. oceanica and T. weissflogii and the log2 fold change expression value in 4203 silicon (Si), iron (Fe), or nitrate (N) limited conditions relative to the nutrient replete 4204 condition. 4205 4206 Supplemental Table 4. Silix gene family assignments for the translated proteins from 4207 the T. oceanica, T. weissflogii, and T. rotula transcriptomes as well as the translated 4208 proteins from the T. pseudonana genome (filtered set of gene models from Thaps3 and 4209 Thaps3_bd). Each Silix gene family assignment starts with the prefix “FAM”. The 4210 genes from the Thaps3 data start with a “jgi_THApse_” prefix, followed by the gene 4211 number assigned by JGI. The genes from the Thaps3_bd data start with a 4212 “jgi_THApbd_” prefix, followed by the gene number assigned by JGI. 4213 4214

4215

4216

4217

4218

177

APPENDIX 2 4219

Supplemental Table 1. All of the significantly differentially expressed genes 4220 across conditions and transcriptomes (>= 0.95 posterior probability of a 2-fold 4221 change in ASC) (Wu et al. 2010). The KO and description columns are from the 4222 KEGG annotations. The gene_ID column entries are formatted such that the first 4223 6 letters indicate which genus (capital letters) species (lower case letters) 4224 transcriptome the sequence originates and the following letters and numbers 4225 indicate the gene ID within that transcriptome. The counts and RPKM values are 4226 the output from the CLC mapping. The next six columns are the posterior 4227 probabilities of a 2-fold change compared to the 320-390 ppm pCO2 condition 4228 (“ambient”) as per ASC (Wu et al. 2010). “Low” refers to the <290 ppm 4229 condition, “projected” refers to the 690-1340 ppm pCO2 condition, and “max” 4230 refers to the 2900-5100 ppm pCO2 condition. The last 8 columns are from the 4231 InterProScan output. 4232 4233 Supplemental Table 2. A summary of the enzyme type, predicted localization, 4234 expression value, and ASC posterior probability of a 2 fold change (Wu et al. 4235 2010) for each gene hypothesized to play a role in the diatom CCM. This includes 4236 bicarbonate transporters, carbonic anhydrases, and genes thought to be involved 4237 in diatom C4 metabolism. The gene_ID column is formatted such that the first 6 4238 letters indicate which genus (capital letters) species (lower case letters) 4239 transcriptome the sequence originates and the following letters and numbers 4240 indicate the gene ID within that transcriptome. The additional worksheets provide 4241 the outputs from the different localization predictions and the “Predicted 4242 Localization” column is the interpretation of these results. 4243 4244 Supplemental Table 3. Selected putative CCM genes from the T. oceanica 4245 transcriptome (this study) and their predicted location when compared to the T. 4246 oceanica genome (Lommer et al. 2012) . Genes included in this table include 4247 bicarbonate transporters, carbonic anhydrases, and genes thought to be involved 4248 in diatom C4 metabolism. 4249 4250

178

Metatranscriptome

60000

40000

20000 Low CO2 (<215 ppm)

Present CO2 (330-390 ppm)

Future CO2 (>780 ppm)

Unique Transcripts Recovered ( >= 2 reads mapped) APPENDIX 3 4251 0

0 25 50 75 100 % of Library

Amplicon

50

40

30

Species 20

Low CO2 (<215 ppm)

Present CO2 (330-390 ppm) 10 Future CO2 (>780 ppm)

0 0 5000 10000 15000 20000 25000 Sample Size 4252

Supplemental Figure 1. A rarefaction curve of the 18S V4 rDNA amplicons. The x- 4253 axis indicates the number of amplicons sampled, and the y-axis indicates the number 4254 of species recovered. The color of the line indicates which pCO2 condition is being 4255 shown. Each triplicate carboy is represented. 4256 4257 4258 4259 4260 4261 4262

4263

4264

179

Metatranscriptome

60000

40000

20000 Low CO2 (<215 ppm)

Present CO2 (330-390 ppm)

Future CO2 (>780 ppm) Unique Transcripts Recovered ( >= 2 reads mapped) 0

0 25 50 75 100 % of Library Amplicon 4265

50 4266

40 Supplemental Figure 2. A rarefaction curve of the metatranscriptome reads mapped to 4267 a subset of transcripts30 from Skeletonema spp. and Thalassiosira spp.. The x-axis 4268 indicates the percentage of the library that was sampled, and the y-axis indicates the 4269 number of uniqueSpecies 20transcripts recovered with at least 2 reads mapped (BWA MEM 4270 Low CO2 (<215 ppm) Present CO (330-390 ppm) MAPQ60 filtering). The color of the line indicates which2 pCO2 condition is being 4271 10 Future CO (>780 ppm) shown. Replicate carboys were pooled before sequenci2 ng, so there is a single line for 4272 each CO conditions’ transcriptome library. 4273 2 0 0 5000 10000 15000 20000 25000 4274 Sample Size 4275

4276

4277

4278

4279

4280

4281

4282

4283

4284

180

4285

4286

Supplemental Table 1: Top: Total number of reads remaining in each library post- 4287 trimming. Middle: Total number of reads that mapped in DIAMOND to the entire 4288 MMETSP database, only the diatoms in the MMETSP database, and only to 4289 Skeletonema spp. or Thalassiosira spp. diatoms in the MMETSP database. For the 4290 DIAMOND mapping, the forward and reverse reads were mapped separately and the 4291 values shown here is the sum of the forward and reverse read counts. Bottom: Number 4292 of read pairs mapped to Skeletonema spp. or Thalassiosira spp. using a more stringent 4293 BWA mapping and MAPQ60 filtering in SAMtools. 4294 4295

4296

4297 low present high

post-trim library sizes 13,831,711 25,037,661 18,501,137

reads mapped to 73,566,310 143,481,946 159,523,246 MMETSP (DIAMOND) reads mapped to MMETSP diatoms 54,032,252 91,529,538 102,295,960 (DIAMOND) reads mapped to MMETSP Skeletonema 25,036,845 24,748,686 27,215,604 (DIAMOND) reads mapped to MMETSP Thalassiosira 10,706,149 26,492,162 28,780,506 (DIAMOND) reads mapped to Skeletonema (BWA 915,475 617,662 547,298 MAPQ60) reads mapped to Thalassiosira (BWA 826,575 1,263,129 1,069,587 MAPQ60) 4298 4299 4300

4301

4302

181

4303

Supplemental Table 2. All of the KEGG KOs from this experiment. The first two 4304 columns are the KEGG KO ID and description. The additional columns indicate the 4305 log2-fold change in low or high CO2 relative to the present day CO2 for Skeletonema 4306 spp. and Thalassiosira spp.. Also shown are the normalized expression values, the 4307 posterior probability of a KO being differentially expressed (Wu et al. 2010), and 4308 higher-level information about which metabolic processes each KO is involved in. 4309 4310 Supplemental Table 3. All of the MMETSP genes detected in this experiment (Keeling 4311 et al. 2014). Columns shown indicate the CAMNT or CAMPEP ID for each gene, the 4312 MMETSP grindstone ID, genus, and species that the sequenced originated from, the 4313 posterior probability of a log2-fold change in low or high CO2 relative to the present 4314 day CO2 (Wu et al. 2010), the normalized reads and TPM values, the log2-fold change 4315 values in low or high CO2 relative to the present day CO2, and higher-level information 4316 about which metabolic processes each KO is involved in. 4317 4318 Supplemental Table 4. A table summarizing all of the carbonic anhydrases and 4319 bicarbonate transporters in this dataset. Columns shown indicate the CAMNT or 4320 CAMPEP ID for each gene, the genus, and the species each sequence originated from. 4321 Also shown is the posterior probability of a log2-fold change in low or high CO2 4322 relative to the present day CO2 (Wu et al. 2010), the normalized reads, the log2-fold 4323 change values in low or high CO2 relative to the present day CO2, the enzyme type for 4324 each transcript (alpha, delta, gamma, or zeta carbonic anhydrase; SLC4 or SLC26 4325 bicarbonate transporter), and the log10-transformation of the normalized reads. 4326 4327

4328

4329

4330

4331

4332

4333

4334

4335

4336

182