Biogeography, Cultivation and Genomic Characterization of in the Red Sea

Dissertation by

Ahmed A. Shibl

In Partial Fulfillment of the Requirements

For the Degree of

Doctor of Philosophy

King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia

©December 2015 Ahmed A. Shibl All Rights Reserved

2

EXAMINATION COMMITTEE APPROVAL FORM

The dissertation by Ahmed A. Shibl is approved by the examination committee.

Committee Chairperson: ______Prof. Ulrich Stingl______

Committee Member: ______Prof. Xelu Moran______

Committee Member: ______Prof. Arnab Pain______

Committee Member: ______Prof. Rania Siam______

3

ABSTRACT

Biogeography, Cultivation and Genomic Characterization of Prochlorococcus in the Red Sea

Ahmed A. Shibl

Aquatic primary productivity mainly depends on pelagic phytoplankton.

The globally abundant marine picocyanobacteria Prochlorococcus comprises a significant fraction of the photosynthetic biomass in most tropical, oligotrophic oceans. The Red Sea is an enclosed narrow body of water characterized by continuous solar irradiance, and negligible annual rainfall, in addition to elevated temperatures and salinity levels, which mimics a global warming scenario.

Analysis of 16S rRNA sequences of communities indicated the predominance of a high-light adapted ecotype (HL II) of Prochlorococcus at the surface of the Northern and Central Red Sea. To this end, we analyzed the distribution of Prochlorococcus at multiple depths within and below the euphotic zone in different regions of the Red Sea, using clone libraries of the 16S–23S rRNA internal transcribed spacer (ITS) region. Results indicated a high diversity of Prochlorococcus ecotypes at the 100 m depth in the water column and an unusual dominance of HL II-related sequences in deeper waters of the Red Sea.

To further investigate the microdiversity of Prochlorococcus over a wider biogeographical scope, we used a 454-pyrosequencing approach to analyze rpoC1 gene pyrotags. Samples were collected from the surface of the water column to up to 500 m at 45 stations that span the Red Sea’s main basin from

4 north to south. Phylogenetic analysis of abundant rpoC1 OTUs revealed genotypes of recently discovered strains that belong to the high-light and low- light clades. In addition, we used a rapid community-profiling tool (GraftM) and quantitatively analyzed rpoC1 gene abundance from 45 metagenomes to assess the Prochlorococcus community structure across vertical and horizontal physicochemical gradients. Results revealed the clustering of samples according to their depth and a strong influence on ecotypic distribution by temperature and oxygen levels. In efforts to better understand how the cells survive the unusual features of the Red Sea, a Prochlorococcus strain of the HL II adapted clade from the euphotic zone was cultured, enabling morphological analyses and growth rates measurements for the strain. In addition, we successfully sequenced and annotated the of the strain, which was then used for genomic comparison with other ecotypes. Interestingly, the set of unique genes identified in the draft genome included genes encoding proteins involved in salt tolerance mechanisms. The expression level and pattern of these genes in the

Red Sea water column was explored through metatranscriptomic mapping and revealed their occurrence throughout, independent of the diel cycle. This led to the hypothesis that Prochlorococcus populations in the highly saline Red Sea are able to biosynthesize additional compatible solutes via several pathways to counterbalance the effects of salt stress. The results presented in this dissertation provide the first glimpse on how the environmental parameters of the

Red Sea can affect the evolution, diversity and distribution patterns of

Prochlorococcus ecotypes.

5

ACKNOWLEDGMENTS First and foremost, Praise be to Allah (AlHamdulilah). Knowing that there are so many people to thank for teaching me, helping me or simply being there for me during the past few years, I will try to mention them all. I would like to thank my committee members, especially my advisor Prof. Uli. He has been patient, motivating and extremely helpful in guiding me through the research I conducted in his lab. I would also like to extend my appreciation to the Core Labs within KAUST. Without the skills of technicians, operators and fellow researchers that work with KAUST facilities and equipment, a lot of my research would not have been possible, or complete. My lab group has been amazing. The Stingl lab members are really my family and friends before they are my colleagues, especially Tyas, Sou, Francy, Steffi, Luke, David, Fauzi, and Agathe. It is hard not to learn and benefit from such a friendly and supportive group. I only hope to work with people half as great as the people I was fortunate enough to be affiliated with here at KAUST. My sincere thanks also go to my advisor from AUC, Prof. Hamza el-Dorry and Prof. Rania Siam who put me on the research track during my Master’s degree. Without their encouragement, support, recommendation and trust I would not be writing this. My parents, sibilings and my grandmother are people I cannot thank enough. Their unconditional love is the fuel I live by everyday if not every hour. Anything I do, all I achieve and everything I become I will owe to them. I always hope to live up to their expectations and make them proud. One very important person in my life I am grateful for is Marcela. Thank you for being there during the bad times before the good ones. Thank you for being patient when I was stressed. Thank you for being next to me when everything else seemed to be against me. Last but not least, I want to give a shoutout to my good friend Taimur TY. It was those skype/chilling sessions that kept me going. See you soon brother. Finally, I extend my gratitude to all who wish me well and gave me their support, directly and indirectly.

6

TABLE OF CONTENTS

EXAMINATION COMMITTEE APPROVAL FORM ...... 2

ABSTRACT ...... 3

ACKNOWLEDGMENTS ...... 5

LIST OF ABBREVIATIONS ...... 8

LIST OF FIGURES ...... 9

LIST OF TABLES ...... 15

INTRODUCTION ...... 16 The Red Sea ...... 16 Environment and characteristics ...... 16 Marine ...... 19 Microbiology of the Red Sea ...... 21 The Prochlorococcus genus ...... 23 Discovery and taxonomy ...... 23 Phylogenetic and genomic analyses ...... 26 Ecology and evolution ...... 30 Metabolism and nutrient utilization ...... 33 Morphology and growth ...... 36 Isolation and application ...... 39 References ...... 42

OBJECTIVES ...... 56

CHAPTER ONE ...... 58 Abstract ...... 59 Introduction ...... 60 Methods and materials ...... 65 Results and Discussion ...... 69 Conclusions ...... 76 Data submission ...... 77 Acknowledgements ...... 77

7 References ...... 78 Figure legends ...... 82

CHAPTER TWO ...... 84 Abstract ...... 85 Introduction ...... 87 Materials and methods ...... 92 Results ...... 97 Discussion ...... 106 Conclusions ...... 112 Acknowledgement ...... 113 Figure legends ...... 113 Supplementary material ...... 115 References ...... 117 Supplementary tables and figures ...... 125

CHAPTER THREE ...... 129 Abstract ...... 130 Introduction ...... 131 Materials and Methods ...... 134 Results ...... 142 Discussion ...... 155 Conclusion ...... 161 Acknowledgement ...... 162 References ...... 163 Supplementary information ...... 171

CONCLUSIONS AND FUTURE PROSPECTS ...... 174

8

LIST OF ABBREVIATIONS

AAI Average Identity ALOHA A Long-Term Oligotrophic Habitat Assessment ANI Average Nucleotide Identity BLAST Basic Local Alignment Search Tool BML Below Mixed Layer COGS Cluster Of Orthologous Groups CSCG Conserved Single-Copy Genes DCM Deep Chlorophyll Maximum DNA Deoxyribonucleic Acid GASW Gulf of Aden Surface Water GAIW Gulf of Aden Intermediate Water GG Glucosyl-Glycerol GGA Glucosylglycerate HEPES Hydroxyethyl-1-Piperazineethanesulfonic Acid HL High-Light HNLC High Nutrient Low Chlorophyll ITS Internal Transcribed Spacer KAUST King Abdullah University Of Science And Technology KEGG Kyoto Encyclopedia Of Genes And KRSE KAUST Red Sea Expedition LL Low-Light OMZ Oxygen Minimum Zone OTU Operational taxonomic unit PCR Polymerase Chain Reaction RNA Ribonucleic Acid TAPS Tris [Hydroxymethyl] Methyl-3-Aminopropanesulfonic Acid TEM Transmission Electron Microscopy WHOI Woods Hole Oceanographic Institute

9 LIST OF FIGURES

INTRODUCTION Figure 1 The main basin of the Red Sea and surrounding countries. In the north, the Gulf Of Suez connects the water body to the Mediteranean Sea while the narrow strait of Bab Al-Mandab in the south connects it to the Indian Ocean...... 17 Figure 2 The phylogenetic distribution of the 12 major clades of Prochlorococcus as determined by the internal transcribed spacer (ITS) sequence diversity. 23 Figure 3 The relationship between temperature and the growth optima of the HL- adapted clades I and II...... 30 Figure 4 The HL adapted cells have higher growth rates at higher light intensities, while the LL cells are not able to survive above a certain light intensity threshold...... 36

CHAPTER 1 Figure 1 Partial map of the Red Sea indicating the location of sampling sites. Stations 2, 5, 18, 14, 27 and 31 were sampled during leg 1 (March) of the 2010 Kaust Red Sea Expedition (KRSE 2010). Stations 198, 197, 13, 4 were sampled during leg 1 (September–October) of KRSE 2011...... 60 Figure 2 Neighbor-joining phylogenetic tree with collapsed subgroups based on KRSE 2010 and KRSE 2011 Prochlorococcus 16S–23S rRNA ITS sequences from clone libraries from the Red Sea surface waters (10 m) and the water column. Major Prochlorococcus clades are indicated on the tree. The numbers in the triangular node indicate the number of sequences in each of the datasets (left: 2010, right: 2011). The tree was rooted with the ITS sequence of Synechococcus strain WH7803...... 67 Figure 3 Prochlorococcus community composition in the Red Sea. A) surface waters (10 m) at different stations sampled during KRSE 2010; b) water column of the north, coastal station 198 (KRSE 2011), including surface water samples from station 2 (KRSE 2010); c) water column of the north, open-water station 197 (KRSE 2011), including surface water samples from

10 station 5 (KRSE 2010); d) water column of the south, coastal station 13 (KRSE 2011); and e) water column of the south, open-water station 4 (KRSE 2011). The asterisk (*) indicates surface water samples from KRSE 2010. 68

CHAPTER 2 Figure 1 Map of the Red Sea’s main basin. Dark blue solid circles indicate stations at which the water column was sampled for pyrotag sequencing of the Prochlorococcus rpoC1 gene fragment. Yellow numbered stars represent the stations at which the water column was sampled for further metagenomic shotgun sequencing. All water samples were collected during KRSE2011 onboard the R/V Aegaeo...... 94 Figure 2 Metagenomic 16S rRNA abundance. The boxplot displays the relative abundance (%) of Prochlorococcus according to 16S rRNA gene fragment recruitment at different depths from the eight sequenced metagenomes. Center lines show the medians; box limits indicate the 25th and 75th percentiles as determined by R software; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles...... 95 Figure 3 Heat map based on taxonomic classification of rpoC1 pyrotags. The taxonomy heat map depicts the relative OTU abundance (%) of Prochlorococcus rpoC1 gene from 454-pyrotag data along the different depths of the water column. The samples in the heatmap are those with >50 sequences retrieved and are arranged by station, from south to north...... 97 Figure 4 (next page) Neighbor-joining tree of rpoC1 sequences. The phylogenetic tree was constructed using the p-distance model and 1,000 bootstrap replicates. The sequences included were obtained from ~50 Prochlorococcus and ten Synechococcus strains (CC9311, CC9902, CC9605, RCC307, RS9916, RS9917, WH5701, WH7803, WH7805, WH8102). The most abundant OTUs from this study are all shown on the tree except for “OTU31 (KT585662)” that fell within the collapsed LL II/ III group. Bootstrap values are represented on each node by symbols and the scale bar indicates the estimated sequence divergence. The outgroup used on the tree is the rpoC1 sequence of Synechococcus elongates strain

11 PCC6301. The branches representing major clades of Prochlorococcus are collapsed (except for HL II) for easier visualization. The accession number of each OTU and the number of sequences contained in them is shown at the branch tip...... 99 Figure 5 Principal component analysis (PCA) of the Prochlorococcus population based on the rpoC1 reads recruited from the metagenomes via GraftM. A) ordination plot of physicochemical parameters and the community structure of Prochlorococcus from all stations and depths. B) ordination plot of physicochemical parameters and the community structure of Prochlorococcus from surface samples (10, 25, and 50 m) only. The lengths of the lines represent the strength of the correlation. Symbols indicate the different stations (arranged from south to north) from which the metagenomic samples were obtained and colors indicate the different depths (green; 10, 25, and 50 m, orange; 100 m, blue; 200-500 m) from which they were collected...... 101 Figure S1 Alpha diversity indices of the six sampled depths from stations sampled for pyrotag sequencing. A) community richness was estimated using observed species and chao1 estimator. Community diversity was calculated using the shannon index. Analyses were executed and plotted in the r environment (R core team 2014, www.r-project.org). B) rarefaction curve based on the observed species. C) rarefaction curve based on the good’s coverage for the pyrotags. Both curves were generated using the mothur software package (Schloss et al., 2009)...... 122 Figure S2 Phylogenetic trees of reference Prochlorococcus/ Synechococcus ITS and rpoC1 sequences, generated using MEGA (Tamura et al., 2013). A) neighbor-joining tree of 16S-23S rRNA ITS sequences constructed using p- distance and 1,000 bootstrap iterations. The sequences were retrieved from all lineages available to date that represent the major phylogenetic clades of Prochlorococcus and ten Synechococcus strains (CC9311, CC9902, CC9605, RCC307, RS9916, RS9917, WH5701, WH7803, WH7805, WH8102). The HL II branch contains its sequences obtained from the 96 single cells sequenced by Kashtan et al (2014) and is collapsed for easier

12 representation. The Synechococcus branch is also collapsed for easier representation. Black solid circles represent bootstrap values more than 80% and the scale bar indicates the estimated sequence divergence. Accession numbers for ITS sequences of strains that do not have a sequenced genome are shown in brackets. The Synechococcus elongates strain PCC6301 was used as an outgroup. B) neighbor-joining tree of rpoC1 sequences constructed using p-distance and 1,000 bootstrap iterations. The sequences were retrieved from all lineages available to date that represent the major phylogenetic clades of Prochlorococcus and ten Synechococcus strains (CC9311, CC9902, CC9605, RCC307, RS9916, RS9917, WH5701, WH7803, WH7805, WH8102). The HL II branch contains rpoC1 sequences obtained from the 96 single cells sequenced by Kashtan et al (2014) and is collapsed for easier representation. The Synechococcus branch is also collapsed for easier representation. Black solid circles represent bootstrap values more than 80% and the scale bar indicates the estimated sequence divergence. Accession numbers for rpoC1 sequences of strains that do not have a sequenced genome are shown in brackets. The Synechococcus elongates strain PCC6301 was used as an outgroup...... 123 Figure S3 Heat map of the abundance distribution for the 38 rpoC1 OTUs that were found more than two times in at least 5% of the samples. The hierarchical clustering of the sampled stations and depths are shown on the left side of the map. The different shades of grey encompass the OTUs that belong to the HL- or LL-adapted clades. The bracket on the right side highlight the clustering of 100 m samples at which the LL-affiliated OTUs were found to be most abundant. The heatmap was generated in the R environment (R core team 2014, www.r-project.org)...... 124

CHAPTER 3 Figure 1 Visual characterizations of the Red Sea Prochlorococcus isolate RSP50. Left panel: visible cell growth in liquid media. Right panel: cytogram showing distinct red fluorescence signals of Prochlorococcus RSP50 relative to Synechococcus. To calculate growth rates for replicates of the Red Sea

13 strain, daily measurements of cell counts were taken for 14 days and the exponential growth phase seems to start at day six. RSP50 was characterized by a growth rate of 0.57 ± 0.01 (n = 3), i.e. a doubling time of 1.22 days. The replicates were grown in 20 ml of 75% seawater-PRO99 media and the maximum abundance reached over 5 × 107 cells per ml of culture...... 138 Figure 2 Growth curves of the Red Sea Prochlorococcus isolate RSP50 based on cytometric counts. A, B and C are replicates. The growth rate and doubling time are shown in the top left of the graph. The exponential growth phase is observed between days six to nine...... 139 Figure 3 Images of the Red Sea Prochlorococcus isolate RSP50 captured using tem with noticeable structures: thylakoids (yellow arrows), carboxysomes (white arrows), and electron-rich structures (blue arrows). The black scale bar indicates a length of 500 nm...... 140 Figure 4 Phylogenetic trees of a) 16S rRNA gene and b) ITS based on neighbor joining and maximum likelihood trees. Bootstrap values of 75% or more are indicated by solid black dots as shown. Scale bars represent estimated sequence divergence as shown. Synechococcus strains from the subcluster 5.1a were used as outgroup...... 140 Figure 5 Whole genome phylogeny based on 239 conserved single copy genes from 43 sequenced Prochlorococcus strains and eight Synechococcus strains plus the Red Sea strains RSP50...... 143 Figure 6 Venn diagram showing the number of orthologous genes shared by all 44 Prochlorococcus genomes, and the number of unique gene found in the RSP50 genome...... 144 Figure 7 Expression patterns of A) Photosystem Q protein (psbA) B) Sucrose phosphate synthase (spsA) in the water column of the Red Sea. The expression pattern of the psbA gene follows a diel cycle where the FPKM is high during the daytime and starts to decrease, as it gets dark. The spsA gene involvled in compatible solute biosynthesis does not follow a diel cycle due to its requirement throughout the day to counterbalance the high salnity of the Red Sea...... 148

14 Figure 8 Expression pattern of a) Glucosylglycerol-phosphate synthase b)Glycerol 3-phosphate dehydrogenase and c) Trehalose synthase in the water column of the Red Sea ...... 150 Figure S1 Heatmap showing the AAI values shared between the Red Sea strain RSP50 and the 43 Prochlorococcus strains that have sequenced genomes...... 169

15 LIST OF TABLES

Chapter 1 Table 1 Overview on physico-chemical characteristics of sampled stations including number of analyzed ITS sequences and phylogenetic diversity of the samples...... 64 Table 2 Correlations of environmental factors and the presence of certain ecotypes on phylogenetic diversity or presence of other factors ...... 66

Chapter 2 Table S1 Overview of the physicochemical characteristics of stations sampled for metagenomic shotgun sequencing and analysis during the KAUST Red Sea Expedition (KRSE) In 2011...... 121

Chapter 3 Table S1 Genome characteristics of the various Prochlorococcus strains that were sequenced, including the Red Sea strain highlighted in grey ...... 167 Table S2 List of unique genes in RSP50 discussed in this study...... 168

16 INTRODUCTION

The Red Sea

Environment and characteristics

The Red Sea (Figure 1) is a narrow basin located between 12.5° N and

30° N and occupies the narrow basin between the African and Asian continents.

With a total area of ~4.59 x 105 km2 (Head 1987), this isolated water body runs

~2,000 km long with a maximum width of 355 km and a maximum depth of slightly more than 2,500 m (Smeed 2004). At its northern end, the Red Sea is forked by the Sinai Peninsula between the Gulfs of Suez and Aqaba, allowing negligible water exchange. At the southern end, the Gulf of Aden connects the

Red Sea to the Indian Ocean through the slim Strait of Bab Al-Mandab. This shallow channel is the only connection with the global ocean that allows an intrusion of foreign water masses mostly during the winter and summer seasons.

The water mass identified in the winter season, Gulf of Aden Surface Water

(GASW), penetrates the surface layer of the Red Sea and has similar temperature and salinity levels to the main basin (Sofianos and Johns 2015). In the summer season, hydrographic surveys identified the Gulf of Aden

Intermediate Waters (GAIW) as the incoming water mass into the intermediate layer (30-100 m) of the Red Sea (Sofianos et al. 2002; Bower and Farrar 2015).

The newer, colder GAIW is less saline and more nutrient-rich than the receiving basin and flows northwards via wind fields and anticyclonic eddies (Churchill et al. 2015; Bower and Farrar 2015). The surface of the Red Sea harbors thousands of islands, several of which are famous and well studied (e.g. Farasan

17 Islands in the south), and most resulting from volcanic eruptions (Sheppard et al.

1992; Behairy et al. 1992). At the bottom of the central Red Sea, ~1,600 to 2,500 m deep, hydrothermal activity and geomorphology gave rise to mineral-rich brine pools. The brine deeps vary significantly in terms of size, depth, metalliferous deposits, and temperature where several of them are better explored than others

(e.g. Atlantis-II Deep) (Antunes et al. 2011). Recorded temperatures of the main basin range from ~22 to 32 °C, with maximum variation between the winter and summer seasons (Raitsos et al. 2013). Arid and humid deserts surround the Red

Sea, providing limited chances of water input and renewal at either end

(Sheppard et al. 1992; Da Silva 1994). Resulting from high temperatures and an evaporation rate exceeding annual rainfall amounts (Morcos 1970; Siddall et al.

2004), the Red Sea water salinity ranges from 36 to 41, rendering it one of the most saline water bodies in the world (Edwards 1987; Alraddadi 2013). The water column of the Red Sea counts as an extremely biodiverse marine environment, rich in flora, fauna, coral reefs and over 1,350 species of fish

(Goren and Dor 1994). The Red Sea is under investigated and little is known about most of its components, including the microbial planktonic food web. At the base of the marine food web come the numerically dominant phytoplankton, processing energy and controlling vital biogeochemical cycles.

18

Figure 1 The main basin of the Red Sea and surrounding countries. In the north, the Gulf of Suez connects the water body to the Mediteranean Sea while the narrow strait of Bab Al-Mandab in the south connects it to the Indian Ocean.

19 Marine microbial ecology

Marine microbial communities dominate the biomass in the Red Sea, which is believed to owe its name to the coloration of surface waters during

Trichodesmium blooms (Belogoroskaya 1970). It is considered a highly nutrient- deficient environment, with the southern region receiving more nutrients than the northern part, transported within the GAIW (Acker et al. 2008; Churchill et al.

2015). The Red Sea is underexplored in terms of the structure and function of the microbial populations it harbors. It remains unclear how distinct conditions in the

Red Sea affect the bacterioplankton diversity, distribution and adaptations. The observed intrusion of nutrient-rich GAIW through the southern Red Sea appears to enhance the productivity of the southern region (Churchill et al. 2015). In turn, this phenomenon affects the temporal and spatial distribution of primary producers and other phytoplankton, as reflected by the consistently higher Chl a concentrations in the southern end of the main basin (Raitsos et al. 2013;

Churchill et al. 2015). Studying the ecological adaptations of marine microbial assemblages will help predict the effects of global climate change on oceanic ecosystems. Indeed, this is particularly important because the Red Sea mimics a global warming scenario, and the dynamics of complex networks of microbial communities will explain their effects on biogeochemical processes. Since the beginning of the gene-sequencing era, molecular techniques have developed, improved and became more affordable. The rise of high-throughput sequencing

(e.g. Roche 454, Illumina, Oxford Nanopore, etc.) and DNA sequence analysis resulted in exponential amounts of datasets, which in turn raised the demand for bioinformatics tools to process and visualize the information. Advances in the

20 technologies and computational power allowed for the molecular analyses of entire marine environments using marker genes (Venter 2004) and paved the way for the marine ‘omics’ revolution. The study of microbial populations using

DNA and RNA in marine habitats has progressed as sequencing costs declined and annotation/ assembly pipelines became more standard. The combination of culture-dependent (e.g. transcriptomics) and culture-independent approaches

(e.g. single-cell genomics) continues to provide a large influx of data that unravel marine microbial community interactions and dynamics (Fuhrman et al. 2015).

In the last few years, the Marine Microbial Ecology lab has been targetting questions regarding the interactions and adaptations of microbial communities in the main basin of the Red Sea. Collecting samples from the water column and brine pools was facilitated by the research vessels (R/V Oceanus; WHOI, R/V

Aegaeo; HCMR and R/V Thuwal; KAUST) during the KAUST Red Sea

Expeditions (KRSE) in 2008, 2010, 2011, 2013 and 2015. The cruise tracks spanned the extreme southern, central and extreme northern end of the Red

Sea’s main basin and covered the coastal and open-water regions. I summarize some of the results from studies conducted in the Red Sea in the following section.

21

Microbiology of the Red Sea

Various microbiological studies were conducted on previously unexplored regions of the Red Sea, spanning surface waters, fish guts (Miyake et al. 2015) and deep-sea brine pools. The composition of archaeal and bacterial communities in different brine-seawater interfaces of the Red Sea revealed the presence of potentially novel niche-specific groups (Guan et al. 2015).

Phylogenetic analyses of functional marker gene sequences from clone libraries further uncovered the high diversity of sulfate-reducing and methanogenic populations (Guan et al. 2015). In addition, an array of cultivation efforts and genomic approaches has been used on microbes inhabiting the brines, to explore the characteristics enabling them to dominate such harsh environments

(Andre et al. 2011, Guan et al. 2014, Ngugi et al. 2015). As for the water column of the Red Sea, the composition of bacterial communities in the surface waters was assessed by pyrosequencing of the 16S rRNA gene (Ngugi et al. 2011). This study revealed the most dominant bacterioplankton groups in the upper euphotic zone and documented changes in microbial abundances along a latitudinal temperature-salinity gradient (Ngugi et al. 2011). For a higher resolution of the vertical niche partitioning of the two most abundant prokaryotic taxa in the Red

Sea water column, we investigated the internal transcribed spacer (ITS) region of

SAR11 (Ngugi and Stingl 2012) and Prochlorococcus (Shibl et al. 2014). In order to investigate the microbial adaptations and competitive strategies in the Red

Sea, a metagenomics-driven approach was used to quantify relative abundances of genetic functions (Thompson et al. 2013). Gene frequencies were then used to highlight and compare cyanophages, SAR11 and Prochlorococcus ecosystem-

22 specific adaptations among different oceans (Thompson et al. 2013).

Observation of the gene expression patterns in naturally occurring bacterial cells is facilitated with metatranscriptomics. We are able to assess microbial gene expression responses to the changing environmental conditions through diel transcript profiles at various depths in stations that span the Red Sea. Finally, on going cultivation efforts and genome sequencing of the abundant phototrophic and heterotrophic (Jimenez-Infante et al. 2015) taxa will drive us forward in understanding the physiology, metabolism and ecology of microbes inhabiting the water column of the Red Sea.

23

The Prochlorococcus genus

Discovery and taxonomy

The phototrophic cyanobacterium Prochlorococcus was discovered a few decades ago (Chisholm et al. 1988, 1992) for the first time in the ocean owing to technological advances in transmission electron microscopy and flow cytometry.

The increased sensitivity of flow cytometers used onboard allowed for the detection of a signal triggered by an unusual pigment (Olson et al. 1988).

Isolating and characterizing the first Prochlorococcus representative revealed that the faint red fluorescence detected was the divinyl-derivative of chlorophyll a

(Chisholm et al. 1992; Goericke and Repeta 1992). Evolutionary distance analyses of the 16S rRNA gene involving the other with similar pigment content, Prochlorothrix and Prochloron, placed them on distinct branches (Urbach et al. 1992). This highlighted the fact that Prochlorococcus evolved more recently and that the loss or gain of pigments occurred independently several times (Kettler et al. 2007; Partensky and Garczarek 2010).

Phylogenetically, it was consistently found that Prochlorococcus forms a sister group with its closest relative Synechococcus, and are believed to share a common ancestor (Urbach et al. 1998; Dufresne et al. 2008). However, neither genera was found to be related to the ancestral chloroplast lineage that gave rise to higher plants (LaRoche et al. 1996). Earlier physiological and molecular studies further resolved Prochlorococcus into several lineages that are defined by their light harvesting ability and temperature growth optima (Moore and

24 Chisholm 1999; Rocap et al. 2002; Johnson et al. 2006). This is described in more detail in the following section. Genomic taxonomy methods (e.g. Average amino acid identity and multilocus sequence analysis) applied to the complete genomes of cultured representatives uncovered new species of Prochlorococcus

(Thompson et al. 2013). More isolates and further species-identification resolution using this taxonomic framework will certainly reduce the species- ecotype incongruence.

Figure 2 The phylogenetic distribution of the 12 major clades of Prochlorococcus as determined by the internal transcribed spacer (ITS) sequence diversity.

25

26

Phylogenetic and genomic analyses

Based on their 16S rDNA phylogeny Prochlorococcus and Synechococcus

(Dufresne et al. 2008; Scanlan et al. 2009) share a common ancestor, forming a monophyletic group with all members of the Prochlorococcus genus sharing a sequence divergence of 3% (Moore et al 1998). Early cultivation efforts and molecular investigations identified strains with different adaptations to light and temperature (Moore et al. 1995, 1998). These physiological differences gave rise to the ecotypic characterization of Prochlorococcus. Ecotypes include strains that have adapted to a specific environmental niche; they are ecologically and genetically distinct subgroups that make up the microdiversity of a microbial species. In the case of Prochlorococcus, the intergenic transcriber spacer (ITS) sequence is used to resolve the genus into discrete lineages that are either adapted to high-light (HL) or low-light (LL) conditions (Rocap et al 2002). Many other marker genes (e.g. rpoC1, ntcA) (Penno et al. 2006; Tai and Palenik 2009) as well as concatenated protein-coding sequences of whole genomes (Kettler et al. 2007) consistently display the same tree pattern (Figure 2). Isolated and/ or sequenced ecotypes of Prochlorococcus are further divided into ~12 clades in total; six HL (I-VI) and six LL (I-VII) (Biller et al. 2015). Ecotypes with no cultured representative have been identified through environmental studies (Martiny et al.

2009; Lavin et al. 2010; Rusch et al. 2010; West et al. 2011; Huang et al. 2012;

Malmstrom et al. 2013; Biller et al. 2014). Alternative naming conventions have been used to label ecotypes (West and Scanlan 1999; Ahlgren et al. 2006;

Rusch et al. 2010; West et al. 2011); either using HL or LL as a prefix followed by a number, or using ‘e’ as a prefix followed by the strain’s name (e.g. LL IV or

27 eMIT9313, respectively). Here, I follow the HL and LL-based nomenclature system that is most commonly used across the literature. I use the terms

‘ecotype’ and ‘clade’ interchangeably to refer to the different HL and LL-adapted strains.

The Prochlorococcus genome is the smallest of any known photosynthetic and free-living organism (Dufresne et al. 2003; Kettler et al. 2007; Hess 2011).

Amongst all strains sequenced to date, the genome size ranges from 1.64 to

2.68 Mb, protein-coding genes range from 1716 to 3022 genes, and the G+C content ranges from 31 to 51% (Scanlan et al. 2009; Biller et al. 2014). The LL IV ecotypes retained relatively larger genomes (2.41-2.68 Mb, 2275-3022 genes) with higher G+C percentages (50-51%) (Rocap et al. 2003; Biller et al. 2014).

This clade seems to be the closest, phylogenetically, to Synechococcus, and has evolved the least within the Prochlorococcus genus (Kettler et al. 2007). All

Prochlorococcus ecotypes studied so far have a single chromosome and no plasmid, and some are believed to release extracellular vesicles that contain

DNA (Coleman et al. 2006; Biller et al. 2014).

The compactness of its genome is believed to be a key driver of the high biomass of Prochlorococcus in the oceans. The advantage of harboring the minimal gene content to survive off sunlight harvesting is reflected by the microbe’s global abundance (Dufresne et al. 2003). The ecological success of

Prochlorococcus is likely a consequence of genome streamlining, allowing for a decrease in cell size and in turn, a more efficient metabolic system in the competitive, low-nutrient environment (Scanlan et al. 2009; Swan et al. 2013;

Giovannoni et al. 2014). The loss of numerous protein-coding genes in the

28 Prochlorococcus genome is attributed to selective pressures of environmental niches (Dufresne et al. 2003; Coleman et al. 2006) or a dependence on co- occurring microbes (Morris et al. 2011; 2012). The methods of genome reduction

(Luo et al. 2011) and the evolutionary patterns of gene loss or gain (Kettler et al.

2007) among Prochlorococcus strains are distinct to ecotype populations in the water column (Coleman et al. 2006; Paul et al. 2010). The 1,250 core genes shared by 12 Prochlorococcus genomes (Kettler et al. 2007) are mostly organized in syntenic regions and appear to be responsible for the cell’s central metabolism and housekeeping functions (Dufresne et al. 2003; Kettler et al.

2007). Additional non-core genes of known (e.g. light harvesting and nutrient acquisition) and unknown functions are believed to explain ecotypic differentiation, and hence contribute to niche specialization (Coleman et al. 2006;

Scanlan et al. 2009). Maintaining flexible genes seems to be a feature of isolates from similar environments (Martiny et al. 2006; Coleman and Chisholm 2010) and mutations in some of them subject Prochlorococcus populations to lower growth rates (Avrani et al. 2011). Additionally, ambiguous partitioning seems to complement some discrepancies; for example the LL I ecotype has a low optimal light intensity (Zinser et al. 2007) and yet carries a large number of hli (high-light inducible) stress genes (Kettler et al. 2007), allowing it to occupy a broader range in a well-mixed water column (Malmstrom et al. 2010). Interestingly, a recent analyses on wild Prochlorococcus cells provided evidence for distinct genomic backbones in sequenced genomes of single cells collected from the mixed layer depth (Kashtan et al. 2014). The global dynamic stability of Prochlorococcus among seasonal time scales is hypothesized to be the result of coexisting

29 subpopulations that carry niche-specific flexible gene cassettes (Kashtan et al.

2014). As more Prochlorococcus genomes are sequenced, newly found genes

(i.e. non-core or flexible genes) are added to the pan-genome, currently estimated at 6,000 genes among cultured isolates (Kettler et al. 2007) and predicted to exceed 57,000 genes in the natural environment (Baumdicker et al.

2012).

Integrating molecular and physiological information from global studies will enhance our understanding of the ecology and evolution of Prochlorococcus. In light of this, a database driven portal (http://proportal.mit.edu/) has been set up to encompass cross-referenced genetic, metagenomic, and transcriptomic data about Prochlorococcus, Synechococcus, and their phages (Kelly et al. 2012).

30

Ecology and evolution

Prochlorococcus is numerically the most abundant photosynthetic microorganism in the global tropical and subtropical marine ecosystems, estimated at 2.8 x 1027 cells (Partensky et al. 1999; Johnson et al. 2006;

Flombaum et al. 2013). Prochlorococcus populations dominate highly stratified, nutrient-poor, warm waters (Johnson et al. 2006; Bouman et al. 2006;

Zwirglmaier et al. 2007) that cover the 40°N and 40°S latitude. Prevailing physicochemical parameters of oceanic regions control the spatial and temporal distribution of the genus. HL and LL Prochlorococcus ecotypes are partitioned throughout the water column according to preferred growth conditions, including nutrient concentrations and light availability (Malmstrom et al. 2010). Detailed comparisons based on genomes of strains adapted to contrasting light intensities highlighted the role of genes that define the fitness of each ecotype (Moore et al.

1995, 1998; Rocap et al. 2003). The depth-stratification of Prochlorococcus ecotypes is strongly influenced by the extent of light penetration through the water column (Malmstrom et al. 2010). The HL and LL ecotypes have adaptive pigment ratios that allow them to dominate the upper and lower photic zones, respectively. Indicative of photoadaptation, the concentration of chlorophyll a per cell, and the intensity of their fluorescence increase at depth, where

Prochlorococcus abundance is lower (Malmstrom et al. 2010; Karl and Church

2014).

Temperature optimum is also a prominent factor that distinguishes the HL- adapted clades (I and II) with cultured representatives (Figure 3) (Johnson et al.

2006; Zinser et al. 2007). In some oceanic regions, the temperature range

31 expands seasonally, leading to significant fluctuations in cellular abundance. In winter, when the surface water temperature is lower and the water column becomes well mixed, Prochlorococcus abundance declines whereas that of

Synechococcus increases (Olson et al. 1990; Lindell and Post 1995; Crosbie and

Furnas 2001; Bouman et al. 2006; Malmstrom et al. 2010). HL II is typically more abundant and has a higher temperature growth optimum (>29 °C) relative to the

HL I clade (<28 °C) (Zinser et al. 2007; Zwirglmaier et al. 2007; Malmstrom et al.

2010). HL-adapted ecotypes III-VI do not yet have cultured representatives.

However, sequence analyses seem to describe physiological attributes that match conditions found in specific niches (Huang et al. 2012). For example, sequences from clades III, IV and V (alternatively called HNLCs; high nutrient, low chlorophyll) were found in equatorial oceanic regions where iron is depleted, macronutrient concentrations are high, and chlorophyll a concentration is low

(Rusch et al. 2010; West et al. 2011; Malmstrom et al 2013). LL-adapted clades, usually found towards the bottom of the euphotic zone, have less cultured representatives and are generally more diverse, as reflected by their polyphyletic topology. LL I can tolerate sudden light fluctuations and hence can occupy the middle euphotic zone as an intermediate clade between the HL and LL groups

(Ahlgren et al. 2006; Johnson et al. 2006; Malmstrom et al. 2010). The LL II and

LL III clades, both found in the lower euphotic zones, are not easily distinguished phylogenetically (Rocap et al. 2002). Having similar global distribution patterns, genomic and metagenomic analyses consistently group them into a single clade

(Kettler et al. 2007; Martiny et al. 2009; Malmstrom et al. 2010). The LL IV clade is the most sensitive to light fluctuations and restricted to the deeper regions of

32 the water column (Ahlgren et al. 2006; Malmstrom et al. 2010). LL-adapted ecotypes V to VII do not have cultured representatives, but sequences analyses indicate distinct features. LL V and VI occupy the oxygen minimum zones (OMZ) of the water column (Lavin et al. 2010). Little information is available on the LL

VII clade, also known as NC1 (Martiny et al. 2009). In oceanic ecosystems, seasonal dynamics and global warming can shift Prochlorococcus community structures on different timescales (Mella-Flores et al. 2011; Fuhrman et al. 2015).

It is important to detect and identify genes of unknown functions in the genomes of different ecotypes to assess adaptive mechanism to particular extreme conditions.

Figure 3 The relationship between temperature and the growth optima of the HL-adapted clades I and II.

33 Metabolism and nutrient utilization

The Prochlorococcus genome contains the minimal collection of genes required for viability (Dufresne et al. 2003; Partensky and Garczarek 2010). For light harvesting and cellular energy production, the genes encoding photosystem

I and II proteins, ATP synthase, NADPH dehydrogenase, cytochrome b6f

(cytb6f), divinyl-chlorophyll a/b binding proteins and prochlorophyte chlorophyll binding proteins (Pcbs) are present (and highly conserved) in all lineages sequenced so far (Rocap et al. 2003; Scanlan et al. 2009; Zorz et al. 2015).

Prochlorococcus releases approximately 2-12 fg carbon cell-1 hour-1 from (Partensky et al. 1993; Zinser et al. 2009) and annually supplies up to half the global primary production, at around 4 gigatons fixed carbon

(Flombaum et al. 2013; Biller et al. 2014). Inorganic carbon is taken up as bicarbonate or carbon dioxide based on the transporters that different lineages posses (Scanlan et al. 2009), and is generally stored as glycogen (Zinser et al.

2009). Carbonic anhydrase converts bicarbonate to carbon dioxide, which is then fixed in the carboxysomes and catalyzed by the ribulose biphosphate carboxylase oxygenase (RuBisCO) (Ting et al. 2007; Scanlan et al. 2009; Zorz et al. 2015). Recently, a homolog of a high-affinity transporter was found in some

Prochlorococcus genomes, capable of taking up glucose (Gomez-Baena et al.

2008; del Carmen Munoz-Marin et al. 2013). However, more experiments on

Prochlorococcus are needed to confirm the ability to grow solely on glucose.

Moreover, genes encoding enzymes for glycolysis, the oxidative pentose phosphate pathway, an incomplete (TCA), peptidoglycan biosynthesis, biosynthesis, nucleic acid biosynthesis, and amino acid

34 biosynthesis are all present (Dufresne et al. 2003; Kettler et al. 2007; Zinser et al.

2009; Scanlan et al. 2009).

The small, free-living Prochlorococcus cell has a large surface area to volume ratio, with a low cellular quota of carbon, nitrogen and phosphorus

(Bertilsson et al. 2003; Heldal et al. 2003). All Prochlorococcus strains can grow on ammonium as a source of nitrogen supported by the presence of a suitable transporter system (Tolonen et al. 2006; Scanlan et al. 2009). However, they cannot fix dinitrogen (N2). Some strains were shown to take up nitrite (Moore et al. 2002), urea (Moore et al. 2002; Scanlan et al. 2009), cyanate (Kamennaya et al. 2008), and amino acids (Mary et al. 2008). None of the previously characterized Prochlorococcus isolates possessed the enzymes (i.e. nitrate permease and nitrate reductase) necessary to utilize nitrate (Moore et al. 2002;

Kettler et al. 2007; Scanlan et al. 2009). However, field data and metagenomic analyses demonstrated the ability of wild Prochlorococcus to take up labeled nitrate (Casey et al. 2007) and revealed the presence of nitrate assimilating genes in assembled genomes (Martiny et al. 2009). Eventually, it was confirmed that some Prochlorococcus strains are able to grow solely on nitrate, and the architecture of the assimilation genes in some HL and LL ecotypes were determined (Berube et al. 2014). Nitrogen scarcity triggers expression of the nitrogen transcriptional regulator (ntcA) amongst other assimilatory genes

(Lindell et al. 2002; Tolonen et al. 2006). Complete deprivation compromises the efficacy of the photosystem and the cell’s viability (Tolonen et al. 2006;

Thompson et al. 2011). All Prochlorococcus strains can grow on inorganic phosphate as a source of phosphorus (Moore et al. 2006; Scanlan et al. 2009). In

35 low-P environments, Prochlorococcus populations had widespread genes for the assimilation of phosphite and phosphonate (Scanlan et al. 2009; Feingersch et al. 2012; Martinez et al. 2012). Phosphorus depletion up-regulates the phosphorus response regulator gene (phoB) and risks the influx of toxic arsenate

(Saunders and Rocap 2015), while continuous deprivation eventually causes cell death (Martiny et al. 2006). Overall, nitrogen and phosphorus sensing, acquisition and metabolism differ between across strains that belong to the same clade, according to their genetic makeup (Moore et al. 2005, 2006; Martiny et al.

2006, 2009; Scanlan et al. 2009; Feingersch et al. 2012). The concentrations of metallic micronutrients vary amongst oceanic regions (Thompson et al. 2011) and are usually higher in the surface than the deep sea (Mann et al. 2002). Some trace metals, e.g. copper and zinc, are necessary for growth but toxic in high concentrations (Scanlan et al. 2009; Huertas et al. 2014). Iron is one of the most important metals for Prochlorococcus (Mann and Chisholm 2000), being a critical component in proteins involved in the photo-dependent electron transport system

(Raven et al. 1999). Ecotypes are differentially adapted to the varying concentrations of iron in the global oceans as reflected by their transcriptomic response (Thompson et al. 2011). Moreover, novel clades were discovered in high nutrient, low chlorophyll (HNLC) waters, where iron concentrations are low

(Rusch et al. 2010; West et al. 2011). Indeed, metagenomic assembly of the strains displayed a low number of iron-containing proteins and bringing them into culture will enhance our understanding of the exerted selective pressure on these populations (Rusch et al. 2010).

36 Morphology and growth

The coccoidal shape of the unicellular Prochlorococcus has a diameter of

0.5-0.8 µm, with the LL IV strains being generally larger (Chisholm et al. 1992;

Partensky et al. 1999). The non-motile Prochlorococcus cells possess a large surface area to volume ratio and have gram-negative staining properties. They contain divinyl Chl a and b, zeaxanthin, α-carotene, divinyl Chl a/b binding proteins but lack phycobilisomes (Goericke and Repeta 1992; Rippka et al. 2000) and therefore grows to a yellow/ lime-green color in seawater-based enrichments. Prochlorococcus possess peripheral and semi-concentric thylakoids, the site of photosynthesis, along the (Partensly et al.

1999; Ting et al. 2007). These intracytoplasmic lamellae have a distinct structure and organization that vary between ecotypes, giving each a suitable photosynthetic fitness related to its niche (Ting et al. 2007). For CO2-fixing, the cytoplasm contains clustered icosahedral carboxysomes, whose protein composition is encoded for by a highly conserved set of genes (Roberts et al.

2012). Prochlorococcus cells possess α-carboxysomes, which serve as compartments for 1A RuBisCO involved in the sequestration of carbon (Hess et al. 2001; Ting et al. 2007). Analyses of Prochlorococcus membrane-lipids revealed that they were mainly made up of glycolipids (Popendorf et al. 2011) and sulfoquinovosyldiacylglycerol (SQDG) (Van Mooy et al. 2006), an advantageous adaptation to low-P environments.

Maximum growth rates of Prochlorococcus have been reported in vivo and in situ (across water columns) under varying environmental conditions (Moore et al. 1995; Vaulot et al. 1995; Liu et al. 1995). In addition, growth rates were

37 assessed in terms of their dependency on temperature and light as well as their influence on oceanic gross primary production (Figures 3 and 4) (Liu et al. 1997;

Shalapyonok et al. 1998; Zinser et al. 2007). Different ratios of intrinsic divinyl

Chl a and b pigments control the light harvesting capability and photoinhibition of

Prochlorococcus ecotypes (Moore et al. 1995; Moore and Chisholm 1999). HL- adapted cells possess a low divinyl chlorophyll ratio (Chl b/a ratio), and are able to utilize light intensities that reach 970 µmol photons m−2 s−1 (Bruyant et al.

2005) with maximal growth rates between 70 and 500 µmol photons m−2 s−1

(Moore et al. 1995). In contrast, LL-adapted cells contain higher concentrations of pigments, especially a higher Chl b/a ratio (Hess et al. 2001) with maximal growth rates between 30 and 150 µmol photons m−2 s−1 (Moore et al. 1995). No known ecotype grows across the whole light range (Ting et al. 2002) and physiological and genomic analyses of light harvesting complexes showed a clear niche differentiation (Garczarek et al. 2001; Steglich et al. 2006). Similarly, no strain grows across the entire temperature range, 11-31 °C that can support growth, but optimal temperatures for all Prochlorococcus ecotypes is 24-28 °C

(Johnson et al. 2006; Zinser et al. 2007). Light-dark cycles in the natural environment force daily variations in the cellular processes and RNA expression patterns in Prochlorococcus populations (Zinser et al. 2009; Waldbauer et al.

2012). In most cases, the expression of genes related to a specific function, like photosynthesis or carbon fixation, coincides with the time of the day these processes occur (Holtzendorff et al. 2001, 2002; Zinser et al. 2009; Waldbauer et al. 2012). DNA synthesis does not follow the diel cycle but replication was found to occur in the afternoon, followed by cell division (Burbage and Binder 2007;

38 Zinser et al. 2009; Waldbaauer et al. 2012). Additionally, cellular division and mortality rates (i.e. biomass) of Prochlorococcus cells are tightly phased with the diel dynamics (Ribalet et al. 2015). Studying this synchrony sheds light on coexisting microbial metabolic activities and their influence on the oceanic ecosystem stability (Ottesen et al. 2014). Interestingly, Prochlorococcus lacks the kaiA gene, and hence lacks a full endogenous circadian clock system (Rocap et al. 2003; Axmann et al. 2009). Under continuous light, the genetic oscillatory system is disturbed, proposing a daily resetting mechanism rather than an internal circadian rhythm (Holtzendorff et al. 2008; Axmann et al. 2009).

Figure 4 The HL adapted cells have higher growth rates at higher light intensities, while the LL cells are not able to survive above a certain light intensity threshold.

39 Isolation and application

Obtaining clonal or axenic microbial isolates facilitates their physiological characterization, genome sequencing, and transcriptomic analysis. Since the discovery of Prochlorococcus and its ecological importance as a globally abundant photosynthetic marine (Chisholm et al. 1988, 1992), efforts continue to be made on bringing HL and LL-adapted strains into culture. The

Sargasso, Mediterranean, Arabian, Atlantic, and Pacific are all oceanic regions from which Prochlorococcus ecotypes have been cultured. Other regions (e.g.

South China Sea, Red Sea) show the presence of novel ecotypes from which we still don’t have isolated representatives (Martiny et al. 2009; Lavin et al. 2010;

Rusch et al. 2010; West et al. 2011; Huang et al. 2012; Malmstrom et al. 2013).

Prochlorococcus sensitivity (Mann et al. 2002; Morris et al. 2011) to heavy metals or fluctuations in light and temperature makes it challenging to purify, grow, and maintain in the lab. Enrichments can be obtained by separating Synechococcus, picoeukaryotes, and algae using filtration, flow cytometry or dilution to extinction methods (Chisholm et al. 1992; Rippka et al. 2000; Moore et al. 2005; Stingl et al. 2007). Careful preparations are required with culture ware, media composition, and incubation conditions. Vessels for seawater collection, glass culture tubes or polystyrene culture flasks are acid-washed and autoclaved prior to use. Liquid or solid medium, made of natural or artificial seawater, is enriched with specific nutrient compositions to meet each strain’s requirements. Media based on 0.1 or 0.2 µm filtered natural seawater from an oligotrophic environment have shown the most consistent yields of Prochlorococcus (Moore et al. 2007), although the artificial-based Turk’s Island Salt Mix has also been

40 successfully used before (Rippka et al. 2000). High-grade macronutrients

(nitrogen and phosphorus) and a mix of trace metals (or vitamins) is filter sterilized, autoclaved and added to the filtered seawater. For larger volumes, bicarbonate, HEPES, TAPS (pH 8) and air bubbling are required (Moore et al.

2007; Morris et al. 2011). The most commonly used media of culturing include

PRO2 and PRO99. Initial trials with semisolid media based on low melting agarose and seawater had extremely low growth yields (Tolonen et al. 2006;

Moore et al. 2007). Plating on Noble agar that carries a layer of streaked helper bacteria was efficient in producing Prochlorococcus colonies (Morris et al. 2008).

A helper bacterium (i.e. Alteromonas sp.) was shown to scavenge harmful hydrogen peroxide (Morris et al. 2011) and using it with the pour-on method had an extremely high recovery (Avarani et al. 2011; Lindell 2014). Once clonal axenic cultures are achieved, they are monitored using microscopy, flow cytometry or purity test broth after which they can be cryopreserved with DMSO, flash frozen in liquid nitrogen and stored at -80 °C (Moore et al. 2007). The difficulty of bringing axenic Prochlorococcus into industrial biomass makes it less feasible to use in biotechnological applications. To date, only a few studies revealed genomic potential for synthesis of useful secondary metabolites (Li et al. 2010; Shih et al. 2013), and it is yet to uncover the full potential of genetic functions in the Prochlorococcus genus.

Nevertheless, Prochlorococcus is an important model organism

(Partensky et al. 1999; Partensky and Garczarek 2010; Coleman and Chisholm

2007) for studying how genomic and phenotypic diversity affects the microbial

41 structure, function and stability in the highly dynamic oceans (Biller et al. 2014;

Karl and Church 2014; Fuhrman et al. 2015; Sunagawa et al. 2015).

42

References

Acker J, Leptoukh G, Shen S, Zhu T & Kempler S (2008). Remotely-sensed chlorophyll a observations of the northern Red Sea indicate seasonal variability and influence of coastal reefs. Journal of marine systems, 69(3), 191-204.

Alraddadi, T. M. (2013). Temporal changes in the Red Sea circulation and associated water masses. University of Southampton, Doctoral thesis, Ocean and Earth Science, 198 pp.

Antunes A, Ngugi DK, Ulrich SU (2011). Microbiology of the Red Sea (and other) deep-sea anoxic brine lakes. Environ Microbiol Rep 3:416–433

Ahlgren, N.A., & Rocap, G (2006). Culture isolation and culture-independent clone libraries reveal new marine Synechococcus ecotypes with distinctive light and N physiologies. Applied and environmental microbiology, 72(11), 7193-7204.

Avrani S, Wurtzel O, Sharon I, Sorek R, Lindell D (2011). Genomic island variability facilitates Prochlorococcus- coexistence. Nature 474:604–608

Axmann IM, Duhring U, Seeliger L, Arnold A, Vanselow JT, Kramer A, Wilde A (2009). Biochemical evidence for a timing mechanism in Prochlorococcus. J Bacteriol 191:5342–5347

Baumdicker, F., Hess, W.R., & Pfaffelhuber, P. (2012). The infinitely many genes model for the distributed genome of bacteria. Genome biology and evolution, 4(4), 443-456.

Behairy AKA, Sheppard CRC, El-Sayed MK (1992). A review of the geology of coral reefs in the Red Sea. UNEP, Nairobi, Kenya, UNEP regional seas reports and studies, 36

Belogorskaya EV (1970). Qualitative and quantitative distribution of phytoplankton in the Red Sea and Gulf of Aden in October– November 1963. Biol Morya (Vladivost)/Mar Biol 21:133–152

Bertilsson S, Berglund O, Karl DM, Chisholm SW (2003). Elemental composition of marine Prochlorococcus and Synechococcus: implications for the ecological stoichiometry of the sea. Limnol Oceanogr 48:1721–1731

43 Berube, PM., Biller, SJ., Kent, AG., Berta-Thompson, J.W., Roggensack, SE., oache-Johnson, K.H. & Chisholm, S.W. (2015). Physiology and evolution of nitrate acquisition in Prochlorococcus. The ISME journal, 9(5), 1195-1207.

Biller, S.J., Berube, P.M., Berta-Thompson, J.W., Kelly, L., Roggensack, S.E., Awad, L (2014). Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. Scientific data, 1:140034.

Biller, S.J., Berube, P.M., Lindell, D., & Chisholm, S.W. (2015). Prochlorococcus: the structure and function of collective diversity. Nature Reviews Microbiology, 13(1), 13-27.

Bouman HA, Ulloa O, Scanlan DJ, Zwirglmaier K, Li WKW, Platt T, Stuart V, Barlow R, Leth O, Clementson L, Lutz V, Fukasawa M, Watanabe S, Sathyendranath S (2006). Oceanographic basis of the global surface distribution of Prochlorococcus ecotypes. Science 312:918–921

Bower AS & Farrar JT (2015). Air–Sea Interaction and Horizontal Circulation in the Red Sea. In The Red Sea (pp. 329-342). Springer Berlin Heidelberg.

Bruyant F, Babin M, Genty B, Prasil O, Behrenfeld MJ, Claustre H, Bricaud A, Garczarek L, Holtzendorff J, Koblizek M, Dousova H, Partensky F (2005). Diel variations in the photosynthetic parameters of Prochlorococcus strain PCC 9511: Combined effects of light and cell cycle. Limnol Oceanogr 50:850–863

Burbage CD, Binder BJ (2007). Relationship between cell cycle and light-limited growth rate in oceanic Prochlorococcus (MIT9312) and Synechococcus (WH8103) (cyanobacteria). J Phycol 43:266–274

Casey JR, LomasMW, Mandecki J,Walker DE (2007). Prochlorococcus contributes to new production in the Sargasso Sea deep chlorophyll maximum. Geophys Res Lett 34. L10604

Chisholm SW., Olson R.J., Zettler E.R., Goericke R., Waterbury J.B., & Welschmeyer, N. A. (1988). A novel free-living prochlorophyte abundant in the oceanic euphotic zone.

Chisholm S.W., Frankel S.L., Goericke R., Olson, R.J., Palenik, B., Waterbury J.B & Zettler, E.R. (1992). Prochlorococcus marinus nov. gen. nov. sp.: an oxyphototrophic marine prokaryote containing divinyl chlorophyll a and b. Archives of Microbiology, 157(3), 297-300.

44 Churchill JH, Bower AS, McCorkle D C & Abualnaja Y (2015). The transport of nutrient-rich Indian Ocean water through the Red Sea and into coastal reef systems. Journal of Marine Research, 72(3), 165-181.

Coleman ML, Sullivan MB, Martiny AC, Steglich C, Barry K, Delong EF, Chisholm SW (2006). Genomic islands and the ecology and evolution of Prochlorococcus. Science 311:1768–1770

Coleman ML, Chisholm SW (2010). Ecosystem-specific selection pressures revealed through comparative population genomics. Proc Natl Acad Sci USA 107:18634–18639

Crosbie, N.D., & Furnas, M.J. (2001). Abundance, distribution and flow- cytometric characterization of picophytoprokaryote populations in central (17 S) and southern (20 S) shelf waters of the Great Barrier Reef. Journal of Plankton Research, 23(8), 809-828.

Da Silva A, Young C & Levitus S (1994). Algorithms and Procedures. Vol. 1, Atlas of surface marine data 1994 NOAA Atlas NESDIS 6: 83. del Carmen Munoz-Marin M, Luque I, Zubkhov MV, Hill PG, Diez J, Manuel Garcia-Fernandez J (2013). Prochlorococcus can use the Pro1404 transporter to take up glucose at nanomolar concentrations in the Atlantic Ocean. Proc Natl Acad Sci USA 110(21):8597–602.

Dufresne, A., Salanoubat, M., Partensky, F., Artiguenave, F., Axmann, I.M., Barbe, V., Duprat, S., Galperin, M.Y., Koonin, E.V. Oztas, S., Robert, C., Rogozin, I.B., Scanlan, D.J., Wolf, Y.I. & Hess, W.R. (2003). Genome sequence of the cyanobacterium Prochlorococcus marinus SS120, a nearly minimal oxyphototrophic genome. Proceedings of the National Academy of Sciences, 100(17), 10020-10025.

Dufresne A, Ostrowski M, Scanlan DJ, Garczarek L, Mazard S, Palenik BP, Paulsen IT, de Marsac NT, Wincker P, Dossat C, Ferriera S, Johnson J, Post AF, Hess WR, Partensky F (2008). Unraveling the genomic mosaic of a ubiquitous genus of marine cyanobacteria. Genome Biol 9:R90 Edwards FJ (1987). Climate and oceanography: Red Sea, Pergamon Press. Oxford 45-69.

Feingersch R, Philosof A,Mejuch T, Glaser F, Alalouf O, Shoham Y, Beja O (2012). Potential for phosphite and phosphonate utilization by Prochlorococcus.

45 ISME J. 6:827–834

Flombaum, P., Gallegos, J.L., Gordillo, R.A., Rincón, J., Zabala, L.L., Jiao, N (2013). Present and future global distributions of the marine Cyanobacteria Prochlorococcus and Synechococcus. Proceedings of the National Academy of Sciences, 110(24), 9824-9829.

Fuhrman JA, Cram JA & Needham DM (2015). Marine microbial community dynamics and their ecological interpretation. Nature Reviews Microbiology.

Garczarek L, Partensky F, Irlbacher H, Holtzendorff J, Babin M, Mary I, Thomas JC, Hess WR (2001). Differential expression of antenna and core genes in Prochlorococcus PCC 9511 (Oxyphotobacteria) grown under a modulated light– dark cycle. Environ Microbiol 3:168–175

Giovannoni, S.J., Thrash, J.C., & Temperton, B. (2014). Implications of streamlining theory for microbial ecology. The ISME journal, 8(8), 1553-1565.

Gomez-Baena G, Lopez-Lozano A, Gil-Martinez J, Lucena JM, Diez J, Candau P, Garcia-Fernandez JM (2008). Glucose uptake and its effect on gene expression in Prochlorococcus. PLoS One 3:e3416

Goericke, R., & Repeta, D.J. (1993). Chlorophylls a and b and divinyl chlorophylls a and b in the open subtropical North Atlantic Ocean. Marine Ecology-Progress Series, 101, 307-307.

Goren M, Dor M (1994). Updated check list of the fishes of the Red Sea. Israel Academy of Science and Humanities, Jerusalem, 120 pp

Guan Y, Hikmawan T., Antunes A., Ngugi D., & Stingl U. (2015). Diversity of methanogens and sulfate-reducing bacteria in the interfaces of five deep-sea anoxic brines of the Red Sea. Research in microbiology.

Head SM (1987). Red Sea fisheries. In: Edwards AJ, Head SM (eds) Red Sea: key environments. Pergamon Press, Oxford, pp 363–382

Heldal M, Scanlan DJ, Norland S, Thingstad F, Mann NH (2003). Elemental composition of single cells of various strains of marine Prochlorococcus and Synechococcus using X-ray microanalysis. Limnol Oceanogr 48:1732–1743

46 Hess, W.R., G. Rocap, C.S. Ting, F. Larimer, S. Stilwagen, J. Lamerdin, & Chisholm S.W (2001). The photosynthetic apparatus of Prochlorococcus: insights through comparative genomics. Photosynth. Res. 70:53–71.

Hess WR (2011). Cyanobacterial genomics for ecology and biotechnology. Curr Opin Microbiol 14:608–614

Holtzendorff J, Partensky F, Jacquet S, Bruyant F, Marie D, Garczarek L, Mary I, Vaulot D, Hess WR (2001). Diel expression of cell cycle-related genes in synchronized cultures of Prochlorococcus sp. strain PCC 9511. J Bacteriol 183:915–920

Holtzendorff J, Marie D, Post AF, Partensky F, Rivlin A, Hess WR (2002). Synchronized expression of ftsZ in natural Prochlorococcus populations of the Red Sea. Environ Microbiol 4:644–653

Holtzendorff J, Partensky F, Mella D, Lennon JF, Hess WR, Garczarek L (2008). Genome streamlining results in loss of robustness of the circadian clock in the marine cyanobacterium Prochlorococcus marinus PCC 9511. J Biol Rhythms 23:187–199

Huertas, M.J., López-Maury, L., Giner-Lamia, J., Sánchez-Riego, A.M., & Florencio, F.J. (2014). Metals in Cyanobacteria: Analysis of the Copper, Nickel, Cobalt and Arsenic Homeostasis Mechanisms. Life, 4(4), 865-886.

Huang S, Wilhelm SW, Harvey HR, Taylor K, Jiao N & Chen F (2012). Novel lineages of Prochlorococcus and Synechococcus in the global oceans. ISME J 6: 285-297.

Jimenez Infante F, Ngugi DK, Alam I, Rashid M, Baalawi W, Kamau AA, et al. (2014). Genomic differentiation among two strains of the PS1 clade isolated from geographically separated marine habitats. FEMS Microbiology Ecology.

Johnson ZI, Zinser ER, Coe A, McNulty NP, Woodward EM, Chisholm SW (2006). Niche partitioning among Prochlorococcus ecotypes along ocean scale environmental gradients. Science 311:1737–1740

Kamennaya NA, Chernihovsky M, Post AF (2008). The cyanate utilization capacity of marine unicellular cyanobacteria. Limnol Oceanogr 53:2485–2494

47 Karl, D.M., & Church, M.J. (2014). Microbial oceanography and the Hawaii Ocean Time-series programme. Nature Reviews Microbiology, 12(10), 699-713.

Kashtan, N., Roggensack, SE., Rodrigue, S., Thompson, J.W., Biller, S.J., Coe, A (2014). Single-cell genomics reveals hundreds of coexisting subpopulations in wild Prochlorococcus. Science, 344(6182), 416-420.

Kelly L, Huang KH, Ding HM, Chisholm SW (2012). ProPortal: a resource for integrated systems biology of Prochlorococcus and its phage. Nucleic Acids Res 40:D632–D640

Kettler GC, Martiny AC, Huang K et al. (2007). Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet 3: e231.

LaRoche J, van der StaayGWM, Partensky F, Ducret A, Aebersold R, Li R, Golden SS, Hiller RG, Wrench PM, Larkum AWD, Green BR (1996). Independent evolution of the prochlorophyte and green plant chlorophyll a/b lightharvesting proteins. Proc Natl Acad Sci USA 93:15244–15248

Lavin P, González B, Santibánez J, Scanlan DJ & Ulloa O (2010). Novel lineages of Prochlorococcus thrive within the oxygen minimum zone of the eastern tropical South Pacific. Environmental Microbiology Reports 2: 728-738.

Li B, Sher D, Kelly L, Shi YX, Huang K, Knerr PJ, Joewono I, Rusch D, Chisholm SW, van der Donk WA (2010). Catalytic promiscuity in the biosynthesis of cyclic peptide secondary metabolites in planktonic marine cyanobacteria. Proc Natl Acad Sci USA 107:10430–10435

Lindell D & Post AF (1995). Ultraphytoplankton succession is triggered by deep winter mixing in the Gulf of Aqaba (Eilat), Red Sea. Limnol Oceanogr 40: 1130- 1141.

Lindell D, Erdner D, Marie D, Prasil O, Le Gall F, Rippka R, Partensky F, Scanlan DJ, Post AF (2002). Nitrogen stress response of Prochlorococcus strain PCC 9511 (Oxyphotobacteria) involves contrasting regulation of ntcA and amt1. J Phycol 38:1113–1124

Liu, H., Campbell, L., & Landry, M. R. (1995). Growth and mortality rates of Prochlorococcus and Synechococcus measured with a selective inhibitor technique. Marine ecology progress series. Oldendorf, 116(1), 277-287.

48 Liu H, Nolla HA, Campbell L (1997). Prochlorococcus growth rate and contribution to primary production in the equatorial and subtropical North Pacific Ocean. Aquat Microb Ecol 12:39–47

Luo HW, Friedman R, Tang JJ, Hughes AL (2011). Genome reduction by deletion of paralogs in the marine cyanobacterium Prochlorococcus. Mol Biol Evol 28:2751–2760

Mann EL, Chisholm SW (2000). Iron limits the cell division rate of Prochlorococcus in the eastern equatorial Pacific. Limnol Oceanogr 45:1067– 1076

Mann EL, Ahlgren N, Moffett JW, Chisholm SW (2002). Copper toxicity and cyanobacteria ecology in the Sargasso Sea. Limnol Oceanogr 47:976–988

Martiny AC, Kathuria S, Berube PM (2009). Widespread metabolic potential for nitrite and nitrate assimilation among Prochlorococcus ecotypes. Proc Natl Acad Sci USA 106:10787–10792.

Martinez A, Osburne MS, Sharma AK, DeLong EF, Chisholm SW (2012). Phosphite utilization by the marine picocyanobacterium Prochlorococcus MIT9301. Environ Microbiol 14:1363–1377

Mary I, Garczarek L, Tarran GA, Kolowrat C, Terry MJ, Scanlan DJ, Burkill PH, Zubkov MV (2008). Diel rhythmicity in amino acid uptake by Prochlorococcus. Environ Microbiol .10:2124–2131

Malmstrom, R.R., Coe, A., Kettler, G.C., Martiny, A.C., Frias-Lopez, J., Zinser, E.R., & Chisholm, S.W. (2010). Temporal dynamics of Prochlorococcus ecotypes in the Atlantic and Pacific oceans. The ISME journal, 4(10), 1252-1264.

Malmstrom, R.R., Rodrigue, S., Huang, K.H., Kelly, L., Kern, S.E., Thompson, A., et al (2013). Ecology of uncultured Prochlorococcus clades revealed through single-cell genomics and biogeographic analysis. The ISME journal, 7(1), 184- 198.

Martiny AC, Coleman ML, Chisholm SW (2006) Phosphate acquisition genes in Prochlorococcus ecotypes: evidence for genome-wide adaptation. Proc Natl Acad Sci USA 103:12552–12557

49 Martiny AC, Tai AP, Veneziano D, Primeau F & Chisholm SW (2009). Taxonomic resolution, ecotypes and the biogeography of Prochlorococcus. Environ Microbiol 11: 823-832. Miyake S, Ngugi, D.K., & Stingl, U. (2015). Diet strongly influences the gut microbiota of surgeonfishes. Molecular ecology, 24(3), 656-672

Mella-Flores, D., Mazard, S., Jeanthon, C., Bendif, E.M., Ostrowski, M., Scanlan, D.J., & Mauriac, R. (2011). Is the distribution of Prochlorococcus and Synechococcus ecotypes in the Mediterranean Sea affected by global warming.

Moore LR, Goericke R, Chisholm SW (1995). Comparative physiology of Synechococcus and Prochlorococcus: influence of light and temperature on growth, pigments, fluorescence and absorptive properties. Mar Ecol Prog Ser 116:259–275

Moore LR, Rocap G, Chisholm SW (1998). Physiology and molecular phylogeny of coexisting Prochlorococcus ecotypes. Nature 393:464–467

Moore LR, Chisholm SW (1999). Photophysiology of the marine cyanobacterium Prochlorococcus: ecotypic differences among cultured isolates. Limnol Oceanogr 44:628–638

Moore LR, Post AF, Rocap G, Chisholm SW (2002). Utilization of different nitrogen sources by the marine cyanobacteria Prochlorococcus and Synechococcus. Limnol Oceanogr 47:989–996

Moore LR, Coe A, Zinser ER, Saito MA, Sullivan MB, Lindell D, Frois-Moniz K, Waterbury JB, Chisholm SW (2007). Culturing the marine cyanobacterium Prochlorococcus. Limnol Oceanogr 5:353–362

Morcos, SA (1970). Physical and chemical oceanography of the Red Sea. Oceanogr. Mar. Biol. Annu. Rev, 8(73), 202. Morris JJ, Kirkegaard R, Szul MJ, Johnson JI, Zinser ER (2008). Facilitation of robust growth of Prochlorococcus colonies and dilute liquid cultures by ‘‘helper’’ heterotrophic bacteria. Appl Environ Microbiol 74:4530–4534

Morris JJ, Johnson ZI, Szul MJ, Keller M, Zinser ER (2011). Dependence of the cyanobacterium Prochlorococcus on hydrogen peroxide scavenging microbes for growth at the ocean’s surface. PLoS One 6:e16805

Morris JJ, Lenski RE, Zinser ER (2012). The : evolution of dependencies through adaptive gene loss. mBIO 3:e00036

50 Ngugi DK, Antunes A, Brune A & Stingl U (2011). Biogeography of pelagic bacterioplankton across an antagonistic temperature–salinity gradient in the Red Sea. Mol Ecol 21: 388-405.

Ngugi DK & Stingl U (2012). Combined analyses of the ITS loci and the corresponding 16S rRNA genes reveal high micro- and macrodiversity of SAR11 populations in the Red Sea. PLoS One 7: e50274.

Ngugi DK, Blom J, Alam I, Rashid M, Ba-Alawi W, Zhang G, et al. (2015). Comparative genomics reveals adaptations of a halotolerant thaumarchaeon in the interfaces of brine pools in the Red Sea. The ISME Journal 9:396–411.

Olson, R.J., Chisholm, S.W., Zettler, E.R., Altabet, M.A., & Dusenberry, J.A. (1990). Spatial and temporal distributions of prochlorophyte picoplankton in the North Atlantic Ocean. Deep Sea Research Part A. Oceanographic Research Papers, 37(6), 1033-1051.

Ottesen, E.A., Young, C. R., Gifford, S.M., Eppley, J. M., Marin, R., Schuster, S.C & DeLong, E.F. (2014). Multispecies diel transcriptional oscillations in open ocean heterotrophic bacterial assemblages. Science, 345(6193), 207-212.

Partensky F, Hoepffner N, Li WK, Ulloa O & Vaulot D (1993). Photoacclimation of Prochlorococcus sp. (Prochlorophyta) strains isolated from the North Atlantic and the Mediterranean Sea. Plant Physiol 101: 285-296.

Partensky F, Hess WR & Vaulot D (1999). Prochlorococcus, a marine photosynthetic prokaryote of global significance. Microbiol Mol Biol Rev 63: 106- 127. Partensky, F., & Garczarek, L. (2010). Prochlorococcus: advantages and limits of minimalism. Annual Review of Marine Science, 2, 305-331.

Penno S, Lindell D, Post AF (2006). Diversity of Synechococcus and Prochlorococcus populations determined from DNA sequences of the N regulatory gene ntcA. Environ Microbiol 8:1200–1211

Paul, S., Dutta, A., Bag, S.K., Das, S., & Dutta, C. (2010). Distinct, ecotype- specific genome and proteome signatures in the marine cyanobacteria Prochlorococcus. BMC genomics, 11(1), 103.

Popendorf KJ, Lomas MW, Van Mooy BAS (2011). Microbial sources of intact polar diacylglycerolipids in theWestern North Atlantic Ocean. Org Geochem

51 42:803–811

Raitsos DE, Pradhan Y, Brewin RJ, Stenchikov G & Hoteit I (2013). Remote sensing the phytoplankton seasonal succession of the Red Sea. e64909.

Raven, J.A., Evans M.C., and Korb R.E. (1999). The role of trace metals in photosynthetic electron transport in O2-evolving organisms. Photosynth Res 60:111–149.

Ribalet, F., Swalwell, J., Clayton, S., Jiménez, V., Sudek, S., Lin, Y & Armbrust, E. V. (2015). Light-driven synchrony of Prochlorococcus growth and mortality in the subtropical Pacific gyre. Proceedings of the National Academy of Sciences, 112(26), 8008-8012.

Rippka R, Coursin T, Hess WR, Lichtle C, Scanlan DJ, Palinska KA, Iteman I, Partensky F, Houmard J, Herdman M (2000). Prochlorococcus marinus Chisholm et al. 1992 subsp pastoris subsp. nov. strain PCC 9511, the first axenic chlorophyll a2/b2-containing cyanobacterium (Oxyphotobacteria). Int J Syst Evol Microbiol 50:1833–1847 Rocap G, Distel D, Waterbury JB, Chisholm SW (2002) Resolution of Prochlorococcus and Synechococcus ecotypes by using 16S-23S ribosomal DNA internal transcribed spacer sequences. Appl Environ Microbiol 68:1180- 1191

Rocap G, Larimer FW, Lamerdin J, Malfatti S, Chain P, Ahlgren N, Arellano A, Coleman M, Hauser L, Hess WR, Johnson ZI, Land M, Lindell D, Post AF, Regala W, Shah M, Shaw SL, Steglich C, Sullivan MB, Ting CS, Tolonen A, Webb EA, Zinser E, Chisholm SW (2003). Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature 424:1042–1047

Roberts EW, Cai F, Kerfeld CA, Cannon GC, Heinhorst S (2012). Isolation and characterization of the Prochlorococcus carboxysome reveal the presence of the novel shell protein CsoS1D. J Bacteriol 194:787–795

Rusch DB, Martiny AC, Dupont CL, Halpern AL & Venter JC (2010). Characterization of Prochlorococcus clades from iron-depleted oceanic regions. Proc Natl Acad Sci USA 107: 16184-16189.

Saunders, J.K., & Rocap, G. (2015). Genomic potential for arsenic efflux and methylation varies among global Prochlorococcus populations. The ISME journal.

52 Scanlan DJ, Ostrowski M, Mazard S, Dufresne A, Garczarek L, Hess WR, Post AF, Hagemann M, Paulsen I & Partensky F (2009). Ecological genomics of marine picocyanobacteria. Microbiol Mol Biol Rev 73: 249-299.

Sheppard C, Price A, Roberts C (1992). Marine ecology of the Arabian region: patterns and processes in extreme tropical environments. Academic Press, London

Shalapyonok A, Olson RJ, Shalapyonok LS (1998). Ultradian growth in the marine planktonic photosynthetic prokaryote Prochlorococcus. Appl Environ Microbiol 64:1066–1069

Shibl, A. A., Thompson, L. R., Ngugi, D. K., & Stingl, U. (2014). Distribution and diversity of Prochlorococcus ecotypes in the Red Sea. FEMS microbiology letters, 356(1), 118-126.

Shih PM,Wu DY, Latifi A, Axen SD, Fewer DP, Talla E, Calteau A, Cai F, de Marsac NT, Rippka R, Herdman M, Sivonen K, Coursin T, Laurent T, Goodwin L Nolan M, Davenport KW, Han CS, Rubin EM, Eisen JA,Woyke T, Gugger M, Kerfeld CA (2013). Improving the coverage of the cyanobacterial phylum using diversity-driven genome sequencing. Proc Natl Acad Sci USA 110:1053–1058

Siddall M, Smeed DA, Hemleben C, Rohling EJ, Schmelzer I, Peltier WR (2004). Understanding the Red Sea response to sea level. Earth Planet Sci Lett 225:421–434

Smeed DA (2004). Exchange through the Bab el Mandeb. Deep Sea Research Part II 51(4):455–474

Sofianos SS, Johns WE (2002). An Oceanic General Circulation Model (OGCM) investigation of the Red Sea circulation: 1. Exchange between the Red Sea and the Indian Ocean. J Geophys Res 107 (C11):3196

Sofianos S & Johns WE (2015). Water Mass Formation, Overturning Circulation, and the Exchange of the Red Sea with the Adjacent Basins. In The Red Sea (pp. 343-353). Springer Berlin Heidelberg.

Steglich C, Futschik M, Rector T, Steen R, Chisholm SW (2006). Genome-wide analysis of light sensing in Prochlorococcus. J Bacteriol 188:7796–7806

Stingl U, Tripp HJ, Giovannoni S (2007). Improvements of high-throughput

53 culturing yielded novel SAR11 strains and other abundant marine bacteria from the Oregon coast and the Bermuda Atlantic Time Series study site. ISME J 1:361–371

Sunagawa, S., Coelho, L.P., Chaffron, S., Kultima, J.R., Labadie, K., Salazar, G & Bork, P. (2015). Structure and function of the global ocean microbiome. Science, 348(6237), 1261359

Swan, B.K., Tupper, B., Sczyrba, A., Lauro, F.M., Martinez-Garcia, M., González, J.M., & Stepanauskas, R. (2013). Prevalent genome streamlining and latitudinal divergence of planktonic bacteria in the surface ocean. Proceedings of the National Academy of Sciences, 110(28), 11463-11468.

Tai V, Palenik B (2009). Temporal variation of Synechococcus clades at a coastal Pacific Ocean monitoring site. ISME J 3:903–915

Thompson AW, Huang K, Saito MA, Chisholm SW (2011). Transcriptome response of high- and low-light-adapted Prochlorococcus strains to changing iron availability. ISME J 5:1580–1594

Thompson, L. R., Field, C., Romanuk, T., Ngugi, D., Siam, R., Dorry, H., & Stingl, U. (2013). Patterns of ecological specialization among microbial populations in the Red Sea and diverse oligotrophic marine environments. Ecology and evolution, 3(6), 1780-1797.

Thompson, C.C., Silva, G.G., Vieira, N.M., Edwards, R., Vicente, A.C. & Thompson, F.L. (2013). Genomic taxonomy of the genus Prochlorococcus. Microbial ecology, 66(4), 752-762.

Ting, C.S., Rocap, G., King, J., & Chisholm, S.W (2002). Cyanobacterial photosynthesis in the oceans: the origins and significance of divergent light- harvesting strategies. Trends Microbiol 10:134–142.

Ting CS, Hsieh C, Sundararaman S, Mannella C, Marko M (2007). Cryo-electron tomography reveals the comparative three-dimensional architecture of Prochlorococcus, a globally important marine cyanobacterium. J Bacteriol 189:4485–4493

Tolonen AC, Aach J, Lindell D, Johnson ZI, Rector T, Steen R, Church GM, Chisholm SW (2006). Global gene expression of Prochlorococcus ecotypes in response to changes in nitrogen availability. Mol Syst Biol 2:53

54 Urbach, E., Robertson, D.L., & Chisholm, S.W. (1992). Multiple evolutionary origins of prochlorophytes within the cyanobacterial radiation.

Urbach, E., Scanlan, D.J., Distel, D.L., Waterbury, J.B., & Chisholm, S.W. (1998). Rapid diversification of marine picophytoplankton with dissimilar light-harvesting structures inferred from sequences of Prochlorococcus and Synechococcus (Cyanobacteria). Journal of molecular evolution, 46(2), 188-201.

Van Mooy BAS, Rocap G, Fredricks HF, Evans CT, Devol AH (2006). Sulfolipids dramatically decrease phosphorus demand by picocyanobacteria in oligotrophic marine environments. Proc Natl Acad Sci USA 103:8607–8612

Vaulot D, Marie D, Olson RJ, Chisholm SW (1995). Growth of Prochlorococcus, a photosynthetic prokaryote, in the equatorial Pacific Ocean. Science 268:1480– 1482

Venter JC, Remington, K. Heidelberg JF, Halpern AL, Rusch D, Eisen JA & Smith, H. O. (2004). Environmental genome shotgun sequencing of the Sargasso Sea. science, 304(5667), 66-74.

Waldbauer JR, Rodrigue S, Coleman ML, Chisholm SW (2012). Transcriptome and proteome dynamics of a light–dark synchronized bacterial cell cycle. PLoS One 7:e43432

West NJ & Scanlan DJ (1999) Niche-partitioning of Prochlorococcus populations in a stratified water column in the eastern North Atlantic Ocean. Appl Environ Microbiol 65: 2585-2591.

West NJ, Lebaron P, Strutton PG & Suzuki MT (2011). A novel clade of Prochlorococcus found in high nutrient low chlorophyll waters in the South and Equatorial Pacific Ocean. ISME J 5: 933-944.

Zinser ER, Johnson ZI, Coe A, Karaca E, Veneziano D, Chisholm SW (2007). Influence of light and temperature on Prochlorococcus ecotype distributions in the Atlantic Ocean. Limnol Oceanogr 52:2205–2220

Zinser ER, Lindell D, Johnson JI, Futschik ME, Steglich C, Coleman ML, Wright MA, Rector T, Steen R, McNulty NP, Thompson LR, Chisholm SW (2009). Choreography of the transcriptome, photophysiology, and cell cycle of a minimal photoautotroph, Prochlorococcus. PLoS One 4:e5135

55 Zorz, J.K., Allanach, J.R., Murphy, C.D., Roodvoets, M.S., Campbell, D.A., & Cockshutt, A.M. (2015). The RUBISCO to Photosystem II Ratio Limits the Maximum Photosynthetic Rate in Picocyanobacteria. Life, 5(1), 403-417.

Zwirglmaier K, Heywood JL, Chamberlain K, Woodward EMS, Zubkov MV, Scanlan DJ (2007). Basin scale distribution patterns of picocyanobacterial lineages in the Atlantic Ocean. Environ Microbiol 9:1278–1290

56 OBJECTIVES

This dissertation focuses on the cyanobacterial Prochlorococcus genus in the main basin of the Red Sea. Prochlorococcus directly contributes to a significant fraction of carbon fixation in the world. The ecology and adaptive mechanisms of the globally important Prochlorococcus sp in the understudied

Red Sea remains a mystery. The research within explores Prochlorococcus in terms of their diversity, distribution, genomic content, gene expression patterns, and the physiological characterization of an isolate. The Red Sea is an isolated water body and mimics a global warming scenario. With peculiar characteristics like oligotrophy, high annual surface temperatures and hypersalinity, it makes an intriguing ecosystem to investigate microbial dynamics.

Chapter One is a pilot study exploring the diversity and distribution of

Prochlorococcus in the Red Sea based on the intergenic transcriber spacer (ITS) region. I report the depth distribution in different seasons at the southern, central and northern regions using Prochlorococcus-specific primers. Chapter Two captures a higher phylogenetic resolution of Prochlorococcus ecotypes using the rpoC1 marker gene fragment over a wider geographical scope that encompasses

45 stations covering the central area and extreme ends. In addition to the rpoC1 pyrotags, and to bypass PCR bias, we analyzed fragments recruited from the

Red Sea metagenomes in order to link the Prochlorococcus ecotype stratification to environmental parameters in the water column. It is important to advance our understanding of the direction in which distribution and diversity patterns of

Prochlorococcus communities can shift in a warming world. As global oceans

57 become warmer and more stratified, Prochlorococcus ecotypes with the highest temperature optimum will increase in abundance and expand their habitat deeper into the water column. This is likely to affect the marine biogeochemical cycles and the bacterial communities within. Chapter Three narrows down the focus on a cultured isolate and sequenced genome of the Prochlorococcus HL II ecotype inhabiting the Red Sea. Physiological characterization and genome-wide comparison will explain the adaptations enforced by selective pressures in the

Red Sea ecosystem. In addition, we use metatranscriptomic datasets to assess cell metabolism, stress response, and nutrient assimilation gene expression patterns throughout the Red Sea water column. Maintaining cultured isolates allows us to investigate nutrient assimilation capabilities and genes of previously unknown function, among others.

In conclusion, the chapters of this dissertation aim to add significant pieces to the global and challenging puzzle of Prochlorococcus physiological and ecological interpretations.

58

CHAPTER ONE

Diversity and Distribution of Prochlorococcus in the Red Sea

Ahmed A Shibl1, Luke R Thompson2, David K Ngugi1, and Ulrich Stingl1

1Red Sea Research Center, King Abdullah University for Science and Technology, Saudi Arabia 2Knight Lab, University of California, San Diego

This is the final, accepted copy of the manuscript that was published in FEMS Microbiology Letters, 2014.

Citation: Shibl, A. A., Thompson, L. R., Ngugi, D. K., & Stingl, U. (2014). Distribution and diversity of Prochlorococcus ecotypes in the Red Sea. FEMS microbiology letters, 356(1), 118-126.

59 Abstract

Photosynthetic of the genus Prochlorococcus play a major role in global primary production in the world’s oligotrophic oceans. A recent study on pelagic bacterioplankton communities in the northern and central Red Sea indicated that the predominant cyanobacterial 16S rRNA gene sequence types were from Prochlorococcus cells belonging to a high light adapted ecotype (HL

II). In this study, we analyzed micro-diversity of Prochlorococcus sp. at multiple depths within and below the euphotic zone in the northern, central, and southern regions of the Red Sea, as well as in surface waters in the same locations, but in a different season. Prochlorococcus dominated the communities in clone libraries of the amplified 16S–23S rRNA internal transcribed spacer (ITS) region. Almost no differences were found between samples from coastal or open-water sites, but a high diversity of Prochlorococcus ecotypes was detected at 100-meter depth in the water column. Additionally, an unusual dominance of HL II-related sequences was observed in deeper waters. Our results indicate that the Red Sea harbors diverse Prochlorococcus lineages, but no novel ecotypes, despite its unusual physicochemical properties.

60

Introduction

The genus Prochlorococcus (Chisholm et al., 1988; 1992) is a highly abundant and widely distributed picocyanobacterium that accounts for a significant percentage of marine photosynthetic biomass and has an important ecological function in contributing to primary production and carbon cycling (Li, 1994;

Jardillier et al., 2010; Chavez et al., 2011), most pronounced across tropical and subtropical waters (Campbell et al., 1994; Buck et al., 1996; Partensky et al.,

1999). The species Prochlorococcus marinus is a genetically diverse group composed of distinct monophyletic clades that are distinguishable by their physiology and genomic features (ecotypes, for review see: Scanlan et al.,

2009). For example, differences in their light harvesting capabilities, mostly driven by ratios of their divinyl-chlorophyll a and b pigments (Partensky et al.,

1993; Hess et al., 2001), allow the cells to occupy specific niches in the water column of oceans around the globe (Moore et al., 1998; Johnson et al., 2006 and

Zwirglmaier et al., 2008).

Based on several marker genes (Penno et al., 2006; Jameson et al., 2008;

2010) and the more commonly used internal transcribed spacer region between the genes for 16S and 23S rRNA (ITS; West & Scanlan, 1999; Rocap et al.,

2002; Johnson et al., 2006), two distinct major lineages of Prochlorococcus are known to date: the high light (HL)-adapted clade that dominates upper surface layers and the low light (LL)-adapted clade that dominates the lower euphotic zone (Rocap et al., 2002; Brown & Fuhrman 2005; Martiny et al., 2009). In addition to the previously reported HL I and HL II ecotypes, several new groups have been discovered from different oceanic regimes. For example, Huang et al

61 (2011) reported a set of novel sequences that seemed to form monophyletic groups (referred to as HL III, HL IV, HL V and HL VI). The former three clades, detected in the surface waters of the Equatorial Pacific, clustered with the previously described HNLC (high-nutrient, low-chlorophyll) clade (Rusch et al.,

2010; West et al., 2011), whereas HL VI, which was found only in the lower euphotic zone (75–150 m) at low abundances, may resemble an

‘intermediate’ genotype (Huang et al., 2011). Similarly, the LL group currently contains four clades (LL I, LL II, LL III and LL IV). Recently, Martiny and colleagues detected a novel and abundant LL cluster in the Atlantic and Pacific

Oceans, NC1 (non-cultured), using PCR-amplified and cloned ITS sequences

(Martiny et al., 2009). The ongoing identification of novel clades indicates that the whole diversity of Prochlorococcus has not been fully recovered, thus increasing the chances of new discoveries, especially in currently under-sampled areas.

Despite the unique physical and chemical attributes of the Red Sea, it is one of the most poorly explored oligotrophic marine environments. Distinctive characteristics include high year-round solar irradiation and particularly high surface water temperatures (Stambler 2005; Edwards 1987). The Red Sea is an extremely saline marine water body (36 to 41 psu), mostly due to negligible freshwater inputs (Edwards 1987; da Silva 1994) and high evaporation rates. In terms of its microbiology, the bacterial (Ngugi et al., 2011) and picoeukaryotic

(Acosta et al., 2013) diversity in the northeastern half of the Red Sea is comparable to other tropical marine environments, but community assemblages showed fundamental differences along the temperature–salinity gradient from

North to South (Ngugi et al., 2011). More recently, the abundance of genetic

62 functions of pelagic microbes (including Prochlorococcus and SAR11) that might reveal ecosystem-specific adaptations in the Red Sea was quantified with a comparative metagenomics approach (Thompson et al., 2013).

Although the distribution and ecology of Prochlorococcus has been extensively studied in other oceanic regions, little is known about its micro- diversity in the main Red Sea basin. Lindell and Post investigated populations of phytoplankton in the northern Gulf of Aqaba and reported that the change in their distribution and abundance followed seasonal patterns in the region (Lindell and

Post 1995), with Prochlorococcus reaching maximal abundances in the summer.

Another study in the Gulf of Aqaba reported the dominance of Prochlorococcus

HL II-adapted cells throughout the water column, with no clear partitioning of LL- adapted populations (Fuller et al., 2005). Despite previous work on

Prochlorococcus in the Gulf of Aqaba, the conditions in the main basin of the Red

Sea are very different in terms of productivity and other physicochemical and hydrodynamic parameters, including the presence of several eddies that seem to have a major impact on distribution of nutrient rich waters from the Indian Ocean

(Stambler 2005; Ahmed et al., 2012; Raitsos et al., 2013), all of which could impact the structure of the residing microbial communities. Additionally, the main basin of the Red Sea displays a very interesting gradient system with antagonistic gradients of salinity and temperature, which was reflected in changes in the microbial communities from the northern Red Sea to the central part, based on 16S rRNA gene analyses (Ngugi et al., 2011). In this study,

Prochlorococcus and SAR11 clades dominated the surface waters, whereas

Synechococcus sequences were rare. The most prevalent Prochlorococcus

63 operational taxonomic units in the Red Sea were affiliated with the high light- adapted (HL II) ecotype. Unfortunately, 16S rRNA analyses do not provide enough phylogenetic resolution on ecotypes in Prochlorococcus (Scanlan et al.,

2009), and the presence of novel Red Sea adapted ecotypes could have been missed in this study.

In order to investigate the potential presence of novel ecotypes, distribution patterns and the seasonal variation of Prochlorococcus at different depths in the water column at the extreme ends of the Eastern Red Sea, we targeted the highly variable internal transcribed spacer region (ITS), and produced two datasets. One dataset is comprised of spring surface water samples from the northern, northern/central, and central regions of the Eastern

Red Sea, whereas the other dataset constitutes fall samples throughout the water column from the northern and southern ends of the Red Sea (Fig. 1).

64

Figure 1 Partial map of the Red Sea indicating the location of sampling sites. Stations 2, 5, 18, 14, 27 and 31 were sampled during Leg 1 (March) of the 2010 KAUST Red Sea Expedition (KRSE 2010). Stations 198, 197, 13, 4 were sampled during Leg 1 (September–October) of KRSE 2011.

65 Methods and materials

Sampling sites and water sample collection

The first set of Red Sea water samples was collected as described previously

(Ngugi et al., 2011) from 10-m surface water in transects from the coast to the open ocean in March 2010 as part of the 2nd R/V Aegaeo WHOI–KAUST Red

Sea Expedition (KRSE 2010). The second set of Red Sea water samples was collected as above in expanded transects from the coast to the open ocean in

September and October 2011 as part of the 3rd R/V Aegaeo KAUST Red Sea

Expedition (KRSE 2011). The sampling sites were as follows: northern, coastal station 198 (27.821 °N, 35.078 °E); northern, open-water station 197 (27.441 °N,

34.811 °E); southern, coastal station 13 (40.725 °N, 160 °E); and southern, open- water station 4 (40.357 °N and 1085 °E; Table 1, Fig. 1). Samples were collected at six depths (10, 25, 50, 100, 200, and 500 m depths or the deepest possible depth in more shallow areas). For all samples, in both expeditions, 20 L of seawater was collected and immediately filtered by serial filtration through a prefilter (mixed cellulose esters, 5.0 µm, Millipore) to remove suspended particles, followed by 1.2 µm and 0.1 µm filters (mixed cellulose esters, Millipore).

Individual filters were then sealed inside sterile zip-lock bags and frozen at –80

°C immediately on board of the ship.

DNA extraction, PCR amplification, cloning and sequencing

Total DNA was extracted from 0.1 µm filters obtained during KRSE 2010 and

2011 using a protocol modified from Rusch et al. (2007). In brief, the aqueous

DNA extracts (after the chloroform–isoamyl–alcohol extraction step) were washed with 10 mM TE buffer (10 mM Tris–HCl; 1 mM Na-EDTA, pH 8.0; Wright

66 et al. 2009) and concentrated using Amicon ultracentrifugation tubes (10 kDa,

Millipore) as described by the manufacturer. The purity of the extracted DNA was assessed with 1% agarose gel electrophoresis and quantified using Quant-iT

PicoGreen (Molecular Probes, Invitrogen). The picocyanobacterial 16S–23S rRNA internal transcribed spacer region (ITS) was amplified by PCR using primers designed to target Prochlorococcus lineages; forward primer, 2F for 16S rRNA: GAAGTCGTTACTYYAACCC and reverse primer, 3R for 23S rRNA:

TCATCGCCTCTGTGTGCC (Martiny et al., 2009). For the KRSE 2010 samples, each PCR reaction mixture (50 µl) contained 5.0 µl 10× PCR buffer (New

England Biolabs), 2.0 µl Magnesium Sulfate (50mM; Invitrogen), 1.25 µl dNTPs

(10 mM), 0.15 µl of each of the primers (100 µM), 0.1 µl Plantinum Taq DNA polymerase (5 U/ µl; Invitrogen), and 10 ng of template DNA. The PCRs were run with the following conditions, modified from Rocap et al., 2002: initial denaturation at 94 °C for 4 min; followed by 30 cycles at 94 °C for 4 min, 58 °C for 1 min, 72 °C for 6 min; and a final extension at 72 °C for 10 min. Amplicons were quality-assessed on agarose gels prior to purification with the QIAquick

PCR purification kit (Qiagen, Hilden, Germany). Purified PCR products were then cloned into the pDrive cloning vector (Qiagen) as per the manufacturer’s protocol.

KRSE 2011 samples were treated exactly the same, except that amplicons were purified with Exosap-IT (Affymetrix, Santa Clara, CA, USA) and purified products were cloned into the pCR 2.1 vector (Invitrogen, Carlsbad, CA,

USA) as per the manufacturer’s protocol. For both datasets, a total of 1,081 colonies with the correct insert size were bi-directionally sequenced with M13

67 primers on an ABI 3730 xl DNA Analyzer in the Biosciences Core Lab facility at

KAUST (Saudi Arabia), and sequences were assembled using the Sequencher sequence analysis software (v5.1; Gene Codes Corporation, Ann Arbor, MI,

USA; http://www.genecodes.com).

Phylogeny and OTU analyses of ITS sequences

Sequences were screened and aligned for OTU analyses using the MOTHUR software package (Schloss et al., 2009), checked for chimeras using

ChimeraSlayer (Haas et al., 2011) and Uchime (Edgar et al., 2011), and the resulting alignments were manually corrected using Geneious Pro software (v5.5; http://www.geneious.com/). Sequences from cultured representatives were retrieved from the Prochlorococcus Portal (http://proportal.mit.edu/), and were added to the alignment together with reference sequences of recently discovered clades. Phylogenetic diversity (PD) indices were calculated using MOTHUR for downstream statistical analysis. PD incorporates differences in phylogenetic distances in order to measure biodiversity across samples (Faith, 1992).

Statistical correlations among physical, chemical, and biological parameters were assessed with MINE (maximal information-based nonparametric exploration;

Reshef et al., 2011), which uses the MIC (maximal information coefficient) statistic and others to identify pairwise relationships in datasets. Phylogenetic trees for the ITS sequences were constructed using Neighbor Joining with

Jukes–Cantor corrected distances on the EPoS software (Griebel et al., 2008).

Bootstrap values based on 1,000 replicates are shown at the nodes. Trees were

68 inferred by maximum likelihood (GTR-GAMMA model using RAxML), and

MrBayes gave the same topologies.

Table 1 Overview on physico-chemical characteristics of sampled stations including number of analyzed ITS sequences and phylogenetic diversity of the samples.

69 Results and Discussion

Prochlorococcus diversity in surface waters

To identify the phylogenetic diversity of surface water (10 m) Prochlorococcus communities in the Eastern Red Sea, we PCR-amplified, cloned and sequenced the 16S–23S rRNA ITS region from samples collected at three coastal and three open-water sites on different transects in spring during KRSE 2010 (Fig. 1). From the total number of sequences analyzed in that dataset (Table 1, Fig. 2), ~94% belonged to the HL II ecotype and ~ 5% were affiliated to the HL VI clade, which was first described from the South China Sea (Huang et al., 2011), an oceanic region similar to the Red Sea in terms of nutrient concentrations and water temperatures. This result is in agreement with earlier reports of 16S rRNA gene sequences retrieved from the same region (Ngugi et al., 2011). In our dataset, no sequences fell within the HL I clade or any of the LL-adapted clades. Upon comparing coastal (stations 18 and 27) and open-water (stations 14 and 31) samples at the north–central and central sites, we noticed a slightly higher occurrence of HL VI-affiliated sequences at coastal areas (Fig. 3a). This coincides with higher Chl a concentrations reported in near-shore waters of the

Red Sea, due to shallower depths and higher nutrient availability (Qurban et al.,

2014). Interestingly, comparing coastal (station 2 and 198) and open-water

(station 5 and 197) samples at the northern sites close to the Gulf of Aqaba, there was no difference in the surface water Prochlorococcus communities, but a higher occurrence of HL VI-affiliated sequences in samples collected during

Spring 2010 compared to Fall 2011 (Fig. 3b and 3c). The higher abundances of

HL VI-affiliated sequences might be attributed to lower temperatures of the

70 surface water in the spring (Table 1), although statistical analysis over all data in

this study indicated that the abundance of HL VI was correlated more strongly

with salinity (negative correlation) than with the other factors (Table 2).

Additionally, the abundance of HL VI was negatively correlated with the

abundance of HL II, indicating that HL VI might have a higher fitness at lower

salinities and might in these conditions replace a fraction of the HL II population.

The similarity of Prochlorococcus communities between coastal and open-ocean

sites in both seasons might reflect their relatively close proximity and similar

physicochemical characteristics (Table 1).

Table 2 Correlations of environmental factors and the presence of certain ecotypes on phylogenetic diversity or presence of other factors

71

Figure 2 Neighbor-joining phylogenetic tree with collapsed subgroups based on KRSE 2010 and KRSE 2011 Prochlorococcus 16S–23S rRNA ITS sequences from clone libraries from the Red Sea surface waters (10 m) and the water column. Major Prochlorococcus clades are indicated on the tree. The numbers in the triangular node indicate the number of sequences in each of the datasets (left: 2010, right: 2011). The tree was rooted with the ITS sequence of Synechococcus strain WH7803.

a b c 72 100%# 100%#

80%# 80%#

60%# 60%#

40%# 40%#

20%# 20%#

0%# 0%# 18# 14# 27# 31# 10*# 10# 25# 50# 100# 200# 390# 10*$ 10$ 25$ 50$ 100$ 200$ 500$ Sta.on#number# Depth#(m)# Depth$(m)$

d e

100%#

80%#

60%#

40%#

20%#

100%# 90%# 0%# 80%# 10# 25# 50# 100# 200# 70%# 10# 25# 50# 100# 153# 60%# 50%# Depth#(m)# Depth#(m)# 40%# 30%# 20%# 10%# 0%# 10# 25# 50# 100# 153# Figure 3 ProchlorococcusDepth&(mbsl)& community composition in the Red Sea. a) Surface waters (10 m) at different stations sampled during KRSE 2010; b) Water column of the north, coastal station 198 (KRSE 2011), including surface water samples from station 2 (KRSE 2010); c) Water column of the north, open-water station 197 (KRSE 2011), including surface water samples from station 5 (KRSE 2010); d) Water column of the south, coastal station 13 (KRSE 2011); and e) Water column of the south, open-water station 4 (KRSE 2011). The asterisk (*) indicates surface water samples from KRSE 2010.

73 Prochlorococcus diversity along the water column

In order to investigate the depth distribution of Prochlorococcus ecotypes, we analyzed sequences from a range of depths in the water-column from the KRSE

2011 dataset (stations 198, 197, 13 and 4; Fig. 1). From all sequences analyzed across all depths, around 80% belonged to the HL-adapted clades and roughly

20% belonged to the LL-adapted clades (Table 1, Fig. 2). We observed very slight differences in Prochlorococcus community composition between coastal and open-water sites at 25 and 50 m depths. The percentage of sequences that clustered with HL VI at these depths was higher at coastal stations 13 (~ 50% and 40%) and 198 (~ 7% and 2.7%) than open-water station 4 (1%; for station

197, we retrieved only few sequences from those depths) (Fig. 3b–e). Only one sequence from 50 m depth at station 13 fell into the HL I group, and no sequences from any station clustered with the recently detected HNLC clades

(Rusch et al., 2010; West et al., 2011).

In the fall season, the water column in the Gulf of Aqaba, Red Sea, witnesses an erosion of its stratification as its surface layer deepens (Carlson et al., 2014) and deep convective mixing results in higher concentrations of nutrients at depths of up to 200 m (Qurban et al., 2014). In the southern end, monsoon winds cause an upwelling in the intermediate waters (100 ~ 200 m) of the Gulf of Aden as it moves towards the Red Sea (Siddall et al., 2002). For all stations, fewer HL II-affiliated sequences were found at 100 m than in any other depths, whereas sequences related to the HL VI clade where identified at coastal stations only (i.e. 11% from station 13 and 1 sequence recovered from station

198). At 100 m depth, the diversity of Prochlorococcus seemed to increase

74 significantly at stations 4, 13 and 197 (for station 198, we retrieved only three sequences at that depth). This increase was reflected by high values in phylogenetic diversity (Table 1). Although we do not have any evidence, we speculate that the low light concentrations at 100 m, together with the unusually high temperatures at depth in the Red Sea provide suitable habitats for a co- existence of diverse Prochlorococcus ecotypes. Statistical analysis indicated that the phylogenetic diversity was better (negatively) correlated with chlorophyll concentrations than with depth (Table 2).

The majority of sequences recovered from depths >100 m at all four stations in this study were affiliated with the HL II ecotype (Fig. 3b–e). HL- adapted Prochlorococcus populations were previously found in the Gulf of Aqaba

– the extreme northern Red Sea, at depths of up to 150 m (West et al., 2001) and 175 m (Steglich et al., 2003). Similar to this report, we show here that in the southern part of the Red Sea (stations 13 and 4), HL II populations dominate at

153 m and 200 m, respectively. In the Atlantic Ocean, the HL II clade dominated the water column at light levels as low as 1%, underscoring their ability to adapt to low light intensities (Jameson et al., 2010). Furthermore, previous reports suggested that HL II adapted cells dominate the Prochlorococcus communities at low latitudes, which was strongly correlated with prevailing higher temperatures

(Johnson et al., 2006). Temperatures in the deep-sea water mass (up to 2,500 m) of the Red Sea water column are uniform at ~22 °C throughout the year

(Manasrah et al., 2004; Ngugi & Stingl 2012). Additionally, deeper light penetration and seasonal mixing in the extreme ends of the Red Sea water column might explain the prevalence of HL II cells at 390 m and 500 m at

75 stations 198 and 197 near the Gulf of Aqaba. Interestingly, Jiao and colleagues

(2013) analyzed aphotic regions of the Western Pacific using flow cytometry and molecular analyses and reported the presence of high-light adapted

Prochlorococcus cells at 500 m depth and significant amounts of HL II cellular rRNA at 300 m in the Luzon Strait. The abundance of Prochlorococcus in aphotic zones might also be attributed to their ability to import organic compounds as demonstrated by Gomez-Baena et al. using several HL and LL strains, although gene expression studies were carried out on a LL-adapted strain (Gomez-Baena et al., 2008). Recently, potential mixotrophic metabolic pathways have been reported in Prochlorococcus and might further explain their presence in aphotic habitats (Muñoz-Marín et al., 2013). Although these are exciting hypotheses, we currently do not have any data on the actual abundance of Prochlorococcus in these depths as this study relied exclusively on PCR-based methods.

Furthermore, we also cannot exclude the possibility that these sequences originate from dead cells.

The low light clade of Prochlorococcus has been shown to be more genetically diverse than the high light group (Kettler et al., 2007). Sequences affiliated with LL-adapted clades (20% of total sequences) in our datasets appeared mostly in the intermediate water layers (100 to 200 m) at all stations except for station 198, where not many sequences were retrieved at these depths. Sequences belonging to clade LL I (~ 8% of total LL sequences) were detected at 100 m in stations 4, 13, 197 and at 390 m depth in station 198.

Sequences that belonged to clades LL II and III (~18% of total LL sequences)

76 were detected at 100 m in stations 4 and 13, but only LL II sequences were found in station 197 and neither LL II nor LL III sequences were detected in station 198, at any depth. LL IV affiliated sequences represented ~16% of total

LL sequences, and were found at all stations at depths of 100 and ~200 m (Fig.

2). Interestingly, the majority of LL-adapted sequences (~ 58%) appeared to cluster with the previously described non-cultured clade (NC1; Martiny et al.,

2009). At station 197 and a depth of 100 m, 69% of the sequences belonged to the NC1 group, whereas at 200 m this cluster only accounted for ~14%. NC1 sequences were also retrieved from stations 4 (100 m), and 13 (100 ~ 153 m;

Fig. 3). It has been suggested that the NC1 clade may not form a completely monophyletic group due to the presence of multiple lineages within (Martiny et al., 2009; Huang et al., 2011). No sequences clustered with the recently defined groups LL V and LL VI (Lavin et al., 2010).

Conclusions

The niche partitioning of HL and LL Prochlorococcus populations has been documented across many oceanic regions, but so far no study had focused on the Red Sea. We investigated the microdiversity of Prochlorococcus in the Red

Sea water column using samples collected during two separate cruises in various regions of this unique water body. For samples at 50 m and shallower, the diversity in the coastal and open-water areas was very similar, and we found that

HL II cells dominated the Prochlorococcus communities. In contrast to this, the diversity at 100 m exhibited varied compositions of both HL and LL clades. In particular, the majority of LL-affiliated sequences fell within the NC1 clade recently discovered in the Atlantic Ocean. Interestingly, HL II sequences were

77 again dominant at depths of up to 500 m, suggesting HL II cells are viable at depths with no light exposure. In general, this report indicates that

Prochlorococcus communities across the Red Sea are rather homogenous, and do not contain any novel ecotypes, despite its unusual physicochemical features.

Cultivation of Prochlorococcus cells belonging to HL and LL clades from the Red

Sea coupled with genomic and physiological characterization will provide explanations for their vertical distribution and relative abundance at different depths.

Data submission

ITS sequences produced in this study were deposited in Genbank under accession numbers KJ625632-KJ626165.

Acknowledgements

We thank AbdulAziz Al-Suwailem, the scientists, the captain and the crew of the

R/V Aegaeo for the KAUST Red Sea Expeditions in 2010 and 2011, and the

Coastal and Marine Resources Core Lab (CMRC) and the Biosciences Core

Laboratory (BCL) at King Abdullah University for Science and Technology

(KAUST) for their logistical and technical support.

78 REFERENCES

Acosta F, Ngugi DK & Stingl U (2013) Diversity of picoeukaryotes at an oligotrophic site off the Northeastern Red Sea Coast. Aquat Biosyst 1: 9-16.

Brown M & Fuhrman J (2005) Marine bacterial microdiversity as revealed by internal transcribed spacer analysis. Aquat Microb Ecol 41: 15-23.

Buck KR, Chavez FP & Campbell L (1996) Basin-wide distributions of living carbon components and the inverted trophic pyramid of the central gyre of the North Atlantic Ocean, Summer 1993. Aquat Microb Ecol 10: 283-298.

Campbell L, Nolla H & Vaulot D (1994) The importance of Prochlorococcus to community structure in the central North Pacific Ocean. Limnol Oceanogr 39: 954-96.

Carlson DF, Fredj E & Gildor H (2014) The annual cycle of vertical mixing and restratification in the Northern Gulf of Eilat/Aqaba (Red Sea) based on high temporal and vertical resolution observations. Deep Sea Research Part I: Oceanographic Research Papers 84: 1-17.

Chavez FP, Messié M & Pennington JT (2011) Marine primary production in relation to climate variability and change. Annual Review of Marine Science 3: 227-260.

Chisholm SW, Olson RJ, Zettler ER, Goericke R, Waterbury JB & Welschmeyer NA (1988) A novel free-living prochlorophyte abundant in the oceanic euphotic zone. Nature 334: 340-343.

Chisholm SW, Frankel SL, Goericke R, Olson RJ, Palenik B, Waterbury JB, West-Johnsrud L & Zettler ER (1992) Prochlorococcus marinus nov. gen. nov. sp.: an oxyphototrophic marine prokaryote containing divinyl chlorophyll a and b. Arch Microbiol 157: 297-300.

Da Silva A, Young C & Levitus S (1994) Algorithms and Procedures. Vol. 1, Atlas of surface marine data 1994 NOAA Atlas NESDIS 6: 83.

Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R (2011) UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27: 2194–2200.

Edwards FJ (1987) Climate and oceanography: Red Sea, Pergamon Press, Oxford 45-69.

79 Faith DP (1992a) Conservation evaluation and phylogenetic diversity. Biol. Conserv 61:1–10.

Fuller NJ, West NJ, Marie D, Yallop M, Rivlin T, Post AF & Scanlan DJ (2005) Dynamics of community structure and phosphate status of picocyanobacterial populations in the Gulf of Aqaba, Red Sea. Limnol Oceanogr 50: 363-375.

Gómez-Baena G, López-Lozano A, Gil-Martínez J, Lucena JM, Diez J, Candau P & García-Fernández JM (2008) Glucose uptake and its effect on gene expression in Prochlorococcus. PloS One 3: e3416.

Griebel T, Brinkmeyer M & Bocker S (2008) EPoS: a modular software framework for phylogenetic analysis. Bioinformatics 24: 2399-2400.

Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, et al. (2011) Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res 21: 494–504.

Hess WR, Rocap G, Ting CS, Larimer F, Stilwagen S, Lamerdin J & Chisholm SW (2001) The photosynthetic apparatus of Prochlorococcus: insights through comparative genomics. Photosynthesis Res 70: 53-71.

Huang S, Wilhelm SW, Harvey HR, Taylor K, Jiao N & Chen F (2012) Novel lineages of Prochlorococcus and Synechococcus in the global oceans. ISME J 6: 285-297.

Jameson E, Joint I, Mann NH & Mühling M (2008) Application of a novel rpoC1- RFLP approach reveals that marine Prochlorococcus populations in the Atlantic gyres are composed of greater microdiversity than previously described. Microb Ecol 55: 141-151.

Jameson E, Joint I, Mann NH & Mühling M (2010) Detailed analysis of the microdiversity of Prochlorococcus populations along a North-South Atlantic Ocean transect. Environ Microbiol 12: 156-171.

Jardillier L, Zubkov MV, Pearman J & Scanlan DJ (2010) Significant CO2 fixation by small prymnesiophytes in the subtropical and tropical northeast Atlantic Ocean. The ISME journal 4: 1180-1192.

Jiao N, Luo T, Zhang R, Yan W, Lin Y, Johnson Z, Tian J, Yuan D, Yang Q & Sun J (2013) Presence of Prochlorococcus in the aphotic waters of the western Pacific Ocean. Biogeosciences Discussions 10: 9345-9371.

80 Johnson ZI, Zinser ER, Coe A, McNulty NP, Woodward EMS & Chisholm SW (2006) Niche partitioning among Prochlorococcus ecotypes along ocean-scale environmental gradients. Science 311: 1737-1740.

Kettler GC, Martiny AC, Huang K et al. (2007) Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet 3: e231.

Lavin P, González B, Santibánez J, Scanlan DJ & Ulloa O (2010) Novel lineages of Prochlorococcus thrive within the oxygen minimum zone of the eastern tropical South Pacific. Environmental Microbiology Reports 2: 728-738.

Li WKW (1994) Primary productivity of prochlorophytes, cyanobacteria, and eucaryotic ultraphytoplankton: measurements from flow cytometric sorting. Limnol Oceanogr 39: 169–175.

Lindell D & Post AF (1995) Ultraphytoplankton succession is triggered by deep winter mixing in the Gulf of Aqaba (Eilat), Red Sea. Limnol Oceanogr 40: 1130- 1141.

Manasrah R, Badran M, Lass HU & Fennel W (2004) Circulation and winter deep-water formation in the northern Red Sea. Oceanologia 46: 5-23.

Martiny AC, Tai AP, Veneziano D, Primeau F & Chisholm SW (2009) Taxonomic resolution, ecotypes and the biogeography of Prochlorococcus. Environ Microbiol 11: 823-832.

Moore LR, Rocap G & Chisholm SW (1998) Physiology and molecular phylogeny of coexisting Prochlorococcus ecotypes. Nature 393: 464-467.

Munoz-Marin MC, Luque I, Zubkov MV, Hill PG, Diez J & Garcia-Fernandez JM (2013) Prochlorococcus can use the Pro1404 transporter to take up glucose at nanomolar concentrations in the Atlantic Ocean. Proc Natl Acad Sci USA 110: 8597-8602.

Ngugi DK, Antunes A, Brune A & Stingl U (2011) Biogeography of pelagic bacterioplankton across an antagonistic temperature–salinity gradient in the Red Sea. Mol Ecol 21: 388-405.

Ngugi DK & Stingl U (2012) Combined analyses of the ITS loci and the corresponding 16S rRNA genes reveal high micro- and macrodiversity of SAR11 populations in the Red Sea. PLoS One 7: e50274.

81 Partensky F, Hoepffner N, Li WK, Ulloa O & Vaulot D (1993) Photoacclimation of Prochlorococcus sp. (Prochlorophyta) strains isolated from the North Atlantic and the Mediterranean Sea. Plant Physiol 101: 285-296.

Partensky F, Hess WR & Vaulot D (1999) Prochlorococcus, a marine photosynthetic prokaryote of global significance. Microbiol Mol Biol Rev 63: 106- 127.

Penno S, Lindell D & Post AF (2006) Diversity of Synechococcus and Prochlorococcus populations determined from DNA sequences of the N- regulatory gene ntcA. Environ Microbiol 8: 1200-1211.

Qurban MA, Balala AC, Kumar S, Bhavya P & Wafar M (2014) Primary production in the northern Red Sea. J Mar Syst. doi: 10.1016/j.jmarsys.2014.01.006

Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M & Sabeti PC (2011) Detecting novel associations in large data sets. Science 334: 1518-1524.

Rocap G, Distel DL, Waterbury JB & Chisholm SW (2002) Resolution of Prochlorococcus and Synechococcus ecotypes by using 16S-23S ribosomal DNA internal transcribed spacer sequences. Appl Environ Microbiol 68: 1180- 1191.

Rusch DB, Halpern AL, Sutton G et al. (2007) The Sorcerer II Global Ocean Sampling expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol 5: e77.

Rusch DB, Martiny AC, Dupont CL, Halpern AL & Venter JC (2010) Characterization of Prochlorococcus clades from iron-depleted oceanic regions. Proc Natl Acad Sci USA 107: 16184-16189.

Scanlan DJ, Ostrowski M, Mazard S, Dufresne A, Garczarek L, Hess WR, Post AF, Hagemann M, Paulsen I & Partensky F (2009) Ecological genomics of marine picocyanobacteria. Microbiol Mol Biol Rev 73: 249-299.

Schloss PD, Westcott SL, Ryabin T et al. (2009) Introducing mothur: open- source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75: 7537-7541.

82 Siddall M, Smeed DA, Matthiesen S & Rohling EJ (2002) Modelling the seasonal cycle of the exchange flow in Bab el Mandab (Red Sea). Deep Sea Research Part I: Oceanographic Research Papers 49: 1551-1569.

Stambler N (2005) Bio-optical properties of the northern Red Sea and the Gulf of Eilat (Aqaba) during winter 1999. J Sea Res 54: 186-203.

Steglich C, Post AF & Hess WR (2003) Analysis of natural populations of Prochlorococcus spp. in the northern Red Sea using phycoerythrin gene sequences. Environ Microbiol 5: 681-690.

Thompson LR, Field C, Romanuk T, Ngugi D, Siam R, El Dorry H & Stingl U (2013) Patterns of ecological specialization among microbial populations in the Red Sea and diverse oligotrophic marine environments. Ecol Evol 3: 1780-1797.

West NJ & Scanlan DJ (1999) Niche-partitioning of Prochlorococcus populations in a stratified water column in the eastern North Atlantic Ocean. Appl Environ Microbiol 65: 2585-2591.

West NJ, Schonhuber WA, Fuller NJ, Amann RI, Rippka R, Post AF & Scanlan DJ (2001) Closely related Prochlorococcus genotypes show remarkably different depth distributions in two oceanic regions as revealed by in situ hybridization using 16S rRNA-targeted oligonucleotides. Microbiology 147: 1731-1744.

West NJ, Lebaron P, Strutton PG & Suzuki MT (2011) A novel clade of Prochlorococcus found in high nutrient low chlorophyll waters in the South and Equatorial Pacific Ocean. ISME J 5: 933-944.

Wright JJ, Lee S, Zaikova E, Walsh DA & Hallam SJ (2009) DNA extraction from 0.22 µM Sterivex filters and cesium chloride density gradient centrifugation. Journal of Visualized Experiments: JoVE.

Zwirglmaier K, Jardillier L, Ostrowski M, Mazard S, Garczarek L, Vaulot D, Not F, Massana R, Ulloa O & Scanlan DJ (2008) Global phylogeography of marine Synechococcus and Prochlorococcus reveals a distinct partitioning of lineages among oceanic biomes. Environ Microbiol 10: 147-161.

Figure legends

Figure 1. Partial map of the Red Sea indicating the location of sampling sites.

Stations 2, 5, 18, 14, 27 and 31 were sampled during Leg 1 (March) of the 2010

83 KAUST Red Sea Expedition (KRSE 2010). Stations 198, 197, 13, 4 were sampled during Leg 1 (September–October) of KRSE 2011.

Figure 2. Neighbor-joining phylogenetic tree with collapsed subgroups based on

KRSE 2010 and KRSE 2011 Prochlorococcus 16S–23S rRNA ITS sequences from clone libraries from the Red Sea surface waters (10 m) and the water column. Major Prochlorococcus clades are indicated on the tree. The numbers in the triangular node indicate the number of sequences in each of the datasets

(left: 2010, right: 2011). The tree was rooted with the ITS sequence of

Synechococcus strain WH7803.

Figure 3. Prochlorococcus community composition in the Red Sea. a) Surface waters (10 m) at different stations sampled during KRSE 2010; b) Water column of the north, coastal station 198 (KRSE 2011), including surface water samples from station 2 (KRSE 2010); c) Water column of the north, open-water station

197 (KRSE 2011), including surface water samples from station 5 (KRSE 2010); d) Water column of the south, coastal station 13 (KRSE 2011); and e) Water column of the south, open-water station 4 (KRSE 2011). The asterisk (*) indicates surface water samples from KRSE 2010.

84

CHAPTER TWO

Detailed pyrotag and metagenomic analyses indicate homogeneity of Prochlorococcus ecotypes in the water column of the Red Sea’s main basin.

Ahmed A Shibl1, Mohammed F. Haroon1, Luke R Thompson2, David K Ngugi1, and Ulrich Stingl1

1Red Sea Research Center, King Abdullah University for Science and Technology, Saudi Arabia 2Knight Lab, University of California, San Diego

The following chapter is the final version of the manuscript that was submitted to Frontiers in Marine Science; Aquatic Microbiology, currently under review

85 Abstract

The marine picocyanobacteria Prochlorococcus generally comprise a significant fraction of the global pelagic bacterioplankton community. In the Red

Sea, they account for around 91% of the phylum Cyanobacteria. Clone libraries of the intergenic spacer region (ITS) suggested the widespread presence of high- light adapted (HL II) ecotypes in the Red Sea, and the occurrence of low-light adapted (LL) ecotypes at intermediate depths in the water column. To obtain a more comprehensive dataset over a wider biogeographical scope, we analyzed

Prochlorococcus rpoC1 gene diversity (gamma subunit of DNA-directed RNA polymerase) from samples across various depths (up to 500 m) at 45 stations that spanned the Red Sea’s main basin from north to south, using a 454- pyrosequencing approach. In addition, we assessed the Prochlorococcus community structure in the northern, central and southern regions of the main basin by identifying and classifying rpoC1 sequences from 45 metagenomes, using a Hidden Markov Model (HMM) based on a set of reference

Prochlorococcus genomes. The Prochlorococcus community structure in our samples corroborated the dominance of HL II adapted cells in the surface waters.

Interestingly, we consistently found high abundances of LL adapted ecotypes co- occurring with HL ecotypes at 100 m depth. In deeper waters, the

Prochlorococcus community structure was dominated by HL II cells. Statistical analyses indicated that the vertical distribution in the water column is correlated with physicochemical gradients; temperature and oxygen seem to correlate with

HL II, while LL clades seem to be more influenced by Chl a, nutrients and salinity. Evolutionary relationships of the prevalent operational taxonomic units

86 (OTUs) indicate the presence of previously known and recently discovered ecotypes that belong to the HL and LL groups. The surge in microdiversity of

Prochlorococcus ecotypes is reflected by the occurrence of those OTUs in high abundance at 100 m depth. Our results suggest that the Prochlorococcus distribution in the main basin of the Red Sea is surprisingly homogeneous and provide evidence of a stable co-occurrence of ecotypic microdiversity at specific depths in the water column.

Keywords; Prochlorococcus, microbial ecology, Red Sea, rpoC1 gene, metagenomics, pyrosequencing, marine microbiology, marine science

87 Introduction

The Red Sea is a poorly explored oligotrophic marine environment with unique physicochemical properties, very different from the neighboring Gulf of

Aqaba (Klinker et al., 1978; Labiosa et al., 2003). The long, narrow basin is enclosed from the northern end by the Suez Canal and Gulf of Aqaba, and from the southern end by the Bab Al Mandab, reducing the water inflow to negligible levels during most of the year (Edwards 1987; Da Silva et al., 1994). Continuous exposure to solar irradiance cause surface waters to reach high temperatures

(23-32 °C) (Raitsos et al., 2013) and the water column to remain isothermal almost all year round (Edwards 1987). Combined with minimal precipitation in the

Arabian region and high evaporation rates (Morcos 1970), the main basin is rendered extremely saline (from 36-41 psu) (Edwards 1987). Remote sensing of

Chl a (a proxy for phytoplankton biomass), surface temperature and wind revealed that the succession of phytoplankton biomass followed a specific seasonality, and was attributed to the effects of mixing and stratification across meridional domains of the Red Sea (Brewin et al., 2013; Raitsos et al., 2013).

The southern Red Sea is the most productive region of the basin, as reflected by the peak in chlorophyll a (Chl a) levels during autumn (Raitsos et al., 2013). This is likely to due to the influx of colder water with higher nutrient concentrations from the Gulf of Aden through the strait of Bab Al-Mandab (Raitsos et al., 2013).

On a molecular level, bacterial (Ngugi et al., 2012) and picoeukaryotic (Acosta et al., 2013) diversity and community assemblages in the northeastern end of the

88 sea were shown to follow a temperature-salinity gradient across the Red Sea basin (Ngugi et al., 2011).

The marine picocyanobacteria Prochlorococcus (Chisholm et al., 1988), and its phylogenetic sister group Synechococcus, are the most abundant photosynthetic microorganisms typically occupying oceans between 45 °N and

45 °S (Olson et al., 1990; Johnson et al., 2006). Despite broad differences in their distribution across water columns and respective genetic diversity, both genera play important roles in the microbial food web and contribute to global primary productivity (Li 1994; Scanlan and West 2002; Jardillier et al., 2010).

Prochlorococcus naturally occurs in high cell numbers in the epipelagic zone of oligotrophic, tropical and subtropical waters where it receives sufficient light energy. Their widespread occurrence in a low nutrient environment is a result of a high surface area-to-volume ratio (Chisholm et al., 1992), an enhanced capacity to harvest sunlight (Partensky et al., 1993; Hess et al., 2001), and the ability to readily fix carbon at extremely low light intensities (Ting et al., 2002). In addition to those physiological advantages, Prochlorococcus is able to occupy the water column due to its high genetic microdiversity (Moore et al., 1998).

Subgroups within the Prochlorococcus genus have several different traits that render them well adapted to their respective (micro-) environments, giving them remarkable resilience within the dynamically complex oceans (Partensky et al.,

1999; Biller et al., 2015).

According to the distinct niches they dominate, Prochlorococcus is divided into high-light adapted (HL) and low-light adapted (LL) clades (Moore et al.,

1998; West and Scanlan 1999; Rocap et al., 2002). Cell adaptation to different

89 light optima arise from the variable ratios of unique divinyl derivatives of Chl b and a molecules that allow them to efficiently absorb the light irradiances they receive, even at much lower intensities (Moore et al., 1998; Ting et al., 2002).

The HL clade has a low divinyl Chl b/a ratio and typically occupies surface waters or surface mixed layers, while the LL clade grows at substantially lower light irradiances and dominates deeper waters (Goericke and Repeta 1993). These clades are further resolved, according to their intergenic transcribed spacer (ITS) region sequences, into several phylogenetically distinct HL (I, II, HNLCs, VI) and

LL (I-VII, NC1) ecotypes (Rocap et al., 2002; Martiny et al., 2009; Lavin et al.,

2010; Rusch et al., 2010; West et al., 2011; Huang et al., 2012). Recently,

Prochlorococcus ITS clone libraries suggested the predominance of HL II adapted cells throughout the water column of the Red Sea (Shibl et al., 2014), although the sampling in this study was limited to a small number of sites across the main basin. Because of their impact on global oceans and current climate change projections (Duarte, 2014), their ecotypic biogeographical distribution and seasonal variability has received a lot of attention (Partensky et al., 1999;

Schattenhofer et al., 2009; Flombaum et al., 2013). Although several novel lineages have been recently identified, representative strains have not been cultured yet. More isolates from each group will allow us to better investigate genomic adaptations that allow them to cope with changes in nutrient and light gradients of different environments (Scanlan et al., 2009).

Recently, through extensive single cell genomics, it was shown that fine- scale sequence diversity exists among coexisting subpopulations of

90 Prochlorococcus in the same drop of seawater (Kashtan et al., 2014). Culture- independent methods and oceanic metagenomes continue to shed light on previously unknown Prochlorococcus groups, which means the true diversity of

Prochlorococcus has not yet been fully elucidated (Rusch et al., 2010; Lavin et al., 2010, Huang et al., 2012). Much of the available information on their microdiversity and niche occupation come from, but is not limited to, 16S rRNA gene and ITS sequence analyses (Rocap et al., 2002; Brown and Fuhrman 2005;

Johnson et al., 2006; Zinser et al., 2006). High sequence similarity and/or length discrepancies have allowed for other less conserved protein-coding genes like psbB (encoding a binding antennae protein), cpeB (encoding β-phycoerythrin) and the nitrogen regulator ntcA (Urbach et al., 1998; Steglich et al., 2003; Penno et al., 2006) to be chosen as markers to analyze Prochlorococcus community structure. One example, rpoC1, the single copy gene encoding the gamma subunit of DNA-directed RNA polymerase, has been more frequently used as a genetic marker to consistently resolve picocyanobacterial communities in various studies (Palenik 1994; Toledo and Palenik 1997; Ferris and Palenik 1998; Ma

2004; Jameson et al., 2008, 2010; Gutierrez-Rodriguez et al., 2014). Phylogenies of rpoC1 gene fragments and comparison with restriction fragment length polymorphism-type (RFLP) sequences from contrasting light intensities in the

Atlantic Ocean led to the detection of putative novel lineages within the LL and

HL I clades (Jameson et al., 2008; 2010). Additional novel picocyanobacterial genotypes related to LL I were recently detected by rpoC1 clone libraries and qPCR assays on samples from the Costa Rica Dome. Spatial differences in picocyanobacterial populations and their taxonomic resolution was linked to the

91 strong interannual variability of the dome, a shallow oxycline and the water body’s environmental gradients (Gutierrez-Rodriguez et al., 2014). In this study, we report the detailed community structure of Prochlorococcus using shotgun metagenomics and PCR-based high throughput sequencing based on the rpoC1 marker gene of the water column. Our dataset represents a large-scale survey that spans various depths of the main basin. We investigate the correlation between Prochlorococcus microdiversity and the physicochemical gradients of the northern, central and southern regions of the Red Sea.

92

Materials and methods

Sample collection

Red Sea water-column samples for pyrotag sequencing and metagenomic shotgun sequencing were collected in September/October 2011 as part of the third R/V Aegaeo KAUST Red Sea Expedition (KRSE 2011). For pyrotag sequencing, sampling sites included twenty-two expanded transects from the coast to the open ocean across the main basin and encompassed six depths (10,

25, 50, 100, 200, 500 or the deepest possible depth at shallower stations). For metagenomic shotgun sequencing, sampling sites included 8 stations and encompassed the same six depths (Figure 1). We used a rosette sampler outfitted with 12 Niskin bottles (10 L/bottle) for water and a CTD unit (SBE

911plus, Sea-Bird Electronics, Bellevue, WA, USA) to capture salinity, temperature, fluorescence and oxygen readings (Table S1). Collected water samples were immediately filtered by serial filtration through a prefilter (mixed cellulose ester, 5.0 µm, Millipore) to remove suspended particles, followed by 1.2

µm and 0.1 µm filters (mixed cellulose ester, Millipore). Individual filters were immediately sealed inside sterile zip-lock bags and frozen at –20 °C onboard.

DNA extraction and rpoc1 pyrosequencing

Total gDNA was extracted from the 0.1-µm filters using a protocol modified from Rusch et al. (2007). In brief, the aqueous DNA extracts (after the chloroform–isoamyl–alcohol extraction step) were washed with 10 mM TE buffer

(10 mM Tris-HCl; 1 mM Na-EDTA, pH 8.0; Wright et al., 2009) and concentrated

93 using Amicon ultracentrifugation tubes (10 kDa, Millipore) as described by the manufacturer. The purity of the extracted DNA was assessed with 1% agarose gel electrophoresis and quantified using Quant-iT PicoGreen (Molecular Probes,

Invitrogen, Carlsbad, CA) as described by the manufacturer. The picocyanobacterial rpoC1 gene was amplified by PCR using primers that target

Prochlorococcus lineages; forward primer 5M_F (5’-

GARCARATHGTYTAYTTYA-‘3) and reverse primer SAC1039R (5’-

CYTGYTTNCCYTCDATDATRTC-‘3) (Tai and Palenik 2009; Gutierrez-Rodriguez et al., 2014). Each PCR mixture (15 µl) contained 7.5 µl of the 2X QIAGEN

Multiplex PCR Master Mix, 0.5 µl of each primer (10 µM) and 2 µl template DNA for samples >10 ng/µl or 4 µl template DNA for samples <10 ng/µl. The PCRs were run with the following conditions: initial denaturation at 94 °C for 15 min, followed by 35 cycles at 94 °C for 30 sec, 35 °C for 45 sec, 72 °C for 45 sec and a final extension at 72 °C for 10 min. A unique barcoded forward primer

(annealed to Lib-L) was added to each of the PCR purified products (2 µl) and another PCR was run with the same conditions except the annealing temperature was raised from 35 °C to 40 °C for 45 sec. Samples were then pooled at equimolar quantities to get 20 ng each, followed by DNA precipitation with isopropanol. Amplicons were purified with the QIAquick Gel Extraction Kit

(Qiagen, Hilden, Germany) and DNA concentrations were measured using the

Qubit BR instrument (Invitrogen, Inc., Carlsbad, CA, USA) as described by the manufacturer. Pyrosequencing was performed on a Roche 454 Genome

Sequencer FLX Titanium instrument (454 Life Sciences, Branford, CT, USA) according to the manufacturer’s protocol.

94 Pyrotag data processing

Resulting pyrotags were pre-processed by trimming primers/ adapters and denoised using the MOTHUR software package (Schloss et al., 2009). Chimeric sequences were detected and removed by Uchime (Edgar et al., 2011). A total of

32,437 optimized reads generated 3935 OTUs after removing singletons, with an average length of 436 nucleotides. For downstream analyses, we selected 113 samples that had at least 50 sequences retrieved from them. The community richness and evenness for each depth sampled was estimated using observed species, Chao1 (Chao, 1984), Shannon indices and rarefaction curves were generated using MOTHUR (Figure S1). First, we designed a reference database of rpoC1 sequences from all available Prochlorococcus genomes (Biller et al.,

2014; Kashtan et al., 2014) for downstream classification with GraftM v0.6.3

(github.com/geronimp/graftM), and phylogenetic analyses (Figure S2). The reference database was aligned and a pairwise distance matrix was generated in the Molecular Evolutionary Genetics Analysis software (MEGA v6.06; Tamura et al., 2013). The average nucleotide distance between all clades was calculated as

0.07 (0.02 at the amino acid level) and therefore a 93% similarity level was selected to reflect the information of different operational taxonomic units

(OTUs). MOTHUR (Schloss et al., 2009) was used for the taxonomic classification of the rpoC1 pyrosequencing reads at an 80% confidence level.

Phylogenetic analyses were carried out using the rpoC1 reference database alignment and the OTUs that were detected more than twice in at least 5% of the samples. Neighbor-joining and maximum-likelihood tree calculations were done in MEGA v6.06 (Tamura et al., 2013). Heat maps displaying Prochlorococcus

95 taxa and OTU abundance were generated using the R package (R Core Team

2014, www.R-project.org; Warnes et al., 2015).

Metagenomic analyses and data processing

Metagenomic shotgun libraries of the 45 samples were sequenced on the

Illumina platform (Illumina Inc., San Diego, CA, USA) according to the manufacturer’s protocol in the Biosciences Core Laboratory (BCL) at King

Abdullah University for Science and Technology (KAUST), as described in

Thompson et al., submitted. Raw Illumina metagenomic reads were trimmed using Trimmomatic (Bolger et al., 2014) and Phix contamination was removed using Bowtie 2 (Langmead and Salzberg 2012). Error correction was performed by SPAdes 3.5.0 (Nurk et al., 2013) and ribosomal contamination was removed by SortMeRNA (Kopylova et al., 2012).

The 45 metagenomes were analyzed for the relative abundance of

Prochlorococcus taxonomic groups using GraftM v0.6.3

(github.com/geronimp/graftM), employing a Hidden Markov Model (HMM) to identify and classify rpoC1 reads to the most granular phylogenetic level possible, using pplacer (Matsen et al., 2010). Firstly, we built custom-made reference packages specifically targeting the Prochlorococcus rpoC1 gene. A

GraftM reference package typically consists of an alignment file with taxonomic information and a phylogenetic tree. HMM profiles were generated from the alignment file of the publicly available Prochlorococcus rpoC1 genes and the maximum-likelihood tree was constructed using MEGA v6.06 (Tamura et al.,

96 2013) with a bootstrap value of 1,000 iterations. Ordination plots, PCA variation calculations, and boxplots were generated using the vegan package in the R environment (R Core Team 2014, www.R-project.org; Oksanen et al., 2015).

Nucleotide sequences and dataset accession numbers

The dataset generated by pyrosequencing for this study has been submitted to the NCBI Sequence Read Archive under the accession number:

SRP063380.

The OTU sequences reported in this study have been deposited in the

NCBI GenBank database under the following accession numbers: KT585635-

KT585672.

The metagenomic reads used in this study have been submitted to the

NCBI Sequence Read Archive under the following accession numbers:

SRR2102994-SRR2103038.

97 Results

Environmental conditions

Seawater samples were collected at six depths (10, 25, 50, 100, 200, and

500 m (or the deepest possible if depth < 500 m)) from 45 stations that spanned the main basin of the Red Sea. Among those 45, eight stations, representative of the southern, central and northern regions, were sampled for further metagenomic and nutrient analysis (Figure 1). For each of these eight stations, the same six depths were sampled, except for the shallow station number 12 where samples for three depths were retrieved, resulting in a total of 45 metagenomes to analyze. At these eight stations, temperatures ranged from 21.5 to 31.2 °C, and salinity ranged from 37 to 40 psu. Oxygen levels were 0.6 mg/L at the lowest, 4.6 mg/L at the highest and the surface layer of the water column was nutrient-poor relative to the deeper waters as indicated by the increase in overall nutrient concentrations with increasing depth (Table S1).

Relative abundance of Prochlorococcus in the Red Sea based on 16S rRNA genes

Based on the metagenomes presented in this study, the relative

Prochlorococcus 16S rRNA gene abundance within the Red Sea displayed a decrease with depth. Surface samples (10, 25, and 50 m) shared very similar relative abundances and displayed an average of 9.1% Prochlorococcus relative abundance, whereas at intermediate waters (i.e. 100 m) the abundance

98 decreased to 3%. The presence of Prochlorococcus cells in deeper samples at

200 and 500 m remained detectable but declined significantly to 0.5%. (Figure 2).

Figure 1 Map of the Red Sea’s main basin. Dark blue solid circles indicate stations at which the water column was sampled for pyrotag sequencing of the Prochlorococcus rpoC1 gene fragment. Yellow numbered stars represent the stations at which the water column was sampled for further metagenomic shotgun sequencing. All water samples were collected during KRSE2011 onboard the R/V Aegaeo . 99

Figure 2 Metagenomic 16S rRNA abundance. The boxpl ot displays the relative abundance (%) of

Prochlorococcus according to 16S rRNA gene fragment recruitment at different depths from the eight sequenced metagenomes. Center lines show the medians; box limits indicate the 25th and 75th percentiles as determin ed by R software; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles.

Prochlorococcus rpoC1 microdiversity and distribution According to Shannon, Chao1 and observed species (Sobs), the alpha

diversity metrics of the pyrotag dataset varied at different depths. Samples from

the surface (10, 25, 50 m) had the highest values for the Chao1 nonparametric

estimator of richness (3100, 3500, 3200, respectively) and Shannon evenness

(6.9, 6.9, 7, respectively). Among the deeper waters (100, 200, 500 m), the 100

m samples reported the highest richness (1600) and evenness (6.55) values.

100 The Chao1 richness estimator was higher for 200 m (1000) than 500 m (750) while the Shannon index of evenness was higher for the 200 m (4.9) compared to the 500 m (5.4). The observed species for surface samples ranged from 2125-

2500. For the 100 m samples, it was 1155, and deeper samples had the lowest observed species of 187 (for 200 m) and 250 (for 500 m) (Figure S1).

The relative abundance of Prochlorococcus clades based on the classification of the rpoC1 pyrotags is shown in a taxonomy heat map (Figure 3).

Using a reference database of available rpoC1 gene sequences for all cultured and sequenced strains, the classification tool in the MOTHUR software package

(Schloss et al., 2009) generated four groups; HL II, and three ‘unclassified’ groups; Prochlorococcus, HL, and LL. The heat map demonstrates the depth distribution of the Prochlorococcus community, from the southern to the northern region of the Red Sea (Figure 3). There is a clear homogeneous pattern across the whole basin of the Red Sea. OTUs belonging to HL II were present throughout the water column, but their relative abundance declined at 100 m while the other groups displayed an increase. This surge in ecotypic diversity, reflected by HL- and LL-affiliated OTUs, was followed by another dominance of

HL II in deeper waters (Figure 3).

101

Figure 3 Heat map based on taxonomic classification of rpoC1 pyrotags. The taxonomy heat map depicts the relative OTU abundance (%) of Prochlorococcus rpoC1 gene from 454-pyrotag data along the different depths of the water column. The samples in the heatmap are those with >50 sequences retrieved and are arranged by station, from south to north.

Phylogenetic analysis of abundant rpoC1 pyrotag OTUs Maximum likelihood and neighbor-joining trees were generated in the

Molecular Evolutionary Genetics Analysis software (MEGA v6.06; Tamura et al.,

2013) and included Prochlorococcus sequences from cultured isolates (Rocap et

al., 2003, Kettler et al., 2007), single cell amplified genomes (Malmstrom et al.,

2013, Kashtan et al., 2014), metagenomic assemblies (Rusch et al., 2010) and

new whole-genome assemblies (Biller et al., 2014) (Figure S2). A total of 38

abundant OTUs, their criteria being found more than twice in at least 5% of the

samples, were used to construct the maximum likelihood and neighbor-joining

trees with 1,000 bootstrap iterations. The majority of the OTUs (i.e. 34) clustered

with the HL II group and only four OTUs fell within the LL group. Within the HL II,

seven OTUs formed a cluster on their own. Within the LL group, one OTU

clustered with the LL II/ III clade and three others formed sequence clusters that

may represent some of the recently discovered lineages that do not have a

cultured representative yet. The number of sequences contained in each OTU

varied and is displayed on the tree next to the accession number (Figure 4). The

abundance of the four LL-affiliated OTUs was highest at almost all the 100 m

samples, which formed a cluster except for a few (Figure S3). Among the HL-

102 affiliated OTUs, OTU01 (KT585635) was the most abundant and was found in stations that spanned all the regions of the main basin. It is worth noting that this

OTU contained the highest number of sequences among the 38 OTUs. The homogeneity of the Prochlorococcus community in the Red Sea is also reflected by the distribution of the abundant OTUs across all the sampled stations. (Figure

S3).

Fragment recruitment from metagenomes

To circumvent the PCR-bias that may result from the amplification and pyrosequencing of the rpoC1 gene, Prochlorococcus populations in the metagenomes were analyzed. To show the relationship between various environmental parameters and the Prochlorococcus population distribution in the water column, we generated ordination plots based on the rpoC1 gene reads returned from the metagenomes. One plot encompasses all stations and depths

(Figure 5A) while the other concentrates on surface samples (10, 25, 50 m) to identify horizontal patterns (Figure 5B). There was no clear difference in the community structure of coastal and open-water samples in terms of their correlation with physicochemical properties. The samples clustered according to their depth, although the 100 m samples appear to be spread between surface and deeper samples (Figure 5A). Surface samples did not show a particular geographical pattern but seemed to be more influenced by temperature rather than nutrient levels (nitrate, phosphate and silicate). The HL and LL unclassified groups are positively correlated with nitrate, phosphate, silicate levels and Chl a, but negatively correlated with temperature and oxygen (Figure 5B).

103

Figure 4 (next page) Neighbor-joining tree of rpoC1 sequences. The phylogenetic tree was constructed using the p-distance model and 1,000 bootstrap replicates. The sequences included were obtained from ~50 Prochlorococcus and ten Synechococcus strains (CC9311, CC9902, CC9605, RCC307, RS9916, RS9917, WH5701, WH7803, WH7805, WH8102). The most abundant OTUs from this study are all shown on the tree except for “Otu31 (KT585662)” that fell within the collapsed LL II/ III group. Bootstrap values are represented on each node by symbols and the scale bar indicates the estimated sequence divergence. The outgroup used on the tree is the rpoC1 sequence of Synechococcus elongates strain PCC6301. The branches representing major clades of Prochlorococcus are collapsed (except for HL II) for easier visualization. The accession number of each OTU and the number of sequences contained in them is shown at the branch tip.

104

105

from which

Prochlorococcus

500 m) - ; 200 reads recruited from the

blue the community structure of rpoC1 and ; 100 m, from which the metagenomic samples

ge the community structure of oran and 50 m, lengths of the lines represent the strength of the

, and population based on the

25 physicochemical parameters of

(arranged from south to north)

Prochlorococcus physicochemical parameters of

lot Ordination plot

the different depths (green; 10, B) Ordination p

A)

from surface samples (10, 25, and 50 m) only. The

Symbols indicate the different stations igure 5 Principal component analysis (PCA) of the F metagenomes via GraftM. from all stations Prochlorococcus and depths. correlation. were obtained and colors indicate they were collected.

106 Discussion

A high diversity of Prochlorococcus has been reported for several oceanic regions with contrasting physicochemical properties (West and Scanlan 1999;

Ferris and Palenik 1998). Despite microbial community analyses conducted in the Gulf of Aqaba, (Lindell and Post 1995; Fuller et al., 2005), the main basin of the Red Sea remained understudied in terms of its microbiology. Ngugi et al.

(2011) analyzed the surface bacterial diversity of the northern and central Red

Sea using 16S rRNA sequences and reported the predominance of HL

Prochlorococcus and SAR11 OTUs. Here, we use pyrotag sequencing of the rpoC1 marker gene for its ability to resolve Prochlorococcus into its discrete HL- and LL-adapted lineages (Palenik 1994; Toledo and Palenik 1997; Ferris and

Palenik 1998). In addition, we apply shotgun metagenomic sequencing to bypass the caveats of bias introduced by PCR-based methods. Our samples were collected in the autumn season (September/ October) during which Chl a measurements peak in the extreme southern end of the Red Sea (Raitsos et al.,

2013). This occurs due to the inflow of cold, less saline and nutrient rich waters from the Indian Ocean, which can influence the bacterioplankton community in this region exclusively. The methods used in this study provide a robust, water column dataset that covers comparable regions in order to address the diversity and distribution of Prochlorococcus lineages in the entire Red Sea.

Based on fragment recruitment of the Prochlorococcus 16S rRNA gene in the metagenomes analyzed here, surface samples (10, 25, 50 m) displayed a

107 higher abundance of Prochlorococcus than deeper samples (100, 200, 500 m)

(Figure 2). In the northern Red Sea, taxonomic classification of bacterial 16S rRNA pyrotags from 10 m samples indicated an 18% relative abundance of

Prochlorococcus (Ngugi et al., 2011). Despite being generally more abundant in the photic zones of water masses (Partensky et al., 1999), the notable presence of Prochlorococcus has been reported in the aphotic zone of the Western Pacific

(Jiao et al., 2013). Similarly, the presence of Prochlorococcus and

Synechococcus was recently reported in all mesopelagic (i.e. ~ 600 m) metagenomic samples collected across the global oceans (Sunagawa et al.,

2015). However, this presence could be attributed to their attachment to sinking aggregates, providing the deep-sea with an energy source (Lochte and Turley

1988).

A heat map generated from all samples with ≥50 sequences shows a predominance of HL II sequences in the water column, along with a gradual increase of another less abundant HL (unclassified) group. Interestingly, we found Prochlorococcus HL and LL-unclassified sequences to be more abundant at 100 m than at any other depth in the water column. At this depth, HL II sequences become less abundant, but at 200 and 500 m they appear to dominate again (Figure 3). Jiao et al (2013) have shown the presence of

Prochlorococcus populations at such depths in the ocean. Based on phylogenetic analyses, the authors report the dominance of HL II affiliated cells in the aphotic zone of the Pacific, as well as the presence of HL VI affiliated cells at 100 m (Jiao et al., 2013). The presence of HL and LL groups at intermediate depths has

108 been reported before in several ocean bodies, possibly as a result of mixing

(West and Scanlan 1999; Penno et al., 2006; Garczarek et al., 2007; Malmstrom et al., 2010). The recently discovered HL VI clade in the South China Sea was postulated to have an intermediate light optimum and can therefore extend into the middle and lower euphotic zones (Huang et al., 2012). Moreover, the observed co-existence of ecotypes at 100 meters could be a result of their collective subpopulations that have distinct genomic backbones, which has been reported by Kashtan et al (2014) for HL-adapted ecotypes only. Within the LL- adapted clade, LL I and II/ III can occupy the middle euphotic zone due to their ability to tolerate light shock (Malmstrom et al., 2010). Other recently identified LL clades (LL V and VI) inhabit the oxygen minimum zones that can extend to the upper layers of the water column (Lavin et al., 2010). The presence of abundant

HL- and LL-affiliated OTUs at 100 m in all stations supports the hypothesis that this particular niche may display a stable co-occurrence of Prochlorococcus populations (Figure S3). This maybe because at 100 m the light intensities are not as high, whereas the temperature is still elevated, favoring the growth of both

LL cells and HL cells.

Correlation of Prochlorococcus distribution patterns with environmental parameters

Principal component analysis (PCA) generated ordination plots that visualize the relationship between environmental parameters and the

Prochlorococcus community structure in the water column (Figure 5). In the plot encompassing all stations and depths, PC1 explained 88.64% of the variation

109 (Figure 5A) while PC1 explained 92.8% of the variation in the surface samples only (10, 25, 50 m) (Figure 5B). The partitioning of samples from 200 and 500 m is suggestive of a strong correlation with high salinity and high levels of nitrate, phosphate and silicate (Figure 5A). The 100 m samples were correlated with chlorophyll, and the interesting taxonomic composition at this depth could be a feature of the chlorophyll maximum layer (DCM). Most of the surface samples

(10, 25 and 50 m) clustered together except for a few anomalies (Figure 5B). It is worth mentioning that these outcast points on the plot are all from southern stations, where the influx of a foreign water mass from the Indian Ocean may have affected the community (Churchill et al., 2015).

Despite having a temperature-salinity gradient across the Red Sea (Ngugi et al., 2011) and Chl a levels being drastically different at extreme ends (Raitsos et al., 2013), negligible difference was found in the community structure of

Prochlorococcus in the surface southern, central and northern regions. The slow influx of waters from the Gulf of Aqaba and the Strait of Bab Al Mandab may not suffice to cause a drastic change in the uniform pattern of Prochlorococcus distribution in the water column. Also, the sampling season in which we obtained this dataset did not display strong gradients and the seasonal effects may have an influence on the community structure. This homogeneity may also be attributed to the prevalence of the HL II ecotype in strongly stratified waters

(Bouman et al., 2011), allowing it to stably occupy the water column. It was observed that LL cells, having a larger genome and cell size, tend to favor highly mixed waters while some strains are more sensitive to light fluctuations than

110 others (Partensky et al., 1999; Zwirglmaier et al., 2008; Bouman et al., 2011).

Different pigment content in HL- and LL-adapted cells leads to different strains dominating at different depths (Campbell and Vaulot 1993; Moore et al., 1998) however, the depths up to which these members can reach is defined by the clarity of the water column (Partensky et al., 1999). In general, the LL clade distribution is less understood due to a lack of cultured representatives, but more potentially novel ecotypes are being termed ‘intermediate’ for their ability to occupy a broader sector of the water column and not necessarily deeper waters only. In the more stratified water column of the main basin Red Sea, this surge in diversity at 100 m could be a permanent phenomenon where distinct ecotypes of

Prochlorococcus are able to co-occur in one niche.

Phylogenetic affiliation of abundant rpoC1 OTUs

We retrieved rpoC1 sequences from all Prochlorococcus genomes available to-date to construct reference neighbor-joining and maximum likelihood trees. To confirm that the rpoC1 gene marker reflects the established genetic microdiversity of the genus, we compared the topology with trees constructed from the ITS sequences from the same genomes and other available ITS sequences (Figure S2). Phylogenetic analyses of the most abundant OTUs from the 454-pyrotag dataset revealed the presence of previously known clades and recently discovered ones. Unique OTUs that were detected twice in ≥ 5% of our samples were selected and 38 sequences were retrieved for phylogenetic inference (Figure 4). From those sequences, 34 clustered with the HL group, seven of which formed a separated branch within the HL II clade and were found

111 to be relatively abundant throughout the water column. One OTU in particular

(Otu15; KT585649) was more dominant at 500 m than the rest (Figure S3).

These OTUs may belong to the recently identified HL VI clade discovered in the

South China Sea (Huang et al., 2012). This novel group was detected in low abundance at the lower euphotic zone (50 - 150 m) based on 16S-23S rRNA ITS sequences but does not have any cultured representatives yet. Hence, we could not include a rpoC1 sequence for it in the tree used in our study. However, ITS clones affiliated with HL VI from the Red Sea were reported in low abundance at the 100 m depth (Shibl et al., 2014), and we suspect that the rpoC1 sequences retrieved in this study might belong to the same population.

The Prochlorococcus LL clade is underrepresented in terms of cultured isolates and sequence availability in databases, but more studies have unraveled putative novel lineages that occupy deeper regions of different oceans (Martiny et al.,

2009; Lavin et al., 2010; Huang et al., 2012). In our dataset, four OTUs (out of the 38) were affiliated with the LL group, one of which clustered within the LL II/

III clade and three of which formed monophyletic branches (Figure 4).

Interestingly, all of these LL-affiliated OTUs were most abundant at 100 m, emphasizing the source of diversity found at this depth. No rpoC1 gene reference sequences are available from the novel LL clades (Martiny et al., 2009; Lavin et al., 2010) and therefore no representatives could be included. However, finding

ITS clones from the Red Sea affiliated with the non-cultured group (NC1) from the Atlantic and Pacific suggests that they are closely related and could be from the same population (Shibl et al., 2014). It is worth noting that NC1 is believed to

112 be polyphyletic (Martiny et al., 2009), which could explain the several independent branches in our tree (Figure 4).

Conclusions

In this study, we expand the knowledge of the niche partitioning of HL and

LL adapted Prochlorococcus populations in the main basin of the oligotrophic

Red Sea. To elucidate the microdiversity within the genus, we applied 454 pyrotag sequencing using the marker gene rpoC1 and fragment recruitments of metagenomes from coastal and open water samples. Results indicated a high microdiversity of Prochlorococcus ecotypes at the 100 m depth in the water column and an unusual dominance of HL II-related sequences in deeper waters.

Phylogenetic analysis of abundant rpoC1 OTUs revealed genotypes that might belong to the recently identified ecotypes that belong to the high-light and low- light clades. Community profiling and quantitative analysis of rpoC1 gene abundance from 45 metagenomes revealed the clustering of samples according to depth and a strong correlation of ecotypic distribution with temperature and oxygen. Although the primary production and nutrient concentrations in the southern region are relatively higher than the central and northern regions of the

Red Sea, the diversity and distribution of Prochlorococcus ecotypes in the water column is homogenous. Future efforts in the isolation of strains from different ecotypes and metatranscriptomic analyses can provide explanations for their vertical distribution and abundance in photic and aphotic zones of the Red Sea.

113 Acknowledgement

We thank AbdulAziz Al-Suwailem, the scientists, the captain and the crew of the R/V Aegaeo for the KAUST Red Sea Expedition 2011, and the Coastal and Marine Resources Core Lab (CMRC) and the Biosciences Core Laboratory

(BCL) at King Abdullah University for Science and Technology (KAUST) for their logistical and technical support. This study was supported by the SABIC Career

Development Chair from KAUST to US.

Figure legends

Figure 1. Map of the Red Sea’s main basin. Dark blue solid circles indicate stations at which the water column was sampled for pyrotag sequencing of the

Prochlorococcus rpoC1 gene fragment. Yellow stars represent the stations at which the water column was sampled for further metagenomic shotgun sequencing. All water samples were collected during KRSE2011 onboard the

R/V Aegaeo.

Figure 2. Metagenomic 16S rRNA abundance. The boxplot displays the relative abundance (%) of Prochlorococcus according to 16S rRNA gene fragment recruitment at different depths from the eight sequenced metagenomes. Center lines show the medians; box limits indicate the 25th and 75th percentiles as determined by R software; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles.

Figure 3. Heat map based on taxonomic classification of rpoC1 pyrotags. The taxonomy heat map depicts the relative OTU abundance (%) of Prochlorococcus rpoC1 gene from 454-pyrotag data along the different depths of the water

114 column. The samples in the heatmap are those with >50 sequences retrieved and are arranged by station, from south to north.

Figure 4. Neighbor-joining tree of rpoC1 sequences. The phylogenetic tree was constructed using the p-distance model and 1,000 bootstrap replicates. The sequences included were obtained from ~50 Prochlorococcus and ten

Synechococcus strains (CC9311, CC9902, CC9605, RCC307, RS9916, RS9917,

WH5701, WH7803, WH7805, WH8102). The most abundant OTUs from this study are all shown on the tree except for “Otu31 (KT585662)” that fell within the collapsed LL II/ III group. Bootstrap values are represented on each node by symbols and the scale bar indicates the estimated sequence divergence. The outgroup used on the tree is the rpoC1 sequence of Synechococcus elongates strain PCC6301. The branches representing major clades of Prochlorococcus are collapsed (except for HL II) for easier visualization. The accession number of each OTU and the number of sequences contained in them is shown at the branch tip.

Figure 5. Principal component analysis (PCA) of the Prochlorococcus population based on the rpoC1 reads recruited from the metagenomes via GraftM. A)

Ordination plot of physicochemical parameters and the community structure of

Prochlorococcus from all stations and depths. B) Ordination plot of physicochemical parameters and the community structure of Prochlorococcus from surface samples (10, 25, and 50 m) only. The lengths of the lines represent the strength of the correlation. Symbols indicate the different stations (arranged from south to north) from which the metagenomic samples were obtained and

115 colors indicate the different depths (green; 10, 25, and 50 m, orange; 100 m, blue; 200-500 m) from which they were collected.

Supplementary material

Table S1. Overview of the physicochemical characteristics of stations sampled for metagenomic shotgun sequencing and analysis during the KAUST Red Sea

Expedition (KRSE) in 2011.

Figure S1. Alpha diversity indices of the six sampled depths from stations sampled for pyrotag sequencing. A) Community richness was estimated using observed species and Chao1 estimator. Community diversity was calculated using the Shannon index. Analyses were executed and plotted in the R environment (R Core Team 2014, www.R-project.org). B) Rarefaction curve based on the observed species. C) Rarefaction curve based on the Good’s coverage for the pyrotags. Both curves were generated using the MOTHUR software package (Schloss et al., 2009).

Figure S2. Phylogenetic trees of reference Prochlorococcus/ Synechococcus ITS and rpoC1 sequences, generated using MEGA (Tamura et al., 2013). A)

Neighbor-joining tree of 16S-23S rRNA ITS sequences constructed using p- distance and 1,000 bootstrap iterations. The sequences were retrieved from all lineages available to date that represent the major phylogenetic clades of

Prochlorococcus and ten Synechococcus strains (CC9311, CC9902, CC9605,

RCC307, RS9916, RS9917, WH5701, WH7803, WH7805, WH8102). The HL II branch contains ITS sequences obtained from the 96 single cells sequenced by

Kashtan et al (2014) and is collapsed for easier representation. The

Synechococcus branch is also collapsed for easier representation. Black solid

116 circles represent bootstrap values more than 80% and the scale bar indicates the estimated sequence divergence. Accession numbers for ITS sequences of strains that do not have a sequenced genome are shown in brackets. The

Synechococcus elongates strain PCC6301 was used as an outgroup. B)

Neighbor-joining tree of rpoC1 sequences constructed using p-distance and

1,000 bootstrap iterations. The sequences were retrieved from all lineages available to date that represent the major phylogenetic clades of

Prochlorococcus and ten Synechococcus strains (CC9311, CC9902, CC9605,

RCC307, RS9916, RS9917, WH5701, WH7803, WH7805, WH8102). The HL II branch contains rpoC1 sequences obtained from the 96 single cells sequenced by Kashtan et al (2014) and is collapsed for easier representation. The

Synechococcus branch is also collapsed for easier representation. Black solid circles represent bootstrap values more than 80% and the scale bar indicates the estimated sequence divergence. Accession numbers for rpoC1 sequences of strains that do not have a sequenced genome are shown in brackets. The

Synechococcus elongates strain PCC6301 was used as an outgroup.

Figure S3. Heat map of the abundance distribution for the 38 rpoC1 OTUs that were found more than two times in at least 5% of the samples. The hierarchical clustering of the sampled stations and depths are shown on the left side of the map. The different shades of grey encompass the OTUs that belong to the HL- or

LL-adapted clades. The bracket on the right side highlight the clustering of 100 m samples at which the LL-affiliated OTUs were found to be most abundant. The heatmap was generated in the R environment (R Core Team 2014, www.R- project.org).

117 REFERENCES

Acosta F., Ngugi D.K. & Stingl U. (2013) Diversity of picoeukaryotes at an oligotrophic site off the Northeastern Red Sea Coast. Aquat Biosyst 1: 9-16.

Biller, S. J., Berube, P. M., Berta-Thompson, J. W., Kelly, L., Roggensack, S. E., Awad, L., et al (2014). Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. Scientific data, 1:140034.

Biller, S. J., Berube, P. M., Lindell, D., & Chisholm, S. W. (2015). Prochlorococcus: the structure and function of collective diversity. Nature Reviews Microbiology, 13(1), 13-27.

Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, btu170.

Bouman, H. A., Ulloa, O., Barlow, R., Li, W. K., Platt, T., Zwirglmaier, K., Scanlan, D. J., & Sathyendranath, S. (2011). Water‐column stratification governs the community structure of subtropical marine picophytoplankton. Environmental microbiology reports, 3(4): 473-482.

Brewin, R. J., Raitsos, D. E., Pradhan, Y., & Hoteit, I. (2013). Comparison of chlorophyll in the Red Sea derived from MODIS-Aqua and in vivo fluorescence. Remote Sensing of Environment, 136, 218-224.

Brown M. & Fuhrman J. (2005) Marine bacterial microdiversity as revealed by internal transcribed spacer analysis. Aquat Microb Ecol 41: 15-23.

Campbell, L., & Vaulot, D. (1993). Photosynthetic picoplankton community structure in the subtropical North Pacific Ocean near Hawaii (station ALOHA). Deep Sea Research Part I: Oceanographic Research Papers, 40(10), 2043- 2060.

Chao, A. (1984). Nonparametric estimation of the number of classes in a population. Scandinavian Journal of statistics, 265-270.

Chisholm S.W., Olson R.J., Zettler E.R., Goericke R., Waterbury J.B. & Welschmeyer N.A. (1988) A novel free-living prochlorophyte abundant in the oceanic euphotic zone. Nature 334: 340-343.

118 Chisholm S.W., Frankel S.L., Goericke R., Olson R.J., Palenik B., Waterbury J.B., West-Johnsrud L. & Zettler E.R. (1992) Prochlorococcus marinus nov. gen. nov. sp.: an oxyphototrophic marine prokaryote containing divinyl chlorophyll a and b. Arch Microbiol 157: 297-300.

Churchill, J. H., Bower, A. S., McCorkle, D. C., & Abualnaja, Y. (2015). The transport of nutrient-rich Indian Ocean water through the Red Sea and into coastal reef systems. Journal of Marine Research, 72(3), 165-181.

Da Silva A., Young C. & Levitus S. (1994) Algorithms and Procedures. Vol. 1, Atlas of surface marine data 1994 NOAA Atlas NESDIS 6: 83.

Duarte, C. M. (2014). Global change and the future ocean: a grand challenge for marine sciences. Frontiers in Marine Science, 1, 63.

Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R (2011) UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27: 2194–2200.

Edwards FJ (1987) Climate and oceanography: Red Sea, Pergamon Press, Oxford 45-69.

Ferris, M. J., & Palenik, B. (1998). Niche adaptation in ocean cyanobacteria. Nature, 396, 226-228.

Flombaum, P., Gallegos, J. L., Gordillo, R. A., Rincón, J., Zabala, L. L., Jiao, N., et al (2013). Present and future global distributions of the marine Cyanobacteria Prochlorococcus and Synechococcus. Proceedings of the National Academy of Sciences, 110(24), 9824-9829.

Fuller NJ, West NJ, Marie D, Yallop M, Rivlin T, Post AF & Scanlan DJ (2005) Dynamics of community structure and phosphate status of picocyanobacterial populations in the Gulf of Aqaba, Red Sea. Limnol Oceanogr 50: 363-375.

Garczarek, L., Dufresne, A., Rousvoal, S., West, N. J., Mazard, S., Marie, D., et al (2007). High vertical and low horizontal diversity of Prochlorococcus ecotypes in the Mediterranean Sea in summer. FEMS microbiology ecology, 60(2), 189- 206.

Goericke, R., & Repeta, D. J. (1993). Chlorophylls a and b and divinyl chlorophylls a and b in the open subtropical North Atlantic Ocean. Marine Ecology-Progress Series, 101, 307-307.

119

Gutiérrez-Rodríguez, A., Slack, G., Daniels, E. F., Selph, K. E., Palenik, B., & Landry, M. R. (2014). Fine spatial structure of genetically distinct picocyanobacterial populations across environmental gradients in the Costa Rica Dome. Limnology and Oceanography, 59(3), 705-723.

Hess WR, Rocap G, Ting CS, Larimer F, Stilwagen S, Lamerdin J & Chisholm SW (2001) The photosynthetic apparatus of Prochlorococcus: insights through comparative genomics. Photosynthesis Res 70: 53-71.

Huang S, Wilhelm SW, Harvey HR, Taylor K, Jiao N & Chen F (2012) Novel lineages of Prochlorococcus and Synechococcus in the global oceans. ISME J 6: 285-297.

Jameson E, Joint I, Mann NH & Mühling M (2008) Application of a novel rpoC1- RFLP approach reveals that marine Prochlorococcus populations in the Atlantic gyres are composed of greater microdiversity than previously described. Microb Ecol 55: 141-151.

Jameson E, Joint I, Mann NH & Mühling M (2010) Detailed analysis of the microdiversity of Prochlorococcus populations along a North‐South Atlantic Ocean transect. Environ Microbiol 12: 156-171.

Jardillier L, Zubkov MV, Pearman J & Scanlan DJ (2010) Significant CO2 fixation by small prymnesiophytes in the subtropical and tropical northeast Atlantic Ocean. The ISME journal 4: 1180-1192.

Jiao N, Luo T, Zhang R, Yan W, Lin Y, Johnson Z, Tian J, Yuan D, Yang Q & Sun J (2013) Presence of Prochlorococcus in the aphotic waters of the western Pacific Ocean. Biogeosciences Discussions 10: 9345-9371.

Johnson ZI, Zinser ER, Coe A, McNulty NP, Woodward EMS & Chisholm SW (2006) Niche partitioning among Prochlorococcus ecotypes along ocean-scale environmental gradients. Science 311: 1737-1740.

Kashtan, N., Roggensack, S. E., Rodrigue, S., Thompson, J. W., Biller, S. J., Coe, A., et al (2014). Single-cell genomics reveals hundreds of coexisting subpopulations in wild Prochlorococcus. Science, 344(6182), 416-420.

Kettler GC, Martiny AC, Huang K et al. (2007) Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet 3: e231.

120

Klinker, J., Reiss, Z., Kropach, C., Levanon, I., Harpaz, H., & Shapiro, Y. (1978). Nutrients and biomass distribution in the Gulf of Aqaba (Elat), Red Sea. Marine Biology, 45(1), 53-64.

Kopylova, E., Noé, L., & Touzet, H. (2012). SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics, 28(24), 3211-3217.

Labiosa, R. G., Arrigo, K. R., Genin, A., Monismith, S. G., & van Dijken, G. (2003). The interplay between upwelling and deep convective mixing in determining the seasonal phytoplankton dynamics in the Gulf of Aqaba: Evidence from SeaWiFS and MODIS. Limnology and oceanography, 48(6), 2355-2368.

Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), 357-359.

Lavin P, González B, Santibánez J, Scanlan DJ & Ulloa O (2010) Novel lineages of Prochlorococcus thrive within the oxygen minimum zone of the eastern tropical South Pacific. Environmental Microbiology Reports 2: 728-738.

Li WKW (1994) Primary productivity of prochlorophytes, cyanobacteria, and eucaryotic ultraphytoplankton: measurements from flow cytometric sorting. Limnol Oceanogr 39: 169–175.

Lindell D & Post AF (1995) Ultraphytoplankton succession is triggered by deep winter mixing in the Gulf of Aqaba (Eilat), Red Sea. Limnol Oceanogr 40: 1130- 1141.

Lochte, K., & Turley, C. M. (1988). Bacteria and cyanobacteria associated with phytodetritus in the deep sea. Nature 333: 67–69.

Ma, Y., Jiao, N. Z., & Zeng, Y. H. (2004). Natural community structure of cyanobacteria in the South China Sea as revealed by rpoC1 gene sequence analysis. Letters in applied microbiology, 39(4), 353-358.

Malmstrom, R. R., Coe, A., Kettler, G. C., Martiny, A. C., Frias-Lopez, J., Zinser, E. R., & Chisholm, S. W. (2010). Temporal dynamics of Prochlorococcus ecotypes in the Atlantic and Pacific oceans. The ISME journal, 4(10), 1252-1264.

121 Malmstrom, R. R., Rodrigue, S., Huang, K. H., Kelly, L., Kern, S. E., Thompson, A., et al (2013). Ecology of uncultured Prochlorococcus clades revealed through single-cell genomics and biogeographic analysis. The ISME journal, 7(1), 184- 198.

Martiny AC, Tai AP, Veneziano D, Primeau F & Chisholm SW (2009) Taxonomic resolution, ecotypes and the biogeography of Prochlorococcus. Environ Microbiol 11: 823-832.

Matsen, F. A., Kodner, R. B., & Armbrust, E. V. (2010). pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC bioinformatics, 11(1), 538.

Moore LR, Rocap G & Chisholm SW (1998) Physiology and molecular phylogeny of coexisting Prochlorococcus ecotypes. Nature 393: 464-467.

Morcos, S. A. (1970). Physical and chemical oceanography of the Red Sea. Oceanogr. Mar. Biol. Annu. Rev, 8(73), 202.

Ngugi DK, Antunes A, Brune A & Stingl U (2011) Biogeography of pelagic bacterioplankton across an antagonistic temperature–salinity gradient in the Red Sea. Mol Ecol 21: 388-405.

Ngugi DK & Stingl U (2012) Combined analyses of the ITS loci and the corresponding 16S rRNA genes reveal high micro- and macrodiversity of SAR11 populations in the Red Sea. PLoS One 7: e50274.

Nurk, S., Bankevich, A., Antipov, D., Gurevich, A. A., Korobeynikov, A., Lapidus, A., ... & Pevzner, P. A. (2013). Assembling single-cell genomes and mini- metagenomes from chimeric MDA products. Journal of Computational Biology, 20(10), 714-737.

Oksanen, J., Blanchet, F. G., Kindt, R., Legendre, P., Minchin, P. R., O'Hara, R. B., et al (2015). Package ‘vegan’. Community ecology package, version, 2-2.

Olson, R. J., Chisholm, S. W., Zettler, E. R., Altabet, M. A., & Dusenberry, J. A. (1990). Spatial and temporal distributions of prochlorophyte picoplankton in the North Atlantic Ocean. Deep Sea Research Part A. Oceanographic Research Papers, 37(6), 1033-1051.

122 Palenik, B. (1994). Cyanobacterial community structure as seen from RNA polymerase gene sequence analysis. Applied and Environmental Microbiology, 60(9), 3212-3219.

Partensky F, Hoepffner N, Li WK, Ulloa O & Vaulot D (1993) Photoacclimation of Prochlorococcus sp. (Prochlorophyta) strains isolated from the North Atlantic and the Mediterranean Sea. Plant Physiol 101: 285-296.

Partensky F, Hess WR & Vaulot D (1999) Prochlorococcus, a marine photosynthetic prokaryote of global significance. Microbiol Mol Biol Rev 63: 106- 127.

Penno S, Lindell D & Post AF (2006) Diversity of Synechococcus and Prochlorococcus populations determined from DNA sequences of the N- regulatory gene ntcA. Environ Microbiol 8: 1200-1211.

R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2012.

Raitsos, D. E., Pradhan, Y., Brewin, R. J., Stenchikov, G., & Hoteit, I. (2013). Remote sensing the phytoplankton seasonal succession of the Red Sea. e64909.

Rocap G, Distel DL, Waterbury JB & Chisholm SW (2002) Resolution of Prochlorococcus and Synechococcus ecotypes by using 16S-23S ribosomal DNA internal transcribed spacer sequences. Appl Environ Microbiol 68: 1180- 1191.

Rocap, G., Larimer, F. W., Lamerdin, J., Malfatti, S., Chain, P., Ahlgren, N. A., et al (2003). Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature, 424(6952), 1042-1047.

Rusch DB, Halpern AL, Sutton G et al. (2007) The Sorcerer II Global Ocean Sampling expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol 5: e77.

Rusch DB, Martiny AC, Dupont CL, Halpern AL & Venter JC (2010) Characterization of Prochlorococcus clades from iron-depleted oceanic regions. Proc Natl Acad Sci USA 107: 16184-16189.

123 Scanlan, D. J., & West, N. J. (2002). Molecular ecology of the marine cyanobacterial genera Prochlorococcus and Synechococcus. FEMS microbiology ecology, 40(1), 1-12.

Scanlan DJ, Ostrowski M, Mazard S, Dufresne A, Garczarek L, Hess WR, Post AF, Hagemann M, Paulsen I & Partensky F (2009) Ecological genomics of marine picocyanobacteria. Microbiol Mol Biol Rev 73: 249-299.

Schattenhofer, M., Fuchs, B. M., Amann, R., Zubkov, M. V., Tarran, G. A., & Pernthaler, J. (2009). Latitudinal distribution of prokaryotic picoplankton populations in the Atlantic Ocean. Environmental microbiology, 11(8), 2078-2093.

Schloss PD, Westcott SL, Ryabin T et al. (2009) Introducing mothur: open- source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75: 7537-7541.

Shibl, A. A., Thompson, L. R., Ngugi, D. K., & Stingl, U. (2014). Distribution and diversity of Prochlorococcus ecotypes in the Red Sea. FEMS microbiology letters, 356(1), 118-126.

Steglich C., Post A.F. & Hess W.R. (2003) Analysis of natural populations of Prochlorococcus spp. in the northern Red Sea using phycoerythrin gene sequences. Environ Microbiol 5: 681-690.

Sunagawa, S., Coelho, L. P., Chaffron, S., Kultima, J. R., Labadie, K., Salazar, G., et al (2015). Structure and function of the global ocean microbiome. Science, 348(6237), 1261359.

Tai, V., & Palenik, B. (2009). Temporal variation of Synechococcus clades at a coastal Pacific Ocean monitoring site. The ISME journal, 3(8), 903-915.

Tamura, K., Stecher, G., Peterson, D., Filipski, A., & Kumar, S. (2013). MEGA6: molecular evolutionary genetics analysis version 6.0. Molecular biology and evolution, 30(12), 2725-2729.

Thompson, L. R., Williams, G. J., Haroon, M. F., Shibl, A., Larsen, P., Shorenstein, J., Knight, R., Stingl, U. (2015). A gene-centric view of microbial selection revealed in a 3D marine metagenomic map. Submitted

124 Ting, C. S., Rocap, G., King, J., & Chisholm, S. W. (2002). Cyanobacterial photosynthesis in the oceans: the origins and significance of divergent light- harvesting strategies. Trends in microbiology, 10(3), 134-142.

Toledo, G., & Palenik, B. (1997). Synechococcus diversity in the California current as seen by RNA polymerase (rpoC1) gene sequences of isolated strains. Applied and Environmental Microbiology, 63(11), 4298-4303.

Urbach, E., Scanlan, D. J., Distel, D. L., Waterbury, J. B., & Chisholm, S. W. (1998). Rapid diversification of marine picophytoplankton with dissimilar light- harvesting structures inferred from sequences of Prochlorococcus and Synechococcus (Cyanobacteria). Journal of molecular evolution, 46(2), 188-201.

Warnes, G. R., Bolker, B., Bonebakker, L., Gentleman, R., Huber, W., Liaw, A., et al (2014). gplots: Various R programming tools for plotting data. 2013. R package version, 2(1).

West N.J. & Scanlan D.J. (1999) Niche-partitioning of Prochlorococcus populations in a stratified water column in the eastern North Atlantic Ocean. Appl Environ Microbiol 65: 2585-2591.

West N.J., Lebaron P., Strutton P.G. & Suzuki M.T .(2011) A novel clade of Prochlorococcus found in high nutrient low chlorophyll waters in the South and Equatorial Pacific Ocean. ISME J 5: 933-944.

Wright J.J., Lee S., Zaikova E, Walsh D.A. & Hallam S.J. (2009) DNA extraction from 0.22 µM Sterivex filters and cesium chloride density gradient centrifugation. Journal of Visualized Experiments: JoVE.

Zinser, E. R., Coe, A., Johnson, Z. I., Martiny, A. C., Fuller, N. J., Scanlan, D. J., & Chisholm, S. W. (2006). Prochlorococcus ecotype abundances in the North Atlantic Ocean as revealed by an improved quantitative PCR method. Applied and environmental microbiology, 72(1), 723-732.

Zwirglmaier K., Jardillier L., Ostrowski M., Mazard S., Garczarek L., Vaulot D., Not F., Massana R., Ulloa O. & Scanlan D.J. (2008) Global phylogeography of marine Synechococcus and Prochlorococcus reveals a distinct partitioning of lineages among oceanic biomes. Environ Microbiol 10: 147-161.

Conflict of interest statement: The authors declare no conflict of interest.

125 Supplementary tables and figures

Table S1 Overview of the physicochemical characteristics of stations sampled for metagenomic shotgun sequencing and analysis during the KAUST Red Sea Expedition (KRSE) in 2011.

126

A B

C

Figure S1 Alpha diversity indices of the six sampled depths from stations sampled for pyrotag sequencing. A) Community richness was estimated using observed species and Chao1 estimator. Community diversity was calculated using the Shannon index. Analyses were executed and plotted in the R environment (R Core Team 2014, www.R-project.org). B) Rarefaction curve based on the observed species. C) Rarefaction curve based on the Good’s coverage for the pyrotags. Both curves were generated using the MOTHUR software package (Schloss et al., 2009).

A B

127

Figure S2 Phylogenetic trees of reference Prochlorococcus/ Synechococcus ITS and rpoC1 sequences, generated using MEGA (Tamura et al., 2013). A) Neighbor-joining tree of 16S-23S rRNA ITS sequences constructed using p- distance and 1,000 bootstrap iterations. The sequences were retrieved from all lineages available to date that represent the major phylogenetic clades of Prochlorococcus and ten Synechococcus strains (CC9311, CC9902, CC9605, RCC307, RS9916, RS9917, WH5701, WH7803, WH7805, WH8102). The HL II branch contains ITS sequences obtained from the 96 single cells sequenced by Kashtan et al (2014) and is collapsed for easier representation. The Synechococcus branch is also collapsed for easier representation. Black solid circles represent bootstrap values more than 80% and the scale bar indicates the estimated sequence divergence. Accession numbers for ITS sequences of strains that do not have a sequenced genome are shown in brackets. The Synechococcus elongates strain PCC6301 was used as an outgroup. B) Neighbor-joining tree of rpoC1 sequences constructed using p-distance and 1,000 bootstrap iterations. The sequences were retrieved from all lineages available to date that represent the major phylogenetic clades of Prochlorococcus and ten Synechococcus strains (CC9311, CC9902, CC9605, RCC307, RS9916, RS9917, WH5701, WH7803, WH7805, WH8102). The HL II branch contains rpoC1 sequences obtained from the 96 single cells sequenced by Kashtan et al (2014) and is collapsed for easier representation. The Synechococcus branch is also collapsed for easier representation. Black solid circles represent bootstrap values more than 80% and the scale bar indicates the estimated sequence divergence. Accession numbers for rpoC1 sequences of strains that do not have a sequenced genome are shown in brackets. The Synechococcus elongates strain PCC6301 was used as an outgroup.

128

Figure S3 Heat map of the abundance distribution for the 38 rpoC1 OTUs that were found more than two times in at least 5% of the samples. The hierarchical clustering of the sampled stations and depths are shown on the left side of the map. The different shades of grey encompass the OTUs that belong to the HL- or LL-adapted clades. The bracket on the right side highlight the clustering of 100 m samples at which the LL-affiliated OTUs were found to be most abundant. The heatmap was generated in the R environment (R Core Team 2014, www.R-project.org).

129

CHAPTER THREE

Isolation and genomic description of a Prochlorococcus strain isolated from the main basin of the Red Sea

Ahmed A Shibl1, Agathe Talaramin1, Luke R Thompson2, David K Ngugi1, Mohammed F. Haroon1 and Ulrich Stingl1

1Red Sea Research Center, King Abdullah University for Science and Technology, Saudi Arabia 2Knight Lab, University of California, San Diego

The following chapter is currently formatted into a manuscript to be sent to Environmental Microbiology or FEMS Microbiology Ecology.

130

Abstract

The Red Sea is a low-nutrient, high salinity water mass that receives minimal water influx from neighboring oceans. The high sea surface temperature of the

Red Sea and the high stratification of its water column resemble the effects of global warming on marine ecosystems. Inevitably, the microbial communities within regain adaptive traits that allow for their survival in such harsh conditions.

To better describe and understand the structure and function of these microorganisms, culturing efforts are necessary. Prochlorococcus is an important primary producer that dominates the water column in the main basin of the Red

Sea. Here, we report the isolation and genome sequencing of a Prochlorococcus strain that belongs to the HL II clade. In an environment as warm and saline year-round as the Red Sea, we ask what is the reason of their ecological success? We compare their genomic content with representative strains and identify unique genes potentially involved in salt tolerance and compatible solute biosynthesis. The particularities of the Red Sea specific strain are further investigated by exploring the expression pattern of these unique genes at different depths and across a diel cycle. Mapping the draft genome to an open- water metatranscriptome dataset showed that the unique genes are in fact independent of depth and light, and are probably an essential feature of the cell regardless of the niche and/or time of day. This study expands our knowledge on the selective adaptations that a critical microbe such as Prochlorococcus can go through in order to cope with climate change and global warming.

131 Introduction

The Red Sea is a peculiar environment characterized by an overall high salinity (from 36-41) (Edwards 1987; Alraddadi 2013) and elevated sea surface temperatures in the range of 23-32 °C (Raitsos et al., 2013). High dust depositions (17-420 µg m-3) occur seasonally (Chen et al., 2008) and a slow influx from the Indian Ocean creates a gradient of nutrient-rich waters in the southern end (Acker et al., 2008; Churchill et al., 2014). In addition, the oligotrophic nature of the main basin of the Red Sea renders it a relatively harsh marine ecosystem. Microbial community profiles from the northern part of the

Red Sea shows the dominance of Proteobacteria (SAR11) and Cyanobacteria in the surface waters, with Prochlorococcus abundance exceeding Synechococcus.

Moreover, the composition of the abundant phylotypes changes across the surface temperature-salinity gradient (Ngugi et al., 2011; 2012). The high-light adapted (HL II) ecotype of Prochlorococcus numerically dominates the Red Sea

(Ngugi et al., 2011; Shibl et al., 2014) and its population structure is highly homogeneous across open-ocean and coastal sites (Shibl et al., 2014). The

Prochlorococcus microdiversity is consistently high at 100 m across the whole basin despite contrasting physicochemical parameters and inflow of nutrient-rich waters at the southern end (Shibl et al., 2014; 2015, submitted, see chapter two).

Constituted of mainly Prochlorococcus and Synechococcus, picocyanobacteria are key players in the surface marine microbial systems, generating a global carbon biomass of 40-120 Tg (Garcia-Pichel et al., 2003).

Cells of the two genera are predicted to support up to 25% of the net marine primary productivity (Flombaum et al., 2013) with their large oceanic distribution

132 (Scanlan 2003; Zwirglmaier et al., 2008). They often occupy overlapping niches in the water column through their distinct ecotypes reflected by their phylogenetic microdiversity (Olson et al., 1990; Lindell and Post 1995; Rocap et al., 2002;

Scanlan and West 2002; Johnson et al., 2006). Prochlorococcus cells developed unique light harvesting capabilities and pigments, delineating into ecotypes that are adapted to high-light (HL) or low-light (LL) levels (Rocap et al., 2002; Martiny et al., 2009; Lavin et al., 2010; Rusch et al., 2010; West et al., 2011; Huang et al., 2012). Prochlorococcus cells are differentiated from Synechococcus and based on their cell size and pigmentation by using flow cytometry.

Prochlorococcus cells are smaller and have a more streamlined genome, ranging from 1.64 to 2.68 Mb (Rocap et al., 2002; Dufresne et al., 2003; Kettler et al.,

2007; Scanlan et al., 2009). Those attributes contribute to their ecological success across a wider range of the water column, but render them more difficult to bring into culture than Synechococcus (Swan et al., 2013; Giovannoni et al.,

2014; Biller et al., 2014).

Several approaches (e.g. single-cells, metagenomic assembly, pure cultures) are used to obtain more Prochlorococcus genomes to further investigate their genomic variation across contrasting environments (Kashtan et al., 2014; Biller et al., 2014; 2015). A significant degree of variability in gene composition is observed between HL and LL-adapted ecotypes (Kettler et al.,

2007; Partensky and Garczarek 2010). Indeed, genome streamlining and preferentially retained gene sets have allowed distinct lineages to occupy particular niches (Scanlan et al., 2009). Despite the presence of many hypotheticals, metagenomics and physiological experiments highlight genes

133 potentially involved in niche adaptation (Paul et al., 2010), nutrient assimilation

(Martiny et al., 2009; Berube et al., 2014) and Prochlorococcus- interactions (Morris et al., 2008; 2012; Sher et al., 2011). A more accurate assessment of gene expression profiles across spatial and temporal gradients is obtained by metatranscriptomic analyses (Shi et al., 2011; Stewart et al., 2012;

Ottesen et al., 2014; Dupont et al., 2015). Distinct Prochlorococcus ecotypes displayed differential transcriptomic responses to high irradiance (Mary 2004;

Mella-Flores et al., 2012), salt stress (Al-Hosani et al., 2015) and nutrient availability (Tolonen et al., 2006; Thompson et al., 2011; Reistetter et al., 2013).

The variation in genetic function and cellular metabolism across a diel cycle is indicative of Prochlorococcus adaptations that control their geographic distribution (Waldbauer et al., 2012; Ottesen et al., 2014; Dupont et al., 2015).

Here, we describe the isolation of a Red Sea-specific Prochlorococcus HL II strain (referred to as RSP50) dominating the main basin. We compare and report its genomic content and discuss possible adaptation mechanisms by investigating unique gene expression patterns across several depths in the Red

Sea water column.

134 Materials and Methods

Cultivation

Collection Water samples from 10 meters depth were collected for cultivation experiments from the central Red Sea at the Atlantis II site (22.18 °N, 38.49 °W) during the time-series cruise STARS 001 (Saudi Time-series Analysis of the Red

Sea) in December 2014. A rosette of 24 10-L Niskin bottles equipped with a CTD

(conductivity-temperature-density) device was used to collect water and record temperature, salinity and fluorescence measurements. In situ conditions of the seawater for isolation were 28.4 °C and a salinity of 39.5. Two dark polyethylene bottles (150 ml) were filled and kept in the shade on the ship.

Isolation Within 24 hours from collection, 200 ml of the seawater was filtered through superimposed 0.6-µm polycarbonate membranes. The filtrate was used as inoculate (20 ml) into 5 ml of PRO2 medium (Moore and Chisholm 1999;

Moore et al., 2007) in acid-washed glass tubes. After one week of growth, monitored by measuring chlorophyll fluorescence using a Turner-10 AU fluorometer (Turner Designs, CA, USA), a dilution to extinction approach in 96- well Teflon plates was used with ProMM medium (Stingl et al., 2007; Moore et al., 2007; Berube et al., 2015). Wells were monitored by eye until a color was visible. The whole volume of each colored well was then transferred into 100% seawater-based Pro99 medium (Moore et al. 2007) and maintained in acid- washed 30 ml glass culturing tubes. After two months of transfers into fresh medium, the cultures were finally maintained in 75 % seawater-based Pro99 medium and transferred weekly.

135 Growth conditions Cultures were incubated at 27 °C under continuous cool white fluorescent light at 50 µE m-2 s-1 in a Multitron Standard incubator (INFORS

HT, Switzerland). Light intensity was measured using the Quantum Scalar

Laboratory sensor (QSL-2100; Biospherical Instruments Inc., CA, USA). Larger volumes of the culture were prepared in clean 200 ml polycarbonate flasks to collect enough biomass for DNA extraction and a sufficient amount of cells for microscopic observations.

Growth measurements

For 14 consecutive days including the day of transfer, three biological replicates of the strain were gently vortexed (650 rpm) for five seconds and subsampled

(193 µl) into 2 ml cryotubes containing 6 µL of a filtered 25 % glutaraldehyde solution. The cryotubes were vortexed and incubated at 4 °C for 30 minutes and then flash frozen in liquid nitrogen and stored at -80 °C until processing.

After thawing, samples were transferred into 96 well plates, each well containing

1 µl of a diluted solution of 1 µm yellow green beads (Fluospheres, Life

Technologies, CA, USA) used as standard for fluorescence and scatter. Events were enumerated using a LSR Fortessa II analyzer (Becton Dickinson, CA,

USA), at a fixed average flow rate of 1 µl s-1. The syringe system injected 100 µl of sample for rinsing prior to each acquisition of 40 to 50 µl, depending on expected concentrations. The strain was identified according to the fluorescence of its natural pigments after excitation by a solid-state blue (488 nm) and a solid state red (640 nm) laser. Prochlorococcus were distinguished using two filters

(670/30 nm and 695/40 nm) collecting signals from chlorophyll. Flow cytometric analysis of the samples was done using the FCS Express 5 software (De Novo

136 software, CA, USA). Growth rates during the exponential phase were estimated based on Wood et al (2005).

Transmission electron microscopy

Cells in exponential growth phase were pelleted from 40 mL of cultures. Pellets were fixed with a solution of glutaraldehyde (2.5 % final concentration) in cacodylate buffer (0.1 mol L-1, pH 7.4) and treated with reduced osmium (1:1 mixture of 2% aqueous potassium ferrocyanide) for one hour, as described in

Karnovsky (1971). Pre-embedded in agar, the cells were then dehydrated using an ethanol series (70, 80, 90, 95 and 100 %). Blocks of 2 to 3 mm were infiltrated and embedded in epoxy resin. Sections 80 – 120 nm thick were placed onto copper grids and contrasted with lead citrate. Imaging was performed using a

Titan transmission electron microscope (80 – 300 kV, FEI, USA) operating at 200 kV. Images were recorded on a 2K x 2K couple-charged device camera (Gatan

Inc., USA).

Cell size measurements were obtained using the DigitalMicrograph software

(Gatan Microscopy Suite) and the ImageJ software (www.imagej.nih.gov), namely the longest segment of a cell (L) and its perpendicular segment or width

(W). The most common formula for cell volume used in the literature assumes a spherical shape of Prochlorococcus cells (Dufresne et al., 2003). Here, biovolumes were estimated in 3 different ways, using length (L), width (W), and the lowest of all measured W values (Wmin) in order to compare with previously published data and detect any bias in the biovolume estimations:

Sphere � = ! π ×(! )3 ! !

Ellipsoid � = ! π ×L×W2 !

137 ! Smallest ellipsoid � = π ×L×W×Wmin !

DNA extraction, library preparation and sequencing

DNA extraction DNA was extracted from pellets obtained via centrifugation of 15 mL of four replicated cultures using the DNeasy Kit (Qiagen), following the manufacturer’s protocol with a recommended pre-treatment for Gram-negative bacteria. Prior to library preparation, 16S rRNA gene and the intergenic transcribed spacer (ITS) gene region were amplified from the extracted DNA using the 16S rRNA universal primers 27F (5'-AGAGTTTGATCMTGGCTCAG-3’) and 1492R (5'-TACGGYTACCTTGTTACGACTT-3') genes and the ITS primers

Picocya16S-F (5’-TGGATCACCTCCTAACAGGG-3’) and Picocya23-R (5’-

CCTTCATCGCCTCTGTGTGCC-3’) (Cai et al., 2010; Huang et al., 2011; Jiao et al., 2014). The amplicons were Sanger sequenced in the Bioscience Core

Laboratory (BCL) facility at KAUST to confirm their identity.

Sequencing library preparation The extracted genomic DNA was sheared into

500 bp fragments according to the M220 focused-ultrasonicator protocol

(Covaris, MA, USA). Sequencing libraries were prepared using the TruSeq DNA sample preparation kits (Illumina, USA) and indexed with TruSeq adapters.

Quality control of the libraries were conducted using a Qubit assay (dsDNA kit,

Qubit, USA), qPCR and a DNA 2100 Bioanalyzer (Agilent Technologies Inc.,

USA). The libraries were sequenced on the MiSeq platform in the BCL facility at

KAUST.

Whole genome assembly and annotation

Raw reads were quality-checked using FastQC (Patel and Jain 2012) and quality-trimmed using Trimmomatic (Bolger et al., 2014) before being assembled

138 using SPAdes v3.1.1 with the modes for read error-correction and repair for misassemblies switched on (Bankevich et al., 2012). Genome completeness and fidelity were estimated by determining the presence (or absence) of a suite of

104 single-copy marker genes universally present in complete bacterial genomes as implemented in CheckM v0.9.7 (Parks et al., 2015). The high-quality contiguous contigs were subsequently annotated using the INDIGO pipeline

(Alam et al., 2013) as previously described (Jimenez Infante et al., 2014; Ngugi et al., 2015). Unique gene annotations were manually curated using the Argot2 server (Falda et al., 2012). To obtain average amino acid identity (AAI) values of orthologous proteins and for downstream phylogenetic inference, the genomes of all available Prochlorococcus strains were used (Biller et al., 2014). AAI values and shared orthologous genes were generated using the get_homologues software package (Contreras-Moreira and Vinuesa 2013) that uses bidirectional best-hit and OrthoMCL v5 (Camacho et al., 2009; Li et al., 2003) clustering algorithms and the CompareM toolbox

(https://github.com/dparks1134/CompareM) that uses the DIAMOND software

(Buchfink et al., 2015).

Phylogeny

16S rRNA gene sequences of the draft genome were aligned with the 16S rRNA sequences of the representative Prochlorococcus strains (Table S1) using

GeneiousPro V.8 (Kearse et al., 2012). Neighbor-joining trees were generated in

MEGA v6 (Tamura et al., 2013) using p-distance and 1,000 bootstraps while maximum likelihood trees were inferred using the GTR-GAMMA model of nucleotide substitution as estimated by the modelgenerator algorithm in EPoS

139 (Griebel et al., 2008) and 1,000 bootstraps. For a higher resolution of the ecotypes within Prochlorococcus, the same approach was followed using the full length ITS sequences from the genomes.

To construct a phylogenomic tree, 239 conserved single-copy genes (CSCGs) among all Prochlorococcus genomes were identified using Hal (Robbertse et al.,

2011). Neighbor-joining and maximum likelihood trees were constructed in

MEGA v6 (Tamura et al., 2013) using p-distance and the JTT (Jones-Taylor-

Thornton) model of amino acid substitution as estimated in ProtTest3.2 (Darriba et al., 2011) and 1,000 bootstraps.

RNA extraction, library preparation and sequencing

Sampling and extraction of RNA Seawater samples were collected in

March 2013 from the water column of an open-ocean site in the Red Sea (Kebrit

Deep; Latitude: 24.7233/ Longitude: 36.2767) during the KAUST Red Sea expedition (KRSE2013). Samples (x2 replicates) were collected every 4 hours over a 48-hour period at four depths: surface (10 m), below the mixed layer

(BML; 40 m; mixed layer ended at 35 m), chlorophyll maximum (DCM; 75 m), and oxygen minimum zone (OMZ; 420 m). For each timepoint and depth, 1 L of seawater was filtered using a peristaltic pump with two in-line filters in series: a

1.6 µm GF/A pre-filter (Whatman), followed by a 0.22 µm Sterivex filter

(Millipore). RNAlater (QIAGEN) was added immediately to fill the dead space of the filter. Samples were flash frozen in liquid nitrogen and stored at -20 °C onboard. RNA extraction was performed according to a protocol adapted from

Massana (1997), Beja (2002) and Stewart (2012). The RNAlater solution was expelled from the Sterivex filter and 2 ml of a lysozyme solution (1 mg/ml in lysis

140 buffer: 40 mM EDTA, 50 mM Tris pH 8.3, 0.73 M sucrose) was added to it. The filter was left to incubate at 37 °C for 45 minutes with rotation. Then 50 µl

Proteinase K solution (20 mg/ml; from 5PRIME) and 100 µl 20% SDS solution were added and the filter incubated at 55 °C for 2 hours with rotation. After incubation, the lysate was expelled to separate tubes and 1 ml of lysis buffer was added to the filter at 55 °C for 15 min for washing. The two lysates were pooled and 1.5 ml absolute ethanol was added. From this solution, RNA was extracted using the RNeasy Protect Bacteria Mini Kit (QIAGEN) according to the manufacturer’s protocol. Extracted RNA was then eluted with two volumes of

150 µl RNase-free water. The RNA samples were concentrated using a speed vacuum, from 250 µl to 60 µl. To remove DNA, the samples were treated with

DNaseI (Turbo DNA-free kit, Ambion) by adding 6 µl of 10x buffer and left to incubate at 37 °C for 30 min. This solution was purified using the RNeasy

MinElute Cleanup Kit (QIAGEN) and eluted with 14 µl RNase-free water.

RNA library preparation and sequencing The total RNA (range of 2-4 ng) was amplified using the ExpressArt C&E Bacterial NANO RNA Amplification Kit

(AMSBIO, UK), which preferentially amplifies mRNA (independent of poly-A tail) and selects against rRNA. Sequencing libraries were prepared using the paired- end TruSeq RNA sample preparation kits (Illumina, USA) following the manufacturer’s protocol. Libraries were sequenced on the HiSeq 2000 platform

(2 x 100 bp) and FASTQ files were generated using bcl2fastq v1.8.4 software in the BCL facility at KAUST.

Data-QC processing and genome mapping Low quality reads and sequencing adapters were removed using Trimmomatic v.0.32 (Bolger et al.,

141 2014). Sequence reads with <50 base pairs were discarded. ERNE-FILTER v.1.4.2 (Del Fabbro et al., 2013) and Bowtie2 v.2.2.4 (Langmead and Salzberg

2012) were used to identify PhiX contamination sequences and then remove them. Reads were error corrected using SPAdes (Bankevich et al., 2012).

Ribosomal RNA (rRNA) was removed using SortMeRNA v.2.0 (Kopylova et al.,

2012). The RNA sequence data have been deposited in NCBI GenBank with

Bioproject number PRJNA289956. Clean reads were mapped to the Red Sea

Prochlorococcus (RSP50) genome using Bowtie2 v.2.2.4 (Langmead and

Salzberg 2012) and gene expression were based on FPKM (fragments per kilobase of exon per million fragments mapped) calculated using an in-house script.

142

Results

Identification and growth of Prochlorococcus RSP50

Seawater used in culturing experiments was collected in acid-clean bottles and

protected from high irradiance. In this study, we started with enrichments of

Prochlorococcus from the open-ocean of the Red Sea using filtration techniques

and natural seawater-based media (Moore et al., 2002; 2007). During the series

of filtrations and dilution-to-extinction, the growth of cells were monitored by

fluorescence measurements and flow cytometry. Weekly transfers and incubation

at 27 °C and 50 µE m-2 s-1 continuous light resulted in a highly pure green-

yellowish culture (Figure 1, Figure 2) selected for further analyses. The level of

purity was verified by microscopy, flow cytometry using DAPI staining, and

growth in MPTB (Marine Purity Test Broth) media in the dark.

Figure 1 Visual characterizations of the Red Sea Prochlorococcus isolate RSP50. Left panel: visible cell growth in liquid media. Right panel: Cytogram showing distinct red fluorescence signals of Prochlorococcus RSP50 relative to Synechococcus. To calculate growth rates for replicates of the Red Sea strain, daily measurements of cell counts were taken for 14 days and the exponential growth phase seems to start at day six. RSP50 was characterized by a growth rate of 0.57 ± 0.01 (n = 3), i.e. a doubling time of 1.22 days. The replicates were grown in 20 ml of 75% Seawater-Pro99 media and the maximum abundance reached over 5 × 107 cells per ml of culture.

5.0E+05'5 x 105 143 Growth'rate:'0.57'±'0.01' Doubling'Ome'(days):'1.22'±'0.02' '' )' >1

'(cells'ml 5.0E+04'5 x 104

RSP50>A'

Abundance_log RSP50>B' RSP50>C'

5.0E+03'5 x 103 1' 2' 3' 4' 5' 6' 7' 8' 9' 10' 11' 12' 13' 14' Days'

Figure 2 Growth curves of the Red Sea Prochlorococcus isolate RSP50 based on cytometric counts. A, B and C are replicates. The growth rate and doubling time are shown in the top left of the graph. The exponential growth phase is observed between days six to nine.

TEM and morphology

Due to their small size, the cultured cells were concentrated by centrifugation to facilitate their preparation for transmission electron microscopy (Figure 3).

RSP50 cells systematically displayed a slightly or clearly ovoid shape (n=34 cells measured), with an overall width-to-length ratio of 0.75 (Table 1). Their complex internal thylakoidal structures were peripheric and lateral, i.e. often located on opposite sides of the cells if they were not connected. Hexagonal carboxysomes were located in the central cytoplasmic area, while dense structures (rich in electrons, seen as dark dots on the pictures) were often found in periphery, between the thylakoidal structures and the cell membrane.

144

Figure 3 Images of the Red Sea Prochlorococcus isolate RSP50 captured using TEM with noticeable structures: thylakoids (yellow arrows), carboxysomes (white arrows), and electron-rich structures (blue arrows). The black scale bar indicates a length of 500 nm.

16S and ITS phylogeny of Prochlorococcus RSP50

To observe the phylogenetic placement of the Red Sea Prochlorococcus strain, the full-length 16S rRNA and internal transcribed spacer (ITS) sequences from

43 Prochlorococcus and eight Synechococcus representatives were retrieved and aligned with the 16S and ITS sequences of the RSP50 strain. The RSP50

16S rRNA sequence clustered with those belonging to the high-light adapted clade (Figure 4a). A more clear resolution of the lineages using ITS indicated the affiliation of RSP50 to the HL II clade, closely related to strains AS9601, MIT9301 and SB (Figure 4b).

Figure 4 (next page) Phylogenetic trees of a) 16S rRNA gene and b) ITS based on neighbor joining and maximum likelihood trees. Bootstrap values of 75% or more are indicated by solid black dots as shown. Scale bars represent estimated sequence divergence as shown. Synechococcus strains from the subcluster 5.1A were used as outgroup.

a b 145

146 Comparative genomics

The draft RSP50 genome is 1,656,033 base pairs (bp) with 1,939 protein-coding genes and 31% GC content. The RSP50 strain had a 94.3% ANI (average nucleotide identity) and a genome overlap of 89.8% with the Prochlorococcus

AS9601 strain cultured from the Arabian Sea. Relative to the MIT9301 strain, the

Red Sea strain had 93.8% ANI and 90.7% genome overlap. Relative to the

MIT9215 strain, it had 91.4% ANI and 88.3% overlap and to the MIT9312 strain

90.1% ANI and 88.8% genome overlap. The amino acid identity (AAI) of the five closest strains to RSP50 was between 93-94%, but dropped significantly to 60-

65% with LL ecotypes (Figure S1). A summary of the genome characteristics of the Prochlorococcus RSP50 strain relative to all cultured and/or sequenced representatives are displayed in Table S1. The completeness of the draft RSP50 genome was calculated at approximately 97% according to a suite of single-copy marker genes (Parks et al., 2015). According to assigned COG functional categories, the highest representation of genes belonged to amino acid transport and metabolism followed by translation, ribosomal structure, biogenesis and then energy production. However, genes with a general function prediction only or of unknown function were both represented by 18% of genes assigned to COG categories. In addition, the RSP50 phylogeny was assessed using a concatenated alignment of 239 conserved single-copy genes (CSCG) shared between Prochlorococcus and Synechococcus representatives. Similar to the

ITS-based tree, the topology of the core gene-based tree supported the resolution of the Red Sea Prochlorococcus strain and placed RSP50 among its closest relatives AS9601, SB, MIT9301, MIT0604 and MIT9314 (Figure 5).

147

Figure 5 Whole genome phylogeny based on 239 conserved single copy genes from 43 sequenced Prochlorococcus strains and eight Synechococcus strains plus the Red Sea strains RSP50.

148 The orthologous genes were analyzed between all available Prochlorococcus genomes and the Red Sea RSP50 strain, and they were found to share 1,842 genes. The number of unique genes found in the Red Sea strain (i.e. no orthologs in any of the other 43 Prochlorococcus strains) was found to be 56 genes, the majority of which were either hypothetical or uncharacterized genes

(Figure 6). A total of 18 genes returned annotations from NCBI and verified using the Argot2 server (Falda et al., 2012). The majority of genes were involved in metabolism (e.g. starch and sucrose, amino sugar and nucleotide sugar) and biosynthesis (e.g. lipopolysaccharide and lysine).

Figure 6 Venn diagram showing the number of orthologous genes shared by all 44 Prochlorococcus genomes, and the number of unique gene found in the RSP50 genome.

149 The core functionality of the Prochlorococcus genes have already been covered broadly and compared extensively (Scanlan et al., 2009). The HL II

Prochlorococcus RSP50 strain harbors all the necessary genes found within its clade for photosynthesis, carbon fixation and nutrient acquisition. The main components of the photosynthetic apparatus and light harvesting structure is present, reflected by the presence of the psbA1 (Figure 7a) and pcbD genes

(encoding D1 protein of photosystem II and Chlorophyll a-b binding light harvesting protein respectively), chromophore-encoding pebA and pebB genes and a homolog for plastoquinol terminal oxidase (PTOX). Genes encoding for carboxysome shell proteins and the ribulose-1,5-bisphosphate carboxylase

(RuBisCo) subunits for carbon fixation are also all present. For nutrient acquisition, the RSP50 genome contains both genes necessary for nitrogen regulation and sensing (i.e. ntcA and glnB genes). For nitrogen acquisition and metabolism, this strain assimilates ammonium via the - glutamate synthase pathway and recycles nitrogen via the urea cycle using a urea transport system and the urease enzyme complex. Phosphate represents another critical requirement for the survival of phytoplankton, and is usually present in low-nanomolar concentrations. The genes for phosphorus regulation and sensing are present, encoding a two-component system made of a histidine kinase and a response regulator. The RSP50 strain carries a high-affinity phosphate transport protein system (encoded for by the pstSCAB genes) and a

DedA family putative alkaline phosphatase-like protein. The latter indicates that there might be alternative ways to assimilate phosphorus by utilizing organic

150 sources. Additionally, the genome has a phosphonate transport system encoded by the phnCDE gene set allowing the cells to access another phosphorus source.

Excessive exposure to light requires the RSP50 to remove reaction oxygen species (ROS) that are generated. A nickel-containing superoxide dismutase

(encoded for by sodN) and a number of peroxidases make up the existing ROS- scavenging system. Several photolyases (PhrAB and putative ones) and a number of high light inducible proteins (encoded for by hli) are also present to catalyze the repair of UV-damaged DNA. Prochlorococcus accumulate compatible solutes to tolerate the high salinity levels in the oceans. Typically, the strain RSP50 carries the stpA gene encoding a glucosyl-glycerolphosphate phosphatase thought to be involved in glucosyl-glycerate (GGA) biosynthesis. It also has the spsA gene encoding a sucrose-phosphate synthase, through which sucrose is accumulated as a compatible solute (Figure 7b).

However, we hypothesize that the particularity of the Red Sea could lead to the presence of distinct genes that contribute to the fitness of the RSP50 strain.

Indeed, the unique genomic component of the RSP50 strain revealed the presence of ggpS that encodes for a glucosylglycerol-phosphate synthase, glpA that encodes for a glycerol-3-phosphate dehydrogenase, and an alpha amylase catalytic protein.

Unique genes associated with salt-stress and their diel expression

These genes, found to be RSP50 specific, likely belonged to the biosynthesis of compatible solutes (Table S2). The glucosylglycerol-phosphate synthase (ggpS) is involved in the synthesis of glucosyl-glycerol (GG). It has not been found in

151 any Prochlorococcus genome before but many Synechococcus strains are known to accumulate GG. Another gene called glpA encodes a glycerol 3- phosphate dehydrogenase, which is involved in osmoregulation by providing glycerol 3-phosphate for the biosynthesis of glycerol. In the Red Sea, the expression pattern of the ggpS and glpA genes in the environment are almost identical to each other. However, the glpA gene expression level is always higher than ggpS, over a course of 48 hours as reflected by the FPKM. Both genes seem to be expressed more in the surface and below the mixed layer (BML) than at the deep chlorophyll maximum (DCM). There is no clear pattern according to the diel cycle and therefore very little correlation between light levels and the expression of these two genes (Figure 8). Another unique gene found in the

RSP50 genome encodes for the alpha amylase catalytic protein, also called maltose alpha-D-glucosyltransferase or trehalose synthase. This enzyme catalyzes the synthesis of trehalose from maltose, but no previously sequenced

Prochlorococcus genome carries it. However, the strain MED4 possesses a trehalase gene but no experiments have been conducted on that gene or pathway in picocyanobacteria. Investigating this gene’s expression pattern revealed that it is expression drops significantly in the DCM compared to the surface and BML. Moreover, the FPKM values for this gene seem to peak more often during the night than the day and are generally lower than that of ggpS and glpA genes.

A 152

B

Figure 7 Expression patterns of A) Photosystem Q protein (psbA) B) Sucrose phosphate synthase (spsA) in the water column of the Red Sea. The expression pattern of the psbA gene follows a diel cycle where the FPKM is high during the daytime and starts to decrease, as it gets dark. The spsA gene involvled in compatible solute biosynthesis does not follow a diel cycle due to its requirement throughout the day to counterbalance the high salnity of the Red Sea.

153

A

B

154

C

Figure 8 Expression pattern of A) Glucosylglycerol-phosphate synthase B) Glycerol 3-phosphate dehydrogenase C) Trehalose synthase in the water column of the Red Sea.

155 Discussion

In this study we describe the isolation of a Prochlorococcus strain from the open water of the Red Sea, referred to as RSP50 hereinafter. Since the first cultures in 1988 from the Sargasso Sea (Palenik and Haselkorn 1992), there have been many attempts at cultivating different ecotypes from various oceanic regions (Partensky et al., 1999; Biller et al., 2015). To differentiate

Prochlorococcus from Synechococcus and heterotrophic cells, flow cytometry is used to detect the red fluorescence of chlorophyll, along with the side or forward scatter. In turn, this made flow sorting from the enrichments feasible but can also be used to visualize the purity of cultures by adding fluorescent stains to the samples (Figure 1).

The strain was grown at 27 °C under constant light in natural seawater- based media Pro99 (Moore et al., 2007). After several weeks of serial dilutions and transfers, we were able to obtain a high level of purity in the Prochlorococcus cultures, assessed by fluorescence microscopy and no visible growth in MPTB.

Although we could not grow the cells in large volumes, we were able to obtain enough biomass for growth measurements over a course of 14-days as well as for TEM microscopy, using an osmium oxide treatment procedure. The growth rate (Figure 2) of RSP50 at 27 °C was highly similar to other HL II strains such as

MIT9312 which had a 0.61±0.01 growth rate at 27 °C and MIT9215 which had a

0.5±0.01 growth rate at 25 °C (Johnson et al., 2006). HL I and LL ecotypes generally have a lower temperature optima, and at 27 °C showed a slower growth rate and longer doubling time (Johnson et al., 2006; Zinser et al., 2007).

The RSP50 strain cell size was ~0.6 µm in diameter, typical of the HL adapted

156 cells of Prochlorococcus strains. Previously, cryoelectron tomography was used on Prochlorococcus to characterize the cell architecture of a HL (MED4) and LL- adapted (MIT9313) ecotype (Ting et al., 2007). Upon comparing the MED4 images with RSP50, we observed a high similarity in terms of the thylakoid structures near the cell periphery. Additionally, structures resembling carboxysomes were seen clustered in the central cytoplasmic space (Figure 3).

The similarity of the RSP50 with MED4 and the difference with MIT9313 cells, supports the previous observations that differences in cell size, shape and architecture have an impact on physiology and niche adaptation (Ting et al.,

2007).

Phylogenetic analyses showed that RSP50 is a HL II adapted ecotype

(Figure 4). This is consistent with previous findings that the most widespread

Prochlorococcus sequences in the water column of the main basin are affiliated with the HL II ecotype (Ngugi et al., 2011, Shibl et al., 2014; 2015 submitted).

Additionally, the topology of the tree generated from conserved single copy genes among all 43 sequenced Prochlorococcus strains (Biller et al., 2014) placed RSP50 closest to strains isolated from similar environments (Figure 5).

This was corroborated with the 94% AAI value was shared by RSP50 with

AS9601 and MIT9314 from the Arabian Sea and Gulf Stream respectively

(Figure S1). Despite these regions receiving high irradiance throughout the year, the difference with the Red Sea lie in their water exchange with neighboring oceans. The more isolated and stratified nature of the Red Sea makes it more oligotrophic, and less prone to water exchange and climate fluctuations compared to the Arabian Sea and Gulf Stream. A similar AAI value was shared

157 with strains that dominate the Western Pacific (SB), Sargasso Sea (MIT9301) and ALOHA station (MIT0604), all having the characteristic of being warm, nutrient-poor environments. Recently, the taxonomic structure of 13

Prochlorococcus strains was assessed using their genomes, and the authors demonstrated that AAI was the method with the highest resolution for species identification, at a cutoff of >94% (Thompson et al., 2013). They also suggested that a low number of novel species exist by scanning 111 marine metagenomes from the global ocean (Thompson et al., 2013). We propose that RSP50 might potentially be defined as a separate species based on its 94% AAI with several strains among the 43 Prochlorococcus strains used in this study.

At the genome level, RSP50 fits within the ranges typical of the HL II strains (Table S1) with a 1.66 Mb genome size and 31% G+C content. The classification of its protein coding genes into functional categories (COGS) highlighted the large number of genes with unknown functions in RSP50. This feature is common between COGS of different Prochlorococcus ecotypes (HL I,

LL I, LLII/ III, and LLIV) and remains an interesting realm that needs further investigation. However, a comparison of COGS from these ecotypes (HL I, LL I,

LLII/ III, and LLIV) showed that they were almost identical, having most of the genes cluster under i) energy production and conversion, ii) amino acid transport and metabolism, and iii) translation, ribosomal structure and biogenesis. The

RSP50 genome, like those belonging to the HL-adapted group, has all necessary genes required for photosynthetic light harvesting, pigment biosynthesis and carbon fixing. Genes that are typically lacking from HL II adapted strains were not found in RSP50, indicating that is went through the rigorous genome streamlining

158 process to establish the common minimalistic lifestyle of the Prochlorococcus genus. RSP50 is likely able to metabolize ammonium and urea as a nitrogen source, reflected by the presence of the gene cassettes responsible for sensing and assimilating those molecules. A high-affinity phosphate transport system indicates that phosphate can be utilized as a P-source from the oligotrophic Red

Sea. Perhaps the RSP50 strain can also utilize organic P sources, which requires phosphatase enzymes, reflected by the presence of DedA family alkaline phosphatase-like protein. Orthologs of this gene was found previously in several Prochlorococcus strains, and the variability in their domain structure likely reflects different capacities to utilize organic P sources (Moore et al., 2005).

Another organic P source that RSP50 is likely able to exploit are phosphonate

(Martiny et al., 2006; Ilikchyan et al., 2009), because similar to other strains, the genome carries genes (phnCDE) that encode for the specific ABC phosphonate transporter.

To counter the generated reactive oxygen species (ROS),

Prochlorococcus RSP50 possesses the sodN gene that encodes the Nickel- binding superoxide dismutase (Ni-SOD) (Dupont et al., 2008). To catalyze the reduction of various hydroxyperoxides, Prochlorococcus strains are also known to express peroxiredoxins (Prx) and thioredoxins (Trx) (Hosoya-Matsuda et al.,

2005; Florencio et al., 2006). In Prochlorococcus RSP50, no catalase or catalase-peroxidase genes are found, however, the presence of four genes encoding Prx- and Trx-proteins indicates the need to continuously detoxify the cells from ROS.

159 In addition, we identified a number of genes unique to the Red Sea strain

RSP50 (Figure 6, Table S2). Despite the majority of the unique pool being hypothetical or uncharacterized, three genes stood out. Those Red Sea specific genes were further subjected to functional inference on Argot2 (Falda et al.,

2012) based on clustering GO (gene ontology) terms retrieved by BLAST and

HMMER. The genes encoded for i) glucosylglycerol-phosphate synthase (ggpS) ii) glycerol 3-phosphate dehydrogenase (glpA) and iii) trehalose synthase/ alpha amylase catalytic protein. Interestingly, these genes are likely to be involved in the biosynthesis of compatible solutes used to counter high salinity levels.

Glucosylglycerol-phosphate synthase is found in all Synechococcus genomes but in none of the Prochlorococcus genomes so far. This enzyme catalyzes the synthesis of the intermediate glucosyl-glycerol phosphate (GG-P) which is then dephosphorylated by glucosyl-glycerolphosphate phosphatase (encoded for by stpA, found in all Prochlorococcus strains) into glucosylglycerol (GG) (Klahn et al., 2010). It is likely that RSP50 acquired ggpS to regain the ability to synthesize a N-free compatible solute that provides sufficient salt tolerance.

Glycerol-3-phosphate dehydrogenase catalyzes the reversible conversion of dihydroxyacetone phosphate (DHAP) to glycerol-3-phosphate. In turn, glycerol-3-phosphate is converted to GG-P (by the glucosylglycerol-phosphate synthase described above) and then GG. We hypothesize that this enzyme, unique to the Red Sea strain, might be involved in controlling the levels of cellular glycerol-3-phosphate, the precursor of the compatible solute GG. We also investigated the expression pattern of these two enzymes in the water column of the Red Sea in this study. Both enzymes (from the RSP50 genome) followed an

160 identical pattern at all depths, where glycerol-3-phosphate dehydrogenase was consistently expressed at higher FPKM values than glucosylglycerol-phosphate synthase. The expression pattern of both is independent of the diel cycle, suggesting that expression levels could depend solely on the cytolic levels of glycerol-3-phosphate (Figure 8a, b). More experiments are needed to verify the functional capability of those enzymes and confirm their role in the tolerance of high salinity levels.

Finally, the third unique enzyme of interest was trehalose synthase (also called maltose α-D-glucosyltransferase or alpha α amylase). This enzyme catalyzes the formation of trehalose from maltose, which represents one of three possible ways to synthesize trehalose in bacteria (Ruhal et al., 2013). Maltose might be taken up from the environment via an ABC transporter, but the substrate specificity for the majority of them in Prochlorococcus is unknown

(Hagemann 2010). Despite having no evidence of trehalose production or uptake in any picocyanobacterial strains, Prochlorococcus MED4 possesses a trehalase gene (treH) (Scanlan et al., 2009). Some Synechococcus strains carry the trehalose synthase gene but trehalose accumulation as a potential osmolyte has not been investigated so far (Hagemann 2010). FPKM values for the expression of this gene is considerably lower than that of glucosylglycerol-phosphate synthase and glycerol-3-phosphate dehydrogenase eat al depths, perhaps due to being less critical for salt tolerance (Figure 8c). This gene also peaks during the night and could possibly follow the diel cycle for optimal functioning. A recent transcriptome analysis of strain AS9601 grown under continuous light showed an increase in expression of genes involved in biosynthesis of glucosylglycerate

161 (GGA) and sucrose under salt stress, among many others of unknown function

(Al-Hosani et al., 2015). Similarly, the RSP50 strain might have an additional mechanism for salt tolerance by producing GG as an extra osmolyte to counter the high salinity levels of the Red Sea. Closer investigation of these genes, along with other unique components will help us understand how the RSP50 differentiated itself genetically to survive the Red Sea. Also, it is worth investigating how these genes became part of the Prochlorococcus gene pool, whether through lateral transfer or via mutational events under selective pressure.

Conclusion

The global dominance of environmentally critical organisms like Prochlorococcus is likely to expand and cover higher latitudes as a consequence of climate change. As a result, physiological adaptive mechanisms and genome-wide selection can give rise to subpopulations in the genera with differential genetic backbones. Sequencing more Prochlorococcus genomes and surveying oceanic regions with metagenomic and metatranscriptomic tools will unravel the potential functions of unique and unknown genes in their respective pan-genome. This study describes the isolation of a Red Sea Prochlorococcus strain RSP50 that belongs to the HL II ecotype. We explore the genomic content of RSP50 and identify a set of unique components that allows it to survive the Red Sea environment. Among the genes identified, three encode for enzymes potentially involved in the biosynthesis of compatible solutes that help cyanobacteria cells tolerate high salinity levels. We report their expression in the natural environment by mapping the RSP50 genome to the Red Sea metatranscriptomic dataset to

162 reveal their correlation with the diel cycle and their depth specificity. Gene expression did not follow a specific pattern but expression values were higher in the surface and BML than in the DCM. More experiments are required to test their functionality under different salt concentrations and better characterize the enzymes. This is the first study on Prochlorococcus in the Red Sea’s main basin that combines isolation, genome sequencing, and metatranscriptomic analyses.

Coupled with our understanding of their distribution and diversity patterns in the main basin of the Red Sea, we get closer to deciphering the current and future role of Prochlorococcus within the warming oceanic ecosystems.

Acknowledgement

The authors would like to acknowledge Rachid Sougrat at the Imaging Core Lab as well as Manal Andijani, Hicham Mansour, Alaguraj Dharmarajnadar and Hatim

Talal Almansouri at the Biosciences Core Lab in KAUST for their help with sample analysis. We are thankful to the crew and captain of the R/V Thuwal, and particularly to Francis Mallone, for making the sampling process smooth and easy. R/V Aegaeo captain and crew for KRSE2013.

163 REFERENCES

Acker, J., Leptoukh, G., Shen, S., Zhu, T., & Kempler, S. (2008). Remotely- sensed chlorophyll a observations of the northern Red Sea indicate seasonal variability and influence of coastal reefs. Journal of marine systems, 69(3), 191- 204.

Ahlgren, N. A., & Rocap, G. (2006). Culture isolation and culture-independent clone libraries reveal new marine Synechococcus ecotypes with distinctive light and N physiologies. Applied and environmental microbiology, 72(11), 7193-7204.

Alam I, Antunes A, Kamau AA, alawi WB, Kalkatawi M, Stingl U, et al. (2013). INDIGO – INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles. PLoS ONE 8:e82210.

Al-Hosani, S., Oudah, M. M., Henschel, A., & Yousef, L. F. (2015). Global transcriptome analysis of salt acclimated Prochlorococcus AS9601. Microbiological Research, 176, 21-28.

Alraddadi, T. M. (2013). Temporal changes in the Red Sea circulation and associated water masses. University of Southampton, Doctoral thesis, Ocean and Earth Science, 198 pp.

Biller, S.J., Berube, P.M., Berta-Thompson, J.W., Kelly, L., Roggensack, S.E., Awad, L (2014). Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. Scientific data, 1:140034.

Biller, S. J., Berube, P. M., Lindell, D., & Chisholm, S. W. (2015). Prochlorococcus: the structure and function of collective diversity. Nature Reviews Microbiology, 13(1), 13-27.

Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477.

Bolger AM, Lohse M, Usadel B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120.

Chen, Y., Paytan, A., Chase, Z., Measures, C., Beck, A. J., Sañudo-Wilhelmy, S. A., & Post, A. F. (2008). Sources and fluxes of atmospheric trace elements to the

164 Gulf of Aqaba, Red Sea. Journal of Geophysical Research: Atmospheres (1984– 2012), 113(D5).

Churchill, J. H., Bower, A. S., McCorkle, D. C., & Abualnaja, Y. (2014). The transport of nutrient-rich Indian Ocean water through the Red Sea and into coastal reef systems. Journal of Marine Research, 72(3), 165-181.

Dufresne, A., Salanoubat, M., Partensky, F., Artiguenave, F., Axmann, I. M., Barbe, V., Duprat, S., Galperin, M. Y., Koonin, E. V. Oztas, S., Robert, C., Rogozin, I. B., Scanlan, D. J., Wolf, Y. I. & Hess, W. R. (2003). Genome sequence of the cyanobacterium Prochlorococcus marinus SS120, a nearly minimal oxyphototrophic genome. Proceedings of the National Academy of Sciences, 100(17), 10020-10025.

Dupont, C. L., K. Neupane, J. Shearer, and B. Palenik. 2008. Diversity, function and evolution of genes coding for putative Ni-containing superoxide dismutases. Environ. Microbiol. 10:1831–1843.

Dupont, C. L., McCrow, J. P., Valas, R., Moustafa, A., Walworth, N., Goodenough, U., Roth, R., Hogle, S. L., Bai, J., Johnson, Z. I., Mann, E., Palenik, B., Barbeau, K. A., Venter J. C. & Allen, A. E. (2015). Genomes and gene expression across light and productivity gradients in eastern subtropical Pacific microbial communities. The ISME journal, 9(5), 1076-1092.

Edwards FJ (1987) Climate and oceanography: Red Sea, Pergamon Press, Oxford 45-69.

Falda, M., Toppo, S., Pescarolo, A., Lavezzo, E., Di Camillo, B., Facchinetti, A., & Fontana, P. (2012). Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms. BMC bioinformatics, 13(Suppl 4), S14.

Flombaum, P., Gallegos, J. L., Gordillo, R. A., Rincón, J., Zabala, L. L., Jiao, N., et al (2013). Present and future global distributions of the marine Cyanobacteria Prochlorococcus and Synechococcus. Proceedings of the National Academy of Sciences, 110(24), 9824-9829.

Florencio, F.J., Perez-Perez, L., Lopez-Maury, A. Mata-Cabana, and M. Lindahl. 2006. The diversity and complexity of the cyanobacterial thioredoxin systems. Photosynth. Res. 89:157–171.

165 Fuller, N. J., Marie, D., Partensky, F., Vaulot, D., Post, A. F., & Scanlan, D. J. (2003). Clade-specific 16S ribosomal DNA oligonucleotides reveal the predominance of a single marine Synechococcus clade throughout a stratified water column in the Red Sea. Applied and Environmental Microbiology, 69(5), 2430-2443.

Garcia-Pichel, F., Belnap, J., Neuer, S., & Schanz, F. (2003). Estimates of global cyanobacterial biomass and its distribution. Algological Studies, 109(1), 213-227.

Giovannoni, S. J., Thrash, J. C., & Temperton, B. (2014). Implications of streamlining theory for microbial ecology. The ISME journal, 8(8), 1553-1565.

Hagemann, M. (2011). Molecular biology of cyanobacterial salt acclimation. FEMS microbiology reviews, 35(1), 87-123.

Hosoya-Matsuda, N., K. Motohashi, H. Yoshimura, A. Nozaki, K. Inoue, M. Ohmori, and T. Hisabori. (2005). Anti-oxidative stress system in cyanobacteria— significance of type II peroxiredoxin and the role of 1-Cys peroxiredoxin in Synechocystis sp. strain PCC 6803. J. Biol. Chem. 280:840–846.

Huang S, Wilhelm SW, Harvey HR, Taylor K, Jiao N & Chen F (2012) Novel lineages of Prochlorococcus and Synechococcus in the global oceans. ISME J 6: 285-297.

Ilikchyan, I. N., R. M. L. McKay, J. P. Zehr, S. T. Dyhrman, and G. S. Bullerjahn. 2009. Detection and expression of the phosphonate transporter gene phnD in marine and freshwater picocyanobacteria. Environ. Microbiol. 11:1314–1324.

Jimenez Infante F, Ngugi DK, Alam I, Rashid M, Baalawi W, Kamau AA, et al. (2014). Genomic differentiation among two strains of the PS1 clade isolated from geographically separated marine habitats. FEMS Microbiology Ecology. doi:10.1111/1574-6941.12348.

Johnson ZI, Zinser ER, Coe A, McNulty NP, Woodward EMS & Chisholm SW (2006) Niche partitioning among Prochlorococcus ecotypes along ocean-scale environmental gradients. Science 311: 1737-1740.

Karnovsky, M. J. (1971). Use of ferrocyanide-re- duced osmium tetroxide in electron microscopy. Paper presented at the l l t h Annual Meeting of the American Society for Cell Biology.

166 Kashtan, N., Roggensack, S., Rodrigue, S., Thompson, J. W., Biller, S. J., Coe, A., Chisholm, S. W. (2014). Single-cell genomics reveals hundreds of coexisting subpopulations in wild Prochlorococcus. Science, 344, 416-420.

Kettler GC, Martiny AC, Huang K et al. (2007) Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet 3: e231.

Klähn, S., Steglich, C., Hess, W. R., & Hagemann, M. (2010). Glucosylglycerate: a secondary compatible solute common to marine cyanobacteria from nitrogen- poor environments. Environmental microbiology, 12(1), 83-94.

Kopylova E, Noé L, Touzet H. (2012). SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics. doi:10.1093/bioinformatics/bts611.

Langmead B, Salzberg SL. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods 9:357–359.

Lavin P, González B, Santibánez J, Scanlan DJ & Ulloa O (2010) Novel lineages of Prochlorococcus thrive within the oxygen minimum zone of the eastern tropical South Pacific. Environmental Microbiology Reports 2: 728-738.

Lindell D & Post AF (1995) Ultraphytoplankton succession is triggered by deep winter mixing in the Gulf of Aqaba (Eilat), Red Sea. Limnol Oceanogr 40: 1130- 1141.

Masella AP, Bartram AK, Truszkowski JM, Brown DG, Neufeld JD. (2012). PANDAseq: paired-end assembler for illumina sequences. BMC Bioinformatics 13:31.

Martiny AC, Coleman ML, Chisholm SW (2006) Phosphate acquisition genes in Prochlorococcus ecotypes: evidence for genome-wide adaptation. Proc Natl Acad Sci USA 103:12552–12557

Martiny AC, Tai AP, Veneziano D, Primeau F & Chisholm SW (2009) Taxonomic resolution, ecotypes and the biogeography of Prochlorococcus. Environ Microbiol 11: 823-832.

Moore, L. R., M. Ostrowski, D. J. Scanlan, K. Feren, and T. Sweetsir. (2005). Ecotypic variation in phosphorus acquisition mechanisms within marine picocyanobacteria. Aquat. Microb. Ecol. 39:257–269.

167

Muhling, M., Fuller, N. J., Somerfield, P. J., Post, A. F., Wilson, W. H., Scanlan, D. J., Joint, I. & Mann, N. H. (2006). High resolution genetic diversity studies of marine Synechococcus isolates using rpoC1-based restriction fragment length polymorphism. Aquatic microbial ecology, 45(3), 263-275.

Nikolenko SI, Korobeynikov AI, Alekseyev MA. (2013). BayesHammer: Bayesian clustering for error correction in single-cell sequencing. BMC Genomics 14 Suppl 1:S7.

Ngugi DK, Antunes A, Brune A & Stingl U (2011) Biogeography of pelagic bacterioplankton across an antagonistic temperature–salinity gradient in the Red Sea. Mol Ecol 21: 388-405.

Ngugi DK & Stingl U (2012) Combined analyses of the ITS loci and the corresponding 16S rRNA genes reveal high micro- and macrodiversity of SAR11 populations in the Red Sea. PLoS One 7: e50274.

Ngugi DK, Blom J, Alam I, Rashid M, Ba-Alawi W, Zhang G, et al. (2015). Comparative genomics reveals adaptations of a halotolerant thaumarchaeon in the interfaces of brine pools in the Red Sea. The ISME Journal 9:396–411.

Olson, R. J., Chisholm, S. W., Zettler, E. R., Altabet, M. A., & Dusenberry, J. A. (1990). Spatial and temporal distributions of prochlorophyte picoplankton in the North Atlantic Ocean. Deep Sea Research Part A. Oceanographic Research Papers, 37(6), 1033-1051.

Ottesen, E. A., Young, C. R., Gifford, S. M., Eppley, J. M., Marin, R., Schuster, S. C., Scholin, C. A. & DeLong, E. F. (2014). Multispecies diel transcriptional oscillations in open ocean heterotrophic bacterial assemblages. Science, 345(6193), 207-212.

Palenik, B., & Haselkorn, R. (1992). Multiple evolutionary origins of prochlorophytes, the chlorophyllb-containing prokaryotes. Nature 355: 265–267.

Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research 25:1043–1055.

Patel RK, Jain M. (2012). NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS ONE 7:e30619.

168

Penno S, Lindell D & Post AF (2006) Diversity of Synechococcus and Prochlorococcus populations determined from DNA sequences of the N- regulatory gene ntcA. Environ Microbiol 8: 1200-1211.

Raitsos, D. E., Pradhan, Y., Brewin, R. J., Stenchikov, G., & Hoteit, I. (2013). Remote sensing the phytoplankton seasonal succession of the Red Sea. e64909.

Rocap G, Distel DL, Waterbury JB & Chisholm SW (2002) Resolution of Prochlorococcus and Synechococcus ecotypes by using 16S-23S ribosomal DNA internal transcribed spacer sequences. Appl Environ Microbiol 68: 1180- 1191.

Ruhal, R., Kataria, R., & Choudhury, B. (2013). Trends in bacterial trehalose metabolism and significant nodes of metabolic pathway in the direction of trehalose accumulation. Microbial biotechnology, 6(5), 493-502.

Rusch DB, Martiny AC, Dupont CL, Halpern AL & Venter JC (2010) Characterization of Prochlorococcus clades from iron-depleted oceanic regions. Proc Natl Acad Sci USA 107: 16184-16189.

Scanlan, D. J., & West, N. J. (2002). Molecular ecology of the marine cyanobacterial genera Prochlorococcus and Synechococcus. FEMS microbiology ecology, 40(1), 1-12.

Scanlan, D. J. (2003). Physiological diversity and niche adaptation in marine Synechococcus. Advances in microbial physiology, 47, 1-64.

Scanlan DJ, Ostrowski M, Mazard S, Dufresne A, Garczarek L, Hess WR, Post AF, Hagemann M, Paulsen I & Partensky F (2009) Ecological genomics of marine picocyanobacteria. Microbiol Mol Biol Rev 73: 249-299.

Six, C., Finkel, Z. V., Irwin, A. J., & Campbell, D. A. (2007). Light variability illuminates niche-partitioning among marine picocyanobacteria. PLoS One, 2(12), e1341.

Shi, Y., Tyson, G. W., Eppley, J. M., & DeLong, E. F. (2011). Integrated metatranscriptomic and metagenomic analyses of stratified microbial assemblages in the open ocean. The ISME journal, 5(6), 999-1013.

169 Shibl, A. A., Thompson, L. R., Ngugi, D. K., & Stingl, U. (2014). Distribution and diversity of Prochlorococcus ecotypes in the Red Sea. FEMS microbiology letters, 356(1), 118-126.

Stewart, F. J., Dalsgaard, T., Young, C. R., Thamdrup, B., Revsbech, N. P., Ulloa, O., Canfield, D. E. & DeLong, E. F. (2012). Experimental incubations elicit profound changes in community in OMZ bacterioplankton. PLoS One, 7(5), e37118.

Stingl U, Tripp HJ, Giovannoni S (2007). Improvements of high-throughput culturing yielded novel SAR11 strains and other abundant marine bacteria from the Oregon coast and the Bermuda Atlantic Time Series study site. ISME J 1:361–371

Swan, B. K., Tupper, B., Sczyrba, A., Lauro, F. M., Martinez-Garcia, M., González, J. M., ... & Stepanauskas, R. (2013). Prevalent genome streamlining and latitudinal divergence of planktonic bacteria in the surface ocean. Proceedings of the National Academy of Sciences, 110(28), 11463-11468.

Thompson, C.C., Silva, G.G., Vieira, N.M., Edwards, R., Vicente, A.C. & Thompson, F.L. (2013). Genomic taxonomy of the genus Prochlorococcus. Microbial ecology, 66(4), 752-762.

Ting CS, Hsieh C, Sundararaman S, Mannella C, Marko M (2007). Cryo-electron tomography reveals the comparative three-dimensional architecture of Prochlorococcus, a globally important marine cyanobacterium. J Bacteriol 189:4485–4493

Waldbauer, J. R., Rodrigue, S., Coleman, M. L., & Chisholm, S. W. (2012). Transcriptome and proteome dynamics of a light-dark synchronized bacterial cell cycle. PloS one, 7(8), e43432.

West N.J., Lebaron P., Strutton P.G. & Suzuki M.T .(2011) A novel clade of Prochlorococcus found in high nutrient low chlorophyll waters in the South and Equatorial Pacific Ocean. ISME J 5: 933-944.

Zinser ER, Johnson ZI, Coe A, Karaca E, Veneziano D, Chisholm SW (2007). Influence of light and temperature on Prochlorococcus ecotype distributions in the Atlantic Ocean. Limnol Oceanogr 52:2205–2220

170 Zwirglmaier K., Jardillier L., Ostrowski M., Mazard S., Garczarek L., Vaulot D., Not F., Massana R., Ulloa O. & Scanlan D.J. (2008) Global phylogeography of marine Synechococcus and Prochlorococcus reveals a distinct partitioning of lineages among oceanic biomes. Environ Microbiol 10: 147-161.

171

Supplementary information

Table S1 Genome characteristics of the various Prochlorococcus strains that were sequenced, including the R ed Sea strain highlighted in grey

172

Table S2 List of unique genes in RSP50 discussed in this study.

Figure S1 Heatmap showing the AAI values shared between the Red Sea strain RSP50 and the 43 Prochlorococcus strains that have sequenced genomes.173

174 CONCLUSIONS AND FUTURE PROSPECTS

Prochlorococcus is a globally important marine microbe that drives primary production in the tropical and subtropical oceans. The framework surrounding the genomic diversity and in turn niche distribution of these organisms is critical in understanding their ecological success in marine ecosystems. The Red Sea is an example of an oligotrophic marine ecosystem characterized with relatively higher temperatures, salinity levels and minimal water exchange. Analyzing the microbial populations inhabiting the main basin of the Red Sea revealed the numerical dominance of Prochlorococcus in the surface waters. The genomic differentiation of Prochlorococcus into distinct ecotypes drives their niche occupation. Chapter one of this thesis addresses the microdiversity of Prochlorococcus in the main basin of the Red Sea, which has not been previously explored. The ITS sequences detected belonged to HL II adapted cells as well as recently discovered LL cells. The second chapter looks at the distribution of Prochlorococcus across the Red Sea from the northern to the southern end, including coastal and open-ocean waters. The significance of this research lies in the biogeographical scope it covers and in the rpoC1 marker gene used, which resolves the diversity of Prochlorococcus to high detail. The large datasets generated in the study, either from pyrosequencing the rpoC1 gene or its metagenomic fragment recruitment, covered all regions of the Red

Sea, including the environmentally contrasting northern and southern ends. The high diversity of Prochlorococcus ecotypes at 100 m and the presence of highly abundant rpoC1 OTUs at 500 m raised interesting questions to be pursued; what

175 drives the cells to gain or lose their fitness at these depths, and what drives this niche partitioning? In addition, the first two chapters highlight the homogeneity of the Prochlorococcus community profile across the temperature-salinity gradient.

We are able to conclude that the temperature-salinity gradient across the main basin of the Red Sea does not influence Prochlorococcus distribution, and the higher microdiversity at specific depths remains consistent throughout. Since the

Red Sea resembles a global warming scenario, and the climate of oceanic regions of the world are inevitably changing, it was intriguing to examine the genomic characteristics that allow Prochlorococcus to survive. What mechanisms do Red Sea-specific strains use for adaptations? How does the unique component in the genome add to the cell’s fitness and help inhabit the water column? To this end, a HL II adapted Prochlorococcus strain was cultured in the lab for genomic and metatranscriptomic investigation. Chapter three describes this strain, and its genome. A few genes involved in salt acclimation were discovered to be unique to the Red Sea strain and the ability to synthesize more compatible solutes surely reflects the salt stress these organisms face. We further explored their expression pattern and saw continuous fluctuations of expression throughout the water column, identifying these genes as an important component for the survival tactics of Prochlorococcus.

It is worth noting that a large portion of the genes in the Red Sea specific strain was of unknown function or uncharacterized. This indicates that the strain may be utilizing several different adaptation mechanisms to deal with the harsh environment and it is worth identifying more functional and metabolic pathways that are specific or predominant in the Red Sea. This study contributes to the

176 identification of more genes in the flexible gene pool of the Prochlorococcus genus, and provides a solid platform to build evolutionary hypotheses on, regarding environmental selective pressure. As we target diverse

Prochlorococcus ecotypes from the Red Sea and sequence their genomes, we get closer to describing the global pan-genome of the genus in an ecological context.

The large metatranscriptomic datasets generated after sampling the Red

Sea during KRSE2013 in the Mediterranean and Red Sea are extremely valuable. By mapping more Prochlorococcus genomes (Red Sea strains and others) to the datasets from different oceanic regions, we will certainly be bale to determine the significance of unknown genes that can be further experimented with. The unique set of genes found can be further analyzed for better understanding the influence of climate change on the microbial communities, and by default on critical biogeochemical cycles. A more detailed analysis of the expression patterns of genes from the Red Sea specific strain will be important to test hypotheses about the forces that shape the evolution of such globally important marine microbes.