!  "#$ #"%""  &'  ()    & (    *# %

  +               ,                         %-      .                  /   01$12.& 3%-                      4   (    (   (( % -                         5- 6   /          /                           (   ( 7   /         /    % 8  (/    /       9(/  :     ;  -        7    :                         % .           /   ,  5.8 ,6  #      (          <        %& / (          /        5    6/        9%-          ; ; (/        /  /   ;    %8  (/  /  - 9          9              %       (   (                 %.8 ,       /     94 ,               9% 9  4 #0   /   (       .           9  4%   9                (            % 8  (                   / (   7   (  7   (          (       7     /    %#"0!      =!$531%$>6     /         7  (/  ?10    / /        (         (?0?             (     (4   (         (#?! /           ( 7          /    % 8  (      /      -       ( 9 9(            %8 (                 /     7    /    (                         %

          ! "   #!   $  %&' ( ) (     !* "+ !    

   "#$  @AA %% A  B C @@ @ @ #1313$

84=!$=#!!=!#$=? 84=!$=#!!=!#=""

       !   "    # $ 

    (#"0=#  

TRANSCRIPTION REGULATION AND GROWTH PHASE TRANSITION IN HYPERTHERMOACIDOPHILIC ARCHAEA

Kun Wang

Transcription regulation and growth phase transition in hyperthermoacidophilic archaea

Kun Wang ©Kun Wang, Stockholm University 2018

ISBN print 978-91-7797-189-4 ISBN PDF 978-91-7797-190-0

Cover photo: Black Pool at Yellowstone National Park, USA. Photo taken by Kun Wang Printed in Sweden by Universitetsservice US-AB, Stockholm 2018 Distributor: Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University To Rolf Bernander

Summary

Organisms from the domain Archaea are ubiquitously represented on our planet and encompass diverse fascinating organisms. The genus Sulfolobus belonging to the phylum Crenarchaeota includes hyperthermoacidophilic strains that grow optimally at 65-85 °C and pH 2-3. These organisms have been used as model organisms for thermophiles to investigate archaeal DNA replication, transcription, , cell cycle, etc.

The focus of this thesis is on the study of archaeal specific transcription factors (TFs) as well as transcriptome changes during growth phase transition of the hyperthermoacidophilic archaeons Sulfolobus acidocaldarius and Sulfolobus solfataricus, respectively, to expand our knowledge on archaeal transcription regulation and growth phase adaptation.

In paper 1, we studied the genome-wide binding sites of BarR, which is a - alanine responsive Lrp family TF that activates the expression of -alanine aminotransferase located in a divergent in S. acidocaldarius. Chroma- tin immunoprecipitation followed by deep sequencing (ChIP-seq) revealed 21 binding regions, including previously characterized barR/Saci_2137 inter- genic region. However, only one additional operon containing two glutamine synthase (Saci_2320 and Saci_2321) was found to be under activation of BarR. This operon is a common target of LyM and Sa-Lrp, which indicates a regulatory network between different Lrp-like regulators. In paper 2, we showed that a TetR family transcription FadRSa regulates fatty acid metabolism in S. acidocaldarius. FadRSa rests in a cluster, Saci_1103- Saci_1126, that mainly contains lipid degradation and fatty acid metabolism genes. ChIP-seq revealed four binding sites within the gene cluster, and RNA- seq analysis of a FadRSa deletion mutant strain further confirmed that the entire gene cluster is repressed by FadRSa. FadRSa binds DNA at a 16- motif with dyad symmetry, and binding of medium- to long-chain acyl-CoA molecules resulted in dissociation of FadRSa from the DNA. Although FadRSa is similar to its bacterial counterparts functionally and structurally, a funda- mentally different ligand binding mode has been observed.

In paper 3, transcriptome data of S. solfataricus at four different time points during growth, including early exponential phase, late exponential phase, early stationary phase and late stationary phase, have been studied and re- vealed a massive change in gene expressions during growth phase transition. Out of a total of 2978 coding genes, 1067 (35.8%) were identified as differentially expressed, including 456 induced genes most of which were related to transposase, metabolism and stress response, 464 repressed genes most of them involved in translation, basic transcription, DNA replication, amino acids metabolism and defence mechanisms, and 147 genes with fluctuated profile including transporters, oxidation-reduction process related genes and few metabolic genes.

In summary, the studies of two metabolic related TFs in S. acidocaldarius, BarR and FadRSa, shed light on their function and regulatory mechanisms. In addition, the transcriptome data of S. solfataricus not only reveals genome- wide alteration of during growth phase transition, but also provide a rich source of information for further studies by the archaea research community.

List of publications

I. Han Liu, Kun Wang, Ann-Christin Lindås, Eveline Peeters. (2016). The genome-scale DNA-binding profile of BarR, a β-alanine respon- sive in the archaeon Sulfolobus acidocaldarius. BMC Genomics, 17(1):569

II. Kun Wang, David Sybers, Hassan Ramadan Maklad, Liesbeth Lem- mens, Charlotte Lewyllie, Xiaoxiao Zhou, Christopher Bräsen, Bet- tina Siebers, Karin Valegård, Ann-Christin Lindås and Eveline Peeters. A bacterial-like FadR transcription factor regulates fatty acid metabo- lism in the archaeal model organism Sulfolobus acidocaldarius. Na- ture Communication (Under consideration after revision)

III. Kun Wang, Mohea Couturier, Anna Knöppel, Erik A. Pelve, Magnus Lundgren and Ann-Christin Lindås. Genome-wide transcription re- sponses during the transition from exponential to stationary growth phase in the Archaeon Sulfolobus solfataricus. (Manuscript)

Abbreviations

5-FOA 5-fluoroorotic acid BRE ChIP chromatin immunoprecipitation ChIP-seq chromatin immunoprecipitation followed by deep sequencing CRISPR clustered regularly interspaced short palindromic repeats DBD DNA binding domain DEGs differentially expressed genes EMSA electrophoretic mobility shift assay HTH helix-turn-helix INR initiator element NGS next-generation sequencing qRT-PCR quantitative PCR RHH ribbon-helix-helix RNAP RNA polymerase RNA-seq RNA sequencing S-layer surface layer TBP TATA-binding protein TFB transcription factor B TFE transcription factor E TFs transcription factors TSS transcription start site UV ultraviolet wHTH winged helix-turn-helix

Contents

Introduction ...... 7 Tree of Archaea ...... 8 Characteristics of the archaea cell ...... 9 Transcriptional control in archaea ...... 11 Basal transcription ...... 11 Transcription factors ...... 13 Chromatin associated ...... 16 Growth phase adaptations ...... 18 Growth phases physiology ...... 18 Transition to stationary phase in ...... 19 Transition to stationary phase in Archaea ...... 21 Model organisms for hyperthermophiles ...... 21 Cell characterization ...... 22 Methods used in the studies ...... 23 Generation of knock-out strains...... 23 Chromatin immunoprecipitation ...... 25 High throughput sequencing ...... 27 Aim of the thesis ...... 30 Project Summary ...... 31 Project I. Characterization of archaeal TFs ...... 31 Paper 1 ...... 31 Paper 2 ...... 32 Project II. Transcription responses during growth phase transition ...... 34 Paper 3 ...... 34 Future perspective ...... 36 Sammanfattning ...... 37 Acknowledgment ...... 39 References ...... 41

5

6 Introduction

Life on Earth used to be categorized into two Kingdoms by whether the cell contains a nucleus or not, namely and prokaryotes. In the late 1970s, Carl Woese challenged this division when he started phylogenetic stud- ies by comparing sequences of the small subunit of ribosomal RNA (16S rRNA) of methanogens that were considered as “bacteria” at that time [1, 2]. The result was revolutionary, he found that prokaryotes actually was not a monophyletic division, it encompassed two distinct phylogenetic groups, bac- teria and archaebacteria. It appeared that archaebacteria, although being a pro- karyote, did not resemble bacteria more than they resembled eukaryotes on a molecular level. The scientific community was skeptical of the new method of classification, but more and more concrete experimental results supporting the division were discovered in the following decade after Carl Woese’s state- ment. It was shown that archaebacterial RNA polymerase was highly similar to eukaryotic RNA polymerase II [3], the cell wall was lacking true pepti- doglycan, and the archaebacteria displayed a distinct cell membrane compo- sition and structure different from both bacteria and eukaryotes [4]. With the supporting evidence in hand, in 1990 archaebacteria was renamed to Archaea and formally proposed as one of the three domains of , which are Archaea, Bacteria and Eukarya [5]. Today, it has become a common method to compare the 16s rRNA sequence to identify and distinguish between different species [6].

The most attractive feature and also the first impression of archaea would be its ability to thrive in extreme environments. Methanopyrus kandleri strain 116 holds the record of upper temperature limit for life, which is 122 °C [7]. Halophilic archaea in nearly saturated salt brines across the world and there are also plenty of strains present in hot acidic springs. However, by em- ploying cultivation-independent methods, like metagenomics and single cell sequencing, archaea species have been discovered ubiquitously on Earth and most archaea are not extremophiles [8]. It has been shown that archaea plays important roles in biogeochemical cycles, carbon and nitrogen cycle in partic- ular. Archaea have also been found on our skin, in our intestine and oral cavity and despite there are no pathogens identified, their health implications are still worth investigating [9, 10].

As Archaea is the most recently identified domain of life, the research is still in its infancy compared to that of bacteria and eukaryotes. However, research

7 on archaea has been growing rapidly and yielded invaluable insights into this fascinating organism.

Tree of Archaea In 1990, Woese showed by using 16S ribosomal RNA sequence analysis that the domain Archaea could be divided into two phyla, Euryarchaeota and Crenarchaeota. Not so many strains were identified at that time and all of them were extremophiles, such as halophiles, thermophiles, and methanogens. With advances in sequencing technologies [11], the phylogenetic tree of archaea has expanded enormously, which gives great implication on archaea diversity, not only the habitats but also their metabolisms. Many new phyla have been suggested during the past 20 years, such as Thaumarchaeota [12], Korarchae- ota [13], Aigarchaeota [14], Geoarchaeota [15], Nanoarchaeota [16] and Loki- archaeota [17], although several are solely based on metagenomic sequence data. Three superphyla have also been introduced recently, TACK encompass- ing Thaum-, Aig-, Cren-, and Korarchaeota [18] and three additional phyla, Geo-, Bathy-, and Verstraetearchaeota, which were included later [19]. The DPANN superphylum consists of Diapherotrites, Parv-, Aenigm-, Nano-, and Nanohaloarchaea [20], the Asgard superphylum contains Loki-, Thor-, Odin- , and Heimdallarchaeota [21]. Conventionally, archaea are also grouped by their physiology, such as halophiles, methanogens, hyperthermophiles, acidophiles and mesophiles.

In the TACK super phylum, Crenarchaeota mainly consists of thermophiles that have an optimal growth temperature higher than 45 °C and hyperthermoacidophiles that lives in hot springs with temperatures above 80 °C and pH around 2. Another widely accepted phylum is Thaumarchaeota in- cluding groups of mesophilic and low-temperature archaea that are distributed in marine and terrestrial environments [12] where they play important roles in global carbon and nitrogen cycles. Aigarchaeota only has one genome se- quence available which encodes many distinct eukaryotic features, such as ubiquitin protein modifier system and type I DNA topoisomerase. 16S rRNA sequence studies indicate that Aigarchaeota is widespread in thermal ecosys- tems [22]. Korarchaeota is another uncultured phylum which was first discov- ered in a hot spring. It also has only one genome sequenced from an enriched culture and is placed close to the root of the archaeal tree based on phyloge- netic analysis [23]. Geoarchaeota is another deep rooted candidate phylum found in a high-temperature acidic iron mat in Yellowstone National Park. Metagenome sequencing and fluorescence in situ hybridization (FISH) results show that they are one of the dominant group in the Fe-mat samples in hot springs [15].

8

DPANN archaea has been found in diverse habitats and many members of this superphylum have a small size and may live a symbiotic lifestyle. Nanoar- chaeum equitans, the only cultured strain in the superphylum, is 0.4 μm in diameter and has a highly reduced genome around 0.5 megabases. Since lack- ing many essential biosynthesis pathways, N. equitans is dependent on its host Ignicoccus [16].

Euryarchaeota, as the name indicates, contains a wide range of species with different physiological traits, such as methanogens, halophiles, hyperthermo- philes, and also thermoacidophiles. Methane production is unique for Ar- chaea and it only exists within Euryarchaeota. Methanogens are a diverse group of strictly anaerobic organisms that have various habitats ranging from deep sea hydrothermal vent to the human gut.

Newly sequenced phyla belonging to the Asgard superphylum continue bring- ing exciting discoveries and new implications for the emergence of the eukar- yotic cell [19]. Asgard archaea contain an extensive number of eukaryotic sig- nature proteins, including ubiquitin system proteins, GTPases, membrane re- modelling proteins, actins, eukaryotic-like tubulins, ESCRT complex pro- teins, etc. Their discovery might rewrite the three-domain tree of life, as phylogenetic analyses place eukaryotes as a sister clad of Asgard archaea. However, additional genome sequences from this superphylum will improve further the exact position of the eukaryotes and also the isolation and cultiva- tion of these archaea will provide more direct evidences on their cellular com- plexity and relation with the eukaryotic cell.

It will not be surprising to see the tree of archaea continue growing immensely in the future. Nonetheless, experimental verification of the protein homo- logues predicted from genomic studies and isolation of representative strains are very important and will give more insights into the diversity of archaeal metabolism, the biotechnology potential and evolutionary course of eukaryogenesis or life in general on Earth eventually.

Characteristics of the archaea cell Archaea cells look like bacteria under the microscope, they are unicellular and have no nucleus or any membrane-enclosed organelles. At molecular level, however, more special features will appear.

One of the most distinctive traits of archaea lies in the composition and struc- ture of the cell membrane. Bacteria and eukaryotes share common building blocks for their membranes where glycerol-3-phosphate (G3P) polar heads are

9 bound to fatty acid chains by ester linkage, while archaea have glycerol-1- phosphate (G1P) and ether-linked isoprenoid chains [24]. There are excep- tions, where ether linkages and isoprenoid chains are found in bacterial and eukaryotic membranes, and specific archaea stains are shown to contain fatty acids [25]. The chiral glycerol phosphate and the enzymes involved in synthe- sis of the phospholipids are unique for archaea and have evolved inde- pendently, which have given rise to the still ongoing debate about the mem- brane composition of last universal common ancestor [26, 27]. Structurally, in addition to having lipid bilayers, most archaea species have a monolayer of tetraether lipids where two glycerol diether moieties are connected. The tetra- ether monolayer is proven to be more resistant to high temperature, acids and/or large metal ion gradients [28, 29]. In addition, isoprenoid side chains can be unsaturated or form ring structures depending on the environmental conditions, such as temperature and salinity [30]. However, the monolayer of tetraether membranes also exists in mesophilic archaea and does not display any specific phylogenic distribution [31].

Motility is vital for microorganisms to adapt to the environmental changes and archaea possess distinct features also here. The archaeal flagellum, named ar- chaellum, provides power for the archaeal cell to move [32]. Even though the archaellum shares high similarity with type IV pilus which bacteria use mainly for twitching, the archaellum can work as the flagella in means of rotational motions. The energy it produces makes Methanocaldococcus jannaschii the fastest moving organism (translocation of body sizes per second) on Earth [33].

Metabolically, archaea share similarities with bacteria. They can be photo- trophs, chemotrophs or heterotrophs and they possess distinct metabolic path- ways and variants of some classical pathways [34]. Methanogens belonging to Euryarchaeota are the only species that can produce methane by methano- genesis. In carbohydrate metabolism, modified Embden-Meyerhof and Ent- ner-Doudoroff pathways are unravelled in different archaea species. Research on archaeal metabolic pathway provides invaluable resources for industrial applications, especially the enzymes from extremophiles, which are stable at extreme temperature, pH, and salinity [35]

Genetically archaea possess both bacterial and eukaryotic characteristics. Ar- chaea contain circular chromosomes ranging from 1.5 to 6 Mbp and genes are grouped in similarly as in bacteria, while their regions con- tain TATA box and BRE element which they share with eukaryotes. To rep- licate the circular chromosome, bacteria use one replication origin, while ar- chaea use one or multiple origin(s) [36-38]. Although the origin structure in archaea is similar as in bacteria, such as genes encoding the replication initia-

10

TATA binding protein (TBP), transcription factor B (TFB) and transcription factor E (TFE). However, to further regulate transcription, archaea take ad- vantages of bacteria-like TFs. By analyzing 52 available archaea genomes, Perez-Rueda and his colleagues identified 3918 DNA binding TFs of which 53% have at least one bacterial homologue, 44% are archaea specific, 6% con- served in all three domains and only 2% have homologues in eukaryotes (mainly in Ascomycetes) [48].

The transcription cycle comprises three phases: initiation, elongation and ter- mination [47] . In brief, transcription initiation starts with TBP binding to the TATA box approximately 25 nucleotides upstream of the transcription start site (TSS) [49]. Upon binding, TBP alone will bend the promoter DNA (Meth- anocaldococcus jannaschii) or with the help of TFB (Sulfolobus acidocaldar- ius) depending on species of archaea [50]. TFB binds the TFB-recognition element (BRE) that is immediately upstream of the TATA box and determines the transcription polarity [49]. The TBP-TFB-DNA complex will recruit RNAP and form the preinitiation complex (PIC), which is sufficient to start transcription in vitro (Fig. 1). However, TFE, which is homologous to the α- subunit of eukaryotic TFIIE, binds the complex in vivo and facilities the open- ing of the RNAP clamp and melting of DNA [51-53]. Once the complex is assembled and the template DNA is loaded, elongation factor complex Spt4/5, conserved in all three domains of life, will displace TFE and assist RNAP escape the promoter and enhance the elongation processivity thereafter [54]. Transcription factor SI (TFSI), homologous to eukaryotic elongation factor TFIIS, cleaves misincorporated nascent RNA during elongation to rescue ar- rested elongation complex [55, 56]. The termination of the transcription can be intrigued by termination factors or specific sequences (intrinsic termina- tors). The archaeal mechanism is similar to eukaryotic RNAP III termination where transcription will be terminated by the presence of an oligo (T) sequence [57]. Recently, the first termination factor, eu- ryarchaeal termination activity (Eta) [58], well conserved in euryarchaea has been characterized in vivo and in vitro. Eta functions similar as the bacterial Mfd that binds upstream of the stalled transcription elongation complex and dissociate the complex to terminate the transcription. The exper- iments also support that other unknown termination factors exist. After disso- ciation from the elongation complex, RNAP can form a new PIC and start the transcription cycle again. Elongation and termination of transcription in ar- chaea are less understood than in bacteria and eukaryotes and more factors participating in these processes are awaiting to be characterized in the future.

Archaea employ several strategies to regulate gene transcription along the transcription cycle. For example, in addition to the TATA box and BRE, there are the initiator element (INR) and the promoter proximal element (PPE) that

12 were proved to influence the output of the transcription [54, 59]. As in eukar- yotes, archaea normally harbor multiple orthologues of TBP and TFB, with each orthologues having different binding motif preferences and different as- semblies of TBPs and TFBs determine the level of transcription [54]. In addi- tion to general transcription factors, two of the most studied transcription reg- ulation strategies in archaea are the regulation by specific transcription factors (TFs) and chromatin binding proteins.

Transcription factors Transcription factors modulate gene transcription by binding specific DNA sequences normally near their target genes in response to environmental stim- uli, such as nutrients, temperature, mental ions, etc. Upon ligand binding at the input domain or sensory domain, TF undergoes conformational changes at the output domain or DNA binding domain (DBD), which will lead to its bind- ing or releasing from the DNA. Based on their composition, TFs are mainly divided into two systems: one-component systems, in which the DBD and sensory domain are located within one protein, and two-component systems, in which the two domains are located in two separate proteins [60, 61]. In addition to sensory and DBD domains, a typical two-component system con- tains a histidine kinase domain in the sensory protein and a phosphoryl group receiver domain in the response protein [60]. One-component systems are simpler and are considered as precursors of the two-component system, as they are more widely distributed among prokaryotes and most protein domains in two-component systems can be found in one component systems [62]. Ar- chaea mainly use one-component systems to perform transcription regulation and where there are two-component systems they are mainly found in psy- chrophilic and mesophilic Euryarchaeal species and are absent in Crenarchae- ota and Korarchaeota [63, 64].

Archaea show less proportion of TFs compared to the total protein coding genes in the genome. Bacteria have approximately 8% open reading frames coding for TFs while archaea only has approximately 4%. Furthermore, ar- chaea encode significant smaller TFs (179 amino acids) than bacteria (236 amino acids) [48]. To overcome the deficit of TFs, regulators can regulate each other or working cooperatively, which creates a hierarchical gene tran- scription network [65]. Recently, two systematic models have been developed for two archaeal species (Halobacterium salinarum [66] and Methanococcus maripaludis [67]) by taking advantages of comprehensive transcriptomics and proteomics data. The networks reveal cross-regulation between TFs and many novel TFs and interactions have been predicted. One protein-protein interac- tion study performed in Pyrococcus horikoshii showed that TFs can interact with each other, which implicates that different TFs can form heterodimer or higher complex to regulate the transcription [68]. However, it is also possible

13 that archaeal TFs have novel features prevented from discovery in this study [65]. The four most abundant TF families in archaea are ArsR (ion sensing regulator), the HTH_3 (putative activator proteins), the Lrp/AsnC (amino acid biosynthesis regulators), and TrmB (maltose-specific regulation), which are also universally represented in all investigated archaea and are homologous to bacterial counterparts [48, 69].

TFs can be activator or repressor, which usually depends on the binding loca- tion [44, 45] (Fig. 1). can occupy the TATA box or BRE blocking TBP or TFB binding, which hampers PIC formation. In addition, they can bind downstream of the TATA box resulting in the inhibition of PIC proceeding to the elongation phase. Binding sites for activators are normally located up- stream of the BRE-TATA box region and their binding facilitate recruitment or stabilization of the general transcription factors [65]. However, the situation can be more complex, one TF can be an activator for one set of genes and a repressor for another or one TF can be either repressor or activator for the same gene depending on the concentration of the TF, such as S. solfataricus TF Ss-LrpB. At low concentrations, Ss-LrpB binds regularly spaced binding sites upstream of the BRE-TATA region of its own gene, which activates the transcription. However, at high concentrations Ss-LrpB wraps DNA around itself and represses transcription of its own gene [70].

DNA binding domains The DNA binding domain (DBD) is critical for recognition and interaction between TFs and DNA and plays a key role in TF characterization and pre- diction. Based on profiles extracted from structures and sequences, DBDs have been classified into several families, from which TFs usually acquire their names. Helix-turn-helix (HTH), ribbon-helix-helix (RHH) and Zn-rib- bon are the three most common DBD in archaea [69].

The helix-turn-helix (HTH) motif which is the most abundant DBD motif in archaea and bacteria is also found in eukaryotes, although more distantly re- lated according to sequence comparison [69, 71] (Fig. 2A). It functions in a variety of proteins, such as general and specific TFs, enzymes and proteins, which are involved in diverse biological processes [72]. The core HTH domain comprises of three helix bundles and the C-terminal of helix-3 is inserted into the major groove of the DNA and makes most contacts with DNA during binding [72]. HTH DBD is considered as one of the most ancient protein domains, thus experienced vast modifications during evolution. Most modifications happen in the two terminus of the domain or in the loop between helix-1 and helix-2 or in both locations, while the sharp turn between helix-2 and helix-3 is a characteristic feature that is conserved in all HTH proteins. Based on modifications HTH has been divided into many families, among which the winged HTH (wHTH) with 2-4 β-strands added at the C-terminal,

14 small molecules, including amino acids, sugars, metal ions and carbohydrates [62].

The Leucine-responsive regulatory protein, Lrp, is one of the best studied ar- chaeal TF families, whose sensory domain mainly binds amino acids. The characterized Lrp family TFs bind to different ligands, some TFs, like FL5 and FL11 from Pyrococcus horikoshii, can interact with several different amino acids [80], while some can only bind one specific amino acid [81]. Aside from ligand binding, sensory domains in Lrp family regulators interact with each other and form higher-order oligomeric structures [44, 80], which is influenced by ligand binging, pH, and protein concentration. TrmB is an- other metabolism related TF family that regulates assimilation and metabo- lism of sugars, such as trehalose, maltose and maltodextrin [82, 83].

Sensing the temperature changes is vital for survival. Two TFs, HSR1 and Phr, from hyperthermophiles have been characterized at present [84]. Upon heat shock, both TFs relives repression of the transcription of several heat shock proteins. In the psychrophilic methanogen Methanococcoides burtonii one two-component system LtrK/LtrR has been identified [85]. The sensory protein LtrK contains a transmembrane domain, a histidine kinaseA domain and a HATPase domain whereas the kinase domain exhibit higher autophos- phorylation activity when the temperature is decreased from the optimal growth temperature. Although the proteins have been characterized, the exact sensing mechanisms for all those TFs are still unknown.

Two redox sensing TFs, MsvR from methanogens and RosR from halophiles, are unique for each group of archaea, which suggest that the strategies for maintaining redox homeostasis might be species specific [45]. MsvR utilize a well-known thiol-based switch mechanism to sense the stress, in which cyste- ines in the sensory domain are oxidized to form disulfide bindings [86]. MsvR repress its own gene and several oxidative stress response genes, such as F420H2 oxidase and rubredoxin, under reducing condition. During oxidative stress, disulfide bounds formed within the sensory domain leads to conforma- tional changes and release from DNA [45, 87]. However, sensing mechanism for RosR is still obscure as there is no cysteines in the protein [88].

Chromatin associated proteins To compact the genome different DNA binding proteins are used in the three domains of life. Bacteria uses nucleoid-associated proteins to form nucleoid and eukaryotes employ four histone proteins H2A, H2B, H3 and H4 to arrange the genome [89]. A wide variety of chromatin binding proteins have been identified in archaea, such as archaeal , Alba, Cren7, Sul7d and CC1, but their distribution and function are species dependent. Euryarchaeota

16 mainly have homologues of the eukaryotic histone proteins H3 and H4, whereas Crenarchaeota mainly have small DNA binding proteins which are highly abundant and can bind to double-stranded DNA and RNA [90]. Chro- matin associated proteins are not only essential for organizing and compacting the genome, but also contribute to transcription regulation. Although we are still at the beginning of understanding the archaeal chromatin binding pro- teins, many interesting features shed light on their role in controlling gene expression.

Euryarchaeota encodes two eukaryotic histone protein homologues, H3 and H4. They assemble into a tetramer, which is wrapped by 60 bp of DNA and forms a structure [91, 92]. Similar as in eukaryotic cells, the nu- cleosome structure is absent immediately upstream and downstream of tran- scription units and several actively transcribed genes like ribosomal DNA [93]. The depletion gives a hint on the interplay between transcription and histone proteins. However, archaeal histone proteins lack amino- and carboxy- terminal tails, which makes them impossible to have similar post-translational modifications as eukaryotic counterparts [90].

Another well-studied chromatin protein family is Alba which is almost uni- versally represented in archaea. In thermophilic archaea, Alba proteins are present in high abundance and can account for as high as 4% of total cellular proteins in Sulfolobus shibatae [94]. They bind the DNA as homo- or hetero- dimer of paralogues with no apparent sequence specificity, which can form loops to bridge the DNA or form stiff filaments along DNA depending on the concentration of Alba [90]. One of the most interesting features of Alba is that they can be acetylated and deacetylate on lysine 16 by Pat (protein acetyltrans- ferase) and Sir2 (silent information regulator 2), respectively [95]. Experiment shows that non-acetylated Alba1 from Sulfolobus spp. represses the transcrip- tion in vitro, while acetylated Alba1 has low affinity to DNA and has no effect on transcription. Therefore, one feasible way to control the transcription in vivo can be to changing the state of Alba. In mesophilic euryarchaea, Alba is a more specific transcription regulator and less of a chromatin remodelling protein [90]. A study of the mesophilic Methanococcus maripaludis showed that Alba was not abundant but bound DNA at specific sequences, the deletion of Alba upregulated genes that were involved in carbon dioxide assimilation [96].

The boundary between transcription regulators and chromatin binding pro- teins is not apparent in many cases, including Alba proteins mentioned above that could perform dual roles in different species. There are also TFs that can modulate genome structure. Lrp regulators are universally represented in ar- chaea and involved in amino acid metabolism and central carbon metabolism. One Lrp family regulator from Sulfolobales, Sa-Lrp, has been characterized

17 as a chromatin remodelling protein since it binds DNA with low sequence specificity and introduces DNA wrapping, in addition to a global gene regu- lator [97]. TrmB family regulators are usually involved in sugar metabolism in archaea. One TrmB-like protein, TrmBL2, was also identified as a chroma- tin binding protein, which covers double-stranded DNA or single-stranded DNA and forms thick and fibrous filaments [98, 99].

Experimental characterization of archaeal regulators employing genetically tractable archaea systems and new technologies will bring more in-depth un- derstanding of the archaeal hybrid regulatory mechanisms which combine bacterial-like regulators with a eukaryotic basal transcription apparatus.

Growth phase adaptations The access to pure culture is critical for physiological and molecular charac- terizations of a certain species, as it provides experimental materials, such as cells, proteins, nucleic acid. Although only a fraction of all archaea species have been successfully isolated and cultured in laboratory compared to the vast amount of species being sequenced, many important discoveries have been made from the cultured representatives. Continuous and batch cultures are two methods that can be used for cell culturing. The former culture method maintains steady-state conditions for cells by continuous inflow of new me- dium and outflow of cell culture [100, 101]. While in laboratory, cells are usually grown in batch culture in flask without supplement of new media or removal of culture. Different from continuous culture, in which cells grow at a constant rate, the growth curve of batch culture is generally divided into four distinct phases: lag phase, where cells get adapted to the new environment and prepare for cell division, exponential phase, where cells dividing at constant rate, stationary phase, where cell number is constant, and dead phase, where cell population starts declining [102].

Growth phases and the transition in between have been extensively studied in bacteria to understand cell adaptation, stress response, and transcriptional reg- ulation [102-107]. There are few studies about stress response in archaea [106- 109], but little is known about archaeal phase transition and stationary phase responses [110-112].

Growth phases physiology In lag phase, cells encounter sudden changes of growth environment, such as upshift of nutrients, less cell density, optimal pH and removal of toxic prod- ucts. Although cells are not dividing, the lag phase is a metabolically active phase, which involves increase of cell size, synthesis of components for cell

18 division, repair and de novo synthesis of damaged macromolecules from pre- vious culture [113]. Many factors affect the duration of the lag phase, such as inoculum volume, cell state of the inoculum, and composition of the new me- dia [113].

During exponential phase, the culture grows at a constant rate and cell number and mass increase exponentially. Although cells are dividing at the same rate, culture is not synchronized, which means that the culture is a mix of cells at different phases of the cell cycle. Flow cytometry analysis of Sulfolobus cells at exponential growth phase shows that the majority of the cells are in G2 phase of the cell cycle containing two chromosomes, there are also newly gen- erated cells that only contain one chromosome and cells that are replicating their chromosome [114]. The distribution of cells in different cell cycle phases in a culture at exponential phase depends on the cell cycle characteristics of each specific strain. Whereas Sulfolobus have the majority of cells in G2 phase because the cells spend more than half the cell cycle at G2 phase [115], cul- tures of the thaumarchaeon Nitrosopumilus maritimus contains more cells with one chromosome [116].

The rapid growth in exponential phase results in depletion of nutrients or ac- cumulation of toxic by-products, which leads to cessation of growth and the cells enter stationary phase. During stationary phase, cells are reprogramed to survive adverse conditions, such as cell morphology changes, more energy divert over hemostats [117]. Bacteria cells become smaller in stationary phase as a result of reductive division and dwarfing [103], and this decrease in cell size has been observed in Sulfolobus as well, but the mechanism behind this is not known. The flow cytometry analysis of Sulfolobus also shown that there are still newborn cells in the early stationary phase, but later all cells are kept in G2 phase with two chromosomes [114].

When the culture condition getting worse, the cell cannot maintain the integ- rity and eventually continues to the dead phase, where cell number in culture declines.

Transition to stationary phase in bacteria Stringent response is a well-studied survival mechanism of bacteria when they encounter stress or starvation [118]. Its characteristic features include down- regulation of rRNA and ribosomal and upregulation of stress proteins, amino acid biosynthesis [102, 119]. The response is mediated by the alarmones guanosine tetra- and pentaphosphates (p)ppGpp, which are accumulated during the transition from exponential growth phase to stationary phase [120]. However, the stringent response mechanisms are different in

19 gram negative and gram positive bacteria [121, 122]. In gram negative bacte- ria, (p)ppGpp binds RNAP directly at the  subunits. In coordination with an- other RNAP binding protein, DksA, (p)ppGpp destabilize the RNAP-DNA open complex with various outcomes: genes encoding rRNA, tRNA, and ri- bosomal proteins are downregulated, while genes coding for amino acid bio- synthesis, nutrient acquisition and stress response are upregulated [123, 124]. In gram positive bacteria, (p)ppGpp cannot bind RNAP directly, but instead binds specifically two enzymes involved in the GTP synthesis pathway and inhibits GTP synthesis [125]. Decreased level of cellular GTP results in dere- pression of the global TF CodY, which activates transcription of more than 80 genes encoding proteins in stress response, such as sporulation [126]. (p)ppGpp also inhibit assembly by inhibiting related assembly GTPases in gram positive bacteria [127].

Sigma factors in bacteria determine RNA polymerase specificity and have sig- nificant roles in survival during stresses and starvation. There are normally several sigma factors with different promoter specificity in bacteria that com- pete for binding with the core RNA polymerase [102]. In E. coli there are seven sigma factors, including one housekeeping factor σ70 and six additional stress response sigma factors that can direct the RNA polymerase to different promoter regions according to the external stimuli. Sigma factor S (σS) in E. coli is a master regulator upregulated several fold upon the entry into station- ary phase and affects 10% of the E. coli genes, which makes the cells more adapted and resistant to stationary phase conditions [128]. There are also other sigma factors responsible for motility and heat shock response which are up- regulated in stationary phase [102].

The nucleoid structure is also changed during growth and it depends on the level of a variety of nucleoid associated proteins (NAPs), which are not only involve in structural function but also implicated in transcription regulation [122]. In stationary phase, the conserved bacterial DNA binding protein Dps, is induced and becomes the most abundant protein [129]. Upon binding of Dps, the nucleoid forms a condensed complex, which protects DNA from a variety of stresses, such as radiation and heat shock [130]. Other NAPs like histone-like protein (H-NS) and integration host factor (IHF) also show higher abundance in stationary phase [122]. In contrast, the factor for inversion stim- ulation (Fis) is the most abundant NAPs in E. coli in exponential phase, but almost absent in stationary phase [129].

Global TFs play significant roles in adaptation and stress response as well. In a similar role as aforementioned CodY in gram positive bacteria, the E. coli TF Lrp regulates more than 400 (10%) stationary phase related genes, most of which are activated [122]. Activated gene products involved in for example amino acids biosynthesis, transport system catabolism, confer viability to the

20 cell to survive the unpleasant condition. Transcription of lrp itself is induced by (p)ppGpp, and there are substantial overlaps of Lrp responsive genes and (p)ppGpp and σS regulated genes, which create an extensive network of growth phase transcription control [131].

Transition to stationary phase in Archaea There are four studies focused on stringent control in archaea [112, 132-134]. In all tested strains from both Euryarchaeota and Crenarchaeota, none of them have detectable (p)ppGpp. However, all strains exhibited stringent control of stable RNA (rRNA and tRNA), whose expression were repressed upon star- vation [112]. As there is no (p)ppGpp present, the stringent response must be mediated by another unknown mechanism.

Like in bacteria, archaeal DNA binding proteins and their effects on transcrip- tion are also growth phase dependent. Halobacterium salinarum contains sev- eral NAPs and one histone protein HpyA. Their expression along the growth differs, with the expression of dpsA and mc1 being increased in stationary phase while hpyA is decreased [135]. Deletion of hpyA showed higher impact on stationary phase cells, with 220 genes showing significantly different ex- pression compared with only 37 genes in exponential growth cells [135]. The histone protein HTz in Thermococcus zilligii exhibits a similar trend as hpyA, the protein level of HTz decreases rapidly from early stationary phase and is completely undetectable in late stationary phase [136].

Model organisms for hyperthermophiles Sulfolobus belong to the Crenarchaeal phylum and are known as hyperther- moacidophiles that grow optimally at 65-85 °C and pH 2-3. They are discov- ered globally in hot springs and solfataric fields. Sulfolobus acidocaldarius and Sulfolobus solfataricus are two of the most studied strains in the genus and used as model organisms for many seminal studies, such as cell cycle, metabolism, DNA repair, transcription, translation, and replication [137]. S. acidocaldarius DSM639 was isolated from a hydrothermal spring in the Yel- lowstone National Park in 1972 and was the first hyperthermophilic microor- ganism found [138]. S. solfataricus was first discovered from Solfatara near Naples in Italy in 1975 [139]. As their names indicated, both species have irregular lobed cell shape (Fig. 3A) and can oxidize elemental sulfur hetero- trophically or autotrophically to obtain energy. To adapt to the elevated tem- perature and acidic environment, Sulfolobus spp. mainly have tetraether mon- olayers in their cytoplasmic membrane and the outer cell wall consists of a surface layer (S-layer) with glycoproteins [140, 141] (Fig. 3B). Adhesive pili and archaellum are also observed on the surface of the cells [142] (Fig. 3C).

21 means the proteins can be expressed feasibly in mesophilic strains, like Esch- erichia coli [145]. The stability usually also facilitating the crystallization of the protein and many important results have been obtained by studying these two strains, for example, the first archaeal RNA polymerases structure was from S. solfataricus, which makes the comparison between RNA polymerases from all three domains of life possible [146].

Third, the complete genome sequences of S. solfataricus and S. acidocaldar- ius were published in 2001 and 2005, respectively [137, 147]. S. acidocaldar- ius has a 2.3 Mbp circular genome encompassing 2292 predicted protein coding genes, while S. solfataricus has a larger genome about 2.9 Mbp and 3217 protein coding genes. The availability of the genome sequences opens up the great opportunity for genome wide studies, for example, comparative genomics assisted predictions of gene function and revealed vast similarities in transcription apparatuses between archaea and eukaryotes; whole genome microarray studies yielded valuable information on archaeal cell cycle. How- ever, nearly 40% of the predicted genes are hypothetical genes without func- tional annotations, which bear great potential for more exciting discoveries.

Forth, Sulfolobus are the only genus in Crenarchaeota that have genetic tools available [148]. Lacking peptidoglycan in the cell wall and harsh cultural con- dition of the strains hampers the development of genetic tools for these organ- isms, as most antibiotics are not applicable in such situation. To construct knockout mutants, auxotrophic selectable markers such as uracil or lactose have been commonly used in Sulfolobus [149]. A number of different shuttle vectors using Sulfolobus viruses or cryptic plasmids as backbone are available for reporter gene assay, recombinant protein overexpression, and construction of deletion mutants. Recently, genetic manipulations by utilizing intrinsic CRISPR I and III systems in S. islandicus have been achieved and have shown potential for applications in other archaea and bacteria that encode an active CRISPR-Cas system [150].

Methods used in the studies

Generation of knock-out strains Since most antibiotics are not suitable as selection marker for Sulfolobus, a common approach to overcome this hurdle is to use metabolically deficient mutant as recipient strain [151]. A S. solfataricus lactose deficient strain with an insertion in the lacS gene is widely used as recipient strain for shuttle vec- tors [152]. The auxotrophic strain is unable to grow with lactose as sole carbon and energy source and reintroducing of the gene variant into the cell, transi-

23 have been constructed by fusing the Sulfolobus virus SSV1 or the plasmid pRN1 with standard E. coli plasmids and used in different studies of Sulfolo- bus [156, 157]. pMJ03 is one of the widely used vectors to conduct reporter assays and protein overexpression in S. solfataricus. It was constructed by li- gating the complete SSV1 genome with the pUC18 E. coli vector and also including the reporter gene lacS under the control of a heat responsive pro- moter of the alpha subunit of the thermosome (tf55a) and the selection marker genes pyrEF [156]. The vector is integrated into the host chromosome at a specific position and stably maintained under selection pressure. Later, this shuttle vector has been improved by replacing the tf55a promoter with an arab- inose-inducible promoter to avoid heat stress during protein overexpression. In addition, a series of shuttle vectors with the plasmid pRN1 as backbone have been successfully constructed and used in S. acidocaldarius for reporter assay and protein overexpression [153]. These shuttle vectors are stable and do not integrate into the chromosome and the copy number are ranging from 2-8 copies per cell [157]. To construct deletion mutants, two methods (Fig. 4) have been employed, one using lactose for selection and the insertion of lacS into the target gene will complement the auxotrophy and disrupt the target (Fig. 4A). As S. acidocaldarius do not have lactose transporters, a method based on homologous recombination has been established (Fig. 4B).

Chromatin immunoprecipitation Protein-DNA interactions are critical for many fundamental cellular pro- cesses, such as transcription, replication, and translation. Chromatin immuno- precipitation (ChIP) assays were first reported in 1984 for studies of the bind- ing between RNA polymerase and its target DNA [158]. Today, ChIP is still a powerful tool and widely used to study in vivo protein-DNA interactions. It has also been adopted successfully in archaea research and a subset of TFs have been studied, such as Lrp [159-162], and TrmB family TFs [82, 83, 161]. The ChIP assay is generally conducted in four steps: crosslinking the protein and DNA, fragmentation of the DNA, immunoprecipitation of protein-DNA complex with specific antibodies, and DNA purification and downstream analysis (Fig. 5).

Protein-DNA interactions are dynamic, and in order to capture the transient contacts the complex must be stabilized first. Ultraviolet (UV) light and for- maldehyde are two main agents that could introduce covalent bonds between the protein and its nucleic acid target sequence. UV light irradiation forms irreversible bonds between protein and nucleic acid, so protease treatment has to be used to separate the DNA from the DNA-protein complex during DNA purification [163]. Formaldehyde crosslinking are, on the other hand, reversi- ble and more efficient, which means that the complex can be disrupted easily

25 by heating [164] and time for crosslink is shorter. The drawback of formalde- hyde fixation is that it also induces crosslinks between proteins. Further, it is possible to perform ChIP without the crosslinking step when the interaction between the protein and DNA is highly stable, such as for DNA and histone proteins [165].

Figure 5. Workflow of ChIP and downstream analysis methods. Cells are first crosslinked to maintain the in vivo interaction between proteins and DNA, after fixa- tion, sonication is used to break the cells and fragment the chromatin into preferred size (0.3-1kb). Then antibodies recognizing the specific protein are mixed with cell lysate, and the target protein and the DNA it crosslinked with will be precipitated with the antibody. Protein-DNA complex obtained from immunoprecipitation is purified and analysis by various methods, such as PCR, microarray, and next generation se- quencing. Figure adapted from [166].

Fragmentation can be achieved by either sonication or restriction enzyme cleavage. The choice of the length of fragments is dependent on downstream

26 applications. Longer fragments (0.3-1kb) are preferred for PCR and microar- ray analysis [167], fragments with average size of 300 bp are recommended with next generation sequencing.

After fragmentation, immunoprecipitation can be performed. Protein-DNA complex can be recognized by protein specific antibodies and isolated from the cell lysate with the help of agarose beads or magnetic beads that the anti- bodies can bind [167].

Thereafter, DNA fragments are purified and can be used in a variety of down- stream analysis. PCR and quantitative PCR are used to confirm interaction between the target protein and a known region of DNA. To get genome-wide binding information, microarray, and more prevalent next generation sequenc- ing can be used [82, 168].

High throughput sequencing It has been 40 years since the first generation DNA sequencing method, Sanger sequencing, was introduced [169]. The Sanger sequencing methodol- ogy has dominated the sequencing market for 30 years until 2005, when mas- sive parallel DNA sequencing or next generation sequencing (NGS) appeared and soon took over. Nowadays, a draft human genome can be easily obtained for $1000 within days by NGS technologies, whereas the first human genome, published in 2001, took scientists from all over the world 10 years to complete and cost hundreds of million dollars [170]. Several sequencing platforms based on different techniques are commercially available, including 454 se- quencing (2005), Solex/Illumina (2006), SOLiD (2007), Helicos (2009) and Ion Torrent (2010) [171]. However, after years of competition, Illumina dom- inants the market by more than 80% market share and its newly released No- vaSeq 6000 system promise to produce up to 10 billion reads in a single run. Except for Ion Torrent sequencer, all other sequencers have retired from the market [172, 173].

For the Illumina system, sequences are produced in three steps: DNA library preparation, clonal amplification and sequencing [174]. During DNA library construction, DNA samples are fragmented and different adaptors are ligated to each end of the fragment using PCR amplification (Fig. 6A). Denatured DNA fragments will anneal to complementary oligonucleotides immobilized on the solid surface of the flow cell. After annealing, the ssDNA is clonally amplified by bridge amplification (Fig. 6B) and several millions of non-over- lapping clusters will be formed in one single lane on a flow cell.

27 labelled with unique fluorophores, respectively. The fluorescence from the newly incorporated nucleotides is captured by the imaging system in each cy- cle (Fig. 6C). After imaging, terminator group and fluorescent dyes will be chemically removed to facilitate the next cycle of synthesis and imaging. The imaging will be recorded and processed so that the sequences of millions of the DNA templates (clusters) are obtained in parallel. The maximum read length, equals to the number of synthesis cycles, is today limited to 300 bp, longer read lengths will give unacceptable high error rates [172].

Direct application of NGS is to obtain whole genome sequences of organisms which will have great implication of various research, such as metagenome sequencing of environmental or human samples, evolution of species, and ge- netic basis of diseases [177]. Many other protocols adapting NGS are widely used. One of the most popular applications is RNA sequencing (RNA-seq) [178], which uses total RNA instead of genomic DNA as starting material for transcriptome analysis. The sequencing can be done strand specific to differ- entiate sense and antisense strand. Several modified RNA-seq protocols are also used to infer the transcription starting site (TSS), discovery and profiling of small noncoding . To map the TFs binding sites across the genome, chromatin immunoprecipitation followed by NGS (ChIP-seq) is the most re- liable method [179]. Precise genomic information at whole genome level ob- tained by diverse NGS adapted approaches has brought biological research into a new horizon.

NGS produces an immense amount of data in the form of short reads ranging from 50 to 400 bp, development of software are as important as the sequencing technologies [172]. Fortunately, there are many open source and commercial softwares available for different analyses, such as sequence alignment, de novo genome assembly, mutation screening and copy number analysis. The softwares are also rapidly updated and become easier for users with less bio- informatic background, which in turn promotes the application of the NGS technologies [169].

29 Aim of the thesis

This thesis consists of two projects with the aim to expand our knowledge of the hyperthermophilic model organisms S. acidocaldarius and S. solfataricus.

In project I, two TFs from S. acidocaldarius have been studied in order to bring more insight into their roles in transcription regulation. Paper 1: Genome-wide binding profiles of BarR Paper 2: Characterization of FadR, a fatty acid metabolism regulator

In Project II, the global gene transcription changes of S. solfataricus during growth phase transition were investigated to bring more basic knowledge into the field of archaea cell biology (Paper 3).

30 Project Summary

Project I. Characterization of archaeal TFs

Paper 1 The genome-scale DNA-binding profile of BarR, a β-alanine responsive transcription factor in S. acidocaldarius

BarR from S. acidocaldarius has been identified as a β-alanine responsive ac- tivator in a previous study [81]. This is a novel Lrp regulator, as all other members of the Lrp family use α-amino acids as ligand. The study also shows that BarR binds to the intergenic region between barR and Saci_2137 (puta- tive aminotransferase) and the binding activates the transcription of Saci_2137. To know if BarR has other target genes in the genome, ChIP-seq was used to reveal its whole genome binding profile. Since -alanine is the specific ligand of BarR, ChIP-seq has been done for cells cultured with or without exogenously added 10 mM -alanine.

Genome-wide binding of BarR Except for previously identified barR/Saci_2137 intergenic region, 20 more binding regions were identified across the genome with ChIP-seq under the two culture conditions. There is no apparent difference in gene expression profiles between the two culture conditions, which suggests that ligand bind- ing dose not strongly affect the DNA binding in vivo for BarR.

ChIP-seq confirmed the in vivo association of BarR with the barR-saci_2137 intergenic region that is responsible for the autoregulation and regulation of the aminotransferase expression. However, in addition to this intergenic re- gion, binding extends into the coding sequence of the BarR target gene Saci_2137, resulting in a complex binding profile with three peak summits with similar motif sequences. Possibly, protein-protein interactions between several BarR bound at these different sites results in the formation of a higher- order nucleoprotein structure.

Regulons of BarR Quantitative reverse transcriptase-PCR was done to investigate the effect of BarR deletion on the expression of the potential target genes. Only one addi-

31 tional operon, Saci_2320 and Saci_2321, encoding glutamate synthase en- zymes was significantly downregulated in ΔbarR versus wt in the presence of 10 mM β-alanine. The operator region of this operon is a hot spot for Lrp family TFs as a previous study showed that the glutamine responsive TF Sa- Lrp also binds here but had no effect on gene expression.

In summary, we obtained a genome-wide binding profile of BarR, and found one additional target operon of BarR, which encodes glutamate synthase en- zymes which is also a binding target for another Lrp family TF Sa-Lrp.

Paper 2 A TetR family regulator, FadR, regulates fatty acid metabolism in S. ac- idocaldarius

Archaeal cell membranes consist of isoprenoid chains instead of fatty acids, but small amount of fatty acids have been detected in several archaea, includ- ing Sulfolobus strains. Their function and metabolic pathways in archaea are still unclear. S. acidocaldarius contains a gene cluster, Saci_1103-Saci_1126, encoding extensive fatty acid degradation related genes, such as β-oxidation enzymes, putative acetyl-CoA acetyltransferase enzymes and lipid degrada- tion genes. There is also a gene (Saci_1107) encoding a TetR family TF, which we later renamed as FadRSa base on its structure and involvement in fatty acid degradation in S. acidocaldarius.

Structure of the FadRSa The crystal structure of FadRSa was determined to 2.4 Å resolution. In solu- tion, the protein forms a homodimer and each subunit contains two functional domains: an N-terminal HTH DNA-binding domain and a C-terminal dimeri- zation and ligand binding domain. The structure highly resembles the FadR TFs from Thermus thermophiles and Bacillus sp., which repress genes in- volved in β-oxidation in respective strain.

Genome-wide binding and regulon of FadRSa ChIP-seq has been used to reveal binding sites of FadRSa across the genome. A total of 14 peaks were identified, of which the highest peaks were observed within the Saci_1103-Saci_1126 gene cluster. The two highest binding regions were located within the intergenic region of a divergent operon encod- ing the fadRSa gene itself and a putative esterase-encoding gene. Two additional peaks appeared within the coding sequence of gene Saci_1115 and in the intergenic region between a fatty acid -oxidation operon and a copG transcription factor gene.

32 TF binding does not necessarily means regulation, to infer whether or not FadRSa regulates the genes nearby the binding sites, RNA-seq was done for the wt strain and the fadRSa deletion mutant. The two strains did not display differences in growth rate or cell morphology. Gene expression analysis re- vealed that all genes in the Saci_1103-Saci_1126 gene cluster were induced in the fadRSa deletion mutant, which indicates that FadRSa is a local repressor of this entire gene cluster. Although FadRSa also binds other genomic regions in vivo, no transcription alteration of genes in these regions was observed in the RNA-seq analysis.

FadRSa-ligand interactions Electrophoretic mobility shift assays (EMSAs) demonstrated that acyl-CoA molecules were specific ligands for FadRSa and ligand binding could dissoci- ate FadRSa from DNA. The co-crystal structure of FadRSa bound to ligand (lau- royl-CoA) was resolved to 1.90 Å. Although FadRSa has similar structure as its bacterial counterparts, the binding mode of the ligand was completely dif- ferent. Instead of entering the binding pocket from within the dimer interface, lauroyl-CoA entered FadRSa from the outside. Upon binding, the distance be- tween the two major DNA binding helixes (α3) of HTH DBDs in the homodi- mer increased, which prevented FadRSa binding to the DNA.

Mechanisms of DNA binding FadRSa possesses the classic HTH DBD, with recognition helix α3 from each subunit of the homodimer interacting with two neighbouring major grooves of the DNA. FadRSa-DNA co-crystal structure was obtained and it revealed an extensive number of residues involved in the interaction. These residues were substituted with alanine and tested in EMSA, which showed that residues Y47, Y52 and Y53 all had negative effects on DNA binding, while G48 was more G48A crucial as the FadRSa mutant had no ability to bind DNA.

In summary, we found that FadRSa is a structural and functional homolog of bacterial FadR regulators belonging to the TetR family, which repress β-oxi- dation of fatty acids. Acyl-CoA molecules are specific ligands for FadRSa and its homologues, but the ligand binding mode and affinity for different lengths of the ligands are different. RNA-seq showed that FadRSa is a local repressor of an entire lipid and fatty acid metabolism related gene cluster Saci_1103- Saci_1126, where four binding regions were observed from ChIP-seq analysis and confirmed with EMSA.

33 Project II. Transcription responses during growth phase transition

Paper 3 Genome-wide transcription responses during transition from exponential growth to stationary phase in the archaeon Sulfolobus solfataricus

Transcriptome data of S. solfataricus at four different time points during growth, including early exponential phase, late exponential phase, early sta- tionary phase and late stationary phase, have been studied and revealed a mas- sive change in transcription profiles during growth phase transition. Out of a total of 2978 protein coding genes, 1067 (35.8%) were identified as differentially expressed: upregulated (456 genes), downregulated (464 genes) and fluctuated genes (147 genes). Function of the differentially expressed genes (DEGs) was investigated by using the archaeal Clusters of Orthologous Genes (arCOGs) function categories.

Most of DEGs belonging to arCOGs J (Translation), D (Cell cycle control), T (Signal transduction), M (Cell wall) and F (Nucleotide transport and metabo- lism) were clearly shown to be downregulated during the growth.

Upregulated genes belonged mostly to arCOGs L (Recombination and repair), N (Cell motility), U (Intracellular trafficking), O (Post-translational modifica- tions), X (Mobilome), I (Lipid transport and metabolism), P (Inorganic ion transport and metabolism) and Q (Secondary metabolites transport and me- tabolism).

The arCOGs K (Transcription), C (Energy production and conversion) and G (Carbohydrate transport and metabolism) were shown to contain approxi- mately the same portion of up- and down-regulated genes. Metabolic path- ways showed clear switches dependent on the growth phase, for example, glu- coneogenesis and reverse ribulose monophosphate were shutting down and on the other hand, pyruvate and acetyl-CoA were instead actively synthesised from acetate. Genes involved in the electron transport chain were downregu- lated, while two quinol oxidase complexes showed opposite profiles. The SoxB complex showed continuous downregulation, but the SoxM complex was upregulated in stationary phase.

In summary, significant changes were observed during the growth of S. solfa- taricus, carefully analysis of the data would deeper our understanding of the growth phase transition of archaea. The obtained data may also help us to

34 identify the signalling pathways responding to the environmental signals fol- lowing growth phase transition. In addition, a comparison of the transcrip- tome, during growth phase transition, of the two related strains S. solfataricus and S. acidocaldarius will illustrate the distinction in cell biology between the two species.

35 Future perspective

Two archaeal TFs, BarR and FadRSa, have been studied. Using ChIP-seq to identify their genome-wide binding sites revealed that only a small portion of the binding sites had a regulatory function, which is consistent with other ChIP-seq studies that many specific bindings of TFs in the genome show no regulatory function [180]. The BarR binding region of the two target genes, barR and Saci_2137, extended into the coding sequence of Saci_2137 display- ing three evenly spaced peaks, which indicated a higher order protein-DNA complex. For FadRSa, four binding peaks were observed in a gene cluster (Saci_1103-Saci_1126), and all the 23 genes were repressed by FadRSa as showed by the RNA-seq data. The repression mechanism is unknown, but may function by long-distance interactions and loop formations. Further structural studies would provide more details about the interactions between DNA and TFs.

The goal of the TFs study is to create gene regulatory networks (GRNs) in archaea to understand their dynamic transcription responses. Studies of both BarR and FadRSa show implication of interactions between TFs. The newly identified glutamate synthetase operon that is activated by BarR is also a com- mon target for several other Lrp regulators and the expression of barR itself is under control of Sa-Lrp. The gene cluster repressed by FadRSa contains one CopG family TF with unknown targets. To construct GRNs, functional char- acterization of the target genes is crucial. Only two out of 23 genes regulated by FadR were experimentally characterized, although the -oxidation genes are well annotated, but the activity should be tested. The work included in this thesis only represents a tip of the iceberg of the whole GRN in archaea, and substantial works are needed to fill the blank.

The second project provides transcription responses during batch culture growth of S. solfataricus. Significant alterations in transcription have been observed for the transition from exponential phase to stationary phase, during which many factors, such as pH, nutrients, and cell density, are changing in parallel. We cannot pinpoint all the events that happen, publicity of the data would enable Archaea community make the best use of it. Almost one-third of the differently expressed genes are hypothetical, with the continuous up- dating of databases for annotation, methodology and experimental characteri- zation will provide more invaluable information of cell physiology and adap- tation mechanisms.

36 Sammanfattning

Organismer inom domänen Arkéer finns över hela vår planet och representerar många olika fascinerande livsformer. Släktet Sulfolobus tillhör stammen Crenarchaeota och inkluderar hypertermoacidofila arter som växer optimalt vid 65-85°C och pH 2-3. Dessa organismer används som modellorganismer för termofiler för att undersöka DNA-replikation, transkription, translation, cellcykel etc. i arkéer.

Fokusen för denna avhandling är en karaktärisering av två transkriptions- faktorer (TF) från den hypertermoacidofila arkéen Sulfolobus acidocaldarius, samt förändringar av transkriptomet hos den närbesläktade Sulfolobus solfataricus, vid övergång från exponentiell tillväxtfas till stationär fas. Målet är att öka vår kunskap om transkriptionsreglering och anpassning till olika tillväxtfaser i arkéer.

I artikel 1 har vi identifierat bindningsställen för BarR i genomet hos S. acidocaldarius. BarR är en β-alanin responsiv TF inom Lrp-familjen som aktiverar uttrycket av β-alanin aminotransferas. Med hjälp av kromatin- immunprecipitering följd av sekvensering (ChIP-seq) kunde 21 bindnings- regioner, bland annat den tidigare karaktäriserade barR/Saci_2137 intergeniska regionen, identifieras. Endast ett ytterligare operon, bestående av två glutaminsyntasgener (Saci_2320 och Saci_2321), visade sig vara aktiverat av BarR. Detta operon är ett vanligt mål för LyM och Sa-Lrp, vilket tyder på att transkriptionsfaktorer från Lrp-familjen kan interagera med varandra och bilda nätverk och på så vis göra regleringen mer specifik. I artikel 2 visade vi att en transkriptionsrepressor inom familjen TetR, FadRSa, reglerar fettsyrametabolismen i S. acidocaldarius. fadRSa finns i ett kluster av gener, Saci_1103-Saci_1126, som främst innehåller gener ansvariga för nedbrytning av lipider och metabolism av fettsyror. ChIP-seq påvisade fyra bindningsställen inom detta genkluster och RNA-sekvensering bekräftade att uttrycket av alla gener i klustret var hämmat av FadRSa. FadRSa binder DNA med ett 16 baspar långt motiv med dyadisk symmetri, och inbindning av medellånga till långa acyl-CoA-molekyler till FadRSa resulterade i dissociation från DNA. Även om FadRSa liknar sina bakteriella motsvarigheter både funktionellt och strukturellt, så har ett fundamentalt annorlunda sätt för ligandbindning observerats.

I artikel 3 utvärderades transkriptomdata från fyra olika tillväxtfaser, tidig exponentiell fas, exponentiell fas, tidig stationär fas och sen stationär fas, för

37 S. solfataricus. Resultaten visade på en v förändring av genuttrycket för ett stort antal gener när cellerna övergår från exponentiell tiväxtfas till stationär fas. Av totalt 2978 (35.8%) proteinkodande gener identifierades 1067 gener med avvikande uttryck, vilka omfattade 456 inducerade gener vilka var främst besläktade med transposaser, metabolism och stressrespons, 464 gener var hämmade och var främst involverade i translation, basal transkription, DNA- replikation, aminosyrametabolism och försvarsmekanismer, de resterande 147 generna visade en fluktuerande profil och sammanlänkas främst med transportörer, oxidation-reduktions processer och ett fåtal metaboliska gener.

För att summera, studierna av de två transkriptionsfaktorerna i S.acidocaldarius, BarR and FadRSa, har bidragit till ökad kunskap om deras funktion och regultoriska mekanismer. Transkriptomdatan från S. solfataricus visar inte bara förändringar av genuttryck vid övergången mellan tillväxtfaser, utan ger även tillgång till en rik källa av information för fortsatta studier av cellbiologin hos arkéer.

38 Acknowledgment

I am deeply grateful to my late supervisor Rolf Bernander for accepting me as a PhD student and guiding me into the fascinating world of Archaea, his critical thinking, sense of humor, passions for the research will be remembered forever.

I want to sincerely thank my supervisor Ann-Christin Lindås, it would be impossible for me to finish this thesis without your consistent guidance and encouragement. Your positive attitude, kindness, and enduring faith not only keep the laboratory running, but also have been enlightening and encouraging to me.

I would like to thank my co-supervisor Marie Öhman and my external exam- iner Per Ljungdahl, for your suggestion and help through my entire PhD years.

My special thanks are dedicated to my collaborator Eveline Peeters for your great effort in our collaboration. Your scientific knowledge, arrangement and inspiring discussion make our projects run smoothly and productively. Sincere thanks also go to your PhD students Han Liu, in the BarR project, David Sybers, Liesbeth Lemmens and Hassan Ramadan Maklad in the FadR pro- ject, for their great contribution to the projects. I would also like to thank some other people involved in the FadR project: Bettina Siebers, Xiaoxiao Zhou, and Christopher Bräsen, thank you for your contribution to the project and manuscript. Karin Valegård, thank you for your great jobs and amazing pro- tein structures.

I would like to give my thanks to Magnus Lundgren and Anna Knöppel for your substantial input for the Exp-Stat project. Many thanks to Erik Pelve for teaching me the first Linux command, and your homemade quizzes really opened doors to Perl for me.

I always feel blessed for being a member of the Archaea group, I am grateful to everyone in the group. Mattias, it is always enjoyable to chat with you, your cakes and party costumes are the best! Fredrik, it is great that we can share passions about green tea, dumpling and badminton, and many thanks for saving me from the poster printer, twice. Mohea, you are amazing, your en-

39 ergy is infinite, and your laugh is penetrating. I will not forget your encour- agement and support. Sabeen, I am glad to share the office with you and hear about your children’s kindergarten life.

I want to give my gratitude to all the friends in our Lantis lunch group and Frescati badminton group. Thanks to Xiongzhuo Tang, Yunpo Zhao, Xiao Wang, Simei Yu, Chen Hou, Lei Cheng, Ning Sun, Sifang Liao, Lidi Xu, Xiaoze, Liqun Yao, Wenjing Kang, Xin Li, Jia Sun, Yue Tang, Jiaxin Li, Xueping Sun. The Lantis food became delicious with your companies. I will remember the wonderful time we have been spent together over the years.

Million thanks I want to say to everyone at MBW, especially people in E5 and F4 corridor, for being so friendly and kind to me.

I would like to thank my parents, brother, and sister in law, for your endless love and support.

Finally, I want to thank my girlfriend, Wei, for all the fights and love.

40 References

1. Fox, G.E., et al., Classification of methanogenic bacteria by 16S ribosomal RNA characterization. Proc Natl Acad Sci U S A, 1977. 74(10): p. 4537-41. 2. Woese, C.R. and G.E. Fox, Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci U S A, 1977. 74(11): p. 5088-90. 3. Zillig, W., K.O. Stetter, and D. Janekovic, DNA-dependent RNA polymerase from the archaebacterium Sulfolobus acidocaldarius. Eur J Biochem, 1979. 96(3): p. 597-604. 4. Kandler, O. and H. Konig, Chemical composition of the peptidoglycan-free cell walls of methanogenic bacteria. Arch Microbiol, 1978. 118(2): p. 141-52. 5. Woese, C.R., O. Kandler, and M.L. Wheelis, Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A, 1990. 87(12): p. 4576-9. 6. Albers, S.V., et al., The legacy of Carl Woese and Wolfram Zillig: from phylogeny to landmark discoveries. Nat Rev Microbiol, 2013. 11(10): p. 713-9. 7. Takai, K., et al., Cell proliferation at 122 degrees C and isotopically heavy CH4 production by a hyperthermophilic methanogen under high-pressure cultivation. Proc Natl Acad Sci U S A, 2008. 105(31): p. 10949-54. 8. Schleper, C., G. Jurgens, and M. Jonuscheit, Genomic studies of uncultivated archaea. Nature Reviews Microbiology, 2005. 3(6): p. 479-488. 9. Probst, A.J., A.K. Auerbach, and C. Moissl-Eichinger, Archaea on Human Skin. PLoS One, 2013. 8(6). 10. Moissl-Eichinger, C., et al., Human age and skin physiology shape diversity and abundance of Archaea on skin. Sci Rep, 2017. 7. 11. Delong, E.F., et al., High Abundance of Archaea in Antarctic Marine Picoplankton. Nature, 1994. 371(6499): p. 695-697. 12. Brochier-Armanet, C., et al., Mesophilic Crenarchaeota: proposal for a third archaeal phylum, the Thaumarchaeota. Nat Rev Microbiol, 2008. 6(3): p. 245-52. 13. Barns, S.M., et al., Perspectives on archaeal diversity, thermophily and monophyly from environmental rRNA sequences. Proc Natl Acad Sci U S A, 1996. 93(17): p. 9188-93. 14. Nunoura, T., et al., Insights into the evolution of Archaea and eukaryotic protein modifier systems revealed by the genome of a novel archaeal group. Nucleic Acids Res, 2011. 39(8): p. 3204-23.

41 15. Kozubal, M.A., et al., Geoarchaeota: a new candidate phylum in the Archaea from high-temperature acidic iron mats in Yellowstone National Park. ISME J, 2013. 7(3): p. 622-34. 16. Huber, H., et al., A new phylum of Archaea represented by a nanosized hyperthermophilic symbiont. Nature, 2002. 417(6884): p. 63-7. 17. Spang, A., et al., Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature, 2015. 521(7551): p. 173-9. 18. Guy, L. and T.J. Ettema, The archaeal 'TACK' superphylum and the origin of eukaryotes. Trends Microbiol, 2011. 19(12): p. 580-7. 19. Spang, A., E.F. Caceres, and T.J.G. Ettema, Genomic exploration of the diversity, ecology, and evolution of the archaeal domain of life. Science, 2017. 357(6351). 20. Rinke, C., et al., Insights into the phylogeny and coding potential of microbial dark matter. Nature, 2013. 499(7459): p. 431-437. 21. Zaremba-Niedzwiedzka, K., et al., Asgard archaea illuminate the origin of eukaryotic cellular complexity. Nature, 2017. 541(7637): p. 353-358. 22. Hedlund, B.P., et al., Uncultivated thermophiles: current status and spotlight on 'Aigarchaeota'. Curr Opin Microbiol, 2015. 25: p. 136- 45. 23. Elkins, J.G., et al., A korarchaeal genome reveals insights into the evolution of the Archaea. Proc Natl Acad Sci U S A, 2008. 105(23): p. 8102-7. 24. Lombard, J., P. Lopez-Garcia, and D. Moreira, The early evolution of lipid membranes and the three domains of life. Nat Rev Microbiol, 2012. 10(7): p. 507-15. 25. Gattinger, A., M. Schloter, and J.C. Munch, Phospholipid etherlipid and phospholipid fatty acid fingerprints in selected euryarchaeotal monocultures for taxonomic profiling. FEMS Microbiol Lett, 2002. 213(1): p. 133-9. 26. Lombard, J., P. Lopez-Garcia, and D. Moreira, An ACP-independent fatty acid synthesis pathway in archaea: implications for the origin of phospholipids. Mol Biol Evol, 2012. 29(11): p. 3261-5. 27. Dacks, J.B., et al., The changing view of eukaryogenesis - fossils, cells, lineages and how they all come together. J Cell Sci, 2016. 129(20): p. 3695-3703. 28. Sprott, G.D., M. Meloche, and J.C. Richards, Proportions of diether, macrocyclic diether, and tetraether lipids in Methanococcus jannaschii grown at different temperatures. J Bacteriol, 1991. 173(12): p. 3907-10. 29. Macalady, J.L., et al., Tetraether-linked membrane monolayers in Ferroplasma spp: a key to survival in acid. Extremophiles, 2004. 8(5): p. 411-9. 30. Oger, P.M. and A. Cario, Adaptation of the membrane in Archaea. Biophys Chem, 2013. 183: p. 42-56.

42 31. Villanueva, L., J.S. Damste, and S. Schouten, A re-evaluation of the archaeal membrane lipid biosynthetic pathway. Nat Rev Microbiol, 2014. 12(6): p. 438-48. 32. Jarrell, K.F. and S.V. Albers, The archaellum: an old motility structure with a new name. Trends Microbiol, 2012. 20(7): p. 307-12. 33. Herzog, B. and R. Wirth, Swimming behavior of selected species of Archaea. Appl Environ Microbiol, 2012. 78(6): p. 1670-4. 34. Brasen, C., et al., Carbohydrate metabolism in Archaea: current insights into unusual enzymes and pathways and their regulation. Microbiol Mol Biol Rev, 2014. 78(1): p. 89-175. 35. Schiraldi, C., M. Giuliano, and M. De Rosa, Perspectives on biotechnological applications of archaea. Archaea, 2002. 1(2): p. 75- 86. 36. Lundgren, M., et al., Three replication origins in Sulfolobus species: synchronous initiation of chromosome replication and asynchronous termination. Proc Natl Acad Sci U S A, 2004. 101(18): p. 7046-51. 37. Norais, C., et al., Genetic and physical mapping of DNA replication origins in Haloferax volcanii. PLoS Genet, 2007. 3(5): p. e77. 38. Pelve, E.A., et al., Four chromosome replication origins in the archaeon Pyrobaculum calidifontis. Mol Microbiol, 2012. 85(5): p. 986-95. 39. Wu, Z., et al., DNA replication origins in archaea. Front Microbiol, 2014. 5: p. 179. 40. Grabowski, B. and Z. Kelman, Archeal DNA replication: eukaryal proteins in a bacterial context. Annu Rev Microbiol, 2003. 57: p. 487- 516. 41. Lange, C., et al., Genome-wide analysis of growth phase-dependent translational and transcriptional regulation in halophilic archaea. BMC Genomics, 2007. 8. 42. Lopez-Maury, L., S. Marguerat, and J. Bahler, Tuning gene expression to changing environments: from rapid responses to evolutionary adaptation. Nature Reviews Genetics, 2008. 9(8): p. 583-593. 43. Bell, S.D. and S.P. Jackson, Mechanism and regulation of transcription in archaea. Curr Opin Microbiol, 2001. 4(2): p. 208-13. 44. Peeters, E., N. Peixeiro, and G. Sezonov, Cis-regulatory logic in archaeal transcription. Biochem Soc Trans, 2013. 41(1): p. 326-31. 45. Karr, E.A., Transcription Regulation in the Third Domain. Advances in Applied Microbiology, Vol 89, 2014. 89: p. 101-133. 46. Bell, S.D., Archaeal transcriptional regulation - variation on a bacterial theme? Trends Microbiol, 2005. 13(6): p. 262-265. 47. Werner, F. and D. Grohmann, Evolution of multisubunit RNA polymerases in the three domains of life. Nature Reviews Microbiology, 2011. 9(2): p. 85-98. 48. Perez-Rueda, E. and S.C. Janga, Identification and genomic analysis of transcription factors in archaeal genomes exemplifies their

43 functional architecture and evolutionary origin. Mol Biol Evol, 2010. 27(6): p. 1449-59. 49. Bell, S.D., et al., Orientation of the transcription preinitiation complex in archaea. Proc Natl Acad Sci U S A, 1999. 96(24): p. 13662-7. 50. Gietl, A., et al., Eukaryotic and archaeal TBP and TFB/TF(II)B follow different promoter DNA bending pathways. Nucleic Acids Res, 2014. 42(10): p. 6219-31. 51. Blombach, F., et al., Molecular Mechanisms of Transcription Initiation-Structure, Function, and Evolution of TFE/TFIIE-Like Factors and Open Complex Formation. J Mol Biol, 2016. 428(12): p. 2592-2606. 52. Nagy, J., et al., Complete architecture of the archaeal RNA polymerase open complex from single-molecule FRET and NPS. Nat Commun, 2015. 6. 53. Schulz, S., et al., TFE and Spt4/5 open and close the RNA polymerase clamp during the transcription cycle (vol 113, pg E1816, 2016). Proc Natl Acad Sci U S A, 2016. 113(20): p. E2871-E2871. 54. Gehring, A.M., J.E. Walker, and T.J. Santangelo, Transcription Regulation in Archaea. J Bacteriol, 2016. 198(14): p. 1906-1917. 55. Lange, U. and W. Hausner, Transcriptional fidelity and proofreading in Archaea and implications for the mechanism of TFS-induced RNA cleavage. Mol Microbiol, 2004. 52(4): p. 1133-1143. 56. Fouqueau, T., et al., The transcript cleavage factor paralogue TFS4 is a potent RNA polymerase inhibitor. Nat Commun, 2017. 8. 57. Santangelo, T.J., et al., Archaeal Intrinsic Transcription Termination In Vivo. J Bacteriol, 2009. 191(22): p. 7102-7108. 58. Walker, J.E., O. Luyties, and T.J. Santangelo, Factor-dependent archaeal transcription termination. Proc Natl Acad Sci U S A, 2017. 114(33): p. E6767-E6773. 59. Ao, X., et al., The Sulfolobus Initiator Element Is an Important Contributor to Promoter Strength. J Bacteriol, 2013. 195(22): p. 5216-5222. 60. Hoch, J.A., Two-component and phosphorelay signal transduction. Curr Opin Microbiol, 2000. 3(2): p. 165-170. 61. Zschiedrich, C.P., V. Keidel, and H. Szurmant, Molecular Mechanisms of Two-Component Signal Transduction. J Mol Biol, 2016. 428(19): p. 3752-75. 62. Ulrich, L.E., E.V. Koonin, and I.B. Zhulin, One-component systems dominate signal transduction in prokaryotes. Trends Microbiol, 2005. 13(2): p. 52-56. 63. Wuichet, K., B.J. Cantwell, and I.B. Zhulin, Evolution and phyletic distribution of two-component signal transduction systems. Curr Opin Microbiol, 2010. 13(2): p. 219-225. 64. Coulson, R.M., N. Touboul, and C.A. Ouzounis, Lineage-specific partitions in archaeal transcription. Archaea, 2007. 2(2): p. 117-25.

44 65. Tenorio-Salgado, S., A. Huerta-Saquero, and E. Perez-Rueda, New insights on gene regulation in archaea. Comput Biol Chem, 2011. 35(6): p. 341-6. 66. Darnell, C.L., et al., Systematic Discovery of Archaeal Transcription Factor Functions in Regulatory Networks through Quantitative Phenotyping Analysis. Msystems, 2017. 2(5). 67. Yoon, S.H., et al., A systems level predictive model for global gene regulation of methanogenesis in a hydrogenotrophic methanogen. Genome Research, 2013. 23(11): p. 1839-1851. 68. Usui, K., et al., Protein-protein interactions of the hyperthermophilic archaeon Pyrococcus horikoshii OT3. Genome Biol, 2005. 6(12). 69. Aravind, L. and E.V. Koonin, DNA-binding proteins and evolution of transcription regulation in the archaea. Nucleic Acids Res, 1999. 27(23): p. 4658-4670. 70. Peeters, E., et al., Ss-LrpB, a novel Lrp-like regulator of Sulfolobus solfataricus P2, binds cooperatively to three conserved targets in its own control region. Mol Microbiol, 2004. 54(2): p. 321-36. 71. Charoensawan, V., D. Wilson, and S.A. Teichmann, Genomic repertoires of DNA-binding transcription factors across the tree of life. Nucleic Acids Res, 2010. 38(21): p. 7364-77. 72. Aravind, L., et al., The many faces of the helix-turn-helix domain: transcription regulation and beyond. FEMS Microbiol Rev, 2005. 29(2): p. 231-62. 73. Perez-Rueda, E., J. Collado-Vides, and L. Segovia, Phylogenetic distribution of DNA-binding transcription factors in bacteria and archaea. Comput Biol Chem, 2004. 28(5-6): p. 341-50. 74. Schreiter, E.R. and C.L. Drennan, Ribbon-helix-helix transcription factors: variations on a theme. Nature Reviews Microbiology, 2007. 5(9): p. 710-720. 75. Chivers, P.T. and T.H. Tahirov, Structure of Pyrococcus horikoshii NikR: Nickel sensing and implications for the regulation of DNA recognition. J Mol Biol, 2005. 348(3): p. 597-607. 76. Sheppard, C. and F. Werner, Structure and mechanisms of viral transcription factors in archaea. Extremophiles, 2017. 21(5): p. 829- 838. 77. Contursi, P., et al., Host and viral transcriptional regulators in Sulfolobus: an overview. Extremophiles, 2013. 17(6): p. 881-895. 78. Krishna, S.S., I. Majumdar, and N.V. Grishin, Structural classification of zinc fingers. Nucleic Acids Res, 2003. 31(2): p. 532- 550. 79. Chen, H.T., et al., Structure of a (Cys(3)His) zinc ribbon, a ubiquitous motif in archaeal and eucaryal transcription. Protein Science, 2000. 9(9): p. 1743-1752. 80. Yokoyama, K., et al., Feast/famine regulation by transcription factor FL11 for the survival of the hyperthermophilic archaeon Pyrococcus OT3. Structure, 2007. 15(12): p. 1542-54.

45 81. Liu, H., et al., BarR, an Lrp-type transcription factor in Sulfolobus acidocaldarius, regulates an aminotransferase gene in a beta-alanine responsive manner. Mol Microbiol, 2014. 92(3): p. 625-39. 82. Reichelt, R., et al., Genome-wide binding analysis of the transcriptional regulator TrmBL1 in Pyrococcus furiosus. BMC Genomics, 2016. 17: p. 40. 83. Schmid, A.K., et al., A single transcription factor regulates evolutionarily diverse but functionally linked metabolic pathways in response to nutrient availability. Mol Syst Biol, 2009. 5: p. 282. 84. Rohlin, L., et al., Heat shock response of Archaeoglobus fulgidus. J Bacteriol, 2005. 187(17): p. 6046-57. 85. Najnin, T., et al., Characterization of a temperature-responsive two component regulatory system from the Antarctic archaeon, Methanococcoides burtonii. Sci Rep, 2016. 6: p. 24278. 86. Sheehan, R., et al., The Methanosarcina acetivorans thioredoxin system activates DNA binding of the redox-sensitive transcriptional regulator MsvR. Journal of Industrial Microbiology & Biotechnology, 2015. 42(6): p. 965-969. 87. Isom, C.E., et al., Redox-sensitive DNA binding by homodimeric Methanosarcina acetivorans MsvR is modulated by cysteine residues. Bmc Microbiology, 2013. 13. 88. Sharma, K., et al., The RosR transcription factor is required for gene expression dynamics in response to extreme oxidative stress in a hypersaline-adapted archaeon. BMC Genomics, 2012. 13. 89. Luger, K., et al., Crystal structure of the nucleosome core particle at 2.8 angstrom resolution. Nature, 1997. 389(6648): p. 251-260. 90. Peeters, E., et al., The interplay between nucleoid organization and transcription in archaeal genomes. Nature Reviews Microbiology, 2015. 13(6): p. 333-341. 91. Sandman, K., et al., Archaeal histones and . Hyperthermophilic Enzymes, Pt C, 2001. 334: p. 116-129. 92. Pereira, S.L., et al., Archaeal nucleosomes. Proc Natl Acad Sci U S A, 1997. 94(23): p. 12633-12637. 93. Ammar, R., et al., Chromatin is an ancient innovation conserved between Archaea and Eukarya. Elife, 2012. 1. 94. Xue, H., et al., An abundant DNA binding protein from the hyperthermophilic archaeon Sulfolobus shibatae affects DNA supercoiling in a temperature-dependent fashion. J Bacteriol, 2000. 182(14): p. 3929-33. 95. Bell, S.D., et al., The interaction of Alba, a conserved archaeal, chromatin protein, with Sir2 and its regulation by acetylation. Science, 2002. 296(5565): p. 148-151. 96. Liu, Y., et al., The Sac10b Homolog in Methanococcus maripaludis Binds DNA at Specific Sites. J Bacteriol, 2009. 191(7): p. 2315-2329. 97. Vassart, A., et al., Sa-Lrp from Sulfolobus acidocaldarius is a versatile, glutamine-responsive, and architectural transcriptional regulator. Microbiologyopen, 2013. 2(1): p. 75-93.

46 98. Wierer, S., et al., TrmBL2 from Pyrococcus furiosus Interacts Both with Double-Stranded and Single-Stranded DNA. PLoS One, 2016. 11(5). 99. Efremov, A.K., et al., TrmBL2 Protein from Thermococcus Kodakarensis Competes with Histones for DNA Binding and Forms Filamentous Nucleoprotein Complexes that Affect DNA Structural State. Biophys J, 2015. 108(2): p. 73a-73a. 100. !!! INVALID CITATION !!! 101. Hoskisson, P.A. and G. Hobbs, Continuous culture - making a comeback? Microbiology-Sgm, 2005. 151: p. 3153-3159. 102. Llorens, J.M.N., A. Tormo, and E. Martinez-Garcia, Stationary phase in gram-negative bacteria. FEMS Microbiol Rev, 2010. 34(4): p. 476- 495. 103. Nystrom, T., Stationary-phase physiology. Annu Rev Microbiol, 2004. 58: p. 161-181. 104. Herman, P.K., Stationary phase in yeast. Curr Opin Microbiol, 2002. 5(6): p. 602-607. 105. Martinez, M.J., et al., Genomic analysis of stationary-phase and exit in Saccharomyces cerevisiae: Gene expression and identification of novel essential genes. Mol Biol Cell, 2004. 15(12): p. 5295-5305. 106. Shockley, K.R., et al., Heat shock response by the hyperthermophilic archaeon Pyrococcus furiosus. Appl Environ Microbiol, 2003. 69(4): p. 2365-71. 107. Hendrickson, E.L., et al., Global responses of Methanococcus maripaludis to specific nutrient limitations and growth rate. J Bacteriol, 2008. 190(6): p. 2198-205. 108. Frols, S., et al., Response of the hyperthermophilic archaeon Sulfolobus solfataricus to UV damage. J Bacteriol, 2007. 189(23): p. 8708-18. 109. Gotz, D., et al., Responses of hyperthermophilic crenarchaea to UV irradiation. Genome Biol, 2007. 8(10): p. R220. 110. Khatibi, P.A., et al., Impact of growth mode, phase, and rate on the metabolic state of the extremely thermophilic archaeon Pyrococcus furiosus. Biotechnol Bioeng, 2017. 111. Gagen, E.J., et al., The Proteome and Lipidome of Thermococcus kodakarensis across the Stationary Phase. Archaea-an International Microbiological Journal, 2016. 112. Cellini, A., et al., Stringent control in the archaeal genus Sulfolobus. Res Microbiol, 2004. 155(2): p. 98-104. 113. Rolfe, M.D., et al., Lag Phase Is a Distinct Growth Phase That Prepares Bacteria for Exponential Growth and Involves Transient Metal Accumulation. J Bacteriol, 2012. 194(3): p. 686-701. 114. Bernander, R. and A. Poplawski, Cell cycle characteristics of thermophilic archaea. J Bacteriol, 1997. 179(16): p. 4963-9. 115. Lindas, A.C. and R. Bernander, The cell cycle of archaea. Nat Rev Microbiol, 2013. 11(9): p. 627-38.

47 116. Pelve, E.A., et al., Cdv-based cell division and cell cycle organization in the thaumarchaeon Nitrosopumilus maritimus. Mol Microbiol, 2011. 82(3): p. 555-566. 117. Ishihama, A., Adaptation of gene expression in stationary phase bacteria. Curr Opin Genet Dev, 1997. 7(5): p. 582-8. 118. Dalebroux, Z.D. and M.S. Swanson, ppGpp: magic beyond RNA polymerase. Nat Rev Microbiol, 2012. 10(3): p. 203-12. 119. Chatterji, D. and A.K. Ojha, Revisiting the stringent response, ppGpp and starvation signaling. Curr Opin Microbiol, 2001. 4(2): p. 160-5. 120. Merrikh, H., A.E. Ferrazzoli, and S.T. Lovett, Growth phase and (p)ppGpp control of IraD, a regulator of RpoS stability, in Escherichia coli. J Bacteriol, 2009. 191(24): p. 7436-46. 121. Geiger, T. and C. Wolz, Intersection of the stringent response and the CodY regulon in low GC Gram-positive bacteria. International Journal of Medical Microbiology, 2014. 304(2): p. 150-5. 122. Pletnev, P., et al., Survival guide: Escherichia coli in the stationary phase. Acta Naturae, 2015. 7(4): p. 22-33. 123. Liu, K., A.N. Bittner, and J.D. Wang, Diversity in (p)ppGpp metabolism and effectors. Curr Opin Microbiol, 2015. 24: p. 72-9. 124. Gaca, A.O., C. Colomer-Winter, and J.A. Lemos, Many means to a common end: the intricacies of (p)ppGpp metabolism and its control of bacterial homeostasis. J Bacteriol, 2015. 197(7): p. 1146-56. 125. Kriel, A., et al., Direct regulation of GTP homeostasis by (p)ppGpp: a critical component of viability and stress resistance. Mol Cell, 2012. 48(2): p. 231-41. 126. Sonenshein, A.L., CodY, a global regulator of stationary phase and virulence in Gram-positive bacteria. Curr Opin Microbiol, 2005. 8(2): p. 203-207. 127. Corrigan, R.M., et al., ppGpp negatively impacts ribosome assembly affecting growth and antimicrobial tolerance in Gram-positive bacteria. Proc Natl Acad Sci U S A, 2016. 113(12): p. E1710-9. 128. Weber, H., et al., Genome-wide analysis of the general stress response network in Escherichia coli: sigmaS-dependent genes, promoters, and sigma factor selectivity. J Bacteriol, 2005. 187(5): p. 1591-603. 129. Azam, T.A., et al., Growth phase-dependent variation in protein composition of the Escherichia coli nucleoid. J Bacteriol, 1999. 181(20): p. 6361-6370. 130. Lee, S.Y., et al., Regulation of Bacterial DNA Packaging in Early Stationary Phase by Competitive DNA Binding of Dps and IHF. Sci Rep, 2015. 5: p. 18146. 131. Tani, T.H., et al., Adaptation to famine: a family of stationary-phase genes revealed by microarray analysis. Proc Natl Acad Sci U S A, 2002. 99(21): p. 13471-6. 132. Beauclerk, A.A., et al., Studies of the GTPase domain of archaebacterial . Eur J Biochem, 1985. 151(2): p. 245-55.

48 133. Cimmino, C., G.L. Scoarughi, and P. Donini, Stringency and Relaxation among the Halobacteria. J Bacteriol, 1993. 175(20): p. 6659-6662. 134. Scoarughi, G.L., C. Cimmino, and P. Donini, Lack of Production of (P)Ppgpp in Halobacterium-Volcanii under Conditions That Are Effective in the Eubacteria. J Bacteriol, 1995. 177(1): p. 82-85. 135. Dulmage, K.A., H. Todor, and A.K. Schmid, Growth-Phase-Specific Modulation of Cell Morphology and Gene Expression by an Archaeal Histone Protein. Mbio, 2015. 6(5). 136. Dinger, M.E., G.J. Baillie, and D.R. Musgrave, Growth phase- dependent expression and degradation of histones in the thermophilic archaeon Thermococcus zilligii. Mol Microbiol, 2000. 36(4): p. 876- 85. 137. Chen, L., et al., The genome of Sulfolobus acidocaldarius, a model organism of the Crenarchaeota. J Bacteriol, 2005. 187(14): p. 4992- 9. 138. Thomas D. Brock, K.M.B., Robert T. Belly, Richard L. Weiss, Sulfolobus: A new genus of sulfur-oxidizing bacteria living at low pH and high temperature. Archiv für Mikrobiologie, 1972. 84(1): p. 14. 139. Derosa, M., A. Gambacorta, and J.D. Bulock, Extremely Thermophilic Acidophilic Bacteria Convergent with Sulfolobus- Acidocaldarius. Journal of General Microbiology, 1975. 86(Jan): p. 156-164. 140. Sleytr, U.B., et al., S-layers: principles and applications. FEMS Microbiol Rev, 2014. 38(5): p. 823-864. 141. Weiss, R.L., Subunit Cell-Wall of Sulfolobus-Acidocaldarius. J Bacteriol, 1974. 118(1): p. 275-284. 142. Albers, S.V. and B.H. Meyer, The archaeal cell envelope. Nat Rev Microbiol, 2011. 9(6): p. 414-26. 143. Ceballos, R.M., et al., Differential virus host-ranges of the Fuselloviridae of hyperthermophilic Archaea: implications for evolution in extreme environments. Front Microbiol, 2012. 3: p. 295. 144. Banerjee, A., et al., Insights into subunit interactions in the Sulfolobus acidocaldarius archaellum cytoplasmic complex. FEBS J, 2013. 280(23): p. 6141-9. 145. Lindas, A.C., et al., A unique cell division machinery in the Archaea. Proc Natl Acad Sci U S A, 2008. 105(48): p. 18942-6. 146. Hirata, A., B.J. Klein, and K.S. Murakami, The X-ray crystal structure of RNA polymerase from Archaea. Nature, 2008. 451(7180): p. 851- 4. 147. She, Q., et al., The complete genome of the crenarchaeon Sulfolobus solfataricus P2. Proc Natl Acad Sci U S A, 2001. 98(14): p. 7835- 7840. 148. Peng, N., et al., Genetic technologies for extremely thermophilic microorganisms of Sulfolobus, the only genetically tractable genus of crenarchaea. Sci China Life Sci, 2017. 60(4): p. 370-385.

49 149. Atomi, H., T. Imanaka, and T. Fukui, Overview of the genetic tools in the Archaea. Front Microbiol, 2012. 3. 150. Li, Y.J., et al., Harnessing Type I and Type III CRISPR-Cas systems for genome editing. Nucleic Acids Res, 2016. 44(4). 151. Leigh, J.A., et al., Model organisms for genetics in the domain Archaea: methanogens, halophiles, Thermococcales and Sulfolobales. FEMS Microbiol Rev, 2011. 35(4): p. 577-608. 152. Worthington, P., et al., Targeted disruption of the alpha-amylase gene in the hyperthermophilic archaeon Sulfolobus solfataficus. J Bacteriol, 2003. 185(2): p. 482-488. 153. Wagner, M., et al., Versatile genetic tool box for the crenarchaeote Sulfolobus acidocaldarius. Front Microbiol, 2012. 3. 154. Wagner, M., et al., Expanding and understanding the genetic toolbox of the hyperthermophilic genus Sulfolobus. Biochem Soc Trans, 2009. 37: p. 97-101. 155. Deng, L., et al., Unmarked gene deletion and host-vector system for the hyperthermophilic crenarchaeon Sulfolobus islandicus. Extremophiles, 2009. 13(4): p. 735-746. 156. Jonuscheit, M., et al., A reporter gene system for the hyperthermophilic archaeon Sulfolobus solfataricus based on a selectable and integrative shuttle vector. Mol Microbiol, 2003. 48(5): p. 1241-1252. 157. Berkner, S., et al., Small multicopy, non-integrative shuttle vectors based on the plasmid pRN1 for Sulfolobus acidocaldarius and Sulfolobus solfataricus, model organisms of the (cren-)archaea. Nucleic Acids Res, 2007. 35(12). 158. Gilmour, D.S. and J.T. Lis, Detecting protein-DNA interactions in vivo: distribution of RNA polymerase on specific bacterial genes. Proc Natl Acad Sci U S A, 1984. 81(14): p. 4275-9. 159. Liu, H., et al., The genome-scale DNA-binding profile of BarR, a beta- alanine responsive transcription factor in the archaeon Sulfolobus acidocaldarius. BMC Genomics, 2016. 17: p. 569. 160. Trong, N.D., et al., The genome-wide binding profile of the Sulfolobus solfataricus transcription factor Ss- LrpB shows binding events beyond direct transcription regulation. BMC Genomics, 2013. 14. 161. Nguyen-Duc, T., et al., Nanobody(R)-based chromatin immunoprecipitation/micro-array analysis for genome-wide identification of transcription factor DNA binding sites. Nucleic Acids Res, 2013. 41(5): p. e59. 162. Peeters, E. and D. Charlier, The Lrp family of transcription regulators in archaea. Archaea, 2010. 2010: p. 750457. 163. Zhang, L., et al., Detecting DNA-binding of proteins in vivo by UV- crosslinking and immunoprecipitation. Biochem Biophys Res Commun, 2004. 322(3): p. 705-11. 164. Toth, J. and M.D. Biggin, The specificity of protein-DNA crosslinking by formaldehyde: in vitro and in drosophila embryos. Nucleic Acids Res, 2000. 28(2): p. e4.

50 165. O'Neill, L.P. and B.M. Turner, Immunoprecipitation of native chromatin: NChIP. Methods, 2003. 31(1): p. 76-82. 166. Collas, P., The Current State of Chromatin Immunoprecipitation. Molecular Biotechnology, 2010. 45(1): p. 87-100. 167. Carey, M.F., C.L. Peterson, and S.T. Smale, Chromatin immunoprecipitation (ChIP). Cold Spring Harb Protoc, 2009. 2009(9): p. pdb prot5279. 168. Raha, D., M. Hong, and M. Snyder, ChIP-Seq: a method for global identification of regulatory elements in the genome. Curr Protoc Mol Biol, 2010. Chapter 21: p. Unit 21 19 1-14. 169. Shendure, J., et al., DNA sequencing at 40: past, present and future. Nature, 2017. 550(7676). 170. Green, E.D., E.M. Rubin, and M.V. Olson, The future of DNA sequencing. Nature, 2017. 550(7675): p. 179-181. 171. Reuter, J.A., D.V. Spacek, and M.P. Snyder, High-Throughput Sequencing Technologies. Mol Cell, 2015. 58(4): p. 586-597. 172. Goodwin, S., J.D. McPherson, and W.R. McCombie, Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics, 2016. 17(6): p. 333-351. 173. Mardis, E.R., Next-generation DNA sequencing methods. Annual Review of Genomics and Human Genetics, 2008. 9: p. 387-402. 174. Shendure, J. and H.L. Ji, Next-generation DNA sequencing. Nat Biotechnol, 2008. 26(10): p. 1135-1145. 175. Metzker, M.L., Sequencing technologies - the next generation. Nat Rev Genet, 2010. 11(1): p. 31-46. 176. Fuller, C.W., et al., The challenges of sequencing by synthesis. Nat Biotechnol, 2009. 27(11): p. 1013-1023. 177. Shendure, J. and E.L. Aiden, The expanding scope of DNA sequencing. Nat Biotechnol, 2012. 30(11): p. 1084-1094. 178. Wang, Z., M. Gerstein, and M. Snyder, RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 2009. 10(1): p. 57-63. 179. Park, P.J., ChIP-seq: advantages and challenges of a maturing technology. Nature Reviews Genetics, 2009. 10(10): p. 669-680. 180. Kemme, C.A., et al., Regulation of transcription factors via natural decoys in genomic DNA. Transcription, 2016. 7(4): p. 115-20.

51