Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch

Year: 2019

Casting light on Asgardarchaeota metabolism in a sunlit microoxic niche

Bulzu, Paul-Adrian ; Andrei, Adrian-Ştefan ; Salcher, Michaela M ; Mehrshad, Maliheh ; Inoue, Keiichi ; Kandori, Hideki ; Beja, Oded ; Ghai, Rohit ; Banciu, Horia L

Abstract: Recent advances in phylogenomic analyses and increased genomic sampling of uncultured prokaryotic lineages have brought compelling evidence in support of the emergence of from within the archaeal of life (eocyte hypothesis)1,2. The discovery of Asgardarchaeota and its supposed position at the base of the eukaryotic tree of life3,4 provided cues about the long-awaited identity of the eocytic lineage from which the nucleated cells (Eukaryota) emerged. While it is apparent that Asgardarchaeota encode a plethora of eukaryotic-specific proteins (the highest number identified yet in )5, the lack of genomic information and metabolic characterization has precluded inferences about their lifestyles and the metabolic landscape that favoured the emergence of the protoeukaryote ancestor. Here, we use advanced phylogenetic analyses for inferring the deep ancestry of eukaryotes, and genome-scale metabolic reconstructions for shedding light on the metabolic milieu of Asgardarchaeota. In doing so, we: (1) show that Heimdallarchaeia (the closest eocytic lineage to eukaryotes to date) are likely to have a microoxic niche, based on their genomic potential, with aerobic metabolic pathways that are unique among (that is, the kynurenine pathway); (2) provide evidence of mixotrophy within Asgardarchaeota; and (3) describe a previously unknown family of rhodopsins encoded within the recovered genomes.

DOI: https://doi.org/10.1038/s41564-019-0404-y

Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-183773 Journal Article Accepted Version

Originally published at: Bulzu, Paul-Adrian; Andrei, Adrian-Ştefan; Salcher, Michaela M; Mehrshad, Maliheh; Inoue, Keiichi; Kandori, Hideki; Beja, Oded; Ghai, Rohit; Banciu, Horia L (2019). Casting light on Asgardarchaeota metabolism in a sunlit microoxic niche. Nature Microbiology, 4(7):1129-1137. DOI: https://doi.org/10.1038/s41564-019-0404-y 1 Casting light on Asgardarchaeota metabolism in a sunlit microoxic niche

2

3 Paul-Adrian Bulzu1#, Adrian-Ştefan Andrei2#, Michaela M. Salcher2,3, Maliheh Mehrshad2, 4 Keiichi Inoue5, Hideki Kandori6, Oded Beja4, Rohit Ghai2*, Horia L. Banciu1,7

5 1Department of Molecular Biology and Biotechnology, Faculty of Biology and Geology, Babeş-Bolyai 6 University, Cluj-Napoca, Romania 7 2Department of Aquatic Microbial Ecology, Institute of Hydrobiology, Biology Centre of the Academy 8 of Sciences of the Czech Republic, České Budějovice, Czech Republic. 9 3Limnological Station, Institute of Plant and Microbial Biology, University of Zurich, Kilchberg, 10 Switzerland. 11 4Faculty of Biology, Technion Israel Institute of Technology, Haifa, Israel. 12 5The Institute for Solid State Physics, The University of Tokyo, Kashiwa, Japan 13 6Department of Life Science and Applied Chemistry, Nagoya Institute of Technology, Nagoya, Japan. 14 7Molecular Biology Center, Institute for Interdisciplinary Research in Bio-Nano-Sciences, Babeş- 15 Bolyai University, Cluj-Napoca, Romania 16

17

18

19

20 # These authors contributed equally to this work

21 *Corresponding author: Rohit Ghai

22 Institute of Hydrobiology, Department of Aquatic Microbial Ecology, Biology Centre of the Academy 23 of Sciences of the Czech Republic, Na Sádkách 7, 370 05, České Budějovice, Czech Republic. 24 Phone: +420 387 775 881

25 Fax: +420 385 310 248

26 E-mail: [email protected]

27

28

29

30

31

32

33

34 35

1

36 Recent advances in phylogenomic analyses and increased genomic sampling of uncultured 37 prokaryotic lineages have brought compelling evidence in support of the emergence of eukaryotes 38 from within the archaeal domain of life (eocyte hypothesis)1,2. The discovery of Asgardarchaeota and 39 its supposed position at the base of the eukaryotic tree of life3,4 provided cues about the long- 40 awaited identity of the eocytic lineage from which the nucleated cells (Eukaryota) emerged. While it 41 is apparent that Asgardarchaeota encode a plethora of eukaryotic-specific proteins (the highest 42 number identified yet in prokaryotes)5, the lack of genomic information and metabolic 43 characterization has precluded inferences about their lifestyles and the metabolic landscape that 44 favored the emergence of the protoeukaryote ancestor. Here, we use advanced phylogenetic 45 analyses for inferring the deep ancestry of eukaryotes, and genome-scale metabolic reconstructions 46 for shedding light on the metabolic milieu of Asgardarchaeota. In doing so, we: i) show that 47 Heimdallarchaeia (the closest eocytic lineage to eukaryotes to date) are likely to have a microoxic 48 niche, based on their genomic potential, with aerobic metabolic pathways unique among Archaea 49 (i.e. kynurenine pathway), ii) provide evidence of mixotrophy within Asgardarchaeota and iii) 50 describe a previously unknown family of rhodopsins encoded within the recovered genomes. 51 At the dawn of genomics, the eukaryotes were recognized as amalgamated genetic jigsaws that bore 52 components of both bacterial and archaeal descent6. This genomic chimerism served as a source of 53 speculation and debate over the nature of the protoeukaryote ancestors3,4,7 and inspirited a plethora 54 of hypothetical scenarios for the processes that led to eukaryogenesis6,8. The hypothesis that 55 eukaryotes emerged from within the archaeal radiation was put forward more than three decades 56 ago, when ribosomal structural patterns were shown to support a sister-group relationship between 57 eukaryotes and an extant archaeal lineage9. This already diversified archaeal group, which was found 58 to bear more similarity (in ribosomal morphology) with eukaryotes than to Bacteria and the rest of 59 the Archaea was denominated as Eocyta9. Subsequently, the eocyte hypothesis9 was overshadowed 60 by the three-domains tree of life, which depicts Archaea and Eukarya as monophyletic groups that 61 share a common ancestor10. More than two decades passed till the usage of increased taxonomic 62 sampling and advanced phylogenetic approaches fueled the revival of the eocyte hypothesis1,2, and 63 alongside the debate regarding the topology of the tree of life4,5,7. Even though, in light of recent 64 research, eukaryotes came into existence through the interplay between an archaeal host11 and a 65 bacterial endosymbiont12, the metabolic milieu of the ancestral prokaryotic lineages still remain 66 elusive. 67 Homology-based searches were employed to recover Asgardarchaeota-related contigs from de novo 68 metagenomic assemblies of two deep-sequenced, shallow, brackish lake sediment samples 69 (sediment pore-water salinities: 5.7% in Tekirghiol Lake and 3.9% in Amara Lake). By utilizing a 70 hybrid binning strategy and performing manual inspection and data curation, we obtained eleven 71 high- and medium-quality (> 50% completeness; < 2% contamination) and twenty-four low 72 completeness (<50% completeness; < 3% contamination) MAGs (metagenome-assembled genomes), 73 spanning three (out of four) evolutionary lineages within the superphylum: Lokiarchaeia (23), 74 Thorarchaeia (10), and Heimdallarchaeia (2). The maximum likelihood phylogenetic tree, based on 75 concatenation of small (SSU) and large (LSU) ribosomal RNA genes, pictured a topology in which 76 eukaryotes branched with high-support from within Asgardarchaeota (Archaea) (Supplementary 77 Figure 1a). Even more remarkably, in addition to recreating a previously described 78 Asgardarchaeota/Eukaryota branching pattern5, we provide support for a close evolutionary linkage 79 between Heimdallarchaeia (Asgardarchaeota superphylum) and eukaryotes (SH-aLRT=97.5; 80 UFBoot=100) (Supplementary Figure 1a). The genome-focused phylogeny of Asgardarchaeota 81 revealed a pattern of ancestry, divergence, and descent, in which Heimdallarchaeia comprise the 82 basal branch of the superphylum and Thor-/Odinarchaeia the youngest one (Figure 1a). Although 83 dissimilar in branching pattern with the SSU + LSU tree (Figure S1a), the phylogenomic one was 84 found to be robust (Figure 1a) and to support a topology brought into attention by an earlier study5. 85 The SR4-recoded13 Bayesian tree (maxdiff=0.1) resolved with high support (PP=1) the monophyly of

2

86 Asgardarchaeota/Eukaryota, but failed to confidently resolve the internal topology of the 87 superphylum and the branching point of eukaryotes (Figure 1b). Noteworthy, in both SR4-recoded 88 Bayesian (Figure 1b) and maximum likelihood phylogenies (Supplementary Figure 1b) the 89 eukaryotes caused branch instability for Heimdallarchaeia, which were attracted without statistical 90 support within the superphylum (PP=0.52; SH-aLRT=92.4 and UFBoot=88). To further substantiate 91 the phylogenetic connections between Asgardarchaeota members and eukaryotes, we screened all 92 recovered MAGs and the publicly available ones (14 in July 2018) for the presence of potential 93 eukaryotic signature proteins (ESP). Similar to previous reports3,5,14, the MAGs were found to be 94 highly enriched in ESP (Figure 1c), which further reinforced their ancestral linkage to eukaryotes. In 95 addition to the reported ESP5, we identified a potential subunit of the COPII vesicle coat complex 96 (associated with intracellular vesicle traffic and secretion) in Thorarchaeia and proteins that harbor 97 the N-terminal domain of folliculin - a -specific protein (known to be involved in 98 membrane trafficking in humans15) (Figure 1c) in Lokiarchaeia. Furthermore, we retrieved conclusive 99 hits for the ESP-related domains Ezrin/radixin/moesin C-terminal domain and active zone protein 100 ELKS in Lokiarchaeia. 101 102 Recent findings reporting the presence of a previously uncharacterized family of rhodopsins16 (i.e. 103 heliorhodopsins; abbreviated as HeR) in monoderms17 encouraged us to perform a dedicated 104 screening in all available Asgardarchaeota MAGs. The results indicated that one of the 105 Heimdallarchaeia MAGs (i.e Heimdallarchaeota RS678) encoded two HeR and what appears to be, as 106 suggested by the presence of a Exiguobacterium-like DTK motif18 and phylogenetic proximity, a type- 107 1 proton-pumping rhodopsin (Figure 2; Supplementary Table 4), suggestive of light sensitivity. 108 Remarkably, we found that the Asgardarchaeota MAGs recovered during this study encoded 109 rhodopsin sequences similar in membrane orientation to type-1 rhodopsins, and which organized 110 during phylogenetic analysis in a monophyletic clade (Standard Bootstrap: SBS=1) placed in-between 111 HeR and type-1 rhodopsins (Figure 2). Multiple sequence alignments showed: i) homology between 112 transmembrane helices 1, 6 and 7 of these rhodopsins and the type-1 rhodopsins, while helix 3 was 113 homologous to HeR and ii) presence of additional characteristic HeR motifs (e.g. RWxF motif similar 114 to RWxE of HeR rather than the RYxD motif in most type-1 rhodopsins, and replacement of a proline 115 residue (P91) conserved in type-1 rhodopsins by serine (S91) in both HeR and the ones that we 116 identified in Asgardarchaeota) (Supplementary Figure 2). Given their phylogenetically intermediate 117 position, as type-1 rhodopsins closest to HeR, and presence of features found in both type-1 and 118 HeR, we denote them as schizorhodopsins (schizo, ‘split’, plus ‘rhodopsin’, abbreviated as SzR). The 119 very recent discovery of HeR and their inconclusive functional role16,17 precludes tentative functional 120 assertions for SzR capacity in Asgardarchaeota. However, the plethora of rhodopsins that we 121 identified in Heimdallarchaeia (putative type-1 proton pumps, HeR and SzR), together with the SzR 122 found in Lokiarchaeia and Thorarchaeia suggests that, during their evolutionary history, 123 Asgardarchaeota were present in light-exposed habitats. Moreover, we consider that the primary 124 niche of these rhodopsin-bearing microbes is most likely the top, light exposed sediment layers. 125 Their recovery from deeper strata may be explained by the high deposition rates characteristic for 126 the sampling locations (typically a few cm per year)19. 127 The genome-scale metabolic reconstruction placed special emphasis on Heimdallarchaeia, since it 128 was suggested by the above-mentioned phylogenetic analyses to encompass the most probable 129 candidates (to date) for the archaeal protoeukaryote ancestor. While the anaerobic lifestyles 130 inferred for Loki-20 and Thorarchaeia14 were considered to be accompanied by autotrophy20 and 131 mixotrophy14, respectively, no consistent metabolic reconstructions exist for Heimdallarchaeia. The 132 physiology inferred here pointed towards mixotrophic lifestyles (for Asgardarchaeota), 133 simultaneously showing the presence of transporters for the uptake of exogenous organic matter 134 and the metabolic circuitry responsible for its catabolism (see Supplementary Discussion). 135 Noteworthy, we found oxygen-dependent metabolic pathways in Heimdallarchaeia, which will be 136 further presented in contrast to the ones harbored by the anaerobic Loki- and Thorarchaeia.

3

137 Heimdallarchaeia were inferred to possess components of the aerobic respiration blueprint: a 138 complete tricarboxylic acid cycle (TCA) supported by an electron transport chain (ETC) containing: 139 V/A-type ATPase, succinate dehydrogenase, NADH:quinone oxidoreductase, and the cytochrome c 140 oxidase (Figure 3). While in Thor- various components of the TCA were found to be missing, in 141 Lokiarchaeia the complete TCA was associated with: isocitrate dehydrogenases, 2-oxoglutarate- 142 ferredoxin oxidoreductases, and ATP-citrate lyases, pointing towards the presence of a reverse 143 tricarboxylic acid cycle (rTCA). Thus, in contrast to Heimdallarchaeia, which utilize TCA to fuel their

144 catabolic machinery (Figure 3), Lokiarchaeia use rTCA for autotrophic CO2 assimilation. While the 145 V/A-type ATPase complex appears to be complete in Loki- and Thorarchaeia, the other components 146 involved in the oxidative phosphorylation processes were not identified. 147 148 As nicotinamide adenine dinucleotide (NAD+) is an essential cofactor in redox biochemistry and 149 energetics21 (e.g. linking TCA and ETC), we investigated its de novo synthesis mechanisms (Figure 4). 150 As expected, all Asgardarchaeota phyla harbored the aspartate pathway - a set of metabolic 151 transformations that can occur both in presence or absence of oxygen22, and which are characteristic 152 for most prokaryotes and plastid-bearing eukaryotes (obtained through lateral gene transfer from 153 their cyanobacterial endosymbionts)21. Surprisingly, in addition to the aspartate pathway, 154 Heimdallarchaeia also encoded the exclusively aerobic kynurenine pathway of NAD+ biosynthesis23 155 (Figure 4), which is present in few bacterial groups and eukaryotes21. The phylogenetic 156 reconstruction and evolutionary history inferences showed that this pathway, which is considered to 157 be present in the protoeukaryote ancestor21, was probably acquired by the ancestor of 158 Heimdallarchaeia through lateral gene transfer from bacteria (Supplementary Figure 3). As far as the 159 authors are aware, Heimdallarchaeia are the first reported archaeal organisms harboring the aerobic 160 kynurenine pathway. Curiously, while Heimdall_LC_3 was found to contain the complete set of 161 genes required for both pathways, Heimdall_LC_2 and Heimdall_RS678 encoded exclusively the 162 genes affiliated with the kynurenine pathway (Figure 4). As the aspartate pathway was reported to 163 function in both oxygen absence (L-aspartate oxidase uses fumarate as electron acceptor)22 and

164 presence (L-aspartate oxidase uses O2 as electron acceptor), the existence of the kynurenine 165 pathway in Heimdall_LC_3 appears redundant. By corroborating the presence/absence pattern of 166 the aspartate pathway in Asgardarchaeota (Figure 4) with the reconstructed evolutionary history of 167 Heimdallarchaeia (Figure 1a, b; Supplementary Figure 1a, b) and blastp similarity searches (for 168 Heimdall_LC_3 L-aspartate oxidase), we inferred that this pathway functioned exclusively under 169 anaerobic conditions. Furthermore, the introgression of kynurenine genes in Heimdallarchaeia 170 appears to be caused by an expansion towards an oxygen-containing niche, which during 171 evolutionary history (from Heimdall_LC_3 to Heimdall_LC_2/Heimdall_RS678) favored the 172 xenologous replacement of the aspartate pathway with the kynurenine one. 173 174 Within the anaplerotic metabolism, the reversible transformation of pyruvate into acetyl-CoA and 175 formate can be accomplished by pyruvate formate lyases, which were inferred to be present in all 176 three phyla. Formate produced during this enzymatic process, or by the activity of arylformamidase 177 (kynurenime formamidase) in Loki- and Heimdallarchaeia could be further oxidized (by formate 178 dehydrogenases) and used for quinone/cytochrome pool reduction, or introduced into one-carbon 179 metabolism and utilized for the synthesis of purines, glycine, formylmethionine, etc. (Figure 3). 180 Uniquely in Heimdallarchaeia we inferred that formate could act as electron donor during aerobic 181 respiration through the actions of the heterotrimeric formate dehydrogenase O. This enzyme 182 facilitates the usage of formate under aerobic conditions, and together with nitrate reductase Z (also 183 present solely in Heimdallarchaeia) may participate in a formate to nitrate electron transport 184 pathway that is active when cells are shifted from aerobic to anaerobic conditions24. The presence of 185 genes encoding pyruvate oxidases (poxL) in Heimdallarchaeia (i.e. LC_2 and LC_3) implies further 186 oxygen usage, as the enzyme employs it in the pyruvate pool decarboxylation process (Figure 3). 187

4

188 Comparative genomic analyses also revealed that the three Asgardarchaeota phyla rely upon 189 (i.e. of Embden-Meyerhof-Parnas type) to fuel their metabolic machinery. Unexpectedly, 190 three Heimdallarchaeia MAGs (LC_3, AB_125 and AMARA_4) were found to employ non-canonical 191 ADP-dependent kinases that use ADP instead of the typical ATP as phosphoryl group donor25 in their 192 glycolytic pathways. Furthermore, they seemed to be bifunctional ADP-dependent 193 glucokinase/phosphofructokinases, which was puzzling since the presence of 6- 194 phosphofructokinases (LC_3 and AB_125) would render their bifunctionality redundant. In order to 195 elucidate the role of the putative bifunctional enzymes, we reconstructed the evolutionary history of 196 the ADP-dependent kinases and inferred that they possess glucokinase activity (based on tree 197 topology and the conserved functional residue E172) (Supplementary Figure 4). Additionally, we 198 observed that the deepest branching Heimdallarchaeia (LC_3) harbored the archaeal-type enzyme, 199 while the younger ones (AB_125 and AMARA_4) clustered together with the eukaryotic-type 200 (Supplementary Figure 4). Although it is easy to assume that cells under low energy conditions (e.g.

201 limiting O2 availability) could highly benefit from using residual ADP to activate sugar moieties and 202 fuel their glycolysis26, the metabolic advantage conferred by these ADP-dependent kinases is 203 unclear. 204 205 Although pentoses could be recycled via nucleotide degradation in all Asgardarchaeota phyla, their 206 synthesis differs between Loki-/Heimdallarchaeia that likely utilize the reverse ribulose 207 monophosphate pathway, and Thorarchaeia that employ the xylulose part (of the non-oxidative 208 branch) of the hexose monophosphate pathway. The identified homologues for ribulose 1,5- 209 bisphosphate carboxylase/oxygenase (RuBisCO) genes were found to appertain to types III (Loki- and 210 Heimdallarchaeia) and IV (Loki- and Thorarchaeia) (Supplementary Figure 5). While RuBisCO is a key

211 enzyme for CO2 fixation in the Calvin-Benson-Bassham cycle, the absence of phosphoribulokinase 212 renders this metabolic pathway highly improbable. However, we consider that the MAGs encoding 213 type III-like RuBisCO (assigned to Loki- and Heimdallarchaeia) utilize the nucleotide monophosphate 27 214 degradation pathway , performing CO2 fixation by linking nucleoside catabolism to 215 glycolysis/gluconeogenesis. This conclusion is supported by the co-occurrence of genes encoding for: 216 RuBisCO type III, AMP phosphorylases, ribose 1,5-bisphosphate isomerases, and carbonic 217 anhydrases. While carbon monoxide (CO) can be used as carbon and energy source in both aerobic 218 and anaerobic metabolisms28, the types of enzymes involved in the reaction are dependent upon the 219 available electron acceptor. Thus, while Heimdallarchaeia harbor all three major subunits of the 220 aerobic carbon monoxide dehydrogenases (CODH), Loki- and Thorarchaeia encoded the oxygen- 221 sensitive carbon monoxide dehydrogenase/acetyl-CoA synthase (CODH/ACS). We infer that while 222 Heimdallarchaeia use CO to obtain energy by shuttling the electrons generated from CO oxidation to 223 oxygen or nitrate, Thor- and Lokiarchaeia may utilize CO as both electron source and intermediary 224 substrate in the ancient Wood–Ljungdahl carbon fixation pathway29 (through CODH/ACS). 225 The mainstream theories on the subject of eukaryogenesis8 which date back to late 20th century have 226 been recently challenged by improved phylogenetic methods2 and increased genomic sampling5,20. 227 Even after experiencing a revival20, the current endosymbiotic theory fails to envision the 228 environmental and metabolic context from which the protoeukaryote ancestor emerged. In order to 229 study the phylogenetic relationships and the metabolic milieu of Asgardarchaeota we leveraged 230 metagenome-derived data. Despite the fact that the performed phylogenomic analyses recovered 231 an eocyte tree topology (in agreement with previous studies4,5), they did not reach consensus 232 regarding the branching point of eukaryotes from within the . Thus, while the SSU+LSU tree 233 showed the Eukaryota branching from within Heimdallarchaeia, the ribosomal protein one pointed 234 towards a sister-group relationship. As the available genomic data represents a fraction of the extant 235 Asgardarchaeota diversity, we consider that additional sampling will further improve the resolution 236 of the phylogenomic tree and clarify the phylogenetic relationship between Eukaryota and 237 Heimdallarchaeia. Overall, the performed metabolic reconstructions indicate that Heimdallarchaeia 238 are mixotrophic and have a facultative aerobic metabolism. The presence of oxygen-dependent

5

239 pathways in Heimdallarchaeia raises the possibility that the archaeal protoeukaryote ancestor could 240 have also been a facultative aerobe. Thus, based upon phylogenetic and metabolic reconstructions, 241 we propose a hypothesis (i.e. ‘aerobic-protoeukaryotes’ model) in which both the archaeal and 242 bacterial eukaryotic ancestors have an oxygen-dependent metabolism. This model surpasses some 243 of the theoretical shortcomings of the ‘hydrogen hypothesis’ by envisioning an endosymbiotic 244 association in which the primordial function of the bacterial counterpart (i.e. oxidative 245 phosphorylation) would not be detrimental to the existence of the archaeal host (caused by oxygen 246 exposure). This model is in agreement with a recent high-scale time-calibrated phylogenomic tree, 247 which shows that the archaeal-bacterial endosymbiosis that gave birth to the protoeukaryote 248 ancestor took place after the Great Oxidation Event30. Even though the present data offers an 249 updated perspective on the lifestyles of Asgardarchaeota, it is based on a fraction of the extant 250 environmental diversity. Thus, further studies will be needed in order to elucidate the lifestyle 251 strategies and evolutionary histories within the Asgardarchaeota. We consider that further 252 environmental exploration will undoubtedly offer access to additional eocytic lineages and improve 253 our apprehension of the evolution of Archaea and Eukaryota. 254 255 Methods 256 Sampling: Amara and Tekirghiol are naturally-formed shallow lakes in South-Eastern Romania that 257 harbor large deposits of organic-rich sediments (or ‘sapropels’). Amara Lake (44°36.30650 N, 258 27°19.52950 E; 32 m a.s.l.; 1.3 km31 area; maximum and average depths of 6 m and 2 m respectively) 259 is an oxbow lake with brackish water (salinity ca. 1%), originating from an early meander of the 260 Ialomița river (Romanian Plain) supposedly at the end of the Neolithic Black Sea transgression (ca. 261 3000 BC)32. Tekirghiol Lake (44°03.19017 N, 28°36.19083 E; 0.8 m a.s.l.; 11.6 km2 area; maximum and 262 average depths of 9 m and 3 m respectively) is a saline coastal lake (salinity ca. 6%) derived from a 263 marine lagoon which was isolated from the Black Sea by a narrow (~200 m wide) sand barrier, most 264 probably during the Phanagorian Black Sea regression (ca. 500-700 BC)31. Sediment sampling was 265 performed in the same manner in two campaigns, in both lakes. In the first sampling campaign, 266 sediment for exploratory chemical and metagenomics analyses was collected on 10 October 2017 at 267 12:00 in Tekirghiol Lake and on 11 October 2017 at 15:00 in Amara Lake. The successful recovery of 268 Asgardarchaeota genomes prompted a second sampling campaign for fine chemical profiling on 22 269 April 2018 at 12:00 in Tekirghiol Lake, on the site of the previous sampling. In Tekirghiol Lake, 270 sampling was performed in the shallow shore area of the lake (approx. 0.8 m depth) using a custom 271 sediment corer (1 m length, 10 cm diameter, sampling area of 78.5 cm2) with a sharpened bottom 272 rim and a removable plug. Five sediment layers were collected in 10 cm intervals (0 – 50 cm) and 273 deposited in sterile Falcon tubes. In Amara Lake the dense vegetation and increased water depth 274 (>1.5m) in the shore area hampered the usage of the custom sediment corer. Therefore, the 275 sampling was performed using a Petite Ponar dredge (Wildco, Saginaw, MI, USA) handled from an 276 inflatable boat. The grab penetrated the sediment layer to a depth of about 10 cm (sampling area 277 225 cm2). Mixed sediment samples obtained from three casts (within a 2 m radius) were collected 278 and deposited in sterile 1 L containers. Sediment samples were stored in the dark at 4 °C and 279 processed within 24 hours after collection. 280 Sediment chemical analyses: Chemical analyses were performed on both mixed samples (0 – 40 cm 281 Tekirghiol and 0 – 10 cm Amara) taken in 2017 and the vertical profile obtained from Tekirghiol Lake 282 in 2018. The leachable major ions were water-extracted using a sediment-to-(milli-Q) water ratio of 283 1:10 at room temperature. The suspension was centrifuged and the supernatant was filtered 284 through 0.22 μm-pore sized PTFE membranes. The obtained filtrate was further analyzed for ion 285 content (Supplementary Figure 10). The water extractable elements Na, K, Ca, Mg, P, Fe, and Mn, 286 were measured by inductively coupled plasma atomic emission spectrometry (ICP-AES) using Optima 287 5300DV spectrometer (Perkin Elmer, USA). Chloride (Cl-) ions were measured by titrimetric method. 2- 288 Sulphate (SO4 ) was assessed by ion chromatography on ICS-1500 (Dionex, Sunnyvale, CA, USA).

6

289 Dissolved carbon (DC) and dissolved inorganic carbon (DIC) were measured by catalytic combustion

290 and infrared detection of CO2 using a Multi N/C 2100S Analyser (Analytik Jena, Germany). Dissolved 291 organic carbon (DOC) was obtained by subtracting DIC from DC. Total dissolved nitrogen (DN) as 292 bound nitrogen (including free ammonia, ammonium, nitrite, nitrate, and organic nitrogen) was 293 analyzed by catalytic combustion followed by oxidation of nitrogen monoxide to nitrogen dioxide 294 with ozone and subsequent chemiluminescence detection. Ammonium ions were analysed by 295 spectrophotometry using Lambda 25 UV-VIS spectrophotometer (Perkin Elmer, Beaconsfield, UK) 296 following formation of colored complexes: indophenol blue complex (ammonium), yellow complex 297 formed with sulphosalicylic acid (nitrate), and red colored azo dye formed from diazonium salt in the 298 presence of N-(1-Naphthyl) ethylenediamine and sulphanilamide under acidic conditions (nitrite). 299 The concentration of sulphides was determined by methylene blue method after fixation of samples 300 with 2% (v/v) Zn-acetate. Humidity was estimated by loss-on-ignition (LOI) method following oven- 301 drying of sediments at 105oC for 24 h. The pH and salinity of pore water were measured with a 302 portable HI 9828 multiparameter (Hanna Instruments, Smithfield, RI, USA). All chemical analyses 303 were performed by E.A. Levei, A.M. Incze, and M. Șenilă at INCDO-INOE 2000 - Research Institute for 304 Analytical Instrumentation (Cluj-Napoca, Romania). 305 DNA extraction and purification: DNA was extracted from approximately 10 g of wet mixed 306 sediment samples (0 – 40 cm Tekirghiol and 0 – 10 cm Amara) collected in 2017 using the DNeasy 307 PowerMax Soil Kit (Qiagen, Hilden, Germany) following the manufacturer’s instructions. Extracted 308 DNA was further purified by passing it through humic acid removal columns (type IV-HRC) provided 309 in the ZR Soil Microbe DNA MiniPrep kit (Zymo Research, Irvine, CA, USA). Purified DNA was quality 310 checked and quantified using a ND-1000 NanoDrop spectrophotometer (Thermo Scientific, 311 Waltham, MA, USA). DNA integrity was assessed by agarose gel (1%) electrophoresis and ethidium 312 bromide staining. The samples were denominated as AMARA and TEKIR in accordance with their site 313 of origin. From each sample, 4 µg of pure DNA were vacuum dried in a SpeedVac concentrator 314 (Thermo Scientific, Waltham, MA, USA) and shipped for library construction and NGS sequencing to 315 Macrogen (Seoul, South Korea). 316 Sequencing and data preprocessing: Library preparation was performed by a commercial company 317 by using the TruSeq DNA PCR Free Library prep kit (Illumina). Whole-genome shotgun sequencing of 318 the 150 paired-end libraries (350bp insert size) was done using a HiSeq X (Illumina) platform. The 319 amount of total raw sequence data generated for each metagenome was: 64.5Gbp for Amara and 320 57.6 Gbp for Tekirghiol. Preprocessing of raw Illumina reads was carried out by using a combination 321 of software tools from the BBMap33 project (https://github.com/BioInfoTools/BBMap/). Briefly, 322 bbduk.sh was used to remove poor quality sequences (qtrim=rl trimq=18), to identify phiX and p- 323 Fosil2 control reads (k=21 ref=vectorfile ordered cardinality), and to remove Illumina adapters (k=21 324 ref=adapterfile ordered cardinality). 325 Abundance estimation for Loki- and Heimdallarchaeia: Preprocessed Illumina sets from Amara and 326 Tekirghiol lakes, as well as published3 set SRX684858 from Loki’s castle marine sediment 327 metagenome, were subsampled to 20 million reads by reformat.sh34. Each subset was queried for 328 putative RNA sequences by scanning with UBLAST35 against the non-redundant SILVA 329 SSURef_NR99_132 database36, that was priorly clustered at 85% sequence identity by UCLUST35. 330 Identified putative 16S rRNA sequences (e-value < 1e-5) were screened using SSU-ALIGN37. Resulting 331 bona fide 16S rRNA sequences were compared by blastn38 (e-value <1e-5) against the curated SILVA 332 SSURef_NR99_132 database. Matches with identity ≥ 80% and alignment length ≥90 bp were 333 considered for downstream analyses. Sequences assigned to Loki- and Heimdallarcheia were used to 334 calculate abundances for these taxa in their originating environments (Supplementary Table 1).

335 Environmental distribution of Heimdallarchaeia: In order to investigate the existence of a habitat 336 preference we extracted all the available environmental data from the SILVA database36 (version 337 132) that was associated with Heimdallarchaeia sequences.

7

338 Metagenome assembly and binning: De novo assembly of preprocessed paired-end Illumina reads 339 was done by Megahit39 v.1.1.1 with k-mer list: 39, 49, 69, 89, 109, 129, 149, and with default 340 parameters. Assembled contigs with minimum nucleotide fragment length of 3 kbp were binned by a 341 combination of -dependent and -independent methods. Protein coding genes were 342 predicted by MetaProdigal40. Taxonomy dependent binning was achieved by first assigning 343 taxonomy labels to the predicted genes by performing screenings with MMseqs241 against the NR 344 database. All contigs with a minimum of 30 % genes assigned to Asgardarchaeota were used for 345 taxonomy-independent binning. Mean base coverage for each contig was computed with bbwrap.sh 346 (default parameters) by mapping preprocessed reads from AMARA and TEKIR datasets to the 347 assembled contigs. Hybrid binning (based on tetranucleotide frequencies and coverage data) was 348 performed using MetaBAT242 with default parameters. Bin completeness, contamination and strain 349 heterogeneity were estimated using CheckM43 with default parameters. Poorly resolved bins (i.e. 350 contamination >10%, unbinned contigs) were further manually curated by a combination of 351 tetranucleotide frequency PCA graphs and repeated rounds of contamination/completeness 352 assessment by CheckM. Final curated bins with CheckM estimated completeness above 10% and 353 contamination below 3% were denominated as metagenome assembled genomes (MAGs). A total of 354 35 MAGs were recovered: 23 Lokiarchaeia, 10 Thorarchaeia and 2 Heimdallarchaeia (Supplementary 355 Table 1). Unbinned contigs were kept for further analyses (total nucleotide bases/site: 3.46 Mbp 356 Amara and 4.06 Mbp Tekirghiol). 357 Genome annotation: Publicly available Asgardarchaeota genomes were downloaded from the NCBI 358 Genome section (https://www.ncbi.nlm.nih.gov/genome). Coding sequences were predicted de 359 novo with Prokka44 for all available MAGs (35 from this study, 14 from NCBI – Accession 360 numbers can be found in Supplementary Table 2). BlastKOALA45 was used to assign KO identifiers (K 361 numbers) to orthologous genes (Supplementary Table 3). Inferences of metabolic pathways and 362 general biological functions were conducted with the online KEGG mapping tools 363 (https://www.genome.jp/kegg/kegg1b.html) using summarized KO numbers assigned to each group. 364 Odinarchaeia were not considered for metabolic reconstruction due to paucity of genome-level 365 data. Ribosomal RNA (rRNA)-coding regions (16S, 23S) and transfer RNA (tRNA)-coding regions were 366 predicted with Barrnap (https://github.com/tseemann/barrnap) and tRNAscan-SE46, respectively. All 367 predicted proteins were queried against NCBI NR, COGs (cluster of orthologous groups) and arCOGs 368 (archaeal cluster of orthologous groups)47. A locally installed version of InterProScan48 was used with 369 default settings to annotate protein domains. Potential eukaryote specific proteins (ESP) were 370 identified based on previously published lists of IPR domains5 (Supplementary Table 5) identified in 371 Asgardarchaeota. Previously unidentified ESP were searched based on key words related to 372 eukaryotic specific processes and/or structures. All IPR domains present exclusively in 373 Asgardarchaeota genomes assembled in this study were manually screened by querying accession 374 numbers against the online InterPro database (https://www.ebi.ac.uk/interpro/search/sequence- 375 search), for associations with eukaryotic specific domains. A previously identified5 ESP - DNA 376 polymerase epsilon, catalytic subunit (IPR029703) - was identified by querying all MAG proteomes 377 with human sequences (Supplementary Table 5). Several candidate ESP sequences were further 378 analyzed using jackhmmer49, Phyre250 and Phobius51. 379 Phylogenetic trees: While all the genomic data was used in metabolic reconstructions, the 380 phylogenetic analyses were performed only with MAGs that had >50% completeness. Due to the 381 challenges associated with reconstructing the evolutionary relationships between archaea and 382 eukayotes5, in our inferences we used only those MAGs (n= 8; Supplementary Table 1) that 383 harbored at least 75% of total phylogenetic markers (See Supplementary Table 7). 384 A total of 131 taxa were considered for concatenated small subunit (SSU) and larger subunit (LSU) 385 ribosomal RNA phylogenetic analyses, consisting of: 97 archaea (37 , 24 386 , 2 Bathyarchaeota, 15 , 3 Aigarchaeota, 2 , 14 387 Asgardarchaeota), 21 bacteria and 13 eukaryotes (Supplementary Table 6). SSU and LSU sequences

8

388 were aligned independently by PRANK52 (parameters: -DNA +F), trimmed using BMGE53 (–m 389 DNAPAM250:4 –g 0.5) and concatenated. Members of the DPANN group of Archaea (Diapherotrites, 390 Parvarchaeota, Aenigmarchaeota, , Nanohaloarchaeota, Woesearchaeota, and 391 Pacearchaeota) were not included due to their known tendency to cause phylogenetic artefacts5. 392 Maximum likelihood phylogeny for concatenated SSU+LSU gene sequences was inferred using IQ- 393 TREE (-m GTR+I+G4+F) with ultrafast bootstrapping -bb 1000 and Shimodaira-Hasegawa testing –alrt 394 100054,55. 395 A total of 93 taxa were considered for concatenated ribosomal protein phylogenomic analyses, 396 consisting of: 85 Archaea (25 Euryarchaeota, 22 Crenarchaeota, 2 Bathyarchaeota, 4 397 Thaumarchaeota, 1 Aigarchaeota, 3 Korarchaeota, 21 Asgardarchaeota, 7 DPANN) and 8 eukaryotes 398 (Supplementary Table 6). Selection criteria for phylogenomic trees of ribosomal proteins conserved 399 between archaea and eukaryotes have been previously described5. Amino-acid sequences for the 55 400 ribosomal proteins were queried and retrieved based on arCOG annotations. Markers not found in 401 the majority of organisms were discarded, obtaining a final set of 48 markers (Supplementary Table 402 7). Additionally, some proteins that were not identified by arCOG scanning were retrieved from NCBI 403 Protein (https://www.ncbi.nlm.nih.gov/protein). Sequences were aligned using PRANK (-protein +F), 404 trimmed with BMGE53 (-m BLOSUM30 -t AA -g 0.2 -b 3), concatenated, and subjected to SR4 amino 405 acid recoding13. Maximum likelihood trees were generated by IQ-TREE (-bb 1000, -alrt 1000) with 406 ultrafast bootstrapping54 and the custom ‘C60SR4’ model described in a previous study5. Bayesian 407 inference phylogenies were constructed using PhyloBayes MPI 1.856, using the CAT-Poisson model. 408 Four chains were run in parallel until estimated maxdiff values calculated by bpcomp (-x 5000 10) fell 409 below the recommended 0.3 threshold, indicating convergence between chains. 410 Multiple sequence alignment of rhodopsins: The three groups of rhodopsins (type-1, 411 schizorhodopsins and heliorhodopsins), were first aligned independently using T_Coffee57 412 (http://tcoffee.crg.cat/) in accurate mode, that employs protein structure information, wherever 413 available, or sequence comparisons with homologues in databases to improve accuracy. These 414 alignments were aligned to each other using the profile alignment mode in T_Coffee. 415 RuBisCO tree reconstruction: MUSCLE58 was used for aligning the sequences (n=146) of the large 416 subunit of RuBisCO (types I-III) and RuBisCO-like (type IV) (rbcL, K01601) proteins. Sequences not 417 generated in this study were recovered from previous studies59,60. For both alignments the maximum 418 likelihood tree was constructed with FastTree2 using a JTT model, a gamma approximation, and 100 419 bootstrap replicates. 420 Phylogenetic inference of Heimdallarchaeia glucokinases and kynurenine pathway proteins: ADP- 421 dependent phosphofructokinase/glucokinase protein sequences were identified by their assigned 422 KO number (K00918) in 3 MAGs (AMARA_4, Heimdall_AB_125, Heimdall_LC_3). Retrieved 423 sequences were used along with 49 other sugar kinases published in a previous study61. Protein 424 sequences of components of the kynurenine pathway - tryptophan 2,3-dioxygenase (TDO), 425 kynurenine 3-monooxygenase (KMO) and 3-hydroxyanthranilate 3,4-dioxygenase (HAAO) – that 426 were identified only in Heimdallarchaeia MAGs, were used along with sequences of corresponding 427 enzymes from 12 Eukaryotes and 15 Bacteria that were retrieved from NCBI RefSeq (Accession 428 numbers in Supplementary Table 8). MAFFT-L-INS-i62 (default parameters) and PRANK52 429 (parameters: -PROTEIN +F) were used for aligning sugar kinase and kynurenine pathway enzyme 430 sequences, respectively, followed by trimming using BMGE53 (-m BLOSUM30 -t AA -g 0.5 -b 3). Single 431 protein maximum likelihood trees were constructed with FastTree263, using an accurate search 432 strategy (-mlacc 2 –spr 4 –slownni), and 100 standard bootstrap replicates. 433 Data and code availability

434 Sequence data generated during this study has been deposited in the NCBI Sequence Read Archive 435 (SRA) under study number SRP155597 and linked to BioProject ID PRJNA483005. The Whole Genome

9

436 Shotgun project containing genome bins assembled in this study has been deposited at 437 DDBJ/ENA/GenBank under the accessions SDMS00000000-SDOA00000000. The versions described 438 in this paper are version SDMS01000000-SDOA01000000. Derived data supporting the findings 439 presented in this paper are available in the Figshare repository with identifier DOI: 440 10.6084/m9.figshare.702262, [https://figshare.com/s/eecba719e91d17c60c3f]. All other relevant 441 data supporting the findings of this study are available within the paper and its supplementary 442 information files. No custom code that is central to the conclusions of this study was generated. All 443 programs used in data analyses are listed in detail with their version numbers in the Nature Research 444 Reporting Summary linked to this article.

445

446 References

447 1. Cox, C. J., Foster, P. G., Hirt, R. P., Harris, S. R. & Embley, T. M. The archaebacterial origin of 448 eukaryotes. Proc. Natl. Acad. Sci. 105, 20356 LP-20361 (2008). 449 2. Williams, T. A., Foster, P. G., Cox, C. J. & Embley, T. M. An archaeal origin of eukaryotes 450 supports only two primary domains of life. Nature 504, 231 (2013). 451 3. Spang, A. et al. Complex archaea that bridge the gap between prokaryotes and eukaryotes. 452 Nature 521, 173 (2015). 453 4. Spang, A. et al. Asgard archaea are the closest prokaryotic relatives of eukaryotes. PLOS 454 Genet. 14, e1007080 (2018). 455 5. Zaremba-Niedzwiedzka, K. et al. Asgard archaea illuminate the origin of eukaryotic cellular 456 complexity. Nature 541, 353–358 (2017). 457 6. McInerney, J. O., O’Connell, M. J. & Pisani, D. The hybrid nature of the Eukaryota and a 458 consilient view of life on Earth. Nat. Rev. Microbiol. 12, 449 (2014). 459 7. Da Cunha, V., Gaia, M., Nasir, A. & Forterre, P. Asgard archaea do not close the debate about 460 the universal tree of life topology. PLOS Genet. 14, e1007215 (2018). 461 8. de Duve, C. The origin of eukaryotes: a reappraisal. Nature reviews. Genetics 8, 395–403 462 (2007). 463 9. Lake, J. A., Henderson, E., Oakes, M. & Clark, M. W. Eocytes: a new ribosome structure 464 indicates a with a close relationship to eukaryotes. Proc. Natl. Acad. Sci. 81, 3786– 465 3790 (1984). 466 10. Woese, C. R., Kandler, O. & Wheelis, M. L. Towards a natural system of organisms: proposal 467 for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. U. S. A. 87, 4576–4579 468 (1990). 469 11. Eme, L., Spang, A., Lombard, J., Stairs, C. W. & Ettema, T. J. G. Archaea and the origin of 470 eukaryotes. Nat. Rev. Microbiol. 15, 711–723 (2017). 471 12. Martijn, J., Vosseberg, J., Guy, L., Offre, P. & Ettema, T. J. G. Deep mitochondrial origin 472 outside the sampled alphaproteobacteria. Nature 557, 101–105 (2018). 473 13. Susko, E. & Roger, A. J. On reduced amino acid alphabets for phylogenetic inference. Mol. 474 Biol. Evol. 24, 2139–2150 (2007). 475 14. Liu, Y. et al. Comparative genomic inference suggests mixotrophic lifestyle for Thorarchaeota. 476 ISME J. 12, 1021–1031 (2018). 477 15. Dodding, M. P. Folliculin – A tumor suppressor at the intersection of metabolic signaling and 478 membrane traffic. Small GTPases 8, 100–105 (2017). 479 16. Pushkarev, A. et al. A distinct abundant group of microbial rhodopsins discovered using 480 functional metagenomics. Nature 558, 595–599 (2018). 481 17. Flores-Uribe, J. et al. Heliorhodopsins are absent in diderm (Gram-negative) bacteria: Some 482 thoughts and possible implications for activity. Environ. Microbiol. Rep. 0, (2019). 483 18. Petrovskaya, L. E. et al. Predicted bacteriorhodopsin from Exiguobacterium sibiricum is a 484 functional proton pump. FEBS Lett. 584, 4193–4196 (2010).

10

485 19. Alexe, M. Studiul lacurilor sărate din Depresiunea Transilvaniei. (Presa Universitară Clujeană, 486 2010). 487 20. Sousa, F. L., Neukirchen, S., Allen, J. F., Lane, N. & Martin, W. F. Lokiarchaeon is hydrogen 488 dependent. Nat. Microbiol. 1, 16034 (2016). 489 21. Ternes, C. M. & Schönknecht, G. Gene Transfers Shaped the Evolution of De Novo NAD(+) 490 Biosynthesis in Eukaryotes. Genome Biol. Evol. 6, 2335–2349 (2014). 491 22. Gazzaniga, F., Stebbins, R., Chang, S. Z., McPeek, M. A. & Brenner, C. Microbial NAD 492 Metabolism: Lessons from Comparative Genomics. Microbiol. Mol. Biol. Rev. 73, 529–541 493 (2009). 494 23. Kurnasov, O. et al. Aerobic tryptophan degradation pathway in bacteria: novel kynurenine 495 formamidase. FEMS Microbiol. Lett. 227, 219–227 (2003). 496 24. Abaibou, H., Pommier, J., Benoit, S., Giordano, G. & Mandrand-Berthelot, M. A. Expression 497 and characterization of the Escherichia coli fdo locus and a possible physiological role for 498 aerobic formate dehydrogenase. J. Bacteriol. 177, 7141–7149 (1995). 499 25. Brasen, C., Esser, D., Rauch, B. & Siebers, B. Carbohydrate metabolism in Archaea: current 500 insights into unusual enzymes and pathways and their regulation. Microbiol. Mol. Biol. Rev. 501 78, 89–175 (2014). 502 26. Dorr, C., Zaparty, M., Tjaden, B., Brinkmann, H. & Siebers, B. The of the 503 hyperthermophile Thermoproteus tenax. ATP-dependent and ADP-dependent 504 glucokinases, teo alternatives for glucose phosphorylation in Archaea. J. Biol. Chem. 278, 505 18744–18753 (2003). 506 27. Kono, T. et al. A RuBisCO-mediated carbon metabolic pathway in methanogenic archaea. Nat. 507 Commun. 8, 14007 (2017). 508 28. Techtmann, S. M. et al. Evidence for horizontal gene transfer of anaerobic carbon monoxide 509 dehydrogenases. Front. Microbiol. 3, 132 (2012). 510 29. Martin, W. F. Hydrogen, metals, bifurcating electrons, and proton gradients: The early 511 evolution of biological energy conservation. FEBS Lett. 586, 485–493 (2012). 512 30. Betts, H. et al. Integrated genomic and fossil evidence illuminates life’s early evolution and 513 eukaryote origins. Nat. Ecol. Evol. (2018). 514 31. Gastescu, P. & Teodorescu, D. C. The lakes of the Romanian Black Sea coast. Maninduced 515 changes, water regime, present state. Rev. Roum. Gėogr./Rom. Journ. Geogr. 60, 27–42 516 (2016). 517 32. Fedorov, P. V. Postglacial transgression of the Black Sea. Int. Geol. Rev. 14, 160–164 (1972). 518 33. Bushnell, B., Rood, J. & Singer, E. BBMerge - Accurate paired shotgun read merging via 519 overlap. PLoS One 12, e0185056 (2017). 520 34. Bushnell, B. BBMap short read aligner. Univ. California, Berkeley, California. URL 521 http//sourceforge. net/projects/bbmap (2016). 522 35. Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 523 2460–2461 (2010). 524 36. Pruesse, E. et al. SILVA: a comprehensive online resource for quality checked and aligned 525 ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 35, 7188–7196 526 (2007). 527 37. Nawrocki, E. P. Structural RNA Homology Search and Alignment using Covariance Models. 528 (Washington University School of Medicine, 2009). 529 38. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search 530 tool. J. Mol. Biol. 215, 403–410 (1990). 531 39. Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node 532 solution for large and complex metagenomics assembly via succinct de Bruijn graph. 533 Bioinformatics 31, 1674–1676 (2015). 534 40. Hyatt, D., LoCascio, P. F., Hauser, L. J. & Uberbacher, E. C. Gene and translation initiation site 535 prediction in metagenomic sequences. Bioinformatics 28, 2223–2230 (2012).

11

536 41. Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the 537 analysis of massive data sets. Nat. Biotechnol. 35, 1026 (2017). 538 42. Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately 539 reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015). 540 43. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing 541 the quality of microbial genomes recovered from isolates, single cells, and metagenomes. 542 Genome Res. 25, 1043–1055 (2015). 543 44. Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 544 (2014). 545 45. Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG Tools for 546 Functional Characterization of Genome and Metagenome Sequences. J. Mol. Biol. 428, 726– 547 731 (2016). 548 46. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA 549 genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997). 550 47. Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Archaeal Clusters of Orthologous Genes (arCOGs): 551 An Update and Application for Analysis of Shared Features between , 552 , and . Life (Basel, Switzerland) 5, 818–840 (2015). 553 48. Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 554 30, 1236–1240 (2014). 555 49. Finn, R. D. et al. HMMER web server: 2015 update. Nucleic Acids Res. 43, W30-8 (2015). 556 50. Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. E. The Phyre2 web portal 557 for protein modeling, prediction and analysis. Nat. Protoc. 10, 845–858 (2015). 558 51. Käll, L., Krogh, A. & Sonnhammer, E. L. L. Advantages of combined transmembrane topology 559 and signal peptide prediction—the Phobius web server. Nucleic Acids Res. 35, W429–W432 560 (2007). 561 52. Loytynoja, A. Phylogeny-aware alignment with PRANK. Methods Mol. Biol. 1079, 155–170 562 (2014). 563 53. Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): a new 564 software for selection of phylogenetic informative regions from multiple sequence 565 alignments. BMC Evol. Biol. 10, 210 (2010). 566 54. Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: Improving 567 the Ultrafast Bootstrap Approximation. Mol. Biol. Evol. 35, 518–522 (2018). 568 55. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: A Fast and Effective 569 Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol. Biol. Evol. 32, 570 268–274 (2015). 571 56. Lartillot, N., Rodrigue, N., Stubbs, D. & Richer, J. PhyloBayes MPI: phylogenetic reconstruction 572 with infinite mixtures of profiles in a parallel environment. Syst. Biol. 62, 611–615 (2013). 573 57. Notredame, C., Higgins, D. G. & Heringa, J. T-Coffee: A novel method for fast and accurate 574 multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000). 575 58. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. 576 Nucleic Acids Res. 32, 1792–1797 (2004). 577 59. Tabita, F. R. et al. Function, structure, and evolution of the RubisCO-like proteins and their 578 RubisCO homologs. Microbiol. Mol. Biol. Rev. 71, 576–599 (2007). 579 60. Wrighton, K. C. et al. RubisCO of a nucleoside pathway known from Archaea is found in 580 diverse uncultivated phyla in bacteria. ISME J. 10, 2702–2714 (2016). 581 61. Castro-Fernandez, V. et al. Reconstructed ancestral enzymes reveal that negative selection 582 drove the evolution of substrate specificity in ADP-dependent kinases. J. Biol. Chem. 292, 583 21218 (2017). 584 62. Katoh, K. MAFFT: a novel method for rapid multiple sequence alignment based on fast 585 Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002). 586 63. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2--approximately maximum-likelihood trees

12

587 for large alignments. PLoS One 5, e9490 (2010). 588 589 590 591 Correspondence and requests for materials should be addressed to R.G. ([email protected]). 592 Acknowledgements 593 We are thankful to Z. Keresztes, V. Muntean, T. Szőke-Nagy, M. Alexe, A. Cristea, and A. Baricz for 594 their technical support during sampling and sample preparation. The contribution of E. A. Levei and 595 M. Șenilă in performing chemical analyses is kindly acknowledged. P-A.B was supported by the 596 research grant PN-III-P4-ID-PCE-2016-0303 (Romanian National Authority for Scientific Research). 597 H.L.B. was supported by the research grants: PN-III-P4-ID-PCE-2016-0303 (Romanian National 598 Authority for Scientific Research) and STAR-UBB Advanced Fellowship-Intern (Babeș-Bolyai 599 University). A-Ş.A. was supported by the research grants: 17-04828S (Grant Agency of the Czech 600 Republic) and MSM200961801 (Academy of Sciences of the Czech Republic). M.M. was supported by 601 the Postdoctoral program PPPLZ L200961651 (Academy of Sciences of the Czech Republic). R.G. was 602 supported by the research grant 17-04828S (Grant Agency of the Czech Republic). 603 604 Contributions 605 H.L.B. and P-A.B. designed the study. P-A.B., A-Ş.A. and R.G. wrote the manuscript. P-A.B., A-Ş.A., 606 R.G., M.M.S and M.M. analyzed and interpreted the data. R.G., O.B., K.I. and H.K. performed 607 rhodopsin data analyses. All authors commented on and approved the manuscript. 608 609 Competing interests 610 The authors declare no competing interests. 611 612 613 Figures Legends 614 615 Figure 1. Asgardarchaeota phylogenomics. a, Maximum likelihood (LG+C60, general matrix and 60- 616 profile protein models of amino acid substitution) phylogeny of the Asgardarchaeota superphylum. 617 The green circles highlight UFBoot values higher than 95. b, Asgardarchaeota phylogeny generated 618 through Bayesian inference (CAT-Poisson, CAT model of amino acid substitution with uniform global 619 exchange rates). The posterior probability values are shown above the internal nodes. High support 620 for Eukaryota/Asgardarchaeota monophyly, and the low support for Eukaryota/Heimdallarchaeia 621 association is indicated by red and blue rectangles on the nodes respectively. The black arrow 622 indicates the unresolved position of Lokiarchaeia. Scale bars indicate the number of substitutions 623 per site. a,b, Both phylogenies are based on 48 concatenated ribosomal protein markers 624 (Supplementary Table 7) with a, 85 archaeal lineages (Supplementary Table 6) selected for ML- 625 inference and b, 85 archaeal + 8 eukaryotic lineages used for Bayesian inference (Supplementary 626 Table 6). Panel (c) provides a census of the eukaryotic signature proteins (ESP) found in all available 627 MAGs. The grey box highlights ESP identified during this study.

628 Figure 2. Phylogenetic analysis of rhodopsins. An unrooted maximum likelihood tree of all 629 Asgardarchaeota schizorhodopsins (n=6) identified in this work, heliorhodopsins and representative 630 known type-1 rhodopsins, is shown. The branches colored red are sequences from the 631 Asgardarchaeota. Bootstrap values on nodes are indicated by colored circles (see color key at the 632 right). A total of 392 sequences, spanning known rhodopsin famillies and including schizorhodopsins 633 retrieved in this study, were used for phylogenetic inference. All related data was deposited in 634 Figshare.

13

635 636 Figure 3. Metabolic reconstruction of Heimdallarchaeia. The text in the yellow panels depicts 637 names of pathways and metabolic processes. Abbreviations: ACSS - acetyl-CoA synthetase and 638 carbonic anhydrase; acyP - acylphosphatase; ampp - AMP phosphorylase; APRT – AMP 639 pyrophosphorylase; ArsC - arsenate reductase (glutaredoxin); BCAA - branched-chain amino acid; 640 CODH - carbon monoxide dehydrogenase; gcvPAB – glycine dehydrogenase; glyA – glycine 641 hydroxymethyltransferase; hmp – nitric oxide dioxygenase; maeA - malate dehydrogenase 642 (decarboxylating); PC - pyruvate carboxylase; PEPCK - phosphoenolpyruvate carboxykinase; pflD - 643 formate C-acetyltransferase; PFOR - pyruvate ferredoxin oxidoreductase; PK - ; poxL 644 – pyruvate oxidase; PPDK - pyruvate, phosphate dikinase; PPP - pentose phosphate pathway; Rpi - 645 ribose-5-phosphate isomerase; RuBisCO - Ribulose-1,5-bisphosphate carboxylase/oxygenase; SOD - 646 superoxide dismutase; TCA - tricarboxylic acid cycle.

647 Figure 4. De novo NAD+ synthesis pathways. The colored boxes show a schematic representation of 648 the kynurenine and aspartate pathways involved in de novo NAD+ synthesis. The presence of the 649 enzymes involved in these pathways is indicated for each MAG by using a colored circle. 650 Abbreviations: 3HAO - 3-hydroxyanthranilate 3,4-dioxygenase; AFMID - arylformamidase; ASO - L- 651 aspartate oxidase; KMO - kynurenine 3-monooxygenase; KYNU - kynureninase; NMNAT - 652 nicotinamide nucleotide adenylyltransferase; NS - NAD+ synthase; QPT - nicotinate-nucleotide 653 pyrophosphorylase; QS - quinolinate synthase; TDO - tryptophan 2,3-dioxygenase.

14 a b 0.67 2 Bathyarchaeota 1 Thor_MP8T_1 1 4 Thaumarchaeota 0.78 Thor_MP9T_1 1 Aigarchaeota 1 22 Crenarchaeota AMARA_8 1 1 AMARA_2 3 Korarchaeota Odin_LCB_4 Odinarchaeia Thor_MP11T_1 1 TEKIR_12S Thor_SMTZ1_45 1 TEKIR_14 Thor_AB_25 Thor_SMTZ1_83 Thorarchaeia Thor_SMTZ1_83 1 Thor_AB_25 TEKIR_12S Thor_MP11T_1 TEKIR_14 Thor_SMZ1_45 AMARA_2

Odin_LCB_4 Odinarchaeia Thorarchaeia 0.93 AMARA_8 AMARA_1S Thor_MP8T_1 Lokiarc_75 Thor_MP9T_1 TEKIR_21 1 8 TEKIR_8 Loki_CR_4 1 AMARA_1 Eukaryota

AMARA_1 Lokiarchaeia 93 92 0.52 TEKIR_8 Loki_CR_4 Lokiarc_75 Heimdall_LC_2 AMARA_1S Heimdall_RS678 TEKIR_21 Lokiarchaeia Heimdall_AB_125 Heimdall_LC_3 0.99 Heimdall -archaeia Heimdall_AB_125 Heimdall_LC_3 Heimdall_LC_2 Asgardarchaeota Asgardarchaeota Heimdall 0.2 Heimdall_RS678 -archaeia 0.3 Odin Heimdall c -archaeia Thorarchaeia -archaeia Lokiarchaeia Odin_LCB_4 Thor_SMTZ1_83 Thor_SMTZ1_45 Thor_SMTZ_45 Thor_MP9T_1 Thor_MP8T_1 Thor_MP11T_1 Thor_AB_25 TEKIR_2S-1 TEKIR_22 TEKIR_17 TEKIR_14 TEKIR_12S TEKIR_7 TEKIR_6 AMARA_8 AMARA_4S AMARA_2 Heimdall_RS678 Heimdall_LC_3 Heimdall_LC_2 Heimdall_AB_125 TEKIR_3 AMARA_4 Loki_GC14_75 Loki_CR_4 TEKIR_24 TEKIR_23 TEKIR_21 TEKIR_18 TEKIR_15 TEKIR_13 TEKIR_10 TEKIR_9 TEKIR_8 TEKIR_7S TEKIR_6S TEKIR_5S TEKIR_5 TEKIR_4 TEKIR_2 TEKIR_1 AMARA_7 AMARA_5S AMARA_5 AMARA_3 AMARA_2S AMARA_1S AMARA_1

DNA polymerase ε-like Topoisomerase IB RNA polymerase, subunit G (rpb8) Ribosomal protein L22e processing Ribosomal protein L28e/Mak16 Information

Tubulins Actin family (divergent) Actin/actin-like conserved site Arp2/3 complex subunit 2/4 Gelsolin-domain protein cytoskeleton Profilin Cell division/

ESCRT-I: Vps28-like ESCRT-I: steadiness box domain ESCRT-II: EAP30 domain

ESCRT-II: Vps25-like sorting

ESCRTIII: Vps2/24/46-like Endosomal

Ubiquitin-domain protein E2-like ubiquitin conjugating protein

E3 UFM1-protein ligase 1 system Ubiquitin

RAG-type GTPase domain Longin-domain protein Vacuolar fusion protein Mon1 RLC7 roadblock domain protein TRAPP-domain protein

Zinc finger, Sec23/Sec24-type rafficking T Arrestin-like proteins (C-terminal) machinery Vesicle coat complex COPII sub. Folliculin (N-terminal)

Ribophorin I Oligosaccharyl transf. OST3/OST6 Oligosaccharyl transf. STT3 OST complex

Figure 1. Asgardarchaeota phylogenomics. a, Maximum likelihood (LG+C60, general matrix and 60-profile protein models of amino acid substitution) phylogeny of the Asgardarchaeota superphylum. The green circles highlight UFBoot values higher than 95. b, Asgardarchaeota phylogeny generated through Bayesian inference (CAT-Poisson, CAT model of amino acid substitution with uniform global exchange rates). The posterior probability values are shown above the internal nodes. High support for Eukaryota/Asgardarchaeota monophyly, and the low support for Eukaryota/Heimdallarchaeia association is indicated by red and blue rectangles on the nodes respectively. The black arrow indicates the unresolved position of Lokiarchaeia. Scale bars indicate the number of substitutions per site. a,b, Both phylogenies are based on 48 concatenated ribosomal protein markers (Supplementary Table 7) with a, 85 archaeal lineages (Supplementary Table 6) selected for ML-inference and b, 85 archaeal + 8 eukaryotic lineages used for Bayesian inference (Supplementary Table 6). Panel (c) provides a census of the eukaryotic signature proteins (ESP) found in all available MAGs. The grey box highlights ESP identified during this study. Asgard schizorhodopsins

Heimdallarchaeia rhodopsin (type-1)

Figure 2. Phylogenetic analysis of rhodopsins. An unrooted maximum likelihood tree of all Asgardarchaeota schizorhodopsins (n=6) identified in this work, heliorhodopsins and representative known type-1 rhodopsins, is shown. The branches colored red are sequences from the Asgardarchaeota. Bootstrap values on nodes are indicated by colored circles (see color key at the right). A total of 392 sequences, spanning known rhodopsin famillies and including schizorhodopsins retrieved in this study, were used for phylogenetic inference. All related data was deposited in Figshare. Bacteriorhodopsin (putative H+ Glycoside Co2+ Polysaccharide Helio- pump) Simple Multiple Pentoside Zn2+ transport Peptide/ Polar Basic rhodopsin Schizorhodopsin 2+ + sugar sugars Hexuronide Cd (PST) Phosphonate Nickel BCAA AA Amine AA H Light

Glucose-1P Glucose-6P L-Tryptophan L-Aspartate As3+ Fe2+ Acr3 SOD catalase feoA/feoB 3+ 2O +2H+ H O H O As Fructose-6P Ribulose-5P Quinolinate 2 2 2 2 efeU Kynurenine 3+ Pathway + O 0.5 O As Fructose-1,6P2 Archaeal NAD 2 2 Fe2+/Zn2+ ArsA/ArsB ArsC PPP fieF pump APRT acyP Glyceraldehyde-3P Ribose-5P PRPP AMP Acetyl-P Acetate Phosphate/As5+ 5+ As ampp Cl- PiT family H2O2 Glycerate-1,3P2 CO Adenine Ribose-1,5P2 clcA Tetracyclines 2 ACSS Rpi CO2 Damaged/misfolded RuBisCO proteins Antibiotics Quinolones Glycerate-2P Ribulose-1,5P2 (MFS transporters) Fe3+ poxL Acetyl-CoA CO PPDK 2 O Archaellum ? Macrolides PFOR 2 oligopeptides Glycerate-3P PEP Pyruvate Acetyl-CoA Chemotactic PK 2C LonB protease signals Acylglycerol ATP Formate PC ATP pflD Citrate degradation ADP+Pi ADP+Pi 3-Phenylpropionic Formyl-THF PEPCK Oxaloacetate cis-Aconitate Tungstate Purines, acid CO fMet-tRNA 2 maeA Fatty Acid Malate D-Isocitrate Glycerol-3P THF β-Oxidation Tat Methenyl-THF CO2 Extracellular Glycine Glycerol-3P hydrolases Twin-arginine TCA dehydrogenase glyA gcvPAB Fumarate Protein CO2 DHAP Methylene-THF translocation - α-Ketoglutarate carbonic 2e Succinate anhydrase HS- Sec-dependent Serine CO +NH + CO 2- 2 4 2 CO2+H2O HCO3 Sec Succinyl-CoA HS - hmp cyanophycin N-β-aspartylarginine n NO + O NO - 3C 2H+ 2 3 H O ADP+Pi ATP - Sulfide:quinone Thiamine 2 2e + oxidoreductase Formate NAD + - NADH EA + H 2H +0.5O2 dehydrogenase O NO3 - - + Fumarate Succinate NO2 NO CO CO H e- 3 2 e- - EA - - EA EAe 2e 2e

+ Nitrate + + Formate CO +H CODH H Succinate H + 2 reductase 1 NO - NO - NADH 2H 2 3 (aerobic type) dehydrogenase cytochrome c oxidase V/A-type ATPase Nitrate/nitrite dehydrogenase transporter

EA XYZ

ABC-type Other Electron transporter transporters Antiporter acceptor Enzyme Rhodopsins

Figure 3. Metabolic reconstruction of Heimdallarchaeia. The text in the yellow panels depicts names of pathways and metabolic processes. Abbreviations: ACSS - acetyl-CoA synthetase and carbonic anhydrase; acyP - acylphosphatase; ampp - AMP phosphorylase; APRT - AMP pyrophosphorylase; ArsC - arsenate reductase (glutaredoxin); BCAA - branched-chain amino acid; CODH - carbon monoxide dehydrogenase; gcvPAB – glycine dehydrogenase; glyA - glycine hydroxymethyltransferase; hmp - nitric oxide dioxygenase; maeA - malate dehydrogenase (decarboxylating); PC - pyruvate carboxylase; PEPCK - phosphoenolpyruvate carboxykinase; pflD - formate C-acetyltransferase; PFOR - pyruvate ferredoxin oxidoreductase; PK - pyruvate kinase; poxL – pyruvate oxidase; PPDK - pyruvate, phosphate dikinase; PPP - pentose phosphate pathway; Rpi - ribose-5-phosphate isomerase; RuBisCO - Ribulose-1,5-bisphosphate carboxylase/oxygenase; SOD - superoxide dismutase; TCA - tricarboxylic acid cycle. TDO AFMID KMO KYNU 3HAO 3-hydroxy 2-Amino 3-hydroxy L-Tryptophan N-formyl L-kynurenine anthranilate 3-carboxymuconate kynurenine kynurenine (3-HA) semialdehyde Quinolinate The kynurenine pathway of tryptophan degradation ASO QS QPT Aspartate pathway L-Aspartate Iminoaspartate

Nicotinate Heimdall mononucleotide (NaMN) Thorarchaeia -archaeia Lokiarchaeia NMNAT

Nicotinate adenine dinucleotide Thor_SMTZ1_83 Thor_SMTZ1_45 Thor_SMTZ_45 Thor_MP9T_1 Thor_MP8T_1 Thor_MP11T_1 Thor_AB_25 TEKIR_2S-1 TEKIR_22 TEKIR_17 TEKIR_14 TEKIR_12S TEKIR_7 TEKIR_6 AMARA_8 AMARA_4S AMARA_2 Heimdall_RS678 Heimdall_LC_3 Heimdall_LC_2 Heimdall_AB_125 TEKIR_3 AMARA_4 Loki_GC14_75 Loki_CR_4 TEKIR_24 TEKIR_23 TEKIR_21 TEKIR_18 TEKIR_15 TEKIR_13 TEKIR_10 TEKIR_9 TEKIR_8 TEKIR_7S TEKIR_6S TEKIR_5S TEKIR_5 TEKIR_4 TEKIR_2 TEKIR_1 AMARA_7 AMARA_5S AMARA_5 AMARA_3 AMARA_2S AMARA_1S AMARA_1 (NaAD) TDO AFMID Aerobic KMO Tryptophan KYNU degradation NS 3HAO

Aspartate ASO Nicotinamide degradation QS adenine NAD+ dinucleotide QPT + NMNAT synthesis (NAD ) NS from quinolinate

Figure 4. De novo NAD+ synthesis pathways. The colored boxes show a schematic representation of the kynurenine and aspartate pathways involved in de novo NAD+ synthesis. The presence of the enzymes involved in these pathways is indicated for each MAG by using a colored circle. Abbreviations: 3HAO - 3-hydroxyanthranilate 3,4-dioxygenase; AFMID - arylformamidase; ASO - L-aspartate oxidase; KMO - kynurenine 3-monooxygenase; KYNU - kynureninase; NMNAT - nicotinamide nucleotide adenylyltransferase; NS - NAD+ synthase; QPT - nicotinate-nucleotide pyrophosphorylase; QS - quinolinate synthase; TDO - tryptophan 2,3-dioxygenase.