<<

Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

Archaeology of Eukaryotic DNA Replication

Kira S. Makarova and Eugene V. Koonin

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894 Correspondence: [email protected]

Recent advances in the characterization of the archaeal DNA replication system together with comparative genomic analysis have led to the identification of several previously un- characterized archaeal proteins involved in replication and currently reveal a nearly com- plete correspondence between the components of the archaeal and eukaryotic replication machineries. It can be inferred that the archaeal ancestor of and even the last common ancestor of all extant possessed replication machineries that were compa- rable in complexity to the eukaryotic replication system. The eukaryotic replication system encompasses multiple paralogs of ancestral components such that heteromeric complexes in eukaryotes replace archaeal homomeric complexes, apparently along with subfunctionali- zation of the eukaryotic complex subunits. In the archaea, parallel, lineage-specific dupli- cations of many genes encoding replication machinery components are detectable as well; most of these archaeal paralogs remain to be functionally characterized. The archaeal rep- lication system shows remarkable plasticity whereby even some essential components such as DNA polymerase and single-stranded DNA-binding protein are displaced by unrelated proteins with analogous activities in some lineages.

ouble-stranded DNA is the molecule that Okazaki fragments (Kornberg and Baker 2005; Dcarries genetic information in all cellular Barry and Bell 2006; Hamdan and Richardson life-forms; thus, replication of this genetic ma- 2009; Hamdan and van Oijen 2010). Thus, it terial is a fundamental physiological process was a major surprise when it became clear that that requires high accuracy and efficiency the protein machineries responsible for this (Kornberg and Baker 2005). The general mech- complex process are drastically different, espe- anism and principles of DNA replication are cially in compared with archaea and common in all three domains of life—archaea, eukarya. The core components of the bacterial bacteria, and eukaryotes—and include recog- replication systems, such as DNA polymerase, nition of defined origins, melting DNA with primase, and replication helicase, are unrelated the aid of dedicated helicases, RNA priming or only distantly related to their counterparts in by the dedicated primase, recruitment of DNA the archaeal/eukaryotic replication apparatus polymerases and processivity factors, replica- (Edgell 1997; Leipe et al. 1999). tion fork formation, and simultaneous replica- The existence of two distinct molecular tion of leading and lagging strands, the latter via machines for replication has raised

Editors: Stephen D. Bell, Marcel Me´chali, and Melvin L. DePamphilis Additional Perspectives on DNA Replication available at www.cshperspectives.org Copyright # 2013 Cold Spring Harbor Laboratory Press; all rights reserved. Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963

1 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

K.S. Makarova and E.V. Koonin

obvious questions on the nature of the replica- Krupovic et al. 2010). Replicon fusion also is a tion system in the last universal common ances- plausible path from a single origin of replication tor (LUCA) of all extant cellular life-forms, and that is typical of bacteria to multiple origins three groups of hypotheses have been proposed present in archaea and eukaryotes. However, (Leipe et al. 1999; Forterre 2002; Koonin 2005, all the evidence in support of frequent replicon 2006, 2009; Glansdorff 2008; McGeoch and Bell fusion and the plausibility of replicon takeover 2008): (1) The replication systems in Bacte- notwithstanding, there is no evidence of dis- ria and in the archaeo–eukaryotic lineage placement of the bacterial replication apparatus originated independently from an RNA-ge- with the archaeal version introduced by mobile nome LUCA or from a noncellular ancestral elements, or vice versa, displacement of the ar- state that encompassed a mix of genetic ele- chaeal machinery with the bacterial version, de- ments with diverse replication strategies and spite the rapid accumulation of diverse bacterial molecular machineries. (2) The LUCA was a and archaeal genome sequences. Thus, the dis- typical cellular life-form that possessed either placement scenarios of DNA replication ma- the archaeal or the bacterial replication appara- chinery evolution are so far not supported by tus in which several key components have been comparative genomic data. replaced in the other major cellular lineage. (3) Regardless of the nature of the DNA replica- The LUCAwas a complex cellular life-form that tion system (if any) in the LUCA and the under- possessed both replication systems, so that the lying causes of the archaeo–bacterial dichotomy differentiation of the bacterial and the ar- of replication machineries, the similarity be- chaeo–eukaryotic replication machineries oc- tween the archaeal and eukaryotic replication curred as a result of genome streamlining in systems is striking (Table 1). Generally, the ar- both lines of descent that was accompanied by chaeal replication protein core appears to be differential loss of components. With regard to the same as the eukaryotic core, but eukaryotes the possible substitution of replication systems, typically possess multiple paralogous subunits a plausible mechanism could be replicon take- within complexes that are homomeric in ar- over (Forterre 2006; McGeoch and Bell 2008). chaea, many additional components interacting Under the replicon takeover hypothesis, mobile with the core ones and more complex regulation elements introduce into cells a new replication (Leipe et al. 1999; Bell and Dutta 2002; Bohlke system or its components, which can displace et al. 2002; Kelman and White 2005; Barry and the original replication system through one or Bell 2006). Thus, the archaeal replication system several instances of integration of the given ele- appears to be an ancestral version of the eukary- ment into the host genome accompanied by in- otic system and hence a good model for func- activation of the host replication genes and/or tional and structural studies aimed at gaining origins of replication. This scenario is compat- mechanistic insights into eukaryotic replication. ible with the experimental results showing that In the last few years, there has been substan- DNA replication DNA in Escherichia coli with tial progress in the study of the archaeal replica- an inactivated DnaAgene or origin of replica- tion systems that has led to an apparently com- tion can be rescued by the replication appara- plete delineation of all proteins that are essential tus of R1 or F1 plasmids integrated into the for replication (Berquist et al. 2007; Beattie and bacterial chromosome (Bernander et al. 1991; Bell 2011a; MacNeill 2011). The combination of Koppes 1992). Furthermore, genome analysis experimental, structural, and bioinformatics suggests frequent replicon fusion in archaea studies has led to the discovery of archaeal ho- and bacteria (McGeoch and Bell 2008); in par- mologs (orthologs) for several components of ticular, such events are implied by the observa- the replication system that have been previously tion that in archaeal , genes encoding deemed specific for eukaryotes (Barry and multiple paralogs of the replication helicase Bell 2006; MacNeill 2010, 2011; Makarova MCM and origins of replication are associated et al. 2012). Furthermore, complex evolutionary with mobile elements (Robinson and Bell 2007; events that involve multiple lineage-specific

2 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

Evolution of DNA Replication

Table 1. The relationship between archaeal and eukaryotic replication systems Archaea Eukaryotes (projection for LACA) (projection for LECA) Comments ORC complex arORC1 Orc1, Cdc6 In LACA the ORC/Cdc6 complex probably arORC2 Orc2, Orc3, Orc4, Orc5 consisted of two distinct subunits, and in TFIIB or homologa Orc6 LECA of six distinct. WhiP or other Cdt1 Both complexes might possess additional Orc6 wHTH proteina and Cdt1 components. CMG complex Archaeal Cdc45/RecJ Cdc45 In many archaea and eukaryotes, CDC45/RecJ Mcm Mcm2, Mcm3, Mcm4, apparently contain inactive DHH Mcm5, Mcm6, Mcm7 phosphoesterase domains. Gins23 Gins2, Gins3 The RecJ family is triplicated in euryarchaea, Gins15 Gins1, Gins5 and some of the paralogs could be involved in Inactivated MCM Mcm10 repair. homologa MCM is independently duplicated in several lineages of euryarchaea. CMG activation factors — RecQ/Sld2 There is no evidence that kinases and — Treslin/Sld3 phosphatases in archaea are directly involved — TopBP1/Dpb11 in replication, although they probably STK CDK, DDK regulate division. PP2C PP2C Primases Prim1/p48 PriS In eukaryotes, Pol a is involved in priming by Prim2a/p58 PriL adding short DNA fragments to RNA DnaG — primers. In archaea, DnaG might be involved in priming specifically on the lagging strand. Polymerases PolB3 Pol a,Pold,Polz No eukaryotic homologs of DP2 are known, but PolB1 Pol 1 Zn fingers of Pol 1 are apparently derived DP1 B subunits of Pol a,Pold,Pol from DP2. z,Pol1 DP2 —

DNA polymerase sliding clamp and clamp loader RFCL RFC1 Eukaryotes have additional duplications of both RFCS RFC2, RFC3, RFC4, RFC4 RFCs and PCNA involved in checkpoint PCNA PCNA complexes (Rad27 and Rad1, Rad9, Hus1, respectively). Primer removal and gap closure RNase H2 RNase II There is a triplication of ligases (LigI, LigIII, Fen1 Fen1/EXO1, Rad2, Rad27 LigIV) in eukaryotes, but only LigI is directly Lig1 Lig1 involved in replication. In a few Halobacteria, ATP-dependent ligase is replaced by NAD-dependent ligase. SSB arRPA1_long Rpa1 In , RPA is displaced by the arRPA1_short and Rpa2 non-homologous ThermoSSB; two short RPA2 RPA forms in many euryarchaea; expansion arCOG05741a Rpa3 of short RPA forms in Halobacteria. For eukaryotic genes in Homo sapiens and Saccharomyces cerevisiae, gene names are indicated. Archaeal genes are denoted as in Barry and Bell (2006) or as introduced here. aNot confidently traced to LACA. Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

K.S. Makarova and E.V. Koonin

duplications, rearrangements, and gene archaea encode representatives of each of these loss, and in part seem to parallel the evolution groups. of the evolution of the replication system in Thermoproteales, Nanoarchaeum equitans, eukaryotes, have been delineated for a variety and several possess only one of replication proteins in several archaeal lin- ORC homolog that, in the latter two cases, eages (Tahirov et al. 2009; Chia et al. 2010; Kru- belongs to the arORC1 group. povic et al. 2010). Here we summarize these show evidence of ancestral bursts of duplication findings and present several additional case events in both arORC1 and arORC2 studies that show the complexity of evolution- (Fig. 2). The distinction between the two ances- ary scenarios for the components of the archaeal tral ORC subfamilies persists, but most of the replication machinery and new aspects of their lineage-specific paralogs show accelerated evo- relationship with the eukaryotic replication lutionary rates, and some have inactivated system. ATPase domains. Further duplications are de- tectable in several Halobacterial lineages giving rise to up to 20 ORC paralogs in Haloterrigena PREREPLICATION COMPLEX turkmenica. It appears that paralogization of DNA replication begins at specific sites in the ORC genes is driven by the appearance of genome known as the origins of replication. additional origins of replication including ac- Some archaea possess a single and others pos- quisition and integration of extra-chromosomal sess several replication origins, whereas in all elements (McGeoch and Bell 2008). In Sulfolo- eukaryotes replication starts from numerous, bus, it has been shown that the two origins are independent origins (Robinson and Bell 2005; recognized by distinct monomers or dimers of Coker et al. 2009). Typically, replication origins Orc1/Cdc6 paralogs homologs and that the encompass a distinct sequence signature, the three Orc1/Cdc6 genes are differentially ex- AT-rich box, that can be used to predict origins pressed in different growth phases and cell cycle in silico (Zhang and Zhang 2005). stages, suggesting a role for these proteins in The origin recognition complex (ORC) is a modulating the activity of the origins (Robin- hexameric protein complex that binds the ori- son et al. 2004). However, the functional differ- gins of DNA replication and recruits additional ences and the causes behind the distinct evolu- replication factors resulting in the formation of tionary constraints in the two major groups of the prereplicative complex (Fig. 1). Eukaryotic archaeal ORCs remain poorly understood. Orc1-5 and the closely related CDC6 are paral- The of the order Methano- ogous proteins that belong to the DnaA/ORC coccales are currently thought to lack ORC pro- of AAAþ ATPases (Iyer et al. 2004) and teins, and are assumed to pos- often also contain a carboxy-terminal helix– sess only one ORC that belongs to the arORC1 turn–helix (HTH) DNA-binding domain. All subfamily (Barry and Bell 2006). We identi- the six families of eukaryotic ORC subunits fied a subfamily of archaeal AAAþ ATPases have been traced back to the last eukaryotic com- (arCOG00472) that are fused to the carboxy- mon ancestor (LECA) (Makarova et al. 2005). terminal wHTH domain and clusters with the The Orc2 and Orc3 families are highly diverged arORC2 branch in the . The and contain inactivated ATPase domains. two major lineages represented in this subfam- Phylogenetic analysis of archaeal ORC ho- ily, and Thermococcales, are mologs reveals a complex evolutionary scenario exactly those that were missing from the arORC2 of gene duplications, losses, and extreme diver- clade. Moreover, the genetic context of these gence (Fig. 2). All this complexity notwith- genes in Methanococcales suggests their possi- standing, the phylogenetic tree clearly shows ble role in replication (Fig. 3). The onlyarchaeon the existence of two major groups of ORC pro- in which we were unable to identify an ORC teins, namely, the slow-evolving arORC1 and candidate is Methanopyrus kandleri AV19.Thus, the fast-evolving arORC2. The vast majority of it is most likely that the last common ancestor

4 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

Evolution of DNA Replication

arORC1/arORC2 TFIIIB Orc1-5 Orc6

MCM Cdc6 Mcm2-7

Whip Cdt1 DDK CDK STK

PP2C PP2C

TopBP1/Dpb11 Sld2 RecQ4 treslin/Sld3 P P Mcm10

P

GINS(51 and 23) GINS1234/Pfs1, Sld1-3 Cdc45 RecJ/CDC45

Rpa1 arRPA1 PolB3 PriL PCNA Pol α B Prim1 PriS Prim2 arRPA2 Rpa2 Rpa3 P PolB1 DP1DP2 Pol δ Pol ε

DnaG Rfc1 Rfc2-4

Fen1/EXO1, Lig1 Lig1 Rad2, Rad27 Fen1

RNAseH2

Figure 1. Comparison of the archaeal (reconstructed LACA) and eukaryotic replication systems (reconstructed LECA). Orthologs are shown by shapes of the same color, and paralogs are denoted by similar colors. The gene product names are indicated as follows: for most eukaryotic genes, the Homo sapiens nomenclature is used, but in several cases, both H. sapiens and Saccharomyces cerevisiae gene names are given for clarity. Archaeal genes are named as in Barry and Bell (2006) or as discussed in the text. The dotted outline in some shapes indicates components that could not be confidently traced to LACA. Small circles with “P” inside indicate phosphory- lation.

of the extant archaea and the last archaeo–eu- and fast-evolving ORC paralogs remains un- karyotic ancestor (whatever the exact nature clear. In S. solfataricus, both the slow-evolving of this entity) already encoded at least two paral- arORC1 and twofast-evolving paralogs from the ogous ORC proteins. The nature of the func- arORC2 branch all recognize origins of replica- tional distinctions between the slow-evolving tion albeit with different specificities (Robinson

Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 5 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

100 13541875|Therm| volcanium GSS1 257076405|Therm| acidarmanus fer1 95 119719183|Therm|Thermofilum pendens Hrk 5 119719780|Therm|Thermofilum pendens Hrk 5 82 343484704|uncla|Candidatus Caldiarchaeum subterraneum 79 156938082|Therm|Ignicoccus hospitalis KIN4-I 84 118430906|Therm|Aeropyrum pernix K1 PDB:1W5S 100 302349122|Therm|Acidilobus saccharovorans 345-15 91 99 218884738|Therm|Desulfurococcus kamchatkensis 1221n 95 320100317|Therm|Desulfurococcus mucosus DSM 2162 126466115|Therm|Staphylothermus marinus F1 33 296241871|Therm|Thermosphaera aggregans DSM 11486 305663894|Therm|Ignisphaera aggregans DSM 17230 93 97 73 347524473|Therm|Pyrolobus fumarii 1A 124027782|Therm|Hyperthermus butylicus DSM 5456 93 77 15897673|Therm|Sulfolobus solfataricus P2 Cdc6_2 93 94 15920687|Therm|Sulfolobus tokodaii str- 7 100 70606689|Therm|Sulfolobus acidocaldarius DSM 639 72 332796207|Therm|Acidianus hospitalis W1 44 146304794|Therm|Metallosphaera sedula DSM 5348 159041494|Therm|Caldivirga maquilingensis IC-167 171184486|Therm|Thermoproteus neutrophilus V24Sta 79 100 99 18312143|Therm|Pyrobaculum aerophilum str- IM2 99 327311520|Therm|Thermoproteus uzoniensis 768-20 99 76 352683020|Therm|Thermoproteus tenax Kra 1 307594151|Therm|Vulcanisaeta distributa DSM 14429 343484216|uncla|Candidatus Caldiarchaeum subterraneum 66 170290623|Candi|Candidatus Korarchaeum cryptofilum OPF8 288930812|Archa|Ferroglobus placidus DSM 10642 84489444|Metha| stadtmanae DSM 3091 100 99 325958319|Metha| sp- AL-21 333988036|Metha|Methanobacterium sp- SWAN-1 88 148642731|Metha| smithii ATCC 35061 98 288559680|Metha|Methanobrevibacter ruminantium M1 70 99 Cdc6 93 20088984|Metha|Methanosarcina acetivorans C2A 100 71 336477428|Metha|Methanosalsum zhilinae DSM 4017 298675924|Metha|Methanohalobium evestigatum Z-7303 91774071|Metha|Methanococcoides burtonii DSM 6242 52 96 77 294495016|Metha|Methanohalophilus mahii DSM 5219 arORC2 100 147919891|envir|uncultured methanogenic archaeon RC-I 282163319|Metha|Methanocella paludicola SANAE 51 93 126178619|Metha|Methanoculleus marisnigri JR1 8 154150222|Metha|Methanoregula boonei 6A8 89 99 91 307352234|Metha|Methanoplanus petrolearius DSM 11571 88601748|Metha|Methanospirillum hungatei JF-1 50 219852613|Metha|Methanosphaerula palustris E1-9c 99 85 124485227|Metha|Methanocorpusculum labreanum Z 93 307353259|Metha|Methanoplanus petrolearius DSM 11571 87 100 307354165|Metha|Methanoplanus petrolearius DSM 11571 116754893|Metha|Methanosaeta thermophila PT 98 330507560|Metha|Methanosaeta concilii GP6 327401265|Archa|Archaeoglobus veneficus SNP6 97 288931679|Archa|Ferroglobus placidus DSM 10642 49 11498302|Archa|Archaeoglobus fulgidus DSM 4304 100 VNG1964H 90 Orc1, Orc9, VNG7070 56 100 110669150|Halob|Haloquadratum walsbyi DSM 16790 292656420|Halob|Haloferax volcanii DS2 333911012|Metha|Methanotorris igneus Kol 5 86 100 99 297619922|Metha|Methanococcus voltae A3 150399778|Metha|Methanococcus vannielii SB 93 45357596|Metha|Methanococcus maripaludis S2 94 98 64 150401570|Metha|Methanococcus aeolicus Nankai-3 94 336121689|Metha|Methanothermococcus okinawensis IH1 100 256810980|Metha|Methanocaldococcus fervens AG86 20 261402168|Metha|Methanocaldococcus vulcanius M7 arCOG00472 96 15668955|Metha|Methanocaldococcus jannaschii DSM 2661 296109010|Metha|Methanocaldococcus infernus ME 96 170290621|Candi|Candidatus Korarchaeum cryptofilum OPF8|Candi

96 170289743|Candi|Candidatus Korarchaeum cryptofilum OPF8 82 223477363|Therm| sp- AM4 93 57642033|Therm|Thermococcus kodakarensis KOD1 98 242399132|Therm|Thermococcus sibiricus MM 739 78 332158370|Therm| sp- NA2 100 14520416|Therm| GE5 26 170290997|Candi|Candidatus Korarchaeum cryptofilum OPF8 284162271|Archa|Archaeoglobus profundus DSM 5631 75 13542308|Therm| GSS1 41614852|Nanoa|Nanoarchaeum equitans Kin4-M 66 332795694|Therm|Acidianus hospitalis W1 146303484|Therm|Metallosphaera sedula DSM 5348 100 99 15922492|Therm|Sulfolobus tokodaii str- 7 18 70605854|Therm|Sulfolobus acidocaldarius DSM 639 66 15898963|Therm|Sulfolobus solfataricus P2 Cdc6_3 PDB:2QBY 100 118577010|Cenar|Cenarchaeum symbiosum A 161527513|marin|Nitrosopumilus maritimus SCM1 91 329766186|marin|Candidatus Nitrosoarchaeum limnia SFB1 91 340345785|marin|Candidatus Nitrosoarchaeum koreensis MY1 343484933|uncla|Candidatus Caldiarchaeum subterraneum 119719499|Therm|Thermofilum pendens Hrk 5|Therm|Thermofilum pendens Hrk 5 305662446|Therm|Ignisphaera aggregans DSM 17230 98 124027479|Therm|Hyperthermus butylicus DSM 5456 92 93 77 16 347522555|Therm|Pyrolobus fumarii 1A 126465697|Therm|Staphylothermus marinus F1 296241749|Therm|Thermosphaera aggregans DSM 11486 31 100 98 218884688|Therm|Desulfurococcus kamchatkensis 1221n 94 99 320100211|Therm|Desulfurococcus mucosus DSM 2162 87 118431069|Therm|Aeropyrum pernix K1 PDB:2V1U 94 302348548|Therm|Acidilobus saccharovorans 345-15 156937045|Therm|Ignicoccus hospitalis KIN4-I 15897201|Therm|Sulfolobus solfataricus P2 87 Cdc6_1 47 83 70606524|Therm|Sulfolobus acidocaldarius DSM 639 96 15920497|Therm|Sulfolobus tokodaii str- 7 67 332796464|Therm|Acidianus hospitalis W1 97 146302786|Therm|Metallosphaera sedula DSM 5348 242399565|Therm|Thermococcus sibiricus MM 739 100 96 337285081|Therm|Pyrococcus yayanosii CH1 14520340|Therm|Pyrococcus abyssi GE5 96 332158295|Therm|Pyrococcus sp- NA2 79 223477351|Therm|Thermococcus sp- AM4 57641836|Therm|Thermococcus kodakarensis KOD1 3 92 124484830|Metha|Methanocorpusculum labreanum Z 99 154149550|Metha|Methanoregula boonei 6A8 95 80 2 88603712|Metha|Methanospirillum hungatei JF-1 126177953|Metha|Methanoculleus marisnigri JR1 98 86 arORC1 307352169|Metha|Methanoplanus petrolearius DSM 11571 36 0 72 219850688|Metha|Methanosphaerula palustris E1-9c 11497860|Archa|Archaeoglobus fulgidus DSM 4304 284161129|Archa|Archaeoglobus profundus DSM 5631 84 99 288930408|Archa|Ferroglobus placidus DSM 10642 61 95 327400297|Archa|Archaeoglobus veneficus SNP6 100 Orc7 100 147920456|envir|uncultured methanogenic archaeon RC-I 93 282162671|Metha|Methanocella paludicola SANAE 64 99 116754748|Metha|Methanosaeta thermophila PT 90 330507685|Metha|Methanosaeta concilii GP6 35 20088900|Metha|Methanosarcina acetivorans C2A 336475960|Metha|Methanosalsum zhilinae DSM 4017 99 85 298673979|Metha|Methanohalobium evestigatum Z-7303 60 294494691|Metha|Methanohalophilus mahii DSM 5219 95 91773683|Metha|Methanococcoides burtonii DSM 6242 86 289595679|uncla| boonei T469 13541589|Therm|Thermoplasma volcanium GSS1 99 257076745|Therm|Ferroplasma acidarmanus fer1 100 99 48478329|Therm| torridus DSM 9790 54 312136232|Metha| fervidus DSM 2088 99 288559259|Metha|Methanobrevibacter ruminantium M1 100 148643324|Metha|Methanobrevibacter smithii ATCC 35061 9 84488832|Metha|Methanosphaera stadtmanae DSM 3091 325957869|Metha|Methanobacterium sp- AL-21 94 92 333988600|Metha|Methanobacterium sp- SWAN-1 96 Orc5, Orc4, Orc2*, Orc6*, Orc3* 90 54 88 Orc8, Orc10 336476214|Metha|Methanosalsum zhilinae DSM 4017 85 298675146|Metha|Methanohalobium evestigatum Z-7303 40 336476075|Metha|Methanosalsum zhilinae DSM 4017 161529191|marin|Nitrosopumilus maritimus SCM1 284164538|Halob|Haloterrigena turkmenica DSM 5511 98 100 300709908|Halob|Halalkalicoccus jeotgali B3 284162269|Archa|Archaeoglobus profundus DSM 5631

0.5

Figure 2. Phylogenetic analysis of the ORC family in archaea and eukaryotes. (Gray) Eukaryotes; (dark blue) Euryarchaeota, with the exception of (orange) Halobacteria; (light blue) ; (purple) deeply branched archaeal lineages (Thaumarchaeota, , ). (Legend continues on following page.) Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

Evolution of DNA Replication

118380400|Cilio|Tetrahymena thermophila 94 52 66358398|Apico|Cryptosporidium parvum Iowa II 84995186|Apico|Theileria annulata strain Ankara 348681925|Oomyc|Phytophthora sojae 83 66824523|Mycet|Dictyostelium discoideum AX4 57 81 308809383|Chlor|Ostreococcus tauri 49659270|Strep|Arabidopsis thaliana 17 156389428|Metaz|Nematostella vectensis 99 99 32454746|Metaz|Homo sapiens 72 348518229|Metaz|Oreochromis niloticus 85 291236736|Metaz|Saccoglossus kowalevskii Orc4 87 69 198426976|Metaz|Ciona intestinalis 71004418|Fungi|Ustilago maydis 521

86 6325420|Fungi|Saccharomyces cerevisiae S288c 92 19112617|Fungi|Schizosaccharomyces pombe 972h 32 340522426|Fungi|Trichoderma reesei QM6a 302813646|Strep|Selaginella moellendorffii 87 17136788|Metaz|Drosophila melanogaster 167527239|Choan|Monosiga brevicollis MX1 90 221130070|Metaz|Hydra magnipapillata 100 25 62 32483369|Metaz|Homo sapiens 77 115948481|Metaz|Strongylocentrotus purpuratus 145614652|Fungi|Magnaporthe oryzae 70-15 Orc3 24 6323025|Fungi|Saccharomyces cerevisiae S288c 100 50311281|Fungi|Kluyveromyces lactis NRRL Y-1140 307104133|Chlor|Chlorella variabilis 68 145354524|Chlor|Ostreococcus lucimarinus CCE9901 167998931|Strep|Physcomitrella patens subsp- patens 12 99 66153722|Strep|Oryza sativa Japonica Group 70 97 15224316|Strep|Arabidopsis thaliana 86 88 5453830|Metaz|Homo sapiens 198428189|Metaz|Ciona intestinalis 167525942|Choan|Monosiga brevicollis MX1 95 Orc2 6319534|Fungi|Saccharomyces cerevisiae S288c 60 100 42 326469111|Fungi|Trichophyton tonsurans CBS 112818 49 19112935|Fungi|Schizosaccharomyces pombe 972h- 89 118384008|Cilio|Tetrahymena thermophila

91 17534571|Metaz|Caenorhabditis elegans 100 268532148|Metaz|Caenorhabditis briggsae 80 66826059|Mycet|Dictyostelium discoideum AX4 94 281207872|Mycet|Polysphondylium pallidum PN500 118398183|Cilio|Tetrahymena thermophila 66800637|Mycet|Dictyostelium discoideum AX4 95 99 19112164|Fungi|Schizosaccharomyces pombe 972h 38 6324068|Fungi|Saccharomyces cerevisiae S288c 30688490|Strep|Arabidopsis thaliana 11 82 308801743|Chlor|Ostreococcus tauri 75 71999676|Metaz|Caenorhabditis elegans Orc5 2 198426177|Metaz|Ciona intestinalis

91 115935317|Metaz|Strongylocentrotus purpuratus 99 156357653|Metaz|Nematostella vectensis 39 4505525|Metaz|Homo sapiens 348685832|Oomyc|Phytophthora sojae 98 17506005|Metaz|Caenorhabditis elegans 100 268566871|Metaz|Caenorhabditis briggsae 218189401|Strep|Oryza sativa Indica Group 100 224059306|Strep|Populus trichocarpa 94 92 30680045|Strep|Arabidopsis thaliana 12 198418008|Metaz|Ciona intestinalis 100 156407454|Metaz|Nematostella vectensis

0 47550985|Metaz|Strongylocentrotus purpuratus 0 4502703|Metaz|Homo sapiens 92 47214086|Metaz|Tetraodon nigroviridis Cdc6 89 71018825|Fungi|Ustilago maydis 521 99 19112702|Fungi|Schizosaccharomyces pombe 972h 99 97 340521289|Fungi|Trichoderma reesei QM6a 6322266|Fungi|Saccharomyces cerevisiae S288c 159113089|Diplo|Giardia lamblia ATCC 50803 72 17555708|Metaz|Caenorhabditis elegans 100 309357462|Metaz|Caenorhabditis briggsae AF16 32 66356538|Apico|Cryptosporidium parvum Iowa II 85 118387267|Cilio|Tetrahymena thermophila 83 85000897|Apico|Theileria annulata strain Ankara 15235420|Strep|Arabidopsis thaliana 100 224106169|Strep|Populus trichocarpa 88 30 115466830|Strep|Oryza sativa Japonica Group 90 308803492|Chlor|Ostreococcus tauri 90 99 76803807|Metaz|Homo sapiens Orc1 115709869|Metaz|Strongylocentrotus purpuratus 64 4007803|Fungi|Schizosaccharomyces pombe 71024009|Fungi|Ustilago maydis 521 91 98 3 6323475|Fungi|Saccharomyces cerevisiae S288c 99 6323575|Fungi|Saccharomyces cerevisiae S288c 66811242|Mycet|Dictyostelium discoideum AX4

0.5

Figure 2. (Continued) The MUSCLE program (Edgar 2004) was used for construction of sequence alignments. The tree was reconstructed using the FastTree program (Price et al. 2010) (293 sequences and 186 aligned positions for the archaeal tree and 84 sequences and 380 aligned positions for the eukaryotic tree). For Hal- obacteria, the branches are collapsed. For characterized genes of Halobacteria, Sulfolobus, and yeast, the conven- tional gene name or protein identifies are indicated in red on the right of the corresponding sequence or the corresponding branch. For two archaeal sequences for which the structure is solved, its PDB code is indicated. () The corresponding genes are missing in some Halobacteria. (Red) Fast-evolving branches; (green) slow-evolving branches.

Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 7 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

K.S. Makarova and E.V. Koonin

ORC HTH DUF66 arORC1 (4056) RFCL RFCS S7 CC1 Endo V Thermoproteales DUF66 arORC1 Endo V

HTH+Znf HTH MCM (5049) (5050) arORC2 (472) Methanococcales

TFIIB (cyclin) GTPase (358) (5526) RFCS TFIIB HIT (5525) Thermoproteales

DnaG DnaG (2253) , , Methanococcales

PolB2 (inactivated polymerase)

RecA (7300) PolB2 , Staphylothermus

PolB2 arORC2 Halobacteria

RPA MPP Lhr-like helicase arRPA1_short arRPA2 (1150) Methanomicrobia, Halobacteria

arRPA1_long RecA Methanobacteria, Methanomicrobia arRPA1_long Deacetylase HHT1 (324) Methanomicrobia, Halobacteria

Figure 3. Genomic context of selected genes for proteins involved in replication in archaea. Homologous genes are shown by arrows of the same color; genes are shown approximately to scale. (Blue) Replication genes; (magenta) genes coding for chromatin-binding protein and the respective modification ; (orange) genes for translation system components; (gray) uncharacterized genes. Genes for which the neighborhood is discussed in the text are marked by an outline. The arCOG numbers to which uncharacterized proteins are assigned are shown in parentheses. HTH, helix–turn–helix domain; S7, ribosomal protein S7; CC1, DNA- binding protein CC1; Endo V,homolog of endonuclease V; Znf, Zn finger; TFIIB, transcription initiation factor TFIIB; HIT, HIT family hydrolase; MPP, metallophosphatase superfamily protein; HHT1, .

et al. 2004). Furthermore, the loss of ORC1 and essential function of ORC1 in replication. In the colocalization of the ORC2 gene with repli- addition, the absence of recognizable ORC ho- cation machinery components in Methanococ- mologs in M. kandleri suggests that alternative, cales imply that on some occasions in the evo- uncharacterized mechanisms of archaeal repli- lution of archaea, ORC2 could take over the cation initiation might exist.

8 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

Evolution of DNA Replication

The phylogenetic tree of the eukaryotic ing, domains of Cdt1 adopt the winged HTH ORC/CDC6 family also divides into two bran- fold (Lee et al. 2004; De Marco et al. 2009). In ches: the slow-evolving ORC1/CDC6 and fast- the Crenarchaeota Aeropyrum pernix and sever- evolving ORC2-3-4-5 (Fig. 2). Owing to the al Sulfolobus , a gene adjacent to the or- extreme sequence divergence, it is difficult to igin of replication region (ORI) encodes a precisely decipher the relationships between ar- winged-helix initiator protein (WhiP) (Robin- chaeal and eukaryotic ORC families, but it son and Bell 2007). Given that WhiP contains seems likely that arORC1 is the ancestor of two winged HTH domains, it has been pro- Orc1/CDC6, whereas arORC2 is the ancestor posed that this protein is the ortholog and func- of the ORC2-3-4-5 branch. However, in eukary- tional analog of Cdt1 (Robinson and Bell 2007). otes, the functional division between the slow- Typical WhiP proteins with two HTH domains evolving and fast-evolving ORC subunits is dif- so far have been detected only in Desulfuro- ferent from that in archaea because all of these coccales, although many other archaea encode proteins are components of the heteromeric closely related homologs of the amino-terminal complex involved in replication. HTH domain of WhiP, and some of these The sixth component of the ORC complex, proteins might interact with ORI regions. Orc6, is shown to be required for DNA replica- Moreover, in several archaeal genomes, HTH- tion in eukaryotes, and it is critical for ORC containing proteins are encoded adjacent to function (Duncker et al. 2009). Orc6 does not ORC subunits (Fig. 3), suggesting that these display any sequence similarity with ATPases. proteins also could perform specific roles in Structural modeling of the amino-terminal do- replication. main of metazoan Orc6 that is essential for rep- The resulting complete ORC–Cdc6–Cdt1 lication (Balasov et al. 2009) and the recently complex is responsible for the ATP-dependent resolved structure of the middle portion of loading of the replicative helicase, the minichro- human Orc6 both show that Orc6 contains two mosome maintenance (MCM) complex, in eu- cyclin-like domains similar to those in the tran- karyotes and probably in some if not most ar- scription factor TFIIB that belongs to a con- chaea and by inference in LACA (Fig. 1). served achaeo–eukaryotic protein family (Liu et al. 2011). Unlike Orc6, the TFIIB family pro- THE CMG COMPLEX teins, in addition to cyclin domains, also contain an amino-terminal Zn finger. However, in sev- The eukaryotic MCM complex consists of six eral archaea, the Zn-binding ligands are lacking paralogous proteins (Mcm2–7), all of which or the finger module is lost completely. Interest- are essential for cell viability and are required ingly, in several genomes of Thermoproteales, for the initiation of DNA replication and repli- the gene for TFIIB is encoded in a predicted cation fork progression (Bochman and Schwa- operon with the small subunit of clamp loader, cha 2009). Additional eukaryotic MCM protein replication factor C (RFC) (Fig. 3). Further- families, Mcm8 and Mcm9, might be ances- more, several archaeal lineages encode addition- tral but are not present in all species, and their al, functionally uncharacterized TFIIB paralogs function in replication is unclear (Bochman that typically contain a single cyclin-like do- and Schwacha 2009). Genetic, biochemical, and main. Therefore, it cannot be ruled out that structural studies have shown that the hexa- TFIIBand/oritsparalog performdualfunctions meric MCM complex is the replicative helicase in both transcription and replication (Fig. 1). that separates DNA strands during chromosom- In eukaryotes, the hexameric ORC recruits al replication (Bochman and Schwacha 2009). two additional components, namely, Cdc6, The conserved core structure of all MCM which is an apparent product of ancestral du- proteins consists of an a-helical amino-termi- plication of Orc1 gene in eukaryotes (Fig. 2), nal subdomain, an OB fold domain with an and Cdt1 (Duncker et al. 2009). Both the mid- inserted Zn finger, an AAAþ ATPase domain, dle and the carboxy-terminal, MCM-interact- and a carboxy-terminal HTH domain. The

Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 9 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

K.S. Makarova and E.V. Koonin

HTH domain can be confidently detected only Pacek et al. 2006; Labib and Gambus 2007). In in Mcm6. The structure of this domain has been addition to binding MCM proteins, the GINS solved, and it has been shown to bind Cdt1 (Wei complex also associates with Pol a-primase, the et al. 2010). The same core structure is con- protein complex that synthesizes the primers on served in MCM proteins from archaea includ- the lagging strand, and with the leading and ing the carboxy-terminal HTH domain. It ap- lagging strands polymerases, Pol 1 and Pol d, pears that all eukaryotic MCM paralogs are respectively (MacNeill 2010). In contrast to products of ancestral duplications arising from the eukaryotic Mcm2–7 complex, which does a single ancestral MCM protein (Chia et al. not show helicase activity without the associat- 2010; Krupovic et al. 2010). All archaea encode ed GINS and Cd45 proteins, in vitro experi- at least one MCM protein homolog, and serial ments with the archaeal MCM homohexamers duplication of the MCM genes occurred in mul- from several have revealed robust tiple lineages. At least two paralogs could be helicase activity that did not require additional traced back to common ancestor of Halobacter- subunits (Sakakibara et al. 2009). iales and independently to the common ances- Highly diverged archaeal GINS homologs tor of Methanococcales (Chia et al. 2010; Kru- have been originally identified in silico (Maka- povic et al. 2010). Other duplications and losses rova et al. 2005) and have been subsequently followed the ancestral duplication in Methano- shown to interact with MCM (Marinsek et al. coccales giving rise to up to eight MCM paral- 2006; Bell 2011). Archaea encode two distinct ogs in Methanococcus maripaludis C6 (Chia forms of the GINS proteins, one of which ap- et al. 2010; Walters and Chong 2010). Duplica- pears to have been derived from the other by tions occurred also within a few other narrow circular permutation of a small domain (Marin- archaeal lineages. It has been shown that all sek et al. 2006). These two forms are also pres- four MCM paralogs of M. maripaludis S2 could ent in eukaryotes, which have two ancestral form a heterohexameric complex (Walters and genes with one domain configuration (Psf2 Chong 2010). In many archaea, some of the and Psf3) and two genes with the permuted paralogous MCM genes are associated with mo- domain arrangement (Psf1 and Sld5). Although bile elements or viruses and show accelerated many archaea encode only one GINS protein evolution (Chia et al. 2010). To date, a single that belongs to the Psf1/Sld5 subfamily, Cren- genetic study has been published showing that archaeota, Thaumarchaeota,Korarchaeaon, and only one of the three MCM paralogs in the eu- encode proteins of both subfam- ryarchaeon T. kodakarensis is essential for cell ilies, suggesting that eukaryotes inherited both viability (Pan et al. 2011b). Interestingly, an in- forms from their archaeal ancestor (Marinsek tein inserted into one of the MCM genes, ap- et al. 2006). Recently, the structure of two parently early in the evolution of euryarchaea, GINS protein from T. kodakaraensis has been followed by inactivation of the ATPase domain solved, revealing that the backbone structure in several intein-containing MCM homologs; of each subunit and the tetrameric assembly all archaea that possess this inactivated MCM closely resemble those of the human GINS com- protein also encode an intact paralog. plex (Oyama et al. 2011). In eukaryotes, in vitro and in vivo experi- Until very recently, it was believed that ments have shown that the MCM complex, on Cdc45, which is present in all eukaryotes in its own, is not the active helicase but requires the one copy, is specific to eukaryotes (MacNeill association with two accessory factors, the tet- 2010). However, an in-depth computational rameric GINS complex (Sld5, Psf1-3) and the analysis has shown that the amino-terminal Cdc45 protein (Moyer et al. 2006; Pacek et al. region of Cdc45 contains a DHH phosphoes- 2006; Labib and Gambus 2007). This complex is tarase domain, suggesting that Cdc45 is the referred to as the CMG (Cdc45, MCM, GINS) eukaryotic ortholog of archaeal and bacterial complex and is thought to be the active replica- RecJ nucleases (Sanchez-Pulido and Ponting tive helicase unit in vivo (Moyer et al. 2006; 2011). Furthermore, the GINS complex of

10 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

Evolution of DNA Replication

S. solfataricus, in addition to the association and developed some specialized function (ar- with MCM, binds a protein denoted RecJdbd COG00429, COG1107). Twootherclades (with- (RecJ-like DNA-binding domain), which is ho- in arCOG00427) retained significant levels of mologous to the carboxy-terminal domain of sequence similarity. Most of these proteins con- bacterial RecJ but lacks the nuclease domain tain an active DHH nuclease domain, suggest- (Marinsek et al. 2006). ing that they are active nucleases. Only one of Recently, the GINS complex of the euryar- these paralogs (arCOG00427_II) is often en- chaeon T.kodakarensis has been shown to inter- coded in a conserved neighborhood with the act with primase, MCM, DNA polymerase D, ribosomal proteins S15 and S3, and the Pcc1 PCNA, and the GINS-associated nuclease subunit of the KEOPS complex (Fig. 4B). (GAN) (Li et al. 2010, 2011). Unlike the RecJdbd Thermococci might have lost one paralog of S. solfataricus, GAN is a bona fide ortholog of (arCOG00427_I). Methanococcales encode an bacterial RecJ containing a DHH phosphoester- additional paralog (arCOG00433) that has lost ase domain with all the essential catalytic resi- the DHH domain. These genes are located in dues. Recent phylogenomic analysis of RecJ ho- genomic neighborhoods that encode no other mologs in archaea led to the identification of proteins involved in replication; thus, it is un- previously unsuspected homologs in Thermo- clear whether they are components of replica- proteales and revealed a complex scenario of tion systems. In Halobacteria, the RecJ orthologs RecJ family evolution in archaea and the origin (arCOG00428), which are encoded in the same of Cdc45 (Fig. 4A) (Makarova et al. 2012). conserved neighborhood, contain an inactivat- Under this scenario, the last common ances- ed DHH domain. Thus, inactivation of the RecJ- tor of all extant archaea possessed a single RecJ like nuclease that apparently became a dedicated ortholog that was encoded in the conserved replication protein seems to have occurred at neighborhood including also the S15, S3, and least twice independently in different archaeal Pcc1 genes (Fig. 4B). This ancestral protein was lineages (Fig. 4C). an active DHH nuclease and an essential com- In eukaryotes, MCM10 is an essential repli- ponent of the replication machinery. However, cation protein that interacts with the assem- the experiments with the S. solfataricus replica- bled preRC and is required for the recruitment tion system indicate that the nuclease domain is of Cdc45 (Wohlschlegel et al. 2002). It stays not required for replication; thus, the ancestral in the active replisome complex throughout RecJ protein might have performed additional the S phase and appears to be important for functions, for example, in repair, that required both replisome assembly and fork progression. the nuclease activity. In Crenarchaeota, the Mcm10 physically interacts with both PCNA DHH domain partially deteriorated, losing the and Pol a (Chattopadhyay and Bielinsky 2007). nuclease activity, and the RecJ homolog ap- The structure of the conserved middle domain parently became a dedicated replication system of Mcm10 revealed a unique arrangement of a component; the subsequent routes of evolution CCCH-type Zn-binding module and an OB- were notably different between the three major fold that jointly form a DNA-binding interface crenarchaeal branches—Sulfolobales, Desulfur- (Warren et al. 2008). The carboxy-terminal do- ococcales, and Themrorpoteales—resulting in main of MCM10 is composed of a second, extreme sequence divergence. The archaeal an- CCCC-type Zn-binding module that is not in- cestor of eukaryotes, the exact nature of which volved in DNA binding and a HTH domain. The remains elusive, also retained the RecJ ortholog CCCC-type Zn-binding module of MCM10 is (Cdc45) in which some but not all catalytic res- structurally similar to a corresponding module idues of the DHH domain are conserved; so far, in the amino-terminal oligomerization domain to our knowledge, there are no experimental of eukaryotic andarchaealMCMhelicases,some data showing a nuclease activity of Cdc45. In of which also possess a carboxy-terminal HTH Euryarchaeota, the RecJ gene seems to have un- domain (Robertson et al. 2010). In several fun- dergone triplication. One clade evolved very fast gi, Mcm10 additionally contains an amino-

Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 11 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

K.S. Makarova and E.V. Koonin

A

COG00429 RecJ clade ar

arCOG00428 COG1107

COG00427_II OG00427_I Cdc45

arCOG00432 ar

GAN arC

COG0608, RecJ

arCOG00425

OG01565 arCOG00424 arC

arC a ar COG2404 rCOG COG00430 OG00431 00423

Ppx1 clade arCOG01566

Prune OG01567 COG0618

rC

a Ppx1

COG1227 COG2404 clade

B Thermococcus kodakaraensis (arCOG00427) TK1251-TK1254 Caldiarchaeum subterraneum (arCOG00427) CSUB_C0444-CSUB_C0447 S15 RecJ Pcc1 S3 Nitrosopumilus maritimus (arCOG00427) Nmar_1508-Nmar_1511

Korarchaeum cryptofilum (arCOG00427) Kcr_1419-Kcr_1418 S15 RecJ Archaeoglobus fulgidus (arCOG00427) AF0699-AF0698 RecJ Pcc1 Hyperthermus butylicus (arCOG00432) Hbut_0431-Hbut_0428 Haloarcula marismortui (arCOG00428) rrnAC1426-rrnAC1429 S15 RecJ Inactivated DHH Pcc1 S3 Pyrobaculum aerophilum (arCOG05692) PAE3485-PAE3482 S15 RecJ Inactivated DHH Pcc1

C Crenarchaeota Thermoproteales Euryarchaeota Thaumarchaeota Methanococcales Korarchaeota Nanoarchaeota Thermococcales Thermoplasmatata Halobacteria Methanosacinales Desulfurococcales Sulfolobales Thepe Calma Thete Pyrca Thene Pyris Pyrar Pyrae Arcfu Korcr Naneq Calte Nitma Censy Metka Metin Metvu Metja Metae Metva Metmp Metst Metth Metsm

arCOG00427 2 111 1 1 122 2 2 2 2222 2 3 2 1 2 arCOG00428 arCOG00432 arCOG00433 arCOG05902 arCOG05692

Figure 4. RecJ homologs in archaea. (Red) Eukaryotes; (yellow) bacteria; the color code for archaea is as in Figure 2. (A) Phylogeny of the DHH superfamily of phosphoesterases. The tree was reconstructed using aligned blocks corresponding to the DHH domain (Makarova et al. 2012). The MUSCLE program (Edgar 2004) was used for the construction of sequence alignments (229 sequences in total, 83 aligned positions). The maximum likelihood (ML) phylogenetic tree was constructed by using the MOLPHY program (Adachi and Hasegawa 1992) with the JTT substitution matrix to perform local rearrangement of an original Fitch tree (Fitch and Margoliash 1967). The MOLPHY program was also used to compute RELL bootstrap values. arCOG or COG numbers or family names (for eukaryotes) are indicated for the corresponding branches. (Green) The location of the GAN protein. (B) Genomic context of the RecJ homologs in selected archaea. The designations are as in Figure 3. The arCOGsto which RecJ-like proteins are assigned are indicated in parentheses. (Legend continues on following page.)

12 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

Evolution of DNA Replication

terminal P-loop ATPase domain (Fien et al. and Stillman 2010). The DDK phosphorylates 2004; Fien and Hurwitz 2006). The human one of the Mcm2/4/6 proteins (Sheu and MCM10 protein forms a hexameric structure Stillman 2010), whereas CDK phosphorylates similar to the structure formed by other MCM Mcm5, Sld2, Sld3, and Mcm10 (Tanaka et al. proteins (Okorokov et al. 2007). Taken together, 2007). This phosphorylation results in binding these observations suggest that Mcm10 might be of Cdc45 and Dpb11 proteins and activation of derived from the MCM protein family. Archaea the Mcm2–7 helicase (Sclafani and Holzen also encode several families of modified MCM 2007). The Sld2, Sld3, and Dpb11 proteins are homologs. Forexample, inactivated MCM para- conserved in most eukaryotes and are essential logs originated independently in M. kandleri regulators of replication initiation (Pospiech and in Thermococci (arCOG05761). In both et al. 2010). cases, Zn-binding amino are also substi- Dpb11 (known as TopBP1 in human) is a tuted, but the carboxy-terminal HTH domain is member of the BRCT domain superfamily that present. The function of these MCM homologs contains several BRCTrepeats (Bork et al. 1997; in archaea is unknown. Several Methanosarci- Garcia et al. 2005). In , the BRCT nales and possess a protein fam- domain is present only in NAD-dependent ily with a CCCC-type Zn finger clearly related to DNA ligase (Bork et al. 1997). The NAD-depen- that in MCM and an amino-terminal HTH do- dent ligase is an essential of DNA rep- main (arCOG02259 and arCOG02260, respec- lication and repair that is present in almost all tively), resembling the configuration of the car- bacteria (except for some intracellular symbi- boxy-terminal portion of eukaryotic MCM10. onts with the smallest genomes) but only in a Some of these proteins also contain an addition- few archaea, which probably acquired this gene al carboxy-terminal RepH/I/J family domain via horizontal transfer from bacteria. The eu- that is involved in replication of plasmid karyotic BRCT domains most likely evolved pNRC100 in Haloferax volcanii (Ng and Das- from the bacterial ones. Sarma 1993). Thus, a homolog of Mcm10 might The Sld3-like proteins show poor sequence have been present already in the archaeal ances- conservation even among eukaryotes. Recently, tor of eukaryotes (Fig. 1). however, it has been shown that an a-helical In eukaryotes, activation of the replicative conserved domain of Sld3 is present in the ma- helicase and loading of the replisome depend jority of eukaryotes (in animals and plants, this on two serine–threonine kinases, CDK (cyclin- protein is called treslin); however, homologs in dependent kinase) and DDK (Dbf4-dependent prokaryotes so far have not been detected (San- kinase) (Tanaka et al. 2007; Araki 2010; Sheu chez-Pulido et al. 2010).

Figure 4. (Continued) The protein IDs for this region in the corresponding genomes are indicated. (C) Phyletic patterns of RecJ-related arCOGs. The phyletic patterns for indicated arCOGs ([filled circles] presence; [empty circles] absence) are superimposed over the phylogenetic tree of archaea. The circles for proteins implicated in replication are shaded. The number of paralogs is indicated inside the circles for arCOG00427 (all other sub- families have one representative in each genome). The tree is a modified version of the consensus phylogeny of archaea (Makarova et al. 2010) with Caldiarchaeum subterraneum included in the Thaumarchaeota branch and several branches with the same distribution of RecJ-related subfamilies collapsed. Arcfu, Archaeoglobus fulgidus; Metsm, Methanobrevibacter smithii ATCC 35061; Metth, thermautotrophicus; Metst, Methanosphaera stadtmanae; Metmp, Methanococcus maripaludis S2; Metva, Methanococcus vannielii SB; Metae, Methanococcus aeolicus Nankai-3; Metja, Methanocaldococcus jannaschii; Metvu, Methanocaldococcus vulcanius M7; Metin, Methanocaldococcus infernus ME; Metka, Methanopyrus kandleri; Pyrae, Pyrobaculum aerophilum; Pyrar, Pyrobaculum arsenaticum DSM 13514; Pyris, Pyrobaculum islandicum DSM 4184; Thene, Thermoproteus neutrophilus V24Sta; Pyrca, Pyrobaculum calidifontis JCM 11548; Thete, Thermoproteus tenax; Calma, Caldivirga maquilingensis IC-167; Thepe, Thermofilum pendens Hrk 5; Censy, Cenarchaeum symbiosum; Nitma, Nitro- sopumilus maritimus SCM1; Naneq, Nanoarchaeum equitans; Korcr, Candidatus Korarchaeum cryptofilum OPF8; Calte, Candidatus Caldiarchaeum subterraneum.

Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 13 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

K.S. Makarova and E.V. Koonin

The Sld2 sequence is poorly conserved as mic contexts (Makarova and Koonin 2010; well, and this protein so far has been identified Makarova et al. 2010), but at this point, there only in Fungi. However, the animal RecQ4 is no evidence of the involvement of these ki- helicase contains an amino-terminal Sld2 do- nases in regulation of archaeal DNA replication. main (Capp et al. 2010). Wefound that the ami- no-terminal domain of plant RecQ (e.g., GI: THE ACTIVE REPLISOME 308804427) shows weak sequence similarity and compatible secondary structure with Sld2 Once the CMG complex is assembled and acti- (KS Makarova, unpubl.). This observation sug- vated, it begins separating the two DNA strands gests that the Sld2 domain is involved in the and forming a replication “bubble” from which regulation of prereplication complex formation the two replication forks, each containing a in all eukaryotes. Moreover, the LECA probably leading and a lagging strand, move away (Moyer contained the Sld2 domain fused with the et al. 2006; Pacek et al. 2006; Labib and Gambus RecQ-like helicase that might be directly in- 2007; MacNeill 2010). Once replication is in volved in replication (Capp et al. 2010; Pospiech progress, other proteins required for the bulk et al. 2010). Most bacteria encode a RecQ heli- DNA synthesis and protection of single-strand- case, whereas only a few archaea do. In Metha- ed DNA can be recruited to form the full repli- nomicrobia, the recQ genes are clearly trans- some complex at each replication fork (Fig. 1). ferred from bacteria. In contrast, Acidilobus saccharovorans, Aeropyrum pernix, and Korar- Primases chaeum cryptofilum possess a small subfamily of RecQ homologs that are highly diverged Similarly to the eukaryotic replication system, and contain a long amino-terminal region that archaeal DNA replication involves a two-sub- in K. cryptofilum encompasses a DEDDh-like unit primase (PriS, the catalytic subunit, and 30 –50 exonuclease domain and in the other PriL, the polymerase-interacting subunit) that two species probably is an inactivated derivative synthesizes an 8- to 12-nt RNA primer (Kuchta of this nuclease. This domain architecture re- and Stengel 2010). In eukaryotes, the primer is sembles the animal Werner syndrome RecQ- then elongated by Pol a complexed with the like helicase (Mushegian et al. 1997; Bernstein cognate B (small) subunit to form a covalent et al. 2010). It seems likely that this particular DNA–RNA hybrid that is required for the rep- archaeal subfamily of RecQ helicases is ances- licative polymerase to start elongation (Kuchta tral to the eukaryotic RecQ. An interesting ques- and Stengel 2010). In archaea, the RNA primer tion that probably can be addressed only by seems to be sufficient for the activation of the structure comparison is whether the extended replicative polymerase (Barry and Bell 2006). amino-terminal region of these archaeal RecQ- The evolution of these proteins in both eu- like proteins also contains a domain homo- karyotes and archaea appears to have involved logous to Sld2 and if they are involved in repli- primarily simple vertical descent because the cation. majority of archaeal and eukaryotic genomes Homologs of CDK and DDK (COG0515) encompass a single gene for each subunit. The serine–threonine kinases are present in most only deviation from this simple pattern is found major archaeal lineages (with the exception of in N. equitans, which encodes a hybrid protein Thaumarchaeota and Nanoarchaeota) and thus containing regions approximately correspond- can be confidently projected to the archaeal ing to the amino-terminal domain of the small ancestor of eukaryotes as well. Most likely, ser- subunit and the carboxy-terminal domain of ine/threonine protein phosphatase(s) might be the large subunit (NEQ395). present in this ancestral form as well. A function- In addition to the large and small subunits al link between kinase and phosphatase and cell of the archaeo–eukaryotic primase (PriSL), all division and membrane-remodeling systems in archaea possess at least one gene for the bac- archaea is strongly suggested by conserved geno- terial-type DnaG primase, although in contrast

14 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

Evolution of DNA Replication

to the bacterial homologs, archaeal DnaG lacks dria and has a dedicated function in mitochon- the amino-terminal Zn-finger domain and the drial genome replication (Shutt and Gray 2006). carboxy-terminal domain that binds the bac- terial replicative helicase DnaB (Makarova et al. 1999). The recently characterized DnaG pro- DNA Polymerase Sliding Clamp tein of S. solfataricus has been shown to possess and Clamp Loader similar properties to the Escherichia coli primase The PCNA (proliferating cell nuclear antigen) and shows a fourfold faster rate of DNA priming functions as the sliding clamp, a trimer that en- than SsoPriSL (Zuo et al. 2010). It has been circles double-stranded DNA and freely slides proposed that DnaG might be involved in the along the DNA molecule (Beattie and Bell initiation of Okazaki fragment synthesis on the 2011b). The sliding clamp is required for the lagging strand (Zuo et al. 2010). The dnaG gene processivity of the DNA polymerase and coor- is often colocalized with the gene for the major dinates the function of multiple binding part- form of archaeal superfamily B DNA polymer- ners that are also required for replication and ase, although these genes might not be co-reg- repair processes. The sliding clamp is one of ulated given that they are oriented convergently the few universally conserved proteins involved (Fig. 5). In many , dnaG is encod- in replication (the bacterial ortholog is known ed in a predicted operon with uncharacterized as the DNA polymerase b subunit). In eukary- a-helical, potentially metal-binding protein of otes and many archaea, there is a single PCNA arCOG02254 that by inference might have a role gene the product of which forms a homotrimer; in replication (Fig. 3). Most eukaryotes (but not this gene can be projected back to the last animals in which this gene has been inactivated) common ancestor of archaea and the archaeal encode a DnaG protein that is typically fused to ancestor of eukaryotes. However, in eukar- a DnaB-like helicase domain and is not directly yotes and independently in Crenarchaea, the related to archaeal DnaG. Instead, this gene ap- PCNA gene is duplicated. Apparently there had pears to have been derived from the mitochon- been an ancestral duplication in Crenarchaea

arORC1 DP 1 DP 2

arORC2 RFCL RFCS GINS23 MCM (4059)

GTPase GINS23 MCM eIF-2β

GTPase PCNA PriS GINS15 L44 S27 eIF-2α Nop10

RimI PolB3 (major) DnaG

S15 CDC45/RecJ Pcc1 S3

Figure 5. Conserved genomic context of selected genes encoding replication proteins. The designations are as in Figure 3. S3, S15, S27, L44, The respective ribosomal proteins; RimI, GNATfamily acetyltransferase; eIF-2a and eIF-2b, the respective translation initiation factors; Nop10, an RNA-binding protein; Pcc1, subunit of the KEOPS complex.

Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 15 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

K.S. Makarova and E.V. Koonin

followed by an additional duplication in the domain polymerases (Burgers et al. 2001). In Desulfurococcales lineage (Chia et al. 2010). addition to the polymerase core domain, all Some species of Crenarchaea encode multiple these proteins contain an amino-terminal 30 – copies of PCNA that can form either homo- 50 exonuclease domain. All eukaryotes possess trimers or heterotrimers (Barry and Bell 2006; four paralogous B-family polymerases denoted Pan et al. 2011a). In eukaryotes, the ancestral Pol a,Pold,Pol1, and Pol z that are involved in duplication yielded at least three additional both DNA replication and repair (Makarova PCNA-like families, all of which are subunits et al. 2005; Kunkel and Burgers 2008). Archaea of the checkpoint 9-1-1 complex (components encode at least two paralogous B-family poly- HUS1, RAD1, and RAD9), which is involved in merases: the “major” one (present in all ar- DNA damage checkpoint control (Aravind et al. chaea), and PolB3 and the minor one (several 1999; Majka and Burgers 2004). lineages of methanogens lack this gene), which Replication factor C, the clamp loader, is are both projected to LACA PolB1 (Edgell et al. another universally conserved protein that is 1998; Rogozin et al. 2008; Tahirov et al. 2009). required for PCNA assembly around a DNA A small subset of archaea possesses another, di- molecule at template–primer junctions (Bloom vergent B-family polymerase (arCOG04926), 2009). In both archaea and eukaryotes, RFC is a which seems to contain active exonuclease and pentamer that consists of one large subunit Palm domains. This family is represented by (RFCL) encoded by a single gene and four small three paralogs in Methanococcoides burtonii subunits (RFCS) that are identical in most ar- and appears to be prone to duplication and chaea but are represented by four distinct para- HGT. In addition, several archaea encode a de- logs encoded by ancestral genes in eukaryotes rivative B-family polymerase, PolB2, in which (Makarova et al. 2005; Chia et al. 2010). Both both the exonuclease and the Palm domain ap- large and small subunits are paralogs and belong pear to be inactivated (Edgell et al. 1998; Rogo- to the AAAþ superfamily of ATPases (Iyer et al. zin et al. 2008). In Crenarachaeota, this gene 2004).Inaddition, early ineukaryoticevolution, often colocalizes with genes encoding a small further duplications gave rise to other families, a-helical protein from arCOG07300 and RadA such as Rad24, which were recruited for distinct protein, whereas in Euryarchaeota, it colocalizes roles in checkpoint complexes in parallel with with the arORC2 gene, suggestive of involve- the duplication of the corresponding PCNAs ment in replication (Fig. 3). and Chl12, a specific chromatid cohesion clamp In addition to the B-family polymerases, loader (Majka and Burgers 2004). In archaea, many archaea encode the unique D-family po- independent duplications of RFCS occurred lymerase (Cann et al. 1998), which is absent in in Methanomicrobia/Halobacteria, Thermopro- Crenarachaea but present in all deeply branch- teales, and a few smaller lineages independently, ing lineages (Thaumarchaea, Nanoarchaeon, often followed by acceleration of the evolution- Korarchaeon), suggesting that this polymerase ary rate of the “extra” copies (Chia et al. 2010). was present in LACA. The D-family polymer- Notably, in archaeal genomes, the “original” ases consist of two subunits. The large subunit paralogs of both PCNA and RFC that retain the DP2 is a large multidomain protein that forms a ancestral roles in replication are encoded in con- homodimer that is responsible for the polymer- served gene contexts adjacent to other proteins ase activity (Shen et al. 2001; Matsui et al. 2011). involved in replication, although some addi- The DP2 protein does not display any sequence tional paralogs are encoded in suggestive con- similarity with other protein families (except texts as well (Figs. 3 and 5). for two Zn fingers), but examination of con- served motifs suggests that it might be a highly diverged Palm-domain polymerase (Cann et al. DNA Polymerases 1998). The small subunit DP1 contains at least In archaea and eukaryotes, the main replicative two domains, an ssDNA-binding OB-fold polymerases belong to the B family of Palm- and the 30 –50 exonuclease domain of the

16 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

Evolution of DNA Replication

metallophosphatase (MPP) family. The DP1 the major (PolB3) and minor (PolB1) forms protein is the ancestor of the small B subunit of archaeal polymerases, and the origin of the of eukaryotic replicative polymerases of the B small subunit from DP1 and the origin of the family that, however, have lost the catalytic ami- carboxy-terminal Zn finger from DP2 empha- no residues of the 30 –50 exonuclease (Ara- size the joint contributions of the B-family and vind and Koonin 1998; Klinge et al. 2009). Evi- D-family archaeal polymerases to the evolution dence has been presented that the D-family of the eukaryotic replication machinery. Fur- polymerase specializes in the replication of the thermore, these findings suggest that the ar- lagging strand, whereas the B-family polymerase chaeal ancestor of eukaryotes possessed a highly is involved in the leading-strand replication complex replication apparatus, possibly more (Henneke et al. 2005). Interestingly, all Crenar- complex than any known extant archaeon. achaea that have no family-D polymerase pos- sess at least one additional active polymerase of Primer Removal and Gap Closure the B family, suggesting that the two distinct B- family polymerases specialize on the leading- In both eukaryotes and archaea, RNase HII and and lagging-strand replication, respectively, as FEN1 flap endonuclease are responsible for re- is the case in eukaryotes. moval of RNA primers during replication (Fig. Phylogenetic analysis of archaeal, eukaryot- 1) (Barry and Bell 2006). Both enzymes are ic, and bacterial B-family DNA polymerases present in all archaea with only a few duplica- combined with an analysis of domain archi- tions observed. Eukaryotes have a single gene tectures indicates that eukaryotic B-family po- for RNase II, but ancestral duplications have lymerases, most likely, originated from two led to at least three FEN1-like families (Rad2, distinct archaeal ancestors, the major archaeal Rad27, EXO1), which, as shown in yeast, pos- form that gave rise to the catalytically active sess essentially the same activities and can com- amino-terminal domain of Pol 1, and the minor plement each other in both replication and re- form that became the common ancestor of Pol pair processes (Sun et al. 2003). a,Pold, and Pol z. All eukaryotic B-family po- Another essential replication enzyme is lymerases contain two carboxy-terminal Zn- DNA ligase, which is responsible for joining of finger modules. Interestingly, the carboxy-ter- the Okazaki fragments during lagging-strand minal module appears to be derived from the replication. The main ligase involved in DNA Zn finger in the DP2 subunit of archaeal D- replication in archaea is an ATP-dependent li- family DNA polymerases that are unrelated to gase (arCOG01347), which is encoded in the the B family, at least as judged by sequence com- vast majority of archaeal genomes and is an parison (Tahirov et al. 2009). The Zn finger of apparent ortholog of the three eukaryotic li- Pol 1 shows greater similarity to the counterpart gases (I, III, and IV) that evolved through in archaeal DP2 than the Zn fingers of other ancestral duplications (Ellenberger and Tom- eukaryotic B-family polymerases. The car- kinson 2008; Yutin and Koonin 2009). These boxy-terminal portion of eukaryotic Pol 1 con- enzymes share a common DNA-binding do- sists of two additional polymerase and exonu- main, a catalytic nucleotidyltransferase domain, clease domains, both inactivated; there are and an OB-fold domain (Martin and MacNeill indications that this module could be of bacte- 2002; Ellenberger and Tomkinson 2008). Only rial or bacteriophage origin (Tahirov et al. Ligase I in eukaryotes is directly involved in 2009). The presence of an inactivated exonucle- replication, whereas Ligase III and Ligase IV ase–polymerase module in Pol 1 parallels a sim- have adopted different roles in DNA repair ilar inactivation of both enzymatic domains in a and homologous recombination. Ligase III distinct subfamily of inactivated archaeal B- and Ligase IV contain carboxy-terminal BRCT family polymerases (Rogozin et al. 2008). domains that are involved in protein–pro- The apparent derivation of the large sub- tein interactions via phosphoserine recognition units of eukaryotic B-family polymerases from (Martin and MacNeill 2002). As pointed out

Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 17 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

K.S. Makarova and E.V. Koonin

above, a BRCT domain is also present in bacte- eukaryotic RPA1; and arRPA1_short, typically rial NAD-dependent DNA ligase, which is also with a single OB-fold domain. found in several archaea. So far only two ar- The diverged COG3390 (arRPA2) consists chaea, Halalkalicoccus jeotgali and Halorhabdus of short proteins most of which contain a single utahensis, lack the ATP-dependent ligase but OB-fold domain and acarboxy-terminal wHTH encode the NAD-dependent ligase, which in domain, the same domain arrangement as in the these organisms can be predicted to function eukaryotic RPA2 protein (Mer et al. 2000). So in replication. far, RPA2 has been detected only in Euryar- chaeota (Fig. 6B). It seems likely that arRPA2 evolved by duplication of the short form of Single-Stranded DNA-Binding Proteins arRPA1 followed by accelerated evolution. Sim- Single-stranded DNA-binding proteins (SSBs) ilar events have been detected in eukaryotes. For are essential components of the replication ma- example, fungal CDC13, Stn1, and Ten1, the chinery that prevent reannealing of the growing subunits of a heterotrimeric complex essential DNA chain with the template and protect for telomere maintenance, are paralogs of the ssDNA from degradation. In all three domains Rpa1–Rpa2–Rpa3 complex subunits but show of life, the major SSB proteins possess an OB- only remote sequence similarity to RPA1, RPA2, fold (Murzin 1993). In eukaryotes, the SSB, and RPA3, respectively (Sun et al. 2009, 2011). known as RPA, is a heterotrimer composed of Thus, most archaea possess at least two RPA three subunits of 70, 32, and 14 kDa (denoted genes, one long (arRPA1_long) and one or RPA1, RPA2, and RPA3 in yeast) that contain several short ones (arRPA1_short or arRPA2), four, one, and one OB-fold units, respectively. which are apparent ancestors of eukaryotic All three subunits are essential for the formation RPA1 and RPA2, respectively, and can be con- of a stable, functional RPA complex (Bochkarev fidently projected to LACA (Fig. 6B). Whether and Bochkareva 2004). Archaea show multiple or not LACA encoded a third short RPA (ar- variations of RPA domain organization, num- COG05741) remains uncertain. but, given the ber of homologs, and modes of interaction. presence of this gene in Korarchaeaon and Among the experimentally characterized RPA several recently sequenced genomes of deeply complexes in archaea, there is a homotetramer branching archaea (Fig. 6B) (Narasingarao et containing a single OB-fold in S. solfataricus al. 2012), it seems likely that this RPA is ancestral (Wadsworth and White 2001); single-subunit in archaea as well (Fig. 1). The major exceptions RPAs containing multiple OB-folds in metha- from this pattern of multiple RPA forms are nogens Methanosarcina acetivorans, Methano- Sulfolobales/Desulfurococcales, which possess caldococcus jannaschii, and Methanothermobac- only RPA1_short, and Thermoproteales, which ter thermautotrophicus (Robbins et al. 2005, and are the only known organisms without detect- references therein); and a heterotrimeric RPA in able OB-fold-containing SSBs. that consists of three distant- The apparent absence of a canonical RPA/ ly related subunits: RPA41 (COG1599), RPA32 SSB in Thermoproteales stimulated the search (COG3390), and RPA14 (arCOG05741), each for an alternative, which resulted in the iden- containing an OB-fold; in addition, RPA41 con- tification of a distinct ssDNA-binding protein tains a Zn-finger-like motif (Komori and Ishino in T. tenax that is unrelated to RPA and has 2001). been denoted ThermoSSB (Paytubi et al. 2012). To characterize general trends in the evolu- ThermoSSB(arCOG05578)containsanssDNA- tion of RPA in archaea, we reconstructed phy- binding domain with a novel fold and a leucine- logenetic trees for COG1599 and COG3390 zipperdomainthatmediatesdimerizationofthis (Fig. 6A). The COG1599 subtree (which we protein (Paytubi et al. 2012). ThermoSSB is pre- here denote arRPA1) divides into two clades: sent in all Thermoproteales with the exception arRPA1_long with several OB-fold domains of T. pendens, which encodes two RPA-like pro- and often a Zn finger homologous to that in teins. Thus, ThermoSSB perfectly complements

18 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

AB

13542016|Therm|Thermoplasma13542016|Therm|Thermoplasma volcaniumvolcanium GSS1 100 4847849148478491|Therm|Picrophilus|Therm| DSMDS 9790 2 93 257076905257076905|Therm|Ferroplasma|Therm|Ferroplasma acidarmanus fer1fer 77 3 80 55376365|Halob|Haloarcula marismortui ATCC 43049 Halobacteria 2 92 288931348|Archa|Ferroglobus placidus DSM 10642

Methanomicrobiales 2 118577123|Cenar|Cenarchaeum symbiosum A 161527601|marin|Nitrosopumilus maritimus SCM1 97 Archaeoglobi 96 340343954|marin|Candidatus Nitrosoarchaeum koreensis MY1 88 329766730|marin|Candidatus Nitrosoarchaeum limnia SFB1 Thermoplasmatata 91 20094877|Metha|Methanopyrus kandleri AV19 327401950|Archa|Archaeoglobus veneficus SNP6 41 83 Methanobacteria 312136409|Metha| DSM 2088 15679384|Metha|Methanothermobacter thermautotrophicus str- Delta H 93 83 89 288559372|Metha|Methanobrevibacter ruminantium M1

81 148643392|Metha|Methanobrevibacter smithii ATCC 35061 Euryarchaeota 36 325957925|Metha|Methanobacterium sp- AL-21 Metka 94 333988515|Metha|Methanobacterium sp- SWAN-1 78 71 41614994|Nanoa|Nanoarchaeum equitans Kin4-M Thermococcales 170290896|Candi|Candidatus Korarchaeum cryptofilum OPF8 91 57641896|Therm|Thermococcus57641896|Therm|Thermococcus kodakarensis KKOD1 Thermoproteales 84 223478091|Therm|Thermococcus223478091|Therm|Thermococcus sp- AM4AM4 2242398695|Therm|Thermococcus42398695|Therm|Thermococcus sibiricus MM 739 337284758337284758|Therm|Pyrococcus|Therm|Pyrococcus yayanosii CH1CH1 Crenarchaeota Thepe 90 74 94 114520503|Therm|Pyrococcus4520503|Therm|Pyrococcus abyssi GE5GE5 Desulfurococcales 87 3332158150|Therm|Pyrococcus32158150|Therm|Pyrococcus sp- NA2NA2 124485459|Metha|Methanocorpusculum labreanum Z 88604174|Metha|Methanospirillum hungatei JF-1 Sulfolobales 99 219850726|Metha|Methanosphaerula palustris E1-9c 53 84 154151637|Metha|Methanoregula boonei 6A8 29 67 Thaumarchaeota 60 126179860|Metha|Methanoculleus marisnigri JR1 11497994|Archa|Archaeoglobus fulgidus DSM 4304 Thaumarchaeota 79 297619538|Metha|Methanococcus voltae A3 Calte 45358595|Metha|Methanococcus maripaludis S2 92 83 150399109|Metha|Methanococcus vannielii SB Naneq 333910219|Metha|Methanotorris igneus Kol 5 90 Nanoarchaeota 98 296109213|Metha|Methanocaldococcus infernus ME 94 15669348|Metha|Methanocaldococcus jannaschii DSM 2661 Korcr 87 256810698|Metha|Methanocaldococcus fervens AG86 Korarchaeota 14 61 261403319|Metha|Methanocaldococcus vulcanius M7 150400778|Metha|Methanococcus aeolicus Nankai-3 98 336121122|Metha|Methanothermococcus okinawensis IH1 98 116754196|Metha|Methanosaeta thermophila PT 330508607|Metha|Methanosaeta concilii GP6 99 Halo_RPA1_1 85 294494712|Metha|Methanohalophilus mahii DSM 5219 52 67 298674005|Metha|Methanohalobium evestigatum Z-7303 arCOG05741 arRPA1_long arRPA1_short arRPA2 arCOG05578 arCOG03772 arCOG02261 75 11 91773712|Metha|Methanococcoides burtonii DSM 6242 arRPA1_long 72 20093424|Metha|Methanosarcina acetivorans C2A 79 336475982|Metha|Methanosalsum zhilinae DSM 4017 282165660|Metha|Methanocella paludicola SANAE 87 289596922|uncla| T469 333910932|Metha|Methanotorris igneus Kol 5 13542015|Therm|Thermoplasma volcanium GSS1 297620031|Metha|Methanococcus voltae A3 98 98 48478490|Therm|Picrophilus torridus DSM 9790 45357685|Metha|Methanococcus maripaludis S2 82 77 257076906|Therm|Ferroplasma acidarmanus fer1 96 150399872|Metha|Methanococcus vannielii SB 96 20094402|Metha|Methanopyrus kandleri AV19 305663511|Therm|Ignisphaera aggregans DSM 17230 60 223478132|Therm|Thermococcus sp- AM4 54 63 70606760|Therm|Sulfolobus acidocaldarius DSM 639 57641894|Therm|Thermococcus kodakarensis KOD1 15920719|Therm|Sulfolobus tokodaii str- 7 99 337284760|Therm|Pyrococcus yayanosii CH1 15899120|Therm|Sulfolobus solfataricus P2 83 87 28 80 332158152|Therm|Pyrococcus sp- NA2 3 332797833|Therm|Acidianus hospitalis W1 90 14520501|Therm|Pyrococcus abyssi GE5 97 84 99 146303452|Therm|Metallosphaera sedula DSM 5348 242398697|Therm|Thermococcus sibiricus MM 739 170289718|Candi|Candidatus Korarchaeum cryptofilum OPF8 87 81 45357683|Metha|Methanococcus maripaludis S2 44 302348760|Therm|Acidilobus saccharovorans 345-15 150399870|Metha|Methanococcus vannielii SB 124027704|Therm|Hyperthermus butylicus DSM 5456 62 94 333910930|Metha|Methanotorris igneus Kol 5 26 126466308|Therm|Staphylothermus marinus F1 87 297620033|Metha|Methanococcus voltae A3 95 296242163|Therm|Thermosphaera aggregans DSM 11486 126179197|Metha|Methanoculleus marisnigri JR1 97 77 320100590|Therm|Desulfurococcus mucosus DSM 2162 154151219|Metha|Methanoregula boonei 6A8 51 94 218883974|Therm|Desulfurococcus kamchatkensis 1221n 59 219852691|Metha|Methanosphaerula palustris E1-9c 85 156937744|Therm|Ignicoccus hospitalis KIN4-I 307353131|Metha|M.petrolearius DSM 11571 92 118431409|Therm|Aeropyrum pernix K1 80 88601895|Metha|M.hungatei JF-1 118576582|Cenar|Cenarchaeum symbiosum A 60 99 75 124485338|Metha|Methanocorpusculum labreanum Z 329764702|marin|Candidatus Nitrosoarchaeum limnia SFB1 99 11498385|Archa|Archaeoglobus fulgidus DSM 4304 340344367|marin|Candidatus Nitrosoarchaeum koreensis MY1 93 98 288930854|Archa|Ferroglobus placidus DSM 10642 69 161528013|marin|Nitrosopumilus maritimus SCM1 79 284162747|Archa|Archaeoglobus profundus DSM 5631 289596472|uncla|Aciduliprofundum boonei T469 76 81 96 288931688|Archa|Ferroglobus placidus DSM 10642 13542067|Therm|Thermoplasma volcaniumvolcanium GSS1 5 327400197|Archa|Archaeoglobus veneficus SNP6 48477745|Therm|Picrophilus48477745|Therm|Picrophilus torridus DSM 97909790 822 98 147921017|envir|uncultured methanogenic archaeon RC-I 8822 257076028|Therm|Ferroplasma257076028|Therm|Ferroplasma acidarmanus fer1 51 282165520|Metha|Methanocella paludicola SANAE 343485818|uncla|Candidatus Caldiarchaeum subterraneum 89 98 116754740|Metha|Methanosaeta thermophila PT 119719124|Therm|Thermofilum pendens Hrk 5 26 330507584|Metha|Methanosaeta concilii GP6 93 146303241|Therm|Metallosphaera sedula DSM 5348 91 336477022|Metha|Methanosalsum zhilinae DSM 4017 75 302348560|Therm|Acidilobus saccharovorans 345-15 83 294496320|Metha|Methanohalophilus mahii DSM 5219 289597160|uncla|Aciduliprofundum boonei T469 63 49 298675818|Metha|Methanohalobium evestigatum Z-7303 77 96 Halo_RPA1_2 95 91773337|Metha|Methanococcoides burtonii DSM 6242 92 99 282163689|Metha|Methanocella paludicola SANAE 51 20091836|Metha|Methanosarcina acetivorans C2A 94 147921611|envir|uncultured methanogenic archaeon RC-I 307352310|Metha|Methanoplanus petrolearius DSM 11571 42 80 20089479|Metha|Methanosarcina acetivorans C2A 219850710|Metha|Methanosphaerula palustris E1-9c 298675833|Metha|Methanohalobium evestigatum Z-7303 95 100 30 126180295|Metha|Methanoculleus marisnigri JR1 89 336476389|Metha|Methanosalsum zhilinae DSM 4017 75 154149726|Metha|Methanoregula boonei 6A8 83 91773348|Metha|Methanococcoides burtonii DSM 6242 124486472|Metha|Methanocorpusculum labreanum Z 219850711|Metha|Methanosphaerula palustris E1-9c 61 85 88603781|Metha|Methanospirillum hungatei JF-1 78 21 72 307352311|Metha|Methanoplanus petrolearius DSM 11571 Halo_RPA2_1 90 126180294|Metha|Methanoculleus marisnigri JR1 98 35 95 147921610|envir|uncultured methanogenic archaeon RC-I 124486473|Metha|Methanocorpusculum labreanum Z 84 282163688|Metha|Methanocella paludicola SANAE 154149727|Metha|Methanoregula boonei 6A8 69 91773347|Metha|Methanococcoides burtonii DSM 6242 98 Halo_RPA1_3 90 20089480|Metha|Methanosarcina acetivorans C2A 99 336476390|Metha|Methanosalsum zhilinae DSM 4017 89 Halo_RPA_4 95 75 98 116754739|Metha|Methanosaeta thermophila PT 14 298675832|Metha|Methanohalobium evestigatum Z-7303 75 330507585|Metha|Methanosaeta concilii GP6 Halo_RPA2_2 93 99 0.2 97 147921018|envir|uncultured methanogenic archaeon RC-I 289597159|uncla|Aciduliprofundum boonei T469 282165519|Metha|Methanocella paludicola SANAE 89 294496321|Metha|Methanohalophilus mahii DSM 5219 20 82 47 298675819|Metha|Methanohalobium evestigatum Z-7303 92 91773338|Metha|Methanococcoides burtonii DSM 6242 86 336477023|Metha|Methanosalsum zhilinae DSM 4017 77 20091837|Metha|Methanosarcina acetivorans C2A 284162746|Archa|Archaeoglobus profundus DSM 5631 84 93 327400196|Archa|Archaeoglobus veneficus SNP6 75 288931687|Archa|Ferroglobus placidus DSM 10642 11498386|Archa|Archaeoglobus fulgidus DSM 4304 0.5 arRPA1_short arRPA2

Figure 6. Phylogenetic analysis of the RPA family in archaea and eukarya. The designations and the method of tree reconstruction are as in Figure 2. (Green) The Thermococcales branches; (pink) the branches. Halobacterial branches are collapsed and numbered as follows: for arRPA1 from Halo_RPA1_1 to Halo_RPA1_4 and for arRPA2 from Halo_RPA2_1 to Halo_RPA2_2. “Long” RPAs are outlined by the red dotted line and “short” RPAs by the green dotted line. (A) Phylogenetic trees of the RPA1 and RPA2 families. RPA1 corresponds to COG1599 (167 sequences in total, 89 aligned positions), and RPA2 corresponds to COG03390 (76 sequences in total, 149 aligned positions). (B) The phyletic patterns of SSB/RPA proteins and their homologs in archaea. The designations are as in Figure 4C. (In panel B, circles with a yellow outline denote) RPA proteins from several organisms that could not be confidently aligned and thus are not present in the corresponding tree but included into the phyletic pattern. (Green outline) Those that do not include all representatives in the corresponding lineage. Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

K.S. Makarova and E.V. Koonin

the phyletic pattern of RPA such that SSB pro- with replication. However, this approach re- teins now have been identified in all archaea. quires caution because some housekeeping Some potential archaeal SSBs remain un- genes encoded in the same locus might not be assigned. In particular, an uncharacterized pa- functionally related but rather are highly ex- ralog of ThermoSSB (arCOG03772) shows a pressed genes that “hitchhike” with genes that broader phyletic distribution being present, in are not directly functional but are expressed sim- addition to Thermoproteales, in Sulfolobales/ ilarly (Rogozin et al. 2002). In particular, genes Desulfurococcales, Thermococcales, and Ar- involved in replication are often associated with chaeoglobi (Paytubi et al. 2012). In addition, genes coding for components of the translation there is at least one more protein family con- system (Fig. 5) (Berthon et al. 2008). Thus, un- taining OB-fold domains and distantly similar characterized genes in the respective operons to archaeal RPA2 (arCOG02261) that is present might not be involved in replication directly in most Methanococcales and Nanoarchaeum but are nevertheless interesting targets forexper- equitans. imental study (Figs. 3 and 5; Table 1). Recently, for example, analysis of gene neighborhoods led to the prediction that such genes as PACE12, GENOMIC CONTEXT AND PREDICTION OF a member of the GPN-loop GTPase family; NEW COMPONENTS OF THE REPLICATION and NudF, a NUDIX pyrophosphatase family MACHINERY member are involved in DNA replication and/ The results of comparative outlined or repair in archaea (Berthon et al. 2008). Here here show that despite the fundamental impor- we also identified several uncharacterized genes tanceof DNA replication, the protein machinery that could be involved in these processes (Figs. 3 responsible for this process shows remarkable and 5; Table 1). In addition, because replication evolutionary plasticity that involves multiple is a complex, metabolically costly process, many events of gene loss, non-orthologous displace- other proteins and cellular systems could be di- ment, and lineage-specific duplication, even rectly or indirectly involved in replication. Re- among the genes encoding core replication pro- cent analysis of proteins interacting with core teins. Examples of such events leading to com- replication proteins in T. kodakarensis suggests plex evolutionary scenarios are the substitution that there could be dozens of such partners, and a B-family polymerase for the D-family poly- for many of these, the specific function is not merase in Crenarchaeota and substitution of known (Li et al. 2010). Thus, all recent advances ThermoSSB for RPAin Thermoproteales. More- notwithstanding, the current understanding of over, some components of replication systems the archaeal replication system is still far from evolve rapidlyand often lose sequence similarity, being complete. which obscures their origin. However, in many such cases, the rapidlyevolving gene remains in a conserved and functionally coherent genomic GENERAL TRENDS IN THE EVOLUTION OF ARCHAEAL AND EUKARYOTIC neighborhood (Figs. 3 and 5). One such exam- REPLICATION SYSTEMS ple is the extremely divergent RecJ homolog in Thermoproteales and another is the numerous The study of archaeal replication is current- GINS proteins that are still annotated as “hypo- ly a dynamic research field that continues thetical” in sequenced archaeal genomes. These to identify previously uncharacterized protein highly divergent proteins are encoded in the components of the replication machinery, same neighborhoods as their better conserved most of which show new connections with the homologs, which facilitates functional predic- eukaryotic replication machinery. Comparative tion. Moreover, using the “guilt by association” genomic analysis reveals a (nearly) precise cor- principle (Aravind 2000; Galperin and Koonin respondence between the components of the ar- 2000), conservation of gene neighborhoods can chaeal and eukaryotic replication systems, with be used for prediction of new genes associated a few notable exceptions such as the CMG-

20 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

Evolution of DNA Replication

activating proteins for which archaeal counter- diversity than eukaryotes, the RecJ family being parts have not been detected (Fig. 1; Table 1). the prime example. In other gene families, mul- Thus, it appears most likely that the archaeal tiple, independent duplications are traceable in ancestor of eukaryotes possessed a DNA repli- eukaryotes and in different lineages of archaea. cation apparatus that was as complex in its main The degree of paralogization of the archaeal rep- features as the eukaryotic replication apparatus. lication system remains underappreciated, and Given this essentially precise correspondence many divergent paralogs still await functional between the archaeal and eukaryotic replication characterization. systems, it can be expected that additional rep- In addition to the diversification via paral- lication proteins shared between archaea and ogization, archaeal replication systems show less eukaryotes eventually will be discovered (Fig. uniformity and a greater plasticity than the eu- 1; Table 1). A major innovation in eukaryotes karyotic counterparts. The striking examples in- could be the regulation of replication by phos- clude non-orthologous displacement of DNA phorylation of replisome subunits catalyzed by polymerases, SSB, and ligases. The actual scope dedicated kinases. Whether a counterpart to of this plasticity is still unexplored because even this regulatory circuit exists in archaea remains among the relatively few archaeal genomes se- to be determined. quenced to date, some gaps within the core of The close correspondence between the com- the replication machinery are apparent such as ponents of the archaeal and eukaryotic rep- the lack of ORC subunits in M. kandleri, sugges- lication systems notwithstanding, eukaryotic tive of additional displacements of essential replication does show increased complexity components. compared with archaeal replication. The prin- In summary, the recent experimental and cipal modality of this greater complexity is du- phylogenomic advances in the study of eukary- plication of the replication machinery compo- otic and archaeal replication systems clearly nents followed by subfunctionalization (Lynch complement each other and jointly are yielding and Force 2000; Ward and Durrett 2004) where- an increasingly complete picture of the organi- by heteromeric complexes in eukaryotes re- zation and evolution of the core replication ma- place homomeric complexes in archaea. The chinery. This obvious progress notwithstand- obvious cases in point are the ORC, MCM, ing, the identity and roles of more peripheral GINS, and RFC complexes. This evolutionary components of the replication apparatus, the trend in the replication machinery is part of a functions of paralogs of replicative genes, and general relationship between eukaryotes and especially, the ultimate origins of the complex their prokaryotic ancestors as exemplified by replication machinery that is inferred to have the proteasome, the exosome, and many other existed in LACA, remain poorly understood macromolecular complexes (Makarova et al. and are targets for future investigation. 2005). Although, on the whole, the eukaryotic rep- lication system is more complex than the ar- ACKNOWLEDGMENTS chaeal counterparts, complicated histories of lineage-specific expansion of gene families ac- The authors’ research is supported by the intra- companiedbyfunctionaldiversificationabound mural funds of the U.S. Department of Health in archaea. Unlike eukaryotes, where the main and Human Services (to the National Library of route to increased complexity is serial intra- Medicine). genomic gene duplication, in archaea, replicon fusion and more generally are major factors of evolution, leading REFERENCES in particular to pseudoparalogy (Makarova et Adachi J, Hasegawa M. 1992. MOLPHY: Programs for mo- al. 2005). Strikingly, in some families of proteins lecular . In Computer science monographs involved in replication, archaea attain far greater 27. Institute of Statistical Mathematics, Tokyo.

Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 21 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

K.S. Makarova and E.V. Koonin

Araki H. 2010. Cyclin-dependent kinase-dependent initia- Cann IK, Komori K, Toh H, Kanai S, Ishino Y. 1998. A tion of chromosomal DNA replication. Curr Opin Cell heterodimeric DNA polymerase: Evidence that members Biol 22: 766–771. of Euryarchaeota possess a distinct DNA polymerase. Aravind L. 2000. Guilt by association: Contextual informa- Proc Natl Acad Sci 95: 14250–14255. tion in genome analysis. Genome Res 10: 1074–1077. Capp C, Wu J, Hsieh TS. 2010. RecQ4: The second replica- Aravind L, Koonin EV. 1998. Phosphoesterase domains as- tive helicase? Crit Rev Biochem Mol Biol 45: 233–242. sociated with DNA polymerases of diverse origins. Nu- Chattopadhyay S, Bielinsky AK. 2007. Human Mcm10 reg- cleic Acids Res 26: 3746–3752. ulates the catalytic subunit of DNA polymerase-a and prevents DNA damage during replication. Mol Biol Cell Aravind L, Walker DR, Koonin EV. 1999. Conserved do- 18: 4085–4095. mains in DNA repair proteins and evolution of repair systems. Nucleic Acids Res 27: 1223–1242. Chia N, Cann I, Olsen GJ. 2010. Evolution of DNA replica- tion protein complexes in eukaryotes and Archaea. PLoS Balasov M, Huijbregts RP, Chesnokov I. 2009. Functional ONE 5: e10866. analysis of an Orc6 mutant in Drosophila. Proc Natl Acad Sci 106: 10672–10677. Coker JA, DasSarma P, Capes M, Wallace T, McGarrity K, Gessler R, Liu J, Xiang H, Tatusov R, Berquist BR, et al. Barry ER, Bell SD. 2006. DNA replication in the Archaea. 2009. Multiple replication origins of Halobacterium sp. Microbiol Mol Biol Rev 70: 15: 614–619. strain NRC-1: Properties of the conserved orc7-depen- Beattie TR, Bell SD. 2011a. Molecular machines in archaeal dent oriC1. J Bacteriol 191: 5253–5261. DNA replication. Curr Opin Chem Biol 15: 614–619. De Marco V, Gillespie PJ, Li A, Karantzelis N, Christo- Beattie TR, Bell SD. 2011b. The role of the DNA sliding doulou E, Klompmaker R, van Gerwen S, Fish A, clamp in Okazaki fragment maturation in archaea and Petoukhov MV, Iliou MS, et al. 2009. Quaternary struc- eukaryotes. Biochem Soc Trans 39: 70–76. ture of the human Cdt1–Geminin complex regulates Bell SD. 2011. DNA replication: Archaeal oriGINS. BMC DNA replication licensing. Proc Natl Acad Sci 106: Biol 9: 36. 19807–19812. Bell SP, Dutta A. 2002. DNA replication in eukaryotic cells. Duncker BP,Chesnokov IN, McConkey BJ. 2009. The origin Annu Rev Biochem 71: 333–374. recognition complex protein family. Genome Biol 10: 214. Bernander R, Dasgupta S, Nordstrom K. 1991. The E. coli Edgar RC. 2004. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids cell cycle and the plasmid R1 replication cycle in the Res 32: 1792–1797. absence of the DnaA protein. Cell 64: 1145–1153. Edgell DR, Doolittle WF.1997. Archaea and the origin(s) of Bernstein KA, Gangloff S, Rothstein R. 2010. The RecQ DNA replication proteins. Cell 89: 995–998. DNA helicases in DNA repair. Annu Rev Genet 44: 393–417. Edgell DR, Malik SB, Doolittle WF. 1998. Evidence of inde- pendent gene duplications during the evolution of ar- Berquist BR, DasSarma P, DasSarma S. 2007. Essential and chaeal and eukaryotic family B DNA polymerases. Mol non-essential DNA replication genes in the model halo- Biol Evol 15: 1207–1217. philic Archaeon, Halobacterium sp. NRC-1. BMC Genet Ellenberger T, Tomkinson AE. 2008. Eukaryotic DNA ligas- 8: 31. es: Structural and functional insights. Annu Rev Biochem Berthon J, Cortez D, Forterre P. 2008. Genomic context 77: 313–338. analysis in Archaea suggests previously unrecognized Fien K, Hurwitz J. 2006. yeast Mcm10p contains links between DNA replication and translation. Genome primase activity. J Biol Chem 281: 22248–22260. Biol 9: R71. Fien K, Cho YS, Lee JK, Raychaudhuri S, Tappin I, Hurwitz Bloom LB. 2009. Loading clamps for DNA replication and J. 2004. Primer utilization by DNA polymerase a-pri- repair. DNA Repair (Amst) 8: 570–578. mase is influenced by its interaction with Mcm10p. J Bochkarev A, Bochkareva E. 2004. From RPA to BRCA2: Biol Chem 279: 16144–16153. Lessons from single-stranded DNA binding by the OB- Fitch WM, Margoliash E. 1967. Construction of phyloge- fold. Curr Opin Struct Biol 14: 36–42. netic trees. Science 155: 279–284. Bochman ML, Schwacha A. 2009. The Mcm complex: Un- Forterre P. 2002. The orgin of DNA genomes and DNA winding the mechanism of a replicative helicase. Micro- replication proteins. Curr Opin Microbiol 5: 525–532. biol Mol Biol Rev 73: 652–683. Forterre P.2006. Three RNA cells for ribosomal lineages and Bohlke K, Pisani FM, Rossi M, Antranikian G. 2002. Archae- three DNAviruses to replicate their genomes: A hypoth- al DNA replication: Spotlight on a rapidly moving field. esis for the origin of cellular domain. Proc Natl Acad Sci 6: 1–14. 103: 3669–3674. Bork P, Hofmann K, Bucher P, Neuwald AF, Altschul SF, Galperin MY, Koonin EV. 2000. Who’s your neighbor? New Koonin EV. 1997. A superfamily of conserved domains computational approaches for functional genomics. Nat in DNA damage-responsive cell cycle checkpoint pro- Biotechnol 18: 609–613. teins. FASEB J 11: 68–76. Garcia V,Furuya K, Carr AM. 2005. Identification and func- Burgers PM, Koonin EV, Bruford E, Blanco L, Burtis KC, tional analysis of TopBP1 and its homologs. DNA Repair Christman MF,Copeland WC, Friedberg EC, Hanaoka F, (Amst) 4: 1227–1239. Hinkle DC, et al. 2001. Eukaryotic DNA polymerases: Glansdorff N, Xu Y, Labedan B. 2008. The last universal Proposal for a revised nomenclature. J Biol Chem 276: common ancestor: Emergence, constitution and genetic 43487–43490. legacy of an elusive forerunner. Biol Direct 3: 29.

22 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

Evolution of DNA Replication

Hamdan SM, Richardson CC. 2009. Motors, switches, and mology with transcription factor TFIIB. Proc Natl Acad contacts in the replisome. Annu Rev Biochem 78: 205– Sci 108: 7373–7378. 243. Lynch M, Force A. 2000. The probability of duplicate gene Hamdan SM, van Oijen AM. 2010. Timing, coordination, preservation by subfunctionalization. Genetics 154: 459– and rhythm: Acrobatics at the DNA replication fork. J 473. Biol Chem 285: 18979–18983. MacNeill SA. 2010. Structure and function of the GINS Henneke G, Flament D, Hubscher U, Querellou J, Raffin JP. complex, a key component of the eukaryotic replisome. 2005. The hyperthermophilic euryarchaeota Pyrococcus Biochem J 425: 489–500. abyssi likely requires the two DNA polymerases D and MacNeill SA. 2011. Protein–protein interactions in the ar- B for DNA replication. J Mol Biol 350: 53–64. chaeal core replisome. Biochem Soc Trans 39: 163–168. Iyer LM, Leipe DD, Koonin EV,Aravind L. 2004. Evolution- þ Majka J, Burgers PM. 2004. The PCNA-RFC families of ary history and higher order classification of AAA DNA clamps and clamp loaders. Prog Nucleic Acid Res ATPases. J Struct Biol 146: 11–31. Mol Biol 78: 227–260. Kelman Z, White MF. 2005. Archaeal DNA replication and Makarova KS, Koonin EV. 2010. Two new families of the repair. Curr Opin Microbiol 8: 669–676. FtsZ-tubulin protein superfamily implicated in mem- Klinge S, Nunez-Ramirez R, Llorca O, Pellegrini L. 2009. 3D brane remodeling in diverse bacteria and archaea. Biol architecture of DNA Pol a reveals the functional core of Direct 5: 33. multi-subunit replicative polymerases. EMBO J 28: Makarova KS, Aravind L, Galperin MY, Grishin NV, Ta- 1978–1987. tusov RL, Wolf YI, Koonin EV. 1999. Comparative geno- Komori K, Ishino Y.2001. Replication protein A in Pyrococ- mics of the Archaea (Euryarchaeota): Evolution of con- cus furiosus is involved in homologous DNA recombina- served protein families, the stable core, and the variable tion. J Biol Chem 276: 25654–25660. shell. Genome Res 9: 608–628. Koonin EV.2006. Temporal order of evolution of DNA rep- Makarova KS, Wolf YI, Mekhedov SL, Mirkin BG, lication systems inferred by comparision of cellular and Koonin EV.2005. Ancestral paralogs and pseudoparalogs viral DNA polymerases. Biol Direct 1: 39. and their role in the emergence of the eukaryotic cell. Koonin EV.2009. On the origin of cells and viruses: primor- Nucleic Acids Res 33: 4626–4638. dial virus world scenario. Ann NY Acad Sci 1178: 47–64. Makarova KS, Yutin N, Bell SD, Koonin EV.2010. Evolution Koonin EV,Martin W. 2005. On the origin of genomes and of diverse cell division and vesicle formation systems in cells within inorganic compartments. Trends Genet 21: archaea. Nat Rev Microbiol 8: 731–741. 647–654. Makarova KS, Koonin EV, Kelman Z. 2012. The archaeal Koppes LJ. 1992. Nonrandom F-plasmid replication in Es- CMG (CDC45/RecJ, MCM, GINS) complex is a con- cherichia coli K-12. J Bacteriol 174: 2121–2123. served component of the DNA replication system in all Kornberg A, Baker TA. 2005. DNA replication, 2nd ed. Uni- archaea and eukaryotes. Biol Direct 7: 7. versity Science Books, Sausalito, CA. Marinsek N, Barry ER, Makarova KS, Dionne I, Koonin EV, Krupovic M, Gribaldo S, Bamford DH, Forterre P.2010. The Bell SD. 2006. GINS, a central nexus in the archaeal DNA evolutionary history of archaeal MCM helicases: A case replication fork. EMBO Rep 7: 539–545. study of vertical evolution combined with hitchhiking of Martin IV,MacNeill SA. 2002. ATP-dependent DNA ligases. mobile genetic elements. Mol Biol Evol 27: 2716–2732. Genome Biol 3: REVIEWS3005. Kuchta RD, Stengel G. 2010. Mechanism and evolution of Matsui I, Urushibata Y,Shen Y,Matsui E, Yokoyama H. 2011. DNA primases. Biochim Biophys Acta 1804: 1180–1189. Novel structure of an N-terminal domain that is crucial Kunkel TA, Burgers PM. 2008. Dividing the workload at a for the dimeric assembly and DNA-binding of an archae- eukaryotic replication fork. Trends Cell Biol 18: 521–527. al DNA polymerase D large subunit from Pyrococcus ho- Labib K, Gambus A. 2007. A key role for the GINS complex rikoshii. FEBS Lett 585: 452–458. at DNA replication forks. Trends Cell Biol 17: 271–278. McGeoch AT, Bell SD. 2008. Extra-chromosomal elements Lee C, Hong B, Choi JM, Kim Y, Watanabe S, Ishimi Y, and the evolution of cellular DNA replication machiner- Enomoto T, Tada S, Cho Y. 2004. Structural basis for ies. Nat Rev Mol Cell Biol 9: 569–574. inhibition of the replication licensing factor Cdt1 by Mer G, Bochkarev A, Gupta R, Bochkareva E, Frappier L, geminin. Nature 430: 913–917. Ingles CJ, Edwards AM, Chazin WJ. 2000. Structural ba- Leipe DD, Aravind L, Koonin EV. 1999. Did DNA replica- sis for the recognition of DNA repair proteins UNG2, tion evolve twice independently? Nucleic Acids Res 27: XPA, and RAD52 by replication factor RPA. Cell 103: 3389–3401. 449–456. Li Z, Santangelo TJ, Cubonova L, Reeve JN, Kelman Z. 2010. Moyer SE, Lewis PW, Botchan MR. 2006. Isolation of the Affinity purification of an archaeal DNA replication pro- Cdc45/Mcm2–7/GINS (CMG) complex, a candidate tein network. MBio 1: e00221-10. for the eukaryotic DNA replication fork helicase. Proc Li Z, Pan M, Santangelo TJ, Chemnitz W, Yuan W, Ed- Natl Acad Sci 103: 10236–10241. wards JL, Hurwitz J, Reeve JN, Kelman Z. 2011. A novel MurzinAG.1993.OB(oligonucleotide/oligosaccharidebind- DNA nuclease is stimulated by association with the GINS ing)-fold: Common structural and functional solution complex. Nucleic Acids Res 39: 6114–6123. for non-homologous sequences. EMBO J 12: 861–867. Liu S, Balasov M, WangH, Wu L, Chesnokov IN, Liu Y.2011. Mushegian AR, Bassett DE Jr, Boguski MS, Bork P, Structural analysis of human Orc6 protein reveals a ho- Koonin EV. 1997. Positionally cloned human disease

Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 23 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

K.S. Makarova and E.V. Koonin

genes: Patterns of evolutionary conservation and func- Rogozin IB, Makarova KS, Pavlov YI, Koonin EV. 2008. A tional motifs. Proc Natl Acad Sci 94: 5831–5836. highly conserved family of inactivated archaeal B family Narasingarao P, Podell S, Ugalde JA, Brochier-Armanet C, DNA polymerases. Biol Direct 3: 32. Emerson JB, Brocks JJ, Heidelberg KB, Banfield JF, Sakakibara N, Kelman LM, Kelman Z. 2009. Unwinding the Allen EE. 2012. De novo metagenomic assembly reveals structure and function of the archaeal MCM helicase. abundant novel major lineage of Archaea in hypersaline Mol Microbiol 72: 286–296. microbial communities. ISME J 6: 81–93. Sanchez-Pulido L, Ponting CP. 2011. Cdc45: The missing Ng WL, DasSarma S. 1993. Minimal replication origin of the RecJ ortholog in eukaryotes? Bioinformatics 27: 1885– 200-kilobase Halobacterium plasmid pNRC100. J Bacter- 1888. iol 175: 4584–4596. Sanchez-Pulido L, Diffley JF, Ponting CP. 2010. Okorokov AL, Waugh A, Hodgkinson J, Murthy A, Hong explains the functional similarities of Treslin/Ticrr and HK, Leo E, Sherman MB, Stoeber K, Orlova EV, Sld3. Curr Biol 20: R509–R510. Williams GH. 2007. Hexameric ring structure of human Sclafani RA, Holzen TM. 2007. Cell cycle regulation of DNA MCM10 DNA replication factor. EMBO Rep 8: 925–930. replication. Annu Rev Genet 41: 237–280. Oyama T,Ishino S, Fujino S, Ogino H, Shirai T,Mayanagi K, Shen Y,Musti K, Hiramoto M, Kikuchi H, Kawarabayashi Y, Saito M, Nagasawa N, Ishino Y, Morikawa K. 2011. Ar- Matsui I. 2001. Invariant Asp-1122 and Asp-1124 are chitectures of archaeal GINS complexes, essential DNA essential residues for polymerization catalysis of family replication initiation factors. BMC Biol 9: 28. D DNA polymerase from . J Biol Pacek M, Tutter AV, Kubota Y, Takisawa H, Walter JC. 2006. Chem 276: 27376–27383. Localization of MCM2–7, Cdc45, and GINS to the site of Sheu YJ, Stillman B. 2010. The Dbf4–Cdc7 kinase promotes DNA unwinding during eukaryotic DNA replication. S phase by alleviating an inhibitory activity in Mcm4. Mol Cell 21: 581–587. Nature 463: 113–117. Pan M, Kelman LM, Kelman Z. 2011a. The archaeal PCNA Shutt TE, Gray MW. 2006. Twinkle, the mitochondrial rep- proteins. Biochem Soc Trans 39: 20–24. licative DNA helicase, is widespread in the eukaryotic Pan M, Santangelo TJ, Li Z, Reeve JN, Kelman Z. 2011b. radiation and may also be the mitochondrial DNA pri- Thermococcus kodakarensis encodes three MCM homo- mase in most eukaryotes. J Mol Evol 62: 588–599. logs but only one is essential. Nucleic Acids Res 39: Sun X, Thrower D, Qiu J, Wu P,Zheng L, Zhou M, Bachant J, 9671–9680. Wilson DM 3rd, Shen B. 2003. Complementary func- tions of the Saccharomyces cerevisiae Rad2 family nucle- Paytubi S, McMahon SA, Graham S, Liu H, Botting CH, ases in Okazaki fragment maturation, mutation avoid- Makarova KS, Koonin EV, Naismith JH, White MF. ance, and chromosome stability. DNA Repair (Amst) 2: 2012. Displacement of the canonical single-stranded 925–940. DNA-binding protein in the Thermoproteales. Proc Natl Acad Sci 109: E398–E405. Sun J, Yu EY, Yang Y, Confer LA, Sun SH, Wan K, Lue NF, Lei M. 2009. Stn1–Ten1 is an Rpa2–Rpa3–like complex Pospiech H, Grosse F,Pisani FM. 2010. The initiation step of at telomeres. Genes Dev 23: 2900–2914. eukaryotic DNA replication. Subcell Biochem 50: 79–104. Sun J, Yang Y, Wan K, Mao N, Yu TY,Lin YC, DeZwaan DC, Price MN, Dehal PS, Arkin AP. 2010. FastTree 2—Approx- Freeman BC, Lin JJ, Lue NF,et al. 2011. Structural bases of imately maximum-likelihood trees for large alignments. dimerization of yeast telomere protein Cdc13 and its in- PLoS ONE 5: e9490. teraction with the catalytic subunit of DNA polymerase Robbins JB, McKinney MC, Guzman CE, Sriratana B, Fitz- alpha. Cell Res 21: 258–274. Gibbon S, Ha T, Cann IK. 2005. The Euryarchaeota, na- Tahirov TH, Makarova KS, Rogozin IB, Pavlov YI, Koonin ture’s medium for engineering of single-stranded DNA- EV.2009. Evolution of DNA polymerases: An inactivated binding proteins. J Biol Chem 280: 15325–15339. polymerase–exonuclease module in Pol 1 and a chimeric Robertson PD, Chagot B, Chazin WJ, Eichman BF. 2010. origin of eukaryotic polymerases from two classes of ar- Solution NMR structure of the C-terminal DNA binding chaeal ancestors. Biol Direct 4: 11. domain of Mcm10 reveals a conserved MCM motif. J Biol Tanaka S, Umemori T, Hirai K, Muramatsu S, Kamimura Y, Chem 285: 22942–22949. Araki H. 2007. CDK-dependent phosphorylation of Sld2 Robinson NP, Bell SD. 2005. Origins of DNA replication in and Sld3 initiates DNA replication in budding yeast. Na- the three domains of life. FEBS J 272: 3757–3766. ture 445: 328–332. Robinson NP, Bell SD. 2007. Extrachromosomal element Wadsworth RI, White MF. 2001. Identification and proper- capture and the evolution of multiple replication origins ties of the Crenarchaeal single-stranded DNA binding in archaeal chromosomes. Proc Natl Acad Sci 104: 5806– protein from Sulfolobus solfataricus. Nucleic Acids Res 5811. 29: 914–920. Robinson NP, Dionne I, Lundgren M, Marsh VL, Ber- Walters AD, Chong JP. 2010. An archaeal order with multi- nander R, Bell SD. 2004. Identification of two origins of ple minichromosome maintenance genes. Microbiology replication in the single chromosome of the archaeon 156: 1405–1414. Sulfolobus solfataricus. Cell 116: 25–38. Ward R, Durrett R. 2004. Subfunctionalization: How often Rogozin IB, Makarova KS, Murvai J, Czabarka E, Wolf YI, does it occur? How long does it take? Theor Popul Biol 66: Tatusov RL, Szekely LA, Koonin EV. 2002. Connected 93–100. gene neighborhoods in prokaryotic genomes. Nucleic Ac- Warren EM, Vaithiyalingam S, Haworth J, Greer B, ids Res 30: 2212–2223. Bielinsky AK, Chazin WJ, Eichman BF. 2008. Structural

24 Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

Evolution of DNA Replication

basis for DNA binding by replication initiator Mcm10. Yutin N, Koonin EV. 2009. Evolution of DNA ligases of Structure 16: 1892–1901. nucleo-cytoplasmic large DNA viruses of eukaryotes: A Wei Z, Liu C, Wu X, Xu N, Zhou B, Liang C, Zhu G. 2010. case of hidden complexity. Biol Direct 4: 51. Characterization and structure determination of the Zhang R, Zhang CT. 2005. Identification of replication or- Cdt1 binding domain of human minichromosome igins in archaeal genomes based on the Z-curve method. maintenance (Mcm) 6. J Biol Chem 285: 12469–12473. Archaea 1: 335–346. Wohlschlegel JA, Dhar SK, Prokhorova TA, Dutta A, Zuo Z, Rodgers CJ, Mikheikin AL, Trakselis MA. 2010. Walter JC. 2002. Xenopus Mcm10 binds to origins of Characterization of a functional DnaG-type primase in DNA replication after Mcm2–7 and stimulates origin Archaea: Implications for a dual-primase system. JMol binding of Cdc45. Mol Cell 9: 233–240. Biol 397: 664–676.

Advanced Online Article. Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a012963 25 Downloaded from http://cshperspectives.cshlp.org/ on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press

Archaeology of Eukaryotic DNA Replication

Kira S. Makarova and Eugene V. Koonin

Cold Spring Harb Perspect Biol published online July 23, 2013

Subject Collection DNA Replication

Replication of Epstein−Barr Viral DNA Endoreplication Wolfgang Hammerschmidt and Bill Sugden Norman Zielke, Bruce A. Edgar and Melvin L. DePamphilis Replication Proteins and Human Disease Replication-Fork Dynamics Andrew P. Jackson, Ronald A. Laskey and Karl E. Duderstadt, Rodrigo Reyes-Lamothe, Nicholas Coleman Antoine M. van Oijen, et al. Break-Induced DNA Replication Helicase Activation and Establishment of Ranjith P. Anand, Susan T. Lovett and James E. Replication Forks at Chromosomal Origins of Haber Replication Seiji Tanaka and Hiroyuki Araki Regulating DNA Replication in Eukarya Poxvirus DNA Replication Khalid Siddiqui, Kin Fan On and John F.X. Diffley Bernard Moss Archaeology of Eukaryotic DNA Replication The Minichromosome Maintenance Replicative Kira S. Makarova and Eugene V. Koonin Helicase Stephen D. Bell and Michael R. Botchan Translesion DNA Polymerases DNA Replication Origins Myron F. Goodman and Roger Woodgate Alan C. Leonard and Marcel Méchali Human Papillomavirus Infections: Warts or Principles and Concepts of DNA Replication in Cancer? Bacteria, Archaea, and Eukarya Louise T. Chow and Thomas R. Broker Michael O'Donnell, Lance Langston and Bruce Stillman Chromatin and DNA Replication DNA Replication Timing David M. MacAlpine and Geneviève Almouzni Nicholas Rhind and David M. Gilbert

For additional articles in this collection, see http://cshperspectives.cshlp.org/cgi/collection/

Copyright © 2013 Cold Spring Harbor Laboratory Press; all rights reserved