(SMC) Proteins in Eukaryotes
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2021.05.15.444277; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. 1 Assessing the ubiquity and origins of 2 structural maintenance of 3 chromosomes (SMC) proteins in 4 eukaryotes 5 6 Mari Yoshinaga1 and Yuji Inagaki1,2. 7 8 1 Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba, 9 Ibaraki, Japan. 10 2 Center for Computational Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan. 11 12 Corresponding author: Yuji Inagaki, [email protected] 13 14 15 Keywords: condensin, cohesin, chromosome assembly, chromosome segregation, DNA 16 repair, ATPase. 17 1 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.15.444277; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. 18 ABSTRACT 19 Structural maintenance of chromosomes (SMC) protein complexes are common in 20 Bacteria, Archaea, and Eukaryota. SMC proteins, together with the proteins related to 21 SMC (SMC-related proteins), constitute a superfamily of ATPases. Bacteria/Archaea 22 and Eukaryotes are distinctive from one another in terms of the repertory of SMC 23 proteins. A single type of SMC protein is dimerized in the bacterial and archaeal 24 complexes, whereas eukaryotes possess six distinct SMC subfamilies (SMC1-6), 25 constituting three heterodimeric complexes, namely cohesin, condensin, and SMC5/6 26 complex. Thus, to bridge the homodimeric SMC complexes in Bacteria and Archaea to 27 the heterodimeric SMC complexes in Eukaryota, we need to invoke multiple 28 duplications of a SMC gene followed by functional divergence. However, to our 29 knowledge, the evolution of the SMC proteins in Eukaryota had not been examined for 30 more than a decade. In this study, we reexamined the ubiquity of SMC1-6 in 31 phylogenetically diverse eukaryotes that cover the major eukaryotic taxonomic groups 32 recognized to date (101 species in total) and provide two novel insights into the SMC 33 evolution in eukaryotes. First, multiple secondary losses of SMC5 and SMC6 occurred 34 in the eukaryotic evolution. Second, the SMC proteins constituting cohesin and 35 condensin (i.e., SMC1-4), and SMC5 and SMC6 were derived from closely related but 36 distinct ancestral proteins. Finally, we discuss how SMC1-4 were evolved from the 37 ancestral SMC protein(s) in the very early stage of eukaryotic evolution. 38 39 INTRODUCTION 40 Chromosomes comprise DNA molecules, which are the body of genetic information, 41 and a large number of proteins with diverse functions. In eukaryotes, cohesin and 42 condensin, together with many other proteins, maintain the integrity of chromosome 43 structure. Cohesin and condensin participate in protein complexes (Anderson et al. 44 2002) that bundle sister chromosomes together during mitosis (Christian H et al. 2008) 45 and meiosis (Ishiguro 2019), and aggregate chromosomes (Sutani and Yanagida 1997), 46 respectively. Cohesin is constituted by two Structural Maintenance of Chromosomes 47 (SMC) proteins (SMC1 and SMC3) (Losada et al. 1998) and accessory subunits 2 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.15.444277; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. 48 Rad21/Scc1 and STAG1/Scc3 (Birkenbihl and Subramani 1995; Carramolino et al. 49 1997; Tóth et al. 1999). Condensin contains SMC2 and SMC4 (Hirano and Mitchison 50 1994), and a different set of accessory subunits CAP-D2, CAP-G, and CAP-H (Hirano et 51 al. 1997). There are two additional SMC proteins, SMC5 and SMC6 (Lehmann et al. 52 1995; Fousteri and Lehmann 2000), which comprise the “SMC5/6” complex together 53 with six accessory proteins (Nse1-6) (Andrews et al. 2005; Fujioka et al. 2002; Hu et al. 54 2005; Pebernard et al. 2004; Pebernard et al. 2006) and involve mainly in DNA repair 55 but also replication fork stability (Aragón 2018). 56 SMC proteins, together with MukB, Rad50, and RecN, belong to a large ATPase 57 superfamily with unique structural characteristics (Niki et al. 1991; Funayama et al. 58 1999; Löwe et al. 2001). SMC proteins comprise “head” that hydrolyzes ATP, “hinge” 59 that facilitates the dimerization of two SMC proteins (SMC1 and SMC3 in cohesin, 60 SMC2 and SMC4 in condensin, and SMC5 and SMC6 in the SMC5/6 complex), and 61 antiparallel coiled coils connecting the head and hinge (Melby et al. 1998). As ATPases, 62 SMC proteins bear seven motifs such as Walker A (P-loop), Walker B, ABC signature 63 motif (C-loop), A-loop, D-loop, H-loop (switch motif), R-loop, and Q-loop, all of which are 64 required ATP binding and hydrolysis. In the ATPases belonging to the SMC superfamily, 65 the Walker A motif, A-loop, R-loop, and Q-loop are located at the N-terminus of the 66 molecule, being remote from the rest of the motifs at the C-terminus (Palou et al. 2018). 67 Thus, SMC proteins most likely form hairpin-like structures to make all of the sequence 68 motifs for ATP binding in close proximity in the tertiary structures (Melby et al. 1998). 69 The vast majority of the members of Bacteria and Archaea possess a single SMC 70 protein for DNA strand aggregation. In contrast to the eukaryotic SMC complexes 71 containing heterodimeric SMC proteins, the SMC complexes in Bacteria and Archaea 72 comprise two identical SMC proteins (i.e., homodimeric), together with accessory 73 subunits (Britton et al. 1998; Soppa 2001). It is noteworthy that the SMC protein is not 74 conserved strictly in Bacteria or Archaea (Soppa 2001). For instance, the absence of 75 the conventional SMC protein in the Crenarchaeota genus Sulfolobus was 76 experimentally shown to be complemented by the proteins that are distantly related to 77 the authentic SMC, namely coalescin (Takemata et al. 2019). 3 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.15.444277; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. 78 To our knowledge, no study on the diversity and evolution of SMC proteins 79 sampled from phylogenetically diverse eukaryotes has been done since Cobbe and 80 Heck (2004). Their phylogenetic analyses recovered individual clades of SMC1-6, and 81 further united (i) SMC1 and SMC4 clades, (ii) SMC2 and SMC3 clades, and (iii) SMC5 82 and SMC6 clades together. Henceforth here in this work, we designated the three 83 unions as “SMC1+4 clan,” “SMC2+3 clan,” and “SMC5+6 clan,” respectively. Based on 84 the phylogenies inferred from the SMC proteins in the three domains of Life, the authors 85 proposed that SMC1-6 were yielded through gene duplication events that occurred in 86 the early eukaryotic evolution. 87 The pioneering work by Cobbe and Heck (2004) was a significant first step to 88 decipher the origin and evolution of SMC proteins in eukaryotes, albeit they provided no 89 clear scenario explaining how a primordial SMC protein diversified into SMC1-6 prior to 90 the divergence of the extant eukaryotes. Thus, we reassessed the ubiquity of SMC1-6 91 in eukaryotes and the phylogenetic relationship among the six eukaryotic SMC 92 subfamilies in this study. Fortunately, recent advances in sequencing technology allow 93 us to search for SMC proteins in the transcriptome and/or genome data of 94 phylogenetically much broader eukaryotic lineages than those sampled from metazoans, 95 fungi (including a microsporidian), land plants, and trypanosomatids analyzed in Cobbe 96 and Heck (2004). Furthermore, computer programs for the maximum-likelihood (ML) 97 phylogenetic methods, as well as hardwares, have been improved significantly since 98 2004. Thus, hundreds of SMC proteins from diverse eukaryotes can be subjected to the 99 ML analyses now, in contrast to Cobbe and Heck (2004) wherein only a distance tree 100 was inferred from the alignment of 148 SMC sequences. 101 Our survey of SMC1-6 in 101 eukaryotes confirmed the early divergence of the 102 six SMC subfamilies in eukaryotes, albeit the secondary loss of SMC5 and SMC6 most 103 likely has occurred in separate branches of the tree of eukaryotes. Moreover, the 104 phylogenetic analysis of SMC1-6, bacterial and archaeal SMC, and Rad50/SbcC (304 105 sequences in total) disfavored the single origin of the six SMC subfamilies in eukaryotes 106 and instead suggested that the ancestral molecule of SMC5 and SMC6 is distinct from 107 that of SMC1-4. We finally explored multiple scenarios to explain how the repertory of 108 the SMC subfamilies was shaped in the very early evolution of eukaryotes. 4 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.15.444277; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.