Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. eerh usa cdm fSine,Pscio ocwRgo,Rsi.Emi:[email protected] Protein E-mail: of Russia. Institute Region, Proteomics, Moscow Pushchino, and Sciences, Bioinformatics of of Academy Russian Laboratory Research, Galzitskaya, Oxana to: *Correspondence Russia Russia. Region, Region, Moscow Moscow Pushchino, 142290, Sciences, of Academy 4 Russian Sciences”, Research, of Protein Academy of Russian Institute the Russia of Region, Research Federal Moscow Biological Sciences, Pushchino, for of 142290, Center Academy Russian Scientific Microorganisms, ”Pushchino Russia of Center Region, Physiology Research Moscow and Pushchino, Biochemistry of 142290, Institute Sciences”, Skryabin of Academy Russian the of Research 1 evolution Deryusheva I. proteins Evgeniya S1 ribosomal proteins Bacterial S1 ribosomal title: bacterial Running of analysis be can evolutionary which and server, the Sequence into ribosomal integrated bacterial are of data dN/dS development presented as evolutionary to The well the useful repeat. as that be single motifs, http://oka.protres.ru:4200. suggested can to phyla; at logo data assemblies data accessed different and research multiple structured the composition obtained from 24 and evolved acid The is in collected S1 amino work identified The protein. identity, proteins this S1 percent were proteins. knowledge, determining bacterial that S1 for our in sequences ribosomal To resource ratio bacterial S1 a evolution. of as 1333 to the history during biologists of assembles of evolutionary computer form of analysis the one independently/dependently a repeats of for comparative can multiple . from study evidence this phyla from provide first globular sense In such will a repeats: classical how phyla into structural versa. the demonstrate bacterial with folded vice we in the proteins is or for differs of repeat relationships development repeats, proteins each evolutionary evolutionary single S1 of where subunit, the study ribosomal of 30S organization, the theories the of “bead-on-string” data, proposed of family a recent protein The has our ribosomal on and important Based proteins. repeats functionally and tandem most mRNA with and protein largest both the with is interacts protein which S1 bacterial multi-domain The Abstract 2020 19, November 4 Sciences” of Academy 3 Russian the of Research 2 Biological for Center 1 Deryusheva Evgenia S1 ribosomal bacterial proteins of analysis evolutionary and Sequence nttt fTertcladEprmna ipyis usa cdm fSine,129,Pushchino, 142290, Sciences, of Academy Russian Biophysics, Experimental and Theoretical of Institute nttt fPoenRsac,RsinAaeyo Sciences of Research Academy Protein Russian of Research, Institute Sciences Protein of of Academy Institute Russian 3 Instrumentation, Biological for Scientific Institute ”Pushchino Center Research Federal Instrumentation, Biological for Institute nttt o ilgclIsrmnain eea eerhCne PscioSinicCne o Biological for Center Scientific ”Pushchino Center Research Federal Instrumentation, Biological for Institute 1 1 nryV Machulin V. Andrey , nryMachulin Andrey , 2 2 ai .Matyunin A. Maxim , ai Matyunin Maxim , 1 3 3 n xn Galzitskaya Oxana and , n xn .Galzitskaya V. Oxana and 3,4* 4 Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. ehv eetysonta h ubro oan nS sadsiciecaatrsi ftephylogenetic the of characteristic distinctive a is which S1 between in differences domains the of mycobacteria, number the of that types rDNA. shown 8 recently 16S have main of We analysis the the for 273 ribosomal in biomarker the of revealed a Of not family as were the . S1 in Gram-negative identified domains protein and were S1 Gram-positive ribosomal phyla of between 12 alignment relationship sequences, pairwise the al. S1 of investigate and et numbers number to different Salah the proteins account by used S1 into studied authors taking were The proteins phyla S1 sequences. bacterial ribosomal classify 13 role to the made sequences. domain, with been sixth interactions have the the attempts finally, Several in elucidated replicase), involved and, be Qb and substrates) to domains (ribosome, RegB remains fifth cell region, which the and initiation of in fourth replication partners third, or S1 protein the (translation other by whole RNAs the with formed as interaction region efficiently in intermediate involved as the phage and domain T4 domains sixth the second the of and while RegB phage, replication ribonuclease Qb phage of the activity of in the replicase 16 activity increases RNA initiation its domains, to translation fifth for S1 for of required optional binding not be the is to for shown responsible specialization are been functional domains has partial two domain a ribosomes sixth Thus, to The binding studied. mRNA. for actively responsible being are of also domains domains two is protein domains S1 S1 the separate of the mRNAs of synthesis of role own The translation its of and repressor initiation autogenic the ribosome. an the in is protein participate domain S1 S1 mRNAs, ribosomal the characteristics with containing specific interact proteins of bacterial family vivo number all this of a 20% of but has about which ribosomes, Proteins up of of makes each component S1 domains, proteins protein structural ribosomal the 8 of S1 only which family of task, not The number actual of different an evolution is a and proteins by emergence ribosomal the evolution bacterial bacterial at of also insight evolution the new of a signatures allows specific of discovery accessed The server be S1 can conservation, which residue analysis, server, Introduction evolutionary the proteins, the into S1 that integrated ribosomal suggested are Keywords: data as data research presented motifs, assemblies obtained multiple biologists The logo from The computer and at evolved repeat. to S1 ribo- protein. composition proteins single useful bacterial S1 acid ribosomal to of be bacterial bacterial amino of history in can identity, development evolutionary ratio data percent evolutionary the dN/dS structured determining of as and for different well study collected 24 resource first vice The in a the or identified as is proteins. were repeats, evolution. work pro- that S1 during single this sequences form somal of to independently/dependently S1 knowledge, development can assembles 1333 our phyla evolutionary of such To of the how repeats will analysis demonstrate of phyla multiple comparative we repeats phyla; bacterial theories this from tandem the In proposed with for domain. repeats: the versa. relationships globular protein structural of evolutionary a a with of family into one teins from study folded The for the is sense evidence proteins. data, repeat classical recent provide and each our the where mRNA on organization, both in Based “bead-on-string” with riboso- differs a interacts important proteins has functionally which and S1 most subunit, and 30S ribosomal largest the of the of is protein protein mal S1 bacterial multi-domain The h ubro tutrlS oan nbcei aiswti titylmtdrnefo n osix to one from range limited strictly a within varies bacteria in domains S1 structural of number The . codnl,S per ocnito he anrgos h -emnlrgo omdfo h first the from formed region N-terminal the regions: main three of consist to appears S1 Accordingly, . http://oka.protres.ru:4200 n neatwt h RAlk ato h mN molecule tmRNA the of part mRNA-like the with interact and 1–5 h aiyo iooa 1poen sauiu aiyo rtischaracterized proteins of family unique a is proteins S1 ribosomal of family The . shrci coli Escherichia . 9,17 . 18 h uhr faohrwork another of authors The . a enietfidb aysuis ti nw httefirst the that known is It studies. many by identified been has 15 13 2 h 1famn,fre ytetid orhand fourth third, the by formed fragment, S1 The . hl h etfu r novdi neatoswith interactions in involved are four next the while , 17 hswr a are u n2 bacterial 26 on out carried was work This . 10,11 12 nadto,S a ucinoutside function can S1 addition, In . iesm te iooa proteins, ribosomal other some Like . 19 sdters eeecdn the encoding gene rpsA the used 14 nadto,tefirst the addition, In . 6,7 in 9 . . Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. eut n discussion at and accessed Results be can server S1 ( The program browser. database cross-platform ( document-oriented language a as ( used framework was web Flask the used We sequence. protein a in acids development amino of Web number the count to were used was described logos Bio.SeqUtils.ProtParam.ProteinAnalysis as sequence calculated func- model was The Bio.pairwise2.align.globalds Yang PID the using (Biopython).The program. aligned matrix was this in Bio.SubsMat.MatrixInfo.blosum62 sequences of the acid with amino parameters ( of tion ( standard pair server service each the 3 identity), MEGA WebLogo used quence the the we by by work, created implemented our In was (ClustalW). Alignment (position, Sequence database UniProt Multiple the from analysis taken sequence were and record Alignment each repeats) for dataset). of analyzed domain database the field SMART S1 from and the deleted each were domain, for to records (these boundaries corresponding zero exact to domains equal The S1 taken was of number this number (None), the databases selected were of domains) values 1200 the sequences (about record, protein analyzed in each domains For structural ( Database of Online identification Genomes and GOLD: the Number type) from genus, taken family, was class, bacteria (, studied ( categories for database taxonomic tion Taxonomic main NCBI the the to to according according classified were bacteria The Python) bacteria in of written diversity sequences. Taxonomic tools protein computing S1 bacterial biological of of avail- analysis set Various freely matics (a the v.2018. Biopython PyCharm using in from implemented implemented were modules (https://www.python.org/), 3 data Python analyzing language and programming presenting able collecting, searching, for Algorithms in described Realization as selected was S1). records (File of records dataset representative dataset A proteins S1 ribosomal at and of accessed expand be Construction and can dN/dS which protein as server, well S1 methods as the ribosomal and motifs, in logo of Materials integrated and family are composition data the acid presented amino of identity, The percentage features the ratios. about on number data data the with structured to them analyze according and superphyla collect into we phyla proteins. ribosomal Here bacterial S1 30S some ribosomal of group 30S family to of domains the family possible S1 the in it of of domain made development to conserved data with evolutionary possible a Names obtained possible Prokaryotic it for The a of made searched hypothesized List sequences we and the S1 protein to addition, according S1 1453 In phyla containing different Nomenclature. data, 25 in studied in Standing The proteins S1 phyla. ribosomal main bacterial the identify in bacteria of grouping 9 h i.ooainbidadBio.codonalign.codonseq.cal and Bio.codonalign.build The . https://elm-lang.org 25 9 eeue ocluaed/S(ai fnnsnnmu osnnmu substitutions). synonymous to non-synonymous of (ratio dN/dS calculate to used were ) . 23 a sdfrdcaaieceto fgahclue nefcsbsdo web a on based interfaces user graphical of creation declarative for used was ) . 22 https://palletsprojects.com/p/flask/ http://weblogo.threeplusone.com fteewsn aao h ubro oan noeo h analyzed the of one in domains of number the on data no was there If . http://oka.protres.ru:4200 http://www.ncbi.nlm.nih.gov/taxonomy 3 9 h nlzddtstcnit f1333 of consists dataset analyzed The . dn ) . 24 https://www.mongodb.com smdls(sn h oda and Goldman the (using modules ds ocluaetePD(ecn se- (percent PID the calculate To . odvlptewbie.MongoDB websites. the develop to ) https://www.megasoftware.net https://gold.jgi.doe.gov/ http://oka.protres.ru:4200 20 .Ga ti informa- stain Gram ). eeue o bioinfor- for used were .TeElm The ). ) 21 ) . . Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. 5 falsuidbceilS eune eogt h rtoatra 6 n 7 eogt to belong 17% and that 1b.). above, 16% showed showed , (Fig. studies we the as Numerous domain time, to to six same 6% belong and five-domain the and Four- sequences At and respectively, respectively. S1 1a.). three-, , sequences, bacterial of two-, and (Fig. studied studied numbers one-, respectively all all different 62%, Thus, of of and containing 1.2% 33% proteins. 55% and groups S1 represented: 2% of most bacterial 0.8%, are prevalence the 1%, proteins the of for account family estimate proteins the to S1 in possible domains domains it S1 S1 make structural six data contains obtained group The biodiversity this proteins in S1 sequences ribosomal of domains Bacterial S1 number structural predominant of the number phylum different However, a a classes contain six). (98%). be Proteobacteria can phylum to may other this one with Proteobacteria of lineage phylum (from S1 monophyletic proteins the have ribosomal a of bacterial always form superphylum has classes data, always phylum a some not - Proteobacteria that does (G+). the make suggested group would and bacteria been which domains, Gram-positive has S1 themselves, are It four in phyla has domains. from two phylum S1 ranging Chloroflexi these five domains the addition, S1 domains, structural In S1 of three number different four. a to contain can one have Cyanobac- which can Actinobacteria, Firmicutes, includes and Actinobacteria while superphylum Firmicutes proposed domains, and The S1 Deinococcus-Thermus, Planc- domains. teria, six phyla S1 six the contain and group basically five, structural to four, used six is and clan or proteins PVC some phylum. four, phyla the in this into indels one, to conserved Chlamydiae belonging Verrucomicrobia, characteristic has sequences tomycetes, and which the rRNA of 16S phylum, 98% of Bacteroidetes cover phyla, Gemmatimonadates Analysis domains , the some six (Chlorobi, is For containing domains proteins exception However, S1 Chlorobi S2). An six S1domains. the contains File with always Ignavibacteriae). combined 1, almost phyla is group and (Table group studied phylum FCB Bacteroidetes this the method the the of of staining So, into protein classifications Gram 1. Fibrobacteres literature Table and the to the added phyla, by in been division existing have classes the their taxonomic added and 6% have supergroups we and of respectively, into analysis, percentage Actinobacteria, further and the for Firmicutes and Here, to proteins belong S1 sequences ribosomal results of Bacteroidetes. in previous to 17% structural domains from and of little structural 16% number differ S1 Proteobacteria, phyla (the of 1. various Table number proteins of in the representation shown S1 of are ribosomal dataset distribution studied of The Phylum the distribution in phyla. superphyla) phylogenetic bacterial and different the phylum 24 domains domains, of in bacterial Features identified records in domains of 1333 proteins. S1 diversity. group containing 0.6% S1 three highest-level dataset have ribosomal about the Cyanobacteria a studied up is used phylum. all make we Tenericutes of study, S1 the 0.8% this proteins is only For two-domain group for these this represented account to in least domain, all belonging represented one of The Records most containing (50% The phylum proteins, proteins). records. Actinobacteria S1 S1 all the four-domain records. of to found all all 33% belong were of group up domains (47% this S1 make Firmicutes in four phyla bacteria and with main analyzed S1) Records all in proteins phylum. Almost bacteria four-domain Proteobacteria of cases. the grouping of phylogenetic to 33% the belong in for which hallmark proteins, a S1 is domain S1 demonstrate in to us domains allowed structural phyla sequences of S1 1453 number of the analysis that exhaustive extended proteins automated above, S1 mentioned As ribosomal of distribution phylogenetic of Features 9 osdrn 43S eune,w bandta bu 2 falrcrswr dnie ssix- as identified were records all of 62% about that obtained we sequences, S1 1453 Considering . > 8 falbceilioae eogt orpyao atra(i Four): (Big bacteria of phyla four to belong isolates bacterial all of 88% 29,30 27 26 u aa(al )dmntaeta h iooa S1 ribosomal the that demonstrate 1) (Table data Our . hssprru ntstetoms ersnaiephyla representative most two the unites supergroup This . n steeoeaueu akfrrveigprokaryotic reviewing for rank useful a therefore is and 4 28 sordt hw(al ) atrao the of bacteria 1), (Table show data our As . 9 31 5 falsuidsqecsbln to belong sequences studied all of 55% . o xml,teDeltaproteobacteria the example, For . 32 codn oour to According . 9 . Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. n imcts(5 falfu-oanpoen 1 Fgr ) hl atriee n adsrc are Caldiserica and sequence of Bacteroidetes Almost percent Phyla highest S1) the studied. proteins domains, 2). four-domain proteins four (Figure all containing S1 proteins of S1) ribosomal S1 (52% proteins For all Actinobacteria four-domain group. phyla of this all the in cases of mono-representatives to 33% belong (45% in group Firmicutes this identified and in were bacteria domains analyzed S1 all four for (99%). with this identity 15% Records For sequence and 1). of phyla. Cyanobacteria in (Figure Cyanobacteria percent and phyla and highest cases Actinobacteria taxonomic the Proteobacteria of for have Between the strains 2% 14% for is Cyanobacteria 38%. and in identity phyla is identified group, sequence Proteobacteria identity are and the sequence proteins Actinobacteria of of the percent S1 percent the three-domain the group, rule, rep- phylum this some a Cyanobacteria also As have the domains; (mono-representative) Within S1 Proteobacteria (Acti- proteins. and three Terrabacteria) S1 13% has phy- (G+ superphylum) three-domain phyla is (Terrabacteria Actinobacteria Proteobacteria Cyanobacteria identity the phylum of the sequence resentatives containing individ- the to cases, of found within all percent In smallest, Proteobacteria) identity were the and phyla). the (Actinobacteria sequence group, Fig.1) Proteobacteria 15% %), rimosus this of and tomyces Actinobacteria sequences, and (24 Firmicutes) in group, percent this and For all phylum phyla Firmicutes, Proteobacteria (http://oka.protres.ru:4200/protein/5eb71e4b07439c8b4d90c98a/pid). highest taxonomic Actinobacteria Firmicutes; of the Between Actinobacteria, and the (0.8% nobacteria domains, the to (15%). bacteria two from belongs lum few containing phyla bacteria unique a proteins ual a (some only as S1 domains proteins For phyla, S1 mentioned S1 containing also studied one-domain is two the proteins all classify Actinobac- such of to example, In uniqueness possible The Firmicutes) (for seems proteins. and it 10% S1 data, (Bacteroidetes of from group our 18% on ranges to based Other identity Thus, sequence etc.) taxonomic (25%). the vancoresmycina Between Actinobacteria Actinobacteria, colatopsis phylum group, of and Tenericutes this mono-representatives. Tenericutes percent For are the (http://oka.protres.ru:4200/protein/5eb71e488886fe5b65803db9/pid). the Bacteroidetes; one-domain) phyla to and containing group, individual smallest, teria within proteins this the identity (S1 in sequence (58%), group phyla of phylum this percent Actinobacteria in highest phyla the the to one-domain, proteins. containing belongs S1 proteins bacterial of S1 sequences For the aligning amino by obtain revealed was to (Figure them code user UniProt the in the appropriate mentioned allows the within on As also proteins clicking S1 server by at containing dataset S1 five-domain the accessed The for of be records analysis 1c). 2b. any can PID for Figure a alignment which information in of sequence pairwise server, example acid shown of the An is into results 2a. phylum integrated Figure the Deinococcus-Thermus in are and given data domains) is each presented teins structural for The of identity percent number phyla. http://oka.protres.ru:4200 the the bacterial calculated to and the collected (according within we group sequences, and S1 phyla ribosomal bacterial proteins available S1 all For ribosomal most bacterial The of Firmicutes. and Identity representative. and Bacteroidetes considered Sequence Actinobacteria and Percent be from Proteobacteria can proteins from microbiomes and S1 S1 of containing proteins phyla analysis current containing four-domain bacterial for six-domain the our major the or dominate of are range analysis percentage groups phyla a isolation representative with the four for microorganisms, reflects these of either study diversity therefore 90%, we the and determining to studies 40 challenging, most from is in highlighted Four was Big phyla teobacteria the microbiology to of belong understanding not Bacteroidetes do and that Actinobacteria, Firmicutes, Proteobacteria, 9 and eaieylwpreto euneiett ohwti niiulpyaadbetween and phyla individual within both identity sequence of percent low relatively a , mcltpi mediterranei Amycolatopsis Aayi I) neapeo I nlsso ordmi otiigS pro- S1 containing four-domain of analysis PID a of example An PID). (Analysis and ircsi aeruginosaTAIHU98 Microcystis ciolnsfriuliensis Actinoplanes 34 ttesm ie h ubro irognssblnigt h Pro- the to belonging microorganisms of number the time, same the At . aetehgetpreto euneiett (85%). identity sequence of percent highest the have 5 2% aetehgetpreto euneidentity. sequence of percent highest the have (24%) 33,34 and Fg c) nfc,otiigisolates obtaining fact, In 1c.). (Fig. ircsi euioaDIANCHI905 aeruginosa Microcystis 17 . 35–37 ngnrl h dataset the general, In . Strep- Amy- Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. h rtadscn oan nS rtis otiigtosrcua oan,as aealwpercentage low a have also domains, structural two domain. containing S1 proteins, the S1 containing in RNA- proteins proteins domains main other R68 S1 second from the and single-domain the differs and first perform known, mechanism The effectively is containing binding N64, RNA phylum) as the Tenericutes protein time, or (the H34, same residues eukaryotic class the acid At Mollicutes F22, and the function bacteria. F19, of binding archeal, some bacteria for residues parasitic only bacterial, of conserved conserved are other R68 of in and presence site strict ( binding domain The RNA S1 form (27%). which representation. other sequence in small each the Proteins to shown due domain. individual possibly As each exception, within the an conservatism Thus, in are increase domains an profiles. proteins five with such containing protein correlates in domains S1 structure S1 the secondary structural of the of of to analysis conservatism corresponding the high regions the considering sequence the when are identified conserved for were most profiles patterns logo shown specific create is can domains Some server. two user containing S1 of the proteins alignment Moreover, the S1 with using http://oka.protres.ru:4200). for analyzed motif, sequences profiles were analysis full-size motif domains) (Logo Logo structural 3 of of example Figure number An in the (by sequences. protein group S1 study the each in motifs logo The pro- S1 For motifs Logo Proteobacteria). Chlamydiae gamma phylum ( and the 15% from (37%). (beta from bacteria phylum muridarum ranges 45% domains, identity to within six sequence the identity ) containing the to teins of sequence and smallest percent Fusobacteria of the the Bacteroidetes, , percent group, (69%), and highest this Spirochaetes, phylum in the the Chlamidia phyla domains, the taxonomic include six Between to domains containing belongs S1 proteins phyla six S1 individual For containing phy- bacteria phyla. alpha, the Gram-negative the Chlamidia of rule, distribu- six for bacteria have a Verrucomicrobia 4% domains. The from Fusobacteria, and As and S1 S1 Fibrobacteres, Planctomycetes 3%, proteins Oligoflexia, Deferribacteres, proteins). Nitrospirae, Aquificae, 55%, ribosomal 1). S1 Ignavibacteriae, Acidobacteria, Also 15%, , (Figure bacteria), six-domain respectively. 23%, sulfur domains classes, all (green is Chlorobi S1 epsilon Proteobacteria of lum and phylum six (86% delta, the containing phylum gamma, within proteins beta, Proteobacteria classes as taxonomic the identified of to tion were belong records proteins the these of 62% About (Verru- (97%). 31% identity sequence to of percent Haloplasmatales) percent the and Deinococcus-Thermus group, Planctomycetes phylum this (53%) the example, in highest this group, phylum (for For phyla the (http://oka.protres.ru:4200/protein/5eb71e57aab9de6c16cb9d0d/pid). Deinococcus-Thermus 14% taxonomic domains, Chlamydiae) Between from the and five comicrobia (26%). ranges to containing phylum identity proteins belongs Proteobacteria sequence S1 phyla the of For to Verru- individual smallest (Fig.1). Haloplasmatales, within the proteins , identity and S1 make the sequence proteins ribosomal of S1 of studied bacteria five-domain percent rule, all in always a of found As have 1.2% also phyla. Deinococcus-Thermus, up are Chlamydiae phylum and domains class) Planctomycetes S1 Proteobacteria, Deinococci Five comicrobia, one domains. of S1 (consisting five monotypic the 14% of from Bacteria Oshkosh) varies group, / identity this For 1551 sequence CDC Firmicutes). of (strain and percent berculosis (Actinobacteria the 23% group, strains to Chloroflexi Planctomycetes) the this Actinobacteria to and in smallest Proteobacteria phyla the example, taxonomic (64%), (for phylum Between Actinobacteria the (27%). to phylum belongs phyla individual within identity http://oka.protres.ru:4200/protein/5eb71e488886fe5b65803db9/logo aetehgetpreto euneiett (95%). identity sequence of percent highest the have 7 a o eeldtkn noacutaayi ftelg oi fti group this of motif logo the of analysis account into taking revealed not was 9 igedmi 1poen aeantvr ihpretg fiett with identity of percentage high very not a have proteins S1 domain single , 40 ti osbeta o hs atrateRAbnigst sfre yseicamino specific by formed is site binding RNA the bacteria these for that possible is It . yoatru ueclss(tanAC 51 H37Rv) / 25618 ATCC (strain tuberculosis Mycobacterium aetehgetpreto euneiett (100%) identity sequence of percent highest the have hru parvatiensis Thermus 6 β- tad,wihcreaewt u ale aaon data earlier our with correlate which strands, and 38 nadto,a nraei h number the in increase an addition, In . hru thermophiles Thermus .Frti ru,rsde 1,F22, F19, residues group, this For ). hayi trachomatis Chlamydia and yoatru tu- Mycobacterium aetehighest the have and . Chlamydia 39 , Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. ic tcndtriewihgnsaetems o es)cnevd swl sietf ee htmay that genes identify as well as conserved, feature, least) informative (or an most be evolution can the adaptive and are of genes genes periods of through which rates gone determine have evolutionary can the summarizes it ratio since dN/dS dN/dS and dN/dS The evolution, mutations protein with neutral selection. beneficial a indicating genes, 1 and protein encoding = with selection on dN/dS genes accordance purifying selection, acting dN/dS homologous mutations, (in selection the of neutral natural known, group between set of is each balance a non-synonymous As the for on of assess http://oka.protres.ru:4200). model) acting ratio to analysis, Yang the used (dN/dS and is calculated domains) ratio we Goldman structural proteins, of the number S1 (using the ribosomal substitutions bacterial synonymous of good to sequences a available but in functions, all are protein substitutions For or the synonymous data C-terminus of to obtained the efficiency non-synonymous the residue), from the of only F22 domain Ratio 3b). reduces conserved S1 (Fig protein the one the capabilities group of off of functional and this N-terminus position cutting its first in the not that the The from domain confirming (in domains fourth data L22 3b). S1 the domain, two experimental F19, (Figure second for the R68 residues, the specific For with Four and are site. agreement N64, binding retained. R68 RNA H34, are and a F22), form R68 H34 can for and that (conserved residues N64, L22 conserved domains F22, specific highest F19, no fourth the (68%) have has residues: and group domains domain third four sixth five this third The with in by the (39%). domains proteins formed between second other S1 the is determined among and for are first identity as, the identity of well between percentage are of as values values minimum domains, the maximum structural and the (71%) six an containing domains, form apparently proteins five residues S1 and R68 abundant of and sequence most H34 the the other F22, For of F19, among representativeness domains identity small three the of remaining site. Despite the the binding percentage for have RNA retained. maximum group, is of domains the this site percentage of structural has RNA-binding lowest bacteria the five domain the from have containing fourth residue domains the proteins R68 fifth group, S1 and (49%) this retained. of second domains is In the group residue) while the H34 (43%). (66%), conserved in identity binding identity the RNA of domains of the an position percentage formed for fourth the maximum residues conserved and (at R68 also and residue third are N64, R34 residues The F22, only These domain F19, domain, second domain. first group. the this For the this For for in group. conserved domains site. this other highly in among domain are third (66%) S1 The residues homology fourth domains. R68 of third and percentage and H34 maximum second third the the F22, the between also by between value has identity characterized found minimum also was is the value domain identity domain and maximum are (78%) first of the domains residues the domains, fourth site group, structural and R68 RNA-binding four this and containing the for proteins H34, form that S1 efficiency. For F19, assumed to binding be RNA seem proteins. can of H34) S1 degree It residue bacterial lower domain. conserved a containing third the three-domain this (42%). the of the for for domains position conserved in domains second the domain other the (at second among and R34 (57%) the first identity and the of R68, the percentage between between N64, maximum value found the minimum was has identity the group of domain and value third (53%) maximal the second domains the Moreover, the domains, third for structural residues and conserved three first are containing H34 proteins and S1 F19 For 3a). conserved. (Figure values are group site minimum proteins this binding and domains S1 in RNA maximum remaining from the domain the domains the of with residues second for R68 pairs the identified and and while been identity, first have 38% The identity have respectively. of domains 30%, structural and two 27% containing domains: within identity of 9 o h rtdmi nti ru fbcei,N4rsdeo h N-idn iei conserved. is site RNA-binding the of residue N64 bacteria, of group this in domain first the For . 9 h rtdmi a oseiccnevdmtfrsde;frtescn oan only domain, second the for residues; motif conserved specific no has domain first The . 14,41 . 43 oee,i eldt,pstv eeto osntoccur, not does selection positive data, real in However, . 7 > 42 9 hsrtomaue h teghadmode and strength the measures ratio This . niaigpstv aatv rdiversifying) or (adaptive positive indicating 1 o h rtdmi nti ru,F9 F22 F19, group, this in domain first the For . < niaigngtv prfigo cleaning) or (purifying negative indicating 1 9 o hsdmi,teRAbnigsite binding RNA the domain, this For . Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. tidadfut oan)apast evtlfrteatvt n ucinlt fteepoen.This proteins. proteins these of of part functionality central and the activity Thus, this the of site. for uniqueness binding vital the RNA confirms be strongest again to the once appears as which domains) R68, it and fourth consider ( N64, and to PNPase H34, from Mollicutes) (third us L22, (Tenericutes, domain allows proteins F19, S1 and domain the residues: among repeat single with (68%) five S1 identity identity by highest from highest sufficient formed the domains the is is has S1 has (six) group the domain domain this and in this S1 domain ) addition, the third In of 2001). The al., repeats domains. functions. et of other necessary (Andrade number the maximum repeats all the several perform its proteins, to of both of chain bacterial formed exception investigated one are the the from which For and with repeats, multiple repeats homologue, homologous single modern of with its examples oligomers of are from structure there However, repeating character. intrachain activitymulti-chain contained the functional necessarily reflect effective which may for ancestor, ancestor homo-oligomers common formed a that and from evolved repeat, repeat each repeats repeat for These functions single corresponding unclear. a the proteins still repeats, S1 is protein of evolution ribosomal nature their 30S the understanding bacterial of of problem The family the overall of development the evolutionary for Possible important percentage more low obviously rather is a the domain) reveals Discussion that (S1 Note group scaffold each domain. proteins S1 structure in these the of the domains of functioning amino individual that conservatism different the the indicating of identity, with between percentage associated of sequences domain the is structural the domains, which S1 explained six of same, the is and the alignment in K four almost region containing and is flexible E proteins residues a residues S1 acid as ‘disorder-promoting’ of well phyla of representative as predominance terminus, most The and domain. linkers S1 flexible the by of structure the in structural for and in specific W) represented acids F, is (Y, which amino domain, residues of S1 the tendencies is proteins position-specific S1 ribosomal of Amino unit a (Analysis structural by domains) basic the structural of above, of percentage mentioned number As the the calculated with and accordance collected (in we group composition, proteins, each acid S1 for ribosomal residues acid bacterial amino of sequences available all For gives composition unfortunately, acid ratio, interpret. dN/dS Amino to the (dN/dS difficult of selection analysis are phyla), negative that bacteria only results (dN/dS between ambiguous phyla) and selection for (within between negative domains proteins, and phyla) S1 S1 individual (within between four-domain proteins and the S1 (within example, For six-domain proteins (for S1 revealed group. was four-domain this selection the in positive 1.05: relatively sequence a = acid Actinobacteria, dN/dS nucleic phylum the the of of representatives some pairs all of characteristic (dN/dS selection phylum the negative for nexilis proteins, found S1 was two-domain (0.75) For and ratio dN/dS Tenericutes. high = phylum relatively (dN/dS < the specific a Proteobacteria Also, to a phylum relative 1.26). the species) example: to = Bacteroidetes relative (for all (dN/dS Tenericutes revealed protein not was and the selection 1.46) but positive of (some, Actinobacteria, region phylum phylogeny (dN/dS certain the of a selection branch negative in one only proteins, within observed and/or usually is domain) selection such because )as rdmnts eaieyhg Nd aiswr on for found were ratios dN/dS high Relatively predominates. also 1) ciolnsfriuliensis Actinoplanes β- Poebcei) o he-oanadfiedmi 1poen,ngtv eeto (dN/dS selection negative proteins, S1 five-domain and three-domain For (Proteobacteria). arlwt nadditional an with barrel hdccu wratislaviensis Rhodococcus 50 http://oka.protres.ru:4200 oeatossgetdta h omnacso ftefml a nedasingle a indeed was family the of ancestor common the that suggested authors Some . β- tad nrbsmlpoen 1 n r hrceitc fa additional an of characteristics are L and A S1. proteins ribosomal in strands β- rnhdaioais(,V ) scnb enfo al ,V ,FadTare T and F I, V, 2, Table from seen be can As I). V, (T, acids amino branched Friue)–08 n o h pair the for and 0.83 – (Firmicutes) 9 . α- ei ewe h hr n fourth and third the between helix < )wsms fe bevd oee,frsm ersnaie of representatives some for However, observed. often most was 1) .Tedt r rsne nTbe2. Table in presented are data The ). and β- aacroi denticolens Parascardovia strands 8 48,49 eeldapeoiac flrearomatic large of predominance a revealed egao sp. Beggiatoa 51 h oooioei tutr fan of structure homo-oligomeric The . β- uatru hallii Eubacterium strands .Frohrrpeettvsof representatives other For ). 44–46 9 Friue)and (Firmicutes) n h N idn site binding RNA the and , 47 ealdaayi fthe of analysis Detailed . o 1snl domain single S1 For . < < 7 )i bevd For observed. is 1) )i bevd For observed. is 1) sarl,frthe for rule, a As . (Actinobacteria) Tyzzerella < α- .coli E. helix )is 1) Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. hlmBceodts ln ihPoebcei,Friue,adAtnbcei,aeas mn h most the among also rhizosphere are the Actinobacteria, in and groups Firmicutes, bacterial four, Proteobacteria, common one, with 4). has along (Figure Bacteroidetes, Cytophagia Bacteroidetes Phylum class phylum The the Gemmatimonadates, within phyla). the domains for Bacteroidia number six class (constant and domains ribosomal and S1 the this Chlorobi six that contains show Fibrobacteres, in always data almost Ignavibacteriae, Our Ignavibacteriae group this group. and in ancient protein more Gemmatimonadates Chlorobi, S1 phylogenetically phyla a Fibrobacteres, considered phyla the is member Fibrobacteres include phylum main also the after authors named group Some bacteria of Bacteroidetes. superphylum of a and organisms most is group and of FCB largest genome The the the is data, PVC, other published superphylum each some of to to is phyla closer have changes According other proteins evolutionary can with to 4). some Planctomycetes comparison susceptible (Figure while in in domains Planctomycetes, domains, indels phylum S1 S1 clan conserved the six six PVC characteristic and contain the five, and generally in four, Chlamydiae phyla rRNA Verrucomicrobia, Verrucomicrobia Planctomycetes, 16S and phyla of Chlamydiae the analysis group to the ancientused above, phylogenetically a mentioned in considered domains As is S1 literature microorganisms structural the of in of group Spirochaetes number distinct phylum constant and the the Moreover, evolutionary in bacteria. The Nitrospi- reflected these Spirochaetes, domains. apparently lifestyles Fusobacteria, S1 is Deferribacteres, copiotrophic six /Tectomicrobia Caldiserica, have a as rae, Alphaproteobacteria with phyla class such associated the of were development and independent Acidobacteria phyla phyla these and Alphaproteobacteria the with to Aquificae data, associated phylum our belonging be of the bacteria representatives to from of both considered development bacteria that often evolutionary gene data, is The horizontal phylum our frequent domains. Acidobacteria to to S1 the due According six phyla is strictly niches. have two closeness inorganic Epsilonproteobacteria ecological This the class in common Proteobacteria. of to signature to species the due related indel the of closely transfer conserved are relationship in the Aquificae specific found by that The uniquely gested supported comparison is changes. in is domains insignificantly which Epsilonproteobacteria Actinobacteria, S1 or pyrophosphatase, the the of constantly to number to number the Aquificae addition this in phylum (in where diversity Proteobacteria phyla, greatest within other the classes have with these phyla) of numbers Firmicutes data, different and our having Bacteriodites to Deltaproteobacteria Gamma-proteobacteria According and as Beta-proteobacteria domains. time with S1 domains, same six the and five at have Bacteria out of branched phyla other Alphaproteobacteria that most suggest than proteins various later Proteobacteria of out with analyses branched Phylogenetic along Gamma-proteobacteria 4). and (Figure Beta-proteobacteria domains a that S1 evolu- up structural are of make number they constant together six. and to Gamma-proteobacteria , the one and this from Chromatibacteria Gamma-proteobacteria in ranges called to class taxon domains oldest related the S1 closely be of most to number tionary considered Deltaproteobacteria the is 4). and Epsilonproteobacteria phy- proteobacteria, (Figure – that this, Gamma domains Note Alpha Within S1 domains. domains, structural S1 1b). S1 six phylum and (Figure or six sequences S1 five have of of proteins number Epsilonproteobacteria consist all different and of a Acidithiobacillia 55% by Thus, represented of the are consists of efficiency classes Proteobacteria the logenetic the only above, reduces protein mentioned the S1 As functionality of the N-terminus its of the not repeats from but domains six functions, S1 with from protein two proteins S1 or well-studied protein C-terminus the ribosomal the of 30S from bacterial One the data. is experimental domain with consistent is suggestion 27 tsol entd htteepyao hlgntctesaeotna h aelvl hl the while level, same the at often are trees phylogenetic on phyla these that noted, be should It . 29,52 h lgflxacasi hrceie ytepeec ffu rsxS oan;frBt and Beta for domains; S1 six or four of presence the by characterized is class Oligoflexia The . 64 . 55,56 54 u aaas ofimtesprto fti ls noasprt n o a for one separate a into class this of separation the confirm also data Our . . 53 oee,teAiihoaillscaswspeiul lsie spr of part as classified previously was class Acidithiobacillales the However, . 63 hl lmda n ercmcoi r osdrdevolutionarily considered are Verrucomicrobia and Clamydiae Phyla . 62 hspyu otissxS oan Fgr 4). (Figure domains S1 six contains phylum This . 14,41 65 hyhv enfudi olsmlsfo aiu locations, various from samples soil in found been have They . . 9 .coli E. twssonta utn ffoeS domain S1 one off cutting that shown was It . 57 55,56 In . oeta hs classes these that Note . 58 h uhr losug- also authors the , 59,60 28 61 atrao the of Bacteria . codn to According . u otefact the to due Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. yolsa rgnt ihrfo omnacso ihGa-oiiebcei,o rmdffrn bacte- different from or or bacteria, bacteria Gram-positive Gram-positive with ancestor some common of ria a Tenere- evolution from (phylum the either originate Mollicutes of mycoplasmas branch bacteria regressive parasitic binding a the RNA Firmicutes are from of (Mollicutes) protein functions mycoplasmas basic S1 that the ribosomal performs 30S effectively bacterial cutes) Terrabacteria. the supergroup the that in Note predominantly phyla also other it than data, ancient our more to Chlo- is Cyanobacteria, according domains Actinobacteria, Firmicutes S1 phyla phylum four the the has to and close another Deinococcus-Thermus, Evolutionary to and According close roflexi, domains. evolutionarily Chloroflexi. S1 more phylum four are the have Chloroflexi in predominantly phy- and domains Chloroflexi Actinobacteria S1 phyla the four the in and version, Cyanobacteria arose Eon of Archean phylum photosynthesis number the the anoxygenic a during in of However, photosynthesis domains oxygenic forms occurs. of earliest invention metabolism the the this of before which lum assumptions one in on that taxa, based argued bacterial is have existing understanding studies few the on of based much hypotheses and phototrophy, and anoxygenic of Cyanobacterial history of evolutionary as and lineages well ancient as phyla in phyla, devoloped Deinococcus-Thermus Firmicutes photosynthesis and and Tenerecutes, Chloroflexi, 4). Actinobacteria, Cyanobacteria, (Figure the the containing six supergroup to a one are from Terrabacteria vary can domains areas S1 unexploited structural and of soils, greenhouse fields, cultivated including hsrsac a uddb h usa cec onain rn ubr18-14-00321. number grant more Foundation, the Science while Russian the off”, by “cut funded are was domains research S1 This terminal the that bacterial likely the more Acknowledgements for is sin- that preserved. to it suggest assemblies are time, multirepeat will ones same obtained from central phyla the evolved amino conservative the bacterial At proteins the on the of of Based repeat. development for specificity evolutionary gle relationships the substitutions. the evolutionary as synonymous proteins, the accessed well the S1 and domains, as ribosomal of non-synonymous be S1 domain, study of RNA-binding individual can the ratio structural each data, the in which of formed site and number independently/dependently server, binding the composition be RNA in acid the can conserved change mo- S1 phyla a the into such of bacterial logo with how presence integrated accordance multidomain and demonstrate in were we ribosome evolution composition phyla, important data during different acid functionally 24 structured in amino and and identified largest identity, collected the percentage in The at the ratio analyzed dN/dS protein. and the and collected tifs, have we Here in domains S1 structural the of number respect Conclusions phylogenetic the the ranking bacteria in and in earlier decreasing have domains the of divisions S1 Moreover, likelihood ancient it. structural the phylogenetic six). of greater more (basically secondly, number the domains and structural microorganism, the life firstly, of symbiotic has number that, Tenerecutes during greater argued phylum coincide a be may the phyla can from different it protein of above, S1 the ribosomal con- all 30S G+C Summarizing low the a data, with our bacteria domain. to Gram-positive S1 are According one (Firmicutes) DNA. clostridia clostridia their to of closest in ancestors phylogenetically tent likely are most mycoplasmas the that rea- showed turn, Firmicutes a sequences phylum In RNA the Streptococcus, 16S with and of relationship Lactobacillus, evolutionary analysis their detailed Bacillus, about Clostridium, made genera was assumption the sonable from bacteria Gram-positive and http://oka.protres.ru:4200 72 ae nacmaio fte1SrN lgncetd eune fsvrlseiso mycoplasmas of species several of sequences oligonucleotide rRNA 16S the of comparison a on Based . 72 hshptei a ofimdeprmnal n scniee ntopsil ains all variants: possible two in considered is and experimentally confirmed was hypothesis This . 70,71 enhl,acrigto according meanwhile, . nti oprtv nlsso 33S rti eune hthv been have that sequences protein S1 1333 of analysis comparative this In . 32,70 10 h hlmDioocsTems(v 1domains) S1 (five Deinococcus-Thermus phylum the 67 40 u eyltl skonaottenature the about known is little very but , hr sa supini h literature the in assumption an is There . 29,52 66 oeta o hs hl,tenumber the phyla, these for that Note . ti ieyacpe htoxygenic that accepted widely is It . 68,69 32 u aarvae he S1 three revealed data Our . oe htActinobacteria that Note, . 73,74 more A . 75 . Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. 5 urirTkd ,SbaainA,Cl E h ciiyo iceefamnso ri- manner. substrate-dependant of a in fragments modules C-terminal four discrete ribosomal S1 S1 function. the the by of endoribonuclease between RegB replicase Chem cooperation the activity of Biol to Activation F. due beta Bontems The is M, Uzan protein S, Q PE. Laalami ri- M, Bisaglia coli 16. Cole in Escherichia AR, the S1 of Subramanian http://www.ncbi.nlm.nih.gov/pubmed/6358207 protein repeat C, RNA-binding bosomal last control. Guerrier-Takada autogenous The 15. in M. involved Dreyfus specifically VS, is doi:10.1128/JB.182.20.5872-5879.2000 Artamonova S1 V, protein bosomal I S1. Boni protein 14. ribosomal http://www.ncbi.nlm.nih.gov/pubmed/6348874 of 2012. 2, functions November and Accessed Structure 1983;28:101-142. AR. Escherichia Subramanian of synthesis. 13. S1 own protein its Ribosomal http://www.ncbi.nlm.nih.gov/pubmed/2120211 of S. regulation 2017. Pedersen the 19, AR, for January Subramanian effector MD, the Rasmussen is J, coli ribosomal Schnier of J, fold Skouv OB 12. second the tmRNA. of of all, Contribution recognition M. not doi:10.1271/bbb.68.2319 the Kimura to if 2325. CW, coli most, Escherichia Zwieb of from J, S1 translation Wower protein IK, for Wower required T, is Okada vivo. S1 11. in protein coli ribo- Ribosomal Escherichia the S. in Pedersen in mRNAs J, natural domains Fricke bacteria. of MA, of Sørensen number grouping 10. The phylogenetic the V. of O Galzitskaya hallmark a OM, doi:10.1371/journal.pone.0221370 Selivanova as family. EI, S1 OB-fold Deryusheva protein somal V, the A of and Machulin members repeats, 9. distribution, as Taxonomic proteins V. domain-containing O Galzitskaya S1 Bioinforma OM, structure. the Selivanova tertiary V., of the bet- A functions of relationship Machulin studies the EI, from Deryusheva of derived 8. Investigation functions V. molecular O its Galzitskaya doi:10.3390/molecules24203681 and MA, 2019;24(20):3681. domain Matyunin tendency S1 V., and the A flexibility ween Machulin proteins: EI, S1 Deryusheva in 7. Repeats O. Galzitskaya M, disorder. To- Lobanov intrinsic E, proteins: for Deryusheva Ribosomal A, C. Machulin Brochier-Armanet systematics? 6. V, prokaryotic Daubin for R, standard Planel generation doi:10.1016/j.ympev.2014.02.013 E, next Talla a M, ward Groussin com- HG, in Ramulu proteins 5. ribosomal scale. of domain analysis the Comparative at O. evolution doi:10.1093/nar/gkf693 Poch reductive 5390. D, of Moras example J-C, an Thierry genomes: R, plete Ripp O, Lecompte evolution. A 4. ribosomal S of signatures U Molecular Z. Sci Luthey-Schulten Acad CR, assembly Woese Natl J, ribosomal Proc Montoya A, in Sethi r-proteins E, Roberts of 3. evolution the elucidates conservation O, Residue function. Lespinet RP. and proteins. Bahadur SP, ribosomal prokaryotic Pilla of 2. Phylogenomics YI. Wolf V, E ed. Koonin Puigb`o P, N, Yutin 1. References LSOne PLoS 03281)121121 doi:10.1074/jbc.M212731200 2003;278(17):15261-15271. . 078()6263 doi:10.1002/prot.25237 2017;85(4):602-613. . n ilMacromol Biol J Int 0275:392 doi:10.1371/journal.pone.0036972 2012;7(5):e36972. . n o Sci Mol J Int 08153)193198 doi:10.1073/pnas.0804861105 2008;105(37):13953-13958. . 09103339 doi:10.1016/j.ijbiomac.2019.08.127 2019;140:323-329. . 092(0:37 doi:10.3390/ijms20102377 2019;20(10):2377. . o Biol Mol J 11 98204:6-6.doi:10.1006/jmbi.1998.1909 1998;280(4):561-569. . ilChem Biol J ilChem Biol J isiBoeho Biochem Biotechnol Biosci o hlgntEvol Phylogenet Mol Bacteriol J uli cd Res Acids Nucleic 90252)104109 Accessed 1990;265(28):17044-17049. . rgNcecAi e o Biol Mol Res Acid Nucleic Prog LSOne PLoS 1983;258(22):13649-13652. . 2000;182(20):5872-5879. . rtisSrc Funct Struct Proteins 2019;14(8):e0221370. . 2014;75(1):103-117. . 2002;30(24):5382- . 2004;68(11):2319- . Molecules J . . Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. 4 uehlzP xlrn rkroi iest ntegnmcera. genomic technologies. the omic in of diversity era prokaryotic the Exploring doi:10.1186/gb-2002-3-2-reviews0003 in P. VIEWS0003. diversity Hugenholtz Microbial 34. G. Tsiamis doi:10.1155/2013/958719 S, 2013;2013:958719. life. Nikolaki of tree the 33. and of view bacteria new uncultured A al. and et cultured K, doi:10.1038/nmicrobiol.2016.48 Anantharaman of BJ, Baker classification LA, the Hug 32. Uniting insights sequences. al. gene genomic et rRNA E, First 16S Pruesse using P. P, Hugenholtz Yilmaz bulking. P, GW, wastewater Yarza Tyson for 31. responsible T, phylum Yamauchi bacterial DH, candidate Parks doi:10.7717/peerj.740 a of A, members Ohashi into Y, Sekiguchi 30. origin the into land. insights evolution: of colonization of 2148-4-44 the timescale and genomic phototrophy, A methanogenesis, SB. of Hedges A, Feijao and FU, Battistuzzi Chlamydiae, 29. Verrucomicrobia, (Planctomycetes, clade relationships. evolutionary PVC their doi:10.3389/fmicb.2012.00327 into the insights provide for bacteria Bac- of signatures ) and Molecular Chlorobi, Gupta. Fibrobacteres, of 28. characteristics sequences signature Proca- and phylogeny for teroidetes. The Framework RS. Taxonomic Gupta and 27. Backbone Phylogenetic A doi:10.1007/978-0-387-21609-6 Overview: In: H-P. Systematics. Klenk ryotic sequences. W, DNA Ludwig protein-coding for 26. substitution nucleotide of model Evol generator. codon-based Biol A logo Mol Z. sequence Yang a N, WebLogo: Goldman 25. SE. Brenner J-M, Chandonia doi:10.1101/gr.849004 G, 2004;14(6):1188-1190. Hon information. protein GE, for Crooks hub 24. A UniProt: resource. al. et annotation doi:10.1093/nar/gku989 C, 2015;43(D1):D204-D212. domain O’Donovan MJ, . protein Martin SMART A, the Bateman new 23. of and years Updates 20 v.7: P. (GOLD) doi:10.1093/nar/gkx922 Bork 2018;46(D1):D493-D496. database I, OnLine Letunic Genomes al. 22. et J, Bertsch mole- D, computational features. for Stamatis tools S, Python Mukherjee available Freely biomarker21. Biopython: novel al. a et as bioinformatics. JT, (rpsA) and Chang gene biology T, S1 cular Antao protein PJA, ribosomal Cock the 20. of molecular Evaluation of al. identification. et footprints species X, as Wang Mycobacterium proteins G, for Liu in H, repeats Duan and 19. Loops IN. Serdyuk OM, evolution. Selivanova gram- EI, between Deryusheva relationship 18. analysis. the Probing sequence F. by Bontems proteins C, Sizun S1 doi:10.1093/nar/gkp547 M, gram-positive Uzan P, and Aliprandi negative M, Bisaglia P, Salah 17. uli cd Res Acids Nucleic iceity(Mosc) Biochemistry rtRvMicrobiol Rev Crit 941()7576 doi:10.1093/oxfordjournals.molbev.a040153 1994;11(5):725-736. . egysManual Bergey’s 094(1:69D5.doi:10.1093/nar/gky977 2019;47(D1):D649-D659. . 043()1313 doi:10.1080/10408410490435133 2004;30(2):123-143. . 8 027(3:4719.doi:10.1134/S000629791213007X 2012;77(13):1487-1499. . Bioinformatics ® a e Microbiol Rev Nat imdRsInt Res Biomed fSseai Bacteriology Systematic of 092(1:4212.doi:10.1093/bioinformatics/btp163 2009;25(11):1422-1423. . 12 0521:778 doi:10.1155/2015/271728 2015;2015:271728. . 041()6565 doi:10.1038/nrmicro3330 2014;12(9):635-645. . M vlBiol Evol BMC uli cd Res Acids Nucleic pigrNwYr;2001:49-65. York; New Springer . a Microbiol Nat eoeBiol Genome 0444.doi:10.1186/1471- 2004;4:44. . rn Microbiol Front 2009;37(16):5578-5588. . PeerJ uli cd Res Acids Nucleic imdRsInt Res Biomed uli cd Res Acids Nucleic 2016;1(5):16048. . 2015;3(1):e740. . 2002;3(2):RE- . eoeRes Genome 2012;3:327. . . . . Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. 2 atsuz U egsS.Amjrcaeo rkroe ihacetaattost ieo land. on life to adaptations ancient with of clade major A Evol SB. Biol Hedges com- FU, a Battistuzzi suggests factors 52. proteins. growth fibroblast beta-trefoil of all homologues evolution. for distant and of ancestor Identification functions, mon RB. structures, Russell CP, Ponting repeats: 51. Protein CP. Ponting C, Biol Perez-Iratxeta Struct MA, Andrade 50. Natural DC. Richardson JS, the aggregation. Richardson in acids amino 49. of propensities Position-specific P. Biol Biswas bin- N, RNA Bhattacharjee S1 the 48. of structure fold. solution acid-binding The nucleic AG. ancient Murzin an 8674(00)81844-9 SM, of Freund member M, a domain: Proctor ding adaptive TJ, detecting Hubbard in M, test Bycroft ratio 47. likelihood the of evolution power and the Accuracy evolution. Z. drives molecular Yang JP, selection Bielawski M, Darwinian Anisimova 46. Positive mammals. CF. in Aquadro proteins MF, reproductive doi:10.1073/pnas.051605998 Wolfner adaptation. female Z, several molecular genomes. Yang of detecting Mammalian WJ, six for Swanson in methods 45. selection Statistical positive JP. of doi:10.1016/S0169-5347(00)01994-7 Bielawski Patterns 2000;15(12):496-503. Z, al. Yang et RR, 44. Fonseca non-synonymous da the ed. T, estimating genome. MH, Vinar to Schierup a C, guide in Kosiol beginners genes A 43. protein-coding M. all Reis of dos ratio V, doi:10.1007/978-1-4939-1438-8 exoribonucleolytic rate Sojo in synonymous B, domain Tomiczek to S1 DC, the of Jeffares role 42. The CM. multimerization. with Arraiano and P, living specificity Gomez-Puertas while substrate A, sexual activity: Barbas and M, plastic, Amblar pathogenic, 41. Being al. genome. et bacterial Ribosomal M, minimal Marenda nearly of C, a Lartigue P, Propensities consistency. Sirand-Pugnet functional 40. Amyloidogenic with bigger al. growing OB-fold: doi:10.2174/1389203033487207 et KVR. 2003;4(3):195-206. Kishan V., V, Checking. Agrawal A 39. Experimental Machulin and Screening EI, doi:10.3390/ijms21155199 Bioinformatics Deryusheva Proteins: SY, S1 Grishin compo- plant. 38. Deciphering legume K. a Schlaeppi of J-C, microbiome Walser root V, the 016-0220-z Roussely-Provent of adaptation MGA, ed. function niche Heijden DA, and der that sition Robinson van indicates K, metabolism. analysis Hartman glycan 37. genomic plant Comparative to E. linked Cytryn strongly doi:10.1371/journal.pone.0076704 Y, is 2013;8(9):e76704. Elad Flavobacteria N, terrestrial Sela of a M, of Kolton Isolation 36. endophytes: bacterial Rice V. analysis. Venturi doi:10.1111/1758-2229.12403 microbiome S, and 398. strains Subramoni P, beneficial of Piffanelli identification P, collection, Abbruscato I, Bertani 35. 001()2.doi:10.1186/1472-6807-10-29 2010;10(1):29. . 092()3533 doi:10.1093/molbev/msn247 2009;26(2):335-343. . 011423:1-3.doi:10.1006/jsbi.2001.4392 2001;134(2-3):117-131. . rcNt cdSiUSA S U Sci Acad Natl Proc LSGenet PLoS o ilEvol Biol Mol 4 0848:1014 doi:10.1371/journal.pgen.1000144 2008;4(8):e1000144. . 011()18-52 doi:10.1093/oxfordjournals.molbev.a003945 2001;18(8):1585-1592. . LSGenet PLoS 029()25-79 doi:10.1073/pnas.052706099 2002;99(5):2754-2759. . o Biol Mol J β- 0735:4-5.doi:10.1371/journal.pgen.0030075 2007;3(5):744-758. . he rtisuengtv eint vi edge-to-edge avoid to design negative use proteins sheet RNA 13 00325:0114.doi:10.1006/jmbi.2000.4087 2000;302(5):1041-1047. . rcNt cdSiUSA S U Sci Acad Natl Proc 071()3737 doi:10.1261/rna.220407 2007;13(3):317-327. . Microbiome Cell 978()2522 doi:10.1016/S0092- 1997;88(2):235-242. . nio irbo Rep Microbiol Environ ehd o Biol Mol Methods n o Sci Mol J Int 0751:.doi:10.1186/s40168- 2017;5(1):2. . urPoenPp Sci Pept Protein Curr 2001;98(5):2509-2514. . β- strand. rnsEo Evol Ecol Trends 2020;21(15):5199. . 2015;1201:65-90. . 2016;8(3):388- . M Struct BMC LSOne PLoS Mol J . . . Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. 8 in .MlclrEiec o h al vlto fPhotosynthesis. of photosynthesis. of Evolution evolution Early early and the Origin doi:10.1007/BF00039173 for RE. Blankenship Evidence 69. the Molecular postdate Oxyphotobacteria J. doi:10.1126/science.289.5485.1724 group 2000;289(5485):1724-1730. Crown Xiong WW. Fischer 68. NJ, Matzke LM, the Ward oxygen. bacteroidetes: of J, gut rise Hemp and PM, Environmental G. Shih Michel 67. M, Czjzek E, Rebuffet connection. benefici- J-H, food plant Hehemann F, of Thomas Significance 66. microbiome: microorganisms. rhizosphere pathogenic The human JM. and doi:10.1111/1574-6976.12028 Raaijmakers pathogenic, transpa- P, plant a Lentisphae- Garbeva al, nov, phylum, R, bacterial sp. Mendes novel nov., a 65. gen. of araneosa description Lentisphaera the SJ. and Giovannoni bacterium, rae. RM, marine Morris producing exopolymer KL, rent Vergin JC, bacterial Cho pvc in 64. evolution lifestyle. content and genome organization of cellular with Analysis associated Evol NL. genes Ward candidate DA, of Liberles In: assessment super-phylum: SJ, spirochetes. Knight to OK, Kamneva defenses 63. Host J. Anguita Edition E, Fourth Fikrig Practice: and CM, cultivation by Olson determined fluctuati- as seasonal 62. field and wheat Diversity a K. in Wernars community S, soil Mil bacterial methods. van the J, molecular of Broek ecologically members den dominant van the of S, of ons Gommans characterization P, Leeflang and E, Smit Identification 61. K-H. studies. Schleifer cultivation and Acidobacteria: J, molecular lakes: of freshwater Overmann Rev of Ecology sediment R, the The in Schulze prokaryotes EE. significant S, Kuramae Spring JA, Veen 60. van Genomes. GA, and Kowalchuk Genes CC, beyond hypo- Barreto Moving Bacteria. conflicting AM, of explains Kielak phylogeny transfers 59. the gene in horizontal aquificales for of Accounting position doi:10.1186/1471-2148-8-272 divergence M. the late Gouy regarding the Gu´eguen L, for theses B, evidence Boussau provide proteins 58. diverse in sequences Aquificales. Signature Order the RS. molecu- of Gupta generalized E, to subdivisions. Griffiths approach proteobacterial 57. compatibility the character of the order of Branching eukaryotes.doi:10.1007/s00239-006-0082-2 Application and data: PHA. phyla sequence Sneath eubacterial lar RS, other Gupta to relationships 56. proteobacteria: of Rev phylogeny Microbiol The Acidithiobacillia FEMS RS. Proteobacteria, Gammaproteobac- Gupta phylum class the the 55. of within description class emended and new Acidithiobacillales, a order for teria. type the Proposal with DP. nov., Kelly doi:10.1007/0-387-30745-1 classis KP, 2006. Williams York; New 54. Springer eds.). JG. E, Stackebrandt Kuenen K-H, a., Schleifer L Robertson 53. nio Microbiol Environ 002()5350 doi:10.1111/j.1574-6976.2000.tb00559.x 2000;24(5):573-590. . 0241)17-30 doi:10.1093/gbe/evs113 2012;4(12):1375-1390. . n ytEo Microbiol Evol Syst J Int Geobiology rn Microbiol Front plEvrnMicrobiol Environ Appl 002()3742 doi:10.1111/j.1574-6976.2000.tb00547.x 2000;24(4):367-402. . 0466:1-2.doi:10.1111/j.1462-2920.2004.00614.x 2004;6(6):611-621. . n Microbiol Int leirIc;21:3-4.doi:10.1016/B978-0-7234-3691-1.00016-7 2013:338-345. Inc.; Elsevier . 071()1-9 doi:10.1111/gbi.12200 2017;15(1):19-29. . 036(AT)20-96 doi:10.1099/ijs.0.049270-0 2013;63(PART8):2901-2906. . 012MY:3 doi:10.3389/fmicb.2011.00093 2011;2(MAY):93. . h Prokaryotes The rn Microbiol Front 0471:15.doi:10.2436/im.v7i1.9443 2004;7(1):41-52. . 016()28-21 doi:10.1128/AEM.67.5.2284-2291.2001 2001;67(5):2284-2291. . o oue5 DoknM akwS oebr E, Rosenberg S, Falkow M, (Dworkin 5. Volume Vol . 14 067MY:4.doi:10.3389/fmicb.2016.00744 2016;7(MAY):744. . ESMcoilRev Microbiol FEMS htsnhRes Photosynth lnclImnlg:Picpe and Principles Immunology: Clinical o Evol Mol J M vlBiol Evol BMC 2013;37(5):634-663. . 2007;64(1):90-100. . 1992;33(2):91-111. . cec 8-) (80- Science ESMicrobiol FEMS 2008;8(1):272. . eoeBiol Genome . Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. +Friue ail ,2 15 6 4 4 2, Caldisericia 1, Aquificae 1 Caldiserica 4 Aquificae Bacilli Mollicutes Caldilineae Tenericutes Firmicutes Chloroflexi 3 Deinococcus- Gloeobacteria Cyanobacteria 6 18 Acidobacteriia Acidobacteria 4 3, 2, 1, G+ Actinobacteria Actinobacteria G- Terrabacteria 9 G- 8 G- 7 6 G+ 5 G- 4 3 2 1 1. Table termini the to are due domains. profiles shift the Logo account 4. site. into binding binding Figure taking RNA RNA given the the are domains. to to positions structural correspond The correspond the R70 server. domain of 3 and second WebLogo N68 the the H34, by in L20, generated H29 F17, and residues F17 conserved and The domain first the site. in R66 F17, F14, from 3 protein Figure S1 ( containing Deinococcus-Thermus. five-domain phylum the within for sequence proteins S1 containing five-domain 2. Figure ( 2020 of progress in projects domains, 1. Figure codon. tryptophan UGA legends a Figure of use genes; urease doi:10.1111/j.1365-2958.1990.tb00636.x urealyticum 1990;4(4):669-676. Ureaplasma A. MS. Vonsky Blanchard VM, 75. Chernov OA, Chernova SN, corre- Borkhsenius pulmonis Mycoplasma 74. of chromosome the switching. in phenotypic rearrangements with High-frequency late K. Dybvig B, Bhugra 73. microbial prokaryotes. of other and potential mycoplasmas coding doi:10.1099/00207713-42-2-226 between and 1979;1:43-61. relationships phylogeny Phylogenetic the H. into Neimark Insights 72. genes: al. conserved et using A, genomes Sczyrba matter. P, archaeal dark Schwientek and C, bacterial Rinke of 71. Phylogeny ed. JA. PJ, Planet Eisen supermatrices. AE, and supertrees Darling JM, Lang 70. (c,d) b etrso hlgntcdsrbto frbsmlS proteins S1 ribosomal of distribution phylogenetic of Features . mgso h oomtffrtetidadfut oan fS rtiscnann i domains. six containing proteins S1 of domains fourth and third the for motif Logo the of Images ( 1poenpyu-ee eune distribution, sequences phylum-level protein S1 a b) (a, i hrssoigtedsrbto fbceilS proteins, S1 bacterial of distribution the showing charts Pie ceai ersnaieo h atra hl,cnann ieetnme fsrcua S1 structural of number different containing phyla, bacterial the of representative Schematic a) Nature I nlsso ordmi otiigS rtisfrbceilpya ( phyla. bacterial for proteins S1 containing four-domain of analysis PID ehdSprrusPyu Class Phylum Supergroups method staining Gram oomtfrpeetto o 1poen,cnann w oan.Tecnevdresidues conserved The domains. two containing proteins, S1 for representation motif Logo 034975)4147 doi:10.1038/nature12352 2013;499(7459):431-437. . https://gold.jgi.doe.gov/statistics o Microbiol Mol LSOne PLoS 9269:1915.doi:10.1111/j.1365-2958.1992.tb01553.x 1992;6(9):1149-1154. . 15 Thermus 0384:650 doi:10.1371/journal.pone.0062510 2013;8(4):e62510. . hru thermophilus Thermus c hlgntcdsrbto fbceilgenome bacterial of distribution phylogenetic ) yolsa i Russian] [in Mycoplasmas lsrda2 4 2, 3 Clostridia - 4 Coriobacteriia encci5 Deinococci a ntenme fsrcua S1 structural of number the in c) nPo Dadaioacid amino and ID UniProt . b) h mycoplasmas The domains S1 of Number ak;2002. Nauka; . I nlssfor analysis PID o Microbiol Mol aae,% dataset, from number sequence of Ratio < < < < < < < < < < . . 1 1 1 1 1 1 1 1 1 1 Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. 4SnrittsSnrita5 6 Synergistia Spirochaetia Synergistetes Spirochaetes 6 5, Phycisphaerae 6 5, Chlamydiia 6 Methylacidiphilae5, Chlamydiae Verrucomicrobia G- 1333 Alpha records: Proteobacteria of amount Proteobacteria Total 24 23 G- G- 22 21 20 19 . .0 .8 .7 .3 8.526 6.742 4.124 6.686 7.989 10.88 8.131 10.14 9.237 5.992 3.233 6.063 8.465 10.46 7.811 10.10 7.775 7.390 3.097 5.610 8.171 11.90 6.665 11.16 7.681 7.865 3.306 6.005 9.427 8.232 4.799 10.66 7.407 6.467 3.758 6.135 9.618 11.82 7.517 10.17 6.0 6.3 6.6 7.3 8.7 9.1 9.4 S1domains structural of 9.6 Number S1domains structural of G Number A S1domains structural of N Number I S1domains structural of Number L S1domains V structural of Number K S1domains E structural of Number code letter acid Amino 6 domains. structural 2. Table Nitrospira 6 3 Nitrospirae Fusobacteriia Nitrospinae/TectomicrobiaNitrospinia 6 6 Fusobacteria Deferribacteres Deferribacteres 6 6 Bacteroidia 6 6 Ignavibacteria Bacteroidetes Ignavibacteriae Fibrobacteria GemmatimonadetesGemmatimonadetes6 Chlorobia Fibrobacteres Chlorobi FCB 18 17 16 G- 15 14 13 12 11 10 otn faioai eius()i eune fS iooa rtisb h ubrof number the by proteins ribosomal S1 of sequences in (%) residues acid amino of Content 1333 records: of amount Total ehdSprrusPyu Class Phylum Supergroups method staining Gram n w he orfiesix five four three two one 1333 records: of amount Total 6 4, Planctomycetia Planctomycetes PVC 16 1333 records: of amount Total lgflxa4 6 4, 2 Oligoflexia 6 Acidithiobacillia Epsilonproteobacteria6 8 30 proteobacteria Delta Gammaproteobacteria1, 6 6 4, Betaproteobacteria2, 5, 3, 2, proteobacteria 1333 records: of amount Total 6 2 - 6 4, 6 1, 6 Sphingobacteriia Flavobacteriia Cytophagia ,62 12 6 5, 6 5, domains S1 of Number 1333 records: of amount Total < < < < < < < < 1333 records: of amount Total aae,% dataset, from number sequence of Ratio < < < < < < < < < < < 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. . .8 .1 .2 .9 0.301 1.124 1.585 1.330 1.524 3.020 2.380 5.031 4.346 5.680 7.247 3.192 0.297 1.010 1.462 1.414 1.486 3.863 4.553 5.873 4.422 5.397 5.992 2.865 0.020 0.907 1.834 1.532 0.987 3.659 3.054 5.285 4.274 6.138 7.518 3.009 0.218 0.367 1.756 1.492 2.353 3.995 2. Figure 4.282 6.590 4.788 6.188 6.211 3.766 0.386 0.773 1.271 1.990 1.824 2.653 3.316 4.367 1. Figure 4.864 6.412 6.135 3.095 0.5 0.7 1.9 2.0 2.3 3.0 3.5 3.6 3.9 C 4.5 S1domains structural W of 5.1 Number H 5.9 S1domains structural of Y Number M S1domains structural of Q Number P S1domains structural of Number R S1domains T structural of Number S S1domains D structural of Number F code letter acid Amino 17 Posted on Authorea 19 Nov 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160577285.57603701/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. iue4. Figure 3. Figure 18